This action might not be possible to undo. Are you sure you want to continue?
1: Core Concepts
Lesson 1 Describing Data Services Lesson introduction
Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes.
• Describe the purpose of Data Services • Describe Data Services architecture
Describing the purpose of Data Services
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.
Describing Data Services benefits
The Business Objects Data Services platform enables you to perform enterprise-level data integration and data quality functions. With Data Services, your enterprise can: • Create a single infrastructure for data movement to enable faster and lower cost implementation. • Manage data as a corporate asset independent of any single system. • Integrate data across many systems and re-use that data for many purposes. • Improve performance. • Reduce burden on enterprise systems. • Prepackage data solutions for fast deployment and quick return on investment (ROI). • Cleanse customer and operational data anywhere across the enterprise. • Enhance customer and operational data by appending additional information to increase the value of the data. • Match and consolidate data at multiple levels within a single pass for individuals, households, or corporations.
Understanding data integration processes
Data Services combines both batch and real-time data movement and management with intelligent caching to provide a single data integration platform for information management from any information source and for any information use. This unique combination allows you to: • Stage data in an operational datastore, data warehouse, or data mart. • Update staged data in batch or real-time modes.
• Create a single environment for developing, testing, and deploying the entire data integration platform. • Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis. Data Services performs three key functions that can be combined to create a scalable, high-performance data platform. It: • Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes. • Creates routing requests to a data warehouse or ERP system using complex rules. • Applies transactions against ERP systems. Data mapping and transformation can be defined using the Data Services Designer graphical user interface. Data Services automatically generates the appropriate interface calls to access the data in the source system.
Describing Data Services architecture Introduction Data Services relies on several unique components to accomplish the data integration and data quality activities required to manage your corporate data. Dictionaries. After completing this unit. you will be able to: • Describe standard Data Services components • Describe Data Services management tools Defining Data Services components Data Services includes the following standard components: • • • • • • • • • • Designer Repository Job Server Engines Access Server Adapters Real-time Services Address Server Cleansing Packages. and Directories Management Console This diagram illustrates the relationships between these components: .
The details for the Job Server display in the status bar in the lower left portion of the screen. . From the Designer. The path may be different. Using Designer. You can create objects that represent data sources. click Programs BusinessObjects XI 3. test. In the BusinessObjects Data Services Repository Login dialog box. From the Start menu.Describing the Designer Data Services Designer is a Windows client application used to create. drop. enter the connection information for the local repository. you create data management applications that consist of data mappings. Designer allows you to manage metadata stored in a local repository. and manually execute jobs that transform data and populate a data warehouse. 2. and then drag. To log in to Designer 1. depending on how the product was installed. and control logic.1 BusinessObjects Data Services Data Services Designer to launch Designer.0/3. 3. 4. To verify the Job Server is running in Designer. Click OK. transformations. you can also trigger the Job Server to run your jobs for initial application testing. and configure them in flow diagrams. hover the cursor over the Job Server icon in the bottom right corner of the screen.
Each repository is associated with one or more Data Services Job Servers. Each repository is stored on an existing Relational Database Management System (RDBMS). 2. click Programs BusinessObjects XI 3.Describing the repository The Data Services repository is a set of tables that holds user-created and predefined system objects. To create a local repository 1. enter the connection information for the local repository. It is set up on an open client/server platform to facilitate sharing metadata with other enterprise tools. From the Start menu.1 BusinessObjects Data Services Data Services Repository Manager to launch the Repository Manager. • A central repository (known in Designer as the Central Object Library) is an optional component that can be used to support multi-user development. In the BusinessObjects Data Services Repository Manager dialog box. The path may be different. • A profiler repository is used to store information that is used to determine the quality of data. There are three types of repositories: • A local repository (known in Designer as the Local Object Library) is used by an application designer to store definitions of source and target metadata and Data Services objects. The Central Object Library provides a shared library that allows developers to check objects in and out for development. depending on how the product was installed. source and target metadata.0/3. . and transformation rules.
you can see the SQL that is applied to create the repository. Describing the Job Server Each repository is associated with at least one Data Services Job Server. 4. Create Create. click Get Version. . and manages extractions and transactions from ERP systems and other sources.3. The Job Server can move data in batch or real-time mode and uses distributed query optimization. Note that the version number refers only to the last major point release number. 5. You may need to confirm that you want to overwrite the existing repository. If you select the Show Details check box. System messages confirm that the local repository is created. performs complex data transformations. which retrieves the job from its associated repository and starts the data movement engine. The version displays in the pane at the bottom of the dialog box. To see the version of the repository. if it already exists. Click Close. The data movement engine integrates data from multiple heterogeneous sources.
and parallel processing to deliver high data throughput and scalability. in-memory caching. depending on how the product was installed.0/3. the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data Services Access Server. and only one. in-memory data transformations. Data Services provides distributed processing capabilities through the Server Groups. While designing a job. you can run it from the Designer. This information is utilized by Data Services to determine where a job. data flow or sub-data flow (depending on the distribution level specified) should be executed. From the Start menu. click Programs BusinessObjects XI 3.1 BusinessObjects Data Services Data Services Server Manager to launch the Server Manager. . Each Data Services server can contribute one. The path may be different.multithreading. you can balance job loads by creating a Job Server Group (multiple Job Servers). In production environments. In your production environment. To verify the connection between repository and Job Server 1. Each Job Server collects resource utilization information for its computer. A Server Group is a collection of Job Servers that each reside on different Data Services server computers. Job Server to a specific Server Group. which executes jobs according to overall system load.
In the Job Server Configuration Editor dialog box. 3. select the Job Server. . In the BusinessObjects Data Services Server Manager dialog box.2. click Edit Job Server Config.
4. 6. select the repository. . 5. Click Resync with Repository. Click Resync. In the Job Server Properties dialog box.
Click Apply. enter the password for the repository.In the BusinessObjects Data Services Server Manager dialog box. request-reply message broker that collects incoming XML message requests.A system message displays indicating that the Job Server will be resynchronized with the selected repository. Real-time services are configured in the Data Services Management Console. Data Services engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability. 13. 7. . The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. click Restart to restart the Job Server. Describing the engines When Data Services jobs are executed. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high.Click OK. the Job Server starts Data Services engine processes to perform data extraction. There is also a Software Development Kit (SDK) to allow customers to create adapters for custom applications. Click OK to acknowledge the warning message. Describing the Access Server The Access Server is a real-time. and movement. routes them to a real-time service. Describing the adapters Adapters are additional Java-based programs that can be installed on the job server to provide connectivity to other systems such as Salesforce. Describing the real-time services The Data Services real-time client communicates with the Access Server when processing real-time jobs. and delivers a message reply within a user-specified time frame. You can configure multiple Access Servers. 11.com or the Java Messaging Queue. transformation. 12.Click OK to close the Job Server Configuration Editor dialog box. 10. In the Password field. A system message displays indicating that the Job Server will be restarted.Click OK to close the Job Server Properties dialog box. 8. 9.
Dictionaries also contain acronym. and repository usage • Configuring and managing adapters • Managing users • Publishing batch jobs and real-time services via web services • Reporting on metadata Auto Documentation View. . dictionary files are used to identify. gender. parse. and stopping real-time services • Configuring Job Server. and directories The Data Quality Cleansing Packages. dictionaries. Access Server. including their relationships. and standardize data such as names. and more. Directories provide information on addresses from postal authorities. match standard. monitoring. and executing batch jobs • Configuring. capitalization. Describing the Cleansing Packages. analyze. standardizing. and address information. It provides access to detailed address line information for most European countries. and directories provide referential data for the Data Cleanse and Address Cleanse transforms to use when parsing. titles. starting. and cleansing name and address data. and firm data. and print graphical representations of all objects as depicted in Data Services Designer.Describing the Address Server The Address Server is used specifically for processing European addresses using the Data Quality Global Address Cleanse transform. Cleansing Packages are packages that enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules. including: • Scheduling. dictionaries. Describing the Management Console The Data Services Management Console provides access to the following features: • Administrator • Auto Documentation • Data Validation • Impact and Lineage Analysis • Operational Dashboard • Data Quality Reports Administrator Administer Data Services resources. properties.
Report types include job summaries. Use the Server Manager to define links between Job Servers and repositories. Defining other Data Services tools There are also several tools to assist you in managing your Data Services installation. Data Quality Reports Use data quality reports to view and export Crystal reports for batch and real-time jobs that include statistics-generating transforms. and transform group reports. and identify potential inconsistencies or errors in source data. you must enable the Generate report data option in the Transform Editor. transform-specific reports. Describing the License Manager The License Manager displays the Data Services components for which you currently have a license.Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data Services batch jobs in order to quickly review. and profiler repositories. and reports. To generate reports for Match. delete. and Global Address Cleanse transforms. . and check the versions of local. Ope rational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one or more repositories over a given time period. Describing the Repository Manager The Data Services Repository Manager allows you to create. It is automatically installed on each computer on which you install a Job Server. US Regulatory Address Cleanse. Describing the Server Manager The Data Services Server Manager allows you to add. separating test and production environments). business views. and Business Objects Enterprise objects such as universes. or edit the properties of Job Servers. assess. Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (for example. central. upgrade.
and Web Intelligence documents. Desktop Intelligence documents.Describing the Metadata Integrator The Metadata Integrator allows Data Services to seamlessly share metadata with Business Objects Intelligence products. . Run the Metadata Integrator to collect metadata into the Data Services repository for Business Views and Universes used by Crystal Reports.
you must import source and target metadata.Lesson 2 Defining Source and Target Metadata Lesson introduction To define data movement requirements in Data Services. • Use datastores • Use datastore and system configurations • Define file formats for flat files • Define file formats for Excel files .
After completing this unit. Each source or target must be defined individually and the datastore options available depend on which Relational Database Management System (RDBMS) or application is used for the datastore. For example. Oracle. and various additional legacy systems using BusinessObjects Data Services Mainframe Interfaces such as Attunity and IBM Connectors The specific information that a datastore contains depends on the connection. When your database or application changes. Data Services does not automatically detect structural changes to the datastore. the adapter might be designed to access metadata. while Data Services extracts data from or loads data directly to the application. VSAM. Sybase. you will be able to: • • • • Explain datastores Create a database datastore Change a datastore definition Import metadata Explaining datastores A datastore provides a connection or multiple connections to data sources such as a database. Database datastores can be created for the following sources: • IBM DB2. you must make corresponding changes in the datastore information in Data Services. Through the datastore connection. if the data source is SQL-compatible. . Microsoft SQL Server. • Application datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems. and Teradata databases (using native connections) • Other databases (through ODBC) • A simple memory storage mechanism using a memory datastore • IMS.Using datastores Introduction Datastores represent connections between Data Services and databases or applications. There are three kinds of datastores: • Database datastores: provide a simple way to import metadata directly from an RDBMS. Data Services can import the metadata that describes the data from the data source. • Adapter datastores: can provide access to an application’s data and metadata or just metadata. Data Services uses these datastores to read data from source tables or load data to target tables.
In the Datastore name field. ask your database administrator to create an account for you. you must have appropriate access privileges to the database or file system that the datastore describes. It cannot contain spaces. see Chapter 5 in the Data Services Designer Guide. The Create New Datastore dialog box displays. The values you select for the datastore type and database type determine the options available when you create a database datastore. Enter the other connection details. To create a datastore. as required. Business Objects offers an Adapter Software Development Kit (SDK) to develop your own custom adapters. Note that if you are using MySQL. In the Datastore Type drop-down list. select the RDBMS for the data source. Depending on the adapter implementation. 4. For more information on these adapters. Leave the Enable automatic data transfer check box selected. If you do not have access. You can use the Data Mart Accelerator for Crystal Reports adapter to import metadata from BusinessObjects Enterprise.Using adapters Adapters provide access to a third-party application’s data and metadata. . right-click the white space and select New from the menu. Creating a database datastore You need to create at least one datastore for each database file system with which you are exchanging data. 2. 5. See the documentation folder under Adapters located in your Data Services installation for more information on the Data Mart Accelerator for Crystal Reports. You can also buy Data Services prepackaged adapters to access application data and metadata in any application. enter the name of the new datastore. ensure that the default value of Database is selected. On the Datastores tab of the Local Object Library. adapters can provide: • Application metadata browsing • Application metadata importing into the Data Services repository For batch and real-time data movement between Data Services and applications. The entries that you must make to create a datastore depend on the selections you make for these two options. In the Database type drop-down list. The name can contain any alphanumeric characters or underscores (_). 3. 6. any ODBC connection provides access to all of the available MySQL schemas. To create a database datastore 1.
and database version. Properties are descriptive of the object and do not affect its operation. These include the database server name. you can edit all connection properties except datastore name. database name. For database datastores. Changing a datastore definition Like all Data Services objects. Includes the date you created the datastore. Click OK. Attributes Class Attributes . and password for the specific database. Properties Tab Description General Contains the name and description of the datastore. if available. The Edit Datastore dialog box allows you to edit all connection properties except datastore name and datastore type for adapter and application datastores. • Properties document the object. You cannot change the name of a datastore after creation. The datastore name appears on the object in the Local Object Library and in calls to the object. database type. This value cannot be changed. For example. the name of the datastore and the date on which it is created are datastore properties. user name.7. datastores are defined by both options and properties: • Options control the operation of objects. Includes overall datastore information such as description and date created. datastore type.
attributes. Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables. right-click the datastore name and select Properties from the menu. 3. Change the database server name. you can edit column names. 2. In some cases. 2.To change datastore options 1. Click OK. The Edit Datastore dialog box displays the connection information. it ignores the column entirely. descriptions. and class attributes. as required. searching. On the Datastores tab of the Local Object Library. After importing metadata. The name of the table column. if Data Services cannot convert the datatype. On the Datastores tab of the Local Object Library. The description of the table. and datatypes. 3. username. right-click the datastore name and select Edit from the menu. You can import metadata by name. Change the datastore properties. . database name. The changes take effect immediately. The edits are propagated to all objects that call these objects. Click OK. as required. The datatype for each column. To change datastore prope rties 1. Metadata Description Table name Table description Column name Column description The name of the table as it appears in the database. The Properties dialog box lists the datastore’s description. and password options. The description of the column. and browsing. Column datatype If a column is defined as an unsupported datatype (see datatypes listed below) Data Services converts the datatype to one that is supported.
and Sybase databases and stored functions and packages from Oracle. double. long. CLOB. 2. numeric. On the Datastores tab of the Local Object Library. . real. time. MS SQL Server. interval. this columns is indicated in the column list by a key icon next to the column name. You can configure imported functions and procedures through the Function Wizard and the Smart Editor in a category identified by the datastore name. Information that is imported for functions includes: • • • • Function parameters Return type Name Owner Imported functions and procedures appear in the Function branch of each datastore tree on the Datastores tab of the Local Object Library. Oracle. After a table has been added to a data flow diagram. The column that comprises the primary key for the table. The items available to import appear in the workspace. Information Data Services records about the table such as the date created and date modified if these values are available. Importing metadata by browsing The easiest way to import metadata is by browsing. Primary key column Table attribute Owner name You can also import stored procedures from DB2. see “Ways of importing metadata”. For more information on importing by searching and importing by name.Metadata Description The following datatypes are supported: BLOB. You can use these functions and procedures in the extraction specifications you give Data Services. Name of the table owner. Navigate to and select the tables for which you want to import metadata. int. right-click the datastore and select Open from the menu. and varchar. date. Chapter 5 in the Data Services Designer Guide. To import metadata by browsing 1. datetime. decimal. Note that functions cannot be imported using this method. timestamp.
and Template Tables. right-click the object and select Reconcile. To view data for a imported datastore. Tables. 5.You can hold down the Ctrl or Shift keys and click to select multiple tables. right-click a table and select View Data from the menu. 37 . The workspace contains columns that indicate whether the table has already been imported into Data Services (Imported) and if the table schema has changed since it was imported (Changed). 3. Right-click the selected items and select Import from the menu. expand the datastore to display the list of imported objects. organized into Functions. In the Local Object Library. To verify whether the repository contains the most recent metadata for an object. 4.
The file format editor has three work areas: • Property Value: Edit file format property values. After completing this unit. you will be able to: • Explain file formats • Create a file format for a flat file Explaining file formats A file format is a generic description that can be used to describe one file or multiple data files if they share the same format. • Column Attributes: Edit and define columns or fields in the file. It is a set of properties describing the structure of a flat file (ASCII). Date formats In the Property Values work area. Expand and collapse the property groups by clicking the leading plus or minus. • SAP R/3 format — this is used with the predefined Transport_Format or with a custom SAP R/3 format. you can override default date formats for files at the field level. • Data Preview: View how the settings affect sample data. Field-specific formats override the default format set in the Properties-Values area. The following data format codes can be used: Code Description DD 2-digit day of the month . The properties and appearance of the work areas vary with the format of the file. • Fixed width format — the fixed column width is specified by the user.Defining file formats for flat files Introduction File formats are connections to flat files in the same way that datastore are connections to databases. The Local Object Library stores file format templates that you use to define specific file formats as sources and targets in data flows. Creating file formats Use the file format editor to set properties for file format templates and source and target file formats. File formats are used to connect to source or target data when the data is stored in a flat file. File format objects can describe files in: • Delimited format — delimiter characters such as commas or tabs separate each field.
Code Description MM MONTH MON YY YYYY HH24 MI SS FF 2-digit month Full name of the month 3-character name of the month 2-digit year 4-digit year 2-digit hour of the day (0-23) 2-digit minute (0-59) 2-digit second (0-59) Up to 9-digit sub-seconds To create a new file format 1. . On the Formats tab of the Local Object Library. it is important to finish inputting the values for the file properties before moving on to the Column Attributes work area. right-click Flat Files and select New from the menu to open the File Format Editor. To make sure your file format definition works properly.
Root directory. In the Name field. Complete the other properties to describe files that this template represents. In the Type field. 6. 4. This happens automatically when you open a file. The Group File Read can read multiple flat files with identical formats through a single file format. • Fixed width: select this file type if the file uses specified widths for each column. and File name. then no data is displayed in the Data Preview section of the file format editor for its files. it cannot be changed. multiple files can be read. By substituting a wild card character or list of file names for the single file name. Overwrite the existing schema as required. Once the name has been created. Click Yes to overwrite the existing schema. enter a name that describes this file format template. 5. . specify the file type: • Delimited: select this file type if the file uses a character sequence to separate columns. the file format must be deleted and a new format created. 3. If a fixed-width file format uses a multi-byte code page.2. If an error is made. Specify the location information of the data file including Location.
Click Save & Close to save the file format and close the file format editor. For columns with a datatype of decimal or numeric. specify the scale of the field. If you do specify columns and they do not match the output schema from the preceding transform. For source files.7. it defaults to the format used by the code page on the computer where the Job Server is installed. Field Size Precision Scale Format You do not need to specify columns for files used as targets. select a format for the field. if desired. For columns with a datatype of varchar. Data Services cannot use the source column format specified. In the Local Object Library. Data Services writes to the target file using the transform’s output schema. For columns with any datatype but varchar. This information overrides the default format set in the Property Values work area for that datatype. Instead. 9. right-click the file format and select View Data from the menu to see the data. . specify the length of the field. specify the precision of the field. For columns with a datatype of decimal or numeric. Select the appropriate datatype from the drop-down list. specify the structure of each column in the Column Attributes work area as follows: Column Description Field Name Data Type Enter the name of the column. For a decimal or real datatype. 8. if you only specify a source column format and the column names and datatypes in the target schema do not match those in the source schema.
You can select specific data in the workbook using custom ranges or auto-detect. right-click Excel Workbooks and select New from the menu. As with file formats and datastores. with no ODBC connection setup and configuration needed. To import and configure an Excel source 1. .Defining file formats for Excel files Introduction You can create file formats for Excel files in the same way that you would for flat files. After completing this unit. you will be able to: • Create a file format for an Excel file Using Excel as a native data source It is possible to connect to Excel workbooks natively as a source. these Excel formats show up as sources in impact and lineage analysis reports. On the Formats tab of the Local Object Library. and you can specify variable for file and sheet names for more flexibility.
The Import Excel Workbook dialog box displays.
2. In the Format name field, enter a name for the format. The name may contain underscores but not spaces. 3. On the Format tab, click the drop-down button beside the Directory field and select <Select folder...>. 4. Navigate to and select a new directory, and then click OK. 5. Click the drop-down button beside the File name field and select <Select file...>. 6. Navigate to and select an Excel file, and then click Open. 7. Do one of the following: • To reference a named range for the Excel file, select the Named range radio button and enter a value in the field provided.
• To reference an entire worksheet, select the Worksheet radio button and then select the All fields radio button. • To reference a custom range, select the Worksheet radio button and the Custom range radio button, click the ellipses (...) button, select the cells, and close the Excel file by clicking X in the top right corner of the worksheet. 8. If required, select the Extend range checkbox. The Extend range checkbox provides a means to extend the spreadsheet in the event that additional rows of data are added at a later time. If this checkbox is checked, at execution time, Data Services searches row by row until a null value row is reached. All rows above the null value row are included. 9. If applicable, select the Use first row values as column names option. If this option is selected, field names are based on the first row of the imported Excel sheet. 10.Click Import schema. The schema is displayed at the top of the dialog box. 11.Specify the structure of each column as follows:
Field Name Data Type
Enter the name of the column. Select the appropriate datatype from the drop-down list. For columns with a datatype of varchar, specify the length of the field. For columns with a datatype of decimal or numeric, specify the precision of the field. For columns with a datatype of decimal or numeric, specify the scale of the field. If desired, enter a description of the column.
12.If required, on the Data Access tab, enter any changes that are required. The Data Access tab provides options to retrieve the file via FTP or execute a custom application (such as unzipping a file) before reading the file. 13.Click OK. The newly imported file format appears in the Local Objects Library with the other Excel workbooks. The sheet is now available to be selected for use as a native data source.
Lesson 3 Creating Batch Jobs Lesson introduction Once metadata has been imported for your datastores, you can create data flows to define data movement requirements. • Work with objects • Create a data flow • Use the Query transform • Use target tables • Execute the job
Options Options control the object. modify. the database name is an option for the connection. or work with are objects. . Some of the most frequently-used objects are: • • • • • • Projects Jobs Work flows Data flows Transforms Scripts This diagram shows some common objects. All objects have options. all entities you add. After completing this unit. Each can be modified to change the behavior of the object. to set up a connection to a database. properties.Defining Data Services objects Introduction Data Services provides you with a variety of objects to use when you are building your data integration and data quality applications. you will be able to: • Define the objects available in Data Services • Explain relationships between objects Understanding Data Services objects In Data Services. and classes. define. For example.
Note: You cannot copy single-use objects. you can open a data flow and edit it. A project is a single-use object that allows you to group jobs. Data Services stores the definition in the repository. A job is the smallest unit of work that you can schedule independently for execution. For example. a data flow within a project is a re-usable object. If this data flow is changed. If you change the definition of the object in one place. Defining projects and jobs A project is the highest-level object in Designer. and then save the object. the changes you make to the data flow are not stored until you save them. Single-use objects Single-use objects appear only as components of other objects. They operate only in the context in which they were created. For example. For example. However. After you define and save a re-usable object. such as a weekly load job and a daily load job.Properties Properties describe the object. Multiple jobs. Data . Every object is either re-usable or single-use. Attributes are properties used to locate and organize objects. Most objects created in Data Services are available for re-use. you can expand it to view the lower-level objects contained in the object. Projects provide a way to organize the other objects you create in Designer. Re-usable objects A re-usable object has a single definition and all calls to the object refer to that definition. • Only one project can be open at a time. Projects have the following characteristics: • Projects are listed in the Local Object Library. Classes Classes define how an object can be used. You can edit re-usable objects at any time independent of the current open project. The objects in a project appear hierarchically in the project area. • Projects cannot be shared among multiple users. the name and creation date describe what the object is used for and when it became active. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. the change is reflected to all other calls to the object. can call the same data flow. both jobs call the new version of the data flow. If a plus sign (+) appears next to an object. For example. You can then re-use the definition as often as necessary by creating calls to it. if you open a new project.
one should consider nesting data flows inside of work flows by default. You can also use work flows to define strategies for handling errors that occur during project execution. However. if a job initially consists of four data flows that are to run sequentially. Always using work flows makes jobs more adaptable to additional development and/or specification changes. . Using work flows Jobs with data flows can be developed without using work flows. A work flow orders data flows and the operations that support them. which involves moving data from one or more sources to one or more target tables or files. you can use the work flow to specify the order in which you want Data Services to populate the tables.Services displays the contents as both names and icons in the project area hierarchy and in the workspace. If these had been initially added to a work flow. But what if specification changes require that they be merged into another job instead? The developer would have to replicate their sequence correctly in the other job. if one target table depends on values from other tables. and targets. This practice can provide various benefits. the developer could then have simply copied that work flow into the correct position within the new job. Defining relationship between objects Jobs are composed of work flows and/or data flows: • A work flow is the incorporation of several data flows into a sequence. Note: Jobs must be associated with a project before they can be executed in th e project area of Designer. they could be set up without work flows. This diagram illustrates a typical work flow. A data flow defines the basic task that Data Services accomplishes. the transformations the data should undergo. For example. For instance. It also defines the interdependencies between data flows. You define data flows by identifying the sources from which to extract data. • A data flow is the process by which source data is transformed into target data. or to define conditions for running sections of a project.
it may have been decided that recovery units are not important. and verify the previous sequence. However. Even if there is one data flow per work flow. as data volumes tend to increase. the whole process could simply be rerun. copy. Initially. It also opens up the possibility that units of recovery are not properly defined. However. the expectation being that if the job fails. and data flows move data from source tables to target tables. these changes can be complex and can consume more time than allotted for in a project plan. This illustration shows the hierarchical relationships for the key object types within Data Services: . to optional work flows. work flows define a sequence of processing steps. it may be determined that a full reprocessing is too time consuming. to data flows. In jobs. Describing the object hierarchy In the repository. Setting these up during initial development when the nature of the processing is being most fully analyzed is preferred. there are benefits to adaptability. The change can be made more quickly with greater accuracy.There would be no need to learn. objects are grouped hierarchically from a project. to jobs. The job may then be changed to incorporate work flows to benefit from recovery units to bypass reprocessing of successful steps.
This course focuses on creating batch jobs using database datastores and file formats. .
These manuals are also accessible by going through Start Programs Business Objects XI 3. tool palette. you will be able to: • Explain how Designer is used • Describe key areas in the Designer window Describing the Designer window The Data Services Designer interface consists of a single application window and several embedded supporting windows. Local Object Library.Using the Data Services Designer interface Introduction The Data Services Designer interface allows you to plan and organize your data integration and data quality jobs in a visual way. After completing this unit. and workspace.1 BusinessObjects Data Services Data Services Documenta tion Technical Manua ls. . Tip: You can access the Data Services Technical Manuals for reference or help through the Designer interface Help menu.0/3. Most of the components of Data Services can be programmed through this interface. The application window contains the menu bar. project area. toolbar.
Audit View Where Used Back . Opens and closes the Local Object Library window. Validate All Objects in Validates all object definitions open in the workspace. Opens and closes the Output window. Moves back in the list of active workspace windows. Objects included in the definition are also validated. Central Object Library Opens and closes the Central Object Library window. Variables Project Area Output View Enabled Descriptions Opens and closes the Variables and Parameters window. Opens the Output window. Enables the system-level setting for viewing object descriptions in the workspace. View Opens the Audit window. Validates the object definition open in the active tab of the Validate Current View workspace.Using the Designer toolbar In addition to many of the standard Windows toolbar buttons. Other objects included in the definition are also validated. You can collect audit statistics on the data that flows out of any Data Services object. Closes all open windows in the workspace. Data Services provides the following unique toolbar buttons: Button Tool Description Save All Close All Windows Local Object Library Saves all new or updated objects. which lists parent objects (such as jobs) of the object currently open in the workspace (such as a data flow). Open and closes the project area.
documents.Button Tool Description Forward Move forward in the list of active workspace windows. which provides access to Administrator. Lineage and Impact Management Console Analysis. Data Integrator. Under each datastore is a list of the tables. Transforms operate on data. Datastores represent connections to databases and applications used in your project. Tab Description Projects are sets of jobs available at a given time. defining the interdependencies between them. Opens Data Insight. Data Validation. Operational Dashboard. and Data Quality transforms. Data flows describe how to process a task. The table shows the tab on which the object type appears in the Local Object Library and describes the Data Services context in which you can use each type of object. producing output data sets from the sources you specify. There are two job types: batch jobs and real-time jobs. Assess and Monitor Contents Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. The Local Object Library lists both platform. Opens the Data Services Technical Manuals. Auto Data Services Documentation. and Data Quality Reports. Opens and closes the Data Services Management Console. Work flows order data flows and the operations that support data flows. Jobs are executable work flows. which allows you to assess and monitor the quality of your data. and functions imported into Data Services .
You must restart Data Services after the import process completes. Custom functions are functions written in the Data Services Scripting Language.xml file format can make repository content easier for you to read. Using the project area The project area provides a hierarchical view of the objects used in each project.atl or . Browse to the destination for the file. Whole repositories can be exported in either . select the file type for your export file. Click OK.xml format. You can import objects to and export objects from your Local Object Library as a file. On any tab of the Local Object Library. Click Open. The repository is exported to the file. In the File name field. right-click the white space and select Repository Import from File from the menu. Excel file. Importing objects from a file overwrites existing objects with the same names in the destination Local Object Library. 3. To import a repository from a file 1. enter the name of the export file. or XML message. Browse to the destination for the export file. Click Save. The Write Repository Export File dialog box displays. 4. On any tab of the Local Object Library. 2. A warning message displays to let you know that it takes a long time to create new versions of existing objects. It also allows you to export Data Services to other products. right-click the white space and select Repository Export To File. Tabs on the bottom of the project area support different tasks. Tabs include: . XML file. In the Save as type list. To export a repository to a file 1. 2.Tab Description Formats describe the structure of a flat file. The Open Import File dialog box displays. Using the . 4. 3. 5.
this signifies a placement option. When you position the project area where one of the directional arrows highlights a portion of the window. From the menu. 2. double-click the gray border. 3. 2. including which steps are complete and which steps are executing. 3. To switch between the last docked and undocked locations. To lock and unlock the project area 1. To change the undocked position of the project area 1. Right-click the border of the project area.Tab Description Create. it stays undocked. These tasks can also be done using the Data Services Management Console. Click and drag the project area to any location on your screen. The project area does not dock inside the workspace area. Click the pin icon to lock the pane in place again. The project area re-appears. The project area hides. To change the docked position of the project area 1. Logs can also be viewed with the Data Services Management Console. view. select Floating. 4. View the status of currently executing jobs. When you drag the project area away from a window edge. Right-click the border of the project area. 3. Click and drag the project area to dock and undock at any edge within Designer. From the menu. View the history of complete jobs. Selecting a specific job execution displays its status. 2. Move the mouse over the docked pane. This provides a hierarchical view of all objects used in each project. select Floating to remove the check mark and clear the docking option. Click the pin icon ( ) on the border to unlock the project area. . and manage projects.
The tool palette contains these objects: Icon Tool Description Available in Pointer Returns the tool pointer to a selection pointer for selecting and moving objects All objects in a diagram. The project area disappears from the Designer window. From the menu. To show the name of each icon. you are creating a new definition of an object.To hide/show the project area 1. you can later drag that existing data flow from the Local Object Library and add it to another data flow called DF2. Jobs and work flows R/3 data Creates a new data flow with the SAP flow licensed extension only. Work flow Data flow Creates a new work flow. select Hide. it is automatically available in the Local Object Library after you create it. If a new object is re-usable. if you select the data flow icon from the tool palette and define a new data flow called DF1. When you create an object from the tool palette. The icons are disabled when they are invalid entries to the diagram open in the workspace. Query Creates a query to define column transform mappings and row selections. Using the tool palette The tool palette is a separate window that appears by default on the right edge of the Designer workspace. To show the project area. hold the cursor over the icon until the tool tip for the icon appears. The icons in the tool palette allow you to create new objects in the workspace. You can move the tool palette anywhere on your screen or dock it on any edge of the Designer window. click Project Area in the toolbar. 2. Jobs and work flows Creates a new data flow. 3. Right-click the border of the project area. For example. SAP licensed extension Data flows .
Creates a new catch object that catches Jobs and work flows errors in a job. extension Script Creates a new script object. and Annotation objects. XML Data flows Data flows Data Create a data transport flow for the SAP SAP Licensed transport Licensed extension. You specify the flow of data by connecting objects in the workspace from left to right in the order you want the data to be moved. the workspace becomes active with your selection. Work flows Try Creates a new try object that tries an alternate work flow if an error occurs in Jobs and work flows a job. Jobs and work flows Jobs and work flows Conditional Creates a new conditional object. work flows.Icon Tool Description Available in Template Creates a new table for a target. While Loop Repeats a sequence of steps in a work flow as long as a condition is true. . This diagram is a visual representation of an entire data movement application or some part of a data movement application. table Template Creates a new XML file for a target. The workspace provides a place to manipulate objects and graphically assemble data movement processes. data flows Using the workspace When you open a job or any object within a job hierarchy. Catch Creates an annotation used to describe Jobs. These processes are represented by icons that you drag and drop into a workspace to create a diagram.
In the drop-down list for each datastore column. or each time you check in and check out the datastore. select System Configurations. you avoid modifying your datastore each time you import or export a job. When designing jobs. 3. Click OK. Data Services maintains system configurations separately.Creating a system configuration System configurations define a set of datastore configurations that you want to use together when running a job. However. Use the SC_ prefix in the system configuration name so that you can easily identify this file as a system configuration. Create datastore configurations for the datastores in your repository before you create the system configurations for them. You cannot define a system configuration if your repository does not contain at least one datastore with multiple configurations. particularly when exporting. By maintaining system configurations in a separate file. determine and create datastore configurations and system configurations depending on your business environment and rules. From the Tools menu. To create a system configuration 1. and a system administrator determines which system configuration to use when scheduling or starting a job in the Administrator. In the Configuration name column. select the appropriate datastore configuration that you want to use when you run a job using this system configuration. 4. . a Data Services designer defines the required datastore and system configurations. In many organizations. 2. enter the system configuration name. you can export system configurations to a separate flat file which you can later import. You cannot check in or check out system configurations. The System Configuration Editor dialog box displays columns for each datastore.
Only one project can be open at a time. Opening a project makes one group of objects easily accessible in the user interface. A project is used solely for organizational purposes. The objects in the project area also display in the workspace. After completing this unit. you can expand it to view the lower-level objects. and delete objects in the workspace Create a work flow Creating a project A project is a single-use object that allows you to group jobs. For example. If a plus sign (+) appears next to an object. . you will be able to: • • • • Create a project Create a job Add. connect. From the Project menu. To create a new project 1. which are grouped into projects. where you can drill down into additional levels:. It is the highest level of organization offered by Data Services. select New Project. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. These data flows are organized into executable jobs. The objects in a project appear hierarchically in the project area in Designer.Working with objects Introduction Data flows define how information is moved from source to target.
Enter a unique name in the Project name field. . they also appear in the project area. Click Open.Open dialog box displays. The Project . The new project appears in the project area. As you add jobs and other lower-level objects to the project.You can also right-click the white space on the Projects tab of the Local Object Library and select New from the menu. select Save All. The name can include alphanumeric characters and underscores (_). To open an existing project 1. If another project is already open. It cannot contain blank spaces.New dialog box displays. To save a project 1. select Open. From the Project menu. The Project . Data Services closes that project and opens the new one in the project area. 2. 3. 2. Select the name of an existing project from the list. 3. From the Project menu. Click Create.
You can include any of the following objects in a job definition: • • • • Work flows Scripts Conditionals While loops • Try/catch blocks • Data flows 0 Source objects 0 Target objects 0 Transforms If a job becomes complex. A job diagram is made up of two or more objects connected together. In production. Click OK. and then create a single job that calls those work flows. you can schedule batch jobs and set up real-time jobs as services that execute a process when Data Services receives a message request. Each step is represented by an object icon that you place in the workspace to create a job diagram. When you are developing your data flows.The Save all changes dialog box lists the jobs. you can organize its content into individual work flows. 3. you can manually execute and test jobs directly in Data Services. You are also prompted to save all changes made in a job when you execute the job or exit the Designer. 2. Creating a job A job is the only executable object in Data Services. A job is made up of steps that are executed together. work flows. Deselect any listed object to avoid saving it. and data flows that you edited since the last save. .
3. Adding. It cannot contain blank spaces. 2. 2. To add objects from the Local Object Library to the workspace 1. . In the Local Object Library. and then click the workspace to add the object. Click and drag the selected object on to the workspace. Click the cursor outside of the job name or press Enter to commit the changes. Edit the name of the job. you can add objects to the job workspace area using either the Local Object Library or the tool palette. When you create a job in the Local Object Library. and deleting objects in the workspace After creating a job. In the project area. right-click the project name and select New Batch Job from the menu. Data Services opens a new workspace for you to define the job. connecting.Tip: It is recommended that you follow consistent na ming con ventions to facilitate object identification across all systems in your en terprise. A new batch job is created in the project area. You can also create a job and related objects from the Local Object Library. you must associate the job and all related objects to a project before you can execute the job. move the cursor to the workspace. click the desired object. The name can include alphanumeric characters and underscores (_). To create a job in the project area 1. click the tab for the type of object you want to add. To add objects from the tool palette to the workspace • In the tool palette.
Open the job or work flow to which you want to add the work flow. while loops. and scripts. To disconnect objects in the workspace area • Select the connecting line between the objects and press Delete. 4. Click the cursor outside of the work flow name or press Enter to commit the changes. To create a work flow 1. the purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. and you can nest calls to any depth. To connect objects in the workspace area • Click and drag from the triangle or square of an object to the triangle or square of the next object in the flow to connect the objects. 5. conditionals. Almost all of the features documented for work flows also apply to jobs. They can also call other work flows. elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path. Ultimately. jobs are just work flows that can be executed. You must connect the objects in a work flow when there is a dependency between the steps. Select the Work Flow icon in the tool palette. Note: In essence. 3. try/catch blocks. For example. .Creating a work flow A work flow is an optional object that defines the decision-making process for executing other objects. 2. unless the jobs containing those work flows execute in parallel. Steps in a work flow execute in a sequence from left to right. Click the workspace where you want to place the work flow. Enter a unique name for the work flow. Defining the order of execution in work flows The connections you make between the icons in the workspace determine the order in which work flows execute. Work flows can contain data flows. A work flow can even call itself.
and skips subsequent occurrences in the job. such as jobs with try/catch blocks or conditionals. define Work Flow B: Finally. you can define each sequence as a separate work flow. you must define Work Flow A: Next. and then call each of the work flows from another work flow. as in this example: First. and you want to ensure that Data Services only executes a particular work flow or data flow one time. Data Services only executes the first occurrence of the work flow or data flow.To execute more complex work flows in parallel. create Work Flow C to call Work Flows A and B: You can specify a job to execute a particular work flow or data flow once only. You might use this feature when developing complex jobs with multiple paths. . If you specify that it should be executed only once.
This result is called a data set. The intermediate result consists of a set of rows from the previous operation and the schema in which the rows are arranged. a work flow can: • Call data flows to perform data movement operations. and target objects that represent the key activities in data integration and data quality processes. • Define the conditions appropriate to run data flows. be further filtered and directed into yet another data set. however. The lines connecting objects in a data flow represent the flow of data through data integration and data quality processes. Any data set created within a data flow is not available to other steps in the work flow. transformed. the results of a SQL statement contain a WHERE clause that flows to the next step in the data flow. transform. This data set may. you will be able to: • Create a data flow • Explain source and target objects • Add source and target objects to a data flow Using data flows Data flows determine how information is extracted from sources.Creating a data flow Introduction Data flows contain the source. . After completing this unit. A work flow does not operate on data sets and cannot provide more data to a data flow. Each icon you place in the data flow diagram becomes a step in the data flow. • Pass parameters to and from data flows. up to the target definition. For example. The objects that you can use as steps in a data flow are: • Source and target objects • Transforms The connections you make between the icons determine the order in which Data Services completes the steps. Data flows are closed operations. produces an intermediate result. in turn. Using data flows as steps in work flows Each step in a data flow. and loaded into targets. even when they are steps in a work flow.
6. Open the job or work flow in which you want to add the data flow. 4. You should not select this option if the parent work flow is a recovery unit. Changing data flow properties You can specify the following advanced data properties for a data flow: Data Flow Property Description Execute only once When you specify that a data flow should only execute once. even if the data flow is contained in a work flow that is a recovery unit that re-executes. a batch job will never re-execute that data flow after the data flow completes successfully. Database links are communication paths between one database server and another. 3. 5. Click the workspace where you want to add the data flow. Enter a unique name for your data flow. and table comparisons.To create a new data flow 1. They cannot contain blank spaces. Database links allow local users to access data on a remote database. Select one of the following values: Use database links Degree of parallelism Cache type . sorts. which can be on the local or a remote computer of the same or different database type. Data flow names can include alphanumeric characters and underscores (_). Degree of parallelism (DOP) is a property of a data flow that defines how many times each transform within a data flow replicates to process a parallel subset of data. Click the cursor outside of the data flow or press Enter to commit the changes. 2. groups. For more information see “Degree of parallelism” in the Data Services Performance Optimization Guide. lookups. filtering. Select the Data Flow icon in the tool palette. Double-click the data flow to open the data flow workspace. For more information see “Database link support for push-down operations across datastores” in the Data Services Performance Optimization Guide. You can cache data to improve performance of operations such as joins.
see “Tuning Caches” in the Data Services Performance Optimization Guide. see “Data Flow” in the Data Services Resource Guide. For more information.Data Flow Property Description • In Memory: Choose this value if your data flow processes a small amount of data that can fit in the available memory. To change data flow properties 1. This is the default. 2. For more information about how Data Integrator processes data flows with multiple properties. 3. The Properties window opens for the data flow. Change the properties of the data flow as required. . • Pageable: Choose this value if you want to return only a subset of data at a time to limit the resources required. Right-click the data flow and select Properties from the menu. Click OK.
Transform . Source only such as the Date Generation transform. A source in real-time jobs. A template table that has been created and saved in another data flow (used in development).Explaining source and target objects A data flow directly reads data from source objects and loads data to target objects. A file formatted with XML tags. Object Description Type Table A file formatted with columns and rows as used in Source and target relational databases. Template table Source and target File Source and target Document A file with an application-specific format Source and target (not readable by SQL or XML parser). A delimited or fixed-width flat file. XML file Source and target XML message Source only XML template file An XML file whose format is based on the preceding transform output (used in Target only development. A pre-built set of operations that can create new data. primarily for debugging data flows).
or create the file format for flat files. 3. Select Make Source or Make Target from the menu. Click and drag the object to the workspace. select the table. open the data flow in which you want to place the object.Adding source and target objects Before you can add source and target objects to a data flow. . 5. Do one of the following: • To add a database table. 2. in the Formats tab of the Local Object Library. you must first create the datastore and import the table metadata for any databases. depending on whether the object is a source or target object. 4. • To add a flat file. Add and connect objects in the data flow as appropriate. in the Datastores tab of the Local Object Library. To add a source or target object to a data flow 1. In the workspace. A pop-up menu appears for the source or target object. select the file format.
Using the Query transform Introduction The Query transform is the most commonly-used transform. It enables you to select data from a source and filter or reformat it as it moves to the target. and is included in most data flows. The workspace can contain these areas: • Input schema area • Output schema area • Parameters area . you will be able to: • Describe the transform editor • Use the Query transform Describing the transform editor The transform editor is a graphical interface for defining the properties of transforms. After completing this unit.
Explaining the Query transform The Query transform is used so frequently that it is included in the tool palette with other standard objects. For any data that needs to move from source to target. For template tables. The options available on this tab differs based on which transform or object you are modifying. Below the input and output schema areas is the parameters area. The output schema area displays the schema of the output data set. the output schema can be defined based on your preferences.The input schema area displays the schema of the input data set. The I iconindicates tabs ) ( containing user-defined entries. . a relationship must be defined between the input and output schemas. you must map each input column to the corresponding output column. For source objects and some transforms. It retrieves a data set that satisfies conditions that you specify. including any functions. To create this relationship. similar to a SQL SELECT statement. this area is not available.
For example. and function results to the output schema. The data output is a data set based on the conditions you specify and using the schema specified in the output schema area. Input/Output The data input is a data set from one or more sources with rows flagged with a NORMAL operation code. The next section gives a brief description the function. options. If a row is flagged as NORMAL when loaded into a target table or file. The output schema can contain multiple columns and functions. • Add new columns.The Query transform can perform the following operations: • Filter the data extracted from sources. • Assign primary keys to output columns. • Join data from multiple sources. • Perform data nesting and unnesting. . The NORMAL operation code creates a new row in the target. Output schema area displays the schema output from the Query transform as a hierarchical tree. Options The input schema area displays all schemas input to the Query transform as a hierarchical tree. Note: When working with nested data from an XML file. you could use the Query transform to select a subset of the data in a table to show only those records from a specific region. and data output results for the Query transform. data input requirements. nested schemas. For more information on the Query transform see “Transforms” Chapter 5 in the Data Services Reference Guide. you can use the Query transform to unnest the data using the right-click menu for the output schema. • Perform transformations and functions on the data. Each input schema can contain multiple columns. • Map columns from input to output schemas. All the rows in a data set are flagged as NORMAL when they are extracted by a source table or file. it is inserted as a new row in the target. which provides options for unnesting.
This indicates that the column has a complex mapping. For each unique set of values in the group by list. A simple mapping is either a single column or an expression with no input column. such as a transformation or a merge between two source columns. Set conditions that determine which rows are output. Specify a list of columns for which you want to combine output. Specify the columns you want used to sort the output data set. This indicates that the column mapping is incorrect. Specify the input schemas used in the current output schema. Select only distinct rows (discarding any duplicate rows). so not all incorrect mappings will necessarily be flagged.Icons preceding columns are combinations of these graphics: Icon Description This indicates that the column is a primary key. Data Services combines or aggregates the values in the remaining columns. Create separate sub data flows to process any of the following resource-intensive query clauses: OUTER JOIN WHERE GROUP BY ORDER BY Advanced . Data Integrator does not perform a complete validation during design. The parameters area of the Query transform includes the following tabs: Tab Description Mapping SELECT FROM Specify how the selected output column is derived. Specify an inner table and an outer table for joins that you want treated as outer joins. This indicates that the column has a simple mapping.
You can either type the column name in the parameters area or click and drag the column from the input schema pane. Find To map input columns to output columns • In the transform editor. . • Select the output column. • Select the output column and manually enter the mapping on the Mapping tab in the parameters area. release the cursor. • Drag a single input column over the corresponding output column. do any of the following: • Drag and drop a single column from the input schema area into the output schema area. see “Distributed Data Flow execution” in the Data Services Designer Guide. • Select multiple input columns (using Ctrl+click or Shift+click) and drag onto Query output schema for automatic mapping.Tab Description • • • • DISTINCT GROUP BY JOIN ORDER BY For more information. and select Remap Column from the menu. Search for a specific work or item in the input schema or the output schema. highlight and manually delete the mapping on the Mapping tab in the parameters area.
Changes are automatically committed. double-click the target table. Change the values as required. To access the target table editor 1. After completing this unit. . Click Back to return to the data flow.Using target tables Introduction The target object for your data flow can be either a physical table or file. The target table editor opens in the workspace. or a template table. you will be able to: • Access the table table editor • Set target table options • Use template tables Accessing the target table editor The target table editor provides a single location to change settings for your target tables. 2. 3. In a data flow.
and tuning techniques for loading a job. table loading options. You can set the following table loading options in the Options tab of the target table editor: Option Description Rows per commit Specifies the transaction size in number of rows. the target table editor opens in the workspace with different tabs where you can set database type properties. There are two options: Column comparison . Specifies how the input columns are mapped to output columns. which are outside the scope of this course.Setting target table options When your target object is a physical table in a database. Note: Most of the tabs in the target table editor focus on migration or performance-tuning techniques.
Writes rows that cannot be loaded to the overflow file for recovery purposes. Sends a TRUNCATE statement to clear the contents of the table before loading during batch jobs. The second 1000 rows are sent to the second loader. Defaults to not selected. Ensures that NULL source columns are not updated in the target table during auto correct loading. Number of loaders For example. Specifies a value that might appear in a source column that you do not want updated in the target table. You can enter spaces. Specifies the number of loaders (to a maximum of five) and the number of rows per commit that each loader receives during parallel loading. the corresponding target column is not updated during auto correct loading. the first 1000 rows are sent to the first loader. The overflow format can include the data rejected and the operation being performed (write_data) or the SQL command used to produce the rejected operation (write_sql). if you choose a Rows per commit of 1000 and set the number of loaders to three. When this value appears in the source column. and the next 1000 rows back to the first loader. • Compare_by_name — maps source columns to target columns by name. Options are enabled for the file name and file format. the third 1000 rows to the third loader. Validation errors occur if the datatypes of the columns do not match. Delete data from table before loading Use overflow file Ignore columns with value Ignore columns with null .Option Description • Compare_by_position — disregards the column names and maps source columns to target columns by position.
Ensures that the same row is not duplicated in a target table. If loading fails for any one of the tables. Data Integrator reads a row from the source and checks if a row exists in the target table with the same values in the primary key. there is no ordering. By default. Updates key column values when it loads data to the target. Indicates where this table falls in the loading order of the tables being loaded. Data Integrator also does not parameterize SQL or push operations to the database if transactional loading is enabled. This is particularly useful for data recovery operations. and overflow file specification. Transactional loading can require rows to be buffered to ensure the correct load order. If a matching row exists. This option allows you to commit data to multiple tables as part of the same transaction. no data is committed to any of the tables. it inserts the new row regardless of other options. Update key columns Auto correct load When Auto correct load is selected. Number of loaders. Data Integrator reports a memory error. By default. Use overflow file. Indicates that this target is included in the transaction processed by a batch or real-time job. Data Integrator uses the primary key of the target table.Option Description Use input keys Enables Data Integrator to use the primary keys from the source table. If a matching row does not exist. Enable partitioning. Transaction order . it updates the row depending on the values of Ignore columns with value and Ignore columns with null. these options are not available: Rows per commit. If the data being buffered is larger than the virtual memory available. Include in transaction The tables must be from the same datastore. If you choose to enable transactional loading. and Delete data from table before loading.
After creating a template table as a target in one data flow. and transform options. Open a data flow in the workspace. If you specify orders among the tables. You can modify the schema of the template table in the data flow where the table is used as a target. click the Template Table icon and click the workspace to add a new template table to the data flow. Template tables are particularly useful in early application development when you are designing and testing a project. the loading operations are applied according to the order. With template tables. Using template tables During the initial design of an application. Tables with the same transaction order are loaded together. you do not have to initially create a new table in your RDBMS and import the metadata into Data Services. Any changes are automatically applied to any other instances of the template table. you might find it convenient to use template tables to represent database tables. To create a template table 1. After a template table is converted. you can use it as a source in other data flows. In the tool palette. 2. Data Services automatically creates the table in the database with the schema defined by the data flow when you execute a job. functions. you can no longer alter the schema.Option Description All loaders have a transaction order of zero. Instead. You must convert template tables so that you can use the new table in expressions. See the Data Services Performance Optimization Guide and “Description of objects” in the Data Services Reference Guide for more information. it can be used only as a target in one data flow. you can convert the template table in the repository to a regular table. Although a template table can be used as a source table in multiple data flows. . Tables with a transaction order of zero are loaded at the discretion of the data flow process. After a template table is created in the database.
To convert a template table into a regular table from the Local Object Library 1. In the Table name field. 4. Click OK. 5. 3. 2. expand the branch for the datastore to view the template table. You also can create a new template table in the Local Object Library Datastore tab by expanding a datastore and right-clicking Templates. . select the datastore for the template table. In the In datastore drop-down list. enter the name for the template table. On the Datastores tab of the Local Object Library. Right-click a template table you want to convert and select Import Table from the menu.The Create Template dialog box displays.
Data Services converts the template table in the repository into a regular table by importing it from the database. To update the icon in all data flows. To convert a template table into a regular table from a data flow 1. On the Datastore tab of the Local Object Library. 3. . select Refresh. Open the data flow containing the template table. from View menu. the table is listed under Tables rather than Template Tables. Right-click the template table you want to convert and select Import Table from the menu. 2.
The Job Server must be running. You will likely run immediate jobs only during the development cycle. you will be able to: • Understand job execution • Execute the job Explaining job execution After you create your project. You can run jobs two ways: • Immediate jobs Data Services initiates both batch and real-time jobs and runs them immediately from within the Designer. To schedule a job. For these jobs. • Scheduled jobs Batch jobs are scheduled. usually on the same machine) must be running. Setting execution properties When you execute a job.Executing the job Introduction Once you have created a data flow. you can then execute the job. After completing this unit. the following options are available in the Execution Properties window: Option Description Print all trace messages Records all trace messages in the log. jobs. you can execute the job in Data Services to see how the data moves from source to target. Enable auditing Collects audit statistics for this specific job execution. use the Data Services Management Console or use a third-party scheduler. Disable data validation statisticsDoes not collect audit statistics for this specific job collection execution. both the Designer and designated Job Server (where the job executes. and associated data flows. it does not execute. . If a job has syntax errors.
Specifies the system configuration to use when executing this job.Option Description Enable recovery Enables the automatic recovery feature. Optimizes Data Services to use the cache statistics collected on a previous execution of the job. This option is a run-time property that is only available if there are system configurations defined in the repository. This option is a run-time property. Resumes a failed job. Displays cache statistics in the Performance Monitor in Administrator. Collects statistics that the Data Services optimizer will use to choose an optimal cache type (in-memory or pageable). Job Server or Server Group Specifies the Job Server or server group to execute this job.The entire job will execute on one server. Allows a job to be distributed to multiple Job Servers for processing. Data Services uses the default datastore configuration for each datastore. Data Services retrieves the results from any steps that were previously executed successfully and re-executes any other steps. • Data flow . • Sub-data flow . The options are: • Job . which define the datastore connections. Recover from last failed execution Collect statistics for optimization Collect statistics for monitoring Use collected statistics System configuration If a system configuration is not specified. When enabled. This option is not available when a job has not yet been executed or when recovery mode was disabled during the previous run.Each data flow within the job will execute on a separate server. Data Services saves the results from completed steps and allows you to resume failed jobs. A system configuration defines a set of datastore configurations. Distribution level .Each sub-data flow (can be a separate transform or function) within a data flow will execute on a separate Job server.
Click OK. In the project area. To execute a job as an imme diate task 1. right-click the job name and select Execute from the menu. 3. 4. Data Services prompts you to save any objects that have not been saved. Select the required job execution parameters. Both the Designer and Job Server must be running for the job to execute. Click OK. 2. .Executing the job Immediate or on demand tasks are initiated from the Designer. The Execution Properties dialog box displays.
• Describe platform transforms • Use the Map Operation transform • Use the Validation transform • Use the Merge transform • Use the Case transform • Use the SQL transform .Lesson 4 Using Platform Transforms Lesson introduction A transform enables you to control how data sets change in a data flow.
Transforms are similar to functions in that they can produce the same or similar values during processing. output data. transforms and functions operate on a different scale: • Functions operate on single values. After completing this unit. However. Some transforms. you will be able to: • • • • Explain transforms Describe the platform transforms available in Data Services Add a transform to a data flow Describe the Transform Editor window Explaining transforms Transforms are objects in data flows that operate on input data sets by changing them or by generating one or more new data sets. can be used as source objects. You can choose to edit the input data. and deleting rows of data. For example. and parameters in a transform. The Query transform is the most commonly-used transform. . and Key Generation transforms are used for slowly changing dimensions. updating.Describing platform transforms Introduction Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target. such as values in specific columns in a data set. such as the Date Generation and SQL transforms. • Transforms operate on data sets by creating. History Preserve. in which case they do not have input options. Each transform provides different options that you can specify based on the transform's function. Transforms are added as components to your data flow in the same way as source and target objects. the Table Comparison. Transforms are often used in combination to create the output data set.
You can have one validation rule per column. Map Operation Merge Query Row Generation SQL Validation . Retrieves a data set that satisfies conditions that you specify. Performs the indicated SQL query operation. Allows you to specify validation criteria for an input data set. Data that fails validation can be filtered out or replaced. A query transform is similar to a SQL SELECT statement. Unifies rows from two or more input data sets into a single output data set. Generates a column filled with integers starting at zero and incrementing by one to the end value you specify.Describing platform transforms The following platform transforms are available on the Transforms tab of the Local Object Library: Icon Transform Description Case Divides the data from an input data set into multiple output data sets based on IF-THEN-ELSE branch logic. Allows conversions between operation codes.
with the Preserve delete row(s) as update row(s) option selected. Rows flagged as DELETE are not loaded. NORMAL . Overwrites an existing row in the target table. All rows in a data set are flagged as NORMAL when they are extracted by a source table or file. After completing this unit. The operation codes indicate how each row in the data set would be applied to a target table if the data set were loaded into a target. Creates a new row in the target. it is inserted as a new row in the target. If a row is flagged as NORMAL when loaded into a target table or file. DELETE Only the History Preserving transform. INSERT Only History Preserving and Key Generation transforms can accept data sets with rows flagged as INSERT as input. can accept data sets with rows flagged as DELETE.Using the Map Operation transform Introduction The Map Operation transform enables you to change the operation code for records. UPDATE Only History Preserving and Key Generation transforms can accept data sets with rows flagged as UPDATE as input. Most transforms operate only on rows flagged as NORMAL. The operation codes are as follows: Operation Code Description Creates a new row in the target. Is ignored by the target. you will be able to: • Describe map operations • Use the Map Operation transform Describing map operations Data Services maintains operation codes that describe the status of each row in each data set described by the inputs to and outputs from objects in data flows.
because comparison results are unpredictable for this datatype. NORMAL. Choose from the following operation codes: INSERT. or DISCARD. options. The next section gives a brief description the function. For more information on the Map Operation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. It can contain hierarchical data. Options The Map Operation transform enables you to set the Output row type option to indicate the new operations desired for the input data set. For example. DELETE. UPDATE.Explaining the Map Operation transform The Map Operation transform allows you to change operation codes on data sets to produce the desired output. Inputs/Outputs Input for the Map Operation transform is a data set with rows flagged with any operation codes. . if a row in the input data set has been updated in some previous operation in the data flow. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target. data input requirements. you can use this transform to map the UPDATE operation to an INSERT. and data output results for the Map Operation transform. Use caution when using columns of datatype real in this transform. Output for the Map Operation transform is a data set with rows flagged as specified by the mapping operations. Data Services can push Map Operation transforms to the source database.
import the file and open it to view the data flow design and mapping logic. 2. 8. 9. . 11. In the transform editor for the Query transform. disconnect the Query transform from the target table.Execute Alpha_Employees_Current_Job with the default execution properties and save all objects you have created. To check the solution. Instructions 1.atl is included in your Course Resources. A solution file called SOLUTION_M apOperation. Add the Employee table from the HR_datamart datastore as the target object. change the settings so that rows with an input operation code of NORMAL have an output operation code of DELETE. create a new batch job called Alpha_Employees_Current_Job with a data flow called Alpha_Employees_Current_DF. Do not execute the solution job. 4. 3. map all columns from the input schema to the same column in the output schema. The expression should be: employee. 6. In the Omega project. as this may override the results in your target table. create an expression to select only those rows where discharge_date is not empty. add the Employee table from the Alpha datastore as the source object. Note that two rows were filtered from the target table. 5. 10. In the data flow workspace.Return to the data flow workspace and view data for both the source and target tables. Objective • Use the Map Operation transform to remove any employee records that have a value in the discharge_date column.Activity: Using the Map Operation transform End users of employee reports have requested that employee records in the data mart contain only current employees. In the data flow workspace. In the transform editor for the Map Operation transform. On the WHERE tab. Add a Map Operation transform between the Query transform and the target table and connect it to both.discharge_date is not null 7. Add the Query transform to the workspace and connect all objects.
you will be able to: • Use the Validation transform Explaining the Validation transform Use the Validation transform in your data flows when you want to ensure that the data at any stage in the data flow meets your criteria. . After completing this unit. The available outputs are pass and fail. The Validation transform qualifies a data set based on rules for input schema columns. It filters out or replaces data that fails your criteria. if you want to load only sales records for October 2007. you would set up a validation rule that states: Sales Date is between 10/1/2007 to 10/31/2007.Using the Validation transform Introduction The Validation transform enables you to create validation rules and move data into target objects based on whether they pass or fail validation. you can choose to pass the record into a Fail table. You can have one validation rule per column. For example. you can set the transform to ensure that all values: • Are within a specific range • Have the same format • Do not contain NULL values The Validation transform allows you to define a re-usable business rule to validate each record and column. or do both. If it does not. correct it in the Pass table. For example. Data Services looks at this date field in each record to validate if the data meets this requirement.
data input requirements. for any NULL values. and data output results for the Validation transform. see “Validation Transform”. Other actions for other validation columns in the row are ignored. then the whole row is sent only to the Fail output. Input/Output Only one source is allowed as a data input for the Validation transform. • Use the Action on Failure area to describe what happens to invalid or failed data. For example. The Validation transform outputs up to two different data sets based on whether the records pass or fail the validation condition you specify. you select a column in the input schema and create a validation rule in the Validation transform editor. For more information on creating a custom Validation functions. Pass. Continuing with the example above. • The DI_ERRORCOLUMNS column displays all error messages for columns with failed rules. You can load pass and fail data into multiple targets.Your validation rule consists of a condition and an action on failure: • Use the condition to describe what you want for your valid data. For more information on the Validation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. Data Services adds the following two columns to the Fail output schemas: • The DI_ERRORACTION column indicates where failed data was sent in this way: 0 The letter B is used for sent to both Pass and Fail outputs. options. 0 The letter F is used for sent only to the Fail output. Options When you use the Validation transform. specify the condition IS NOT NULL if you do not want any NULLS in data passed to the specified target. For example. you may want to select the Send to Fail option to send all NULL values to a specified FAILED target table. You can also create a custom Validation function and select it when you create a validation rule. Both. The Validation transform offers several options for creating this validation rule: . For example. then the precedence order is Fail. The Pass output schema is identical to the input schema. if one column’s action is Send to Fail and the column fails. and Both actions are specified for the row. Fail. “<ValidationTransformName> failed rule(s): c1:c2”. If you choose to send failed data to the Pass output. If a row has conditions set for multiple columns and the Pass. You may want to substitute a value for failed data that you send to the Pass output because Data Services does not add columns to the Pass output. Data Services does not track the results. The names of input columns associated with each message are separated by colons. Chapter 12 in the Data Services Reference Guide. The next section gives a brief description the function.
<. =. • In: specify a list of possible values for a column. >) and enter the associated value. Data Services supports Validation functions that take one parameter and return an integer datatype. If a return value is not a zero. Define the condition for the validation rule: • Operator: select an operator for a Boolean expression (for example. • Match pattern: enter a pattern of upper and lowercase alphanumeric characters to ensure the format of the column is correct. • Custom validation function: select a function from a list for validation purposes. This option also uses the LOOKUP_EXT function. datetime. or time. timestamp. decimal. • Between/and: specify a range of values for a column.Option Description Enable Validation Turn the validation rule on and off for the column. Data Services converts substitute values in the condition to a corresponding column datatype: integer. Send all NULL values to the Pass output automatically. Data Services will not apply the validation rule on this column when an incoming value for it is NULL. You can define the NOT NULL constraint for the column in the LOOKUP table to ensure the Exists in table condition executes properly. date. • Exists in table: specify that a column’s value must exist in a column in another table. varchar. then Data Services processes it as TRUE. The Validation Do not validate when NULL Condition . • Custom condition: create more complex expressions using the function and smart editors.
Add your target objects to the workspace.MM. you specify a date as 12-01-2004. 4.01. you can choose to substitute a value or expression for the failed values that are sent to the Pass output. click and drag the Validation transform to the workspace to the right of your source object. Open the data flow workspace. depending on the options you select.FF) If. 2. Connect the source object to the transform.Option Description transform requires that you enter some values in specific formats: • date (YYYY. for example. On the Transforms tab of the Local Object Library. 6. 3. .DD HH24:MI:SS) • time (HH24:MI:SS) • timestamp (YYYY. 5.MM. Add your source object to the workspace.DD HH24:MI:SS.MM.DD) • datetime (YYYY. Double-click the Validation transform to open the transform editor. Define where a record is loaded if it fails the validation rule: • Send to Fail • Send to Pass • Send to both If you choose Send to Pass or Send to Both. Data Services produces an error because you must enter this date as 2004. Action on Fail To create a validation rule 1.12. You will require one target object for records that pass validation. and an optional target object for records that fail validation.
On the Properties tab. All conditions must be Boolean expressions. 10. select the Enable Validation option. 11. 9. 8. select the For pass.Click and drag from the transform to the target object. enter a name and description for the validation rule.Release the mouse and select the appropriate label for that object from the pop-up menu. substitute with option and enter a substitute value or expression for the failed value that is sent to the Pass output.Click Back to return to the data flow workspace. In the Condition area. 15.If desired. . click to select an input schema column. select an action. In the input schema area. 14.On the Action On Failure tab. 13. select a condition type and enter any associated value required. This option is only available if you select Send to Pass or Send to Both. 12. In the parameters area.7.
adjust the datatypes for the columns based on their content: Column Datatype ORDERID SHIPPERNAME SHIPPERADDRESS SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION int varchar(50) varchar(50) varchar(50) int varchar(20) varchar(20) int . You will use the Validation transform to validate order data from flat file sources and the alpha orders table before merging it. Use the structure of the text file to determine the appropriate settings.16. Create a file format called Order_Shippers_Format for the flat file Order_Shippers_04_20_07. 2. Activity: Using the Validation transform Order data is stored in multiple formats with different structures and different information. • Create a column on the target table for employee information so that orders taken by employees who are no longer with the company are assigned to a default current employee using the validation transform in a new column named order_assigned_to. • Create a column to hold the employee ID of the employee who originally made the sale. Instructions 1.txt.Repeat step 14 and step 15 for all target objects. Objectives • Join the data in the Orders flat files with that in the Order_Shippers flat files. • Replace null values in the shipper fax column with a value of 'No Fax' and send those rows to a separate table for follow up. In the Column Attributes pane.
create a new batch job called Alpha_Orders_Validated_Job and two data flows. enter the correct path. 9.SHIPPERNAME Order_Shippers_Format. Edit the Orders_Format source object to change the Capture Data Conversion Errors option to Yes.ORDERDATE Order_Shippers_Format.CUSTOMERID Orders_Format.Add the following mappings in the Query transform: S chema Out Mapping ORDERID CUSTOMERID ORDERDATE SHIPPERNAME SHIPPERADDRESS Orders_Format. 4. Add the file formats Orders_Format and Order_Shippers_Format as source objects to the Alpha_Orders_Files_DF data flow workspace.Column Datatype SHIPPERPOSTALCODE varchar(15) 3. In the Root directory. and the second named Alpha_Orders_DB_DF.ORDERID Orders_Format. 7. one named Alpha_Orders_Files_DF.ORDERID = Orders_Format. 10. Edit the source objects so that the Orders_Format source is using all three related orders flat files and the Order_Shippers_Format source is using all three order shippers files. The expression should be as follows: Order_Shippers_Format.SHIPPERADDRESS .Add a Query transform to the workspace and connect it to the two source objects. edit the source objects to point to the file on the Job Server. create a WHERE clause to join the data on the OrderID values. If necessary. 6. In the Omega project. The instructor will provide this information. 5.ORDERID 12. In the Location drop-down list. If the Job Server is on a different machine than Designer. 8. 11. Tip: You can use a wildcard to replace the dates in the file names. select Job Server.In the transform editor for the Query transform. this step is required.
16.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a datatype of varchar(15) and map it to Orders_Format.SHIPPERPOSTALCODE 13. enable validation for the ORDER_ASSIGNED_TO column to verify the value in the column exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore. one called Orders_Files_Work and one called Orders_Files_No_Fax.SHIPPERFAX Order_Shippers_Format.DBO.Add two target tables in the Delta datastore as targets. The expression should be as follows: HR_DATAMART. 23. 22. 18.Add a Query transform to the workspace and connect it to the source.EM PLOYEE.S chema Out Mapping SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION SHIPPERPOSTALCODE Order_Shippers_Format.In the transform editor for the Query transform.Add a Validation transform to the right of the Query transform and connect the transforms. 14.SHIPPERCITY Order_Shippers_Format.SHIPPERPHONE Order_Shippers_Format. 19.Set the action on failure for the Order_Assigned_To column to send to both pass and fail.SHIPPERCOUNTRY Order_Shippers_Format.Enable validation for the SHIPPERFAX column to send NULL values to both pass and fail. except the EMPLOYEEID column.EMPLOYEEID. 20. . For pass. substituting 'No Fax' for pass.EMPLOYEEID.SHIPPERREGION Order_Shippers_Format.Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a datatype of varchar(15) and map it to Orders_Format.In the Alpha_Orders_DB_DF workspace. 21.In the transform editor for the Validation transform.Connect the pass output from the Validation transform to Orders_Files_Work and the fail output to Orders_Files_No_Fax. add the Orders table from the Alpha datastore as the source object.EM PLOYEEID 17. 15. substitute '3Cla5' to assign it to the default employee. map all of the columns from the input schema to the output schema.
Execute Alpha_Orders_Validated_Job with the default execution properties and save all objects you have created. 26. as this may override the results in your target table.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a data type of varchar(15) and map it to Orders. 30. Do not execute the solution job. 28. . To check the solution. 31.Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a data type of varchar(15) and map it to Orders.Add a Validation transform to the right of the Query transform and connect the transforms.Change the names of the following Schema Out columns: Old column name New column name SHIPPERCITYID SHIPPERCOUNTRYID SHIPPERREGIONID SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION 25.Add two target tables in the Delta datastore as targets. 32.EMPLOYEEID. For pass. substitute '3Cla5' to assign it to the default employee.View the data in the target tables to view the differences between passing and failing records. one named Orders_DB_Work and one named Orders_DB_No_Fax. 27. substituting 'No Fax' for pass.Connect the pass output from the Validation transform to Orders_DB_Work and the fail output to Orders_DB_No_Fax. 34.Enable validation for the ShipperFax column to send NULL values to both pass and fail. A solution file called SOLUTION_Validation.atl is included in your Course Resources. 29. import the file and open it to view the data flow design and mapping logic.EMPLOYEEID. 33.24.Set the action on failure for the Order_Assigned_To column to send to both pass and fail.Enable validation for Order_Assigned_To to verify the column value exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore.
All sources must have the same schema. For more information on the Merge transform see “Transforms” Chapter 5 in the Data Services Reference Guide. you could use the Merge transform to combine two sets of address data: The next section gives a brief description the function. including: • Number of columns • Column names • Column datatypes If the input data set contains hierarchical data. and data output results for the Merge transform. . For example.Using the Merge transform Introduction The Merge transform allows you to combine multiple sources with the same schema into a single target. After completing this unit. you will be able to: • Use the Merge transform Explaining the Merge transform The Merge transform combines incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets. the names and datatypes must match at every level of the hierarchy. options. Input/Output The Merge transform performs a union of the sources. data input requirements.
• Use the Merge transform to merge the validated orders data. The transform does not strip out duplicate rows. Instructions 1.The output data has the same schema as the source data. but the output is for two different sources: flat files and database tables. you can add the Query transform to one of the tables before the Merge transform to redefine the schema to match the other table. the nested data is passed through without change. In the data flow workspace. Change the datatype for the following Schema Out columns as specified: Column Type ORDERDATE SHIPPERADDRESS SHIPPERCOUNTRY SHIPPERREGION datetime varchar(100) varchar(50) varchar(50) . Objectives • Use the Query transforms to modify any column names and data types and to perform lookups for any columns that reference other tables. The next step in the process is to modify the structure of those data sets so they match. 2. In the Omega project. add the orders_file_work and orders_db_work tables from the Delta datastore as the source objects. Options The Merge transform does not offer any options. 3. and then merge them into a single data set. connecting each source object to its own Query transform. The output data set contains a row for every row in the source data sets. Add two Query transforms to the data flow. In the transform editor for the Query transform connected to the orders_files_work table. If columns in the input set contain nested schemas. 5. 4. Tip: If you want to merge tables that do not have the same schema. map all columns from input to output. Activity: Using the Merge transform The Orders data has now been validated. create a new batch job called Alpha_Orders_M erged_Job with a data flow called Alpha_Orders_M erged_DF .
change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.'='. For the SHIPPERCOUNTRY column.SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'.'MAX']. map all columns from input to output. For the SHIPPERREGION column.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 7.REGION.SOURCE. "output_cols_info"='<?xml version="1.SOURCE. [COUNTRYNAM E]. The expression should be as follows: lookup_ext([ALPHA. .[NULL].[NULL]. The expression should be as follows: lookup_ext([ALPHA.'PRE_LOAD_CACHE'.For the SHIPPERCITY column.ORDERS_FILE_WORK.'M AX'].'='.'PRE_LOAD_CACHE'. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore. 9.Column Type SHIPPERPOSTALCODE varchar(50) 6. Change the datatype for the following Schema Out columns as specified: Column Type ORDER_TAKEN_BY ORDER_ASSIGNED_TO SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION varchar(15) varchar(15) varchar(50) varchar(50) varchar(50) 10.[REGIONID. [REGIONNAM E].[COUNTRYID.ORDERS_FILE_WORK.COUNTRY. change the mapping to perform a lookup of CityName from the City table in the Alpha datastore.SHIPPERREGION]) SET ("run_as_separate_process"-'no'. "output_cols_info"='<?xml version="1.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 8. In the transform editor for the Query transform connected to the orders_db_work table.
SHIPPERCITY]) SET ("run_as_separate_process"-'no'. import the file and open it to view the data flow design and mapping logic. "output_cols_info"='<?xml version="1. Note that the SHIPPERCITY. A solution file called SOLUTION_M erge. 15.SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'.'='.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 11. as this may override the results in your target table.SOURCE.'M AX'].'PRE_LOAD_CACHE'.[CITYID.Execute Alpha_Orders_Merged_Job with the default execution properties and save all objects you have created. "output_cols_info"='<?xml version="1.atl is included in your Course Resources.'='.[NULL]. Do not execute the solution job. 14.[NULL]. SHIPPERCOUNTRY.CITY.Add a Merge transform to the data flow and connect both Query transforms to the Merge transform. [CITYNAM E].'M AX'].SHIPPERREGION]) SET ("run_as_separate_process"-'no'.ORDERS_DB_WORK. [REGIONNAM E]. 16. "output_cols_info"='<?xml version="1.For the SHIPPERCOUNTRY column.SOURCE.[COUNTRYID.[REGIONID.COUNTRY.'='.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 12. [COUNTRYNAM E]. To check the solution.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 13.SOURCE. The expression should be as follows: lookup_ext([ALPHA. The expression should be as follows: lookup_ext([ALPHA. .ORDERS_DB_WORK.REGION.The expression should be as follows: lookup_ext([ALPHA.View the data in the target table. and SHIPPERREGION columns for the 363 records in the template table consistently have names versus ID values.For the SHIPPERREGION column. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore.Add a template table called Orders_M erged in the Delta datastore as the target table and connect it to the Merge transform.ORDERS_DB_WORK. change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.'MAX'].'PRE_LOAD_CACHE'.'PRE_LOAD_CACHE'.[NULL].
data input requirements. For more information on the Case transform. Depending on the data.Using the Case transform Introduction The Case transform supports separating data from a source into multiple targets based on branch logic. and data output results for the Case transform. you will be able to: • Use the Case transform Explaining the Case transform You use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform. . Input/Output Only one data flow source is allowed as a data input for the Case transform. The input and output schema are also identical when using the case transform. After completing this unit. only one of multiple branches is executed per row. see “Transforms” Chapter 5 in the Data Services Reference Guide. you can use the Case transform to read a table that contains sales revenue facts for different regions and separate the regions into their own tables for more efficient data access: The next section gives a brief description the function. options. For example. The transform allows you to split a data set into smaller sets based on logical branches.
Define the Case expression for the corresponding label. 5. 6. 4. Each label represents a case expression (WHERE clause). Expression Produce default option with label Row can be TRUE for one case only To create a case statement 1. Add your source object to the workspace. Options The Case transform offers several options: Option Description Label Define the name of the connection that describes where data will go if the corresponding Case condition is true. . Specify that the transform must use the expression in this label when all other Case expressions evaluate to false. 3. Open the data flow workspace. 2. On the Transforms tab of the Local Object Library. Double-click the Case transform to open the transform editor. Connect the source object to the transform. You will require one target object for each possible condition in the case statement.The connections between the Case transform and objects used for a particular case must be labeled. You connect the output of the Case transform with another object in the workspace. click and drag the Case transform to the workspace to the right of your source object. Each output label in the Case transform must be used at least once. Add your target objects to the workspace. Specify that the transform passes each row to the first case whose expression returns true.
. In this case.To direct records that meet multiple conditions to only one target. records are placed in the target associated with the first condition that evaluates as true. 10.Click Back to return to the data flow workspace. 13. select the Produce default option with label option and enter the label name in the associated field.Repeat step 7 to step 10 for all expressions. select the Row can be TRUE for one case only option.To direct records that do not meet any defined conditions to a separate target object. to specify that you want all Customers with a RegionID of 1. For example. create the following statement: Customer. In the Label field.RegionID = 1 11. Click and drag an input schema column to the Expression pane at the bottom of the window. 8.7. click Add to add a new expression. enter a label for the expression. 14.Enter the rest of the expression to define the condition. 12. 9. In the parameters area of the transform editor.
7.ORDERDATE. In the Omega project. 17. 2. the resulting data set must be split out by quarter for reporting purposes. Add the following two output columns: Column Type Mapping ORDERQUARTER int quarter (orders_merged.ORDERYEAR = '2006' and Query. Instructions 1.Connect the transform to the target object. 3. Objective • Use the Case transform to create separate tables for orders occurring in fiscal quarters 3 and 4 for the year 2007 and quarter 1 of 2008. 5. create the following labels and associated expressions: Label Expression Q42006 Query. Add a Case transform to the data flow and connect it to the Query transform.Release the mouse and select the appropriate label for that object from the pop-up menu. add the Orders_Merged table from the Delta datastore as the source object. In the transform editor for the Case transform. 'YYYY') ORDERYEAR varchar(4) 6. Activity: Using the Case transform Once the orders have been validated and merged. map all columns from input to output. In the transform editor for the Query transform.15. In the data flow workspace. create a new batch job called Alpha_Orders_By_Quarter_Job with a data flow named Alpha_Orders_By_Quarter_DF.ORDERDATE) to_char (orders_merged. Add a Query transform to the data flow and connect it to the source table. 16.Repeat step 15 and step 16 for all target objects.ORDERQUARTER = 4 . 4.
ORDERYEAR = '2007' and Query. Choose the settings to not produce a default output set for the Case transform and to specify that rows can be true for one case only. .ORDERQUARTER = 1 Query.ORDERYEAR = '2007' and Query. and Orders_Q4_2007. 9.Execute Alpha_Orders_By_Quarter_Job with the default execution properties and save all objects you have created. Orders_Q1_2007.ORDERYEAR = '2007' and Query. Do not execute the solution job. import the file and open it to view the data flow design and mapping logic.ORDERYEAR = '2007' and Query. A solution file called SOLUTION_Case.ORDERQUARTER = 4 Q22007 Q32007 Q42007 8.ORDERQUARTER = 3 Query. Add five template tables in the Delta datastore called Orders_Q4_2006. 10. To check the solution.Label Expression Q12007 Query.Connect the output from the Case transform to the target tables selecting the corresponding labels. Orders_Q2_2007.ORDERQUARTER = 2 Query.View the data in the target tables and confirm that there are 103 orders that were placed in Q1 of 2007. Orders_Q3_2007. as this may override the results in your target table. 11.atl is included in your Course Resources. 12.
Inputs/Outputs There is no input data set for the SQL transform. After completing this unit. you will be able to: • Use the SQL transform Explaining the SQL transform Use this transform to perform standard SQL operations when other built-in transforms cannot perform them. you cannot use this functionality if your source objects include file formats. The SQL transform can be used to extract for general select statements as well as stored procedures and views. data input requirements. The next section gives a brief description the function. The SQL transform performs more efficiently because the merge is pushed down to the database. There are two ways of defining the output schema for a SQL transform if the SQL submitted is expected to return a result set: . However. For more information on the SQL transform see “Transforms” Chapter 5 in the Data Services Reference Guide. You can use the SQL transform as a replacement for the Merge transform when you are dealing with database tables only. and data output results for the SQL transform. options.Using the SQL transform Introduction The SQL transform allows you to submit SQL commands to generate data to be moved into target objects.
5. Indicate the number of rows retrieved in a single request to a source database. Use this only if the data set is small enough to fit in memory.• Automatic — After you type the SQL statement. Add your target object to the workspace. Double-click the SQL transform to open the transform editor. Enter the text of the SQL query. Options The SQL transform has the following options: Option Description Datastore Specify the datastore for the tables referred to in the SQL statement. 4. The number of columns defined in the output of the SQL transform must equal the number of columns returned by the SQL query. • Manual — Output columns must be defined in the output portion of the SQL transform if the SQL operation is returning a data set. On the Transforms tab of the Local Object Library. Hold the output from this transform in memory for use in subsequent transforms. 2. . Database type Join rank Array fetch size Cache SQL text To create a SQL statement 1. click and drag the SQL transform to the workspace. The highest ranked source is accessed first to construct the join. but the column names and data types of the output columns do not need to match the column names or data types in the SQL query. 3. Connect the transform to the target object. click Update schema to execute a select statement against the database that obtains column information returned by the select statement and populates the output schema. Open the data flow workspace. The default value is 1000. Indicate the weight of the output data set if the data set is used in a join. Specify the type of database for the datastore where there are multiple datastore configurations.
10. you would use the following statement: Select * from Customers.Click Update Schema to update the output schema with the appropriate values. Change the other available options. In the parameters area.Click and drag from the transform to the target object. to copy the entire contents of a table into the target object. 11. In the SQL text area. 7. . you can change the names and datatypes of these columns. 9.6. select the appropriate configuration from the Database type drop-down list. if required. For example. enter the SQL statement. If required. 8. You can also create the output columns manually. 12. If there is more than one datastore configuration. select the source datastore from the Datastore drop-down list.Click Back to return to the data flow workspace.
• Set up recoverable work flows • Using recovery mode • Using try/catch blocks and automatic recovery .Lesson 5 Setting up Error Handling Lesson introduction For sophisticated error handling. you can use recoverable work flows and try/catch blocks to recover data.
put the work flow in sleep mode and then increment the counter. After completing this unit. such as server failures. In this situation. For example. you will be able to: • • • • • Explain how to avoid data recovery situations Explain the levels of data recovery strategies Recover a failed job using automatic recovery Recover missing values and rows Define alternative work flows Avoiding data recovery situations The best solution to data recovery situations is obviously not to get into them in the first place. repeat the while loop. If the condition does not change. As long as the file does not exist and the counter is less than a particular value. As long as the file does not exist. . In each iteration of the loop. In other words. You can use a while loop to check for the existence of the file using the file_exists function. the steps done during the while loop result in a change in the condition so that the condition is eventually no longer satisfied and the work flow exits from the while loop. you can have the work flow go into sleep mode for a particular length of time before checking again. to ensure that the while loop eventually exits. you must resolve the problems that prevented the successful execution of the job. Others. the while loop does not end. you could use the wait_for_file function or a while loop and the file_exists function to check that the file exists in a specified location before executing the job. you might want a work flow to wait until the system writes a particular file. While loops The while loop is a single-use object that you can use in a work flow. Some of those situations are unavoidable.Using recovery mechanisms Introduction If a Data Services job does not complete properly. can easily be sidestepped by constructing your jobs so that they take into account the issues that frequently cause them to fail. The while loop repeats a sequence of steps as long as a condition is true. however. change the while loop to check for the existence of the file and the value of the counter. One example is when an external file is required to run a job. Because the system might never write the file. such as a counter. Typically. you must add another check to the loop.
the job never re-executes that object. When there is a dependency like this. do a full replacement of the target. This option is outside of the scope of this course. partially loaded. you should designate the work flow as a recovery unit. steps in a work flow depend on each other and must be executed together. When this happens. • Define alternative work flows: Use conditionals. some data flows may not have completed. including the steps that executed successfully in prior work flow runs. You need to design your data movement jobs so that you can recover your data by rerunning the job and retrieving all the data without introducing duplicate or missing data. • Recover from partially-loaded tables: Use the Table Comparison transform. Configuring work flows and data flows In some cases. you may need to specify that a work flow or data flow should only execute once. Data Services executes the entire work flow during recovery. To specify a work flow as a recovery unit 1. and scripts to ensure all exceptions are managed in a work flow. you may use a combination of these techniques to recover from exceptions. and use overflow files to manage rows that could not be inserted. or altered. When this setting is enabled. Depending on the relationships between data flows in your application. • Recover a partially-loaded job: Use automatic recovery. On the General tab. Conversely. right-click the work flow and select Properties from the menu. try/catch blocks. In the project area or on the Work Flows tab of the Local Object Library. include a preload SQL command to avoid duplicate loading of rows when recovering from partially loaded tables. . 2. Click OK. If the work flow does not complete successfully.Describing levels of data recovery strategies When a job fails to complete successfully during execution. • Recover missing values or rows: Use the Validation transform or the Query transform with WHERE clauses to identify missing values. 3. use the auto-correct load feature. Note: It is important to note that some recovery mechanisms are for use in production systems and are not supported in development en viron ments. select the Recover as a unit check box. There are different levels of data recovery and recovery strategies. some tables may have been loaded. The Properties dialog box displays. You can: • Recover your entire database: Use your standard RDBMS services to restore crashed data cache to an entire database. It is not recommended to mark a work flow or data flow as “Execute only once” if the parent work flow is a recovery unit. This requires the entire work flow to complete successfully.
If your recovery job uses new values. The recovery job does not reload the dimension tables in a failed job because the original job. Click OK. During recovery mode. For example. suppose a daily update job running overnight successfully loads dimension tables in a warehouse. the job execution may follow a completely different path through conditional steps or try/catch blocks. The Execution Properties dialog box displays. right-click the work flow or data flow and select Properties from the menu. As in normal job execution. the database log overflows and stops the job from loading fact tables. 2. Data Services executes the steps in parallel if they are not connected in the work flow diagrams and in serial if they are connected. In the project area. 3. However. such as those in a try/catch block. the data in the fact tables will not correspond to the data previously extracted into the dimension tables. even though it failed. while the job is running. The Properties dialog box displays. If your recovery job uses new extraction criteria. To enable automatic recovery in a job 1. On the Parameters tab. In recovery mode. select the Enable recovery check box. you can execute the job again in recovery mode. Data Services executes the steps or recovery units that did not complete successfully in a previous execution. you truncate the log file and run the job again in recovery mode. • Your recovery job must follow the exact execution path that the original job followed. such as basing data extraction on the current system date. On the General tab. select the Execute only once check box. Data Services records any external inputs to the original job so that your recovery job can use these stored values and follow the same execution path. 2. successfully loaded the dimension tables. The next day. Using recovery mode If a job with automated recovery enabled fails during execution. Data Services retrieves the results for successfully-completed steps and reruns uncompleted or failed steps under the same conditions as the original job. To ensure that the fact tables are loaded with the data that corresponds properly to the data already loaded in the dimension tables. right-click the job and select Execute from the menu.To specify that an object executes only once 1. In the project area or on the appropriate tab of the Local Object Library. ensure the following: • Your recovery job must use the same extraction criteria that your original job used when loading the dimension tables. This includes steps that failed and steps that generated an exception but completed successfully. .
select the Recover from last execution check box. see “Using preload SQL to allow re-executable Data Flows”. To recover from last execution 1. Within your recoverable work flow. 3. or recovery mode was disabled during the previous run. This technique can be optimal when the changes to the target table are numerous compared to the size of the table. can slow jobs executed in non-recovery mode. For more information on preloading SQL commands. The auto-correct load checks the target table for existing rows before adding new rows to the table. In the project area. • Change the target table options to use the auto-correct load feature when you have tables with fewer rows and more fields. Preload SQL commands can remove partial database updates that occur during incomplete execution of a step in a job. the previous job run succeeded. The Execution Properties dialog box displays.If this check box is not selected. • Include a SQL command to execute before the table loads. right-click the job that failed and select Execute from the menu. Click OK. the preload SQL command deletes rows based on a variable that is set before the partial insertion step began. Click OK. This option is not available when a job has not yet been executed. however. 2. you can use several methods to ensure that you do not insert duplicate rows: • Include the Table Comparison transform (available in Data Integrator packages only) in your data flow when you have tables with more rows and fewer fields. such as dimension tables. Consider this technique when the target table is large and the changes to the table are relatively few. Using the auto-correct load option. On the Parameters tab. Typically. Recovering missing values or rows Missing values that are introduced into the target data during data integration and data quality processes can be managed using the Validation or Query transforms. 3. Recovering from partially-loaded data Executing a failed job again may result in duplication of rows that were loaded successfully during the first job run. such as fact tables. . • Change the target table options to completely replace the target table during each execution. Data Services does not record the results from the steps during the job and cannot recover the job if it fails. Chapter 18 in the Data Services Designer Guide.
2. you can use the commands to load the target manually when the target is accessible. 2. you can use Data Services to specify the format of the error-causing records in the overflow file. When you specify an overflow file. Data Services writes the row to the overflow file instead. Overflow files help you process this type of data problem. The trace log indicates the data flow in which the load failed and the location of the file. Open the target table editor for the target table in your data flow. In the File name field. under Error handling. This technique allows you to automate the process of recovering your results. select what you want Data Services to write to the file about the rows that failed to load: • If you select Write data. You can use the overflow information to identify invalid data in your source or problems introduced in the data movement. enter or browse to the full path and file name for the file. This script reads the value in a status table and populates a global variable with the same value. A script to determine if recovery is required. A conditional that calls the appropriate work flow based on whether recovery is required. For example. On the Options tab. . 3.Missing rows are rows that cannot be inserted into the target table. The initial value in table is set to indicate that recovery is not required. select the Use overflow file check box. give a full path name to ensure that Data Services creates a unique file when more than one file is created in the same job. When you specify an overflow file and Data Services cannot load a row into a table. In the File format drop-down list. • If you select Write sql. rows may be missing in instances where a primary key constraint is violated. Defining alternative work flows You can set up your jobs to use alternative work flows that cover all possible exceptions and have recovery mechanisms built in. To use an overflow file in a job 1. Alternative work flows consist of several components: 1. 4. Every new run will overwrite the existing overflow file.
wherever possible. The script updates the status table to indicate that recovery is not required.The conditional contains an If/Then/Else statement to specify that work flows that do not require recovery are processed one way. 4. The data flow is set up for more resource-intensive processing that will resolve the exceptions. A script in the catch object to update the status table. This ensures that. . 5. and those that do require recovery are processed another way. the data flow is executed in a less resource-intensive mode. A work flow to execute a data flow with recovery and a script to update the status table. 3. The script specifies that recovery is required if any exceptions are generated. A work flow with a try/catch block to execute a data flow without recovery. The data flow where recovery is not required is set up without the auto correct load option set.
Conditionals are single-use objects used to implement conditional logic in a work flow. When you define a conditional, you must specify a condition and two logical branches:
S tatement Description
A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression. Work flow element to execute if the IF expression evaluates to TRUE. Work flow element to execute if the IF expression evaluates to FALSE.
Both the Then and Else branches of the conditional can contain any object that you can have in a work flow, including other work flows, data flows, nested conditionals, try/catch blocks, scripts, and so on.
A try/catch block allows you to specify alternative work flows if errors occur during job execution. Try/catch blocks catch classes of errors, apply solutions that you provide, and continue execution. For each catch in the try/catch block, you will specify: • One exception or group of exceptions handled by the catch. To handle more than one exception or group of exceptions, add more catches to the try/catch block. • The work flow to execute if the indicated exception occurs. Use an existing work flow or define a work flow in the catch editor. If an exception is thrown during the execution of a try/catch block, and if no catch is looking for that exception, then the exception is handled by normal error logic.
Using try/catch blocks and automatic recovery
Data Services does not save the result of a try/catch block for re-use during recovery. If an exception is thrown inside a try/catch block, during recovery Data Services executes the step that threw the exception and subsequent steps. Because the execution path through the try/catch block might be different in the recovered job, using variables set in the try/catch block could alter the results during automatic recovery.
For example, suppose you create a job that defines the value of variable $I within a try/catch block. If an exception occurs, you set an alternate value for $I. Subsequent steps are based on the new value of $I.
During the first job execution, the first work flow contains an error that generates an exception, which is caught. However, the job fails in the subsequent work flow.
You fix the error and run the job in recovery mode. During the recovery execution, the first work flow no longer generates the exception. Thus the value of variable $I is different, and the job selects a different subsequent work flow, producing different results.
To ensure proper results with automatic recovery when a job contains a try/catch block, do not use values set inside the try/catch block or reference output variables from a try/catch block in any subsequent steps.
To create an alte rnative work flow
1. Create a job. 2. Add a global variable to your job called $G_recovery_needed with a datatype of int . The purpose of this global variable is to store a flag that indicates whether or not recovery is needed. This flag is based on the value in a recovery status table, which contains a flag of 1 or 0, depending on whether recovery is needed. 3. In the job workspace, add a work flow using the tool palette. 4. In the work flow workspace, add a script called GetStatus using the tool palette. 5. In the script workspace, construct an expression to update the value of the $G_recovery_needed global variable to the same value as is in the recovery status table. The script content depends on the RDBMS on which the status table resides. The following is an example of the expression:
$G_recovery_needed = sql('DEM O_Target', 'select recovery_flag from recovery_status');
6. Return to the work flow workspace. 7. Add a conditional to the workspace using the tool palette and connect it to the script. 8. Open the conditional. The transform editor for the conditional allows you to specify the IF expression and Then/Else branches.
11.Connect the objects in the Then pane. 16. In the IF field. . expand the tree in the Available exceptions pane.Add a catch object to the Then pane after the work flow or data flow using the tool palette.Add a try object to the Then pane of the transform editor using the tool palette.Open the workspace for the catch object. 10. enter the expression that evaluates whether recovery is required. The following is an example of the expression: $G_recovery_needed = 0 This means the objects in the Then pane will run if recovery is not required. construct an expression update the flag in the recovery status table to 1. Data Services catches all exceptions. 13. select the appropriate exceptions.To change which exceptions act as triggers. If desired. 15.In the Local Object Library. 14.Add a script called Fail to the lower pane using the Tool. and click Set to move them to the Trigger on these exceptions pane. 17. indicating that recovery is needed. you can add a data flow here instead of a script. the objects in the Else pane will run. This object will be executed if there are any exceptions. click and drag a work flow or data flow to the Then pane after the try object. If recovery is needed.In the script workspace.9. By default. 12. All exception types are lists in the Available exceptions pane.
Check the contents of the status table again. 19. 24. the second object will be executed.'update recovery_status set recovery_flag = 0'). 30. 20.'update recovery_status set recovery_flag = 1'). 28. 27.Connect the objects in the Then pane. 23. The following is an example of the expression: sql('DEMO_Target'.Check the contents of the status table. . 18. indicating that recovery is not needed. The third time this job is executed.Return to the conditional workspace. then the first object will be executed. 22.Add a script called Pass to the lower pane using the tool palette. The second time this job is executed. construct an expression to update the flag in the recovery status table to 0. The first time this job is executed.In the Local Object Library.Return to the conditional workspace. so there is a primary key exception.Execute the job.In the script workspace. The job succeeds because the auto correct load feature checks for existing values before trying to insert new rows. 26. the version of the data flow with the Auto correct load option selected runs because the recovery_flag value in the status table is set to 1.Execute the job again.The script content depends on the RDBMS on which the status table resides. The following is an example of the expression: sql('DEMO_Target'. The script content depends on the RDBMS on which the status table resides. the job fails because the target table already contains records. click and drag the work flow or data flow that represents the recovery process to the Else pane. the job succeeds because the recovery_flag value in the status table is set to 0 and the target table is empty. 21. This combination means that if recovery is not needed. 29.Validate and save all objects. 25.Connect the objects in the Else pane. The recovery_flag field now contains a value of 1.Execute the job again. so there is no primary key constraint. The recovery_flag field contains a value of 0. if recovery is required.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.