BusinessObjects Data Services XI 3.0/3.

1: Core Concepts

CONTENTS

Lesson 1 Describing Data Services Lesson introduction
Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes.

• Describe the purpose of Data Services • Describe Data Services architecture

Describing the purpose of Data Services
Introduction
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.

Describing Data Services benefits
The Business Objects Data Services platform enables you to perform enterprise-level data integration and data quality functions. With Data Services, your enterprise can: • Create a single infrastructure for data movement to enable faster and lower cost implementation. • Manage data as a corporate asset independent of any single system. • Integrate data across many systems and re-use that data for many purposes. • Improve performance. • Reduce burden on enterprise systems. • Prepackage data solutions for fast deployment and quick return on investment (ROI). • Cleanse customer and operational data anywhere across the enterprise. • Enhance customer and operational data by appending additional information to increase the value of the data. • Match and consolidate data at multiple levels within a single pass for individuals, households, or corporations.

Understanding data integration processes
Data Services combines both batch and real-time data movement and management with intelligent caching to provide a single data integration platform for information management from any information source and for any information use. This unique combination allows you to: • Stage data in an operational datastore, data warehouse, or data mart. • Update staged data in batch or real-time modes.

• Create a single environment for developing, testing, and deploying the entire data integration platform. • Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis. Data Services performs three key functions that can be combined to create a scalable, high-performance data platform. It: • Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes. • Creates routing requests to a data warehouse or ERP system using complex rules. • Applies transactions against ERP systems. Data mapping and transformation can be defined using the Data Services Designer graphical user interface. Data Services automatically generates the appropriate interface calls to access the data in the source system.

you will be able to: • Describe standard Data Services components • Describe Data Services management tools Defining Data Services components Data Services includes the following standard components: • • • • • • • • • • Designer Repository Job Server Engines Access Server Adapters Real-time Services Address Server Cleansing Packages. After completing this unit. and Directories Management Console This diagram illustrates the relationships between these components: . Dictionaries.Describing Data Services architecture Introduction Data Services relies on several unique components to accomplish the data integration and data quality activities required to manage your corporate data.

The details for the Job Server display in the status bar in the lower left portion of the screen. and control logic. 3. 2. 4. Click OK. In the BusinessObjects Data Services Repository Login dialog box.1 BusinessObjects Data Services Data Services Designer to launch Designer. and manually execute jobs that transform data and populate a data warehouse. From the Designer. enter the connection information for the local repository. transformations. hover the cursor over the Job Server icon in the bottom right corner of the screen. From the Start menu. you can also trigger the Job Server to run your jobs for initial application testing. . and then drag. The path may be different. To verify the Job Server is running in Designer.0/3. you create data management applications that consist of data mappings. depending on how the product was installed. and configure them in flow diagrams. test. click Programs BusinessObjects XI 3.Describing the Designer Data Services Designer is a Windows client application used to create. Using Designer. You can create objects that represent data sources. Designer allows you to manage metadata stored in a local repository. To log in to Designer 1. drop.

The path may be different.0/3.1 BusinessObjects Data Services Data Services Repository Manager to launch the Repository Manager. • A central repository (known in Designer as the Central Object Library) is an optional component that can be used to support multi-user development.Describing the repository The Data Services repository is a set of tables that holds user-created and predefined system objects. . In the BusinessObjects Data Services Repository Manager dialog box. • A profiler repository is used to store information that is used to determine the quality of data. There are three types of repositories: • A local repository (known in Designer as the Local Object Library) is used by an application designer to store definitions of source and target metadata and Data Services objects. From the Start menu. depending on how the product was installed. The Central Object Library provides a shared library that allows developers to check objects in and out for development. source and target metadata. click Programs BusinessObjects XI 3. enter the connection information for the local repository. 2. Each repository is associated with one or more Data Services Job Servers. To create a local repository 1. Each repository is stored on an existing Relational Database Management System (RDBMS). It is set up on an open client/server platform to facilitate sharing metadata with other enterprise tools. and transformation rules.

if it already exists. Describing the Job Server Each repository is associated with at least one Data Services Job Server. click Get Version. . The version displays in the pane at the bottom of the dialog box. Click Close. you can see the SQL that is applied to create the repository. The data movement engine integrates data from multiple heterogeneous sources. Create Create. performs complex data transformations.3. If you select the Show Details check box. which retrieves the job from its associated repository and starts the data movement engine. and manages extractions and transactions from ERP systems and other sources. 5. To see the version of the repository. 4. You may need to confirm that you want to overwrite the existing repository. Note that the version number refers only to the last major point release number. The Job Server can move data in batch or real-time mode and uses distributed query optimization. System messages confirm that the local repository is created.

you can run it from the Designer. In production environments. From the Start menu. Job Server to a specific Server Group. and parallel processing to deliver high data throughput and scalability. Each Data Services server can contribute one.multithreading. data flow or sub-data flow (depending on the distribution level specified) should be executed. which executes jobs according to overall system load. you can balance job loads by creating a Job Server Group (multiple Job Servers). While designing a job. Each Job Server collects resource utilization information for its computer.1 BusinessObjects Data Services Data Services Server Manager to launch the Server Manager. and only one. The path may be different. depending on how the product was installed. in-memory caching. in-memory data transformations. click Programs BusinessObjects XI 3. In your production environment. To verify the connection between repository and Job Server 1. the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data Services Access Server. A Server Group is a collection of Job Servers that each reside on different Data Services server computers. This information is utilized by Data Services to determine where a job.0/3. . Data Services provides distributed processing capabilities through the Server Groups.

select the Job Server. . 3. In the Job Server Configuration Editor dialog box. click Edit Job Server Config. In the BusinessObjects Data Services Server Manager dialog box.2.

.4. select the repository. Click Resync. In the Job Server Properties dialog box. 6. 5. Click Resync with Repository.

Click Apply. and movement. Data Services engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability. 9. Describing the Access Server The Access Server is a real-time. Describing the real-time services The Data Services real-time client communicates with the Access Server when processing real-time jobs. routes them to a real-time service. You can configure multiple Access Servers. The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. 11. the Job Server starts Data Services engine processes to perform data extraction. and delivers a message reply within a user-specified time frame. 13. enter the password for the repository.In the BusinessObjects Data Services Server Manager dialog box.com or the Java Messaging Queue. 7.Click OK to close the Job Server Properties dialog box. click Restart to restart the Job Server. In the Password field. 8. transformation. Describing the engines When Data Services jobs are executed. Click OK to acknowledge the warning message.Click OK to close the Job Server Configuration Editor dialog box.Click OK. A system message displays indicating that the Job Server will be restarted. .A system message displays indicating that the Job Server will be resynchronized with the selected repository. request-reply message broker that collects incoming XML message requests. 12. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high. Real-time services are configured in the Data Services Management Console. 10. Describing the adapters Adapters are additional Java-based programs that can be installed on the job server to provide connectivity to other systems such as Salesforce. There is also a Software Development Kit (SDK) to allow customers to create adapters for custom applications.

and directories The Data Quality Cleansing Packages. dictionary files are used to identify. titles. Describing the Management Console The Data Services Management Console provides access to the following features: • Administrator • Auto Documentation • Data Validation • Impact and Lineage Analysis • Operational Dashboard • Data Quality Reports Administrator Administer Data Services resources. including: • Scheduling. dictionaries. monitoring. It provides access to detailed address line information for most European countries. properties. and address information. and directories provide referential data for the Data Cleanse and Address Cleanse transforms to use when parsing.Describing the Address Server The Address Server is used specifically for processing European addresses using the Data Quality Global Address Cleanse transform. Describing the Cleansing Packages. gender. . and executing batch jobs • Configuring. Access Server. and repository usage • Configuring and managing adapters • Managing users • Publishing batch jobs and real-time services via web services • Reporting on metadata Auto Documentation View. and stopping real-time services • Configuring Job Server. Directories provide information on addresses from postal authorities. and print graphical representations of all objects as depicted in Data Services Designer. and more. parse. Cleansing Packages are packages that enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules. dictionaries. Dictionaries also contain acronym. starting. including their relationships. standardizing. and firm data. and cleansing name and address data. match standard. capitalization. and standardize data such as names. analyze.

and Global Address Cleanse transforms. upgrade. Describing the Repository Manager The Data Services Repository Manager allows you to create. transform-specific reports. and transform group reports. Describing the License Manager The License Manager displays the Data Services components for which you currently have a license. Ope rational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one or more repositories over a given time period. and Business Objects Enterprise objects such as universes. Use the Server Manager to define links between Job Servers and repositories. and check the versions of local. Describing the Server Manager The Data Services Server Manager allows you to add. To generate reports for Match. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (for example. assess. It is automatically installed on each computer on which you install a Job Server. business views. separating test and production environments). and reports. Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns. delete. Data Quality Reports Use data quality reports to view and export Crystal reports for batch and real-time jobs that include statistics-generating transforms.Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data Services batch jobs in order to quickly review. central. or edit the properties of Job Servers. you must enable the Generate report data option in the Transform Editor. US Regulatory Address Cleanse. Report types include job summaries. and identify potential inconsistencies or errors in source data. . and profiler repositories. Defining other Data Services tools There are also several tools to assist you in managing your Data Services installation.

Run the Metadata Integrator to collect metadata into the Data Services repository for Business Views and Universes used by Crystal Reports. Desktop Intelligence documents. and Web Intelligence documents. .Describing the Metadata Integrator The Metadata Integrator allows Data Services to seamlessly share metadata with Business Objects Intelligence products.

Lesson 2 Defining Source and Target Metadata Lesson introduction To define data movement requirements in Data Services. • Use datastores • Use datastore and system configurations • Define file formats for flat files • Define file formats for Excel files . you must import source and target metadata.

Oracle. VSAM. There are three kinds of datastores: • Database datastores: provide a simple way to import metadata directly from an RDBMS. if the data source is SQL-compatible. while Data Services extracts data from or loads data directly to the application. Through the datastore connection. Data Services uses these datastores to read data from source tables or load data to target tables. and Teradata databases (using native connections) • Other databases (through ODBC) • A simple memory storage mechanism using a memory datastore • IMS.Using datastores Introduction Datastores represent connections between Data Services and databases or applications. . Data Services can import the metadata that describes the data from the data source. you must make corresponding changes in the datastore information in Data Services. and various additional legacy systems using BusinessObjects Data Services Mainframe Interfaces such as Attunity and IBM Connectors The specific information that a datastore contains depends on the connection. Sybase. After completing this unit. Database datastores can be created for the following sources: • IBM DB2. Data Services does not automatically detect structural changes to the datastore. the adapter might be designed to access metadata. • Adapter datastores: can provide access to an application’s data and metadata or just metadata. Each source or target must be defined individually and the datastore options available depend on which Relational Database Management System (RDBMS) or application is used for the datastore. Microsoft SQL Server. you will be able to: • • • • Explain datastores Create a database datastore Change a datastore definition Import metadata Explaining datastores A datastore provides a connection or multiple connections to data sources such as a database. When your database or application changes. For example. • Application datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems.

adapters can provide: • Application metadata browsing • Application metadata importing into the Data Services repository For batch and real-time data movement between Data Services and applications.Using adapters Adapters provide access to a third-party application’s data and metadata. any ODBC connection provides access to all of the available MySQL schemas. right-click the white space and select New from the menu. The Create New Datastore dialog box displays. . See the documentation folder under Adapters located in your Data Services installation for more information on the Data Mart Accelerator for Crystal Reports. It cannot contain spaces. You can use the Data Mart Accelerator for Crystal Reports adapter to import metadata from BusinessObjects Enterprise. 5. enter the name of the new datastore. On the Datastores tab of the Local Object Library. In the Database type drop-down list. Note that if you are using MySQL. 2. In the Datastore Type drop-down list. select the RDBMS for the data source. You can also buy Data Services prepackaged adapters to access application data and metadata in any application. as required. The values you select for the datastore type and database type determine the options available when you create a database datastore. see Chapter 5 in the Data Services Designer Guide. If you do not have access. ask your database administrator to create an account for you. To create a datastore. Enter the other connection details. The name can contain any alphanumeric characters or underscores (_). you must have appropriate access privileges to the database or file system that the datastore describes. Depending on the adapter implementation. ensure that the default value of Database is selected. To create a database datastore 1. 6. For more information on these adapters. 3. 4. Business Objects offers an Adapter Software Development Kit (SDK) to develop your own custom adapters. The entries that you must make to create a datastore depend on the selections you make for these two options. In the Datastore name field. Leave the Enable automatic data transfer check box selected. Creating a database datastore You need to create at least one datastore for each database file system with which you are exchanging data.

Changing a datastore definition Like all Data Services objects. Properties Tab Description General Contains the name and description of the datastore. Includes overall datastore information such as description and date created. Properties are descriptive of the object and do not affect its operation. Includes the date you created the datastore. The Edit Datastore dialog box allows you to edit all connection properties except datastore name and datastore type for adapter and application datastores. the name of the datastore and the date on which it is created are datastore properties. and database version. The datastore name appears on the object in the Local Object Library and in calls to the object. This value cannot be changed. For example. For database datastores. datastores are defined by both options and properties: • Options control the operation of objects. you can edit all connection properties except datastore name. if available. You cannot change the name of a datastore after creation.7. datastore type. • Properties document the object. database name. and password for the specific database. database type. Click OK. These include the database server name. user name. Attributes Class Attributes .

and password options. and browsing. you can edit column names. it ignores the column entirely. searching. and class attributes. as required. . Metadata Description Table name Table description Column name Column description The name of the table as it appears in the database. Click OK. The description of the column. Click OK. 3. 3. right-click the datastore name and select Properties from the menu. The Properties dialog box lists the datastore’s description. The edits are propagated to all objects that call these objects.To change datastore options 1. username. The changes take effect immediately. database name. After importing metadata. You can import metadata by name. The Edit Datastore dialog box displays the connection information. In some cases. attributes. The description of the table. The datatype for each column. 2. On the Datastores tab of the Local Object Library. 2. right-click the datastore name and select Edit from the menu. descriptions. To change datastore prope rties 1. On the Datastores tab of the Local Object Library. if Data Services cannot convert the datatype. Change the database server name. Change the datastore properties. and datatypes. The name of the table column. Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables. as required. Column datatype If a column is defined as an unsupported datatype (see datatypes listed below) Data Services converts the datatype to one that is supported.

After a table has been added to a data flow diagram. and Sybase databases and stored functions and packages from Oracle. decimal. this columns is indicated in the column list by a key icon next to the column name. The column that comprises the primary key for the table. The items available to import appear in the workspace. date. MS SQL Server. numeric. Primary key column Table attribute Owner name You can also import stored procedures from DB2.Metadata Description The following datatypes are supported: BLOB. and varchar. . real. To import metadata by browsing 1. see “Ways of importing metadata”. double. Chapter 5 in the Data Services Designer Guide. Information Data Services records about the table such as the date created and date modified if these values are available. You can configure imported functions and procedures through the Function Wizard and the Smart Editor in a category identified by the datastore name. time. 2. Note that functions cannot be imported using this method. For more information on importing by searching and importing by name. interval. timestamp. Information that is imported for functions includes: • • • • Function parameters Return type Name Owner Imported functions and procedures appear in the Function branch of each datastore tree on the Datastores tab of the Local Object Library. On the Datastores tab of the Local Object Library. long. You can use these functions and procedures in the extraction specifications you give Data Services. Navigate to and select the tables for which you want to import metadata. Oracle. Importing metadata by browsing The easiest way to import metadata is by browsing. int. datetime. CLOB. right-click the datastore and select Open from the menu. Name of the table owner.

To view data for a imported datastore. The workspace contains columns that indicate whether the table has already been imported into Data Services (Imported) and if the table schema has changed since it was imported (Changed). 5. To verify whether the repository contains the most recent metadata for an object. In the Local Object Library. 37 . 3.You can hold down the Ctrl or Shift keys and click to select multiple tables. right-click a table and select View Data from the menu. and Template Tables. organized into Functions. 4. Tables. expand the datastore to display the list of imported objects. right-click the object and select Reconcile. Right-click the selected items and select Import from the menu.

Creating file formats Use the file format editor to set properties for file format templates and source and target file formats. File format objects can describe files in: • Delimited format — delimiter characters such as commas or tabs separate each field. Expand and collapse the property groups by clicking the leading plus or minus. • SAP R/3 format — this is used with the predefined Transport_Format or with a custom SAP R/3 format. After completing this unit. Date formats In the Property Values work area. The following data format codes can be used: Code Description DD 2-digit day of the month . • Fixed width format — the fixed column width is specified by the user. The file format editor has three work areas: • Property Value: Edit file format property values. you will be able to: • Explain file formats • Create a file format for a flat file Explaining file formats A file format is a generic description that can be used to describe one file or multiple data files if they share the same format. Field-specific formats override the default format set in the Properties-Values area. • Column Attributes: Edit and define columns or fields in the file. you can override default date formats for files at the field level. • Data Preview: View how the settings affect sample data. File formats are used to connect to source or target data when the data is stored in a flat file.Defining file formats for flat files Introduction File formats are connections to flat files in the same way that datastore are connections to databases. The properties and appearance of the work areas vary with the format of the file. The Local Object Library stores file format templates that you use to define specific file formats as sources and targets in data flows. It is a set of properties describing the structure of a flat file (ASCII).

On the Formats tab of the Local Object Library.Code Description MM MONTH MON YY YYYY HH24 MI SS FF 2-digit month Full name of the month 3-character name of the month 2-digit year 4-digit year 2-digit hour of the day (0-23) 2-digit minute (0-59) 2-digit second (0-59) Up to 9-digit sub-seconds To create a new file format 1. To make sure your file format definition works properly. . it is important to finish inputting the values for the file properties before moving on to the Column Attributes work area. right-click Flat Files and select New from the menu to open the File Format Editor.

Specify the location information of the data file including Location. In the Name field. and File name.2. 3. • Fixed width: select this file type if the file uses specified widths for each column. If a fixed-width file format uses a multi-byte code page. enter a name that describes this file format template. it cannot be changed. specify the file type: • Delimited: select this file type if the file uses a character sequence to separate columns. Root directory. . Complete the other properties to describe files that this template represents. multiple files can be read. 4. then no data is displayed in the Data Preview section of the file format editor for its files. Once the name has been created. This happens automatically when you open a file. The Group File Read can read multiple flat files with identical formats through a single file format. Click Yes to overwrite the existing schema. If an error is made. Overwrite the existing schema as required. In the Type field. the file format must be deleted and a new format created. 6. 5. By substituting a wild card character or list of file names for the single file name.

8. specify the length of the field. specify the precision of the field. right-click the file format and select View Data from the menu to see the data.7. it defaults to the format used by the code page on the computer where the Job Server is installed. 9. specify the scale of the field. Click Save & Close to save the file format and close the file format editor. if desired. Data Services cannot use the source column format specified. For source files. If you do specify columns and they do not match the output schema from the preceding transform. . This information overrides the default format set in the Property Values work area for that datatype. For columns with a datatype of varchar. if you only specify a source column format and the column names and datatypes in the target schema do not match those in the source schema. select a format for the field. For a decimal or real datatype. For columns with a datatype of decimal or numeric. For columns with a datatype of decimal or numeric. In the Local Object Library. Data Services writes to the target file using the transform’s output schema. specify the structure of each column in the Column Attributes work area as follows: Column Description Field Name Data Type Enter the name of the column. For columns with any datatype but varchar. Field Size Precision Scale Format You do not need to specify columns for files used as targets. Select the appropriate datatype from the drop-down list. Instead.

You can select specific data in the workbook using custom ranges or auto-detect. To import and configure an Excel source 1. After completing this unit. As with file formats and datastores. On the Formats tab of the Local Object Library.Defining file formats for Excel files Introduction You can create file formats for Excel files in the same way that you would for flat files. . these Excel formats show up as sources in impact and lineage analysis reports. and you can specify variable for file and sheet names for more flexibility. right-click Excel Workbooks and select New from the menu. with no ODBC connection setup and configuration needed. you will be able to: • Create a file format for an Excel file Using Excel as a native data source It is possible to connect to Excel workbooks natively as a source.

The Import Excel Workbook dialog box displays.

2. In the Format name field, enter a name for the format. The name may contain underscores but not spaces. 3. On the Format tab, click the drop-down button beside the Directory field and select <Select folder...>. 4. Navigate to and select a new directory, and then click OK. 5. Click the drop-down button beside the File name field and select <Select file...>. 6. Navigate to and select an Excel file, and then click Open. 7. Do one of the following: • To reference a named range for the Excel file, select the Named range radio button and enter a value in the field provided.

• To reference an entire worksheet, select the Worksheet radio button and then select the All fields radio button. • To reference a custom range, select the Worksheet radio button and the Custom range radio button, click the ellipses (...) button, select the cells, and close the Excel file by clicking X in the top right corner of the worksheet. 8. If required, select the Extend range checkbox. The Extend range checkbox provides a means to extend the spreadsheet in the event that additional rows of data are added at a later time. If this checkbox is checked, at execution time, Data Services searches row by row until a null value row is reached. All rows above the null value row are included. 9. If applicable, select the Use first row values as column names option. If this option is selected, field names are based on the first row of the imported Excel sheet. 10.Click Import schema. The schema is displayed at the top of the dialog box. 11.Specify the structure of each column as follows:
Column Description

Field Name Data Type

Enter the name of the column. Select the appropriate datatype from the drop-down list. For columns with a datatype of varchar, specify the length of the field. For columns with a datatype of decimal or numeric, specify the precision of the field. For columns with a datatype of decimal or numeric, specify the scale of the field. If desired, enter a description of the column.

Field Size

Precision

Scale

Description

12.If required, on the Data Access tab, enter any changes that are required. The Data Access tab provides options to retrieve the file via FTP or execute a custom application (such as unzipping a file) before reading the file. 13.Click OK. The newly imported file format appears in the Local Objects Library with the other Excel workbooks. The sheet is now available to be selected for use as a native data source.

Lesson 3 Creating Batch Jobs Lesson introduction Once metadata has been imported for your datastores, you can create data flows to define data movement requirements. • Work with objects • Create a data flow • Use the Query transform • Use target tables • Execute the job

the database name is an option for the connection. properties. All objects have options. modify. Options Options control the object. you will be able to: • Define the objects available in Data Services • Explain relationships between objects Understanding Data Services objects In Data Services. and classes. Each can be modified to change the behavior of the object. After completing this unit. For example. all entities you add. to set up a connection to a database. Some of the most frequently-used objects are: • • • • • • Projects Jobs Work flows Data flows Transforms Scripts This diagram shows some common objects. or work with are objects. .Defining Data Services objects Introduction Data Services provides you with a variety of objects to use when you are building your data integration and data quality applications. define.

you can expand it to view the lower-level objects contained in the object. Re-usable objects A re-usable object has a single definition and all calls to the object refer to that definition. After you define and save a re-usable object. Data . You can then re-use the definition as often as necessary by creating calls to it. Defining projects and jobs A project is the highest-level object in Designer. the name and creation date describe what the object is used for and when it became active. • Only one project can be open at a time. You can edit re-usable objects at any time independent of the current open project. Multiple jobs. Note: You cannot copy single-use objects. A project is a single-use object that allows you to group jobs. For example. a data flow within a project is a re-usable object. If you change the definition of the object in one place. can call the same data flow. you can open a data flow and edit it. Most objects created in Data Services are available for re-use. Attributes are properties used to locate and organize objects. the changes you make to the data flow are not stored until you save them. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. A job is the smallest unit of work that you can schedule independently for execution. For example. The objects in a project appear hierarchically in the project area. if you open a new project. If a plus sign (+) appears next to an object.Properties Properties describe the object. Every object is either re-usable or single-use. Data Services stores the definition in the repository. such as a weekly load job and a daily load job. Single-use objects Single-use objects appear only as components of other objects. the change is reflected to all other calls to the object. • Projects cannot be shared among multiple users. both jobs call the new version of the data flow. Projects have the following characteristics: • Projects are listed in the Local Object Library. They operate only in the context in which they were created. However. Classes Classes define how an object can be used. and then save the object. If this data flow is changed. For example. Projects provide a way to organize the other objects you create in Designer. For example.

which involves moving data from one or more sources to one or more target tables or files. they could be set up without work flows. and targets. or to define conditions for running sections of a project. the transformations the data should undergo. This practice can provide various benefits. If these had been initially added to a work flow. . if one target table depends on values from other tables. You define data flows by identifying the sources from which to extract data. Using work flows Jobs with data flows can be developed without using work flows. the developer could then have simply copied that work flow into the correct position within the new job. A work flow orders data flows and the operations that support them. Note: Jobs must be associated with a project before they can be executed in th e project area of Designer. • A data flow is the process by which source data is transformed into target data.Services displays the contents as both names and icons in the project area hierarchy and in the workspace. one should consider nesting data flows inside of work flows by default. However. You can also use work flows to define strategies for handling errors that occur during project execution. Defining relationship between objects Jobs are composed of work flows and/or data flows: • A work flow is the incorporation of several data flows into a sequence. But what if specification changes require that they be merged into another job instead? The developer would have to replicate their sequence correctly in the other job. you can use the work flow to specify the order in which you want Data Services to populate the tables. Always using work flows makes jobs more adaptable to additional development and/or specification changes. A data flow defines the basic task that Data Services accomplishes. For instance. This diagram illustrates a typical work flow. For example. It also defines the interdependencies between data flows. if a job initially consists of four data flows that are to run sequentially.

to jobs. It also opens up the possibility that units of recovery are not properly defined. as data volumes tend to increase. The change can be made more quickly with greater accuracy. copy. the whole process could simply be rerun. there are benefits to adaptability. to optional work flows. and verify the previous sequence. Describing the object hierarchy In the repository. In jobs. This illustration shows the hierarchical relationships for the key object types within Data Services: . the expectation being that if the job fails. Setting these up during initial development when the nature of the processing is being most fully analyzed is preferred. to data flows. it may have been decided that recovery units are not important. However. work flows define a sequence of processing steps. these changes can be complex and can consume more time than allotted for in a project plan. objects are grouped hierarchically from a project. The job may then be changed to incorporate work flows to benefit from recovery units to bypass reprocessing of successful steps.There would be no need to learn. it may be determined that a full reprocessing is too time consuming. Even if there is one data flow per work flow. and data flows move data from source tables to target tables. However. Initially.

This course focuses on creating batch jobs using database datastores and file formats. .

toolbar. project area. After completing this unit. and workspace. tool palette. The application window contains the menu bar. Most of the components of Data Services can be programmed through this interface. Tip: You can access the Data Services Technical Manuals for reference or help through the Designer interface Help menu.0/3. .Using the Data Services Designer interface Introduction The Data Services Designer interface allows you to plan and organize your data integration and data quality jobs in a visual way. you will be able to: • Explain how Designer is used • Describe key areas in the Designer window Describing the Designer window The Data Services Designer interface consists of a single application window and several embedded supporting windows. Local Object Library.1 BusinessObjects Data Services Data Services Documenta tion Technical Manua ls. These manuals are also accessible by going through Start Programs Business Objects XI 3.

which lists parent objects (such as jobs) of the object currently open in the workspace (such as a data flow). Validate All Objects in Validates all object definitions open in the workspace. Opens the Output window. Open and closes the project area. Central Object Library Opens and closes the Central Object Library window.Using the Designer toolbar In addition to many of the standard Windows toolbar buttons. Moves back in the list of active workspace windows. Opens and closes the Output window. Data Services provides the following unique toolbar buttons: Button Tool Description Save All Close All Windows Local Object Library Saves all new or updated objects. Closes all open windows in the workspace. Other objects included in the definition are also validated. Variables Project Area Output View Enabled Descriptions Opens and closes the Variables and Parameters window. You can collect audit statistics on the data that flows out of any Data Services object. View Opens the Audit window. Objects included in the definition are also validated. Enables the system-level setting for viewing object descriptions in the workspace. Audit View Where Used Back . Opens and closes the Local Object Library window. Validates the object definition open in the active tab of the Validate Current View workspace.

Transforms operate on data. which provides access to Administrator. Operational Dashboard. The table shows the tab on which the object type appears in the Local Object Library and describes the Data Services context in which you can use each type of object. Opens and closes the Data Services Management Console. defining the interdependencies between them. Assess and Monitor Contents Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. Opens the Data Services Technical Manuals. and Data Quality transforms. There are two job types: batch jobs and real-time jobs.Button Tool Description Forward Move forward in the list of active workspace windows. Under each datastore is a list of the tables. Data Validation. and Data Quality Reports. Data flows describe how to process a task. documents. Datastores represent connections to databases and applications used in your project. Jobs are executable work flows. producing output data sets from the sources you specify. and functions imported into Data Services . Tab Description Projects are sets of jobs available at a given time. Lineage and Impact Management Console Analysis. Auto Data Services Documentation. Opens Data Insight. Data Integrator. Work flows order data flows and the operations that support data flows. which allows you to assess and monitor the quality of your data. The Local Object Library lists both platform.

On any tab of the Local Object Library. 4. 2. It also allows you to export Data Services to other products.atl or . To export a repository to a file 1. In the File name field. A warning message displays to let you know that it takes a long time to create new versions of existing objects. enter the name of the export file. To import a repository from a file 1. 2. Browse to the destination for the file. or XML message. You must restart Data Services after the import process completes. Browse to the destination for the export file. The repository is exported to the file. In the Save as type list. Tabs on the bottom of the project area support different tasks. Click Save. Click Open. select the file type for your export file. 5. 4. Tabs include: . The Open Import File dialog box displays. Whole repositories can be exported in either . The Write Repository Export File dialog box displays. Using the project area The project area provides a hierarchical view of the objects used in each project. Using the . Importing objects from a file overwrites existing objects with the same names in the destination Local Object Library.xml file format can make repository content easier for you to read. You can import objects to and export objects from your Local Object Library as a file. 3. 3.xml format. On any tab of the Local Object Library. Excel file. right-click the white space and select Repository Export To File. Click OK.Tab Description Formats describe the structure of a flat file. XML file. right-click the white space and select Repository Import from File from the menu. Custom functions are functions written in the Data Services Scripting Language.

including which steps are complete and which steps are executing. Move the mouse over the docked pane. Click the pin icon to lock the pane in place again. To change the undocked position of the project area 1. This provides a hierarchical view of all objects used in each project. The project area re-appears. Right-click the border of the project area. it stays undocked. this signifies a placement option. View the status of currently executing jobs. Click the pin icon ( ) on the border to unlock the project area. Click and drag the project area to dock and undock at any edge within Designer. 3. 3.Tab Description Create. . and manage projects. Click and drag the project area to any location on your screen. 2. View the history of complete jobs. select Floating. From the menu. 2. select Floating to remove the check mark and clear the docking option. The project area does not dock inside the workspace area. double-click the gray border. 2. To lock and unlock the project area 1. These tasks can also be done using the Data Services Management Console. When you position the project area where one of the directional arrows highlights a portion of the window. To switch between the last docked and undocked locations. The project area hides. 4. Right-click the border of the project area. Logs can also be viewed with the Data Services Management Console. When you drag the project area away from a window edge. Selecting a specific job execution displays its status. From the menu. To change the docked position of the project area 1. 3. view.

you are creating a new definition of an object. For example. click Project Area in the toolbar. Jobs and work flows R/3 data Creates a new data flow with the SAP flow licensed extension only. Work flow Data flow Creates a new work flow.To hide/show the project area 1. The project area disappears from the Designer window. 2. The icons are disabled when they are invalid entries to the diagram open in the workspace. To show the project area. Jobs and work flows Creates a new data flow. If a new object is re-usable. select Hide. hold the cursor over the icon until the tool tip for the icon appears. it is automatically available in the Local Object Library after you create it. if you select the data flow icon from the tool palette and define a new data flow called DF1. The tool palette contains these objects: Icon Tool Description Available in Pointer Returns the tool pointer to a selection pointer for selecting and moving objects All objects in a diagram. 3. Using the tool palette The tool palette is a separate window that appears by default on the right edge of the Designer workspace. The icons in the tool palette allow you to create new objects in the workspace. When you create an object from the tool palette. To show the name of each icon. From the menu. you can later drag that existing data flow from the Local Object Library and add it to another data flow called DF2. You can move the tool palette anywhere on your screen or dock it on any edge of the Designer window. Query Creates a query to define column transform mappings and row selections. SAP licensed extension Data flows . Right-click the border of the project area.

This diagram is a visual representation of an entire data movement application or some part of a data movement application. and Annotation objects. XML Data flows Data flows Data Create a data transport flow for the SAP SAP Licensed transport Licensed extension. Catch Creates an annotation used to describe Jobs. Work flows Try Creates a new try object that tries an alternate work flow if an error occurs in Jobs and work flows a job. table Template Creates a new XML file for a target. . You specify the flow of data by connecting objects in the workspace from left to right in the order you want the data to be moved. Creates a new catch object that catches Jobs and work flows errors in a job.Icon Tool Description Available in Template Creates a new table for a target. work flows. The workspace provides a place to manipulate objects and graphically assemble data movement processes. data flows Using the workspace When you open a job or any object within a job hierarchy. the workspace becomes active with your selection. Jobs and work flows Jobs and work flows Conditional Creates a new conditional object. While Loop Repeats a sequence of steps in a work flow as long as a condition is true. These processes are represented by icons that you drag and drop into a workspace to create a diagram. extension Script Creates a new script object.

you can export system configurations to a separate flat file which you can later import. In many organizations. When designing jobs. Click OK. a Data Services designer defines the required datastore and system configurations. 2. enter the system configuration name. determine and create datastore configurations and system configurations depending on your business environment and rules. By maintaining system configurations in a separate file. or each time you check in and check out the datastore. you avoid modifying your datastore each time you import or export a job. 3. Use the SC_ prefix in the system configuration name so that you can easily identify this file as a system configuration. select the appropriate datastore configuration that you want to use when you run a job using this system configuration. You cannot check in or check out system configurations. . 4.Creating a system configuration System configurations define a set of datastore configurations that you want to use together when running a job. and a system administrator determines which system configuration to use when scheduling or starting a job in the Administrator. In the drop-down list for each datastore column. particularly when exporting. The System Configuration Editor dialog box displays columns for each datastore. From the Tools menu. In the Configuration name column. To create a system configuration 1. Create datastore configurations for the datastores in your repository before you create the system configurations for them. However. select System Configurations. You cannot define a system configuration if your repository does not contain at least one datastore with multiple configurations. Data Services maintains system configurations separately.

After completing this unit. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. . and delete objects in the workspace Create a work flow Creating a project A project is a single-use object that allows you to group jobs. It is the highest level of organization offered by Data Services. select New Project. Only one project can be open at a time. where you can drill down into additional levels:. The objects in a project appear hierarchically in the project area in Designer. you can expand it to view the lower-level objects. connect. A project is used solely for organizational purposes. you will be able to: • • • • Create a project Create a job Add. To create a new project 1. The objects in the project area also display in the workspace. If a plus sign (+) appears next to an object. For example. Opening a project makes one group of objects easily accessible in the user interface. which are grouped into projects. From the Project menu.Working with objects Introduction Data flows define how information is moved from source to target. These data flows are organized into executable jobs.

they also appear in the project area. 3. From the Project menu. select Open. Click Open.New dialog box displays. 2.You can also right-click the white space on the Projects tab of the Local Object Library and select New from the menu. The Project . The Project . Click Create. 3. The new project appears in the project area. As you add jobs and other lower-level objects to the project. To open an existing project 1.Open dialog box displays. . From the Project menu. It cannot contain blank spaces. Select the name of an existing project from the list. Enter a unique name in the Project name field. 2. select Save All. To save a project 1. If another project is already open. Data Services closes that project and opens the new one in the project area. The name can include alphanumeric characters and underscores (_).

Creating a job A job is the only executable object in Data Services. A job diagram is made up of two or more objects connected together. Deselect any listed object to avoid saving it. In production. . you can manually execute and test jobs directly in Data Services. You can include any of the following objects in a job definition: • • • • Work flows Scripts Conditionals While loops • Try/catch blocks • Data flows 0 Source objects 0 Target objects 0 Transforms If a job becomes complex. 3. and then create a single job that calls those work flows. you can organize its content into individual work flows.The Save all changes dialog box lists the jobs. and data flows that you edited since the last save. you can schedule batch jobs and set up real-time jobs as services that execute a process when Data Services receives a message request. Click OK. 2. Each step is represented by an object icon that you place in the workspace to create a job diagram. work flows. When you are developing your data flows. You are also prompted to save all changes made in a job when you execute the job or exit the Designer. A job is made up of steps that are executed together.

It cannot contain blank spaces.Tip: It is recommended that you follow consistent na ming con ventions to facilitate object identification across all systems in your en terprise. The name can include alphanumeric characters and underscores (_). connecting. . you can add objects to the job workspace area using either the Local Object Library or the tool palette. In the project area. To create a job in the project area 1. To add objects from the tool palette to the workspace • In the tool palette. Click the cursor outside of the job name or press Enter to commit the changes. 2. 2. right-click the project name and select New Batch Job from the menu. you must associate the job and all related objects to a project before you can execute the job. click the tab for the type of object you want to add. move the cursor to the workspace. When you create a job in the Local Object Library. In the Local Object Library. You can also create a job and related objects from the Local Object Library. A new batch job is created in the project area. and then click the workspace to add the object. 3. Data Services opens a new workspace for you to define the job. To add objects from the Local Object Library to the workspace 1. Adding. click the desired object. Edit the name of the job. Click and drag the selected object on to the workspace. and deleting objects in the workspace After creating a job.

Work flows can contain data flows. and you can nest calls to any depth. Steps in a work flow execute in a sequence from left to right. To connect objects in the workspace area • Click and drag from the triangle or square of an object to the triangle or square of the next object in the flow to connect the objects. 4. Defining the order of execution in work flows The connections you make between the icons in the workspace determine the order in which work flows execute. try/catch blocks. 2. To disconnect objects in the workspace area • Select the connecting line between the objects and press Delete. Note: In essence. jobs are just work flows that can be executed. the purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. conditionals. Almost all of the features documented for work flows also apply to jobs. A work flow can even call itself. Open the job or work flow to which you want to add the work flow. To create a work flow 1. while loops. They can also call other work flows. For example. You must connect the objects in a work flow when there is a dependency between the steps. Select the Work Flow icon in the tool palette. Click the workspace where you want to place the work flow. Click the cursor outside of the work flow name or press Enter to commit the changes. Enter a unique name for the work flow. and scripts.Creating a work flow A work flow is an optional object that defines the decision-making process for executing other objects. Ultimately. 5. 3. . unless the jobs containing those work flows execute in parallel. elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path.

you can define each sequence as a separate work flow. such as jobs with try/catch blocks or conditionals. You might use this feature when developing complex jobs with multiple paths. . you must define Work Flow A: Next. create Work Flow C to call Work Flows A and B: You can specify a job to execute a particular work flow or data flow once only. and skips subsequent occurrences in the job. Data Services only executes the first occurrence of the work flow or data flow. as in this example: First. and you want to ensure that Data Services only executes a particular work flow or data flow one time.To execute more complex work flows in parallel. If you specify that it should be executed only once. define Work Flow B: Finally. and then call each of the work flows from another work flow.

• Pass parameters to and from data flows. produces an intermediate result. This result is called a data set. and target objects that represent the key activities in data integration and data quality processes. . transform.Creating a data flow Introduction Data flows contain the source. For example. After completing this unit. the results of a SQL statement contain a WHERE clause that flows to the next step in the data flow. be further filtered and directed into yet another data set. A work flow does not operate on data sets and cannot provide more data to a data flow. you will be able to: • Create a data flow • Explain source and target objects • Add source and target objects to a data flow Using data flows Data flows determine how information is extracted from sources. The lines connecting objects in a data flow represent the flow of data through data integration and data quality processes. up to the target definition. and loaded into targets. Data flows are closed operations. in turn. even when they are steps in a work flow. The objects that you can use as steps in a data flow are: • Source and target objects • Transforms The connections you make between the icons determine the order in which Data Services completes the steps. The intermediate result consists of a set of rows from the previous operation and the schema in which the rows are arranged. • Define the conditions appropriate to run data flows. This data set may. however. a work flow can: • Call data flows to perform data movement operations. transformed. Any data set created within a data flow is not available to other steps in the work flow. Each icon you place in the data flow diagram becomes a step in the data flow. Using data flows as steps in work flows Each step in a data flow.

5. sorts. Changing data flow properties You can specify the following advanced data properties for a data flow: Data Flow Property Description Execute only once When you specify that a data flow should only execute once. Database links are communication paths between one database server and another. groups. Degree of parallelism (DOP) is a property of a data flow that defines how many times each transform within a data flow replicates to process a parallel subset of data. Database links allow local users to access data on a remote database. lookups. filtering. and table comparisons. Select the Data Flow icon in the tool palette. even if the data flow is contained in a work flow that is a recovery unit that re-executes. Enter a unique name for your data flow. 6. For more information see “Database link support for push-down operations across datastores” in the Data Services Performance Optimization Guide. Select one of the following values: Use database links Degree of parallelism Cache type . Double-click the data flow to open the data flow workspace. For more information see “Degree of parallelism” in the Data Services Performance Optimization Guide. Open the job or work flow in which you want to add the data flow.To create a new data flow 1. a batch job will never re-execute that data flow after the data flow completes successfully. 3. 4. Data flow names can include alphanumeric characters and underscores (_). Click the workspace where you want to add the data flow. You should not select this option if the parent work flow is a recovery unit. Click the cursor outside of the data flow or press Enter to commit the changes. You can cache data to improve performance of operations such as joins. which can be on the local or a remote computer of the same or different database type. They cannot contain blank spaces. 2.

Right-click the data flow and select Properties from the menu. Click OK.Data Flow Property Description • In Memory: Choose this value if your data flow processes a small amount of data that can fit in the available memory. 2. To change data flow properties 1. For more information about how Data Integrator processes data flows with multiple properties. see “Data Flow” in the Data Services Resource Guide. This is the default. For more information. see “Tuning Caches” in the Data Services Performance Optimization Guide. • Pageable: Choose this value if you want to return only a subset of data at a time to limit the resources required. The Properties window opens for the data flow. Change the properties of the data flow as required. . 3.

Source only such as the Date Generation transform. A source in real-time jobs. Transform . Template table Source and target File Source and target Document A file with an application-specific format Source and target (not readable by SQL or XML parser). primarily for debugging data flows). XML file Source and target XML message Source only XML template file An XML file whose format is based on the preceding transform output (used in Target only development.Explaining source and target objects A data flow directly reads data from source objects and loads data to target objects. A pre-built set of operations that can create new data. Object Description Type Table A file formatted with columns and rows as used in Source and target relational databases. A delimited or fixed-width flat file. A template table that has been created and saved in another data flow (used in development). A file formatted with XML tags.

4. To add a source or target object to a data flow 1. Click and drag the object to the workspace. 3. In the workspace. Do one of the following: • To add a database table. 2. . Select Make Source or Make Target from the menu.Adding source and target objects Before you can add source and target objects to a data flow. Add and connect objects in the data flow as appropriate. depending on whether the object is a source or target object. or create the file format for flat files. select the table. you must first create the datastore and import the table metadata for any databases. in the Formats tab of the Local Object Library. open the data flow in which you want to place the object. A pop-up menu appears for the source or target object. • To add a flat file. select the file format. in the Datastores tab of the Local Object Library. 5.

and is included in most data flows. The workspace can contain these areas: • Input schema area • Output schema area • Parameters area .Using the Query transform Introduction The Query transform is the most commonly-used transform. you will be able to: • Describe the transform editor • Use the Query transform Describing the transform editor The transform editor is a graphical interface for defining the properties of transforms. It enables you to select data from a source and filter or reformat it as it moves to the target. After completing this unit.

the output schema can be defined based on your preferences. For template tables. Below the input and output schema areas is the parameters area. Explaining the Query transform The Query transform is used so frequently that it is included in the tool palette with other standard objects. For any data that needs to move from source to target. including any functions. To create this relationship. . you must map each input column to the corresponding output column. The options available on this tab differs based on which transform or object you are modifying. It retrieves a data set that satisfies conditions that you specify. The output schema area displays the schema of the output data set.The input schema area displays the schema of the input data set. similar to a SQL SELECT statement. a relationship must be defined between the input and output schemas. For source objects and some transforms. The I iconindicates tabs ) ( containing user-defined entries. this area is not available.

Input/Output The data input is a data set from one or more sources with rows flagged with a NORMAL operation code. • Perform transformations and functions on the data. and function results to the output schema. The data output is a data set based on the conditions you specify and using the schema specified in the output schema area. and data output results for the Query transform. Note: When working with nested data from an XML file. The next section gives a brief description the function. The output schema can contain multiple columns and functions.The Query transform can perform the following operations: • Filter the data extracted from sources. • Add new columns. The NORMAL operation code creates a new row in the target. . nested schemas. which provides options for unnesting. If a row is flagged as NORMAL when loaded into a target table or file. Output schema area displays the schema output from the Query transform as a hierarchical tree. For more information on the Query transform see “Transforms” Chapter 5 in the Data Services Reference Guide. All the rows in a data set are flagged as NORMAL when they are extracted by a source table or file. Each input schema can contain multiple columns. data input requirements. For example. • Perform data nesting and unnesting. • Join data from multiple sources. you can use the Query transform to unnest the data using the right-click menu for the output schema. it is inserted as a new row in the target. • Map columns from input to output schemas. you could use the Query transform to select a subset of the data in a table to show only those records from a specific region. options. • Assign primary keys to output columns. Options The input schema area displays all schemas input to the Query transform as a hierarchical tree.

This indicates that the column has a simple mapping. Create separate sub data flows to process any of the following resource-intensive query clauses: OUTER JOIN WHERE GROUP BY ORDER BY Advanced . This indicates that the column has a complex mapping. Set conditions that determine which rows are output. Data Integrator does not perform a complete validation during design. Data Services combines or aggregates the values in the remaining columns. so not all incorrect mappings will necessarily be flagged. Specify the columns you want used to sort the output data set. Specify the input schemas used in the current output schema. Specify a list of columns for which you want to combine output. The parameters area of the Query transform includes the following tabs: Tab Description Mapping SELECT FROM Specify how the selected output column is derived. This indicates that the column mapping is incorrect.Icons preceding columns are combinations of these graphics: Icon Description This indicates that the column is a primary key. such as a transformation or a merge between two source columns. Specify an inner table and an outer table for joins that you want treated as outer joins. Select only distinct rows (discarding any duplicate rows). A simple mapping is either a single column or an expression with no input column. For each unique set of values in the group by list.

. You can either type the column name in the parameters area or click and drag the column from the input schema pane. see “Distributed Data Flow execution” in the Data Services Designer Guide. • Select the output column.Tab Description • • • • DISTINCT GROUP BY JOIN ORDER BY For more information. • Select multiple input columns (using Ctrl+click or Shift+click) and drag onto Query output schema for automatic mapping. release the cursor. Search for a specific work or item in the input schema or the output schema. highlight and manually delete the mapping on the Mapping tab in the parameters area. do any of the following: • Drag and drop a single column from the input schema area into the output schema area. • Select the output column and manually enter the mapping on the Mapping tab in the parameters area. Find To map input columns to output columns • In the transform editor. and select Remap Column from the menu. • Drag a single input column over the corresponding output column.

2. Changes are automatically committed. In a data flow. or a template table. you will be able to: • Access the table table editor • Set target table options • Use template tables Accessing the target table editor The target table editor provides a single location to change settings for your target tables.Using target tables Introduction The target object for your data flow can be either a physical table or file. To access the target table editor 1. double-click the target table. 3. Click Back to return to the data flow. Change the values as required. . The target table editor opens in the workspace. After completing this unit.

You can set the following table loading options in the Options tab of the target table editor: Option Description Rows per commit Specifies the transaction size in number of rows. Note: Most of the tabs in the target table editor focus on migration or performance-tuning techniques. table loading options. which are outside the scope of this course. Specifies how the input columns are mapped to output columns. and tuning techniques for loading a job.Setting target table options When your target object is a physical table in a database. There are two options: Column comparison . the target table editor opens in the workspace with different tabs where you can set database type properties.

the corresponding target column is not updated during auto correct loading. Ensures that NULL source columns are not updated in the target table during auto correct loading. if you choose a Rows per commit of 1000 and set the number of loaders to three. Validation errors occur if the datatypes of the columns do not match. Writes rows that cannot be loaded to the overflow file for recovery purposes. the third 1000 rows to the third loader. When this value appears in the source column. Specifies the number of loaders (to a maximum of five) and the number of rows per commit that each loader receives during parallel loading. The overflow format can include the data rejected and the operation being performed (write_data) or the SQL command used to produce the rejected operation (write_sql). Options are enabled for the file name and file format. Defaults to not selected. Specifies a value that might appear in a source column that you do not want updated in the target table. The second 1000 rows are sent to the second loader. Number of loaders For example.Option Description • Compare_by_position — disregards the column names and maps source columns to target columns by position. Delete data from table before loading Use overflow file Ignore columns with value Ignore columns with null . the first 1000 rows are sent to the first loader. and the next 1000 rows back to the first loader. Sends a TRUNCATE statement to clear the contents of the table before loading during batch jobs. • Compare_by_name — maps source columns to target columns by name. You can enter spaces.

This option allows you to commit data to multiple tables as part of the same transaction. Ensures that the same row is not duplicated in a target table. it inserts the new row regardless of other options. no data is committed to any of the tables. Update key columns Auto correct load When Auto correct load is selected. Transaction order . If you choose to enable transactional loading. Data Integrator reads a row from the source and checks if a row exists in the target table with the same values in the primary key. it updates the row depending on the values of Ignore columns with value and Ignore columns with null. and Delete data from table before loading. and overflow file specification. By default. Data Integrator reports a memory error. Number of loaders. If a matching row does not exist. Use overflow file. Indicates where this table falls in the loading order of the tables being loaded. This is particularly useful for data recovery operations.Option Description Use input keys Enables Data Integrator to use the primary keys from the source table. Updates key column values when it loads data to the target. Indicates that this target is included in the transaction processed by a batch or real-time job. If a matching row exists. Include in transaction The tables must be from the same datastore. If the data being buffered is larger than the virtual memory available. Data Integrator also does not parameterize SQL or push operations to the database if transactional loading is enabled. these options are not available: Rows per commit. By default. Enable partitioning. If loading fails for any one of the tables. Data Integrator uses the primary key of the target table. Transactional loading can require rows to be buffered to ensure the correct load order. there is no ordering.

Template tables are particularly useful in early application development when you are designing and testing a project. To create a template table 1. In the tool palette. the loading operations are applied according to the order. Tables with the same transaction order are loaded together. you can no longer alter the schema. Open a data flow in the workspace. With template tables. you can convert the template table in the repository to a regular table. 2. functions. You must convert template tables so that you can use the new table in expressions. After a template table is created in the database. Data Services automatically creates the table in the database with the schema defined by the data flow when you execute a job. You can modify the schema of the template table in the data flow where the table is used as a target. you can use it as a source in other data flows. . See the Data Services Performance Optimization Guide and “Description of objects” in the Data Services Reference Guide for more information. you might find it convenient to use template tables to represent database tables. and transform options. Any changes are automatically applied to any other instances of the template table. it can be used only as a target in one data flow. Instead. After a template table is converted. If you specify orders among the tables. Tables with a transaction order of zero are loaded at the discretion of the data flow process. After creating a template table as a target in one data flow. you do not have to initially create a new table in your RDBMS and import the metadata into Data Services. Although a template table can be used as a source table in multiple data flows. Using template tables During the initial design of an application. click the Template Table icon and click the workspace to add a new template table to the data flow.Option Description All loaders have a transaction order of zero.

2. expand the branch for the datastore to view the template table. You also can create a new template table in the Local Object Library Datastore tab by expanding a datastore and right-clicking Templates. To convert a template table into a regular table from the Local Object Library 1. 3. Click OK. In the In datastore drop-down list. . In the Table name field. On the Datastores tab of the Local Object Library.The Create Template dialog box displays. select the datastore for the template table. enter the name for the template table. Right-click a template table you want to convert and select Import Table from the menu. 4. 5.

3. . the table is listed under Tables rather than Template Tables. Right-click the template table you want to convert and select Import Table from the menu. On the Datastore tab of the Local Object Library. To convert a template table into a regular table from a data flow 1. select Refresh. from View menu. Open the data flow containing the template table.Data Services converts the template table in the repository into a regular table by importing it from the database. To update the icon in all data flows. 2.

Setting execution properties When you execute a job. jobs. both the Designer and designated Job Server (where the job executes. If a job has syntax errors. Disable data validation statisticsDoes not collect audit statistics for this specific job collection execution. you can execute the job in Data Services to see how the data moves from source to target. the following options are available in the Execution Properties window: Option Description Print all trace messages Records all trace messages in the log. Enable auditing Collects audit statistics for this specific job execution. The Job Server must be running. you can then execute the job. it does not execute. usually on the same machine) must be running. use the Data Services Management Console or use a third-party scheduler. . you will be able to: • Understand job execution • Execute the job Explaining job execution After you create your project. You can run jobs two ways: • Immediate jobs Data Services initiates both batch and real-time jobs and runs them immediately from within the Designer. and associated data flows. • Scheduled jobs Batch jobs are scheduled. For these jobs. You will likely run immediate jobs only during the development cycle. To schedule a job. After completing this unit.Executing the job Introduction Once you have created a data flow.

Optimizes Data Services to use the cache statistics collected on a previous execution of the job. A system configuration defines a set of datastore configurations.Each sub-data flow (can be a separate transform or function) within a data flow will execute on a separate Job server. • Data flow . Data Services retrieves the results from any steps that were previously executed successfully and re-executes any other steps. Resumes a failed job. This option is not available when a job has not yet been executed or when recovery mode was disabled during the previous run. • Sub-data flow . The options are: • Job .Each data flow within the job will execute on a separate server. which define the datastore connections.Option Description Enable recovery Enables the automatic recovery feature. Data Services saves the results from completed steps and allows you to resume failed jobs. Recover from last failed execution Collect statistics for optimization Collect statistics for monitoring Use collected statistics System configuration If a system configuration is not specified. Job Server or Server Group Specifies the Job Server or server group to execute this job. Allows a job to be distributed to multiple Job Servers for processing. Displays cache statistics in the Performance Monitor in Administrator. Collects statistics that the Data Services optimizer will use to choose an optimal cache type (in-memory or pageable). When enabled. This option is a run-time property. This option is a run-time property that is only available if there are system configurations defined in the repository. Data Services uses the default datastore configuration for each datastore. Distribution level . Specifies the system configuration to use when executing this job.The entire job will execute on one server.

Click OK.Executing the job Immediate or on demand tasks are initiated from the Designer. To execute a job as an imme diate task 1. 3. Select the required job execution parameters. 2. Both the Designer and Job Server must be running for the job to execute. 4. Click OK. The Execution Properties dialog box displays. Data Services prompts you to save any objects that have not been saved. right-click the job name and select Execute from the menu. . In the project area.

• Describe platform transforms • Use the Map Operation transform • Use the Validation transform • Use the Merge transform • Use the Case transform • Use the SQL transform .Lesson 4 Using Platform Transforms Lesson introduction A transform enables you to control how data sets change in a data flow.

you will be able to: • • • • Explain transforms Describe the platform transforms available in Data Services Add a transform to a data flow Describe the Transform Editor window Explaining transforms Transforms are objects in data flows that operate on input data sets by changing them or by generating one or more new data sets. For example. History Preserve. . such as values in specific columns in a data set. After completing this unit. transforms and functions operate on a different scale: • Functions operate on single values.Describing platform transforms Introduction Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target. Some transforms. However. can be used as source objects. The Query transform is the most commonly-used transform. Each transform provides different options that you can specify based on the transform's function. Transforms are added as components to your data flow in the same way as source and target objects. • Transforms operate on data sets by creating. Transforms are often used in combination to create the output data set. in which case they do not have input options. the Table Comparison. updating. Transforms are similar to functions in that they can produce the same or similar values during processing. such as the Date Generation and SQL transforms. and deleting rows of data. and Key Generation transforms are used for slowly changing dimensions. You can choose to edit the input data. output data. and parameters in a transform.

Retrieves a data set that satisfies conditions that you specify. Allows conversions between operation codes. Data that fails validation can be filtered out or replaced.Describing platform transforms The following platform transforms are available on the Transforms tab of the Local Object Library: Icon Transform Description Case Divides the data from an input data set into multiple output data sets based on IF-THEN-ELSE branch logic. Allows you to specify validation criteria for an input data set. A query transform is similar to a SQL SELECT statement. Performs the indicated SQL query operation. You can have one validation rule per column. Map Operation Merge Query Row Generation SQL Validation . Unifies rows from two or more input data sets into a single output data set. Generates a column filled with integers starting at zero and incrementing by one to the end value you specify.

The operation codes are as follows: Operation Code Description Creates a new row in the target. If a row is flagged as NORMAL when loaded into a target table or file. Is ignored by the target. can accept data sets with rows flagged as DELETE. you will be able to: • Describe map operations • Use the Map Operation transform Describing map operations Data Services maintains operation codes that describe the status of each row in each data set described by the inputs to and outputs from objects in data flows.Using the Map Operation transform Introduction The Map Operation transform enables you to change the operation code for records. NORMAL . DELETE Only the History Preserving transform. All rows in a data set are flagged as NORMAL when they are extracted by a source table or file. Most transforms operate only on rows flagged as NORMAL. Overwrites an existing row in the target table. The operation codes indicate how each row in the data set would be applied to a target table if the data set were loaded into a target. INSERT Only History Preserving and Key Generation transforms can accept data sets with rows flagged as INSERT as input. Rows flagged as DELETE are not loaded. it is inserted as a new row in the target. Creates a new row in the target. with the Preserve delete row(s) as update row(s) option selected. UPDATE Only History Preserving and Key Generation transforms can accept data sets with rows flagged as UPDATE as input. After completing this unit.

For example. DELETE. The next section gives a brief description the function. or DISCARD. NORMAL. UPDATE. Inputs/Outputs Input for the Map Operation transform is a data set with rows flagged with any operation codes. Use caution when using columns of datatype real in this transform. For more information on the Map Operation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. Options The Map Operation transform enables you to set the Output row type option to indicate the new operations desired for the input data set. Data Services can push Map Operation transforms to the source database.Explaining the Map Operation transform The Map Operation transform allows you to change operation codes on data sets to produce the desired output. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target. . because comparison results are unpredictable for this datatype. data input requirements. Output for the Map Operation transform is a data set with rows flagged as specified by the mapping operations. It can contain hierarchical data. and data output results for the Map Operation transform. you can use this transform to map the UPDATE operation to an INSERT. options. Choose from the following operation codes: INSERT. if a row in the input data set has been updated in some previous operation in the data flow.

11. 5. Do not execute the solution job. To check the solution. as this may override the results in your target table. A solution file called SOLUTION_M apOperation. In the data flow workspace. In the data flow workspace.Activity: Using the Map Operation transform End users of employee reports have requested that employee records in the data mart contain only current employees. Instructions 1. Add a Map Operation transform between the Query transform and the target table and connect it to both. Objective • Use the Map Operation transform to remove any employee records that have a value in the discharge_date column. import the file and open it to view the data flow design and mapping logic. Add the Employee table from the HR_datamart datastore as the target object. 6. change the settings so that rows with an input operation code of NORMAL have an output operation code of DELETE.Execute Alpha_Employees_Current_Job with the default execution properties and save all objects you have created. The expression should be: employee. add the Employee table from the Alpha datastore as the source object. Add the Query transform to the workspace and connect all objects. 2. On the WHERE tab. 9. 3. create a new batch job called Alpha_Employees_Current_Job with a data flow called Alpha_Employees_Current_DF. map all columns from the input schema to the same column in the output schema. 8.Return to the data flow workspace and view data for both the source and target tables. In the transform editor for the Map Operation transform. In the Omega project. . create an expression to select only those rows where discharge_date is not empty. disconnect the Query transform from the target table. Note that two rows were filtered from the target table. 10. In the transform editor for the Query transform.atl is included in your Course Resources.discharge_date is not null 7. 4.

if you want to load only sales records for October 2007. Data Services looks at this date field in each record to validate if the data meets this requirement. . For example. you can choose to pass the record into a Fail table. correct it in the Pass table. It filters out or replaces data that fails your criteria. The Validation transform qualifies a data set based on rules for input schema columns. you can set the transform to ensure that all values: • Are within a specific range • Have the same format • Do not contain NULL values The Validation transform allows you to define a re-usable business rule to validate each record and column. After completing this unit. The available outputs are pass and fail.Using the Validation transform Introduction The Validation transform enables you to create validation rules and move data into target objects based on whether they pass or fail validation. you will be able to: • Use the Validation transform Explaining the Validation transform Use the Validation transform in your data flows when you want to ensure that the data at any stage in the data flow meets your criteria. If it does not. For example. you would set up a validation rule that states: Sales Date is between 10/1/2007 to 10/31/2007. or do both. You can have one validation rule per column.

The Pass output schema is identical to the input schema. Data Services does not track the results. for any NULL values. For more information on the Validation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. Continuing with the example above. You can load pass and fail data into multiple targets. For example. For more information on creating a custom Validation functions. and data output results for the Validation transform. If a row has conditions set for multiple columns and the Pass. Chapter 12 in the Data Services Reference Guide. you may want to select the Send to Fail option to send all NULL values to a specified FAILED target table. data input requirements. Options When you use the Validation transform. The Validation transform offers several options for creating this validation rule: . “<ValidationTransformName> failed rule(s): c1:c2”. 0 The letter F is used for sent only to the Fail output. If you choose to send failed data to the Pass output. see “Validation Transform”. options. Pass. specify the condition IS NOT NULL if you do not want any NULLS in data passed to the specified target. For example. You can also create a custom Validation function and select it when you create a validation rule.Your validation rule consists of a condition and an action on failure: • Use the condition to describe what you want for your valid data. you select a column in the input schema and create a validation rule in the Validation transform editor. Both. The names of input columns associated with each message are separated by colons. then the whole row is sent only to the Fail output. You may want to substitute a value for failed data that you send to the Pass output because Data Services does not add columns to the Pass output. Data Services adds the following two columns to the Fail output schemas: • The DI_ERRORACTION column indicates where failed data was sent in this way: 0 The letter B is used for sent to both Pass and Fail outputs. if one column’s action is Send to Fail and the column fails. Other actions for other validation columns in the row are ignored. then the precedence order is Fail. Input/Output Only one source is allowed as a data input for the Validation transform. Fail. • Use the Action on Failure area to describe what happens to invalid or failed data. The next section gives a brief description the function. For example. The Validation transform outputs up to two different data sets based on whether the records pass or fail the validation condition you specify. • The DI_ERRORCOLUMNS column displays all error messages for columns with failed rules. and Both actions are specified for the row.

You can define the NOT NULL constraint for the column in the LOOKUP table to ensure the Exists in table condition executes properly. >) and enter the associated value. This option also uses the LOOKUP_EXT function. • Exists in table: specify that a column’s value must exist in a column in another table. then Data Services processes it as TRUE. decimal. <. or time. • Custom validation function: select a function from a list for validation purposes. Data Services converts substitute values in the condition to a corresponding column datatype: integer. • Match pattern: enter a pattern of upper and lowercase alphanumeric characters to ensure the format of the column is correct. • In: specify a list of possible values for a column. The Validation Do not validate when NULL Condition . =. • Custom condition: create more complex expressions using the function and smart editors. varchar. • Between/and: specify a range of values for a column. Data Services supports Validation functions that take one parameter and return an integer datatype. Define the condition for the validation rule: • Operator: select an operator for a Boolean expression (for example. If a return value is not a zero. timestamp. Send all NULL values to the Pass output automatically. date. Data Services will not apply the validation rule on this column when an incoming value for it is NULL.Option Description Enable Validation Turn the validation rule on and off for the column. datetime.

Define where a record is loaded if it fails the validation rule: • Send to Fail • Send to Pass • Send to both If you choose Send to Pass or Send to Both. 5. Double-click the Validation transform to open the transform editor.MM. Connect the source object to the transform.12. for example.MM.Option Description transform requires that you enter some values in specific formats: • date (YYYY.DD HH24:MI:SS. On the Transforms tab of the Local Object Library. depending on the options you select.DD) • datetime (YYYY. click and drag the Validation transform to the workspace to the right of your source object. Open the data flow workspace. 4. Data Services produces an error because you must enter this date as 2004. you specify a date as 12-01-2004.FF) If. 2. You will require one target object for records that pass validation.MM. 3. Action on Fail To create a validation rule 1.DD HH24:MI:SS) • time (HH24:MI:SS) • timestamp (YYYY. Add your source object to the workspace. . Add your target objects to the workspace. you can choose to substitute a value or expression for the failed values that are sent to the Pass output. 6. and an optional target object for records that fail validation.01.

15. 13.On the Properties tab. select the Enable Validation option. In the parameters area.Click Back to return to the data flow workspace. All conditions must be Boolean expressions. 11. click to select an input schema column. This option is only available if you select Send to Pass or Send to Both. 10. enter a name and description for the validation rule. select an action. In the Condition area.If desired. 12. 14. 8. select a condition type and enter any associated value required. select the For pass. substitute with option and enter a substitute value or expression for the failed value that is sent to the Pass output. In the input schema area. 9.Click and drag from the transform to the target object.On the Action On Failure tab. .Release the mouse and select the appropriate label for that object from the pop-up menu.7.

• Create a column on the target table for employee information so that orders taken by employees who are no longer with the company are assigned to a default current employee using the validation transform in a new column named order_assigned_to. In the Column Attributes pane. • Create a column to hold the employee ID of the employee who originally made the sale.txt.Repeat step 14 and step 15 for all target objects. adjust the datatypes for the columns based on their content: Column Datatype ORDERID SHIPPERNAME SHIPPERADDRESS SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION int varchar(50) varchar(50) varchar(50) int varchar(20) varchar(20) int . • Replace null values in the shipper fax column with a value of 'No Fax' and send those rows to a separate table for follow up. Instructions 1. Use the structure of the text file to determine the appropriate settings.16. You will use the Validation transform to validate order data from flat file sources and the alpha orders table before merging it. Objectives • Join the data in the Orders flat files with that in the Order_Shippers flat files. Activity: Using the Validation transform Order data is stored in multiple formats with different structures and different information. Create a file format called Order_Shippers_Format for the flat file Order_Shippers_04_20_07. 2.

In the Root directory. 9. 6. edit the source objects to point to the file on the Job Server. 4.In the transform editor for the Query transform. The instructor will provide this information. If necessary.ORDERDATE Order_Shippers_Format. 10. 5. enter the correct path. this step is required. Tip: You can use a wildcard to replace the dates in the file names.CUSTOMERID Orders_Format. 7.Add the following mappings in the Query transform: S chema Out Mapping ORDERID CUSTOMERID ORDERDATE SHIPPERNAME SHIPPERADDRESS Orders_Format. 8. In the Omega project. create a new batch job called Alpha_Orders_Validated_Job and two data flows. select Job Server.ORDERID 12. one named Alpha_Orders_Files_DF. The expression should be as follows: Order_Shippers_Format. Edit the source objects so that the Orders_Format source is using all three related orders flat files and the Order_Shippers_Format source is using all three order shippers files. 11. Add the file formats Orders_Format and Order_Shippers_Format as source objects to the Alpha_Orders_Files_DF data flow workspace. and the second named Alpha_Orders_DB_DF. create a WHERE clause to join the data on the OrderID values.ORDERID Orders_Format. In the Location drop-down list.SHIPPERADDRESS .Add a Query transform to the workspace and connect it to the two source objects. Edit the Orders_Format source object to change the Capture Data Conversion Errors option to Yes. If the Job Server is on a different machine than Designer.SHIPPERNAME Order_Shippers_Format.ORDERID = Orders_Format.Column Datatype SHIPPERPOSTALCODE varchar(15) 3.

15.SHIPPERPOSTALCODE 13. map all of the columns from the input schema to the output schema.S chema Out Mapping SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION SHIPPERPOSTALCODE Order_Shippers_Format. substitute '3Cla5' to assign it to the default employee. .In the transform editor for the Validation transform.Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a datatype of varchar(15) and map it to Orders_Format.Connect the pass output from the Validation transform to Orders_Files_Work and the fail output to Orders_Files_No_Fax. The expression should be as follows: HR_DATAMART.Add a Query transform to the workspace and connect it to the source.Enable validation for the SHIPPERFAX column to send NULL values to both pass and fail.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a datatype of varchar(15) and map it to Orders_Format. enable validation for the ORDER_ASSIGNED_TO column to verify the value in the column exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore. 21. except the EMPLOYEEID column.Add a Validation transform to the right of the Query transform and connect the transforms.EM PLOYEEID 17.SHIPPERFAX Order_Shippers_Format.SHIPPERPHONE Order_Shippers_Format.SHIPPERCOUNTRY Order_Shippers_Format. 16. 18. 14. For pass. 22. one called Orders_Files_Work and one called Orders_Files_No_Fax.EM PLOYEE.SHIPPERREGION Order_Shippers_Format.In the Alpha_Orders_DB_DF workspace.Set the action on failure for the Order_Assigned_To column to send to both pass and fail.In the transform editor for the Query transform. add the Orders table from the Alpha datastore as the source object.DBO. 23.Add two target tables in the Delta datastore as targets.EMPLOYEEID. 20. 19.EMPLOYEEID.SHIPPERCITY Order_Shippers_Format. substituting 'No Fax' for pass.

24.atl is included in your Course Resources. For pass. 28. Do not execute the solution job.EMPLOYEEID.Add a Validation transform to the right of the Query transform and connect the transforms. 32.Change the names of the following Schema Out columns: Old column name New column name SHIPPERCITYID SHIPPERCOUNTRYID SHIPPERREGIONID SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION 25. To check the solution.EMPLOYEEID.Enable validation for the ShipperFax column to send NULL values to both pass and fail.Execute Alpha_Orders_Validated_Job with the default execution properties and save all objects you have created. as this may override the results in your target table.Set the action on failure for the Order_Assigned_To column to send to both pass and fail. 31. 27. import the file and open it to view the data flow design and mapping logic. one named Orders_DB_Work and one named Orders_DB_No_Fax.Connect the pass output from the Validation transform to Orders_DB_Work and the fail output to Orders_DB_No_Fax.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a data type of varchar(15) and map it to Orders. substituting 'No Fax' for pass. A solution file called SOLUTION_Validation.View the data in the target tables to view the differences between passing and failing records. 34. .Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a data type of varchar(15) and map it to Orders. substitute '3Cla5' to assign it to the default employee. 26.Enable validation for Order_Assigned_To to verify the column value exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore. 30. 33. 29.Add two target tables in the Delta datastore as targets.

. and data output results for the Merge transform. Input/Output The Merge transform performs a union of the sources. you will be able to: • Use the Merge transform Explaining the Merge transform The Merge transform combines incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets.Using the Merge transform Introduction The Merge transform allows you to combine multiple sources with the same schema into a single target. including: • Number of columns • Column names • Column datatypes If the input data set contains hierarchical data. All sources must have the same schema. For more information on the Merge transform see “Transforms” Chapter 5 in the Data Services Reference Guide. data input requirements. the names and datatypes must match at every level of the hierarchy. options. you could use the Merge transform to combine two sets of address data: The next section gives a brief description the function. After completing this unit. For example.

In the Omega project. In the transform editor for the Query transform connected to the orders_files_work table. add the orders_file_work and orders_db_work tables from the Delta datastore as the source objects. In the data flow workspace. Options The Merge transform does not offer any options. Change the datatype for the following Schema Out columns as specified: Column Type ORDERDATE SHIPPERADDRESS SHIPPERCOUNTRY SHIPPERREGION datetime varchar(100) varchar(50) varchar(50) . the nested data is passed through without change. you can add the Query transform to one of the tables before the Merge transform to redefine the schema to match the other table. create a new batch job called Alpha_Orders_M erged_Job with a data flow called Alpha_Orders_M erged_DF . 3. If columns in the input set contain nested schemas. connecting each source object to its own Query transform. and then merge them into a single data set. Instructions 1. 5. Add two Query transforms to the data flow. but the output is for two different sources: flat files and database tables. The output data set contains a row for every row in the source data sets.The output data has the same schema as the source data. 2. map all columns from input to output. The next step in the process is to modify the structure of those data sets so they match. Activity: Using the Merge transform The Orders data has now been validated. • Use the Merge transform to merge the validated orders data. The transform does not strip out duplicate rows. 4. Tip: If you want to merge tables that do not have the same schema. Objectives • Use the Query transforms to modify any column names and data types and to perform lookups for any columns that reference other tables.

change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.'PRE_LOAD_CACHE'.'PRE_LOAD_CACHE'. 9. "output_cols_info"='<?xml version="1. For the SHIPPERREGION column.Column Type SHIPPERPOSTALCODE varchar(50) 6. .[NULL].'='. In the transform editor for the Query transform connected to the orders_db_work table. change the mapping to perform a lookup of CityName from the City table in the Alpha datastore.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 7.'M AX'].SHIPPERREGION]) SET ("run_as_separate_process"-'no'.SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'. [REGIONNAM E].[REGIONID.ORDERS_FILE_WORK.SOURCE.COUNTRY. [COUNTRYNAM E]. The expression should be as follows: lookup_ext([ALPHA.For the SHIPPERCITY column. "output_cols_info"='<?xml version="1.[NULL].[COUNTRYID. The expression should be as follows: lookup_ext([ALPHA.ORDERS_FILE_WORK.SOURCE.REGION. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore.'MAX'].'='. For the SHIPPERCOUNTRY column. map all columns from input to output.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 8. Change the datatype for the following Schema Out columns as specified: Column Type ORDER_TAKEN_BY ORDER_ASSIGNED_TO SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION varchar(15) varchar(15) varchar(50) varchar(50) varchar(50) 10.

Add a Merge transform to the data flow and connect both Query transforms to the Merge transform.Execute Alpha_Orders_Merged_Job with the default execution properties and save all objects you have created. import the file and open it to view the data flow design and mapping logic.SHIPPERCITY]) SET ("run_as_separate_process"-'no'. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore.'PRE_LOAD_CACHE'. "output_cols_info"='<?xml version="1.CITY. 16. "output_cols_info"='<?xml version="1.'PRE_LOAD_CACHE'. The expression should be as follows: lookup_ext([ALPHA. The expression should be as follows: lookup_ext([ALPHA.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 12.[COUNTRYID. SHIPPERCOUNTRY.REGION.'='. A solution file called SOLUTION_M erge.'PRE_LOAD_CACHE'. .For the SHIPPERREGION column.The expression should be as follows: lookup_ext([ALPHA. Note that the SHIPPERCITY.SOURCE.ORDERS_DB_WORK.[CITYID. change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.ORDERS_DB_WORK.[REGIONID. and SHIPPERREGION columns for the 363 records in the template table consistently have names versus ID values. as this may override the results in your target table. "output_cols_info"='<?xml version="1.SOURCE. [CITYNAM E].For the SHIPPERCOUNTRY column.'MAX'].'='.SOURCE. [COUNTRYNAM E].[NULL].SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'.'M AX'].COUNTRY.View the data in the target table. 14.[NULL]. Do not execute the solution job.[NULL].0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 13.ORDERS_DB_WORK.SHIPPERREGION]) SET ("run_as_separate_process"-'no'.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 11. [REGIONNAM E]. 15.'M AX'].atl is included in your Course Resources.'='.Add a template table called Orders_M erged in the Delta datastore as the target table and connect it to the Merge transform. To check the solution.

data input requirements.Using the Case transform Introduction The Case transform supports separating data from a source into multiple targets based on branch logic. The transform allows you to split a data set into smaller sets based on logical branches. After completing this unit. and data output results for the Case transform. options. For more information on the Case transform. . only one of multiple branches is executed per row. Depending on the data. you can use the Case transform to read a table that contains sales revenue facts for different regions and separate the regions into their own tables for more efficient data access: The next section gives a brief description the function. For example. see “Transforms” Chapter 5 in the Data Services Reference Guide. The input and output schema are also identical when using the case transform. you will be able to: • Use the Case transform Explaining the Case transform You use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform. Input/Output Only one data flow source is allowed as a data input for the Case transform.

Add your target objects to the workspace. Add your source object to the workspace. You will require one target object for each possible condition in the case statement. Define the Case expression for the corresponding label. You connect the output of the Case transform with another object in the workspace. Connect the source object to the transform. Options The Case transform offers several options: Option Description Label Define the name of the connection that describes where data will go if the corresponding Case condition is true. Specify that the transform passes each row to the first case whose expression returns true.The connections between the Case transform and objects used for a particular case must be labeled. Expression Produce default option with label Row can be TRUE for one case only To create a case statement 1. Each label represents a case expression (WHERE clause). Each output label in the Case transform must be used at least once. click and drag the Case transform to the workspace to the right of your source object. . 5. 2. Open the data flow workspace. 6. 4. 3. On the Transforms tab of the Local Object Library. Double-click the Case transform to open the transform editor. Specify that the transform must use the expression in this label when all other Case expressions evaluate to false.

select the Row can be TRUE for one case only option. In the parameters area of the transform editor. For example. In the Label field. enter a label for the expression.Click Back to return to the data flow workspace. 8.To direct records that do not meet any defined conditions to a separate target object. 14.To direct records that meet multiple conditions to only one target.Enter the rest of the expression to define the condition. 9.RegionID = 1 11. create the following statement: Customer.7. In this case. 12. click Add to add a new expression. to specify that you want all Customers with a RegionID of 1. 13. Click and drag an input schema column to the Expression pane at the bottom of the window. 10. select the Produce default option with label option and enter the label name in the associated field.Repeat step 7 to step 10 for all expressions. . records are placed in the target associated with the first condition that evaluates as true.

Add a Query transform to the data flow and connect it to the source table. Add the following two output columns: Column Type Mapping ORDERQUARTER int quarter (orders_merged. Activity: Using the Case transform Once the orders have been validated and merged. Instructions 1. add the Orders_Merged table from the Delta datastore as the source object.Release the mouse and select the appropriate label for that object from the pop-up menu. 17. 4. map all columns from input to output. Add a Case transform to the data flow and connect it to the Query transform. 'YYYY') ORDERYEAR varchar(4) 6.Connect the transform to the target object. 3. create a new batch job called Alpha_Orders_By_Quarter_Job with a data flow named Alpha_Orders_By_Quarter_DF.Repeat step 15 and step 16 for all target objects.ORDERDATE) to_char (orders_merged.15. 2. In the data flow workspace. In the transform editor for the Query transform. In the transform editor for the Case transform. Objective • Use the Case transform to create separate tables for orders occurring in fiscal quarters 3 and 4 for the year 2007 and quarter 1 of 2008. 16. the resulting data set must be split out by quarter for reporting purposes. create the following labels and associated expressions: Label Expression Q42006 Query.ORDERDATE. 5.ORDERQUARTER = 4 . In the Omega project. 7.ORDERYEAR = '2006' and Query.

atl is included in your Course Resources.Connect the output from the Case transform to the target tables selecting the corresponding labels. To check the solution.Execute Alpha_Orders_By_Quarter_Job with the default execution properties and save all objects you have created. Add five template tables in the Delta datastore called Orders_Q4_2006. as this may override the results in your target table. Orders_Q3_2007.ORDERQUARTER = 1 Query.ORDERYEAR = '2007' and Query. 10. Orders_Q1_2007.ORDERYEAR = '2007' and Query.Label Expression Q12007 Query. 12.View the data in the target tables and confirm that there are 103 orders that were placed in Q1 of 2007. A solution file called SOLUTION_Case. .ORDERYEAR = '2007' and Query. 11.ORDERQUARTER = 4 Q22007 Q32007 Q42007 8. Choose the settings to not produce a default output set for the Case transform and to specify that rows can be true for one case only. Orders_Q2_2007.ORDERYEAR = '2007' and Query.ORDERQUARTER = 3 Query. import the file and open it to view the data flow design and mapping logic. 9. and Orders_Q4_2007. Do not execute the solution job.ORDERQUARTER = 2 Query.

you will be able to: • Use the SQL transform Explaining the SQL transform Use this transform to perform standard SQL operations when other built-in transforms cannot perform them. For more information on the SQL transform see “Transforms” Chapter 5 in the Data Services Reference Guide. The next section gives a brief description the function. After completing this unit. However.Using the SQL transform Introduction The SQL transform allows you to submit SQL commands to generate data to be moved into target objects. There are two ways of defining the output schema for a SQL transform if the SQL submitted is expected to return a result set: . and data output results for the SQL transform. data input requirements. you cannot use this functionality if your source objects include file formats. You can use the SQL transform as a replacement for the Merge transform when you are dealing with database tables only. The SQL transform performs more efficiently because the merge is pushed down to the database. The SQL transform can be used to extract for general select statements as well as stored procedures and views. Inputs/Outputs There is no input data set for the SQL transform. options.

On the Transforms tab of the Local Object Library. click and drag the SQL transform to the workspace. Double-click the SQL transform to open the transform editor.• Automatic — After you type the SQL statement. Options The SQL transform has the following options: Option Description Datastore Specify the datastore for the tables referred to in the SQL statement. Database type Join rank Array fetch size Cache SQL text To create a SQL statement 1. . The number of columns defined in the output of the SQL transform must equal the number of columns returned by the SQL query. Enter the text of the SQL query. Add your target object to the workspace. • Manual — Output columns must be defined in the output portion of the SQL transform if the SQL operation is returning a data set. 4. Open the data flow workspace. but the column names and data types of the output columns do not need to match the column names or data types in the SQL query. The highest ranked source is accessed first to construct the join. 5. The default value is 1000. Indicate the weight of the output data set if the data set is used in a join. click Update schema to execute a select statement against the database that obtains column information returned by the select statement and populates the output schema. Use this only if the data set is small enough to fit in memory. 3. Indicate the number of rows retrieved in a single request to a source database. 2. Hold the output from this transform in memory for use in subsequent transforms. Specify the type of database for the datastore where there are multiple datastore configurations. Connect the transform to the target object.

. select the source datastore from the Datastore drop-down list. In the SQL text area. if required.6. 8. 11. 12. You can also create the output columns manually. you can change the names and datatypes of these columns. 7. select the appropriate configuration from the Database type drop-down list. enter the SQL statement. 9. If there is more than one datastore configuration. you would use the following statement: Select * from Customers. 10.Click and drag from the transform to the target object. In the parameters area. Change the other available options.Click Back to return to the data flow workspace.Click Update Schema to update the output schema with the appropriate values. to copy the entire contents of a table into the target object. If required. For example.

Lesson 5 Setting up Error Handling Lesson introduction For sophisticated error handling. you can use recoverable work flows and try/catch blocks to recover data. • Set up recoverable work flows • Using recovery mode • Using try/catch blocks and automatic recovery .

In other words. . While loops The while loop is a single-use object that you can use in a work flow. you must add another check to the loop. the while loop does not end. For example. In this situation. As long as the file does not exist. The while loop repeats a sequence of steps as long as a condition is true. you can have the work flow go into sleep mode for a particular length of time before checking again. In each iteration of the loop. As long as the file does not exist and the counter is less than a particular value. Others. the steps done during the while loop result in a change in the condition so that the condition is eventually no longer satisfied and the work flow exits from the while loop. you must resolve the problems that prevented the successful execution of the job. change the while loop to check for the existence of the file and the value of the counter. however. Because the system might never write the file. you will be able to: • • • • • Explain how to avoid data recovery situations Explain the levels of data recovery strategies Recover a failed job using automatic recovery Recover missing values and rows Define alternative work flows Avoiding data recovery situations The best solution to data recovery situations is obviously not to get into them in the first place.Using recovery mechanisms Introduction If a Data Services job does not complete properly. you might want a work flow to wait until the system writes a particular file. you could use the wait_for_file function or a while loop and the file_exists function to check that the file exists in a specified location before executing the job. Some of those situations are unavoidable. can easily be sidestepped by constructing your jobs so that they take into account the issues that frequently cause them to fail. If the condition does not change. You can use a while loop to check for the existence of the file using the file_exists function. such as a counter. After completing this unit. to ensure that the while loop eventually exits. put the work flow in sleep mode and then increment the counter. such as server failures. One example is when an external file is required to run a job. repeat the while loop. Typically.

Data Services executes the entire work flow during recovery. This option is outside of the scope of this course. It is not recommended to mark a work flow or data flow as “Execute only once” if the parent work flow is a recovery unit.Describing levels of data recovery strategies When a job fails to complete successfully during execution. The Properties dialog box displays. you may use a combination of these techniques to recover from exceptions. • Recover from partially-loaded tables: Use the Table Comparison transform. you may need to specify that a work flow or data flow should only execute once. . This requires the entire work flow to complete successfully. Click OK. steps in a work flow depend on each other and must be executed together. select the Recover as a unit check box. partially loaded. the job never re-executes that object. In the project area or on the Work Flows tab of the Local Object Library. you should designate the work flow as a recovery unit. some data flows may not have completed. try/catch blocks. include a preload SQL command to avoid duplicate loading of rows when recovering from partially loaded tables. • Recover missing values or rows: Use the Validation transform or the Query transform with WHERE clauses to identify missing values. When there is a dependency like this. Depending on the relationships between data flows in your application. When this happens. Conversely. right-click the work flow and select Properties from the menu. or altered. and use overflow files to manage rows that could not be inserted. You can: • Recover your entire database: Use your standard RDBMS services to restore crashed data cache to an entire database. Configuring work flows and data flows In some cases. • Recover a partially-loaded job: Use automatic recovery. • Define alternative work flows: Use conditionals. There are different levels of data recovery and recovery strategies. To specify a work flow as a recovery unit 1. including the steps that executed successfully in prior work flow runs. do a full replacement of the target. 2. 3. Note: It is important to note that some recovery mechanisms are for use in production systems and are not supported in development en viron ments. and scripts to ensure all exceptions are managed in a work flow. You need to design your data movement jobs so that you can recover your data by rerunning the job and retrieving all the data without introducing duplicate or missing data. When this setting is enabled. some tables may have been loaded. use the auto-correct load feature. If the work flow does not complete successfully. On the General tab.

the database log overflows and stops the job from loading fact tables. On the General tab. even though it failed. . 2. The next day. select the Execute only once check box.To specify that an object executes only once 1. the data in the fact tables will not correspond to the data previously extracted into the dimension tables. 2. you can execute the job again in recovery mode. right-click the work flow or data flow and select Properties from the menu. select the Enable recovery check box. suppose a daily update job running overnight successfully loads dimension tables in a warehouse. The Execution Properties dialog box displays. Click OK. However. For example. such as basing data extraction on the current system date. In the project area or on the appropriate tab of the Local Object Library. As in normal job execution. To ensure that the fact tables are loaded with the data that corresponds properly to the data already loaded in the dimension tables. Data Services retrieves the results for successfully-completed steps and reruns uncompleted or failed steps under the same conditions as the original job. The recovery job does not reload the dimension tables in a failed job because the original job. you truncate the log file and run the job again in recovery mode. Data Services executes the steps in parallel if they are not connected in the work flow diagrams and in serial if they are connected. the job execution may follow a completely different path through conditional steps or try/catch blocks. right-click the job and select Execute from the menu. If your recovery job uses new values. The Properties dialog box displays. To enable automatic recovery in a job 1. Data Services records any external inputs to the original job so that your recovery job can use these stored values and follow the same execution path. 3. such as those in a try/catch block. • Your recovery job must follow the exact execution path that the original job followed. During recovery mode. ensure the following: • Your recovery job must use the same extraction criteria that your original job used when loading the dimension tables. This includes steps that failed and steps that generated an exception but completed successfully. In recovery mode. On the Parameters tab. In the project area. Data Services executes the steps or recovery units that did not complete successfully in a previous execution. Using recovery mode If a job with automated recovery enabled fails during execution. If your recovery job uses new extraction criteria. successfully loaded the dimension tables. while the job is running.

Click OK. such as fact tables. For more information on preloading SQL commands. 3. Within your recoverable work flow. right-click the job that failed and select Execute from the menu. 2. This technique can be optimal when the changes to the target table are numerous compared to the size of the table. Typically. This option is not available when a job has not yet been executed. 3. To recover from last execution 1. • Include a SQL command to execute before the table loads. see “Using preload SQL to allow re-executable Data Flows”. Recovering from partially-loaded data Executing a failed job again may result in duplication of rows that were loaded successfully during the first job run. Data Services does not record the results from the steps during the job and cannot recover the job if it fails. the preload SQL command deletes rows based on a variable that is set before the partial insertion step began. On the Parameters tab. .If this check box is not selected. or recovery mode was disabled during the previous run. select the Recover from last execution check box. The auto-correct load checks the target table for existing rows before adding new rows to the table. can slow jobs executed in non-recovery mode. Using the auto-correct load option. In the project area. such as dimension tables. however. Preload SQL commands can remove partial database updates that occur during incomplete execution of a step in a job. Recovering missing values or rows Missing values that are introduced into the target data during data integration and data quality processes can be managed using the Validation or Query transforms. • Change the target table options to use the auto-correct load feature when you have tables with fewer rows and more fields. Click OK. • Change the target table options to completely replace the target table during each execution. Chapter 18 in the Data Services Designer Guide. you can use several methods to ensure that you do not insert duplicate rows: • Include the Table Comparison transform (available in Data Integrator packages only) in your data flow when you have tables with more rows and fewer fields. the previous job run succeeded. The Execution Properties dialog box displays. Consider this technique when the target table is large and the changes to the table are relatively few.

Missing rows are rows that cannot be inserted into the target table. 4. For example. To use an overflow file in a job 1. you can use the commands to load the target manually when the target is accessible. Open the target table editor for the target table in your data flow. This technique allows you to automate the process of recovering your results. rows may be missing in instances where a primary key constraint is violated. . On the Options tab. • If you select Write sql. In the File name field. select the Use overflow file check box. Data Services writes the row to the overflow file instead. This script reads the value in a status table and populates a global variable with the same value. enter or browse to the full path and file name for the file. Overflow files help you process this type of data problem. In the File format drop-down list. you can use Data Services to specify the format of the error-causing records in the overflow file. under Error handling. A conditional that calls the appropriate work flow based on whether recovery is required. Alternative work flows consist of several components: 1. The initial value in table is set to indicate that recovery is not required. Every new run will overwrite the existing overflow file. A script to determine if recovery is required. The trace log indicates the data flow in which the load failed and the location of the file. 2. You can use the overflow information to identify invalid data in your source or problems introduced in the data movement. When you specify an overflow file and Data Services cannot load a row into a table. When you specify an overflow file. give a full path name to ensure that Data Services creates a unique file when more than one file is created in the same job. select what you want Data Services to write to the file about the rows that failed to load: • If you select Write data. 2. 3. Defining alternative work flows You can set up your jobs to use alternative work flows that cover all possible exceptions and have recovery mechanisms built in.

This ensures that. . 3. A work flow to execute a data flow with recovery and a script to update the status table. wherever possible. 5. The data flow where recovery is not required is set up without the auto correct load option set. 4. A work flow with a try/catch block to execute a data flow without recovery. The script specifies that recovery is required if any exceptions are generated. the data flow is executed in a less resource-intensive mode. A script in the catch object to update the status table.The conditional contains an If/Then/Else statement to specify that work flows that do not require recovery are processed one way. The script updates the status table to indicate that recovery is not required. The data flow is set up for more resource-intensive processing that will resolve the exceptions. and those that do require recovery are processed another way.

Conditionals
Conditionals are single-use objects used to implement conditional logic in a work flow. When you define a conditional, you must specify a condition and two logical branches:
S tatement Description

If

A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression. Work flow element to execute if the IF expression evaluates to TRUE. Work flow element to execute if the IF expression evaluates to FALSE.

Then

Else

Both the Then and Else branches of the conditional can contain any object that you can have in a work flow, including other work flows, data flows, nested conditionals, try/catch blocks, scripts, and so on.

Try/Catch Blocks
A try/catch block allows you to specify alternative work flows if errors occur during job execution. Try/catch blocks catch classes of errors, apply solutions that you provide, and continue execution. For each catch in the try/catch block, you will specify: • One exception or group of exceptions handled by the catch. To handle more than one exception or group of exceptions, add more catches to the try/catch block. • The work flow to execute if the indicated exception occurs. Use an existing work flow or define a work flow in the catch editor. If an exception is thrown during the execution of a try/catch block, and if no catch is looking for that exception, then the exception is handled by normal error logic.

Using try/catch blocks and automatic recovery
Data Services does not save the result of a try/catch block for re-use during recovery. If an exception is thrown inside a try/catch block, during recovery Data Services executes the step that threw the exception and subsequent steps. Because the execution path through the try/catch block might be different in the recovered job, using variables set in the try/catch block could alter the results during automatic recovery.

For example, suppose you create a job that defines the value of variable $I within a try/catch block. If an exception occurs, you set an alternate value for $I. Subsequent steps are based on the new value of $I.

During the first job execution, the first work flow contains an error that generates an exception, which is caught. However, the job fails in the subsequent work flow.

You fix the error and run the job in recovery mode. During the recovery execution, the first work flow no longer generates the exception. Thus the value of variable $I is different, and the job selects a different subsequent work flow, producing different results.

To ensure proper results with automatic recovery when a job contains a try/catch block, do not use values set inside the try/catch block or reference output variables from a try/catch block in any subsequent steps.

To create an alte rnative work flow
1. Create a job. 2. Add a global variable to your job called $G_recovery_needed with a datatype of int . The purpose of this global variable is to store a flag that indicates whether or not recovery is needed. This flag is based on the value in a recovery status table, which contains a flag of 1 or 0, depending on whether recovery is needed. 3. In the job workspace, add a work flow using the tool palette. 4. In the work flow workspace, add a script called GetStatus using the tool palette. 5. In the script workspace, construct an expression to update the value of the $G_recovery_needed global variable to the same value as is in the recovery status table. The script content depends on the RDBMS on which the status table resides. The following is an example of the expression:
$G_recovery_needed = sql('DEM O_Target', 'select recovery_flag from recovery_status');

6. Return to the work flow workspace. 7. Add a conditional to the workspace using the tool palette and connect it to the script. 8. Open the conditional. The transform editor for the conditional allows you to specify the IF expression and Then/Else branches.

expand the tree in the Available exceptions pane. This object will be executed if there are any exceptions. Data Services catches all exceptions. 12.Connect the objects in the Then pane. enter the expression that evaluates whether recovery is required.In the script workspace. you can add a data flow here instead of a script. select the appropriate exceptions.Add a script called Fail to the lower pane using the Tool.To change which exceptions act as triggers. If desired. construct an expression update the flag in the recovery status table to 1. All exception types are lists in the Available exceptions pane. If recovery is needed. click and drag a work flow or data flow to the Then pane after the try object. 15. 16. 11. The following is an example of the expression: $G_recovery_needed = 0 This means the objects in the Then pane will run if recovery is not required. .Open the workspace for the catch object.Add a try object to the Then pane of the transform editor using the tool palette. 14. By default. indicating that recovery is needed.In the Local Object Library. 17. and click Set to move them to the Trigger on these exceptions pane. 13.Add a catch object to the Then pane after the work flow or data flow using the tool palette. the objects in the Else pane will run. In the IF field.9. 10.

'update recovery_status set recovery_flag = 1').Execute the job. 29. 30.Validate and save all objects. click and drag the work flow or data flow that represents the recovery process to the Else pane. the job fails because the target table already contains records. The following is an example of the expression: sql('DEMO_Target'. The recovery_flag field now contains a value of 1. the job succeeds because the recovery_flag value in the status table is set to 0 and the target table is empty. 18.Check the contents of the status table.Execute the job again.Check the contents of the status table again.Return to the conditional workspace. The following is an example of the expression: sql('DEMO_Target'. The script content depends on the RDBMS on which the status table resides.The script content depends on the RDBMS on which the status table resides. 19.Connect the objects in the Else pane. 24. 23. 20. 26. if recovery is required.Connect the objects in the Then pane. construct an expression to update the flag in the recovery status table to 0. The first time this job is executed. then the first object will be executed. This combination means that if recovery is not needed.Add a script called Pass to the lower pane using the tool palette. 27.Return to the conditional workspace. indicating that recovery is not needed. the second object will be executed. The second time this job is executed. . The job succeeds because the auto correct load feature checks for existing values before trying to insert new rows. 21. The third time this job is executed. the version of the data flow with the Auto correct load option selected runs because the recovery_flag value in the status table is set to 1.'update recovery_status set recovery_flag = 0'). The recovery_flag field contains a value of 0. 22.In the Local Object Library. 25.Execute the job again.In the script workspace. so there is no primary key constraint. 28. so there is a primary key exception.

Sign up to vote on this title
UsefulNot useful