This action might not be possible to undo. Are you sure you want to continue?
1: Core Concepts
Lesson 1 Describing Data Services Lesson introduction
Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes.
• Describe the purpose of Data Services • Describe Data Services architecture
Describing the purpose of Data Services
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.
Describing Data Services benefits
The Business Objects Data Services platform enables you to perform enterprise-level data integration and data quality functions. With Data Services, your enterprise can: • Create a single infrastructure for data movement to enable faster and lower cost implementation. • Manage data as a corporate asset independent of any single system. • Integrate data across many systems and re-use that data for many purposes. • Improve performance. • Reduce burden on enterprise systems. • Prepackage data solutions for fast deployment and quick return on investment (ROI). • Cleanse customer and operational data anywhere across the enterprise. • Enhance customer and operational data by appending additional information to increase the value of the data. • Match and consolidate data at multiple levels within a single pass for individuals, households, or corporations.
Understanding data integration processes
Data Services combines both batch and real-time data movement and management with intelligent caching to provide a single data integration platform for information management from any information source and for any information use. This unique combination allows you to: • Stage data in an operational datastore, data warehouse, or data mart. • Update staged data in batch or real-time modes.
• Create a single environment for developing, testing, and deploying the entire data integration platform. • Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis. Data Services performs three key functions that can be combined to create a scalable, high-performance data platform. It: • Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes. • Creates routing requests to a data warehouse or ERP system using complex rules. • Applies transactions against ERP systems. Data mapping and transformation can be defined using the Data Services Designer graphical user interface. Data Services automatically generates the appropriate interface calls to access the data in the source system.
you will be able to: • Describe standard Data Services components • Describe Data Services management tools Defining Data Services components Data Services includes the following standard components: • • • • • • • • • • Designer Repository Job Server Engines Access Server Adapters Real-time Services Address Server Cleansing Packages. and Directories Management Console This diagram illustrates the relationships between these components: . Dictionaries. After completing this unit.Describing Data Services architecture Introduction Data Services relies on several unique components to accomplish the data integration and data quality activities required to manage your corporate data.
test. transformations. In the BusinessObjects Data Services Repository Login dialog box. 3.Describing the Designer Data Services Designer is a Windows client application used to create. From the Start menu. You can create objects that represent data sources. depending on how the product was installed. Using Designer. and then drag. you can also trigger the Job Server to run your jobs for initial application testing. hover the cursor over the Job Server icon in the bottom right corner of the screen. and manually execute jobs that transform data and populate a data warehouse. drop. 4. and control logic. From the Designer. click Programs BusinessObjects XI 3. and configure them in flow diagrams.0/3. To log in to Designer 1. 2.1 BusinessObjects Data Services Data Services Designer to launch Designer. To verify the Job Server is running in Designer. Designer allows you to manage metadata stored in a local repository. . enter the connection information for the local repository. Click OK. you create data management applications that consist of data mappings. The path may be different. The details for the Job Server display in the status bar in the lower left portion of the screen.
In the BusinessObjects Data Services Repository Manager dialog box. . The path may be different. Each repository is stored on an existing Relational Database Management System (RDBMS).0/3. and transformation rules. • A central repository (known in Designer as the Central Object Library) is an optional component that can be used to support multi-user development. • A profiler repository is used to store information that is used to determine the quality of data. The Central Object Library provides a shared library that allows developers to check objects in and out for development. There are three types of repositories: • A local repository (known in Designer as the Local Object Library) is used by an application designer to store definitions of source and target metadata and Data Services objects. source and target metadata. 2.Describing the repository The Data Services repository is a set of tables that holds user-created and predefined system objects. depending on how the product was installed. enter the connection information for the local repository. click Programs BusinessObjects XI 3.1 BusinessObjects Data Services Data Services Repository Manager to launch the Repository Manager. From the Start menu. Each repository is associated with one or more Data Services Job Servers. To create a local repository 1. It is set up on an open client/server platform to facilitate sharing metadata with other enterprise tools.
Describing the Job Server Each repository is associated with at least one Data Services Job Server. Note that the version number refers only to the last major point release number. if it already exists. performs complex data transformations. Create Create. which retrieves the job from its associated repository and starts the data movement engine. To see the version of the repository. and manages extractions and transactions from ERP systems and other sources. you can see the SQL that is applied to create the repository. System messages confirm that the local repository is created. The Job Server can move data in batch or real-time mode and uses distributed query optimization.3. You may need to confirm that you want to overwrite the existing repository. The data movement engine integrates data from multiple heterogeneous sources. . If you select the Show Details check box. click Get Version. 4. 5. The version displays in the pane at the bottom of the dialog box. Click Close.
the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data Services Access Server. From the Start menu. Job Server to a specific Server Group. and parallel processing to deliver high data throughput and scalability.0/3. Data Services provides distributed processing capabilities through the Server Groups. Each Data Services server can contribute one.multithreading.1 BusinessObjects Data Services Data Services Server Manager to launch the Server Manager. This information is utilized by Data Services to determine where a job. In production environments. . While designing a job. and only one. To verify the connection between repository and Job Server 1. In your production environment. The path may be different. in-memory caching. in-memory data transformations. data flow or sub-data flow (depending on the distribution level specified) should be executed. you can run it from the Designer. you can balance job loads by creating a Job Server Group (multiple Job Servers). Each Job Server collects resource utilization information for its computer. click Programs BusinessObjects XI 3. A Server Group is a collection of Job Servers that each reside on different Data Services server computers. which executes jobs according to overall system load. depending on how the product was installed.
click Edit Job Server Config. .2. In the BusinessObjects Data Services Server Manager dialog box. 3. select the Job Server. In the Job Server Configuration Editor dialog box.
6. Click Resync with Repository. In the Job Server Properties dialog box. Click Resync. .4. 5. select the repository.
Click Apply. and movement. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high. The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. request-reply message broker that collects incoming XML message requests. 10. Real-time services are configured in the Data Services Management Console. click Restart to restart the Job Server. Data Services engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability. 11. In the Password field. You can configure multiple Access Servers. A system message displays indicating that the Job Server will be restarted. Describing the engines When Data Services jobs are executed.A system message displays indicating that the Job Server will be resynchronized with the selected repository. Describing the Access Server The Access Server is a real-time.Click OK to close the Job Server Configuration Editor dialog box.Click OK to close the Job Server Properties dialog box. There is also a Software Development Kit (SDK) to allow customers to create adapters for custom applications. 13. transformation. . enter the password for the repository. Click OK to acknowledge the warning message. Describing the adapters Adapters are additional Java-based programs that can be installed on the job server to provide connectivity to other systems such as Salesforce. the Job Server starts Data Services engine processes to perform data extraction.Click OK.In the BusinessObjects Data Services Server Manager dialog box. Describing the real-time services The Data Services real-time client communicates with the Access Server when processing real-time jobs. 7. and delivers a message reply within a user-specified time frame. routes them to a real-time service. 9. 12.com or the Java Messaging Queue. 8.
dictionary files are used to identify. properties. capitalization. and cleansing name and address data. starting. Access Server. Dictionaries also contain acronym. and standardize data such as names. and directories The Data Quality Cleansing Packages.Describing the Address Server The Address Server is used specifically for processing European addresses using the Data Quality Global Address Cleanse transform. parse. Cleansing Packages are packages that enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules. including: • Scheduling. and repository usage • Configuring and managing adapters • Managing users • Publishing batch jobs and real-time services via web services • Reporting on metadata Auto Documentation View. and firm data. including their relationships. titles. match standard. Directories provide information on addresses from postal authorities. . Describing the Cleansing Packages. and directories provide referential data for the Data Cleanse and Address Cleanse transforms to use when parsing. dictionaries. and print graphical representations of all objects as depicted in Data Services Designer. dictionaries. and more. It provides access to detailed address line information for most European countries. analyze. and address information. and executing batch jobs • Configuring. gender. and stopping real-time services • Configuring Job Server. standardizing. monitoring. Describing the Management Console The Data Services Management Console provides access to the following features: • Administrator • Auto Documentation • Data Validation • Impact and Lineage Analysis • Operational Dashboard • Data Quality Reports Administrator Administer Data Services resources.
Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns. or edit the properties of Job Servers. and Business Objects Enterprise objects such as universes. Describing the License Manager The License Manager displays the Data Services components for which you currently have a license. It is automatically installed on each computer on which you install a Job Server. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (for example. To generate reports for Match. central. business views. assess. upgrade. delete. and identify potential inconsistencies or errors in source data. transform-specific reports. Defining other Data Services tools There are also several tools to assist you in managing your Data Services installation. and reports. . Report types include job summaries. US Regulatory Address Cleanse. Describing the Server Manager The Data Services Server Manager allows you to add. Describing the Repository Manager The Data Services Repository Manager allows you to create. separating test and production environments). and transform group reports. and Global Address Cleanse transforms. Data Quality Reports Use data quality reports to view and export Crystal reports for batch and real-time jobs that include statistics-generating transforms.Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data Services batch jobs in order to quickly review. you must enable the Generate report data option in the Transform Editor. Ope rational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one or more repositories over a given time period. Use the Server Manager to define links between Job Servers and repositories. and profiler repositories. and check the versions of local.
Desktop Intelligence documents. and Web Intelligence documents. .Describing the Metadata Integrator The Metadata Integrator allows Data Services to seamlessly share metadata with Business Objects Intelligence products. Run the Metadata Integrator to collect metadata into the Data Services repository for Business Views and Universes used by Crystal Reports.
Lesson 2 Defining Source and Target Metadata Lesson introduction To define data movement requirements in Data Services. • Use datastores • Use datastore and system configurations • Define file formats for flat files • Define file formats for Excel files . you must import source and target metadata.
and various additional legacy systems using BusinessObjects Data Services Mainframe Interfaces such as Attunity and IBM Connectors The specific information that a datastore contains depends on the connection. .Using datastores Introduction Datastores represent connections between Data Services and databases or applications. you will be able to: • • • • Explain datastores Create a database datastore Change a datastore definition Import metadata Explaining datastores A datastore provides a connection or multiple connections to data sources such as a database. After completing this unit. Data Services can import the metadata that describes the data from the data source. Data Services does not automatically detect structural changes to the datastore. For example. Through the datastore connection. When your database or application changes. Oracle. you must make corresponding changes in the datastore information in Data Services. There are three kinds of datastores: • Database datastores: provide a simple way to import metadata directly from an RDBMS. Each source or target must be defined individually and the datastore options available depend on which Relational Database Management System (RDBMS) or application is used for the datastore. while Data Services extracts data from or loads data directly to the application. VSAM. Database datastores can be created for the following sources: • IBM DB2. • Adapter datastores: can provide access to an application’s data and metadata or just metadata. and Teradata databases (using native connections) • Other databases (through ODBC) • A simple memory storage mechanism using a memory datastore • IMS. Sybase. if the data source is SQL-compatible. • Application datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems. the adapter might be designed to access metadata. Microsoft SQL Server. Data Services uses these datastores to read data from source tables or load data to target tables.
ask your database administrator to create an account for you. See the documentation folder under Adapters located in your Data Services installation for more information on the Data Mart Accelerator for Crystal Reports. select the RDBMS for the data source. you must have appropriate access privileges to the database or file system that the datastore describes. right-click the white space and select New from the menu. The entries that you must make to create a datastore depend on the selections you make for these two options. In the Datastore Type drop-down list. Enter the other connection details. Depending on the adapter implementation. . To create a database datastore 1. You can use the Data Mart Accelerator for Crystal Reports adapter to import metadata from BusinessObjects Enterprise. any ODBC connection provides access to all of the available MySQL schemas. To create a datastore. Leave the Enable automatic data transfer check box selected. If you do not have access. On the Datastores tab of the Local Object Library. 6. Business Objects offers an Adapter Software Development Kit (SDK) to develop your own custom adapters. It cannot contain spaces. 4. The Create New Datastore dialog box displays. In the Database type drop-down list. ensure that the default value of Database is selected. The name can contain any alphanumeric characters or underscores (_). 5. enter the name of the new datastore. You can also buy Data Services prepackaged adapters to access application data and metadata in any application.Using adapters Adapters provide access to a third-party application’s data and metadata. 3. as required. The values you select for the datastore type and database type determine the options available when you create a database datastore. 2. see Chapter 5 in the Data Services Designer Guide. In the Datastore name field. For more information on these adapters. Note that if you are using MySQL. Creating a database datastore You need to create at least one datastore for each database file system with which you are exchanging data. adapters can provide: • Application metadata browsing • Application metadata importing into the Data Services repository For batch and real-time data movement between Data Services and applications.
database type. The datastore name appears on the object in the Local Object Library and in calls to the object. Properties Tab Description General Contains the name and description of the datastore. Properties are descriptive of the object and do not affect its operation. These include the database server name. if available. Click OK. Changing a datastore definition Like all Data Services objects. and password for the specific database. The Edit Datastore dialog box allows you to edit all connection properties except datastore name and datastore type for adapter and application datastores. datastores are defined by both options and properties: • Options control the operation of objects. and database version. For database datastores. For example. You cannot change the name of a datastore after creation. Attributes Class Attributes . Includes the date you created the datastore. • Properties document the object. database name. the name of the datastore and the date on which it is created are datastore properties. Includes overall datastore information such as description and date created. This value cannot be changed. you can edit all connection properties except datastore name. user name.7. datastore type.
Click OK. database name. as required. 2. searching. After importing metadata. The description of the column. The edits are propagated to all objects that call these objects. On the Datastores tab of the Local Object Library. Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables.To change datastore options 1. On the Datastores tab of the Local Object Library. 2. Change the datastore properties. and class attributes. . you can edit column names. Click OK. Column datatype If a column is defined as an unsupported datatype (see datatypes listed below) Data Services converts the datatype to one that is supported. 3. The changes take effect immediately. and password options. The Properties dialog box lists the datastore’s description. 3. The Edit Datastore dialog box displays the connection information. attributes. and browsing. Change the database server name. You can import metadata by name. descriptions. In some cases. The datatype for each column. right-click the datastore name and select Edit from the menu. username. Metadata Description Table name Table description Column name Column description The name of the table as it appears in the database. as required. right-click the datastore name and select Properties from the menu. The name of the table column. it ignores the column entirely. and datatypes. The description of the table. if Data Services cannot convert the datatype. To change datastore prope rties 1.
datetime. Name of the table owner. CLOB. Navigate to and select the tables for which you want to import metadata. and varchar. long. double. Chapter 5 in the Data Services Designer Guide. You can configure imported functions and procedures through the Function Wizard and the Smart Editor in a category identified by the datastore name. The column that comprises the primary key for the table. The items available to import appear in the workspace. Note that functions cannot be imported using this method. time. date. Importing metadata by browsing The easiest way to import metadata is by browsing. this columns is indicated in the column list by a key icon next to the column name. Information that is imported for functions includes: • • • • Function parameters Return type Name Owner Imported functions and procedures appear in the Function branch of each datastore tree on the Datastores tab of the Local Object Library. interval. int. Oracle. real. For more information on importing by searching and importing by name. right-click the datastore and select Open from the menu. Primary key column Table attribute Owner name You can also import stored procedures from DB2.Metadata Description The following datatypes are supported: BLOB. After a table has been added to a data flow diagram. 2. numeric. Information Data Services records about the table such as the date created and date modified if these values are available. timestamp. MS SQL Server. On the Datastores tab of the Local Object Library. see “Ways of importing metadata”. decimal. . and Sybase databases and stored functions and packages from Oracle. You can use these functions and procedures in the extraction specifications you give Data Services. To import metadata by browsing 1.
expand the datastore to display the list of imported objects. right-click the object and select Reconcile. right-click a table and select View Data from the menu. and Template Tables. Right-click the selected items and select Import from the menu. organized into Functions. 5. In the Local Object Library. Tables. The workspace contains columns that indicate whether the table has already been imported into Data Services (Imported) and if the table schema has changed since it was imported (Changed). 4. To view data for a imported datastore. 3.You can hold down the Ctrl or Shift keys and click to select multiple tables. 37 . To verify whether the repository contains the most recent metadata for an object.
Field-specific formats override the default format set in the Properties-Values area. • Fixed width format — the fixed column width is specified by the user. Creating file formats Use the file format editor to set properties for file format templates and source and target file formats.Defining file formats for flat files Introduction File formats are connections to flat files in the same way that datastore are connections to databases. The properties and appearance of the work areas vary with the format of the file. Date formats In the Property Values work area. File formats are used to connect to source or target data when the data is stored in a flat file. It is a set of properties describing the structure of a flat file (ASCII). Expand and collapse the property groups by clicking the leading plus or minus. The Local Object Library stores file format templates that you use to define specific file formats as sources and targets in data flows. • Data Preview: View how the settings affect sample data. • Column Attributes: Edit and define columns or fields in the file. you can override default date formats for files at the field level. The file format editor has three work areas: • Property Value: Edit file format property values. • SAP R/3 format — this is used with the predefined Transport_Format or with a custom SAP R/3 format. you will be able to: • Explain file formats • Create a file format for a flat file Explaining file formats A file format is a generic description that can be used to describe one file or multiple data files if they share the same format. The following data format codes can be used: Code Description DD 2-digit day of the month . After completing this unit. File format objects can describe files in: • Delimited format — delimiter characters such as commas or tabs separate each field.
On the Formats tab of the Local Object Library. .Code Description MM MONTH MON YY YYYY HH24 MI SS FF 2-digit month Full name of the month 3-character name of the month 2-digit year 4-digit year 2-digit hour of the day (0-23) 2-digit minute (0-59) 2-digit second (0-59) Up to 9-digit sub-seconds To create a new file format 1. To make sure your file format definition works properly. it is important to finish inputting the values for the file properties before moving on to the Column Attributes work area. right-click Flat Files and select New from the menu to open the File Format Editor.
In the Name field. The Group File Read can read multiple flat files with identical formats through a single file format. If a fixed-width file format uses a multi-byte code page. . Once the name has been created. 6. Specify the location information of the data file including Location. enter a name that describes this file format template. it cannot be changed. 3. This happens automatically when you open a file. multiple files can be read. Complete the other properties to describe files that this template represents. Root directory. In the Type field. the file format must be deleted and a new format created. 5. specify the file type: • Delimited: select this file type if the file uses a character sequence to separate columns.2. Overwrite the existing schema as required. • Fixed width: select this file type if the file uses specified widths for each column. and File name. 4. then no data is displayed in the Data Preview section of the file format editor for its files. Click Yes to overwrite the existing schema. If an error is made. By substituting a wild card character or list of file names for the single file name.
specify the scale of the field. For columns with any datatype but varchar. For source files. Field Size Precision Scale Format You do not need to specify columns for files used as targets. . specify the length of the field. Data Services cannot use the source column format specified. it defaults to the format used by the code page on the computer where the Job Server is installed. select a format for the field. specify the structure of each column in the Column Attributes work area as follows: Column Description Field Name Data Type Enter the name of the column.7. This information overrides the default format set in the Property Values work area for that datatype. Click Save & Close to save the file format and close the file format editor. Data Services writes to the target file using the transform’s output schema. Select the appropriate datatype from the drop-down list. For columns with a datatype of decimal or numeric. specify the precision of the field. 8. if desired. For columns with a datatype of varchar. In the Local Object Library. Instead. 9. For columns with a datatype of decimal or numeric. For a decimal or real datatype. if you only specify a source column format and the column names and datatypes in the target schema do not match those in the source schema. If you do specify columns and they do not match the output schema from the preceding transform. right-click the file format and select View Data from the menu to see the data.
After completing this unit. these Excel formats show up as sources in impact and lineage analysis reports. You can select specific data in the workbook using custom ranges or auto-detect.Defining file formats for Excel files Introduction You can create file formats for Excel files in the same way that you would for flat files. As with file formats and datastores. To import and configure an Excel source 1. . On the Formats tab of the Local Object Library. and you can specify variable for file and sheet names for more flexibility. you will be able to: • Create a file format for an Excel file Using Excel as a native data source It is possible to connect to Excel workbooks natively as a source. with no ODBC connection setup and configuration needed. right-click Excel Workbooks and select New from the menu.
The Import Excel Workbook dialog box displays.
2. In the Format name field, enter a name for the format. The name may contain underscores but not spaces. 3. On the Format tab, click the drop-down button beside the Directory field and select <Select folder...>. 4. Navigate to and select a new directory, and then click OK. 5. Click the drop-down button beside the File name field and select <Select file...>. 6. Navigate to and select an Excel file, and then click Open. 7. Do one of the following: • To reference a named range for the Excel file, select the Named range radio button and enter a value in the field provided.
• To reference an entire worksheet, select the Worksheet radio button and then select the All fields radio button. • To reference a custom range, select the Worksheet radio button and the Custom range radio button, click the ellipses (...) button, select the cells, and close the Excel file by clicking X in the top right corner of the worksheet. 8. If required, select the Extend range checkbox. The Extend range checkbox provides a means to extend the spreadsheet in the event that additional rows of data are added at a later time. If this checkbox is checked, at execution time, Data Services searches row by row until a null value row is reached. All rows above the null value row are included. 9. If applicable, select the Use first row values as column names option. If this option is selected, field names are based on the first row of the imported Excel sheet. 10.Click Import schema. The schema is displayed at the top of the dialog box. 11.Specify the structure of each column as follows:
Field Name Data Type
Enter the name of the column. Select the appropriate datatype from the drop-down list. For columns with a datatype of varchar, specify the length of the field. For columns with a datatype of decimal or numeric, specify the precision of the field. For columns with a datatype of decimal or numeric, specify the scale of the field. If desired, enter a description of the column.
12.If required, on the Data Access tab, enter any changes that are required. The Data Access tab provides options to retrieve the file via FTP or execute a custom application (such as unzipping a file) before reading the file. 13.Click OK. The newly imported file format appears in the Local Objects Library with the other Excel workbooks. The sheet is now available to be selected for use as a native data source.
Lesson 3 Creating Batch Jobs Lesson introduction Once metadata has been imported for your datastores, you can create data flows to define data movement requirements. • Work with objects • Create a data flow • Use the Query transform • Use target tables • Execute the job
define.Defining Data Services objects Introduction Data Services provides you with a variety of objects to use when you are building your data integration and data quality applications. all entities you add. or work with are objects. you will be able to: • Define the objects available in Data Services • Explain relationships between objects Understanding Data Services objects In Data Services. All objects have options. to set up a connection to a database. . and classes. properties. For example. Each can be modified to change the behavior of the object. Some of the most frequently-used objects are: • • • • • • Projects Jobs Work flows Data flows Transforms Scripts This diagram shows some common objects. After completing this unit. Options Options control the object. modify. the database name is an option for the connection.
• Only one project can be open at a time. For example. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. and then save the object. For example. Data . the change is reflected to all other calls to the object. They operate only in the context in which they were created. Defining projects and jobs A project is the highest-level object in Designer. A project is a single-use object that allows you to group jobs. the name and creation date describe what the object is used for and when it became active. Multiple jobs. You can edit re-usable objects at any time independent of the current open project. Data Services stores the definition in the repository. can call the same data flow. The objects in a project appear hierarchically in the project area. Note: You cannot copy single-use objects. Most objects created in Data Services are available for re-use. You can then re-use the definition as often as necessary by creating calls to it. If this data flow is changed. such as a weekly load job and a daily load job. Projects have the following characteristics: • Projects are listed in the Local Object Library. the changes you make to the data flow are not stored until you save them. both jobs call the new version of the data flow. For example. Attributes are properties used to locate and organize objects. For example. • Projects cannot be shared among multiple users. If a plus sign (+) appears next to an object. Projects provide a way to organize the other objects you create in Designer.Properties Properties describe the object. you can open a data flow and edit it. If you change the definition of the object in one place. Classes Classes define how an object can be used. Re-usable objects A re-usable object has a single definition and all calls to the object refer to that definition. if you open a new project. After you define and save a re-usable object. Single-use objects Single-use objects appear only as components of other objects. a data flow within a project is a re-usable object. you can expand it to view the lower-level objects contained in the object. However. Every object is either re-usable or single-use. A job is the smallest unit of work that you can schedule independently for execution.
Services displays the contents as both names and icons in the project area hierarchy and in the workspace. A data flow defines the basic task that Data Services accomplishes. or to define conditions for running sections of a project. This practice can provide various benefits. But what if specification changes require that they be merged into another job instead? The developer would have to replicate their sequence correctly in the other job. the developer could then have simply copied that work flow into the correct position within the new job. which involves moving data from one or more sources to one or more target tables or files. This diagram illustrates a typical work flow. A work flow orders data flows and the operations that support them. For example. Note: Jobs must be associated with a project before they can be executed in th e project area of Designer. You define data flows by identifying the sources from which to extract data. Always using work flows makes jobs more adaptable to additional development and/or specification changes. the transformations the data should undergo. Defining relationship between objects Jobs are composed of work flows and/or data flows: • A work flow is the incorporation of several data flows into a sequence. However. Using work flows Jobs with data flows can be developed without using work flows. • A data flow is the process by which source data is transformed into target data. they could be set up without work flows. one should consider nesting data flows inside of work flows by default. If these had been initially added to a work flow. you can use the work flow to specify the order in which you want Data Services to populate the tables. For instance. . and targets. You can also use work flows to define strategies for handling errors that occur during project execution. if a job initially consists of four data flows that are to run sequentially. if one target table depends on values from other tables. It also defines the interdependencies between data flows.
as data volumes tend to increase. Even if there is one data flow per work flow. Initially. In jobs. it may have been decided that recovery units are not important. it may be determined that a full reprocessing is too time consuming. Setting these up during initial development when the nature of the processing is being most fully analyzed is preferred. work flows define a sequence of processing steps. It also opens up the possibility that units of recovery are not properly defined. and verify the previous sequence. these changes can be complex and can consume more time than allotted for in a project plan. However. to jobs. Describing the object hierarchy In the repository. to optional work flows. the expectation being that if the job fails.There would be no need to learn. to data flows. This illustration shows the hierarchical relationships for the key object types within Data Services: . copy. the whole process could simply be rerun. objects are grouped hierarchically from a project. there are benefits to adaptability. and data flows move data from source tables to target tables. The change can be made more quickly with greater accuracy. The job may then be changed to incorporate work flows to benefit from recovery units to bypass reprocessing of successful steps. However.
.This course focuses on creating batch jobs using database datastores and file formats.
0/3. Local Object Library. . The application window contains the menu bar. After completing this unit. Most of the components of Data Services can be programmed through this interface. These manuals are also accessible by going through Start Programs Business Objects XI 3. and workspace. you will be able to: • Explain how Designer is used • Describe key areas in the Designer window Describing the Designer window The Data Services Designer interface consists of a single application window and several embedded supporting windows. tool palette.Using the Data Services Designer interface Introduction The Data Services Designer interface allows you to plan and organize your data integration and data quality jobs in a visual way. Tip: You can access the Data Services Technical Manuals for reference or help through the Designer interface Help menu.1 BusinessObjects Data Services Data Services Documenta tion Technical Manua ls. toolbar. project area.
Objects included in the definition are also validated. You can collect audit statistics on the data that flows out of any Data Services object. Enables the system-level setting for viewing object descriptions in the workspace. Open and closes the project area. Opens the Output window.Using the Designer toolbar In addition to many of the standard Windows toolbar buttons. Central Object Library Opens and closes the Central Object Library window. Audit View Where Used Back . Variables Project Area Output View Enabled Descriptions Opens and closes the Variables and Parameters window. View Opens the Audit window. which lists parent objects (such as jobs) of the object currently open in the workspace (such as a data flow). Moves back in the list of active workspace windows. Validate All Objects in Validates all object definitions open in the workspace. Other objects included in the definition are also validated. Opens and closes the Local Object Library window. Closes all open windows in the workspace. Data Services provides the following unique toolbar buttons: Button Tool Description Save All Close All Windows Local Object Library Saves all new or updated objects. Validates the object definition open in the active tab of the Validate Current View workspace. Opens and closes the Output window.
producing output data sets from the sources you specify. Operational Dashboard. Assess and Monitor Contents Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. Data Validation. Under each datastore is a list of the tables. Opens the Data Services Technical Manuals. Datastores represent connections to databases and applications used in your project. Tab Description Projects are sets of jobs available at a given time. Opens and closes the Data Services Management Console. Lineage and Impact Management Console Analysis. Data Integrator. which allows you to assess and monitor the quality of your data. defining the interdependencies between them. and Data Quality transforms. Transforms operate on data. There are two job types: batch jobs and real-time jobs. Auto Data Services Documentation. Jobs are executable work flows. which provides access to Administrator.Button Tool Description Forward Move forward in the list of active workspace windows. The Local Object Library lists both platform. Work flows order data flows and the operations that support data flows. Data flows describe how to process a task. and functions imported into Data Services . The table shows the tab on which the object type appears in the Local Object Library and describes the Data Services context in which you can use each type of object. and Data Quality Reports. documents. Opens Data Insight.
Tabs include: . select the file type for your export file. or XML message. Tabs on the bottom of the project area support different tasks. A warning message displays to let you know that it takes a long time to create new versions of existing objects. 4. Click Open. To export a repository to a file 1. To import a repository from a file 1.xml file format can make repository content easier for you to read. Using the project area The project area provides a hierarchical view of the objects used in each project. Browse to the destination for the file. Custom functions are functions written in the Data Services Scripting Language. 2. right-click the white space and select Repository Export To File. Whole repositories can be exported in either . In the Save as type list. Using the . 4. You can import objects to and export objects from your Local Object Library as a file. In the File name field. 2. You must restart Data Services after the import process completes. The Write Repository Export File dialog box displays.xml format. The Open Import File dialog box displays. right-click the white space and select Repository Import from File from the menu. Browse to the destination for the export file. enter the name of the export file. Excel file. XML file. Click OK. On any tab of the Local Object Library. Click Save. On any tab of the Local Object Library. The repository is exported to the file.Tab Description Formats describe the structure of a flat file. 5. 3. Importing objects from a file overwrites existing objects with the same names in the destination Local Object Library. It also allows you to export Data Services to other products. 3.atl or .
Right-click the border of the project area. 2. This provides a hierarchical view of all objects used in each project. These tasks can also be done using the Data Services Management Console. To lock and unlock the project area 1. Move the mouse over the docked pane. To switch between the last docked and undocked locations. Selecting a specific job execution displays its status. 2. Right-click the border of the project area. To change the docked position of the project area 1. The project area does not dock inside the workspace area. To change the undocked position of the project area 1. From the menu. double-click the gray border. Logs can also be viewed with the Data Services Management Console. Click and drag the project area to dock and undock at any edge within Designer. 3. Click the pin icon to lock the pane in place again. this signifies a placement option. View the history of complete jobs. View the status of currently executing jobs. select Floating to remove the check mark and clear the docking option. view. 4. The project area re-appears. When you drag the project area away from a window edge. Click and drag the project area to any location on your screen. 2. The project area hides.Tab Description Create. From the menu. When you position the project area where one of the directional arrows highlights a portion of the window. it stays undocked. select Floating. Click the pin icon ( ) on the border to unlock the project area. 3. 3. . and manage projects. including which steps are complete and which steps are executing.
3. SAP licensed extension Data flows . Jobs and work flows Creates a new data flow. Work flow Data flow Creates a new work flow. The project area disappears from the Designer window. The tool palette contains these objects: Icon Tool Description Available in Pointer Returns the tool pointer to a selection pointer for selecting and moving objects All objects in a diagram. The icons in the tool palette allow you to create new objects in the workspace. if you select the data flow icon from the tool palette and define a new data flow called DF1. When you create an object from the tool palette. you are creating a new definition of an object. it is automatically available in the Local Object Library after you create it. Jobs and work flows R/3 data Creates a new data flow with the SAP flow licensed extension only. 2. click Project Area in the toolbar. To show the project area. select Hide. From the menu. Right-click the border of the project area. The icons are disabled when they are invalid entries to the diagram open in the workspace. You can move the tool palette anywhere on your screen or dock it on any edge of the Designer window. hold the cursor over the icon until the tool tip for the icon appears. Query Creates a query to define column transform mappings and row selections. To show the name of each icon. Using the tool palette The tool palette is a separate window that appears by default on the right edge of the Designer workspace. If a new object is re-usable. you can later drag that existing data flow from the Local Object Library and add it to another data flow called DF2. For example.To hide/show the project area 1.
Icon Tool Description Available in Template Creates a new table for a target. Jobs and work flows Jobs and work flows Conditional Creates a new conditional object. XML Data flows Data flows Data Create a data transport flow for the SAP SAP Licensed transport Licensed extension. Catch Creates an annotation used to describe Jobs. Creates a new catch object that catches Jobs and work flows errors in a job. The workspace provides a place to manipulate objects and graphically assemble data movement processes. You specify the flow of data by connecting objects in the workspace from left to right in the order you want the data to be moved. work flows. data flows Using the workspace When you open a job or any object within a job hierarchy. table Template Creates a new XML file for a target. extension Script Creates a new script object. . While Loop Repeats a sequence of steps in a work flow as long as a condition is true. and Annotation objects. These processes are represented by icons that you drag and drop into a workspace to create a diagram. Work flows Try Creates a new try object that tries an alternate work flow if an error occurs in Jobs and work flows a job. the workspace becomes active with your selection. This diagram is a visual representation of an entire data movement application or some part of a data movement application.
2. In many organizations. When designing jobs. Use the SC_ prefix in the system configuration name so that you can easily identify this file as a system configuration. By maintaining system configurations in a separate file. enter the system configuration name. However. In the Configuration name column. you can export system configurations to a separate flat file which you can later import. Create datastore configurations for the datastores in your repository before you create the system configurations for them. 4. particularly when exporting. Data Services maintains system configurations separately. The System Configuration Editor dialog box displays columns for each datastore. determine and create datastore configurations and system configurations depending on your business environment and rules. and a system administrator determines which system configuration to use when scheduling or starting a job in the Administrator. a Data Services designer defines the required datastore and system configurations. You cannot check in or check out system configurations. you avoid modifying your datastore each time you import or export a job. From the Tools menu. or each time you check in and check out the datastore. In the drop-down list for each datastore column. 3. Click OK. select System Configurations. select the appropriate datastore configuration that you want to use when you run a job using this system configuration. .Creating a system configuration System configurations define a set of datastore configurations that you want to use together when running a job. You cannot define a system configuration if your repository does not contain at least one datastore with multiple configurations. To create a system configuration 1.
From the Project menu. A project is used solely for organizational purposes. It is the highest level of organization offered by Data Services. . where you can drill down into additional levels:. Only one project can be open at a time. The objects in the project area also display in the workspace. To create a new project 1. If a plus sign (+) appears next to an object. select New Project. you can expand it to view the lower-level objects. Opening a project makes one group of objects easily accessible in the user interface.Working with objects Introduction Data flows define how information is moved from source to target. connect. After completing this unit. These data flows are organized into executable jobs. and delete objects in the workspace Create a work flow Creating a project A project is a single-use object that allows you to group jobs. which are grouped into projects. For example. you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. you will be able to: • • • • Create a project Create a job Add. The objects in a project appear hierarchically in the project area in Designer.
they also appear in the project area. 2. Click Open. As you add jobs and other lower-level objects to the project. Data Services closes that project and opens the new one in the project area. The Project . Enter a unique name in the Project name field. The name can include alphanumeric characters and underscores (_). 3. The new project appears in the project area. 2. To open an existing project 1. 3.Open dialog box displays. Select the name of an existing project from the list. select Save All. .New dialog box displays.You can also right-click the white space on the Projects tab of the Local Object Library and select New from the menu. From the Project menu. From the Project menu. Click Create. select Open. If another project is already open. It cannot contain blank spaces. The Project . To save a project 1.
you can manually execute and test jobs directly in Data Services. you can organize its content into individual work flows.The Save all changes dialog box lists the jobs. you can schedule batch jobs and set up real-time jobs as services that execute a process when Data Services receives a message request. Click OK. work flows. and data flows that you edited since the last save. You are also prompted to save all changes made in a job when you execute the job or exit the Designer. A job diagram is made up of two or more objects connected together. When you are developing your data flows. A job is made up of steps that are executed together. Creating a job A job is the only executable object in Data Services. In production. You can include any of the following objects in a job definition: • • • • Work flows Scripts Conditionals While loops • Try/catch blocks • Data flows 0 Source objects 0 Target objects 0 Transforms If a job becomes complex. 3. Deselect any listed object to avoid saving it. . and then create a single job that calls those work flows. Each step is represented by an object icon that you place in the workspace to create a job diagram. 2.
To create a job in the project area 1. To add objects from the tool palette to the workspace • In the tool palette. Click and drag the selected object on to the workspace.Tip: It is recommended that you follow consistent na ming con ventions to facilitate object identification across all systems in your en terprise. When you create a job in the Local Object Library. Edit the name of the job. 3. In the Local Object Library. you can add objects to the job workspace area using either the Local Object Library or the tool palette. move the cursor to the workspace. click the desired object. It cannot contain blank spaces. Adding. you must associate the job and all related objects to a project before you can execute the job. The name can include alphanumeric characters and underscores (_). right-click the project name and select New Batch Job from the menu. and then click the workspace to add the object. 2. Data Services opens a new workspace for you to define the job. click the tab for the type of object you want to add. A new batch job is created in the project area. 2. connecting. In the project area. . You can also create a job and related objects from the Local Object Library. Click the cursor outside of the job name or press Enter to commit the changes. and deleting objects in the workspace After creating a job. To add objects from the Local Object Library to the workspace 1.
and you can nest calls to any depth. unless the jobs containing those work flows execute in parallel. Enter a unique name for the work flow. conditionals. Work flows can contain data flows. 2. To create a work flow 1.Creating a work flow A work flow is an optional object that defines the decision-making process for executing other objects. 5. while loops. . Open the job or work flow to which you want to add the work flow. Ultimately. You must connect the objects in a work flow when there is a dependency between the steps. Steps in a work flow execute in a sequence from left to right. the purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. Almost all of the features documented for work flows also apply to jobs. Note: In essence. elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path. jobs are just work flows that can be executed. and scripts. Select the Work Flow icon in the tool palette. 4. Click the cursor outside of the work flow name or press Enter to commit the changes. 3. For example. To connect objects in the workspace area • Click and drag from the triangle or square of an object to the triangle or square of the next object in the flow to connect the objects. A work flow can even call itself. Defining the order of execution in work flows The connections you make between the icons in the workspace determine the order in which work flows execute. They can also call other work flows. Click the workspace where you want to place the work flow. To disconnect objects in the workspace area • Select the connecting line between the objects and press Delete. try/catch blocks.
such as jobs with try/catch blocks or conditionals. define Work Flow B: Finally. If you specify that it should be executed only once. .To execute more complex work flows in parallel. and then call each of the work flows from another work flow. You might use this feature when developing complex jobs with multiple paths. you must define Work Flow A: Next. and you want to ensure that Data Services only executes a particular work flow or data flow one time. as in this example: First. Data Services only executes the first occurrence of the work flow or data flow. and skips subsequent occurrences in the job. you can define each sequence as a separate work flow. create Work Flow C to call Work Flows A and B: You can specify a job to execute a particular work flow or data flow once only.
produces an intermediate result. however. After completing this unit. This data set may. even when they are steps in a work flow. be further filtered and directed into yet another data set. The objects that you can use as steps in a data flow are: • Source and target objects • Transforms The connections you make between the icons determine the order in which Data Services completes the steps. in turn. the results of a SQL statement contain a WHERE clause that flows to the next step in the data flow. For example. transform. Each icon you place in the data flow diagram becomes a step in the data flow. A work flow does not operate on data sets and cannot provide more data to a data flow. and loaded into targets. transformed. This result is called a data set. . • Define the conditions appropriate to run data flows. and target objects that represent the key activities in data integration and data quality processes.Creating a data flow Introduction Data flows contain the source. • Pass parameters to and from data flows. The lines connecting objects in a data flow represent the flow of data through data integration and data quality processes. you will be able to: • Create a data flow • Explain source and target objects • Add source and target objects to a data flow Using data flows Data flows determine how information is extracted from sources. Any data set created within a data flow is not available to other steps in the work flow. Data flows are closed operations. a work flow can: • Call data flows to perform data movement operations. up to the target definition. The intermediate result consists of a set of rows from the previous operation and the schema in which the rows are arranged. Using data flows as steps in work flows Each step in a data flow.
a batch job will never re-execute that data flow after the data flow completes successfully. 5. sorts. even if the data flow is contained in a work flow that is a recovery unit that re-executes. 2. They cannot contain blank spaces. 4. groups. Database links allow local users to access data on a remote database. which can be on the local or a remote computer of the same or different database type.To create a new data flow 1. lookups. 3. Open the job or work flow in which you want to add the data flow. and table comparisons. For more information see “Degree of parallelism” in the Data Services Performance Optimization Guide. Enter a unique name for your data flow. Data flow names can include alphanumeric characters and underscores (_). You can cache data to improve performance of operations such as joins. Click the cursor outside of the data flow or press Enter to commit the changes. filtering. Click the workspace where you want to add the data flow. Degree of parallelism (DOP) is a property of a data flow that defines how many times each transform within a data flow replicates to process a parallel subset of data. 6. Double-click the data flow to open the data flow workspace. You should not select this option if the parent work flow is a recovery unit. Select one of the following values: Use database links Degree of parallelism Cache type . Database links are communication paths between one database server and another. Select the Data Flow icon in the tool palette. For more information see “Database link support for push-down operations across datastores” in the Data Services Performance Optimization Guide. Changing data flow properties You can specify the following advanced data properties for a data flow: Data Flow Property Description Execute only once When you specify that a data flow should only execute once.
For more information. . The Properties window opens for the data flow. Change the properties of the data flow as required. Click OK. see “Tuning Caches” in the Data Services Performance Optimization Guide. To change data flow properties 1. 3. This is the default. For more information about how Data Integrator processes data flows with multiple properties. 2. Right-click the data flow and select Properties from the menu.Data Flow Property Description • In Memory: Choose this value if your data flow processes a small amount of data that can fit in the available memory. see “Data Flow” in the Data Services Resource Guide. • Pageable: Choose this value if you want to return only a subset of data at a time to limit the resources required.
A source in real-time jobs. A pre-built set of operations that can create new data. primarily for debugging data flows). A delimited or fixed-width flat file. Transform . A template table that has been created and saved in another data flow (used in development). A file formatted with XML tags.Explaining source and target objects A data flow directly reads data from source objects and loads data to target objects. Template table Source and target File Source and target Document A file with an application-specific format Source and target (not readable by SQL or XML parser). XML file Source and target XML message Source only XML template file An XML file whose format is based on the preceding transform output (used in Target only development. Source only such as the Date Generation transform. Object Description Type Table A file formatted with columns and rows as used in Source and target relational databases.
To add a source or target object to a data flow 1. select the table. open the data flow in which you want to place the object. in the Datastores tab of the Local Object Library. 3. Do one of the following: • To add a database table. 5. Click and drag the object to the workspace. • To add a flat file.Adding source and target objects Before you can add source and target objects to a data flow. Add and connect objects in the data flow as appropriate. 4. or create the file format for flat files. . in the Formats tab of the Local Object Library. Select Make Source or Make Target from the menu. depending on whether the object is a source or target object. 2. A pop-up menu appears for the source or target object. you must first create the datastore and import the table metadata for any databases. In the workspace. select the file format.
you will be able to: • Describe the transform editor • Use the Query transform Describing the transform editor The transform editor is a graphical interface for defining the properties of transforms. After completing this unit. It enables you to select data from a source and filter or reformat it as it moves to the target. The workspace can contain these areas: • Input schema area • Output schema area • Parameters area . and is included in most data flows.Using the Query transform Introduction The Query transform is the most commonly-used transform.
you must map each input column to the corresponding output column. The I iconindicates tabs ) ( containing user-defined entries. The output schema area displays the schema of the output data set. It retrieves a data set that satisfies conditions that you specify. To create this relationship. a relationship must be defined between the input and output schemas. similar to a SQL SELECT statement. For template tables. Below the input and output schema areas is the parameters area. this area is not available. the output schema can be defined based on your preferences. For any data that needs to move from source to target. Explaining the Query transform The Query transform is used so frequently that it is included in the tool palette with other standard objects. The options available on this tab differs based on which transform or object you are modifying. including any functions. For source objects and some transforms.The input schema area displays the schema of the input data set. .
If a row is flagged as NORMAL when loaded into a target table or file. • Perform data nesting and unnesting. For more information on the Query transform see “Transforms” Chapter 5 in the Data Services Reference Guide. The output schema can contain multiple columns and functions. The NORMAL operation code creates a new row in the target. Note: When working with nested data from an XML file. All the rows in a data set are flagged as NORMAL when they are extracted by a source table or file. it is inserted as a new row in the target. The next section gives a brief description the function. For example. you could use the Query transform to select a subset of the data in a table to show only those records from a specific region. you can use the Query transform to unnest the data using the right-click menu for the output schema. and data output results for the Query transform. Output schema area displays the schema output from the Query transform as a hierarchical tree. • Perform transformations and functions on the data. The data output is a data set based on the conditions you specify and using the schema specified in the output schema area. • Map columns from input to output schemas. Input/Output The data input is a data set from one or more sources with rows flagged with a NORMAL operation code. options. • Add new columns. Each input schema can contain multiple columns. • Assign primary keys to output columns. data input requirements. Options The input schema area displays all schemas input to the Query transform as a hierarchical tree. • Join data from multiple sources. . nested schemas. which provides options for unnesting. and function results to the output schema.The Query transform can perform the following operations: • Filter the data extracted from sources.
so not all incorrect mappings will necessarily be flagged. Specify the columns you want used to sort the output data set. Data Services combines or aggregates the values in the remaining columns. This indicates that the column mapping is incorrect. For each unique set of values in the group by list. This indicates that the column has a simple mapping. Create separate sub data flows to process any of the following resource-intensive query clauses: OUTER JOIN WHERE GROUP BY ORDER BY Advanced . This indicates that the column has a complex mapping. Specify a list of columns for which you want to combine output. such as a transformation or a merge between two source columns. Set conditions that determine which rows are output. Specify the input schemas used in the current output schema. A simple mapping is either a single column or an expression with no input column. Select only distinct rows (discarding any duplicate rows). The parameters area of the Query transform includes the following tabs: Tab Description Mapping SELECT FROM Specify how the selected output column is derived. Specify an inner table and an outer table for joins that you want treated as outer joins.Icons preceding columns are combinations of these graphics: Icon Description This indicates that the column is a primary key. Data Integrator does not perform a complete validation during design.
• Select the output column and manually enter the mapping on the Mapping tab in the parameters area. Find To map input columns to output columns • In the transform editor.Tab Description • • • • DISTINCT GROUP BY JOIN ORDER BY For more information. • Drag a single input column over the corresponding output column. • Select the output column. You can either type the column name in the parameters area or click and drag the column from the input schema pane. and select Remap Column from the menu. release the cursor. • Select multiple input columns (using Ctrl+click or Shift+click) and drag onto Query output schema for automatic mapping. highlight and manually delete the mapping on the Mapping tab in the parameters area. . Search for a specific work or item in the input schema or the output schema. see “Distributed Data Flow execution” in the Data Services Designer Guide. do any of the following: • Drag and drop a single column from the input schema area into the output schema area.
or a template table. Changes are automatically committed. Click Back to return to the data flow. 2. The target table editor opens in the workspace.Using target tables Introduction The target object for your data flow can be either a physical table or file. In a data flow. After completing this unit. To access the target table editor 1. . Change the values as required. you will be able to: • Access the table table editor • Set target table options • Use template tables Accessing the target table editor The target table editor provides a single location to change settings for your target tables. double-click the target table. 3.
the target table editor opens in the workspace with different tabs where you can set database type properties. Note: Most of the tabs in the target table editor focus on migration or performance-tuning techniques. You can set the following table loading options in the Options tab of the target table editor: Option Description Rows per commit Specifies the transaction size in number of rows.Setting target table options When your target object is a physical table in a database. table loading options. Specifies how the input columns are mapped to output columns. There are two options: Column comparison . and tuning techniques for loading a job. which are outside the scope of this course.
Option Description • Compare_by_position — disregards the column names and maps source columns to target columns by position. the third 1000 rows to the third loader. Delete data from table before loading Use overflow file Ignore columns with value Ignore columns with null . Sends a TRUNCATE statement to clear the contents of the table before loading during batch jobs. When this value appears in the source column. the first 1000 rows are sent to the first loader. Writes rows that cannot be loaded to the overflow file for recovery purposes. Number of loaders For example. Options are enabled for the file name and file format. Specifies a value that might appear in a source column that you do not want updated in the target table. The overflow format can include the data rejected and the operation being performed (write_data) or the SQL command used to produce the rejected operation (write_sql). if you choose a Rows per commit of 1000 and set the number of loaders to three. • Compare_by_name — maps source columns to target columns by name. Ensures that NULL source columns are not updated in the target table during auto correct loading. You can enter spaces. Specifies the number of loaders (to a maximum of five) and the number of rows per commit that each loader receives during parallel loading. The second 1000 rows are sent to the second loader. and the next 1000 rows back to the first loader. Defaults to not selected. Validation errors occur if the datatypes of the columns do not match. the corresponding target column is not updated during auto correct loading.
By default. If the data being buffered is larger than the virtual memory available. there is no ordering. Number of loaders. If a matching row does not exist. no data is committed to any of the tables. Include in transaction The tables must be from the same datastore. it inserts the new row regardless of other options. Updates key column values when it loads data to the target. Data Integrator reads a row from the source and checks if a row exists in the target table with the same values in the primary key. This is particularly useful for data recovery operations. Ensures that the same row is not duplicated in a target table. Data Integrator reports a memory error. If you choose to enable transactional loading.Option Description Use input keys Enables Data Integrator to use the primary keys from the source table. Data Integrator also does not parameterize SQL or push operations to the database if transactional loading is enabled. it updates the row depending on the values of Ignore columns with value and Ignore columns with null. Data Integrator uses the primary key of the target table. By default. If loading fails for any one of the tables. Transactional loading can require rows to be buffered to ensure the correct load order. and Delete data from table before loading. and overflow file specification. these options are not available: Rows per commit. Update key columns Auto correct load When Auto correct load is selected. Use overflow file. Indicates where this table falls in the loading order of the tables being loaded. This option allows you to commit data to multiple tables as part of the same transaction. Transaction order . Indicates that this target is included in the transaction processed by a batch or real-time job. If a matching row exists. Enable partitioning.
. Although a template table can be used as a source table in multiple data flows. To create a template table 1.Option Description All loaders have a transaction order of zero. Open a data flow in the workspace. Tables with the same transaction order are loaded together. Using template tables During the initial design of an application. Tables with a transaction order of zero are loaded at the discretion of the data flow process. You can modify the schema of the template table in the data flow where the table is used as a target. Data Services automatically creates the table in the database with the schema defined by the data flow when you execute a job. Any changes are automatically applied to any other instances of the template table. and transform options. Template tables are particularly useful in early application development when you are designing and testing a project. After a template table is converted. In the tool palette. With template tables. If you specify orders among the tables. you might find it convenient to use template tables to represent database tables. click the Template Table icon and click the workspace to add a new template table to the data flow. functions. the loading operations are applied according to the order. it can be used only as a target in one data flow. After creating a template table as a target in one data flow. After a template table is created in the database. See the Data Services Performance Optimization Guide and “Description of objects” in the Data Services Reference Guide for more information. you can convert the template table in the repository to a regular table. 2. you do not have to initially create a new table in your RDBMS and import the metadata into Data Services. you can no longer alter the schema. Instead. You must convert template tables so that you can use the new table in expressions. you can use it as a source in other data flows.
Click OK.The Create Template dialog box displays. To convert a template table into a regular table from the Local Object Library 1. In the In datastore drop-down list. 5. . select the datastore for the template table. In the Table name field. On the Datastores tab of the Local Object Library. enter the name for the template table. 4. Right-click a template table you want to convert and select Import Table from the menu. 2. 3. expand the branch for the datastore to view the template table. You also can create a new template table in the Local Object Library Datastore tab by expanding a datastore and right-clicking Templates.
To update the icon in all data flows. the table is listed under Tables rather than Template Tables. select Refresh. from View menu. To convert a template table into a regular table from a data flow 1. . On the Datastore tab of the Local Object Library. Open the data flow containing the template table. 2. Right-click the template table you want to convert and select Import Table from the menu.Data Services converts the template table in the repository into a regular table by importing it from the database. 3.
you can execute the job in Data Services to see how the data moves from source to target. Enable auditing Collects audit statistics for this specific job execution. usually on the same machine) must be running. • Scheduled jobs Batch jobs are scheduled. Disable data validation statisticsDoes not collect audit statistics for this specific job collection execution. Setting execution properties When you execute a job. . You will likely run immediate jobs only during the development cycle. To schedule a job. use the Data Services Management Console or use a third-party scheduler. You can run jobs two ways: • Immediate jobs Data Services initiates both batch and real-time jobs and runs them immediately from within the Designer. you will be able to: • Understand job execution • Execute the job Explaining job execution After you create your project. you can then execute the job. both the Designer and designated Job Server (where the job executes. it does not execute. and associated data flows. After completing this unit. For these jobs. If a job has syntax errors.Executing the job Introduction Once you have created a data flow. jobs. the following options are available in the Execution Properties window: Option Description Print all trace messages Records all trace messages in the log. The Job Server must be running.
Allows a job to be distributed to multiple Job Servers for processing. The options are: • Job .Each sub-data flow (can be a separate transform or function) within a data flow will execute on a separate Job server.The entire job will execute on one server. Optimizes Data Services to use the cache statistics collected on a previous execution of the job. When enabled.Option Description Enable recovery Enables the automatic recovery feature. This option is not available when a job has not yet been executed or when recovery mode was disabled during the previous run.Each data flow within the job will execute on a separate server. Data Services uses the default datastore configuration for each datastore. • Data flow . Specifies the system configuration to use when executing this job. This option is a run-time property. which define the datastore connections. Data Services saves the results from completed steps and allows you to resume failed jobs. Collects statistics that the Data Services optimizer will use to choose an optimal cache type (in-memory or pageable). Resumes a failed job. Job Server or Server Group Specifies the Job Server or server group to execute this job. This option is a run-time property that is only available if there are system configurations defined in the repository. Data Services retrieves the results from any steps that were previously executed successfully and re-executes any other steps. A system configuration defines a set of datastore configurations. Distribution level . • Sub-data flow . Displays cache statistics in the Performance Monitor in Administrator. Recover from last failed execution Collect statistics for optimization Collect statistics for monitoring Use collected statistics System configuration If a system configuration is not specified.
right-click the job name and select Execute from the menu. . In the project area. Both the Designer and Job Server must be running for the job to execute. 3. Select the required job execution parameters. 2. Click OK. The Execution Properties dialog box displays. Data Services prompts you to save any objects that have not been saved.Executing the job Immediate or on demand tasks are initiated from the Designer. To execute a job as an imme diate task 1. Click OK. 4.
Lesson 4 Using Platform Transforms Lesson introduction A transform enables you to control how data sets change in a data flow. • Describe platform transforms • Use the Map Operation transform • Use the Validation transform • Use the Merge transform • Use the Case transform • Use the SQL transform .
in which case they do not have input options. Each transform provides different options that you can specify based on the transform's function. and Key Generation transforms are used for slowly changing dimensions. After completing this unit. Transforms are similar to functions in that they can produce the same or similar values during processing. updating. Transforms are added as components to your data flow in the same way as source and target objects. output data. transforms and functions operate on a different scale: • Functions operate on single values. Transforms are often used in combination to create the output data set. However. For example. can be used as source objects. History Preserve. such as the Date Generation and SQL transforms. You can choose to edit the input data. the Table Comparison. such as values in specific columns in a data set. . you will be able to: • • • • Explain transforms Describe the platform transforms available in Data Services Add a transform to a data flow Describe the Transform Editor window Explaining transforms Transforms are objects in data flows that operate on input data sets by changing them or by generating one or more new data sets. and parameters in a transform.Describing platform transforms Introduction Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target. and deleting rows of data. Some transforms. The Query transform is the most commonly-used transform. • Transforms operate on data sets by creating.
Allows conversions between operation codes. Performs the indicated SQL query operation. Data that fails validation can be filtered out or replaced.Describing platform transforms The following platform transforms are available on the Transforms tab of the Local Object Library: Icon Transform Description Case Divides the data from an input data set into multiple output data sets based on IF-THEN-ELSE branch logic. You can have one validation rule per column. Map Operation Merge Query Row Generation SQL Validation . Unifies rows from two or more input data sets into a single output data set. A query transform is similar to a SQL SELECT statement. Generates a column filled with integers starting at zero and incrementing by one to the end value you specify. Allows you to specify validation criteria for an input data set. Retrieves a data set that satisfies conditions that you specify.
All rows in a data set are flagged as NORMAL when they are extracted by a source table or file. Overwrites an existing row in the target table. Most transforms operate only on rows flagged as NORMAL. Rows flagged as DELETE are not loaded. Is ignored by the target. you will be able to: • Describe map operations • Use the Map Operation transform Describing map operations Data Services maintains operation codes that describe the status of each row in each data set described by the inputs to and outputs from objects in data flows. with the Preserve delete row(s) as update row(s) option selected. DELETE Only the History Preserving transform. The operation codes are as follows: Operation Code Description Creates a new row in the target. If a row is flagged as NORMAL when loaded into a target table or file. it is inserted as a new row in the target. The operation codes indicate how each row in the data set would be applied to a target table if the data set were loaded into a target. UPDATE Only History Preserving and Key Generation transforms can accept data sets with rows flagged as UPDATE as input. INSERT Only History Preserving and Key Generation transforms can accept data sets with rows flagged as INSERT as input.Using the Map Operation transform Introduction The Map Operation transform enables you to change the operation code for records. After completing this unit. can accept data sets with rows flagged as DELETE. NORMAL . Creates a new row in the target.
Inputs/Outputs Input for the Map Operation transform is a data set with rows flagged with any operation codes. or DISCARD. It can contain hierarchical data. if a row in the input data set has been updated in some previous operation in the data flow. NORMAL. Use caution when using columns of datatype real in this transform. and data output results for the Map Operation transform. data input requirements. DELETE. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target. The next section gives a brief description the function. Data Services can push Map Operation transforms to the source database. because comparison results are unpredictable for this datatype. Choose from the following operation codes: INSERT. Options The Map Operation transform enables you to set the Output row type option to indicate the new operations desired for the input data set. For more information on the Map Operation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. . you can use this transform to map the UPDATE operation to an INSERT.Explaining the Map Operation transform The Map Operation transform allows you to change operation codes on data sets to produce the desired output. options. UPDATE. Output for the Map Operation transform is a data set with rows flagged as specified by the mapping operations. For example.
Do not execute the solution job. 6. Objective • Use the Map Operation transform to remove any employee records that have a value in the discharge_date column. 10. change the settings so that rows with an input operation code of NORMAL have an output operation code of DELETE. To check the solution. create an expression to select only those rows where discharge_date is not empty. On the WHERE tab.discharge_date is not null 7. as this may override the results in your target table. import the file and open it to view the data flow design and mapping logic. In the data flow workspace. The expression should be: employee. add the Employee table from the Alpha datastore as the source object.Activity: Using the Map Operation transform End users of employee reports have requested that employee records in the data mart contain only current employees. 9. Note that two rows were filtered from the target table.atl is included in your Course Resources. 11. Add the Employee table from the HR_datamart datastore as the target object. disconnect the Query transform from the target table. In the transform editor for the Map Operation transform.Return to the data flow workspace and view data for both the source and target tables. .Execute Alpha_Employees_Current_Job with the default execution properties and save all objects you have created. Add a Map Operation transform between the Query transform and the target table and connect it to both. 4. In the data flow workspace. In the transform editor for the Query transform. Instructions 1. A solution file called SOLUTION_M apOperation. 5. create a new batch job called Alpha_Employees_Current_Job with a data flow called Alpha_Employees_Current_DF. In the Omega project. Add the Query transform to the workspace and connect all objects. 2. map all columns from the input schema to the same column in the output schema. 8. 3.
For example. The Validation transform qualifies a data set based on rules for input schema columns. if you want to load only sales records for October 2007. The available outputs are pass and fail. You can have one validation rule per column. you would set up a validation rule that states: Sales Date is between 10/1/2007 to 10/31/2007. or do both. you can choose to pass the record into a Fail table. Data Services looks at this date field in each record to validate if the data meets this requirement.Using the Validation transform Introduction The Validation transform enables you to create validation rules and move data into target objects based on whether they pass or fail validation. . For example. If it does not. After completing this unit. correct it in the Pass table. you can set the transform to ensure that all values: • Are within a specific range • Have the same format • Do not contain NULL values The Validation transform allows you to define a re-usable business rule to validate each record and column. It filters out or replaces data that fails your criteria. you will be able to: • Use the Validation transform Explaining the Validation transform Use the Validation transform in your data flows when you want to ensure that the data at any stage in the data flow meets your criteria.
The names of input columns associated with each message are separated by colons. Chapter 12 in the Data Services Reference Guide. Data Services adds the following two columns to the Fail output schemas: • The DI_ERRORACTION column indicates where failed data was sent in this way: 0 The letter B is used for sent to both Pass and Fail outputs. Both. Continuing with the example above. For more information on creating a custom Validation functions. and Both actions are specified for the row. Data Services does not track the results. 0 The letter F is used for sent only to the Fail output. You can also create a custom Validation function and select it when you create a validation rule. see “Validation Transform”. Pass. Input/Output Only one source is allowed as a data input for the Validation transform. specify the condition IS NOT NULL if you do not want any NULLS in data passed to the specified target. Other actions for other validation columns in the row are ignored.Your validation rule consists of a condition and an action on failure: • Use the condition to describe what you want for your valid data. The next section gives a brief description the function. For example. • Use the Action on Failure area to describe what happens to invalid or failed data. then the whole row is sent only to the Fail output. data input requirements. and data output results for the Validation transform. The Validation transform offers several options for creating this validation rule: . You can load pass and fail data into multiple targets. Fail. you select a column in the input schema and create a validation rule in the Validation transform editor. options. you may want to select the Send to Fail option to send all NULL values to a specified FAILED target table. Options When you use the Validation transform. for any NULL values. • The DI_ERRORCOLUMNS column displays all error messages for columns with failed rules. If a row has conditions set for multiple columns and the Pass. For more information on the Validation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. then the precedence order is Fail. For example. If you choose to send failed data to the Pass output. The Pass output schema is identical to the input schema. You may want to substitute a value for failed data that you send to the Pass output because Data Services does not add columns to the Pass output. The Validation transform outputs up to two different data sets based on whether the records pass or fail the validation condition you specify. “<ValidationTransformName> failed rule(s): c1:c2”. For example. if one column’s action is Send to Fail and the column fails.
timestamp. Send all NULL values to the Pass output automatically. =. • In: specify a list of possible values for a column. The Validation Do not validate when NULL Condition . • Between/and: specify a range of values for a column. You can define the NOT NULL constraint for the column in the LOOKUP table to ensure the Exists in table condition executes properly. Define the condition for the validation rule: • Operator: select an operator for a Boolean expression (for example. Data Services supports Validation functions that take one parameter and return an integer datatype. then Data Services processes it as TRUE. • Exists in table: specify that a column’s value must exist in a column in another table. This option also uses the LOOKUP_EXT function. Data Services will not apply the validation rule on this column when an incoming value for it is NULL. datetime.Option Description Enable Validation Turn the validation rule on and off for the column. • Custom validation function: select a function from a list for validation purposes. <. >) and enter the associated value. • Custom condition: create more complex expressions using the function and smart editors. If a return value is not a zero. • Match pattern: enter a pattern of upper and lowercase alphanumeric characters to ensure the format of the column is correct. decimal. date. or time. Data Services converts substitute values in the condition to a corresponding column datatype: integer. varchar.
4. you can choose to substitute a value or expression for the failed values that are sent to the Pass output.01.MM. You will require one target object for records that pass validation.MM. you specify a date as 12-01-2004.12.Option Description transform requires that you enter some values in specific formats: • date (YYYY.DD HH24:MI:SS) • time (HH24:MI:SS) • timestamp (YYYY.DD) • datetime (YYYY. On the Transforms tab of the Local Object Library. depending on the options you select. 5. 3.MM. Add your source object to the workspace. 6. Action on Fail To create a validation rule 1. Open the data flow workspace. Data Services produces an error because you must enter this date as 2004. click and drag the Validation transform to the workspace to the right of your source object. and an optional target object for records that fail validation.FF) If. 2. Define where a record is loaded if it fails the validation rule: • Send to Fail • Send to Pass • Send to both If you choose Send to Pass or Send to Both. for example. Connect the source object to the transform.DD HH24:MI:SS. Double-click the Validation transform to open the transform editor. Add your target objects to the workspace. .
Release the mouse and select the appropriate label for that object from the pop-up menu. 13. select the For pass. 8. 14. In the Condition area. 9. 15. substitute with option and enter a substitute value or expression for the failed value that is sent to the Pass output. All conditions must be Boolean expressions. select the Enable Validation option.7. 12. This option is only available if you select Send to Pass or Send to Both. click to select an input schema column. 11. select a condition type and enter any associated value required.Click and drag from the transform to the target object. enter a name and description for the validation rule.If desired.On the Action On Failure tab. 10.Click Back to return to the data flow workspace. In the input schema area.On the Properties tab. In the parameters area. . select an action.
adjust the datatypes for the columns based on their content: Column Datatype ORDERID SHIPPERNAME SHIPPERADDRESS SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION int varchar(50) varchar(50) varchar(50) int varchar(20) varchar(20) int . • Replace null values in the shipper fax column with a value of 'No Fax' and send those rows to a separate table for follow up.16. • Create a column on the target table for employee information so that orders taken by employees who are no longer with the company are assigned to a default current employee using the validation transform in a new column named order_assigned_to. You will use the Validation transform to validate order data from flat file sources and the alpha orders table before merging it.txt. Create a file format called Order_Shippers_Format for the flat file Order_Shippers_04_20_07. 2. Use the structure of the text file to determine the appropriate settings. In the Column Attributes pane. Activity: Using the Validation transform Order data is stored in multiple formats with different structures and different information. Instructions 1. • Create a column to hold the employee ID of the employee who originally made the sale.Repeat step 14 and step 15 for all target objects. Objectives • Join the data in the Orders flat files with that in the Order_Shippers flat files.
enter the correct path.Add the following mappings in the Query transform: S chema Out Mapping ORDERID CUSTOMERID ORDERDATE SHIPPERNAME SHIPPERADDRESS Orders_Format. Edit the source objects so that the Orders_Format source is using all three related orders flat files and the Order_Shippers_Format source is using all three order shippers files. 9.ORDERID Orders_Format.ORDERID 12. The instructor will provide this information.ORDERID = Orders_Format. If the Job Server is on a different machine than Designer. Tip: You can use a wildcard to replace the dates in the file names. 10. Add the file formats Orders_Format and Order_Shippers_Format as source objects to the Alpha_Orders_Files_DF data flow workspace.CUSTOMERID Orders_Format.SHIPPERNAME Order_Shippers_Format.Add a Query transform to the workspace and connect it to the two source objects. 5. 4.Column Datatype SHIPPERPOSTALCODE varchar(15) 3.ORDERDATE Order_Shippers_Format. 8. 7. In the Location drop-down list. create a WHERE clause to join the data on the OrderID values. 6.In the transform editor for the Query transform. In the Root directory. this step is required. edit the source objects to point to the file on the Job Server. If necessary.SHIPPERADDRESS . Edit the Orders_Format source object to change the Capture Data Conversion Errors option to Yes. select Job Server. create a new batch job called Alpha_Orders_Validated_Job and two data flows. and the second named Alpha_Orders_DB_DF. In the Omega project. one named Alpha_Orders_Files_DF. The expression should be as follows: Order_Shippers_Format. 11.
The expression should be as follows: HR_DATAMART.Add a Validation transform to the right of the Query transform and connect the transforms.Add a Query transform to the workspace and connect it to the source.SHIPPERPHONE Order_Shippers_Format.EM PLOYEEID 17. 14. substituting 'No Fax' for pass. 15.In the Alpha_Orders_DB_DF workspace. 20.SHIPPERPOSTALCODE 13.S chema Out Mapping SHIPPERCITY SHIPPERCOUNTRY SHIPPERPHONE SHIPPERFAX SHIPPERREGION SHIPPERPOSTALCODE Order_Shippers_Format.EMPLOYEEID.SHIPPERFAX Order_Shippers_Format. add the Orders table from the Alpha datastore as the source object.SHIPPERCITY Order_Shippers_Format. one called Orders_Files_Work and one called Orders_Files_No_Fax.SHIPPERREGION Order_Shippers_Format.In the transform editor for the Validation transform. 16. For pass.Add two target tables in the Delta datastore as targets. 18. substitute '3Cla5' to assign it to the default employee. 19. 21.Connect the pass output from the Validation transform to Orders_Files_Work and the fail output to Orders_Files_No_Fax.EMPLOYEEID.Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a datatype of varchar(15) and map it to Orders_Format. 22.EM PLOYEE.Enable validation for the SHIPPERFAX column to send NULL values to both pass and fail.DBO.In the transform editor for the Query transform.Set the action on failure for the Order_Assigned_To column to send to both pass and fail.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a datatype of varchar(15) and map it to Orders_Format. . enable validation for the ORDER_ASSIGNED_TO column to verify the value in the column exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore.SHIPPERCOUNTRY Order_Shippers_Format. map all of the columns from the input schema to the output schema. 23. except the EMPLOYEEID column.
View the data in the target tables to view the differences between passing and failing records.Enable validation for the ShipperFax column to send NULL values to both pass and fail. 32. For pass. import the file and open it to view the data flow design and mapping logic.EMPLOYEEID.Insert a new output column above ORDERDATE called ORDER_TAKEN_BY with a data type of varchar(15) and map it to Orders.atl is included in your Course Resources.EMPLOYEEID.Add a Validation transform to the right of the Query transform and connect the transforms.Enable validation for Order_Assigned_To to verify the column value exists in the EMPLOYEEID column of the Employee table in the HR_datamart datastore.Set the action on failure for the Order_Assigned_To column to send to both pass and fail. 31.Execute Alpha_Orders_Validated_Job with the default execution properties and save all objects you have created. . 26.Change the names of the following Schema Out columns: Old column name New column name SHIPPERCITYID SHIPPERCOUNTRYID SHIPPERREGIONID SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION 25. as this may override the results in your target table. 28.Insert a new output column above ORDERDATE called ORDER_ASSIGNED_TO with a data type of varchar(15) and map it to Orders.Add two target tables in the Delta datastore as targets. substituting 'No Fax' for pass. 27. 30. one named Orders_DB_Work and one named Orders_DB_No_Fax. To check the solution. 34.Connect the pass output from the Validation transform to Orders_DB_Work and the fail output to Orders_DB_No_Fax.24. 29. substitute '3Cla5' to assign it to the default employee. 33. A solution file called SOLUTION_Validation. Do not execute the solution job.
Input/Output The Merge transform performs a union of the sources. For more information on the Merge transform see “Transforms” Chapter 5 in the Data Services Reference Guide. For example. options. After completing this unit. and data output results for the Merge transform. . you could use the Merge transform to combine two sets of address data: The next section gives a brief description the function. including: • Number of columns • Column names • Column datatypes If the input data set contains hierarchical data. the names and datatypes must match at every level of the hierarchy. All sources must have the same schema.Using the Merge transform Introduction The Merge transform allows you to combine multiple sources with the same schema into a single target. data input requirements. you will be able to: • Use the Merge transform Explaining the Merge transform The Merge transform combines incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets.
you can add the Query transform to one of the tables before the Merge transform to redefine the schema to match the other table. The next step in the process is to modify the structure of those data sets so they match. Objectives • Use the Query transforms to modify any column names and data types and to perform lookups for any columns that reference other tables. 5. In the Omega project. Instructions 1. 2. connecting each source object to its own Query transform. Options The Merge transform does not offer any options. In the transform editor for the Query transform connected to the orders_files_work table. Tip: If you want to merge tables that do not have the same schema. In the data flow workspace. The transform does not strip out duplicate rows. the nested data is passed through without change. Add two Query transforms to the data flow. 3. • Use the Merge transform to merge the validated orders data. 4. add the orders_file_work and orders_db_work tables from the Delta datastore as the source objects. create a new batch job called Alpha_Orders_M erged_Job with a data flow called Alpha_Orders_M erged_DF . but the output is for two different sources: flat files and database tables. and then merge them into a single data set. The output data set contains a row for every row in the source data sets. Activity: Using the Merge transform The Orders data has now been validated. Change the datatype for the following Schema Out columns as specified: Column Type ORDERDATE SHIPPERADDRESS SHIPPERCOUNTRY SHIPPERREGION datetime varchar(100) varchar(50) varchar(50) . map all columns from input to output.The output data has the same schema as the source data. If columns in the input set contain nested schemas.
'PRE_LOAD_CACHE'.REGION.[REGIONID. The expression should be as follows: lookup_ext([ALPHA. "output_cols_info"='<?xml version="1. In the transform editor for the Query transform connected to the orders_db_work table.SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'.For the SHIPPERCITY column. map all columns from input to output. For the SHIPPERCOUNTRY column.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 7. [COUNTRYNAM E]. "output_cols_info"='<?xml version="1. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore. 9.[NULL].[COUNTRYID.SOURCE.ORDERS_FILE_WORK. [REGIONNAM E].'='.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 8. For the SHIPPERREGION column. The expression should be as follows: lookup_ext([ALPHA.'MAX'].Column Type SHIPPERPOSTALCODE varchar(50) 6.[NULL].SHIPPERREGION]) SET ("run_as_separate_process"-'no'.'M AX'].'PRE_LOAD_CACHE'. change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.COUNTRY. change the mapping to perform a lookup of CityName from the City table in the Alpha datastore. Change the datatype for the following Schema Out columns as specified: Column Type ORDER_TAKEN_BY ORDER_ASSIGNED_TO SHIPPERCITY SHIPPERCOUNTRY SHIPPERREGION varchar(15) varchar(15) varchar(50) varchar(50) varchar(50) 10.ORDERS_FILE_WORK.'='.SOURCE. .
SHIPPERCOUNTRY]) SET ("run_as_separate_process"-'no'.'M AX']. Note that the SHIPPERCITY.ORDERS_DB_WORK.'PRE_LOAD_CACHE'.View the data in the target table.'PRE_LOAD_CACHE'. A solution file called SOLUTION_M erge.'PRE_LOAD_CACHE'.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 13. "output_cols_info"='<?xml version="1.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 11. Do not execute the solution job.[COUNTRYID. change the mapping to perform a lookup of RegionName from the Region table in the Alpha datastore.SOURCE.[NULL]. To check the solution.Add a Merge transform to the data flow and connect both Query transforms to the Merge transform. and SHIPPERREGION columns for the 363 records in the template table consistently have names versus ID values. 14.ORDERS_DB_WORK. .SOURCE. [REGIONNAM E].[NULL].'M AX'].For the SHIPPERCOUNTRY column.'MAX']. The expression should be as follows: lookup_ext([ALPHA.SHIPPERCITY]) SET ("run_as_separate_process"-'no'. [CITYNAM E].Execute Alpha_Orders_Merged_Job with the default execution properties and save all objects you have created.REGION.SHIPPERREGION]) SET ("run_as_separate_process"-'no'. 15. import the file and open it to view the data flow design and mapping logic. as this may override the results in your target table.ORDERS_DB_WORK.Add a template table called Orders_M erged in the Delta datastore as the target table and connect it to the Merge transform. 16. "output_cols_info"='<?xml version="1.'='. "output_cols_info"='<?xml version="1.[NULL]. change the mapping to perform a lookup of CountryName from the Country table in the Alpha datastore. SHIPPERCOUNTRY.[CITYID. [COUNTRYNAM E].COUNTRY.0" encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/></output_cols_info>') 12.CITY. The expression should be as follows: lookup_ext([ALPHA.[REGIONID.atl is included in your Course Resources.'='.SOURCE.'='.For the SHIPPERREGION column.The expression should be as follows: lookup_ext([ALPHA.
The input and output schema are also identical when using the case transform. data input requirements. After completing this unit. you can use the Case transform to read a table that contains sales revenue facts for different regions and separate the regions into their own tables for more efficient data access: The next section gives a brief description the function. Depending on the data. options. The transform allows you to split a data set into smaller sets based on logical branches. only one of multiple branches is executed per row. Input/Output Only one data flow source is allowed as a data input for the Case transform. and data output results for the Case transform. For example. For more information on the Case transform. see “Transforms” Chapter 5 in the Data Services Reference Guide.Using the Case transform Introduction The Case transform supports separating data from a source into multiple targets based on branch logic. . you will be able to: • Use the Case transform Explaining the Case transform You use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform.
Each output label in the Case transform must be used at least once. You connect the output of the Case transform with another object in the workspace. On the Transforms tab of the Local Object Library. Define the Case expression for the corresponding label.The connections between the Case transform and objects used for a particular case must be labeled. Open the data flow workspace. 2. 6. Double-click the Case transform to open the transform editor. Options The Case transform offers several options: Option Description Label Define the name of the connection that describes where data will go if the corresponding Case condition is true. 5. Add your target objects to the workspace. Specify that the transform passes each row to the first case whose expression returns true. Connect the source object to the transform. . Add your source object to the workspace. Each label represents a case expression (WHERE clause). Specify that the transform must use the expression in this label when all other Case expressions evaluate to false. 4. Expression Produce default option with label Row can be TRUE for one case only To create a case statement 1. click and drag the Case transform to the workspace to the right of your source object. 3. You will require one target object for each possible condition in the case statement.
9. In this case. 13. .Enter the rest of the expression to define the condition. create the following statement: Customer. enter a label for the expression.RegionID = 1 11. 8. 14. 10. select the Produce default option with label option and enter the label name in the associated field. select the Row can be TRUE for one case only option.To direct records that meet multiple conditions to only one target.7.Click Back to return to the data flow workspace. Click and drag an input schema column to the Expression pane at the bottom of the window.Repeat step 7 to step 10 for all expressions. records are placed in the target associated with the first condition that evaluates as true. click Add to add a new expression. For example. In the parameters area of the transform editor. In the Label field. to specify that you want all Customers with a RegionID of 1.To direct records that do not meet any defined conditions to a separate target object. 12.
16. 5. Objective • Use the Case transform to create separate tables for orders occurring in fiscal quarters 3 and 4 for the year 2007 and quarter 1 of 2008. In the data flow workspace. Activity: Using the Case transform Once the orders have been validated and merged. 7. create the following labels and associated expressions: Label Expression Q42006 Query. add the Orders_Merged table from the Delta datastore as the source object.ORDERDATE.Release the mouse and select the appropriate label for that object from the pop-up menu.Repeat step 15 and step 16 for all target objects. 'YYYY') ORDERYEAR varchar(4) 6. 17.ORDERDATE) to_char (orders_merged. In the Omega project.Connect the transform to the target object. 3.15. In the transform editor for the Case transform. the resulting data set must be split out by quarter for reporting purposes. map all columns from input to output.ORDERYEAR = '2006' and Query. Add the following two output columns: Column Type Mapping ORDERQUARTER int quarter (orders_merged. Add a Case transform to the data flow and connect it to the Query transform. create a new batch job called Alpha_Orders_By_Quarter_Job with a data flow named Alpha_Orders_By_Quarter_DF. 2. Instructions 1. Add a Query transform to the data flow and connect it to the source table.ORDERQUARTER = 4 . 4. In the transform editor for the Query transform.
ORDERQUARTER = 1 Query.View the data in the target tables and confirm that there are 103 orders that were placed in Q1 of 2007. import the file and open it to view the data flow design and mapping logic. Orders_Q1_2007.ORDERQUARTER = 2 Query. 12. Do not execute the solution job.atl is included in your Course Resources. 10.Label Expression Q12007 Query. as this may override the results in your target table. Orders_Q3_2007.ORDERYEAR = '2007' and Query.ORDERQUARTER = 4 Q22007 Q32007 Q42007 8.ORDERYEAR = '2007' and Query. A solution file called SOLUTION_Case. To check the solution. 11.ORDERQUARTER = 3 Query.ORDERYEAR = '2007' and Query. Choose the settings to not produce a default output set for the Case transform and to specify that rows can be true for one case only. Orders_Q2_2007. Add five template tables in the Delta datastore called Orders_Q4_2006. 9.Connect the output from the Case transform to the target tables selecting the corresponding labels.Execute Alpha_Orders_By_Quarter_Job with the default execution properties and save all objects you have created. and Orders_Q4_2007.ORDERYEAR = '2007' and Query. .
Using the SQL transform Introduction The SQL transform allows you to submit SQL commands to generate data to be moved into target objects. you cannot use this functionality if your source objects include file formats. There are two ways of defining the output schema for a SQL transform if the SQL submitted is expected to return a result set: . For more information on the SQL transform see “Transforms” Chapter 5 in the Data Services Reference Guide. Inputs/Outputs There is no input data set for the SQL transform. The SQL transform can be used to extract for general select statements as well as stored procedures and views. and data output results for the SQL transform. data input requirements. The next section gives a brief description the function. After completing this unit. However. You can use the SQL transform as a replacement for the Merge transform when you are dealing with database tables only. The SQL transform performs more efficiently because the merge is pushed down to the database. you will be able to: • Use the SQL transform Explaining the SQL transform Use this transform to perform standard SQL operations when other built-in transforms cannot perform them. options.
Connect the transform to the target object. • Manual — Output columns must be defined in the output portion of the SQL transform if the SQL operation is returning a data set. 4. . Hold the output from this transform in memory for use in subsequent transforms.• Automatic — After you type the SQL statement. The default value is 1000. 3. On the Transforms tab of the Local Object Library. Use this only if the data set is small enough to fit in memory. click and drag the SQL transform to the workspace. The highest ranked source is accessed first to construct the join. 5. click Update schema to execute a select statement against the database that obtains column information returned by the select statement and populates the output schema. Specify the type of database for the datastore where there are multiple datastore configurations. but the column names and data types of the output columns do not need to match the column names or data types in the SQL query. Add your target object to the workspace. Open the data flow workspace. Options The SQL transform has the following options: Option Description Datastore Specify the datastore for the tables referred to in the SQL statement. Indicate the weight of the output data set if the data set is used in a join. Double-click the SQL transform to open the transform editor. 2. Database type Join rank Array fetch size Cache SQL text To create a SQL statement 1. The number of columns defined in the output of the SQL transform must equal the number of columns returned by the SQL query. Enter the text of the SQL query. Indicate the number of rows retrieved in a single request to a source database.
8. In the SQL text area. 7. In the parameters area.Click Back to return to the data flow workspace.Click and drag from the transform to the target object.Click Update Schema to update the output schema with the appropriate values. If required. select the source datastore from the Datastore drop-down list.6. You can also create the output columns manually. Change the other available options. if required. you can change the names and datatypes of these columns. 9. 12. select the appropriate configuration from the Database type drop-down list. If there is more than one datastore configuration. to copy the entire contents of a table into the target object. you would use the following statement: Select * from Customers. 10. enter the SQL statement. For example. 11. .
• Set up recoverable work flows • Using recovery mode • Using try/catch blocks and automatic recovery . you can use recoverable work flows and try/catch blocks to recover data.Lesson 5 Setting up Error Handling Lesson introduction For sophisticated error handling.
you could use the wait_for_file function or a while loop and the file_exists function to check that the file exists in a specified location before executing the job. put the work flow in sleep mode and then increment the counter. to ensure that the while loop eventually exits. the steps done during the while loop result in a change in the condition so that the condition is eventually no longer satisfied and the work flow exits from the while loop. can easily be sidestepped by constructing your jobs so that they take into account the issues that frequently cause them to fail. While loops The while loop is a single-use object that you can use in a work flow. repeat the while loop. you can have the work flow go into sleep mode for a particular length of time before checking again. As long as the file does not exist. you must resolve the problems that prevented the successful execution of the job. For example. Others. change the while loop to check for the existence of the file and the value of the counter. In each iteration of the loop. however. In other words. such as a counter.Using recovery mechanisms Introduction If a Data Services job does not complete properly. you might want a work flow to wait until the system writes a particular file. you will be able to: • • • • • Explain how to avoid data recovery situations Explain the levels of data recovery strategies Recover a failed job using automatic recovery Recover missing values and rows Define alternative work flows Avoiding data recovery situations The best solution to data recovery situations is obviously not to get into them in the first place. You can use a while loop to check for the existence of the file using the file_exists function. Some of those situations are unavoidable. If the condition does not change. you must add another check to the loop. the while loop does not end. The while loop repeats a sequence of steps as long as a condition is true. After completing this unit. . Typically. In this situation. such as server failures. Because the system might never write the file. As long as the file does not exist and the counter is less than a particular value. One example is when an external file is required to run a job.
Data Services executes the entire work flow during recovery. • Recover from partially-loaded tables: Use the Table Comparison transform. do a full replacement of the target. select the Recover as a unit check box. It is not recommended to mark a work flow or data flow as “Execute only once” if the parent work flow is a recovery unit. Click OK. some tables may have been loaded. On the General tab. you may need to specify that a work flow or data flow should only execute once. • Recover a partially-loaded job: Use automatic recovery. partially loaded. or altered. . This option is outside of the scope of this course. use the auto-correct load feature. The Properties dialog box displays. To specify a work flow as a recovery unit 1.Describing levels of data recovery strategies When a job fails to complete successfully during execution. some data flows may not have completed. try/catch blocks. You need to design your data movement jobs so that you can recover your data by rerunning the job and retrieving all the data without introducing duplicate or missing data. the job never re-executes that object. you should designate the work flow as a recovery unit. Note: It is important to note that some recovery mechanisms are for use in production systems and are not supported in development en viron ments. When this setting is enabled. Depending on the relationships between data flows in your application. You can: • Recover your entire database: Use your standard RDBMS services to restore crashed data cache to an entire database. 3. including the steps that executed successfully in prior work flow runs. 2. This requires the entire work flow to complete successfully. Configuring work flows and data flows In some cases. In the project area or on the Work Flows tab of the Local Object Library. include a preload SQL command to avoid duplicate loading of rows when recovering from partially loaded tables. When this happens. • Define alternative work flows: Use conditionals. When there is a dependency like this. If the work flow does not complete successfully. • Recover missing values or rows: Use the Validation transform or the Query transform with WHERE clauses to identify missing values. you may use a combination of these techniques to recover from exceptions. There are different levels of data recovery and recovery strategies. and use overflow files to manage rows that could not be inserted. and scripts to ensure all exceptions are managed in a work flow. steps in a work flow depend on each other and must be executed together. Conversely. right-click the work flow and select Properties from the menu.
If your recovery job uses new extraction criteria. Data Services retrieves the results for successfully-completed steps and reruns uncompleted or failed steps under the same conditions as the original job. . This includes steps that failed and steps that generated an exception but completed successfully. For example. To ensure that the fact tables are loaded with the data that corresponds properly to the data already loaded in the dimension tables. The Execution Properties dialog box displays. The Properties dialog box displays. Click OK. The recovery job does not reload the dimension tables in a failed job because the original job. • Your recovery job must follow the exact execution path that the original job followed. such as those in a try/catch block. On the Parameters tab. even though it failed. successfully loaded the dimension tables. ensure the following: • Your recovery job must use the same extraction criteria that your original job used when loading the dimension tables. the job execution may follow a completely different path through conditional steps or try/catch blocks. 2. 3. the database log overflows and stops the job from loading fact tables. you can execute the job again in recovery mode. the data in the fact tables will not correspond to the data previously extracted into the dimension tables. On the General tab. As in normal job execution. To enable automatic recovery in a job 1.To specify that an object executes only once 1. right-click the work flow or data flow and select Properties from the menu. The next day. suppose a daily update job running overnight successfully loads dimension tables in a warehouse. while the job is running. During recovery mode. Using recovery mode If a job with automated recovery enabled fails during execution. However. In recovery mode. you truncate the log file and run the job again in recovery mode. 2. Data Services executes the steps in parallel if they are not connected in the work flow diagrams and in serial if they are connected. If your recovery job uses new values. select the Execute only once check box. Data Services executes the steps or recovery units that did not complete successfully in a previous execution. In the project area. right-click the job and select Execute from the menu. In the project area or on the appropriate tab of the Local Object Library. such as basing data extraction on the current system date. select the Enable recovery check box. Data Services records any external inputs to the original job so that your recovery job can use these stored values and follow the same execution path.
To recover from last execution 1. The Execution Properties dialog box displays. Typically. Chapter 18 in the Data Services Designer Guide. . select the Recover from last execution check box. The auto-correct load checks the target table for existing rows before adding new rows to the table. Recovering from partially-loaded data Executing a failed job again may result in duplication of rows that were loaded successfully during the first job run. Within your recoverable work flow. the previous job run succeeded. • Change the target table options to use the auto-correct load feature when you have tables with fewer rows and more fields. 3. or recovery mode was disabled during the previous run. see “Using preload SQL to allow re-executable Data Flows”. • Include a SQL command to execute before the table loads. For more information on preloading SQL commands. In the project area. however. On the Parameters tab. Recovering missing values or rows Missing values that are introduced into the target data during data integration and data quality processes can be managed using the Validation or Query transforms. 3. such as dimension tables. can slow jobs executed in non-recovery mode. This option is not available when a job has not yet been executed. you can use several methods to ensure that you do not insert duplicate rows: • Include the Table Comparison transform (available in Data Integrator packages only) in your data flow when you have tables with more rows and fewer fields. Data Services does not record the results from the steps during the job and cannot recover the job if it fails.If this check box is not selected. 2. such as fact tables. Using the auto-correct load option. Preload SQL commands can remove partial database updates that occur during incomplete execution of a step in a job. Click OK. right-click the job that failed and select Execute from the menu. Click OK. Consider this technique when the target table is large and the changes to the table are relatively few. the preload SQL command deletes rows based on a variable that is set before the partial insertion step began. • Change the target table options to completely replace the target table during each execution. This technique can be optimal when the changes to the target table are numerous compared to the size of the table.
2. When you specify an overflow file and Data Services cannot load a row into a table. Data Services writes the row to the overflow file instead. 3. For example. This script reads the value in a status table and populates a global variable with the same value. In the File format drop-down list. 4. Open the target table editor for the target table in your data flow. The initial value in table is set to indicate that recovery is not required. A script to determine if recovery is required. select the Use overflow file check box. select what you want Data Services to write to the file about the rows that failed to load: • If you select Write data. Overflow files help you process this type of data problem. You can use the overflow information to identify invalid data in your source or problems introduced in the data movement. 2. When you specify an overflow file. The trace log indicates the data flow in which the load failed and the location of the file. you can use the commands to load the target manually when the target is accessible. Every new run will overwrite the existing overflow file. In the File name field. . Alternative work flows consist of several components: 1.Missing rows are rows that cannot be inserted into the target table. you can use Data Services to specify the format of the error-causing records in the overflow file. This technique allows you to automate the process of recovering your results. To use an overflow file in a job 1. give a full path name to ensure that Data Services creates a unique file when more than one file is created in the same job. • If you select Write sql. Defining alternative work flows You can set up your jobs to use alternative work flows that cover all possible exceptions and have recovery mechanisms built in. enter or browse to the full path and file name for the file. A conditional that calls the appropriate work flow based on whether recovery is required. under Error handling. rows may be missing in instances where a primary key constraint is violated. On the Options tab.
the data flow is executed in a less resource-intensive mode. 4. . 5. The data flow where recovery is not required is set up without the auto correct load option set. wherever possible. The script specifies that recovery is required if any exceptions are generated. A work flow to execute a data flow with recovery and a script to update the status table. The data flow is set up for more resource-intensive processing that will resolve the exceptions. This ensures that. The script updates the status table to indicate that recovery is not required. 3. A work flow with a try/catch block to execute a data flow without recovery. and those that do require recovery are processed another way.The conditional contains an If/Then/Else statement to specify that work flows that do not require recovery are processed one way. A script in the catch object to update the status table.
Conditionals are single-use objects used to implement conditional logic in a work flow. When you define a conditional, you must specify a condition and two logical branches:
S tatement Description
A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression. Work flow element to execute if the IF expression evaluates to TRUE. Work flow element to execute if the IF expression evaluates to FALSE.
Both the Then and Else branches of the conditional can contain any object that you can have in a work flow, including other work flows, data flows, nested conditionals, try/catch blocks, scripts, and so on.
A try/catch block allows you to specify alternative work flows if errors occur during job execution. Try/catch blocks catch classes of errors, apply solutions that you provide, and continue execution. For each catch in the try/catch block, you will specify: • One exception or group of exceptions handled by the catch. To handle more than one exception or group of exceptions, add more catches to the try/catch block. • The work flow to execute if the indicated exception occurs. Use an existing work flow or define a work flow in the catch editor. If an exception is thrown during the execution of a try/catch block, and if no catch is looking for that exception, then the exception is handled by normal error logic.
Using try/catch blocks and automatic recovery
Data Services does not save the result of a try/catch block for re-use during recovery. If an exception is thrown inside a try/catch block, during recovery Data Services executes the step that threw the exception and subsequent steps. Because the execution path through the try/catch block might be different in the recovered job, using variables set in the try/catch block could alter the results during automatic recovery.
For example, suppose you create a job that defines the value of variable $I within a try/catch block. If an exception occurs, you set an alternate value for $I. Subsequent steps are based on the new value of $I.
During the first job execution, the first work flow contains an error that generates an exception, which is caught. However, the job fails in the subsequent work flow.
You fix the error and run the job in recovery mode. During the recovery execution, the first work flow no longer generates the exception. Thus the value of variable $I is different, and the job selects a different subsequent work flow, producing different results.
To ensure proper results with automatic recovery when a job contains a try/catch block, do not use values set inside the try/catch block or reference output variables from a try/catch block in any subsequent steps.
To create an alte rnative work flow
1. Create a job. 2. Add a global variable to your job called $G_recovery_needed with a datatype of int . The purpose of this global variable is to store a flag that indicates whether or not recovery is needed. This flag is based on the value in a recovery status table, which contains a flag of 1 or 0, depending on whether recovery is needed. 3. In the job workspace, add a work flow using the tool palette. 4. In the work flow workspace, add a script called GetStatus using the tool palette. 5. In the script workspace, construct an expression to update the value of the $G_recovery_needed global variable to the same value as is in the recovery status table. The script content depends on the RDBMS on which the status table resides. The following is an example of the expression:
$G_recovery_needed = sql('DEM O_Target', 'select recovery_flag from recovery_status');
6. Return to the work flow workspace. 7. Add a conditional to the workspace using the tool palette and connect it to the script. 8. Open the conditional. The transform editor for the conditional allows you to specify the IF expression and Then/Else branches.
Add a script called Fail to the lower pane using the Tool. All exception types are lists in the Available exceptions pane. .Open the workspace for the catch object. Data Services catches all exceptions. If recovery is needed. 10. you can add a data flow here instead of a script.In the Local Object Library.To change which exceptions act as triggers. If desired. 12. 13. By default. indicating that recovery is needed.Add a try object to the Then pane of the transform editor using the tool palette. select the appropriate exceptions.Add a catch object to the Then pane after the work flow or data flow using the tool palette. enter the expression that evaluates whether recovery is required. 15. In the IF field. The following is an example of the expression: $G_recovery_needed = 0 This means the objects in the Then pane will run if recovery is not required. click and drag a work flow or data flow to the Then pane after the try object. the objects in the Else pane will run. and click Set to move them to the Trigger on these exceptions pane. construct an expression update the flag in the recovery status table to 1. 17. expand the tree in the Available exceptions pane.In the script workspace. 11.9.Connect the objects in the Then pane. 14. This object will be executed if there are any exceptions. 16.
click and drag the work flow or data flow that represents the recovery process to the Else pane.Add a script called Pass to the lower pane using the tool palette. the job fails because the target table already contains records. The following is an example of the expression: sql('DEMO_Target'. This combination means that if recovery is not needed. the job succeeds because the recovery_flag value in the status table is set to 0 and the target table is empty.Return to the conditional workspace. 25. indicating that recovery is not needed.'update recovery_status set recovery_flag = 1').The script content depends on the RDBMS on which the status table resides. 21.Connect the objects in the Then pane. The first time this job is executed. 28.Execute the job again.Check the contents of the status table. 29. 20. The following is an example of the expression: sql('DEMO_Target'. then the first object will be executed.In the script workspace. 23. 19. The script content depends on the RDBMS on which the status table resides. The third time this job is executed. 26. construct an expression to update the flag in the recovery status table to 0. 22.Execute the job. 30.Execute the job again. . The job succeeds because the auto correct load feature checks for existing values before trying to insert new rows. The recovery_flag field now contains a value of 1. so there is a primary key exception. The recovery_flag field contains a value of 0.Validate and save all objects.'update recovery_status set recovery_flag = 0'). 24.Return to the conditional workspace. The second time this job is executed.Check the contents of the status table again. 27.Connect the objects in the Else pane. so there is no primary key constraint. the second object will be executed. 18. if recovery is required.In the Local Object Library. the version of the data flow with the Auto correct load option selected runs because the recovery_flag value in the status table is set to 1.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.