You are on page 1of 70

SAP BusinessObjects Data Services Getting Started Guide

SAP BusinessObjects Data Services XI 3.2 SP1 (12.2.1)

Copyright

2009 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company.All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. 2009-10-24

Contents
Chapter 1 Overview of SAP BusinessObjects Data Services 5 SAP BusinessObjects Data Services and the SAP BusinessObjects solution portfolio........................................................................................................6 Software benefits.........................................................................................7 Unification with the platform...................................................................7 Ease of use and high productivity..........................................................8 High availability and performance..........................................................8 Associated software.....................................................................................8 SAP BusinessObjects Metadata Management......................................9 Interfaces.....................................................................................................9 Chapter 2 Architecture 11

Standard components................................................................................12 Designer...............................................................................................14 Repository............................................................................................14 Job Server............................................................................................15 Engine..................................................................................................15 Access Server......................................................................................16 Address Server.....................................................................................16 Administrator........................................................................................16 Metadata Reports applications.............................................................17 Metadata Integrator..............................................................................19 Service.................................................................................................20 SNMP Agent.........................................................................................21 Adapter SDK........................................................................................21 Optional components.................................................................................21

SAP BusinessObjects Data Services Getting Started Guide

Contents

Multi-user..............................................................................................21 Management tools.....................................................................................22 License Manager..................................................................................22 Repository Manager.............................................................................22 Server Manager....................................................................................22 Operating system platforms.......................................................................23 Distributed architecture..............................................................................23 Host names and port numbers.............................................................25 Appendix A Glossary 27

Index

69

SAP BusinessObjects Data Services Getting Started Guide

Overview of SAP BusinessObjects Data Services

Overview of SAP BusinessObjects Data Services SAP BusinessObjects Data Services and the SAP BusinessObjects solution portfolio

About this section

This section introduces SAP BusinessObjects Data Services and explains its place in the SAP BusinessObjects solution portfolio.
Related Topics

SAP BusinessObjects Data Services and the SAP BusinessObjects solution portfolio Software benefits Interfaces

SAP BusinessObjects Data Services and the SAP BusinessObjects solution portfolio
The SAP BusinessObjects solution portfolio delivers extreme insight through specialized end-user tools on a single, trusted business intelligence platform. This entire platform is supported by SAP BusinessObjects Data Services. On top of SAP BusinessObjects Data Services, the SAP BusinessObjects solution portfolio layers the most reliable, scalable, flexible, and manageable business intelligence (BI) platform which supports the industry's best integrated end-user interfaces: reporting, query and analysis, and performance management dashboards, scorecards, and applications. True data integration blends batch extraction, transformation, and loading (ETL) technology with real-time bi-directional data flow across multiple applications for the extended enterprise. By building a relational datastore and intelligently blending direct real-time and batch data-access methods to access data from enterprise resource planning (ERP) systems and other sources, SAP has created a powerful, high-performance data integration product that allows you to fully leverage your ERP and enterprise application infrastructure for multiple uses. SAP provides a batch and real-time data integration system to drive today's new generation of analytic and supply-chain management applications. Using the highly scalable data integration solution provided by SAP, your enterprise can maintain a real-time, on-line dialogue with customers, suppliers,

SAP BusinessObjects Data Services Getting Started Guide

Overview of SAP BusinessObjects Data Services Software benefits

employees, and partners, providing them with the critical information they need for transactions and business analysis.

Software benefits
Use SAP BusinessObjects Data Services to develop enterprise data integration for batch and real-time uses. With the software: You can create a single infrastructure for batch and real-time data movement to enable faster and lower cost implementation. Your enterprise can manage data as a corporate asset independent of any single system. Integrate data across many systems and reuse that data for many purposes. You have the option of using pre-packaged data solutions for fast deployment and quick ROI. These solutions extract historical and daily data from operational systems and cache this data in open relational databases.

The software customizes and manages data access and uniquely combines industry-leading, patent-pending technologies for delivering data to analytic, supply-chain management, customer relationship management, and Web applications.

Unification with the platform


SAP BusinessObjects Data Services provides several points of platform unification: Get end-to-end data lineage and impact analysis Create the semantic layer (universe) and manage change within the ETL design environment

SAP deeply integrates the entire ETL process with the business intelligence platform so you benefit from: Easy metadata management Simplified and unified administration Life cycle management

SAP BusinessObjects Data Services Getting Started Guide

Overview of SAP BusinessObjects Data Services Associated software

Trusted information

Ease of use and high productivity


SAP BusinessObjects Data Services combines both batch and real-time data movement and management to provide a single data integration platform for information management from any information source, for any information use. Using the software, you can: Stage data in an operational datastore, data warehouse, or data mart. Update staged data in batch or real-time modes. Create a single graphical development environment for developing, testing, and deploying the entire data integration platform. Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.

High availability and performance


The high-performance engine and proven data movement and management capabilities of SAP BusinessObjects Data Services include: Scalable, multi-instance data-movement for fast execution Load balancing Changed-data capture Parallel processing

Associated software
Choose from other SAP BusinessObjects solution portfolio software options to further support and enhance the power of your SAP BusinessObjects Data Services software.

SAP BusinessObjects Data Services Getting Started Guide

Overview of SAP BusinessObjects Data Services Interfaces

SAP BusinessObjects Metadata Management


SAP BusinessObjects Metadata Management provides an integrated view of metadata and its multiple relationships for a complete Business Intelligence project spanning some or all of the SAP BusinessObjects solution portfolio. Use the software to: View metadata about reports, documents, and data sources from a single repository. Analyze lineage to determine data sources of documents and reports. Analyze the impact of changing a source table, column, element, or field on existing documents and reports. Track different versions (changes) to each object over time. View operational metadata (such as the number of rows processed and CPU utilization) as historical data with a datetime. View metadata in different languages.

For more information on SAP BusinessObjects Metadata Management, contact your SAP sales representative.

Interfaces
SAP BusinessObjects Data Services provides many types of interface components. Your version of the software may provide some or all of them. You can use the Interface Development Kit to develop adapters that read from and/or write to other applications. In addition to the interfaces listed above, the Nested Relational Data Model (NRDM) allows you to apply the full power of SQL transforms to manipulate, process, and enrich hierarchical business documents. For a detailed list of supported environments and hardware requirements, see the Supported Platforms document available in the SAP BusinessObjects Support > Documentation > Supported Platforms/PARs section of the SAP Service Marketplace: https://service.sap.com/bosap-support. This document includes specific version and patch-level requirements for databases, applications, web application servers, web browsers, and operating systems.

SAP BusinessObjects Data Services Getting Started Guide

Overview of SAP BusinessObjects Data Services Interfaces

Related Topics

Designer Guide: Nested Data

10

SAP BusinessObjects Data Services Getting Started Guide

Architecture

Architecture Standard components

This section describes SAP BusinessObjects Data Services components and their distribution on your network. This section contains the following topics: Standard components Optional components Management tools Operating system platforms Distributed architecture

The architecture is layered to allow data integration to occur over a variety of open, industry-standard APIs for optimal data and metadata management.
Related Topics

Standard components Optional components Management tools Operating system platforms Distributed architecture

Standard components
The following diagram summarizes the relationships among SAP BusinessObjects Data Services components.

12

SAP BusinessObjects Data Services Getting Started Guide

Architecture Standard components

For a detailed list of supported environments and hardware requirements, see the Supported Platforms document available in the SAP BusinessObjects Support > Documentation > Supported Platforms/PARs section of the SAP Service Marketplace: https://service.sap.com/bosap-support . This document includes specific version and patch-level requirements for databases, applications, web application servers, web browsers, and operating systems.
Related Topics

Designer Repository Job Server Engine Access Server Address Server Administrator

SAP BusinessObjects Data Services Getting Started Guide

13

Architecture Standard components

Metadata Reports applications Service SNMP Agent Adapter SDK

Designer
The Designer is a development tool with an easy-to-use graphical user interface. It enables developers to define data management applications that consist of data mappings, transformations, and control logic. Use the Designer to create applications containing work flows (job execution definitions) and data flows (data transformation definitions). To use the Designer, create objects, then drag, drop, and configure them by selecting icons in flow diagrams, table layouts, and nested workspace pages. The objects in the Designer represent metadata. The Designer interface allows you to manage metadata stored in a repository. From the Designer, you can also trigger the Job Server to run your jobs for initial application testing.
Related Topics

Repository Job Server

Repository
The SAP BusinessObjects Data Services repository is a set of tables that hold user-created and predefined system objects, source and target metadata, and transformation rules. Set up repositories on an open client/server platform to facilitate sharing metadata with other enterprise tools. Store each repository on an existing RDBMS. Each repository is associated with one or more Job Servers which run the jobs you create. There are two types of repositories: A local repository is used by an application designer to store definitions of objects (like projects, jobs, work flows, and data flows) and source/target metadata.

14

SAP BusinessObjects Data Services Getting Started Guide

Architecture Standard components

A central repository is an optional component that can be used to support multi-user development. The central repository provides a shared object library allowing developers to check objects in and out of their local repositories.

Job Server
The SAP BusinessObjects Data Services Job Server starts the data movement engine that integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions from ERP systems and other sources. The Job Server can move data in either batch or real-time mode and uses distributed query optimization, multi-threading, in-memory caching, in-memory data transformations, and parallel processing to deliver high data throughput and scalability. While designing a job, you can run it from the Designer which tells the Job Server to run the job. The Job Server gets the job from its associated repository, then starts an engine to process the job. In your production environment, the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Access Server. In production environments, you can balance job loads by creating a Job Server Group (multiple Job Servers) which executes jobs according to overall system load.
Related Topics

Engine Access Server

Engine
When SAP BusinessObjects Data Services jobs are executed, the Job Server starts engine processes to perform data extraction, transformation, and movement. The engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability.

SAP BusinessObjects Data Services Getting Started Guide

15

Architecture Standard components

Access Server
The SAP BusinessObjects Data Services Access Server is a real-time, request-reply message broker that collects message requests, routes them to a real-time service, and delivers a message reply within a user-specified time frame. The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high. You can configure multiple Access Servers.

Address Server
The SAP BusinessObjects Data Services Address Server provides address validation and correction for the Global Address Cleanse EMEA engine and Global Suggestion Lists. The Address Server must be started prior to processing data flows that contain the Global Suggestion List transform or the Global Address Cleanse transform with the EMEA engine enabled.

Administrator
The Administrator provides browser-based administration of SAP BusinessObjects Data Services resources including: Scheduling, monitoring, and executing batch jobs Configuring, starting, and stopping real-time services Configuring Job Server, Access Server, and repository usage Configuring and managing adapters Managing users Publishing batch jobs and real-time services via Web services

16

SAP BusinessObjects Data Services Getting Started Guide

Architecture Standard components

Metadata Reports applications


The Metadata Reports applications provide browser-based analysis and reporting capabilities on metadata that is associated with: your SAP BusinessObjects Data Services jobs other SAP BusinessObjects solution portfolio applications associated with SAP BusinessObjects Data Services

Metadata Reports provide four applications for exploring your metadata: Impact and lineage analysis Operational dashboards Auto documentation Data validation

Impact and Lineage Analysis reports


Impact and Lineage Analysis reports include: Datastore Analysis For each datastore connection, view overview, table, function, and hierarchy reports. SAP BusinessObjects Data Services users can determine: What data sources populate their tables What target tables their tables populate Whether one or more of the following SAP BusinessObjects solution portfolio reports uses data from their tables: Business Views Crystal Reports SAP BusinessObjects BW Universes Builder SAP BusinessObjects Web Intelligence documents SAP BusinessObjects Desktop Intelligence documents

SAP BusinessObjects Data Services Getting Started Guide

17

Architecture Standard components

Universe analysis View Universe, class, and object lineage. Universe users can determine what data sources populate their Universes and what reports use their Universes. Business View analysis View the data sources for Business Views in the Central Management Server (CMS). You can view business element and business field lineage reports for each Business View. Crystal Business View users can determine what data sources populate their Business Views and what reports use their views. Report analysis View data sources for reports in the Central Management Server (CMS). You can view table and column lineage reports for each Crystal Report and Web Intelligence Document managed by CMS. Report writers can determine what data sources populate their reports.nic Dependency analysis Search for specific objects in your repository and understand how those objects impact or are impacted by other SAP BusinessObjects Data Services or SAP BusinessObjects BW Universe Builder objects and reports. Metadata search results provide links back into associated reports.

To view impact and lineage analysis for SAP BusinessObjects solution portfolio applications, you must configure the Metadata Integrator.
Related Topics

Installation Guide: Installing and Configuring the Metadata Integrator

Operational Dashboard reports


Operational dashboard reports provide graphical depictions of SAP BusinessObjects Data Services job execution statistics. This feedback allows you to view at a glance the status and performance of your job executions for one or more repositories over a given time period. You can then use this information to streamline and monitor your job scheduling and management for maximizing overall efficiency and performance.

Auto Documentation reports


Auto documentation reports provide a convenient and comprehensive way to create printed documentation for all of the objects you create in SAP

18

SAP BusinessObjects Data Services Getting Started Guide

Architecture Standard components

BusinessObjects Data Services. Auto documentation reports capture critical information for understanding your jobs so you can see at a glance the entire ETL process. After creating a project, you can use Auto documentation reports to quickly create a PDF or Microsoft Word file that captures a selection of job, work flow, and/or data flow information including graphical representations and key mapping details.

Data Validation dashboard


Data Validation dashboard reports provide graphical depictions that let you evaluate the reliability of your target data based on the validation rules you created in your SAP BusinessObjects Data Services batch jobs. This feedback allows business users to quickly review, assess, and identify potential inconsistencies or errors in source data.

Metadata Integrator
The Metadata Integrator allows SAP BusinessObjects Data Services to seamlessly share metadata with SAP BusinessObjects business intelligence (BI) solutions. Run the Metadata Integrator to collect metadata into the SAP BusinessObjects Data Services repository for Business Views and Universes used by Crystal Reports, SAP BusinessObjects Desktop Intelligence documents, and SAP BusinessObjects Web Intelligence documents.

SAP BusinessObjects Data Services Getting Started Guide

19

Architecture Standard components

Service
The SAP BusinessObjects Data Services Service is installed when Job and Access Servers are installed. The Service starts Job Servers and Access Servers when you restart your system. The Windows service name is Data Services Service. The UNIX equivalent is a daemon named AL_JobService.

20

SAP BusinessObjects Data Services Getting Started Guide

Architecture Optional components

SNMP Agent
SAP BusinessObjects Data Services error events can be communicated using applications supported by simple network management protocol (SNMP) for better error monitoring. Install an SAP BusinessObjects Data Services SNMP agent on any computer running a Job Server. The SNMP agent monitors and records information about the Job Servers and jobs running on the computer where the agent is installed. You can configure network management software (NMS) applications to communicate with the SNMP agent. Thus, you can use your NMS application to monitor the status of jobs.

Adapter SDK
The SAP BusinessObjects Data Services Adapter SDK provides a Java platform for rapid development of adapters to other applications and middleware products such as EAI systems. Adapters use industry-standard XML and Java technology to ease the learning curve. Adapters provide all necessary styles of interaction including: reading, writing, and request-reply from SAP BusinessObjects Data Services to other systems request-reply from other systems to SAP BusinessObjects Data Services

Optional components
Multi-user
SAP BusinessObjects Data Services Multi-user is an advanced optional component that enables your development team to work together on interdependent parts of an application through all phases of development. While each user works on applications in a unique local repository, the team uses a central repository to store the master copy of the entire project. The central repository preserves all versions of an application's objects, so you can revert to a previous version if needed.

SAP BusinessObjects Data Services Getting Started Guide

21

Architecture Management tools

Multi-user development includes other advanced features such as labeling and filtering to provide you with more flexibility and control in managing application objects. For more details, see the Management Console: Administrator Guide and the Advanced Development Guide.

Management tools
SAP BusinessObjects Data Services has several management tools to assist you in managing your components.

License Manager
The License Manager displays the SAP BusinessObjects Data Services components for which you currently have a license.

Repository Manager
The Repository Manager allows you to create, upgrade, and check the versions of local and central repositories.

Server Manager
The Server Manager allows you to add, delete, or edit the properties of Job Servers and Access Servers. It is automatically installed on each computer on which you install a Job Server or Access Server. Use the Server Manager to define links between Job Servers and repositories. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (separating test from production, for example). You can also specify a Job Server as SNMP-enabled.

22

SAP BusinessObjects Data Services Getting Started Guide

Architecture Operating system platforms

The Server Manager is also where you specify SMTP server settings for the smtp_to email function..
Related Topics

Designer Guide: Monitoring Jobs, SNMP support Reference Guide: To define and enable the smtp_to function

Operating system platforms


For a detailed list of supported environments and hardware requirements, see the Supported Platforms document available in the SAP BusinessObjects Support > Documentation > Supported Platforms/PARs section of the SAP Service Marketplace: https://service.sap.com/bosap-support . This document includes specific version and patch-level requirements for databases, applications, web application servers, web browsers, and operating systems.

Distributed architecture
SAP BusinessObjects Data Services has a distributed architecture. An Access Server can serve multiple Job Servers and repositories. The multi-user licensed extension allows multiple Designers to work from a central repository. The following diagram illustrates both of these features.

SAP BusinessObjects Data Services Getting Started Guide

23

Architecture Distributed architecture

You can distribute software components across multiple computers, subject to the following rules: Engine processes run on the same computer as the Job Server that spawns them Adapters require a local Job Server

Distribute components across a number of computers to best support the traffic and connectivity requirements of your network. You can create a minimally distributed system, designed for developing and testing or a highly distributed system designed to scale with the demands of a production environment.

24

SAP BusinessObjects Data Services Getting Started Guide

Architecture Distributed architecture

Host names and port numbers


Communication between a Web application, the Access Server, the Job Server, and real-time services occurs through TCP/IP connections specified by IP addresses (or host names) and port numbers. If your network does not use static addresses, use the name of the computer as the host name. If connecting to a computer that uses a static IP address, use that number as the host name for Access Server and Job Server configurations. To allow for a highly scalable system, each component maintains its own list of connections. You define these connections through the Server Manager, the Administrator, Repository Manager, and the Message Client library calls (from Web client).
Related Topics

Installation Guide: Preparing to Install the software, Check port assignments

SAP BusinessObjects Data Services Getting Started Guide

25

Architecture Distributed architecture

26

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Glossary

ABAP Advanced Business Application Programming. A fourth-generation programming language developed by SAP in which SAP Applications are written. ABAP data flow A data flow that extracts data from an SAP Applications source table. Data Services translates steps you define in an ABAP data flow into ABAP and then passes the ABAP program back to your SAP Application system for execution. The resulting table or file resides on the SAP Application system to be used as a source in the parent data flow. ABAP program A program that executes database operations on an SAP Applications server. Data Services ABAP data flows generate ABAP programs. Access Server The Access Server dispatches requests to real-time services, ensuring optimal load balancing and complete life cycle management. Adapter An external Data Services interface. There are two types of adapters: Custom adapters Adapters developed using the Adapter SDK (Software Development Kit) Prepackaged adapters Adapters prebuilt and purchased from SAP, such as the Data Services Salesforce.com adapter Address Cleanse Transforms that produce a correct and complete standardized form of an input address. The transform can also assign codes for postal automation and append other useful address information. address line A line of data in an address that contains the primary and, possibly, secondary address. The primary address contains components such as the primary range, primary name, directionals (post- and pre-), and the suffix. The secondary address normally contains components such as the unit designator and the secondary range. Address Server

28

SAP BusinessObjects Data Services Getting Started Guide

Glossary

A process that provides address validation and correction for the Global Address Cleanse transform's EMEA engine and Global Suggestion Lists transform. Administrator A browser-based system administration application on the Data Services Management Console. Use the Administrator to do the following: Execute, schedule and monitor batch jobs Add connections to repositories Configure the profiler Define users for multi-user development (central repository) Manage the retention of logs files Monitor Access Server status and inbound/outbound messages Configure Adapter instances (a prerequisite for creating adapter datastores) Configure SAP application client interfaces (to read IDocs) Configure, start, stop and monitor real-time services Configure Data Services jobs callable as webservices and generate WSDL Set up the SAP RFC Server (to load data into or read data from an SAPNetWeaver BW system). after-image The values in an UPDATE row after the row changes. You use before- and after-images of UPDATE rows for log-based changed-data capture (CDC) jobs which Data Services supports. aggregate function A function that summarizes data (sums, calculates an average, identifies a maximum value, and so on). Where possible, Data Services pushes down the execution of the aggregate function to the underlying Relational Database Management System). aggregated data Data that results when a process combines elements. This data can be presented collectively or in summary form. ALE (Application Link Enabling) An SAP Applications programming-related interface designed to allow reliable communication across a distributed environment. Implemented in Data Services with the iDoc interface.

SAP BusinessObjects Data Services Getting Started Guide

29

Glossary

alias Alternate form or name. Data Services uses aliases in multiple ways, including the following: Aliases are alternate forms that could potentially be matched to the word. For example, Robert is a personal name alias for Bob. Alias data is output in the Match_Std fields. In the Address Cleanse transforms, an alias is an alternative form of a primary address line. Aliases apply only to primary addresses (usually streets), not secondary addresses or last lines. You can also create multiple aliases for table owners in a datastore and then use datastore configurations to change the alias values. By using aliases instead of real owner names, you limit the amount of time it takes to port jobs to different environments. AMAS Australia Posts Address Matching Approval System (AMAS). To receive postal discounts in Australia, you are required to file an AMAS report. application Another term for a software program. association matching A method of matching that combines the results of two or more Match transforms by using the Associate transform. Association matching is used to find duplicates based multiple different match criteria (for example based on Name+Address and then SSN+DOB) and bring them together. A common use for association matching is to identify customers who have multiple residences. Examples of such customers could include students and snowbirds. attribute A property created for a type of object. BAPI Business Application Programming Interface. A standardized SAP Applications programming interface that allows non-SAP applications to access specific business processes and data. Basis

30

SAP BusinessObjects Data Services Getting Started Guide

Glossary

The SAP infrastructure. Basis is the foundation for all SAP products based on ABAP. batch Executes one job or a series of jobs all at one time. After batch processing begins, it continues until it is done or until an error occurs. batch job The unit of work that can be scheduled independently for execution by the Administrator. Jobs are special work flows that can be scheduled for execution, but cannot be called by other work flows or jobs. before-image The values in an UPDATE row before the row changes. You use before- and after-images of UPDATE rows for log-based changed-data capture (CDC) jobs which Data Services supports. best record Contains the most complete, accurate, and up-to-date information. A best record is created by consolidating data elements from matching records into a single record. For example, suppose you found two records that match. One record has a phone number that is different and more current than the other. You can move the more current phone number into the other record to create your best record. A master record in a match group is also considered a best record, based on the best record priority assigned to the source that the record was in. best record priority Best record priority is a way for you to designate data from a particular source as having more importance than other data. For example, because your data warehouse meets your standards for data, it might carry more weight in the matching process than would a rented source. The smaller the priority number, the higher the priority, and the more likely that records from that source will rise to the top of their match groups to become master records. Assign a priority of 0 to your best source, and larger numbers to other sources. The blank penalty can affect the value of the best record priority. blank penalty
SAP BusinessObjects Data Services Getting Started Guide 31

Glossary

In the Match transform, tells Data Services that records with blank fields should be considered less important (as driver or as Master record) than records with completed fields (blank data = bad data). Blank penalties increase the value of the best record priority for the source that the blank field exists in, thereby reducing the priority of the source. Lowering the priority of a source helps ensure that the records in that source will not become the master record (or best record) of a match group. BLOB A field whose data consists of Binary Large Objectssuch as bitmap graphics, images, OLE objects, metafiles, and so on. blueprint A sample Data Quality job that can be used by Data Services without modification. Boolean expression An expression that defines a logical relationship between two or more items. The expression is either TRUE or FALSE. breadcrumb A visual path of your location in the application. break group Places records into groups that are more likely to match. For example, you might want to create a break group based on the first three digits of the postcode. This break group will ensure that records with a postcode of 546 are never even compared with records that have a postcode of 611, saving valuable processing time for all but the smallest jobs. Break groups consist of driver and passenger records. Fields commonly used for creating break groups are postcodes, account or Social Security numbers, or the first two positions of a street name. break key A user-defined field that is used to create break groups. Create a break key if the data you want to break on is contained in multiple fields, such as the postcode and street name. bulk loading A software-based mechanism that moves large amounts of data into a database to achieve optimal performance. Bulk loading is faster

32

SAP BusinessObjects Data Services Getting Started Guide

Glossary

than traditional INSERT statements. This mechanism supports compression, blocking, and buffering to optimize transfer times. business component A set of tables Siebel applications use to create a logical object called a business object. business rules 1. Settings within your Data Quality transforms that explain how you want to process your data. These include things like telling the Global Address Cleanse transform how to case output data, or setting up match criteria for a matching process. 2. Business rules can also be used to group validation rules from Validation transforms for display in the Data Validation reports in the Management Console. Business views Business views in Crystal Reports enable you to control the presentation of your database to report designers and users. case-sensitive Pertaining to the differentiation between upper-case and lower-case letters. A case-sensitive program differentiates between upper-case and lower-case letters when evaluating a text string. CASS A United States Postal Service (USPS) certification that requires software vendors to go through a series of tests to prove that their software correctly codes addresses according to USPS requirements, and produces the required USPS reports. Long form: Coding Accuracy Support System CDC checkpoint A CDC checkpoint enables Data Services to restrict CDC subscription reads. After you enable a checkpoint, the next time the CDC job runs, it reads only the rows inserted into the CDC table since the last checkpoint. CDC datastore A CDC datastore allows you to limit extracted data to changed data only. A CDC datastore connects a changed-data capture table on a source database to Data Services. CDC subscription

SAP BusinessObjects Data Services Getting Started Guide

33

Glossary

A CDC subscription is an option on a source CDC table. You can define multiple subscriptions on the same CDC table to allow different data flows to extract data from the same table without corrupting data extracted by other data flows. A subscription defines the start and end of your data set, and it is often used with the check-point option. changed-data capture (CDC) The process of retrieving changes made to a production data source. This process consolidates units of work, ensures data is synchronized with the original source, and reduces load times by loading only changed data in a warehouse environment. Citrix MetaFrame XP Citrix MetaFrame XP software provides an access infrastructure for enterprise applications. You can use this software to run Data Services on a server which publishes instances of the Designer and other Data Services interfaces to users on client computers. classifications Indicators to Data Cleanse of the types of situations that apply to this word. For example, Hewlett is assigned the Firm_Name and Name_Weak_Family_Name classifications, because it can be used in both firm and personal names. client/server A distributed technology approach where the processing is divided by function. The server performs shared functions (such as managing communications and providing database services), while the client performs individual user functions. command A directive given to a program to initiate an action. Communication Structure In SAP NetWeaver BW, a data structure that defines a set of InfoObjects available from an InfoSource to put into InfoCubes. compare buffer A part of memory reserved for processing break groups (one break group at a time) in the Match or Associate transform. A larger buffer typically helps improve performance. conditional

34

SAP BusinessObjects Data Services Getting Started Guide

Glossary

A single-use object, available in work flows, that allows you to branch the execution logic based on the results of an expression. The conditional takes the form of an if/then/else statement. constant A data string that does not change from one record to the next. content type Specifies the type of data in a field in your data source. This helps you map your fields when you set up downstream transforms. contribution value A value you assign to a match criteria that represents the importance (or weight) you place on that criterias data. For example, your organization may place a high degree of importance on the customer number. For these types of criteria you would assign a higher contribution value to reflect a higher importance. The contribution value is part of weighted scoring. Crystal Reports A reporting tool that allows users to create feature-rich reports and integrate them into web and Windows applications. Ctrl-click An action to select multiple values within an application. This accomplished by pressing the Control key and using the mouse. cube 1. A multi-dimensional or OLAP database in which data is summarized, consolidated, and stored in "dimensions" (each representing information such as customer or product line) and "measures" (for example sales, cost, or profit), enabling improved processing time and storage space requirements over traditional data storage methods such as relational databases. 2. The combination of indexes (dimensions and measures) stored in SAP NetWeaver BW Accelerator. custom ABAP program A custom ABAP program runs an ABAP program and generates a data set. With a custom ABAP program, you can run an existing ABAP program as part of a job. Use a custom ABAP program as a source in a data flow or an ABAP data flow. custom adapter

SAP BusinessObjects Data Services Getting Started Guide

35

Glossary

An adapter developed using the Data Services Adapter Development Kit. custom function A script you create to evaluate or make calculations on input values and produce a return value. Data Cleanse A transform that identifies and isolates specific parts of mixed data, and then standardizes the data based on information stored in the parsing dictionary, business rules defined in the rule file, and expressions defined in the pattern file. data extraction The process of moving data from a database or application source to a database target (either from a legacy database to a data mart, or from one data mart to another). data flow A reusable object containing steps to define the transformation of data from source to target. Data flows are called from inside a work flow or job. You can pass information into or out of data flows using parameters. data loading The process of populating a data warehouse. Data loading is provided by DBMS-specific load processes, DBMS insert processes, and independent fast-load processes. data mapping The process of assigning a source data element to a target data element. data mart A highly-focused version of a data warehouse. Typically, created by a department or division of a company, data marts contain data for a specific subject area, such as finance or sales. Data Services can populate a data mart. data movement The aspect of the data integration process that includes extraction, data transformation, and loading (ETL). That which the application accomplishes as a whole. Do not confuse with data transformation, which is what happens within one phase of a data flow. data record
36 SAP BusinessObjects Data Services Getting Started Guide

Glossary

A row of data that is constructed at runtime. The data remains in the form of the data record throughout the Data Services job. data salvage The process of temporarily copying data from a passenger record to the driver record after the two records are compared. The data thats copied is data that is found in the passenger record, but is missing or incomplete (initials, for example) in the driver record. Data salvaging prevents blank matching or initials matching from matching records that you may not want to match. Data Services A software system that allows users to build and execute applications with which they can create and maintain data warehouses. Data Services consists of several components: Data Services engine The core process that reads job information from the Data Services repository and sets up run-time processes that execute the job. The run-time processes extract, transform, and load relational and hierarchical data. The Job Server starts the Data Services engine to execute batch or real-time jobs. Data Services interface A program that Data Services uses to access data sources. Specific interfaces vary by installation. There are internal interfaces (those native to the installation) and external interfaces (those that you install separately). Internal interfaces allow Data Services to access applications like SAP Applications and SAP NetWeaver BW, messages, relational database systems, and legacy systems. An external interface is also known as an adapter. It allows Data Services to access applications using information exchange technologies such as JMS (Java Messaging Services) or Salesforce.com. Data Services repository The database that contains information about a Data Services application. The repository contains information about defined reusable objects, the metadata for sources and targets, transforms and functions. The repository also contains the job history and runtime statistics information. When you invoke Data Services, you log in to the repository containing the objects you want to use. You can use a local repository or a central (shared) repository.

SAP BusinessObjects Data Services Getting Started Guide

37

Glossary

The Data Services profiler uses a profiler repository to store profiling data. The Cleansing Packages repository stores reference data for the data cleansing transform. All repositories are created and maintained with the Repository Manager. Data Services service The process that ensures that the Access Server and the Job Server are running. You can configure the Data Services service to restart the Access Server and Job Server whenever the computer where they are located restarts. data set Rows of data with a defined schema. A step in a data flowsuch as reading data from a source, joining data in a Query transform, or transforming data though another transformyields a data set. You can view individual data sets by placing a target table or file at that point in the data flow. data source name (DSN) Provides connectivity for a Windows user to a database through an Open Database Connectivity (ODBC) driver. The DSN may contain: database name, directory, database driver, user ID, password, and other information. data transformation The phase of the data movement process that occurs between extraction and loading. Do not confuse with data movement, which is what the data flow accomplishes as a whole. Data transformation describes a process, while a transform is a tool (a step, icon, or object) in Data Services that enacts the transformation (such as query, merge, or data cleanse). data transport A step in an ABAP data flow that defines a target to store the data set extracted during the flow. You can locate the target file on the SAP Application server or in a location accessible to both the SAP Application server and to Data Services across a network. data type The format used to store a value. Data types can imply a default format for displaying and entering the value. Data read from a source is converted to the appropriate Data Services data types; data loaded

38

SAP BusinessObjects Data Services Getting Started Guide

Glossary

to a target is converted from its Data Services data type to the type appropriate for the target. data validation Defining rules to which correct data should conform. In Data Services, you define these rules in the Validation transform. You can separate data that passes the validation rules from failed data. Data Validation dashboard A category of graphical reports in the Management Console to evaluate the reliability of your target data based on the validation rules you created in your Data Services batch jobs. This feedback allows business users to quickly review, assess, and identify potential inconsistencies or errors in source data. data warehouse A Data Warehouse houses a standardized, consistent, clean and integrated form of data sourced from various operational systems in use in the organization, structured in a way to specifically address the reporting and analytic requirements. Data Services can populate a data warehouse. database A collection of tables managed by a DBMS such as Microsoft SQL Server or Oracle. database link Communication path from one database server to another. The datastores in a database link relationship are called linked datastores. Data Services uses linked datastores to enhance its performance by pushing down operations to a target database using a target datastore. DataConnector DataConnector operator instances are used to read data files generated by Data Services when performing bulk loading using the Teradata Warehouse Builder. datastore A logical channel connecting Data Services to a source or target application. Different datastore types include database, application, web service, and adapters. The datastore definition typically includes the name and location of the database as well as user authentication information. Data Services uses a datastore definition to qualify a

SAP BusinessObjects Data Services Getting Started Guide

39

Glossary

table name wherever a table is indicated in a diagram or expression. You can access the datastore definition through the object library. datastore configuration Defines a connection to a particular database from a single datastore. DBMS (database management system) A software system that builds and maintains database tables. debug mode Allows you to diagnose errors while executing a job using the interactive debugger features in the Designer. degree of parallelism (DOP) A property of a data flow that defines how many times each transform defined in the data flow replicates for use on a parallel subset of data. For example, if you set the Degree of parallelism to 4, then when the job executes, Data Services replicates each transform in the data flow four times. Each of these replicated transforms executes in parallel using a separate thread. The operating system will distribute the threads among the available CPUs. delimited flat file A data file in which each column value is separated by a delimiter, such as a comma, semicolon, tab, space, and so on. Each row starts a new line. delimiter Data Services has three types of delimiters: column, row, and text (character string). To separate columns, a delimiter can be a tab, semicolon, comma, space, or any character sequence. To separate rows of data, a delimiter can be a {new line} or any other character sequence. To denote the start and end of a character string, a delimiter can be single quotation marks ('), double quotation marks ("), or {none}. delivery point code A two-digit number derived from the primary range (house number). This number is used in the generation of a DPBC barcode. Delivery Point Validation (DPV) A technology that assists you in validating the accuracy of your address information with the USA Regulatory Address Cleanse

40

SAP BusinessObjects Data Services Getting Started Guide

Glossary

transform. With DPV, you can identify addresses that are undeliverable as addressed and determine whether or not an address is a Commercial Mail Receiving Agency (CMRA). Designer A graphical user interface that allows you to design and test Data Services jobs. destination record A location where you place your updated or best data when creating a best record. A destination record can be either a master record, a subordinate record, or both in a match group. diacritical character A character that contains an accent, dieresis (umlaut), tilde, cedilla, or other distinguishing marks (for example, or ). You can choose to have standardized data with these types of characters. The application uses the Latin-1 code page for assigning these accents. diagram The icons and connections between the icons that make up the definition of a job, work flow, or data flow. Diagrams appear in the Designer workspace. dictionary Relational database that contains a lexicon of words and phrases that the data cleansing packages and the Data Cleanse transform use to identify, parse, and standardize data. directional A component of the address line that indicates direction. For example, North in 211 N. 115th St. discrete field Input or output data that has separate fields for each piece of information, such as addresses and names. discrete format Input source format in which pieces of data are parsed down to nearly the most distinct level. For example, a first name field would be discrete, whereas a name field that could contain first, middle, or last name information would not be discrete. domain value

SAP BusinessObjects Data Services Getting Started Guide

41

Glossary

In PeopleSoft, the category name (or link) between a value and its description. downstream A data flow object, such as a transform, that is placed after another data flow object in a job. DPBC (Delivery Point Barcode) A form of Postnet barcode, consisting of 62 bars and based on the combination of ZIP Code, ZIP+4, DPBC, and a check digit. drill down A method of exploring detailed data that was used in creating a summary level of data. Drill-down levels depend on the granularity of the data in the data warehouse. driver record A record that drives the comparison process. Driver records are part of a break group and are compared with passenger records to determine matches. Driver records are chosen based on the driver order you assign to a source. (In general, a source with your best data should be used first.) After a driver record has been compared with all of the passenger records, the next passenger record in the break group becomes the driver record. If you do not reorder your break groups using Group Prioritization, the driver record is the first record in the break group. DTD Document type definition. A text file that describes the elements (tags) in an XML document and the relationship among them. When an XML document is used to describe a transaction, the DTD describes the data schema used in the transaction. dual address A dual address occurs when a record contains two address lines. Two combinations are typical: PO box and street address: 1000 Main Street, Suite 51 PO Box 2342

42

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Rural route or Highway Contract and street address: RR 1 Box 345 12784 Old Columbus Road

dual names Two names included on an address line, for example, John and Jane Doe. Early Warning System (EWS) A solution for matching valid delivery points that have been created between updates to the national ZIP+4 directory. EWS uses four months of rolling data found in an intermediate directory that is updated weekly with data from the USPS. EDI Electronic Data Interchange. Electronic exchange of structured data between businesses. This exchange is not dependent on hardware, software, or communication protocols. element A component found within XML Schemas and DTDs. eLOT Enhanced Line of Travel (eLOT) takes Line of Travel one step further in the presorting process. The original line of travel (LOT) narrowed down the mail carriers delivery route to the block face level (ZIP+4 level) by discerning whether an address resided on the odd or even side of a street or thoroughfare. eLOT narrows the mail carriers delivery route walk sequence to the house (delivery point) level. This allows you to sort your mailings to a more precise level. embedded data flow A data flow with an open begin or an open end point that can be used inside another dataflow. An embedded dataflow can be a combination of sources or targets and transforms, and is mainly used to reduce the visual complexity of a diagram in a dataflow. An embedded dataflow can be re-used in multiple other dataflows. Enterprise application Enterprise applications enable enterprises to execute and optimize business and IT strategies in domains like ERP (Enterprise Resource

SAP BusinessObjects Data Services Getting Started Guide

43

Glossary

Planning), CRM (Customer Relationship Management) or SCM (Supply Chain Management). Enterprise applications usually store data in a relational database optimized for operational use. SAP provides these solutions through the SAP Business Suite. Data Services supports both SAP's own solution as well as third-party solutions like Oracle e-Business Suite, Siebel, JD Edwards or PeopleSoft. ERP system (Enterprise resource planning system) . An enterprise application from which Data Services can extract data. SAP offers this system as a solution part of the SAP Business Suite. exception An error that occurs while executing a job. You can catch individual or groups of exceptions using a try/catch block inside a work flow. Catching an exception allows you to automatically execute a solution for the error. expression A combination of variables, parameters, constants, and functions linked by operation symbols and any required punctuation that describe a rule for calculating a value. Expressions are used in conditionals, functions, scripts, transforms, and while conditions to route information and change fields. extract date The date that data was extracted. extract frequency The interval at which data is extracted, such as daily, weekly, monthly, or quarterly. The frequency that data extracts are needed in the data warehouse is determined by the shortest frequency requested through an order, or by the frequency required to maintain consistency of the other associated data types in the source data warehouse. fault code A numeric value that is assigned to a record after the USA Regulatory Address Cleanse transform validation process that signifies that the particular record was not successfully validated. Each numeric value represent a different type of fault. file format

44

SAP BusinessObjects Data Services Getting Started Guide

Glossary

A description of how data is or should be organized in a file Data Services reads from or loads to. A file format can be specific to a single file or generic for many files. filter An expression that limits the data returned. fixed-width flat file A data file in which each column of data is the same width. flat file A flat file is a file containing records, generally one record per line. Fields may have a fixed width with padding, or be delimited by tabs, commas (CSV), or other characters. There are no structural relationships. The data is flat like a sheet of paper, rather than to more complex models such as a relational database. FSA (Forward Sortation Area) The first three characters of a Canadian alphanumeric postal code. For example, K1A in the postal code for Canada Posts Ottawa headquarters, K1A 0B1. function A program that operates on values that are passed to it. Data Services functions are available through a function wizard in a script, conditional, or Query transform. Data Services also gives you access to functions provided by the DBMS you are using. In addition, you can define your own functions using the Data Services scripting language. gathering Recombines terms that belong together, such as alphanumeric terms that you would look up together in the dictionary. For example, if Data Cleanse breaks 1st into "1" and "st", then gathering recombines them to 1st. gender A code that indicates the likelihood of a record being a certain gender. This code is derived from the name and has five possible values: strong male, strong female, weak male, weak female, ambiguous, and unassigned. For example, a record marked as strong male indicates a high likelihood that the person is male. generated field

SAP BusinessObjects Data Services Getting Started Guide

45

Glossary

A field that is generated on output by a transform. For example, a postcode field generated by the Global Address Cleanse transform. GeoCensus A directory that contains latitude, longitude, census tract, and block information. That information sets the stage for mapping, demographic marketing, and other applications of your address data. global suggestion lists Global suggestion lists offer a way to complete and populate addresses with minimal data, or it can offer suggestions for possible matches. This address-entry system is ideal in call center environments or any transactional environment where data cleansing is necessary at the point of entry. It's also a research tool to manage bad addresses from a previous batch process. Global suggestion lists are available with the Global Suggestion Lists transform. highest level object The object that is not a dependent of any object in the object hierarchy. host name The computers network name (or IP address). Used most often in Data Services to specify a computer where the Web application, the Access Server, the Job Server, and real-time services reside. hybrid format A format for records in which some fields are discrete, whereas others are in a multiline format. IDoc Intermediate Document. An SAP-specific format. Used for EDI (Electronic Data Interchange) and ALE (Application Link Enabling). IDoc type Indicates the SAP format that is used to interpret the data of a business transaction. Consists of the following components: A control record: Identical for each IDoc type. Several data records: A single data record consists of a fixed key part and a variable data part. The data part is interpreted using segments, which differ depending on the IDoc type selected.

46

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Several status records: Identical for each IDoc type. Describe the status states an IDoc has already passed through or the status an IDoc has attained.

impact and lineage analysis The category of reports on the Management Console that shows the relationship between source and target tables on Data Services, and with SAP BusinessObjects Enterprise objects such as universes, business views, and reports. import The process of acquiring information for the Data Services repository. Import the following kinds of information into Data Services: The metadata for source and target databases Descriptions and code for user-defined and DBMS functions and transforms ATL or XML files with definitions of Data Services objects that were previously exported out of a another Data Services repository. InfoArea In SAP NetWeaver BW, an element for grouping meta-objects in the BW system. Each InfoProvider is assigned an InfoArea. The resulting hierarchy is displayed in the Data Warehousing Workbench. In addition to their properties as an InfoProviders, InfoObjects can also be assigned to different InfoAreas. InfoCube In SAP NetWeaver BW, a type of InfoProvider. An InfoCube describes a self-contained dataset (from the reporting view), for example, for a business-oriented area. This dataset can be evaluated with the BEx query. An InfoCube is a set of relational tables that are created in accordance with the star schema: a large fact table in the center, with several dimension tables surrounding it. InfoObject In SAP NetWeaver BW, Business evaluation objects (for example, customers or sales) are called InfoObjects.

SAP BusinessObjects Data Services Getting Started Guide

47

Glossary

InfoObjects are subdivided into characteristics, key figures, units, time characteristics, and technical characteristics (such as request numbers). InfoPackage In SAP NetWeaver BW, describes which data in a DataSource should be requested from a source system. The data can be precisely selected using selection parameters (for example, only controlling area 001 in period 10.1997). An InfoPackage can request the following types of data Transaction data Attributes for master data Hierarchies for master data Master data texts InfoPackages are also used to start Data Services jobs to load data into SAP NetWeaver BW. InfoSource In SAP NetWeaver BW, a structure that consists of InfoObjects and is used as a non-persistent store to connect two transformations. input fields Original fields in your input sources. interactive debugger A Designer feature that allows you to step through the data of a job one row at a time using filters and breakpoints on a line. Like executing a job, you can start the interactive debugger from the Debug menu when a job is active in the workspace. While in debug mode, all other Designer features are set to read-only. interface Data Services offers two types of interfaces: An internal Data Services interface allows you to create datastore connections to natively supported applications. An external Data Services interface (or adapter) allows Data Services to communicate with information exchange technologies such as the Salesforce.com adapter. intersource match

48

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Match between records of different sources. intrasource match Match between records within a source. JDBC A Java API developed by Sun Microsystems that acts as an interface between a developers Java code and a database. It provides a mechanism for the developer to connect to a specified database, request information about the database, and then select information from it. Long form: Java Database Connectivity job The unit of work that can be scheduled independently for execution by the Administrator. Jobs are special work flows that can be scheduled for execution, but cannot be called by other work flows or jobs. Job Server A process that receives requests from the Designer and the Administrator to start and stop jobs. To start batch or real-time jobs, the Job Server triggers the Data Services engine. Engine processes run on the same computer as the Job Server process that triggers them. join rank A value given to or calculated for all data sets in a data flow. Data Services uses the join rank to determine which source to read first when assembling the data set in a join. Data Services uses the source with the lower join rank as the inner source of the join and uses the source with the higher join rank as the outer source of the join. key A value used to identify a record in a database. key figure In SAP NetWeaver BW, an InfoObject that represents a numeric fact. lastline The lastline of an address contains components such as the locality, region, and postcode (and it may contain the country name). license-controlled feature

SAP BusinessObjects Data Services Getting Started Guide

49

Glossary

A Data Services feature that is enabled or disabled based on the product license. The product license controls which icons and settings are available in Data Services as an internal Data Services interface. line of travel (LOT) A sorting sequence in which ZIP+4 codes are arranged in the order that they are served by the mail carrier. LOT sequencing is required for some bulk mailing discounts. linked datastores The datastores in a database link relationship. A database link stores information about how to connect to a remote data source, such as its host name, database name, user name, password, and database type. Data Services uses linked datastores to enhance its performance by pushing down operations to a target database using a target datastore. Local Delivery Unit (LDU) The last three characters of a Canadian alphanumeric postal code. For example, 0B1 in the postal code for Canada Posts Ottawa headquarters, K1A 0B1. locale A set of parameters that define the user's language, country and any special variant preferences that the user wants to see in their user interface. A locale identifier consists of a codepage, a language identifier and a region identifier. locality A part of the address line of a record. Locality most often refers to the city or town. In some countries, such as the United Kingdom, locality can extend to include district. Locatable Address Conversion System (LACS) A database of addresses that have been permanently converted, usually due to 911 emergency system implementation. The changes often consist of conversion from rural-style addressing to standardized, city-style addressing, or renumbering of existing city-style addresses. lookup table Contains data that other tables can reference with lookup functions that return one or more output columns. mail piece unit

50

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Typically referred to as a version identifier for printers, it represents the unique characteristics of a portion of a mailing. Every segment within a Mail.dat must have at least one mail piece unit. mapped field A field in a specific transform, for which it has been defined which field it should read from upstream transforms. master record The first record in a match group. You can control which record is the master record by using the Group Prioritization operation in the Match transform. match criteria A group of options that determine the rules for matching on particular data. match group A group of records found to be matching with each other. A match group consists of a master record and subordinate records. match level A Match level designates the level in "hierarchically" type matching. One Match set can have one or more match levels. Duplicates that are found at one level are passed to the next level, where they are compared based on that levels keys, and so on. For example, you could use multiple match levels if you wanted to detect duplicates at the household (residence), family, and individual level. The order of the match levels is important because duplicates are found at each level, and only the results are made available for the next level. Usually, you will define your broadest match levels first, followed by more specific match levels. match set A group of criteria used to perform matching on your data. A typical setup might have only select data reaching each match set for comparison. For example, you might want to exclude blank SSNs (Social Security Numbers), certain foreign addresses, and so on from reaching a particular match set. A match set also allows for multiple match sets to be considered for association in a combined match set. matching record

SAP BusinessObjects Data Services Getting Started Guide

51

Glossary

A group of records found to be matches based on the criteria and business rules you choose. The records do not necessarily have the same data. memory datastore A datastore connection/container for memory tables. memory table Internal Data Services table used to store a data set in memory while a job runs. Use instead of staging tables to improve performance of a real-time job built with multiple data flows. Use a memory table to move a data set between data flows. message Represents hierarchical data (such as a header with line items) for document-oriented transactions (such as a purchase order). metadata In Data Services, information acquired and maintained to describe tables in source and target databases. This information includes the names of tables and their columns, and the data types of the columns. In general, metadata typically includes a description of data models, a description of the layouts used in database design, the definition of the system of record, the mapping of data from the system of record to other places in the environment, and specific database design definitions. multi-source Records that appear on two or more sources. For example, lets say youre bringing together customer sources from several direct marketers or publishers. Your best prospects may be the people whose names appear on two or more sources, indicating they may be most receptive to your offer. multiline The multiline format is a database record format in which address data is not consistently located in the same arrangement in all records. That is, data items float among fields. For example, an input source may have fields named Line1, Line2, Line3, and Line4 that contain various categories of name and address data, as well as non-address data. nested data

52

SAP BusinessObjects Data Services Getting Started Guide

Glossary

Data in one table that is related to a single row of another table. A nested table appears in Data Services as a column in a parent table. Columns in the nested table can themselves contain tables. normal source A source of records that the application should consider to be good, eligible records in a matching or association process. North American Numbering Plan (NANP) Telephone numbering plan shared by 19 North American countries. These countries include the United States and territories, Canada, Bermuda, Anguilla, Antigua & Barbuda, the Bahamas, Barbados, the British Virgin Islands, the Cayman Islands, Dominica, the Dominican Republic, Grenada, Jamaica, Montserrat, St. Kitts and Nevis, St. Lucia, St. Vincent and the Grenadines, Trinidad and Tobago, and Turks & Caicos. null The absence of a value within a database field for a given record. It does not mean zero because zero is a value. object Any item that you create in the Designer. Data Services distinguishes two classes of objects: reusable objects that are complete and can be reused in your projects (such as data flows) and single-use objects that only appear as components of other objects (such as a try/catch block). This distinction affects how you create and retrieve each type of object. object definition The options that describe the operation of an object. To view and modify an object definition, open the object so that its definition appears in the workspace. object dependent An object associated beneath the highest level object in the hierarchy. object library A tool in the Designer that gives you access to reusable objects. object version An instance of an object. Each time a you add or check in an object to the central repository, Data Services creates a new version of the

SAP BusinessObjects Data Services Getting Started Guide

53

Glossary

object. The latest version of an object is the last or most recent version created. ODBC (Open Database Connectivity) A standard developed by the Microsoft Cooperation. It is an interface that gives applications the ability to retrieve data in data management systems using SQL for accessing the data. Such an interface allows a developer to develop, compile, and ship applications without targeting specific database management systems. ODS (Operational data store) An OLAP-designed relational database that an enterprise has designated as the operational database of record (for example, a finance department might use an ODS to close its books). OLAP (Online Analytical Processing) An approach to quickly answer multi-dimensional analytical queries. Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. OLAP systems are used in a query environment, such as for a business intelligence application. OLTP Online transaction processing. A relational database design optimized for operational use. OLTP systems are used in an operational environment, such as for an enterprise application. open hub destination An object within the open hub service that contains all information about a target system for data in an InfoProvider. The open hub service enables you to share data from an SAP NetWeaver BW system to non-SAP data marts, analytical applications, and other applications such as Data Services. It ensures controlled distribution and the consistency of data across several systems. operation code A flag associated with a row in a data set that indicates the status of the data in the row. The operation codes are INSERT, UPDATE, DELETE, and NORMAL. operational dashboard A category of reports on the Management Console to see at a glance the status and performance of job and data flow executions over a given time period.

54

SAP BusinessObjects Data Services Getting Started Guide

Glossary

option Business rules that can be set for a Data Quality transform that specify how you want to process your data. Each Data Quality transform has a different set of available options. Options and their values are displayed in the Option Editor. Option Editor A tab in a Data Quality transform editor through which you can change the value for each option within the transform. Option Explorer A pane in the Associate, Match, and User-Defined transform editors. The Option Explorer shows a list of the option groups within a transform. option group Contain a set of options that allow you to set different business rules for a transform. These are displayed in the Option Explorer. other source In a Match transform, a source of records that should be treated as transparent, such as seed sources. They are not counted in determining how to characterize a match groupfor example, multi-source or single-source. For example, some mailers use a seed source of potential buyers who report back to the mailer when they receive a mail piece so that the mailer can measure delivery. parameter A value passed to a work flow or data flow when that flow is called. partition To divide table data into sets based on a criteria such as a range or list of values in each row. You can configure Data Services to read and write partitioned table data in parallel threads. Designing jobs with partitioned table data can improve job performance if a Job Server's computer memory and number of CPUs supports the job's parallel-processing configuration settings. passenger record The records that are compared against driver records in a break group. After a driver record has been compared with every passenger record in a break group, a passenger record can become the new driver record in the break group, or it can be found to be a match

SAP BusinessObjects Data Services Getting Started Guide

55

Glossary

with a driver record. At this point it is taken out of the comparison process. pattern file User-defined patterns are stored in a pattern file. The pattern file is a plain text file and can be edited in any text editing program. The pattern file is used by the Data Cleanse transform. pick list A type of list returned by the Global Suggestion Lists transform that is used to narrow down an address by starting with minimal information. A pick list returns possibilities in a similar manner to a suggestion list. You can pick an entry from this list to continue processing. PMB (Private mail box) Private mail boxes are like post-office boxes but they are hosted by private companies. The USA Regulatory Address Cleanse and the Global Address Cleanse transforms can recognize certain forms of PMB data when it appears in an address line. postal address A delivery address that is a rural route or box number. postal code A system of letters and/or digits used for sorting mail. Examples include the ZIP Code used in the United States and the alphanumeric FSA LDU system used in Canada. postcode move A valid postcode that has been split or moved, so only a portion of the area that had been covered by the one postcode now has two or more postcodes, including the original one, for the same area. Postcode2 The secondary part of a postal code. For example, in the United States, a postcode is composed of two parts (54601-4051). The first five digits are followed by a hyphen and a four-digit code. The four-digit code is the Postcode2 for a US postcode. prepackaged adapter An adapter prebuilt and purchased from SAP, such as the Data Services Salesforce.com adapter. primary entry

56

SAP BusinessObjects Data Services Getting Started Guide

Glossary

A word or phrase in the dictionary that the data cleansing packages and Data Cleanse transform use to identify, parse, and standardize data. primary key A column that is guaranteed to contain unique values, and whose values identify all of the rows in a table. project The collection of jobs available in the Designer at a given time. A project provides a way to organize the objects you create. property Detailed descriptive information about objects that you display on the Designer. It includes information such as when it was created. Query transform A data transformation object that you can use to map columns from a source to a target schema, add new columns to the target schema, determine the data to extract, and perform operations on the data. Similar to an SQL SELECT statement, a query creates a data set that satisfies the conditions you specify. Rapid Mart Rapid Mart packages provide prebuilt data mart solutions for enterprise applications, such as SAP, PeopleSoft, Oracle, and Siebel. These powerful solutions combine domain knowledge and data integration best practices in prebuilt data models, transformation logic, and data extraction. Rapid Marts packages are add-ons to Data Services. real-time job A group of objects (data flows, work flows, conditionals, scripts, and so forth) that execute on-demand as a "request-response" system. You design real-time jobs in the Designer, then configure them as real-time services and associate them with an Access Server in the Administrator, where they are started, managed and monitored. When a real-time service receives a request from a caller, it processes the request and returns a reply. reference file A file of address data used by Data Services to match, assign, standardize, and verify addresses. Reference files are also referred to as postal directories. These files have a .dir extension.

SAP BusinessObjects Data Services Getting Started Guide

57

Glossary

relational data A data set in which data in each column contains a scalar value. Data Services can process relational data; it can also process nested data. repository See Data Services repository. request/acknowledge operation This operation is used to execute a remote HTTP service in the Request Acknowledge mode. In other words, it makes the request to the remote machine where the HTTP Adapter server is running and does not wait for the reply; instead, it sends an acknowledgement if the operation is successful. request/reply This operation is used to execute a remote HTTP service in the Request Reply mode. In other words, it makes the request to the remote machine where the HTTP server is running and waits for the reply. reusable object An object (such as a data flow, datastore, or job) that can be defined, stored, and reused independent of other objects. Any object that is visible in the object library. RFC (Remote Function Call) server The Data Services RFC server allows third-party programs, including SAP Applications and SAP NetWeaver BW, to schedule and initiate Data Services jobs and return the results to Data Services. RFC server Interface The node on the Administrator application of the Data Services Management Console where you configure SAP connections to load data into or read data from an SAP NetWeaver BW system. Data Services uses the RFC server interface to to schedule SAP jobs, read from SAP open hub destinations, load data into SAP NetWeaver BW, and to view Data Services logs from SAP NetWeaver BW. rule file For the Data Cleanse transform, the rule file controls how the application parses groups of output type subcomponents for name, firm, phone, SSN, and other non-address data.

58

SAP BusinessObjects Data Services Getting Started Guide

Glossary

For example, if you input Mr. and Mrs. John Smith, the application could parse it into the individual components Mr., and, Mrs., John, Smith. This is very useful, but generally, you would also want to parse the whole group of related data Mr. and Mrs. John Smith. To parse data in this way, you must create rules. rule matching Matches the token classifications against defined rules. sample size The number of rows to display in the View Data feature. sampling rate The number of rows processed after which Data Services writes information to the monitor log file and updates job events. sampling rows The frequency to select a sample row to profile, starting with the first row of the specified number of sampling rows. For example, if you set Profiling size to 1000000 and set Sampling rows to 100, the Profiler profiles rows number 1, 101, 201, and so forth until 1000000 rows are profiled. SAP Applications An ERP system. Formerly known as SAP R/3 or SAP ERP. SAP BusinessObjects Enterprise A business intelligence platform that powers the management and secure deployment of specialized end-user tools for reporting, query and analysis, and performance management on a scalable and open services-oriented architecture. SAP BusinessObjects InfoView A web-based interface that end users access to view, schedule, and keep track of published reports. InfoView consolidates the presentation of a company's business intelligence information and allows it to be accessed in a way that is secure, focused, and personalized to users inside and outside an organization. SAP BusinessObjects Rapid Mart SAP BusinessObjects Rapid Mart packages provide prebuilt data mart solutions for enterprise applications, such as SAP, PeopleSoft, Oracle, and Siebel. These powerful solutions combine domain knowledge and data integration best practices in prebuilt data

SAP BusinessObjects Data Services Getting Started Guide

59

Glossary

models, transformation logic, and data extraction. Rapid Marts packages are add-ons to Data Services. SAP BusinessObjects Web Intelligence A web-based query and analysis tool that enables users to track, understand, and manage corporate data using a simple browser as their interface, while maintaining tight security over data access. Long form: SAP NetWeaver Business Warehouse (SAP NetWeaver BW) SAP NetWeaver Business Warehouse. Formerly known as SAP Business Information Warehouse. script A step in a job or work flow that allows you to calculate values to pass to other parts of the job or work flow. The script can call functions, execute if-then-else statements, and assign values to variables. Write a script in the Data Services scripting language. secondary information Assists Data Cleanse in determining how to process the word when it is used in different ways. Secondary information can include how Data Cleanse will standardize the output data for the word or alternate forms that could potentially be matched to the word . segment Format with which the data records of IDocs are interpreted. SERP Canada Post Corporations Software Evaluation and Recognition Program. Data Quality is certified under this program, allowing you to receive postage discounts for mailings to and within Canada. server group A defined collection of Job Servers on different computers. A server group automatically measures resource availability on each Job Server in the group and distributes batch jobs or part of a job to the Job Server with the lightest load at run time. Use the Server Groups node in the Administrators navigation tree to group Job Servers that are associated with the same repository into a server group. service request Any message sent from a Web client that requires processing by a real-time job. similarity score
60 SAP BusinessObjects Data Services Getting Started Guide

Glossary

A percentage that indicates how much two fields or values are considered alike. This percentage is calculated by the application after the comparison process. For example, Ron and Rob are considered 67% alike because two of the three characters are alike. Similarity scores are used in a number of situations not just in the Match transform. For example, they can be used to determine which suggestions to return for suggestion lists. The similarity score is not always a direct result of a one-to-one comparison; it can be altered by some options, such as those defined in the Match transform, for example. single-use object A step in a work flow or data flow that cannot be saved independently of the flow. Create single-use objects (such as a try/catch block, script, or conditional) from the tool palette. smart editor A flexible editing tool in Data Services used for creating scripts, expressions, and custom functions without having to type the names of existing elements like column, function, and variable names. SNMP (System Network Management Protocol) A protocol that helps network administrators manage network routing hardware. The protocol can manage a variety of hardware and software devices. Data Services supports monitoring through SNMP. snowbird A casual term to describe someone who has multiple residences. This term is derived from individuals who reside in a cooler-climate region during the summer, and relocate to a home in a warmer-climate region during the winter. SOAP (Simple Object Access Protocol) An XML-based message protocol used to encode the information in a web service request and response messages before sending them over a network or Internet. source 1. An object (table, file, or legacy system) from which Data Services reads data. 2. For the Match transform, the grouping of records on the basis of some data characteristic that you can identify. A source might be all records from one input file, or all records that contain a

SAP BusinessObjects Data Services Getting Started Guide

61

Glossary

particular value in a particular field. Sources are abstract and arbitrarythere is no physical boundary line between sources. Source membership can cut across data sources as well as distinguish among records within a data source, based on how you define the source. source group A group of sources that you can use to prepare a second set of match statistics, combining the statistics for two or more regular sources. For example, suppose you define five sourcestwo house sources and three rented sources. You would get match statistics for each individual source. But suppose that you also wanted a summary for the house sources and a summary for the rented sources. You could create two source groupsone for the house sources and one for the rented sources. Source groups affect only the way that match statistics are reported. They do not affect matching or record priority. source record The location where the data you want to use to update or create your best record with resides. A source record can be the master or subordinate record of a match group. SQL (Structured Query Language) A query language for accessing relational, ODBC, DRDA (Distributed Relational Database Architecture), or non-relational database systems. SQL query tool An end-user tool that accepts SQL to be processed against one or more relational databases. standards Define how Data Cleanse will standardize capitalization or other output formatting on data. star schema A database design you can use to format data in a data mart. This design is based on a single fact table to which any number of dimensional tables may be joined. This type of database design supports multi-dimensional database analysis. step

62

SAP BusinessObjects Data Services Getting Started Guide

Glossary

An object that is part of the definition of a work flow or data flow. Each step is represented by an icon in the diagram of the flow and is connected to other steps to indicate the flow of data through the data flow or the order of execution in the work flow. street address A delivery address that is the street name and house number. subordinate record Records that are part of a match group, and are found to be matches with (and subordinate to) a master record. Subordinate records can contain data that may be used to update a master record and, thus, create a best record. substitution parameter A text string "alias" that you can use within your job and transforms. You define a substitution parameter and its value in a substitution parameter configuration. Then, at runtime, that parameter is replaced with its value anywhere it is used in your job. substitution parameter configuration The definition of the substitution parameters used throughout your job in a particular run-time environment . If you change the run-time environment, you can change the substitution parameter configuration before you execute the job. suggestion lists Normally, when an address cleansing transform looks up an address in the postal directories, it finds one matching record. Sometimes, due to incomplete information, there may be two or more records (or suggestions) in the postal directories that could possibly be the correct record. Suggestion lists provide you with a list of matching addresses, so that you can choose which is the best address. suppression source A source that contains records of information that should be excluded from other output destinations. The records in the suppression source are used for matching in other sources. The records that match the suppression source could then be removed from further processing. For example, suppression sources may be your own bad-account file or no-mail sources provided by the government or direct-marketing association (DMA) to prevent wasted mailings and offending consumers.

SAP BusinessObjects Data Services Getting Started Guide

63

Glossary

system configuration Groups together a set of datastore configurations and a substitution parameter configuration. Data store configurations define datastore connections. A substitution parameter configuration can be associated with one or more system configurations. For example, you might create one system configuration for your local system and a different system configuration for another system. When executing a job, you can specify which system configuration to use. table A database table that Data Services reads data from or loads data into. The path and mechanisms for reading and loading data and apportioning the data among rows and columns are defined in the datastore that the table is associated with. Writing a data set to a database table means sending a combination of rows with appropriate operation code to the database table. target An object in which Data Services loads extracted and transformed data in a data flow. Data Services loads rows flagged as INSERT, UPDATE, or DELETE. TCP/IP Transmission Control Protocol/Internet Protocol. The basic communication protocol of the internet, and often intranets and extranets. A computer having direct access to the internet contains a copy of the TCP/IP program. TCP/IP makes it possible for computers to communicate with each other. Tdpid (Teradata Director Program ID) The server name Data Services uses when loading with the bulk loader option. Data Services uses tdpid as a Teradata Warehouse Builder operator attribute. territory The locale value for a geographical location (usually the country) where a locale language is used. The paring of a language with a territory determines factors such as date format, time format, decimal separator, currency format, and so on. thread The instance of the program running on behalf of some process. Data Services typically creates one thread per data flow object. If you are using parallel objects in data flows, the thread count will

64

SAP BusinessObjects Data Services Getting Started Guide

Glossary

increase to approximately one thread for each source or target table partition. If you set the Degree of parallelism (DOP) option for your data flow to a value greater than one, the thread count per transform will increase. The operating system will distribute the threads among the available CPUs. tokenization Assigns specific meanings to each of the pieces that result from word breaking. Data Cleanse looks up each individual input word in the dictionary. A list of tokens is created using the classifications associated with each word in the dictionary. tooltip A small pop-up window with descriptive text. transfer rule In SAP NetWeaver BW, transfer rules help you determine how the fields for the transfer structure are assigned to the InfoObjects of the communication structure. transfer structure In SAP NetWeaver BW, a structure in which data is transferred from the source system into BW. It displays a selection of fields for an extract structure of the source system. To an ETL tool like Data Services, a transfer structure looks like a table. transform A step in a data flow that acts on a data set. Data Services transforms are available through the object library in three cateogories: Data Integrator, Data quality, and Platform. transparent network substrate (TNS) The Oracle networking technology that provides a single application interface to all industry-standard networking protocols. It is stored in the tnsnames.ora network configuration file. Use a TNS to connect to your Oracle database or the Data Services Repository (stored in an Oracle database). try/catch block A combination of a try object and one or more catch objects that define alternate execution paths in case an error occurs during the execution of a job. You can tune try/catch blocks to trap specific errors and to provide general alerts or messages if an error occurs. Unicode

SAP BusinessObjects Data Services Getting Started Guide

65

Glossary

A standard that was designed to create a universal character set. It accomplishes this by providing a unique number for every character in every language. The Unicode Standard describes more than 50,000 characters, including all the characters of the common character sets in use when Unicode was established around 1990, as well as many that have been added since then. Unicode is an open character set, meaning it can continue to incorporate characters as needed. Unicode can handle letters, punctuation, and technical symbolsregardless of platform, program, writing system, or language. unique identifier In a Data Quality transform, an ID that is unique to a record or group of matching records. It is sequential, static, and will not change when records are updated or re-processed through the application. unique record Records that do not have any matching or subordinate records and, therefore, do not belong to any match group after the matching process is complete. universe In SAP BusinessObjects Enterprise, an abstraction of a data source that presents data to users in non-technical terms. upstream A data flow object, such as a transform, that is placed before another data flow object in a job. variable A symbolic placeholder for a value. Data Services lets you define local variables and global variables. Local variables pertain to the work flow or custom function in which they are defined. You can pass the value into another work flow or data flow using a parameter. Global variables pertain to the job in which they are defined. With global variables, there is no need to define parameters between objects in the job. Global variables can also be selected at execution time. This eliminates the need to open the Designer to set global variable values.

66

SAP BusinessObjects Data Services Getting Started Guide

Glossary

web server A machine or application that serves web pages over the Internet or intranet. A web server hosts pages, scripts, programs, and multimedia files and then serves them using HTTP, which sends files to a client web browser. web services A standard platform for integrating applications. Web Services allow different programs, constructed in different languages, on different platforms to communicate with each other. weighted scoring A method of comparison that provides you with a greater degree of control in the matching process. This method allows you to use contribution values to place more or less importance on various match criteria. word breaking Breaks the input line down into smaller, more usable pieces. By default, Data Cleanse breaks an input line on white space, punctuation, and alphanumeric transitions. Terms such as 20GB, 4G, 1st, and U2 each break into two tokens at the alphanumeric transition. For example, "20GB" breaks into "20" and "GB" tokens. work flow A reusable object containing steps to define the order of job execution. Work flows call data flows, but cannot manipulate data themselves. You call work flows from inside other work flows or jobs. You pass information into or out of work flows using parameters. You reuse work flows by dragging existing work flows from the object library. workspace The window inside the Designer in which you define, display, and modify objects. The workspace for a data flow contains an area to build a diagram representing the data flow definition. The workspace for a transform contains an editor for modifying the transform options. WSDL Web Services Definition Language. Web services are self-contained, modular business process applications based on open Internet standards. XML
SAP BusinessObjects Data Services Getting Started Guide 67

Glossary

Extensible Markup Language. This markup language is like HTML (Hypertext Markup Language) in that it specifies a standard with which you can define your own markup languages with their own sets of tags. XML allows you to define various tags with various rules, such as tags that represent business rules, tags that represent data description, or tags that represent data relationships. XML Schema The XML format used by Data Services to support message processing that includes Web Services. XML Schemas describe the data structure of an XML file or message. Data flows can read and write data to messages or files based on a specified XML Schema format. You can use the same XML Schema to describe multiple XML sources or targets. XML Schema properties include: Name, Description, Imported from, Root element name, and Namespace. Z4Change The Z4Change directory lists all the ZIP and ZIP+4 Codes in the country. A record in this file is tagged if it has changed within the last 12 months. The change might be a postal-code change (ZIP, ZIP+4, or CART), or even a change in the standardized form of the address-line or city name. ZCF The ZIP-City File directory that is used by the USA Regulatory Address Cleanse transform when processing data from the U.S. ZIP Code ZIP is an acronym that stands for "Zone Improvement Plan." This is a 3-, 5-, or 9-digit number that represents a geographic region of the United States. The ZIP Code is important in determining entry eligibility and presort containerization. Note that this code is different from a facility code. ZIP+4 A nine-digit number, consisting of the ordinary ZIP Code and a four-digit, add-on code. zone The ZIP-City File directory that is used by the USA Regulatory Address Cleanse transform when processing data from the U.S.

68

SAP BusinessObjects Data Services Getting Started Guide

Index
A
Access Server description 16 Adapter SDK 21 Address Server 16 Administrator description 16 Auto Documentation reports 18 IP addresses host name, using for 25 specifying connection 25

J
Job Server description 15

C
central repository 14 components description 12

L
License Manager 22 local repository 14

M
management tools 22 Metadata Integrator description 19 metadata, reporting tool 17

D
Designer description 14 distributed architecture 23 distributing components across network 23

N
network, models of distribution 23

E
engine 15

O
operating systems supported 23 Operational Dashboard reports 18

H
host names using IP address 25

I
Impact and Lineage Analysis reports 17

P
ports requirement for 25

SAP BusinessObjects Data Services Getting Started Guide

69

Index

R
repository central 14 description 14 local 14 Repository Manager 22

standard components 12

T
TCP/IP connections required 25 connections, defining 25

S
scalability 25 Server Manager 22

U
utilities 22

70

SAP BusinessObjects Data Services Getting Started Guide

You might also like