Professional Documents
Culture Documents
Reading Sample
In this selection from Chapter 5, you’ll get a taste of some of the
main building blocks that make the jobs run in SAP Data Services.
This chapter provides both the information you need to understand
each of the objects, as well as helpful step-by-step instructions to
get them set up and running in your system.
“Introduction”
“Objects”
Contents
Index
The Authors
Bing Chen, James Hanck, Patrick Hanck, Scott Hertel, Allen Lissarrague, Paul Médaille
www.sap-press.com/3688
© 2015 by Rheinwerk Publishing, Inc. This reading sample may be distributed free of charge. In no way must the file be altered, or
individual pages be removed. The use for any commercial purpose other than promoting the book is strictly prohibited.
4
Introduction
Welcome to SAP Data Services: The Comprehensive Guide. The mission of this book
is to capture information in one source that will show you how to plan, develop,
implement, and perfect SAP Data Services jobs to perform data-provisioning pro-
cesses simply, quickly, and accurately.
This book is intended for those who have years of experience with SAP Data Ser-
vices (which we’ll mainly refer to as just Data Services) and its predecessor, Data
Integrator, as well as those who are brand new to this toolset. The book is
designed to be useful for architects, developers, data stewards, IT operations, data
acquisition and BI teams, and management teams.
17
Introduction Introduction
and other key functionality for organizations’ EIM strategy and solutions. Chapter 8, Change Data Capture, is for developers and technical personnel to
Finally, the book looks into the outlook of Data Services. design and maintain jobs that leverage change data capture (CDC). It starts with
what CDC is and the use cases that it supports. This chapter builds a data flow that
With these four parts, the aim is to give you a navigation panel on where you
leverages CDC and ensures a voluminous source table is efficiently replicated to a
should start reading first. Thus, if your focus is more operational in nature, you’ll
data warehouse.
likely focus on Part I, and if you’re a developer new to Data Services, you’ll want
to focus on Parts II and III. Chapter 9, Social Media Analytics, explores the text data processing (TDP) capa-
bility of Data Services, describes how it can be used to analyze social media data,
The following is a detailed description of each chapter.
and suggests more use cases for this important functionality.
Chapter 1, System Considerations, leads technical infrastructure resources,
Chapter 10, Design Considerations, builds on the prior chapters to make sure
developers, and IT managers through planning and implementation of the Data
jobs are architected with performance, transparency, supportability, and long-
Services environment. The chapter starts with a look at identifying the require-
term total cost in mind.
ments of the Data Services environments for your organization over a five-year
horizon and how to size the environments appropriately. Chapter 11, Integration into Data Warehouses, takes you through data ware-
housing scenarios and strategies to solve common integration challenges incurred
Chapter 2, Installation, is for technical personnel. We explain the process of
with dimensional and factual data leveraging Data Services. The chapter builds
installing Data Services on Windows and Linux environments.
data flows to highlight proper provisioning of data to data warehouses.
Chapter 3, Configuration and Administration, reviews components of the land-
Chapter 12, Industry-Specific Integrations, takes the reader through integration
scape and how to make sure your jobs are executed in a successful and timely
strategies for the distribution and retail industries leveraging Data Services.
manner.
Chapter 13, SAP Information Steward, is for the stewardship and developer
Chapter 4, Application Navigation, walks new users through a step-by-step pro-
resources and explores common functionality in Information Steward that
cess on how to create a simple batch job using each of the development environ-
enables transparency and trust in the data provisioned by Data Services and how
ments in Data Services.
the two technology solutions work together.
Chapter 5, Objects, dives into the primary building blocks that make a job run. Chapter 14, Where Is SAP Data Services Headed?, explores the potential future
Along the way, we provide hints to make jobs run more efficiently and make of Data Services.
them easier to maintain over their useful life expectancy and then some!
You can also find the code that’s detailed in the book available for download at
Chapter 6, Variables, Parameters, and Substitution Parameters, explores the www.sap-press.com/3688.
process of defining and using variables and parameters. We’ll look at a few exam-
ples that use these value placeholders to make your Data Services jobs more flex- Common themes presented in the book include simplicity in job creation and the
ible and easier to maintain over time. use of standards. Through simplicity and standards, we’re able to quickly hand off
to support or comprehend from the development team how jobs work. Best prac-
Chapter 7, Programming with SAP Data Services Scripting Language and tice is to design and build jobs with the perspective that someone will have to
Python, explores how Data Services makes its toolset extensible through various support it while half asleep in the middle of the night or from the other side of
coding languages and coding objects. Before that capability is tapped, the ques- the world with limited or no access to the development team. The goal for this
tion, “Why code?” needs to be asked and the answer formulated to ensure that book is that you’ll be empowered with the information on how to create and
the coding activity is being entered into for the right reasons. improve your organization’s data-provisioning processes through simplicity and
standards to ultimately achieve efficiency in cost and scale.
18 19
This chapter dives into the primary building blocks that make a job run.
Along the way, we provide hints to make jobs run more efficiently and
make them easier to maintain over their useful life expectancy and then
some!
5 Objects
Now that you know how to create a simple batch integration job using the two
integrated development environments (IDEs) that come with SAP Data Services,
we dive deeper and explore the objects available to specify processes that enable
reliable delivery of trusted and timely information. Before we get started, we
want to remind you about the Data Services object hierarchy, illustrated in Figure
5.1. This chapter traverses this object hierarchy and describes the objects in
detail.
As you can see from the object hierarchy, the project object is at the top. The proj-
ect object is a collection of jobs that is used as an organization method to create
job groupings.
Note
We’ll spend the remainder of this chapter discussing the commonly used objects
in detail.
207
5 Objects Jobs 5.1
208 209
5 Objects Jobs 5.1
A workflow’s Recover as a unit option will cause Recover last failed execution to start To create a real-time job from the Data Services Designer IDE, from the Project
at the beginning of that workflow instead of the last failed object within that workflow. Area or the Jobs tab of the local repository, right-click, and choose New Real-
time Job as shown in Figure 5.2.
210 211
5 Objects Jobs 5.1
If we were to replace the annotations with objects that commonly exist in a real-
time job, it might look like Figure 5.4. In this figure, you can see that initializa-
tion, real-time processing loop, and clean-up are represented by SC_Init, DF_Get-
CustomerLocation, and SC_CleanUp, respectively.
The new job is opened with process begin and end items. These two items seg-
ment the job into three sections. These sections are the initialization (prior to pro-
cess begin), real-time processing loop (between process begin and end), and
cleanup (after process end) as shown in Figure 5.3. Figure 5.4 Real-Time Job with Objects
Within the DF_GetCustomerLocation data flow, you’ll notice two sources and one
target. Although there can be many objects within a real-time job’s data flow, it’s
required that they at least have one XML Message Source object and one XML
Message Target object.
A real-time job can even be executed via Data Services Designer the same way
you execute a batch job by pressing the (F8) button (or the menu option Debug 폷
Execute) while the job window is active. Even though real-time jobs can be exe-
cuted this way in development and testing situations, the execution in production
implementations is managed by setting the real-time job to execute as a service
via the access server within the Data Services Management Console.
Note
For a real-time job to be executed successfully via Data Services Designer, the XML
Source Message and XML Target Message objects within the data flow must have their
XML test file specified, and all sources must exist.
Figure 5.3 Real-Time Job Logical Sections
212 213
5 Objects Jobs 5.1
To set up the real-time job to execute as a service, you navigate to the Real-Time
Services folder, as shown in Figure 5.5, within the access server under which we
want it to run.
After the configuration has been applied, you can start the service by clicking the
Real-Time Services folder, selecting the newly added service, and then clicking
Start (see Figure 5.7). The status changes to Service Started.
214 215
5 Objects Workflow 5.2
The Web Services Configuration – Add Real-Time Services window appears as Note
shown in Figure 5.9. Click Add to expose the web service, and a confirmation sta-
At this point, you can also view the history log from the Web Services Status page
tus message is shown.
where you can see the status from initialization and processing steps. The clean-up pro-
cess status isn’t included because the service is still running and hasn’t yet been shut
down.
To test the web service externally from Data Services, a third-party product such
as SoapUI can be used. After importing the web services’ Web Services Descrip-
tion Language (WSDL) information into the third-party product, a request can be
created and executed to generate a response from your real-time job. The sample
request and response are shown in Figure 5.11 and Figure 5.12.
The status of the web service can now be viewed from the Web Services Status
tab, where it appears as a published web service (see Figure 5.10). The web ser- Figure 5.11 Sample SOAP Web Service Request
The rest of this chapter focuses on the remaining Figure 5.1 objects that job
objects rely on to enable their functionality and support their processing.
5.2 Workflow
A workflow and a job are very similar. As shown in the object hierarchy, many of
the same objects that interact with a job can optionally also interact with a work-
flow, and both can contain objects and be configured to execute in a specified
Figure 5.10 Web Services Status
216 217
5 Objects Workflow 5.2
order of operation. Following are the primary differences between a job and a
multiple jobs. These activities can be encapsulated within a common workflow. Doing
workflow: so will make future updates to that activity easier to implement and ensure that all con-
왘 Although both have variables, a job has global variables, and a workflow has suming jobs of the activity get the update. When creating the workflows to contain the
common activity, make sure to denote that commonality with the name of the work-
parameters.
flow. This is often done with a CWF_ prefix instead of the standard WF_.
왘 A job can be executed, and a workflow must ultimately be contained by a job
to be executed.
5.2.1 Areas of a Workflow
왘 A workflow can contain other workflows and even recursively call itself,
A workflow has two tabs, the General tab and the Continuous Options tab. In
although a job isn’t able to contain other jobs.
addition to the standard object attributes such as name and description, the Gen-
왘 A workflow has the Recover as a unit option, described later in this section, eral tab also allows specification of workflow-only properties as show in the fol-
whereas a workflow has the Continuous option, also described later in this lowing list and in Figure 5.13:
section.
1 Execution Type
You might wonder why you need workflows when you can just specify the order There are three workflow execution types:
of operations within a job. In addition to some of the functionality mentioned
왘 Regular is a traditional encapsulation of operations and enables subwork-
later in this section, the use of workflows in an organization will depend heavily
flows.
on the organization’s standards and design patterns. Such design patterns might
be influenced by the following: 왘 Continuous is a workflow introduced in Data Services version 4.1 (this is
discussed in detail in Section 5.2.2).
왘 Organizations that leverage an enterprise scheduling system at times like work-
왘 Single is a workflow that specifies that all objects encapsulated within it are
flow logic to be built into its streams instead of the jobs in which it calls. In such
to execute in one operating system (OS) process.
instances, a rule can emerge that each logical unit of work is a separate job. This
translates into each job having one activity (e.g., a data flow to effect the inser- 2 Execute only once
tion of data from table A to table B). On the other side of the spectrum, orga- In the rare case where a workflow has included a job more than once and only
nizations that build all their workflow into Data Services jobs might have a sin- one execution of the workflow is required. You may be wondering when you
gle job provision customer from the source of record to all pertinent systems. would ever include the same workflow in the same job. Although rare, this
This latter design pattern might have a separate workflow for each pertinent does come in handy when you have a job with parallel processes, and the work-
system so that logic can be compartmentalized to enable easier support and/or flow contains an activity that is dependent on more than one subprocess and
rerun ability. varies which subprocess finishes first. In that case, you would include the
workflow prior to each subprocess and check the Execute only once checkbox
왘 Organizations may have unique logging or auditing activities that are common
on both instances. During execution, the first time instance to execute will exe-
across a class of jobs. In such cases, rather than rebuilding that logic over and
cute the operations, and the second execution will be bypassed.
over again to be included in each job, your best option is to create one work-
flow in a generic manner to contain that logic so that it can be written once and 3 Recover as a unit
used by all jobs within that class. This forces all objects within a workflow to reexecute when the job being exe-
cuted with the Enable recovery option subsequently fails, and then the job is
Recommendation reexecuted with the Recover from last failed execution option. If the job is
Perform a yearly review of the jobs released into the production landscape. Look for executed with Enable recover, and the workflow doesn’t have the Recover as
opportunities for simplification where an activity has been written multiple times in a unit option selected, the job restarts in recovery mode from the last failed
218 219
5 Objects Workflow 5.2
object within that workflow (assuming the prior failure occurred within the requirements can be met without a continuous workflow, the technical perfor-
workflow). mance and resource utilization are greatly improved by using it. As shown in Fig-
4 Bypass ure 5.15, the instantiation and cleanup processes only need to occur once (or as
This was introduced in Data Services 4.2 SP3 to enable certain workflows (and often as resources are released), and as shown in Figure 5.16, instantiation and
data flows) to be bypassed by passing a value of YES. This has a stated purpose cleanup processes need to occur for each cycle, resulting in more system resource
to facilitate testing processes where not all activities are required and isn’t utilization.
intended for production use.
Start
Setting the stage, an organization wants a process to wait for some event to occur Workflow
before it kicks off a set of activities, and then upon finishing the activities, it needs
to await the next event occurrence to repeat. This needs to occur from some start
Loop Start
time to some end time. To effect a standard highly efficient process, Data Services Data Flow
4.1 released the continuous workflow execution type setting. After setting the
execution type to Continuous, the Continuous Options tab becomes enabled
(see Figure 5.14). Complete Complete
Workflow Data Flow
Prior to Data Services 4.1, this functionality could be accomplished by coding
your own polling process to check for an event occurrence, perform some activi-
ties, and then wait for a defined interval period. Although the same functional Figure 5.15 Continuous Workflow
220 221
5 Objects Data Flows 5.4
The try object is dragged onto the beginning of a sequence of objects on a job or
workflow, and a catch object is dragged to where you want the execution encap-
Start
Data Flow
sulation to end. Between these two objects, if an error is raised in any of the
Complete
Workflow sequenced objects, the execution moves directly to the catch object, where a new
sequence of steps can be specified.
Complete
Clean Up Data Flow
Data Flow
5.4 Data Flows
Figure 5.16 Continuous Operation Prior to the Continuous Workflow
Data flows are collections of objects that represent source data, transform objects,
and target data. As the name suggests, it also defines the flow the data will take
through the collection of objects.
5.3 Logical Flow Objects
Data flows are reusable and can be attached to workflows to join multiple data
Logical flow objects are used to map business logic in the workflows. These flows together, or they can be attached directly to jobs.
objects are available in the job and workflow workspace. These objects are not
There are two types of data flows: the standard data flow and an ABAP data flow.
available in the data flow workspace. The logical flow objects consist of condi-
The ABAP data flow is an object that can be called from the standard object. The
tional, while loop, and try and catch block objects.
advantage of the ABAP data flow is that it integrates with SAP applications for bet-
ter performance. A prerequisite for using ABAP data flows is to have transport
5.3.1 Conditional files in place on the source SAP application.
A conditional object can be dragged onto the workspace of a job or workflow The following example demonstrates the use of a standard data flow and an ABAP
object to enable an IF-THEN-ELSE condition. There are three areas of a conditional data flow. We’ll create a standard data flow that has an ABAP data flow that
object: accesses an SAP ECC table, a pass-through Query transform, and a target template
왘 Condition table.
222 223
Datastores 5.6
</ENTRY>
<ENTRY>
<SELECTION>11</SELECTION>
<PRIMARY_NAME1>TRINITY PL</PRIMARY_NAME1>
</ENTRY>
</SUGGESTION_LIST>
Listing 5.1 SUGGESTION_LIST Output Field
The DSF2 Walk Sequencer transform enables the following types of postal dis-
counts:
왘 Sortcode discount
왘 Walk Sequence discount
왘 Total Active Deliveries discount
왘 Residential Saturation discount
5.6 Datastores
Datastores connecting to relational databases, as explored in Chapter 3 and Chap-
ter 4, are also able to source and target applications such as SAP Business Ware-
house (SAP BW) and a new datastore type introduced in Data Services version
4.2, RESTful web services, which will be explored in this section.
299
5 Objects Datastores 5.6
왘 SAP BW source datastore for reading data from SAP BW user ID and not a dialog user ID. The SAP Application Server name and SAP
왘 SAP BW target datastore for loading data into SAP BW Gateway Hostname fields correspond with the server name of your SAP BW
instance. The Client Number and System Number also correspond with those of
An SAP BW source datastore is very similar to the SAP application datastore in your SAP BW instance.
that you can import the same objects as SAP applications except for hierarchies.
SAP BW provides additional objects to browse and import, including InfoAreas,
InfoCubes, Operational Datastore (ODS) objects, and Open Hub tables.
DataSource/PSA
Management Console
Staging BAPI
InfoCube
RFC Server
InfoObjects
Open Hub Tables
Lo
Designer
Open Hub Services
ad
Read
Datastore Object
Job Server
Figure 5.109 SAP Data Services—SAP BW Transactional Layout Figure 5.110 RFC Configuration in Data Services
There are several steps to setting up a job to read data from SAP BW. We’ll detail To set up the source system, follow these steps:
them all here. 1. On the SAP BW side, through the BW Workbench:Modeling, create a new
To set up the RFC server configuration, follow these steps: source system by going to Source System 폷 External Source 폷 Create.
2. You’ll be prompted for a Logical System Name and Source System Name. (For
1. On the Data Services side, through the Administrator module on the Data Ser-
consistency, we tend to make the Logical System Name the same as the RFC
vices Management Console, go into SAP Connections 폷 RFC Server Interface.
ProgramID we entered for the RFC server configuration. The Source System
2. Click the RFC Server Interface Configuration table, and then click the Add
Name we treat as a short description field.) Click the checkmark button to con-
button.
tinue.
3. Enter the information in the fields as shown in Figure 5.110. Click Apply.
3. You’ll be presented with the RFC Destination page as shown in Figure 5.111.
The RFC ProgramID is the key identifier between the two systems, and although The Logical System Name is shown in the RFC Destination field. The Source
it can be whatever you want, it needs to be identical between the systems and System Name is shown in the Description 1 field. The Activation Type should
descriptive. RFC ProgramID is also case sensitive. Username and Password are be set to Registered Server Program. The Program ID field needs to match
connection details for SAP BW; it’s recommended that this be a communication (exactly) the RFC ProgramID. Click the Apply button to complete the configu-
ration.
300 301
5 Objects Datastores 5.6
302 303
5 Objects Datastores 5.6
InfoCubes should be created and activated where the extracted data will ulti-
mately be placed. Figure 5.113 Create New Datastore—SAP BW Target
http://maps.googleapis.com/maps/api/geocode/xml?address=Chicago
304 305
5 Objects Datastores 5.6
306 307
5 Objects File Formats 5.7
<request> method GET. The request schema will include the available input parameters for a
<param xmlns:xs="http://www.w3.org/2001/ function, which in this case, is the parameter item. The reply schema will include
XMLSchema"
type="xs:string" style="query" name="search"/> the XML- or JSON-formatted data in addition to HTTP error codes (AL_ERROR_NUM)
</request> and error messages (AL_ERROR_MSG). These fields can then be integrated and used
<response> in downstream data flow processing logic.
<representation mediaType="application/xml"/>
</response>
</method>
<resource path="(item: .+)">
<param xmlns:xs="http://www.w3.org/2001/XMLSchema"
type="xs:string" style="template" name="item"/>
<method name="PUT" id="putItem">
<request>
<representation mediaType="*/*"/>
</request>
<response>
<representation mediaType="*/*"/>
</response>
</method> Figure 5.114 Data Services REST Web Service Functions Example
<method name="DELETE" id="deleteItem"/>
<method name="GET" id="getItem">
<response>
<representation mediaType="*/*"/>
</response>
</method>
</resource>
</resource>
</resource>
</resources>
</application>
Listing 5.3 WADL File Example
In addition to the WADL file, the web service datastore objects can be configured
for security and encryption as well as configured to use XML- or JSON-formatted
data. After the web service datastore has been created, the available functions can
be browsed through the Datastore Explorer in the Data Services Designer appli-
Figure 5.115 Data Services REST Function Example
cation. Figure 5.114 shows the view from the Datastore Explorer in Data Services
Designer for the same REST application with functions for manipulating con-
tainer and item data. Notice all seven functions from the WADL file are present
along with a nested structure for the functions. 5.7 File Formats
Any of the available functions can then be imported into the repository through As datastore structures provide metadata regarding column attribution to data-
the Data Store Explorer. Once imported, they become ready for use in data flows. bases and application, file formats provide the same for files. We’ll go over the
Figure 5.115 shows the getItem function from our WADL file with both a request different file format types and explain how to use them in the following sections.
schema and reply schema defined. Notice the function is based on the HTTP
308 309
5 Objects File Formats 5.7
The first step in creating a flat file format template is to right-click Flat Files, and
select New as shown in Figure 5.116. The Flat Files objects are located in the
Data Services Designer, in the Local Object Library area, in the File Format sec-
tion.
In the left-hand column, the first option to consider is the type of flat file. Your
default choice is Delimited and appropriately is the typical selection. The other
choices include Fixed Width, in which the data fields are fixed lengths. These
files tend to be easier to read but can be inefficient for storing empty spaces for
values smaller than defined widths. SAP Transports type is used when you want
to create your own SAP application file format. This is the case if the predefined
Transport_Format wasn’t suitable to read from or write to a file in an SAP data
flow. The last two types are Unstructured Text and Unstructured Binary.
Unstructured Text type is used to consume unstructured files such as text files,
HTML, or XML. Unstructured Binary is used to read unstructured binary files
such as Microsoft Office documents (Word, Excel, PowerPoint, Outlook emails),
Figure 5.116 Creating a Flat File Object
generic .eml files, PDF files, and other binary files (Open Document, Corel Word-
Perfect, etc.). You can then use these sources with Text Data Processing trans-
Creating a Flat File Template forms to extract and manipulate data.
The File Format Editor screen will appear after selecting New as shown in Fig-
The second configuration option is Name. It can’t be blank, so a generated name
ure 5.117. This editor will be used to configure the flat file’s structure, data for-
is entered by default. Assuming you want a more descriptive name to reference,
mat, and source or target file location. We’ll explain the common configuration
you’ll need to change the name prior to saving. After the template is created, the
steps as we step through examples of creating flat file templates. There are several
Name can’t be changed.
ways to define the flat file template; one way is to manually create one. To do
this, you modify the setting in the left-hand column (Figure 5.117) and define the The third item of the Delimited file is Adaptable Schema. This is a useful option
data structure in the upper-right section. when processing several files, where their formats aren’t consistent. For example,
310 311
5 Objects File Formats 5.7
you’re processing 10 files, and some have a comments column at the end, 왘 The default type of delimiter is Comma. If your results show only one column
whereas others do not. In this case, you’ll receive an error because your schema of combined data, adjust the Column in the Delimiter section as necessary.
is expecting a fixed number of elements and finds either more or less. With 왘 If the first rows on the files are headers, then you can select to Skip row header
Adaptable Schema set at Yes, you define the maximum number of columns. For to change the column names to the headers. If you want the headers written
the files with fewer columns, null values are inserted into the missing column when using the template as a target, then select Yes on the Write row header
placeholders. There is an assumption here that the missing columns are at the end option.
of the row, and the order of the fields isn’t changed.
왘 If you’ll be using the template for multiple files, review the data types and field
sizes in the right-hand section. The values are determined from the file you
Creating a Flat File Template with an Existing File imported; if the field sizes are variable, then ensure the values are set to their
Another method to create a flat file format is to import an existing file. This can maximum. Data types can also be incorrect based on a small set of values; for
be done by using the second section (Data Files) of the left-hand column of the example, the part IDs in the file may be all numbers, but in another file, they
File Format Editor screen. Using the yellow folder icons, select the directory may include alphanumeric values.
and file to import. You’ll receive a pop-up asking to overwrite the current
schema, as shown in Figure 5.118.
312 313
5 Objects File Formats 5.7
can change the Adaptable Schema setting. You can modify performance settings If you intend to process multiple files in a given directory, you can use a wildcard
that weren’t present on the template such as Join Rank, Cache, and Parallel (*). For example, if you want to read all text files in a directory, then the File
process threads. At the bottom of the left-hand section is the Source Informa- Name(s) setting would be *.txt. You can also use the wildcard for just a portion
tion. The Include file name column setting will add a column to the file’s data of the file name. This is useful if the directory contains different types of data. For
source that contains the file name. This is a useful setting when processing multi- example, a directory contains part data and price data, and the files are appended
ple files. with timestamps. To process only part data from the year 2014, the setting is
part_update_2014*.txt.
When using the wildcard, Data Services expects a file to exist. An error occurs
when the specified file can’t be found. This error is avoided by checking existence
prior to reading the flat file. One method of validating file existence is to use the
file_exists(filename) function, which given the path to the file will return 0 if
no files are located. The one shortcoming of this method is that the file_exists
function doesn’t process file paths that contain wildcards, which in many cases is
the scenario presented. There is another function, wait_for_file(filename,
timeout, interval), which does allow for wildcards in the file name and can be
set to accomplish the same results. Figure 5.121 shows the setup to accomplish
this. First a script calls the wait_for_file function. Then a conditional transform
is used to either process the flat file or catch when no file exists.
The Root Directory setting can be modified from what is in the template and be
specific to that one flat file source or target. You can hard-code the path into this
setting, but this path will likely change as you move the project through environ-
ments. A better practice is to use substitution parameters for the path. Using the
substitution parameter configurations for each environment allows the same
parameter to represent different values in each environment (or configuration).
The File Name(s) setting is similar to the Root Directory setting. The value
comes from the template, but it can be changed and is independent of the tem-
plate. You can hard-code the name, but that assumes the same file will be used
each time. It’s not uncommon to have timestamps, department codes, or other
identifiers/differentiators appended to file names. In the situation where the
name is changed and set during the execution of the job, you’ll use a parameter or Figure 5.121 Validating That Files Are Present to Read
global variables.
314 315
5 Objects File Formats 5.7
5.7.2 Creating a Flat File Template from a Query Transform UNIX Server: Adding an Excel Adapter
The methods to create flat file templates discussed so far can be used for either To read the data from an Excel file, you must first define an Excel adapter on the
source or target data sources. Target data sources have one other method to job server in the Data Services Management Console.
quickly create a flat file template. Within the Query Editor, you can right-click the
Warning
output schema and select Create File Format as shown in Figure 5.122. This will
use the output schema as the data type definition and create a template. Only one Excel adapter can be configured for each job server.
Excel supports multiple Excel versions and Excel file extensions. The following
4. On the Adapter Configuration page, enter “BOExcelAdapter” in the Adapter
versions are compatible as of Data Services version 4.2: Microsoft Excel 2003,
Instance Name field (see Figure 5.124). The entry is case sensitive.
2007, and 2010.
5. All the other entries may be left at their default values. The only exception to
that rule is when processing files which are larger than 1MB, in which case, you
Preliminary System Requirements would need to modify the Additional Java Launcher Options value to
This section describes the initial step to complete to use an Excel component in -Xms64m -Xmx512 or -Xms128m -Xmx1024m (the default is -Xms64m -Xmx-
Data Services. 256m).
316 317
5 Objects File Formats 5.7
Warning
Data Services references Excel workbooks as sources only. It’s not possible to define an
Excel workbook as a target. For situations where we’ve needed to output to Excel work-
books, we’ve used a User-Defined transform; an example is given in the “User-Defined
Transform” section found under Section 5.5.3.
At this point, you can’t import or read a password-protected Excel workbook.
6. Start the adapter from the Adapter Instance Status tab (see Figure 5.125).
318 319
5 Objects File Formats 5.7
If the first row of your spreadsheet contains the column names, don’t forget to
click the corresponding option as shown in Figure 5.129.
Warning After confirming and updating column attributes if required, click the OK button,
and the Excel format object will be saved.
To import an Excel workbook into Data Services, it must first be available on the Win-
dows file system. You can later change its actual location (e.g., to the UNIX server). The Alternatively, an Excel format can be manually defined by entering the column
same goes if you want to reimport the workbook definition or view its content. names, data types, content types, and description. These can be entered in the
Schema pane at the top of the New File Creation tab. After the file format has
been defined, it can be used in any workflow as a source or a target.
320 321
5 Objects File Formats 5.7
If both options (FTP/Custom) are left unchecked, Data Services assumes that the file is
located on the same machine where the job server is installed.
322 323
5 Objects File Formats 5.7
왘 Data Services can deal with Excel formulas. However, if an invalid formula results in Before attempting to connect Data Services to your Hadoop instance, make sure to val-
an error such as #DIV/0, #VALUE, or #REF, then the software will process the cell as idate the following prerequisites and requirements:
a NULL value. 왘 The Data Services job server must be installed on Linux.
왘 Be careful when opening the file in Excel when using it from Data Services. Because 왘 Hadoop must be installed on the same server as the Data Services job server. The job
the application reads stored formula values, there is a risk of reading incorrect values server may or may not be part of the Hadoop cluster.
if Excel isn’t closed properly. Ideally, the file shouldn’t be opened by any other soft-
왘 The job server must start from an environment that has sourced the Hadoop environ-
ware when used in Data Services.
ment script.
왘 Also see Excel usage in Chapter 4, Section 4.2.
왘 Text processing components must be installed on Hadoop Distributed File System
(HDFS).
5.7.4 Hadoop
With the rise of big data, Apache Hadoop has become a widely used framework Loading and Reading Data with HDFS
for storing and processing very large volumes of structured and unstructured data To connect to HDFS from Data Services, you must first configure an HDFS file for-
on clusters of commodity hardware. While Hadoop has many components that mat in Data Services Designer. After this file format is created, it can be used to
can be used in various configurations, Data Services has the capability to connect load and read data from an HDFS file in a data flow.
and integrate with the Hadoop framework specifically in the following ways:
In this example, we’ll connect to HDFS in a Merge transform to read files from
왘 Hadoop distributed file system (HDFS) HDFS.
A distributed file system that provides high aggregate throughput access to
data. 1. Open Data Services Designer, and create a new data flow. Select the Formats
tab at the bottom (see Figure 5.132).
왘 MapReduce
A programming model for parallel processing of large data sets.
왘 Hive
An SQL-like interface used for read-only queries written in HiveQL.
왘 Pig
A scripting language used to simplify the generation of MapReduce programs
written in PigLatin.
324 325
5 Objects Summary 5.8
2. Create a new HDFS file format for the source file (see Figure 5.133). 5.8 Summary
This chapter has traversed and described the majority of the Data Services Object
hierarchy. With these objects at your disposal within Data Services, virtually any
data provisioning requirement can be solved. In the next chapter we explore how
to use value placeholders to enable values to be set, shared, and passed between
objects.
3. Set the appropriate Data File(s) properties for your Hadoop instance, such as
the hostname for the Hadoop cluster and path to the file you want to read in.
The HDFS file can now be connected to an object (in this case, a Merge trans-
form) and loaded into a number of different types of targets, such as a database
table, flat file, or even another file in HDFS.
In Figure 5.134, we connect HDFS to a Merge transform and then load the data
into a database table. The HDFS file can be read and sourced much like any other
source object.
326 327
Contents
Acknowledgments ............................................................................................ 15
Introduction ..................................................................................................... 17
2 Installation ................................................................................ 57
7
Contents Contents
2.2.2 Preparing for Repository Creation ..................................... 67 3.6.2 Common Causes of Job Execution Failures ........................ 168
2.2.3 Creating Repositories ....................................................... 67 3.7 Summary ....................................................................................... 172
2.3 Postal Directories .......................................................................... 74
2.3.1 USA Postal Directories ..................................................... 74
2.3.2 Global Postal Directories .................................................. 81 PART II Jobs in SAP Data Services
2.4 Installing SAP Server Functions ...................................................... 82
2.5 Configuration for Excel Sources in Linux ........................................ 84 4 Application Navigation ............................................................. 175
2.5.1 Enabling Adapter Management in a Linux Job Server ........ 84
2.5.2 Configuring an Adapter for Excel on a Linux Job Server .... 86 4.1 Introduction to Data Services Object Types ................................... 175
2.6 SAP Information Steward ............................................................... 89 4.2 Hypothetical Work Request ........................................................... 177
2.7 Summary ....................................................................................... 90 4.3 Data Services Designer .................................................................. 180
4.3.1 Datastore ......................................................................... 181
3 Configuration and Administration ............................................ 91 4.3.2 File Format ....................................................................... 183
4.3.3 Data Flow ........................................................................ 184
3.1 Server Tools ................................................................................... 91 4.4 Data Services Workbench .............................................................. 190
3.1.1 Central Management Console .......................................... 91 4.4.1 Project ............................................................................. 192
3.1.2 Data Services Management Console ................................. 94 4.4.2 Datastore ......................................................................... 193
3.1.3 Data Services Server Manager for Linux/UNIX .................. 113 4.4.3 File Format ....................................................................... 196
3.1.4 Data Services Server Manager for Windows ...................... 125 4.4.4 Data Flow ........................................................................ 198
3.1.5 Data Services Repository Manager .................................... 132 4.4.5 Validation ........................................................................ 202
3.2 Set Up Landscape Components ...................................................... 132 4.4.6 Data Flow Execution ........................................................ 202
3.2.1 Datastores ........................................................................ 133 4.5 Summary ....................................................................................... 205
3.2.2 Table Owner Aliases ......................................................... 136
3.2.3 Substitute Parameters ...................................................... 138 5 Objects ...................................................................................... 207
3.2.4 System Configuration ....................................................... 139
3.3 Security and Securing Sensitive Data .............................................. 140 5.1 Jobs ............................................................................................... 208
3.3.1 Encryption ........................................................................ 141 5.1.1 Batch Job Object .............................................................. 208
3.3.2 Secure Socket Layer (SSL) ................................................. 142 5.1.2 Real-Time Job Object ....................................................... 211
3.3.3 Enable SSL Communication via CMS ................................. 143 5.2 Workflow ...................................................................................... 217
3.3.4 Data Services SSL Configuration ....................................... 147 5.2.1 Areas of a Workflow ......................................................... 219
3.4 Path to Production ........................................................................ 150 5.2.2 Continuous Workflow ...................................................... 220
3.4.1 Repository-Based Promotion ............................................ 150 5.3 Logical Flow Objects ..................................................................... 222
3.4.2 File-Based Promotion ....................................................... 152 5.3.1 Conditional ...................................................................... 222
3.4.3 Central Repository-Based Promotion ................................ 153 5.3.2 While Loop ...................................................................... 223
3.4.4 Object Promotion ............................................................ 155 5.3.3 Try and Catch Blocks ........................................................ 223
3.4.5 Object Promotion with CTS+ ............................................ 157 5.4 Data Flows .................................................................................... 223
3.5 Operation Readiness ...................................................................... 157 5.4.1 Creating a Standard Data Flow with an Embedded
3.5.1 Scheduling ....................................................................... 157 ABAP Data Flow .............................................................. 224
3.5.2 Support ............................................................................ 164 5.4.2 Creating an Embedded ABAP Data Flow .......................... 225
3.6 How to Troubleshoot Execution Exceptions ................................... 166 5.5 Transforms .................................................................................... 231
3.6.1 Viewing Job Execution Logs ............................................. 166 5.5.1 Platform Transforms ......................................................... 231
8 9
Contents Contents
5.5.2 Data Integrator Transforms ............................................... 249 8.3.2 Oracle .............................................................................. 369
5.5.3 Data Quality Transforms ................................................... 271 8.3.3 SQL Server ....................................................................... 369
5.6 Datastores ..................................................................................... 299 8.4 Target-Based CDC Solution ............................................................ 372
5.6.1 SAP BW Source Datastores ............................................... 299 8.5 Timestamp CDC Process ................................................................ 376
5.6.2 SAP BW Target Datastore ................................................. 303 8.5.1 Limitations ....................................................................... 376
5.6.3 RESTful Web Services ....................................................... 305 8.5.2 Salesforce ......................................................................... 377
5.6.4 Using RESTful Applications in Data Services ..................... 307 8.5.3 Example ........................................................................... 377
5.7 File Formats ................................................................................... 309 8.6 Summary ....................................................................................... 379
5.7.1 Flat File Format ................................................................ 310
5.7.2 Creating a Flat File Template from a Query Transform ...... 316
5.7.3 Excel File Format .............................................................. 316 PART III Applied Integrations and Design Considerations
5.7.4 Hadoop ............................................................................ 324
5.8 Summary ....................................................................................... 327 9 Social Media Analytics .............................................................. 383
6 Variables, Parameters, and Substitution Parameters ............... 329 9.1 The Use Case for Social Media Analytics ........................................ 384
9.1.1 It’s Not Just Social ............................................................ 385
6.1 Substitution Parameters ................................................................. 331 9.1.2 The Voice of Your Customer ............................................. 386
6.2 Global Variables ............................................................................ 335 9.2 The Process of Structuring Unstructured Data ................................ 387
6.3 Variables ....................................................................................... 336 9.2.1 Text Data Processing Overview ........................................ 387
6.4 Local Variables .............................................................................. 337 9.2.2 Entity and Fact Extraction ................................................. 388
6.5 Parameters .................................................................................... 339 9.2.3 Grammatical Parsing and Disambiguation ......................... 393
6.6 Example: Variables and Parameters in Action ................................. 340 9.3 The Entity Extraction Transform ..................................................... 394
6.7 Summary ....................................................................................... 342 9.3.1 Language Support ............................................................ 395
9.3.2 Entity Extraction Transform: Input, Output,
and Options ..................................................................... 396
7 Programming with SAP Data Services Scripting Language 9.4 Approach to Creating a Social Media Analytics Application ........... 404
and Python ................................................................................ 343 9.4.1 A Note on Data Sources ................................................... 405
9.4.2 The Data Services Social Media Analysis Project ............... 407
7.1 Why Code? .................................................................................... 343 9.5 Summary ....................................................................................... 411
7.2 Functions ...................................................................................... 345
7.3 Script Object ................................................................................. 349
7.4 Coding within Transform Objects ................................................... 352 10 Design Considerations .............................................................. 413
7.4.1 User-Defined Transforms .................................................. 352
7.4.2 SQL Transform ................................................................. 356 10.1 Performance .................................................................................. 414
7.5 Summary ....................................................................................... 357 10.1.1 Constraining Results ......................................................... 414
10.1.2 Pushdown ........................................................................ 414
10.1.3 Enhancing Performance When Joins Occur on
8 Change Data Capture ................................................................ 359 the Job Server .................................................................. 422
10.1.4 Caching ............................................................................ 423
8.1 Comparing CDC Types ................................................................... 360 10.1.5 Degree of Parallelism (DoP) .............................................. 424
8.2 CDC Design Considerations ........................................................... 362 10.1.6 Bulk Loading .................................................................... 425
8.3 Source-Based CDC Solutions .......................................................... 363
8.3.1 Using CDC Datastores in Data Services ............................. 363
10 11
Contents Contents
12 13
Index
A Application (Cont.)
settings, 92
ABAP data flow, 223 Architecture
create, 225 performance considerations, 24
extraction, 229 scenario, 24
ABAP program, dynamically load/execute, 82 system availability/uptime considerations, 24
Access server web application performance, 27
configure, 123, 127 Associate transform, 292
parameters, 129 Asynchronous changes, 361
SSL, 148 ATL, export files from Data Services Designer,
Accumulating snapshot, 437, 450 32
Acta Transformation Language, 493 Auditing, prevent pushdown, 421
Adapter Management, 84 Authentication, 58
Adapter SDK, 504 Auto Documentation
Adaptive Job Server, 93 module, 109
Adaptive Process Server, 93 run, 109
Address
census data, 81
cleansing and standardization, 282 B
data, clean and standardize, 275
global validation, 81 BAPI function call, 466
ISO country codes, 295 BAPI function call, read data into
latitude/longitude, 294 Data Services, 470
list of potential matches, 297 Batch job, 208
street-level validation, 82 auditing, 210
Address Cleanse transform, 74 create, 189
Address SHS Directory, 75 execution properties, 209
Administrator module, 95 logging, 209
Adapter Instances submodule, 99 monitoring sample rate, 209
Central Repository, 100 performance, 210
Management configuration, 104 statistics, 210
Object Promotion submodule, 101 trace message, 209
real-time job, 97 Batch Job Configuration tab, 97
Real-Time submodule, 97 Big data, 500
SAP Connections submodule, 98 unstructured repositories, 451
Server Group, 100 BIP, 24, 57, 62, 63
Web Services submodule, 98 CMC, 58
Aggregate fact, 437, 450 licensing, 63
Alias, create, 136 patch, 63
All-world directories, 81 user mapping, 59
Apache Hadoop, 324 Blueprints, 404
Application BOE Scheduler, 160
authorization, 92 BOE 씮 BIP
framework, 46 Brand loyalty, 384
513
Index Index
514 515
Index Index
516 517
Index Index
Global address, single transform, 282 Installation Job service Mapping, 200
Global digital landscape, 499 commands, 42 start, 115 input fields to content types, 272
Global Postal Directories, 81 deployment decisions, 62 stop, 115 multiple candidates, 200
Global Suggestion List transform, 297 Linux, 84 Join pair, 201 parent and child objects, 264
Global variable, 335 settings, 41 Join rank, 238, 422 Master data management, 479
data type, 336 Integrated development environment, 180 option, 232 Match results, create through
Integration, 30 Data Services, 481
IPS, 24, 57 Match Review, 189, 477
H cluster installation, 24 K approvers/reviewers, 486
CMC, 58 best record, 484
Hadoop, 451 licensing, 63 Key Generation transform, 249, 441 configuration, 482
Hadoop distributed file system (HDFS), 324 system availability/performance, 24 Kimball approach, 429 configure job status, 485
Haversine formula, 462 user mapping, 59 principles, 430 data migration, 481
HDFS Kimball methodology, 429 process, 478
connect to Merge transform, 326 Klout™ scores, 385 results table, 483
load/read data, 325 J Knowledge transfer, 165 task review, 487
Hierarchical data structure, 265 terminology, 479
read/write, 247 Job, 176, 208 use cases, 479
Hierarchy Flattening transform, 261 add global variable, 335 L Match transform, 288, 479
High-rise business building address, 78 blueprint, 404 break key, 288
History Preserving transform, 256 common execution failures, 168 LACSLink Directory files, 77 consolidate results from, 292
effective dates, 256 common processes, 426 Lastline, 276 Group Prioritization option, 288
History table, 444 data provisioning, 191 Latency, 178 output fields, 291
Horizontal flattening, 261 dependency, 160 Left outer join, 232 scoring method, 289
HotLog mode, 369 execution exception, 166 Lineage, 64, 496 Memory, 40
filter, 96 diagrams, 421 page caching, 41
graphic portrayal, 109 Linux Merge transform, 237
I processing tips, 426 configure Excel sources, 84 Metadata Management, 64, 496
read data from SAP BW, 300 enable Adapter Management, 84 Mirrored mount point, 25
I/O replication, 191 Local repository, 64 Monitor log, 97
performance, 37 rerunnable, 426 Local variable, 337 Monitor, system resources, 37
pushdown, 414 schedule, 157 Log file, retention history, 92 Multi-developer environment, 65
reduce, 414 specify specific execution, 158 Logical flow object, 222 Multi-developer landscape, 31
speed of memory versus disk, 423 standards, 330 lookup_ext(), 447, 448 Multiline field, 276
IDE, 175, 179 Job Error Log tab, 167
primary objects, 175 Job execution
Idea Place, 505 log, 166 M N
IDoc message transform, 471 statistics, 113
Impact and Lineage Analysis module, 112 Job server, 211 Mail carrier's route walk sequence, 79 Name Order option, 273
InfoPackage, 469 associate local repository, 119 Management or intelligence tier, 46 National Change of Address (NCOALink)
Information Platform Services 씮 IPS configure, 115 Map CDC Operation transform, 259, 366 Directory files, 80
Information Steward, 89, 477 create, 116 input data and results, 260 National Directory, 74
expose lineage, 496 default repository, 121 Map Operation transform, 246, 374 Native Component Supportability (NCS), 131
Match Review, 477 delete, 117 inverse ETL job, 246 Natural language processing, 394
InfoSource, create, 303 edit, 117 Map to output, 187 Nested data, 263
Inner join, 232, 422 join performance enhancement, 422 Map_CDC_Operation, 467 Nested Relational Data Model (NRDM), 235
Input and Output File Repository, 94 remove repository, 120 Map_Operation, 444, 446
SSL, 148 transform, 439
518 519
Index Index
Network traffic, 45 Product Availability Matrix (PAM), Real-time processing loop, 213 S
New numeric sequence, 238 repository planning, 65 Replication server, 363
Production, 30 repoman, 72 Salesforce source tables, CDC func-
Profiler repository, 65 Reporting environment, 62 tionality, 377
O Project, 176, 180, 191 patch, 63 Sandbox, 29
create, 189 Repository, 64 SAP APO, 455, 465
Object, 207 data source, 192 create, 67 interface, 469
common, 426 lifecycle, 34 create outside of installation program, 67 load calculations, 467
connect in Data Services Designer, 186 new, 192 default, 121 read data from, 470
hierarchy, 207 Public sector extraction, 391 execution of jobs, 95 SAP BusinessObjects BI platform 씮 BIP
pairs, 223 Pushdown, 414 import ATL, 493 SAP BusinessObjects Business Intelligence
types, 175 Data Services version, 419 manage users and groups, 100 platform, 24
Object Promotion, 155 determine degree of, 415 number of, 65 SAP BusinessObjects reporting, 63
execute, 156 manual, 421 planning for, 65 SAP BW, 465, 469
with CTS+, 157 multiple datastores, 415 pre-creation tasks, 67 load data, 303
Operating system, 36 prevented by auditing, 421 resync, 119 set up InfoCubes/InfoSources, 303
I/O operations, 37 transform instructions, 419 sizing, 66 source datastore, 299, 302
optimize to read from disk, 37 Python, 343, 352 update password, 118 source system, 470
Operation code, UPDATE/DELETE, 372 calculate geographic positions, 463 update with UNIX command line, 72 target datastore, 303, 304
Operational Dashboard module, 113 Expression Editor, 354 update with Windows command line, 70 SAP Change and Transport System (CTS),
Optimized SQL, 415, 416 options, 354 Repository Manager, 61 install, 83
perform insert, 417 programming language, 292 Windows, 68 SAP Customer Connection, 507
Python script, analysis project, 406 Representational State Transfer (REST), 305 SAP Customer Engagement Intelligence
Residential and business addresses, 80 application, 409
P Residential Delivery Indicator SAP Data Services roadmap, 499
Q Directory files, 77 SAP Data Warehousing Workbench, 303
Pageable cache, 122, 130 REST, use applications in Data Services, 307 SAP ECC, 455, 465, 466
Parameters, 339 Quality assurance, 30 RESTful web service, 47, 305 data load, 467
Parse Discrete Input option, 274 Query Editor, mapping derivation, 187 Reverse Soundex Address Directory, 75 extraction, 467
Parsing rule, 271 Query join, compare data source/target, 444 Reverse-Pivot transform, 269 interfaces, 471
Performance metrics and scoping, 131 Query transform, 229, 231 RFC Server configuration, 300 SAP HANA, 451, 505
Performance Monitor, 97 constraint violation, 170 RFC Server Interface, add/status, 98 SAP HANA, integrate platforms, 452
Periodic snapshot, 437, 450 data field modification, 232 Round-robin split, 425 SAP Information Steward 씮
Permissions, users and groups, 60 define joins, 232 Row Generation transform, 238 Information Steward
Pipeline forecast, 465 define mapping, 186 Row, normalized to canonical, 267 SAP server functions, 82
Pivot transform, 267 filter records output, 233 Row-by-row comparison, 361 SCD
Placeholder naming, 331 mapping, 232 Rule, topic subentity, 391 change history, 435
Platform transform, 231 primary key fields, 235 Rule-based scoring, 289 dimension entities, 435
Postal directory, 74, 278 sorting criteria, 234 RuleViolation output pipe, 241 type 1, 438
global, 81 Runbook, 165 type 1 implement in data flow, 439
USA, 74 Runtime type 2, 441
Postal discount, 299 R configure, 130 type 2 implement in data flow, 442
Postcode directory, 76 edit resource, 122 type 3, 443
Postcode reverse directory, 75 Range, date sequence, 249 type 3 implement in data flow, 444
Predictive analytics, 385 Real-time job, 211, 456 type 4, 444
Privileges, troubleshoot, 168 execute as service, 214 type 4 implement in data flow, 445
execute via Data Services Designer, 213 typical scenario, process and load, 437
520 521
Index Index
Script, 176 SQL, execution statements, 415 TDP (Cont.) USA Regulatory Address Cleanse
object, 349 SSL extraction, 388 transform, 275
Scripting language and Python, 343 communication via CMS, 143 grammatical parsing, 393 add-on mailing products, 279
Secure Socket Layer (SSL), 142 configuration for Data Services, 147 public sector extraction, 391 address assignment, 275
Semantic disambiguation, 393 configure, 130 SAP Customer Engagement Intelligence address class, 281
Semantics and linguistic context, 388 enable for Data Services Designer Clients, 146 application, 409 address standardized on output, 279
Sentiment extraction demonstration, 407 enable for the Server Intelligent Agent, 143 SAP HANA, 410 CASS certification rules, 280
Server enable for web servers, 145 SAP Lumira, 404 Dual Address option, 279
group, 100 key and certificate files, 143 semantic disambiguation, 393 field class, 281
list of, 93 path, 124 Temporary cache file, encrypt, 149 map your input fields, 276
Server-based tools, 91 Staging, 30 Text Data Processing Extraction Customization output fields, 281
Services configuration, 50 tier, 426 Guide, 391 Reference Files option group, 278
Shared directory, export, 103 Star schema, 431, 437 Text data processing 씮 TDP User
Shared object library, 65 dimensional (typical), 433 Text data, social media analysis, 384 acceptance, 30
SIA Start and end dates, 257 Threshold, Match Review, 482 authenticate, 58
clustering, 24 Statistic-generation transform object, 113 Tier, 46 manage for central repository, 100
clustering example, 28 Street name spelling variations, 75 Tier to server to services mapping, 49 permissions, 92
subnet, 27 Structured data, 387 Time dimension table, 250 User acceptance testing (UAT), 31
Simple, 500 Substitute parameter, 133, 138 Timestamp User-defined transform, 292, 352
Simple Object Access Protocol (SOAP), 306 Substitution parameter, 331 CDC, 361, 376 map input, 353
Sizing, 45 configuration, 105 Salesforce table, 377 USPS
Sizing, planning, 54 create, 332 TNS-less connection, 69 change of address, 80
Slowly changing dimensions (SCD), 434 design pattern, 332 Trace log, 97 discounts, 79
SMTP value override, 334 Trace message, 209 UTF-8 locale, 41
configuration, 124 Substitution variable configuration, 171 Transaction fact, 437
for Windows, 131 SuiteLink Directory file, 78 load, 448
Snowflaking, 436 Surrogate key, 435 Transform, 176, 231 V
SoapUI, 217 pipeline, 446 compare set of records, 374
Social media analytics, 383 svrcfg, 113 Data Integrator, 249 Validation rule, 112
create application, 404 Synchronous changes, 361 data quality, 271 definition, 241
Voice of the Customer domain, 390 System availability, 24 Entity Extraction, 394 fail, 241
Social media data, filter, 409 System resources, 37 object, 352 failed records, 243
Source data, surrogate key, 249 platform, 231 reporting, 242
Source object, join multiple, 237 Transparency, decreased, 421 Validation transform, 240
Source system T Try and catch block, 223 define rules, 241
audit column, 414 Twitter data, 408 Varchar, 202
tune statement, 421 Table Comparison transform, 254, 361, 372, Variable, 336
Source table 375, 439, 442 pass, 339
prepare/review match results, 478 comparison columns, 255 U vs. local variable, 339
run snapshot CDC, 376 primary key, 255 Vertical flattening, 261
Source-based CDC, 360 Table, pivot, 267 Unauthorized access, 30 VOC, requests, 390
datastore configuration, 363 Target table, load only updated rows, 367 UNIX, adding an Excel Adapter, 317 Voice of the Customer (VOC) domain, 390
SQL Query transform, 244 Target-based CDC, 360, 372 Unstructured data, 387
SQL Server database CDC, 369 data flow design consideration, 375 Unstructured text, TDP, 387
SQL Server Management Studio, 369 Targeted mailings, 79 Upgrades, 63 W
SQL transform, 356 TDP, 387 US zip codes, 78
SQL transform, custom code, 356 dictionary, 390 Warranty support, 165
Web presence, 384
522 523
Index
524
First-hand knowledge.
524 Pages, 2015, $79.95/€79.95 We hope you have enjoyed this reading sample. You may recommend
ISBN 978-1-4932-1167-8 or pass it on to others, but only in its entirety, including all pages. This
reading sample and all its parts are protected by copyright law. All usage
www.sap-press.com/3688 and exploitation rights are reserved by the author and the publisher.