SAP BusinessObjects Data Services

Technical Overview

Agenda

1. 2. 3. 3. 4. 5. 6.

Data Services Technical Architecture Developing in Data Services Data Quality Management Deploying in Data Services Deploying for Performance Managing with Data Services Maintenance with Data Services

© SAP 2008 / Page 2

Are you spending too much resources learning different tools?
DATA INTEGRATOR
DI Development UI

DATA QUALITY
DQ Development (UI)

DI Metadata

DQ Metadata

DI Engine

DQ Engine

Separate products create the need for redundant investments in training, software management, and life-cycle support
© SAP 2008 / Page 3 2007/Page 3

SAP BusinessObjects Data Services
Data Services is the first single solution combining data integration and data quality

Data Integrator
Development User Interface

Data Services
One Development User Interface

Metadata Repository Runtime Architecture

One Metadata Repository
One Runtime Architecture

Administration and Connectors

Access Transform Improve

Data Quality
Development User Interface Metadata Repository Runtime Architecture

Deliver

One Administration Environment
Administration and Connectors

One Set of Connectors

© SAP 2008 / Page 4

Introducing SAP BusinessObjects Data Services
First single, enterprise-class data Integration and quality application
One tool for enterprise-class data integration and data quality

DATA SERVICES
Data Services
One Development User Interface

One intuitive development user interface (UI) One administration environment

One Metadata Repository
One Runtime Architecture

Access Transform Improve

One runtime architecture

Deliver

One Administration Environment
One Set of Connectors

© SAP 2008 / Page 5 2007/Page 5

com. April 2009 © SAP 2008 / Page 6 .SAP BusinessObjects Data Services Gold Award Winner – 2008 Product of the Year Source: SearchDataManagement.

cleanse. Transform. enhance. and Move data anywhere at any frequency IMPROVE data (parse. match. consolidate) Deliver auditable information Semantic Layer   Maximize developer productivity Deliver extreme scalability   Physical Data Integration On single servers On multiple servers (Grid computing) Benefits:   Database / Application agnostic Real Time Total graphical environment maximizes developer productivity with little or no coding  Investment protected with extreme scalability possibility © SAP 2008 / Page 7 Batch .What is Data Services? Technology with capabilities to    Explore (profile).

Co. Shenton House 3 Shenton Way. SGX Center 2A Shenton Way. #01-07 Singapore Emails Tan Hock Guan Future & Electronics Co. Singapore Hock Guan Tan Future & Electronics Co. #01 Singapore Data Integrator SAP BW Richard Tan Hock Guan Future Electornics Singapore 068804 Excel SAP R/3 Web CRM ERP Data Mart Tan Hok Guan Future Elect. Hok Guan Future Elect. Singapore Files (Flat. S068804 Mainframes Tan. Co.Use Case: Data Integrator Integrate Heterogeneous Data Disparate sources CRM One or more Targets Business Intelligence Mr. Raw data © SAP 2008 / Page 8 Extracts & Transforms data Loads properly formatted and structured data into Target/s . 7th Floor. #07-01. XML) Data Warehouse Original. Richard Tan Hock Guan Future Electornics RDBMS 2 Shenton Way SGX Centre 1.

2 Shenton Way #07-01 SGX Centre 1 Singapore 068804 Tan Hock Guan Future & Electronics Co. Hok Guan Future Elect. disparate data Mr.Use Case: Data Quality Management Scheduled Cleansing/Matching Uncleaned. #01-07 Singapore Tan Hock Guan Future & Electronics Co. Raw data © SAP 2008 / Page 9 Standardized. 7th Floor. Shenton House 3 Shenton Way. #07-01. De-Duplicated data . SGX Center 2A Shenton Way. Corrected data Matched. Richard Tan Hock Guan Future Electornics 2 Shenton Way SGX Centre 1. Co. S068804 Cleansed data Richard Tan Hock Guan Future Electornics 2 Shenton Way #07-01 SGX Centre 1 Singapore 068804 Tan Hok Guan Future Elect. #01 Singapore Data Quality Management Hock Guan Tan Future & Electronics Co. 3 Shenton Way #07-01 Shenton House Singapore 068805 De-duped data Apply “Fuzzy” match techniques Richard Tan Hock Guan Future Electronics 2 Shenton Way #07-01 SGX Centre 1 Singapore 068804 Tan. 3 Shenton Way #07-01 Shenton House Singapore 068805 Using Dictionaries + Directories Rules Original.

Data Services Technical Architecture Architecture © SAP 2008 / Page 10 .

while Client runs only on Windows Central Repository Local Repository Profiler Repository Dictionaries + Directories Real-time Services Request-Response Access Server Job Server and Engine Heterogeneous Data Sources Heterogeneous Data Targets © SAP 2008 / Page 11 .Data Services Architecture Client-based (for job design) and mostly Web-based (for all other tasks) Designer (Windows) Administrator (Web) Web Applications Servers run on Windows and UNIX (including Linux).

Data Services Physical Architecture Client DI Designer (Windows) DI Administrator (Web) Web Applications Repository (Logical – repositories can be separate/different databases on different servers) Central Repository Global Parsing Repository Local Repository Profiler Repository Server (Windows. UNIX. Linux) (Windows/Linux 32-bit. UNIX 64-bit) Address Directories Real-time Services Job Server and Engine Request-Response Access Server © SAP 2008 / Page 12 .

Developing in Data Services Development Environment © SAP 2008 / Page 13 .

Data Services Designer Project Source Job Workflow / Dataflow Canvas Dataflow Target Repository Window Status • Job Server • Profiler Server © SAP 2008 / Page 14 .

Pre-Built Interfaces Offers tremendous productivity  Pre-built (no coding) access to common databases. common ERP applications  Access to ERP applications is via ERP application layer. maintaining application integrity Exposes the application’s metadata layer  Databases: • • • • • • • • • Oracle DB2 Sybase & IQ SQL Server Informix Teradata ODBC MySQL Netezza Applications: • • • • • • • JD Edwards Oracle Apps PeopleSoft Siebel SFDC SAP BI SAP ERP (R/3) • ABAP • BAPI • IDoc Files & Transport: • • • • • • • • • Text delimited Text fixed width EBCDIC XML Cobol Excel HTTP JMS SOAP (Web Services) Mainframe* (with partner): • • • • • • • ADABAS ISAM VSAM Enscribe IMS/DB RMS Both direct and change data © SAP 2008 / Page 15 .

Type Connection Details Message 1 Right-click to create Connection © SAP 2008 / Page 16 .Connecting to Datastore Source / Target 2  Select  Enter Name.

Connecting to File Source 2 Enter Connection Details 1 Right-click to create File Schema and Connection © SAP 2008 / Page 17 .

g. Xxx Xxxx99.Data Profiling Analysis of data beyond viewing  Frequency distribution  Distinct values   Null values Minimum/Maximum values  Data Patterns (e. 99-Xxx) Can drill down to view specific records © SAP 2008 / Page 18 .

Relationship Analysis Comparison of values between data sets to determine fit  Shows % of non-matching values among   Table .Flat file  Table .Table Flat file .Flat file Can drill down to view actual records © SAP 2008 / Page 19 .

Interactive Debugger Very useful for checking the logic of a dataflow  Can examine and modify data row-by-row  Can place filters and breakpoints to pause execution of job and returns control to user  Can break after processing a number of rows © SAP 2008 / Page 20 .

Data Quality Management Data Quality © SAP 2008 / Page 21 .

Austria. U. Locality/City. New Caledonia. French Guiana. Japan*. Greenland. United Kingdom. * . Switzerland. Greece. Guam. Puerto Rico. Belgium. New Zealand. Note: 1. Northern Mariana Islands. Liechtenstein. Virgin Islands. Postcode) (248 countries/territories):  Singapore. India. Australia*. Canada*. San Marino. Mayotte. Poland. Portugal. Sweden. Germany. France. Monaco. (including the ones listed under Primary/Street range) Primary/Street range (to unit level . Italy. Cyprus. United States*. Finland. Spain. Faroe Islands.Address Assignment Levels for Countries Last-Line (or Country. Martinique.Requires Country-specific Engine 2. Primary/Street name: (41 countries):  Brazil. Wallis and Futuna. Saint Pierre and Miquelon. Malaysia. Netherlands.25 countries):  American Samoa. Denmark. Reunion. Luxembourg. Address-line (or Unit) assignment level for Singapore before end 2008 through license with SingaporePost © SAP 2008 / Page 22 . Norway.S. Palau. French Polynesia. etc. Guadeloupe.

Address Cleanse can correct or add postal codes © SAP 2008 / Page 23 .More on Address Cleansing / Verification Original Addresses Cleansed Addresses For some countries.

More on Data (Customer) Cleansing Original Customer Data Cleansed Customer Data Parsing (Identification and Isolation of specific parts of mixed data) can be extended to non-Customer information © SAP 2008 / Page 24 .

Universal Data Cleanse Uncleaned Product data Cleansed data Data Quality Management Custom Dictionary / Rules Original. Raw data Customized Dictionary / Rules Standardized and properly formatted data © SAP 2008 / Page 25 .

Pizza = 1 : Pizza_Crust : 6. End_Action.Universal Data Cleanse Custom Parser Word Break Tokenize Rule Match Action Word breaking  Break the input down into smaller pieces Large mushrooms sausage pepperoni stuffed crust Rule1 = Size + Topping + Topping + Topping + Crust + Crust. Pizza = 1 : Pizza_Crust : 5. Pizza = 1 : Pizza_Topping : 3. Pizza = 1 : Pizza_Size : 1. Tokenization  Assign meaning to the pieces Rule matching (pattern)  Match the pieces to rules Actions & Action item assignment  Create output from rule matches Pizza Pizza_Crust Pizza_Topping Pizza_Size © SAP 2008 / Page 26 . Action = Pizza. Pizza = 1 : Pizza_Topping : 4. Pizza = 1 : Pizza_Topping : 2.

Universal Data Cleanse Custom Dictionaries Dictionary entries can be added or modified Entire new dictionary can be created Rule: Replace this word白い with “white” when it is classified as COLOR © SAP 2008 / Page 27 .

Deploying in Data Services Deployment Environment © SAP 2008 / Page 28 .

Multi-User Development Environment DEV Job Server TEST Job Server PROD PROD Job Server Job Server PROD Job Server PROD Job Server Local Repo TEST Local Repo Server Group Local Repo PROD Local Repo Local Repo Check in Get Check in Get Local Repo DEV © SAP 2008 / Page 29 Central Repo TEST Central Repo PROD .

Check-In/Check-Out Management Sophisticated method for sharing/moving DI objects  Involves check-in/out of objects with versioning Local Centralized © SAP 2008 / Page 30 .

Real-Time vs Batch (Scheduled) Processing Support Typical Batch Job Converted to Real-Time Job Real-Time capabilities built into the engine in the form of Request-Response message processing Immediate action upon receiving Message 1 ERP or Web applications 4 Real-time Jobs 2 Message Listener Real-time Services 3 Immediate action upon receiving Response Data Services Job Server and Engine Response Access Server © SAP 2008 / Page 31 .

Invoking/Consuming Data Services-Produced Web Services 1 External Web Service client 2 Real-time Services 5a Data Services Web Server Invoke Real-time jobs Request-Response Access Server 4a 3 Web Services server 4b Invoke Batch jobs 5b Job Server and Engine 4c Repository © SAP 2008 / Page 32 .

Consuming 3rd Party Web Services 1 Define Web Service source 2 Use imported Web Service operation as standard function © SAP 2008 / Page 33 .

Deploying for Performance Deployment Environment © SAP 2008 / Page 34 .

Data Services Datastore Configurations Data Sources/Targets can be switched between different development environments © SAP 2008 / Page 35 .

Data Services System Configurations Substitution Parameter Configuration $$FILE_PATH $$LOG_PATH DEV C:\TEMP C:\LOG Datastore configuration PROD C:\PROD C:\LOG System configuration PROD System configuration DEV Configuration1 Oracle SJ-PROD Database type Datastore Source Server name Database name User Password Source_Data sd_5263 ***** Datastore configuration LocalServer mySQL Localhost Source_Data sa ProdServer Oracle SJ-PROD Source_Data sd_5263 Datastore Target Database type Server name Database name User Password © SAP 2008 / Page 36 ***** ***** .

1 Parallel Execution  via explicit Parallel Data flows The Dimension data flows run in parallel The Dimensions data flow run before the Facts data flow © SAP 2008 / Page 37 .Improving Performance .

Improving Performance .2 Parallel Execution  via Partitioned source and target Non-partitioned data flow at Design Time Specifying Partitioning scheme Partitioned at Execution time If Source is partitioned 2 ways If Target is partitioned 2 ways © SAP 2008 / Page 38 .

Improving Performance .3 Parallel Execution  via specifying Degree of Parallelism (DOP) > 1 At Design Time At Execution Time DOP=2 © SAP 2008 / Page 39 .

Distributing Data Flow Execution .1 Run as a separate process for the following resource-intensive operations  Hierarchy_Flattening  Join  GROUP_BY  ORDER_BY  DISTINCT  Table_Comparison  Lookup_ext function  Count_distinct function  Associate  CountryID  Global Address Cleanse  Global Suggestion List  Match  User-Defined © SAP 2008 / Page 40 .

transform at source)  Concept is “Push-Down” processing Source Target Transform at Target Data Integrator Source Target Transform at Source Data Integrator Source Target © SAP 2008 / Page 41 .Distributing Data Flow Execution .2 Various ways available to improve performance  Traditional ETL Data Integrator Flexible ETL (traditional transform)  Hybrid “ELT” (transform at target.

3 “Push-Down” in detail  Push-Down settings “Push-Down” can be staged onto database or file directory Original Data Flow To push down the GroupBy operation to the target server To push down operations to the source server Data Flow with Push-Down’s © SAP 2008 / Page 42 .Distributing Data Flow Execution .

or sub-Data flow Source Target © SAP 2008 / Page 43 . Data flow.4 Grid deployment (across multiple servers) is easy  No changes required in Job flows  Simply specify at execution time what to distribute across servers – Job.Distributing Data Flow Execution .

Distributing Data Flow Execution .5 Distribution Level 101  Job level distribution  All processes belonging to job execute on the same computer All sub-data flows run on same computer © SAP 2008 / Page 44 .

Distributing Data Flow Execution .6 Distribution Level 101  Data flow level distribution  All processes of each data flow can execute on a different computer Computer 1 Computer 2 © SAP 2008 / Page 45 .

7 Distribution Level 101  Sub data flow level distribution  Each sub data flow can execute on a different computer Computer 2 Computer 4 Computer 1 Computer 3 © SAP 2008 / Page 46 .Distributing Data Flow Execution .

Managing with Data Services Management Environment © SAP 2008 / Page 47 .

General Management Web-based Management with graphical dash-board info © SAP 2008 / Page 48 .

DI re-runs uncompleted or failed steps under the same conditions as the original job  There is Manual Recovery. DI records the result of each successful step in a job During recovery.Recovery From Unsuccessful Job Execution Automated Recovery  With this set. too © SAP 2008 / Page 49 .

Maintenance with Data Services Maintenance Environment © SAP 2008 / Page 50 .

Auto-Reporting Via Web-based Management Console  Reports automatically generated  Report details can be interactively chosen and drilled into  Reports can be printed Interactive Dataflow graphics © SAP 2008 / Page 51 .

Lineage And Impact Provides analyzing dependencies for  Impact (source to which target/s)  Lineage (target back to which source/s) © SAP 2008 / Page 52 .

Thank you! © SAP 2008 / Page 53 .

or transmitted in any form or for any purpose without the express prior written permission of SAP AG. xApps. and functionalities of the SAP ® product and is not intended to be binding upon SAP to any particular course of business.A. Please note that this document is subject to change and may be changed by SAP at any time without notice. SAP. or consequential damages that may result from the use of these materials. R/3. The information in this document is proprietary to SAP.Copyright 2009 SAP AG All Rights Reserved No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. SAP shall have no liability for damages of any kind including without limitation direct. copied. product strategy. SAP does not warrant the accuracy or completeness of the information. Business Objects and the Business Objects logo. includin g but not limited to the implied warranties of merchantability. No part of this document may be reproduced. and/or development. either express or implied. SAP has no control over the information t hat you may access through the use of hot links contained in these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages. Web Intelligence. This limitation shall not apply in cases of intent or gross negligence. special. SAP assumes no responsibility for errors or omissions in this document. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vend ors. ByDesign. This document is provided without a warranty of any kind. This document contains only intended strategies. SAP NetWeaver. Xcelsius and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S. graphics. SAP Business ByDesign. or other items contained within this material. or non-infringement. National product specifications may vary. indirect. fitness for a particular purpose. links. PartnerEdge and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. developments. Duet. Crystal Decisions. BusinessObjects. in the United States and in several other countries. Business Objects is an SAP Company. text. Data contained in this document serves informational purposes only. All other product and service names mentioned and associated logos displayed are the trademarks of their respective companies. The statutory liability for personal injury and defective products is not affected. This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. xApp. The information contained herein may be changed without prior notice. © SAP 2008 / Page 54 . Crystal Reports.

Sign up to vote on this title
UsefulNot useful