You are on page 1of 50

Real-time Data Quality for SAP

Dietrich O. Banschbach Manager, R&D EMEA SAS International

Copyright 2005, SAS Institute Inc. All rights reserved.

Agenda

Overview dfConnector for SAP Scenarios Technology Additional Information

Copyright 2005, SAS Institute Inc. All rights reserved.

Overview: Companies
Companies involved:

SAP AG - worlds largest Enterprise


Resource Planning (ERP) software company

DataFlux Corporation (a SAS company)


a leading provider of data management solutions consisting of data quality, data profiling, data integration, data augmentation and data monitoring

Copyright 2005, SAS Institute Inc. All rights reserved.

Overview: SAP partnership


SAS is an SAP Software Partner with several
SAP certified interfaces

DataFlux, an SAP Software Partner in its own


right, has attained SAP interface certification for its DataFlux dfConnector for SAP product

Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP


DataFlux dfConnector for SAP enhances data
quality in SAP systems in real-time

Facilitates communication between SAP


applications and DataFlux dfIntelliServer

Offers transparent access from SAP applications


to DataFlux dfIntelliServer services for data validation, standardization, deduplication, errortolerant search, etc.

Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP


Provides a remote function call (RFC) server that
channels function calls from within SAP systems to dfIntelliServer and returns results to SAP

Framework consisting of a set of DataFlux supplied


ABAP functions that map to dfIntelliServer functions. These can be called by any SAP application.

Functions can be used to build new or extend


existing data quality solutions in SAP using DataFlux methods

Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP: Architecture

BADI
RFC server, based on SAP Java Connector

API

SAP Web Application Server

Business Add-In (ABAP)

dfIntelliServer (data quality algorithms, reference database)

JDBC Search Index SAS Oracle MySQL DB/2 MS SQL


Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP: Framework


Function modules written in ABAP use a
standard call function destination to invoke a method that is not part of the current SAP system dfConnector listening at the specified destination

The call function destination invokes


dfConnector gathers all parameters and initiates
the appropriate call into dfIntelliServer using its Java client API

Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP: Postal Address Validation


ABAP programmers can use the framework
functions in any SAP application

As an example application that uses this


framework, dfConnector for SAP supports postal address validation as defined in SAPs BC-BASPV certification scenario.

Enhances SAPs Business Address Services


(formerly Central Address Management)

dfConnector is Certified for SAP NetWeaver.


Formally tested with R/3 Enterprise (4.7)
Copyright 2005, SAS Institute Inc. All rights reserved.

dfConnector for SAP: Postal Address Validation


Customer, vendor and other addresses in SAP
are checked in real-time for correct city names, street names, house numbers and zip codes

Missing information is auto completed from a


reference database

Quarterly adjustment process keeps addresses


up to date via a batch-run
Reports which addresses are correct and which ones could not be validated (stating the reason)
Process can be used to do initial validation of all addresses in SAP
Copyright 2005, SAS Institute Inc. All rights reserved.

10

dfConnector for SAP: Deduplication


In addition to postal address validation, a
duplicate check is carried out before a new entry can be saved in SAP

Avoids multiple entries of the same customer or


vendor name with slight differences in spelling

Offers error tolerant (fuzzy) search

Copyright 2005, SAS Institute Inc. All rights reserved.

11

Scenarios: Postal Address Validation


This scenario enhances data quality within SAP in
real-time as address data is entered interactively

Addresses are checked for correct:


city names
street names house numbers zip codes

Input is standardized according to postal authority


requirements (e.g. USPS rules)

Missing information can be auto completed


Copyright 2005, SAS Institute Inc. All rights reserved.

12

Scenario 1: Create new customer


Create new customer in SAPGUI using standard
SAP transaction XD01

Fill in data:
Company name
City Country (No street)

Copyright 2005, SAS Institute Inc. All rights reserved.

13

Scenario 1: Create new customer

Copyright 2005, SAS Institute Inc. All rights reserved.

14

Scenario 1: Create new customer

Required entry

Copyright 2005, SAS Institute Inc. All rights reserved.

15

Scenario 1: Create new customer


Missing information field is colored and cursor is positoned in that field

Error message in status line

Copyright 2005, SAS Institute Inc. All rights reserved.

16

Scenario 1: Create new customer


Street name entered incorrectly (Street instead of Drive)

Click on Check button when all data has been entered

Region required to resolve the address

Copyright 2005, SAS Institute Inc. All rights reserved.

17

Scenario 1: Create new customer


Address is validated by dfIntelliServer
City name converted to uppercase Postal code (ZiP) added Street name uppercased and standardized (DR=Drive) District added automatically

Copyright 2005, SAS Institute Inc. All rights reserved.

18

Scenario 2: Creating a customer with minimal data entry


Data entered in SAP:
Part of a street name with a spelling mistake Postal code

Country (required by SAP)

Copyright 2005, SAS Institute Inc. All rights reserved.

19

Scenario 2: Creating a customer with minimal data entry

Partial street name with spelling mistake Basic postal code No region specified

Copyright 2005, SAS Institute Inc. All rights reserved.

20

Scenario 2: Creating a new customer with minimal data entry


Address is validated by dfIntelliServer
City name uppercased
Postal code added (zip plus 4) Street name uppercased and standardized (PKWY=Parkway) Spelling mistake corrected District added automatically

Region added automatically

Copyright 2005, SAS Institute Inc. All rights reserved.

21

Scenario 3: Inconsistent or unresolvable addresses


Neither post code nor city are specified User insists on saving a record even though the
entry could not be validated

To ensure high availability of the SAP system,


address data can still be entered and saved if dfConnector and/or dfIntelliServer are temporarily unavailable. Entries are marked as not having been checked against official address reference data. Those addresses can be corrected in the dfConnector Quarterly Address Adjustment process which checks and updates in batch mode
Copyright 2005, SAS Institute Inc. All rights reserved.

22

Scenario 3: Inconsistent or unresolvable addresses

Error message: No zip code and/or city specified

Copyright 2005, SAS Institute Inc. All rights reserved.

23

Scenario 3: Inconsistent or unresolvable addresses

Copyright 2005, SAS Institute Inc. All rights reserved.

24

Scenario 4: Duplicate search


The following scenario shows the duplicate
search and elimination capabilities of DataFlux dfConnector for SAP

The scenario first shows how easy it is (caused


by a small typo) to create a duplicate customer record in the SAP database without dfConnector

In comparison, the same process is performed


using dfConnector for SAP to identify potential duplicates and resolve the situation

Copyright 2005, SAS Institute Inc. All rights reserved.

25

Scenario 4: Duplicate search


Using the standard SAP search, the user first checks in
SAP if the customer he would like to create does not currently exist. But accidentally he has a small typo in the street name (Wesston instead of Weston)

Copyright 2005, SAS Institute Inc. All rights reserved.

26

Scenario 4: Duplicate search


The search returns no hits and the user
proceeds under the assumption he can now create a unique customer

He creates and saves a new customer entry,


thus creating a duplicate

Copyright 2005, SAS Institute Inc. All rights reserved.

27

Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved.

28

Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved.

29

Scenario 4: Duplicate search

After that the duplicate search capabilities of


dfConnector are triggered. Based on matchcodes created by dfIntelliServer, potential duplicates are detected

Copyright 2005, SAS Institute Inc. All rights reserved.

30

Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved.

31

Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved.

32

Scenario 4: Duplicate search Transaction flow


Address data is entered in SAPGUI. Postal address
validation executes

The /DATAFLUX/ADDR_SEARCH implementation of the


BAdI ADDRESS_SEARCH is invoked searches for duplicates

Function module /DATAFLUX/DUPLICATE_CHECK


/DATAFLUX/DUPLICATE_CHECK calls dfConnector
which gathers the entered SAP data.

Matchcodes are generated dynamically and a JDBC call is


made to retrieve results from the external RDBMS. The results of the search are returned to dfConnector which passes them to SAP to display a list of potential duplicates
Copyright 2005, SAS Institute Inc. All rights reserved.

33

Scenario 5: Quarterly adjustment process


Quarterly Adjustment is a batch process that
ensures address data stays up to date

If new address data are available e.g. from


USPS, this can be activated in the system in three steps by running:
SAP report to get all addresses DataFlux provided report to check, standardize and auto complete addresses SAP report to write the updated addresses back to the SAP database
Copyright 2005, SAS Institute Inc. All rights reserved.

34

Scenario 5: Quarterly adjustment process


RSADRQU1 report scans all addresses for a certain
country and inserts them into an index table

/DATAFLUX/RSADRQU2 reads all SAP addresses from


index table and validates each address. Addresses are checked, auto completed and standardized. If an address cannot be validated it is flagged for later reporting purposes. Indicates the level of address quality, i.e. how many addresses are correct and how many are incorrect

RSADRQU3 writes back validated and corrected


addresses to the operational SAP database. Alternatively reports reason for not being able to write them back

Copyright 2005, SAS Institute Inc. All rights reserved.

35

Scenario 5: Quarterly adjustment process

Copyright 2005, SAS Institute Inc. All rights reserved.

36

Scenario 5: Quarterly adjustment process


Checked addresses: + = ok - = failed

Summary

Copyright 2005, SAS Institute Inc. All rights reserved.

37

Scenario 5: Quarterly adjustment process

Copyright 2005, SAS Institute Inc. All rights reserved.

38

Technology
Java 1.4.x/1.5 to interface SAP with the Dataflux
dfIntelliServer 6 using SAP Java Connector 2.1.3

ABAP programming to hook into the predefined


interfaces (SAP Business Add-In) for address validation and deduplication

SAP Add-on Assembly Kit (AAK) to allow for


SAP certification (e.g. Name spaces, installation, deployment, upgrade etc.)

Search index creation in SAS data sets or in any


external JDBC-compliant RDBMS
Copyright 2005, SAS Institute Inc. All rights reserved.

39

Technology: dfConnector Framework Functions



/DATAFLUX/AREA_CODE

/DATAFLUX/DETERMINE_GENDER
/DATAFLUX/DETERMINE_LOCALE /DATAFLUX/DETERMINE_ENTITY /DATAFLUX/DIRECTORY_SEARCH /DATAFLUX/DUPLICATE_CHECK /DATAFLUX/GENERATE_MATCHCODE /DATAFLUX/GEN_MATCHCODE_PARSED /DATAFLUX/GEOCODE /DATAFLUX/LOOKUP_COUNTY /DATAFLUX/LOOKUP_PHONE /DATAFLUX/PARSE /DATAFLUX/QUERY_SERVER /DATAFLUX/STANDARDIZE /DATAFLUX/STANDARDIZE_PARSED /DATAFLUX/STANDARDIZE_SCHEME /DATAFLUX/DELETE_INDEX_ENTRY /DATAFLUX/VERIFY_ADDRESS

/DATAFLUX/MAINTAIN_INDEX_ENTRY
40

Copyright 2005, SAS Institute Inc. All rights reserved.

Technology: /DATAFLUX/VERIFY_ADDRESS

Input data

Results

Copyright 2005, SAS Institute Inc. All rights reserved.

41

Technology: /DATAFLUX/VERIFY_ADDRESS

Copyright 2005, SAS Institute Inc. All rights reserved.

42

Technology: External Search Index


The external search index can be stored in an
arbitrary RDBMS that supports the JDBC interface

Examples:
SAS data sets MySQL Microsoft SQL Server

MaxDB (formerly known as SAP DB)


Oracle ...

Copyright 2005, SAS Institute Inc. All rights reserved.

43

Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved.

44

Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved.

45

Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved.

46

Technology: External search index Example: Stored in SAS

Copyright 2005, SAS Institute Inc. All rights reserved.

47

Technology: RFC server platforms


SAP supported Java Connector JCo platforms
(used by RFC server component of dfConnector):
Windows NT SP4 or later, Win 2000, XP, Win 2003 Server Sun Solaris/SPARC 8 or later IBM AIX 4.3 or later HP-UX 11.0 or later (PA_RISC processors, only) OS/400 V5R1 or later (not for SAP JCo 2.0.5) COMPAQ Tru64 5.0 or later (not for SAP JCo 2.1.x) Z/Linux on S/390 (Linux / Z-series GLIBC 2.2.4 or later) Linux Kernel 2.2.14 or later (Intel compatible processors)

Copyright 2005, SAS Institute Inc. All rights reserved.

48

Additional Information
SUGI Birds-of-a-Feather (BoF) session
Enhancing SAP with SAS, room 107, Tuesday at 6 p.m.

www.dataflux.com

Copyright 2005, SAS Institute Inc. All rights reserved.

49

Copyright 2005, SAS Institute Inc. All rights reserved.

50

You might also like