You are on page 1of 50

Real-time Data Quality

for SAP
Dietrich O. Banschbach
Manager, R&D EMEA
SAS International

Copyright 2005, SAS Institute Inc. All rights reserved.


Agenda

Overview
dfConnector for SAP
Scenarios
Technology
Additional Information

Copyright 2005, SAS Institute Inc. All rights reserved. 2


Overview: Companies

Companies involved:

SAP AG - worlds largest Enterprise Resource


Planning (ERP) software company

DataFlux Corporation (a SAS company)


a leading provider of data management
solutions consisting of data quality, data profiling,
data integration, data augmentation and data
monitoring

Copyright 2005, SAS Institute Inc. All rights reserved. 3


Overview: SAP partnership

SAS is an SAP Software Partner with several


SAP certified interfaces
DataFlux, an SAP Software Partner in its own
right, has attained SAP interface certification
for its DataFlux dfConnector for SAP product

Copyright 2005, SAS Institute Inc. All rights reserved. 4


dfConnector for SAP

DataFlux dfConnector for SAP enhances data


quality in SAP systems in real-time
Facilitates communication between SAP
applications and DataFlux dfIntelliServer
Offers transparent access from SAP applications
to DataFlux dfIntelliServer services for data
validation, standardization, deduplication, error-
tolerant search, etc.

Copyright 2005, SAS Institute Inc. All rights reserved. 5


dfConnector for SAP

Provides a remote function call (RFC) server that


channels function calls from within SAP systems to
dfIntelliServer and returns results to SAP
Framework consisting of a set of DataFlux supplied
ABAP functions that map to dfIntelliServer
functions. These can be called by any SAP
application.
Functions can be used to build new or extend
existing data quality solutions in SAP using
DataFlux methods

Copyright 2005, SAS Institute Inc. All rights reserved. 6


dfConnector for SAP: Architecture

BADI API

RFC server, based on dfIntelliServer


SAP Web Business SAP Java Connector (data quality
Application Add-In
algorithms,
Server (ABAP)
reference database)

JDBC

Search Index

SAS Oracle MySQL DB/2 MS SQL

Copyright 2005, SAS Institute Inc. All rights reserved. 7


dfConnector for SAP: Framework

Function modules written in ABAP use a


standard call function destination to invoke a
method that is not part of the current SAP system
The call function destination invokes
dfConnector listening at the specified destination
dfConnector gathers all parameters and initiates
the appropriate call into dfIntelliServer using its
Java client API

Copyright 2005, SAS Institute Inc. All rights reserved. 8


dfConnector for SAP: Postal Address Validation

ABAP programmers can use the framework


functions in any SAP application
As an example application that uses this
framework, dfConnector for SAP supports postal
address validation as defined in SAPs BC-BAS-
PV certification scenario.
Enhances SAPs Business Address Services
(formerly Central Address Management)
dfConnector is Certified for SAP NetWeaver.
Formally tested with R/3 Enterprise (4.7)

Copyright 2005, SAS Institute Inc. All rights reserved. 9


dfConnector for SAP: Postal Address Validation

Customer, vendor and other addresses in SAP


are checked in real-time for correct city names,
street names, house numbers and zip codes
Missing information is auto completed from a
reference database
Quarterly adjustment process keeps addresses
up to date via a batch-run
Reports which addresses are correct and which ones
could not be validated (stating the reason)
Process can be used to do initial validation of all
addresses in SAP

Copyright 2005, SAS Institute Inc. All rights reserved. 10


dfConnector for SAP: Deduplication

In addition to postal address validation, a


duplicate check is carried out before a new entry
can be saved in SAP
Avoids multiple entries of the same customer or
vendor name with slight differences in spelling
Offers error tolerant (fuzzy) search

Copyright 2005, SAS Institute Inc. All rights reserved. 11


Scenarios: Postal Address Validation

This scenario enhances data quality within SAP in


real-time as address data is entered interactively
Addresses are checked for correct:
city names
street names
house numbers
zip codes

Input is standardized according to postal authority


requirements (e.g. USPS rules)
Missing information can be auto completed
Copyright 2005, SAS Institute Inc. All rights reserved. 12
Scenario 1: Create new customer

Create new customer in SAPGUI using standard


SAP transaction XD01
Fill in data:
Company name
City
Country
(No street)

Copyright 2005, SAS Institute Inc. All rights reserved. 13


Scenario 1: Create new customer

Copyright 2005, SAS Institute Inc. All rights reserved. 14


Scenario 1: Create new customer

Required
entry

Copyright 2005, SAS Institute Inc. All rights reserved. 15


Scenario 1: Create new customer
Missing
information
field is colored
and cursor is
positoned in
that field

Error
message
in status
line

Copyright 2005, SAS Institute Inc. All rights reserved. 16


Scenario 1: Create new customer

Street
name
entered
incorrectly
(Street
instead of
Click on
Drive)
Check
button
when all
data has
been
entered
Region
required
to resolve
the
address

Copyright 2005, SAS Institute Inc. All rights reserved. 17


Scenario 1: Create new customer

Address is validated by dfIntelliServer


City name converted to uppercase
Postal code (ZiP) added
Street name uppercased and standardized (DR=Drive)
District added automatically

Copyright 2005, SAS Institute Inc. All rights reserved. 18


Scenario 2:
Creating a customer with minimal data entry
Data entered in SAP:
Part of a street name with a spelling mistake
Postal code
Country (required by SAP)

Copyright 2005, SAS Institute Inc. All rights reserved. 19


Scenario 2: Creating a customer with minimal data
entry

Partial
street
name with
spelling
mistake

Basic No region
postal specified
code

Copyright 2005, SAS Institute Inc. All rights reserved. 20


Scenario 2:
Creating a new customer with minimal data entry
Address is validated by dfIntelliServer
City name uppercased
Postal code added (zip plus 4)
Street name uppercased and standardized (PKWY=Parkway)
Spelling mistake corrected
District added automatically
Region added automatically

Copyright 2005, SAS Institute Inc. All rights reserved. 21


Scenario 3: Inconsistent or unresolvable addresses

Neither post code nor city are specified


User insists on saving a record even though the
entry could not be validated
To ensure high availability of the SAP system,
address data can still be entered and saved if
dfConnector and/or dfIntelliServer are
temporarily unavailable. Entries are marked as
not having been checked against official address
reference data. Those addresses can be
corrected in the dfConnector Quarterly Address
Adjustment process which checks and updates in
batch mode
Copyright 2005, SAS Institute Inc. All rights reserved. 22
Scenario 3: Inconsistent or unresolvable addresses

Error
message: No
zip code
and/or city
specified

Copyright 2005, SAS Institute Inc. All rights reserved. 23


Scenario 3: Inconsistent or unresolvable addresses

Copyright 2005, SAS Institute Inc. All rights reserved. 24


Scenario 4: Duplicate search

The following scenario shows the duplicate


search and elimination capabilities of DataFlux
dfConnector for SAP
The scenario first shows how easy it is (caused
by a small typo) to create a duplicate customer
record in the SAP database without dfConnector
In comparison, the same process is performed
using dfConnector for SAP to identify potential
duplicates and resolve the situation

Copyright 2005, SAS Institute Inc. All rights reserved. 25


Scenario 4: Duplicate search

Using the standard SAP search, the user first checks in


SAP if the customer he would like to create does not
currently exist. But accidentally he has a small typo in the
street name (Wesston instead of Weston)

Copyright 2005, SAS Institute Inc. All rights reserved. 26


Scenario 4: Duplicate search
The search returns no hits and the user proceeds
under the assumption he can now create a
unique customer
He creates and saves a new customer entry,
thus creating a duplicate

Copyright 2005, SAS Institute Inc. All rights reserved. 27


Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved. 28


Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved. 29


Scenario 4: Duplicate search

After that the duplicate search capabilities of


dfConnector are triggered. Based on matchcodes
created by dfIntelliServer, potential duplicates are
detected

Copyright 2005, SAS Institute Inc. All rights reserved. 30


Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved. 31


Scenario 4: Duplicate search

Copyright 2005, SAS Institute Inc. All rights reserved. 32


Scenario 4: Duplicate search
Transaction flow
Address data is entered in SAPGUI. Postal address
validation executes
The /DATAFLUX/ADDR_SEARCH implementation of the
BAdI ADDRESS_SEARCH is invoked
Function module /DATAFLUX/DUPLICATE_CHECK
searches for duplicates
/DATAFLUX/DUPLICATE_CHECK calls dfConnector which
gathers the entered SAP data.
Matchcodes are generated dynamically and a JDBC call is
made to retrieve results from the external RDBMS. The
results of the search are returned to dfConnector which
passes them to SAP to display a list of potential duplicates

Copyright 2005, SAS Institute Inc. All rights reserved. 33


Scenario 5: Quarterly adjustment process

Quarterly Adjustment is a batch process that


ensures address data stays up to date
If new address data are available e.g. from
USPS, this can be activated in the system in
three steps by running:
SAP report to get all addresses
DataFlux provided report to check, standardize
and auto complete addresses
SAP report to write the updated addresses
back to the SAP database

Copyright 2005, SAS Institute Inc. All rights reserved. 34


Scenario 5: Quarterly adjustment process

RSADRQU1 report scans all addresses for a certain


country and inserts them into an index table
/DATAFLUX/RSADRQU2 reads all SAP addresses from
index table and validates each address. Addresses are
checked, auto completed and standardized.
If an address cannot be validated it is flagged for later
reporting purposes. Indicates the level of address quality,
i.e. how many addresses are correct and how many are
incorrect
RSADRQU3 writes back validated and corrected
addresses to the operational SAP database. Alternatively
reports reason for not being able to write them back

Copyright 2005, SAS Institute Inc. All rights reserved. 35


Scenario 5: Quarterly adjustment process

Copyright 2005, SAS Institute Inc. All rights reserved. 36


Scenario 5: Quarterly adjustment process

Checked
addresses:
+ = ok
- = failed

Summary

Copyright 2005, SAS Institute Inc. All rights reserved. 37


Scenario 5: Quarterly adjustment process

Copyright 2005, SAS Institute Inc. All rights reserved. 38


Technology

Java 1.4.x/1.5 to interface SAP with the Dataflux


dfIntelliServer 6 using SAP Java Connector 2.1.3
ABAP programming to hook into the predefined
interfaces (SAP Business Add-In) for address
validation and deduplication
SAP Add-on Assembly Kit (AAK) to allow for SAP
certification (e.g. Name spaces, installation,
deployment, upgrade etc.)
Search index creation in SAS data sets or in any
external JDBC-compliant RDBMS

Copyright 2005, SAS Institute Inc. All rights reserved. 39


Technology: dfConnector Framework Functions
/DATAFLUX/AREA_CODE
/DATAFLUX/DETERMINE_GENDER
/DATAFLUX/DETERMINE_LOCALE
/DATAFLUX/DETERMINE_ENTITY
/DATAFLUX/DIRECTORY_SEARCH
/DATAFLUX/DUPLICATE_CHECK
/DATAFLUX/GENERATE_MATCHCODE
/DATAFLUX/GEN_MATCHCODE_PARSED
/DATAFLUX/GEOCODE
/DATAFLUX/LOOKUP_COUNTY
/DATAFLUX/LOOKUP_PHONE
/DATAFLUX/PARSE
/DATAFLUX/QUERY_SERVER
/DATAFLUX/STANDARDIZE
/DATAFLUX/STANDARDIZE_PARSED
/DATAFLUX/STANDARDIZE_SCHEME
/DATAFLUX/DELETE_INDEX_ENTRY
/DATAFLUX/VERIFY_ADDRESS
/DATAFLUX/MAINTAIN_INDEX_ENTRY
Copyright 2005, SAS Institute Inc. All rights reserved. 40
Technology: /DATAFLUX/VERIFY_ADDRESS

Input data

Results

Copyright 2005, SAS Institute Inc. All rights reserved. 41


Technology: /DATAFLUX/VERIFY_ADDRESS

Copyright 2005, SAS Institute Inc. All rights reserved. 42


Technology: External Search Index

The external search index can be stored in an


arbitrary RDBMS that supports the JDBC
interface
Examples:
SAS data sets
MySQL
Microsoft SQL Server
MaxDB (formerly known as SAP DB)
Oracle
...

Copyright 2005, SAS Institute Inc. All rights reserved. 43


Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved. 44


Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved. 45


Technology: External Search Index

Copyright 2005, SAS Institute Inc. All rights reserved. 46


Technology: External search index
Example: Stored in SAS

Copyright 2005, SAS Institute Inc. All rights reserved. 47


Technology: RFC server platforms

SAP supported Java Connector JCo platforms


(used by RFC server component of dfConnector):
Windows NT SP4 or later, Win 2000, XP, Win 2003 Server
Sun Solaris/SPARC 8 or later
IBM AIX 4.3 or later
HP-UX 11.0 or later (PA_RISC processors, only)
OS/400 V5R1 or later (not for SAP JCo 2.0.5)
COMPAQ Tru64 5.0 or later (not for SAP JCo 2.1.x)
Z/Linux on S/390 (Linux / Z-series GLIBC 2.2.4 or later)
Linux Kernel 2.2.14 or later (Intel compatible processors)

Copyright 2005, SAS Institute Inc. All rights reserved. 48


Additional Information

SUGI Birds-of-a-Feather (BoF) session


Enhancing SAP with SAS, room 107, Tuesday
at 6 p.m.
www.dataflux.com

Copyright 2005, SAS Institute Inc. All rights reserved. 49


Copyright 2005, SAS Institute Inc. All rights reserved. 50 50

You might also like