You are on page 1of 8

SCEIS Data Cleansing General Guidelines

189867663.doc

Objective
The purpose of this document is to outline the course of actions to cleanse data in the legacy systems or in the corresponding staging area before it is loaded into SAP. It defines general guidelines, which may be customized for each conversion ob ect when detailed cleansing instructions are rolled out. This is a living document that will be updated as !lue Print and "ata #onversion decisions are made in the following wee$s.

Versions The following table documents the revision history of this document:
%&'SI() %&'SI() "AT& "&S#'IPTI() *P"AT&" !+

1., 1.1

-./.-,,0 -.13.-,,0

1inal version reviewed and approved by '. 2ic$er &ditorial review

!1ord

Data Cleansing
"ata #leansing is the process of reviewing and maintaining legacy application data so that it can be converted into the S#&IS SAP solution without intervention at final conversion time. "ata cleansing is one of the most important processes for data conversion. #leansing of the data must occur prior to loading it into the Production SAP environment. 4oading poor 5uality data into SAP could result in incorrect business decisions and may be more difficult to correct later. As part of the S#&IS "eployment Strategy, legacy data must be cleansed before loading it into the SAP solution. State Agencies will cleanse their own data per scope indicated in the "ata #leansing Scope charts below. 'esources will be needed from the Agencies who are currently using the legacy data. The "eployment team will coordinate this process.

Data Cleansing Guiding Principles/and Assumptions


4egacy data must undergo data cleansing to improve 5uality, minimize data integrity issues, reduce data volume and e6tract7program run time. State Agencies will be responsible for cleansing master and transactional data to be converted to SAP If necessary, Agencies will be re5uired to supply additional resources to complete high volume, low comple6ity manual cleansing activities Agencies will ensure that e6tracted data is validated before and after data are loaded to SAP An Agency data owner will be assigned for each conversion and will be responsible for the cleanliness of the source data to be converted It is the responsibility of the Agency data owners to communicate with one another to identify dependencies between cleansing efforts S#&IS 1unctional Teams will provide the SAP data re5uirements and the corresponding support to help Agencies to understand SAP data fields and map legacy systems data to SAP

189867663.doc

2or$ plan and metrics will be used by the "eployment S#&IS team to trac$ progress over the course of the implementation

Data in scope to be cleansed by State Agencies


()4+ the following data ob ects need to be cleansed by Agency resources. The rest of 8aster and Transactional data ob ects will either be loaded in SAP by the S#&IS functional teams 9such as #hart of Accounts or 8aterial 8aster:, derived from other data ob ects 9such as #ommitment Items and 1und #enters: or entered manually in SAP as part of final #utover 9such as open Purchase (rders, current year !udget:.

aster Data Cleansing objects in Scope !or State Agencies


!*SI)&SS P'(#&SS.SAP 8("*4& #()%&'SI() (!;&#T S(*'#& S+ST&8.I)P*T 1I4& "ATA T( !& #4&A)S&" '&SP()SI!4&

Assets 8anagement

Accounts 'eceivable

1i6ed Assets 8aster < !alances. Also include #apital and (perational 4eases #ustomer 8aster !an$. !an$ Accounts #ost #enters

=A1'S, !A'S, 8anual.&6cel Spreadsheet

All active assets

Agency 1inance "epartment

8anual.&6cel Spreadsheet 8anual.&6cel Spreadsheet 8anual.&6cel Spreadsheet

#ash 8anagement

#(ST #()T'(4.#()T'(44I)
=

#ost #ontrol.#ontrolling

Internal (rders

8anual.&6cel Spreadsheet. STA'S

=rants 8anagement

Sponsor

8anual.&6cel Spreadsheet, #1"A 2ebsite

=rants 8anagement

Sponsored Programs

8anual.&6cel Spreadsheet

Active agency #ustomer list !an$ files. #urrent !an$ Accounts )ew SAP #ost #enters based on agency org structure )ew SAP Internal (rders based on SPI'S non7capital and capital pro ects Agency active Sponsor lists combined with #1"A information )ew SAP Sponsored Programs

Agency 1inance "epartment ST( (nly

Agency 1inance "epartment

Agency 1inance "epartment

Agency 1inance "epartment

Agency 1inance "epartment

189867663.doc

=rants 8anagement

(pen =rant

8anual.&6cel Spreadsheet STA'S.&6tract Program

Purchasing < S'8.88.1I

%endor 8aster

Active agency =rants list Active %endors in the last -> months

Agency 1inance "epartment Agency 1inance "epartment

SCEIS "ransactional Data Cleansing objects in Scope !or Agencies


!*SI)&SS P'(#&SS #()%&'SI() (!;&#T S(*'#& S+ST&8.I)P*T 1I4& "ATA T( !& #4&A)S&" '&SP()SI!4&

=eneral 4edger

=4 !alances

STA'S.&6tract Programs or &6cel Spreadsheet 8anual.&6cel Spreadsheet 8anual.&6cel Spreadsheet APS.&6tract Program or &6cel Spreadsheet

Accounts Payable Accounts 'eceivable Procurement

%endor (pen Items A' (pen Items (pen #ontracts

&nding balances of last fiscal period before go7 live date (utstanding vendor invoices (utstanding customer invoices #ontract !alances by go7 live date

Agency 1inance "epartment Agency 1inance "epartment Agency 1inance "epartment Agency Procurement "epartment

General Cleansing Guidelines


Data t#at can be cleansed in t#e legacy system $it#out %no$ing SAP re&uirements
&?P4A)ATI() '&S(4*TI()

ISS*&

"uplicates

The same data entity 9fi6ed asset, vendor, customer, etc.: is named two or more times in the same system.

(bsoletes or inactive records

"ata that is not up to date or no longer active. (bsolete data should remain in the legacy system since it is not needed in SAP. &6ample vendors no longer purchased from.

"ata cleansing is re5uired. 1lag one or more of the data elements so that it is not included in the @to be@ e6tract file. "ata cleansing is re5uired. The rules to declare a record obsolete is as followsA 7 %endorsA no activity in the last two years 7 1i6ed AssetsA 'etired of

189867663.doc

>

Incorrect "ata

Inconsistencies that are related to typing or data entry errors 7 typical problems include spelling errors 9e.g., !an$ of America vs. !anc of America: and reference inconsistencies 9e.g., -nd Street vs. Second Street, or Inc vs. #orporation:. 8issing data in current legacy system.

scrapped Assets after ? years 7 #ustomersA T!" 7 !an$ AccountsA T!" 7 Pro ectsA T!" 7 =rantsA T!" #leansing involves using a field in the legacy system to identify the record and use it to sort out these files when e6tracting data. "ata cleansing is re5uired. 'eview file and correct manually. If the error is present in multiple records, there may be a way to correct this automatically. #onsult with Agency Technical support.

Incomplete 'ecords

"ata cleansing is re5uired. #orrect incomplete records since some of this data may be re5uired by SAP.

Cleansing Process
'un corresponding 4egacy System report and download it to an e6cel spreadsheet "epending on the size and.or comple6ity of the data file, determine, either programmatically or manually, duplicates, obsoletes, incorrect or incomplete records #orrect records per suggested solutions in the previous chart. If necessary, consult with your Agency Technical support and.or corresponding S#&IS Team member 'eport status to "eployment team per pro ect plan and metrics sheet

Data t#at s#ould be cleansed based on SAP re&uirements


o "etailed "ata 8apping and understanding of SAP data fields will be re5uired

189867663.doc

o o

Agencies will be given the corresponding support from the S#&IS team to understand SAP re5uirements and complete mapping The following guidelines may be revised and customized for each conversion ob ect
&?P4A)ATI() '&S(4*TI()

ISS*&

8issing re5uired values or intermittent data

The current system does not re5uire a certain field, so it has been left blan$, or a given field should be filled per up to date procedure but it is s$ipped when information is not $nown at the time of data entry. This field is re5uired in SAP per defined business process.

(verloaded data fields

Two organizations use the same field to store different elements of information.

#leansing 'e5uired. It might be possible to automatically populate the field 9a: by plugging in a constant value, or 9b: by referencing some other file to Cloo$ upD the information. If not, manual data cleansing will be needed. #onsult with Agency technical support for assistance. #leansing re5uired in one database or the other, or both based on what the field will be used for in SAP It may not be possible to reliably separate the two values. 8anual cleansing may be re5uired.

#ompound data fields

Inconsistent similar data

The current system does not provide a separate field for some desired piece of information. That piece of information is being stored along with another one in its designated field. &6ampleA current system includes a field named C#ontactD which would typically contain the CnameD of the appropriate contact individual. !ecause the system does not include a separate field for the contactEs telephone number, both the Cname and phone numberD are being stored in the C#ontactD field. Similar data entered into separate or independent systems. &6ample, consider two departments defining pro ects in their systems. Same type of data 9pro ect

#leansing re5uired in one database or the other, or both based on what the field will be used for in SAP.

189867663.doc

1ree form te6t fields

"ifferent data values to represent the same

Intelligent data fields

&ncoded data fields

related: is entered into different systems but since it is not validated against each other or a central system, the data format is different. 1ree form te6t fields may have data that varies in meaning based on the user who entered the data into the system. Inconsistencies due to different data structures used in different source systems 7 typical problems include using different data values to represent the same thing 9e.g., System A uses 1 for CyesD, System ! uses + for CyesD and System # uses a flag for CyesD:. %arious positions of the data field imply additional information. SAP typically provides a separate field for the implied additional information. &6ampleA #onsider a system which includes a 07character field named CInvoice )umberD. A value of C=D in the first position indicates a sale to the *S =overnmentF a value of C"D in the first position indicates a sale to a non7government *S customer. The remaining characters in the field contain a uni5ue serial number. Thus, it is possible to determine some additional information from the invoice number G customer type. Is the customer type *S =overnment or domesticH The data field in the current system contains a code to represent a full value. SAP re5uires the full value or SAP uses a different code to represent the same full value. &6ampleA consider a system

"ata #leansing may be re5uired based on SAP re5uirements.

#leansing re5uired in one database or the other or all based on what the field will be used for in SAP

If there is a regular pattern to the coding, the separation can probably be done programmatically. If not, manual conversion may be re5uired. S#&IS functional team will determine the solution.

The full value can be programmatically generated from a loo$7up table. S#&IS 1unctional Team will propose solution.

189867663.doc

1ormatting

1ield lengths

which includes a 17character field named C)ame Prefi6D, where a code of C1D indicates C8r.D, a code of C-D indicates C8issD, a code of C3D indicates C8rs.D. SAP wants the full value 9that is, C8r.D, C8rs.D, or C8issD:, not the code. A data field in the current system contains a value not allowed by the corresponding SAP field. &6ampleA #onsider a field where the current system allows alpha7numeric values, but the SAP field is only numeric. The length of the data field in the current system is longer than the corresponding field in SAP. &6ampleA #onsider a current system with description field of length 3,. Suppose SAP provides a description field of length ->. A valid field entry in legacy is not valid in SAP.

8anual data cleansing will be re5uired.

Should the field be unilaterally truncatedH (r should each description be evaluated by a human and abbreviated to retain ma6imum readabilityH Per proposed solution, manual data cleansing may be re5uired. &stablish the need for a translation table in the data cleansing procedures and describe itEs fields and valid entries

"ata re5uiring translation tables

Cleansing Process
Attend meeting to gain understanding of SAP field re5uirements Team up with S#&IS functional team member to develop legacy system vs. SAP fields mapping. &6cel spreadsheet tool will be used to create to be file 'un corresponding 4egacy System report and download data to an e6cel spreadsheet per previously defined data file "epending on the size and.or comple6ity of the data file, determine, either programmatically or manually, data to be cleansed as per guidelines indicated before in this document #orrect records per suggested solutions in the previous chart. If necessary, consult with your Agency Technical support and.or corresponding S#&IS Team member 'eport status to "eployment team per pro ect plan and metrics sheet

189867663.doc

You might also like