You are on page 1of 14

Data Audit Findings

Data Quality Assessment Results for <CUSTOMER>


Data Object: <OBJECT>
INTERNAL
TABLE OF CONTENTS
EXECUTIVE SUMMARY 3
Key Findings..................................................................................................................................................... 3
SCOPE4
QUALITY CRITERIA – GENERAL DESCRIPTION 6
Fill Rates........................................................................................................................................................... 6
RESULTS – FILL RATE 8
<Object Name / Description, repeat section as necessary for each object>.....................................................8
RESULTS – PLACEHOLDERS 9
<Object Name / Description, repeat section as necessary for each object>.....................................................9
RESULTS – PATTERNS 10
<Object Name / Description, repeat section as necessary for each object>...................................................10
<Object Name / Description – Field Name / Description, repeat section as necessary for each field>...........10
RESULTS – DUPLICATES 11
<Object Name / Description, repeat section as necessary for each object>...................................................11
FINDINGS SUMMARY 12
Next Steps 13
EXECUTIVE SUMMARY
Provide an executive summary based upon the analysis and results from this data quality assessment.
This paper documents the results of the SAP Service:
Object Level Assessment for Advanced Data Quality (Data Audit)

This service was carried out on <##> distinct <Customer> <application name> systems:
- <System #1> (<System #1 location or description>)
- <System #2> (<System #2 location or description>)
for the data object <Object>.

From each system, <Customer> provided SAP with a representative sample of approximately <##>% of the
overall <Object> data. SAP then performed an analysis on this data with regards to
- completeness (measured by fill rates of fields)
- use of placeholders (invalid or inaccurate values)
- correct formats (measured by validity of patterns)
- duplication (multiple records representing the same business entity)

Key Findings
The overall results show there are some significant issues in regard to <to be completed …>

<Summarize where there was high data quality.>

<Summarize the use of data within the systems – fill rates, columns used, keys or patterns used, etc.>

<Summarize duplication within each source system and across all data sets combined.>

<Summarize additional information discovered about the data.>

<Summarize the potential impact of the identified data quality issues on data based decisions or proposed
changes to business processes.>

Analysis Disclaimer
If the full scope of the project was not completed, provide those caveats here. Else remove this section.
Due to limited <Customer> resource availability and abbreviated access to <Customer> data, this data
quality assessment was halted after about eight hours of analysis. The information provided in this executive
summary is based upon the completed analysis. However not all information was gather. To demonstrate
the types of analysis available, some fictional reports are provided within this document and are noted as
such.

3
4
SCOPE
Outline the systems and data reviewed during this assessment.
This service was carried out on <##> distinct <Customer> <application name> systems:
- System #1 (<System #1 location or description>)
- System #2 (<System #2 location or description>)
for the data object <Object>.

Following our initial analysis, it was agreed up-front between <Customer> and SAP that <##> relevant tables
from the source system(s) would be analyzed. These tables included in the analysis were:

Table Name System #1 System #2


  # Data Records # Fields # Data Records # Fields
XXX ###,### ## ###,### ##
XXX ###,### ## ###,### ##
XXX ###,### ## ###,### ##
XXX ###,### ## ###,### ##
XXX ###,### ## ###,### ##

The data provided represents approximately <##>% of the total dataset from each source systems and is
considered representative of the overall data quality. <Table> represents the main table and relates the key
<Object> field to the standard company name, address, and contact fields (among others). The remaining
tables contain additional information about the <object> and also include lookup and check tables.

Each source system was analyzed separately using the SAP BusinessObjects toolset and a comparison
made of the overall data quality with respect to the following criteria
- Field fill rates
- Placeholders
- Patterns
- Duplication
<Note that whereas it makes sense to analyze the fill rates on all the tables concerned, it only makes sense
to analyze the use of placeholders and patterns on the tables that comprise the main attributes of the
<Object> object. These tables are touched by end users while creating and maintaining data.>

The field fill rate analysis shows which of the fields in the respective tables are actually used and to what
extent. This can provide useful information with regards to the underlying business processes and
demonstrates quite clearly which tables and fields are actually required when making data based decision,
for example when determining source of truth for a data object. Note that a fill rate analysis only takes into
consideration whether a field actually contains a value – it does not provide an indication of whether the
value is reasonable or correct.

The analysis of the use of placeholders serves primarily to find out whether values entered into fields are
indeed valid, or merely entered to complete a transaction. For example, a data record must to be entered
but not all business required data is available. Instead of a valid value, a placeholder is specified at the time
of data creation. The person entering the new record may intend to correct the value later, but often this is
forgotten. Typical examples that are often found here include post codes, whereby a combination such as
“11111” is commonly found, or telephone numbers where a dummy number or simply “999” is entered.

The pattern analysis is slightly more complex and consists of establishing whether valid values are entered
that match pre-defined format, standards or patterns. An example of this can be the format of post codes
depending on the country where a business partner is located. In Germany, for example, the only valid
format is a post code with 5 numeric digits (e.g. “69160”), whereas in the UK there are 6 possible formats:
A9 9AA
A99 9AA
A9A 9AA
AA9 9AA
AA99 9AA
AA9A 9AA
(Note: ‘A’ signifies an alphabetic letter and ‘9’ signifies a numeric digit. Variations may also include the use
of a dash (-) or a single space.) Other countries may have similar or different valid formats.

6
QUALITY CRITERIA – GENERAL DESCRIPTION

This section describes general rules used to assess the data. If different criteria were used, adjust this section.

Fill Rates
Experience has shown that fields can be usefully divided into various fill rate ranges and then analyzed
further.

Fill Rate = 100%:


These fields generally represent the backbone of the data object model. As a rule, these are mandatory
fields where an entry is either ensured by the system itself and/or by governance processes. Experience
shows that mandatory fields as imposed by the system itself can be significantly subject to the use of
placeholders. For these fields it is thus generally advisable to consider the use of a combination of system-
related and governance processes to ensure that correct or default values are entered e.g. by introducing
approval steps.

Caution: Even if a field listed in the report may be 100% complete, it is possible that this is merely a system
field and not necessarily relevant for a particular business process.

Fill Rate between 90% - 100%:


As a rule these fields are significant for the underlying business processes and a great deal of effort has
gone into maintaining them. Missing entries can thus lead to disruptions and/or delays in the business
processes, potentially even to a process not completing. It is thus correspondingly important to introduce
procedures to ensure that the required information is 100% complete.

Fill Rate between 35% - 90%:


Fields with fill rates in this range tend to belong to those fields that are relevant for business processes.
There may be very valid reasons why they are not filled to 100% (e.g. name of a second contact person in an
organization may well be just a “nice to have”, but not essential). In other cases, there may be a need for the
data to be filled to 100%, but for organizational or processing reasons the fields are not used correctly as
intended. An example here is the use by one in-house worker of a particular field for certain information (e.g.
with street and house number information), whereas another worker may enter the street and house number
information in two different fields. These fields thus need to be subjected to some scrutiny to investigate how
they are currently used and how they should be used in the future CRUD (create, read, update, delete)
processes.

Common causes for incomplete data in these fields thus include information that is generally difficult to
obtain, and the entry of equivalent data values in different fields. The latter in particular can lead to disruption
in the business processes.

Fill Rate between 1% - 35%:


As a rule these fields are not usually required at all for the successful running of the business processes.
Nevertheless, they may in some cases contain important information. Typical fields that fall into this
category are for example those which have secondary information content. Examples of these could be
PFACH (post office box) or PSTL2 (post code of the post office box). In the analysis it should be taken into
consideration that non-essential fields can represent ballast and lead to an unwieldy system. There is of
course also the possibility that fields falling into this range may also contain important information, but
distributed over several fields.

Fill Rate = 0%:


These fields quite apparently are not used for any business processes at all. For this reason, these fields
can be comfortably ignored in most future data based decisions. If this field is referenced as a source in an
ETL process or data migration process, it may be necessary to derive or construct data in place of the
missing values in this field.
RESULTS – FILL RATE
The following sections provide recommendations for different types of data analysis. Additional charts, graphs, statistics,
etc. are also recommended in these sections. The final results and analysis should be updated based upon the data
audit.

A proportion of these fields represent the structural framework of the data model on which the business
processes run. It should be noted, however, that some fields may be simply system fields that are required
by the application backend, and they have no actual relevance to the business processes themselves.

<Object Name / Description, repeat section as necessary for each object>


<Key findings from the analysis of the given object should be described in detail in this section. Some
findings that may be described could include …>
<Summarize the table contents and importance of high data quality>
<Provide tables or graphs. Side-by-side comparison of multiple source systems will highlight difference is
data usage between multiple sources.>
<Possible stat: # fields with 100% fill rate.>
<Possible stat: # fields with high fill rate. These columns are business essential.>
<Possible stat: # fields with <100% fill rate, but appear to be key. List fields.>
<Possible stat: # fields with 0% fill rate. These columns are not used at all.>
<Is there a difference in field usage or fill rate between the different sources? This will highlight different
application configuration or different business processes.>
<Tables or graphs listing the fill rate by column within this table>
<Highlight fields within this table that correspond to the criteria or statistics noted above.>
RESULTS – PLACEHOLDERS
Placeholders are generally used in mandatory fields where an entry must be made in order to create a
record, but the correct value is not known at the time of creation. Other important fields may also be
affected; however the person entering data typically does not enter a default or placeholder value unless the
field requires mandatory entry.

<Object Name / Description, repeat section as necessary for each object>


<Key findings from the analysis of the given object should be described in detail in this section. Some
findings that may be described could include …>
<Summarize the table contents and importance of high data quality>
<Provide tables or graphs. Side-by-side comparison of multiple source systems will highlight difference is
placeholder data between multiple sources.>
<Concentrate on fields with 100% fill rate. Mandatory fields often contain placeholders.>
<Look for values: asterisk (*), dash (-), period (.), ‘X’, blank spaces, etc.>
<Look for repeating characters: ‘aaa’, ‘zzz’, ‘000’, ‘111’, ‘999’, etc.>
<Look for sequences: ‘abc’, ‘xyz’, ‘asfd’, ‘123’, etc.>
<As placeholders are found in columns, test the same placeholders in other columns.>
<Possible stat: # fields containing some placeholder data.>
<Chart/graph: List columns and % placeholder data by field.>
<Is there a difference in placeholder data between the different sources? This will highlight different
governance practices or different business processes.>
RESULTS – PATTERNS
By analyzing data standards and patterns, it is possible to identify data that does not meet data quality
standards or to identify multiple standards used during the data entry process. It can be readily determined
whether the data is valid based upon the format of the entries.

<Object Name / Description, repeat section as necessary for each object>


<Key findings from the analysis of the given object should be described in detail in this section. Some
findings that may be described could include …>
<Summarize the table contents and importance of high data quality>
<Provide tables or graphs. Side-by-side comparison of multiple source systems will highlight difference is
placeholder data between multiple sources.>
<Concentrate on fields that are masked at the application level or typically conform to a special format or
standard.>
<Possible stat: by column, # patterns within each column.>
<Do the patterns differ by data source, region, etc.?>
<Is there a difference in patterns between the different sources? This will highlight different governance
practices or different business processes.>

<Object Name / Description – Field Name / Description, repeat section as necessary for each field>
<Key findings from the analysis of the given object should be described in detail in this section. Some
findings that may be described could include …>
<Summarize the table/column contents and importance of high data quality>
<Provide tables or graphs. Side-by-side comparison of multiple source systems will highlight difference is
placeholder data between multiple sources.>
<Possible stat: # patterns within field and the occurrence of each pattern.>
<Possible stat: % column data that conforms to defined formats or standards.>
<Do the patterns conform to defined formats or standards?>
RESULTS – DUPLICATES
By analyzing key fields for master data, it is possible to identify duplicate entities within a given data set or
across all data sets. Within a single source, duplicate entries are created due to a lack of governance within
the master data create process or by integration processes that lack checks for existing master data entities.
Often duplicates are created during integration of multiple data sets or during merger and acquisition.
Across multiple data sets, duplication (if managed by a master data process) may be acceptable.

<Object Name / Description, repeat section as necessary for each object>


<Key findings from the analysis of the given object should be described in detail in this section. Some
findings that may be described could include …>
<Summarize the table contents and importance of high data quality>
<Provide tables or graphs. Side-by-side comparison of multiple source systems will highlight difference is
placeholder data between multiple sources.>
<Possible stat: % entities flagged as duplicate to existing entity?>
<Possible stat: # entities that could be removed or logically deleted due to duplication to existing entity.>
<Provide statistics within each source and across all sources.>
<Is there a difference in duplication between the different sources? This will highlight different governance
practices or different business processes.>
<Are the matching requirements clearly defined or is this an initial assessment?>
FINDINGS SUMMARY
This section should recap the key points and/or concerns identified during the data quality assessment. These key
points and/or concerns will be addressed in the recommendations documented in the next steps section.
NEXT STEPS
This section should outline a plan to address the data quality issues, existing business disruption, or business process
risks. The plan may include recommended changes to the customer’s business process. The plan may also include a
data quality improvement program or the definition of or changes to an information governance program.

13
www.sap.com/contactsap

© 2018 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.

The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See http://www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark
information and notices.

You might also like