Professional Documents
Culture Documents
4 The Practitioner's Guide To Data Quality Improvement Cap 8
4 The Practitioner's Guide To Data Quality Improvement Cap 8
8
CHAPTER OUTLINE
8.1 What Are Dimensions of Data Quality? 130
8.2 Categorization of Dimensions 131
8.3 Describing Data Quality Dimensions 134
8.4 Intrinsic Dimensions 135
8.5 Contextual 138
8.6 Qualitative Dimensions 142
8.7 Finding Your Own Dimensions 146
8.8 Summary 146
Applications
Data sets
Records
5. Completeness
6. Consistency
7. Currency
8. Timeliness
9. Reasonableness
10. Identifiability
The relationships between the dimensions are shown in
Figure 8.2.
The main consideration of isolating critical dimensions of
data quality is to provide universal metrics for assessing the level
of data quality in different operational or analytic contexts.
Using this approach, the data stewards can work with the line-
of-business managers to:
• Define data quality rules that represent the validity expectations,
• Determine minimum thresholds for acceptability, and
• Measure against those acceptability thresholds.
In other words, the assertions that correspond to these
thresholds can be transformed into rules used for monitoring
the degree to which measured levels of quality meet defined
and agreed-to business expectations. In turn, metrics that
Intrinsic Contextual
Accuracy Completeness
Lineage Consistency
Semantic Currency
Structure Timeliness
Reasonableness
8.4.1 Accuracy
Accuracy is one of the more challenging dimensions to assess
because it refers to the degree to which data values agree with an
identified source of correct information. There may be many
potential sources of correct information; some examples include
a database of record; a similar, corroborative set of data values
from another table; dynamically computed values; or perhaps
the results of a manual process. In many cases there is no defin-
itive source of correct information.
The characteristics of accuracy include the definition of a sys-
tem of record or a set of values of record, the precision of data
values, value acceptance, and domain definition, as is described
in Table 8.2.
8.4.2 Lineage
Trustworthiness of data is of critical importance to all par-
ticipants across the enterprise. One aspect for the measure of
trustworthiness is the ability to identify the source of any new
or updated data element. In addition, documenting the flow of
information better enables root cause analysis. Therefore, a
dimension measuring the historical sources of data, called line-
age, is valuable in overall assessment.
136 Chapter 8 DIMENSIONS OF DATA QUALITY
8.5 Contextual
The contextual dimensions provide a way for the analyst to
review conformance with data quality expectations associated
with how data items are related to each other.
8.5.1 Completeness
A data model should not include extra information, nor
should its deployment be missing relevant data values. Unused
data, by nature of its being ignored, will tend toward entropy,
leading to low data quality levels along with introducing a prob-
lem of maintaining data consistency across any copies or
replicas. Completeness refers to the expectation that certain
attributes are expected to have assigned values in a data set.
Completeness rules can be assigned to a data set in three levels
of constraints:
1. Mandatory attributes require a value;
2. Optional attributes, which may have a value (potentially
under specific circumstances); and
Chapter 8 DIMENSIONS OF DATA QUALITY 139
8.5.2 Consistency
In any enterprise environment, consistency is relevant to the
different levels of the data hierarchy – within tables, databases,
across different applications, as well as with externally supplied
data. By virtue of the growing trend to consolidate and exchange
data across the lines of business, there may be discovered
inconsistencies in different data sets that may have eluded
scrutiny in the past. Policies and procedures for reporting
140 Chapter 8 DIMENSIONS OF DATA QUALITY
8.5.3 Currency
Currency refers to the degree to which information is current
with the world that it models. Currency can measure how “up-
to-date” information is, and whether it is correct despite the pos-
sibility of modifications or changes that impact time and date
values. Data currency may be measured as a function of the
expected frequency rate at which different data elements are
expected to be refreshed, as well as verifying that the data is
up-to-date; these may require some automated and manual pro-
cesses. Currency rules may be defined to assert limits to the
Chapter 8 DIMENSIONS OF DATA QUALITY 141
8.5.4 Timeliness
Timeliness refers to the time expectation for accessibility of
information. Timeliness can be measured as the time between
when information is expected and when it is readily available
for use. Some examples of timeliness measures are provided in
Table 8.9.
8.5.5 Reasonableness
General statements associated with expectations of consis-
tency or reasonability of values, either in the context of existing
data or over a time series, are included in this dimension.
Table 8.10 provides examples.
142 Chapter 8 DIMENSIONS OF DATA QUALITY
8.5.6 Identifiability
Identifiability refers to the unique naming and representation
of core conceptual objects as well as the ability to link data
instances containing entity data together based on identifying
attribute values. Examples are shown in Table 8.11.
8.8 Summary
The basis for assessing the quality of data is to create a frame-
work that can be used for asserting expectations, providing a
means for quantification, establishing performance objectives,
and applying the oversight process to ensure that the par-
ticipants conform to the policies. This framework is based on
dimensions of data quality – those discussed in this chapter,
along with any others that are specifically relevant within your
industry, organization, or even just between the IT department
and its business clients.
These measures complete the description of the processes
that can be used to collect measurements and report them to
management (as introduced in chapter 7). Before any significant
improvement will be manifested across the enterprise, though,
the individuals in the organization must understand the virtues
of performance-oriented data quality management and be
prepared to make the changes needed for quality management.