You are on page 1of 2

© 2001 Giga Information Group

Copyright and Material Usage Guidelines

June 26, 2001

Data Quality Market Segments

Lou Agosta

A client inquiry

Can you please more precisely define the market segments identified in Planning Assumption, Market
Overview: Data Quality, Lou Agosta?

The data quality market is diverse and dynamic. Without wasting too much time on assumptions regarding
method, Giga’s approach is to describe the market and abstract the diverse categories within it rather than
attempting to impose categories or distinctions from a formal or academic point of view.

“Data quality extenders” are applications that leverage data quality technologies, such as searching, matching
and de-duplication, in the domains of operational and analytic customer relationship management (CRM),
product management and catalog standardization. Since market multiples are more favorable in CRM than
they are in moving and manipulating dumb data, many vendors have shifted their technologies, products and
offerings in this direction. The key differentiator here is an extension of a technology from data quality in the
narrow sense to CRM, for example, product or catalog standardization.

“Name and address processing, householding and geocoding” is data quality technology that focuses on the
issues relating to leveraging name and address data. This is relevant to direct marketers whose mailing
expenses are substantial and that have called upon automation to manage the process of producing a specific
output — namely, a name and address suitable for printing on an envelope in a national postal system. This is
a very particular market segment — that of optimizing mailing and other marketing initiatives based on
geographic data. Geocoding is closely related to the problem of physically locating the individual on the
(postal) map and describing demographics based on relevant demographic data. Name and address
preparation for the US Post Service (or any national postal system) is one of those dull-as-dirt applications
that could cost a small fortune if a firm makes an error; thus, a specialized application is of the essence here.

“Identification and searching” are technologies that address the particular computer science problem of how
we know entities are properly identified in the context of searching. For example, “Church” can refer to a
street, a person’s name or a place of worship, depending on whether the context is 212 Church Street, Alonzo
Church or The First Church of Gainsville. Though closely related to the operation of matching, identification
is, in fact, a distinct function and is the presupposition for matching, which becomes a special case of finding
something again after having indexed and stored it.

“Matching and de-duplication’ is the application functions and algorithms to the specific problem of
eliminating ambiguity — that is, two names refer to the same entity (person, place, thing). When an
organization has two data stores that refer to the same, possibly overlapping (but not identical), population
(whether people, products or transactions), the problems presented include building a single master file
(database) or extending the amount of information available by aggregating the two sources in a meaningful
way. Government organizations conducting censuses and related surveys have faced this problem for a long

IdeaByte ♦ Data Quality Market Segments

© 2001 Giga Information Group
All rights reserved. Reproduction or redistribution in any form without the prior permission of Giga Information Group is expressly prohibited. This information
is provided on an “as is” basis and without express or implied warranties. Although this information is believed to be accurate at the time of publication, Giga
Information Group cannot and does not warrant the accuracy, completeness or suitability of this information or that the information is correct.
Data Quality Market Segments ♦ Lou Agosta

time. The result is the need for de-duplication — knowing how to make a determination that eliminates
duplicates. In fact, this category (matching and de-duplication) is closely related to identification and
searching. In subsequent Giga research, reference is sometimes made to matching and searching.

“Data profiling/metadata/analytics” is the description of the structure, data elements and content of data
structures that regard content and the validity and semantics of the content in question. This includes
information engineering and reengineering. The profiling of defined codes, statuses and permissible business
values against the actual content of data stores is especially useful if the results are captured in a central
metadata repository for leveraged reuse. Data quality is relative to allowable values, that is, to data standards.
The idea of a data dictionary is not new. What is new is the possibility of capturing the data in an automated
way to a local metadata repository as the data is mapped from the transactional system of record to the
decision-support data store.
“Standardization/scrubbing/correction/parsing” is modifying and enhancing the quality and content of data
against a specified set of rules, canons or standards that indicates the proper structure and semantic content of
the data. At least one data quality vendor, SSA, makes a strong case that standardization is unrelated to data
quality. But even SSA acknowledges that standardization is necessary in order to qualify for discounts on
direct mail through the national postal service.

“Data augmentation” is the correlating of demographic information with basic customer or product data. The
large credit reporting and data aggregators specialize in this area. These are not software products, and these
infrastructures may also be involved in the delivery of the content. As discussed in previous research, they
include Axciom, Equifax, Experian, CACI, Claritas, Harte-Hanks, Polk and TransUnion (see Planning
Assumption, Market Overview: Data Quality, Lou Agosta).

When data quality services, such as those specified in any of the above, are offered as part of a centralized
service to a variety of clients from a single source, then a service bureau model is being invoked. An
application service provider (ASP) is an example of a modern approach to a service bureau. Data quality
vendors that are trying to generate revenues using the service bureau or ASP model include: Firstlogic
(eDataQualityService ASP), Harte-Hanks (in a variety of service forms) and Group1 (

For additional research regarding this topic see the following:

•= Planning Assumption, Data Quality Methodologies and Technologies, Lou Agosta
•= Planning Assumption, Vendor ScoreCard: Data Quality, Part 1, Lou Agosta
•= Planning Assumption, Vendor ScoreCard: Data Quality, Part 2, Lou Agosta

IdeaByte ♦ RIB-062001-00226 ♦

© 2001 Giga Information Group
Page 2 of 2