June 20, 2002

‘Data Quality’ Is a Misnomer

Lou Agosta

A client inquiry

What is the definition and value of data as opposed to information?

“Data quality” is a misnomer. Data in itself is meaningless, data is what is given — it is basic raw material.
Whether unstructured or structured content, it is data. If a person asks the value of data, that is easy — data
itself is worthless. It is what you do with the data that has value. Data is the content and when it is structured
in such a way as to reduce uncertainty, then it has value as information. Thus, data plus structure produces
information. Information provides differences and distinctions that reduce uncertainty.

A simple example is that the attribute of gender tells us something about a customer. If I am confident that
the customer is either male or female but I am not sure which one, then I have not reduced my uncertainty
one bit. I do not have any more information than when I started. Whereas if I have the distinction
male/female and, literally, the bit of information that the customer is male, then I will plan on selling them a
Father’s Day gift rather than one for Mother’s Day. The data without the structure is meaningless; the
structure without the data is empty. The structure — the simple male/female distinction — is not in itself
information. The application of the structure to the data yields information and provides a reduction in
uncertainty. Sell the individual a tie, not flowers.

Thus, Giga’s working definition of information and how to transform dumb data into quality information is
depicted in the figure below. As the attributes of the data are structured according to a defined process for
transforming the data along the three high-level dimensions of objectivity, usability and trustworthiness, the
information quality improves in precisely those dimensions. In particular, information = objective(data) +
useable(data) + trustworthy(data). Knowledge is not on the same continuum as data and information. The
commitment needed might be represented as a point in one of the quadrants or as a circle encompassing the
entire diagram. Knowledge = commitment(information).

From Data to Information

Ease of use

Subjective Objective


Hard to use quality

Source: Giga Information Group

From a business perspective, knowledge is qualitatively different than information. There is a gap separating
information, no matter how high the quality, from knowledge. The “best available information” never results
in knowledge without something special mixed in to strengthen it. Information requires something additional
to be added to it in order to yield knowledge. That something is commitment, commitment to goals relevant
to the business enterprise such as customer service, launching a new product or attaining operational
excellence. (Knowledge = commitment(information).)

Data, information and knowledge are overlapping categories that describe different aspects of the world of
business. They are different ways of describing the same phenomena. One person’s data may be another’s
information and vice versa. Yet the distinctions are valid or why would they exist in the first place? Data is
what is given — subjective, uncertain and unclear in its use or interpretation. Add structure to data in the
interest of reducing uncertainty and the result is information. (Information = structure (data).) Information is
built out of data by applying structure, categories, processes — including data models, functional
transformations (ETL), queries and representation — in a process that generates increasing objectivity,
usability and certainty. Each of these dimensions is further decomposed. So, objectivity includes such aspects
as accuracy, existence, causality, consistency; timeliness; completeness, unambiguousness, precision (not
vague); usability includes ease of interpretation, availability, security; and trustworthiness includes
credibility, believability and the accumulated lessons of experience. Thus, information quality is improved.
However, no matter how much it is improved and how certain it is, information is still not knowledge. It is
not as if the information were getting more and more certain and finally resulted in knowledge. To get
knowledge from information, something else — a commitment to a business decision — must be added.

Philosophers, marketing executives, linguists and scientists have struggled with the distinctions between data,
information and knowledge for decades if not centuries. Giga has taken a pragmatic approach to defining
these distinctions. We know that lack of data quality costs money — misdirected mail is returned, effort is
wasted, rework is incurred, sales are lost and inventory outages occur. Quality implies differences,
differences imply distinctions of value and distinctions of value imply market value. Market value implies the
dollar value. Like so many things, information quality is a bootstrap operation requiring iteration, a process of
learning from one’s mistakes and commitment to business results. Start by employing data profiling to build

an inventory of data assets and evaluate the state of information quality within the enterprise on a system-by-
system basis. Be prepared for “roll-up-the-sleeves” hard work. This is likely to be both a top-down and
bottom-up task, since the impact on information quality of relations between systems can only be evaluated
by including both sides of the interface.

