You are on page 1of 5

DATA QUALITY SUMMARY

The foundation of a build-


ing plays a major role in the

Data Quality and Data successful construction and


longevity of the building. The

Quality Dimensions
stronger the foundation, the
stronger the building. In the
same way, data are the foun-
By Rupa Mahanti dation upon which high-per-
formance organizations rest
in this competitive age. Data
are no longer a by-product of
the various applications and systems for
INDUSTRIAL REVOLUTION, DIGITAL business functions in an organization
an organization’s information
technology (IT) systems and
AGE, AND DATA (Mahanti 2019). applications, but are an orga-
nization’s most valuable asset
The first industrial revolution (late 1700s
and early 1800s) was characterized by
DATA-CENTRIC ORGANIZATIONS and resource, with a real,
measurable value. Besides
steam-powered machines, and the second In some organizations, data are the the importance of data as a
industrial revolution (late 1800s) was primary product or service. Insurance resource, it is also appropriate
characterized by electricity and assembly companies, banks, online retailers, credit to view data as a commodity.
lines. The introduction of computers, card companies, financial services com- These issues are compounded
innovations in computing, and industrial panies, and the Internal Revenue Service because the value of the data
automation defined the third industrial (IRS) are all organizations in which does not only rest with the
revolution (Radziwill 2018). The current business is data centric. These organiza- data themselves, but also the
and fourth industrial revolution, common- tions rely heavily on data and processing actions that arise from the
ly known as Industry 4.0 (Kagermann 2011), data as their primary activities. These data and their usage (Mahanti
is characterized by machine intelligence, organizations primarily process and trade 2019). This article summariz-
pervasive computing, affordable storage, information products (Mahanti 2019). es the case for data quality,
robust connectivity (Radziwill 2018), and Other organizations, such as man- discusses the different data
Internet of Systems and Industrial Inter- ufacturing, utilities, and healthcare quality dimensions and their
net of Things (IIoT) and fosters the vision organizations, may appear to be less role in ensuring quality data
of a smart factory. Data are the enabler and involved with information systems in a succinct fashion, and the
the differentiator in the Industry 4.0 era. because their products or activities are central role data quality plays
In this connected, digital age, high-quality not information-specific. However, if one in ensuring product quality,
data form the basis for every solid opera- looks beyond the products into opera- process quality, and compli-
tional and strategic process. Good quality tions, it becomes clear that most of their ance in the digital age.
data are essential to providing excellent activities and decisions are driven by data.
customer service, making operations For instance, manufacturing organiza-
efficient, ensuring compliance with regu- tions process raw materials to produce KEYWORDS
latory requirements, engaging in effective and ship products. However, data drives
decision making, and conducting effective the processes of material acquisition, data, data quality, data quality
strategic business planning. The data used inventory management, supply chain dimensions, digital age,
for all these decisions need to be managed management, determining final product Industry 4.0
efficiently in order to generate a return. quality, order processing, shipping, and
Additionally, the same data are often billing (Mahanti 2019).
used several times for multiple purposes. For utility companies, assets and asset
For example, address data are used for maintenance are the primary concerns.
deliveries, billing, invoices, and marketing. To make these processes effective, they
Product data are used for sales, inventory, require good quality data about their
forecasting, marketing, financial forecasts, assets and asset performance (in addition
and supply chain management. Although to customer data, sales information,
one often thinks in terms of software marketing data, billing, and service data)
systems driving organizational processes, to be able to provide good service and gain
instead, data are the true foundation for a competitive advantage (Mahanti 2019).

4 | SQP VOL. 22 | NO. 1


Data Quality and Data Quality Dimensions

For hospitals and healthcare organizations, the defined as the data’s fitness for use or purpose for a given
primary activity is medical and patient care. While these context or specific task. Data quality reflects insights
activities on their own are not information-centric, hos- into, or direct evaluation of, data’s fitness to serve their
pitals need to store and process patient data, physician purpose in a given context. In this sense, data reflect
data, encounter data, information about care protocols, J. M. Juran’s “fitness for use” criterion applied to any
information about resource utilization and scheduling, other product or entity, and data on performance are
and patient billing to provide good quality service essential for all quality planning (Bisgaard 2007).
(Mahanti 2019). Furthermore, the Internet of Things Data quality is accomplished when a business uses
(IoT) is making it possible to establish health monitoring data that are, at a minimum, complete, relevant, and
networks around patients, and to connect patients and timely. Determination of quality is dynamic, as a certain
healthcare professionals via wearable sensors that collect level of excellence is not universal, not an absolute, and
vital data about the human body for further use (Wu et al. not a constant but is assessed to a relative degree. The
2017; Cicirelli et al. 2016; Gope and Hwang 2016). same applies in the case of data quality (Mahanti 2019).
When people talk about data quality, they usually
relate to data accuracy only and do not consider or assess
DATA AND ORGANIZATIONAL IMPACT other important data quality dimensions in their quest
New trends in data warehousing, business intelligence, to achieve better quality data. Undeniably, data are
data mining, data analytics, decision support, enterprise normally considered of poor quality if erroneous values
resource planning, and customer relationship manage- are associated with the real-world entity or event, such
ment systems draw attention to the fact that data plays as an incorrect zip code used in an address, a wrong date
an ever-growing and important role in organizations of birth, an incorrect title or gender for employees or
(Mahanti 2019). customers, incorrect phone numbers or email IDs in
More information is available because people and contact information, or incorrect product specifications
devices (such as sensors and actuators) are producing in the case of products in a retail store (Mahanti 2019).
data at greater rates than ever before (Radziwill 2018). However, data quality is not one-dimensional, but
Large volumes of data across the various applications and rather multidimensional and hierarchical, and hence
systems in organizations present a number of challenges complex. While accuracy is definitely an important
to the organization. From executive-level decisions about characteristic of data quality and therefore should not
mergers and acquisition activity to call-center represen- be overlooked, accuracy alone does not completely
tatives making split-second decisions about customer characterize the quality of data. Data quality has many
service, the data an enterprise collects on virtually more attributes than the evident characteristic of data
every aspect of the organization—customers, prospects, accuracy. There are other substantial dimensions, such as
products, inventory, finances, assets, or employees—can completeness, consistency, currency, and timeliness, that
have a significant effect on the organization’s ability to are needed to holistically illustrate the quality of data
meet quality and performance objectives. These may across multiple dimensions (Mahanti 2019).
include satisfying customers, reducing costs, improving Despite the fact that fitness for use or purpose does
productivity, mitigating risks (Dorr and Murnane 2011), capture the principle of quality, it is still abstract. Thus,
or increasing operational efficiency. it is a challenge to measure data quality using only this
Accurate, complete, current, consistent, and timely holistic construct or definition. It needs to be broken
data are critical to accurate, timely, and unbiased deci- into measurable facets or characteristics, known as data
sions. Since data and information are the basis of decision quality dimensions, for data quality assessment to be
making, they must be carefully managed to ensure they actionable. Hence, to measure data quality, one needs to
can be located easily; relied upon for their currency, com- measure one or more of the dimensions of data quality,
pleteness, and accuracy; and obtained when and where depending on the context, situation, and task for which
the data are needed (Mahanti 2019). the data are to be used. In short, these data quality
dimensions enable one to operationalize data quality
(Mahanti 2019).
DATA QUALITY
While good data are a source of information, knowledge,
and myriad opportunities, bad data are a tremendous
DATA QUALITY DIMENSIONS
burden and only present problems. There are many ways Each data quality dimension captures a particular measur-
of defining data quality. Data quality is the capability of able aspect of data quality. In other words, the data quality
data to satisfy the stated business, system, and technical dimensions represent the views, benchmarks, or measures
requirements of an enterprise. Data quality can be for data quality issues that can be understood, analyzed,

www.asq.org | 5
Data Quality and Data Quality Dimensions

and (eventually) resolved or minimized. Enterprise data and assessing data quality and includes the foundational
must conform to the various dimensions of data quality information necessary to comprehend common-sense
an organization has determined are important to be fit for assumptions about data, thus providing a starting point
operational and analytical use (Mahanti 2019). for defining expectations related to data quality (Sebas-
When defining data quality measures, one should tian-Coleman 2013).
try to focus at a minimum on the dimensions that are In the absence of metadata or given inadequate
meaningful and pertinent for the business with maximum metadata, subject matter experts need to be consulted to
return on investment. On the other hand, measuring all get an understanding of the data. When measuring data
the different dimensions of data quality gives the complete quality dimensions, it is also imperative to contemplate
picture. Each organization needs to identify the appropri- the data granularity level at which they are applicable,
ate balance based on its unique competitive position and so the measurements are practically useful. In studying
risk appetite. Also, data quality dimensions are intercon- data quality dimensions, the author observes that some
nected, and dependencies and correlations exist between dimensions (for example, data coverage and timeliness)
them, which must be taken into account when measure- are applicable at higher granularity levels. such as
ment is being planned (Mahanti 2019). The different data the data set level. Alternatively, dimensions such as
quality dimensions are summarized in Table 1. completeness can be applicable at lower levels of data
granularity, namely the data element level. Granularity
may depend on the types of the dimensions that are
MEASURING DATA QUALITY DIMENSIONS selected for measurement (Mahanti 2019).
The management axiom “what gets measured gets Data quality dimensions that are related to characteristics
managed” (Willcocks and Lester 1996) applies to data of the data themselves, for example, completeness,
quality and, in this light, data quality dimensions signify accuracy, consistency, uniqueness, integrity, and
a fundamental management element in the data quality validity, are primarily defined based on data element
arena. Measurement exposes the hidden truths and thus and/or data record level. Measurements in this case
is essentially the first step toward diagnosing and fixing generally involve objectively examining data values
data quality. stored in the data set against business rules to measure
With data quality being such a broad topic, and with the data quality dimensions. On the other hand, the data
the huge amounts of data and number of data elements quality dimensions that deal with the usage of data that
that organizations have and continue to capture, store, contribute to users’ judgment about the data’s fitness for
and accumulate (thanks to the capabilities of digitiza- use, such as interpretability, accessibility, and credibility,
tion), measurement can feel overwhelming. The myth may be defined based on any arbitrary abstraction of data
that data need to be 100 percent error-free makes things elements, records, or data sets (Mahanti 2019).
even more difficult. Not all data quality dimensions need
to be measured for data, nor do all data elements need
to be subject to measurement. Only those data elements
CONCLUDING THOUGHTS
that drive significant benefits should be measured for Data without quality can neither contribute any value
quality purposes. However, data do not need to be 100 nor serve any purpose. Hence, high-quality data is not a
percent error-free, and, though data quality is broad, the “nice-to-have” requirement but a “must-have” require-
various data quality dimensions make measurement an ment. A data quality improvement program can be driven
achievable exercise (Mahanti 2019). with the Six Sigma approach (Mahanti 2019). However,
The degree of data quality excellence that should Six Sigma improvement projects will not yield reliable
be attained and sustained is driven by the criticality of outcomes without measurements that take into account
the data, the business need, and the cost and time to all the required data quality dimensions.
achieve the defined degree of data quality. The costs in While measurement is an integral part of the data
time, resources, and dollars to achieve and sustain the quality journey, data quality management involves much
desired quality level must be balanced against the return more than measurement. It also involves the manage-
on investment and benefits derived from that degree of ment of people, processes, policies, technology, standards,
quality (Mahanti 2019). and data within an enterprise. Data quality management
Measurement of data quality dimensions for a data set is data-, people-, process-, and technology-intensive, with
involves understanding the data set as a whole, as well as data being at the core, and, as such, to succeed would
understanding the constituent data elements, the context need all these elements to work in an integrated manner
of data use, and the characteristics of the data elements. to ensure success (Mahanti 2019). Good data quality and
This can include size, data type, and default values. effective management of data and processes to improve
Metadata is the first input to the process of measuring and sustain data quality can reduce costs and risks,

6 | SQP VOL. 22 | NO. 1


Data Quality and Data Quality Dimensions

Table 1 Data quality dimensions in a nutshell (Adapted from Mahanti 2019).

Data Quality Dimension Definition

Accessibility The ease with which the existence of data can be determined, the suitability of the form or medium through
which the data can be quickly and easily retrieved.
Accuracy The extent to which data are the true representation of reality, be it features of the real-world entity, situation,
object, phenomenon or event, which they intend to model.
Believability The extent to which the data are regarded as being trustworthy and credible by the user.

Credibility The extent to which the good faith of a provider of data or source of data can be relied upon to ensure what
the data actually represent is what the data are supposed to represent, and there is no intent to misrepresent
what the data are supposed to represent (Chisholm 2014).
Trustworthiness The extent to which the data originate from trustworthy sources.

Reputation The extent to which the data are highly regarded in terms of their source or content (Pipino et al. 2002)

Timeliness The time expectation of availability of data for consumption.

Currency The extent to which the stored data values are sufficiently up to date for the intent of use despite lapse of time.

Volatility The frequency with which the data elements change over time.

Correctness Refers to freedom from errors.

Precision The extent to which the data elements contain a sufficient level of detail.

Reliability Whether the data can be counted on to convey the right information (Wand and Wang 1996).

Consistency The extent to which the same data are equivalent across different data tables sources or systems.

Integrity The extent to which data are not missing important relationship linkages (Faltin et al. 2012) and the relationship
linkages are valid.
Completeness The extent to which the applicable data (data element, records or data set) are not absent.

Conformance/Validity The extent to which data elements comply to a set of internal or external standards or guidelines or standard
data definitions, including data type, size, format, and other features.
Interpretability The extent to which the user can easily understand and properly use and analyze the data.

Security The extent to which access to data is restricted and regulated appropriately to prevent unauthorized access.

Conciseness The extent to which the data are represented in a compact manner but at the same time are complete.

Uniqueness The extent to which an entity is recorded only once and there are no repetitions. Duplication is the inverse of
uniqueness.
Duplication The extent of unwanted duplication of an entity. Uniqueness is the inverse of duplication.

Cardinality Refers to the uniqueness of the data values that are contained in a particular column, known as attribute, of a
database table.
Data coverage The extent of the availability and comprehensiveness of the data when compared to total data universe or
population of interest (McGilvray 2008).
Relevance The extent to which the data content and coverage is relevant for the purpose for which it is used and the
extent to which it meets the current and potential future needs.
Ease of manipulation The extent to which the data can be easily manipulated or transformed for different tasks.

Objectivity The extent to which the data are free from bias and judgement.

Traceability/Lineage The extent to which data can be verified with respect to the origin, history, first inserted date and time,
updated date and time, and audit trail by means of documented recorded identification.
Data Specification A measure of the existence, completeness, quality, and documentation of data standards, data models,
business rules, metadata, and reference data (McGilvray 2008).
Granularity The extent to which data elements can be subdivided.

Redundancy The extent to which data are replicated and captured in two different systems in different storage locations.

©2019 ASQ

www.asq.org | 7
Data Quality and Data Quality Dimensions

eliminate waste, empower decision making, improve


customer satisfaction, enhance brand image, and help ACKNOWLEDGEMENTS
organizations be compliant and satisfy privacy, security,
and regulatory requirements. To learn more about data quality such as the myths, chal-
Data quality is just as important whether an organi- lenges, critical success factors, strategy, DQ dimensions,
zation is dealing with Big Data or “small data.” However, data profiling, and more, including how to measure data
data management technologies that are suitable for quality dimensions, implement methodologies for data
smaller data will not work for Big Data, owing to the quality management, and data quality aspects to consider
different Vs of Big Data (namely, volume, velocity, va- when undertaking data intensive projects, please read
riety, and veracity). For example, fancy charts, graphs, Data Quality: Dimensions, Measurement, Strategy, Man-
and analysis provided for Big Data may not contain the agement and Governance published by ASQ Quality Press
comparable accuracy that is experienced with the “tried- in 2019. This article draws significantly from the research
and-true” methods for smaller data (Duarte and Dame presented in that book. Special thanks to Nicole Radzi-
2019). Future research will be focused on Big Data. will, SQP editor, for reviewing and editing this article.

REFERENCES

Bisgaard, S. 2007. Quality management and Juran’s legacy. Quality Mahanti, R. 2019. Data quality: Dimensions, measurement, strategy,
and Reliability Engineering International 23, no. 6:665-677. management and governance. Milwaukee: ASQ Quality Press.

Chisholm, M. 2014. Data credibility: A new dimension of data quality? McGilvray, D. 2008. Executing data quality projects. Burlington, MA:
Available at: https://www.information-management.com/news/ Morgan Kaufmann.
data-credibility-a-new-dimension-of-data-quality.
Pipino, L. L., Y. W. Lee, and R. W. Wang. 2002. Data quality assess-
Cicirelli, F., G. Fortino, A. Giordano, A. Guerrieri, G. Spezzano, and A. Vinci. ment. Communications of the ACM 45, 211-218.
2016. On the design of smart homes: A framework for activity recog-
nition in home environment. Journal of Medical Systems 40, no. 9:1–17. Radziwill, N. M. 2018. Let’s get digital. Quality Progress (October).

Dorr, B., and R. Murnane. 2011. Using data profiling, data quality, and Sebastian-Coleman, L. 2013. Measuring data quality for ongoing
data monitoring to improve enterprise information. Software Quality improvement: A data quality assessment framework. Burlington, MA:
Professional 13, no. 4:9. Morgan-Kaufmann.

Duarte, J., and J. Dame. 2019. Data science and the quality profes- Wand, Y., and R. Y. Wang. 1996. Anchoring data quality dimensions in
sional. Software Quality Professional 21, no. 3:13-19. ontological foundations. Communications of the ACM 39, 11.

Faltin, F., R. S. Kenett, and F. Ruggeri. 2012. Statistical methods in Willcocks, L., and S. Lester. 1996. Beyond the IT productivity paradox.
healthcare. New York: Wiley. European Management Journal 14, no. 3:279-290.

Kagermann, Henning, Wolf-Dieter Lukas, and Wolfgang Wahlster. Wu, T., F. Wu, J. M. Redouté, and M. R. Yuce. 2017. An autonomous
2011. Industrie 4.0: Mit dem Internet der Dinge auf dem Weg zur 4. In- wireless body area network implementation towards IoT connected
dustriellen Revolution, VDI Nachrichten 13, no. 11. Available at: https:// healthcare applications. IEEE Access 5, 11413-11422.
tinyurl.com/ly6vkgf.

BIOGRAPHY
Rupa Mahanti is a business and information management consultant and has extensive and diversified consulting experience in different tech-
nologies, solution environments, business areas, industry sectors, and geographies (United States, United Kingdom, India, and Australia). With a
work experience that spans industry, academic, and research, Mahanti has guided a doctoral dissertation, published a large number of research
articles, and authored the book Data Quality: Dimensions, Measurement, Strategy, Management and Governance.  She is an associate editor with
Software Quality Professional and a reviewer for several international journals. She can be reached at rupa.mahanti0@gmail.com.

8 | SQP VOL. 22 | NO. 1

You might also like