You are on page 1of 6

contributed articles

DOI: 10.1145/ 1629175.1629210


we also differentiate between gover-
BY VIJAY KHATRI AND CAROL V. BROWN nance and management as follows:
! Governance refers to what decisions

Designing
must be made to ensure effective man-
agement and use of IT (decision do-
mains) and who makes the decisions
(locus of accountability for decision-

Data
making).
! Management involves making and
implementing decisions.

Governance
For example, governance includes
establishing who in the organization
holds decision rights for determining
standards for data quality. Manage-
ment involves determining the actual
metrics employed for data quality.
Here, we focus on the former.
Corporate governance has been de-
fined as a set of relationships between
ORGANIZATIONS ARE BECOMING INCREASINGLY SERIOUS a company’s management, its board,
about the notion of “data as an asset” as they face its shareholders and other stakehold-
increasing pressure for reporting a “single version of ers that provide a structure for deter-
mining organizational objectives and
the truth.” In a 2006 survey of 359 North American monitoring performance, thereby en-
organizations that had deployed business intelligence suring that corporate objectives are
attained. Considering the synergy be-
and analytic systems, a program for the governance tween macroeconomic and structural
of data was reported to be one of the five success policies, corporate governance is a key
“practices” for deriving business value from data element in not only improving eco-
nomic efficiency and growth, but also
assets.a In light of the opportunities to leverage enhancing corporate confidence.b A
data assets as well ensure legislative compliance to framework for linking corporate and
IT governance (see Figure 1) has been
mandates such as the Sarbanes-Oxley (SOX) Act and proposed by Weill and Ross.10
Basel II, data governance has also recently been given Unlike these authors, however, we
significant prominence in practitioners’ conferences, differentiate between IT assets and in-
formation assets: IT assets refers to tech-
such as TDWI (The Data Warehousing Institute) nologies (computers, communication
World Conference and DAMA (Data Management and databases) that help support the
automation of well-defined tasks, while
Association) International Symposium. information assets (or data) are defined
The objective of this article is to provide an overall as facts having value or potential value
framework for data governance that can be used by that are documented. Note that in the
context of this article, we do not differ-
researchers to focus on important data governance entiate between data and information.
issues, and by practitioners to develop an effective data Next, we use the Weill and Ross
framework for IT governance as a start-
governance approach, strategy and design. Designing ing point for our own framework for
data governance requires stepping back from day-to- data governance. We then propose a set
day decision making and focusing on identifying the a http://mediakit.businessweek.com/pdf/research/
fundamental decisions that need to be made and who KnightsbridgeWhitePaper.pdf (last viewed on August 2,
2007)
should be making them. Based on Weill and Ross,10 b http://www.oecd.org/dataoecd/32/18/31557724.pdf

148 CO M MUNICATIO NS O F T H E AC M | JA NUA RY 201 0 | VO L . 5 3 | N O. 1


contributed articles

of five data decision domains, why they Figure 1: Key organizational assets to be governed; adapted from Weill and Ross.10
are important, and guidelines for what
governance is needed for each decision
domain. By operationalizing the locus
of accountability of decision making (the
“who”) for each decision domain, we
create a data governance matrix, which
can be used by practitioners to design Figure 2: Decision domains for data governance.
their data governance. The insights
presented here have been informed by
field research, and address an area that
is of growing interest to the informa-
tion systems (IS) research and practice
community.

IT Governance as the Context


for Data Governance
IT governance refers to who holds the
decision rights and is held accountable
for an organization’s decision-making
about IT assets. In their IT governance
framework, Weill and Ross propose that Table 1: Framework for data decision domains.
governance design includes five major
decision domains: IT principles; IT ar- Data Governance Domain Decisions Potential Roles or Locus of
Domains Accountability
chitecture; IT infrastructure; Business
application needs; and IT investment Data Principles ˜K\UhUfYh\YigYgcZXUhUZcfh\YVig]bYgg3 ˜8UhUckbYf#hfighYY
˜7`Uf]Zm]b[h\Yfc`YcZ ˜K\UhUfYh\YaYW\Ub]gagZcfWcaaib]WUh]b[ ˜8UhUWighcX]Ub
and prioritization. Although the five key XUhUUgUbUggYh Vig]bYggigYgcZXUhUcbUbcb[c]b[VUg]g3 ˜8UhUghYkUfX
decisions are interrelated, each of these ˜K\UhUfYh\YXYg]fUV`YVY\Uj]cfgZcfYad`cm]b[ ˜8UhUdfcXiWYf#gidd`]Yf
XUhUUgUggYhg3 ˜8UhUWcbgiaYf
decisions deals with a distinctive set of ˜<ckUfYcddcfhib]h]YgZcfg\Uf]b[UbXfYigYcZ ˜9bhYfdf]gY8UhU7caa]hhYY#
core issues. IT principles clarify the role XUhU]XYbh]ÑYX3 7cibW]`
˜<ckXcYgh\YfY[i`UhcfmYbj]fcbaYbh]bÒiYbWY
that IT plays in the organization and h\YVig]bYggigYgcZXUhU3
drive the IT architecture decisions that
Data Quality ˜K\UhUfYh\YghUbXUfXgZcfXUhUeiU`]hmk]h\ ˜8UhUckbYf
establish the IT infrastructure. The orga- ˜9ghUV`]g\]b[h\Y fYgdYWhhcUWWifUWm h]aY`]bYgg Wcad`YhYbYgg ˜GiV^YWhaUhhYfYldYfh
nization’s IT infrastructure capabilities fYei]fYaYbhgcZ UbXWfYX]V]`]hm3 ˜8UhUeiU`]hmaUbU[Yf
]bhYbXYXigYcZXUhU ˜K\Uh]gh\Ydfc[fUaZcfYghUV`]g\]b[UbX ˜8UhUeiU`]hmUbU`mgh
enable its business application needs, and Wcaaib]WUh]b[XUhUeiU`]hm3
the need for new IT applications can cre- ˜<ckk]``XUhUeiU`]hmUgkY``Ugh\YUggcW]UhYX
dfc[fUaVYYjU`iUhYX3
ate new IT infrastructure requirements.
IT investment and prioritization decisions Metadata ˜K\Uh]gh\Ydfc[fUaZcfXcWiaYbh]b[h\Y ˜9bhYfdf]gYXUhUUfW\]hYWh
˜9ghUV`]g\]b[h\Y gYaUbh]WgcZXUhU3 ˜9bhYfdf]gYXUhUacXY`Yf
are in turn shaped by the organization’s gYaUbh]Wgcf ˜<ckk]``XUhUVYWcbg]ghYbh`mXYÑbYXUbX ˜8UhUacXY`]b[Yb[]bYYf
IT principles, architecture, infrastruc- ÅWcbhYbhÆcZXUhUgc acXY`YXgch\Uh]h]g]bhYfdfYhUV`Y3 ˜8UhUUfW\]hYWh
h\Uh]h]g]bhYfdfYhUV`Y ˜K\Uh]gh\Yd`Ubhc_YYdX]ZZYfYbhhmdYgcZ ˜9bhYfdf]gY5fW\]hYWhifY
ture, and application needs. Vmh\YigYfg aYhUXUhUid!hc!XUhY3 7caa]hhYY
Data Access ˜K\Uh]gh\YVig]bYggjU`iYcZXUhU3 ˜8UhUckbYf
Data Governance: ˜GdYW]Zm]b[UWWYgg ˜<ckk]``f]g_UggYggaYbhVYWcbXiWhYXcbUb ˜8UhUVYbYÑW]Ufm
The Five Decision Domains fYei]fYaYbhgcZXUhU cb[c]b[VUg]g3 ˜7\]YZ]bZcfaUh]cbgYWif]hm
˜<ckk]``UggYggaYbhfYgi`hgVY]bhY[fUhYXk]h\ cZÑWYf
Data governance refers to who holds the h\YcjYfU``Wcad`]UbWYacb]hcf]b[YZZcfhg3 ˜8UhUgYWif]hmcZÑWYf
decision rights and is held accountable ˜K\UhUfYXUhUUWWYggghUbXUfXgUbX ˜HYW\b]WU`gYWif]hmUbU`mgh
dfcWYXifYg3 ˜9bhYfdf]gY5fW\]hYWhifY
for an organization’s decision-making ˜K\Uh]gh\Ydfc[fUaZcfdYf]cX]Wacb]hcf]b[ 8YjY`cdaYbh7caa]hhYY
about its data assets. Our framework for UbXUiX]hZcfWcad`]UbWY3
data governance includes five interrelat- ˜<ck]ggYWif]hmUkUfYbYggUbXYXiWUh]cb
X]ggYa]bUhYX3
ed decision domains: Data principles; ˜K\Uh]gh\Ydfc[fUaZcfVUW_idUbXfYWcjYfm3
Data quality; Metadata; Data access; Data Lifecycle ˜<ck]gXUhU]bjYbhcf]YX3 ˜9bhYfdf]gYXUhUUfW\]hYWh
and Data lifecycle. Figure 2 emphasizes ˜8YhYfa]b]b[h\Y ˜K\Uh]gh\Ydfc[fUaZcfXUhUXYÑb]h]cb  ˜=bZcfaUh]cbW\U]baUbU[Yf
the interconnections between these de- XYÑb]h]cb dfcXiWh]cb  dfcXiWh]cb fYhYbh]cb UbXfYh]fYaYbhZcf
fYhYbh]cbUbX X]ZZYfYbhhmdYgcZXUhU3
cision domains. Data principles, shown fYh]fYaYbhcZXUhU ˜<ckXch\YWcad`]UbWY]ggiYgfY`UhYXhc
at the top of the framework, establish `Y[]g`Uh]cbUZZYWhXUhUfYhYbh]cbUbXUfW\]j]b[3
the direction for all other decisions.
An organization’s data principles set which in turn are the basis for how data retirement of data (data lifecycle) play
the boundary requirements for the in- is interpreted (metadata) and accessed a key role in operationalizing the data
tended uses of data, which set the or- (data access) by users. Decisions that principles into IT infrastructure.
ganization’s standards for data quality, define the production, retention and Table 1 summarizes the scope of

JA N UA RY 2 0 1 0 | VO L. 53 | N O. 1 | C OM M U N IC AT ION S OF T H E ACM 149


contributed articles

Figure 3: Framework for IT and data decision domains.

each decision domain with examples data stewards wherein they employ IT ing standards with respect to various
of the types of decisions to be made for tools (such as, DataFlux, Informatica dimensions of data quality, defines
each domain. The far righthand col- Data Quality) that help surface quality mechanisms for communicating busi-
umn in Table 1 also provides examples issues for the business owners (or data ness uses of data on an ongoing basis,
of potential organizational roles that owners/trustees). and delineates procedures for evaluat-
could be vested with decision rights for Data Quality. Poor data quality can ing the quality of data. By providing a
the various domains—that is, the “lo- impact an enterprise at both operation- roadmap for interpreting (metadata)
cus of accountability.” A case study that al and strategic levels7; current prob- and assessing data, data quality deci-
we conducted with a large insurance lems in data quality reportedly cost US sions are pivotal in the effective gover-
company revealed several such roles businesses more than $611 billion ev- nance of data assets.
for data governance: for example, the ery year in postage, printing, and staff Metadata. Defined as “data about
governance of data access was vested overhead.c Similar to product quality,3 data,” metadata describes what the data
in an Enterprise Architecture Develop- the quality of data refers to its ability to is about and provides a mechanism for
ment Committee. satisfy its usage requirements.5 While a concise and consistent description
Data Principles. Effective data prin- data quality has multiple dimensions, of the representation of data, thereby
ciples establish the linkage with the such as accuracy, timeliness, complete- helping interpret the meaning or “se-
business. For example, the organiza- ness and credibility, these dimensions mantics” of data. Different types of
tional decision to standardize business are relative and need to be defined in metadata such as physical, domain-in-
processes implies that there should be the context of the end use of data.1,5,9 dependent, domain-specific, and user
a clearly defined business owner of data For example, while 85% accuracy of the metadata8 play a role in the discovery,
assets (data principle). By delineating name, address, and phone number of retrieval, collation and analysis of data.
the business uses of data, data prin- physicians may be acceptable for an At the lowest level, physical metadata
ciples therefore establish the extent to insurance company that is targeting includes information about the physi-
which data is an enterprisewide asset, physicians as potential customers, this cal storage of data. Domain-independent
and thus what specific policies, stan- metric would not be acceptable for or- metadata includes descriptions such
dards and guidelines are appropriate. ganizations that need to notify prescrib- as the creator/modifier of data and
In keeping with the notion of data as ing physicians about a drug recall. authorization/audit/lineage informa-
an asset, data principles also establish/ ! Accuracy refers to correctness of data, tion related to the data. By providing a
foster opportunities for sharing and re- that is, whether the recorded value is in set of mappings from a representation
using data. Each principle is supported conformity with actual value, with re- language to agreed-upon concepts in
by a rationale and a set of implications. spect to its intended use. the real world, domain-specific meta-
Data principles take into account the ! Timeliness indicates that the recorded data connects a database to the “real
usage of external data, such as, custom- value is up-to-date for the task at hand. world.” Domain-specific metadata, for
er data from third-party service provid- ! Completeness suggests that the req- example, can be specified at different
ers. An organization’s data principles uisite values are recorded (not miss- levels—such as division and organiza-
also take into consideration the regula- ing) and that it is of adequate depth/ tion; at the division-level it provides
tory environment that could influence breadth. descriptions of the application data for
the business uses of data. ! Credibility indicates the trustworthi- individual units, while at the organiza-
Data principles therefore define ness of the source as well as its content. tion-level it supports reconciliation of
the desirable behaviors both for IS The data quality decision domain— domain-specific (data) descriptions for
professionals and business users. For which could be vested with roles such the entire organization. Finally, user
example, the notion of business own- as data quality manager, data quality metadata includes annotations that
ers of data implies that business users analyst, data quality trainer and sub- users may associate with data items or
have an important role in managing ject matter expert—provides underly- collections; such annotations can, for
data quality as well as its lifecycle, in- example, capture user preferences and
terpretability and access. On the other c http://www.dw-institute.com/research/display. usage history.
hand, IS professionals play the role of aspx?ID=6626 The metadata that is employed in

150 CO M MUNICATIO NS O F T H E ACM | JA NUA RY 201 0 | VO L . 5 3 | N O. 1


contributed articles

an enterprise depends on the intended Table 2: Potential example of data governance matrix.
use of and access to the data, as well
Decision
as the management of its life cycle. To Domain Data Data
Data Quality Metadata Data Access
support retrieval and analysis of data, Locus of Principles Lifecycle
accountability
the metadata decision domain may
be vested in such roles as enterprise Centralized ✓

data architects and data modeling en- ✓ ✓


gineers to develop a programmatic ✓
approach for documenting the seman-
tics of data. To ensure that the data is

interpretable, standardizing metadata
Decentralized
provides the ability to effectively use
and track information. As the environ-
ment for a business changes, the way value of the diagnostic information replaced with a metadata pointer that
an organization conducts business – of a patient admitted in the hospital enables its retrieval from the archive;
and consequently the associated data – changes as the patient undertakes sur- additionally, the archive is usually in-
also changes. As such, there is a need to gery, moves to an acute care center, is dexed. In contrast, a backup involves
manage changes in metadata as well. discharged, receives a follow-up consul- saving a large block of (snapshot) data
Data Access. Data access is premised tation, and transitions from sick-care to on a secondary storage medium, which
on the ability of data beneficiaries to wellness-care. By understanding how provides temporary protection of data.
assign a value to different categories of data is used, and how long it must be
data. Effective risk analysis by data se- retained, organizations can develop ap- Assessing Data Governance
curity officers, for example, identifies proaches to map usage patterns to the To design data governance, we have
the data needs of the business and ad- optimal storage media, thereby mini- presented an overall framework that
dresses safeguards to ensure the confi- mizing the total cost of storing data over provides a set of five data decision do-
dentiality, integrity and availability of its life cycle. mains. By specifying data decision do-
data. By integrating risk assessment Many organizations do not know mains that are consonant with IT deci-
with an organization’s legal and regu- what data they have, how critical that sion domains, we have also provided
latory compliance monitoring efforts data is, the sources that exist for criti- an overarching framework to align the
(such as requirements of the Graham- cal data, or the degree of redundancy of IT assets with the data assets (see Table
Leach Bliley Act for financial industry), their data assets.4 In order to manage 1). IT infrastructure includes decisions
industry standards serve as a guide for the inventory of data as well as its vari- that determine shared and enabling
the writing and updating of an organi- ous data sources, information chain services and the capabilities to enable
zation’s access policies and standards. managersd develop an understanding tracking, storing, analyzing, modeling
The data access standards (and the as- of different types of data that are the and presenting data. As may be evident,
sociated service level agreements) can most/least prevalent, their storage re- the decisions related to IT governance
be based on the definition of “unac- quirements, and the growth trends. A are related to those for data gover-
ceptable” uses of data and external re- data taxonomy can help in the man- nance; similarly, data governance deci-
quirements for auditability (the ability agement of the lifecycle of data, which sions should be tightly integrated with
to track who/what has accessed/modi- in turn can be embedded as metadata; those in IT governance. As such, defin-
fied data), privacy and availability. Data additionally, service level agreements ing common mechanisms across data
access decisions also provide stan- (for data access/use) can also be em- and IT assets could induce improved
dards at the physical and logical level.6 bedded as metadata. By placing data performance. For example, the same
The standards for physical data integ- on an appropriate storage medium committee that establishes the role of
rity ensure that the data is immune to according to business needs, data can IT in business (IT principles) could be
physical harm such as power failure; be more effectively distributed across employed to clarify the role of data as
the standards for logical data integrity multiple resources, thus leading to im- an asset (data principles).
ensure that the structure of a database proved storage utilization and reduced In designing data governance, the as-
is preserved. Developing integrated, storage acquisition costs. signment of the locus of accountability for
enterprise-wide data access decisions Besides cost imperatives, compli- each decision domain will be somewhere
can also help automate the migration ance issues related to legislation, such on a continuum between centralized
of data from over-utilized volumes into as HIPAA, SOX and Basel II, determine and decentralized.2 Table 2 provides an
under-utilized volumes across DAS/ how organizations must deal with the example of what a data governance ma-
NAS/SAN environments. lifecycle of data, its retention and ar- trix, which includes locus of data deci-
Data Life cycle. Realizing that all chival. Archive and backup are not syn- sion making accountability for each of
data moves through life-cycle stages is onymous. When a file is archived, it is the five decision domains, could be for
central to designing data governance. usually deleted from the source and a given organization. For example, the
From the perspective of data in an elec- decision rights for defining the organi-
tronic health record (EHR) maintained d https://www.research.ibm.com/journal/sj/464/vayghan. zation’s data principles could be highly
by a hospital, the uses and thereby the pdf centralized within a group of corporate

JA N UA RY 2 0 1 0 | VO L. 53 | N O. 1 | C OM M U N IC AT ION S OF T H E ACM 151


contributed articles

executives who serve as data trustees. In propriate for different decision do-
contrast, decision rights for data qual- mains in the same organization. Simi-
ity may belong to business managers lar to Weill and Ross,10 we also suggest
who are data owners in many different that a “one page” design matrix (Table
business units, and thus be highly de- 2) may be useful for communicating a
centralized. Decisions related to data given organization’s data governance
access and data life cycle may be vested approach. The proposed framework
with an enterprise data architect and also provides a common terminology
a data security officer, respectively, as that can be used by researchers to share
the hub, but with business unit partici- their findings with other members of
pation but not authority (such as, data the IS community.
beneficiaries) as the spokes. Finally, the
decision rights for the metadata domain
References
may involve both data consumer and 1. Ballou, D. P. and Pazer, H. L. Modeling data and
data modeling engineers, and a more process quality in multi-input, multi-output
information systems. Management Science 31,(1985),
balanced approach to responsibility and 150-162.
2. Brown, C. V. Horizontal mechanisms under differing
accountability; hence, it is modeled here IS organization contexts. MIS Quarterly 23, (1999),
as at the midpoint on the continuum. 421-454.
3. Griffin, A. and Hauser, J. R. The voice of customer.
Both structural and non-structural Marketing Science 12, (1993), 1-27.
mechanisms2 can be employed to imple- 4. Levitin, A. V. and Redman, T. C. Data as resource:
Properties, implications, and prescriptions. Sloan
ment the governance structure shown Management Review,(1998), 89-101.
in Table 2. For example, a committee 5. Olson, J. E. Data Quality: The Accuracy Dimension.
Morgan Kaufmann, San Francisco, CA, 2003.
of business leaders may review and ap- 6. Pfleeger, C. P. and Pfleeger, S. L. Security in computing.
prove IT project requests and/or act as Prentice Hall, Upper Saddle River, NJ, 2003.
7. Redman, T. C. The impact of poor data quality on the
the governing body for developing and typical enterprise. Comm. ACM 41, (1998), 79-82.
enforcing a set of data principles. For 8. Singh, G., Bharathi, S., Chervenak, A., Deelman, E.,
Kesselman, C., Manohar, M., Patil, S., and Pearlman,
other decision domains that require col- L. A metadata catalog service for data intensive
laboration across business unit and IS applications. In Proceedings of the ACM/IEEE
SC2003 Conference on High Performance Networking
professionals, similar standing commit- and Computing. (Phoenix, AZ, 2003)
tee mechanisms can also be employed, 9. Wang, R. Y. and Strong, D. M. Beyond accuracy: What
data quality means to data consumers. Journal of
as well as processes that help ensure Management Information Systems 12, (1996), 5-34.
10. Weill, P. and Ross, J. W. IT governance: How top
consistent behaviors across multiple performers manage IT decision rights for superior
business and IS units. Corporate an- results. Harvard Business School Press, Boston, MA,
2004.
nouncements and other central commu-
nications using Web-based portals could
Vijay Khatri (vkhatri@indiana.edu) is an associate
be the mechanisms employed to dissem- professor at the Kelley School of Business of Indiana
inate policy decisions and procedures, as University, Bloomington, Indiana, USA.
well as to convey the organization’s data Carol V. Brown (Carol.Brown@stevens.edu) is a
governance objectives. Finally, organi- distinguished professor at the Howe School of Technology
Management of Stevens Institute of Technology, Hoboken,
zational incentives and reward systems New Jersey, USA.
could be designed to reinforce the value © 2010 ACM 0001-0782/10/0100 $10.00
that the organization places on manag-
ing data as an organizational asset.

Conclusion
We have presented a data governance
framework that can be used by practi-
tioners to develop a data governance
strategy and approach for managing
data as an organizational asset. We
have identified five decision domains,
presented arguments for why each of
these domains is important, described
some key decisions to be made for each
domain, and provided some examples
of organizational positions that may be
given accountability.
We also have proposed that differ-
ing levels of centralized, decentralized,
and shared decision rights may be ap-

152 CO M MUNICATIO NS O F T H E AC M | JA NUA RY 201 0 | VO L . 5 3 | N O. 1


Copyright of Communications of the ACM is the property of Association for Computing Machinery and its
content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for individual use.

You might also like