Professional Documents
Culture Documents
D
ta governance has always been important
a
for minimizing risk, but never more so than Minimizing risk with modern data governance:
now. Today, organizations of all sizes want
to collect, integrate, access, and share volumes of 1 Modernize data governance strategies,
data from diverse sources to drive data science, policies, and frameworks
strategy development, process optimization, and
daily operational decisions. The digital transforma- 2 Empower data stewards with autonomous
tion of applications and processes is accelerating governance capabilities
data growth and heightening its importance.
3 Modernize data catalogs to drive
In TDWI research, 84 percent of organizations comprehensive and proactive governance
surveyed say that data governance is extremely
important, with 13 percent calling it moderately 4 Centralize governance policies to overcome
important.1 Nearly all of those surveyed (94 silos and ensure compliance
percent) regard modernizing data governance
as an opportunity to solve an array of problems 5 Aim for continuous governance via tight
in data collection, integration, and access. integration with workflows and curation
Regulatory adherence is often at the top of the list.
Organizations must comply with regulations such 6 Reduce governance risks in cloud migration
as the European Union’s General Data Protection and hybrid multicloud environments
1
Unless otherwise noted, survey results are from 2021 TDWI Best Practices Report:
Modern Data Governance, online at tdwi.org/bpreports.
Regulation (GDPR), the California Consumer Privacy Data governance has never been easy. Organizations
Act (CCPA), and similar laws. Additional industry- face a variety of challenges in making it effective,
specific regulations demand governance to minimize continuous, and appropriately restrictive. In
data exposure risks. addition to keeping up with changing data privacy
regulations, these challenges include:
Organizations need comprehensive and up-to-date
policies, rules, and enforcement practices to guide • Data growth and diversity, which require
how people and applications collect, access, share, governance scalability and drive the need for
and protect sensitive data, including personally automation
identifiable information (PII), to comply with
regulations. Policies and rules are vital to business • Data silos that fragment data governance
stakeholders’ understanding of their responsibilities oversight
for the data.
• Democratization of data and analytics, which
One of the keys to governance is data lineage puts pressure on organizations to expand data
documentation and tracking. With data lineage governance and balance those requirements with
capabilities, organizations have visibility into data users’ interests in self-service functionality
life cycles and the person responsible for the data
from its initial collection through transformation, • Maintaining data governance during
reporting, analytics (including data science), and cloud migration and in hybrid, multicloud
external sharing. With data volumes growing, data environments
lineage tracking and other important processes
for data quality, profiling, and validation require • New data governance risks that arise as data
techniques such as AI and machine learning (AI/ML) science grows and dashboards and applications
and automation to keep pace. are augmented with data-rich AI
Although regulatory compliance is a major focus, This TDWI Checklist Report discusses six steps for
it is not the only data governance concern. Many minimizing risk through modern data governance.
organizations see governance as providing a The report examines the role of automation for
framework for improving enterprise-wide data streamlining governance, supporting critical data
quality, data findability, and standards for how stewardship, and enabling organizations to achieve
individual users and teams transform, enrich, and continuous protection even as data grows, moves,
use data in reports, dashboards, and data science. and changes.
Governance can help organizations minimize risks
by establishing policies and using technologies
to improve data quality, reduce unnecessary data
duplication, and improve efficiency and repeatability.
follow, more automated, and less intrusive. Finally, However, TDWI finds that many organizations
governance is not a “one-and-done” project; the struggle with data stewardship and regulatory
committee must meet periodically to assess whether compliance when it comes to data access and use;
the current policies are up-to-date and effective for 40 percent of organizations surveyed say that trying
minimizing risk. to complement governance with data stewardship is
one of their leading challenges.
data catalogs to scale up to handle large volumes being used—can draw on the strengths of a modern
of diverse and changing data and to manage tasks enterprise data catalog. Data lineage tracking is
autonomously. Some enterprise data catalogs use increasingly difficult because data elements might
AI/ML algorithms to parse large sets of data to have come from an API, a data lake, a spreadsheet,
extract metadata automatically. These algorithms a BI report, or a data warehouse. Modern enterprise
can infer missing data, determine data types, and data catalogs offer tools that automate discovery
uncover relationships between data elements. This and documentation of data lineage, including noting
builds the catalog automatically under the guidance missing information and using AI/ML to suggest
of data stewards and SMEs. the possible lineage. Some tools provide dashboards
that enable data stewards and SMEs to visualize data
Modern enterprise data catalogs can play an active lineage and track the data’s life cycle. Having smarter
role in improving users’ collaborative experiences and more automated data lineage documentation
through proactive, AI/ML-driven recommendations and monitoring is critical to ensuring sensitive data
and shared knowledge. With these advances, protection and effective data governance as data
organizations can use the enterprise data catalog grows and becomes more diverse.
as the hub for autonomous data governance.
Responding to governance audits and advising
A modern enterprise data catalog aids governance in users of risks. With accurate inventories and data
many ways. Here are three key areas: lineage information, enterprise data catalogs can
help organizations maintain a complete audit
Knowing what data you have and where it is. trail of the data, which is essential for responding
The first step in data governance is to inventory to governance audits. Organizations can use the
data assets. To adhere to data privacy regulations, catalog’s resources to develop and maintain reports
organizations must locate and classify sensitive showing compliance with data privacy regulations.
data such as PII; 43 percent of organizations Autonomous capabilities in modern data catalogs are
surveyed by TDWI regard this step as a significant also able to warn users when certain data sets house
challenge. As data volumes rise, manually sensitive content. Automation is essential as the
documenting the location of high-volume data number of users and workloads increases.
in thousands of tables with hundreds of columns
becomes daunting. Modern enterprise data catalogs
can automate inventorying, tagging, and classifying,
as well as data profiling and quality assessment.
Centralize governance
Through the training of ML models, modern catalogs
can identify content patterns that indicate PII or
4 policies to overcome
other sensitive data that systems need to protect silos and ensure
from unauthorized access.
compliance
Monitoring and tracking data lineage.
Today’s diverse and distributed data environments
Understanding, monitoring, and documenting data
present governance challenges that affect regulatory
lineage—that is, where the data comes from, where
compliance as well as other governance objectives
it is now, who is accountable for it, and how it is
such as improving data consistency, quality, and ease inconsistency; they want to monitor and audit
of use. Data silos and the resulting need to move data the effectiveness of data governance across
between systems can increase data integration and the enterprise.
governance costs and create exposure risks.
An enterprise data catalog is important to holistic
In TDWI research, 38 percent of organizations data governance because it provides a centralized
surveyed cite distributed data silos as one of their source of knowledge about the data, its lineage,
biggest governance challenges.2 Each silo’s data its location, and who is accountable for it. Modern
owners often set up narrow governance policies; enterprise data catalogs automate collection of
without continuous oversight, rules are applied metadata from distributed silos, including data lakes
inconsistently to workloads. Across the organization, that contain diverse structured, semistructured, and
governance policies become both redundant and unstructured data types. They can crawl metadata in
conflicting. Narrow policies fall out of date when it both on-premises and cloud-based data systems.
becomes difficult for part-time data stewards to keep AI/ML in modern enterprise data catalogs can
up with data changes manually and users add new provide automated discovery of business glossary
data from different sources. This creates delays and terms to improve classification of data and mapping
complexity when users in finance or data science, of related metadata.
for example, have projects that call for cross-domain
access or single views of data held in silos. Along with enabling holistic data governance, an
enterprise data catalog can make it easier to gain
Data silos are often due to established legacy cross-domain visibility into the data and access
systems still in production, including older data to it to develop single views of all relevant data.
warehouses and data marts. However, the biggest AI/ML can enable automated discovery and
driver of growth in data silos is the democratization surfacing of proactive recommendations to provide
of self-service data and analytics; 52 percent of stewardship and accelerate users’ data exploration.
organizations surveyed by TDWI say that this trend
presents one of their biggest governance challenges.3 Self-service data visualization, analytics, and
Many organizations do not have good visibility into collaboration are vital to organizations, yet
data silos; they do not know what self-service users organizations have to balance self-service with data
are doing with the data and how they may be governance. Holistic, centralized data governance
sharing it. resources such as an enterprise data catalog make
it easier to achieve the right balance and avoid the
These issues are behind the growing interest in downsides of data silos. Data stewards will have a
holistic data governance, where most or all of more complete resource for guiding users to trusted
the organization is governed at an enterprise data. Autonomous data stewardship capabilities
layer above silos. Key objectives are to make data in enterprise data catalogs can embed automated
governance leaner, more consistent, scalable, and constraints in applications that inform users about
easier to maintain. Organizations seek to centralize sensitive data so they avoid risks of misuse and
policy management to reduce duplication and governance exposure.
2
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics, online at tdwi.org/bpreports.
3
Ibid.
4
2021 TDWI Best Practices Report: Modernizing Data and Information Integration for Business Innovation, online at tdwi.org/bpreports.
5
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics, online at tdwi.org/bpreports.
6
2021 TDWI Best Practices Report: Modernizing Data and Information Integration for Business Innovation, online at tdwi.org/bpreports.
Product and company names mentioned herein may be trademarks and/or registered trademarks
E info@tdwi.org of their respective companies. Inclusion of a vendor, product, or service in TDWI research does
not constitute an endorsement by TDWI or its management. Sponsorship of a publication should
tdwi.org not be construed as an endorsement of the sponsor organization or validation of its claims.