You are on page 1of 50

BEST PRACTICES REPORT Q3 2023

Achieving Scalable, Agile, and


Comprehensive Data Management
and Data Governance

By David Stodder
Research Sponsors
Alation

Dataiku

Denodo

erwin by Quest

Hitachi Vantara

SAP

Snowflake
BEST PRACTICES REPORT Q3 2023

Achieving Scalable, Agile, Table of Contents


and Comprehensive Data Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Management and Data Modernizing the Data Foundation for


Business Insights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Governance Managing and Governing Data for


Business Objectives . . . . . . . . . . . . . . . . . . . . . . . . 7
By David Stodder Modernization Drivers . . . . . . . . . . . . . . . . . . . . . 10
Satisfying Analytics and AI/ML
Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Data Governance for Protecting Data and
Driving Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Meeting Compliance and Data Protection


Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Data Governance for Improving Data Trust. . . . . 16
Top Data Governance Challenges. . . . . . . . . . . . 18
Data Management for Expanding Opportunities. 20

Cloud Data Migration and Management. . . . . . . 24


Data Platforms and Patterns: Current and
Planned Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Data Management Modernization Priorities. . . . 26
Data Integration: Evolving to Meet New
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Objectives for Improving Data Integrations . . . . 32


Data Catalogs: Current Satisfaction and Where
to Expand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Distributed Data: Challenges and Opportunities . . . 35

Data Fabric and Data Mesh Architectures . . . . . 37


© 2023 by TDWI, a division of 1105 Media, Inc. All rights reserved.
Reproductions in whole or in part are prohibited except by written DataOps for Planning and Scalability . . . . . . . . . 38
permission. Email requests or feedback to info@tdwi.org.

Product and company names mentioned herein may be trademarks and/or


Data Objectives for Optimizing Analytics
registered trademarks of their respective companies. Inclusion of a vendor, and AI/ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
product, or service in TDWI research does not constitute an endorsement by
TDWI or its management. Sponsorship of a publication should not be construed Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . 41
as an endorsement of the sponsor organization or validation of its claims.

This report is based on independent research and represents TDWI’s findings; Research Sponsors . . . . . . . . . . . . . . . . . . . . . . . . . . 45
reader experience may differ. The information contained in this report was
obtained from sources believed to be reliable at the time of publication.
Features and specifications can and do change frequently; readers are
encouraged to visit vendor websites for updated information. TDWI shall not
be liable for any omissions or errors in the information in this report.

TDWI RESEARCH 1 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

About the Author About the TDWI Best


DAVID STODDER is senior Practices Reports Series
director of TDWI Research
This series is designed to educate technical
for business intelligence.
and business professionals about new data and
He focuses on providing
analytics technologies, concepts, or approaches
research-based insights
that address a significant problem or issue.
and best practices for
Research for the reports is conducted via
organizations implementing
interviews with industry experts and leading-edge
BI, analytics, data discovery, data visualization,
user companies, and it is supplemented by
performance management, and related
surveys of data professionals.
technologies and methods and has been a thought
leader in the field for over two decades. Previously,
To support the program, TDWI seeks vendors that
he headed up his own independent firm and
collectively wish to evangelize a new approach
served as vice president and research director with
to solving data and analytics problems or an
Ventana Research. He was the founding chief editor
emerging technology discipline. By banding
of Intelligent Enterprise where he also served as
together, sponsors can validate a new market
editorial director for nine years. You can reach him
niche and educate organizations about alternative
by email (dstodder@tdwi.org), on Twitter, and on
solutions to critical business intelligence issues.
LinkedIn.
To suggest a topic that meets these requirements,
please contact TDWI Senior Research Directors
About TDWI Research David Stodder (dstodder@tdwi.org), James
TDWI, a division of 1105 Media, Inc., is the premier Kobielus (jkobielus@tdwi.org), or Fern Halper
provider of in-depth, high-quality education (fhalper@tdwi.org).
and research in the business intelligence and
data warehousing industry. TDWI is dedicated to Acknowledgments
educating business and information technology
TDWI would like to thank many people who
professionals about the best practices, strategies,
contributed to this report. First, we appreciate
techniques, and tools required to successfully
the many users who responded to our survey,
design, build, maintain, and enhance business
especially those who agreed to our requests for
intelligence and data warehousing solutions.
phone interviews. Second, our report sponsors,
TDWI also fosters the advancement of business
who diligently reviewed outlines, survey questions,
intelligence and data warehousing research
and report drafts. Finally, we would like to
and contributes to knowledge transfer and
recognize TDWI’s production team: James Powell,
the professional development of its members.
Lindsay Stares, Pete Considine, Rod Gosser, and
TDWI offers a worldwide membership program,
John Bardell.
educational conferences, topical educational
seminars, role-based training, onsite courses,
certification, solution provider partnerships, an Sponsors
awards program for best practices, live webinars, Alation, Denodo, Dataiku, erwin by Quest, Hitachi
resource-filled publications, an in-depth research Vantara, SAP, and Snowflake sponsored the
program, and a comprehensive website: tdwi.org. research and writing of this report.

TDWI RESEARCH 2 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Report purpose. Data demands are great as Research methods. In addition to the survey,
organizations deploy analytics, artificial TDWI conducted interviews with business and IT
intelligence and machine learning (AI/ML), executives and managers, application developers,
and data-rich applications. Data democratiza- and data management and analytics experts.
tion is multiplying use cases. Organizations face TDWI also received briefings from vendors that
data governance challenges to protect sensitive offer products and services related to the topics
data and improve trust in data quality. Data addressed in this report.
management and integration require scalability,
speed, and agility. This TDWI Best Practices Report Survey demographics. Nearly two in five respondents
examines priorities and discusses how to overcome are business or IT executives and VPs (39%). The
challenges to achieve value. second-largest group consists of business or data
analysts and data scientists (26%). Third largest
Survey methodology. In June and July 2023, is developers and data, application, or enterprise
TDWI sent invitations via email to business architects (18%). Line-of-business (LOB) managers
and IT professionals in our database, asking and business sponsors account for 8% of the
them to participate in an internet-based survey. respondent population. Other IT staff, consultants,
The invitation was also posted online and in and other titles account for 9% of the total.
publications from TDWI and other firms. The
survey collected responses from 209 respondents, Industries vary considerably. Computer/network
with 158 completing every question. For this manufacturing is the largest (11%), followed by
research, all responses are valuable and are financial services (10%), software manufacturing
included in this report’s sample. This explains why and publishing (9%), data services (8%),
the number of respondents varies per question. construction/engineering and government
(8% each), consulting/professional services
and education (6% each), healthcare (5%), and
Position insurance (4%). Just under half of respondents
reside in the U.S. (48%), with South Asia (primarily
India and Pakistan) (25%), Asia and Pacific
Other 2%
Islands (10%), Europe (6%), Canada (4%), and
IT-other; consultants 7%
other regions following. Respondents come from
LOB managers/ enterprises of all sizes.
sponsors 8%

Developers/ Business/IT
architects 18% 39% exec/VP

Business, data
analyst/scientist 26%
Industry
Computer/network manufacturing 11% Consulting/professional services 6%

Financial services 10% Education 6%

Software manufacturer/publisher 9% Healthcare 5%

Data service provider 8% Insurance 4%

Construction/engineering 8% Manufacturing (non-computer) 4%

Government 8% Other 21%

(“Other” consists of multiple industries, each represented by less than 4% of


respondents.)

Geography Europe
6%
Africa
3% Middle East
Canada 1%
4% South Asia
United States (India, Pakistan)
48% 25%

Asia/Pacific
Islands
10%
Central/South Australia/
America New Zealand
3% 2%

Company Size by Revenue Number of Employees

$10 billion or more 15% 100,000 or more 8%


$1 billion to $9.99 billion 18% 10,000 to 99,999 18%
$500 million to $999 million 9% 1,000 to 9,999 41%
$100 million to $499 million 20% 100 to 999 25%
$50 million to $99 million 9% Fewer than 100 7%
$20 million to $49 million 8% Don’t know 1%
Less than $20 million 8%
Don’t know or unable to disclose 13%
TDWI RESEARCH 4 tdwi.org
Based on 394 respondents.
Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Executive Summary analysts, and business users to spend too much


time acquiring, integrating, and preparing
data. We discuss how AI-infused automation in
The data explosion continues to accelerate across data integration and preparation processes are
distributed landscapes with data on premises and maturing to enable users to focus more time
on multiple cloud data platforms. Organizations on solving business challenges and achieving
face challenges as well as tremendous potential data-driven innovation.
for increasing the value of data assets, including
through data monetization—potential that can
Our research finds that
go untapped without good data management
and governance. Most organizations have a
most organizations have
democratized spectrum of users, creating ever-
only isolated success in
greater demand for data inside and outside managing and governing
their organizations. This drives the need for data to meet objectives.
diverse types of data and different levels of data
timeliness, as well as differing requirements for Our research finds that most organizations
data governance, integration, and quality. have only isolated success in managing and
governing data to meet objectives. Accelerating
This TDWI Best Practices Report focuses on growth in data volume, workloads, and users
understanding current challenges and providing across distributed and disparate data landscapes
best practices insights for modernizing processes generates pressure that can lead to chaos
and deploying technologies to solve them. and higher costs. Our report discusses how
Analytics workloads augmented with AI/ML are organizations can improve the balance between
critical to competing in every industry. Data-driven enterprise data governance and the agility
business initiatives depend on scalable, agile, and required for self-service user empowerment.
comprehensive data management and governance.
The latest applications embed sophisticated Limited data access is a problem when
analytics using AI/ML capabilities that must be organizations are trying to develop new insights
provisioned with continuous, integrated, curated about concerns such as customer behavior,
data to deliver insights to all users. Flexibility is supply chains, public health, operational cost
key to keeping pace with business demand and drivers, and business performance. Distributed
unanticipated events. data dispersed across silos is a major challenge
to gaining complete views of all data about these
Organizations are advancing with AI/ML concerns. It also presents challenges to holistic
through easier, automated capabilities such as data governance and management. Research in
autoML. Some are investigating large language this report shows many enterprises now have
models (LLMs) and generative AI. To move experience with or plans for bridging distributed
forward, organizations need to modernize data data through data virtualization, data mesh,
management, integration, and governance and data fabric architectures or consolidating
and align investments with evolving business disparate data into a unified cloud data platform.
requirements. Legacy technologies and practices
often force data scientists, data and business

TDWI RESEARCH 5 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

All data architectures today rely on technology Organizations cannot advance if


modernization to capture and manage metadata
they stand pat with inadequate
and other knowledge about all the data, including
legacy tools and platforms
data lineage. Organizations are expanding use
of data catalogs and additional data intelligence
when data environments
and semantic layer systems. This report discusses are constantly changing.
current satisfaction with data catalogs, business
glossaries, and metadata management systems Data management, data integration, and data
and where organizations need to improve to governance technologies and practices together
increase satisfaction. form the foundation for achieving ambitious
objectives with data. Organizations cannot advance
The report concludes with a discussion of how if they stand pat with inadequate legacy tools and
modern technologies and practices are coming platforms when data environments are constantly
together to create unified data environments. It changing. Agility is essential for responding
discusses the importance of making this unity to often unanticipated changes in business
flexible rather than restrictive. Finding the right requirements and conditions. Organizations need
balance enables organizations to empower teams to continuously evolve data management and data
to maximize the value of enterprise data assets. governance to overcome obstacles that prevent the
We close the report with 10 recommended best right people and applications from accessing the
practices for success. right data at the right time.

This TDWI Best Practices Report focuses


Modernizing the on trends, challenges, and opportunities to

Data Foundation for modernize data management, data integration,


and data governance technologies and practices.
Business Insights Updating data platforms, distributed data
integration layers, and tools for data management
Opportunities to drive higher value from data and data governance is essential for supporting
and analytics can be squandered if organizations growth in the use of data and increasing efficiency
lack scalable, agile, and comprehensive data and satisfaction. This report will examine current
management and governance. Data democratiza- experiences and practices and discuss how to
tion is accelerating demands in organizations of all address evolving challenges.
sizes and across all industries. Users require trusted
data for strategic forecasting, risk assessment, How successful are organizations currently in
and customer engagement as well as to drive managing and governing data, especially to
daily decisions. Data fuels simple operational support analytics and AI/ML? The prevailing
dashboards as well as complex analytics projects answer, according to our research, is “it depends.”
augmented by AI/ML. Automated decisions in Most organizations we surveyed experience
data-driven applications embedded with analytics inconsistency; 38% say they have isolated areas
and AI/ML require continuous access to trusted of success and agility, but capabilities across their
data for real-time response to customer behavior organizations are uneven (figure not shown).
or business events and for pattern detection. Only 14% say that they are highly successful and

TDWI RESEARCH 6 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

confident in how they manage and govern data They can seem like fixed services; organizations
today and can respond to change. Twice that that do not consider themselves “in the IT
percentage (28%) indicate that although they regard business” are mostly focused on reducing IT
their organizations’ current data management costs. However, business-driven adoption of
and data governance as successful, they anticipate cloud computing and trends toward funding
challenges in responding to change. some or all data management through operations
rather than as standard capital expenditures
Fortunately, only a small percentage of have put increased emphasis on aligning
respondents indicate a total lack of confidence; resource utilization with business operations and
just 8% agree with the statement that they are objectives.
“struggling to meet current requirements and are
not well prepared for change.” However, nearly as Usage-based service pricing
many in our research (10%) say that they are just models offered today by cloud
getting started with managing and governing data
data platform, application,
for more than one project, application, or team.
and tool providers make it
Inexperience is often the case with small and
easier to see the relationship
midsize businesses (SMBs). As they grow, between resource consumption
SMBs often bump up against the limitations of and business activity.
inexpensive and readily available tools such as
spreadsheets; hand-coded, point-to-point data Usage-based service pricing models offered today
pipelines; and siloed data platforms. As data by cloud data platform, application, and tool
grows in volume and variety and more users providers make it easier to see the relationship
attempt complex analytics, these users spend too between resource consumption and business
much time on ingesting, extracting, and preparing activity. With these models, business functions
data. Redundant or inconsistent data pipelines can understand how business activities drive
and transformation processes increase costs and consumption of storage, processing, data
lead to unsatisfactory performance. Decision management, and data integration services.
makers are reluctant to trust the resulting data With holistic data observability monitoring,
findings. Risks of sensitive data exposure are organizations have visibility into how bottlenecks
everywhere, demanding more holistic data slowing data access—such as having to wait
governance to at least ensure adherence to data for high-volume data movement or slow data
privacy regulations. transformation processes—ultimately affect
operational performance, customer satisfaction,
and the ability to achieve business objectives.
Managing and Governing
Data for Business Objectives Cloud service monitoring tools play an essential
role in tracking usage so companies can avoid
When traditional IT systems and practices are
paying for unused or underutilized resources.
isolated from business functions, it is not always
Organizations can deploy metrics and data
clear how data management and governance
observability capabilities, which will be discussed
relate to the achievement of business objectives.
in more detail later, to understand usage patterns

TDWI RESEARCH 7 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

and to know, for example, which data sets, • Customer experience. Big data analytics is
reports, dashboards, analytics models, data revolutionizing how organizations personalize
warehouses, and data applications are delivering marketing offers and shape customer
the most value. experiences. The volume of behavioral
data generated by customer activity across
Monitoring and metrics enable organizations channels can give marketers a tremendous
to analyze usage data to determine where resource for gaining a detailed view of
modernization, such as greater automation, customers’ historical and real-time behavior,
would be most beneficial. Monitoring visibility which organizations can analyze to generate
also helps to evaluate whether resources are predictive insights into how customers will
being devoted to unimportant projects or where respond to future offers. Figure 1 shows
your organization needs to improve access that 28% are very successful and 43% are
to underutilized data assets. With analytics somewhat successful with data management
augmented by AI/ML (including generative AI and and governance for improving customer
LLMs) rising in importance, organizations must experiences.
ensure that they have resources and governance
practices in place as they scale up in numbers of • Customer service and support. Integrating
users, workloads, and data volumes. data to gain 360-degree views of customers
is valuable for all customer-centric
Understanding how investments contribute to objectives, including excelling in service
business objectives is critical. To drill down deeper and support. Valuable data often includes
into the contribution of data management and semi- and unstructured customer service
governance investments to business success, records, call center interaction records,
we asked research participants to indicate how customer satisfaction survey data, and
successfully their organizations are managing information generated by online behavior and
and governing data and realizing value through engagement. Figure 1 shows that 22% are very
analytics, including AI/ML, to achieve objectives successful and 46% are somewhat successful
across business functions and initiatives. In with data management and governance for
Figure 1, we can see that organizations surveyed improving customer service and support.
are most successful at meeting objectives
important to regulatory compliance and • Financial planning, budgeting, and
protecting data; 30% are very successful and 43% forecasting. Organizations are reasonably
are somewhat successful. These results show the satisfied with their data management and
importance organizations place on succeeding governance for these activities; 22% are very
with core data governance priorities for regulatory successful and 44% somewhat successful. Users
compliance, often led by enterprise IT. depend on data management and governance
to provision trusted, high-quality data and
Organizations indicate the most success in three enable timely single views drawn from
additional areas: multiple sources for accurate snapshots and
forecasting decisions. Agility is an important
attribute for organizations to update plans and
budgets as conditions change.

TDWI RESEARCH 8 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 1 Regulatory compliance and protecting data


30% 43% 17% 5% 5%

Customer experience with your company


To achieve business 28% 43% 13% 6% 10%
objectives in the
following functions Customer service and support
22% 46% 16% 8% 8%
and initiatives, how
successful is your Financial planning, budgeting, and forecasting
22% 44% 19% 7% 8%
organization in
managing and governing Operational and LOB decisions by people
19% 47% 18% 8% 8%
data and realizing value
through analytics, Marketing and sales to customers and partners
21% 44% 16% 4% 15%
including AI/ML?
Fraud, abuse, and cybercrime detection
22% 42% 15% 6% 15%

Based on answers from 209 Collaboration with business partners and network
respondents. Ordered by combined 17% 45% 19% 8% 11%
“very successful” and “somewhat
Risk management
successful” responses.
23% 37% 21% 8% 11%

Strategic decisions and new business development


16% 42% 25% 9% 8%

Systems and equipment diagnostics and maintenance


19% 37% 26% 6% 12%

Automated decisions and embedded analytics


17% 37% 22% 14% 10%

Supply chain management and manufacturing


16% 36% 17% 5% 26%

Resource and/or energy consumption


16% 34% 19% 6% 25%

When we combine “somewhat unsuccessful” and with 22% somewhat unsuccessful and 14% not
“not at all successful” responses, Figure 1 shows at all successful.
that organizations are less successful in these
three areas: • Strategic decisions and new business
development. When organizations are
• Automated decisions and embedded undergoing significant, sometimes unexpected
analytics. Cutting-edge applications embed changes in markets, business networks, and
analytics to detect events, anomalies, and economic environments, they need agile data
patterns and provide users with recom- management and governance. The research
mendations for action. Some use analytics indicates some frustration with support
to drive automated decisions. The research for strategic decision-making and new
indicates that many organizations are not business development, with 25% somewhat
yet fully satisfied with data management and unsuccessful and 9% not at all successful.
governance for supporting these capabilities,

TDWI RESEARCH 9 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

• Systems and equipment diagnostics and importance of various drivers and priorities. Half
maintenance. Analytics augmented with of those surveyed (50%) say that improving trust
AI/ML is revolutionizing these functions, in data quality, accuracy, and completeness is
enabling organizations to develop predictive one of their most significant drivers. This issue
and prescriptive algorithms to know when is the focus of what many in the industry call
machinery is likely to break down. They can “offensive” data governance. Protecting sensitive
prepare maintenance and repair based on data and preventing unauthorized use and
data rather than traditional schedules. Data sharing, regarded as “defensive” data governance,
management, integration, and governance ranks second, with 38% identifying it as one of
are critical to gathering and preparing data their top three drivers. Just under one-third say
streamed from end points and enabling that complying with regulations and enabling
integrated views of historical and real-time regulatory reporting is a significant driver.
data. About one quarter (26%) are somewhat
unsuccessful in supporting these advanced Both offensive and defensive data governance
capabilities and 6% are not at all successful. are essential to maximizing the value of all
data, whether structured, semistructured, or
Modernization Drivers unstructured. Figure 2 shows that this ranks third
among drivers with 37% choosing it as one of
their top three. This selection also indicates the
Organizations are updating existing data
importance of collecting and curating diverse
platforms or migrating to new ones and using
data for analytics and AI/ML development. Data
modern data integration, data management,
lakes, unified data lakehouse platforms that
and governance practices and systems to
integrate data warehouses and data lakes, and
achieve greater success with the objectives
NoSQL systems have become critical technologies
shown in Figure 1. To gain insight into current
for managing diverse data. Many organizations
modernization priorities, we asked research
are focusing on addressing distributed data
participants to identify their organizations’ most
challenges; eliminating data silos and unifying
significant drivers (see Figure 2).
them in a single platform ranks fifth, with 32%
saying it is one of their top drivers.
The largest percentage
of respondents (50%) say Analytics and AI/ML development and
that improving trust in operationalization are significant drivers. Ranked
data quality, accuracy, and fourth by participants is the usage of analytics
completeness is one of their for operational efficiency and effectiveness.
most significant drivers. Organizations want to enable managers in
business functions and lines of business (LOBs)
The top two choices demonstrate the importance to examine different perspectives on data used
of addressing data governance as part of in performance metrics and dashboards to
modernization. We will discuss specific data determine, for example, the best ways to reduce
governance issues in depth in the next section, unnecessary costs and delays.
but here we gain a broader view of the relative

TDWI RESEARCH 10 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 2 Improving trust in data quality, 50%


accuracy, and completeness
What are the most
significant drivers behind Protecting sensitive data and
your organization’s preventing unauthorized use and 38%
current efforts and
sharing
plans to modernize
data management and Maximizing the value of all data
governance? 37%
(structured, semi-, and unstructured)

Based on answers from 196 Using analytics for operational 34%


respondents, who were asked efficiency and effectiveness
to select at least their top three
objectives.
Eliminating data silos and unifying 32%
data in a single platform

Complying with regulations; regulatory 32%


reporting

Enabling access and governance across 28%


a distributed data environment

Managing and orchestrating data 26%


pipelines, ETL, and data connectivity

Empowering users to find and analyze 26%


trusted data on their own

Developing, testing, and deploying 25%


analytics and AI/ML

Embedding analytics and AI/ML in 14%


applications

Reducing data management and 14%


governance costs

Nearly half of research participants (48%, drivers. Among this selection of research
not shown) who indicated earlier that their participants, 72% say that developing, testing,
organizations are highly successful and confident and deploying analytics and AI/ML is a top driver
in how they manage and govern data say that (compared to 25% of all respondents saying this
using analytics for operational efficiency and driver is significant).
effectiveness is one of their most significant

TDWI RESEARCH 11 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Satisfying Analytics and Nearly one-third say that making it easier


for users to operationalize data pipelines
AI/ML Requirements and transformation is a top priority (32%). A
The importance of analytics and AI/ML demands significant percentage prioritize increasing
that organizations develop a comprehensive automation in data pipelines and AI/ML
strategy for addressing projects’ data management, workflows (29%). These and additional results in
integration, and governance requirements. Figure 3 suggest that data problems are hindering
Uncoordinated, piecemeal approaches tend to analytics and AI/ML projects and solving them
increase challenges as well as costs. However, could accelerate value.
rather than constrain analytics, data science, and
data engineering teams with too much centralized Organizations need visibility to
control, many organizations are seeking to empower detect data changes after an AI
teams so they have the ability and independence to model has been deployed so
develop sharable data products and analytics that they can understand the reasons
meet their domain’s business objectives. Later, this for any poor model performance.
report will discuss strategies such as the data mesh
for finding the right balance between centralized Strong interest exists in extending data governance
governance and self-service empowerment. to AI model governance. In Figure 3, nearly
one-third say that monitoring models continuously
Many organizations are seeking for accuracy, data drift, and performance is a top
to empower teams so they have priority (30%). Organizations need such visibility
the ability and independence to to detect data changes after an AI model has been
develop sharable data products deployed so they can understand the reasons for
and analytics that meet their any poor model performance. Nearly as many
prioritize defensive model governance through
domain’s business objectives.
prevention of unauthorized data exposure in
models and processes (29%).
Because they are part of the foundation,
addressing pain points in data management
Explainability is often considered a key
and data governance is vital to moving forward
element of AI model governance; 29% say
with analytics and AI/ML. We asked research
they prioritize better transparency into data
participants for their data management and
processing, including for explainability.
governance improvement priorities specifically
Regulatory compliance is a major driver behind
for analytics and AI/ML. In Figure 3, we can see
explainability, which is the ability to explain
that reducing time and costs associated with
business decisions that are based on a model, such
data collection and preparation is the highest
as denial of credit to a customer. In addition to
priority for the largest percentage of respondents
regulatory compliance, however, organizations
(43%). Increasing data availability for AI model
also need explainability to improve decision-makers’
development, training, and testing is a close
trust in AI models and to know how to tune model
second (40%). Additionally, nearly one-quarter
performance for better results.
say that provisioning data for feature engineering
is a major focus (22%).

TDWI RESEARCH 12 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 3 Reduce time and costs associated with 43%


data collection and preparation
To improve data
management and Increase data availability for model 40%
governance for analytics development, training, and testing
and AI/ML specifically,
which of the following Make it easier for users to
objectives are your operationalize data pipelines and 32%
organization’s highest transformation
priority?
Monitor models continuously for 30%
accuracy, data drift, and performance
Based on answers from 184
respondents, who were asked
to select at least their top three
Prevent unauthorized data exposure in 29%
objectives. models and processes

Increase transparency into data 29%


processes, including for explainability

Increase automation of data pipelines 29%


and AI/ML workflows

Improve data sharing from external 26%


sources to complement internal data

Provision data for feature engineering 22%

Develop knowledge graphs and 22%


semantic layers for analytics and AI/ML

Manage data for large language models 9%


(LLMs) and generative AI

Data Governance customers’ personally identifiable information


(PII) from unauthorized access and sharing.

for Protecting Data Organizations need to comply with legal regulations


as well as industry and internal policies; they need
and Driving Value the ability to respond to audits to demonstrate
compliance. Users must know and adhere to rules
Modern data governance fuses two critical sets of and policies as they access and interact with data.
objectives. As noted earlier, the defensive objectives
focus on articulating and implementing rules Offensive data governance objectives aim at
and policies for protecting sensitive data such as creating business value through increased

TDWI RESEARCH 13 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

organizational understanding and trust in the


Meeting Compliance and
data and its fitness for each purpose. Users need
to be able to easily discover available data and Data Protection Priorities
be confident in the data they choose to consume As organizations grow more reliant on data for
and share in reports, metrics, dashboards, and decisions and applications processes, data volumes
notifications. If people lose trust in the data—for and the number and complexity of data-intensive
example, by depending on a chart that turns out workloads make both defensive and offensive data
to contain inaccurate or incomplete data—they governance more challenging. Looking first at
will be reluctant to turn to data again. Analytics defensive data governance, Figure 5 offers a fuller
models built and tested with poor and incomplete view of where organizations are most and least
data can lead to bad business decisions and successful in addressing regulatory compliance
unsatisfactory customer interactions. and sensitive data protection priorities.

Thus, improving the value of data assets through


Almost two-thirds indicate
a data curation process focused on discover-
ability, data quality, accuracy, consistency,
success (64%) in training to
and completeness is vital. Some organizations ensure users understand
integrate offensive data governance with guardrails, establishing
programs for improving user data literacy. Data data stewardship roles, and
literacy addresses primarily human aspects of how using modern tools that can
people interact with data. The primary goal is to automate and embed data
raise individuals’ proficiency with understanding governance constraints.
what data means and their ability to communicate
and share analytics insights. Here are areas where organizations are most
successful:
Overall, most research participants have a
positive view of how their organizations are • Complying with data privacy and other
achieving defensive and offensive data governance industry regulations. The imperative of
objectives. More than half (54%) regard their acting in compliance with regulations and
organizations as somewhat successful and 19% avoiding penalties understandably gets the
say they are very successful (see Figure 4). The most attention. Nearly one-third say they are
percentages are consistent across all sizes of very successful (30%) and more than half are
organizations surveyed. Demonstrating the somewhat successful (55%).
importance of data governance to growth in
analytics and AI/ML, 43% of respondents who • Controlling access to sensitive data such as
say their organization is very successful indicate PII. Access controls supported by tools and
that developing, testing, and deploying analytics up-to-date policies are critical to protecting
and AI/ML is their top data management and sensitive data. Distributed data landscapes
governance modernization driver (not shown). such as hybrid multicloud environments
are a challenge for controlling access; some
organizations use a data fabric or data

TDWI RESEARCH 14 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

virtualization layer to create a central point of • Ensuring data governance in self-service


control. Organizations show measured success and analytics. As organizations democratize
in controlling access, with 30% very successful data access and analytics, balancing users’
and 50% somewhat successful. needs with data governance priorities
becomes challenging. Training to ensure
• Setting data governance rules and policies users understand guardrails, establishing data
and keeping them up to date. Data stewardship roles for guidance and mentoring,
environments are constantly changing as and using modern tools that can automate
organizations collect new data and democratize and embed data governance constraints are
access. This makes it critical to review rules and all critical to success. Almost two-thirds
policies and ensure they remain accurate and of respondents indicate success (64%).
relevant. In our research, nearly three-quarters However, only 20% say their organizations
of organizations (73%) are successful to some are very successful and 30% say they are
degree, with 31% very successful. either somewhat unsuccessful or not at all
successful. About one-fifth (21%) say finding
• Training users in governance and the right balance between self-service data
responsible data use. Training users so they access and enterprise data governance is one
know their responsibilities and accountability of their top challenges (not shown).
for compliance is important. In Figure 5, we
can see that most organizations surveyed The research shows that organizations are least
regard themselves as successful in training successful in improving discoverability and having
users (64%), although just 21% say they are an accurate inventory of sensitive data, which is
very successful and 30% are either somewhat a requirement for most data privacy regulations
unsuccessful or not at all successful, including GDPR. One-quarter of respondents say
indicating room for improvement. their organizations are somewhat unsuccessful

Figure 4 Very successful 19%

Overall, how successful Somewhat successful 54%


is your organization
currently with its data Somewhat unsuccessful 9%
governance for ensuring
regulatory compliance, Not very successful 7%
protecting sensitive data,
and improving trust in Just getting started with data 10%
the data people use and governance
share?
Don’t know or NA 1%

Based on answers from 183


respondents.

TDWI RESEARCH 15 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

(25%) and 10% are not at all successful. We that may not be available in legacy data
will discuss later how data catalogs and similar integration and management systems.
metadata management systems are valuable Traditional practices often require users to
for knowing the location of sensitive data and consult policy documentation to determine
establishing an inventory. whether their data use and sharing complies
with governance policies, which can lead to
The research shows that inconsistent compliance.
organizations are least successful
• Monitoring governance during data
in improving discoverability
loading, movement, and replication.
and having an accurate
Organizations need to monitor exposure risks
inventory of sensitive data. not only while data assets are at rest in data
lakes and data warehouses but also while they
Other issues that organizations indicate are
are in motion during data integration or cloud
challenging include:
migration. Governing data in motion to avoid
exposure appears to be challenging for some
• Coordinating data governance with data
organizations; 22% say they are somewhat
security and use of masking techniques.
unsuccessful and 11% are not at all successful.
One-third of organizations surveyed say
Leading data integration tools and data
they are unsuccessful in establishing good
platforms offer automated capabilities for
coordination (33%), with 10% reporting that
monitoring data governance during loading,
they are not at all successful. Governance
movement, and replication.
committees should bring leaders together to
ensure alignment between data governance
and data security measures such as masking, Data Governance for
anonymizing and de-identifying data. Improving Data Trust
Regulations often have conditions regarding
acceptable use of these techniques for The offensive side of data governance is about
certain types of customer data. Organizations building data trust. Data curation steps for
should evaluate solutions such as automated improving the data’s quality, validity, integrity,
tag-based masking capabilities in data and authenticity are central to this objective.
platforms and tools that offer flexibility in Integrating data governance and data trust is how
whether your organization assigns masking to organizations can avoid a “Wild West” in which
databases, tables, schema, or columns. the expansion in self-service BI and analytics
happens without proper data governance
• Embedding governance constraints guardrails. Through data governance, you can
in pipelines, tools, and applications. clarify who is responsible for data authoring and
One-third of organizations surveyed are data curation.
unsuccessful in implementing embedded
constraints (33%), with 12% indicating Data quality monitoring is valuable for ensuring
they are not at all successful. This priority requisite profiling, cleansing, validation, and
involves using embedded capabilities enrichment, as well as uncovering problems

TDWI RESEARCH 16 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 5 Complying with data privacy and other industry regulations


30% 55% 9% 4%
2%

Regarding regulatory Controlling access to sensitive data such as personally identifiable information (PII) 2%
30% 50% 14% 4%
compliance and
protecting sensitive data, Setting data governance rules and policies and keeping them up to date
31% 42% 15% 7% 5%
how successful is your
organization with its data Training users in governance and responsible data use
21% 43% 22% 8% 6%
governance for addressing
the following priorities? Ensuring data governance in self-service BI and analytics
20% 44% 23% 7% 6%

Embedding governance constraints in pipelines, tools, and applications


21% 40% 21% 12% 6%
Based on answers from 177
respondents. Ordered by combined Monitoring external data sharing
“very successful” and “somewhat 25% 35% 23% 8% 9%
successful” responses.
Coordinating data governance with data security and use of masking techniques
23% 37% 23% 10% 7%

Monitoring governance during data loading, movement, and replication


Very successful 21% 39% 22% 11% 7%

Improving discoverability and having sensitive data inventory


Somewhat successful
23% 35% 25% 10% 7%
Somewhat unsuccessful Monitoring data lakes and disparate data marts for governance compliance
21% 37% 22% 7% 13%
Not at all successful
Ensuring compliance and data protection in AI/ML model processes
Don’t know or NA 18% 35% 19% 7% 21%

such as data inconsistency and redundancy. validate new data and address data quality issues
Data governance policies can indicate when at sources (50%). Organizations can use data
data owners or other experts should be notified intelligence tools and processes such as data
about problems with the data. Modern tools can catalogs to know more about source data quality.
automate monitoring. AI-infused automation Scalability is often a challenge when sources
is playing an increasing role in data curation become numerous and voluminous.
processes, enabling organizations to scale up to
higher and more diverse data volumes faster and The second most common action is to monitor data
spot anomalies that require attention. quality and ensure fit for each purpose (45%). With
the spectrum of use cases in most organizations
TDWI research asked which of the actions stretching from simple data consumption to deep
listed in Figure 6 are being undertaken within analytics and AI/ML, it is important to target
respondents’ organizations to increase users’ data curation carefully. Data quality monitoring
trust in data quality, including its accuracy and and processes need to be fit for each purpose.
completeness. The most common action is to AI-augmented tools for automatically detecting

TDWI RESEARCH 17 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

data quality issues are evolving to make it easier data classification as well. Along with enabling
for users to learn about data quality and address better defensive data governance and managing
problems; 25% say they are currently deploying the inventory of sensitive data, data lineage
these tools and 16% are automating data quality information makes it easier for self-service
recommendations to users. business users to find and contextually understand
high-value, relevant, and trusted data.
Data stewards can oversee
data quality processes, manage Many organizations employ a data catalog for
data governance. Our research shows that 42%
metadata and master data, and
use a data catalog to inventory data and identify
guide users to data that is fit
data quality issues. Automation in modern data
for purpose. Automated data catalogs is critical to meeting scalability and data
intelligence tools are essential. diversity challenges. As noted, data catalogs can
help organizations document data inventories
Data stewardship is a priority. Almost half of as required by regulations and data lineage.
organizations (45%) say that enhancing data Easier-to-use interfaces in modern data catalogs
stewardship to include data quality and trust enable different types of users to access data
is an action they are taking. Data stewardship intelligence and determine whether they can trust
is valuable for both defensive and offensive data to be fit for their needs.
data governance. Along with being experts in
regulatory compliance, data stewards can oversee
data quality processes, manage metadata and
Top Data Governance
master data, and guide users to data that is fit Challenges
for purpose. Automated data intelligence tools
We close this section of the report with a look at
such as data catalogs (or similar functionality
the most pressing data governance challenges
embedded within some data intelligence tools
confronting organizations in our research. These
and data platforms), as well as embedded data
fall into the following areas:
governance constraints and data governance
workflows, are essential to data stewards because
Governing distributed data across silos.
they reduce manual work, offer more complete
Data distributed across a hybrid multicloud
information about data, and streamline updates
environment makes it difficult to improve data
to metadata as the data environment changes.
quality, track data lineage, and monitor data use
for protecting sensitive data. Data silos, including
Tracking data lineage is also being undertaken
independent data marts and data lakes, fragment
by 45% of organizations surveyed. Data lineage
data across systems; 41% cite data silos as one of
information includes visual documentation of
their top three challenges (figure not shown for
the data sources, ownership, and transformations
research reported in this section). Silo creation
throughout an organization’s data landscape.
often accelerates with data democratization as
Some rely on data lineage–specific tools. Others
project teams or individual users set up their own
leverage data catalogs that can track data lineage
data repositories.
and serve as a central repository for information,
combining visibility of data quality scoring and

TDWI RESEARCH 18 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 6 Validate new data and address data 50%


quality issues at sources
To increase users’ trust
in the data quality, Monitor data quality and ensure fit for
including accuracy and 45%
each purpose
completeness, which of
the following actions are Enhance data stewardship to include
being undertaken within 45%
data quality and trust
your organization?
Track data lineage; document sources, 45%
Based on answers from 177 ownership, and transformations
respondents, who could select all
that applied. Use a data catalog to inventory data 42%
and identify data quality issues

Eliminate data silos by moving data to 41%


a central platform

Train users in data quality and literacy 35%

Deploy AI-augmented tools to 25%


automatically detect data quality issues

Create a virtual data fabric to manage 24%


and improve data quality

Automate data quality recommendations 16%


to users

Use a data marketplace or exchange to 9%


provide trusted data sets or products

One-third (33%) say that unifying organizations surveyed (41%) to eliminate data
silos by moving data to a central data platform.
data governance across a
Some are instead choosing to create a virtual data
distributed data architecture
fabric to improve holistic data governance, data
is a major challenge. quality, and data management across a distributed
data environment (24%). These options will be
One-third (33%) say that unifying data
discussed in more detail in sections to follow.
governance across a distributed data architecture
is a major challenge and 28% say it is difficult Integrating data governance with data
to ensure data governance across on-premises integration and data pipelines. Organizations
and cloud-based data. Governance challenges should ensure that data governance is part of
are motivating a significant percentage of processes for ingesting, collecting, and preparing

TDWI RESEARCH 19 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

data, not just once the data is at rest in storage Many organizations surveyed say that finding
and managed as part of a data warehouse or data the right people to be data stewards is a primary
lake. Nearly one third (30%) say that ensuring challenge (38%). Organizations often struggle to
data trust and protection in data pipelines, identify data stewards, most of whom have “day
data loading, and extract, transform, and load jobs” as business subject matter experts (SMEs),
(ETL) processes is one of their top challenges. data analysts, business analysts, or data warehouse
Organizations should employ modern, automated managers. Automated tools and data catalogs
capabilities embedded in data integration tools are critical to reducing hands-on monitoring
and platforms. Allowing data pipeline processes and manual policy enforcement so data stewards
to automatically access a data catalog or other can work efficiently. By collecting information
metadata management will enable users to find about data activity among users, data knowledge,
trusted data faster and help data stewards keep data ownership, and relevant data governance
track of sensitive data. constraints, data catalogs can help data stewards
avoid having to track down disparate information.
Addressing people and process issues. Earlier,
our research noted that most organizations are Using automation effectively. More than one
confident in their ability to set governance rules quarter of organizations surveyed (28%) say
and policies and keep them up to date. Yet, a that automating data governance, data quality,
significant percentage (40%) say that ensuring and data stewardship tasks is one of their top
clear rules, policies, and responsibilities for data challenges. The same percentage need solutions
use is an ongoing challenge. This is because data that will enable them to increase automated
landscapes are constantly changing, as are data scanning and tagging of new data sets (28%).
privacy regulations and the dangers of exposure Organizations should evaluate forward-looking
through errors and abuse, such as hacking. tools and platforms that provide automation of
Organizations should continuously monitor the these and similar functions.
effectiveness of rules and policies so that they are
effective and up to date.
Data Management
As we noted earlier, integration of data
governance with analytics model governance
for Expanding
is an emerging priority. In our research, 14% of Opportunities
organizations surveyed regard this as one of their
top challenges. Organizations should facilitate Pressures on data management continue to rise as
communication between data governance and data grows in volume, variety, and velocity. Rather
data science teams to align policies and processes. than rely on a traditional monolithic system,
organizations increasingly rely on a portfolio of
Automated tools and data data management solutions to support access
catalogs are critical to reducing to the range of historical, continuously updated,
hands-on monitoring and manual and real-time data for BI, analytics, and AI/ML.
To maximize the value of data assets, including
policy enforcement so data
through monetization and participation in data
stewards can work efficiently. marketplaces, organizations are creating data

TDWI RESEARCH 20 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

services made available through application in their ability to modernize, and 5% are not at
programming interfaces (APIs). all satisfied or confident they can modernize.
Some say they are just getting started with data
Data platforms are the centerpiece of data management (9%) and 3% don’t know.
management. However, platforms have become
more diverse as organizations seek to manage, The largest percentage
govern, and derive value from petabytes of of respondents is mostly
structured, semistructured, and unstructured
satisfied with their current data
data. In-memory platforms, massively parallel
management (45%), but over
processing (MPP), and other technology
advancements are enabling organizations to
one-third is more pessimistic.
improve query performance and data availability
We will discuss data management challenges
for analytics and AI/ML.
and priorities more fully later in this section.
First, we examine research results about which
Central to data management modernization for
systems, platforms, storage, cloud services, and
most organizations is cloud migration. Cloud
technology standards are currently in use or
computing arrangements vary, but the trend
planned (see Figure 7).
toward serverless computing frees organizations
from having to devote most of their time to
Not surprisingly, the most established options top
technical details such as server configuration,
the list: relational DBMS for data warehouse (68%
system and software updates, maintenance, and
currently using) and spreadsheets (66% currently
security. They can devote greater attention to
using). Spreadsheets are ubiquitous, especially
tapping data’s business potential and taking
among organizations that are just starting with
advantage of scalable and elastic cloud computing
data management (87% of these respondents
to respond to changing requirements.
currently use them for data management, not
shown). Analysts extract data from sources into
Modernizing data management is not easy. As
spreadsheets where they then store, organize,
technologies and practices become established,
categorize, manipulate, and analyze data.
data management becomes like a big ship that
takes time to change direction. Only 8% of
Spreadsheets offer functions for activities such
organizations surveyed say they are both very
as aggregating data, performing calculations,
satisfied now with their data management and
and creating graphs. However, spreadsheets can
confident they can modernize technologies,
become poor-quality data dumps when analysts
services, and practices to meet evolving
lack formal processes to incorporate new data
requirements (figure not shown).
and cleanse, enrich, and prepare it. Files grow and
become difficult to use and share for business
The largest percentage is mostly satisfied with
insights. Data catalogs can be useful for governing
their current data management (45%), but these
data so users working with spreadsheets or other
organizations anticipate challenges in trying to
BI tools can locate and access trusted data sets.
modernize. Over one-third is more pessimistic;
30% are somewhat dissatisfied with their current
state of data management and not very confident

TDWI RESEARCH 21 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Relational systems fit well for supporting BI reporting data access and deliver more complete data
and analytics requirements involving mostly management. If traditional relational DBMSs are
structured data. Transactional databases (OLTP) not appropriate for use cases, organizations turn to
are also primarily based on relational technology the variety of data management solutions grouped
and models (49% currently using). Application- under the NoSQL umbrella.
specific databases are frequently based on relational
technology. Many applications offer managed Columnar (or column-oriented) databases, a
reporting and analytics functionality based on variation on relational systems, are often included
relational databases; 50% are currently using applica- with NoSQL systems. Columnar databases store
tion-specific databases and 25% plan to use them. data by column rather than by row to deliver
faster query performance for analytics by
Current and planned use of in-memory database eliminating wasted processing from large table
technology is substantial. To accelerate access and scans. Figure 7 shows that 36% are currently using
improve performance, many organizations deploy columnar databases, which is about the same
in-memory databases. These have traditionally percentage (35%) as in our 2022 research. In 2022,
used relational models and technologies, but however, only 24% planned to use a columnar
today columnar databases are often in-memory database and 25% said they were not using
as are other types of NoSQL databases. As one and had no plans to use one. This report’s
memory space has expanded, organizations are research notes a rise to 35% planning to use a
able to store entire data warehouses, analytics columnar database, with only 16% saying they are
data platforms, data marts, and online analytical not using one and have no plans to use one.
processing (OLAP) cubes in memory.
A substantial number of NoSQL systems are
Some organizations couple in-memory OLAP key-value or document databases; 31% are
with a semantic layer to optimize access to data currently using one and 33% plan to use one.
in all locations and use the cube effectively to Numerous types of items could be stored as
collect and store pre-aggregated data structures, documents as required by various use cases
dimensions, and measures for faster analytics. that range from OLTP workloads to real-time,
Figure 7 shows that 35% are currently using an automated analytics embedded in applications.
in-memory database and 30% plan to use one. Document database solutions offer expressive
query languages and multiple levels of indexing
Adoption of columnar and other NoSQL systems to support diverse types of queries. Key-value
is growing. Growth in semi- and unstructured data databases (or stores), using a simple data model
and demand for complex analytics has focused of keys and values, similarly give developers
attention on non-relational systems such as NoSQL flexibility to meet a variety of application and
data management. Although many organizations analytics requirements.
use simple data or file storage systems to hold
a variety of data (55% currently use data or file Simplicity is part of the appeal of spreadsheet-
storage for data management and 29% plan to like DataFrame data structures for storing and
use them), most organizations eventually need interacting with a variety of data sourced from
substantial data models and enterprise data SQL and NoSQL files. DataFrames are used in data
access and management capabilities to optimize science, including for AI/ML projects. They are

TDWI RESEARCH 22 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 7 Relational DBMS for data warehouse


68% 19% 6% 7%

Which of the following Spreadsheets


types of data management 66% 25% 4% 5%

systems, platforms, Data or file storage system


storage, cloud services, 55% 29% 10% 6%

and technology standards Application-specific database


are currently in use or 50% 25% 13% 12%

planned for future use at Transactional database (OLTP)


your organization? 49% 29% 11% 11%

Columnar database
36% 35% 16% 13%
Based on answers from 171 In-memory database
respondents. Ordered by “currently
35% 30% 19% 16%
using” responses.
Content management system
34% 40% 15% 11%
Currently using Mainframe data management
32% 29% 24% 15%
Plan to use
Key-value or document database
Not using; no plans
31% 33% 21% 15%
to use

Don’t know or N/A DataFrames


31% 29% 20% 20%

Apache Hive and Hadoop technologies


31% 28% 27% 14%

Distributed SQL query engine


30% 38% 16% 16%

Graph database
26% 29% 27% 18%

Hive table format


26% 26% 26% 22%

Delta Lake
19% 29% 33% 19%

Apache Iceberg engine/processing


18% 29% 33% 20%

the primary data types used in the pandas Python Graph databases are NoSQL systems offering
data analysis library and can also be used with advantages over traditional relational databases for
other languages such as R and Scala. Unlike single- storing networked data relationships on a large and
location spreadsheets, a DataFrame can function complex scale. Visual graph query languages enable
as a data structure shared across a distributed users to avoid complex SQL programming and
computing environment. In Figure 7, 31% say they accelerate exploration and repeatable discovery of
are using DataFrames and 29% plan to use them. complex data relationships. Figure 7 shows that 26%
are currently using a graph database and 29% plan
to use one.

TDWI RESEARCH 23 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Cloud Data Migration and as-is to the cloud. For example, an organization
might keep the same data warehouse model
Management and ETL routines using the same underlying
Cloud data migration is a critical part of data software but deploy it on a new cloud provider’s
management modernization for organizations of platform. Organizations choose this approach to
all sizes. Organizations want to take advantage of migrate more rapidly. However, it is important to
pay-as-you-go pricing models that offer elastic balance speed benefits with taking advantage of
scalability. With storage and processing resource modernization opportunities with the new cloud
allocation in tighter sync with business needs, platforms and services.
organizations are better prepared to respond to
growth in data volume or the need for faster workload With phased cloud migrations,
processing to accomplish seasonal or unanticipated organizations can test and
requirements. Organizations are also taking validate related components
advantage of cloud elasticity for AI/ML workloads and learn from experience to
and expansion in self-service BI dashboards and improve succeeding migrations.
daily operational reporting.
Hybrid data environments that comprise
TDWI asked research participants to choose on-premises and cloud-based systems are
which of a set of statements best describes common; 17% say that they plan to continue
their organization’s status regarding cloud data having a hybrid environment. In some cases, this is
management and migration. We can see that due to local data residency laws and access control
organizations are adopting a range of strategies. requirements that make it impossible to migrate
No one approach is dominant. data assets. In other cases, organizations want
to continue running production applications and
About one-quarter are focused on phased data platforms because they are working well and
migrations to move and modernize most or all their have high satisfaction rates. Some organizations
data systems to cloud platforms (26%; figure not will keep legacy systems in place but choose a
shown). Among larger companies with $1 billion cloud-first strategy for any new systems. However,
or more in revenues, the percentage increases to our research finds that only 11% of organizations
37%. With this approach, organizations migrate surveyed for this report are adopting this strategy.
pieces incrementally. In each phase, organizations
can test and validate related components and learn
from experience with each phase how they can Data Platforms and Patterns:
improve succeeding migrations. Organizations Current and Planned Usage
need visibility into application data dependencies
The research in Figure 7 offered a view of current
to avoid disrupting use cases that are unrelated
and planned usage of different data management
to data assets, data integration processes, and
technologies and models. In Figure 8, we take
applications they are migrating.
a different perspective and examine the status
and plans for data management platforms,
The second-largest percentage currently uses
repositories, and patterns for supporting BI,
or plans to use a lift-and-shift strategy (22%)
analytics, AI/ML, and data applications.
to migrate on-premises data and workloads
TDWI RESEARCH 24 tdwi.org
Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

The largest percentage of organizations surveyed Unifying data warehouses and data lakes. Indeed,
uses a data warehouse on premises (67%). Nearly an important current trend is toward deploying
half have a data warehouse in the cloud (48%) a unified data platform such as a data lakehouse
and 38% plan to implement a cloud-based data that integrates data warehouse and data lake
warehouse. (Note that research participants capabilities. Figure 8 shows that 22% are currently
provided an answer for each type of system or using a unified cloud data warehouse and data
pattern listed in Figure 8; we did not ask for a lake and 46% plan to use one, which is the highest
choice between, for example, an on-premises or “plan to use” percentage in the figure.
cloud data warehouse.)
A data vault is an established data modeling
Across TDWI research, we have seen a strong design pattern that some organizations view
trend toward cloud data warehouses. However, as a flexible alternative to traditional data
the results here show that many organizations warehouse and star schema designs. Figure 8
are continuing to work with on-premises data shows that a data vault is currently in use by 20%
warehouses. Many organizations indicate that they of organizations surveyed. About one-third plan
have both. Among organizations that are taking to use it (34%). Our interview research finds that
a phased migration path to the cloud, 79% are some organizations use a data vault as part of
currently using a data warehouse on premises. Yet, their unified data lakehouse strategy.
even among these organizations, 64% also have a
data warehouse in the cloud (not shown). Cloud data lakehouses typically employ massively
parallel processing (MPP) database engines
As we saw in our 2022 research, more that can efficiently scale computing power to
organizations have a data lake in the cloud (41%) fit workload requirements. Organizations could
than have one on premises (30%). One-third of eliminate separate staging areas and choose data
research participants (33%) say they plan to use lakehouses as their platform for extract, load, and
a cloud data lake. TDWI defines a data lake as transform (ELT) processes, which is a variation
a design pattern and architecture optimized to on traditional ETL. Organizations can target data
capture a wide range of data types, both old and lakehouse storage for collecting and staging data;
new, at scale. The purposes of a data lake are then, for the transformation, the lakehouse uses
manifold and can vary among organizations. the power and scale of in-database processing.

In general, unlike data warehouses that operate with Unified data platforms are
well-defined preprocessing for data transformation, adopting evolving open
formatting, and cleansing to achieve trusted,
source file formats such as
carefully structured data, a data lake lets workloads
Apache Iceberg and Delta
dictate how they need to categorize, cleanse, and
otherwise prepare data for analytics and AI/ML.
Lake storage framework
Some organizations use a data lake as a staging to enable standardized
area for data transformation and cleansing before support for all workloads.
loading data into a data warehouse.

TDWI RESEARCH 25 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Unified data platforms such as data lakehouses are


Data Management
adopting evolving open source file formats such as
Apache Iceberg and Delta Lake storage framework Modernization Priorities
to enable standardized support for all workloads As it was in 2022, data quality problems and
ranging from BI and OLAP to data science and an- improving the ability to fix them ranks first
alytics-driven applications. The standards improve among data management modernization
support for a wider variety of programming priorities for increasing user satisfaction and
languages and computation engines. maximizing the value of data (44%; see Figure 9).
To address data quality challenges, organizations
The standards also enable data lakehouses to need consistent practices supported by
standardize for ACID transactions, record-level automated tools. Organizations should integrate
updates, metadata standardization, schema data quality steps into the automation of data
enforcement, and additional features essential to preparation, which 38% of respondents cite as a
optimizing performance for BI, analytics, and data data management modernization priority. Data
application workloads. Referring to Figure 7, our preparation and quality are not one-size-fits-all;
research finds that 18% are currently using Apache BI reporting use cases have different requirements
Iceberg and 19% Delta Lake; 29% plan to use than data science use cases.
technologies based on each of these standards.
Tracking data lineage, noted earlier in our
Nearly one-third currently use a data virtualization discussion of data governance, is essential to
layer or data fabric. Figure 8 shows that 32% increasing data intelligence and solving problems
are currently using a data virtualization layer or faster. Figure 9 shows that this is a top priority for
fabric and 38% are planning to use one. A data 35% of research participants. With data volumes
virtualization layer connects to the sources to growing, data lineage tracking and processes for
access metadata, which is then used to enable data data quality, profiling, and validation require AI/ML
transformations and joins that result in a new, techniques and automation to keep pace.
logical data source. For users, data virtualization
presents an abstraction layer, shielding them from Modern enterprise data catalogs offer tools that
the complexities of knowing the various source automate discovery and documentation of data
data formats and implementations to access them. lineage, including noting missing information
and using AI/ML to suggest the possible lineage.
A data fabric builds on data virtualization to Some tools provide dashboards that enable data
provide a universal and holistic approach to stewards and subject matter experts (SMEs) to
integrating diverse components of physically visualize data lineage and track the data’s life
distributed data environments. This report will cycle. Some additionally embed visibility of data
discuss data fabrics and related technologies in quality scoring and sensitive data classification
more depth later in our section on distributed within data lineage for fuller context, impact
data management and integration. analysis, and rapid problem identification.

TDWI RESEARCH 26 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Research in Figure 9 highlights the urgency to data, and providing views of it from a single point
accelerate data management processes. More of access.
than one-third say it is a top priority to reduce
delays in adding new data or modifying existing Data silo growth challenges represent a key
data (38%). Just over one-quarter of respondents reason behind consolidation of fragmented data
(26%) are looking for alternatives to slow data into a single, unified data platform. Requiring
movement, copying, and replication. As data users to locate and access data across numerous
volumes grow, these processes can become costly silos causes delays. Data silos make it difficult for
bottlenecks, which motivates organizations to users to gain a single view of the truth: that is, all
evaluate new practices. data relevant to topics of interest. Thus, we see in
Figure 9 that 34% of respondents put a priority on
Some organizations use change data capture eliminating and consolidating disparate data silos
(CDC) technologies to capture only changes to onto a single data platform. Organizations often
data and data structures and provision these cannot consolidate all data silos immediately.
continuously rather than in large batch processes. They must prioritize which ones will most reduce
Other use cases benefit from data virtualization, data management complexity and costs and make
federation, and data fabrics to reduce data the data environment easier to govern.
movement by connecting to and querying
distributed sources, aggregating the resulting

Figure 8 Data warehouse (DW) on premises


67% 18% 11% 4%

BI or analytics platform
Which of the following 62% 26% 8% 4%
types of data management
Data warehouse in the cloud 2%
platforms, repositories, 48% 38% 12%
or patterns are currently
Operational data store
in use or planned for use 47% 28% 15% 10%
by your organization to
Data lake in the cloud
support BI, analytics, 41% 33% 15% 11%
AI/ML, and data
Data streaming, including real-time data ingestion
applications? 40% 35% 13% 12%

Data virtualization layer or data fabric


Based on answers from 170 32% 38% 18% 12%
respondents. Ordered by “currently
using” responses. Data lake (DL) on premises
30% 30% 30% 10%

Currently using Unified cloud DW/DL platform (e.g., cloud data lakehouse)
22% 46% 18% 14%
Plan to use
Data Vault techniques
Not using; no plans 20% 34% 24% 22%
to use

Don’t know or N/A

TDWI RESEARCH 27 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

In the following sections we will examine data Stateless Representational State Transfer (REST)
integration and distributed data management. APIs offer the most established approach and
Modernization in these areas is key to addressing are popular for their simplicity and flexibility.
data management priorities and solving However, an alternative query language for
challenges holistically. APIs, GraphQL, is growing in popularity among
application developers and data analysts.

Data Integration: GraphQL enables users to examine complete


descriptions of data available through APIs and
Evolving to Meet query the data more efficiently. GraphQL enables
single-request aggregation of data from multiple
New Challenges sources or APIs. In Figure 10, we see that APIs
are currently used by the largest percentage of
Data integration technologies and practices, organizations surveyed (59%) and a significant
including data pipelines and APIs, are essential to percentage plan to use APIs (28%).
supplying users with trusted data at the right time
for reports, dashboards, real-time notifications, Data connectivity needs to
and analytics (including AI/ML). Organizations support user demands for
need to solve data integration problems that easier setup, expanded data
hinder decision makers from gaining actionable
discovery and findability, and
insights and making faster decisions. Data-driven
applications with embedded analytics and
significantly reduced latency.
automated decision capabilities are important
The second-largest percentage say they are using
in areas such as e-commerce, customer support,
data connectors and connectivity tools (46%
logistics, and manufacturing. These applications
currently using and 32% planning to use). This
cannot function without seamless data integration.
is a broad category of tools that are important
for delivering data to applications either from
The sheer number of items listed in Figure 10
original sources or intermediate data platforms.
makes it clear that data integration requirements
Data connectivity needs to support user demands
encompass a range of technologies. Solutions are
for easier setup, expanded data discovery and
continuing to evolve to meet the demands of new
findability, and significantly reduced latency
use cases. For example, APIs have become a popular
through real-time reporting and analytics. It
way of connecting applications for transferring data
is important for organizations to monitor data
and sharing application functionality easily. APIs
connectivity to improve standardization; legacy
enable organizations to create a layer for managing
data connectivity is often characterized by
connections that shields users from complexity,
numerous point-to-point integrations that are
reducing the technical expertise required to
conflicting, redundant, and difficult to update.
access data. Organizations can govern data access
Monitoring is also important to prevent data
through the layer rather than open uncontrolled
governance exposures.
data interaction. Sharing data assets and services
via APIs has become an important facilitator of
Data transformation and pipelines are commonly
modern collaborative business relationships (that
used. ETL and ELT are next highest, used by
is, the “API economy”).

TDWI RESEARCH 28 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

45% and 44% of organizations respectively. With connect to sources and stream raw, even real-time,
ELT, organizations build data pipelines to load (or data into a data lake or data lakehouse. This is
replicate) data into a target repository before data primarily for use by data scientists performing
transformation, thereby saving data movement to predictive analytics and running AI/ML programs.
a separate staging area. The research shows that Other pipelines provide fuller data profiling, quality,
traditional ETL, however, is still important and is and validation steps as needed by workloads.
the appropriate choice for certain workloads.
Data preparation and preprocessing have a
Data pipelines, which often contain ETL or ELT significant role in data integration. The data
functionality, are in use by 43% of organizations and pipeline steps mentioned above (plus others
35% plan to use them. Some data pipelines simply such as data enrichment) fall under the umbrella

Figure 9 Address data quality problems and


44%
improve ability to fix them
Overall, which of the
following are your Automate data preparation and
organization’s top enrichment for BI, analytics, and AI/ML
38%
current data management
modernization priorities Reduce delays in adding new data or
for increasing user modifying existing data
38%
satisfaction and
maximizing the value of Track data lineage to increase data
data? intelligence and solve problems faster
35%

Based on answers from 167 Make it easier to view, query, and


34%
respondents, who were asked access distributed data
to select at least their top three
objectives.
Increase processing availability for
34%
analytics and AI/ML

Eliminate and consolidate disparate


34%
data silos onto single data platform

Find alternatives to slow data


26%
movement, copying, and replication

Reduce data management costs 17%

Adopt or expand continuous data


13%
streaming

Enable authorized access to live,


12%
real-time data sets

TDWI RESEARCH 29 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

of data preparation and preprocessing. Data Nearly as many are currently using data streaming
preparation processes focus on determining (e.g., real-time) technologies (33%) and more
what the data is and improving its quality and are planning to use them (40%). Data streaming
completeness. Organizations use data catalogs solutions offer continuous flows through pipelines
in data preparation steps to standardize how that gather and process data as sources generate it.
different users define and structure the data. Organizations apply real-time analytics and AI/ML
to track situations, understand patterns, and answer
The life cycle of preparation and preprocessing complex business questions.
steps starts with collecting and consolidating
data and continues through transformation and Over a third (36%) are currently
enrichment to make data useful for purposes using a data catalog or metadata
such as reporting, dashboards, OLAP analytics,
management system and 47%
and AI/ML. Self-service data preparation is a
plan to use one, which is the
critical trend for empowering nontechnical users
as well as data scientists to be less reliant on IT
highest “plan to use” percentage
developers and analysts. for any item in Figure 10.

Two out of five respondents (40%) in Figure 10 say Sizeable percentages use data intelligence
their organizations are using data preparation or technologies. Data intelligence consists of
preprocessing tools and 34% are planning to use knowledge about diverse data built on metadata
them. Data preparation and preprocessing steps resources contained in a repository such as a
can be time-consuming and involve considerable data catalog. As shown in Figure 10, 36% are
manual work, which makes tools that automate currently using a data catalog or metadata
steps and integrate preparation with data pipelines management system and 47% plan to use one,
highly valuable. Some organizations are employing which is the highest “plan to use” percentage for
data fabric preparation and delivery layers to any item in the figure.
simplify data exploration and transformation in
heavily distributed data environments. Along with accurate data definitions and metadata,
shared data intelligence systems will manage and
Options for accelerating access to timely data include data profiles, information about the data’s
are popular. Organizations are currently using location, taxonomy and classification documents,
or are planning to use several technologies that calculations, and data lineage information
enable them to bypass lengthy data integration about sources and data ownership. Significant
and preparation steps to allow users to access percentages of organizations surveyed are using
fresher, more frequently updated data. More than solutions based on open source standards. Nearly
one-third are using CDC technologies (36%), and one-third (30%) are currently using Apache Atlas,
39% plan to use them to capture changes to data a data governance and metadata framework; 25%
continuously rather than force users to wait for are planning to use it. About one-quarter are
batch updates to historical data. either currently using (23%) or planning to use
(25%) Metastore, an open source Apache Hive
metadata repository typically used with data lakes.

TDWI RESEARCH 30 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 10 Application programming interfaces (APIs)


59% 28% 8% 5%

Which of the following Data connectors and connectivity tools


46% 32% 11% 11%
types of data integration,
data catalog, and data ELT (data loaded first into target data platform)
44% 29% 15% 12%
intelligence technologies
are currently in use or ETL with an intermediate staging area
45% 31% 13% 11%
planned for use at your
organization? Data pipelines
43% 35% 10% 12%

Business glossary
Based on answers from 167 40% 42% 12% 6%
respondents. Ordered by “currently
using” responses. Data preparation or preprocessing tool
40% 34% 14% 12%

Data catalog or metadata management


36% 47% 12% 5%
Currently using
Master data management
Plan to use 36% 40% 15% 9%

Not using; no plans Change data capture


to use 36% 39% 12% 13%
Don’t know or N/A Data streaming (e.g., real-time)
33% 40% 13% 14%

Enterprise service bus or message-oriented middleware


33% 32% 18% 17%

Apache Atlas (data governance and metadata framework)


30% 25% 34% 11%

Integration Platform as a Service (iPaaS)


29% 35% 19% 17%

Product information management


29% 32% 18% 21%

Data lineage tool


28% 45% 16% 11%

Data virtualization layer


28% 38% 20% 14%

Semantic layer or ontology tool


24% 40% 20% 16%

Data fabric
24% 38% 23% 15%

Knowledge graphs
24% 32% 25% 19%

Metastore (Apache Hive metadata repository)


23% 25% 28% 24%

Data mesh decentralized architecture


21% 44% 21% 14%

TDWI RESEARCH 31 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 10 also shows strong interest in data lineage management (CIM) system to capture, manage,
tools, which are incorporated as capabilities in and govern customer data while 20% are using a
some data catalogs and data integration suites or multidomain MDM system for this purpose.
are offered as standalone data intelligence tools.
TDWI finds that 28% are using data lineage tools
Objectives for Improving
now and 45% plan to use them. We will discuss key
goals with data catalogs, business glossaries, and Data Integration
metadata management—and levels of satisfaction in With the preceding discussion of current
achieving them—later. and planned data integration systems in
mind, we now turn attention to improvement
Also contributing to data intelligence are priorities. Improving data quality, accuracy,
business glossaries (40% currently use and 42% and completeness is the top data integration
plan to use them) and master data management objective (51%; figure for this section not shown),
(MDM), which 36% currently use and 40% which aligns with what we reported for data
plan to. Business glossaries collect, organize, management modernization. Organizations
and coordinate business terms and definitions should create processes for determining data
to provide clarity and data context across quality and use automated tools to monitor
organizations. Business glossaries may also changes in the data. AI-infused automation in
contain business rules, business policies, data data intelligence tools is enabling organizations
sharing agreements, and other business assets to move past the limits of manual data curation.
to be used in conjunction with technical data
assets. Modern data intelligence solutions merge Reducing the amount of data movement,
business glossaries and data catalogs together. replication, and copying is an important objective
for 34% of organizations surveyed. As data
MDM systems document and coordinate master volumes rise, these processes often cause data
data, which is higher-level data about products, latency and therefore slower time to insight.
suppliers, customers, and other business entities. Automating data movement to save cloud costs
MDM enables organizations to establish “golden” and optimize processing is a top objective
master data definitions that are consistent, valid, for 27% of organizations. As we noted earlier,
and accurate across applications and services. heavy data movement, replication, and copying
Many organizations today are establishing MDM also raises data governance concerns because
systems that integrate master data from different of potential exposure of sensitive data. Cost
domains (most commonly customer and product also rises. A primary driver behind interest in
information) to reduce management overhead and data virtualization and data fabrics is reducing
enable cross-domain data access and analytics. complexity and costs due to data movement,
replication, and copying.
Figure 10 shows that 29% are currently using a
product information management (PIM) system Cloud migration of data integration tools and
and more (32%) are planning to use one. In systems is a priority. One-third of organizations
research for this report not shown here, TDWI (33%) are seeking to migrate data integration,
finds that 29% are using a customer information pipelines, and data intelligence to the cloud. It

TDWI RESEARCH 32 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

is important to gather knowledge about existing of cloud data resource consumption is key to
on-premises data integration to inform cloud keeping spending in line with overall financial
migration teams about dependencies and best plans. As part of cost optimization, some
strategies for migration. Knowledge about organizations establish cloud financial operations,
current ETL performance will help organizations or FinOps. Inspired by DevOps practices, FinOps
eliminate unnecessary workloads—a priority of frameworks facilitate collaboration among
25% of research participants—as they shift data stakeholders in engineering, finance, technology,
integration processes to the cloud. Organizations and business domains regarding spending
can use migration as an opportunity to switch to decisions. Monitoring visibility into real-time
ELT, which for 20% is a top objective. data about costs, use, and allocation is essential
to FinOps. Tools in the marketplace are evolving
In some cases, using phases to migrate data to provide granular visibility and use AI/ML to
integration processes in smaller units is the best generate recommendations.
approach, but in others it may make sense to
migrate all related processes and subprocesses
together. With either strategy, using automated Data Catalogs:
tools will increase consistency and accelerate
testing of newly migrated processes.
Current Satisfaction
and Where to
Finally, 32% cite the importance of reducing costs
and improving monitoring of cost drivers. Large Expand
organizations often have thousands of ETL, data
pipeline, data streaming, and data preparation Data catalogs, business glossaries, and metadata
processes. Modernization should include management systems share capabilities. They
monitoring to improve visibility into the value of can play complementary roles, with business
data integration processes and whether resources glossaries specializing in business definitions.
need to be reallocated to optimize the most In the marketplace, data catalogs, business
important processes and eliminate those that are glossaries and metadata management systems are
no longer needed. generally converging into single, all-in-one data
intelligence solutions.
Performance optimization focuses on reducing
costs through higher efficiency. Modern, automated With organizations supporting
tools enable performance optimization without more diverse users and
user intervention. Optimization should ensure that workloads, a data catalog
workloads have the resources they need and show provides a necessary foundation
expected rates of consumption. Once you have
for both defensive and
granular visibility and proper controls in place, you
offensive data governance.
can use tool capabilities in cloud data platforms to
identify inefficient spending.
A centralized metadata repository such as a data
catalog helps organizations reduce confusion
Cost optimization and FinOps practices can be
about the data’s quality and consistency and
valuable. Maintaining continuous awareness

TDWI RESEARCH 33 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

enhances data curation and governance. A data inventories, which is necessary for
complete and accurate data catalog includes the regulatory compliance aspects of data
what the data means and represents, the data’s governance. Data catalogs can also manage
origins, where to find it, who is responsible for data governance and security rules that
it, and information to help the user quickly drive constraints in data pipelines and
understand if it is relevant to a user’s objectives. applications. Overall, 63% are satisfied with
With organizations supporting more diverse their data catalog, business glossary, or other
users and workloads than ever before, a data metadata management system for improving
catalog provides a necessary foundation for both data governance and regulatory compliance,
defensive and offensive data governance and plays with 22% very satisfied. The percentages for
an important role in making it easier to leverage inventorying data assets and documenting
and share high-value, trustworthy, and relevant objects available to users are similar; 21% are
data and data intelligence with other users. very satisfied and 42% are somewhat satisfied.

To understand where organizations are realizing • Improving data pipelines, transformation,


benefits and where they need to improve, TDWI and preparation. In Figure 11, just over
research examined levels of satisfaction with three in five respondents (61%) are satisfied
data catalogs, business glossaries, and metadata with their data catalog, business glossary,
management systems (see Figure 11). All the or metadata management system for this
objectives listed in the figure are important; we objective, with 20% very satisfied and 29%
discussed some of them earlier in the context of unsatisfied. This also shows improvement
defensive and offensive data governance. Here, over our 2021 research findings, which showed
for brevity, we will highlight those with the 11% were very satisfied, 31% were somewhat
highest levels of satisfaction. satisfied, and 28% were unsatisfied. Closer
integration of data catalogs with data curation
• Making it easier for users to search for and processes saves users’ time in locating
find data. At the point of data use, users need relevant data and accelerates development of
the ability to access the data catalog through trusted data products.
an intuitive interface to discover relevant,
available data and learn about its consistency, Automation in data catalogs enables faster
lineage, age, and governance constraints. identification of data quality, consistency, and
Organizations show three times greater completeness issues. Figure 11 shows that
satisfaction today than when we asked about 60% are satisfied with these data intelligence
this objective in 2021; 32% are very satisfied technologies for improving these areas. During
compared to just 10% in 2021. About the same data pipeline development and transformation
percentages are and were somewhat satisfied and preparation processes, some data catalogs
(36% in this report versus 35% in 2021). can automatically expose which data sets are
available, trusted, and governed.
• Improving data governance and
regulatory compliance. As noted earlier, Where organizations appear to be encountering
data catalogs are valuable for their role in the most difficulty is in establishing a single,
managing metadata to document accurate location-transparent resource for metadata. Two

TDWI RESEARCH 34 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

in five respondents (40%) say their organizations Distributed data environments


are unsatisfied, the largest percentage in
hold considerable potential
Figure 11. When organizations have numerous
if organizations can bridge
departmental databases or multiple data
platforms, data warehouses, and data lakes, they
data silos and manage the
can suddenly be drowning in disparate metadata environments holistically.
that is difficult to align.
Although there is no single way to solve these
Modern data catalogs have AI-infused capabilities problems, distributed data environments hold
for crawling distributed data platforms for considerable potential if organizations can bridge
metadata to coordinate data definitions and data silos and manage the environments holistically.
meaning across platforms. When organizations However, when growth is piecemeal rather than
establish an enterprise data catalog, it can according to an enterprise strategy, data integration
serve as a single, location-transparent resource and management becomes dauntingly complex.
regardless of the location of the data or type of In Figure 12, we can see that organizations are
data architecture. currently using or plan to use a variety of strategies
to address distributed data challenges and maximize
the potential of all the diverse data.
Distributed Data:
Challenges and Organizations are consolidating data silos.
Consolidating data from silos into a single data
Opportunities platform is the current strategy for the largest
percentage of organizations surveyed (47%).
Distributed data environments mostly just happen. Reducing the number of siloed data systems can
As organizations grow, add subsidiaries or divisions, make data management and governance easier
and reorganize to compete in new markets, business and potentially reduce total cost of ownership
data requirements generate new data platforms and (TCO). Organizations no longer need expert
applications. Users need more than what monolithic administrators for each platform or to carry licenses
enterprise data systems provide. Then, technology or subscriptions for more systems than they need.
evolution, especially for analytics and data science,
generates new systems that grow alongside existing Consolidating to a modern cloud data platform
legacy systems. Rapid migration to the cloud enables organizations to take advantage of elastic
typically adds to the amount of distributed data, scalability. Capabilities such as automatic scaling
which spreads across a hybrid of on-premises and (“auto-scaling”) in leading cloud platforms
multiple cloud-based data platforms. As a result, can add or subtract computational resources
data silos proliferate and become difficult to access, in response to changes in workloads without
integrate, manage, and govern. intervention from users or administrators.
Modern platforms use automation and rolling
technology updates to continuously optimize
performance even as workloads increase in
number and complexity.

TDWI RESEARCH 35 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

The research in Figure 12 shows the importance multidomain MDM is currently in use by 28%
of data intelligence tools and systems such as a of organizations surveyed and 39% plan to use
data catalog and semantic layer; 32% are currently this solution. One-quarter of respondents (25%)
using these to address distributed data challenges say their organizations are using knowledge
and 52% plan to use them. Earlier, this report graphs to visualize and analyze distributed data
discussed the value of location-transparent relationships and 38% plan to use them.
enterprise data catalogs. Additional semantic layer
technologies play an important role in ensuring Accelerating analytics and AI/ML depends on
the quality and consistency of business repre- distributed data management and integration.
sentations of distributed data for BI reporting, Nearly one-third are currently unifying analytics
calculations, and multidimensional modeling. to work across domains, query engines, and
storage repositories and 42% plan to do this.
Some organizations are using enterprise Access to all relevant data from across distributed
knowledge graphs and ontologies to improve systems is critical to analytics and AI/ML. A major
discovery of how distributed data sets relate to challenge is to hide the technical complexity of
each other and to definitions of higher-level accessing each system so that nontechnical users
entities such as people, places, and things and data scientists can move faster to gain single
that might be contained in MDM. In Figure 12, views. Semantic data integration that brings

Figure 11 Make it easier for users to search for and find data
32% 36% 13% 12% 7%

How satisfied is your Improve data governance and regulatory compliance


organization with its data 22% 41% 19% 9% 9%

catalog, business glossary, Inventory data assets; document objects available for users
or other metadata 21% 42% 19% 9% 9%

management system for Improve data pipelines, transformation, and preparation


achieving the following 20% 41% 20% 9% 10%

objectives? Improve data quality, consistency, and completeness


19% 41% 23% 9% 8%

Monitor data lineage, usage, and sharing


Based on answers from 166 23% 32% 24% 9% 12%
respondents. Order by combined
“very satisfied” and “somewhat Align metadata with business definitions of customers, products, etc.
satisfied” responses. 20% 34% 22% 9% 15%

Integrate data catalog with data virtualization layer


23% 28% 17% 13% 19%
Very satisfied
Provide descriptive semantic knowledge about diverse data
Somewhat satisfied 19% 30% 22% 12% 17%

Coordinate or consolidate metadata across multiple catalogs and glossaries


Somewhat unsatisfied
17% 32% 25% 12% 14%

Not at all satisfied Establish a single, location-transparent resource for metadata


17% 30% 29% 11% 13%
Don’t know or NA
Enable user annotation and sharing of data intelligence
16% 31% 25% 10% 18%

TDWI RESEARCH 36 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

together data catalogs and data virtualization A key goal of a data fabric is to take advantage
layers is an important strategy. Figure 12 shows of opportunities to automate tedious and
that 30% are currently using a data virtualization repetitive data engineering tasks. Solutions
layer and 40% plan to use one. are also focusing on using AI/ML to generate
recommendations for locating relevant data and
A similar percentage of respondents are currently streamlining preparation tasks. Organizations are
using a distributed SQL engine to query data in also automating application of data governance
multiple sources (28%) or planning to use one constraints within data fabrics.
(42%). Often based on open source Presto, a
distributed SQL engine can use a single SQL query Data intelligence is critical to data fabrics and
to retrieve and interact with large volumes of data data meshes. Data catalogs, active metadata
at multiple sources using models and systems management, and semantic data integration
ranging from traditional relational DBMSs to technologies play important roles in providing
NoSQL systems and data lakes. shared knowledge about distributed data for
faster, easier, and more consistent access, data
Data Fabric and Data Mesh preparation, and governance. Data intelligence
is also important to creating a decentralized data
Architectures mesh architecture, which 21% of organizations
Figure 12 shows that over one-quarter of surveyed are currently using.
organizations are developing a data fabric to view
and query distributed data (27%). A data fabric The data mesh has emerged as a leading strategy
provides a location-transparent layer, usually built for enhancing the value of distributed data and
with data virtualization, to make it easier and faster avoiding the costs, delays, and management
to access distributed data and manage and govern headaches of copying and moving massive
it holistically. The fabric should be extensible and amounts of data. Most data mesh strategies focus
modular to ensure less dependence on central IT. on strengthening distributed data ownership
and governance within domains: that is, where
Instead of a piecemeal approach, the goal users with a direct relationship to the business
of a data fabric is to have an organized and priorities build, manage, and share data products
orchestrated strategy for onboarding new data across the organization. A data mesh can
across the distributed environment. It enables complement a data fabric and take advantage of
organizations to set up policies and standards both data virtualization and open data lakehouses
for governed data pipelines and data curation to enable distributed data governance from
processes and for deploying computational centralized policies and increase the value of data.
resources to maximize benefits.
A data mesh supports “data as a product”
development. It encourages teams working in
Instead of a piecemeal
domains to treat data sets as products that have
approach, the goal of a data “customers” who could be business users, data
fabric is to have an organized scientists, data engineers, or external users of
and orchestrated strategy for monetized services, potentially made available
a distributed environment. through a data marketplace or exchange. The

TDWI RESEARCH 37 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 12 Consolidate data from silos into a single data platform


47% 38% 10% 5%

To address distributed Use a data catalog, semantic layer, and/or data intelligence tools
data challenges, which of 32% 52% 9% 7%

the following strategies Unify analytics to work across domains, query engines, and storage repositories
for unifying management 32% 42% 13% 13%

and governance and Use a data virtualization layer


integrating data access 30% 40% 15% 15%

are currently in use or Establish multi-domain master data management


planned to be used by 28% 39% 19% 14%

your organization? Use a distributed SQL engine to query data in multiple sources
28% 42% 19% 11%
Based on answers from 169 Develop a data fabric to view and query distributed data
respondents. Ordered by “currently 27% 42% 17% 14%
using” responses.
Use knowledge graphs to visualize and analyze distributed data relationships
25% 38% 24% 13%
Currently using
Create a decentralized data mesh architecture
21% 38% 25% 16%
Plan to use

Not using; no plans


to use

Don’t know or N/A

notion is that organizations need to take the development life cycles. Organizations use DataOps
same care in developing and operationaliz- and data observability capabilities to monitor and
ing data sets and data products that they do in manage progress toward operationalization. They
delivering traditional products to customers. can spot bottlenecks and quality problems that are
Depending on the use case, customers may need not only slowing progress but are also affecting the
data sets enriched with metadata and additional organization’s ability to achieve business objectives.
capabilities. Data products increase in value
when people can trust them for decisions and We asked research participants if their
collaborate to achieve business objectives. organizations are currently using or planning to
use DataOps practices and whether they have had
DataOps for Planning and experience in using DataOps or related methods
(see Figure 13). Only 15% say they are currently
Scalability using some or all DataOps practices successfully.
With development and production of data products, This is somewhat low, but we are still in the early
data pipelines, and analytics, organizations need stages of maturity in use of DataOps.
frameworks to provide a structure for planning,
orchestration, and scalability. DataOps builds
on agile and DevOps methodologies to provide
a structure that has similarities to software

TDWI RESEARCH 38 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Only 15% say they are currently • Increasing automation throughout data
integration and preparation (37%). Using
using some or all DataOps
a framework plus modern data observability
practices successfully. This is
tools can help organizations understand
somewhat low, but we are still which phases of integration and preparation
in the early stages of maturity. processes need automated tools to reduce
manual work and increase consistency.
In the first stages, organizations may adopt only
some elements of the full framework. As the • Ensuring data governance throughout
figure shows, more organizations have experience data integration and preparation (36%).
in using established agile and DevOps methods Data governance is a key focus of DataOps.
(26%). A significant percentage are planning Visibility through data observability
to use DataOps but have not yet begun (21%). monitoring tools can enable organizations
Smaller percentages say they are not having to understand the complete health of data
success with DataOps or have no experience or environments and be able to address both
interest in it. defensive and offensive data governance
issues quickly.
To gauge what issues are driving organizations
to use a framework such as DataOps, we asked • Streamlining delivery of data sets,
research participants what their most challenging models, analytics, and other artifacts
issues are in improving delivery of scalable and (31%). Organizations cannot streamline
repeatable value from data, code, and analytics delivery when quality is a problem. Agile,
models. Here are the top issues for respondents DevOps, and DataOps all help organizations
(figure not shown): spot quality problems sooner, before they
have a negative impact on downstream
• Improving communication and collaboration data products, analytics models, and other
among stakeholders (45%). Collaboration artifacts. Data observability tools can improve
is critical to ensuring quality, sorting out end-to-end visibility to identify and resolve
dependencies, and eliminating redundancy. problems hindering the speed of delivery into
Improving collaboration and communication production.
are central objectives for establishing
continuous feedback loops in the DataOps
framework.
Data Objectives for Optimizing
Analytics and AI/ML
• Having flexibility to adjust when
With demand rising for accelerated development
requirements change (39%). Projects often
and deployment of advanced analytics and AI/ML,
lose momentum and focus when users change
it is critical to address data access, preparation,
requirements. Agile and/or DataOps principles
and management problems that prevent projects
for shorter iterations can help prepare project
from achieving success. To close this report’s
teams to handle new requirements.
research, we asked participants to identify
their highest priority objectives for optimizing

TDWI RESEARCH 39 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

analytics and AI/ML workloads and increasing with some research participants looking to reduce
business value (see Figure 14). delays in preparing data processes; 18% say it is a
priority to set up accelerators and predefined data
The priority shared by most organizations processes for autoML.
surveyed highlights the importance of data
democratization: making it easier for users Using automation to accelerate data insights is
to discover and access all types of data (45%). a shared priority. Nearly one-third of research
Organizations are seeking solutions and participants prioritize automating discovery
practices that will empower users to uncover of actionable data insights (31%). Technology
insights drawn from exploration of relationships advances across data integration, data intelligence,
between sets of structured, semistructured, and and front-end data applications are aligning to
unstructured data. Nearly one-quarter prioritize enable automated discovery. In Figure 14, 22% say
making it easier and faster to discover data that they regard it as a top priority to augment BI,
relationships (23%). Organizations will need to KPIs, and metrics with AI-driven recommenda-
take measures to improve data availability and tions. Surfacing timely, intelligent recommenda-
support for self-service ad hoc querying. tions to augment users’ dashboards can accelerate
informed business decisions. To enhance
The second most common objective is to make predictive understanding of future directions in
it easier for non-coders to search, access, and sales, operational expenditures, and business
prepare data (42%). This again highlights the profitability, an even larger percentage say that
trend toward data democratization; organizations enriching business forecasting with AI/ML insights
do not want to unnecessarily limit deeper is important (30%).
data interaction to only technically skilled
programmers and developers. One-quarter of One in five organizations
research participants see enabling self-service surveyed say that enabling
data preparation (e.g., data loading, blending, and use of LLMs and generative
transformation) as one of their keys to optimizing AI is a top objective (20%).
analytics and AI/ML workloads. Along with making
data discovery and access easier, organizations Generative AI is a rising objective. One in five
need tools that empower nontechnical users, organizations surveyed say that enabling use of LLMs
business analysts, and data scientists to prepare and generative AI is a top objective (20%). Across
data as they see fit with less dependence on IT. industries, there is clearly tremendous interest today
in the transformational capabilities of generative
Trusting predictive models and the results of AI—a subset of AI/ML that offers techniques for
AI/ML depends on appropriate data curation. generating new content in a variety of forms,
A significant percentage of respondents are including but not limited to text, code, images, and
looking for improvement in curating diverse speech. Generative AI requires robust data platforms
data sets to ensure unbiased results (28%). to support LLMs and similar ML models at the core
Organizations need AI model governance tools of generative AI. Organizations will need to revise
that provide visibility into training data selection defensive and offensive data governance rules and
and curation to avoid “black box” models (that policies to guide growth in generative AI.
are not transparent). AutoML support is a focus,

TDWI RESEARCH 40 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 13 We use agile and/or DevOps but not


26%
DataOps
Is your organization
currently using or Not yet, but we plan to use DataOps 21%
planning to use
DataOps practices to Currently using some or all DataOps
optimize productivity practices successfully
15%
and support growth
in data pipelines and Interested but we have no current plans 14%
analytics and AI/ML
workloads? Have you We tried DataOps but did not have success 6%
had experience in using
DataOps or related We have no experience or interest in
methods? 4%
DataOps

Based on answers from 163 Don’t know or N/A 14%


respondents.

Recommendations Enhance data management and governance


agility. When markets, economic conditions,
and competition are changing quickly, business
Here are 10 best practices for achieving scalable, leaders and data scientists need agility so they
agile, and comprehensive data management and can focus on solving business challenges. Good
governance. This report has discussed how data practices for data governance can fall by the
democratization, data-driven applications, and wayside, but this eventually creates trouble,
advancement with analytics and AI/ML are major including higher costs. Our research shows dissat-
drivers behind modernization. Our research shows isfaction with data management and governance
that these 10 best practices are key for meeting support for strategic decision-making and new
demands generated by these drivers. business development. Bring business leadership
and stakeholders together to find the right
Harness modern technologies and practices to
balance between agility and governance. Use
improve data quality. The top priority across much
the elasticity of cloud data platforms to handle
of our research is improving trust in data quality,
changing requirements. Automating constraints
accuracy, and completeness. With new data sources
can play a beneficial role in making data
and types of workloads, data quality will never be
governance less obtrusive and easier to update.
a fully solved problem. Organizations need to take
advantage of data catalogs, data preparation tools,
Address analytics and AI/ML pain points. Our
and capabilities in data platforms and virtual layers to
research shows that solving data management,
align data curation processes with user requirements.
integration, preparation, and governance
AI-infused automation can help organizations
challenges is key to moving forward with
improve data quality in massive, high-velocity data.

TDWI RESEARCH 41 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

augmented analytics, including generative AI. empowering users to do more of their own work
Reducing time and costs associated with data rather than depend entirely on IT. In this report,
collection, preparation, and provisioning for we find that demand for self-service capabilities
feature engineering is a priority. Organizations continues to grow. However, organizations should
should evaluate automated tools and examine increase the role of shared resources in data
processes to rationalize those that are redundant. integration, transformation, and pipelines—such
Ease of use and automation are advancing in as data catalogs and the computational power of
solutions to make it easier for non-coders to cloud data platforms—to improve speed, scale,
discover and prepare data. and quality, and reduce redundancy.

Establish data stewardship. Many of the To improve self-service


challenges faced by organizations in our research satisfaction, organizations
call for improved data stewardship. Data stewards
should increase the role of
have expertise in the data and can oversee both
shared resources—such as data
defensive and offensive data governance. They
can mentor new users to follow rules and policies
catalogs and the computational
and apply best practices for selecting data sets power of cloud data platforms
that are fit for purpose. As noted, data stewards
Examine alternatives to heavy data movement,
are usually busy with “day jobs” in IT as data
replication, and copying. Our research finds
analysts and managers or have business roles
that reducing these activities is one of the
as subject matter experts in business domains.
top objectives among organizations surveyed.
Thus, automated tools and automated processes
Automation, monitoring, and scheduling help
in data catalogs are critical to reducing hands-on
identify and solve bottlenecks. Consolidation
monitoring and manual work.
to a unified data platform can reduce some
Improve data intelligence and maximize the value needs for data movement, such as between data
of data catalogs. Data intelligence, centered in staging areas and target platforms. In other cases,
a data catalog, is critical to both defensive and organizations can avoid movement by setting up
offensive data governance. Data catalogs and a data virtualization layer and developing a data
other metadata management tools help keep fabric architecture.
track of and classify sensitive data for protecting
Evaluate a data fabric architecture for distributed
PII. They also aid in discovering data quality and
data. Research in this report shows considerable
consistency issues and streamline solving them.
interest in data fabrics to improve data access,
Modern data catalogs offer AI-infused automation
governance, and management of distributed data.
to handle complex, high-volume data and feature
Growth in hybrid multicloud data environments
easier-to-use interfaces for less technical users.
is accelerating interest as organizations evaluate
Improve self-service data integration, how to establish a virtual, location-independent
transformation, and pipeline development. layer above multiple data platforms. Data fabrics
Organizations would like to reduce the time rely on semantic richness based on metadata and
users spend on data preparation and pipeline additional data intelligence, often managed by a
development. There is strong interest in data catalog. Organizations should evaluate current

TDWI RESEARCH 42 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

Figure 14 Make it easier for users to discover and


45%
access all types of data
Which of the following
objectives are your
Make it easier for non-coders to search,
highest priority to address 42%
access, and prepare data
for your organization to
optimize analytics and
Automate discovery of actionable data
AI/ML workloads and 31%
insights
increase their business
value? Enrich business forecasting with AI/ML
30%
insights
Based on answers from 159
respondents, who were asked Curate diverse data sets to ensure unbiased
to select at least their top three 28%
results
objectives.

Enable self-service data preparation (e.g.,


25%
data loading, blending, transformation)

Make it easier and faster to discover data


23%
relationships

Augment BI, KPIs, and metrics with


22%
AI-driven recommendations

Support users’ preferred model


21%
development frameworks and languages

Enable use of large language models and


20%
generative AI

Set up accelerators and predefined data


18%
processes for autoML

Automate feature engineering recommen-


4%
dations

components to determine if they can support a data empowerment and reduced dependence on
fabric and modernize where needed. centralized data platforms and enterprise IT. A
data mesh champions decentralization to enable
Consider a data mesh architecture. With domain teams closest to business objectives.
business domains across organizations As they empower domains, organizations
increasingly focused on developing various also need to enable cross-domain projects,
types of data products, they are seeking enterprise data governance, cost management,

TDWI RESEARCH 43 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

and efficient resource sharing. Evaluate a data


mesh architecture as a potential framework for
decentralization and self-service to enhance
business agility. Use federation and automation of
constraints to standardize data governance.

Evaluate DataOps for better planning and


scalability. Unplanned increases in data
pipelines, data preparation, and development
of data products can generate costly problems
without a management framework. Our research
finds growing interest in DataOps. Improving
communication and collaboration among
stakeholders is a challenge faced by many
organizations and is a key goal of DataOps.
Evaluate data observability monitoring as part
of DataOps to improve end-to-end visibility into
development and the business impact of data
management and governance problems.

TDWI RESEARCH 44 tdwi.org


Research Sponsors

Alation is a leader in enterprise data intelligence Dataiku is a platform for Everyday AI, enabling data
solutions, enabling self-service analytics, cloud experts and domain experts to work together to
transformation, and data governance. More than build data into their daily operations, from advanced
450 enterprises, including more than 35% of Fortune analytics to generative AI. Together, they design,
100 companies, build data culture and improve develop, and deploy new AI capabilities, at all scales
data-driven decision-making with Alation, including and in all industries. Organizations that use Dataiku
Cisco, Nasdaq, Pfizer, Salesforce, and Virgin enable their people to be extraordinary, creating the
Australia. The company’s data intelligence platform AI that will power their company into the future.
allows anyone in an organization to self-service
governed data so that anyone can find, understand, More than 500 companies worldwide use
and trust data for decision-making, to train AI Dataiku, driving diverse use cases from predictive
models, and accelerate digital transformation. As maintenance and supply chain optimization
a pioneer of the data catalog market and creator of to quality control in precision engineering to
the first-ever machine learning-driven data catalog, marketing optimization, generative AI use cases, and
Alation combines machine learning with human everything in between.
insight to solve a key problem faced by almost every
organization: the need to know exactly what data
they have, where it lives, if it’s trustworthy, and how
to make the best use of it. Alation’s data intelligence
platform inspires collaboration and makes it possible
for everyone in an organization to think data-first
before making decisions. The company recently Denodo is a leader in data management. The
surpassed $100M in ARR, closely followed by raising award-winning Denodo Platform is the leading data
a $123M Series E round, led by Thoma Bravo, Sanabil integration, management, and delivery platform
Investments, and Costanoa, with participation from using a logical approach to enable self-service BI,
Databricks Ventures, an all-equity tranche that data science, hybrid/multicloud data integration, and
values Alation over $1.7 billion—an impressive 1.5 enterprise data services. Realizing more than 400%
times higher than the company’s previous valuation ROI and millions of dollars in benefits, Denodo’s
in a challenging economic climate. customers across large enterprises and mid-market
companies in 30+ industries have received payback
in less than 6 months.

TDWI RESEARCH 45 tdwi.org


Research Sponsors

Erwin by Quest is a trusted industry leader, Hitachi Vantara is part of Hitachi Group. From
empowering organizations with data management transportation and energy to automotive systems—
and governance solutions. Erwin Data Intelligence the group focuses on sectors that make an incredible
unlocks deep insights and enhances data impact on our future. Understanding how data
governance, ensuring data assets are effectively enables mission-critical digital and industrial
harnessed. Erwin Data Marketplace fosters environments is what we do.
seamless collaboration and access to reliable
data, fueling agile decision-making across teams.
The Pentaho platform enables insights at scale into
Erwin Data Modeler helps simplify complex
data by operationalizing data management with
data migration challenges with efficient data
automation accelerating data discovery, onboarding,
architecture management. Erwin by Quest empowers
and tiering, as well as data pipeline orchestration
organizations to embrace the future of data
and analytics. This improves data quality and
management by empowering them to make informed
trust by discovering, identifying, categorizing
decisions and unleash the true potential of their data
structured and unstructured data into meaningful
assets while safeguarding sensitive information.
terms defined by customers, and standardizing
them through business glossaries. This helps
organizations meet their data governance gathering
and usage requirements. In addition, the analytics
tools provide users with observability of their
pipelines and can provide dashboards to ensure data
quality-driven outcomes.

The Pentaho platform portfolio includes:

• Pentaho Data Integration & Analytics:


end-to-end IT and OT data integration and
analytics for any data type

• Pentaho Data Catalog: provides trusted data by


automating discovery for self-service analytics,
compliance, and quality across an enterprise-
wide data fabric

• Pentaho Data Storage Optimizer: tiers data


between file systems such as Hadoop and S3
Object Storage seamlessly for data management
and cost savings

TDWI RESEARCH 46 tdwi.org


Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance

As one of the world’s largest providers of enterprise Snowflake enables every organization to mobilize
application software, SAP strives to help every their data with Snowflake’s Data Cloud. Customers
business run as an intelligent enterprise to use the Data Cloud to unite siloed data, discover
ultimately help the world run better and improve and securely share data, power data applications,
people’s lives. The SAP Business Technology and execute diverse AI/ML and analytic workloads.
Platform accelerates innovation to unlock your Wherever data or users live, Snowflake delivers a
business potential. SAP BTP brings together single data experience that spans multiple clouds
application development, data and analytics, and geographies. Thousands of customers across
integration, and AI capabilities into one unified many industries, including 639 of the 2023 Forbes
environment optimized for SAP applications. Global 2000 (G2K) as of July 31, 2023, use Snowflake
Additionally, SAP data and analytics solutions enable Data Cloud to power their businesses.
organizations to easily integrate, model, manage,
and use data from anywhere. Using SAP’s modern Learn more at snowflake.com.
data stack, organizations retain the context of SAP
data and have a trusted foundation that extends -
planning and analytics across the enterprise to make
the most impactful decisions.

More information is available at sap.com.

TDWI RESEARCH 47 tdwi.org


TDWI Research provides research and advice for
data professionals worldwide. TDWI Research
focuses exclusively on data management and
analytics issues and teams up with industry
thought leaders and practitioners to deliver both
broad and deep understanding of the business and
technical challenges surrounding the deployment
and use of data management and analytics
solutions. TDWI Research offers in-depth research
reports, commentary, inquiry services, and topical
conferences as well as strategic planning services
to user and vendor organizations.

A Division of 1105 Media


6300 Canoga Avenue, Suite 1150
Woodland Hills, CA 91367

E info@tdwi.org tdwi.org

You might also like