Professional Documents
Culture Documents
By David Stodder
Research Sponsors
Alation
Dataiku
Denodo
erwin by Quest
Hitachi Vantara
SAP
Snowflake
BEST PRACTICES REPORT Q3 2023
This report is based on independent research and represents TDWI’s findings; Research Sponsors . . . . . . . . . . . . . . . . . . . . . . . . . . 45
reader experience may differ. The information contained in this report was
obtained from sources believed to be reliable at the time of publication.
Features and specifications can and do change frequently; readers are
encouraged to visit vendor websites for updated information. TDWI shall not
be liable for any omissions or errors in the information in this report.
Report purpose. Data demands are great as Research methods. In addition to the survey,
organizations deploy analytics, artificial TDWI conducted interviews with business and IT
intelligence and machine learning (AI/ML), executives and managers, application developers,
and data-rich applications. Data democratiza- and data management and analytics experts.
tion is multiplying use cases. Organizations face TDWI also received briefings from vendors that
data governance challenges to protect sensitive offer products and services related to the topics
data and improve trust in data quality. Data addressed in this report.
management and integration require scalability,
speed, and agility. This TDWI Best Practices Report Survey demographics. Nearly two in five respondents
examines priorities and discusses how to overcome are business or IT executives and VPs (39%). The
challenges to achieve value. second-largest group consists of business or data
analysts and data scientists (26%). Third largest
Survey methodology. In June and July 2023, is developers and data, application, or enterprise
TDWI sent invitations via email to business architects (18%). Line-of-business (LOB) managers
and IT professionals in our database, asking and business sponsors account for 8% of the
them to participate in an internet-based survey. respondent population. Other IT staff, consultants,
The invitation was also posted online and in and other titles account for 9% of the total.
publications from TDWI and other firms. The
survey collected responses from 209 respondents, Industries vary considerably. Computer/network
with 158 completing every question. For this manufacturing is the largest (11%), followed by
research, all responses are valuable and are financial services (10%), software manufacturing
included in this report’s sample. This explains why and publishing (9%), data services (8%),
the number of respondents varies per question. construction/engineering and government
(8% each), consulting/professional services
and education (6% each), healthcare (5%), and
Position insurance (4%). Just under half of respondents
reside in the U.S. (48%), with South Asia (primarily
India and Pakistan) (25%), Asia and Pacific
Other 2%
Islands (10%), Europe (6%), Canada (4%), and
IT-other; consultants 7%
other regions following. Respondents come from
LOB managers/ enterprises of all sizes.
sponsors 8%
Developers/ Business/IT
architects 18% 39% exec/VP
Business, data
analyst/scientist 26%
Industry
Computer/network manufacturing 11% Consulting/professional services 6%
Geography Europe
6%
Africa
3% Middle East
Canada 1%
4% South Asia
United States (India, Pakistan)
48% 25%
Asia/Pacific
Islands
10%
Central/South Australia/
America New Zealand
3% 2%
confident in how they manage and govern data They can seem like fixed services; organizations
today and can respond to change. Twice that that do not consider themselves “in the IT
percentage (28%) indicate that although they regard business” are mostly focused on reducing IT
their organizations’ current data management costs. However, business-driven adoption of
and data governance as successful, they anticipate cloud computing and trends toward funding
challenges in responding to change. some or all data management through operations
rather than as standard capital expenditures
Fortunately, only a small percentage of have put increased emphasis on aligning
respondents indicate a total lack of confidence; resource utilization with business operations and
just 8% agree with the statement that they are objectives.
“struggling to meet current requirements and are
not well prepared for change.” However, nearly as Usage-based service pricing
many in our research (10%) say that they are just models offered today by cloud
getting started with managing and governing data
data platform, application,
for more than one project, application, or team.
and tool providers make it
Inexperience is often the case with small and
easier to see the relationship
midsize businesses (SMBs). As they grow, between resource consumption
SMBs often bump up against the limitations of and business activity.
inexpensive and readily available tools such as
spreadsheets; hand-coded, point-to-point data Usage-based service pricing models offered today
pipelines; and siloed data platforms. As data by cloud data platform, application, and tool
grows in volume and variety and more users providers make it easier to see the relationship
attempt complex analytics, these users spend too between resource consumption and business
much time on ingesting, extracting, and preparing activity. With these models, business functions
data. Redundant or inconsistent data pipelines can understand how business activities drive
and transformation processes increase costs and consumption of storage, processing, data
lead to unsatisfactory performance. Decision management, and data integration services.
makers are reluctant to trust the resulting data With holistic data observability monitoring,
findings. Risks of sensitive data exposure are organizations have visibility into how bottlenecks
everywhere, demanding more holistic data slowing data access—such as having to wait
governance to at least ensure adherence to data for high-volume data movement or slow data
privacy regulations. transformation processes—ultimately affect
operational performance, customer satisfaction,
and the ability to achieve business objectives.
Managing and Governing
Data for Business Objectives Cloud service monitoring tools play an essential
role in tracking usage so companies can avoid
When traditional IT systems and practices are
paying for unused or underutilized resources.
isolated from business functions, it is not always
Organizations can deploy metrics and data
clear how data management and governance
observability capabilities, which will be discussed
relate to the achievement of business objectives.
in more detail later, to understand usage patterns
and to know, for example, which data sets, • Customer experience. Big data analytics is
reports, dashboards, analytics models, data revolutionizing how organizations personalize
warehouses, and data applications are delivering marketing offers and shape customer
the most value. experiences. The volume of behavioral
data generated by customer activity across
Monitoring and metrics enable organizations channels can give marketers a tremendous
to analyze usage data to determine where resource for gaining a detailed view of
modernization, such as greater automation, customers’ historical and real-time behavior,
would be most beneficial. Monitoring visibility which organizations can analyze to generate
also helps to evaluate whether resources are predictive insights into how customers will
being devoted to unimportant projects or where respond to future offers. Figure 1 shows
your organization needs to improve access that 28% are very successful and 43% are
to underutilized data assets. With analytics somewhat successful with data management
augmented by AI/ML (including generative AI and and governance for improving customer
LLMs) rising in importance, organizations must experiences.
ensure that they have resources and governance
practices in place as they scale up in numbers of • Customer service and support. Integrating
users, workloads, and data volumes. data to gain 360-degree views of customers
is valuable for all customer-centric
Understanding how investments contribute to objectives, including excelling in service
business objectives is critical. To drill down deeper and support. Valuable data often includes
into the contribution of data management and semi- and unstructured customer service
governance investments to business success, records, call center interaction records,
we asked research participants to indicate how customer satisfaction survey data, and
successfully their organizations are managing information generated by online behavior and
and governing data and realizing value through engagement. Figure 1 shows that 22% are very
analytics, including AI/ML, to achieve objectives successful and 46% are somewhat successful
across business functions and initiatives. In with data management and governance for
Figure 1, we can see that organizations surveyed improving customer service and support.
are most successful at meeting objectives
important to regulatory compliance and • Financial planning, budgeting, and
protecting data; 30% are very successful and 43% forecasting. Organizations are reasonably
are somewhat successful. These results show the satisfied with their data management and
importance organizations place on succeeding governance for these activities; 22% are very
with core data governance priorities for regulatory successful and 44% somewhat successful. Users
compliance, often led by enterprise IT. depend on data management and governance
to provision trusted, high-quality data and
Organizations indicate the most success in three enable timely single views drawn from
additional areas: multiple sources for accurate snapshots and
forecasting decisions. Agility is an important
attribute for organizations to update plans and
budgets as conditions change.
Based on answers from 209 Collaboration with business partners and network
respondents. Ordered by combined 17% 45% 19% 8% 11%
“very successful” and “somewhat
Risk management
successful” responses.
23% 37% 21% 8% 11%
When we combine “somewhat unsuccessful” and with 22% somewhat unsuccessful and 14% not
“not at all successful” responses, Figure 1 shows at all successful.
that organizations are less successful in these
three areas: • Strategic decisions and new business
development. When organizations are
• Automated decisions and embedded undergoing significant, sometimes unexpected
analytics. Cutting-edge applications embed changes in markets, business networks, and
analytics to detect events, anomalies, and economic environments, they need agile data
patterns and provide users with recom- management and governance. The research
mendations for action. Some use analytics indicates some frustration with support
to drive automated decisions. The research for strategic decision-making and new
indicates that many organizations are not business development, with 25% somewhat
yet fully satisfied with data management and unsuccessful and 9% not at all successful.
governance for supporting these capabilities,
• Systems and equipment diagnostics and importance of various drivers and priorities. Half
maintenance. Analytics augmented with of those surveyed (50%) say that improving trust
AI/ML is revolutionizing these functions, in data quality, accuracy, and completeness is
enabling organizations to develop predictive one of their most significant drivers. This issue
and prescriptive algorithms to know when is the focus of what many in the industry call
machinery is likely to break down. They can “offensive” data governance. Protecting sensitive
prepare maintenance and repair based on data and preventing unauthorized use and
data rather than traditional schedules. Data sharing, regarded as “defensive” data governance,
management, integration, and governance ranks second, with 38% identifying it as one of
are critical to gathering and preparing data their top three drivers. Just under one-third say
streamed from end points and enabling that complying with regulations and enabling
integrated views of historical and real-time regulatory reporting is a significant driver.
data. About one quarter (26%) are somewhat
unsuccessful in supporting these advanced Both offensive and defensive data governance
capabilities and 6% are not at all successful. are essential to maximizing the value of all
data, whether structured, semistructured, or
Modernization Drivers unstructured. Figure 2 shows that this ranks third
among drivers with 37% choosing it as one of
their top three. This selection also indicates the
Organizations are updating existing data
importance of collecting and curating diverse
platforms or migrating to new ones and using
data for analytics and AI/ML development. Data
modern data integration, data management,
lakes, unified data lakehouse platforms that
and governance practices and systems to
integrate data warehouses and data lakes, and
achieve greater success with the objectives
NoSQL systems have become critical technologies
shown in Figure 1. To gain insight into current
for managing diverse data. Many organizations
modernization priorities, we asked research
are focusing on addressing distributed data
participants to identify their organizations’ most
challenges; eliminating data silos and unifying
significant drivers (see Figure 2).
them in a single platform ranks fifth, with 32%
saying it is one of their top drivers.
The largest percentage
of respondents (50%) say Analytics and AI/ML development and
that improving trust in operationalization are significant drivers. Ranked
data quality, accuracy, and fourth by participants is the usage of analytics
completeness is one of their for operational efficiency and effectiveness.
most significant drivers. Organizations want to enable managers in
business functions and lines of business (LOBs)
The top two choices demonstrate the importance to examine different perspectives on data used
of addressing data governance as part of in performance metrics and dashboards to
modernization. We will discuss specific data determine, for example, the best ways to reduce
governance issues in depth in the next section, unnecessary costs and delays.
but here we gain a broader view of the relative
Nearly half of research participants (48%, drivers. Among this selection of research
not shown) who indicated earlier that their participants, 72% say that developing, testing,
organizations are highly successful and confident and deploying analytics and AI/ML is a top driver
in how they manage and govern data say that (compared to 25% of all respondents saying this
using analytics for operational efficiency and driver is significant).
effectiveness is one of their most significant
(25%) and 10% are not at all successful. We that may not be available in legacy data
will discuss later how data catalogs and similar integration and management systems.
metadata management systems are valuable Traditional practices often require users to
for knowing the location of sensitive data and consult policy documentation to determine
establishing an inventory. whether their data use and sharing complies
with governance policies, which can lead to
The research shows that inconsistent compliance.
organizations are least successful
• Monitoring governance during data
in improving discoverability
loading, movement, and replication.
and having an accurate
Organizations need to monitor exposure risks
inventory of sensitive data. not only while data assets are at rest in data
lakes and data warehouses but also while they
Other issues that organizations indicate are
are in motion during data integration or cloud
challenging include:
migration. Governing data in motion to avoid
exposure appears to be challenging for some
• Coordinating data governance with data
organizations; 22% say they are somewhat
security and use of masking techniques.
unsuccessful and 11% are not at all successful.
One-third of organizations surveyed say
Leading data integration tools and data
they are unsuccessful in establishing good
platforms offer automated capabilities for
coordination (33%), with 10% reporting that
monitoring data governance during loading,
they are not at all successful. Governance
movement, and replication.
committees should bring leaders together to
ensure alignment between data governance
and data security measures such as masking, Data Governance for
anonymizing and de-identifying data. Improving Data Trust
Regulations often have conditions regarding
acceptable use of these techniques for The offensive side of data governance is about
certain types of customer data. Organizations building data trust. Data curation steps for
should evaluate solutions such as automated improving the data’s quality, validity, integrity,
tag-based masking capabilities in data and authenticity are central to this objective.
platforms and tools that offer flexibility in Integrating data governance and data trust is how
whether your organization assigns masking to organizations can avoid a “Wild West” in which
databases, tables, schema, or columns. the expansion in self-service BI and analytics
happens without proper data governance
• Embedding governance constraints guardrails. Through data governance, you can
in pipelines, tools, and applications. clarify who is responsible for data authoring and
One-third of organizations surveyed are data curation.
unsuccessful in implementing embedded
constraints (33%), with 12% indicating Data quality monitoring is valuable for ensuring
they are not at all successful. This priority requisite profiling, cleansing, validation, and
involves using embedded capabilities enrichment, as well as uncovering problems
Regarding regulatory Controlling access to sensitive data such as personally identifiable information (PII) 2%
30% 50% 14% 4%
compliance and
protecting sensitive data, Setting data governance rules and policies and keeping them up to date
31% 42% 15% 7% 5%
how successful is your
organization with its data Training users in governance and responsible data use
21% 43% 22% 8% 6%
governance for addressing
the following priorities? Ensuring data governance in self-service BI and analytics
20% 44% 23% 7% 6%
such as data inconsistency and redundancy. validate new data and address data quality issues
Data governance policies can indicate when at sources (50%). Organizations can use data
data owners or other experts should be notified intelligence tools and processes such as data
about problems with the data. Modern tools can catalogs to know more about source data quality.
automate monitoring. AI-infused automation Scalability is often a challenge when sources
is playing an increasing role in data curation become numerous and voluminous.
processes, enabling organizations to scale up to
higher and more diverse data volumes faster and The second most common action is to monitor data
spot anomalies that require attention. quality and ensure fit for each purpose (45%). With
the spectrum of use cases in most organizations
TDWI research asked which of the actions stretching from simple data consumption to deep
listed in Figure 6 are being undertaken within analytics and AI/ML, it is important to target
respondents’ organizations to increase users’ data curation carefully. Data quality monitoring
trust in data quality, including its accuracy and and processes need to be fit for each purpose.
completeness. The most common action is to AI-augmented tools for automatically detecting
data quality issues are evolving to make it easier data classification as well. Along with enabling
for users to learn about data quality and address better defensive data governance and managing
problems; 25% say they are currently deploying the inventory of sensitive data, data lineage
these tools and 16% are automating data quality information makes it easier for self-service
recommendations to users. business users to find and contextually understand
high-value, relevant, and trusted data.
Data stewards can oversee
data quality processes, manage Many organizations employ a data catalog for
data governance. Our research shows that 42%
metadata and master data, and
use a data catalog to inventory data and identify
guide users to data that is fit
data quality issues. Automation in modern data
for purpose. Automated data catalogs is critical to meeting scalability and data
intelligence tools are essential. diversity challenges. As noted, data catalogs can
help organizations document data inventories
Data stewardship is a priority. Almost half of as required by regulations and data lineage.
organizations (45%) say that enhancing data Easier-to-use interfaces in modern data catalogs
stewardship to include data quality and trust enable different types of users to access data
is an action they are taking. Data stewardship intelligence and determine whether they can trust
is valuable for both defensive and offensive data to be fit for their needs.
data governance. Along with being experts in
regulatory compliance, data stewards can oversee
data quality processes, manage metadata and
Top Data Governance
master data, and guide users to data that is fit Challenges
for purpose. Automated data intelligence tools
We close this section of the report with a look at
such as data catalogs (or similar functionality
the most pressing data governance challenges
embedded within some data intelligence tools
confronting organizations in our research. These
and data platforms), as well as embedded data
fall into the following areas:
governance constraints and data governance
workflows, are essential to data stewards because
Governing distributed data across silos.
they reduce manual work, offer more complete
Data distributed across a hybrid multicloud
information about data, and streamline updates
environment makes it difficult to improve data
to metadata as the data environment changes.
quality, track data lineage, and monitor data use
for protecting sensitive data. Data silos, including
Tracking data lineage is also being undertaken
independent data marts and data lakes, fragment
by 45% of organizations surveyed. Data lineage
data across systems; 41% cite data silos as one of
information includes visual documentation of
their top three challenges (figure not shown for
the data sources, ownership, and transformations
research reported in this section). Silo creation
throughout an organization’s data landscape.
often accelerates with data democratization as
Some rely on data lineage–specific tools. Others
project teams or individual users set up their own
leverage data catalogs that can track data lineage
data repositories.
and serve as a central repository for information,
combining visibility of data quality scoring and
One-third (33%) say that unifying organizations surveyed (41%) to eliminate data
silos by moving data to a central data platform.
data governance across a
Some are instead choosing to create a virtual data
distributed data architecture
fabric to improve holistic data governance, data
is a major challenge. quality, and data management across a distributed
data environment (24%). These options will be
One-third (33%) say that unifying data
discussed in more detail in sections to follow.
governance across a distributed data architecture
is a major challenge and 28% say it is difficult Integrating data governance with data
to ensure data governance across on-premises integration and data pipelines. Organizations
and cloud-based data. Governance challenges should ensure that data governance is part of
are motivating a significant percentage of processes for ingesting, collecting, and preparing
data, not just once the data is at rest in storage Many organizations surveyed say that finding
and managed as part of a data warehouse or data the right people to be data stewards is a primary
lake. Nearly one third (30%) say that ensuring challenge (38%). Organizations often struggle to
data trust and protection in data pipelines, identify data stewards, most of whom have “day
data loading, and extract, transform, and load jobs” as business subject matter experts (SMEs),
(ETL) processes is one of their top challenges. data analysts, business analysts, or data warehouse
Organizations should employ modern, automated managers. Automated tools and data catalogs
capabilities embedded in data integration tools are critical to reducing hands-on monitoring
and platforms. Allowing data pipeline processes and manual policy enforcement so data stewards
to automatically access a data catalog or other can work efficiently. By collecting information
metadata management will enable users to find about data activity among users, data knowledge,
trusted data faster and help data stewards keep data ownership, and relevant data governance
track of sensitive data. constraints, data catalogs can help data stewards
avoid having to track down disparate information.
Addressing people and process issues. Earlier,
our research noted that most organizations are Using automation effectively. More than one
confident in their ability to set governance rules quarter of organizations surveyed (28%) say
and policies and keep them up to date. Yet, a that automating data governance, data quality,
significant percentage (40%) say that ensuring and data stewardship tasks is one of their top
clear rules, policies, and responsibilities for data challenges. The same percentage need solutions
use is an ongoing challenge. This is because data that will enable them to increase automated
landscapes are constantly changing, as are data scanning and tagging of new data sets (28%).
privacy regulations and the dangers of exposure Organizations should evaluate forward-looking
through errors and abuse, such as hacking. tools and platforms that provide automation of
Organizations should continuously monitor the these and similar functions.
effectiveness of rules and policies so that they are
effective and up to date.
Data Management
As we noted earlier, integration of data
governance with analytics model governance
for Expanding
is an emerging priority. In our research, 14% of Opportunities
organizations surveyed regard this as one of their
top challenges. Organizations should facilitate Pressures on data management continue to rise as
communication between data governance and data grows in volume, variety, and velocity. Rather
data science teams to align policies and processes. than rely on a traditional monolithic system,
organizations increasingly rely on a portfolio of
Automated tools and data data management solutions to support access
catalogs are critical to reducing to the range of historical, continuously updated,
hands-on monitoring and manual and real-time data for BI, analytics, and AI/ML.
To maximize the value of data assets, including
policy enforcement so data
through monetization and participation in data
stewards can work efficiently. marketplaces, organizations are creating data
services made available through application in their ability to modernize, and 5% are not at
programming interfaces (APIs). all satisfied or confident they can modernize.
Some say they are just getting started with data
Data platforms are the centerpiece of data management (9%) and 3% don’t know.
management. However, platforms have become
more diverse as organizations seek to manage, The largest percentage
govern, and derive value from petabytes of of respondents is mostly
structured, semistructured, and unstructured
satisfied with their current data
data. In-memory platforms, massively parallel
management (45%), but over
processing (MPP), and other technology
advancements are enabling organizations to
one-third is more pessimistic.
improve query performance and data availability
We will discuss data management challenges
for analytics and AI/ML.
and priorities more fully later in this section.
First, we examine research results about which
Central to data management modernization for
systems, platforms, storage, cloud services, and
most organizations is cloud migration. Cloud
technology standards are currently in use or
computing arrangements vary, but the trend
planned (see Figure 7).
toward serverless computing frees organizations
from having to devote most of their time to
Not surprisingly, the most established options top
technical details such as server configuration,
the list: relational DBMS for data warehouse (68%
system and software updates, maintenance, and
currently using) and spreadsheets (66% currently
security. They can devote greater attention to
using). Spreadsheets are ubiquitous, especially
tapping data’s business potential and taking
among organizations that are just starting with
advantage of scalable and elastic cloud computing
data management (87% of these respondents
to respond to changing requirements.
currently use them for data management, not
shown). Analysts extract data from sources into
Modernizing data management is not easy. As
spreadsheets where they then store, organize,
technologies and practices become established,
categorize, manipulate, and analyze data.
data management becomes like a big ship that
takes time to change direction. Only 8% of
Spreadsheets offer functions for activities such
organizations surveyed say they are both very
as aggregating data, performing calculations,
satisfied now with their data management and
and creating graphs. However, spreadsheets can
confident they can modernize technologies,
become poor-quality data dumps when analysts
services, and practices to meet evolving
lack formal processes to incorporate new data
requirements (figure not shown).
and cleanse, enrich, and prepare it. Files grow and
become difficult to use and share for business
The largest percentage is mostly satisfied with
insights. Data catalogs can be useful for governing
their current data management (45%), but these
data so users working with spreadsheets or other
organizations anticipate challenges in trying to
BI tools can locate and access trusted data sets.
modernize. Over one-third is more pessimistic;
30% are somewhat dissatisfied with their current
state of data management and not very confident
Relational systems fit well for supporting BI reporting data access and deliver more complete data
and analytics requirements involving mostly management. If traditional relational DBMSs are
structured data. Transactional databases (OLTP) not appropriate for use cases, organizations turn to
are also primarily based on relational technology the variety of data management solutions grouped
and models (49% currently using). Application- under the NoSQL umbrella.
specific databases are frequently based on relational
technology. Many applications offer managed Columnar (or column-oriented) databases, a
reporting and analytics functionality based on variation on relational systems, are often included
relational databases; 50% are currently using applica- with NoSQL systems. Columnar databases store
tion-specific databases and 25% plan to use them. data by column rather than by row to deliver
faster query performance for analytics by
Current and planned use of in-memory database eliminating wasted processing from large table
technology is substantial. To accelerate access and scans. Figure 7 shows that 36% are currently using
improve performance, many organizations deploy columnar databases, which is about the same
in-memory databases. These have traditionally percentage (35%) as in our 2022 research. In 2022,
used relational models and technologies, but however, only 24% planned to use a columnar
today columnar databases are often in-memory database and 25% said they were not using
as are other types of NoSQL databases. As one and had no plans to use one. This report’s
memory space has expanded, organizations are research notes a rise to 35% planning to use a
able to store entire data warehouses, analytics columnar database, with only 16% saying they are
data platforms, data marts, and online analytical not using one and have no plans to use one.
processing (OLAP) cubes in memory.
A substantial number of NoSQL systems are
Some organizations couple in-memory OLAP key-value or document databases; 31% are
with a semantic layer to optimize access to data currently using one and 33% plan to use one.
in all locations and use the cube effectively to Numerous types of items could be stored as
collect and store pre-aggregated data structures, documents as required by various use cases
dimensions, and measures for faster analytics. that range from OLTP workloads to real-time,
Figure 7 shows that 35% are currently using an automated analytics embedded in applications.
in-memory database and 30% plan to use one. Document database solutions offer expressive
query languages and multiple levels of indexing
Adoption of columnar and other NoSQL systems to support diverse types of queries. Key-value
is growing. Growth in semi- and unstructured data databases (or stores), using a simple data model
and demand for complex analytics has focused of keys and values, similarly give developers
attention on non-relational systems such as NoSQL flexibility to meet a variety of application and
data management. Although many organizations analytics requirements.
use simple data or file storage systems to hold
a variety of data (55% currently use data or file Simplicity is part of the appeal of spreadsheet-
storage for data management and 29% plan to like DataFrame data structures for storing and
use them), most organizations eventually need interacting with a variety of data sourced from
substantial data models and enterprise data SQL and NoSQL files. DataFrames are used in data
access and management capabilities to optimize science, including for AI/ML projects. They are
Columnar database
36% 35% 16% 13%
Based on answers from 171 In-memory database
respondents. Ordered by “currently
35% 30% 19% 16%
using” responses.
Content management system
34% 40% 15% 11%
Currently using Mainframe data management
32% 29% 24% 15%
Plan to use
Key-value or document database
Not using; no plans
31% 33% 21% 15%
to use
Graph database
26% 29% 27% 18%
Delta Lake
19% 29% 33% 19%
the primary data types used in the pandas Python Graph databases are NoSQL systems offering
data analysis library and can also be used with advantages over traditional relational databases for
other languages such as R and Scala. Unlike single- storing networked data relationships on a large and
location spreadsheets, a DataFrame can function complex scale. Visual graph query languages enable
as a data structure shared across a distributed users to avoid complex SQL programming and
computing environment. In Figure 7, 31% say they accelerate exploration and repeatable discovery of
are using DataFrames and 29% plan to use them. complex data relationships. Figure 7 shows that 26%
are currently using a graph database and 29% plan
to use one.
Cloud Data Migration and as-is to the cloud. For example, an organization
might keep the same data warehouse model
Management and ETL routines using the same underlying
Cloud data migration is a critical part of data software but deploy it on a new cloud provider’s
management modernization for organizations of platform. Organizations choose this approach to
all sizes. Organizations want to take advantage of migrate more rapidly. However, it is important to
pay-as-you-go pricing models that offer elastic balance speed benefits with taking advantage of
scalability. With storage and processing resource modernization opportunities with the new cloud
allocation in tighter sync with business needs, platforms and services.
organizations are better prepared to respond to
growth in data volume or the need for faster workload With phased cloud migrations,
processing to accomplish seasonal or unanticipated organizations can test and
requirements. Organizations are also taking validate related components
advantage of cloud elasticity for AI/ML workloads and learn from experience to
and expansion in self-service BI dashboards and improve succeeding migrations.
daily operational reporting.
Hybrid data environments that comprise
TDWI asked research participants to choose on-premises and cloud-based systems are
which of a set of statements best describes common; 17% say that they plan to continue
their organization’s status regarding cloud data having a hybrid environment. In some cases, this is
management and migration. We can see that due to local data residency laws and access control
organizations are adopting a range of strategies. requirements that make it impossible to migrate
No one approach is dominant. data assets. In other cases, organizations want
to continue running production applications and
About one-quarter are focused on phased data platforms because they are working well and
migrations to move and modernize most or all their have high satisfaction rates. Some organizations
data systems to cloud platforms (26%; figure not will keep legacy systems in place but choose a
shown). Among larger companies with $1 billion cloud-first strategy for any new systems. However,
or more in revenues, the percentage increases to our research finds that only 11% of organizations
37%. With this approach, organizations migrate surveyed for this report are adopting this strategy.
pieces incrementally. In each phase, organizations
can test and validate related components and learn
from experience with each phase how they can Data Platforms and Patterns:
improve succeeding migrations. Organizations Current and Planned Usage
need visibility into application data dependencies
The research in Figure 7 offered a view of current
to avoid disrupting use cases that are unrelated
and planned usage of different data management
to data assets, data integration processes, and
technologies and models. In Figure 8, we take
applications they are migrating.
a different perspective and examine the status
and plans for data management platforms,
The second-largest percentage currently uses
repositories, and patterns for supporting BI,
or plans to use a lift-and-shift strategy (22%)
analytics, AI/ML, and data applications.
to migrate on-premises data and workloads
TDWI RESEARCH 24 tdwi.org
Achieving Scalable, Agile, and Comprehensive Data Management and Data Governance
The largest percentage of organizations surveyed Unifying data warehouses and data lakes. Indeed,
uses a data warehouse on premises (67%). Nearly an important current trend is toward deploying
half have a data warehouse in the cloud (48%) a unified data platform such as a data lakehouse
and 38% plan to implement a cloud-based data that integrates data warehouse and data lake
warehouse. (Note that research participants capabilities. Figure 8 shows that 22% are currently
provided an answer for each type of system or using a unified cloud data warehouse and data
pattern listed in Figure 8; we did not ask for a lake and 46% plan to use one, which is the highest
choice between, for example, an on-premises or “plan to use” percentage in the figure.
cloud data warehouse.)
A data vault is an established data modeling
Across TDWI research, we have seen a strong design pattern that some organizations view
trend toward cloud data warehouses. However, as a flexible alternative to traditional data
the results here show that many organizations warehouse and star schema designs. Figure 8
are continuing to work with on-premises data shows that a data vault is currently in use by 20%
warehouses. Many organizations indicate that they of organizations surveyed. About one-third plan
have both. Among organizations that are taking to use it (34%). Our interview research finds that
a phased migration path to the cloud, 79% are some organizations use a data vault as part of
currently using a data warehouse on premises. Yet, their unified data lakehouse strategy.
even among these organizations, 64% also have a
data warehouse in the cloud (not shown). Cloud data lakehouses typically employ massively
parallel processing (MPP) database engines
As we saw in our 2022 research, more that can efficiently scale computing power to
organizations have a data lake in the cloud (41%) fit workload requirements. Organizations could
than have one on premises (30%). One-third of eliminate separate staging areas and choose data
research participants (33%) say they plan to use lakehouses as their platform for extract, load, and
a cloud data lake. TDWI defines a data lake as transform (ELT) processes, which is a variation
a design pattern and architecture optimized to on traditional ETL. Organizations can target data
capture a wide range of data types, both old and lakehouse storage for collecting and staging data;
new, at scale. The purposes of a data lake are then, for the transformation, the lakehouse uses
manifold and can vary among organizations. the power and scale of in-database processing.
In general, unlike data warehouses that operate with Unified data platforms are
well-defined preprocessing for data transformation, adopting evolving open
formatting, and cleansing to achieve trusted,
source file formats such as
carefully structured data, a data lake lets workloads
Apache Iceberg and Delta
dictate how they need to categorize, cleanse, and
otherwise prepare data for analytics and AI/ML.
Lake storage framework
Some organizations use a data lake as a staging to enable standardized
area for data transformation and cleansing before support for all workloads.
loading data into a data warehouse.
Research in Figure 9 highlights the urgency to data, and providing views of it from a single point
accelerate data management processes. More of access.
than one-third say it is a top priority to reduce
delays in adding new data or modifying existing Data silo growth challenges represent a key
data (38%). Just over one-quarter of respondents reason behind consolidation of fragmented data
(26%) are looking for alternatives to slow data into a single, unified data platform. Requiring
movement, copying, and replication. As data users to locate and access data across numerous
volumes grow, these processes can become costly silos causes delays. Data silos make it difficult for
bottlenecks, which motivates organizations to users to gain a single view of the truth: that is, all
evaluate new practices. data relevant to topics of interest. Thus, we see in
Figure 9 that 34% of respondents put a priority on
Some organizations use change data capture eliminating and consolidating disparate data silos
(CDC) technologies to capture only changes to onto a single data platform. Organizations often
data and data structures and provision these cannot consolidate all data silos immediately.
continuously rather than in large batch processes. They must prioritize which ones will most reduce
Other use cases benefit from data virtualization, data management complexity and costs and make
federation, and data fabrics to reduce data the data environment easier to govern.
movement by connecting to and querying
distributed sources, aggregating the resulting
BI or analytics platform
Which of the following 62% 26% 8% 4%
types of data management
Data warehouse in the cloud 2%
platforms, repositories, 48% 38% 12%
or patterns are currently
Operational data store
in use or planned for use 47% 28% 15% 10%
by your organization to
Data lake in the cloud
support BI, analytics, 41% 33% 15% 11%
AI/ML, and data
Data streaming, including real-time data ingestion
applications? 40% 35% 13% 12%
Currently using Unified cloud DW/DL platform (e.g., cloud data lakehouse)
22% 46% 18% 14%
Plan to use
Data Vault techniques
Not using; no plans 20% 34% 24% 22%
to use
In the following sections we will examine data Stateless Representational State Transfer (REST)
integration and distributed data management. APIs offer the most established approach and
Modernization in these areas is key to addressing are popular for their simplicity and flexibility.
data management priorities and solving However, an alternative query language for
challenges holistically. APIs, GraphQL, is growing in popularity among
application developers and data analysts.
45% and 44% of organizations respectively. With connect to sources and stream raw, even real-time,
ELT, organizations build data pipelines to load (or data into a data lake or data lakehouse. This is
replicate) data into a target repository before data primarily for use by data scientists performing
transformation, thereby saving data movement to predictive analytics and running AI/ML programs.
a separate staging area. The research shows that Other pipelines provide fuller data profiling, quality,
traditional ETL, however, is still important and is and validation steps as needed by workloads.
the appropriate choice for certain workloads.
Data preparation and preprocessing have a
Data pipelines, which often contain ETL or ELT significant role in data integration. The data
functionality, are in use by 43% of organizations and pipeline steps mentioned above (plus others
35% plan to use them. Some data pipelines simply such as data enrichment) fall under the umbrella
of data preparation and preprocessing. Data Nearly as many are currently using data streaming
preparation processes focus on determining (e.g., real-time) technologies (33%) and more
what the data is and improving its quality and are planning to use them (40%). Data streaming
completeness. Organizations use data catalogs solutions offer continuous flows through pipelines
in data preparation steps to standardize how that gather and process data as sources generate it.
different users define and structure the data. Organizations apply real-time analytics and AI/ML
to track situations, understand patterns, and answer
The life cycle of preparation and preprocessing complex business questions.
steps starts with collecting and consolidating
data and continues through transformation and Over a third (36%) are currently
enrichment to make data useful for purposes using a data catalog or metadata
such as reporting, dashboards, OLAP analytics,
management system and 47%
and AI/ML. Self-service data preparation is a
plan to use one, which is the
critical trend for empowering nontechnical users
as well as data scientists to be less reliant on IT
highest “plan to use” percentage
developers and analysts. for any item in Figure 10.
Two out of five respondents (40%) in Figure 10 say Sizeable percentages use data intelligence
their organizations are using data preparation or technologies. Data intelligence consists of
preprocessing tools and 34% are planning to use knowledge about diverse data built on metadata
them. Data preparation and preprocessing steps resources contained in a repository such as a
can be time-consuming and involve considerable data catalog. As shown in Figure 10, 36% are
manual work, which makes tools that automate currently using a data catalog or metadata
steps and integrate preparation with data pipelines management system and 47% plan to use one,
highly valuable. Some organizations are employing which is the highest “plan to use” percentage for
data fabric preparation and delivery layers to any item in the figure.
simplify data exploration and transformation in
heavily distributed data environments. Along with accurate data definitions and metadata,
shared data intelligence systems will manage and
Options for accelerating access to timely data include data profiles, information about the data’s
are popular. Organizations are currently using location, taxonomy and classification documents,
or are planning to use several technologies that calculations, and data lineage information
enable them to bypass lengthy data integration about sources and data ownership. Significant
and preparation steps to allow users to access percentages of organizations surveyed are using
fresher, more frequently updated data. More than solutions based on open source standards. Nearly
one-third are using CDC technologies (36%), and one-third (30%) are currently using Apache Atlas,
39% plan to use them to capture changes to data a data governance and metadata framework; 25%
continuously rather than force users to wait for are planning to use it. About one-quarter are
batch updates to historical data. either currently using (23%) or planning to use
(25%) Metastore, an open source Apache Hive
metadata repository typically used with data lakes.
Business glossary
Based on answers from 167 40% 42% 12% 6%
respondents. Ordered by “currently
using” responses. Data preparation or preprocessing tool
40% 34% 14% 12%
Data fabric
24% 38% 23% 15%
Knowledge graphs
24% 32% 25% 19%
Figure 10 also shows strong interest in data lineage management (CIM) system to capture, manage,
tools, which are incorporated as capabilities in and govern customer data while 20% are using a
some data catalogs and data integration suites or multidomain MDM system for this purpose.
are offered as standalone data intelligence tools.
TDWI finds that 28% are using data lineage tools
Objectives for Improving
now and 45% plan to use them. We will discuss key
goals with data catalogs, business glossaries, and Data Integration
metadata management—and levels of satisfaction in With the preceding discussion of current
achieving them—later. and planned data integration systems in
mind, we now turn attention to improvement
Also contributing to data intelligence are priorities. Improving data quality, accuracy,
business glossaries (40% currently use and 42% and completeness is the top data integration
plan to use them) and master data management objective (51%; figure for this section not shown),
(MDM), which 36% currently use and 40% which aligns with what we reported for data
plan to. Business glossaries collect, organize, management modernization. Organizations
and coordinate business terms and definitions should create processes for determining data
to provide clarity and data context across quality and use automated tools to monitor
organizations. Business glossaries may also changes in the data. AI-infused automation in
contain business rules, business policies, data data intelligence tools is enabling organizations
sharing agreements, and other business assets to move past the limits of manual data curation.
to be used in conjunction with technical data
assets. Modern data intelligence solutions merge Reducing the amount of data movement,
business glossaries and data catalogs together. replication, and copying is an important objective
for 34% of organizations surveyed. As data
MDM systems document and coordinate master volumes rise, these processes often cause data
data, which is higher-level data about products, latency and therefore slower time to insight.
suppliers, customers, and other business entities. Automating data movement to save cloud costs
MDM enables organizations to establish “golden” and optimize processing is a top objective
master data definitions that are consistent, valid, for 27% of organizations. As we noted earlier,
and accurate across applications and services. heavy data movement, replication, and copying
Many organizations today are establishing MDM also raises data governance concerns because
systems that integrate master data from different of potential exposure of sensitive data. Cost
domains (most commonly customer and product also rises. A primary driver behind interest in
information) to reduce management overhead and data virtualization and data fabrics is reducing
enable cross-domain data access and analytics. complexity and costs due to data movement,
replication, and copying.
Figure 10 shows that 29% are currently using a
product information management (PIM) system Cloud migration of data integration tools and
and more (32%) are planning to use one. In systems is a priority. One-third of organizations
research for this report not shown here, TDWI (33%) are seeking to migrate data integration,
finds that 29% are using a customer information pipelines, and data intelligence to the cloud. It
is important to gather knowledge about existing of cloud data resource consumption is key to
on-premises data integration to inform cloud keeping spending in line with overall financial
migration teams about dependencies and best plans. As part of cost optimization, some
strategies for migration. Knowledge about organizations establish cloud financial operations,
current ETL performance will help organizations or FinOps. Inspired by DevOps practices, FinOps
eliminate unnecessary workloads—a priority of frameworks facilitate collaboration among
25% of research participants—as they shift data stakeholders in engineering, finance, technology,
integration processes to the cloud. Organizations and business domains regarding spending
can use migration as an opportunity to switch to decisions. Monitoring visibility into real-time
ELT, which for 20% is a top objective. data about costs, use, and allocation is essential
to FinOps. Tools in the marketplace are evolving
In some cases, using phases to migrate data to provide granular visibility and use AI/ML to
integration processes in smaller units is the best generate recommendations.
approach, but in others it may make sense to
migrate all related processes and subprocesses
together. With either strategy, using automated Data Catalogs:
tools will increase consistency and accelerate
testing of newly migrated processes.
Current Satisfaction
and Where to
Finally, 32% cite the importance of reducing costs
and improving monitoring of cost drivers. Large Expand
organizations often have thousands of ETL, data
pipeline, data streaming, and data preparation Data catalogs, business glossaries, and metadata
processes. Modernization should include management systems share capabilities. They
monitoring to improve visibility into the value of can play complementary roles, with business
data integration processes and whether resources glossaries specializing in business definitions.
need to be reallocated to optimize the most In the marketplace, data catalogs, business
important processes and eliminate those that are glossaries and metadata management systems are
no longer needed. generally converging into single, all-in-one data
intelligence solutions.
Performance optimization focuses on reducing
costs through higher efficiency. Modern, automated With organizations supporting
tools enable performance optimization without more diverse users and
user intervention. Optimization should ensure that workloads, a data catalog
workloads have the resources they need and show provides a necessary foundation
expected rates of consumption. Once you have
for both defensive and
granular visibility and proper controls in place, you
offensive data governance.
can use tool capabilities in cloud data platforms to
identify inefficient spending.
A centralized metadata repository such as a data
catalog helps organizations reduce confusion
Cost optimization and FinOps practices can be
about the data’s quality and consistency and
valuable. Maintaining continuous awareness
enhances data curation and governance. A data inventories, which is necessary for
complete and accurate data catalog includes the regulatory compliance aspects of data
what the data means and represents, the data’s governance. Data catalogs can also manage
origins, where to find it, who is responsible for data governance and security rules that
it, and information to help the user quickly drive constraints in data pipelines and
understand if it is relevant to a user’s objectives. applications. Overall, 63% are satisfied with
With organizations supporting more diverse their data catalog, business glossary, or other
users and workloads than ever before, a data metadata management system for improving
catalog provides a necessary foundation for both data governance and regulatory compliance,
defensive and offensive data governance and plays with 22% very satisfied. The percentages for
an important role in making it easier to leverage inventorying data assets and documenting
and share high-value, trustworthy, and relevant objects available to users are similar; 21% are
data and data intelligence with other users. very satisfied and 42% are somewhat satisfied.
The research in Figure 12 shows the importance multidomain MDM is currently in use by 28%
of data intelligence tools and systems such as a of organizations surveyed and 39% plan to use
data catalog and semantic layer; 32% are currently this solution. One-quarter of respondents (25%)
using these to address distributed data challenges say their organizations are using knowledge
and 52% plan to use them. Earlier, this report graphs to visualize and analyze distributed data
discussed the value of location-transparent relationships and 38% plan to use them.
enterprise data catalogs. Additional semantic layer
technologies play an important role in ensuring Accelerating analytics and AI/ML depends on
the quality and consistency of business repre- distributed data management and integration.
sentations of distributed data for BI reporting, Nearly one-third are currently unifying analytics
calculations, and multidimensional modeling. to work across domains, query engines, and
storage repositories and 42% plan to do this.
Some organizations are using enterprise Access to all relevant data from across distributed
knowledge graphs and ontologies to improve systems is critical to analytics and AI/ML. A major
discovery of how distributed data sets relate to challenge is to hide the technical complexity of
each other and to definitions of higher-level accessing each system so that nontechnical users
entities such as people, places, and things and data scientists can move faster to gain single
that might be contained in MDM. In Figure 12, views. Semantic data integration that brings
Figure 11 Make it easier for users to search for and find data
32% 36% 13% 12% 7%
catalog, business glossary, Inventory data assets; document objects available for users
or other metadata 21% 42% 19% 9% 9%
together data catalogs and data virtualization A key goal of a data fabric is to take advantage
layers is an important strategy. Figure 12 shows of opportunities to automate tedious and
that 30% are currently using a data virtualization repetitive data engineering tasks. Solutions
layer and 40% plan to use one. are also focusing on using AI/ML to generate
recommendations for locating relevant data and
A similar percentage of respondents are currently streamlining preparation tasks. Organizations are
using a distributed SQL engine to query data in also automating application of data governance
multiple sources (28%) or planning to use one constraints within data fabrics.
(42%). Often based on open source Presto, a
distributed SQL engine can use a single SQL query Data intelligence is critical to data fabrics and
to retrieve and interact with large volumes of data data meshes. Data catalogs, active metadata
at multiple sources using models and systems management, and semantic data integration
ranging from traditional relational DBMSs to technologies play important roles in providing
NoSQL systems and data lakes. shared knowledge about distributed data for
faster, easier, and more consistent access, data
Data Fabric and Data Mesh preparation, and governance. Data intelligence
is also important to creating a decentralized data
Architectures mesh architecture, which 21% of organizations
Figure 12 shows that over one-quarter of surveyed are currently using.
organizations are developing a data fabric to view
and query distributed data (27%). A data fabric The data mesh has emerged as a leading strategy
provides a location-transparent layer, usually built for enhancing the value of distributed data and
with data virtualization, to make it easier and faster avoiding the costs, delays, and management
to access distributed data and manage and govern headaches of copying and moving massive
it holistically. The fabric should be extensible and amounts of data. Most data mesh strategies focus
modular to ensure less dependence on central IT. on strengthening distributed data ownership
and governance within domains: that is, where
Instead of a piecemeal approach, the goal users with a direct relationship to the business
of a data fabric is to have an organized and priorities build, manage, and share data products
orchestrated strategy for onboarding new data across the organization. A data mesh can
across the distributed environment. It enables complement a data fabric and take advantage of
organizations to set up policies and standards both data virtualization and open data lakehouses
for governed data pipelines and data curation to enable distributed data governance from
processes and for deploying computational centralized policies and increase the value of data.
resources to maximize benefits.
A data mesh supports “data as a product”
development. It encourages teams working in
Instead of a piecemeal
domains to treat data sets as products that have
approach, the goal of a data “customers” who could be business users, data
fabric is to have an organized scientists, data engineers, or external users of
and orchestrated strategy for monetized services, potentially made available
a distributed environment. through a data marketplace or exchange. The
To address distributed Use a data catalog, semantic layer, and/or data intelligence tools
data challenges, which of 32% 52% 9% 7%
the following strategies Unify analytics to work across domains, query engines, and storage repositories
for unifying management 32% 42% 13% 13%
your organization? Use a distributed SQL engine to query data in multiple sources
28% 42% 19% 11%
Based on answers from 169 Develop a data fabric to view and query distributed data
respondents. Ordered by “currently 27% 42% 17% 14%
using” responses.
Use knowledge graphs to visualize and analyze distributed data relationships
25% 38% 24% 13%
Currently using
Create a decentralized data mesh architecture
21% 38% 25% 16%
Plan to use
notion is that organizations need to take the development life cycles. Organizations use DataOps
same care in developing and operationaliz- and data observability capabilities to monitor and
ing data sets and data products that they do in manage progress toward operationalization. They
delivering traditional products to customers. can spot bottlenecks and quality problems that are
Depending on the use case, customers may need not only slowing progress but are also affecting the
data sets enriched with metadata and additional organization’s ability to achieve business objectives.
capabilities. Data products increase in value
when people can trust them for decisions and We asked research participants if their
collaborate to achieve business objectives. organizations are currently using or planning to
use DataOps practices and whether they have had
DataOps for Planning and experience in using DataOps or related methods
(see Figure 13). Only 15% say they are currently
Scalability using some or all DataOps practices successfully.
With development and production of data products, This is somewhat low, but we are still in the early
data pipelines, and analytics, organizations need stages of maturity in use of DataOps.
frameworks to provide a structure for planning,
orchestration, and scalability. DataOps builds
on agile and DevOps methodologies to provide
a structure that has similarities to software
Only 15% say they are currently • Increasing automation throughout data
integration and preparation (37%). Using
using some or all DataOps
a framework plus modern data observability
practices successfully. This is
tools can help organizations understand
somewhat low, but we are still which phases of integration and preparation
in the early stages of maturity. processes need automated tools to reduce
manual work and increase consistency.
In the first stages, organizations may adopt only
some elements of the full framework. As the • Ensuring data governance throughout
figure shows, more organizations have experience data integration and preparation (36%).
in using established agile and DevOps methods Data governance is a key focus of DataOps.
(26%). A significant percentage are planning Visibility through data observability
to use DataOps but have not yet begun (21%). monitoring tools can enable organizations
Smaller percentages say they are not having to understand the complete health of data
success with DataOps or have no experience or environments and be able to address both
interest in it. defensive and offensive data governance
issues quickly.
To gauge what issues are driving organizations
to use a framework such as DataOps, we asked • Streamlining delivery of data sets,
research participants what their most challenging models, analytics, and other artifacts
issues are in improving delivery of scalable and (31%). Organizations cannot streamline
repeatable value from data, code, and analytics delivery when quality is a problem. Agile,
models. Here are the top issues for respondents DevOps, and DataOps all help organizations
(figure not shown): spot quality problems sooner, before they
have a negative impact on downstream
• Improving communication and collaboration data products, analytics models, and other
among stakeholders (45%). Collaboration artifacts. Data observability tools can improve
is critical to ensuring quality, sorting out end-to-end visibility to identify and resolve
dependencies, and eliminating redundancy. problems hindering the speed of delivery into
Improving collaboration and communication production.
are central objectives for establishing
continuous feedback loops in the DataOps
framework.
Data Objectives for Optimizing
Analytics and AI/ML
• Having flexibility to adjust when
With demand rising for accelerated development
requirements change (39%). Projects often
and deployment of advanced analytics and AI/ML,
lose momentum and focus when users change
it is critical to address data access, preparation,
requirements. Agile and/or DataOps principles
and management problems that prevent projects
for shorter iterations can help prepare project
from achieving success. To close this report’s
teams to handle new requirements.
research, we asked participants to identify
their highest priority objectives for optimizing
analytics and AI/ML workloads and increasing with some research participants looking to reduce
business value (see Figure 14). delays in preparing data processes; 18% say it is a
priority to set up accelerators and predefined data
The priority shared by most organizations processes for autoML.
surveyed highlights the importance of data
democratization: making it easier for users Using automation to accelerate data insights is
to discover and access all types of data (45%). a shared priority. Nearly one-third of research
Organizations are seeking solutions and participants prioritize automating discovery
practices that will empower users to uncover of actionable data insights (31%). Technology
insights drawn from exploration of relationships advances across data integration, data intelligence,
between sets of structured, semistructured, and and front-end data applications are aligning to
unstructured data. Nearly one-quarter prioritize enable automated discovery. In Figure 14, 22% say
making it easier and faster to discover data that they regard it as a top priority to augment BI,
relationships (23%). Organizations will need to KPIs, and metrics with AI-driven recommenda-
take measures to improve data availability and tions. Surfacing timely, intelligent recommenda-
support for self-service ad hoc querying. tions to augment users’ dashboards can accelerate
informed business decisions. To enhance
The second most common objective is to make predictive understanding of future directions in
it easier for non-coders to search, access, and sales, operational expenditures, and business
prepare data (42%). This again highlights the profitability, an even larger percentage say that
trend toward data democratization; organizations enriching business forecasting with AI/ML insights
do not want to unnecessarily limit deeper is important (30%).
data interaction to only technically skilled
programmers and developers. One-quarter of One in five organizations
research participants see enabling self-service surveyed say that enabling
data preparation (e.g., data loading, blending, and use of LLMs and generative
transformation) as one of their keys to optimizing AI is a top objective (20%).
analytics and AI/ML workloads. Along with making
data discovery and access easier, organizations Generative AI is a rising objective. One in five
need tools that empower nontechnical users, organizations surveyed say that enabling use of LLMs
business analysts, and data scientists to prepare and generative AI is a top objective (20%). Across
data as they see fit with less dependence on IT. industries, there is clearly tremendous interest today
in the transformational capabilities of generative
Trusting predictive models and the results of AI—a subset of AI/ML that offers techniques for
AI/ML depends on appropriate data curation. generating new content in a variety of forms,
A significant percentage of respondents are including but not limited to text, code, images, and
looking for improvement in curating diverse speech. Generative AI requires robust data platforms
data sets to ensure unbiased results (28%). to support LLMs and similar ML models at the core
Organizations need AI model governance tools of generative AI. Organizations will need to revise
that provide visibility into training data selection defensive and offensive data governance rules and
and curation to avoid “black box” models (that policies to guide growth in generative AI.
are not transparent). AutoML support is a focus,
augmented analytics, including generative AI. empowering users to do more of their own work
Reducing time and costs associated with data rather than depend entirely on IT. In this report,
collection, preparation, and provisioning for we find that demand for self-service capabilities
feature engineering is a priority. Organizations continues to grow. However, organizations should
should evaluate automated tools and examine increase the role of shared resources in data
processes to rationalize those that are redundant. integration, transformation, and pipelines—such
Ease of use and automation are advancing in as data catalogs and the computational power of
solutions to make it easier for non-coders to cloud data platforms—to improve speed, scale,
discover and prepare data. and quality, and reduce redundancy.
components to determine if they can support a data empowerment and reduced dependence on
fabric and modernize where needed. centralized data platforms and enterprise IT. A
data mesh champions decentralization to enable
Consider a data mesh architecture. With domain teams closest to business objectives.
business domains across organizations As they empower domains, organizations
increasingly focused on developing various also need to enable cross-domain projects,
types of data products, they are seeking enterprise data governance, cost management,
Alation is a leader in enterprise data intelligence Dataiku is a platform for Everyday AI, enabling data
solutions, enabling self-service analytics, cloud experts and domain experts to work together to
transformation, and data governance. More than build data into their daily operations, from advanced
450 enterprises, including more than 35% of Fortune analytics to generative AI. Together, they design,
100 companies, build data culture and improve develop, and deploy new AI capabilities, at all scales
data-driven decision-making with Alation, including and in all industries. Organizations that use Dataiku
Cisco, Nasdaq, Pfizer, Salesforce, and Virgin enable their people to be extraordinary, creating the
Australia. The company’s data intelligence platform AI that will power their company into the future.
allows anyone in an organization to self-service
governed data so that anyone can find, understand, More than 500 companies worldwide use
and trust data for decision-making, to train AI Dataiku, driving diverse use cases from predictive
models, and accelerate digital transformation. As maintenance and supply chain optimization
a pioneer of the data catalog market and creator of to quality control in precision engineering to
the first-ever machine learning-driven data catalog, marketing optimization, generative AI use cases, and
Alation combines machine learning with human everything in between.
insight to solve a key problem faced by almost every
organization: the need to know exactly what data
they have, where it lives, if it’s trustworthy, and how
to make the best use of it. Alation’s data intelligence
platform inspires collaboration and makes it possible
for everyone in an organization to think data-first
before making decisions. The company recently Denodo is a leader in data management. The
surpassed $100M in ARR, closely followed by raising award-winning Denodo Platform is the leading data
a $123M Series E round, led by Thoma Bravo, Sanabil integration, management, and delivery platform
Investments, and Costanoa, with participation from using a logical approach to enable self-service BI,
Databricks Ventures, an all-equity tranche that data science, hybrid/multicloud data integration, and
values Alation over $1.7 billion—an impressive 1.5 enterprise data services. Realizing more than 400%
times higher than the company’s previous valuation ROI and millions of dollars in benefits, Denodo’s
in a challenging economic climate. customers across large enterprises and mid-market
companies in 30+ industries have received payback
in less than 6 months.
Erwin by Quest is a trusted industry leader, Hitachi Vantara is part of Hitachi Group. From
empowering organizations with data management transportation and energy to automotive systems—
and governance solutions. Erwin Data Intelligence the group focuses on sectors that make an incredible
unlocks deep insights and enhances data impact on our future. Understanding how data
governance, ensuring data assets are effectively enables mission-critical digital and industrial
harnessed. Erwin Data Marketplace fosters environments is what we do.
seamless collaboration and access to reliable
data, fueling agile decision-making across teams.
The Pentaho platform enables insights at scale into
Erwin Data Modeler helps simplify complex
data by operationalizing data management with
data migration challenges with efficient data
automation accelerating data discovery, onboarding,
architecture management. Erwin by Quest empowers
and tiering, as well as data pipeline orchestration
organizations to embrace the future of data
and analytics. This improves data quality and
management by empowering them to make informed
trust by discovering, identifying, categorizing
decisions and unleash the true potential of their data
structured and unstructured data into meaningful
assets while safeguarding sensitive information.
terms defined by customers, and standardizing
them through business glossaries. This helps
organizations meet their data governance gathering
and usage requirements. In addition, the analytics
tools provide users with observability of their
pipelines and can provide dashboards to ensure data
quality-driven outcomes.
As one of the world’s largest providers of enterprise Snowflake enables every organization to mobilize
application software, SAP strives to help every their data with Snowflake’s Data Cloud. Customers
business run as an intelligent enterprise to use the Data Cloud to unite siloed data, discover
ultimately help the world run better and improve and securely share data, power data applications,
people’s lives. The SAP Business Technology and execute diverse AI/ML and analytic workloads.
Platform accelerates innovation to unlock your Wherever data or users live, Snowflake delivers a
business potential. SAP BTP brings together single data experience that spans multiple clouds
application development, data and analytics, and geographies. Thousands of customers across
integration, and AI capabilities into one unified many industries, including 639 of the 2023 Forbes
environment optimized for SAP applications. Global 2000 (G2K) as of July 31, 2023, use Snowflake
Additionally, SAP data and analytics solutions enable Data Cloud to power their businesses.
organizations to easily integrate, model, manage,
and use data from anywhere. Using SAP’s modern Learn more at snowflake.com.
data stack, organizations retain the context of SAP
data and have a trusted foundation that extends -
planning and analytics across the enterprise to make
the most impactful decisions.
E info@tdwi.org tdwi.org