The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
The Forrester Wave™ - Enterprise Data Catalogs For DataOps, Q2 2022
Topics
MG Michele Goetz
Summary
In our 26-criterion evaluation of enterprise data catalogs for DataOps providers, we identified the 14 most
significant ones — Alation, Amazon Web Services, Atlan, Cloudera, Collibra, [Link], Google, Hitachi
Vantara, IBM, Informatica, Microsoft, Oracle, Talend, and TIBCO — and researched, analyzed, and scored
them. This report shows how each provider measures up and helps technology architecture and delivery
leaders select the right one for their needs.
As a result of these trends, enterprise data catalog (EDC) customers should look for providers that:
Address the diversity, granularity, and dynamic nature of data and metadata. According to a
2021 Forrester survey of data and analytics decision-makers, a top challenge for DataOps
whose firms are implementing or expanding agile development for data management is
managing data products. The EDC can overcome this challenge by becoming the system of
record to capture and manage the portfolio of data and analytics assets, policies, code, and
components throughout the data product lifecycle. Additionally, data product requirements
About Forrester Reprints [Link]
and backlogs have a home in the EDC to keep in sync with CI/CD and harmonize the
The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022
development and delivery processes between data engineers, data scientists, and application
developers.
Topics
Generate deep
Evaluation transparency of the nature
Summary and
Vendor path of data flow and delivery.
Offerings Adoption of
Vendor Profiles Evaluation Overview S
CI/CD practices by DataOps requires detailed intelligence of data movement and
transformation. EDCs must help data engineers visualize existing and future data sources and
integrations to support impact analysis, root cause analysis, bug fixes, and data policy
compliance. Machine learning that surfaces these issues through processing inspection and
SQL/code parsing to guide, or even suggest, next steps is critical to stay on top of the
complexity of modern insight applications.
Deliver a UI/UX that reinforces modern DataOps and engineering best practices. Data
engineering responsibilities are extending beyond data warehouses and data lakes to support
the development of insight applications. CI/CD practices and software engineering skills
require data administration workbenches designed to integrate with developer workbenches,
testing platforms, and data automation. Bidirectional communication, collaboration, and
workflows between the data and developer environments simplify data delivery to speed up
and scale data for analytics and solutions.
Evaluation Summary
The Forrester Wave™ evaluation highlights Leaders, Strong Performers, Contenders, and
Challengers. It’s an assessment of the top vendors in the market and does not represent the entire
vendor landscape. You’ll find more information about this market in our Now Tech: Enterprise Data
Catalogs For DataOps, Q1 2022 and other reports on enterprise data catalogs for DataOps.
We intend this evaluation to be a starting point only and encourage clients to view product
evaluations and adapt criteria weightings using the Excel-based vendor comparison tool (see
Figure 1 and see Figure 2). Click the link at the beginning of this report on [Link] to
download the tool.
Figure 1
Topics
Figure 2
Topics
Vendor Offerings
About Forrester Reprints [Link]
Forrester included 14 vendors in this assessment: Alation, Amazon Web Services, Atlan, Cloudera,
The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022
Collibra, [Link], Google, Hitachi Vantara, IBM, Informatica, Microsoft, Oracle, Talend, and TIBCO
(see Figure
Topics3).
Figure 3
Evaluation Summary Vendor Offerings Vendor Profiles Evaluation Overview S
Vendor Profiles
Our analysis uncovered the following strengths and weaknesses of individual vendors.
Leaders
Atlan is the tool of choice for DataOps and data product deployment. Atlan’s vision is to create
frictionless data product deployment through a single metadata and data automation platform.
The tool was built by data engineers and for data engineers, and Atlan’s customers represent
both digital natives and kitchen-table brands running their business on modern distributed
platforms. As a result, Atlan maintains a strong focus for continued innovation in this metadata-
driven data and application ecosystem. Three years in the market, it has shifted its strategy
from product building and market entry to pragmatic priorities matched to customer needs
and sustainable growth. Atlan will maintain its trajectory through continued expansion of its
strategic partner ecosystem and continued support and investment from notable VCs.
Atlan is
more than metadata and data governance, standing out from the competition. Key strengths
are connectivity, DAG, dbt, and collaboration for full DataOps support. Extensive integration
makes data sharing easy, flexible, and scalable within hybrid distributed ecosystems for
analytics and operational use cases. For example, its use of dbt extends data profiling and
lineage with advanced monitoring. Its advanced capabilities and use of DAG provides light
data virtualization and data unification (MDM-light), a modern alternative used in advanced
data organizations. Most noteworthy is its bidirectional services and communication with
developer and collaboration tools that simplify and speed up data development. However,
Informatica delivers
Evaluation Summaryholistically on DataOps
Vendorcapabilities.
Offerings Strong, steady, and pragmatic
Vendor Profiles is a Evaluation Overview S
winning strategy for Informatica. A data management veteran, Informatica has a large, deep,
and strategic partner ecosystem among technology and service providers. One reference
customer called this out as a differentiating factor for working with Informatica. New kids on
the block attempt to disrupt Informatica’s footprint, but Informatica’s product innovation and
planned enhances stay timed to client needs over shiny objects. Modern data capabilities will
lag, but platform enhancements will check the boxes for what a critical mass of customers
want and need. Product vision and market approach are tightly coupled, pragmatic, and
connected to solve the big picture data and data governance challenges for both CIO and
CDO objectives.
Informatica EDC is ready for modern deployment with the investments into
cloud native APIs and Kubernetes. Data quality and analytics governance are above par and
complete with the integration of their data quality and MDM tools, and native data preparation.
Axon, its data governance solution, will further extend DataOps for an end-to-end data
engineer and data steward workbench. Reference customers are loyal and give Informatica
the highest overall satisfaction marks for evaluated capabilities. Specifically, customers
highlight the user experience, data product support, and developer support. Customers say
metadata management, discovery and profiling, and monitoring and alerts are as expected.
Informatica EDC is a good fit for organizations extending their Informatica investment and
want DataOps enablement and democratization and a proven track record.
[Link] provides deep knowledge of the data estate. [Link] successfully navigates the
space between data engineers, stewards, and analytic professionals. Its product vision and
planned enhancements stand out and evolve through relationships with digital natives and
data startups, keeping one foot in innovation and the other in pragmatic requirements for the
current offering. With significant capabilities in place, planned enhancements focus on a better
UI/UX over innovation to move code-based capabilities to no-code and low-code to
democratize DataOps. [Link] also boasts a growing list of marquee customers and
deployment expansion. Partnerships with major technology vendors exist along with a notable
relationship with Deloitte, but its overall partner ecosystem is in growth mode and lags behind
others in this evaluation.
[Link] gives a deeper view of data for DataOps than most data
catalogs. A robust knowledge graph delivers strong metadata capture, management, and data
visibility. It embraces Python and SQL code capture from code libraries and data science
notebooks. It augments the UI with code views for granular data understanding. Integration
with cloud, data fabric, analytics, and DevOps environments is also above par with [Link].
However, data governance is still evolving — lagging behind other data catalogs — but it
supports data protection. Additionally, data products can be defined, but this requires
configuration. Reference customers appreciate the knowledge graph, elastic search, and data
virtualization capabilities; [Link] is a strong fit for organizations that need to translate data
product requirements to production-ready capabilities quickly and easily.
Strong Performers
Google gets active on metadata for data engineers but needs Dataplex to deliver. Google lifts
the distinction between data and metadata. Its approach centers on “everything is metadata”
to create modern digital applications. Its market approach and product vision address the
technical challenges of data and metadata within the data catalog and throughout GCP’s
Dataplex. It has a uniquely technical value proposition with planned enhancements for active
metadata specifically designed with modern and edge use cases and the DataOps practice in
About Forrester Reprints [Link]
mind. Google markets its support for a distributed data estate, but planned enhancements for
The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022
connectivity, multicloud, and governance still lag. While Google embraces a metadata driven
architecture,
Topics the catalog is a small component of Dataplex rather than a DataOps center of
gravity.
Google enables active metadata, a differentiating capability to enable DaaS and data
Evaluation Summary Vendor Offerings Vendor Profiles Evaluation Overview S
fabric. It addresses data at rest and data flows through sophisticated schema management
and compatibility assessments to ensure that data movement (i.e., batch, real-time, and
streams) resolves correctly between source and target environments, not typical in other tools.
For DataOps support, all services and processing is monitored through Apache Airflow. Data
governance is limited to security and privacy. Overall data quality and analytics governance
lag due to limited semantic support. Reference customers are excited by the promise of active
metadata but give the tool low satisfaction scores to fully deliver that capability. Google is a
good fit for firms that need a metadata foundation for cloud analytics on GCP and an
environment that can deliver greater data product sophistication in the future.
Microsoft delivers data understanding but lags in data governance support. Purview is the
focal point for data, metadata, and data governance in the wider 365 ecosystem and
distributed cloud estates of customers. Its accelerated and wide adoption by Azure and
PowerBI customers is attributable to its strong customer codevelopment program. Customers
influence Microsoft Purview’s planned enhancements and get to kick the tires within the
prerelease and beta programs. This tight customer relationship also shapes the innovation
strategy beyond the block-and-tackle of engineering data for data lakes and warehouses.
However, beta and early release capabilities frustrate reference customers who see desired
capabilities tested and rolled out in other regions first, resulting in missed opportunities.
The
current version of Purview is strongest in data discovery, profiling, and lineage, to
communicate data conditions for data engineers, BI, and governance teams. Purview’s
connectivity to Synapse extends the catalog functionality beyond metadata to facilitate DaaS
for real-time data flows, orchestration, and provisioning data to marketplaces and exchanges.
Data governance lags. Robust data quality and risk management support for regulations seen
in other evaluated tools is missing. Purview checks off the functionality boxes, but reference
customers indicate it has room to grow, even on the basics of metadata management. Purview
is a good fit for organizations getting started with Azure data and BI environments. It can help
them establish DataOps best practices with the ability to grow with the tool and influence new
capabilities as they mature.
Collibra delivers no-code data governance but lags in DataOps enablement. Collibra brings
data intelligence and governance to the data estate. Its penetration into enterprise data
governance programs gives DataOps teams a portal to understand data from a business point
of view, as well as data access in a marketplace. Collibra’s partner ecosystem and customer
base is diverse, demonstrating value for multiple use cases and industries. Notably, service
providers extend Collibra’s value with data product enablement. A strategic relationship with
Google is shaping planned enhancements for modern data and cloud enablement. However,
capabilities specific to DataOps requirements are not on its planned enhancements, as data
engineers are not the primary role for Collibra. Instead, Collibra’s go to market and product
strategy continue to focus on data consumers.
Collibra’s data governance capabilities are
above par, validated by reference customers. Data stewards give data engineers an easy path
to enable data governance in the data fabric by linking and cataloging data policies to
domains, data objects, roles, and Collibra extracts technical and logical information from
sources and pipeline tools through connectors to give data stewards and consumers profiling
and lineage across the data fabric. Data engineers need to configure objects and types to
accommodate the complete metadata of the technical landscape, data engines, and flows.
Oracle excelsSummary
Evaluation at real-time operational metadata but lags in integrated data
Vendor Offerings governance.
Vendor Profiles Part of Evaluation Overview S
Oracle’s Data Infrastructure, Data Catalog is a quiet powerhouse enabling business and
industrial use cases in modern complex ecosystems. Its data catalog is positioned for Oracle
customers with modern digital and industry use cases delivered through a strategic legacy of
technology and service provider partnerships. Oracle is placing bets on the Data Mesh market
trend, one of the first vendors to define data and architectural principles by business value. Its
data catalog is key to this strategy to harness metadata and drive distributed, adaptive
architectures and processes for IoT and edge computing systems. But sexier data offerings
such as autonomous data warehouse, edge, business applications, and cloud overshadow
planned enhancements and broader DataOps enablement.
Oracle’s current data catalog
delivers detailed metadata scans for data engineers to understand and navigate complex
systems and data flows. Its data preparation capability gives data engineers an easy method
to further profile and transform data and metadata. Data fabric, cloud, and analytics integration
provides strong bidirectional metadata and schema sharing within the Oracle ecosystem. But
the user experience needs improvement. Its UI/UX feels familiar for data engineers who are
used to OCI and ODI, and it has little DevOps integration and automation. Additionally, data
governance is controls oriented with minimal business data policy support to keep data
engineers connected to contextual data requirements. Oracle’s data catalog is a good fit for
organizations building their next-generation data platform on OCI. Oracle declined to
participate in the full Forrester Wave evaluation process.
Talend delivers data assurance in the data fabric but lags in business context. Included in its
overall data fabric platform, Talend positions its data catalog to address data challenges as a
team, helping to organize, scale, and create better data accessibility. Built for data engineers
from the start, Talend has not looked back, even as other competitors augmented their
catalogs for nontechnical users. Thus, product vision and planned enhancements continue to
deliver on what DataOps teams need for data management and development capabilities.
However, its market approach minimizes the data catalog as a strategic capability even as
Talend increasingly deploys data capabilities on modern metadata-driven, distributed, cloud,
and IoT environments that require active metadata strengths.
Talend’s data and metadata
capture, data profiling, and lineage provide a strong and holistic technical and logical view of
the data, pipelines, and schemas. Forrester client inquiries indicate data engineers easily
address quality, compliance, root cause, and impact analysis. Compared to standalone and
some data management catalogs evaluated, the catalog works natively with its data
preparation and data quality tools to scale trusted data. While other vendors add data product
support, Talend does not directly support those business artifacts and objects. It also lags in
support of data marketplace and exchange capabilities that organizations seek for data self-
service. Organizations investing in Talend’s data fabric offerings get value from an integrated
data catalog tuned for DataOps to create a single source of trust, truth, and control of data in
the data fabric. Talend declined to participate in the full Forrester Wave evaluation process.
Alation excels at ML and SQL support, but analytic UX inhibits DataOps requirements. Alation
delivers a data intelligence platform to enable data discovery, literacy, and governance for
analytics professionals and data stewards. Alation is addressing the DataOps practice by
actively working on enhancements to support the needs of data engineers to bring them
closer to their business counterparts. A product vision for DataOpslags behind others in this
evaluation. Investments in dbt for pipeline ingestion, and recent acquisition of Lyngo, position
AlationReprints
About Forrester to address end-to-end data governance and
[Link] engineering processes. But Alation’s
planned enhancements focus on analytics professionals and data stewards. Alation’s
The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022
customers are loyal, but limited capabilities for cohesive data management and governance
between
Topics business and IT has organizations kept Alation concentrated in analytic functions.
Alation’s machine learning is one of the most sophisticated in this market. Its data intelligence
Evaluation Summary Vendor Offerings Vendor Profiles Evaluation Overview S
classifies and labels data for consumption and governance. And its profiling capabilities
examine SQL statements and data labels to assist and govern SQL statement building and
ensure that correct sources and data elements are used. Reference customers are particularly
happy with the ability to build, deploy, and manage data products through self-service. An
added, unique surprise is a notebook-like experience for the data steward for documentation
and communication. However, the UI/UX is not friendly for technical teams, and customer
references confirm a lack of deeper lineage and integration with data quality tools. Alation
provides organizations a worthy tool to scale out data self-service that removes data product
delivery bottlenecks, opening DataOps resources for other data needs.
Hitachi Vantara excels at data lineage but must work to transition to DataOps. Hitachi Vantara
delivers DataOps through its LumadaDataOps Platform. The data catalog sits at its core to
inventory, analyze, and govern data artifacts. Hitachi Vantara differentiates in this category by
addressing both enterprise use cases and industry operational technology use cases. Its
product vision and planned enhancements strike the right balance to handle typical business
data and IoT. However, data catalog integration with other tools and acquired solutions are still
disjointed and lag behind the vision for DataOps. While Hitachi Vantara has the right strategy
in place, customers will likely hit some bumps until the platform is more integrated.
Incorporating Waterline Data and IOTahoe acquisitions in the catalog deliver robust metadata
introspection and extraction of data sources and pipelines. Machine learning provides rare
support for lineage inferencing to close gaps in the source and provenance of the data.
Additional intelligence brings wide profiling and classification, a strength validated by
reference customers. More work is needed to connect data catalog capabilities to the
industrial use cases Hitachi Vantara delivers that demand real-time, IoT, and edge data
support. Its product catalogs data assets, but more introspection and data preparation in batch
and real-time could improve contextualization and governance of streaming data. Hitachi
Vantara is a good partner for its customers on the DataOps journey connecting IT and OT use
cases and who need strong classification and labeling for data.
Contenders
IBM creates a business view of data but lags in logical and technical details. IBM’s strategy
focuses on active metadata for the enterprise data supply chain. Its knowledge catalog lives in
IBM CloudPak and IBM Cloud to support data governance, curation, and self-service. IBM’s
market approach ensures data access, but the product vision and planned enhancements are
limiting for data engineers that need more data automation and development support. Data
fabric innovation as a metadata-driven ecosystem drives the knowledge catalog R&D, with a
vast partner ecosystem for go to market. However, the tool shows little evolution since the last
evaluation and IBM did not provide customer references, making use of Knowledge Catalog
within its customer environments an enigma.
IBM’s data catalog strengths are clearly
connected to its heritage data governance and glossary solutions. Data consumers, analysts,
and data scientists have a strong marketplace and data shopping experience with clearly
described and governed data assets. The Knowledge Catalog also has significant integration
capabilities within its own ecosystem and other platforms. However, a lack of fine-grained
visibility into data quality, a missing engineering UI/UX, lack of orchestration, and minimal
DevOps platform connectivity keep Knowledge Catalog out of step with DataOps and
Cloudera’s EDC
Evaluation keeps customer data lakes
Summary Vendororganized
Offerings but sticks to catalog basics.
Vendor Cloudera
Profiles Evaluation Overview S
provides a scalable analytic environment and positions its Apache Atlas-based data catalog to
manage data in secure trusted data lakes. One of the first and foremost big data Hadoop-
based platforms, Cloudera built a vast and strategic ecosystem of technology and service
providers racing to help customers with big data. However, its overall product vision and
planned enhancements show the investment has peaked with its data catalog. Cloudera
focuses on its analytics solutions, preferring to partner with other EDC providers for added
metadata management and governance.
Cloudera’s EDC delivers on-par data profiling and
lineage to describe the customer’s data lake. Many APIs are available to connect cloud
sources, data fabric, and analytics environments, extract schemas, and provide visibility into
forward and backward lineage. Cloudera’s use and enhancement of Apache Atlas for
metadata capture is vast compared to most data catalogs. But it leaves a goldmine of
captured metadata untouched, which could offset manual data classification and labeling and
enable DAG and automation. The UI is dated compared to other catalogs, inhibiting a UX that
keeps data engineers and data stewards in sync. Organizations won’t use Cloudera as an
enterprise data catalog, but they will greatly benefit by connecting to and extracting its rich
metadata for data fabric platforms. Cloudera declined to participate in the full Forrester Wave
evaluation process.
Challengers
AWS brings technical prowess for Glue ETL but lags in wider data estate support. The Amazon
Web Services (AWS) data and analytic platform offers Glue ETL for easy data movement into
the AWS environment. Glue Data Catalog is a feature within this tool. AWS and its partners
minimize the use of the data catalog, maintaining its ETL metadata status rather than a stand-
alone capability. Thus, Glue Data Catalog is the ETL repository in the platform and metadata
source for other enterprise data catalogs. AWS, as a leading cloud platform for data, analytics,
and applications, misses the opportunity to play a more central role in metadata, data
intelligence, data automation, and governance hub for the entire cloud platform for their
customers as done by the other hyperscalers.
AWS maintains and provides the necessary
metadata and schema information for data pipeline building and running ETL/ETL jobs
specifically for AWS Glue ETL. For real-time and streams, the data catalog is adept at handling
time-series data. But without visibility into all data and data processes, pipeline impact analysis
is limited for pipeline building. Reduced visibility also constrains views into automation,
workflows, and governance. By expanding beyond Glue, to capture metadata and schemas for
all pipelines, point to point, and pub/sub processes, AWS could offer a competitive data
catalog. Organizations should include the Glue Data Catalog in their shortlist as an additional
metadata repository in a broader data fabric environment for enterprise lineage analysis and
added technical and logical metadata. AWS declined to participate in the full Forrester Wave
evaluation process.
TIBCO delivers basic data catalog functionality but must clarify its path forward. A push to the
cloud, advanced analytics, and investments in data management awakened TIBCO to the
importance of a metadata capability and broader DataOps support. Its product strategy
concentrates on data source inventory and description to support data sourcing and access
for analytics. The acquisition of Orchestra Networks MDM brought subject matter expertise
and technical know-how, but the vision and planned enhancements for its data catalog are still
ongoing. As a result, its data catalog market approach is minimal and quiet, hiding its actual
About Forrester Reprints [Link]
capabilities. At a product vision crossroad, TIBCO risks advances in its broad platform
The Forrester Wave™: Enterprise Data Catalogs For DataOps, Q2 2022
solutions without stronger metadata, automation, and DataOps support that its competitors
have.
TIBCO’s
Topics data catalog delivers the basic metadata management and profiling of data
sources and source lineage. Data engineers can measure data quality with basic data
Evaluation Summary Vendor Offerings Vendor Profiles Evaluation Overview S
statistics and discover sensitive data to apply controls and protection. Today, its data catalog is
designed for analytics use cases for a better user experience, integration, and data profiling
for data governance teams. DataOps and data engineer capabilities are missing, limiting the
ability to move insights and data applications into TIBCO’s wider process, event, and
streaming portfolio. Organizations choosing TIBCO will get a foundation for data access and
self-service for TIBCO analytics but should integrate the catalog into an enterprise tool for
enterprise data management and governance objectives. TIBCO declined to participate in the
full Forrester Wave evaluation process.
Evaluation Overview
We evaluated vendors against 26 criteria, which we grouped into three high-level categories:
Current offering. Each vendor’s position on the vertical axis of the Forrester Wave graphic
indicates the strength of its current offering. Key criteria for these solutions include cataloging,
data intelligence, DataOps and engineering, data governance, connectivity, interoperability,
portability, and data as a service.
Strategy. Placement on the horizontal axis indicates the strength of the vendors’ strategies. We
evaluated product vision, market approach, performance, planned enhancements, innovation
roadmap, and partner ecosystem.
Market presence. Represented by the size of the markers on the graphic, our market presence
scores reflect each vendor’s revenue, number of customers, and average deal size.
Bundled data catalogs in data fabric platforms or standalone vendors with revenue above $25
million.
Data engineers, data architects, and DataOps teams that are significant or primary users of the
solution.
Supplemental Material
Online Resource
We publish all our Forrester Wave scores and weightings in an Excel file that provides detailed
product evaluations and customizable rankings; download this tool by clicking the link at the
beginning of this report on [Link]. We intend these scores and default weightings to serve
only as a starting point and encourage readers to adapt the weightings to fit their individual needs.
We include the Forrester Wave publishing date (quarter and year) clearly in the title of each
Forrester Wave report. We evaluated the vendors participating in this Forrester Wave using
materials they provided to us by March 30, 2022, and did not allow additional information after that
point. We encourage readers to evaluate how the market and vendor offerings change over time.
In accordance with The Forrester Wave™ And New Wave™ Vendor Review Policy, Forrester asks
vendors to review our findings prior to publishing to check for accuracy. Vendors marked as
nonparticipating vendors in the Forrester Wave graphic met our defined inclusion criteria but
declined to participate in or contributed only partially to the evaluation. We score these vendors in
accordance with The Forrester Wave™ And The Forrester New Wave™ Nonparticipating And
Incomplete Participation Vendor Policy and publish their positioning along with those of the
participating vendors.
Integrity Policy
We conduct all our research, including Forrester Wave evaluations, in accordance with the Integrity
Policy posted on our website.
© 2022,
Forrester Research, Inc. and/or its subsidiaries. All rights reserved.









