You are on page 1of 15

Market Guide for DSML

Engineering Platforms
Published 2 May 2022 - ID G00763493 - 16 min read
By Afraz Jaffri, Erick Brethenoux, and 7 more

Data science and machine learning engineering platforms promote


code-first development to build and support ML models used in
critical business applications. Data and analytics leaders scaling
their DSML initiatives must reassess their engineering requirements
and vendor selection criteria.

Overview
Key Findings
 Creating an efficient and maintainable path from prototype to production for
artificial intelligence (AI) and machine learning (ML) models continues to be a
major challenge, fueled by the increasing complexity of data and
infrastructure.
 Data science teams in many organizations have not sufficiently adopted
and strengthened software engineering and DevOps practices for AI and ML
model development and operationalization, leaving significant gaps in model
life cycle execution and management.
 Data science and machine learning (DSML) engineering platforms combine
the flexibility and utility of open-source ML frameworks with proprietary
controls and procedures that support repeatable and reliable delivery of
models into production.
 The DSML engineering platform market is fragmented, with established
vendors adding functionality across their platforms and smaller, specialist
vendors focusing on frameworks for abstracting common DSML tasks such as
experiment tracking and pipeline development.

Recommendations
Data and analytics leaders responsible for delivering data science solutions and the
necessary technology to move prototypes into production should:
 Select a DSML engineering platform by identifying gaps in current model
development practices, paying specific attention to capabilities around model
deployment, management, governance and security. Fill any remaining gaps
with specialist tools.
 Include flexibility, openness and composability as core requirements for
DSML platform selection by assessing API completeness, modularity of
platform architecture and metadata access.
 Match the skill sets of data science and ML engineering teams to DSML
engineering platform capabilities by assessing the level of collaboration and
abstraction provided across the model development and management life
cycle.
 Select a DSML engineering platform only when you have a pipeline of use
cases and a strategic business need for the delivery of machine
learning products with quantifiable scalability and performance requirements.

Market Definition
This document was revised on 3 May 2022. The document you are viewing is the
corrected version. For more information, see the Corrections page on gartner.com.
DSML engineering platforms consist of a core product and supporting portfolio of
integrated products, components, libraries and frameworks (including proprietary,
partner-sourced and open-source) for the development and operations of machine
learning solutions integrated with typically complex, innovative and highly scalable
applications. These solutions are engineered by personas who have deep technical
expertise in data science and machine learning or have other skills in digital
technology, such as data, software or system engineers. The platforms provide a
code-centric user interface, using a variety of programming languages. To boost
productivity, they also facilitate composition and automation through visual interfaces
and through open APIs.

Market Description
DSML engineering platforms have the primary purpose of developing ML models
that can drive critical business systems such as credit approval, predictive
maintenance, medical diagnosis and fraud detection. In order to do this, DSML
engineering platforms have evolved from supporting a core data science audience
with code-driven model development to now also supporting data engineering,
application development and infrastructure roles. DSML engineering platform
development has been driven by the need to enable collaboration between these
roles, including their activities and tasks, to effectively deliver ML systems. The focus
of these platforms has shifted in all areas of model development. Figure 1 illustrates
this shift by summarizing the main characteristics of a DSML engineering platform,
and Table 1 identifies 11 key aspects of platform functionality.
Figure 1: DSML Engineering Platforms
Table 1: Key Aspects of Platform Functionality
Enlarge Table


Capability DSML Engineering Platform Focus

Data access and Data access is provided for streaming data and
preparation unstructured data, typically achieved
through prebuilt connectors provided with the
platform. Data-centric AI is supported through data
labeling and synthetic data generation. Metadata
generated throughout the development life cycle is
stored and can be accessed programmatically.
Feature stores are also provided.

Data exploration and Support for notebooks is the de facto way for
visualization exploring and visualizing data. There are also
platform-specific functions for a variety of
exploratory statistics and geolocation and graph
analytics. Integrations are provided for
visualizations in external analytics platforms.

User interface Primarily code-based UI, which includes notebooks


modalities and IDEs for the majority of DSML tasks, although
visual-based drag-and-drop environments are
available for pipeline building and process design.
Graphical user interfaces are generally used for
administration and monitoring.

Collaboration Collaboration is supported across all stages in the


end-to-end pipeline, especially between
development and operationalization. Increasingly,
support is provided for implementing custom
approval workflows and integration with
marketplaces and workplace applications.

Model development Engineering platforms facilitate the development of


novel ML and deep learning techniques, such as
graph neural networks, transfer learning, federated
learning, ensembles and reinforcement learning, by
supporting the most recent open-source frameworks
or proprietary implementation.

Advanced analytics Support for different types of analyses on


geospatial, audio, images, video and text. Some
Capability DSML Engineering Platform Focus

platforms also provide composite AI capabilities to


combine ML with other techniques such as
simulation, rules or optimization.

Infrastructure Infrastructure can be provisioned, scaled, monitored


and spun down as required for data engineering,
model training, testing and deployment. Support is
provided for mixing and matching workloads
across on-premises, hybrid, multicloud and edge
environments.

Performance and Performance is primarily supported by


scalability utilizing multinode server clustering, distributed
autoscaling, model compression and optionally
distributed training. Support for training models with
specialist hardware such as GPUs is a standard
feature. Scalability is supported by orchestration of
workloads to optimize compute resources.

Precanned solutions Accelerate expert data science development by


providing template code specialized for different use
cases by industry or domain, and provide
accessibility to newly developed model architectures
produced by third parties.

Operationalization DSML engineering platforms support the abstracting


of components in the model development process
into pipelines. These components may include
model catalogs, model validation, performance
monitoring, containerization and integration with
XOps (DataOps, ModelOps, MLOps and DevOps)
processes. Models can be deployed in batch and
real-time modes for inferencing and
retraining on different endpoints.

Governance and Common components found in engineering


responsible AI platforms include model auditing and policy
compliance, risk management and security profiling,
privacy protection, bias mitigation, and fairness
Capability DSML Engineering Platform Focus

metrics through a variety of explainable AI


techniques.

Source: Gartner (May 2022)

Market Direction
The AI and data science platform market is due to grow to over $10 billion by 2025 at
a 21.6% compounded annual growth rate.1 This growth in the market mirrors the
investments made by organizations in data science and ML initiatives, which are
largely turning from strategy to execution. The DSML engineering market is
representative of this shift in dynamics between business need and technical
implementation.
Buyers of these platforms have typically had success in building and deploying
DSML solutions in pockets within their enterprise and are now looking to formalize
DSML development practices, platforms and architectures to provide sustainable
growth in the use of DSML enterprisewide. DSML engineering platforms will continue
to focus on enterprisewide deployments, which are managed by centralized teams,
often within IT, but also give visibility to lines of business (LOBs) for decision making.
The capabilities that will drive the development of these platforms and have the most
impact for these users are:

 Data access across hybrid and multicloud data sources with provisioning for


on-demand scalable compute for data engineering and model training.
 Experiment tracking and model management concentrating on increasing the
number of metadata points stored that will power catalogs for easy search
and reuse with some augmentation.
 Applying CI/CD practices to model development with a focus on unit
testing and validation of code and model behavior before entering production.
 The merger of model monitoring functionality with AIOps functionality to
include augmented monitoring of models, data and infrastructure, as part of
observability with highlighting of root causes and problem resolution
(see Market Guide for AIOps Platforms).
 Openness and flexibility of platforms via software development kits (SDKs)
and APIs to build a complete DSML stack through integration with other
DSML engineering platforms, data catalogs, DevOps and MLOps tools, and
collaboration platforms. It also enables organizations to build their own
services and integrate with other API-driven services.
Recent investment by cloud service providers (CSPs) in their DSML products,
together with the commonality of data stored in the cloud and criticality of cloud
compute resources for DSML engineering, means Amazon SageMaker, Google
Vertex AI and Microsoft Azure Machine Learning will further solidify their positions as
dominant players in the market. However, other vendors will continue to flourish as
part of cloud ecosystems through tight integration and focus on different parts of the
DSML stack and specialized solutions. DataRobot’s AI cloud for industries and SAS
industry solutions are examples of this type of integration.

Market Analysis
The DSML engineering platform market is still an emerging and immature market but
has many established vendors that have been adding functionality to their DSML
platforms to ease the frustration organizations face when deploying and running
models in production. These issues include:

 Building a data pipeline that supports the usage of the model in the correct
context (low latency or high throughput, for example)
 Building a model deployment pipeline, which includes code refactoring,
automated testing, monitoring and updating, quality validation, packaging and
deploying into the correct target state (container, service, code)
 Ensuring the observability of models, including mapping model outputs to
model versions and training data and being able to identify, explain and
correct data and concept drift
Not all barriers are technical in nature, and they are often resolved by improving
processes and collaboration (see 4 Machine Learning Best Practices to Achieve
Project Success). This demand and gap in the market has also allowed smaller
vendors to create offerings, either as all-purpose platforms or targeted on certain
tasks in the ML model life cycle, referred to as MLOps. Gartner’s social media
analysis of MLOps and related terms shows that the topic of MLOps platforms had
the biggest share of voice in 2021, along with MLOps capabilities such as continuous
monitoring, model governance and continuous delivery, as shown in Figure 2.
Figure 2: Share of Voice for MLOps in Social Media 2021
The emergence of MLOps has fragmented the DSML market into four broad
categories:
Multipersona DSML — A number of platforms fall in the category of being both
DSML multipersona and engineering platforms. They typically provide a low-code
interface for domain experts and citizen data scientists to work on predictive
analytics tasks, but also contain a coding environment where experts can take a
human or autogenerated ML model and deploy it as a service with basic monitoring
capability. Vendors that offer such platforms are expanding their capability and
appeal to all points on the user spectrum.
DSML Engineering — These platforms are focused on serving the needs of expert
data scientists and delivery teams responsible for building and maintaining ML and
AI solutions. They provide an end-to-end platform with developer tools for managing
code, data, experiments, models, model outputs and associated pipelines, often
integrating with DevOps tools and open-source frameworks. They also provide and
manage their own compute servers, while also able to connect to external compute
resources.
MLOps — Platforms in this category are focused on easing the process of
operationalizing models by integrating with existing developer tools. They typically
provide an abstraction layer for managing and tracking experiments and model
training, deployment and monitoring. These processes are abstracted through APIs
and libraries, which data scientists can utilize as they are writing code. The functions
provided integrate to back-end services such as model registries, feature stores,
monitoring services and infrastructure usage logs.
Specialists — A number of tools and platforms within the MLOps category focus on a
subset of capabilities. This can include explainability, security, deployment,
monitoring and governance. More information on these capabilities and platforms
can be found in Market Guide for AI Trust, Risk and Security Management.
As the market evolves, the continuation of mergers and acquisitions between
vendors that offer a full-stack engineering platform and those that are MLOps or
specialists will continue. Specialists are also likely to work together to enable
interoperability and build their own ecosystem of tightly integrated tools.

Open Source Remains Central to DSML Engineering


Platforms
Open-source libraries have become the standard utility used in the DSML domain.
Therefore, DSML platforms in general de-emphasize the provision of proprietary
libraries, algorithms and techniques in favor of supporting the comprehensive and
continuously advancing open-source set of frameworks. DSML engineering
platforms continue this trend by incorporating many open-source packages, with
their associated runtime environment, within the platform. A small number of
platforms also support libraries for multiple development languages including Python,
R, Go, C++, Scala and Julia.
Notebooks are a key tool in a data scientist’s toolbox for data exploration,
experimentation, collaboration and sharing and remain front and center in DSML
engineering platforms. Recent notebook innovations include real-time
collaboration, autopackaging, and deployment and auditability. Future innovations
will continue to focus on bringing notebook-based experiments into live production
settings.
As the development phase becomes increasingly commoditized by open source,
DSML engineering platforms turn their attention to developing supporting tools and
functions that make development scalable and collaborative, both between different
life cycle phases and between different teams. Many open-source frameworks have
emerged to also support these core needs of DSML teams (such as MLflow,
Metaflow and TensorFlow Extended [TFX]), yet being able to manage each one and
fit them together in a cohesive manner is out of reach for most organizations. DSML
engineering platforms either replicate and enhance the ideas from open source or
build a commercial offering on top of an open-source framework in order to abstract
away many low-level functions that require deep technical expertise to execute.

The Emerging Role of Composability, Metadata and Orchestration


The DSML development process has changed significantly since the introduction of
the Cross-Industry Standard Process for Data Mining (CRISP-DM) life cycle.
Traditionally, each step in the process was performed linearly, with a singular output
from one process forming the entry to the next. Now, agile development and the
need for greater speed and reproducibility has led to almost every stage of the
process being decomposed into components with associated metadata
(see Table 2).

Table 2: Metadata Generated From the DSML Engineering Life Cycle


Enlarge Table

Business
Data ML Models Model Outputs
Understanding

User stories ETL store Hyperparameters Model accuracy

Ownership Data Experiments Infrastructure


versioning utilization

Task Feature store Code versioning Model


management explainability

Business KPIs Security and Lineage Execution cycles


access

Source: Gartner (May 2022)


DSML engineering platforms store some or all the above metadata types, but
crucially, many also embrace composability by giving API access to each individual
type. This trend brings dynamism to the market as organizations need not be tied to
monolithic platforms but can build their own solutions in a decentralized, composable
environment. As the proliferation continues, it is inevitable that metadata
management and cataloging will become a core part of DSML engineering platform
architecture.
Supporting the ecosystem of platforms and tools needed to sustain high-velocity
DSML development are AI orchestration platforms (see Cool Vendors in Enterprise
AI Operationalization and Engineering). These services are a part of the DSML stack
that act as the glue between different services participating in a modular DSML
architecture and utilizing generated metadata. This orchestration will increasingly
become augmented by DSML engineering platforms themselves and reduce the
need for manual configuration in a trend similar to the emergence of augmentation
for infrastructure monitoring (see Platform Teams and AIOps Will Redefine DevOps
Approaches by 2025).
Further details on these providers and others can be found in the Gartner
research Tool: Vendor Identification for Data Science and Machine Learning
Platforms (the Market Guide is limited to a maximum of 40 vendors [see Note 1
and Note 2]).

Representative Vendors
The vendors listed in this Market Guide do not imply an exhaustive list. This section
is intended to provide more understanding of the market and its offerings.

Market Introduction
Table 3: Representative Vendors in DSML Engineering Platform Market
Enlarge Table

Vendor Product(s)

 4Paradigm 4Paradigm Sage AIOS, Sage Studio, Sage


HyperCycle

 Activeeon ProActive Machine Learning

 Alibaba Cloud Machine Learning Platform for AI (PAI)

 Altair Altair Data Analytics, Knowledge Studio,


SmartWorks, Panopticon, WPS Analytics,
HyperStudy, Monarch

 Amazon Web Amazon SageMaker


Services (AWS)
Vendor Product(s)

 C3 AI C3 AI Application Platform, C3 AI Studio, C3 AI Ex


Machina

 Cloudera Cloudera Data Platform (CDP)

 cnvrg.io cnvrg.io (self-hosted), cnvrg.io Metacloud (SaaS)

 Comet Comet Experiment Management, Comet Model


Production Monitoring (MPM)

 Databricks Databricks Lakehouse Platform

 Dataiku Dataiku

 DataRobot DataRobot AI Cloud

 DataVision BeeYard

 Deepnote Deepnote

 Domino Data Lab Domino Enterprise MLOps Platform

 Exponential AI Enso

 FICO FICO Platform

 ForePaaS ForePaaS Platform

 Google Vertex AI, BigQuery


Vendor Product(s)

 HPE HPE Ezmeral ML Ops

 IBM IBM Watson Studio on Cloud Pak for Data, IBM


SPSS

 Iguazio The Iguazio MLOps Platform

 KNIME KNIME Analytics Platform, KNIME Server

 MathWorks MATLAB

 Microsoft Azure Machine Learning

 Neo4j Neo4j Graph Data Science (includes Neo4j Graph


Database, Bloom, Browser), AuraDS

 Oracle OCI Data and AI Platform, Oracle Machine Learning


available in Oracle Database, Oracle Analytics Cloud

 Palantir Foundry
Technologies

 RapidMiner RapidMiner Platform consists of Studio, AI Hub as a


bundle

 Red Hat Red Hat OpenShift, Red Hat OpenShift Data Science

 RStudio PBC RStudio Team (Bundle of RStudio Workbench,


RStudio Connect, RStudio Package Manager)

 Run:AI Run:AI Atlas Platform


Vendor Product(s)

 SAS SAS Visual Data Science Decisioning

 Scale AI Scale Rapid, Nucleus, Launch, Validate, Document


AI, Collect, Image/Video, Mapping, Synthetic

 Teradata Vantage includes in-database analytics, Bring Your


Own Model (BYOM), Open Analytics Framework and
others

 TIBCO Software TIBCO Data Science, TIBCO Spotfire, TIBCO


Streaming, TIBCO Model Ops

 TigerGraph TigerGraph Enterprise Graph Database, TigerGraph


Graph Data Science Library, TigerGraph ML
Workbench

 TruEra TruEra Diagnostics, TruEra Monitoring

 Valohai Valohai

 Verta Verta Platform

Source: Gartner (May 2022)

Market Recommendations
Data and analytics leaders must capitalize on trends and configure their strategy for
DSML engineering platforms by:

 Assessing the current state of model development practices across data, data
science, machine learning engineering and operations. Assess the DSML
engineering platforms against current process limitations, and consider
specialists for acute needs such as explainability, testing and monitoring.
 Creating upskilling initiatives for the key roles involved in DSML
engineering (see Leading Upskilling Initiatives in Data Science and Machine
Learning). Data scientists should be required to learn foundational DevOps
practices, and operational experts should be upskilled to have a better
understanding of DSML techniques.
 Identifying whether the capabilities of a given DSML engineering platform
match the requirements and ambition of the organization. The level of DSML
maturity and skills, AI product centricity and organizational value need to be
considered when deciding the type of DSML platform required.

Evidence
Approved Methodology: Gartner conducts social listening analysis leveraging third-
party data tools to complement or supplement the other fact bases presented in this
document. Due to its qualitative and organic nature, the results should not be used
separately from the rest of this research. No conclusions should be drawn from this
data alone. Social media data in reference is from 1 January 2019 through 31
December 2021 in all geographies (except China) and recognized languages.
The SMA Team: Mani Ratnam and Talmeez Fahim from the Social Media Analytics
Team contributed to this research.
1
 Forecast Analysis: Artificial Intelligence Software, Worldwide

Note 1: Representative Vendor Selection


The list of vendors is not exhaustive, and it represents vendors that Gartner has
identified under the scope of the emerging data science and machine learning
engineering platforms market.

Note 2: Gartner’s Initial Market Coverage


This Market Guide provides Gartner’s initial coverage of the market and focuses on
the market definition, rationale for the market and market dynamics.

You might also like