Professional Documents
Culture Documents
Engineering Platforms
Published 2 May 2022 - ID G00763493 - 16 min read
By Afraz Jaffri, Erick Brethenoux, and 7 more
Overview
Key Findings
Creating an efficient and maintainable path from prototype to production for
artificial intelligence (AI) and machine learning (ML) models continues to be a
major challenge, fueled by the increasing complexity of data and
infrastructure.
Data science teams in many organizations have not sufficiently adopted
and strengthened software engineering and DevOps practices for AI and ML
model development and operationalization, leaving significant gaps in model
life cycle execution and management.
Data science and machine learning (DSML) engineering platforms combine
the flexibility and utility of open-source ML frameworks with proprietary
controls and procedures that support repeatable and reliable delivery of
models into production.
The DSML engineering platform market is fragmented, with established
vendors adding functionality across their platforms and smaller, specialist
vendors focusing on frameworks for abstracting common DSML tasks such as
experiment tracking and pipeline development.
Recommendations
Data and analytics leaders responsible for delivering data science solutions and the
necessary technology to move prototypes into production should:
Select a DSML engineering platform by identifying gaps in current model
development practices, paying specific attention to capabilities around model
deployment, management, governance and security. Fill any remaining gaps
with specialist tools.
Include flexibility, openness and composability as core requirements for
DSML platform selection by assessing API completeness, modularity of
platform architecture and metadata access.
Match the skill sets of data science and ML engineering teams to DSML
engineering platform capabilities by assessing the level of collaboration and
abstraction provided across the model development and management life
cycle.
Select a DSML engineering platform only when you have a pipeline of use
cases and a strategic business need for the delivery of machine
learning products with quantifiable scalability and performance requirements.
Market Definition
This document was revised on 3 May 2022. The document you are viewing is the
corrected version. For more information, see the Corrections page on gartner.com.
DSML engineering platforms consist of a core product and supporting portfolio of
integrated products, components, libraries and frameworks (including proprietary,
partner-sourced and open-source) for the development and operations of machine
learning solutions integrated with typically complex, innovative and highly scalable
applications. These solutions are engineered by personas who have deep technical
expertise in data science and machine learning or have other skills in digital
technology, such as data, software or system engineers. The platforms provide a
code-centric user interface, using a variety of programming languages. To boost
productivity, they also facilitate composition and automation through visual interfaces
and through open APIs.
Market Description
DSML engineering platforms have the primary purpose of developing ML models
that can drive critical business systems such as credit approval, predictive
maintenance, medical diagnosis and fraud detection. In order to do this, DSML
engineering platforms have evolved from supporting a core data science audience
with code-driven model development to now also supporting data engineering,
application development and infrastructure roles. DSML engineering platform
development has been driven by the need to enable collaboration between these
roles, including their activities and tasks, to effectively deliver ML systems. The focus
of these platforms has shifted in all areas of model development. Figure 1 illustrates
this shift by summarizing the main characteristics of a DSML engineering platform,
and Table 1 identifies 11 key aspects of platform functionality.
Figure 1: DSML Engineering Platforms
Table 1: Key Aspects of Platform Functionality
Enlarge Table
Capability DSML Engineering Platform Focus
Data access and Data access is provided for streaming data and
preparation unstructured data, typically achieved
through prebuilt connectors provided with the
platform. Data-centric AI is supported through data
labeling and synthetic data generation. Metadata
generated throughout the development life cycle is
stored and can be accessed programmatically.
Feature stores are also provided.
Data exploration and Support for notebooks is the de facto way for
visualization exploring and visualizing data. There are also
platform-specific functions for a variety of
exploratory statistics and geolocation and graph
analytics. Integrations are provided for
visualizations in external analytics platforms.
Market Direction
The AI and data science platform market is due to grow to over $10 billion by 2025 at
a 21.6% compounded annual growth rate.1 This growth in the market mirrors the
investments made by organizations in data science and ML initiatives, which are
largely turning from strategy to execution. The DSML engineering market is
representative of this shift in dynamics between business need and technical
implementation.
Buyers of these platforms have typically had success in building and deploying
DSML solutions in pockets within their enterprise and are now looking to formalize
DSML development practices, platforms and architectures to provide sustainable
growth in the use of DSML enterprisewide. DSML engineering platforms will continue
to focus on enterprisewide deployments, which are managed by centralized teams,
often within IT, but also give visibility to lines of business (LOBs) for decision making.
The capabilities that will drive the development of these platforms and have the most
impact for these users are:
Market Analysis
The DSML engineering platform market is still an emerging and immature market but
has many established vendors that have been adding functionality to their DSML
platforms to ease the frustration organizations face when deploying and running
models in production. These issues include:
Building a data pipeline that supports the usage of the model in the correct
context (low latency or high throughput, for example)
Building a model deployment pipeline, which includes code refactoring,
automated testing, monitoring and updating, quality validation, packaging and
deploying into the correct target state (container, service, code)
Ensuring the observability of models, including mapping model outputs to
model versions and training data and being able to identify, explain and
correct data and concept drift
Not all barriers are technical in nature, and they are often resolved by improving
processes and collaboration (see 4 Machine Learning Best Practices to Achieve
Project Success). This demand and gap in the market has also allowed smaller
vendors to create offerings, either as all-purpose platforms or targeted on certain
tasks in the ML model life cycle, referred to as MLOps. Gartner’s social media
analysis of MLOps and related terms shows that the topic of MLOps platforms had
the biggest share of voice in 2021, along with MLOps capabilities such as continuous
monitoring, model governance and continuous delivery, as shown in Figure 2.
Figure 2: Share of Voice for MLOps in Social Media 2021
The emergence of MLOps has fragmented the DSML market into four broad
categories:
Multipersona DSML — A number of platforms fall in the category of being both
DSML multipersona and engineering platforms. They typically provide a low-code
interface for domain experts and citizen data scientists to work on predictive
analytics tasks, but also contain a coding environment where experts can take a
human or autogenerated ML model and deploy it as a service with basic monitoring
capability. Vendors that offer such platforms are expanding their capability and
appeal to all points on the user spectrum.
DSML Engineering — These platforms are focused on serving the needs of expert
data scientists and delivery teams responsible for building and maintaining ML and
AI solutions. They provide an end-to-end platform with developer tools for managing
code, data, experiments, models, model outputs and associated pipelines, often
integrating with DevOps tools and open-source frameworks. They also provide and
manage their own compute servers, while also able to connect to external compute
resources.
MLOps — Platforms in this category are focused on easing the process of
operationalizing models by integrating with existing developer tools. They typically
provide an abstraction layer for managing and tracking experiments and model
training, deployment and monitoring. These processes are abstracted through APIs
and libraries, which data scientists can utilize as they are writing code. The functions
provided integrate to back-end services such as model registries, feature stores,
monitoring services and infrastructure usage logs.
Specialists — A number of tools and platforms within the MLOps category focus on a
subset of capabilities. This can include explainability, security, deployment,
monitoring and governance. More information on these capabilities and platforms
can be found in Market Guide for AI Trust, Risk and Security Management.
As the market evolves, the continuation of mergers and acquisitions between
vendors that offer a full-stack engineering platform and those that are MLOps or
specialists will continue. Specialists are also likely to work together to enable
interoperability and build their own ecosystem of tightly integrated tools.
Business
Data ML Models Model Outputs
Understanding
Representative Vendors
The vendors listed in this Market Guide do not imply an exhaustive list. This section
is intended to provide more understanding of the market and its offerings.
Market Introduction
Table 3: Representative Vendors in DSML Engineering Platform Market
Enlarge Table
Vendor Product(s)
Dataiku Dataiku
DataVision BeeYard
Deepnote Deepnote
Exponential AI Enso
MathWorks MATLAB
Palantir Foundry
Technologies
Red Hat Red Hat OpenShift, Red Hat OpenShift Data Science
Valohai Valohai
Market Recommendations
Data and analytics leaders must capitalize on trends and configure their strategy for
DSML engineering platforms by:
Assessing the current state of model development practices across data, data
science, machine learning engineering and operations. Assess the DSML
engineering platforms against current process limitations, and consider
specialists for acute needs such as explainability, testing and monitoring.
Creating upskilling initiatives for the key roles involved in DSML
engineering (see Leading Upskilling Initiatives in Data Science and Machine
Learning). Data scientists should be required to learn foundational DevOps
practices, and operational experts should be upskilled to have a better
understanding of DSML techniques.
Identifying whether the capabilities of a given DSML engineering platform
match the requirements and ambition of the organization. The level of DSML
maturity and skills, AI product centricity and organizational value need to be
considered when deciding the type of DSML platform required.
Evidence
Approved Methodology: Gartner conducts social listening analysis leveraging third-
party data tools to complement or supplement the other fact bases presented in this
document. Due to its qualitative and organic nature, the results should not be used
separately from the rest of this research. No conclusions should be drawn from this
data alone. Social media data in reference is from 1 January 2019 through 31
December 2021 in all geographies (except China) and recognized languages.
The SMA Team: Mani Ratnam and Talmeez Fahim from the Social Media Analytics
Team contributed to this research.
1
Forecast Analysis: Artificial Intelligence Software, Worldwide