Azure Machine Learning Guide

Tell us about your PDF experience.
Azure Machine Learning documentation

Learn how to train and deploy models and manage the ML lifecycle (MLOps) with Azure
Machine Learning. Tutorials, code examples, API references, and more.
Overview
ｅ OVERVIEW
What is Azure Machine Learning?
Setup & quickstart
ｆ QUICKSTART
Create resources
Get started with Azure Machine Learning
Start with the basics
ｇ TUTORIAL
Prepare and explore data
Develop on a cloud workstation
Train a model
Deploy a model
Set up a reusable pipeline
Work with data
ｃ HOW-TO GUIDE
Use Apache Spark in Azure Machine Learning
Create data assets

Work with tables
Train models
ｃ HOW-TO GUIDE
Run training with CLI, SDK, or REST API
Tune hyperparameters for model training
Build pipelines from reuseable components
Use automated ML in studio
Train with R
Deploy models
｀ DEPLOY
Streamline model deployment with endpoints
Real-time scoring with online endpoints
Batch scoring with batch endpoints
Deploy R models
Manage the ML lifecycle (MLOps)
ｃ HOW-TO GUIDE
Track, monitor, analyze training runs
Model management, deployment & monitoring
Security for ML projects
ｃ HOW-TO GUIDE
Create a secure workspace

Connect to data sources
Enterprise security & governance
Reference docs
ｉ REFERENCE
Python SDK (v2)
CLI (v2)
REST API
Algorithm & component reference
Resources
ｉ REFERENCE
Upgrade to v2
Python SDK (v2) code examples
CLI (v2) code examples
ML Studio (classic) documentation

Azure Machine Learning SDK & CLI (v1)
Article • 06/09/2023
APPLIES TO: Azure CLI ml extension v1 Python SDK azureml v1
All articles in this section document the use of the first version of Azure Machine
Learning Python SDK (v1) or Azure CLI ml extension (v1). For information on the current
SDK and CLI, see Azure Machine Learning SDK and CLI v2.
SDK v1
The Azure SDK examples in articles in this section require the azureml-core , or Python
SDK v1 for Azure Machine Learning. The Python SDK v2 is now available.
The v1 and v2 Python SDK packages are incompatible, and v2 style of coding will not
work for articles in this directory. However, machine learning workspaces and all
underlying resources can be interacted with from either, meaning one user can create a
workspace with the SDK v1 and another can submit jobs to the same workspace with
the SDK v2.
We recommend not to install both versions of the SDK on the same environment, since
it can cause clashes and confusion in the code.
How do I know which SDK version I have?

To find out whether you have Azure Machine Learning Python SDK v1, run pip
show azureml-core . (Or, in a Jupyter notebook, use %pip show azureml-core )
To find out whether you have Azure Machine Learning Python SDK v2, run pip
show azure-ai-ml . (Or, in a Jupyter notebook, use %pip show azure-ai-ml )
Based on the results of pip show you can determine which version of SDK you have.
CLI v1
The Azure CLI commands in articles in this section require the azure-cli-ml , or v1,
extension for Azure Machine Learning. The enhanced v2 CLI using the ml extension is
now available and recommended.
The extensions are incompatible, so v2 CLI commands will not work for articles in this
directory. However, machine learning workspaces and all underlying resources can be
interacted with from either, meaning one user can create a workspace with the v1 CLI
and another can submit jobs to the same workspace with the v2 CLI.
How do I know which CLI extension I have?

To find which extensions you have installed, use az extension list .
If the list of Extensions contains azure-cli-ml , you have the v1 extension.

If the list contains ml , you have the v2 extension.
Next steps
For more information on installing and using the different extensions, see the following
articles:
azure-cli-ml - Install, set up, and use the CLI (v1)
ml - Install and set up the CLI (v2)
For more information on installing and using the different SDK versions:
azureml-core - Install the Azure Machine Learning SDK (v1) for Python
azure-ai-ml - Install the Azure Machine Learning SDK (v2) for Python
How Azure Machine Learning works:
Architecture and concepts (v1)
Article • 06/01/2023
This article applies to the first version (v1) of the Azure Machine Learning CLI & SDK. For
version two (v2), see How Azure Machine Learning works (v2).
Learn about the architecture and concepts for Azure Machine Learning. This article gives
you a high-level understanding of the components and how they work together to
assist in the process of building, deploying, and maintaining machine learning models.
Workspace
A machine learning workspace is the top-level resource for Azure Machine Learning.
The workspace is the centralized place to:
Manage resources you use for training and deployment of models, such as
computes
Store assets you create when you use Azure Machine Learning, including:
Environments
Experiments
Pipelines
Datasets
Models
Endpoints
A workspace includes other Azure resources that are used by the workspace:
Azure Container Registry (ACR) : Registers docker containers that you use during
training and when you deploy a model. To minimize costs, ACR is only created
when deployment images are created.
Azure Storage account : Is used as the default datastore for the workspace.
Jupyter notebooks that are used with your Azure Machine Learning compute
instances are stored here as well.
Azure Application Insights : Stores monitoring information about your models.
Azure Key Vault : Stores secrets that are used by compute targets and other
sensitive information that's needed by the workspace.
You can share a workspace with others.
Computes
A compute target is any machine or set of machines you use to run your training script
or host your service deployment. You can use your local machine or a remote compute
resource as a compute target. With compute targets, you can start training on your local
machine and then scale out to the cloud without changing your training script.
Azure Machine Learning introduces two fully managed cloud-based virtual machines
(VM) that are configured for machine learning tasks:
Compute instance: A compute instance is a VM that includes multiple tools and

environments installed for machine learning. The primary use of a compute
instance is for your development workstation. You can start running sample
notebooks with no setup required. A compute instance can also be used as a
compute target for training and inferencing jobs.
Compute clusters: Compute clusters are a cluster of VMs with multi-node scaling
capabilities. Compute clusters are better suited for compute targets for large jobs
and production. The cluster scales up automatically when a job is submitted. Use
as a training compute target or for dev/test deployment.
For more information about training compute targets, see Training compute targets. For
more information about deployment compute targets, see Deployment targets.
Datasets and datastores

Azure Machine Learning Datasets make it easier to access and work with your data. By
creating a dataset, you create a reference to the data source location along with a copy
of its metadata. Because the data remains in its existing location, you incur no extra
storage cost, and don't risk the integrity of your data sources.
For more information, see Create and register Azure Machine Learning Datasets. For
more examples using Datasets, see the sample notebooks .
Datasets use datastore to securely connect to your Azure storage services. Datastores
store connection information without putting your authentication credentials and the
integrity of your original data source at risk. They store connection information, like your
subscription ID and token authorization in your Key Vault associated with the
workspace, so you can securely access your storage without having to hard code them
in your script.
Environments
Workspace > Environments
An environment is the encapsulation of the environment where training or scoring of

your machine learning model happens. The environment specifies the Python packages,
environment variables, and software settings around your training and scoring scripts.
For code samples, see the "Manage environments" section of How to use environments.
Experiments
Workspace > Experiments
An experiment is a grouping of many runs from a specified script. It always belongs to a

workspace. When you submit a run, you provide an experiment name. Information for
the run is stored under that experiment. If the name doesn't exist when you submit an
experiment, a new experiment is automatically created.
For an example of using an experiment, see Tutorial: Train your first model.
Runs
Workspace > Experiments > Run
A run is a single execution of a training script. An experiment will typically contain

multiple runs.
Azure Machine Learning records all runs and stores the following information in the
experiment:
Metadata about the run (timestamp, duration, and so on)

Metrics that are logged by your script
Output files that are autocollected by the experiment or explicitly uploaded by you
A snapshot of the directory that contains your scripts, prior to the run
You produce a run when you submit a script to train a model. A run can have zero or
more child runs. For example, the top-level run might have two child runs, each of which
might have its own child run.
Run configurations
Workspace > Experiments > Run > Run configuration
A run configuration defines how a script should be run in a specified compute target.
You use the configuration to specify the script, the compute target and Azure Machine
Learning environment to run on, any distributed job-specific configurations, and some
additional properties. For more information on the full set of configurable options for
runs, see ScriptRunConfig.
A run configuration can be persisted into a file inside the directory that contains your
training script. Or it can be constructed as an in-memory object and used to submit a
run.
For example run configurations, see Configure a training run.
Snapshots
Workspace > Experiments > Run > Snapshot
When you submit a run, Azure Machine Learning compresses the directory that contains
the script as a zip file and sends it to the compute target. The zip file is then extracted,
and the script is run there. Azure Machine Learning also stores the zip file as a snapshot
as part of the run record. Anyone with access to the workspace can browse a run record
and download the snapshot.
Logging
Azure Machine Learning automatically logs standard run metrics for you. However, you
can also use the Python SDK to log arbitrary metrics.
There are multiple ways to view your logs: monitoring run status in real time, or viewing
results after completion. For more information, see Monitor and view ML run logs.
７ Note
To prevent unnecessary files from being included in the snapshot, make an ignore
file ( .gitignore or .amlignore ) in the directory. Add the files and directories to
exclude to this file. For more information on the syntax to use inside this file, see
syntax and patterns for .gitignore . The .amlignore file uses the same syntax. If
both files exist, the .amlignore file is used and the .gitignore file is unused.
Git tracking and integration

When you start a training run where the source directory is a local Git repository,
information about the repository is stored in the run history. This works with runs
submitted using a script run configuration or ML pipeline. It also works for runs
submitted from the SDK or Machine Learning CLI.
For more information, see Git integration for Azure Machine Learning.
Training workflow
When you run an experiment to train a model, the following steps happen. These are
illustrated in the training workflow diagram below:
Azure Machine Learning is called with the snapshot ID for the code snapshot saved
in the previous section.
Azure Machine Learning creates a run ID (optional) and a Machine Learning service
token, which is later used by compute targets like Machine Learning Compute/VMs
to communicate with the Machine Learning service.
You can choose either a managed compute target (like Machine Learning
Compute) or an unmanaged compute target (like VMs) to run training jobs. Here
are the data flows for both scenarios:
VMs/HDInsight, accessed by SSH credentials in a key vault in the Microsoft
subscription. Azure Machine Learning runs management code on the compute
target that:
1. Prepares the environment. (Docker is an option for VMs and local computers.
See the following steps for Machine Learning Compute to understand how
running experiments on Docker containers works.)
2. Downloads the code.
3. Sets up environment variables and configurations.
4. Runs user scripts (the code snapshot mentioned in the previous section).
Machine Learning Compute, accessed through a workspace-managed identity.

Because Machine Learning Compute is a managed compute target (that is, it's
managed by Microsoft) it runs under your Microsoft subscription.
1. Remote Docker construction is kicked off, if needed.

2. Management code is written to the user's Azure Files share.
3. The container is started with an initial command. That is, management code
as described in the previous step.
After the run completes, you can query runs and metrics. In the flow diagram
below, this step occurs when the training compute target writes the run metrics
back to Azure Machine Learning from storage in the Azure Cosmos DB database.
Clients can call Azure Machine Learning. Machine Learning will in turn pull metrics
from the Azure Cosmos DB database and return them back to the client.
Models
At its simplest, a model is a piece of code that takes an input and produces output.
Creating a machine learning model involves selecting an algorithm, providing it with
data, and tuning hyperparameters. Training is an iterative process that produces a
trained model, which encapsulates what the model learned during the training process.
You can bring a model that was trained outside of Azure Machine Learning. Or you can
train a model by submitting a run of an experiment to a compute target in Azure
Machine Learning. Once you have a model, you register the model in the workspace.
Azure Machine Learning is framework agnostic. When you create a model, you can use
any popular machine learning framework, such as Scikit-learn, XGBoost, PyTorch,
TensorFlow, and Chainer.
For an example of training a model using Scikit-learn, see Tutorial: Train an image
classification model with Azure Machine Learning.
Model registry
Workspace > Models
The model registry lets you keep track of all the models in your Azure Machine Learning
workspace.
Models are identified by name and version. Each time you register a model with the
same name as an existing one, the registry assumes that it's a new version. The version
is incremented, and the new model is registered under the same name.
When you register the model, you can provide additional metadata tags and then use
the tags when you search for models.
 Tip
A registered model is a logical container for one or more files that make up your
model. For example, if you have a model that is stored in multiple files, you can
register them as a single model in your Azure Machine Learning workspace. After
registration, you can then download or deploy the registered model and receive all
the files that were registered.
You can't delete a registered model that is being used by an active deployment.
For an example of registering a model, see Train an image classification model with
Azure Machine Learning.
Deployment
You deploy a registered model as a service endpoint. You need the following
components:
Environment. This environment encapsulates the dependencies required to run

your model for inference.
Scoring code. This script accepts requests, scores the requests by using the model,
and returns the results.
Inference configuration. The inference configuration specifies the environment,
entry script, and other components needed to run the model as a service.
For more information about these components, see Deploy models with Azure Machine
Learning.
Endpoints
Workspace > Endpoints
An endpoint is an instantiation of your model into a web service that can be hosted in
the cloud.
Web service endpoint
When deploying a model as a web service, the endpoint can be deployed on Azure
Container Instances, Azure Kubernetes Service, or FPGAs. You create the service from
your model, script, and associated files. These are placed into a base container image,
which contains the execution environment for the model. The image has a load-
balanced, HTTP endpoint that receives scoring requests that are sent to the web service.
You can enable Application Insights telemetry or model telemetry to monitor your web
service. The telemetry data is accessible only to you. It's stored in your Application
Insights and storage account instances. If you've enabled automatic scaling, Azure
automatically scales your deployment.
The following diagram shows the inference workflow for a model deployed as a web
service endpoint:
Here are the details:
The user registers a model by using a client like the Azure Machine Learning SDK.
The user creates an image by using a model, a score file, and other model
dependencies.
The Docker image is created and stored in Azure Container Registry.
The web service is deployed to the compute target (Container Instances/AKS)
using the image created in the previous step.
Scoring request details are stored in Application Insights, which is in the user's
subscription.
Telemetry is also pushed to the Microsoft Azure subscription.

For an example of deploying a model as a web service, see Tutorial: Train and deploy a
model.
Real-time endpoints
When you deploy a trained model in the designer, you can deploy the model as a real-
time endpoint. A real-time endpoint commonly receives a single request via the REST
endpoint and returns a prediction in real-time. This is in contrast to batch processing,
which processes multiple values at once and saves the results after completion to a
datastore.
Pipeline endpoints
Pipeline endpoints let you call your ML Pipelines programatically via a REST endpoint.
Pipeline endpoints let you automate your pipeline workflows.
A pipeline endpoint is a collection of published pipelines. This logical organization lets

you manage and call multiple pipelines using the same endpoint. Each published
pipeline in a pipeline endpoint is versioned. You can select a default pipeline for the
endpoint, or specify a version in the REST call.
Automation
Azure Machine Learning CLI

The Azure Machine Learning CLI v1 is an extension to the Azure CLI, a cross-platform
command-line interface for the Azure platform. This extension provides commands to
automate your machine learning activities.
ML Pipelines
You use machine learning pipelines to create and manage workflows that stitch together
machine learning phases. For example, a pipeline might include data preparation, model
training, model deployment, and inference/scoring phases. Each phase can encompass
multiple steps, each of which can run unattended in various compute targets.
Pipeline steps are reusable, and can be run without rerunning the previous steps if the
output of those steps hasn't changed. For example, you can retrain a model without
rerunning costly data preparation steps if the data hasn't changed. Pipelines also allow
data scientists to collaborate while working on separate areas of a machine learning
workflow.
Monitoring and logging

Azure Machine Learning provides the following monitoring and logging capabilities:
For Data Scientists, you can monitor your experiments and log information from
your training runs. For more information, see the following articles:
Start, monitor, and cancel training runs
Log metrics for training runs
Track experiments with MLflow
Visualize runs with TensorBoard
For Administrators, you can monitor information about the workspace, related
Azure resources, and events such as resource creation and deletion by using Azure
Monitor. For more information, see How to monitor Azure Machine Learning.
For DevOps or MLOps, you can monitor information generated by models
deployed as web services to identify problems with the deployments and gather
data submitted to the service. For more information, see Collect model data and
Monitor with Application Insights.
Interacting with your workspace
Studio
Azure Machine Learning studio provides a web view of all the artifacts in your
workspace. You can view results and details of your datasets, experiments, pipelines,
models, and endpoints. You can also manage compute resources and datastores in the
studio.
The studio is also where you access the interactive tools that are part of Azure Machine
Learning:
Azure Machine Learning designer to perform workflow steps without writing code
Web experience for automated machine learning
Azure Machine Learning notebooks to write and run your own code in integrated
Jupyter notebook servers.
Data labeling projects to create, manage, and monitor projects for labeling images
or text.
Programming tools
） Important
Tools marked (preview) below are currently in public preview. The preview version is
provided without a service level agreement, and it's not recommended for
production workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of Use for
Microsoft Azure Previews .
Interact with the service in any Python environment with the Azure Machine
Learning SDK for Python.
Use Azure Machine Learning designer to perform the workflow steps without
writing code.
Use Azure Machine Learning CLI for automation.
Next steps
To get started with Azure Machine Learning, see:

Create an Azure Machine Learning workspace
Tutorial: Train and deploy a model
Set up a Python development
environment for Azure Machine
Learning (v1)
Article • 06/01/2023
Learn how to configure a Python development environment for Azure Machine

Learning.
The following table shows each development environment covered in this article, along
with pros and cons.
Environment Pros Cons
Local Full control of your development Takes longer to get started.

environment environment and dependencies. Run with Necessary SDK packages must be
any build tool, environment, or IDE of your installed, and an environment
choice. must also be installed if you don't
already have one.
The Data Similar to the cloud-based compute A slower getting started

Science instance (Python and the SDK are pre- experience compared to the
Virtual installed), but with additional popular data cloud-based compute instance.
Machine science and machine learning tools pre-
(DSVM) installed. Easy to scale and combine with
other custom tools and workflows.
Azure Easiest way to get started. The entire SDK is Lack of control over your
Machine already installed in your workspace VM, and development environment and
Learning notebook tutorials are pre-cloned and dependencies. Additional cost
compute ready to run. incurred for Linux VM (VM can be
instance stopped when not in use to avoid
charges). See pricing details .
Azure Ideal for running large-scale intensive Overkill for experimental machine
Databricks machine learning workflows on the scalable learning, or smaller-scale
Apache Spark platform. experiments and workflows.
Additional cost incurred for Azure
Databricks. See pricing details .
This article also provides additional usage tips for the following tools:
Jupyter Notebooks: If you're already using Jupyter Notebooks, the SDK has some
extras that you should install.
Visual Studio Code: If you use Visual Studio Code, the Azure Machine Learning
extension includes extensive language support for Python as well as features to
make working with the Azure Machine Learning much more convenient and
productive.
Prerequisites
Azure Machine Learning workspace. If you don't have one, you can create an Azure
Machine Learning workspace through the Azure portal, Azure CLI, and Azure
Resource Manager templates.
Local and DSVM only: Create a workspace configuration

file
The workspace configuration file is a JSON file that tells the SDK how to communicate
with your Azure Machine Learning workspace. The file is named config.json, and it has
the following format:
JSON
{
"subscription_id": "<subscription-id>",
"resource_group": "<resource-group>",
"workspace_name": "<workspace-name>"
}
This JSON file must be in the directory structure that contains your Python scripts or
Jupyter Notebooks. It can be in the same directory, a subdirectory named .azureml, or in
a parent directory.
To use this file from your code, use the Workspace.from_config method. This code loads
the information from the file and connects to your workspace.
Create a workspace configuration file in one of the following methods:
Azure portal
Download the file: In the Azure portal , select Download config.json from the
Overview section of your workspace.
Azure Machine Learning Python SDK
Create a script to connect to your Azure Machine Learning workspace and use the
write_config method to generate your file and save it as .azureml/config.json. Make
sure to replace subscription_id , resource_group , and workspace_name with your
own.
APPLIES TO: Python SDK azureml v1
Python
from azureml.core import Workspace
subscription_id = '<subscription-id>'
resource_group = '<resource-group>'
workspace_name = '<workspace-name>'
try:
ws = Workspace(subscription_id = subscription_id, resource_group =
resource_group, workspace_name = workspace_name)
ws.write_config()
print('Library configuration succeeded')
except:
print('Workspace not found')
Local computer or remote VM environment

You can set up an environment on a local computer or remote virtual machine, such as
an Azure Machine Learning compute instance or Data Science VM.
To configure a local development environment or remote VM:
1. Create a Python virtual environment (virtualenv, conda).

７ Note
Although not required, it's recommended you use Anaconda or

Miniconda to manage Python virtual environments and install packages.
） Important
If you're on Linux or macOS and use a shell other than bash (for example, zsh)
you might receive errors when you run some commands. To work around this
problem, use the bash command to start a new bash shell and run the
commands there.
2. Activate your newly created Python virtual environment.
3. Install the Azure Machine Learning Python SDK.
4. To configure your local environment to use your Azure Machine Learning

workspace, create a workspace configuration file or use an existing one.
Now that you have your local environment set up, you're ready to start working with
Azure Machine Learning. See the Azure Machine Learning Python getting started guide
to get started.
Jupyter Notebooks
When running a local Jupyter Notebook server, it's recommended that you create an
IPython kernel for your Python virtual environment. This helps ensure the expected
kernel and package import behavior.
1. Enable environment-specific IPython kernels
Bash
conda install notebook ipykernel
2. Create a kernel for your Python virtual environment. Make sure to replace <myenv>
with the name of your Python virtual environment.
Bash
ipython kernel install --user --name <myenv> --display-name "Python

(myenv)"
3. Launch the Jupyter Notebook server
See the Azure Machine Learning notebooks repository to get started with Azure
Machine Learning and Jupyter Notebooks. Also see the community-driven repository,
AzureML-Examples .
Visual Studio Code

To use Visual Studio Code for development:
1. Install Visual Studio Code .
2. Install the Azure Machine Learning Visual Studio Code extension (preview).
） Important
This feature is currently in public preview. This preview version is provided

without a service-level agreement, and it's not recommended for production
workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of
Use for Microsoft Azure Previews .
Once you have the Visual Studio Code extension installed, use it to:
Manage your Azure Machine Learning resources

Launch Visual Studio Code remotely connected to a compute instance (preview)
Run and debug experiments
Deploy trained models.
Azure Machine Learning compute instance

The Azure Machine Learning compute instance is a secure, cloud-based Azure
workstation that provides data scientists with a Jupyter Notebook server, JupyterLab,
and a fully managed machine learning environment.
There is nothing to install or configure for a compute instance.
Create one anytime from within your Azure Machine Learning workspace. Provide just a
name and specify an Azure VM type. Try it now with Create resources to get started.
To learn more about compute instances, including how to install packages, see Create
and manage an Azure Machine Learning compute instance.
 Tip
To prevent incurring charges for an unused compute instance, stop the compute
instance. Or enable idle shutdown for the compute instance.
In addition to a Jupyter Notebook server and JupyterLab, you can use compute
instances in the integrated notebook feature inside of Azure Machine Learning studio.
You can also use the Azure Machine Learning Visual Studio Code extension to connect
to a remote compute instance using VS Code.
Data Science Virtual Machine

The Data Science VM is a customized virtual machine (VM) image you can use as a
development environment. It's designed for data science work that's pre-configured
tools and software like:
Packages such as TensorFlow, PyTorch, Scikit-learn, XGBoost, and the Azure

Machine Learning SDK
Popular data science tools such as Spark Standalone and Drill
Azure tools such as the Azure CLI, AzCopy, and Storage Explorer
Integrated development environments (IDEs) such as Visual Studio Code and
PyCharm
Jupyter Notebook Server
For a more comprehensive list of the tools, see the Data Science VM tools guide.
） Important
If you plan to use the Data Science VM as a compute target for your training or
inferencing jobs, only Ubuntu is supported.
To use the Data Science VM as a development environment:
1. Create a Data Science VM using one of the following methods:
Use the Azure portal to create an Ubuntu or Windows DSVM.
Create a Data Science VM using ARM templates.
Use the Azure CLI

To create an Ubuntu Data Science VM, use the following command:
Azure CLI
# create a Ubuntu Data Science VM in your resource group

# note you need to be at least a contributor to the resource group
in order to execute this command successfully
# If you need to create a new resource group use: "az group create
--name YOUR-RESOURCE-GROUP-NAME --location YOUR-REGION (For
example: westus2)"
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name
YOUR-VM-NAME --image microsoft-dsvm:linux-data-science-vm-
ubuntu:linuxdsvmubuntu:latest --admin-username YOUR-USERNAME --
admin-password YOUR-PASSWORD --generate-ssh-keys --authentication-
type password
To create a Windows DSVM, use the following command:
Azure CLI
# create a Windows Server 2016 DSVM in your resource group

# note you need to be at least a contributor to the resource group
in order to execute this command successfully
az vm create --resource-group YOUR-RESOURCE-GROUP-NAME --name
YOUR-VM-NAME --image microsoft-dsvm:dsvm-windows:server-
2016:latest --admin-username YOUR-USERNAME --admin-password YOUR-
PASSWORD --authentication-type password
2. Activate the conda environment containing the Azure Machine Learning SDK.
For Ubuntu Data Science VM:
Bash
conda activate py36
For Windows Data Science VM:
Bash
conda activate AzureML
3. To configure the Data Science VM to use your Azure Machine Learning workspace,
create a workspace configuration file or use an existing one.
Similar to local environments, you can use Visual Studio Code and the Azure Machine
Learning Visual Studio Code extension to interact with Azure Machine Learning.
For more information, see Data Science Virtual Machines .
Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference.
Install & use the CLI (v1)
Article • 06/01/2023
APPLIES TO: Azure CLI ml extension v1
） Important
The Azure CLI commands in this article require the azure-cli-ml , or v1, extension
for Azure Machine Learning. Support for the v1 extension will end on September
30, 2025. You will be able to install and use the v1 extension until that date.
We recommend that you transition to the ml , or v2, extension before September

30, 2025. For more information on the v2 extension, see Azure ML CLI extension
and Python SDK v2.
The Azure Machine Learning CLI is an extension to the Azure CLI, a cross-platform
command-line interface for the Azure platform. This extension provides commands for
working with Azure Machine Learning. It allows you to automate your machine learning
activities. The following list provides some example actions that you can do with the CLI
extension:
Run experiments to create machine learning models
Register machine learning models for customer usage
Package, deploy, and track the lifecycle of your machine learning models
The CLI isn't a replacement for the Azure Machine Learning SDK. It's a complementary
tool that is optimized to handle highly parameterized tasks which suit themselves well to
automation.
Prerequisites
To use the CLI, you must have an Azure subscription. If you don't have an Azure
subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today.
To use the CLI commands in this document from your local environment, you
need the Azure CLI.
If you use the Azure Cloud Shell , the CLI is accessed through the browser and
lives in the cloud.
Full reference docs

Find the full reference docs for the azure-cli-ml extension of Azure CLI.
Connect the CLI to your Azure subscription
） Important
If you are using the Azure Cloud Shell, you can skip this section. The cloud shell
automatically authenticates you using the account you log into your Azure
subscription.
There are several ways that you can authenticate to your Azure subscription from the
CLI. The most basic is to interactively authenticate using a browser. To authenticate
interactively, open a command line or terminal and use the following command:
Azure CLI
az login
If the CLI can open your default browser, it will do so and load a sign-in page.
Otherwise, you need to open a browser and follow the instructions on the command
line. The instructions involve browsing to https://aka.ms/devicelogin and entering an
authorization code.
 Tip
After logging in, you see a list of subscriptions associated with your Azure account.
The subscription information with isDefault: true is the currently activated
subscription for Azure CLI commands. This subscription must be the same one that
contains your Azure Machine Learning workspace. You can find the subscription ID
from the Azure portal by visiting the overview page for your workspace. You can
also use the SDK to get the subscription ID from the workspace object. For
example, Workspace.from_config().subscription_id .
To select another subscription, use the az account set -s <subscription name or

ID> command and specify the subscription name or ID to switch to. For more
information about subscription selection, see Use multiple Azure Subscriptions.
For other methods of authenticating, see Sign in with Azure CLI.
Install the extension

To install the CLI (v1) extension:
Azure CLI
az extension add -n azure-cli-ml
Update the extension

To update the Machine Learning CLI extension, use the following command:
Azure CLI
az extension update -n azure-cli-ml
Remove the extension

To remove the CLI extension, use the following command:
Azure CLI
az extension remove -n azure-cli-ml
Resource management
The following commands demonstrate how to use the CLI to manage resources used by
If you don't already have one, create a resource group:
Azure CLI
az group create -n myresourcegroup -l westus2
Create an Azure Machine Learning workspace:

Azure CLI
az ml workspace create -w myworkspace -g myresourcegroup
For more information, see az ml workspace create.
Attach a workspace configuration to a folder to enable CLI contextual awareness.
Azure CLI
az ml folder attach -w myworkspace -g myresourcegroup
This command creates a .azureml subdirectory that contains example runconfig

and conda environment files. It also contains a config.json file that is used to
communicate with your Azure Machine Learning workspace.
For more information, see az ml folder attach.
Attach an Azure blob container as a Datastore.
Azure CLI
az ml datastore attach-blob -n datastorename -a accountname -c

containername
For more information, see az ml datastore attach-blob.
Upload files to a Datastore.
Azure CLI
az ml datastore upload -n datastorename -p sourcepath
For more information, see az ml datastore upload.
Attach an AKS cluster as a Compute Target.
Azure CLI
az ml computetarget attach aks -n myaks -i myaksresourceid -g

myresourcegroup -w myworkspace
For more information, see az ml computetarget attach aks

Compute clusters
Create a new managed compute cluster.
Azure CLI
az ml computetarget create amlcompute -n cpu --min-nodes 1 --max-nodes

1 -s STANDARD_D3_V2
Create a new managed compute cluster with managed identity
User-assigned managed identity
Azure CLI
az ml computetarget create amlcompute --name cpu-cluster --vm-size

Standard_NC6 --max-nodes 5 --assign-identity
'/subscriptions/<subcription_id>/resourcegroups/<resource_group>/pro
viders/Microsoft.ManagedIdentity/userAssignedIdentities/<user_assign
ed_identity>'
System-assigned managed identity
Azure CLI
az ml computetarget create amlcompute --name cpu-cluster --vm-size

Standard_NC6 --max-nodes 5 --assign-identity '[system]'
Add a managed identity to an existing cluster:
User-assigned managed identity
Azure CLI
az ml computetarget amlcompute identity assign --name cpu-cluster

'/subscriptions/<subcription_id>/resourcegroups/<resource_group>/pro
viders/Microsoft.ManagedIdentity/userAssignedIdentities/<user_assign
ed_identity>'
System-assigned managed identity
Azure CLI
az ml computetarget amlcompute identity assign --name cpu-cluster

'[system]'
For more information, see az ml computetarget create amlcompute.
７ Note
Azure Machine Learning compute clusters support only one system-assigned

identity or multiple user-assigned identities, not both concurrently.
Compute instance
Manage compute instances. In all the examples below, the name of the compute
instance is cpu
Create a new computeinstance.
Azure CLI
az ml computetarget create computeinstance -n cpu -s "STANDARD_D3_V2" -

v
For more information, see az ml computetarget create computeinstance.
Stop a computeinstance.
Azure CLI
az ml computetarget computeinstance stop -n cpu -v
For more information, see az ml computetarget computeinstance stop.
Start a computeinstance.
Azure CLI
az ml computetarget computeinstance start -n cpu -v
For more information, see az ml computetarget computeinstance start.
Restart a computeinstance.
Azure CLI
az ml computetarget computeinstance restart -n cpu -v

For more information, see az ml computetarget computeinstance restart.
Delete a computeinstance.
Azure CLI
az ml computetarget delete -n cpu -v
For more information, see az ml computetarget delete computeinstance.
Run experiments
Start a run of your experiment. When using this command, specify the name of the
runconfig file (the text before *.runconfig if you're looking at your file system)
against the -c parameter.
Azure CLI
az ml run submit-script -c sklearn -e testexperiment train.py
 Tip
The az ml folder attach command creates a .azureml subdirectory, which

contains two example runconfig files.
If you have a Python script that creates a run configuration object

programmatically, you can use RunConfig.save() to save it as a runconfig file.
The full runconfig schema can be found in this JSON file . The schema is
self-documenting through the description key of each object. Additionally,
there are enums for possible values, and a template snippet at the end.
For more information, see az ml run submit-script.
View a list of experiments:
Azure CLI
az ml experiment list
For more information, see az ml experiment list.

HyperDrive run
You can use HyperDrive with Azure CLI to perform parameter tuning runs. First, create a
HyperDrive configuration file in the following format. See Tune hyperparameters for
your model article for details on hyperparameter tuning parameters.
yml
# hdconfig.yml
sampling:
type: random # Supported options: Random, Grid, Bayesian
parameter_space: # specify a name|expression|values tuple for each
parameter.
- name: --penalty # The name of a script parameter to generate values
for.
expression: choice # supported options: choice, randint, uniform,
quniform, loguniform, qloguniform, normal, qnormal, lognormal, qlognormal
values: [0.5, 1, 1.5] # The list of values, the number of values is
dependent on the expression specified.
policy:
type: BanditPolicy # Supported options: BanditPolicy,
MedianStoppingPolicy, TruncationSelectionPolicy, NoTerminationPolicy
evaluation_interval: 1 # Policy properties are policy specific. See the
above link for policy specific parameter details.
slack_factor: 0.2
primary_metric_name: Accuracy # The metric used when evaluating the policy
primary_metric_goal: Maximize # Maximize|Minimize
max_total_runs: 8 # The maximum number of runs to generate
max_concurrent_runs: 2 # The number of runs that can run concurrently.
max_duration_minutes: 100 # The maximum length of time to run the experiment
before cancelling.
Add this file alongside the run configuration files. Then submit a HyperDrive run using:
Azure CLI
az ml run submit-hyperdrive -e <experiment> -c <runconfig> --hyperdrive-

configuration-name <hdconfig> my_train.py
Note the arguments section in runconfig and parameter space in HyperDrive config.
They contain the command-line arguments to be passed to training script. The value in
runconfig stays the same for each iteration, while the range in HyperDrive config is
iterated over. Don't specify the same argument in both files.
Dataset management
The following commands demonstrate how to work with datasets in Azure Machine
Learning:
Register a dataset:
Azure CLI
az ml dataset register -f mydataset.json
For information on the format of the JSON file used to define the dataset, use az
ml dataset register --show-template .
For more information, see az ml dataset register.
List all datasets in a workspace:
Azure CLI
az ml dataset list
For more information, see az ml dataset list.
Get details of a dataset:
Azure CLI
az ml dataset show -n dataset-name
For more information, see az ml dataset show.
Unregister a dataset:
Azure CLI
az ml dataset unregister -n dataset-name
For more information, see az ml dataset unregister.
Environment management
The following commands demonstrate how to create, register, and list Azure Machine
Learning environments for your workspace:
Create scaffolding files for an environment:

Azure CLI
az ml environment scaffold -n myenv -d myenvdirectory
For more information, see az ml environment scaffold.
Register an environment:
Azure CLI
az ml environment register -d myenvdirectory
For more information, see az ml environment register.
List registered environments:
Azure CLI
az ml environment list
For more information, see az ml environment list.
Download a registered environment:
Azure CLI
az ml environment download -n myenv -d downloaddirectory
For more information, see az ml environment download.
Environment configuration schema

If you used the az ml environment scaffold command, it generates a template
azureml_environment.json file that can be modified and used to create custom
environment configurations with the CLI. The top level object loosely maps to the
Environment class in the Python SDK.
JSON
{
"name": "testenv",
"version": null,
"environmentVariables": {
"EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
},
"python": {
"userManagedDependencies": false,
"interpreterPath": "python",
"condaDependenciesFile": null,
"baseCondaEnvironment": null
},
"docker": {
"enabled": false,
"baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-
ubuntu18.04:20210615.v1",
"baseDockerfile": null,
"sharedVolumes": true,
"shmSize": "2g",
"arguments": [],
"baseImageRegistry": {
"address": null,
"username": null,
"password": null
}
},
"spark": {
"repositories": [],
"packages": [],
"precachePackages": true
},
"databricks": {
"mavenLibraries": [],
"pypiLibraries": [],
"rcranLibraries": [],
"jarLibraries": [],
"eggLibraries": []
},
"inferencingStackVersion": null
}
The following table details each top-level field in the JSON file, its type, and a
description. If an object type is linked to a class from the Python SDK, there's a loose 1:1
match between each JSON field and the public variable name in the Python class. In
some cases, the field may map to a constructor argument rather than a class variable.
For example, the environmentVariables field maps to the environment_variables
variable in the Environment class.
JSON field Type Description
name string Name of the environment. Don't start

name with Microsoft or AzureML.
version string Version of the environment.
environmentVariables {string: string} A hash-map of environment variable

names and values.
JSON field Type Description
python PythonSectionhat defines

the Python environment
and interpreter to use on
target compute resource.
docker DockerSection Defines settings to customize the

Docker image built to the
environment's specifications.
spark SparkSection The section configures Spark settings.

It's only used when framework is set to
PySpark.
databricks DatabricksSection Configures Databricks library

dependencies.
inferencingStackVersion string Specifies the inferencing stack version

added to the image. To avoid adding
an inferencing stack, leave this field
null . Valid value: "latest".
ML pipeline management
The following commands demonstrate how to work with machine learning pipelines:
Create a machine learning pipeline:
Azure CLI
az ml pipeline create -n mypipeline -y mypipeline.yml
For more information, see az ml pipeline create.
For more information on the pipeline YAML file, see Define machine learning
pipelines in YAML.
Run a pipeline:
Azure CLI
az ml run submit-pipeline -n myexperiment -y mypipeline.yml
For more information, see az ml run submit-pipeline.

For more information on the pipeline YAML file, see Define machine learning
pipelines in YAML.
Schedule a pipeline:
Azure CLI
az ml pipeline create-schedule -n myschedule -e myexperiment -i

mypipelineid -y myschedule.yml
For more information, see az ml pipeline create-schedule.
Model registration, profiling, deployment

The following commands demonstrate how to register a trained model, and then deploy
it as a production service:
Register a model with Azure Machine Learning:
Azure CLI
az ml model register -n mymodel -p sklearn_regression_model.pkl
For more information, see az ml model register.
OPTIONAL Profile your model to get optimal CPU and memory values for
deployment.
Azure CLI
az ml model profile -n myprofile -m mymodel:1 --ic inferenceconfig.json

-d "{\"data\": [[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}" -t
myprofileresult.json
For more information, see az ml model profile.
Deploy your model to AKS
Azure CLI
az ml model deploy -n myservice -m mymodel:1 --ic inferenceconfig.json

--dc deploymentconfig.json --ct akscomputetarget
For more information on the inference configuration file schema, see Inference
configuration schema.
For more information on the deployment configuration file schema, see

Deployment configuration schema.
For more information, see az ml model deploy.
Inference configuration schema

The entries in the inferenceconfig.json document map to the parameters for the
InferenceConfig class. The following table describes the mapping between entities in the
JSON document and the parameters for the method:
JSON entity Method Description

parameter
entryScript entry_script Path to a local file that contains the code to run for the
image.
sourceDirectory source_directory Optional. Path to folders that contain all files to create the
image, which makes it easy to access any files within this
folder or subfolder. You can upload an entire folder from
your local machine as dependencies for the Webservice.
Note: your entry_script, conda_file, and
extra_docker_file_steps paths are relative paths to the
source_directory path.
environment environment Optional. Azure Machine Learning environment.
You can include full specifications of an Azure Machine Learning environment in the
inference configuration file. If this environment doesn't exist in your workspace, Azure
Machine Learning will create it. Otherwise, Azure Machine Learning will update the
environment if necessary. The following JSON is an example:
JSON
{
"entryScript": "score.py",
"environment": {
"docker": {
"arguments": [],
"baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-
ubuntu18.04",
"enabled": false,
"shmSize": null
},
},
"name": "my-deploy-env",
"python": {
"baseCondaEnvironment": null,
"condaDependencies": {
"channels": [
"conda-forge"
],
"dependencies": [
"python=3.7",
{
"pip": [
"azureml-defaults",
"azureml-telemetry",
"scikit-learn==0.22.1",
"inference-schema[numpy-support]"
]
}
],
"name": "project_environment"
},
"userManagedDependencies": false
},
"version": "1"
}
}
You can also use an existing Azure Machine Learning environment in separated CLI
parameters and remove the "environment" key from the inference configuration file. Use
-e for the environment name, and --ev for the environment version. If you don't specify
--ev, the latest version will be used. Here is an example of an inference configuration
file:
JSON
{
"sourceDirectory": null
}
The following command demonstrates how to deploy a model using the previous
inference configuration file (named myInferenceConfig.json).
It also uses the latest version of an existing Azure Machine Learning environment
(named AzureML-Minimal).
Azure CLI
az ml model deploy -m mymodel:1 --ic myInferenceConfig.json -e AzureML-

Minimal --dc deploymentconfig.json
Deployment configuration schema
Local deployment configuration schema

The entries in the deploymentconfig.json document map to the parameters for
LocalWebservice.deploy_configuration. The following table describes the mapping
between the entities in the JSON document and the parameters for the method:
JSON Method Description

entity parameter
computeType NA The compute target. For local targets, the value must be
local .
port port The local port on which to expose the service's HTTP
endpoint.
This JSON is an example deployment configuration for use with the CLI:
JSON
{
"computeType": "local",
"port": 32267
}
Save this JSON as a file called deploymentconfig.json .
Azure Container Instance deployment configuration

schema
AciWebservice.deploy_configuration. The following table describes the mapping
JSON entity Method parameter Description
computeType NA The compute target. For ACI, the value

must be ACI .
containerResourceRequirements NA Container for the CPU and memory

entities.
cpu cpu_cores The number of CPU cores to allocate.

Defaults, 0.1
memoryInGB memory_gb The amount of memory (in GB) to

allocate for this web service. Default,
0.5
location location The Azure region to deploy this

Webservice to. If not specified the
Workspace location will be used. More
details on available regions can be
found here: ACI Regions
authEnabled auth_enabled Whether to enable auth for this

Webservice. Defaults to False
sslEnabled ssl_enabled Whether to enable SSL for this

Webservice. Defaults to False.
appInsightsEnabled enable_app_insights Whether to enable AppInsights for this

sslCertificate ssl_cert_pem_file The cert file needed if SSL is enabled
sslKey ssl_key_pem_file The key file needed if SSL is enabled
cname ssl_cname The cname for if SSL is enabled
dnsNameLabel dns_name_label The dns name label for the scoring

endpoint. If not specified a unique dns
name label will be generated for the
scoring endpoint.
The following JSON is an example deployment configuration for use with the CLI:
JSON
{
"computeType": "aci",
"containerResourceRequirements":
{
"cpu": 0.5,
"memoryInGB": 1.0
},
"authEnabled": true,
"sslEnabled": false,
"appInsightsEnabled": false
}
Azure Kubernetes Service deployment configuration

schema
AksWebservice.deploy_configuration. The following table describes the mapping
computeType NA The compute target.

For AKS, the value
must be aks .
autoScaler NA Contains configuration

elements for autoscale.
See the autoscaler
table.
autoscaleEnabled autoscale_enabled Whether to enable

autoscaling for the
web service. If
numReplicas = 0 ,
True ; otherwise, False .
minReplicas autoscale_min_replicas The minimum number

of containers to use
when autoscaling this
web service. Default, 1 .
maxReplicas autoscale_max_replicas The maximum number

of containers to use
when autoscaling this
web service. Default,
10 .
refreshPeriodInSeconds autoscale_refresh_seconds How often the

autoscaler attempts to
scale this web service.
Default, 1 .
targetUtilization autoscale_target_utilization The target utilization

(in percent out of 100)
that the autoscaler
should attempt to
maintain for this web
service. Default, 70 .
dataCollection NA Contains configuration

elements for data
collection.
storageEnabled collect_model_data Whether to enable

model data collection
for the web service.
Default, False .
authEnabled auth_enabled Whether or not to

enable key
authentication for the
web service. Both
tokenAuthEnabled and
authEnabled cannot be
True . Default, True .
tokenAuthEnabled token_auth_enabled Whether or not to

enable token
authentication for the
web service. Both
tokenAuthEnabled and
authEnabled cannot be
True . Default, False .
containerResourceRequirements NA Container for the CPU

and memory entities.
cpu cpu_cores The number of CPU

cores to allocate for
this web service.
Defaults, 0.1
memoryInGB memory_gb The amount of

memory (in GB) to
allocate for this web
service. Default, 0.5
appInsightsEnabled enable_app_insights Whether to enable

Application Insights
logging for the web
service. Default, False .
scoringTimeoutMs scoring_timeout_ms A timeout to enforce

for scoring calls to the
web service. Default,
60000 .
maxConcurrentRequestsPerContainer replica_max_concurrent_requests The maximum

concurrent requests
per node for this web
service. Default, 1 .
maxQueueWaitMs max_request_wait_time The maximum time a

request will stay in
thee queue (in
milliseconds) before a
503 error is returned.
Default, 500 .
numReplicas num_replicas The number of

containers to allocate
for this web service. No
default value. If this
parameter is not set,
the autoscaler is
enabled by default.
keys NA Contains configuration

elements for keys.
primaryKey primary_key A primary auth key to

use for this Webservice
secondaryKey secondary_key A secondary auth key

to use for this
Webservice
gpuCores gpu_cores The number of GPU

cores (per-container
replica) to allocate for
this Webservice.
Default is 1. Only
supports whole
number values.
livenessProbeRequirements NA Contains configuration

elements for liveness
probe requirements.
periodSeconds period_seconds How often (in seconds)

to perform the liveness
probe. Default to 10
seconds. Minimum
value is 1.
initialDelaySeconds initial_delay_seconds Number of seconds

after the container has
started before liveness
probes are initiated.
Defaults to 310
timeoutSeconds timeout_seconds Number of seconds

after which the liveness
probe times out.
Defaults to 2 seconds.
Minimum value is 1
successThreshold success_threshold Minimum consecutive

successes for the
liveness probe to be
considered successful
after having failed.
Defaults to 1. Minimum
value is 1.
failureThreshold failure_threshold When a Pod starts and

the liveness probe fails,
Kubernetes will try
failureThreshold times
before giving up.
Defaults to 3. Minimum
value is 1.
namespace namespace The Kubernetes

namespace that the
webservice is deployed
into. Up to 63
lowercase
alphanumeric ('a'-'z',
'0'-'9') and hyphen ('-')
characters. The first
and last characters
can't be hyphens.
JSON
{
"computeType": "aks",
"autoScaler":
{
"autoscaleEnabled": true,
"minReplicas": 1,
"maxReplicas": 3,
"refreshPeriodInSeconds": 1,
"targetUtilization": 70
},
"dataCollection":
{
"storageEnabled": true
},
{
"cpu": 0.5,
"memoryInGB": 1.0
}
}
Next steps
Command reference for the Machine Learning CLI extension.
Train and deploy machine learning models using Azure Pipelines

Set up Visual Studio Code desktop with
the Azure Machine Learning extension
(preview)
Article • 06/15/2023
Learn how to set up the Azure Machine Learning Visual Studio Code extension for your
machine learning workflows. You only need to do this setup when using the VS Code
desktop application. If you use VS Code for the Web, this is handled for you.
The Azure Machine Learning extension for VS Code provides a user interface to:
Manage Azure Machine Learning resources (experiments, virtual machines, models,

deployments, etc.)
Develop locally using remote compute instances
Train machine learning models
Debug machine learning experiments locally
Schema-based language support, autocompletion and diagnostics for specification
file authoring
） Important
This feature is currently in public preview. This preview version is provided without
a service-level agreement, and it's not recommended for production workloads.
Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Azure Machine Learning .
Visual Studio Code. If you don't have it, install it .
Python
(Optional) To create resources using the extension, you need to install the CLI (v2).
For setup instructions, see Install, set up, and use the CLI (v2).
Clone the community driven repository
Bash
git clone https://github.com/Azure/azureml-examples.git --depth 1
Install the extension

1. Open Visual Studio Code.
2. Select Extensions icon from the Activity Bar to open the Extensions view.
3. In the Extensions view search bar, type "Azure Machine Learning" and select the
first extension.
4. Select Install.
７ Note
The Azure Machine Learning VS Code extension uses the CLI (v2) by default. To
switch to the 1.0 CLI, set the azureML.CLI Compatibility Mode setting in Visual
Studio Code to 1.0 . For more information on modifying your settings in Visual
Studio, see the user and workspace settings documentation .
Sign in to your Azure Account

In order to provision resources and job workloads on Azure, you have to sign in with
your Azure account credentials. To assist with account management, Azure Machine
Learning automatically installs the Azure Account extension. Visit the following site to
learn more about the Azure Account extension .
To sign into your Azure account, select the Azure: Sign In button in the bottom right
corner on the Visual Studio Code status bar to start the sign in process.
Choose your default workspace

Choosing a default Azure Machine Learning workspace enables the following when
authoring CLI (v2) YAML specification files:
Schema validation
Autocompletion
Diagnostics
If you don't have a workspace, create one. For more information, see manage Azure
Machine Learning resources with the VS Code extension.
To choose your default workspace, select the Set Azure Machine Learning Workspace
button on the Visual Studio Code status bar and follow the prompts to set your
workspace.
Alternatively, use the > Azure ML: Set Default Workspace command in the command
palette and follow the prompts to set your workspace.
Next Steps
Manage your Azure Machine Learning resources
Develop on a remote compute instance locally
Train an image classification model using the Visual Studio Code extension
Run and debug machine learning experiments locally (CLI v1)
Tutorial: Designer - train a no-code
regression model
Article • 05/29/2023
Train a linear regression model that predicts car prices using the Azure Machine
Learning designer. This tutorial is part one of a two-part series.
This tutorial uses the Azure Machine Learning designer, for more information, see What
is Azure Machine Learning designer?
７ Note
Designer supports two types of components, classic prebuilt components (v1) and
custom components (v2). These two types of components are NOT compatible.
Classic prebuilt components provide prebuilt components majorly for data

processing and traditional machine learning tasks like regression and classification.
This type of component continues to be supported but will not have any new
components added.
Custom components allow you to wrap your own code as a component. It supports
sharing components across workspaces and seamless authoring across Studio, CLI
v2, and SDK v2 interfaces.
For new projects, we highly suggest you use custom component, which is
compatible with AzureML V2 and will keep receiving new updates.
This article applies to classic prebuilt components and not compatible with CLI v2
and SDK v2.
In part one of the tutorial, you learn how to:
＂ Create a new pipeline.

＂ Import data.
＂ Prepare data.
＂ Train a machine learning model.
＂ Evaluate a machine learning model.
In part two of the tutorial, you deploy your model as a real-time inferencing endpoint to
predict the price of any car based on technical specifications you send it.
７ Note
A completed version of this tutorial is available as a sample pipeline.
To find it, go to the designer in your workspace. In the New pipeline section, select
Sample 1 - Regression: Automobile Price Prediction(Basic).
） Important
If you do not see graphical elements mentioned in this document, such as buttons
in studio or designer, you may not have the right level of permissions to the
workspace. Please contact your Azure subscription administrator to verify that you
have been granted the correct level of access. For more information, see Manage
users and roles.
Create a new pipeline

Azure Machine Learning pipelines organize multiple machine learning and data
processing steps into a single resource. Pipelines let you organize, manage, and reuse
complex machine learning workflows across projects and users.
To create an Azure Machine Learning pipeline, you need an Azure Machine Learning
workspace. In this section, you learn how to create both these resources.
Create a new workspace

You need an Azure Machine Learning workspace to use the designer. The workspace is
the top-level resource for Azure Machine Learning, it provides a centralized place to
work with all the artifacts you create in Azure Machine Learning. For instruction on
creating a workspace, see Create workspace resources.
７ Note
If your workspace uses a Virtual network, there are additional configuration steps
you must use to use the designer. For more information, see Use Azure Machine
Learning studio in an Azure virtual network
Create the pipeline

７ Note
Designer supports two type of components, classic prebuilt components and

custom components. These two types of components are not compatible.
Classic prebuilt components provides prebuilt components majorly for data

components added.
Custom components allow you to provide your own code as a component. It

supports sharing across workspaces and seamless authoring across Studio, CLI, and
SDK interfaces.
This article applies to classic prebuilt components.
1. Sign in to ml.azure.com , and select the workspace you want to work with.
2. Select Designer -> Classic prebuilt
3. Select Create a new pipeline using classic prebuilt components.
4. Click the pencil icon beside the automatically generated pipeline draft name,
rename it to Automobile price prediction. The name doesn't need to be unique.
Set the default compute target
A pipeline jobs on a compute target, which is a compute resource that's attached to
your workspace. After you create a compute target, you can reuse it for future jobs.
） Important
Attached compute is not supported, use compute instances or clusters instead.
You can set a Default compute target for the entire pipeline, which will tell every
component to use the same compute target by default. However, you can specify
compute targets on a per-module basis.
1. Select Settings to the right of the canvas to open the Settings pane.
2. Select Create Azure Machine Learning compute instance.
If you already have an available compute target, you can select it from the
Select Azure Machine Learning compute instance drop-down to run this
pipeline.
Or, select "Serverless" to use serverless compute (preview).
3. Enter a name for the compute resource.
4. Select Create.
７ Note
It takes approximately five minutes to create a compute resource. After the

resource is created, you can reuse it and skip this wait time for future jobs.
The compute resource autoscales to zero nodes when it's idle to save cost.
When you use it again after a delay, you might experience approximately five
minutes of wait time while it scales back up.
Import data
There are several sample datasets included in the designer for you to experiment with.
For this tutorial, use Automobile price data (Raw).
1. To the left of the pipeline canvas is a palette of datasets and components. Select
Component -> Sample data.
2. Select the dataset Automobile price data (Raw), and drag it onto the canvas.
Visualize the data

You can visualize the data to understand the dataset that you'll use.
1. Right-click the Automobile price data (Raw) and select Preview Data.
2. Select the different columns in the data window to view information about each
one.
Each row represents an automobile, and the variables associated with each
automobile appear as columns. There are 205 rows and 26 columns in this dataset.
Prepare data
Datasets typically require some preprocessing before analysis. You might have noticed
some missing values when you inspected the dataset. These missing values must be
cleaned so that the model can analyze the data correctly.
Remove a column
When you train a model, you have to do something about the data that's missing. In this
dataset, the normalized-losses column is missing many values, so you'll exclude that
column from the model altogether.
1. In the datasets and component palette to the left of the canvas, click Component
and search for the Select Columns in Dataset component.
2. Drag the Select Columns in Dataset component onto the canvas. Drop the
component below the dataset component.
3. Connect the Automobile price data (Raw) dataset to the Select Columns in
Dataset component. Drag from the dataset's output port, which is the small circle
at the bottom of the dataset on the canvas, to the input port of Select Columns in
Dataset, which is the small circle at the top of the component.
 Tip
You create a flow of data through your pipeline when you connect the output
port of one component to an input port of another.
4. Select the Select Columns in Dataset component.
5. Click on the arrow icon under Settings to the right of the canvas to open the
component details pane. Alternatively, you can double-click the Select Columns in
Dataset component to open the details pane.
6. Select Edit column to the right of the pane.
7. Expand the Column names drop down next to Include, and select All columns.
8. Select the + to add a new rule.
9. From the drop-down menus, select Exclude and Column names.
10. Enter normalized-losses in the text box.
11. In the lower right, select Save to close the column selector.
12. In the Select Columns in Dataset component details pane, expand Node info.
13. Select the Comment text box and enter Exclude normalized losses.
Comments will appear on the graph to help you organize your pipeline.
Clean missing data

Your dataset still has missing values after you remove the normalized-losses column.
You can remove the remaining missing data by using the Clean Missing Data
component.
 Tip
Cleaning the missing values from input data is a prerequisite for using most of the
components in the designer.
and search for the Clean Missing Data component.
2. Drag the Clean Missing Data component to the pipeline canvas. Connect it to the
Select Columns in Dataset component.
3. Select the Clean Missing Data component.
component details pane. Alternatively, you can double-click the Clean Missing
Data component to open the details pane.
6. In the Columns to be cleaned window that appears, expand the drop-down menu
next to Include. Select, All columns
7. Select Save
8. In the Clean Missing Data component details pane, under Cleaning mode, select
Remove entire row.
9. In the Clean Missing Data component details pane, expand Node info.
10. Select the Comment text box and enter Remove missing value rows.
Your pipeline should now look something like this:

Train a machine learning model
Now that you have the components in place to process the data, you can set up the
training components.
Because you want to predict price, which is a number, you can use a regression
algorithm. For this example, you use a linear regression model.
Split the data

Splitting data is a common task in machine learning. You'll split your data into two
separate datasets. One dataset will train the model and the other will test how well the
model performed.
and search for the Split Data component.
2. Drag the Split Data component to the pipeline canvas.
3. Connect the left port of the Clean Missing Data component to the Split Data
component.
） Important
Make sure that the left output port of Clean Missing Data connects to Split
Data. The left port contains the cleaned data. The right port contains the
discarded data.
4. Select the Split Data component.
component details pane. Alternatively, you can double-click the Split Data
component to open the details pane.
6. In the Split Data details pane, set the Fraction of rows in the first output dataset
to 0.7.
This option splits 70 percent of the data to train the model and 30 percent for
testing it. The 70 percent dataset will be accessible through the left output port.
The remaining data will be available through the right output port.
7. In the Split Data details pane, expand Node info.
8. Select the Comment text box and enter Split the dataset into training set (0.7) and
test set (0.3).
Train the model

Train the model by giving it a dataset that includes the price. The algorithm constructs a
model that explains the relationship between the features and the price as presented by
the training data.
and search for the Linear Regression component.
2. Drag the Linear Regression component to the pipeline canvas.
and search for the Train Model component.
4. Drag the Train Model component to the pipeline canvas.
5. Connect the output of the Linear Regression component to the left input of the
Train Model component.
6. Connect the training data output (left port) of the Split Data component to the
right input of the Train Model component.
） Important
Make sure that the left output port of Split Data connects to Train Model. The
left port contains the training set. The right port contains the test set.
7. Select the Train Model component.
component details pane. Alternatively, you can double-click the Train Model
component to open the details pane.
10. In the Label column window that appears, expand the drop-down menu and select
Column names.
11. In the text box, enter price to specify the value that your model is going to predict.
） Important
Make sure you enter the column name exactly. Do not capitalize price.
Your pipeline should look like this:

Add the Score Model component
After you train your model by using 70 percent of the data, you can use it to score the
other 30 percent to see how well your model functions.
and search for the Score Model component.
2. Drag the Score Model component to the pipeline canvas.
3. Connect the output of the Train Model component to the left input port of Score
Model. Connect the test data output (right port) of the Split Data component to
the right input port of Score Model.
Add the Evaluate Model component

Use the Evaluate Model component to evaluate how well your model scored the test
dataset.
and search for the Evaluate Model component.
2. Drag the Evaluate Model component to the pipeline canvas.

3. Connect the output of the Score Model component to the left input of Evaluate
Model.
The final pipeline should look something like this:
Submit the pipeline

Now that your pipeline is all setup, you can submit a pipeline job to train your machine
learning model. You can submit a valid pipeline job at any point, which can be used to
review changes to your pipeline during development.
1. At the top of the canvas, select Submit.
2. In the Set up pipeline job dialog box, select Create new.
７ Note
Experiments group similar pipeline jobs together. If you run a pipeline

multiple times, you can select the same experiment for successive jobs.
a. For New experiment Name, enter Tutorial-CarPrices.
b. Select Submit.
c. You'll see a submission list in the left pane of the canvas, and a notification will
pop up at the top right corner of the page. You can select the Job detail link to
go to job detail page for debugging.
If this is the first job, it may take up to 20 minutes for your pipeline to finish
running. The default compute settings have a minimum node size of 0, which
means that the designer must allocate resources after being idle. Repeated
pipeline jobs will take less time since the compute resources are already allocated.
Additionally, the designer uses cached results for each component to further
improve efficiency.
View scored labels

In the job detail page, you can check the pipeline job status, results and logs.
After the job completes, you can view the results of the pipeline job. First, look at the
predictions generated by the regression model.
1. Right-click the Score Model component, and select Preview data > Scored
dataset to view its output.
Here you can see the predicted prices and the actual prices from the testing data.
Evaluate models
Use the Evaluate Model to see how well the trained model performed on the test
dataset.
1. Right-click the Evaluate Model component and select Preview data > Evaluation
results to view its output.
The following statistics are shown for your model:
Mean Absolute Error (MAE): The average of absolute errors. An error is the
difference between the predicted value and the actual value.
Root Mean Squared Error (RMSE): The square root of the average of squared
errors of predictions made on the test dataset.
Relative Absolute Error: The average of absolute errors relative to the absolute
difference between actual values and the average of all actual values.
Relative Squared Error: The average of squared errors relative to the squared
difference between the actual values and the average of all actual values.
Coefficient of Determination: Also known as the R squared value, this statistical
metric indicates how well a model fits the data.
For each of the error statistics, smaller is better. A smaller value indicates that the
predictions are closer to the actual values. For the coefficient of determination, the
closer its value is to one (1.0), the better the predictions.
Clean up resources
Skip this section if you want to continue on with part 2 of the tutorial, deploying models.
） Important
You can use the resources that you created as prerequisites for other Azure
Machine Learning tutorials and how-to articles.
Delete everything
If you don't plan to use anything that you created, delete the entire resource group so
you don't incur any charges.
1. In the Azure portal, select Resource groups on the left side of the window.
2. In the list, select the resource group that you created.
3. Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets

In the designer where you created your experiment, delete individual assets by selecting
them and then selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when
it's not being used. This action is taken to minimize charges. If you want to delete the
compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and
selecting Unregister.
To delete a dataset, go to the storage account by using the Azure portal or Azure
Storage Explorer and manually delete those assets.
Next steps
In part two, you'll learn how to deploy your model as a real-time endpoint.
Continue to deploying models

Tutorial: Designer - deploy a machine
learning model
Article • 05/29/2023
Use the designer to deploy a machine learning model to predict the price of cars. This
tutorial is part two of a two-part series.
７ Note

components added.
and SDK v2.
In part one of the tutorial you trained a linear regression model on car prices. In part
two, you deploy the model to give others a chance to use it. In this tutorial, you:
＂ Create a real-time inference pipeline.

＂ Create an inferencing cluster.
＂ Deploy the real-time endpoint.
＂ Test the real-time endpoint.
Prerequisites
Complete part one of the tutorial to learn how to train and score a machine learning
model in the designer.
） Important
users and roles.
Create a real-time inference pipeline

To deploy your pipeline, you must first convert the training pipeline into a real-time
inference pipeline. This process removes training components and adds web service
inputs and outputs to handle requests.
７ Note
Create inference pipeline only supports training pipelines which contain only the
designer built-in components and must have a component like Train Model which
outputs the trained model.
Create a real-time inference pipeline

1. On pipeline job detail page, above the pipeline canvas, select Create inference
pipeline > Real-time inference pipeline.
Your new pipeline will now look like this:
When you select Create inference pipeline, several things happen:
The trained model is stored as a Dataset component in the component

palette. You can find it under My Datasets.
Training components like Train Model and Split Data are removed.
The saved trained model is added back into the pipeline.
Web Service Input and Web Service Output components are added. These
components show where user data enters the pipeline and where data is
returned.
７ Note
By default, the Web Service Input will expect the same data schema as the
component output data which connects to the same downstream port as it. In
this sample, Web Service Input and Automobile price data (Raw) connect to
the same downstream component, hence Web Service Input expect the same
data schema as Automobile price data (Raw) and target variable column
price is included in the schema. However, usually When you score the data,
you won't know the target variable values. For such case, you can remove the
target variable column in the inference pipeline using Select Columns in
Dataset component. Make sure that the output of Select Columns in Dataset
removing target variable column is connected to the same port as the output
of the Web Service Input component.
2. Select Submit, and use the same compute target and experiment that you used in
part one.
If this is the first job, it may take up to 20 minutes for your pipeline to finish
running. The default compute settings have a minimum node size of 0, which
improve efficiency.
3. Go to the real-time inference pipeline job detail by selecting Job detail link in the
left pane.
4. Select Deploy in the job detail page.
Create an inferencing cluster

In the dialog box that appears, you can select from any existing Azure Kubernetes
Service (AKS) clusters to deploy your model to. If you don't have an AKS cluster, use the
following steps to create one.
1. Select Compute in the dialog box that appears to go to the Compute page.
2. On the navigation ribbon, select Inference Clusters > + New.

3. In the inference cluster pane, configure a new Kubernetes Service.
4. Enter aks-compute for the Compute name.
5. Select a nearby region that's available for the Region.
6. Select Create.
７ Note
It takes approximately 15 minutes to create a new AKS service. You can check
the provisioning state on the Inference Clusters page.
Deploy the real-time endpoint

After your AKS service has finished provisioning, return to the real-time inferencing
pipeline to complete deployment.
1. Select Deploy above the canvas.

2. Select Deploy new real-time endpoint.
3. Select the AKS cluster you created.
You can also change Advanced setting for your real-time endpoint.
Advanced setting Description
Enable Application Insights Whether to enable Azure Application Insights to collect

diagnostics and data data from the deployed endpoints.
collection By default: false.
Scoring timeout A timeout in milliseconds to enforce for scoring calls to the

web service.
By default: 60000.
Auto scale enabled Whether to enable autoscaling for the web service.
By default: true.
Min replicas The minimum number of containers to use when

autoscaling this web service.
By default: 1.
Advanced setting Description
Max replicas The maximum number of containers to use when

autoscaling this web service.
By default: 10.
Target utilization The target utilization (in percent out of 100) that the
autoscaler should attempt to maintain for this web service.
By default: 70.
Refresh period How often (in seconds) the autoscaler attempts to scale
this web service.
By default: 1.
CPU reserve capacity The number of CPU cores to allocate for this web service.
By default: 0.1.
Memory reserve capacity The amount of memory (in GB) to allocate for this web
service.
By default: 0.5.
4. Select Deploy.
A success notification from the notification center appears after deployment

finishes. It might take a few minutes.
 Tip
You can also deploy to Azure Container Instance (ACI) if you select Azure
Container Instance for Compute type in the real-time endpoint setting box. Azure
Container Instance is used for testing or development. Use ACI for low-scale CPU-
based workloads that require less than 48 GB of RAM.
Test the real-time endpoint

After deployment finishes, you can view your real-time endpoint by going to the
Endpoints page.
1. On the Endpoints page, select the endpoint you deployed.
In the Details tab, you can see more information such as the REST URI, Swagger
definition, status, and tags.
In the Consume tab, you can find sample consumption code, security keys, and set
authentication methods.
In the Deployment logs tab, you can find the detailed deployment logs of your
real-time endpoint.
2. To test your endpoint, go to the Test tab. From here, you can enter test data and
select Test verify the output of your endpoint.
Update the real-time endpoint

You can update the online endpoint with new model trained in the designer. On the
online endpoint detail page, find your previous training pipeline job and inference
pipeline job.
1. You can directly find and modify your training pipeline draft in the designer
homepage.
Or you can open the training pipeline job link and then clone it into a new pipeline
draft to continue editing.
2. After you submit the modified training pipeline, go to the job detail page.
3. When the job completes, right click Train Model and select Register data.
Input name and select File type.

4. After the dataset registers successfully, open your inference pipeline draft, or clone
the previous inference pipeline job into a new draft. In the inference pipeline draft,
replace the previous trained model shown as MD-XXXX node connected to the
Score Model component with the newly registered dataset.
5. If you need to update the data preprocessing part in your training pipeline, and
would like to update that into the inference pipeline, the processing is similar as
steps above.
You just need to register the transformation output of the transformation

component as dataset.
Then manually replace the TD- component in inference pipeline with the
registered dataset.
6. After modifying your inference pipeline with the newly trained model or
transformation, submit it. When the job is completed, deploy it to the existing
online endpoint deployed previously.
Limitations
Due to datastore access limitation, if your inference pipeline contains Import Data
or Export Data component, they'll be auto-removed when deploy to real-time
endpoint.
If you have datasets in the real-time inference pipeline and want to deploy it to
real-time endpoint, currently this flow only supports datasets registered from Blob
datastore. If you want to use datasets from other type datastores, you can use
Select Column to connect with your initial dataset with settings of selecting all
columns, register the outputs of Select Column as File dataset and then replace the
initial dataset in the real-time inference pipeline with this newly registered dataset.
If your inference graph contains "Enter Data Manually" component which is not
connected to the same port as “Web service Input” component, the "Enter Data
Manually" component will not be executed during HTTP call processing. A
workaround is to register the outputs of that "Enter Data Manually" component as
dataset, then in the inference pipeline draft, replace the "Enter Data Manually"
component with the registered dataset.
Clean up resources
） Important
Delete everything

Next steps
In this tutorial, you learned the key steps in how to create, deploy, and consume a
machine learning model in the designer. To learn more about how you can use the
designer see the following links:
Designer samples: Learn how to use the designer to solve other types of problems.
Use Azure Machine Learning studio in an Azure virtual network.
Tutorial: Get started with a Python script
in Azure Machine Learning (SDK v1, part
1 of 3)
Article • 04/20/2023
In this tutorial, you run your first Python script in the cloud with Azure Machine
Learning. This tutorial is part 1 of a three-part tutorial series.
This tutorial avoids the complexity of training a machine learning model. You will run a
"Hello World" Python script in the cloud. You will learn how a control script is used to
configure and create a run in Azure Machine Learning.
In this tutorial, you will:
＂ Create and run a "Hello world!" Python script.

＂ Create a Python control script to submit "Hello world!" to Azure Machine Learning.
＂ Understand the Azure Machine Learning concepts in the control script.
＂ Submit and run the "Hello world!" script.
＂ View your code output in the cloud.
Prerequisites
Complete Create resources you need to get started to create a workspace and
compute instance to use in this tutorial series.
Create a cloud-based compute cluster. Name it 'cpu-cluster' to match the code
in this tutorial.
Create and run a Python script

This tutorial will use the compute instance as your development computer. First create a
few folders and the script:
1. Sign in to the Azure Machine Learning studio and select your workspace if
prompted.
2. On the left, select Notebooks
3. In the Files toolbar, select +, then select Create new folder.
4. Name the folder get-started.

5. To the right of the folder name, use the ... to create another folder under get-
started.
6. Name the new folder src. Use the Edit location link if the file location is not
correct.
7. To the right of the src folder, use the ... to create a new file in the src folder.
8. Name your file hello.py. Switch the File type to Python (.py)*.
Copy this code into your file:
Python
# src/hello.py
print("Hello world!")
Your project folder structure will now look like:
Test your script

You can run your code locally, which in this case means on the compute instance.
Running code locally has the benefit of interactive debugging of code.
If you have previously stopped your compute instance, start it now with the Start
compute tool to the right of the compute dropdown. Wait about a minute for state to
change to Running.
Select Save and run script in terminal to run the script.
You'll see the output of the script in the terminal window that opens. Close the tab and
select Terminate to close the session.
Create a control script
A control script allows you to run your hello.py script on different compute resources.
You use the control script to control how and where your machine learning code is run.
Select the ... at the end of get-started folder to create a new file. Create a Python file
called run-hello.py and copy/paste the following code into that file:
Python
# get-started/run-hello.py
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
ws = Workspace.from_config()
experiment = Experiment(workspace=ws, name='day1-experiment-hello')
config = ScriptRunConfig(source_directory='./src', script='hello.py',

compute_target='cpu-cluster')
run = experiment.submit(config)
aml_url = run.get_portal_url()
print(aml_url)
 Tip
If you used a different name when you created your compute cluster, make sure to
adjust the name in the code compute_target='cpu-cluster' as well.
Understand the code

Here's a description of how the control script works:
Workspace connects to your Azure Machine Learning workspace, so that you can
communicate with your Azure Machine Learning resources.
experiment = Experiment( ... )
Experiment provides a simple way to organize multiple jobs under a single name. Later
you can see how experiments make it easy to compare metrics between dozens of jobs.
config = ScriptRunConfig( ... )

ScriptRunConfig wraps your hello.py code and passes it to your workspace. As the
name suggests, you can use this class to configure how you want your script to run in
Azure Machine Learning. It also specifies what compute target the script will run on. In
this code, the target is the compute cluster that you created in the setup tutorial.
Submits your script. This submission is called a run. In v2, it has been renamed to a job.
A run/job encapsulates a single execution of your code. Use a job to monitor the script
progress, capture the output, analyze the results, visualize metrics, and more.
The run object provides a handle on the execution of your code. Monitor its progress
from the Azure Machine Learning studio with the URL that's printed from the Python
script.
Submit and run your code in the cloud

1. Select Save and run script in terminal to run your control script, which in turn runs
hello.py on the compute cluster that you created in the setup tutorial.
2. In the terminal, you may be asked to sign in to authenticate. Copy the code and
follow the link to complete this step.
3. Once you're authenticated, you'll see a link in the terminal. Select the link to view
the job.
７ Note
You may see some warnings starting with Failure while loading
azureml_run_type_providers.... You can ignore these warnings. Use the link at
the bottom of these warnings to view your output.
View the output

1. In the page that opens, you'll see the job status.
2. When the status of the job is Completed, select Output + logs at the top of the
page.
3. Select std_log.txt to view the output of your job.
Monitor your code in the cloud in the studio
The output from your script will contain a link to the studio that looks something like
this: https://ml.azure.com/experiments/hello-world/runs/<run-id>?
wsid=/subscriptions/<subscription-id>/resourcegroups/<resource-
group>/workspaces/<workspace-name> .
Follow the link. At first, you'll see a status of Queued or Preparing. The very first run will
take 5-10 minutes to complete. This is because the following occurs:
A docker image is built in the cloud

The compute cluster is resized from 0 to 1 node
The docker image is downloaded to the compute.
Subsequent jobs are much quicker (~15 seconds) as the docker image is cached on the
compute. You can test this by resubmitting the code below after the first job has
completed.
Wait about 10 minutes. You'll see a message that the job has completed. Then use
Refresh to see the status change to Completed. Once the job completes, go to the
Outputs + logs tab. There you can see a std_log.txt file that looks like this:
txt
1: [2020-08-04T22:15:44.407305] Entering context manager injector.

2: [context_manager_injector.py] Command line Options: Namespace(inject=
['ProjectPythonPath:context_managers.ProjectPythonPath',
'RunHistory:context_managers.RunHistory',
'TrackUserError:context_managers.TrackUserError',
'UserExceptions:context_managers.UserExceptions'], invocation=['hello.py'])
3: Starting the daemon thread to refresh tokens in background for process
with pid = 31263
4: Entering Job History Context Manager.
5: Preparing to call script [ hello.py ] with arguments: []
6: After variable expansion, calling script [ hello.py ] with arguments: []
7:
8: Hello world!
9: Starting the daemon thread to refresh tokens in background for process
with pid = 31263
10:
11:
12: The experiment completed successfully. Finalizing job...
13: Logging experiment finalizing status in history service.
14: [2020-08-04T22:15:46.541334] TimeoutHandler __init__
15: [2020-08-04T22:15:46.541396] TimeoutHandler __enter__
16: Cleaning up all outstanding Job operations, waiting 300.0 seconds
17: 1 items cleaning up...
18: Cleanup took 0.1812913417816162 seconds
19: [2020-08-04T22:15:47.040203] TimeoutHandler __exit__
On line 8, you see the "Hello world!" output.
The 70_driver_log.txt file contains the standard output from a job. This file can be
useful when you're debugging remote jobs in the cloud.
Next steps
In this tutorial, you took a simple "Hello world!" script and ran it on Azure. You saw how
to connect to your Azure Machine Learning workspace, create an experiment, and
submit your hello.py code to the cloud.
In the next tutorial, you build on these learnings by running something more interesting
than print("Hello world!") .
Tutorial: Train a model
７ Note
If you want to finish the tutorial series here and not progress to the next step,
remember to clean up your resources.
Tutorial: Train your first machine
learning model (SDK v1, part 2 of 3)
Article • 04/04/2023
This tutorial shows you how to train a machine learning model in Azure Machine
Learning. This tutorial is part 2 of a three-part tutorial series.
In Part 1: Run "Hello world!" of the series, you learned how to use a control script to run
a job in the cloud.
In this tutorial, you take the next step by submitting a script that trains a machine
learning model. This example will help you understand how Azure Machine Learning
eases consistent behavior between local debugging and remote runs.
In this tutorial, you:
＂ Create a training script.

＂ Use Conda to define an Azure Machine Learning environment.
＂ Create a control script.
＂ Understand Azure Machine Learning classes ( Environment , Run , Metrics ).
＂ Submit and run your training script.
＂ Log metrics to Azure Machine Learning.
＂ View your metrics in the cloud.
Prerequisites
Completion of part 1 of the series.
Create training scripts

First you define the neural network architecture in a model.py file. All your training code
will go into the src subdirectory, including model.py.
The training code is taken from this introductory example from PyTorch. Note that the
Azure Machine Learning concepts apply to any machine learning code, not just PyTorch.
1. Create a model.py file in the src subfolder. Copy this code into the file:
Python
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):

x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
2. On the toolbar, select Save to save the file. Close the tab if you wish.
3. Next, define the training script, also in the src subfolder. This script downloads the
CIFAR10 dataset by using PyTorch torchvision.dataset APIs, sets up the network
defined in model.py, and trains it for two epochs by using standard SGD and cross-
entropy loss.
Create a train.py script in the src subfolder:
Python
import torch
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from model import Net
# download CIFAR 10 data

trainset = torchvision.datasets.CIFAR10(
root="../data",
train=True,
download=True,
transform=torchvision.transforms.ToTensor(),
)
trainloader = torch.utils.data.DataLoader(
trainset, batch_size=4, shuffle=True, num_workers=2
)
if __name__ == "__main__":
# define convolutional network

net = Net()
# set up pytorch loss / optimizer

criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# train the network

for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# unpack the data
inputs, labels = data
# zero the parameter gradients

optimizer.zero_grad()
# forward + backward + optimize

outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999:
loss = running_loss / 2000
print(f"epoch={epoch + 1}, batch={i + 1:5}: loss
{loss:.2f}")
running_loss = 0.0
print("Finished Training")
4. You now have the following folder structure:

Test locally
Select Save and run script in terminal to run the train.py script directly on the compute
instance.
After the script completes, select Refresh above the file folders. You'll see the new data
folder called get-started/data Expand this folder to view the downloaded data.
Create a Python environment

Azure Machine Learning provides the concept of an environment to represent a
reproducible, versioned Python environment for running experiments. It's easy to create
an environment from a local Conda or pip environment.
First you'll create a file with the package dependencies.

1. Create a new file in the get-started folder called pytorch-env.yml :
yml
name: pytorch-env
channels:
- defaults
- pytorch
dependencies:
- python=3.7
- pytorch
- torchvision
2. On the toolbar, select Save to save the file. Close the tab if you wish.
Create the control script

The difference between the following control script and the one that you used to submit
"Hello world!" is that you add a couple of extra lines to set the environment.
Create a new Python file in the get-started folder called run-pytorch.py :
Python
# run-pytorch.py
from azureml.core import Experiment
from azureml.core import Environment
from azureml.core import ScriptRunConfig
if __name__ == "__main__":
experiment = Experiment(workspace=ws, name='day1-experiment-train')
config = ScriptRunConfig(source_directory='./src',
script='train.py',
# set up pytorch environment

env = Environment.from_conda_specification(
name='pytorch-env',
file_path='pytorch-env.yml'
)
config.run_config.environment = env
print(aml_url)
 Tip
Understand the code changes
env = ...
References the dependency file you created above.
Adds the environment to ScriptRunConfig.
Submit the run to Azure Machine Learning

1. Select Save and run script in terminal to run the run-pytorch.py script.
2. You'll see a link in the terminal window that opens. Select the link to view the job.
７ Note
azureml_run_type_providers.... You can ignore these warnings. Use the link at
the bottom of these warnings to view your output.
View the output

1. In the page that opens, you'll see the job status. The first time you run this script,
Azure Machine Learning will build a new Docker image from your PyTorch
environment. The whole job might take around 10 minutes to complete. This
image will be reused in future jobs to make them run much quicker.
2. You can see view Docker build logs in the Azure Machine Learning studio. Select
the Outputs + logs tab, and then select 20_image_build_log.txt.
3. When the status of the job is Completed, select Output + logs.
4. Select std_log.txt to view the output of your job.
txt
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to
../data/cifar-10-python.tar.gz
Extracting ../data/cifar-10-python.tar.gz to ../data
epoch=1, batch= 2000: loss 2.19
...
epoch=2, batch=10000: loss 1.49
epoch=2, batch=12000: loss 1.46
Finished Training
If you see an error Your total snapshot size exceeds the limit , the data folder is
located in the source_directory value used in ScriptRunConfig .
Select the ... at the end of the folder, then select Move to move data to the get-started
folder.
Log training metrics

Now that you have a model training in Azure Machine Learning, start tracking some
performance metrics.
The current training script prints metrics to the terminal. Azure Machine Learning
provides a mechanism for logging metrics with more functionality. By adding a few lines
of code, you gain the ability to visualize metrics in the studio and to compare metrics
between multiple jobs.
Modify train.py to include logging

1. Modify your train.py script to include two more lines of code:
Python
import torch
import torchvision
from azureml.core import Run
# ADDITIONAL CODE: get run from the current context

run = Run.get_context()
# download CIFAR 10 data

root='./data',
train=True,
download=True,
transform=torchvision.transforms.ToTensor()
)
trainset,
batch_size=4,
shuffle=True,
num_workers=2
)
if __name__ == "__main__":
net = Net()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# train the network
running_loss = 0.0
# unpack the data
loss.backward()
optimizer.step()
# print statistics
if i % 2000 == 1999:
# ADDITIONAL CODE: log loss metric to AML
run.log('loss', loss)
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss
{loss:.2f}')
running_loss = 0.0
print('Finished Training')
2. Save this file, then close the tab if you wish.
Understand the additional two lines of code

In train.py, you access the run object from within the training script itself by using the
Run.get_context() method and use it to log metrics:
Python
# ADDITIONAL CODE: get run from the current context
...
# ADDITIONAL CODE: log loss metric to AML
run.log('loss', loss)
Metrics in Azure Machine Learning are:
Organized by experiment and run, so it's easy to keep track of and compare
metrics.
Equipped with a UI so you can visualize training performance in the studio.
Designed to scale, so you keep these benefits even as you run hundreds of
experiments.
Update the Conda environment file

The train.py script just took a new dependency on azureml.core . Update pytorch-
env.yml to reflect this change:
yml
name: pytorch-env
channels:
- defaults
- pytorch
dependencies:
- python=3.7
- pytorch
- torchvision
- pip
- pip:
- azureml-sdk
Make sure you save this file before you submit the run.

Select the tab for the run-pytorch.py script, then select Save and run script in terminal
to re-run the run-pytorch.py script. Make sure you've saved your changes to pytorch-
env.yml first.
This time when you visit the studio, go to the Metrics tab where you can now see live
updates on the model training loss! It may take a 1 to 2 minutes before the training
begins.
Next steps
In this session, you upgraded from a basic "Hello world!" script to a more realistic
training script that required a specific Python environment to run. You saw how to use
curated Azure Machine Learning environments. Finally, you saw how in a few lines of
code you can log metrics to Azure Machine Learning.
There are other ways to create Azure Machine Learning environments, including from a
pip requirements.txt file or from an existing local Conda environment.
In the next session, you'll see how to work with data in Azure Machine Learning by
uploading the CIFAR10 dataset to Azure.
Tutorial: Bring your own data
７ Note
If you want to finish the tutorial series here and not progress to the next step,
remember to clean up your resources.
Tutorial: Upload data and train a model
(SDK v1, part 3 of 3)
Article • 04/04/2023
This tutorial shows you how to upload and use your own data to train machine learning
models in Azure Machine Learning. This tutorial is part 3 of a three-part tutorial series.
In Part 2: Train a model, you trained a model in the cloud, using sample data from
PyTorch . You also downloaded that data through the torchvision.datasets.CIFAR10
method in the PyTorch API. In this tutorial, you'll use the downloaded data to learn the
workflow for working with your own data in Azure Machine Learning.
＂ Upload data to Azure.

＂ Create a control script.
＂ Understand the new Azure Machine Learning concepts (passing parameters,
datasets, datastores).
＂ Submit and run your training script.
Prerequisites
You'll need the data that was downloaded in the previous tutorial. Make sure you have
completed these steps:
1. Create the training script.

2. Test locally.
Adjust the training script

By now you have your training script (get-started/src/train.py) running in Azure Machine
Learning, and you can monitor the model performance. Let's parameterize the training
script by introducing arguments. Using arguments will allow you to easily compare
different hyperparameters.
Our training script is currently set to download the CIFAR10 dataset on each run. The
following Python code has been adjusted to read the data from a directory.
７ Note
The use of argparse parameterizes the script.
1. Open train.py and replace it with this code:
Python
import os
import argparse
import torch
import torchvision
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
'--data_path',
type=str,
help='Path to the training data'
)
'--learning_rate',
type=float,
default=0.001,
help='Learning rate for SGD'
)
'--momentum',
type=float,
default=0.9,
help='Momentum for SGD'
)
args = parser.parse_args()
print("===== DATA =====")
print("DATA PATH: " + args.data_path)
print("LIST FILES IN DATA PATH...")
print(os.listdir(args.data_path))
print("================")
# prepare DataLoader for CIFAR10 data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
root=args.data_path,
train=True,
download=False,
transform=transform,
)
trainset,
batch_size=4,
shuffle=True,
num_workers=2
)
net = Net()
optimizer = optim.SGD(
net.parameters(),
lr=args.learning_rate,
momentum=args.momentum,
)
# train the network
running_loss = 0.0
# unpack the data
loss.backward()
optimizer.step()
# print statistics
if i % 2000 == 1999:
run.log('loss', loss) # log loss metric to AML
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss
{loss:.2f}')
running_loss = 0.0
print('Finished Training')
2. Save the file. Close the tab if you wish.
Understanding the code changes

The code in train.py has used the argparse library to set up data_path , learning_rate ,
and momentum .
Python
# .... other code

parser.add_argument('--data_path', type=str, help='Path to the training
data')
parser.add_argument('--learning_rate', type=float, default=0.001,
help='Learning rate for SGD')
parser.add_argument('--momentum', type=float, default=0.9, help='Momentum
for SGD')
# ... other code
Also, the train.py script was adapted to update the optimizer to use the user-defined
parameters:
Python
optimizer = optim.SGD(
net.parameters(),
lr=args.learning_rate, # get learning rate from command-line
argument
momentum=args.momentum, # get momentum from command-line argument
)
Upload the data to Azure

To run this script in Azure Machine Learning, you need to make your training data
available in Azure. Your Azure Machine Learning workspace comes equipped with a
default datastore. This is an Azure Blob Storage account where you can store your
training data.
７ Note
Azure Machine Learning allows you to connect other cloud-based datastores that
store your data. For more details, see the datastores documentation.
1. Create a new Python control script in the get-started folder (make sure it is in get-
started, not in the /src folder). Name the script upload-data.py and copy this code
into the file:
Python
# upload-data.py
from azureml.core import Dataset
from azureml.data.datapath import DataPath
datastore = ws.get_default_datastore()
Dataset.File.upload_directory(src_dir='data',
target=DataPath(datastore,
"datasets/cifar10")
)
The target_path value specifies the path on the datastore where the CIFAR10 data
will be uploaded.
 Tip
While you're using Azure Machine Learning to upload the data, you can use
Azure Storage Explorer to upload ad hoc files. If you need an ETL tool, you
can use Azure Data Factory to ingest your data into Azure.
2. Select Save and run script in terminal to run the upload-data.py script.
You should see the following standard output:
txt
Uploading ./data\cifar-10-batches-py\data_batch_2
Uploaded ./data\cifar-10-batches-py\data_batch_2, 4 files out of an
estimated total of 9
.
.
Uploading ./data\cifar-10-batches-py\data_batch_5
Uploaded ./data\cifar-10-batches-py\data_batch_5, 9 files out of an
estimated total of 9
Uploaded 9 files
Create a control script

As you've done previously, create a new Python control script called run-pytorch-data.py
in the get-started folder:
Python
# run-pytorch-data.py
if __name__ == "__main__":
dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))
experiment = Experiment(workspace=ws, name='day1-experiment-data')
config = ScriptRunConfig(
source_directory='./src',
script='train.py',
compute_target='cpu-cluster',
arguments=[
'--data_path', dataset.as_named_input('input').as_mount(),
'--learning_rate', 0.003,
'--momentum', 0.92],
)

name='pytorch-env',
file_path='pytorch-env.yml'
)
print("Submitted to compute cluster. Click link below")
print("")
print(aml_url)
 Tip
Understand the code changes

The control script is similar to the one from part 3 of this series, with the following new
lines:
dataset = Dataset.File.from_files( ... )
A dataset is used to reference the data you uploaded to Azure Blob Storage. Datasets
are an abstraction layer on top of your data that are designed to improve reliability and
trustworthiness.
config = ScriptRunConfig(...)
ScriptRunConfig is modified to include a list of arguments that will be passed into
train.py . The dataset.as_named_input('input').as_mount() argument means the
specified directory will be mounted to the compute target.

Select Save and run script in terminal to run the run-pytorch-data.py script. This run will
train the model on the compute cluster using the data you uploaded.
This code will print a URL to the experiment in the Azure Machine Learning studio. If you
go to that link, you'll be able to see your code running.
７ Note
azureml_run_type_providers.... You can ignore these warnings. Use the link at the
bottom of these warnings to view your output.
Inspect the log file

In the studio, go to the experiment job (by selecting the previous URL output) followed
by Outputs + logs. Select the std_log.txt file. Scroll down through the log file until you
see the following output:
txt
Processing 'input'.
Processing dataset FileDataset
{
"source": [
"('workspaceblobstore', 'datasets/cifar10')"
],
"definition": [
"GetDatastoreFiles"
],
"registration": {
"id": "XXXXX",
"name": null,
"version": null,
"workspace": "Workspace.create(name='XXXX', subscription_id='XXXX',
resource_group='X')"
}
}
Mounting input to /tmp/tmp9kituvp3.
Mounted input to /tmp/tmp9kituvp3 as folder.
Exit __enter__ of DatasetContextManager
Entering Job History Context Manager.
Current directory: /mnt/batch/tasks/shared/LS_root/jobs/dsvm-
aml/azureml/tutorial-session-
3_1600171983_763c5381/mounts/workspaceblobstore/azureml/tutorial-session-
3_1600171983_763c5381
Preparing to call script [ train.py ] with arguments: ['--data_path',
'$input', '--learning_rate', '0.003', '--momentum', '0.92']
After variable expansion, calling script [ train.py ] with arguments: ['--
data_path', '/tmp/tmp9kituvp3', '--learning_rate', '0.003', '--momentum',
'0.92']
Script type = None

===== DATA =====
DATA PATH: /tmp/tmp9kituvp3
LIST FILES IN DATA PATH...
['cifar-10-batches-py', 'cifar-10-python.tar.gz']
Notice:
Azure Machine Learning has mounted Blob Storage to the compute cluster
automatically for you.
The dataset.as_named_input('input').as_mount() used in the control script
resolves to the mount point.
Clean up resources
If you plan to continue now to another tutorial, or to start your own training jobs, skip to
Next steps.
Stop compute instance

If you're not going to use it now, stop the compute instance:
1. In the studio, on the left, select Compute.

2. In the top tabs, select Compute instances
3. Select the compute instance in the list.
4. On the top toolbar, select Stop.
Delete all resources
） Important
The resources that you created can be used as prerequisites to other Azure
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
1. In the Azure portal, select Resource groups on the far left.
2. From the list, select the resource group that you created.
4. Enter the resource group name. Then select Delete.
You can also keep the resource group but delete a single workspace. Display the
workspace properties and select Delete.
Next steps
In this tutorial, we saw how to upload data to Azure by using Datastore . The datastore
served as cloud storage for your workspace, giving you a persistent and flexible place to
keep your data.
You saw how to modify your training script to accept a data path via the command line.
By using Dataset , you were able to mount a directory to the remote job.
Now that you have a model, learn:
How to deploy MLflow models .

Tutorial: Train and deploy an image
classification model with an example
Jupyter Notebook
Article • 06/01/2023
In this tutorial, you train a machine learning model on remote compute resources. You'll
use the training and deployment workflow for Azure Machine Learning in a Python
Jupyter Notebook. You can then use the notebook as a template to train your own
machine learning model with your own data.
This tutorial trains a simple logistic regression by using the MNIST dataset and scikit-
learn with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000
grayscale images. Each image is a handwritten digit of 28 x 28 pixels, representing a
number from zero to nine. The goal is to create a multi-class classifier to identify the
digit a given image represents.
Learn how to take the following actions:
＂ Download a dataset and look at the data.

＂ Train an image classification model and log metrics using MLflow.
＂ Deploy the model to do real-time inference.
Prerequisites
Complete the Quickstart: Get started with Azure Machine Learning to:
Create a workspace.
Create a cloud-based compute instance to use for your development
environment.
Run a notebook from your workspace

Azure Machine Learning includes a cloud notebook server in your workspace for an
install-free and pre-configured experience. Use your own environment if you prefer to
have control over your environment, packages, and dependencies.
Clone a notebook folder

You complete the following experiment setup and run steps in Azure Machine Learning
studio. This consolidated interface includes machine learning tools to perform data
science scenarios for data science practitioners of all skill levels.
1. Sign in to Azure Machine Learning studio .
2. Select your subscription and the workspace you created.
3. On the left, select Notebooks.
4. At the top, select the Samples tab.
5. Open the SDK v1 folder.
6. Select the ... button at the right of the tutorials folder, and then select Clone.
7. A list of folders shows each user who accesses the workspace. Select your folder to
clone the tutorials folder there.
Open the cloned notebook

1. Open the tutorials folder that was cloned into your User files section.
2. Select the quickstart-azureml-in-10mins.ipynb file from your tutorials/compute-

instance-quickstarts/quickstart-azureml-in-10mins folder.
Install packages
Once the compute instance is running and the kernel appears, add a new code cell to
install packages needed for this tutorial.
1. At the top of the notebook, add a code cell.
2. Add the following into the cell and then run the cell, either by using the Run tool
or by using Shift+Enter.
Bash
%pip install scikit-learn==0.22.1

%pip install scipy==1.5.2
You may see a few install warnings. These can safely be ignored.
Run the notebook

This tutorial and accompanying utils.py file is also available on GitHub if you wish to
use it on your own local environment. If you aren't using the compute instance, add
%pip install azureml-sdk[notebooks] azureml-opendatasets matplotlib to the install
above.
） Important
The rest of this article contains the same content as you see in the notebook.
Switch to the Jupyter Notebook now if you want to run the code while you read
along. To run a single code cell in a notebook, click the code cell and hit
Shift+Enter. Or, run the entire notebook by choosing Run all from the top toolbar.
Import data
Before you train a model, you need to understand the data you're using to train it. In
this section, learn how to:
Download the MNIST dataset

Display some sample images
You'll use Azure Open Datasets to get the raw MNIST data files. Azure Open Datasets
are curated public datasets that you can use to add scenario-specific features to
machine learning solutions for better models. Each dataset has a corresponding class,
MNIST in this case, to retrieve the data in different ways.
Python
import os
from azureml.opendatasets import MNIST
data_folder = os.path.join(os.getcwd(), "/tmp/qs_data")

os.makedirs(data_folder, exist_ok=True)
mnist_file_dataset = MNIST.get_file_dataset()
mnist_file_dataset.download(data_folder, overwrite=True)
Take a look at the data

Load the compressed files into numpy arrays. Then use matplotlib to plot 30 random
images from the dataset with their labels above them.
Note this step requires a load_data function that's included in an utils.py file. This file
is placed in the same folder as this notebook. The load_data function simply parses the
compressed files into numpy arrays.
Python
from utils import load_data

import matplotlib.pyplot as plt
import numpy as np
import glob
# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps
the model converge faster.
X_train = (
load_data(
glob.glob(
os.path.join(data_folder, "**/train-images-idx3-ubyte.gz"),
recursive=True
)[0],
False,
)
/ 255.0
)
X_test = (
load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-images-idx3-ubyte.gz"),
recursive=True
)[0],
False,
)
/ 255.0
)
y_train = load_data(
glob.glob(
os.path.join(data_folder, "**/train-labels-idx1-ubyte.gz"),
recursive=True
)[0],
True,
).reshape(-1)
y_test = load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-labels-idx1-ubyte.gz"),
recursive=True
)[0],
True,
).reshape(-1)
# now let's show some randomly chosen images from the traininng set.
count = 0
sample_size = 30
plt.figure(figsize=(16, 6))
for i in np.random.permutation(X_train.shape[0])[:sample_size]:
count = count + 1
plt.subplot(1, sample_size, count)
plt.axhline("")
plt.axvline("")
plt.text(x=10, y=-10, s=y_train[i], fontsize=18)
plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)
plt.show()
The code above displays a random set of images with their labels, similar to this:
Train model and log metrics with MLflow

You'll train the model using the code below. Note that you are using MLflow
autologging to track metrics and log model artifacts.
You'll be using the LogisticRegression classifier from the SciKit Learn framework to
classify the data.
７ Note
The model training takes approximately 2 minutes to complete.**
Python
# create the model

import mlflow
import numpy as np
from sklearn.linear_model import LogisticRegression
# connect to your workspace

# create experiment and start logging to a new run in the experiment

experiment_name = "azure-ml-in10-mins-tutorial"
# set up MLflow to track the metrics

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)
mlflow.autolog()
# set up the Logistic regression model

reg = 0.5
clf = LogisticRegression(
C=1.0 / reg, solver="liblinear", multi_class="auto", random_state=42
)
# train the model

with mlflow.start_run() as run:
clf.fit(X_train, y_train)
View experiment
In the left-hand menu in Azure Machine Learning studio, select Jobs and then select
your job (azure-ml-in10-mins-tutorial). A job is a grouping of many runs from a
specified script or piece of code. Multiple jobs can be grouped together as an
experiment.
Information for the run is stored under that job. If the name doesn't exist when you
submit a job, if you select your run you will see various tabs containing metrics, logs,
explanations, etc.
Version control your models with the model
registry
You can use model registration to store and version your models in your workspace.
Registered models are identified by name and version. Each time you register a model
with the same name as an existing one, the registry increments the version. The code
below registers and versions the model you trained above. Once you have executed the
code cell below you will be able to see the model in the registry by selecting Models in
the left-hand menu in Azure Machine Learning studio.
Python
# register the model

model_uri = "runs:/{}/model".format(run.info.run_id)
model = mlflow.register_model(model_uri, "sklearn_mnist_model")
Deploy the model for real-time inference

In this section you learn how to deploy a model so that an application can consume
(inference) the model over REST.
Create deployment configuration

The code cell gets a curated environment, which specifies all the dependencies required
to host the model (for example, the packages like scikit-learn). Also, you create a
deployment configuration, which specifies the amount of compute required to host the
model. In this case, the compute will have 1CPU and 1GB memory.
Python
# create environment for the deploy

from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice
# get a curated environment

env = Environment.get(
workspace=ws,
name="AzureML-sklearn-0.24.1-ubuntu18.04-py37-cpu-inference",
version=1
)
env.inferencing_stack_version='latest'
# create deployment config i.e. compute resources

aciconfig = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
tags={"data": "MNIST", "method": "sklearn"},
description="Predict MNIST with sklearn",
)
Deploy model
This next code cell deploys the model to Azure Container Instance.
７ Note
The deployment takes approximately 3 minutes to complete.**
Python
%%time
import uuid
from azureml.core.model import InferenceConfig
from azureml.core.model import Model
# get the registered model

model = Model(ws, "sklearn_mnist_model")
# create an inference config i.e. the scoring script and environment

inference_config = InferenceConfig(entry_script="score.py", environment=env)
# deploy the service

service_name = "sklearn-mnist-svc-" + str(uuid.uuid4())[:4]
service = Model.deploy(
workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aciconfig,
)
service.wait_for_deployment(show_output=True)
The scoring script file referenced in the code above can be found in the same folder as
this notebook, and has two functions:
1. An init function that executes once when the service starts - in this function you
normally get the model from the registry and set global variables
2. A run(data) function that executes each time a call is made to the service. In this
function, you normally format the input data, run a prediction, and output the
predicted result.
View endpoint
Once the model has been successfully deployed, you can view the endpoint by
navigating to Endpoints in the left-hand menu in Azure Machine Learning studio. You
will be able to see the state of the endpoint (healthy/unhealthy), logs, and consume
(how applications can consume the model).
Test the model service

You can test the model by sending a raw HTTP request to test the web service.
Python
# send raw HTTP request to test the web service.

import requests
# send a random row from the test set to score

random_index = np.random.randint(0, len(X_test) - 1)
input_data = '{"data": [' + str(list(X_test[random_index])) + "]}"
headers = {"Content-Type": "application/json"}
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)

print("label:", y_test[random_index])
print("prediction:", resp.text)
Clean up resources
If you're not going to continue to use this model, delete the Model service using:
Python
# if you want to keep workspace and only delete endpoint (it will incur cost
while running)
service.delete()
If you want to control cost further, stop the compute instance by selecting the "Stop
compute" button next to the Compute dropdown. Then start the compute instance
again the next time you need it.
Delete everything
Use these steps to delete your Azure Machine Learning workspace and all compute
resources.
） Important
The resources that you created can be used as prerequisites to other Azure
If you don't plan to use any of the resources that you created, delete them so you don't
incur any charges:
2. From the list, select the resource group that you created.
Next steps
Learn about all of the deployment options for Azure Machine Learning.
Learn how to authenticate to the deployed model.
Make predictions on large quantities of data asynchronously.
Monitor your Azure Machine Learning models with Application Insights.
Tutorial: Build an Azure Machine
Learning pipeline for image
classification
Article • 06/01/2023
７ Note
For a tutorial that uses SDK v2 to build a pipeline, see Tutorial: Use ML pipelines
for production ML workflows with Python SDK v2 in a Jupyter Notebook.
In this tutorial, you learn how to build an Azure Machine Learning pipeline to prepare
data and train a machine learning model. Machine learning pipelines optimize your
workflow with speed, portability, and reuse, so you can focus on machine learning
instead of infrastructure and automation.
The example trains a small Keras convolutional neural network to classify images in
the Fashion MNIST dataset.
In this tutorial, you complete the following tasks:
＂ Configure workspace
＂ Create an Experiment to hold your work
＂ Provision a ComputeTarget to do the work
＂ Create a Dataset in which to store compressed data
＂ Create a pipeline step to prepare the data for training
＂ Define a runtime Environment in which to perform training
＂ Create a pipeline step to define the neural network and perform the training
＂ Compose a Pipeline from the pipeline steps
＂ Run the pipeline in the experiment
＂ Review the output of the steps and the trained neural network
＂ Register the model for further use
If you don't have an Azure subscription, create a free account before you begin. Try the
free or paid version of Azure Machine Learning today.
Prerequisites
Complete Create resources to get started if you don't already have an Azure
Machine Learning workspace.
A Python environment in which you've installed both the azureml-core and
azureml-pipeline packages. This environment is for defining and controlling your
Azure Machine Learning resources and is separate from the environment used at
runtime for training.
） Important
Currently, the most recent Python release compatible with azureml-pipeline is

Python 3.8. If you've difficulty installing the azureml-pipeline package, ensure that
python --version is a compatible release. Consult the documentation of your
Python virtual environment manager ( venv , conda , and so on) for instructions.
Start an interactive Python session

This tutorial uses the Python SDK for Azure Machine Learning to create and control an
Azure Machine Learning pipeline. The tutorial assumes that you'll be running the code
snippets interactively in either a Python REPL environment or a Jupyter notebook.
This tutorial is based on the image-classification.ipynb notebook found in the

python-sdk/tutorial/using-pipelines directory of the Azure Machine Learning
Examples repository. The source code for the steps themselves is in the keras-
mnist-fashion subdirectory.
Import types
Import all the Azure Machine Learning types that you'll need for this tutorial:
Python
import os
import azureml.core
from azureml.core import (
Workspace,
Experiment,
Dataset,
Datastore,
ComputeTarget,
Environment,
ScriptRunConfig
)
from azureml.data import OutputFileDatasetConfig
from azureml.core.compute import AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import Pipeline
# check core SDK version number

print("Azure Machine Learning SDK Version: ", azureml.core.VERSION)
The Azure Machine Learning SDK version should be 1.37 or greater. If it isn't, upgrade
with pip install --upgrade azureml-core .
Configure workspace
Create a workspace object from the existing Azure Machine Learning workspace.
Python
workspace = Workspace.from_config()
） Important
This code snippet expects the workspace configuration to be saved in the current
directory or its parent. For more information on creating a workspace, see Create
workspace resources. For more information on saving the configuration to file, see
Create a workspace configuration file.
Create the infrastructure for your pipeline

Create an Experiment object to hold the results of your pipeline runs:
Python
exp = Experiment(workspace=workspace, name="keras-mnist-fashion")
Create a ComputeTarget that represents the machine resource on which your pipeline
will run. The simple neural network used in this tutorial trains in just a few minutes even
on a CPU-based machine. If you wish to use a GPU for training, set use_gpu to True .
Provisioning a compute target generally takes about five minutes.
Python
use_gpu = False
# choose a name for your cluster

cluster_name = "gpu-cluster" if use_gpu else "cpu-cluster"
found = False
# Check if this compute target already exists in the workspace.
cts = workspace.compute_targets
if cluster_name in cts and cts[cluster_name].type == "AmlCompute":
found = True
print("Found existing compute target.")
compute_target = cts[cluster_name]
if not found:
print("Creating a new compute target...")
compute_config = AmlCompute.provisioning_configuration(
vm_size= "STANDARD_NC6" if use_gpu else "STANDARD_D2_V2"
# vm_priority = 'lowpriority', # optional
max_nodes=4,
)
# Create the cluster.

compute_target = ComputeTarget.create(workspace, cluster_name,
compute_config)
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min_node_count is provided, it will use the scale settings for
the cluster.
compute_target.wait_for_completion(
show_output=True, min_node_count=None, timeout_in_minutes=10
)
# For a more detailed view of current AmlCompute status, use
get_status().print(compute_target.get_status().serialize())
７ Note
GPU availability depends on the quota of your Azure subscription and upon Azure
capacity. See Manage and increase quotas for resources with Azure Machine
Learning.
Create a dataset for the Azure-stored data

Fashion-MNIST is a dataset of fashion images divided into 10 classes. Each image is a
28x28 grayscale image and there are 60,000 training and 10,000 test images. As an
image classification problem, Fashion-MNIST is harder than the classic MNIST
handwritten digit database. It's distributed in the same compressed binary form as the
original handwritten digit database .
To create a Dataset that references the Web-based data, run:
Python
data_urls =
["https://data4mldemo6150520719.blob.core.windows.net/demo/mnist-fashion"]
fashion_ds = Dataset.File.from_files(data_urls)
# list the files referenced by fashion_ds

print(fashion_ds.to_path())
This code completes quickly. The underlying data remains in the Azure storage resource
specified in the data_urls array.
Create the data-preparation pipeline step

The first step in this pipeline will convert the compressed data files of fashion_ds into a
dataset in your own workspace consisting of CSV files ready for use in training. Once
registered with the workspace, your collaborators can access this data for their own
analysis, training, and so on
Python
datastore = workspace.get_default_datastore()
prepared_fashion_ds = OutputFileDatasetConfig(
destination=(datastore, "outputdataset/{run-id}")
).register_on_complete(name="prepared_fashion_ds")
The above code specifies a dataset that is based on the output of a pipeline step. The
underlying processed files will be put in the workspace's default datastore's blob
storage at the path specified in destination . The dataset will be registered in the
workspace with the name prepared_fashion_ds .
Create the pipeline step's source

The code that you've executed so far has create and controlled Azure resources. Now it's
time to write code that does the first step in the domain.
If you're following along with the example in the Azure Machine Learning Examples
repo , the source file is already available as keras-mnist-fashion/prepare.py .
If you're working from scratch, create a subdirectory called keras-mnist-fashion/ .

Create a new file, add the following code to it, and name the file prepare.py .
Python
# prepare.py
# Converts MNIST-formatted files at the passed-in input path to a passed-in
output path
import os
import sys
# Conversion routine for MNIST binary format

def convert(imgf, labelf, outf, n):
f = open(imgf, "rb")
l = open(labelf, "rb")
o = open(outf, "w")
f.read(16)
l.read(8)
images = []
for i in range(n):
image = [ord(l.read(1))]
for j in range(28 * 28):
image.append(ord(f.read(1)))
images.append(image)
for image in images:

o.write(",".join(str(pix) for pix in image) + "\n")
f.close()
o.close()
l.close()
# The MNIST-formatted source

mounted_input_path = sys.argv[1]
# The output directory at which the outputs will be written
mounted_output_path = sys.argv[2]
# Create the output directory

os.makedirs(mounted_output_path, exist_ok=True)
# Convert the training data

convert(
os.path.join(mounted_input_path, "mnist-fashion/train-images-idx3-
ubyte"),
os.path.join(mounted_input_path, "mnist-fashion/train-labels-idx1-
ubyte"),
os.path.join(mounted_output_path, "mnist_train.csv"),
60000,
)
# Convert the test data

convert(
os.path.join(mounted_input_path, "mnist-fashion/t10k-images-idx3-
ubyte"),
os.path.join(mounted_input_path, "mnist-fashion/t10k-labels-idx1-
ubyte"),
os.path.join(mounted_output_path, "mnist_test.csv"),
10000,
)
The code in prepare.py takes two command-line arguments: the first is assigned to
mounted_input_path and the second to mounted_output_path . If that subdirectory doesn't
exist, the call to os.makedirs creates it. Then, the program converts the training and
testing data and outputs the comma-separated files to the mounted_output_path .
Specify the pipeline step

Back in the Python environment you're using to specify the pipeline, run this code to
create a PythonScriptStep for your preparation code:
Python
script_folder = "./keras-mnist-fashion"
prep_step = PythonScriptStep(
name="prepare step",
script_name="prepare.py",
# On the compute target, mount fashion_ds dataset as input,
prepared_fashion_ds as output
arguments=[fashion_ds.as_named_input("fashion_ds").as_mount(),
prepared_fashion_ds],
source_directory=script_folder,
compute_target=compute_target,
allow_reuse=True,
)
The call to PythonScriptStep specifies that, when the pipeline step is run:
All the files in the script_folder directory are uploaded to the compute_target
Among those uploaded source files, the file prepare.py will be run
The fashion_ds and prepared_fashion_ds datasets will be mounted on the
compute_target and appear as directories
The path to the fashion_ds files will be the first argument to prepare.py . In
prepare.py , this argument is assigned to mounted_input_path
The path to the prepared_fashion_ds will be the second argument to prepare.py .
In prepare.py , this argument is assigned to mounted_output_path
Because allow_reuse is True , it won't be rerun until its source files or inputs
change
This PythonScriptStep will be named prepare step
Modularity and reuse are key benefits of pipelines. Azure Machine Learning can
automatically determine source code or Dataset changes. The output of a step that isn't
affected will be reused without rerunning the steps again if allow_reuse is True . If a
step relies on a data source external to Azure Machine Learning that may change (for
instance, a URL that contains sales data), set allow_reuse to False and the pipeline step
will run every time the pipeline is run.
Create the training step

Once the data has been converted from the compressed format to CSV files, it can be
used for training a convolutional neural network.
Create the training step's source

With larger pipelines, it's a good practice to put each step's source code in a separate
directory ( src/prepare/ , src/train/ , and so on) but for this tutorial, just use or create
the file train.py in the same keras-mnist-fashion/ source directory.
Python
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.utils import to_categorical
from keras.callbacks import Callback
import numpy as np
import pandas as pd
import os
from sklearn.model_selection import train_test_split
# dataset object from the run

dataset = run.input_datasets["prepared_fashion_ds"]
# split dataset into train and test set
(train_dataset, test_dataset) = dataset.random_split(percentage=0.8,
seed=111)
# load dataset into pandas dataframe

data_train = train_dataset.to_pandas_dataframe()
data_test = test_dataset.to_pandas_dataframe()
img_rows, img_cols = 28, 28

input_shape = (img_rows, img_cols, 1)
X = np.array(data_train.iloc[:, 1:])
y = to_categorical(np.array(data_train.iloc[:, 0]))
# here we split validation data to optimiza classifier during training

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2,
random_state=13)
# test data
X_test = np.array(data_test.iloc[:, 1:])
y_test = to_categorical(np.array(data_test.iloc[:, 0]))
X_train = (
X_train.reshape(X_train.shape[0], img_rows, img_cols,
1).astype("float32") / 255
)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols,
X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols,
batch_size = 256
num_classes = 10
epochs = 10
# construct neuron network

model = Sequential()
model.add(
Conv2D(
32,
kernel_size=(3, 3),
activation="relu",
kernel_initializer="he_normal",
input_shape=input_shape,
)
)
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation="relu"))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))
model.compile(
loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=["accuracy"],
)
# start an Azure ML run
class LogRunMetrics(Callback):
# callback at the end of every epoch
def on_epoch_end(self, epoch, log):
# log a value repeated which creates a list
run.log("Loss", log["loss"])
run.log("Accuracy", log["accuracy"])
history = model.fit(
X_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(X_val, y_val),
callbacks=[LogRunMetrics()],
)
score = model.evaluate(X_test, y_test, verbose=0)
# log a single value

run.log("Final test loss", score[0])
print("Test loss:", score[0])
run.log("Final test accuracy", score[1])

print("Test accuracy:", score[1])
plt.figure(figsize=(6, 3))
plt.title("Fashion MNIST with Keras ({} epochs)".format(epochs),
fontsize=14)
plt.plot(history.history["accuracy"], "b-", label="Accuracy", lw=4,
alpha=0.5)
plt.plot(history.history["loss"], "r--", label="Loss", lw=4, alpha=0.5)
plt.legend(fontsize=12)
plt.grid(True)
# log an image
run.log_image("Loss v.s. Accuracy", plot=plt)
# create a ./outputs/model folder in the compute target

# files saved in the "./outputs" folder are automatically uploaded into run
history
os.makedirs("./outputs/model", exist_ok=True)
# serialize NN architecture to JSON

model_json = model.to_json()
# save model JSON
with open("./outputs/model/model.json", "w") as f:
f.write(model_json)
# save model weights
model.save_weights("./outputs/model/model.h5")
print("model saved in ./outputs/model folder")
Most of this code should be familiar to ML developers:
The data is partitioned into train and validation sets for training, and a separate
test subset for final scoring
The input shape is 28x28x1 (only 1 because the input is grayscale), there will be
256 inputs in a batch, and there are 10 classes
The number of training epochs will be 10
The model has three convolutional layers, with max pooling and dropout, followed
by a dense layer and softmax head
The model is fitted for 10 epochs and then evaluated
The model architecture is written to outputs/model/model.json and the weights to
outputs/model/model.h5
Some of the code, though, is specific to Azure Machine Learning. run =

Run.get_context() retrieves a Run object, which contains the current service context.
The train.py source uses this run object to retrieve the input dataset via its name (an
alternative to the code in prepare.py that retrieved the dataset via the argv array of
script arguments).
The run object is also used to log the training progress at the end of every epoch and,
at the end of training, to log the graph of loss and accuracy over time.
Create the training pipeline step

The training step has a slightly more complex configuration than the preparation step.
The preparation step used only standard Python libraries. More commonly, you'll need
to modify the runtime environment in which your source code runs.
Create a file conda_dependencies.yml with the following contents:
yml
dependencies:
- python=3.7
- pip:
- azureml-core
- azureml-dataset-runtime
- keras==2.4.3
- tensorflow==2.4.3
- numpy
- scikit-learn
- pandas
- matplotlib
The Environment class represents the runtime environment in which a machine learning
task runs. Associate the above specification with the training code with:
Python
keras_env = Environment.from_conda_specification(
name="keras-env", file_path="./conda_dependencies.yml"
)
train_cfg = ScriptRunConfig(
script="train.py",
environment=keras_env,
)
Creating the training step itself uses code similar to the code used to create the
preparation step:
Python
train_step = PythonScriptStep(
name="train step",
arguments=[
prepared_fashion_ds.read_delimited_files().as_input(name="prepared_fashion_d
s")
],
source_directory=train_cfg.source_directory,
script_name=train_cfg.script,
runconfig=train_cfg.run_config,
)
Create and run the pipeline

Now that you've specified data inputs and outputs and created your pipeline's steps,
you can compose them into a pipeline and run it:
Python
pipeline = Pipeline(workspace, steps=[prep_step, train_step])

run = exp.submit(pipeline)
The Pipeline object you create runs in your workspace and is composed of the
preparation and training steps you've specified.
７ Note
This pipeline has a simple dependency graph: the training step relies on the
preparation step and the preparation step relies on the fashion_ds dataset.
Production pipelines will often have much more complex dependencies. Steps may
rely on multiple upstream steps, a source code change in an early step may have
far-reaching consequences, and so on. Azure Machine Learning tracks these
concerns for you. You need only pass in the array of steps and Azure Machine
Learning takes care of calculating the execution graph.
The call to submit the Experiment completes quickly, and produces output similar to:
.NET CLI
Submitted PipelineRun 5968530a-abcd-1234-9cc1-46168951b5eb

Link to Azure Machine Learning Portal: https://ml.azure.com/runs/abc-xyz...
You can monitor the pipeline run by opening the link or you can block until it completes
by running:
Python
run.wait_for_completion(show_output=True)
） Important
The first pipeline run takes roughly 15 minutes. All dependencies must be
downloaded, a Docker image is created, and the Python environment is
provisioned and created. Running the pipeline again takes significantly less time
because those resources are reused instead of created. However, total run time for
the pipeline depends on the workload of your scripts and the processes that are
running in each pipeline step.
Once the pipeline completes, you can retrieve the metrics you logged in the training
step:
Python
run.find_step_run("train step")[0].get_metrics()
If you're satisfied with the metrics, you can register the model in your workspace:
Python
run.find_step_run("train step")[0].register_model(
model_name="keras-model",
model_path="outputs/model/",
datasets=[("train test data", fashion_ds)],
)
Clean up resources
Don't complete this section if you plan to run other Azure Machine Learning tutorials.
Stop the compute instance

If you used a compute instance, stop the VM when you aren't using it to reduce cost.
1. In your workspace, select Compute.
2. From the list, select the name of the compute instance.
3. Select Stop.
4. When you're ready to use the server again, select Start.
Delete everything
If you don't plan to use the resources you created, delete them, so you don't incur any
charges:
1. In the Azure portal, in the left menu, select Resource groups.

2. In the list of resource groups, select the resource group you created.
4. Enter the resource group name. Then, select Delete.
workspace properties, and then select Delete.
Next steps
In this tutorial, you used the following types:
The Workspace represents your Azure Machine Learning workspace. It contained:

The Experiment that contains the results of training runs of your pipeline
The Dataset that lazily loaded the data held in the Fashion-MNIST datastore
The ComputeTarget that represents the machine(s) on which the pipeline steps
run
The Environment that is the runtime environment in which the pipeline steps run
The Pipeline that composes the PythonScriptStep steps into a whole
The Model that you registered after being satisfied with the training process
The Workspace object contains references to other resources (notebooks, endpoints, and
so on) that weren't used in this tutorial. For more, see What is an Azure Machine
Learning workspace?.
The OutputFileDatasetConfig promotes the output of a run to a file-based dataset. For

more information on datasets and working with data, see How to access data.
For more on compute targets and environments, see What are compute targets in Azure
Machine Learning? and What are Azure Machine Learning environments?
The ScriptRunConfig associates a ComputeTarget and Environment with Python source

files. A PythonScriptStep takes that ScriptRunConfig and defines its inputs and outputs,
which in this pipeline was the file dataset built by the OutputFileDatasetConfig .
For more examples of how to build pipelines by using the machine learning SDK, see the
example repository .
Tutorial: Power BI integration - Create
the predictive model with a Jupyter
Notebook (part 1 of 2)
Article • 04/23/2023
In part 1 of this tutorial, you train and deploy a predictive machine learning model by
using code in a Jupyter Notebook. You also create a scoring script to define the input
and output schema of the model for integration into Power BI. In part 2, you use the
model to predict outcomes in Microsoft Power BI.
＂ Create a Jupyter Notebook.

＂ Create an Azure Machine Learning compute instance.
＂ Train a regression model by using scikit-learn.
＂ Write a scoring script that defines the input and output for easy integration into
Microsoft Power BI.
＂ Deploy the model to a real-time scoring endpoint.
Prerequisites
An Azure subscription. If you don't already have a subscription, you can use a free
trial .
An Azure Machine Learning workspace. If you don't already have a workspace, see
Create workspace resources.
Introductory knowledge of the Python language and machine learning workflows.
Create a notebook and compute

On the Azure Machine Learning Studio home page, select Create new > Notebook:
On the Create a new file page:
1. Name your notebook (for example, my_model_notebook).

2. Change the File Type to Notebook.
3. Select Create.
Next, to run code cells, create a compute instance and attach it to your notebook. Start
by selecting the plus icon at the top of the notebook:
On the Create compute instance page:
1. Choose a CPU virtual machine size. For this tutorial, you can choose a
Standard_D11_v2, with 2 cores and 14 GB of RAM.
2. Select Next.
3. On the Configure Settings page, provide a valid Compute name. Valid characters
are uppercase and lowercase letters, digits, and hyphens (-).
4. Select Create.
In the notebook, you might notice the circle next to Compute turned cyan. This color
change indicates that the compute instance is being created:
７ Note
The compute instance can take 2 to 4 minutes to be provisioned.

After the compute is provisioned, you can use the notebook to run code cells. For
example, in the cell you can type the following code:
Python
import numpy as np
np.sin(3)
Then select Shift + Enter (or select Control + Enter or select the Play button next to the
cell). You should see the following output:
Now you're ready to build a machine learning model.
Build a model by using scikit-learn

In this tutorial, you use the Diabetes dataset . This dataset is available in Azure Open
Datasets .
Import data
To import your data, copy the following code and paste it into a new code cell in your
notebook.
Python
from azureml.opendatasets import Diabetes
diabetes = Diabetes.get_tabular_dataset()
X = diabetes.drop_columns("Y")
y = diabetes.keep_columns("Y")
X_df = X.to_pandas_dataframe()
y_df = y.to_pandas_dataframe()
X_df.info()
The X_df pandas data frame contains 10 baseline input variables. These variables
include age, sex, body mass index, average blood pressure, and six blood serum
measurements. The y_df pandas data frame is the target variable. It contains a
quantitative measure of disease progression one year after the baseline. The data frame
contains 442 records.
Train the model

Create a new code cell in your notebook. Then copy the following code and paste it into
the cell. This code snippet constructs a ridge regression model and serializes the model
by using the Python pickle format.
Python
import joblib
from sklearn.linear_model import Ridge
model = Ridge().fit(X_df,y_df)
joblib.dump(model, 'sklearn_regression_model.pkl')
Register the model

In addition to the content of the model file itself, your registered model will store
metadata. The metadata includes the model description, tags, and framework
information.
Metadata is useful when you're managing and deploying models in your workspace. By
using tags, for instance, you can categorize your models and apply filters when you list
models in your workspace. Also, if you mark this model with the scikit-learn framework,
you'll simplify deploying it as a web service.
Copy the following code and then paste it into a new code cell in your notebook.
Python
import sklearn

from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration
model = Model.register(workspace=ws,
model_name='my-sklearn-model', # Name
of the registered model in your workspace.
model_path='./sklearn_regression_model.pkl', # Local
file to upload and register as a model.
model_framework=Model.Framework.SCIKITLEARN, #
Framework used to create the model.
model_framework_version=sklearn.__version__, #
Version of scikit-learn used to create the model.
sample_input_dataset=X,
sample_output_dataset=y,
resource_configuration=ResourceConfiguration(cpu=2,
memory_in_gb=4),
description='Ridge regression model to predict
diabetes progression.',
tags={'area': 'diabetes', 'type': 'regression'})
print('Name:', model.name)
print('Version:', model.version)
You can also view the model in Azure Machine Learning Studio. In the menu on the left,
select Models:
Define the scoring script

When you deploy a model that will be integrated into Power BI, you need to define a
Python scoring script and custom environment. The scoring script contains two
functions:
The init() function runs when the service starts. It loads the model (which is
automatically downloaded from the model registry) and deserializes it.
The run(data) function runs when a call to the service includes input data that
needs to be scored.
７ Note
The Python decorators in the code below define the schema of the input and
output data, which is important for integration into Power BI.
Copy the following code and paste it into a new code cell in your notebook. The
following code snippet has cell magic that writes the code to a file named score.py.
Python
%%writefile score.py
import json
import pickle
import numpy as np
import pandas as pd
import os
import joblib
from inference_schema.schema_decorators import input_schema, output_schema

from inference_schema.parameter_types.numpy_parameter_type import
NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import
PandasParameterType
def init():
global model
# Replace filename if needed.
path = os.getenv('AZUREML_MODEL_DIR')
model_path = os.path.join(path, 'sklearn_regression_model.pkl')
# Deserialize the model file back into a sklearn model.
model = joblib.load(model_path)
input_sample = pd.DataFrame(data=[{
"AGE": 5,
"SEX": 2,
"BMI": 3.1,
"BP": 3.1,
"S1": 3.1,
"S2": 3.1,
"S3": 3.1,
"S4": 3.1,
"S5": 3.1,
"S6": 3.1
}])
# This is an integer type sample. Use the data type that reflects the
expected result.
output_sample = np.array([0])
# To indicate that we support a variable length of data input,

# set enforce_shape=False
@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
try:
print("input_data....")
print(data.columns)
print(type(data))
result = model.predict(data)
print("result.....")
print(result)
# You can return any data type, as long as it can be serialized by JSON.
return result.tolist()
except Exception as e:
error = str(e)
return error
Define the custom environment

Next, define the environment to score the model. In the environment, define the Python
packages, such as pandas and scikit-learn, that the scoring script (score.py) requires.
To define the environment, copy the following code and paste it into a new code cell in
your notebook.
Python

environment = Environment('my-sklearn-environment')
environment.python.conda_dependencies =
CondaDependencies.create(pip_packages=[
'azureml-defaults',
'inference-schema[numpy-support]',
'joblib',
'numpy',
'pandas',
'scikit-learn=={}'.format(sklearn.__version__)
])
inference_config =
InferenceConfig(entry_script='./score.py',environment=environment)
Deploy the model
To deploy the model, copy the following code and paste it into a new code cell in your
notebook:
Python
service_name = 'my-diabetes-model'
service = Model.deploy(ws, service_name, [model], inference_config,

overwrite=True)
７ Note
The service can take 2 to 4 minutes to deploy.
If the service deploys successfully, you should see the following output:
txt
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local

deployment: https://aka.ms/debugimage#debug-locally to debug if deployment
takes longer than 10 minutes.
Running.....................................................................
.................
Succeeded
ACI service creation operation finished, operation "Succeeded"
You can also view the service in Azure Machine Learning Studio. In the menu on the left,
select Endpoints:
We recommend that you test the web service to ensure it works as expected. To return
your notebook, in Azure Machine Learning Studio, in the menu on the left, select
Notebooks. Then copy the following code and paste it into a new code cell in your
notebook to test the service.
Python
import json
input_payload = json.dumps({
'data': X_df[0:2].values.tolist()
})
output = service.run(input_payload)
print(output)
The output should look like this JSON structure: {'predict': [[205.59], [68.84]]} .
Next steps
In this tutorial, you saw how to build and deploy a model so that it can be consumed by
Power BI. In the next part, you'll learn how to consume this model in a Power BI report.
Tutorial: Consume a model in Power BI

What is an Azure Machine Learning
workspace?
Article • 04/12/2023
Workspaces are places to collaborate with colleagues to create machine learning

artifacts and group related work. For example, experiments, jobs, datasets, models,
components, and inference endpoints. This article describes workspaces, how to
manage access to them, and how to use them to organize your work.
Ready to get started? Create a workspace.
Tasks performed within a workspace

For machine learning teams, the workspace is a place to organize their work. Below are
some of the tasks you can start from a workspace:
Create jobs - Jobs are training runs you use to build your models. You can group
jobs into experiments to compare metrics.
Author pipelines - Pipelines are reusable workflows for training and retraining your
model.
Register data assets - Data assets aid in management of the data you use for
model training and pipeline creation.
Register models - Once you have a model you want to deploy, you create a
registered model.
Deploy a model - Use the registered model and a scoring script to deploy a model.
Besides grouping your machine learning results, workspaces also host resource
configurations:
Compute targets are used to run your experiments.

Datastores define how you and others can connect to data sources when using
data assets.
Security settings - Networking, identity and access control, and encryption settings.
Organizing workspaces
For machine learning team leads and administrators, workspaces serve as containers for
access management, cost management and data isolation. Below are some tips for
organizing workspaces:
Use user roles for permission management in the workspace between users. For
example a data scientist, a machine learning engineer or an admin.
Assign access to user groups: By using Azure Active Directory user groups, you
don't have to add individual users to each workspace, and to other resources the
same group of users requires access to.
Create a workspace per project: While a workspace can be used for multiple
projects, limiting it to one project per workspace allows for cost reporting accrued
to a project level. It also allows you to manage configurations like datastores in the
scope of each project.
Share Azure resources: Workspaces require you to create several associated
resources. Share these resources between workspaces to save repetitive setup
steps.
Enable self-serve: Pre-create and secure associated resources as an IT admin, and
use user roles to let data scientists create workspaces on their own.
Share assets: You can share assets between workspaces using Azure Machine
Learning registries.
How is my content stored in a workspace?

Your workspace keeps a history of all training runs, with logs, metrics, output, lineage
metadata, and a snapshot of your scripts. As you perform tasks in Azure Machine
Learning, artifacts are generated. Their metadata and data are stored in the workspace
and on its associated resources.
Associated resources
When you create a new workspace, you're required to bring other Azure resources to
store your data. If not provided by you, these resources will automatically be created by
Azure Storage account . Stores machine learning artifacts such as job logs. By
default, this storage account is used when you upload data to the workspace.
Jupyter notebooks that are used with your Azure Machine Learning compute
instances are stored here as well.
） Important
To use an existing Azure Storage account, it can't be of type BlobStorage, a

premium account (Premium_LRS and Premium_GRS) and cannot have a
hierarchical namespace (used with Azure Data Lake Storage Gen2). You can
use premium storage or hierarchical namespace as additional storage by
creating a datastore. Do not enable hierarchical namespace on the storage
account after upgrading to general-purpose v2. If you bring an existing
general-purpose v1 storage account, you may upgrade this to general-
purpose v2 after the workspace has been created.
Azure Container Registry . Stores created docker containers, when you build
custom environments via Azure Machine Learning. Scenarios that trigger creation
of custom environments include AutoML when deploying models and data
profiling.
７ Note
Workspaces can be created without Azure Container Registry as a dependency
if you do not have a need to build custom docker containers. To read
container images, Azure Machine Learning also works with external container
registries. Azure Container Registry is automatically provisioned when you
build custom docker images. Use Azure RBAC to prevent customer docker
containers from being built.
７ Note
If your subscription setting requires adding tags to resources under it, Azure
Container Registry (ACR) created by Azure Machine Learning will fail, since we
cannot set tags to ACR.
Azure Application Insights . Helps you monitor and collect diagnostic information
from your inference endpoints.
Azure Key Vault . Stores secrets that are used by compute targets and other
sensitive information that's needed by the workspace.
Create a workspace
There are multiple ways to create a workspace. To get started use one of the following
options:
The Azure Machine Learning studio lets you quickly create a workspace with
default settings.
Use Azure portal for a point-and-click interface with more security options.
Use the VS Code extension if you work in Visual Studio Code.
To automate workspace creation using your preferred security settings:
Azure Resource Manager / Bicep templates provide a declarative syntax to deploy

Azure resources. An alternative option is to use Terraform. Also see How to create
a secure workspace by using a template.
Use the Azure Machine Learning CLI or Azure Machine Learning SDK for Python for
prototyping and as part of your MLOps workflows.
Use REST APIs directly in scripting environment, for platform integration or in

MLOps workfows.
Tools for workspace interaction and
management
Once your workspace is set up, you can interact with it in the following ways:
On the web:
Azure Machine Learning studio
Azure Machine Learning designer
In any Python environment with the Azure Machine Learning SDK

On the command line using the Azure Machine Learning CLI extension v1
Azure Machine Learning VS Code Extension
The following workspace management tasks are available in each interface.
Workspace management task Portal Studio Python SDK Azure CLI VS Code
Create a workspace ✓ ✓ ✓ ✓ ✓
Manage workspace access ✓ ✓
Create and manage compute resources ✓ ✓ ✓ ✓ ✓
Create a compute instance ✓ ✓ ✓ ✓
２ Warning
Moving your Azure Machine Learning workspace to a different subscription, or

moving the owning subscription to a new tenant, is not supported. Doing so may
cause errors.
Sub resources
When you create compute clusters and compute instances in Azure Machine Learning,
sub resources are created.
VMs: provide computing power for compute instances and compute clusters,
which you use to run jobs.
Load Balancer: a network load balancer is created for each compute instance and
compute cluster to manage traffic even while the compute instance/cluster is
stopped.
Virtual Network: these help Azure resources communicate with one another, the
internet, and other on-premises networks.
Bandwidth: encapsulates all outbound data transfers across regions.
Next steps
To learn more about planning a workspace for your organization's requirements, see
Organize and set up Azure Machine Learning.
To get started with Azure Machine Learning, see:

Create and manage a workspace
Recover a workspace after deletion (soft-delete)
Get started with Azure Machine Learning
Tutorial: Create your first classification model with automated machine learning
What is an Azure Machine Learning
compute instance?
Article • 04/04/2023
An Azure Machine Learning compute instance is a managed cloud-based workstation

for data scientists. Each compute instance has only one owner, although you can share
files between multiple compute instances.
Compute instances make it easy to get started with Azure Machine Learning
development and provide management and enterprise readiness capabilities for IT
administrators.
Use a compute instance as your fully configured and managed development

environment in the cloud for machine learning. They can also be used as a compute
target for training and inferencing for development and testing purposes.
For compute instance Jupyter functionality to work, ensure that web socket
communication isn't disabled. Ensure your network allows websocket connections to
*.instances.azureml.net and *.instances.azureml.ms.
） Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
Why use a compute instance?

A compute instance is a fully managed cloud-based workstation optimized for your
machine learning development environment. It provides the following benefits:
Key benefits Description

Key benefits Description
Productivity You can build and deploy models using integrated notebooks and the
following tools in Azure Machine Learning studio:
- Jupyter
- JupyterLab
- VS Code (preview)
Compute instance is fully integrated with Azure Machine Learning
workspace and studio. You can share notebooks and data with other data
scientists in the workspace.
Managed & secure Reduce your security footprint and add compliance with enterprise
security requirements. Compute instances provide robust management
policies and secure networking configurations such as:
- Autoprovisioning from Resource Manager templates or Azure Machine

Learning SDK
- Azure role-based access control (Azure RBAC)
- Virtual network support
- Azure policy to disable SSH access
- Azure policy to enforce creation in a virtual network
- Auto-shutdown/auto-start based on schedule
- TLS 1.2 enabled
Preconfigured for ML Save time on setup tasks with pre-configured and up-to-date ML
packages, deep learning frameworks, GPU drivers.
Fully customizable Broad support for Azure VM types including GPUs and persisted low-level
customization such as installing packages and drivers makes advanced
scenarios a breeze. You can also use setup scripts to automate
customization
Secure your compute instance with No public IP.

The compute instance is also a secure training compute target similar to compute
clusters, but it's single node.
You can create a compute instance yourself, or an administrator can create a
compute instance on your behalf.
You can also use a setup script for an automated way to customize and configure
the compute instance as per your needs.
To save on costs, create a schedule to automatically start and stop the compute
instance, or enable idle shutdown
Tools and environments

Azure Machine Learning compute instance enables you to author, train, and deploy
models in a fully integrated notebook experience in your workspace.
You can run Jupyter notebooks in VS Code using compute instance as the remote
server with no SSH needed. You can also enable VS Code integration through remote
SSH extension .
You can install packages and add kernels to your compute instance.
Following tools and environments are already installed on the compute instance:
General tools & environments Details
Drivers CUDA
cuDNN
NVIDIA
Blob FUSE
Intel MPI library
Azure CLI
Azure Machine Learning samples
Docker
Nginx
NCCL 2.0
Protobuf
R tools & environments Details
R kernel
You can Add RStudio or Posit Workbench (formerly RStudio Workbench) when you
create the instance.
PYTHON tools & Details

environments
Anaconda Python
Jupyter and
extensions
Jupyterlab and
extensions
PYTHON tools & Details
environments
Azure Machine Includes azure-ai-ml and many common azure extra packages. To see the
Learning SDK for full list, open a terminal window on your compute instance and run
Python conda list -n azureml_py310_sdkv2 âzure
from PyPI
Other PyPI packages jupytext

tensorboard
nbconvert
notebook
Pillow
Conda packages cython

numpy
ipykernel
scikit-learn
matplotlib
tqdm
joblib
nodejs
Deep learning PyTorch

packages TensorFlow
Keras
Horovod
MLFlow
pandas-ml
scrapbook
ONNX packages keras2onnx

onnx
onnxconverter-common
skl2onnx
onnxmltools
Azure Machine
Learning Python
samples
Python packages are all installed in the Python 3.8 - AzureML environment. Compute
instance has Ubuntu 20.04 as the base OS.
Accessing files
Notebooks and Python scripts are stored in the default storage account of your
workspace in Azure file share. These files are located under your "User files" directory.
This storage makes it easy to share notebooks between compute instances. The storage
account also keeps your notebooks safely preserved when you stop or delete a compute
instance.
The Azure file share account of your workspace is mounted as a drive on the compute
instance. This drive is the default working directory for Jupyter, Jupyter Labs, RStudio,
and Posit Workbench. This means that the notebooks and other files you create in
Jupyter, JupyterLab, RStudio, or Posit are automatically stored on the file share and
available to use in other compute instances as well.
The files in the file share are accessible from all compute instances in the same
workspace. Any changes to these files on the compute instance will be reliably persisted
back to the file share.
You can also clone the latest Azure Machine Learning samples to your folder under the
user files directory in the workspace file share.
Writing small files can be slower on network drives than writing to the compute instance
local disk itself. If you're writing many small files, try using a directory directly on the
compute instance, such as a /tmp directory. Note these files won't be accessible from
other compute instances.
Don't store training data on the notebooks file share. You can use the /tmp directory on
the compute instance for your temporary data. However, don't write large files of data
on the OS disk of the compute instance. OS disk on compute instance has 128-GB
capacity. You can also store temporary training data on temporary disk mounted on
/mnt. Temporary disk size is based on the VM size chosen and can store larger amounts
of data if a higher size VM is chosen. Any software packages you install are saved on the
OS disk of compute instance. Note customer managed key encryption is currently not
supported for OS disk. The OS disk for compute instance is encrypted with Microsoft-
managed keys.
You can also mount datastores and datasets.
Create
Follow the steps in Create resources you need to get started to create a basic compute
instance.
For more options, see create a new compute instance.

As an administrator, you can create a compute instance for others in the workspace.
You can also use a setup script for an automated way to customize and configure the
compute instance.
Other ways to create a compute instance:
Directly from the integrated notebooks experience.

From Azure Resource Manager template. For an example template, see the create
an Azure Machine Learning compute instance template .
With Azure Machine Learning SDK
From the CLI extension for Azure Machine Learning
The dedicated cores per region per VM family quota and total regional quota, which
applies to compute instance creation, is unified and shared with Azure Machine Learning
training compute cluster quota. Stopping the compute instance doesn't release quota to
ensure you'll be able to restart the compute instance. Don't stop the compute instance
through the OS terminal by doing a sudo shutdown.
Compute instance comes with P10 OS disk. Temp disk type depends on the VM size
chosen. Currently, it isn't possible to change the OS disk type.
Compute target
Compute instances can be used as a training compute target similar to Azure Machine
Learning compute training clusters. But a compute instance has only a single node,
while a compute cluster can have more nodes.
A compute instance:
Has a job queue.

Runs jobs securely in a virtual network environment, without requiring enterprises
to open up SSH port. The job executes in a containerized environment and
packages your model dependencies in a Docker container.
Can run multiple small jobs in parallel. One job per core can run in parallel while
the rest of the jobs are queued.
Supports single-node multi-GPU distributed training jobs
You can use compute instance as a local inferencing deployment target for test/debug
scenarios.
 Tip
The compute instance has 120GB OS disk. If you run out of disk space and get into
an unusable state, please clear at least 5 GB disk space on OS disk (mounted on /)
through the compute instance terminal by removing files/folders and then do sudo
reboot . Temporary disk will be freed after restart; you do not need to clear space on
temp disk manually. To access the terminal go to compute list page or compute
instance details page and click on Terminal link. You can check available disk space
by running df -h on the terminal. Clear at least 5 GB space before doing sudo
reboot . Please do not stop or restart the compute instance through the Studio until
5 GB disk space has been cleared. Auto shutdowns, including scheduled start or
stop as well as idle shutdowns, will not work if the CI disk is full.
Next steps
Create resources you need to get started.
Tutorial: Train your first ML model shows how to use a compute instance with an
integrated notebook.
What are compute targets in Azure
Machine Learning?
Article • 06/15/2023
A compute target is a designated compute resource or environment where you run your
training script or host your service deployment. This location might be your local
machine or a cloud-based compute resource. Using compute targets makes it easy for
you to later change your compute environment without having to change your code.
In a typical model development lifecycle, you might:
1. Start by developing and experimenting on a small amount of data. At this stage,

use your local environment, such as a local computer or cloud-based virtual
machine (VM), as your compute target.
2. Scale up to larger data, or do distributed training by using one of these training
compute targets.
3. After your model is ready, deploy it to a web hosting environment with one of
these deployment compute targets.
The compute resources you use for your compute targets are attached to a workspace.
Compute resources other than the local machine are shared by users of the workspace.
Training compute targets

Azure Machine Learning has varying support across different compute targets. A typical
model development lifecycle starts with development or experimentation on a small
amount of data. At this stage, use a local environment like your local computer or a
cloud-based VM. As you scale up your training on larger datasets or perform distributed
training, use Azure Machine Learning compute to create a single- or multi-node cluster
that autoscales each time you submit a job. You can also attach your own compute
resource, although support for different scenarios might vary.
Compute targets can be reused from one training job to the next. For example, after
you attach a remote VM to your workspace, you can reuse it for multiple jobs. For
machine learning pipelines, use the appropriate pipeline step for each compute target.
You can use any of the following resources for a training compute target for most jobs.
Not all resources can be used for automated machine learning, machine learning
pipelines, or designer. Azure Databricks can be used as a training resource for local runs
and machine learning pipelines, but not as a remote target for other training.
Training targets Automated Machine learning Azure Machine
machine learning pipelines Learning designer
Local computer Yes
Azure Machine Learning Yes Yes Yes

compute cluster
Azure Machine Learning Yes (through SDK) Yes Yes

compute instance
Azure Machine Learning Yes Yes

Kubernetes
Remote VM Yes Yes
Apache Spark pools Yes (SDK local Yes

(preview) mode only)
Azure Databricks Yes (SDK local Yes

mode only)
Azure Data Lake Analytics Yes
Azure HDInsight Yes
Azure Batch Yes
 Tip
The compute instance has 120GB OS disk. If you run out of disk space, use the
terminal to clear at least 1-2 GB before you stop or restart the compute instance.
Compute targets for inference

When performing inference, Azure Machine Learning creates a Docker container that
hosts the model and associated resources needed to use it. This container is then used
in a compute target.
The compute target you use to host your model will affect the cost and availability of
your deployed endpoint. Use this table to choose an appropriate compute target.
Compute target Used for GPU Description

support
support
Local web service Testing/debugging Use for limited testing and

troubleshooting. Hardware acceleration
depends on use of libraries in the local
system.
Azure Machine Real-time Yes Fully managed computes for real-time

Learning endpoints inference (managed online endpoints) and batch
(SDK/CLI v2 only) scoring (batch endpoints) on serverless
Batch inference compute.
Azure Machine Real-time Yes Run inferencing workloads on on-

Learning inference premises, cloud, and edge Kubernetes
Kubernetes clusters.
Batch inference
Azure Container Real-time Use for low-scale CPU-based workloads

Instances (SDK/CLI inference that require less than 48 GB of RAM.
v1 only) Doesn't require you to manage a cluster.
Recommended for
dev/test purposes Supported in the designer.
only.
７ Note
When choosing a cluster SKU, first scale up and then scale out. Start with a machine
that has 150% of the RAM your model requires, profile the result and find a
machine that has the performance you need. Once you've learned that, increase the
number of machines to fit your need for concurrent inference.
７ Note
Container instances require the SDK or CLI v1 and are suitable only for small
models less than 1 GB in size.
Learn where and how to deploy your model to a compute target.
Azure Machine Learning compute (managed)

A managed compute resource is created and managed by Azure Machine Learning. This
compute is optimized for machine learning workloads. Azure Machine Learning
compute clusters and compute instances are the only managed computes.
You can create Azure Machine Learning compute instances or compute clusters from:
Azure Machine Learning studio.

The Python SDK and the Azure CLI:
Compute instance.
Compute cluster.
An Azure Resource Manager template. For an example template, see Create an
Azure Machine Learning compute cluster .
７ Note
Instead of creating a compute cluster, use serverless compute (preview) to offload

compute lifecycle management to Azure Machine Learning.
When created, these compute resources are automatically part of your workspace,
unlike other kinds of compute targets.
Capability Compute cluster Compute instance
Single- or multi-node cluster ✓ Single node cluster
Autoscales each time you submit a job ✓
Automatic cluster management and job scheduling ✓ ✓
Support for both CPU and GPU resources ✓ ✓
７ Note
To avoid charges when the compute is idle:
For compute cluster make sure the minimum number of nodes is set to 0, or
use serverless compute (preview).
For a compute instance, enable idle shutdown.
Supported VM series and sizes
） Important
If your compute instance or compute clusters are based on any of these series,
recreate with another VM size before their retirement date to avoid service
disruption.
These series are retiring on August 31, 2023:
Azure NC-series
Azure NCv2-series
Azure ND-series
Azure NV- and NV_Promo series
These series are retiring on August 31, 2024:
Azure Av1-series
Azure HB-series
When you select a node size for a managed compute resource in Azure Machine
Learning, you can choose from among select VM sizes available in Azure. Azure offers a
range of sizes for Linux and Windows for different workloads. To learn more, see VM
types and sizes.
There are a few exceptions and limitations to choosing a VM size:
Some VM series aren't supported in Azure Machine Learning.

There are some VM series, such as GPUs and other special SKUs, which may not
initially appear in your list of available VMs. But you can still use them, once you
request a quota change. For more information about requesting quotas, see
Request quota increases. See the following table to learn more about supported
series.
Supported VM series Category Supported by
DDSv4 General purpose Compute clusters and instance
Dv2 General purpose Compute clusters and instance
Dv3 General purpose Compute clusters and instance
DSv2 General purpose Compute clusters and instance
DSv3 General purpose Compute clusters and instance
EAv4 Memory optimized Compute clusters and instance
Ev3 Memory optimized Compute clusters and instance
ESv3 Memory optimized Compute clusters and instance
FSv2 Compute optimized Compute clusters and instance

Supported VM series Category Supported by
FX Compute optimized Compute clusters
H High performance compute Compute clusters and instance
HB High performance compute Compute clusters and instance
HBv2 High performance compute Compute clusters and instance
HBv3 High performance compute Compute clusters and instance
HC High performance compute Compute clusters and instance
LSv2 Storage optimized Compute clusters and instance
M Memory optimized Compute clusters and instance
NC GPU Compute clusters and instance
NC Promo GPU Compute clusters and instance
NCv2 GPU Compute clusters and instance
NCv3 GPU Compute clusters and instance
ND GPU Compute clusters and instance
NDv2 GPU Compute clusters and instance
NV GPU Compute clusters and instance
NVv3 GPU Compute clusters and instance
NCasT4_v3 GPU Compute clusters and instance
NDasrA100_v4 GPU Compute clusters and instance
While Azure Machine Learning supports these VM series, they might not be available in
all Azure regions. To check whether VM series are available, see Products available by
region .
７ Note
Azure Machine Learning doesn't support all VM sizes that Azure Compute supports.
To list the available VM sizes, use one of the following methods:
REST API
If using the GPU-enabled compute targets, it is important to ensure that the correct
CUDA drivers are installed in the training environment. Use the following table to
determine the correct CUDA version to use:
GPU Architecture Azure VM Series Supported CUDA versions
Ampere NDA100_v4 11.0+
Turing NCT4_v3 10.0+
Volta NCv3, NDv2 9.0+
Pascal NCv2, ND 9.0+
Maxwell NV, NVv3 9.0+
Kepler NC, NC Promo 9.0+
In addition to ensuring the CUDA version and hardware are compatible, also ensure that
the CUDA version is compatible with the version of the machine learning framework you
are using:
For PyTorch, you can check the compatibility by visiting Pytorch's previous versions
page .
For Tensorflow, you can check the compatibility by visiting Tensorflow's build from
source page .
Compute isolation
Azure Machine Learning compute offers VM sizes that are isolated to a specific
hardware type and dedicated to a single customer. Isolated VM sizes are best suited for
workloads that require a high degree of isolation from other customers' workloads for
reasons that include meeting compliance and regulatory requirements. Utilizing an
isolated size guarantees that your VM will be the only one running on that specific
server instance.
The current isolated VM offerings include:
Standard_M128ms
Standard_F72s_v2
Standard_NC24s_v3
Standard_NC24rs_v3*
*RDMA capable
To learn more about isolation, see Isolation in the Azure public cloud.
Unmanaged compute
An unmanaged compute target is not managed by Azure Machine Learning. You create
this type of compute target outside Azure Machine Learning and then attach it to your
workspace. Unmanaged compute resources can require additional steps for you to
maintain or to improve performance for machine learning workloads.
Azure Machine Learning supports the following unmanaged compute types:
Remote virtual machines

Azure HDInsight
Azure Databricks
Azure Data Lake Analytics
Azure Synapse Spark pool (preview)
 Tip
Currently this requires the Azure Machine Learning SDK v1.
Azure Kubernetes Service
For more information, see Manage compute resources.
Next steps
Learn how to:
Deploy your model

Create & use software environments in
Azure Machine Learning with CLI v1
Article • 06/01/2023
In this article, learn how to create and manage Azure Machine Learning environments
using CLI v1. Use the environments to track and reproduce your projects' software
dependencies as they evolve. The Azure Machine Learning CLI v1 mirrors most of the
functionality of the Python SDK v1. You can use it to create and manage environments.
Software dependency management is a common task for developers. You want to

ensure that builds are reproducible without extensive manual software configuration.
The Azure Machine Learning Environment class accounts for local development
solutions such as pip and Conda and distributed cloud development through Docker
capabilities.
For a high-level overview of how environments work in Azure Machine Learning, see
What are ML environments? For information about managing environments in the Azure
Machine Learning studio, see Manage environments in the studio. For information
about configuring development environments, see Set up a Python development
environment for Azure Machine Learning.
Prerequisites
An Azure Machine Learning workspace
） Important

and Python SDK v2.
Scaffold an environment
The following command scaffolds the files for a default environment definition in the
specified directory. These files are JSON files. They work like the corresponding class in
the SDK. You can use the files to create new environments that have custom settings.
Azure CLI
az ml environment scaffold -n myenv -d myenvdir
Register an environment
Run the following command to register an environment from a specified directory:
Azure CLI
az ml environment register -d myenvdir
List environments
Run the following command to list all registered environments:
Azure CLI
az ml environment list
Download an environment
To download a registered environment, use the following command:
Azure CLI
az ml environment download -n myenv -d downloaddir
Next steps
After you have a trained model, learn how and where to deploy models.
View the Environment class SDK reference.
Where to save and write files for Azure
Machine Learning experiments
Article • 01/26/2023
In this article, you learn where to save input files, and where to write output files from
your experiments to prevent storage limit errors and experiment latency.
When launching training jobs on a compute target, they are isolated from outside
environments. The purpose of this design is to ensure reproducibility and portability of
the experiment. If you run the same script twice, on the same or another compute
target, you receive the same results. With this design, you can treat compute targets as
stateless computation resources, each having no affinity to the jobs that are running
after they are finished.
Where to save input files

Before you can initiate an experiment on a compute target or your local machine, you
must ensure that the necessary files are available to that compute target, such as
dependency files and data files your code needs to run.
Azure Machine Learning jobs training scripts by copying the entire source directory. If
you have sensitive data that you don't want to upload, use a .ignore file or don't include
it in the source directory. Instead, access your data using a datastore.
The storage limit for experiment snapshots is 300 MB and/or 2000 files.
For this reason, we recommend:
Storing your files in an Azure Machine Learning dataset. This prevents

experiment latency issues, and has the advantages of accessing data from a
remote compute target, which means authentication and mounting are managed
by Azure Machine Learning. Learn more about how to specify a dataset as your
input data source in your training script with Train with datasets.
If you only need a couple data files and dependency scripts and can't use a
datastore, place the files in the same folder directory as your training script.
Specify this folder as your source_directory directly in your training script, or in
the code that calls your training script.
Storage limits of experiment snapshots
For experiments, Azure Machine Learning automatically makes an experiment snapshot
of your code based on the directory you suggest when you configure the job. This has a
total limit of 300 MB and/or 2000 files. If you exceed this limit, you'll see the following
error:
Python
While attempting to take snapshot of .

Your total snapshot size exceeds the limit of 300.0 MB
To resolve this error, store your experiment files on a datastore. If you can't use a
datastore, the below table offers possible alternate solutions.
Experiment description Storage limit solution
Less than 2000 files & Override snapshot size limit with
can't use a datastore azureml._restclient.snapshots_client.SNAPSHOT_MAX_SIZE_BYTES =
'insert_desired_size'
This may take several minutes depending on the number and size of
files.
Must use specific script To prevent unnecessary files from being included in the snapshot,
directory make an ignore file ( .gitignore or .amlignore ) in the directory. Add
the files and directories to exclude to this file. For more information on
the syntax to use inside this file, see syntax and patterns for
.gitignore . The .amlignore file uses the same syntax. If both files
exist, the .amlignore file is used and the .gitignore file is unused.
Pipeline Use a different subdirectory for each step
Jupyter notebooks Create a .amlignore file or move your notebook into a new, empty,
subdirectory and run your code again.
Where to write files

Due to the isolation of training experiments, the changes to files that happen during
jobs are not necessarily persisted outside of your environment. If your script modifies
the files local to compute, the changes are not persisted for your next experiment job,
and they're not propagated back to the client machine automatically. Therefore, the
changes made during the first experiment job don't and shouldn't affect those in the
second.
When writing changes, we recommend writing files to storage via an Azure Machine
Learning dataset with an OutputFileDatasetConfig object. See how to create an
OutputFileDatasetConfig.
Otherwise, write files to the ./outputs and/or ./logs folder.
） Important
Two folders, outputs and logs, receive special treatment by Azure Machine Learning.
During training, when you write files to ./outputs and ./logs folders, the files will
automatically upload to your job history, so that you have access to them once
your job is finished.
For output such as status messages or scoring results, write files to the ./outputs
folder, so they are persisted as artifacts in job history. Be mindful of the number
and size of files written to this folder, as latency may occur when the contents are
uploaded to job history. If latency is a concern, writing files to a datastore is
recommended.
To save written file as logs in job history, write files to ./logs folder. The logs are
uploaded in real time, so this method is suitable for streaming live updates from a
remote job.
Next steps
Learn more about accessing data from storage.
Learn more about Create compute targets for model training and deployment
Use private Python packages with Azure
Machine Learning
Article • 03/02/2023
In this article, learn how to use private Python packages securely within Azure Machine
Learning. Use cases for private Python packages include:
You've developed a private package that you don't want to share publicly.
You want to use a curated repository of packages stored within an enterprise
firewall.
The recommended approach depends on whether you have few packages for a single
Azure Machine Learning workspace, or an entire repository of packages for all
workspaces within an organization.
The private packages are used through Environment class. Within an environment, you
declare which Python packages to use, including private ones. To learn about
environment in Azure Machine Learning in general, see How to use environments.
Prerequisites
The Azure Machine Learning SDK for Python
Use small number of packages for

development and testing
For a small number of private packages for a single workspace, use the static
Environment.add_private_pip_wheel() method. This approach allows you to quickly add a
private package to the workspace, and is well suited for development and testing
purposes.
Point the file path argument to a local wheel file and run the add_private_pip_wheel
command. The command returns a URL used to track the location of the package within
your Workspace. Capture the storage URL and pass it the add_pip_package() method.
Python
whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = "my-
custom.whl")
myenv = Environment(name="myenv")
conda_dep = CondaDependencies()
conda_dep.add_pip_package(whl_url)
myenv.python.conda_dependencies=conda_dep
Internally, Azure Machine Learning service replaces the URL by secure SAS URL, so your
wheel file is kept private and secure.
Use a repository of packages from Azure

DevOps feed
If you're actively developing Python packages for your machine learning application,
you can host them in an Azure DevOps repository as artifacts and publish them as a
feed. This approach allows you to integrate the DevOps workflow for building packages
with your Azure Machine Learning Workspace. To learn how to set up Python feeds
using Azure DevOps, read Get Started with Python Packages in Azure Artifacts
This approach uses Personal Access Token to authenticate against the repository. The
same approach is applicable to other repositories with token based authentication, such
as private GitHub repositories.
1. Create a Personal Access Token (PAT) for your Azure DevOps instance. Set the
scope of the token to Packaging > Read.
2. Add the Azure DevOps URL and PAT as workspace properties, using the
Workspace.set_connection method.
Python
pat_token = input("Enter secret token")

ws.set_connection(name="connection-1",
category = "PythonFeed",
target = "https://pkgs.dev.azure.com/<MY-ORG>",
authType = "PAT",
value = pat_token)
3. Create an Azure Machine Learning environment and add Python packages from
the feed.
Python

env = Environment(name="my-env")
cd = CondaDependencies()
cd.add_pip_package("<my-package>")
cd.set_pip_option("--extra-index-url https://pkgs.dev.azure.com/<MY-
ORG>/_packaging/<MY-FEED>/pypi/simple")")
env.python.conda_dependencies=cd
The environment is now ready to be used in training runs or web service endpoint
deployments. When building the environment, Azure Machine Learning service uses the
PAT to authenticate against the feed with the matching base URL.
Use a repository of packages from private

storage
You can consume packages from an Azure storage account within your organization's
firewall. The storage account can hold a curated set of packages or an internal mirror of
publicly available packages.
To set up such private storage, see Secure an Azure Machine Learning workspace and
associated resources. You must also place the Azure Container Registry (ACR) behind the
VNet.
） Important
You must complete this step to be able to train or deploy models using the private
package repository.
After completing these configurations, you can reference the packages in the Azure
Machine Learning environment definition by their full URL in Azure blob storage.
Next steps
Learn more about enterprise security in Azure Machine Learning
Launch Visual Studio Code integrated
with Azure Machine Learning (preview)
Article • 06/15/2023
In this article, you learn how to launch Visual Studio Code remotely connected to an
Azure Machine Learning compute instance. Use VS Code as your integrated
development environment (IDE) with the power of Azure Machine Learning resources.
Use VS Code in the browser with VS Code for the Web, or use the VS Code desktop
application.
） Important
Previews .
There are two ways you can connect to a compute instance from Visual Studio Code. We
recommend the first approach.
1. Use VS Code as your workspace's integrated development environment (IDE).

This option provides you with a full-featured development environment for
building your machine learning projects.
You can open VS Code from your workspace either in the browser VS Code
for the Web or desktop application VS Code Desktop.
We recommend VS Code for the Web, as you can do all your machine
learning work directly from the browser, and without any required
installations or dependencies.
2. Remote Jupyter Notebook server. This option allows you to set a compute
instance as a remote Jupyter Notebook server. This option is only available in VS
Code (Desktop).
） Important
To connect to a compute instance behind a firewall, see Configure inbound and

outbound network traffic.
Prerequisites
Before you get started, you need:
1. An Azure Machine Learning workspace and compute instance. Complete Create

resources you need to get started to create them both.
2. Sign in to studio and select your workspace if it's not already open.
3. In the Manage preview features panel, scroll down and enable Connect compute
instances to Visual Studio Code for the Web.
Use VS Code as your workspace IDE

Use one of these options to connect VS Code to your compute instance and workspace
files.
Studio -> VS Code (Web)
VS Code for the Web provides you with a full-featured development environment
for building your machine learning projects, all from the browser and without
required installations or dependencies. And by connecting your Azure Machine
Learning compute instance, you get the rich and integrated development
experience VS Code offers, enhanced by the power of Azure Machine Learning.
Launch VS Code for the Web with one select from the Azure Machine Learning
studio, and seamlessly continue your work.
Sign in to Azure Machine Learning studio and follow the steps to launch a VS
Code (Web) browser tab, connected to your Azure Machine Learning compute
instance.
You can create the connection from either the Notebooks or Compute section of
Notebooks
1. Select the Notebooks tab.
2. In the Notebooks tab, select the file you want to edit.
3. If the compute instance is stopped, select Start compute and wait until
it's running.
4. Select Editors > Edit in VS Code (Web).
Compute
1. Select the Compute tab

2. If the compute instance you wish to use is stopped, select it and then
select Start.
3. Once the compute instance is running, in the Applications column, select
VS Code (Web).

If you pick one of the click-out experiences, a new VS Code window is opened, and a
connection attempt made to the remote compute instance. When attempting to make
this connection, the following steps are taking place:
1. Authorization. Some checks are performed to make sure the user attempting to
make a connection is authorized to use the compute instance.
2. VS Code Remote Server is installed on the compute instance.
3. A WebSocket connection is established for real-time interaction.
Once the connection is established, it's persisted. A token is issued at the start of the
session, which gets refreshed automatically to maintain the connection with your
compute instance.
After you connect to your remote compute instance, use the editor to:
Author and manage files on your remote compute instance or file share .
Use the VS Code integrated terminal to run commands and applications on your
remote compute instance.
Debug your scripts and applications
Use VS Code to manage your Git repositories
Remote Jupyter Notebook server

This option allows you to use a compute instance as a remote Jupyter Notebook server
from Visual Studio Code (Desktop). This option connects only to the compute instance,
not the rest of the workspace. You won't see your workspace files in VS Code when
using this option.
In order to configure a compute instance as a remote Jupyter Notebook server, first

install:
Azure Machine Learning Visual Studio Code extension. For more information, see
the Azure Machine Learning Visual Studio Code Extension setup guide.
To connect to a compute instance:
1. Open a Jupyter Notebook in Visual Studio Code.
2. When the integrated notebook experience loads, choose Select Kernel.


Alternatively, use the command palette:

a. Select View > Command Palette from the menu bar to open the command
palette.
b. Enter into the text box AzureML: Connect to Compute instance Jupyter server .
3. Choose Azure ML Compute Instances from the list of Jupyter server options.
4. Select your subscription from the list of subscriptions. If you have previously
configured your default Azure Machine Learning workspace, this step is skipped.
5. Select your workspace.
6. Select your compute instance from the list. If you don't have one, select Create
new Azure Machine Learning Compute Instance and follow the prompts to create
one.
7. For the changes to take effect, you have to reload Visual Studio Code.
8. Open a Jupyter Notebook and run a cell.
） Important
You MUST run a cell in order to establish the connection.
At this point, you can continue to run cells in your Jupyter Notebook.
 Tip
You can also work with Python script files (.py) containing Jupyter-like code cells.
For more information, see the Visual Studio Code Python interactive
documentation .
Next steps
Now that you've launched Visual Studio Code remotely connected to a compute
instance, you can prep your data, edit and debug your code, and submit training jobs
with the Azure Machine Learning extension.
To learn more about how to make the most of VS Code integrated with Azure Machine
Learning, see Work in VS Code remotely connected to a compute instance (preview).
Manage Azure Machine Learning
resources with the VS Code Extension
(preview)
Article • 04/04/2023
Learn how to manage Azure Machine Learning resources with the VS Code extension.
） Important
Previews .
Prerequisites
Azure subscription. If you don't have one, sign up to try the free or paid version of
Visual Studio Code. If you don't have it, install it .
Azure Machine Learning extension. Follow the Azure Machine Learning VS Code
extension installation guide to set up the extension.
Create resources
The quickest way to create resources is using the extension's toolbar.
1. Open the Azure Machine Learning view.

2. Select + in the activity bar.
3. Choose your resource from the dropdown list.
4. Configure the specification file. The information required depends on the type of
resource you want to create.
5. Right-click the specification file and select AzureML: Execute YAML.
Alternatively, you can create a resource by using the command palette:
1. Open the command palette View > Command Palette

2. Enter > Azure ML: Create <RESOURCE-TYPE> into the text box. Replace RESOURCE-
TYPE with the type of resource you want to create.
3. Configure the specification file.

4. Open the command palette View > Command Palette
5. Enter > Azure ML: Create Resource into the text box.
Version resources
Some resources like environments, and models allow you to make changes to a
resource and store the different versions.
To version a resource:
1. Use the existing specification file that created the resource or follow the create
resources process to create a new specification file.
2. Increment the version number in the template.
As long as the name of the updated resource is the same as the previous version, Azure
Machine Learning picks up the changes and creates a new version.
Workspaces
For more information, see workspaces.
Create a workspace
1. In the Azure Machine Learning view, right-click your subscription node and select
Create Workspace.
2. A specification file appears. Configure the specification file.
Alternatively, use the > Azure ML: Create Workspace command in the command palette.
Remove workspace
1. Expand the subscription node that contains your workspace.
2. Right-click the workspace you want to remove.
3. Select whether you want to remove:
Only the workspace: This option deletes only the workspace Azure resource.
The resource group, storage accounts, and any other resources the
workspace was attached to are still in Azure.
With associated resources: This option deletes the workspace and all
resources associated with it.
Alternatively, use the > Azure ML: Remove Workspace command in the command palette.
Datastores
The extension currently supports datastores of the following types:
Azure Blob
Azure Data Lake Gen 1
Azure Data Lake Gen 2
Azure File
For more information, see datastore.
Create a datastore
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Datastores node and select Create Datastore.
4. Choose the datastore type.
Alternatively, use the > Azure ML: Create Datastore command in the command palette.
Manage a datastore
2. Expand your workspace node.
3. Expand the Datastores node inside your workspace.
4. Right-click the datastore you want to:
Unregister Datastore. Removes datastore from your workspace.

View Datastore. Display read-only datastore settings
Alternatively, use the > Azure ML: Unregister Datastore and > Azure ML: View
Datastore commands respectively in the command palette.
Datasets
The extension currently supports the following dataset types:
Tabular: Allows you to materialize data into a DataFrame.

File: A file or collection of files. Allows you to download or mount files to your
compute.
For more information, see datasets
Create dataset
2. Expand the workspace node you want to create the dataset under.
3. Right-click the Datasets node and select Create Dataset.
Alternatively, use the > Azure ML: Create Dataset command in the command palette.
Manage a dataset
3. Expand the Datasets node.
4. Right-click the dataset you want to:
View Dataset Properties. Lets you view metadata associated with a specific
dataset. If you have multiple versions of a dataset, you can choose to only
view the dataset properties of a specific version by expanding the dataset
node and performing the same steps described in this section on the version
of interest.
Preview dataset. View your dataset directly in the VS Code Data Viewer. Note
that this option is only available for tabular datasets.
Unregister dataset. Removes a dataset and all versions of it from your
workspace.
Alternatively, use the > Azure ML: View Dataset Properties and > Azure ML: Unregister
Dataset commands respectively in the command palette.
Environments
For more information, see environments.
Create environment
2. Expand the workspace node you want to create the datastore under.
3. Right-click the Environments node and select Create Environment.
Alternatively, use the > Azure ML: Create Environment command in the command
palette.
View environment configurations

To view the dependencies and configurations for a specific environment in the
extension:

3. Expand the Environments node.
4. Right-click the environment you want to view and select View Environment.
Alternatively, use the > Azure ML: View Environment command in the command palette.
Experiments
For more information, see experiments.
Create job
The quickest way to create a job is by clicking the Create Job icon in the extension's
activity bar.
Using the resource nodes in the Azure Machine Learning view:

3. Right-click the Experiments node in your workspace and select Create Job.
4. Choose your job type.
Alternatively, use the > Azure ML: Create Job command in the command palette.
View job
To view your job in Azure Machine Learning studio:

2. Expand the Experiments node inside your workspace.
3. Right-click the experiment you want to view and select View Experiment in Studio.
4. A prompt appears asking you to open the experiment URL in Azure Machine
Learning studio. Select Open.
Alternatively, use the > Azure ML: View Experiment in Studio command respectively in
the command palette.
Track job progress

As you're running your job, you may want to see its progress. To track the progress of a
job in Azure Machine Learning studio from the extension:

3. Expand the job node you want to track progress for.
4. Right-click the job and select View Job in Studio.
5. A prompt appears asking you to open the job URL in Azure Machine Learning
studio. Select Open.
Download job logs & outputs

Once a job is complete, you may want to download the logs and assets such as the
model generated as part of a job.

3. Expand the job node you want to download logs and outputs for.
4. Right-click the job:
To download the outputs, select Download outputs.

To download the logs, select Download logs.
Alternatively, use the > Azure ML: Download Outputs and > Azure ML: Download Logs
commands respectively in the command palette.
Compute instances
For more information, see compute instances.
Create compute instance

3. Expand the Compute node.
4. Right-click the Compute instances node in your workspace and select Create
Compute.
Alternatively, use the > Azure ML: Create Compute command in the command palette.
Connect to compute instance

To use a compute instance as a development environment or remote Jupyter server, see
Connect to a compute instance.
Stop or restart compute instance

3. Expand the Compute instances node inside your Compute node.
4. Right-click the compute instance you want to stop or restart and select Stop
Compute instance or Restart Compute instance respectively.
Alternatively, use the > Azure ML: Stop Compute instance and Restart Compute instance
commands respectively in the command palette.
View compute instance configuration

4. Right-click the compute instance you want to inspect and select View Compute
instance Properties.
Alternatively, use the AzureML: View Compute instance Properties command in the
command palette.
Delete compute instance

4. Right-click the compute instance you want to delete and select Delete Compute
instance.
Alternatively, use the AzureML: Delete Compute instance command in the command
palette.
Compute clusters
For more information, see training compute targets.
Create compute cluster

3. Expand the Compute node.
4. Right-click the Compute clusters node in your workspace and select Create
Compute.
Alternatively, use the > Azure ML: Create Compute command in the command palette.
View compute configuration

3. Expand the Compute clusters node inside your Compute node.
4. Right-click the compute you want to view and select View Compute Properties.
Alternatively, use the > Azure ML: View Compute Properties command in the command
palette.
Delete compute cluster

3. Expand the Compute clusters node inside your Compute node.
4. Right-click the compute you want to delete and select Remove Compute.
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Inference Clusters
For more information, see compute targets for inference.
Manage inference clusters

3. Expand the Inference clusters node inside your Compute node.
4. Right-click the compute you want to:
View Compute Properties. Displays read-only configuration data about your

attached compute.
Detach compute. Detaches the compute from your workspace.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach
Compute commands respectively in the command palette.
Delete inference clusters

3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to delete and select Remove Compute.
Alternatively, use the > Azure ML: Remove Compute command in the command palette.
Attached Compute
For more information, see unmanaged compute.
Manage attached compute

3. Expand the Attached computes node inside your Compute node.
4. Right-click the compute you want to:
View Compute Properties. Displays read-only configuration data about your

attached compute.
Detach compute. Detaches the compute from your workspace.
Alternatively, use the > Azure ML: View Compute Properties and > Azure ML: Detach
Compute commands respectively in the command palette.
Models
For more information, see train machine learning models.
Create model
3. Right-click the Models node in your workspace and select Create Model.
Alternatively, use the > Azure ML: Create Model command in the command palette.
View model properties

2. Expand the Models node inside your workspace.
3. Right-click the model whose properties you want to see and select View Model
Properties. A file opens in the editor containing your model properties.
Alternatively, use the > Azure ML: View Model Properties command in the command
palette.
Download model
3. Right-click the model you want to download and select Download Model File.
Alternatively, use the > Azure ML: Download Model File command in the command
palette.
Delete a model
3. Right-click the model you want to delete and select Remove Model.
4. A prompt appears confirming you want to remove the model. Select Ok.
Alternatively, use the > Azure ML: Remove Model command in the command palette.
Endpoints
For more information, see endpoints.
Create endpoint
3. Right-click the Models node in your workspace and select Create Endpoint.
4. Choose your endpoint type.
Alternatively, use the > Azure ML: Create Endpoint command in the command palette.
Delete endpoint
2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to remove and select Remove Service.
4. A prompt appears confirming you want to remove the service. Select Ok.
Alternatively, use the > Azure ML: Remove Service command in the command palette.
View service properties

In addition to creating and deleting deployments, you can view and edit settings
associated with the deployment.

2. Expand the Endpoints node inside your workspace.
3. Right-click the deployment you want to manage:
To view deployment configuration settings, select View Service Properties.
Alternatively, use the > Azure ML: View Service Properties command in the command
palette.
Next steps
Train an image classification model with the VS Code extension.
MLflow and Azure Machine Learning
(v1)
Article • 07/06/2023
MLflow is an open-source library for managing the life cycle of your machine learning
experiments. MLflow's tracking URI and logging API are collectively known as MLflow
Tracking . This component of MLflow logs and tracks your training run metrics and
model artifacts, no matter where your experiment's environment is--on your computer,
on a remote compute target, on a virtual machine, or in an Azure Databricks cluster.
Together, MLflow Tracking and Azure Machine Learning allow you to track an
experiment's run metrics and store model artifacts in your Azure Machine Learning
workspace.
Compare MLflow and Azure Machine Learning

clients
The following table summarizes the clients that can use Azure Machine Learning and
their respective capabilities.
MLflow Tracking offers metric logging and artifact storage functionalities that are
otherwise available only through the Azure Machine Learning Python SDK.
Capability MLflow Tracking Azure Machine Azure Azure Machine

and deployment Learning Python Machine Learning
SDK Learning CLI studio
Manage a ✓ ✓ ✓
workspace
Use data stores ✓ ✓
Log metrics ✓ ✓
Upload artifacts ✓ ✓
View metrics ✓ ✓ ✓ ✓
Manage ✓ ✓ ✓
compute
Capability MLflow Tracking Azure Machine Azure Azure Machine
and deployment Learning Python Machine Learning
SDK Learning CLI studio
Deploy models ✓ ✓ ✓ ✓
Monitor model ✓
performance
Detect data drift ✓ ✓
Track experiments
With MLflow Tracking, you can connect Azure Machine Learning as the back end of your
MLflow experiments. You can then do the following tasks:
Track and log experiment metrics and artifacts in your Azure Machine Learning
workspace. If you already use MLflow Tracking for your experiments, the
workspace provides a centralized, secure, and scalable location to store training
metrics and models. Learn more at Track machine learning models with MLflow
and Azure Machine Learning.
Track and manage models in MLflow and the Azure Machine Learning model
registry.
Track Azure Databricks training runs.
Train MLflow projects (preview)
） Important
Items marked (preview) in this article are currently in public preview. The preview
version is provided without a service level agreement, and it's not recommended
for production workloads. Certain features might not be supported or might have
You can use MLflow Tracking to submit training jobs with MLflow Projects and Azure
Machine Learning back-end support.
２ Warning
Support for MLflow Projects in Azure Machine Learning will end on September 30,
2023. You'll be able to submit MLflow Projects ( MLproject files) to Azure Machine
Learning until that date.
We recommend that you transition to Azure Machine Learning Jobs, using either
the Azure CLI or the Azure Machine Learning SDK for Python (v2) before September
2026, when MLflow Projects will be fully retired in Azure Machine Learning. For
more information on Azure Machine Learning jobs, see Track ML experiments and
models with MLflow.
You can submit jobs locally with Azure Machine Learning tracking or migrate your runs
to the cloud via Azure Machine Learning compute.
Learn more at Train machine learning models with MLflow projects and Azure Machine
Learning.
Deploy MLflow experiments

You can deploy your MLflow model as an Azure web service so that you can apply the
model management and data drift detection capabilities in Azure Machine Learning to
your production models.
Next steps
Track machine learning models with MLflow and Azure Machine Learning
Train machine learning models with MLflow projects and Azure Machine Learning
(preview)
Track Azure Databricks runs with MLflow
Deploy models with MLflow
Data in Azure Machine Learning v1
Article • 05/31/2023
Azure Machine Learning makes it easy to connect to your data in the cloud. It provides
an abstraction layer over the underlying storage service, so you can securely access and
work with your data without having to write code specific to your storage type. Azure
Machine Learning also provides the following data capabilities:
Interoperability with Pandas and Spark DataFrames

Versioning and tracking of data lineage
Data labeling
Data drift monitoring
Data workflow
When you're ready to use the data in your cloud-based storage solution, we
recommend the following data delivery workflow. This workflow assumes you have an
Azure storage account and data in a cloud-based storage service in Azure.
1. Create an Azure Machine Learning datastore to store connection information to

your Azure storage.
2. From that datastore, create an Azure Machine Learning dataset to point to a

specific file(s) in your underlying storage.
3. To use that dataset in your machine learning experiment you can either
Mount it to your experiment's compute target for model training.
OR
Consume it directly in Azure Machine Learning solutions like, automated

machine learning (automated ML) experiment runs, machine learning
pipelines, or the Azure Machine Learning designer.
4. Create dataset monitors for your model output dataset to detect for data drift.
5. If data drift is detected, update your input dataset and retrain your model
accordingly.
The following diagram provides a visual demonstration of this recommended workflow.
Connect to storage with datastores

Azure Machine Learning datastores securely keep the connection information to your
data storage on Azure, so you don't have to code it in your scripts. Register and create a
datastore to easily connect to your storage account, and access the data in your
underlying storage service.
Supported cloud-based storage services in Azure that can be registered as datastores:
Azure Blob Container

Azure File Share
Azure Data Lake
Azure Data Lake Gen2
Azure SQL Database
Azure Database for PostgreSQL
Databricks File System
Azure Database for MySQL
 Tip
You can create datastores with credential-based authentication for accessing

storage services, like a service principal or shared access signature (SAS) token.
These credentials can be accessed by users who have Reader access to the
workspace.
If this is a concern, create a datastore that uses identity-based data access to
connect to storage services.
Reference data in storage with datasets

Azure Machine Learning datasets aren't copies of your data. By creating a dataset, you
create a reference to the data in its storage service, along with a copy of its metadata.
Because datasets are lazily evaluated, and the data remains in its existing location, you
Incur no extra storage cost.

Don't risk unintentionally changing your original data sources.
Improve ML workflow performance speeds.
To interact with your data in storage, create a dataset to package your data into a
consumable object for machine learning tasks. Register the dataset to your workspace
to share and reuse it across different experiments without data ingestion complexities.
Datasets can be created from local files, public urls, Azure Open Datasets , or Azure
storage services via datastores.
There are 2 types of datasets:
A FileDataset references single or multiple files in your datastores or public URLs. If

your data is already cleansed and ready to use in training experiments, you can
download or mount files referenced by FileDatasets to your compute target.
A TabularDataset represents data in a tabular format by parsing the provided file or

list of files. You can load a TabularDataset into a pandas or Spark DataFrame for
further manipulation and cleansing. For a complete list of data formats you can
create TabularDatasets from, see the TabularDatasetFactory class.
Additional datasets capabilities can be found in the following documentation:
Version and track dataset lineage.

Monitor your dataset to help with data drift detection.
Work with your data

With datasets, you can accomplish a number of machine learning tasks through
seamless integration with Azure Machine Learning features.
Create a data labeling project.

Train machine learning models:
automated ML experiments
the designer
notebooks
Azure Machine Learning pipelines
Access datasets for scoring with batch inference in machine learning pipelines.
Set up a dataset monitor for data drift detection.
Label data with data labeling projects

Labeling large amounts of data has often been a headache in machine learning projects.
Those with a computer vision component, such as image classification or object
detection, generally require thousands of images and corresponding labels.
Azure Machine Learning gives you a central location to create, manage, and monitor
labeling projects. Labeling projects help coordinate the data, labels, and team members,
allowing you to more efficiently manage the labeling tasks. Currently supported tasks
are image classification, either multi-label or multi-class, and object identification using
bounded boxes.
Create an image labeling project or text labeling project, and output a dataset for use in
machine learning experiments.
Monitor model performance with data drift

In the context of machine learning, data drift is the change in model input data that
leads to model performance degradation. It is one of the top reasons model accuracy
degrades over time, thus monitoring data drift helps detect model performance issues.
See the Create a dataset monitor article, to learn more about how to detect and alert to
data drift on new data in a dataset.
Next steps
Create a dataset in Azure Machine Learning studio or with the Python SDK using
these steps.
Try out dataset training examples with our sample notebooks .
Network data access with Azure
Machine Learning studio
Article • 04/04/2023
Data access is complex and it's important to recognize that there are many pieces to it.
For example, accessing data from Azure Machine Learning studio is different than using
the SDK. When using the SDK on your local development environment, you're directly
accessing data in the cloud. When using studio, you aren't always directly accessing the
data store from your client. Studio relies on the workspace to access data on your
behalf.
） Important
The information in this article is intended for Azure administrators who are creating
the infrastructure required for an Azure Machine Learning solution.
 Tip
Studio only supports reading data from the following datastore types in a VNet:
Azure Storage Account (blob & file)

Azure Data Lake Storage Gen1
Azure SQL Database
Data access
In general, data access from studio involves the following checks:
1. Who is accessing?
There are multiple different types of authentication depending on the storage

type. For example, account key, token, service principal, managed identity,
and user identity.
If authentication is made using a user identity, then it's important to know
which user is trying to access storage. Learn more about identity-based data
access.
2. Do they have permission?
Are the credentials correct? If so, does the service principal, managed
identity, etc., have the necessary permissions on the storage? Permissions are
granted using Azure role-based access controls (Azure RBAC).
Reader of the storage account reads metadata of the storage.
Storage Blob Data Reader reads data within a blob container.
Contributor allows write access to a storage account.
More roles may be required depending on the type of storage.
3. Where is access from?
User: Is the client IP address in the VNet/subnet range?

Workspace: Is the workspace public or does it have a private endpoint in a
VNet/subnet?
Storage: Does the storage allow public access, or does it restrict access
through a service endpoint or a private endpoint?
4. What operation is being performed?
Create, read, update, and delete (CRUD) operations on a data store/dataset

are handled by Azure Machine Learning.
Data Access calls (such as preview or schema) go to the underlying storage
and need extra permissions.
5. Where is this operation being run; compute resources in your Azure subscription
or resources hosted in a Microsoft subscription?
All calls to dataset and datastore services (except the "Generate Profile"
option) use resources hosted in a Microsoft subscription to run the
operations.
Jobs, including the "Generate Profile" option for datasets, run on a compute
resource in your subscription, and access the data from there. So the
compute identity needs permission to the storage rather than the identity of
the user submitting the job.
The following diagram shows the general flow of a data access call. In this example, a
user is trying to make a data access call through a machine learning workspace, without
using any compute resource.
Scenarios and identities
The following table lists what identities should be used for specific scenarios:
Scenario Use workspace Identity to use

Managed Service Identity (MSI)
Access from UI Yes Workspace MSI
Access from UI No User's Identity
Access from Job Yes/No Compute MSI
Access from Notebook Yes/No User's identity
 Tip
If you need to access data from outside Azure Machine Learning, such as using
Azure Storage Explorer, user identity is probably what is used. Consult the
documentation for the tool or service you are using for specific information. For
more information on how Azure Machine Learning works with data, see Identity-
based data access to storage services on Azure.
Azure Storage Account

When using an Azure Storage Account from Azure Machine Learning studio, you must
add the managed identity of the workspace to the following Azure RBAC roles for the
storage account:
Blob Data Reader

If the storage account uses a private endpoint to connect to the VNet, you must
grant the managed identity the Reader role for the storage account private
endpoint.
For more information, see Use Azure Machine Learning studio in an Azure Virtual
Network.
See the following sections for information on limitations when using Azure Storage
Account with your workspace in a VNet.
Secure communication with Azure Storage Account

To secure communication between Azure Machine Learning and Azure Storage
Accounts, configure storage to Grant access to trusted Azure services.
Azure Storage firewall

When an Azure Storage account is behind a virtual network, the storage firewall can
normally be used to allow your client to directly connect over the internet. However,
when using studio it isn't your client that connects to the storage account; it's the Azure
Machine Learning service that makes the request. The IP address of the service isn't
documented and changes frequently. Enabling the storage firewall will not allow
studio to access the storage account in a VNet configuration.
Azure Storage endpoint type

When the workspace uses a private endpoint and the storage account is also in the
VNet, there are extra validation requirements when using studio:
If the storage account uses a service endpoint, the workspace private endpoint
and storage service endpoint must be in the same subnet of the VNet.
If the storage account uses a private endpoint, the workspace private endpoint
and storage service endpoint must be in the same VNet. In this case, they can be in
different subnets.

When using Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style
access control lists. You can assign the workspace's managed identity access to
resources just like any other security principal. For more information, see Access control
in Azure Data Lake Storage Gen1.

When using Azure Data Lake Storage Gen2 as a datastore, you can use both Azure RBAC
and POSIX-style access control lists (ACLs) to control data access inside of a virtual
network.
To use Azure RBAC, follow the steps in the Datastore: Azure Storage Account section of
the 'Use Azure Machine Learning studio in an Azure Virtual Network' article. Data Lake
Storage Gen2 is based on Azure Storage, so the same steps apply when using Azure
RBAC.
To use ACLs, the managed identity of the workspace can be assigned access just like any
other security principal. For more information, see Access control lists on files and
directories.
Azure SQL Database

To access data stored in an Azure SQL Database with a managed identity, you must
create a SQL contained user that maps to the managed identity. For more information
on creating a user from an external provider, see Create contained users mapped to
Azure AD identities.
After you create a SQL contained user, grant permissions to it by using the GRANT T-
SQL command.
Secure communication with Azure SQL Database

To secure communication between Azure Machine Learning and Azure SQL Database,
there are two options:
） Important
Both options allow the possibility of services outside your subscription connecting
to your SQL Database. Make sure that your SQL login and user permissions limit
access to authorized users only.
Allow Azure services and resources to access the Azure SQL Database server.
Enabling this setting allows all connections from Azure, including connections from
the subscriptions of other customers, to your database server.
For information on enabling this setting, see IP firewall rules - Azure SQL Database
and Synapse Analytics.
Allow the IP address range of the Azure Machine Learning service in Firewalls
and virtual networks for the Azure SQL Database. Allowing the IP addresses
through the firewall limits connections to the Azure Machine Learning service for
a region.
２ Warning
The IP ranges for the Azure Machine Learning service may change over time.
There is no built-in way to automatically update the firewall rules when the IPs
change.
To get a list of the IP addresses for Azure Machine Learning, download the Azure IP
Ranges and Service Tags and search the file for AzureMachineLearning.<region> ,
where <region> is the Azure region that contains your Azure Machine Learning
workspace.
To add the IP addresses to your Azure SQL Database, see IP firewall rules - Azure
SQL Database and Synapse Analytics.
Next steps
For information on enabling studio in a network, see Use Azure Machine Learning studio
in an Azure Virtual Network.
Connect to storage services on Azure with datastores
Article • 04/04/2023
In this article, learn how to connect to data storage services on Azure with Azure Machine Learning datastores and the Azure Machine
Learning Python SDK.
Datastores securely connect to your storage service on Azure without putting your authentication credentials and the integrity of your
original data source at risk. They store connection information, like your subscription ID and token authorization in your Key Vault that's
associated with the workspace, so you can securely access your storage without having to hard code them in your scripts. You can create
datastores that connect to these Azure storage solutions.
To understand where datastores fit in Azure Machine Learning's overall data access workflow, see the Securely access data article.
For a low code experience, see how to use the Azure Machine Learning studio to create and register datastores.
 Tip
This article assumes you want to connect to your storage service with credential-based authentication credentials, like a service
principal or a shared access signature (SAS) token. Keep in mind, if credentials are registered with datastores, all users with workspace
Reader role are able to retrieve these credentials. Learn more about workspace Reader role..
If this is a concern, learn how to Connect to storage services with identity based access.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of
An Azure storage account with a supported storage type.
The Azure Machine Learning SDK for Python.
An Azure Machine Learning workspace.
Either create an Azure Machine Learning workspace or use an existing one via the Python SDK.
Import the Workspace and Datastore class, and load your subscription information from the file config.json using the function
from_config() . This looks for the JSON file in the current directory by default, but you can also specify a path parameter to point to
the file using from_config(path="your/file/path") .
Python
import azureml.core
from azureml.core import Workspace, Datastore
When you create a workspace, an Azure blob container and an Azure file share are automatically registered as datastores to the
workspace. They're named workspaceblobstore and workspacefilestore , respectively. The workspaceblobstore is used to store
workspace artifacts and your machine learning experiment logs. It's also set as the default datastore and can't be deleted from the
workspace. The workspacefilestore is used to store notebooks and R scripts authorized via compute instance.
７ Note
Azure Machine Learning designer will create a datastore named azureml_globaldatasets automatically when you open a sample
in the designer homepage. This datastore only contains sample datasets. Please do not use this datastore for any confidential
data access.
Supported data storage service types

Datastores currently support storing connection information to the storage services listed in the following matrix.
 Tip
For unsupported storage solutions (those not listed in the table below), you may run into issues connecting and working with your
data. We suggest you move your data to a supported Azure storage solution. Doing this may also help with additional scenarios, like
saving data egress cost during ML experiments.
Storage type Authentication type Azure Machine Learning Azure Machine Learning Azure Machine Learning Azure Machine Learning
studio Python SDK CLI REST API
Azure Blob Storage Account key ✓ ✓ ✓ ✓

SAS token
Azure File Share Account key ✓ ✓ ✓ ✓

SAS token
Azure Data Lake Storage Service principal ✓ ✓ ✓ ✓

Gen 1
Azure Data Lake Storage Service principal ✓ ✓ ✓ ✓

Gen 2
Azure SQL Database SQL authentication ✓ ✓ ✓ ✓

Service principal
Azure PostgreSQL SQL authentication ✓ ✓ ✓ ✓
Azure Database for MySQL SQL authentication ✓* ✓* ✓*
Databricks File System No authentication ✓** ✓ ** ✓**
MySQL is only supported for pipeline DataTransferStep.

Databricks is only supported for pipeline DatabricksStep.
Storage guidance
We recommend creating a datastore for an Azure Blob container. Both standard and premium storage are available for blobs. Although
premium storage is more expensive, its faster throughput speeds might improve the speed of your training runs, particularly if you train
against a large dataset. For information about the cost of storage accounts, see the Azure pricing calculator .
Azure Data Lake Storage Gen2 is built on top of Azure Blob storage and designed for enterprise big data analytics. A fundamental part of
Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into
a hierarchy of directories for efficient data access.
Storage access and permissions

To ensure you securely connect to your Azure storage service, Azure Machine Learning requires that you have permission to access the
corresponding data storage container. This access depends on the authentication credentials used to register the datastore.
７ Note
This guidance also applies to datastores created with identity-based data access.
Virtual network
Azure Machine Learning requires extra configuration steps to communicate with a storage account that is behind a firewall or within a
virtual network. If your storage account is behind a firewall, you can add your client's IP address to an allowlist via the Azure portal.
Azure Machine Learning can receive requests from clients outside of the virtual network. To ensure that the entity requesting data from the
service is safe and to enable data being displayed in your workspace, use a private endpoint with your workspace.
For Python SDK users, to access your data via your training script on a compute target, the compute target needs to be inside the same
virtual network and subnet of the storage. You can use a compute instance/cluster in the same virtual network.
For Azure Machine Learning studio users, several features rely on the ability to read data from a dataset, such as dataset previews, profiles,
and automated machine learning. For these features to work with storage behind virtual networks, use a workspace managed identity in
the studio to allow Azure Machine Learning to access the storage account from outside the virtual network.
７ Note
If your data storage is an Azure SQL Database behind a virtual network, be sure to set Deny public access to No via the Azure portal
to allow Azure Machine Learning to access the storage account.
Access validation
２ Warning
Cross tenant access to storage accounts is not supported. If cross tenant access is needed for your scenario, please reach out to the
Azure Machine Learning Data Support team alias at amldatasupport@microsoft.com for assistance with a custom code solution.
As part of the initial datastore creation and registration process, Azure Machine Learning automatically validates that the underlying
storage service exists and the user provided principal (username, service principal, or SAS token) has access to the specified storage.
After datastore creation, this validation is only performed for methods that require access to the underlying storage container, not each
time datastore objects are retrieved. For example, validation happens if you want to download files from your datastore; but if you just want
to change your default datastore, then validation doesn't happen.
To authenticate your access to the underlying storage service, you can provide either your account key, shared access signatures (SAS)
tokens, or service principal in the corresponding register_azure_*() method of the datastore type you want to create. The storage type
matrix lists the supported authentication types that correspond to each datastore type.
You can find account key, SAS token, and service principal information on your Azure portal .
If you plan to use an account key or SAS token for authentication, select Storage Accounts on the left pane, and choose the storage
account that you want to register.
The Overview page provides information such as the account name, container, and file share name.
For account keys, go to Access keys on the Settings pane.
For SAS tokens, go to Shared access signatures on the Settings pane.
If you plan to use a service principal for authentication, go to your App registrations and select which app you want to use.
Its corresponding Overview page will contain required information like tenant ID and client ID.
） Important
If you need to change your access keys for an Azure Storage account (account key or SAS token), be sure to sync the new credentials
with your workspace and the datastores connected to it. Learn how to sync your updated credentials.
Permissions
For Azure blob container and Azure Data Lake Gen 2 storage, make sure your authentication credentials have Storage Blob Data Reader
access. Learn more about Storage Blob Data Reader. An account SAS token defaults to no permissions.
For data read access, your authentication credentials must have a minimum of list and read permissions for containers and objects.
For data write access, write and add permissions also are required.
Create and register datastores

When you register an Azure storage solution as a datastore, you automatically create and register that datastore to a specific workspace.
Review the storage access & permissions section for guidance on virtual network scenarios, and where to find required authentication
credentials.
Within this section are examples for how to create and register a datastore via the Python SDK for the following storage types. The
parameters provided in these examples are the required parameters to create and register a datastore.
Azure blob container

Azure file share
Azure Data Lake Storage Generation 2
To create datastores for other supported storage services, see the reference documentation for the applicable register_azure_* methods.
If you prefer a low code experience, see Connect to data with Azure Machine Learning studio.
） Important
If you unregister and re-register a datastore with the same name, and it fails, the Azure Key Vault for your workspace may not have
soft-delete enabled. By default, soft-delete is enabled for the key vault instance created by your workspace, but it may not be enabled
if you used an existing key vault or have a workspace created prior to October 2020. For information on how to enable soft-delete, see
Turn on Soft Delete for an existing key vault.
７ Note
Datastore name should only consist of lowercase letters, digits and underscores.

To register an Azure blob container as a datastore, use register_azure_blob_container().
The following code creates and registers the blob_datastore_name datastore to the ws workspace. This datastore accesses the my-
container-name blob container on the my-account-name storage account, by using the provided account access key. Review the storage
access & permissions section for guidance on virtual network scenarios, and where to find required authentication credentials.
Python
blob_datastore_name='azblobsdk' # Name of the datastore to workspace

container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") # Name of Azure blob container
account_name=os.getenv("BLOB_ACCOUNTNAME", "<my-account-name>") # Storage account name
account_key=os.getenv("BLOB_ACCOUNT_KEY", "<my-account-key>") # Storage account access key
blob_datastore = Datastore.register_azure_blob_container(workspace=ws,
datastore_name=blob_datastore_name,
container_name=container_name,
account_name=account_name,
account_key=account_key)
Azure file share

To register an Azure file share as a datastore, use register_azure_file_share().
The following code creates and registers the file_datastore_name datastore to the ws workspace. This datastore accesses the my-
fileshare-name file share on the my-account-name storage account, by using the provided account access key. Review the storage access &
permissions section for guidance on virtual network scenarios, and where to find required authentication credentials.
Python
file_datastore_name='azfilesharesdk' # Name of the datastore to workspace

file_share_name=os.getenv("FILE_SHARE_CONTAINER", "<my-fileshare-name>") # Name of Azure file share container
account_name=os.getenv("FILE_SHARE_ACCOUNTNAME", "<my-account-name>") # Storage account name
account_key=os.getenv("FILE_SHARE_ACCOUNT_KEY", "<my-account-key>") # Storage account access key
file_datastore = Datastore.register_azure_file_share(workspace=ws,
datastore_name=file_datastore_name,
file_share_name=file_share_name,
Azure Data Lake Storage Generation 2

For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use register_azure_data_lake_gen2() to register a credential datastore
connected to an Azure DataLake Gen 2 storage with service principal permissions.
In order to utilize your service principal, you need to register your application and grant the service principal data access via either Azure
role-based access control (Azure RBAC) or access control lists (ACL). Learn more about access control set up for ADLS Gen 2.
The following code creates and registers the adlsgen2_datastore_name datastore to the ws workspace. This datastore accesses the file
system test in the account_name storage account, by using the provided service principal credentials. Review the storage access &
permissions section for guidance on virtual network scenarios, and where to find required authentication credentials.
Python
adlsgen2_datastore_name = 'adlsgen2datastore'
subscription_id=os.getenv("ADL_SUBSCRIPTION", "<my_subscription_id>") # subscription id of ADLS account

resource_group=os.getenv("ADL_RESOURCE_GROUP", "<my_resource_group>") # resource group of ADLS account
account_name=os.getenv("ADLSGEN2_ACCOUNTNAME", "<my_account_name>") # ADLS Gen2 account name

tenant_id=os.getenv("ADLSGEN2_TENANT", "<my_tenant_id>") # tenant id of service principal
client_id=os.getenv("ADLSGEN2_CLIENTID", "<my_client_id>") # client id of service principal
client_secret=os.getenv("ADLSGEN2_CLIENT_SECRET", "<my_client_secret>") # the secret of service principal
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name, # ADLS Gen2 account name
filesystem='test', # ADLS Gen2 filesystem
tenant_id=tenant_id, # tenant id of service principal
client_id=client_id, # client id of service principal
client_secret=client_secret) # the secret of service principal
Create datastores with other Azure tools

In addition to creating datastores with the Python SDK and the studio, you can also use Azure Resource Manager templates or the Azure
Machine Learning VS Code extension.
Azure Resource Manager

There are several templates at https://github.com/Azure/azure-quickstart-
templates/tree/master/quickstarts/microsoft.machinelearningservices that can be used to create datastores.
For information on using these templates, see Use an Azure Resource Manager template to create a workspace for Azure Machine Learning.
VS Code extension
If you prefer to create and manage datastores using the Azure Machine Learning VS Code extension, visit the VS Code resource
management how-to guide to learn more.
Use data in your datastores

After you create a datastore, create an Azure Machine Learning dataset to interact with your data. Datasets package your data into a lazily
evaluated consumable object for machine learning tasks, like training.
With datasets, you can download or mount files of any format from Azure storage services for model training on a compute target. Learn
more about how to train ML models with datasets.
Get datastores from your workspace

To get a specific datastore registered in the current workspace, use the get() static method on the Datastore class:
Python
# Get a named datastore from the current workspace

datastore = Datastore.get(ws, datastore_name='your datastore name')
To get the list of datastores registered with a given workspace, you can use the datastores property on a workspace object:
Python
# List all datastores registered in the current workspace
datastores = ws.datastores
for name, datastore in datastores.items():
print(name, datastore.datastore_type)
To get the workspace's default datastore, use this line:
Python
You can also change the default datastore with the following code. This ability is only supported via the SDK.
Python
ws.set_default_datastore(new_default_datastore)
Access data during scoring

Azure Machine Learning provides several ways to use your models for scoring. Some of these methods don't provide access to datastores.
Use the following table to understand which methods allow you to access datastores during scoring:
Method Datastore access Description
Batch prediction ✔ Make predictions on large quantities of data asynchronously.
Web service Deploy models as a web service.
For situations where the SDK doesn't provide access to datastores, you might be able to create custom code by using the relevant Azure
SDK to access the data. For example, the Azure Storage SDK for Python is a client library that you can use to access data stored in blobs
or files.
Move data to supported Azure storage solutions

Azure Machine Learning supports accessing data from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake
Storage Gen2, Azure SQL Database, and Azure Database for PostgreSQL. If you're using unsupported storage, we recommend that you
move your data to supported Azure storage solutions by using Azure Data Factory and these steps. Moving data to supported storage can
help you save data egress costs during machine learning experiments.
Azure Data Factory provides efficient and resilient data transfer with more than 80 prebuilt connectors at no extra cost. These connectors
include Azure data services, on-premises data sources, Amazon S3 and Redshift, and Google BigQuery.
Next steps
Create an Azure machine learning dataset
Train a model
Deploy a model
Connect to storage by using identity-
based data access with SDK v1
Article • 03/02/2023
In this article, you'll learn how to connect to storage services on Azure, with identity-
based data access and Azure Machine Learning datastores via the Azure Machine
Learning SDK for Python.
Typically, datastores use credential-based authentication to confirm you have

permission to access the storage service. They keep connection information, like your
subscription ID and token authorization, in the key vault that's associated with the
workspace. When you create a datastore that uses identity-based data access, your
Azure account (Azure Active Directory token) is used to confirm you have permission to
access the storage service. In the identity-based data access scenario, no authentication
credentials are saved. Only the storage account information is stored in the datastore.
To create datastores with identity-based data access via the Azure Machine Learning
studio UI, see Connect to data with the Azure Machine Learning studio.
To create datastores that use credential-based authentication, like access keys or service
principals, see Connect to storage services on Azure.
Identity-based data access in Azure Machine

Learning
There are two scenarios in which you can apply identity-based data access in Azure
Machine Learning. These scenarios are a good fit for identity-based access when you're
working with confidential data and need more granular data access management:
２ Warning
Identity-based data access is not supported for automated ML experiments.
Accessing storage services

Training machine learning models with private data
Accessing storage services

You can connect to storage services via identity-based data access with Azure Machine
Learning datastores or Azure Machine Learning datasets.
Your authentication credentials are kept in a datastore, which is used to ensure you have
permission to access the storage service. When these credentials are registered via
datastores, any user with the workspace Reader role can retrieve them. That scale of
access can be a security concern for some organizations. Learn more about the
workspace Reader role.
When you use identity-based data access, Azure Machine Learning prompts you for
your Azure Active Directory token for data access authentication, instead of keeping
your credentials in the datastore. That approach allows for data access management at
the storage level and keeps credentials confidential.
The same behavior applies when you:
Create a dataset directly from storage URLs.

Work with data interactively via a Jupyter Notebook on your local computer or
compute instance.
７ Note
Credentials stored via credential-based authentication include subscription IDs,

shared access signature (SAS) tokens, and storage access key and service principal
information, like client IDs and tenant IDs.
Model training on private data

Certain machine learning scenarios involve training models with private data. In such
cases, data scientists need to run training workflows without being exposed to the
confidential input data. In this scenario, a managed identity of the training compute is
used for data access authentication. This approach allows storage admins to grant
Storage Blob Data Reader access to the managed identity that the training compute
uses to run the training job. The individual data scientists don't need to be granted
access. For more information, see Set up managed identity on a compute cluster.
Prerequisites
An Azure subscription. If you don't have an Azure subscription, create a free
account before you begin. Try the free or paid version of Azure Machine
Learning .
An Azure storage account with a supported storage type. These storage types are
supported:
Azure Blob Storage
Azure SQL Database
Either create an Azure Machine Learning workspace or use an existing one via the
Python SDK.
Create and register datastores

When you register a storage service on Azure as a datastore, you automatically create
and register that datastore to a specific workspace. See Storage access permissions for
guidance on required permission types. You can also manually create the storage you
want to connect to without any special permissions, and you just need the name.
See Work with virtual networks for details on how to connect to data storage behind
virtual networks.
In the following code, notice the absence of authentication parameters like sas_token ,
account_key , subscription_id , and the service principal client_id . This omission
indicates that Azure Machine Learning will use identity-based data access for
authentication. Creation of datastores typically happens interactively in a notebook or
via the studio. So your Azure Active Directory token is used for data access
authentication.
７ Note
Datastore names should consist only of lowercase letters, numbers, and

underscores.

To register an Azure blob container as a datastore, use register_azure_blob_container().
The following code creates the credentialless_blob datastore, registers it to the ws
workspace, and assigns it to the blob_datastore variable. This datastore accesses the
my_container_name blob container on the my-account-name storage account.
Python
# Create blob datastore without credentials.

datastore_name='credentialless_blob',
container_name='my_container_name',
account_name='my_account_name')

Use register_azure_data_lake() to register a datastore that connects to Azure Data Lake
Storage Gen1.
The following code creates the credentialless_adls1 datastore, registers it to the

workspace workspace, and assigns it to the adls_dstore variable. This datastore
accesses the adls_storage Azure Data Lake Storage account.
Python
# Create Azure Data Lake Storage Gen1 datastore without credentials.

adls_dstore = Datastore.register_azure_data_lake(workspace = workspace,
datastore_name='credentialless_adls1',
store_name='adls_storage')

Use register_azure_data_lake_gen2() to register a datastore that connects to Azure Data
Lake Storage Gen2.
The following code creates the credentialless_adls2 datastore, registers it to the ws

workspace, and assigns it to the adls2_dstore variable. This datastore accesses the file
system tabular in the myadls2 storage account.
Python
# Create Azure Data Lake Storage Gen2 datastore without credentials.
adls2_dstore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name='credentialless_adls2',
filesystem='tabular',
account_name='myadls2')
Azure SQL database

For an Azure SQL database, use register_azure_sql_database() to register a datastore
that connects to an Azure SQL database storage.
The following code creates and registers the credentialless_sqldb datastore to the ws
workspace and assigns it to the variable, sqldb_dstore . This datastore accesses the
database mydb in the myserver SQL DB server.
Python
# Create a sqldatabase datastore without credentials
sqldb_dstore = Datastore.register_azure_sql_database(workspace=ws,
datastore_name='credentialless_sqldb',
server_name='myserver',
database_name='mydb')
Storage access permissions

To help ensure that you securely connect to your storage service on Azure, Azure
Machine Learning requires that you have permission to access the corresponding data
storage.
２ Warning
Cross tenant access to storage accounts is not supported. If cross tenant access is
needed for your scenario, please reach out to the Azure Machine Learning Data
Support team alias at amldatasupport@microsoft.com for assistance with a custom
code solution.
Identity-based data access supports connections to only the following storage services.
Azure Blob Storage

Azure SQL Database
To access these storage services, you must have at least Storage Blob Data Reader
access to the storage account. Only storage account owners can change your access
level via the Azure portal.
If you prefer to not use your user identity (Azure Active Directory), you can also grant a
workspace managed-system identity (MSI) permission to create the datastore. To do so,
you must have Owner permissions to the storage account and add the
grant_workspace_access= True parameter to your data register method.
If you're training a model on a remote compute target and want to access the data for
training, the compute identity must be granted at least the Storage Blob Data Reader
role from the storage service. Learn how to set up managed identity on a compute
cluster.
Work with virtual networks

By default, Azure Machine Learning can't communicate with a storage account that's
behind a firewall or in a virtual network.
You can configure storage accounts to allow access only from within specific virtual
networks. This configuration requires more steps, to ensure that data doesn't leak
outside of the network. This behavior is the same for credential-based data access. For
more information, see How to configure virtual network scenarios.
If your storage account has virtual network settings, they dictate the needed identity
type and permissions access. For example for data preview and data profile, the virtual
network settings determine what type of identity is used to authenticate data access.
In scenarios where only certain IPs and subnets are allowed to access the storage,
then Azure Machine Learning uses the workspace MSI to accomplish data previews
and profiles.
If your storage is ADLS Gen 2 or Blob and has virtual network settings, customers
can use either user identity or workspace MSI depending on the datastore settings
defined during creation.
If the virtual network setting is “Allow Azure services on the trusted services list to
access this storage account”, then Workspace MSI is used.
Use data in storage

We recommend that you use Azure Machine Learning datasets when you interact with
your data in storage with Azure Machine Learning.
） Important
Datasets using identity-based data access are not supported for automated ML
experiments.
Datasets package your data into a lazily evaluated consumable object for machine
learning tasks like training. Also, with datasets you can download or mount files of any
format from Azure storage services like Azure Blob Storage and Azure Data Lake Storage
to a compute target.
To create a dataset, you can reference paths from datastores that also use identity-
based data access.
If you're underlying storage account type is Blob or ADLS Gen 2, your user identity
needs Blob Reader role.
If your underlying storage is ADLS Gen 1, permissions need can be set via the
storage's Access Control List (ACL).
In the following example, blob_datastore already exists and uses identity-based data
access.
Python
blob_dataset =
Dataset.Tabular.from_delimited_files(blob_datastore,'test.csv')
Another option is to skip datastore creation and create datasets directly from storage
URLs. This functionality currently supports only Azure blobs and Azure Data Lake
Storage Gen1 and Gen2. For creation based on storage URL, only the user identity is
needed to authenticate.
Python
blob_dset =
Dataset.File.from_files('https://myblob.blob.core.windows.net/may/keras-
mnist-fashion/')
When you submit a training job that consumes a dataset created with identity-based
data access, the managed identity of the training compute is used for data access
authentication. Your Azure Active Directory token isn't used. For this scenario, ensure
that the managed identity of the compute is granted at least the Storage Blob Data
Reader role from the storage service. For more information, see Set up managed identity
on compute clusters.
Next steps
Create an Azure Machine Learning dataset
Train with datasets
Create a datastore with key-based data access
Create Azure Machine Learning datasets
Article • 04/13/2023
In this article, you learn how to create Azure Machine Learning datasets to access data for your local or remote
experiments with the Azure Machine Learning Python SDK. To understand where datasets fit in Azure Machine
Learning's overall data access workflow, see the Securely access data article.
By creating a dataset, you create a reference to the data source location, along with a copy of its metadata.
Because the data remains in its existing location, you incur no extra storage cost, and don't risk the integrity of
your data sources. Also datasets are lazily evaluated, which aids in workflow performance speeds. You can create
datasets from datastores, public URLs, and Azure Open Datasets.
For a low-code experience, Create Azure Machine Learning datasets with the Azure Machine Learning studio.
With Azure Machine Learning datasets, you can:
Keep a single copy of data in your storage, referenced by datasets.
Seamlessly access data during model training without worrying about connection strings or data paths. Learn
more about how to train with datasets.
Share data and collaborate with other users.
） Important
Items in this article marked as "preview" are currently in public preview. The preview version is provided
without a service level agreement, and it's not recommended for production workloads. Certain features
might not be supported or might have constrained capabilities. For more information, see Supplemental
Terms of Use for Microsoft Azure Previews .
Prerequisites
To create and work with datasets, you need:
An Azure subscription. If you don't have one, create a free account before you begin. Try the free or paid
version of Azure Machine Learning .
The Azure Machine Learning SDK for Python installed, which includes the azureml-datasets package.
Create an Azure Machine Learning compute instance, which is a fully configured and managed
development environment that includes integrated notebooks and the SDK already installed.
OR
Work on your own Jupyter notebook and install the SDK yourself.
７ Note
Some dataset classes have dependencies on the azureml-dataprep package, which is only compatible with
64-bit Python. If you are developing on Linux, these classes rely on .NET Core 2.1, and are only supported on
specific distributions. For more information on the supported distros, see the .NET Core 2.1 column in the
Install .NET on Linux article.
） Important
While the package may work on older versions of Linux distros, we do not recommend using a distro that is
out of mainstream support. Distros that are out of mainstream support may have security vulnerabilities, as
they do not receive the latest updates. We recommend using the latest supported version of your distro that
is compatible with .
Compute size guidance

When creating a dataset, review your compute processing power and the size of your data in memory. The size of
your data in storage is not the same as the size of data in a dataframe. For example, data in CSV files can expand
up to 10x in a dataframe, so a 1 GB CSV file can become 10 GB in a dataframe.
If your data is compressed, it can expand further; 20 GB of relatively sparse data stored in compressed parquet
format can expand to ~800 GB in memory. Since Parquet files store data in a columnar format, if you only need
half of the columns, then you only need to load ~400 GB in memory.
Learn more about optimizing data processing in Azure Machine Learning.
Dataset types
There are two dataset types, based on how users consume them in training; FileDatasets and TabularDatasets. Both
types can be used in Azure Machine Learning training workflows involving, estimators, AutoML, hyperDrive and
pipelines.
FileDataset
A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed,
and ready to use in training experiments, you can download or mount the files to your compute as a FileDataset
object.
We recommend FileDatasets for your machine learning workflows, since the source files can be in any format,
which enables a wider range of machine learning scenarios, including deep learning.
Create a FileDataset with the Python SDK or the Azure Machine Learning studio .
TabularDataset
A TabularDataset represents data in a tabular format by parsing the provided file or list of files. This provides you
with the ability to materialize the data into a pandas or Spark DataFrame so you can work with familiar data
preparation and training libraries without having to leave your notebook. You can create a TabularDataset object
from .csv, .tsv, .parquet, .jsonl files, and from SQL query results.
With TabularDatasets, you can specify a time stamp from a column in the data or from wherever the path pattern
data is stored to enable a time series trait. This specification allows for easy and efficient filtering by time. For an
example, see Tabular time series-related API demo with NOAA weather data .
Create a TabularDataset with the Python SDK or Azure Machine Learning studio.
７ Note
Automated ML workflows generated via the Azure Machine Learning studio currently only support
TabularDatasets.
７ Note
For TabularDatasets generating from SQL query results, T-SQL (e.g. 'WITH' sub query) or duplicate column
name is not supported. Complex queries like T-SQL can cause performance issues. Duplicate column names in
a dataset can cause ambiguity issues.
Access datasets in a virtual network

If your workspace is in a virtual network, you must configure the dataset to skip validation. For more information
on how to use datastores and datasets in a virtual network, see Secure a workspace and associated resources.
Create datasets from datastores

For the data to be accessible by Azure Machine Learning, datasets must be created from paths in Azure Machine
Learning datastores or web URLs.
 Tip
You can create datasets directly from storage urls with identity-based data access. Learn more at Connect to
storage with identity-based data access.
To create datasets from a datastore with the Python SDK:
1. Verify that you have contributor or owner access to the underlying storage service of your registered Azure
Machine Learning datastore. Check your storage account permissions in the Azure portal.
2. Create the dataset by referencing paths in the datastore. You can create a dataset from multiple paths in
multiple datastores. There is no hard limit on the number of files or data size that you can create a dataset
from.
７ Note
For each data path, a few requests will be sent to the storage service to check whether it points to a file or a
folder. This overhead may lead to degraded performance or failure. A dataset referencing one folder with
1000 files inside is considered referencing one data path. We recommend creating dataset referencing less
than 100 paths in datastores for optimal performance.
Create a FileDataset
Use the from_files() method on the FileDatasetFactory class to load files in any format and to create an
unregistered FileDataset.
If your storage is behind a virtual network or firewall, set the parameter validate=False in your from_files()
method. This bypasses the initial validation step, and ensures that you can create your dataset from these secure
files. Learn more about how to use datastores and datasets in a virtual network.
Python
from azureml.core import Workspace, Datastore, Dataset
# create a FileDataset pointing to files in 'animals' folder and its subfolders recursively
datastore_paths = [(datastore, 'animals')]
animal_ds = Dataset.File.from_files(path=datastore_paths)
# create a FileDataset from image and label files behind public web urls
web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz']
mnist_ds = Dataset.File.from_files(path=web_paths)
If you want to upload all the files from a local directory, create a FileDataset in a single method with
upload_directory(). This method uploads data to your underlying storage, and as a result incur storage costs.
Python

from azureml.data.datapath import DataPath
datastore = Datastore.get(ws, '<name of your datastore>')
ds = Dataset.File.upload_directory(src_dir='<path to you data>',
target=DataPath(datastore, '<path on the datastore>'),
show_progress=True)
To reuse and share datasets across experiment in your workspace, register your dataset.
Create a TabularDataset
Use the from_delimited_files() method on the TabularDatasetFactory class to read files in .csv or .tsv format, and to
create an unregistered TabularDataset. To read in files from .parquet format, use the from_parquet_files() method. If
you're reading from multiple files, results will be aggregated into one tabular representation.
See the TabularDatasetFactory reference documentation for information about supported file formats, as well as
syntax and design patterns such as multiline support.
If your storage is behind a virtual network or firewall, set the parameter validate=False in your
from_delimited_files() method. This bypasses the initial validation step, and ensures that you can create your
dataset from these secure files. Learn more about how to use datastores and datasets in a virtual network.
The following code gets the existing workspace and the desired datastore by name. And then passes the datastore
and file locations to the path parameter to create a new TabularDataset, weather_ds .
Python
datastore_name = 'your datastore name'
# get existing workspace

# retrieve an existing datastore in the workspace by name

datastore = Datastore.get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore
datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
Set data schema

By default, when you create a TabularDataset, column data types are inferred automatically. If the inferred types
don't match your expectations, you can update your dataset schema by specifying column types with the following
code. The parameter infer_column_type is only applicable for datasets created from delimited files. Learn more
about supported data types.
Python

from azureml.data.dataset_factory import DataType
# create a TabularDataset from a delimited file behind a public web url and convert column
"Survived" to boolean
web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path, set_column_types={'Survived':
DataType.to_bool()})
# preview the first 3 rows of titanic_ds

titanic_ds.take(3).to_pandas_dataframe()
(Index) PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 False 3 Braund, male 22.0 1 0 A/5 7.2500 S

Mr. Owen 21171
Harris
1 2 True 1 Cumings, female 38.0 1 0 PC 17599 71.2833 C85 C

Mrs. John
Bradley
(Florence
Briggs
Th...
2 3 True 3 Heikkinen, female 26.0 0 0 STON/O2. 7.9250 S

Miss. 3101282
Laina
To reuse and share datasets across experiments in your workspace, register your dataset.
Wrangle data
After you create and register your dataset, you can load it into your notebook for data wrangling and exploration
prior to model training.
If you don't need to do any data wrangling or exploration, see how to consume datasets in your training scripts for
submitting ML experiments in Train with datasets.
Filter datasets (preview)

Filtering capabilities depends on the type of dataset you have.
） Important
Filtering datasets with the preview method, filter() is an experimental preview feature, and may change at any
time.
For TabularDatasets, you can keep or remove columns with the keep_columns() and drop_columns() methods.
To filter out rows by a specific column value in a TabularDataset, use the filter() method (preview).
The following examples return an unregistered dataset based on the specified expressions.
Python
# TabularDataset that only contains records where the age column value is greater than 15
tabular_dataset = tabular_dataset.filter(tabular_dataset['age'] > 15)
# TabularDataset that contains records where the name column value contains 'Bri' and the age column
value is greater than 15
tabular_dataset = tabular_dataset.filter((tabular_dataset['name'].contains('Bri')) &
(tabular_dataset['age'] > 15))
In FileDatasets, each row corresponds to a path of a file, so filtering by column value is not helpful. But, you can
filter() out rows by metadata like, CreationTime, Size etc.
The following examples return an unregistered dataset based on the specified expressions.
Python
# FileDataset that only contains files where Size is less than 100000
file_dataset = file_dataset.filter(file_dataset.file_metadata['Size'] < 100000)
# FileDataset that only contains files that were either created prior to Jan 1, 2020 or where
file_dataset = file_dataset.filter((file_dataset.file_metadata['CreatedTime'] < datetime(2020,1,1))
| (file_dataset.file_metadata['CanSeek'] == False))
Labeled datasets created from image labeling projects are a special case. These datasets are a type of
TabularDataset made up of image files. For these types of datasets, you can filter() images by metadata, and by
column values like label and image_details .
Python
# Dataset that only contains records where the label column value is dog
labeled_dataset = labeled_dataset.filter(labeled_dataset['label'] == 'dog')
# Dataset that only contains records where the label and isCrowd columns are True and where the file
size is larger than 100000
labeled_dataset = labeled_dataset.filter((labeled_dataset['label']['isCrowd'] == True) &
(labeled_dataset.file_metadata['Size'] > 100000))
Partition data
You can partition a dataset by including the partitions_format parameter when creating a TabularDataset or
FileDataset.
When you partition a dataset, the partition information of each file path is extracted into columns based on the
specified format. The format should start from the position of first partition key until the end of file path.
For example, given the path ../Accounts/2019/01/01/data.jsonl where the partition is by department name and
time; the partition_format='/{Department}/{PartitionDate:yyyy/MM/dd}/data.jsonl' creates a string column
'Department' with the value 'Accounts' and a datetime column 'PartitionDate' with the value 2019-01-01 .
If your data already has existing partitions and you want to preserve that format, include the partitioned_format
parameter in your from_files() method to create a FileDataset.
To create a TabularDataset that preserves existing partitions, include the partitioned_format parameter in the
from_parquet_files() or the from_delimited_files() method.
The following example,
Creates a FileDataset from partitioned files.

Gets the partition keys
Creates a new, indexed FileDataset using
Python
file_dataset = Dataset.File.from_files(data_paths, partition_format = '{userid}/*.wav')

ds.register(name='speech_dataset')
# access partition_keys
indexes = file_dataset.partition_keys # ['userid']
# get all partition key value pairs should return [{'userid': 'user1'}, {'userid': 'user2'}]
partitions = file_dataset.get_partition_key_values()
partitions = file_dataset.get_partition_key_values(['userid'])
# return [{'userid': 'user1'}, {'userid': 'user2'}]
# filter API, this will only download data from user1/ folder
new_file_dataset = file_dataset.filter(ds['userid'] == 'user1').download()
You can also create a new partitions structure for TabularDatasets with the partitions_by() method.
Python
dataset = Dataset.get_by_name('test') # indexed by country, state, partition_date
# call partition_by locally

new_dataset = ds.partition_by(name="repartitioned_ds", partition_keys=['country'],
target=DataPath(datastore, "repartition"))
partition_keys = new_dataset.partition_keys # ['country']
Explore data
After you're done wrangling your data, you can register your dataset, and then load it into your notebook for data
exploration prior to model training.
For FileDatasets, you can either mount or download your dataset, and apply the Python libraries you'd normally
use for data exploration. Learn more about mount vs download.
Python
# download the dataset

dataset.download(target_path='.', overwrite=False)
# mount dataset to the temp directory at `mounted_path`
import tempfile
mounted_path = tempfile.mkdtemp()
mount_context = dataset.mount(mounted_path)
mount_context.start()
For TabularDatasets, use the to_pandas_dataframe() method to view your data in a dataframe.
Python
# preview the first 3 rows of titanic_ds

titanic_ds.take(3).to_pandas_dataframe()
(Index) PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 False 3 Braund, male 22.0 1 0 A/5 7.2500 S

Mr. Owen 21171
Harris
1 2 True 1 Cumings, female 38.0 1 0 PC 17599 71.2833 C85 C

Mrs. John
Bradley
(Florence
Briggs
Th...
2 3 True 3 Heikkinen, female 26.0 0 0 STON/O2. 7.9250 S

Miss. 3101282
Laina
Create a dataset from pandas dataframe

To create a TabularDataset from an in memory pandas dataframe use the register_pandas_dataframe() method.
This method registers the TabularDataset to the workspace and uploads data to your underlying storage, which
incurs storage costs.
Python

import pandas as pd
pandas_df = pd.read_csv('<path to your csv file>')

datastore = Datastore.get(ws, '<name of your datastore>')
dataset = Dataset.Tabular.register_pandas_dataframe(pandas_df, datastore, "dataset_from_pandas_df",
show_progress=True)
 Tip
Create and register a TabularDataset from an in memory spark dataframe or a dask dataframe with the public
preview methods, register_spark_dataframe() and register_dask_dataframe(). These methods are
experimental preview features, and may change at any time.
These methods upload data to your underlying storage, and as a result incur storage costs.
Register datasets
To complete the creation process, register your datasets with a workspace. Use the register() method to register
datasets with your workspace in order to share them with others and reuse them across experiments in your
workspace:
Python
titanic_ds = titanic_ds.register(workspace=workspace,
name='titanic_ds',
description='titanic training data')
Create datasets using Azure Resource Manager

There are many templates at https://github.com/Azure/azure-quickstart-
templates/tree/master//quickstarts/microsoft.machinelearningservices that can be used to create datasets.
For information on using these templates, see Use an Azure Resource Manager template to create a workspace for
Train with datasets

Use your datasets in your machine learning experiments for training ML models. Learn more about how to train
with datasets.
Version datasets
You can register a new dataset under the same name by creating a new version. A dataset version is a way to
bookmark the state of your data so that you can apply a specific version of the dataset for experimentation or
future reproduction. Learn more about dataset versions.
Python
# create a TabularDataset from Titanic training data

web_paths = ['https://dprepdata.blob.core.windows.net/demo/Titanic.csv',
'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv']
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_paths)
# create a new version of titanic_ds

titanic_ds = titanic_ds.register(workspace = workspace,
name = 'titanic_ds',
description = 'new titanic training data',
create_new_version = True)
Next steps
Learn how to train with datasets.
Use automated machine learning to train with TabularDatasets .
For more dataset training examples, see the sample notebooks .
Connect to data with the Azure Machine
Learning studio
Article • 03/02/2023
In this article, learn how to access your data with the Azure Machine Learning studio .
Connect to your data in storage services on Azure with Azure Machine Learning
datastores, and then package that data for tasks in your ML workflows with Azure
Machine Learning datasets.
The following table defines and summarizes the benefits of datastores and datasets.
Object Description Benefits
Datastores Securely connect to your Because your information is securely stored, you
storage service on Azure, by
storing your connection
information, like your Don't put authentication credentials or original data
subscription ID and token sources at risk.
authorization in your Key No longer need to hard code them in your
Vault associated with the scripts.
workspace
Datasets By creating a dataset, you Because datasets are lazily evaluated, and the data
create a reference to the data remains in its existing location, you
source location, along with a
copy of its metadata. With Keep a single copy of data in your storage.
datasets you can, Incur no extra storage cost
Don't risk unintentionally changing your original
Access data during model data sources.
training. Improve ML workflow performance speeds.
Share data and collaborate
with other users.
Use open-source libraries,
like pandas, for data
exploration.
To understand where datastores and datasets fit in Azure Machine Learning's overall
data access workflow, see the Securely access data article.
For a code first experience, see the following articles to use the Azure Machine Learning
Python SDK to:
Connect to Azure storage services with datastores.

Create Azure Machine Learning datasets.
Prerequisites
Learning .
Access to Azure Machine Learning studio .
An Azure Machine Learning workspace. Create workspace resources.

When you create a workspace, an Azure blob container and an Azure file share
are automatically registered as datastores to the workspace. They're named
workspaceblobstore and workspacefilestore , respectively. If blob storage is
sufficient for your needs, the workspaceblobstore is set as the default datastore,
and already configured for use. Otherwise, you need a storage account on
Azure with a supported storage type.
Create datastores
You can create datastores from these Azure storage solutions. For unsupported storage
solutions, and to save data egress cost during ML experiments, you must move your
data to a supported Azure storage solution. Learn more about datastores.
You can create datastores with credential-based access or identity-based access.
Credential-based
Create a new datastore in a few steps with the Azure Machine Learning studio.
） Important
If your data storage account is in a virtual network, additional configuration

steps are required to ensure the studio has access to your data. See Network
isolation & privacy to ensure the appropriate configuration steps are applied.

2. Select Data on the left pane under Assets.
3. At the top, select Datastores.
4. Select +Create.
5. Complete the form to create and register a new datastore. The form
intelligently updates itself based on your selections for Azure storage type and
authentication type. See the storage access and permissions section to
understand where to find the authentication credentials you need to populate
this form.
The following example demonstrates what the form looks like when you create an
Azure blob datastore:
Create data assets

After you create a datastore, create a dataset to interact with your data. Datasets
package your data into a lazily evaluated consumable object for machine learning tasks,
like training. Learn more about datasets.
There are two types of datasets, FileDataset and TabularDataset. FileDatasets create
references to single or multiple files or public URLs. Whereas TabularDatasets represent
your data in a tabular format. You can create TabularDatasets from .csv, .tsv, .parquet,
.jsonl files, and from SQL query results.
The following steps describe how to create a dataset in Azure Machine Learning
studio .
７ Note
Datasets created through Azure Machine Learning studio are automatically

registered to the workspace.
1. Navigate to Azure Machine Learning studio
2. Under Assets in the left navigation, select Data. On the Data assets tab, select
Create
3. Give your data asset a name and optional description. Then, under Type, select one
of the Dataset types, either File or Tabular.
4. You have a few options for your data source. If your data is already stored in Azure,
choose "From Azure storage". If you want to upload data from your local drive,
choose "From local files". If your data is stored at a public web location, choose
"From web files". You can also create a data asset from a SQL database, or from
Azure Open Datasets.
5. For the file selection step, select where you want your data to be stored in Azure,
and what data files you want to use.
a. Enable skip validation if your data is in a virtual network. Learn more about
virtual network isolation and privacy.
6. Follow the steps to set the data parsing settings and schema for your data asset.
The settings will be pre-populated based on file type and you can further
configure your settings prior to creating the data asset.
7. Once you reach the Review step, click Create on the last page
Data preview and profile

After you create your dataset, verify you can view the preview and profile in the studio
with the following steps:
1. Sign in to the Azure Machine Learning studio

2. Under Assets in the left navigation, select Data.
3. Select the name of the dataset you want to view.

4. Select the Explore tab.
5. Select the Preview tab.
6. Select the Profile tab.
You can get a vast variety of summary statistics across your data set to verify whether
your data set is ML-ready. For non-numeric columns, they include only basic statistics
like min, max, and error count. For numeric columns, you can also review their statistical
moments and estimated quantiles.
Specifically, Azure Machine Learning dataset's data profile includes:
７ Note
Blank entries appear for features with irrelevant types.
Statistic Description
Feature Name of the column that is being summarized.
Profile In-line visualization based on the type inferred. For example, strings, booleans, and
dates will have value counts, while decimals (numerics) have approximated
histograms. This allows you to gain a quick understanding of the distribution of the
data.
Type In-line value count of types within a column. Nulls are their own type, so this
distribution visualization is useful for detecting odd or missing values.
Type Inferred type of the column. Possible values include: strings, booleans, dates, and
decimals.
Min Minimum value of the column. Blank entries appear for features whose type doesn't
have an inherent ordering (like, booleans).
Max Maximum value of the column.
Count Total number of missing and non-missing entries in the column.

Statistic Description
Not Number of entries in the column that aren't missing. Empty strings and errors are
missing treated as values, so they won't contribute to the "not missing count."
count
Quantiles Approximated values at each quantile to provide a sense of the distribution of the
data.
Mean Arithmetic mean or average of the column.
Standard Measure of the amount of dispersion or variation of this column's data.

deviation
Variance Measure of how far spread out this column's data is from its average value.
Skewness Measure of how different this column's data is from a normal distribution.
Kurtosis Measure of how heavily tailed this column's data is compared to a normal
distribution.
Storage access and permissions

To ensure you securely connect to your Azure storage service, Azure Machine Learning
requires that you have permission to access the corresponding data storage. This access
depends on the authentication credentials used to register the datastore.
Virtual network
If your data storage account is in a virtual network, extra configuration steps are
required to ensure Azure Machine Learning has access to your data. See Use Azure
Machine Learning studio in a virtual network to ensure the appropriate configuration
steps are applied when you create and register your datastore.
Access validation
２ Warning
Cross tenant access to storage accounts is not supported. If cross tenant access is
needed for your scenario, please reach out to the Azure Machine Learning Data
Support team alias at amldatasupport@microsoft.com for assistance with a custom
code solution.
As part of the initial datastore creation and registration process, Azure Machine
Learning automatically validates that the underlying storage service exists and the user
provided principal (username, service principal, or SAS token) has access to the specified
storage.
After datastore creation, this validation is only performed for methods that require
access to the underlying storage container, not each time datastore objects are
retrieved. For example, validation happens if you want to download files from your
datastore; but if you just want to change your default datastore, then validation doesn't
happen.
To authenticate your access to the underlying storage service, you can provide either
your account key, shared access signatures (SAS) tokens, or service principal according
to the datastore type you want to create. The storage type matrix lists the supported
authentication types that correspond to each datastore type.
You can find account key, SAS token, and service principal information on your Azure
portal .
If you plan to use an account key or SAS token for authentication, select Storage
Accounts on the left pane, and choose the storage account that you want to
register.
The Overview page provides information such as the account name, container,
and file share name.
1. For account keys, go to Access keys on the Settings pane.

2. For SAS tokens, go to Shared access signatures on the Settings pane.
If you plan to use a service principal for authentication, go to your App

registrations and select which app you want to use.
Its corresponding Overview page will contain required information like tenant
ID and client ID.
） Important
If you need to change your access keys for an Azure Storage account (account
key or SAS token), be sure to sync the new credentials with your workspace
and the datastores connected to it. Learn how to sync your updated
credentials.
If you unregister and re-register a datastore with the same name, and it fails,
the Azure Key Vault for your workspace may not have soft-delete enabled. By
default, soft-delete is enabled for the key vault instance created by your
workspace, but it may not be enabled if you used an existing key vault or have
a workspace created prior to October 2020. For information on how to enable
soft-delete, see Turn on Soft Delete for an existing key vault.
Permissions
For Azure blob container and Azure Data Lake Gen 2 storage, make sure your
authentication credentials have Storage Blob Data Reader access. Learn more about
Storage Blob Data Reader. An account SAS token defaults to no permissions.
For data read access, your authentication credentials must have a minimum of list
and read permissions for containers and objects.
For data write access, write and add permissions also are required.
Train with datasets

Use your datasets in your machine learning experiments for training ML models. Learn
more about how to train with datasets.
Next steps
A step-by-step example of training with TabularDatasets and automated machine
learning.
Train a model.
For more dataset training examples, see the sample notebooks .

Train models with Azure Machine
Learning datasets
Article • 04/04/2023
In this article, you learn how to work with Azure Machine Learning datasets to train
machine learning models. You can use datasets in your local or remote compute target
without worrying about connection strings or data paths.
For structured data, see Consume datasets in machine learning training scripts.
For unstructured data, see Mount files to remote compute targets.
Azure Machine Learning datasets provide a seamless integration with Azure Machine
Learning training functionality like ScriptRunConfig, HyperDrive, and Azure Machine
Learning pipelines.
If you aren't ready to make your data available for model training, but want to load your
data to your notebook for data exploration, see how to explore the data in your dataset.
Prerequisites
To create and train with datasets, you need:

Learning today.
The Azure Machine Learning SDK for Python installed (>= 1.13.0), which includes
the azureml-datasets package.
７ Note
Some Dataset classes have dependencies on the azureml-dataprep package. For

Linux users, these classes are supported only on the following distributions: Red
Hat Enterprise Linux, Ubuntu, Fedora, and CentOS.
Consume datasets in machine learning training
scripts
If you have structured data not yet registered as a dataset, create a TabularDataset and
use it directly in your training script for your local or remote experiment.
In this example, you create an unregistered TabularDataset and specify it as a script

argument in the ScriptRunConfig object for training. If you want to reuse this
TabularDataset with other experiments in your workspace, see how to register datasets
to your workspace.
Create a TabularDataset
The following code creates an unregistered TabularDataset from a web url.
Python
from azureml.core.dataset import Dataset
web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path)
TabularDataset objects provide the ability to load the data in your TabularDataset into a
pandas or Spark DataFrame so that you can work with familiar data preparation and
training libraries without having to leave your notebook.
Access dataset in training script

The following code configures a script argument --input-data that you'll specify when
you configure your training run (see next section). When the tabular dataset is passed in
as the argument value, Azure Machine Learning will resolve that to ID of the dataset,
which you can then use to access the dataset in your training script (without having to
hardcode the name or ID of the dataset in your script). It then uses the
to_pandas_dataframe() method to load that dataset into a pandas dataframe for further
data exploration and preparation prior to training.
７ Note
If your original data source contains NaN, empty strings or blank values, when you
use to_pandas_dataframe() , then those values are replaced as a Null value.
If you need to load the prepared data into a new dataset from an in-memory pandas
dataframe, write the data to a local file, like a parquet, and create a new dataset from
that file. Learn more about how to create datasets.
Python
%%writefile $script_folder/train_titanic.py
import argparse
from azureml.core import Dataset, Run
parser.add_argument("--input-data", type=str)
ws = run.experiment.workspace
# get the input dataset by ID

dataset = Dataset.get_by_id(ws, id=args.input_data)
# load the TabularDataset to pandas DataFrame

df = dataset.to_pandas_dataframe()
Configure the training run

A ScriptRunConfig object is used to configure and submit the training run.
This code creates a ScriptRunConfig object, src , that specifies:
A script directory for your scripts. All the files in this directory are uploaded into
the cluster nodes for execution.
The training script, train_titanic.py.
The input dataset for training, titanic_ds , as a script argument. Azure Machine
Learning will resolve this to corresponding ID of the dataset when it's passed to
your script.
The compute target for the run.
The environment for the run.
Python
src = ScriptRunConfig(source_directory=script_folder,
script='train_titanic.py',
# pass dataset as an input with friendly name
'titanic'
arguments=['--input-data',
titanic_ds.as_named_input('titanic')],
environment=myenv)
# Submit the run configuration for your training run

run = experiment.submit(src)
Mount files to remote compute targets

If you have unstructured data, create a FileDataset and either mount or download your
data files to make them available to your remote compute target for training. Learn
about when to use mount vs. download for your remote training experiments.
The following example,
Creates an input FileDataset, mnist_ds , for your training data.

Specifies where to write training results, and to promote those results as a
FileDataset.
Mounts the input dataset to the compute target.
７ Note
If you are using a custom Docker base image, you will need to install fuse via apt-
get install -y fuse as a dependency for dataset mount to work. Learn how to
build a custom build image.
For the notebook example, see How to configure a training run with data input and
output .
Create a FileDataset
The following example creates an unregistered FileDataset, mnist_data from web urls.
This FileDataset is the input data for your training run.
Learn more about how to create datasets from other sources.
Python
web_paths = [
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
]
mnist_ds = Dataset.File.from_files(path = web_paths)
Where to write training output

You can specify where to write your training results with an OutputFileDatasetConfig
object.
OutputFileDatasetConfig objects allow you to:
Mount or upload the output of a run to cloud storage you specify.

Save the output as a FileDataset to these supported storage types:
Azure blob
Azure file share
Azure Data Lake Storage generations 1 and 2
Track the data lineage between training runs.
The following code specifies that training results should be saved as a FileDataset in the
outputdataset folder in the default blob datastore, def_blob_store .
Python

def_blob_store = ws.get_default_datastore()
output = OutputFileDatasetConfig(destination=(def_blob_store,
'sample/outputdataset'))

We recommend passing the dataset as an argument when mounting via the arguments
parameter of the ScriptRunConfig constructor. By doing so, you get the data path
(mounting point) in your training script via arguments. This way, you're able to use the
same training script for local debugging and remote training on any cloud platform.
The following example creates a ScriptRunConfig that passes in the FileDataset via
arguments . After you submit the run, data files referred to by the dataset mnist_ds are
mounted to the compute target, and training results are saved to the specified
outputdataset folder in the default datastore.
Python
input_data= mnist_ds.as_named_input('input').as_mount()# the dataset will be

mounted on the remote compute
script='dummy_train.py',
arguments=[input_data, output],
environment=myenv)

Simple training script

The following script is submitted through the ScriptRunConfig. It reads the mnist_ds
dataset as input, and writes the file to the outputdataset folder in the default blob
datastore, def_blob_store .
Python
%%writefile $source_directory/dummy_train.py
# Copyright (c) Microsoft Corporation. All rights reserved.

# Licensed under the MIT License.
import sys
import os
print("*********************************************************")
print("Hello Azure Machine Learning!")
mounted_input_path = sys.argv[1]
mounted_output_path = sys.argv[2]
print("Argument 1: %s" % mounted_input_path)

print("Argument 2: %s" % mounted_output_path)
with open(mounted_input_path, 'r') as f:

content = f.read()
with open(os.path.join(mounted_output_path, 'output.csv'), 'w') as fw:
fw.write(content)
Mount vs download
Mounting or downloading files of any format are supported for datasets created from
Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1, Azure Data Lake Storage
Gen2, Azure SQL Database, and Azure Database for PostgreSQL.
When you mount a dataset, you attach the files referenced by the dataset to a directory
(mount point) and make it available on the compute target. Mounting is supported for
Linux-based computes, including Azure Machine Learning Compute, virtual machines,
and HDInsight. If your data size exceeds the compute disk size, downloading isn't
possible. For this scenario, we recommend mounting since only the data files used by
your script are loaded at the time of processing.
When you download a dataset, all the files referenced by the dataset will be
downloaded to the compute target. Downloading is supported for all compute types. If
your script processes all files referenced by the dataset, and your compute disk can fit
your full dataset, downloading is recommended to avoid the overhead of streaming
data from storage services. For multi-node downloads, see how to avoid throttling.
７ Note
The download path name should not be longer than 255 alpha-numeric characters
for Windows OS. For Linux OS, the download path name should not be longer than
4,096 alpha-numeric characters. Also, for Linux OS the file name (which is the last
segment of the download path /path/to/file/{filename} ) should not be longer
than 255 alpha-numeric characters.
The following code mounts dataset to the temp directory at mounted_path
Python
import tempfile
mounted_path = tempfile.mkdtemp()
# mount dataset onto the mounted_path of a Linux-based compute

mount_context = dataset.mount(mounted_path)
mount_context.start()
import os
print(os.listdir(mounted_path))
print (mounted_path)
Get datasets in machine learning scripts

Registered datasets are accessible both locally and remotely on compute clusters like
the Azure Machine Learning compute. To access your registered dataset across
experiments, use the following code to access your workspace and get the dataset that
was used in your previously submitted run. By default, the get_by_name() method on the
Dataset class returns the latest version of the dataset that's registered with the
workspace.
Python
%%writefile $script_folder/train.py
from azureml.core import Dataset, Run
workspace = run.experiment.workspace
dataset_name = 'titanic_ds'
# Get a dataset by name

titanic_ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)
# Load a TabularDataset into pandas DataFrame

df = titanic_ds.to_pandas_dataframe()
Access source code during training

Azure Blob storage has higher throughput speeds than an Azure file share and will scale
to large numbers of jobs started in parallel. For this reason, we recommend configuring
your runs to use Blob storage for transferring source code files.
The following code example specifies in the run configuration which blob datastore to
use for source code transfers.
Python
# workspaceblobstore is the default blob storage

src.run_config.source_directory_data_store = "workspaceblobstore"
Notebook examples
For more dataset examples and concepts, see the dataset notebooks .
See how to parametrize datasets in your ML pipelines .
Troubleshooting
Dataset initialization failed: Waiting for mount point to be ready has timed out:
If you don't have any outbound network security group rules and are using
azureml-sdk>=1.12.0 , update azureml-dataset-runtime and its dependencies to be
the latest for the specific minor version, or if you're using it in a run, recreate your
environment so it can have the latest patch with the fix.
If you're using azureml-sdk<1.12.0 , upgrade to the latest version.
If you have outbound NSG rules, make sure there's an outbound rule that allows all
traffic for the service tag AzureResourceMonitor .
Dataset initialization failed: StreamAccessException was caused by

ThrottlingException
For multi-node file downloads, all nodes may attempt to download all files in the file
dataset from the Azure Storage service, which results in a throttling error. To avoid
throttling, initially set the environment variable AZUREML_DOWNLOAD_CONCURRENCY to a value
of eight times the number of CPU cores divided by the number of nodes. Setting up a
value for this environment variable may require some experimentation, so the
aforementioned guidance is a starting point.
The following example assumes 32 cores and 4 nodes.
Python

myenv.environment_variables = {"AZUREML_DOWNLOAD_CONCURRENCY":64}
AzureFile storage
Unable to upload project files to working directory in AzureFile because the storage is
overloaded:
If you're using file share for other workloads, such as data transfer, the
recommendation is to use blobs so that file share is free to be used for submitting
runs.
Another option is to split the workload between two different workspaces.
ConfigException: Could not create a connection to the AzureFileService due to

missing credentials. Either an Account Key or SAS token needs to be linked the
default workspace blob store.
To ensure your storage access credentials are linked to the workspace and the
associated file datastore, complete the following steps:
1. Navigate to your workspace in the Azure portal .

2. Select the storage link on the workspace Overview page.
3. On the storage page, select Access keys on the left side menu.
4. Copy the key.
5. Navigate to the Azure Machine Learning studio for your workspace.
6. In the studio, select the file datastore for which you want to provide authentication
credentials.
7. Select Update authentication .
8. Paste the key from the previous steps.
9. Select Save.
Passing data as input

TypeError: FileNotFound: No such file or directory: This error occurs if the file path you
provide isn't where the file is located. You need to make sure the way you refer to the
file is consistent with where you mounted your dataset on your compute target. To
ensure a deterministic state, we recommend using the abstract path when mounting a
dataset to a compute target. For example, in the following code we mount the dataset
under the root of the filesystem of the compute target, /tmp .
Python
# Note the leading / in '/tmp/dataset'

script_params = {
'--data-folder':
dset.as_named_input('dogscats_train').as_mount('/tmp/dataset'),
}
If you don't include the leading forward slash, '/', you'll need to prefix the working
directory for example, /mnt/batch/.../tmp/dataset on the compute target to indicate
where you want the dataset to be mounted.
Next steps
Auto train machine learning models with TabularDatasets.
Train image classification models with FileDatasets.
Train with datasets using pipelines.

Detect data drift (preview) on datasets
Article • 03/02/2023
Learn how to monitor data drift and set alerts when drift is high.
With Azure Machine Learning dataset monitors (preview), you can:
Analyze drift in your data to understand how it changes over time.

Monitor model data for differences between training and serving datasets. Start by collecting model data from deployed models.
Monitor new data for differences between any baseline and target dataset.
Profile features in data to track how statistical properties change over time.
Set up alerts on data drift for early warnings to potential issues.
Create a new dataset version when you determine the data has drifted too much.
An Azure Machine Learning dataset is used to create the monitor. The dataset must include a timestamp column.
You can view data drift metrics with the Python SDK or in Azure Machine Learning studio. Other metrics and insights are available through
the Azure Application Insights resource associated with the Azure Machine Learning workspace.
） Important
Data drift detection for datasets is currently in public preview. The preview version is provided without a service level agreement, and
it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For
more information, see Supplemental Terms of Use for Microsoft Azure Previews .
Prerequisites
To create and work with dataset monitors, you need:
An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of
Azure Machine Learning today.
The Azure Machine Learning SDK for Python installed, which includes the azureml-datasets package.
Structured (tabular) data with a timestamp specified in the file path, file name, or column in the data.
What is data drift?

Data drift is one of the top reasons model accuracy degrades over time. For machine learning models, data drift is the change in model
input data that leads to model performance degradation. Monitoring data drift helps detect these model performance issues.
Causes of data drift include:
Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
Data quality issues, such as a broken sensor always reading 0.
Natural drift in the data, such as mean temperature changing with the seasons.
Change in relation between features, or covariate shift.
Azure Machine Learning simplifies drift detection by computing a single metric abstracting the complexity of datasets being compared.
These datasets may have hundreds of features and tens of thousands of rows. Once drift is detected, you drill down into which features are
causing the drift. You then inspect feature level metrics to debug and isolate the root cause for the drift.
This top down approach makes it easy to monitor data instead of traditional rules-based techniques. Rules-based techniques such as
allowed data range or allowed unique values can be time consuming and error prone.
In Azure Machine Learning, you use dataset monitors to detect and alert for data drift.
Dataset monitors
With a dataset monitor you can:
Detect and alert to data drift on new data in a dataset.

Analyze historical data for drift.
Profile new data over time.
The data drift algorithm provides an overall measure of change in data and indication of which features are responsible for further
investigation. Dataset monitors produce a number of other metrics by profiling new data in the timeseries dataset.
Custom alerting can be set up on all metrics generated by the monitor through Azure Application Insights. Dataset monitors can be used to
quickly catch data issues and reduce the time to debug the issue by identifying likely causes.
Conceptually, there are three primary scenarios for setting up dataset monitors in Azure Machine Learning.
Scenario Description
Monitor a model's serving Results from this scenario can be interpreted as monitoring a proxy for the model's accuracy, since model accuracy degrades
data for drift from the when the serving data drifts from the training data.
training data
Monitor a time series dataset This scenario is more general, and can be used to monitor datasets involved upstream or downstream of model building.
for drift from a previous time The target dataset must have a timestamp column. The baseline dataset can be any tabular dataset that has features in
period. common with the target dataset.
Perform analysis on past This scenario can be used to understand historical data and inform decisions in settings for dataset monitors.
data.
Dataset monitors depend on the following Azure services.
Azure service Description
Dataset Drift uses Machine Learning datasets to retrieve training data and compare data for model training. Generating profile of data is used to
generate some of the reported metrics such as min, max, distinct values, distinct values count.
Azureml pipeline The drift calculation job is hosted in azureml pipeline. The job is triggered on demand or by schedule to run on a compute configured at
and compute drift monitor creation time.
Application Drift emits metrics to Application Insights belonging to the machine learning workspace.
insights
Azure blob Drift emits metrics in json format to Azure blob storage.
storage
Baseline and target datasets

You monitor Azure machine learning datasets for data drift. When you create a dataset monitor, you will reference your:
Baseline dataset - usually the training dataset for a model.

Target dataset - usually model input data - is compared over time to your baseline dataset. This comparison means that your target
dataset must have a timestamp column specified.
The monitor will compare the baseline and target datasets.
Create target dataset

The target dataset needs the timeseries trait set on it by specifying the timestamp column either from a column in the data or a virtual
column derived from the path pattern of the files. Create the dataset with a timestamp through the Python SDK or Azure Machine Learning
studio. A column representing a "timestamp" must be specified to add timeseries trait to the dataset. If your data is partitioned into folder
structure with time info, such as '{yyyy/MM/dd}', create a virtual column through the path pattern setting and set it as the "partition
timestamp" to enable time series API functionality.
Python SDK
The Dataset class with_timestamp_columns() method defines the time stamp column for the dataset.
Python
from azureml.core import Workspace, Dataset, Datastore
# get workspace object

# get datastore object

dstore = Datastore.get(ws, 'your datastore name')
# specify datastore paths

dstore_paths = [(dstore, 'weather/*/*/*/*/data.parquet')]
# specify partition format

partition_format = 'weather/{state}/{date:yyyy/MM/dd}/data.parquet'
# create the Tabular dataset with 'state' and 'date' as virtual columns
dset = Dataset.Tabular.from_parquet_files(path=dstore_paths, partition_format=partition_format)
# assign the timestamp attribute to a real or virtual column in the dataset

dset = dset.with_timestamp_columns('date')
# register the dataset as the target dataset

dset = dset.register(ws, 'target')
 Tip
For a full example of using the timeseries trait of datasets, see the example notebook or the datasets SDK documentation.
Create dataset monitor

Create a dataset monitor to detect and alert to data drift on a new dataset. Use either the Python SDK or Azure Machine Learning studio.
Python SDK
See the Python SDK reference documentation on data drift for full details.
The following example shows how to create a dataset monitor using the Python SDK
Python
from azureml.core import Workspace, Dataset

from azureml.datadrift import DataDriftDetector
from datetime import datetime
# get the workspace object

# get the target dataset

target = Dataset.get_by_name(ws, 'target')
# set the baseline dataset

baseline = target.time_before(datetime(2019, 2, 1))
# set up feature list

features = ['latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'snowDepth', 'stationName',
'countryOrRegion']
# set up data drift detector

monitor = DataDriftDetector.create_from_datasets(ws, 'drift-monitor', baseline, target,
compute_target='cpu-cluster',
frequency='Week',
feature_list=None,
drift_threshold=.6,
latency=24)
# get data drift detector by name

monitor = DataDriftDetector.get_by_name(ws, 'drift-monitor')
# update data drift detector

monitor = monitor.update(feature_list=features)
# run a backfill for January through May

backfill1 = monitor.backfill(datetime(2019, 1, 1), datetime(2019, 5, 1))
# run a backfill for May through today

backfill1 = monitor.backfill(datetime(2019, 5, 1), datetime.today())
# disable the pipeline schedule for the data drift detector
monitor = monitor.disable_schedule()
# enable the pipeline schedule for the data drift detector

monitor = monitor.enable_schedule()
 Tip
For a full example of setting up a timeseries dataset and data drift detector, see our example notebook .
Understand data drift results

This section shows you the results of monitoring a dataset, found in the Datasets / Dataset monitors page in Azure studio. You can update
the settings as well as analyze existing data for a specific time period on this page.
Start with the top-level insights into the magnitude of data drift and a highlight of features to be further investigated.
Metric Description
Data drift A percentage of drift between the baseline and target dataset over time. Ranging from 0 to 100, 0 indicates identical datasets and 100
magnitude indicates the Azure Machine Learning data drift model can completely tell the two datasets apart. Noise in the precise percentage measured is
expected due to machine learning techniques being used to generate this magnitude.
Top Shows the features from the dataset that have drifted the most and are therefore contributing the most to the Drift Magnitude metric. Due to
drifting covariate shift, the underlying distribution of a feature does not necessarily need to change to have relatively high feature importance.
features
Threshold Data Drift magnitude beyond the set threshold will trigger alerts. This can be configured in the monitor settings.
Drift magnitude trend

See how the dataset differs from the target dataset in the specified time period. The closer to 100%, the more the two datasets differ.
Drift magnitude by features
This section contains feature-level insights into the change in the selected feature's distribution, as well as other statistics, over time.
The target dataset is also profiled over time. The statistical distance between the baseline distribution of each feature is compared with the
target dataset's over time. Conceptually, this is similar to the data drift magnitude. However this statistical distance is for an individual
feature rather than all features. Min, max, and mean are also available.
In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the
baseline dataset's distribution and the most recent job's distribution of the same feature.
These metrics can also be retrieved in the Python SDK through the get_metrics() method on a DataDriftDetector object.
Feature details
Finally, scroll down to view details for each individual feature. Use the dropdowns above the chart to select the feature, and additionally
select the metric you want to view.
Metrics in the chart depend on the type of feature.
Numeric features
Metric Description
Wasserstein distance Minimum amount of work to transform baseline distribution into the target distribution.
Mean value Average value of the feature.
Min value Minimum value of the feature.
Max value Maximum value of the feature.
Categorical features
Metric Description
Euclidian distance Computed for categorical columns. Euclidean distance is computed on two vectors, generated from empirical distribution of the same c
Unique values Number of unique values (cardinality) of the feature.
On this chart, select a single date to compare the feature distribution between the target and this date for the displayed feature. For
numeric features, this shows two probability distributions. If the feature is numeric, a bar chart is shown.
Metrics, alerts, and events
Metrics can be queried in the Azure Application Insights resource associated with your machine learning workspace. You have access to all
features of Application Insights including set up for custom alert rules and action groups to trigger an action such as, an
Email/SMS/Push/Voice or Azure Function. Refer to the complete Application Insights documentation for details.
To get started, navigate to the Azure portal and select your workspace's Overview page. The associated Application Insights resource is
on the far right:
Select Logs (Analytics) under Monitoring on the left pane:

The dataset monitor metrics are stored as customMetrics . You can write and run a query after setting up a dataset monitor to view them:
After identifying metrics to set up alert rules, create a new alert rule:
You can use an existing action group, or create a new one to define the action to be taken when the set conditions are met:
Troubleshooting
Limitations and known issues for data drift monitors:
The time range when analyzing historical data is limited to 31 intervals of the monitor's frequency setting.
Limitation of 200 features, unless a feature list is not specified (all features used).
Compute size must be large enough to handle the data.
Ensure your dataset has data within the start and end date for a given monitor job.
Dataset monitors will only work on datasets that contain 50 rows or more.
Columns, or features, in the dataset are classified as categorical or numeric based on the conditions in the following table. If the
feature does not meet these conditions - for instance, a column of type string with >100 unique values - the feature is dropped from
our data drift algorithm, but is still profiled.
Feature Data Condition Limitations

type type
Categorical string The number of unique values in the feature is less than 100 and less than 5% of the Null is treated as its own category.
number of rows.
Numerical int, The values in the feature are of a numerical data type and do not meet the condition for Feature dropped if >15% of values
float a categorical feature. are null.
When you have created a data drift monitor but cannot see data on the Dataset monitors page in Azure Machine Learning studio, try
the following.
1. Check if you have selected the right date range at the top of the page.
2. On the Dataset Monitors tab, select the experiment link to check job status. This link is on the far right of the table.
3. If the job completed successfully, check the driver logs to see how many metrics have been generated or if there's any warning
messages. Find driver logs in the Output + logs tab after you click on an experiment.
If the SDK backfill() function does not generate the expected output, it may be due to an authentication issue. When you create the
compute to pass into this function, do not use Run.get_context().experiment.workspace.compute_targets . Instead, use
ServicePrincipalAuthentication such as the following to create the compute that you pass into that backfill() function:
Python
auth = ServicePrincipalAuthentication(
tenant_id=tenant_id,
service_principal_id=app_id,
service_principal_password=client_secret
)
ws = Workspace.get("xxx", auth=auth, subscription_id="xxx", resource_group="xxx")
compute = ws.compute_targets.get("xxx")
From the Model Data Collector, it can take up to (but usually less than) 10 minutes for data to arrive in your blob storage account. In a
script or Notebook, wait 10 minutes to ensure cells below will run.
Python
import time
time.sleep(600)
Next steps
Head to the Azure Machine Learning studio or the Python notebook to set up a dataset monitor.
See how to set up data drift on models deployed to Azure Kubernetes Service.
Set up dataset drift monitors with Azure Event Grid.
Version and track Azure Machine
Learning datasets
Article • 03/02/2023
In this article, you'll learn how to version and track Azure Machine Learning datasets for
reproducibility. Dataset versioning is a way to bookmark the state of your data so that
you can apply a specific version of the dataset for future experiments.
Typical versioning scenarios:
When new data is available for retraining

When you're applying different data preparation or feature engineering
approaches
Prerequisites
For this tutorial, you need:
Azure Machine Learning SDK for Python installed. This SDK includes the azureml-
datasets package.
An Azure Machine Learning workspace. Retrieve an existing one by running the

following code, or create a new workspace.
Python
import azureml.core
An Azure Machine Learning dataset.
Register and retrieve dataset versions

By registering a dataset, you can version, reuse, and share it across experiments and
with colleagues. You can register multiple datasets under the same name and retrieve a
specific version by name and version number.
Register a dataset version
The following code registers a new version of the titanic_ds dataset by setting the
create_new_version parameter to True . If there's no existing titanic_ds dataset
registered with the workspace, the code creates a new dataset with the name
titanic_ds and sets its version to 1.
Python
titanic_ds = titanic_ds.register(workspace = workspace,

description = 'titanic training data',
Retrieve a dataset by name

By default, the get_by_name() method on the Dataset class returns the latest version of
the dataset registered with the workspace.
The following code gets version 1 of the titanic_ds dataset.
Python

# Get a dataset by name and version number
titanic_ds = Dataset.get_by_name(workspace = workspace,
version = 1)
Versioning best practice

When you create a dataset version, you're not creating an extra copy of data with the
workspace. Because datasets are references to the data in your storage service, you have
a single source of truth, managed by your storage service.
） Important
If the data referenced by your dataset is overwritten or deleted, calling a specific

version of the dataset does not revert the change.
When you load data from a dataset, the current data content referenced by the dataset
is always loaded. If you want to make sure that each dataset version is reproducible, we
recommend that you not modify data content referenced by the dataset version. When
new data comes in, save new data files into a separate data folder and then create a new
dataset version to include data from that new folder.
The following image and sample code show the recommended way to structure your
data folders and to create dataset versions that reference those folders:
Python
# get the default datastore of the workspace

datastore = workspace.get_default_datastore()
# create & register weather_ds version 1 pointing to all files in the folder
of week 27
datastore_path1 = [(datastore, 'Weather/week 27')]
dataset1 = Dataset.File.from_files(path=datastore_path1)
dataset1.register(workspace = workspace,
name = 'weather_ds',
description = 'weather data in week 27',
# create & register weather_ds version 2 pointing to all files in the folder
of week 27 and 28
datastore_path2 = [(datastore, 'Weather/week 27'), (datastore, 'Weather/week
28')]
dataset2 = Dataset.File.from_files(path = datastore_path2)
dataset2.register(workspace = workspace,
name = 'weather_ds',
description = 'weather data in week 27, 28',
Version an ML pipeline output dataset

You can use a dataset as the input and output of each ML pipeline step. When you rerun
pipelines, the output of each pipeline step is registered as a new dataset version.
ML pipelines populate the output of each step into a new folder every time the pipeline
reruns. This behavior allows the versioned output datasets to be reproducible. Learn
more about datasets in pipelines.
Python

from azureml.pipeline.core import Pipeline, PipelineData
from azureml.core. runconfig import CondaDependencies, RunConfiguration
# get input dataset

input_ds = Dataset.get_by_name(workspace, 'weather_ds')
# register pipeline output as dataset

output_ds = PipelineData('prepared_weather_ds',
datastore=datastore).as_dataset()
output_ds = output_ds.register(name='prepared_weather_ds',
create_new_version=True)
conda = CondaDependencies.create(
pip_packages=['azureml-defaults', 'azureml-dataprep[fuse,pandas]'],
pin_sdk_version=False)
run_config = RunConfiguration()
run_config.environment.docker.enabled = True
run_config.environment.python.conda_dependencies = conda
# configure pipeline step to use dataset as the input and output

prep_step = PythonScriptStep(script_name="prepare.py",
inputs=[input_ds.as_named_input('weather_ds')],
outputs=[output_ds],
runconfig=run_config,
source_directory=project_folder)
Track data in your experiments

Azure Machine Learning tracks your data throughout your experiment as input and
output datasets.
The following are scenarios where your data is tracked as an input dataset.
As a DatasetConsumptionConfig object through either the inputs or arguments

parameter of your ScriptRunConfig object when submitting the experiment job.
When methods like, get_by_name() or get_by_id() are called in your script. For this
scenario, the name assigned to the dataset when you registered it to the
workspace is the name displayed.
The following are scenarios where your data is tracked as an output dataset.
Pass an OutputFileDatasetConfig object through either the outputs or arguments

parameter when submitting an experiment job. OutputFileDatasetConfig objects
can also be used to persist data between pipeline steps. See Move data between
ML pipeline steps.
Register a dataset in your script. For this scenario, the name assigned to the
dataset when you registered it to the workspace is the name displayed. In the
following example, training_ds is the name that would be displayed.
Python
training_ds = unregistered_ds.register(workspace = workspace,

name = 'training_ds',
description = 'training data'
)
Submit child job with an unregistered dataset in script. This results in an

anonymous saved dataset.
Trace datasets in experiment jobs

For each Machine Learning experiment, you can easily trace the datasets used as input
with the experiment Job object.
The following code uses the get_details() method to track which input datasets were
used with the experiment run:
Python
# get input datasets

inputs = run.get_details()['inputDatasets']
input_dataset = inputs[0]['dataset']
# list the files referenced by input_dataset

input_dataset.to_path()
You can also find the input_datasets from experiments by using the Azure Machine
Learning studio.
The following image shows where to find the input dataset of an experiment on Azure
Machine Learning studio. For this example, go to your Experiments pane and open the
Properties tab for a specific run of your experiment, keras-mnist .
Use the following code to register models with datasets:
Python
model = run.register_model(model_name='keras-mlp-mnist',
model_path=model_path,
datasets =[('training data',train_dataset)])
After registration, you can see the list of models registered with the dataset by using
Python or go to the studio .
The following view is from the Datasets pane under Assets. Select the dataset and then
select the Models tab for a list of the models that are registered with the dataset.
Next steps
Train with datasets
More sample dataset notebooks
Create and explore Azure Machine
Learning dataset with labels
Article • 03/02/2023
In this article, you'll learn how to export the data labels from an Azure Machine Learning
data labeling project and load them into popular formats such as, a pandas dataframe
for data exploration.
What are datasets with labels

Azure Machine Learning datasets with labels are referred to as labeled datasets. These
specific datasets are TabularDatasets with a dedicated label column and are only created
as an output of Azure Machine Learning data labeling projects. Create a data labeling
project for image labeling or text labeling. Machine Learning supports data labeling
projects for image classification, either multi-label or multi-class, and object
identification together with bounded boxes.
Prerequisites
An Azure subscription. If you don’t have an Azure subscription, create a free
account before you begin.
The Azure Machine Learning SDK for Python, or access to Azure Machine Learning
studio .
A Machine Learning workspace. See Create workspace resources.
Access to an Azure Machine Learning data labeling project. If you don't have a
labeling project, first create one for image labeling or text labeling.
Export data labels

When you complete a data labeling project, you can export the label data from a
labeling project. Doing so, allows you to capture both the reference to the data and its
labels, and export them in COCO format or as an Azure Machine Learning dataset.
Use the Export button on the Project details page of your labeling project.
COCO
The COCO file is created in the default blob store of the Azure Machine Learning
workspace in a folder within export/coco.
７ Note
In object detection projects, the exported "bbox": [x,y,width,height]" values in

COCO file are normalized. They are scaled to 1. Example : a bounding box at (10,
10) location, with 30 pixels width , 60 pixels height, in a 640x480 pixel image will be
annotated as (0.015625. 0.02083, 0.046875, 0.125). Since the coordintes are
normalized, it will show as '0.0' as "width" and "height" for all images. The actual
width and height can be obtained using Python library like OpenCV or Pillow(PIL).
Azure Machine Learning dataset

You can access the exported Azure Machine Learning dataset in the Datasets section of
your Azure Machine Learning studio. The dataset Details page also provides sample
code to access your labels from Python.
 Tip
Once you have exported your labeled data to an Azure Machine Learning dataset,
you can use AutoML to build computer vision models trained on your labeled data.
Learn more at Set up AutoML to train computer vision models with Python
Explore labeled datasets via pandas dataframe

Load your labeled datasets into a pandas dataframe to leverage popular open-source
libraries for data exploration with the to_pandas_dataframe() method from the azureml-
dataprep class.
Install the class with the following shell command:
shell
pip install azureml-dataprep
In the following code, the animal_labels dataset is the output from a labeling project
previously saved to the workspace. The exported dataset is a TabularDataset.
Python
import azureml.core
from azureml.core import Dataset, Workspace
# get animal_labels dataset from the workspace

animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
animal_pd = animal_labels.to_pandas_dataframe()

import matplotlib.image as mpimg
#read images from dataset

img = mpimg.imread(animal_pd['image_url'].iloc(0).open())
imgplot = plt.imshow(img)
Next steps
Learn to train image classification models in Azure
Set up AutoML to train computer vision models with Python
Data ingestion with Azure Data Factory
Article • 03/02/2023
In this article, you learn about the available options for building a data ingestion
pipeline with Azure Data Factory. This Azure Data Factory pipeline is used to ingest data
for use with Azure Machine Learning. Data Factory allows you to easily extract,
transform, and load (ETL) data. Once the data has been transformed and loaded into
storage, it can be used to train your machine learning models in Azure Machine
Learning.
Simple data transformation can be handled with native Data Factory activities and
instruments such as data flow. When it comes to more complicated scenarios, the data
can be processed with some custom code. For example, Python or R code.
Compare Azure Data Factory data ingestion

pipelines
There are several common techniques of using Data Factory to transform data during
ingestion. Each technique has advantages and disadvantages that help determine if it's a
good fit for a specific use case:
Technique Advantages Disadvantages
Data Factory + Azure Low latency, Only good for short running
Functions serverless compute processing
Stateful functions
Reusable functions
Data Factory + custom Large-scale parallel Requires wrapping code into an

component computing executable
Suited for heavy Complexity of handling
algorithms dependencies and IO
Data Factory + Azure Apache Spark Can be expensive

Databricks notebook Native Python Creating clusters initially takes time
environment and adds latency
Azure Data Factory with Azure functions

Azure Functions allows you to run small pieces of code (functions) without worrying
about application infrastructure. In this option, the data is processed with custom
Python code wrapped into an Azure Function.
The function is invoked with the Azure Data Factory Azure Function activity. This
approach is a good option for lightweight data transformations.
Advantages:
The data is processed on a serverless compute with a relatively low latency
Data Factory pipeline can invoke a Durable Azure Function that may implement
a sophisticated data transformation flow
The details of the data transformation are abstracted away by the Azure
Function that can be reused and invoked from other places
Disadvantages:
The Azure Functions must be created before use with ADF
Azure Functions is good only for short running data processing
Azure Data Factory with Custom Component

activity
In this option, the data is processed with custom Python code wrapped into an
executable. It's invoked with an Azure Data Factory Custom Component activity. This
approach is a better fit for large data than the previous technique.
Advantages:
The data is processed on Azure Batch pool, which provides large-scale parallel
and high-performance computing
Can be used to run heavy algorithms and process significant amounts of data
Disadvantages:
Azure Batch pool must be created before use with Data Factory
Over engineering related to wrapping Python code into an executable.
Complexity of handling dependencies and input/output parameters
Azure Data Factory with Azure Databricks

Python notebook
Azure Databricks is an Apache Spark-based analytics platform in the Microsoft cloud.
In this technique, the data transformation is performed by a Python notebook, running

on an Azure Databricks cluster. This is probably, the most common approach that uses
the full power of an Azure Databricks service. It's designed for distributed data
processing at scale.
Advantages:
The data is transformed on the most powerful data processing Azure service,
which is backed up by Apache Spark environment
Native support of Python along with data science frameworks and libraries
including TensorFlow, PyTorch, and scikit-learn
There's no need to wrap the Python code into functions or executable modules.
The code works as is.
Disadvantages:
Azure Databricks infrastructure must be created before use with Data Factory
Can be expensive depending on Azure Databricks configuration
Spinning up compute clusters from "cold" mode takes some time that brings
high latency to the solution
Consume data in Azure Machine Learning

The Data Factory pipeline saves the prepared data to your cloud storage (such as Azure
Blob or Azure Data Lake).
Consume your prepared data in Azure Machine Learning by,
Invoking an Azure Machine Learning pipeline from your Data Factory pipeline.
OR
Creating an Azure Machine Learning datastore.
Invoke Azure Machine Learning pipeline from Data

Factory
This method is recommended for Machine Learning Operations (MLOps) workflows. If
you don't want to set up an Azure Machine Learning pipeline, see Read data directly
from storage.
Each time the Data Factory pipeline runs,
1. The data is saved to a different location in storage.

2. To pass the location to Azure Machine Learning, the Data Factory pipeline calls an
Azure Machine Learning pipeline. When calling the ML pipeline, the data location
and job ID are sent as parameters.
3. The ML pipeline can then create an Azure Machine Learning datastore and dataset
with the data location. Learn more in Execute Azure Machine Learning pipelines in
Data Factory.
 Tip
Datasets support versioning, so the ML pipeline can register a new version of the
dataset that points to the most recent data from the ADF pipeline.
Once the data is accessible through a datastore or dataset, you can use it to train an ML
model. The training process might be part of the same ML pipeline that is called from
ADF. Or it might be a separate process such as experimentation in a Jupyter notebook.
Since datasets support versioning, and each job from the pipeline creates a new version,
it's easy to understand which version of the data was used to train a model.
Read data directly from storage
If you don't want to create an ML pipeline, you can access the data directly from the
storage account where your prepared data is saved with an Azure Machine Learning
datastore and dataset.
The following Python code demonstrates how to create a datastore that connects to
Azure DataLake Generation 2 storage. Learn more about datastores and where to find
service principal permissions.
Python
adlsgen2_datastore_name = '<ADLS gen2 storage account alias>' #set ADLS
Gen2 storage account alias in Azure Machine Learning
subscription_id=os.getenv("ADL_SUBSCRIPTION", "<ADLS account subscription

ID>") # subscription id of ADLS account
resource_group=os.getenv("ADL_RESOURCE_GROUP", "<ADLS account resource
group>") # resource group of ADLS account
account_name=os.getenv("ADLSGEN2_ACCOUNTNAME", "<ADLS account name>") # ADLS

Gen2 account name
tenant_id=os.getenv("ADLSGEN2_TENANT", "<tenant id of service principal>") #
tenant id of service principal
client_id=os.getenv("ADLSGEN2_CLIENTID", "<client id of service principal>")
# client id of service principal
client_secret=os.getenv("ADLSGEN2_CLIENT_SECRET", "<secret of service
principal>") # the secret of service principal
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name, # ADLS Gen2 account name
filesystem='<filesystem name>', # ADLS Gen2 filesystem
tenant_id=tenant_id, # tenant id of service principal
client_id=client_id, # client id of service principal
Next, create a dataset to reference the file(s) you want to use in your machine learning
task.
The following code creates a TabularDataset from a csv file, prepared-data.csv . Learn
more about dataset types and accepted file formats.
Python
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
# retrieve data via Azure Machine Learning datastore

datastore = Datastore.get(ws, adlsgen2_datastore)
datastore_path = [(datastore, '/data/prepared-data.csv')]
prepared_dataset = Dataset.Tabular.from_delimited_files(path=datastore_path)
From here, use prepared_dataset to reference your prepared data, like in your training
scripts. Learn how to Train models with datasets in Azure Machine Learning.
Next steps
Run a Databricks notebook in Azure Data Factory
Access data in Azure storage services
Train models with datasets in Azure Machine Learning.
DevOps for a data ingestion pipeline
Data wrangling with Apache Spark
pools (deprecated)
Article • 06/01/2023
２ Warning
The Azure Synapse Analytics integration with Azure Machine Learning available in
Python SDK v1 is deprecated. Users can continue using Synapse workspace
registered with Azure Machine Learning as a linked service. However, a new
Synapse workspace can no longer be registered with Azure Machine Learning as a
linked service. We recommend using Managed (Automatic) Synapse compute and
attached Synapse Spark pools available in CLI v2 and Python SDK v2. Please see
https://aka.ms/aml-spark for more details.
In this article, you learn how to perform data wrangling tasks interactively within a
dedicated Synapse session, powered by Azure Synapse Analytics, in a Jupyter notebook
using the Azure Machine Learning Python SDK.
If you prefer to use Azure Machine Learning pipelines, see How to use Apache Spark
(powered by Azure Synapse Analytics) in your machine learning pipeline (preview).
For guidance on how to use Azure Synapse Analytics with a Synapse workspace, see the
Azure Synapse Analytics get started series.
Azure Machine Learning and Azure Synapse

Analytics integration
The Azure Synapse Analytics integration with Azure Machine Learning (preview) allows
you to attach an Apache Spark pool backed by Azure Synapse for interactive data
exploration and preparation. With this integration, you can have a dedicated compute
for data wrangling at scale, all within the same Python notebook you use for training
your machine learning models.
Prerequisites
The Azure Machine Learning Python SDK installed.
Create an Azure Machine Learning workspace.
Create an Azure Synapse Analytics workspace in Azure portal.
Create Apache Spark pool using Azure portal, web tools, or Synapse Studio.
Configure your development environment to install the Azure Machine Learning

SDK, or use an Azure Machine Learning compute instance with the SDK already
installed.
Install the azureml-synapse package (preview) with the following code:
Python
pip install azureml-synapse
Link your Azure Machine Learning workspace and Azure Synapse Analytics
workspace with the Azure Machine Learning Python SDK or via the Azure Machine
Learning studio
Attach a Synapse Spark pool as a compute target.
Launch Synapse Spark pool for data wrangling

tasks
To begin data preparation with the Apache Spark pool, specify the attached Spark
Synapse compute name. This name can be found via the Azure Machine Learning studio
under the Attached computes tab.
） Important
To continue use of the Apache Spark pool you must indicate which compute
resource to use throughout your data wrangling tasks with %synapse for single lines
of code and %%synapse for multiple lines. Learn more about the %synapse magic
command.
Python
%synapse start -c SynapseSparkPoolAlias
After the session starts, you can check the session's metadata.
Python
%synapse meta
You can specify an Azure Machine Learning environment to use during your Apache
Spark session. Only Conda dependencies specified in the environment will take effect.
Docker image isn't supported.
２ Warning
Python dependencies specified in environment Conda dependencies are not

supported in Apache Spark pools. Currently, only fixed Python versions are
supported. Check your Python version by including sys.version_info in your script.
The following code, creates the environment, myenv , which installs azureml-core version
1.20.0 and numpy version 1.17.0 before the session begins. You can then include this
environment in your Apache Spark session start statement.
Python
from azureml.core import Workspace, Environment
# creates environment with numpy and azureml-core dependencies

env = Environment(name="myenv")
env.python.conda_dependencies.add_pip_package("azureml-core==1.20.0")
env.python.conda_dependencies.add_conda_package("numpy==1.17.0")
env.register(workspace=ws)
To begin data preparation with the Apache Spark pool and your custom environment,
specify the Apache Spark pool name and which environment to use during the Apache
Spark session. Furthermore, you can provide your subscription ID, the machine learning
workspace resource group, and the name of the machine learning workspace.
） Important
Make sure to Allow session level packages is enabled in the linked Synapse
workspace.
Python
%synapse start -c SynapseSparkPoolAlias -e myenv -s

AzureMLworkspaceSubscriptionID -r AzureMLworkspaceResourceGroupName -w
AzureMLworkspaceName
Load data from storage

Once your Apache Spark session starts, read in the data that you wish to prepare. Data
loading is supported for Azure Blob storage and Azure Data Lake Storage Generations 1
and 2.
There are two ways to load data from these storage services:
Directly load data from storage using its Hadoop Distributed Files System (HDFS)
path.
Read in data from an existing Azure Machine Learning dataset.
To access these storage services, you need Storage Blob Data Reader permissions. If
you plan to write data back to these storage services, you need Storage Blob Data
Contributor permissions. Learn more about storage permissions and roles.
Load data with Hadoop Distributed Files System (HDFS)

path
To load and read data in from storage with the corresponding HDFS path, you need to
have your data access authentication credentials readily available. These credentials
differ depending on your storage type.
The following code demonstrates how to read data from an Azure Blob storage into a
Spark dataframe with either your shared access signature (SAS) token or access key.
Python
%%synapse
# setup access key or SAS token

sc._jsc.hadoopConfiguration().set("fs.azure.account.key.<storage account
name>.blob.core.windows.net", "<access key>")
sc._jsc.hadoopConfiguration().set("fs.azure.sas.<container name>.<storage
account name>.blob.core.windows.net", "<sas token>")
# read from blob

df = spark.read.option("header",
"true").csv("wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv")
The following code demonstrates how to read data in from Azure Data Lake Storage
Generation 1 (ADLS Gen 1) with your service principal credentials.
Python
%%synapse
# setup service principal which has access of the data

sc._jsc.hadoopConfiguration().set("fs.adl.account.<storage account
name>.oauth2.access.token.provider.type","ClientCredential")
name>.oauth2.client.id", "<client id>")
name>.oauth2.credential", "<client secret>")
name>.oauth2.refresh.url",
"https://login.microsoftonline.com/<tenant id>/oauth2/token")
df = spark.read.csv("adl://<storage account
name>.azuredatalakestore.net/<path>")
The following code demonstrates how to read data in from Azure Data Lake Storage
Generation 2 (ADLS Gen 2) with your service principal credentials.
Python
%%synapse
# setup service principal which has access of the data

sc._jsc.hadoopConfiguration().set("fs.azure.account.auth.type.<storage
account name>.dfs.core.windows.net","OAuth")
sc._jsc.hadoopConfiguration().set("fs.azure.account.oauth.provider.type.
<storage account name>.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
sc._jsc.hadoopConfiguration().set("fs.azure.account.oauth2.client.id.
<storage account name>.dfs.core.windows.net", "<client id>")
sc._jsc.hadoopConfiguration().set("fs.azure.account.oauth2.client.secret.
<storage account name>.dfs.core.windows.net", "<client secret>")
sc._jsc.hadoopConfiguration().set("fs.azure.account.oauth2.client.endpoint.
<storage account name>.dfs.core.windows.net",
"https://login.microsoftonline.com/<tenant id>/oauth2/token")
df = spark.read.csv("abfss://<container name>@<storage
account>.dfs.core.windows.net/<path>")
Read in data from registered datasets

You can also get an existing registered dataset in your workspace and perform data
preparation on it by converting it into a spark dataframe.
The following example authenticates to the workspace, gets a registered TabularDataset,

blob_dset , that references files in blob storage, and converts it into a spark dataframe.
When you convert your datasets into a spark dataframe, you can use pyspark data
exploration and preparation libraries.
Python
%%synapse
subscription_id = "<enter your subscription ID>"

resource_group = "<enter your resource group>"
workspace_name = "<enter your workspace name>"
ws = Workspace(workspace_name = workspace_name,
subscription_id = subscription_id,
resource_group = resource_group)
dset = Dataset.get_by_name(ws, "blob_dset")

spark_df = dset.to_spark_dataframe()
Perform data wrangling tasks

After you've retrieved and explored your data, you can perform data wrangling tasks.
The following code, expands upon the HDFS example in the previous section and filters
the data in spark dataframe, df , based on the Survivor column and groups that list by
Age
Python
%%synapse
from pyspark.sql.functions import col, desc
df.filter(col('Survived') ==
1).groupBy('Age').count().orderBy(desc('count')).show(10)
df.show()
Save data to storage and stop spark session

Once your data exploration and preparation is complete, store your prepared data for
later use in your storage account on Azure.
In the following example, the prepared data is written back to Azure Blob storage and
overwrites the original Titanic.csv file in the training_data directory. To write back to
storage, you need Storage Blob Data Contributor permissions. Learn more about
storage permissions and roles.
Python
%% synapse
df.write.format("csv").mode("overwrite").save("wasbs://demo@dprepdata.blob.c
ore.windows.net/training_data/Titanic.csv")
When you've completed data preparation and saved your prepared data to storage,
stop using your Apache Spark pool with the following command.
Python
%synapse stop
Create dataset to represent prepared data

When you're ready to consume your prepared data for model training, connect to your
storage with an Azure Machine Learning datastore, and specify which file(s) you want to
use with an Azure Machine Learning dataset.
The following code example,
Assumes you already created a datastore that connects to the storage service
where you saved your prepared data.
Gets that existing datastore, mydatastore , from the workspace, ws with the get()
method.
Creates a FileDataset, train_ds , that references the prepared data files located in
the training_data directory in mydatastore .
Creates the variable input1 , which can be used at a later time to make the data
files of the train_ds dataset available to a compute target for your training tasks.
Python
from azureml.core import Datastore, Dataset
datastore = Datastore.get(ws, datastore_name='mydatastore')
datastore_paths = [(datastore, '/training_data/')]

train_ds = Dataset.File.from_files(path=datastore_paths, validate=True)
input1 = train_ds.as_mount()
Use a ScriptRunConfig to submit an experiment
run to a Synapse Spark pool
If you're ready to automate and productionize your data wrangling tasks, you can
submit an experiment run to an attached Synapse Spark pool with the ScriptRunConfig
object.
Similarly, if you have an Azure Machine Learning pipeline, you can use the
SynapseSparkStep to specify your Synapse Spark pool as the compute target for the
data preparation step in your pipeline.
Making your data available to the Synapse Spark pool depends on your dataset type.
For a FileDataset, you can use the as_hdfs() method. When the run is submitted,
the dataset is made available to the Synapse Spark pool as a Hadoop distributed
file system (HFDS).
For a TabularDataset, you can use the as_named_input() method.
The following code,
Creates the variable input2 from the FileDataset train_ds that was created in the
previous code example.
Creates the variable output with the HDFSOutputDatasetConfiguration class. After
the run is complete, this class allows us to save the output of the run as the
dataset, test in the datastore, mydatastore . In the Azure Machine Learning
workspace, the test dataset is registered under the name registered_dataset .
Configures settings the run should use in order to perform on the Synapse Spark
pool.
Defines the ScriptRunConfig parameters to,
Use the dataprep.py , for the run.
Specify which data to use as input and how to make it available to the Synapse
Spark pool.
Specify where to store output data, output .
Python
from azureml.core import Dataset, HDFSOutputDatasetConfig

from azureml.core.environment import CondaDependencies
from azureml.core import RunConfiguration
input2 = train_ds.as_hdfs()
output = HDFSOutputDatasetConfig(destination=(datastore,
"test").register_on_complete(name="registered_dataset")
run_config = RunConfiguration(framework="pyspark")
run_config.target = synapse_compute_name
run_config.spark.configuration["spark.driver.memory"] = "1g"
run_config.spark.configuration["spark.driver.cores"] = 2
run_config.spark.configuration["spark.executor.memory"] = "1g"
run_config.spark.configuration["spark.executor.cores"] = 1
run_config.spark.configuration["spark.executor.instances"] = 1
conda_dep.add_pip_package("azureml-core==1.20.0")
run_config.environment.python.conda_dependencies = conda_dep
script_run_config = ScriptRunConfig(source_directory = './code',

script= 'dataprep.py',
arguments = ["--file_input", input2,
"--output_dir", output],
run_config = run_config)
For more information about run_config.spark.configuration and general Spark

configuration, see SparkConfiguration Class and Apache Spark's configuration
documentation .
Once your ScriptRunConfig object is set up, you can submit the run.
Python
exp = Experiment(workspace=ws, name="synapse-spark")

run = exp.submit(config=script_run_config)
run
For more details, like the dataprep.py script used in this example, see the example
notebook .
After your data is prepared, you can then use it as input for your training jobs. In the
aforementioned code example, the registered_dataset is what you would specify as
your input data for training jobs.
Example notebooks
See the example notebooks for more concepts and demonstrations of the Azure
Synapse Analytics and Azure Machine Learning integration capabilities.
Run an interactive Spark session from a notebook in your Azure Machine Learning
workspace .
Submit an Azure Machine Learning experiment run with a Synapse Spark pool as
your compute target .
Next steps
Train a model.
Train with Azure Machine Learning dataset.
DevOps for a data ingestion pipeline
Article • 03/02/2023
In most scenarios, a data ingestion solution is a composition of scripts, service

invocations, and a pipeline orchestrating all the activities. In this article, you learn how to
apply DevOps practices to the development lifecycle of a common data ingestion
pipeline that prepares data for machine learning model training. The pipeline is built
using the following Azure services:
Azure Data Factory: Reads the raw data and orchestrates data preparation.
Azure Databricks: Runs a Python notebook that transforms the data.
Azure Pipelines: Automates a continuous integration and development process.
Data ingestion pipeline workflow

The data ingestion pipeline implements the following workflow:
1. Raw data is read into an Azure Data Factory (ADF) pipeline.

2. The ADF pipeline sends the data to an Azure Databricks cluster, which runs a
Python notebook to transform the data.
3. The data is stored to a blob container, where it can be used by Azure Machine
Learning to train a model.
Continuous integration and delivery overview

As with many software solutions, there is a team (for example, Data Engineers) working
on it. They collaborate and share the same Azure resources such as Azure Data Factory,
Azure Databricks, and Azure Storage accounts. The collection of these resources is a
Development environment. The data engineers contribute to the same source code
base.
A continuous integration and delivery system automates the process of building, testing,
and delivering (deploying) the solution. The Continuous Integration (CI) process
performs the following tasks:
Assembles the code

Checks it with the code quality tests
Runs unit tests
Produces artifacts such as tested code and Azure Resource Manager templates
The Continuous Delivery (CD) process deploys the artifacts to the downstream
environments.
This article demonstrates how to automate the CI and CD processes with Azure
Pipelines .
Source control management

Source control management is needed to track changes and enable collaboration
between team members. For example, the code would be stored in an Azure DevOps,
GitHub, or GitLab repository. The collaboration workflow is based on a branching model.
For example, GitFlow .
Python Notebook Source Code

The data engineers work with the Python notebook source code either locally in an IDE
(for example, Visual Studio Code ) or directly in the Databricks workspace. Once the
code changes are complete, they are merged to the repository following a branching
policy.
 Tip
We recommended storing the code in .py files rather than in .ipynb Jupyter
Notebook format. It improves the code readability and enables automatic code
quality checks in the CI process.
Azure Data Factory Source Code

The source code of Azure Data Factory pipelines is a collection of JSON files generated
by an Azure Data Factory workspace. Normally the data engineers work with a visual
designer in the Azure Data Factory workspace rather than with the source code files
directly.
To configure the workspace to use a source control repository, see Author with Azure
Repos Git integration.
Continuous integration (CI)

The ultimate goal of the Continuous Integration process is to gather the joint team work
from the source code and prepare it for the deployment to the downstream
environments. As with the source code management this process is different for the
Python notebooks and Azure Data Factory pipelines.
Python Notebook CI
The CI process for the Python Notebooks gets the code from the collaboration branch
(for example, master or develop) and performs the following activities:
Code linting
Unit testing
Saving the code as an artifact
The following code snippet demonstrates the implementation of these steps in an Azure
DevOps yaml pipeline:
YAML
steps:
- script: |
flake8 --output-file=$(Build.BinariesDirectory)/lint-testresults.xml --
format junit-xml
workingDirectory: '$(Build.SourcesDirectory)'
displayName: 'Run flake8 (code style analysis)'
- script: |
python -m pytest --junitxml=$(Build.BinariesDirectory)/unit-
testresults.xml $(Build.SourcesDirectory)
displayName: 'Run unit tests'
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testResultsFiles: '$(Build.BinariesDirectory)/*-testresults.xml'
testRunTitle: 'Linting & Unit tests'
failTaskOnFailedTests: true
displayName: 'Publish linting and unit test results'
- publish: $(Build.SourcesDirectory)
artifact: di-notebooks
The pipeline uses flake8 to do the Python code linting. It runs the unit tests defined in
the source code and publishes the linting and test results so they're available in the
Azure Pipelines execution screen.
If the linting and unit testing is successful, the pipeline will copy the source code to the
artifact repository to be used by the subsequent deployment steps.
Azure Data Factory CI

CI process for an Azure Data Factory pipeline is a bottleneck for a data ingestion
pipeline. There's no continuous integration. A deployable artifact for Azure Data Factory
is a collection of Azure Resource Manager templates. The only way to produce those
templates is to click the publish button in the Azure Data Factory workspace.
1. The data engineers merge the source code from their feature branches into the
collaboration branch, for example, master or develop.
2. Someone with the granted permissions clicks the publish button to generate Azure
Resource Manager templates from the source code in the collaboration branch.
3. The workspace validates the pipelines (think of it as of linting and unit testing),
generates Azure Resource Manager templates (think of it as of building) and saves
the generated templates to a technical branch adf_publish in the same code
repository (think of it as of publishing artifacts). This branch is created
automatically by the Azure Data Factory workspace.
For more information on this process, see Continuous integration and delivery in Azure
Data Factory.
It's important to make sure that the generated Azure Resource Manager templates are
environment agnostic. This means that all values that may differ between environments
are parametrized. Azure Data Factory is smart enough to expose the majority of such
values as parameters. For example, in the following template the connection properties
to an Azure Machine Learning workspace are exposed as parameters:
JSON
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-
01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"factoryName": {
"value": "devops-ds-adf"
},
"AzureMLService_servicePrincipalKey": {
"value": ""
},
"AzureMLService_properties_typeProperties_subscriptionId": {
"value": "0fe1c235-5cfa-4152-17d7-5dff45a8d4ba"
},
"AzureMLService_properties_typeProperties_resourceGroupName": {
"value": "devops-ds-rg"
},
"AzureMLService_properties_typeProperties_servicePrincipalId": {
"value": "6e35e589-3b22-4edb-89d0-2ab7fc08d488"
},
"AzureMLService_properties_typeProperties_tenant": {
"value": "72f988bf-86f1-41af-912b-2d7cd611db47"
}
}
}
However, you may want to expose your custom properties that are not handled by the
Azure Data Factory workspace by default. In the scenario of this article an Azure Data
Factory pipeline invokes a Python notebook processing the data. The notebook accepts
a parameter with the name of an input data file.
Python
import pandas as pd
import numpy as np
data_file_name = getArgument("data_file_name")
data = pd.read_csv(data_file_name)
labels = np.array(data['target'])
...
This name is different for Dev, QA, UAT, and PROD environments. In a complex pipeline
with multiple activities, there can be several custom properties. It's good practice to
collect all those values in one place and define them as pipeline variables:
The pipeline activities may refer to the pipeline variables while actually using them:
The Azure Data Factory workspace doesn't expose pipeline variables as Azure Resource
Manager templates parameters by default. The workspace uses the Default
Parameterization Template dictating what pipeline properties should be exposed as
Azure Resource Manager template parameters. To add pipeline variables to the list,
update the "Microsoft.DataFactory/factories/pipelines" section of the Default
Parameterization Template with the following snippet and place the result json file in the
root of the source folder:
JSON
"Microsoft.DataFactory/factories/pipelines": {
"properties": {
"variables": {
"*": {
"defaultValue": "="
}
}
}
}
Doing so will force the Azure Data Factory workspace to add the variables to the
parameters list when the publish button is clicked:
JSON
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-
01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"factoryName": {
"value": "devops-ds-adf"
},
...
"data-ingestion-
pipeline_properties_variables_data_file_name_defaultValue": {
"value": "driver_prediction_train.csv"
}
}
}
The values in the JSON file are default values configured in the pipeline definition.
They're expected to be overridden with the target environment values when the Azure
Resource Manager template is deployed.
Continuous delivery (CD)

The Continuous Delivery process takes the artifacts and deploys them to the first target
environment. It makes sure that the solution works by running tests. If successful, it
continues to the next environment.
The CD Azure Pipelines consists of multiple stages representing the environments. Each
stage contains deployments and jobs that perform the following steps:
Deploy a Python Notebook to Azure Databricks workspace

Deploy an Azure Data Factory pipeline
Run the pipeline
Check the data ingestion result
The pipeline stages can be configured with approvals and gates that provide additional
control on how the deployment process evolves through the chain of environments.
Deploy a Python Notebook

The following code snippet defines an Azure Pipeline deployment that copies a Python
notebook to a Databricks cluster:
YAML
- stage: 'Deploy_to_QA'
displayName: 'Deploy to QA'
variables:
- group: devops-ds-qa-vg
jobs:
- deployment: "Deploy_to_Databricks"
displayName: 'Deploy to Databricks'
timeoutInMinutes: 0
environment: qa
strategy:
runOnce:
deploy:
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.x'
addToPath: true
architecture: 'x64'
displayName: 'Use Python3'
- task: configuredatabricks@0
inputs:
url: '$(DATABRICKS_URL)'
token: '$(DATABRICKS_TOKEN)'
displayName: 'Configure Databricks CLI'
- task: deploynotebooks@0
inputs:
notebooksFolderPath: '$(Pipeline.Workspace)/di-notebooks'
workspaceFolder: '/Shared/devops-ds'
displayName: 'Deploy (copy) data processing notebook to the
Databricks cluster'
The artifacts produced by the CI are automatically copied to the deployment agent and
are available in the $(Pipeline.Workspace) folder. In this case, the deployment task
refers to the di-notebooks artifact containing the Python notebook. This deployment
uses the Databricks Azure DevOps extension to copy the notebook files to the
Databricks workspace.
The Deploy_to_QA stage contains a reference to the devops-ds-qa-vg variable group

defined in the Azure DevOps project. The steps in this stage refer to the variables from
this variable group (for example, $(DATABRICKS_URL) and $(DATABRICKS_TOKEN) ). The idea
is that the next stage (for example, Deploy_to_UAT ) will operate with the same variable
names defined in its own UAT-scoped variable group.
Deploy an Azure Data Factory pipeline

A deployable artifact for Azure Data Factory is an Azure Resource Manager template. It's
going to be deployed with the Azure Resource Group Deployment task as it is
demonstrated in the following snippet:
YAML
- deployment: "Deploy_to_ADF"
displayName: 'Deploy to ADF'
timeoutInMinutes: 0
environment: qa
strategy:
runOnce:
deploy:
steps:
- task: AzureResourceGroupDeployment@2
displayName: 'Deploy ADF resources'
inputs:
azureSubscription: $(AZURE_RM_CONNECTION)
resourceGroupName: $(RESOURCE_GROUP)
location: $(LOCATION)
csmFile: '$(Pipeline.Workspace)/adf-
pipelines/ARMTemplateForFactory.json'
csmParametersFile: '$(Pipeline.Workspace)/adf-
pipelines/ARMTemplateParametersForFactory.json'
overrideParameters: -data-ingestion-
pipeline_properties_variables_data_file_name_defaultValue
"$(DATA_FILE_NAME)"
The value of the data filename parameter comes from the $(DATA_FILE_NAME) variable
defined in a QA stage variable group. Similarly, all parameters defined in
ARMTemplateForFactory.json can be overridden. If they are not, then the default values
are used.
Run the pipeline and check the data ingestion result

The next step is to make sure that the deployed solution is working. The following job
definition runs an Azure Data Factory pipeline with a PowerShell script and executes a
Python notebook on an Azure Databricks cluster. The notebook checks if the data has
been ingested correctly and validates the result data file with $(bin_FILE_NAME) name.
YAML
- job: "Integration_test_job"
displayName: "Integration test job"
dependsOn: [Deploy_to_Databricks, Deploy_to_ADF]
pool:
vmImage: 'ubuntu-latest'
timeoutInMinutes: 0
steps:
- task: AzurePowerShell@4
displayName: 'Execute ADF Pipeline'
inputs:
ScriptPath: '$(Build.SourcesDirectory)/adf/utils/Invoke-
ADFPipeline.ps1'
ScriptArguments: '-ResourceGroupName $(RESOURCE_GROUP) -
DataFactoryName $(DATA_FACTORY_NAME) -PipelineName $(PIPELINE_NAME)'
azurePowerShellVersion: LatestVersion
inputs:
versionSpec: '3.x'
addToPath: true
architecture: 'x64'
inputs:
- task: executenotebook@0
inputs:
notebookPath: '/Shared/devops-ds/test-data-ingestion'
existingClusterId: '$(DATABRICKS_CLUSTER_ID)'
executionParams: '{"bin_file_name":"$(bin_FILE_NAME)"}'
displayName: 'Test data ingestion'
- task: waitexecution@0
displayName: 'Wait until the testing is done'
The final task in the job checks the result of the notebook execution. If it returns an
error, it sets the status of pipeline execution to failed.
Putting pieces together

The complete CI/CD Azure Pipeline consists of the following stages:
CI
Deploy To QA
Deploy to Databricks + Deploy to ADF
Integration Test
It contains a number of Deploy stages equal to the number of target environments you
have. Each Deploy stage contains two deployments that run in parallel and a job that
runs after deployments to test the solution on the environment.
A sample implementation of the pipeline is assembled in the following yaml snippet:
YAML
variables:
- group: devops-ds-vg
stages:
- stage: 'CI'
displayName: 'CI'
jobs:
- job: "CI_Job"
displayName: "CI Job"
pool:
timeoutInMinutes: 0
steps:
inputs:
versionSpec: '3.x'
addToPath: true
architecture: 'x64'
- script: pip install --upgrade flake8 flake8_formatter_junit_xml
displayName: 'Install flake8'
- checkout: self
- script: |
flake8 --output-file=$(Build.BinariesDirectory)/lint-testresults.xml
--format junit-xml
workingDirectory: '$(Build.SourcesDirectory)'
displayName: 'Run flake8 (code style analysis)'
- script: |
python -m pytest --junitxml=$(Build.BinariesDirectory)/unit-
testresults.xml $(Build.SourcesDirectory)
displayName: 'Run unit tests'
- task: PublishTestResults@2
condition: succeededOrFailed()
inputs:
testResultsFiles: '$(Build.BinariesDirectory)/*-testresults.xml'
testRunTitle: 'Linting & Unit tests'
failTaskOnFailedTests: true
displayName: 'Publish linting and unit test results'
# The CI stage produces two artifacts (notebooks and ADF pipelines).

# The pipelines Azure Resource Manager templates are stored in a
technical branch "adf_publish"
- publish:
$(Build.SourcesDirectory)/$(Build.Repository.Name)/code/dataingestion
artifact: di-notebooks
- checkout: git://${{variables['System.TeamProject']}}@adf_publish
- publish: $(Build.SourcesDirectory)/$(Build.Repository.Name)/devops-ds-
adf
artifact: adf-pipelines
- stage: 'Deploy_to_QA'
displayName: 'Deploy to QA'
variables:
- group: devops-ds-qa-vg
jobs:
- deployment: "Deploy_to_Databricks"
displayName: 'Deploy to Databricks'
timeoutInMinutes: 0
environment: qa
strategy:
runOnce:
deploy:
steps:
inputs:
versionSpec: '3.x'
addToPath: true
architecture: 'x64'
inputs:
- task: deploynotebooks@0
inputs:
notebooksFolderPath: '$(Pipeline.Workspace)/di-notebooks'
workspaceFolder: '/Shared/devops-ds'
displayName: 'Deploy (copy) data processing notebook to the
Databricks cluster'
- deployment: "Deploy_to_ADF"
displayName: 'Deploy to ADF'
timeoutInMinutes: 0
environment: qa
strategy:
runOnce:
deploy:
steps:
- task: AzureResourceGroupDeployment@2
displayName: 'Deploy ADF resources'
inputs:
resourceGroupName: $(RESOURCE_GROUP)
location: $(LOCATION)
csmFile: '$(Pipeline.Workspace)/adf-
pipelines/ARMTemplateForFactory.json'
csmParametersFile: '$(Pipeline.Workspace)/adf-
pipelines/ARMTemplateParametersForFactory.json'
overrideParameters: -data-ingestion-
pipeline_properties_variables_data_file_name_defaultValue
"$(DATA_FILE_NAME)"
- job: "Integration_test_job"
displayName: "Integration test job"
dependsOn: [Deploy_to_Databricks, Deploy_to_ADF]
pool:
timeoutInMinutes: 0
steps:
- task: AzurePowerShell@4
displayName: 'Execute ADF Pipeline'
inputs:
ScriptPath: '$(Build.SourcesDirectory)/adf/utils/Invoke-
ADFPipeline.ps1'
ScriptArguments: '-ResourceGroupName $(RESOURCE_GROUP) -
DataFactoryName $(DATA_FACTORY_NAME) -PipelineName $(PIPELINE_NAME)'
azurePowerShellVersion: LatestVersion
inputs:
versionSpec: '3.x'
addToPath: true
architecture: 'x64'
inputs:
- task: executenotebook@0
inputs:
notebookPath: '/Shared/devops-ds/test-data-ingestion'
existingClusterId: '$(DATABRICKS_CLUSTER_ID)'
executionParams: '{"bin_file_name":"$(bin_FILE_NAME)"}'
displayName: 'Test data ingestion'
- task: waitexecution@0
displayName: 'Wait until the testing is done'
Next steps
Source Control in Azure Data Factory
Continuous integration and delivery in Azure Data Factory
DevOps for Azure Databricks
Import data into Azure Machine
Learning designer
Article • 06/01/2023
In this article, you learn how to import your own data in the designer to create custom
solutions. There are two ways you can import data into the designer:
Azure Machine Learning datasets - Register datasets in Azure Machine Learning

to enable advanced features that help you manage your data.
Import Data component - Use the Import Data component to directly access data
from online data sources.
） Important
users and roles.
Use Azure Machine Learning datasets

We recommend that you use datasets to import data into the designer. When you
register a dataset, you can take full advantage of advanced data features like versioning
and tracking and data monitoring.
Register a dataset
You can register existing datasets programatically with the SDK or visually in Azure
Machine Learning studio.
You can also register the output for any designer component as a dataset.
1. Select the component that outputs the data you want to register.
2. In the properties pane, select Outputs + logs > Register dataset.

If the component output data is in a tabular format, you must choose to register the
output as a file dataset or tabular dataset.
File dataset registers the component's output folder as a file dataset. The output
folder contains a data file and meta files that the designer uses internally. Select
this option if you want to continue to use the registered dataset in the designer.
Tabular dataset registers only the component's the output data file as a tabular
dataset. This format is easily consumed by other tools, for example in Automated
Machine Learning or the Python SDK. Select this option if you plan to use the
registered dataset outside of the designer.
Use a dataset
Your registered datasets can be found in the component palette, under Datasets. To use
a dataset, drag and drop it onto the pipeline canvas. Then, connect the output port of
the dataset to other components in the canvas.
If you register a file dataset, the output port type of the dataset is AnyDirectory. If you
register a Tabular dataset, the output port type of the dataset if DataFrameDirectory.
Note that if you connect the output port of the dataset to other components in the
designer, the port type of datasets and components need to be aligned.
７ Note
The designer supports dataset versioning. Specify the dataset version in the
property panel of the dataset component.
Limitations
Currently you can only visualize tabular dataset in the designer. If you register a file
dataset outside designer, you cannot visualize it in the designer canvas.
Currently the designer only supports preview outputs which are stored in Azure
blob storage. You can check and change your output datastore in the Output
settings under Parameters tab in the right panel of the component.
If your data is stored in virtual network (VNet) and you want to preview, you need
to enable workspace managed identity of the datastore.
1. Go the related datastore and click Update authentication

2. Select Yes to enable workspace managed identity.
Import data using the Import Data component

While we recommend that you use datasets to import data, you can also use the Import
Data component. The Import Data component skips registering your dataset in Azure
Machine Learning and imports data directly from a datastore or HTTP URL.
For detailed information on how to use the Import Data component, see the Import
Data reference page.
７ Note
If your dataset has too many columns, you may encounter the following error:
"Validation failed due to size limitation". To avoid this, register the dataset in the
Datasets interface.
Supported sources
This section lists the data sources supported by the designer. Data comes into the
designer from either a datastore or from tabular dataset.
Datastore sources
For a list of supported datastore sources, see Access data in Azure storage services.
Tabular dataset sources

The designer supports tabular datasets created from the following sources:
Delimited files
JSON files
Parquet files
SQL queries
Data types
The designer internally recognizes the following data types:
String
Integer
Decimal
Boolean
Date
The designer uses an internal data type to pass data between components. You can
explicitly convert your data into data table format using the Convert to Dataset
component. Any component that accepts formats other than the internal format will
convert the data silently before passing it to the next component.
Data constraints
Modules in the designer are limited by the size of the compute target. For larger
datasets, you should use a larger Azure Machine Learning compute resource. For more
information on Azure Machine Learning compute, see What are compute targets in
Azure Machine Learning?
Access data in a virtual network

If your workspace is in a virtual network, you must perform additional configuration
steps to visualize data in the designer. For more information on how to use datastores
and datasets in a virtual network, see Use Azure Machine Learning studio in an Azure
virtual network.
Next steps
Learn the designer fundamentals with this Tutorial: Predict automobile price with the
designer.
Set up an image labeling project and
export labels
Article • 06/14/2023
Learn how to create and run data labeling projects to label images in Azure Machine
Learning. Use machine learning (ML)-assisted data labeling or human-in-the-loop
labeling to help with the task.
Set up labels for classification, object detection (bounding box), or instance

segmentation (polygon).
You can also use the data labeling tool in Azure Machine Learning to create a text
labeling project.
Image labeling capabilities

Azure Machine Learning data labeling is a tool you can use to create, manage, and
monitor data labeling projects. Use it to:
Coordinate data, labels, and team members to efficiently manage labeling tasks.
Track progress and maintain the queue of incomplete labeling tasks.
Start and stop the project, and control the labeling progress.
Review and export the labeled data as an Azure Machine Learning dataset.
） Important
The data images you work with in the Azure Machine Learning data labeling tool
must be available in an Azure Blob Storage datastore. If you don't have an existing
datastore, you can upload your data files to a new datastore when you create a
project.
Image data can be any file that has one of these file extensions:
.jpg
.jpeg
.png
.jpe
.jfif
.bmp
.tif
.tiff
.dcm
.dicom
Each file is an item to be labeled.
Prerequisites
You use these items to set up image labeling in Azure Machine Learning:
The data that you want to label, either in local files or in Azure Blob Storage.
The set of labels that you want to apply.
The instructions for labeling.
An Azure Machine Learning workspace. See Create an Azure Machine Learning
workspace.
Create an image labeling project

Labeling projects are administered in Azure Machine Learning. Use the Data Labeling
page in Machine Learning to manage your projects.
If your data is already in Azure Blob Storage, make sure that it's available as a datastore
before you create the labeling project.
1. To create a project, select Add project.
2. For Project name, enter a name for the project.
You can't reuse the project name, even if you delete the project.
3. To create an image labeling project, for Media type, select Image.
4. For Labeling task type, select an option for your scenario:
To apply only a single label to an image from a set of labels, select Image
Classification Multi-class.
To apply one or more labels to an image from a set of labels, select Image
Classification Multi-label. For example, a photo of a dog might be labeled
with both dog and daytime.
To assign a label to each object within an image and add bounding boxes,
select Object Identification (Bounding Box).
To assign a label to each object within an image and draw a polygon around
each object, select Instance Segmentation (Polygon).
5. Select Next to continue.
Add workforce (optional)

Select Use a vendor labeling company from Azure Marketplace only if you've engaged
a data labeling company from Azure Marketplace . Then select the vendor. If your
vendor doesn't appear in the list, clear this option.
Make sure that you first contact the vendor and sign a contract. For more information,
see Work with a data labeling vendor company (preview).
Select Next to continue.
Specify the data to label

If you already created a dataset that contains your data, select the dataset in the Select
an existing dataset dropdown. You can also select Create a dataset to use an existing
Azure datastore or to upload local files.
７ Note
A project can't contain more than 500,000 files. If your dataset exceeds this file
count, only the first 500,000 files are loaded.
Create a dataset from an Azure datastore

In many cases, you can upload local files. However, Azure Storage Explorer provides a
faster and more robust way to transfer a large amount of data. We recommend Storage
Explorer as the default way to move files.
To create a dataset from data that's already stored in Blob Storage:
1. Select Create.
2. For Name, enter a name for your dataset. Optionally, enter a description.
3. Ensure that Dataset type is set to File. Only file dataset types are supported for
images.
4. Select Next.
5. Select From Azure storage, and then select Next.
6. Select the datastore, and then select Next.
7. If your data is in a subfolder within Blob Storage, choose Browse to select the path.
To include all the files in the subfolders of the selected path, append /** to
the path.
To include all the data in the current container and its subfolders, append
**/*.* to the path.
8. Select Create.
9. Select the data asset you created.
Create a dataset from uploaded data

To directly upload your data:
1. Select Create.
3. Ensure that Dataset type is set to File. Only file dataset types are supported for
images.
4. Select Next.
5. Select From local files, and then select Next.
6. (Optional) Select a datastore. You can also leave the default to upload to the
default blob store (workspaceblobstore) for your Machine Learning workspace.
7. Select Next.
8. Select Upload > Upload files or Upload > Upload folder to select the local files or
folders to upload.
9. In the browser window, find your files or folders, and then select Open.
10. Continue to select Upload until you specify all your files and folders.
11. Optionally, you can choose to select the Overwrite if already exists checkbox.
Verify the list of files and folders.
12. Select Next.
13. Confirm the details. Select Back to modify the settings or select Create to create
the dataset.
14. Finally, select the data asset you created.
Configure incremental refresh

If you plan to add new data files to your dataset, use incremental refresh to add the files
to your project.
When Enable incremental refresh at regular intervals is set, the dataset is checked
periodically for new files to be added to a project based on the labeling completion rate.
The check for new data stops when the project contains the maximum 500,000 files.
Select Enable incremental refresh at regular intervals when you want your project to
continually monitor for new data in the datastore.
Clear the selection if you don't want new files in the datastore to automatically be
added to your project.
） Important
Don't create a new version for the dataset you want to update. If you do, the
updates won't be seen because the data labeling project is pinned to the initial
version. Instead, use Azure Storage Explorer to modify your data in the
appropriate folder in Blob Storage.
Also, don't remove data. Removing data from the dataset your project uses causes
an error in the project.
After the project is created, use the Details tab to change incremental refresh, view the
time stamp for the last refresh, and request an immediate refresh of data.
Specify label classes

On the Label categories page, specify a set of classes to categorize your data.
Your labelers' accuracy and speed are affected by their ability to choose among classes.
For instance, instead of spelling out the full genus and species for plants or animals, use
a field code or abbreviate the genus.
You can use either a flat list or create groups of labels.
To create a flat list, select Add label category to create each label.
To create labels in different groups, select Add label category to create the top-
level labels. Then select the plus sign (+) under each top level to create the next
level of labels for that category. You can create up to six levels for any grouping.
You can select labels at any level during the tagging process. For example, the labels
Animal , Animal/Cat , Animal/Dog , Color , Color/Black , Color/White , and Color/Silver
are all available choices for a label. In a multi-label project, there's no requirement to
pick one of each category. If that is your intent, make sure to include this information in
your instructions.
Describe the image labeling task

It's important to clearly explain the labeling task. On the Labeling instructions page, you
can add a link to an external site that has labeling instructions, or you can provide
instructions in the edit box on the page. Keep the instructions task-oriented and
appropriate to the audience. Consider these questions:
What are the labels labelers will see, and how will they choose among them? Is
there a reference text to refer to?
What should they do if no label seems appropriate?
What should they do if multiple labels seem appropriate?
What confidence threshold should they apply to a label? Do you want the labeler's
best guess if they aren't certain?
What should they do with partially occluded or overlapping objects of interest?
What should they do if an object of interest is clipped by the edge of the image?
What should they do if they think they made a mistake after they submit a label?
What should they do if they discover image quality issues, including poor lighting
conditions, reflections, loss of focus, undesired background included, abnormal
camera angles, and so on?
What should they do if multiple reviewers have different opinions about applying a
label?
For bounding boxes, important questions include:
How is the bounding box defined for this task? Should it stay entirely on the
interior of the object or should it be on the exterior? Should it be cropped as
closely as possible, or is some clearance acceptable?
What level of care and consistency do you expect the labelers to apply in defining
bounding boxes?
What is the visual definition of each label class? Can you provide a list of normal,
edge, and counter cases for each class?
What should the labelers do if the object is tiny? Should it be labeled as an object
or should they ignore that object as background?
How should labelers handle an object that's only partially shown in the image?
How should labelers handle an object that's partially covered by another object?
How should labelers handle an object that has no clear boundary?
How should labelers handle an object that isn't the object class of interest but has
visual similarities to a relevant object type?
７ Note
Labelers can select the first nine labels by using number keys 1 through 9.
Quality control (preview)

To get more accurate labels, use the Quality control page to send each item to multiple
labelers.
） Important
Consensus labeling is currently in public preview.
The preview version is provided without a service level agreement, and it's not
recommended for production workloads. Certain features might not be supported
or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
To have each item sent to multiple labelers, select Enable consensus labeling (preview).
Then set values for Minimum labelers and Maximum labelers to specify how many
labelers to use. Make sure that you have as many labelers available as your maximum
number. You can't change these settings after the project has started.
If a consensus is reached from the minimum number of labelers, the item is labeled. If a
consensus isn't reached, the item is sent to more labelers. If there's no consensus after
the item goes to the maximum number of labelers, its status is Needs Review, and the
project owner is responsible for labeling the item.
７ Note
Instance Segmentation projects can't use consensus labeling.
Use ML-assisted data labeling

To accelerate labeling tasks, on the ML assisted labeling page, you can trigger
automatic machine learning models. Medical images (files that have a .dcm extension)
aren't included in assisted labeling.
At the start of your labeling project, the items are shuffled into a random order to
reduce potential bias. However, the trained model reflects any biases that are present in
the dataset. For example, if 80 percent of your items are of a single class, then
approximately 80 percent of the data used to train the model lands in that class.
To enable assisted labeling, select Enable ML assisted labeling and specify a GPU. If you
don't have a GPU in your workspace, a GPU cluster is created for you and added to your
workspace. The cluster is created with a minimum of zero nodes, which means it costs
nothing when not in use.
ML-assisted labeling consists of two phases:
Clustering
Pre-labeling
The labeled data item count that's required to start assisted labeling isn't a fixed
number. This number can vary significantly from one labeling project to another. For
some projects, it's sometimes possible to see pre-label or cluster tasks after 300 items
have been manually labeled. ML-assisted labeling uses a technique called transfer
learning. Transfer learning uses a pre-trained model to jump-start the training process. If
the classes of your dataset resemble the classes in the pre-trained model, pre-labels
might become available after only a few hundred manually labeled items. If your dataset
significantly differs from the data that's used to pre-train the model, the process might
take more time.
When you use consensus labeling, the consensus label is used for training.
Because the final labels still rely on input from the labeler, this technology is sometimes
called human-in-the-loop labeling.
７ Note
ML-assisted data labeling doesn't support default storage accounts that are
secured behind a virtual network. You must use a non-default storage account for
ML-assisted data labeling. The non-default storage account can be secured behind
the virtual network.
Clustering
After you submit some labels, the classification model starts to group together similar
items. These similar images are presented to labelers on the same page to help make
manual tagging more efficient. Clustering is especially useful when a labeler views a grid
of four, six, or nine images.
After a machine learning model is trained on your manually labeled data, the model is
truncated to its last fully connected layer. Unlabeled images are then passed through
the truncated model in a process called embedding or featurization. This process
embeds each image in a high-dimensional space that the model layer defines. Other
images in the space that are nearest the image are used for clustering tasks.
The clustering phase doesn't appear for object detection models or text classification.
Pre-labeling
After you submit enough labels for training, either a classification model predicts tags or
an object detection model predicts bounding boxes. The labeler now sees pages that
contain predicted labels already present on each item. For object detection, predicted
boxes are also shown. The task involves reviewing these predictions and correcting any
incorrectly labeled images before page submission.
After a machine learning model is trained on your manually labeled data, the model is
evaluated on a test set of manually labeled items. The evaluation helps determine the
model's accuracy at different confidence thresholds. The evaluation process sets a
confidence threshold beyond which the model is accurate enough to show pre-labels.
The model is then evaluated against unlabeled data. Items with predictions that are
more confident than the threshold are used for pre-labeling.
Initialize the image labeling project

After the labeling project is initialized, some aspects of the project are immutable. You
can't change the task type or dataset. You can modify labels and the URL for the task
description. Carefully review the settings before you create the project. After you submit
the project, you return to the Data Labeling overview page, which shows the project as
Initializing.
７ Note
This page might not automatically refresh. After a pause, manually refresh the page
to see the project's status as Created.
Run and monitor the project

After you initialize the project, Azure begins to run it. To see the project details, select
the project on the main Data Labeling page.
To pause or restart the project, on the project command bar, toggle the Running status.
You can label data only when the project is running.
Dashboard
The Dashboard tab shows the progress of the labeling task.
The progress charts show how many items have been labeled, skipped, need review, or
aren't yet complete. Hover over the chart to see the number of items in each section.
A distribution of the labels for completed tasks is shown below the chart. In some
project types, an item can have multiple labels. The total number of labels can exceed
the total number of items.
A distribution of labelers and how many items they've labeled also are shown.
The middle section shows a table that has a queue of unassigned tasks. When ML-
assisted labeling is off, this section shows the number of manual tasks that are awaiting
assignment.
When ML-assisted labeling is on, this section also shows:
Tasks that contain clustered items in the queue.

Tasks that contain pre-labeled items in the queue.
Additionally, when ML-assisted labeling is enabled, you can scroll down to see the ML-
assisted labeling status. The Jobs sections give links for each of the machine learning
runs.
Training: Trains a model to predict the labels.

Validation: Determines whether item pre-labeling uses the prediction of this
model.
Inference: Prediction run for new items.
Featurization: Clusters items (only for image classification projects).
Data tab
On the Data tab, you can see your dataset and review labeled data. Scroll through the
labeled data to see the labels. If you see data that's incorrectly labeled, select it and
choose Reject to remove the labels and return the data to the unlabeled queue.
If your project uses consensus labeling, review images that have no consensus:
1. Select the Data tab.
2. On the left menu, select Review labels.
3. On the command bar above Review labels, select All filters.
4. Under Labeled datapoints, select Consensus labels in need of review to show only
images for which the labelers didn't come to a consensus.
5. For each image to review, select the Consensus label dropdown to view the
conflicting labels.
6. Although you can select an individual labeler to see their labels, to update or reject
the labels, you must use the top choice, Consensus label (preview).
Details tab
View and change details of your project. On this tab, you can:
View project details and input datasets.
Set or clear the Enable incremental refresh at regular intervals option, or request
an immediate refresh.
View details of the storage container that's used to store labeled outputs in your
project.
Add labels to your project.
Edit instructions you give to your labels.
Change settings for ML-assisted labeling and kick off a labeling task.
Vision Studio tab

If your project was created from Vision Studio, you'll also see a Vision Studio tab. Select
Go to Vision Studio to return to Vision Studio. Once you return to Vision Studio, you will
be able to import your labeled data.
Access for labelers

Anyone who has Contributor or Owner access to your workspace can label data in your
project.
You can also add users and customize the permissions so that they can access labeling
but not other parts of the workspace or your labeling project. For more information, see
Add users to your data labeling project.
Add new labels to a project

During the data labeling process, you might want to add more labels to classify your
items. For example, you might want to add an Unknown or Other label to indicate
confusion.
To add one or more labels to a project:
1. On the main Data Labeling page, select the project.
2. On the project command bar, toggle the status from Running to Paused to stop
labeling activity.
3. Select the Details tab.
4. In the list on the left, select Label categories.
5. Modify your labels.

6. In the form, add your new label. Then choose how to continue the project. Because
you've changed the available labels, choose how to treat data that's already
labeled:
Start over, and remove all existing labels. Choose this option if you want to
start labeling from the beginning by using the new full set of labels.
Start over, and keep all existing labels. Choose this option to mark all data as
unlabeled, but keep the existing labels as a default tag for images that were
previously labeled.
Continue, and keep all existing labels. Choose this option to keep all data
already labeled as it is, and start using the new label for data that's not yet
labeled.
7. Modify your instructions page as necessary for new labels.
8. After you've added all new labels, toggle Paused to Running to restart the project.
Start an ML-assisted labeling task

ML-assisted labeling starts automatically after some items have been labeled. This
automatic threshold varies by project. You can manually start an ML-assisted training
run if your project contains at least some labeled data.
７ Note
On-demand training is not available for projects created before December 2022. To
use this feature, create a new project.
To start a new ML-assisted training run:
1. At the top of your project, select Details.

2. On the left menu, select ML assisted labeling.
3. Near the bottom of the page, for On-demand training, select Start.
Export the labels

To export the labels, on the Project details page of your labeling project, select the
Export button. You can export the label data for Machine Learning experimentation at
any time.
You can export an image label as:
A CSV file. Azure Machine Learning creates the CSV file in a folder inside
Labeling/export/csv.
A COCO format file. Azure Machine Learning creates the COCO file in a folder
inside Labeling/export/coco.
An Azure Machine Learning dataset with labels.
When you export a CSV or COCO file, a notification appears briefly when the file is ready
to download. You'll also find the notification in the Notification section on the top bar:
Access exported Azure Machine Learning datasets and data assets in the Data section of
Machine Learning. The data details page also provides sample code you can use to
access your labels by using Python.
After you export your labeled data to an Azure Machine Learning dataset, you can use
AutoML to build computer vision models that are trained on your labeled data. Learn
more at Set up AutoML to train computer vision models by using Python.
Troubleshoot issues
Use these tips if you see any of the following issues:
Issue Resolution
Only datasets created on blob This issue is a known limitation of the current release.
datastores can be used.
Removing data from the dataset Don't remove data from the version of the dataset you used
your project uses causes an error in a labeling project. Create a new version of the dataset to
in the project. use to remove data.
After a project is created, the Manually refresh the page. Initialization should complete at
project status is Initializing for an roughly 20 data points per second. No automatic refresh is a
extended time. known issue.
Newly labeled items aren't visible To load all labeled items, select the First button. The First
in data review. button takes you back to the front of the list, and it loads all
labeled data.
You can't assign a set of tasks to a This issue is a known limitation of the current release.
specific labeler.
Troubleshoot object detection
Issue Resolution
If you select the Esc key when you label for object detection, a To delete the label, select the X
zero-size label is created and label submission fails. delete icon next to the label.
Next steps
How to tag images
Set up a text labeling project and export
labels
Article • 06/14/2023
In Azure Machine Learning, learn how to create and run data labeling projects to label
text data. Specify either a single label or multiple labels to apply to each text item.
You can also use the data labeling tool in Azure Machine Learning to create an image
labeling project.
Text labeling capabilities

Azure Machine Learning data labeling is a tool you can use to create, manage, and
monitor data labeling projects. Use it to:
Coordinate data, labels, and team members to efficiently manage labeling tasks.
Track progress and maintain the queue of incomplete labeling tasks.
Start and stop the project, and control the labeling progress.
Review and export the labeled data as an Azure Machine Learning dataset.
） Important
The text data you work with in the Azure Machine Learning data labeling tool must
be available in an Azure Blob Storage datastore. If you don't have an existing
datastore, you can upload your data files to a new datastore when you create a
project.
These data formats are available for text data:
.txt: Each file represents one item to be labeled.

.csv or .tsv: Each row represents one item that's presented to the labeler. You
decide which columns the labeler can see when they label the row.
Prerequisites
You use these items to set up text labeling in Azure Machine Learning:
The data that you want to label, either in local files or in Azure Blob Storage.
The set of labels that you want to apply.
The instructions for labeling.
An Azure Machine Learning workspace. See Create an Azure Machine Learning
workspace.
Create a text labeling project

Labeling projects are administered in Azure Machine Learning. Use the Data Labeling
page in Machine Learning to manage your projects.
If your data is already in Azure Blob Storage, make sure that it's available as a datastore
before you create the labeling project.
1. To create a project, select Add project.
2. For Project name, enter a name for the project.
You can't reuse the project name, even if you delete the project.
3. To create a text labeling project, for Media type, select Text.
4. For Labeling task type, select an option for your scenario:
To apply only a single label to each piece of text from a set of labels, select
Text Classification Multi-class.
To apply one or more labels to each piece of text from a set of labels, select
Text Classification Multi-label.
To apply labels to individual text words or to multiple text words in each
entry, select Text Named Entity Recognition.
5. Select Next to continue.
Add workforce (optional)

Select Use a vendor labeling company from Azure Marketplace only if you've engaged
a data labeling company from Azure Marketplace . Then select the vendor. If your
vendor doesn't appear in the list, clear this option.
Make sure that you first contact the vendor and sign a contract. For more information,
see Work with a data labeling vendor company (preview).
Select Next to continue.
Select or create a dataset

If you already created a dataset that contains your data, select it in the Select an
existing dataset dropdown. You can also select Create a dataset to use an existing
Azure datastore or to upload local files.
７ Note
A project can't contain more than 500,000 files. If your dataset exceeds this file
count, only the first 500,000 files are loaded.
Create a dataset from an Azure datastore

In many cases, you can upload local files. However, Azure Storage Explorer provides a
faster and more robust way to transfer a large amount of data. We recommend Storage
Explorer as the default way to move files.
To create a dataset from data that's already stored in Blob Storage:
1. Select Create.
3. Choose the Dataset type:
If you're using a .csv or .tsv file and each row contains a response, select
Tabular.
If you're using separate .txt files for each response, select File.
4. Select Next.
5. Select From Azure storage, and then select Next.
6. Select the datastore, and then select Next.
7. If your data is in a subfolder within Blob Storage, choose Browse to select the path.
To include all the files in the subfolders of the selected path, append /** to
the path.
To include all the data in the current container and its subfolders, append
**/*.* to the path.
8. Select Create.
9. Select the data asset you created.
Create a dataset from uploaded data

To directly upload your data:
1. Select Create.
3. Choose the Dataset type:
If you're using a .csv or .tsv file and each row contains a response, select
Tabular.
If you're using separate .txt files for each response, select File.
4. Select Next.
5. Select From local files, and then select Next.
6. (Optional) Select a datastore. The default uploads to the default blob store
(workspaceblobstore) for your Machine Learning workspace.
7. Select Next.
8. Select Upload > Upload files or Upload > Upload folder to select the local files or
folders to upload.
9. Find your files or folder in the browser window, and then select Open.
10. Continue to select Upload until you specify all of your files and folders.
11. Optionally select the Overwrite if already exists checkbox. Verify the list of files
and folders.
12. Select Next.
13. Confirm the details. Select Back to modify the settings, or select Create to create
the dataset.
14. Finally, select the data asset you created.
Configure incremental refresh

If you plan to add new data files to your dataset, use incremental refresh to add the files
to your project.
When Enable incremental refresh at regular intervals is set, the dataset is checked
periodically for new files to be added to a project based on the labeling completion rate.
The check for new data stops when the project contains the maximum 500,000 files.
Select Enable incremental refresh at regular intervals when you want your project to
continually monitor for new data in the datastore.
Clear the selection if you don't want new files in the datastore to automatically be
added to your project.
） Important
Don't create a new version for the dataset you want to update. If you do, the
updates won't be seen because the data labeling project is pinned to the initial
version. Instead, use Azure Storage Explorer to modify your data in the
appropriate folder in Blob Storage.
Also, don't remove data. Removing data from the dataset your project uses causes
an error in the project.
After the project is created, use the Details tab to change incremental refresh, view the
time stamp for the last refresh, and request an immediate refresh of data.
７ Note
Projects that use tabular (.csv or .tsv) dataset input can use incremental refresh. But
incremental refresh only adds new tabular files. The refresh doesn't recognize
changes to existing tabular files.
Specify label categories

On the Label categories page, specify a set of classes to categorize your data.
Your labelers' accuracy and speed are affected by their ability to choose among classes.
For instance, instead of spelling out the full genus and species for plants or animals, use
a field code or abbreviate the genus.
You can use either a flat list or create groups of labels.
To create a flat list, select Add label category to create each label.
To create labels in different groups, select Add label category to create the top-
level labels. Then select the plus sign (+) under each top level to create the next
level of labels for that category. You can create up to six levels for any grouping.
You can select labels at any level during the tagging process. For example, the labels
Animal , Animal/Cat , Animal/Dog , Color , Color/Black , Color/White , and Color/Silver
are all available choices for a label. In a multi-label project, there's no requirement to
pick one of each category. If that is your intent, make sure to include this information in
your instructions.
Describe the text labeling task

It's important to clearly explain the labeling task. On the Labeling instructions page, you
can add a link to an external site that has labeling instructions, or you can provide
instructions in the edit box on the page. Keep the instructions task-oriented and
appropriate to the audience. Consider these questions:
What are the labels labelers will see, and how will they choose among them? Is
there a reference text to refer to?
What should they do if no label seems appropriate?
What should they do if multiple labels seem appropriate?
What confidence threshold should they apply to a label? Do you want the labeler's
best guess if they aren't certain?
What should they do with partially occluded or overlapping objects of interest?
What should they do if an object of interest is clipped by the edge of the image?
What should they do if they think they made a mistake after they submit a label?
What should they do if they discover image quality issues, including poor lighting
conditions, reflections, loss of focus, undesired background included, abnormal
camera angles, and so on?
What should they do if multiple reviewers have different opinions about applying a
label?
７ Note
Labelers can select the first nine labels by using number keys 1 through 9.
Quality control (preview)

To get more accurate labels, use the Quality control page to send each item to multiple
labelers.
） Important
Consensus labeling is currently in public preview.
The preview version is provided without a service level agreement, and it's not
or might have constrained capabilities.
For more information, see Supplemental Terms of Use for Microsoft Azure
Previews .
To have each item sent to multiple labelers, select Enable consensus labeling (preview).
Then set values for Minimum labelers and Maximum labelers to specify how many
labelers to use. Make sure that you have as many labelers available as your maximum
number. You can't change these settings after the project has started.
If a consensus is reached from the minimum number of labelers, the item is labeled. If a
consensus isn't reached, the item is sent to more labelers. If there's no consensus after
the item goes to the maximum number of labelers, its status is Needs Review, and the
project owner is responsible for labeling the item.
Use ML-assisted data labeling

To accelerate labeling tasks, the ML assisted labeling page can trigger automatic
machine learning models. Machine learning (ML)-assisted labeling can handle both file
(.txt) and tabular (.csv) text data inputs.
To use ML-assisted labeling:
1. Select Enable ML assisted labeling.

2. Select the Dataset language for the project. This list shows all languages that the
TextDNNLanguages Class supports.
3. Specify a compute target to use. If you don't have a compute target in your
workspace, this step creates a compute cluster and adds it to your workspace. The
cluster is created with a minimum of zero nodes, and it costs nothing when not in
use.
More information about ML-assisted labeling

At the start of your labeling project, the items are shuffled into a random order to
reduce potential bias. However, the trained model reflects any biases present in the
dataset. For example, if 80 percent of your items are of a single class, then
approximately 80 percent of the data that's used to train the model lands in that class.
To train the text DNN model that ML-assisted labeling uses, the input text per training
example is limited to approximately the first 128 words in the document. For tabular
input, all text columns are concatenated before this limit is applied. This practical limit
allows the model training to complete in a reasonable amount of time. The actual text in
a document (for file input) or set of text columns (for tabular input) can exceed 128
words. The limit pertains only to what the model internally uses during the training
process.
The number of labeled items that's required to start assisted labeling isn't a fixed
number. This number can vary significantly from one labeling project to another. The
variance depends on many factors, including the number of label classes and the label
distribution.
When you use consensus labeling, the consensus label is used for training.
Because the final labels still rely on input from the labeler, this technology is sometimes
called human-in-the-loop labeling.
７ Note
ML-assisted data labeling doesn't support default storage accounts that are
secured behind a virtual network. You must use a non-default storage account for
ML-assisted data labeling. The non-default storage account can be secured behind
the virtual network.
Pre-labeling
After you submit enough labels for training, the trained model is used to predict tags.
The labeler now sees pages that show predicted labels already present on each item.
The task then involves reviewing these predictions and correcting any mislabeled items
before page submission.
After you train the machine learning model on your manually labeled data, the model is
evaluated on a test set of manually labeled items. The evaluation helps determine the
model's accuracy at different confidence thresholds. The evaluation process sets a
confidence threshold beyond which the model is accurate enough to show pre-labels.
The model is then evaluated against unlabeled data. Items that have predictions that are
more confident than the threshold are used for pre-labeling.
Initialize the text labeling project

After the labeling project is initialized, some aspects of the project are immutable. You
can't change the task type or dataset. You can modify labels and the URL for the task
description. Carefully review the settings before you create the project. After you submit
the project, you return to the Data Labeling overview page, which shows the project as
Initializing.
７ Note
This page might not automatically refresh. After a pause, manually refresh the page
to see the project's status as Created.
Run and monitor the project

After you initialize the project, Azure begins to run it. To see the project details, select
the project on the main Data Labeling page.
To pause or restart the project, on the project command bar, toggle the Running status.
You can label data only when the project is running.
Dashboard
The Dashboard tab shows the labeling task progress.
The progress charts show how many items have been labeled, skipped, need review, or
aren't yet complete. Hover over the chart to see the number of items in each section.
A distribution of the labels for completed tasks is shown below the chart. In some
project types, an item can have multiple labels. The total number of labels can exceed
the total number of items.
A distribution of labelers and how many items they've labeled also are shown.
The middle section shows a table that has a queue of unassigned tasks. When ML-
assisted labeling is off, this section shows the number of manual tasks that are awaiting
assignment.
When ML-assisted labeling is on, this section also shows:
Tasks that contain clustered items in the queue.

Tasks that contain pre-labeled items in the queue.
Additionally, when ML-assisted labeling is enabled, you can scroll down to see the ML-
assisted labeling status. The Jobs sections give links for each of the machine learning
runs.
Data
On the Data tab, you can see your dataset and review labeled data. Scroll through the
labeled data to see the labels. If you see data that's incorrectly labeled, select it and
choose Reject to remove the labels and return the data to the unlabeled queue.
If your project uses consensus labeling, review items that have no consensus:
1. Select the Data tab.
2. On the left menu, select Review labels.
3. On the command bar above Review labels, select All filters.
4. Under Labeled datapoints, select Consensus labels in need of review to show only
items for which the labelers didn't come to a consensus.
5. For each item to review, select the Consensus label dropdown to view the
conflicting labels.
6. Although you can select an individual labeler to see their labels, to update or reject
the labels, you must use the top choice, Consensus label (preview).
Details tab
View and change details of your project. On this tab, you can:
View project details and input datasets.

Set or clear the Enable incremental refresh at regular intervals option, or request
an immediate refresh.
View details of the storage container that's used to store labeled outputs in your
project.
Add labels to your project.
Edit instructions you give to your labels.
Change settings for ML-assisted labeling and kick off a labeling task.
Language Studio tab

If your project was created from Language Studio, you'll also see a Language Studio
tab.
If labeling is active in Language Studio, you can't also label in Azure Machine
Learning. In that case, Language Studio is the only tab available. Select View in
Language Studio to go to the active labeling project in Language Studio. From
there, you can switch to labeling in Azure Machine Learning if you wish.
If labeling is active in Azure Machine Learning, you have two choices:
Select Switch to Language Studio to switch your labeling activity back to

Language Studio. When you switch, all your currently labeled data is imported into
Language Studio. Your ability to label data in Azure Machine Learning is disabled,
and you can label data in Language Studio. You can switch back to labeling in
Azure Machine Learning at any time through Language Studio.
７ Note
Only users with the correct roles in Azure Machine Learning have the ability
to switch labeling.
Select Disconnect from Language Studio to sever the relationship with Language
Studio. Once you disconnect, the project will lose its association with Language
Studio, and will no longer have the Language Studio tab. Disconnecting your
project from Language Studio is a permanent, irreversible process and can't be
undone. You will no longer be able to access your labels for this project in
Language Studio. The labels are available only in Azure Machine Learning from this
point onward.
Access for labelers
Anyone who has Contributor or Owner access to your workspace can label data in your
project.
You can also add users and customize the permissions so that they can access labeling
but not other parts of the workspace or your labeling project. For more information, see
Add users to your data labeling project.
Add new labels to a project

During the data labeling process, you might want to add more labels to classify your
items. For example, you might want to add an Unknown or Other label to indicate
confusion.
To add one or more labels to a project:
1. On the main Data Labeling page, select the project.
2. On the project command bar, toggle the status from Running to Paused to stop
labeling activity.
3. Select the Details tab.
4. In the list on the left, select Label categories.
5. Modify your labels.

6. In the form, add your new label. Then choose how to continue the project. Because
you've changed the available labels, choose how to treat data that's already
labeled:
Start over, and remove all existing labels. Choose this option if you want to
start labeling from the beginning by using the new full set of labels.
Start over, and keep all existing labels. Choose this option to mark all data as
unlabeled, but keep the existing labels as a default tag for images that were
previously labeled.
Continue, and keep all existing labels. Choose this option to keep all data
already labeled as it is, and start using the new label for data that's not yet
labeled.
7. Modify your instructions page as necessary for new labels.
8. After you've added all new labels, toggle Paused to Running to restart the project.
Start an ML-assisted labeling task

ML-assisted labeling starts automatically after some items have been labeled. This
automatic threshold varies by project. You can manually start an ML-assisted training
run if your project contains at least some labeled data.
７ Note
On-demand training is not available for projects created before December 2022. To
use this feature, create a new project.
To start a new ML-assisted training run:
1. At the top of your project, select Details.

2. On the left menu, select ML assisted labeling.
3. Near the bottom of the page, for On-demand training, select Start.
Export the labels

To export the labels, on the Project details page of your labeling project, select the
Export button. You can export the label data for Machine Learning experimentation at
any time.
For all project types except Text Named Entity Recognition, you can export label data
as:
A CSV file. Azure Machine Learning creates the CSV file in a folder inside
Labeling/export/csv.
An Azure Machine Learning dataset with labels.
For Text Named Entity Recognition projects, you can export label data as:
An Azure Machine Learning dataset (v1) with labels.

A CoNLL file. For this export, you'll also have to assign a compute resource. The
export process runs offline and generates the file as part of an experiment run.
Azure Machine Learning creates the CoNLL file in a folder
insideLabeling/export/conll.
When you export a CSV or CoNLL file, a notification appears briefly when the file is
ready to download. You'll also find the notification in the Notification section on the top
bar:
Access exported Azure Machine Learning datasets and data assets in the Data section of
Machine Learning. The data details page also provides sample code you can use to
access your labels by using Python.
Troubleshoot issues
Use these tips if you see any of the following issues:
Issue Resolution
Only datasets created on blob This issue is a known limitation of the current release.
datastores can be used.
Removing data from the dataset Don't remove data from the version of the dataset you used
your project uses causes an error in a labeling project. Create a new version of the dataset to
in the project. use to remove data.
After a project is created, the Manually refresh the page. Initialization should complete at
project status is Initializing for an roughly 20 data points per second. No automatic refresh is a
extended time. known issue.
Newly labeled items aren't visible To load all labeled items, select the First button. The First
in data review. button takes you back to the front of the list, and it loads all
labeled data.
You can't assign a set of tasks to a This issue is a known limitation of the current release.
specific labeler.
Next steps
How to tag text
Export or delete your Machine Learning
service workspace data
Article • 04/04/2023
In Azure Machine Learning, you can export or delete your workspace data using either
the portal graphical interface or the Python SDK. This article describes both options.
７ Note
For information about viewing or deleting personal data, see Azure Data Subject
Requests for the GDPR. For more information about GDPR, see the GDPR section
of the Microsoft Trust Center and the GDPR section of the Service Trust
portal .
７ Note
This article provides steps about how to delete personal data from the device or
service and can be used to support your obligations under the GDPR. For general
information about GDPR, see the GDPR section of the Microsoft Trust Center
and the GDPR section of the Service Trust portal .
Control your workspace data

In-product data stored by Azure Machine Learning is available for export and deletion.
You can export and delete data with Azure Machine Learning studio, the CLI, and the
SDK. Additionally, you can access telemetry data through the Azure Privacy portal.
In Azure Machine Learning, personal data consists of user information in job history
documents.
Delete high-level resources using the portal

When you create a workspace, Azure creates several resources within the resource
group:
The workspace itself

A storage account
A container registry
An Applications Insights instance
A key vault
To delete these resources, selecting them from the list and choose Delete:
） Important
If the resource is configured for soft delete, the data won't actually delete unless
you optionally select to delete the resource permanently. For more information, see
the following articles:
Workspace soft-deletion.
Soft delete for blobs.
Soft delete in Azure Container Registry.
Azure log analytics workspace.
Azure Key Vault soft-delete.
Job history documents, which may contain personal user information, are stored in the
storage account in blob storage, in /azureml subfolders. You can download and delete
the data from the portal.
Export and delete machine learning resources
using Azure Machine Learning studio
Azure Machine Learning studio provides a unified view of your machine learning
resources - for example, notebooks, data assets, models, and jobs. Azure Machine
Learning studio emphasizes preservation of a record of your data and experiments. You
can delete computational resources such as pipelines and compute resources with the
browser. For these resources, navigate to the resource in question and choose Delete.
You can unregister data assets and archive jobs, but these operations don't delete the
data. To entirely remove the data, data assets and job data require deletion at the
storage level. Storage level deletion happens in the portal, as described earlier. Azure
Machine Learning studio can handle individual deletion. Job deletion deletes the data of
that job.
Azure Machine Learning studio can handle training artifact downloads from
experimental jobs. Choose the relevant Job. Choose Output + logs, and navigate to the
specific artifacts you wish to download. Choose ... and Download, or select Download
all.
To download a registered model, navigate to the Model and choose Download.
Export and delete resources using the Python

SDK
You can download the outputs of a particular job using:
Python
# Retrieved from Azure Machine Learning web UI

run_id = 'aaaaaaaa-bbbb-cccc-dddd-0123456789AB'
experiment = ws.experiments['my-experiment']
run = next(run for run in ex.get_runs() if run.id == run_id)
metrics_output_port = run.get_pipeline_output('metrics_output')
model_output_port = run.get_pipeline_output('model_output')
metrics_output_port.download('.', show_progress=True)
model_output_port.download('.', show_progress=True)
The following machine learning resources can be deleted using the Python SDK:
Type Function Call Notes
Workspace delete Use delete-dependent-resources to cascade the delete
Model delete
ComputeTarget delete
WebService delete
Next steps
Learn more about Managing a workspace.
Automated machine learning (AutoML)?
Article • 06/01/2023
Automated machine learning, also referred to as automated ML or AutoML, is the

process of automating the time-consuming, iterative tasks of machine learning model
development. It allows data scientists, analysts, and developers to build ML models with
high scale, efficiency, and productivity all while sustaining model quality. Automated ML
in Azure Machine Learning is based on a breakthrough from our Microsoft Research
division .
Traditional machine learning model development is resource-intensive, requiring

significant domain knowledge and time to produce and compare dozens of models.
With automated machine learning, you'll accelerate the time it takes to get production-
ready ML models with great ease and efficiency.
Ways to use AutoML in Azure Machine

Learning
Azure Machine Learning offers the following two experiences for working with
automated ML. See the following sections to understand feature availability in each
experience (v1).
For code-experienced customers, Azure Machine Learning Python SDK. Get started
with Tutorial: Use automated machine learning to predict taxi fares (v1).
For limited/no-code experience customers, Azure Machine Learning studio at

https://ml.azure.com . Get started with these tutorials:
Tutorial: Create a classification model with automated ML in Azure Machine
Learning.
Tutorial: Forecast demand with automated machine learning
Experiment settings
The following settings allow you to configure your automated ML experiment.
The The studio web

Python experience
SDK
The The studio web
Python experience
SDK
Split data into train/validation sets ✓ ✓
Supports ML tasks: classification, regression, & forecasting ✓ ✓
Supports computer vision tasks: image classification, object ✓

detection & instance segmentation
Optimizes based on primary metric ✓ ✓
Supports Azure Machine Learning compute as compute ✓ ✓

target
Configure forecast horizon, target lags & rolling window ✓ ✓
Set exit criteria ✓ ✓
Set concurrent iterations ✓ ✓
Drop columns ✓ ✓
Block algorithms ✓ ✓
Cross validation ✓ ✓
Supports training on Azure Databricks clusters ✓
View engineered feature names ✓
Featurization summary ✓
Featurization for holidays ✓
Log file verbosity levels ✓
Model settings
These settings can be applied to the best model as a result of your automated ML
experiment.
The Python The studio web

SDK experience
Best model registration, deployment, ✓ ✓

explainability
The Python The studio web
SDK experience
Enable voting ensemble & stack ensemble ✓ ✓

models
Show best model based on non-primary metric ✓
Enable/disable ONNX model compatibility ✓
Test the model ✓ ✓ (preview)
Job control settings

These settings allow you to review and control your experiment jobs and its child jobs.
The Python SDK The studio web experience
Job summary table ✓ ✓
Cancel jobs & child jobs ✓ ✓
Get guardrails ✓ ✓
Pause & resume jobs ✓
When to use AutoML: classification, regression,

forecasting, computer vision & NLP
Apply automated ML when you want Azure Machine Learning to train and tune a model
for you using the target metric you specify. Automated ML democratizes the machine
learning model development process, and empowers its users, no matter their data
science expertise, to identify an end-to-end machine learning pipeline for any problem.
ML professionals and developers across industries can use automated ML to:
Implement ML solutions without extensive programming knowledge

Save time and resources
Leverage data science best practices
Provide agile problem-solving
Classification
Classification is a common machine learning task. Classification is a type of supervised
learning in which models learn using training data, and apply those learnings to new
data. Azure Machine Learning offers featurizations specifically for these tasks, such as
deep neural network text featurizers for classification. Learn more about featurization
(v1) options.
The main goal of classification models is to predict which categories new data will fall
into based on learnings from its training data. Common classification examples include
fraud detection, handwriting recognition, and object detection. Learn more and see an
example at Create a classification model with automated ML (v1).
See examples of classification and automated machine learning in these Python

notebooks: Fraud Detection , Marketing Prediction , and Newsgroup Data
Classification
Regression
Similar to classification, regression tasks are also a common supervised learning task.
Different from classification where predicted output values are categorical, regression
models predict numerical output values based on independent predictors. In regression,
the objective is to help establish the relationship among those independent predictor
variables by estimating how one variable impacts the others. For example, automobile
price based on features like, gas mileage, safety rating, etc. Learn more and see an
example of regression with automated machine learning (v1).
See examples of regression and automated machine learning for predictions in these
Python notebooks: CPU Performance Prediction ,
Time-series forecasting
Building forecasts is an integral part of any business, whether it's revenue, inventory,
sales, or customer demand. You can use automated ML to combine techniques and
approaches and get a recommended, high-quality time-series forecast. Learn more with
this how-to: automated machine learning for time series forecasting (v1).
An automated time-series experiment is treated as a multivariate regression problem.

Past time-series values are "pivoted" to become additional dimensions for the regressor
together with other predictors. This approach, unlike classical time series methods, has
an advantage of naturally incorporating multiple contextual variables and their
relationship to one another during training. Automated ML learns a single, but often
internally branched model for all items in the dataset and prediction horizons. More
data is thus available to estimate model parameters and generalization to unseen series
becomes possible.
Advanced forecasting configuration includes:
holiday detection and featurization

time-series and DNN learners (Auto-ARIMA, Prophet, ForecastTCN)
many models support through grouping
rolling-origin cross validation
configurable lags
rolling window aggregate features
See examples of regression and automated machine learning for predictions in these
Python notebooks: Sales Forecasting , Demand Forecasting , and Forecasting
GitHub's Daily Active Users .
Computer vision
Support for computer vision tasks allows you to easily generate models trained on
image data for scenarios like image classification and object detection.
With this capability you can:
Seamlessly integrate with the Azure Machine Learning data labeling capability
Use labeled data for generating image models
Optimize model performance by specifying the model algorithm and tuning the
hyperparameters.
Download or deploy the resulting model as a web service in Azure Machine
Learning.
Operationalize at scale, leveraging Azure Machine Learning MLOps and ML
Pipelines (v1) capabilities.
Authoring AutoML models for vision tasks is supported via the Azure Machine Learning
Python SDK. The resulting experimentation jobs, models, and outputs can be accessed
from the Azure Machine Learning studio UI.
Learn how to set up AutoML training for computer vision models.

Automated ML for images supports the following computer vision tasks:
Task Description
Multi-class Tasks where an image is classified with only a single label from a set of classes -
image e.g. each image is classified as either an image of a 'cat' or a 'dog' or a 'duck'
classification
Multi-label Tasks where an image could have one or more labels from a set of labels - e.g. an
image image could be labeled with both 'cat' and 'dog'
classification
Object Tasks to identify objects in an image and locate each object with a bounding box
detection e.g. locate all dogs and cats in an image and draw a bounding box around each.
Instance Tasks to identify objects in an image at the pixel level, drawing a polygon around
segmentation each object in the image.
Natural language processing: NLP

Support for natural language processing (NLP) tasks in automated ML allows you to
easily generate models trained on text data for text classification and named entity
recognition scenarios. Authoring automated ML trained NLP models is supported via the
Azure Machine Learning Python SDK. The resulting experimentation jobs, models, and
outputs can be accessed from the Azure Machine Learning studio UI.
The NLP capability supports:
End-to-end deep neural network NLP training with the latest pre-trained BERT
models
Seamless integration with Azure Machine Learning data labeling
Use labeled data for generating NLP models
Multi-lingual support with 104 languages
Distributed training with Horovod
Learn how to set up AutoML training for NLP models (v1).

How automated ML works
During training, Azure Machine Learning creates a number of pipelines in parallel that
try different algorithms and parameters for you. The service iterates through ML
algorithms paired with feature selections, where each iteration produces a model with a
training score. The higher the score, the better the model is considered to "fit" your
data. It will stop once it hits the exit criteria defined in the experiment.
Using Azure Machine Learning, you can design and run your automated ML training
experiments with these steps:
1. Identify the ML problem to be solved: classification, forecasting, regression or

computer vision.
2. Choose whether you want to use the Python SDK or the studio web experience:
Learn about the parity between the Python SDK and studio web experience.
For limited or no code experience, try the Azure Machine Learning studio
web experience at https://ml.azure.com
For Python developers, check out the Azure Machine Learning Python SDK
(v1)
3. Specify the source and format of the labeled training data: Numpy arrays or
Pandas dataframe
4. Configure the compute target for model training, such as your local computer,
Azure Machine Learning Computes, remote VMs, or Azure Databricks with SDK v1.
5. Configure the automated machine learning parameters that determine how many
iterations over different models, hyperparameter settings, advanced
preprocessing/featurization, and what metrics to look at when determining the
best model.
6. Submit the training job.
7. Review the results
The following diagram illustrates this process.

You can also inspect the logged job information, which contains metrics gathered
during the job. The training job produces a Python serialized object ( .pkl file) that
contains the model and data preprocessing.
While model building is automated, you can also learn how important or relevant
features are to the generated models.
https://www.microsoft.com/en-us/videoplayer/embed/RE2Xc9t?postJsllMsg=true
Guidance on local vs. remote managed ML

compute targets
The web interface for automated ML always uses a remote compute target. But when
you use the Python SDK, you will choose either a local compute or a remote compute
target for automated ML training.
Local compute: Training occurs on your local laptop or VM compute.

Remote compute: Training occurs on Machine Learning compute clusters.
Choose compute target

Consider these factors when choosing your compute target:
Choose a local compute: If your scenario is about initial explorations or demos

using small data and short trains (i.e. seconds or a couple of minutes per child job),
training on your local computer might be a better choice. There is no setup time,
the infrastructure resources (your PC or VM) are directly available.
Choose a remote ML compute cluster: If you are training with larger datasets like
in production training creating models which need longer trains, remote compute
will provide much better end-to-end time performance because AutoML will
parallelize trains across the cluster's nodes. On a remote compute, the start-up
time for the internal infrastructure will add around 1.5 minutes per child job, plus
additional minutes for the cluster infrastructure if the VMs are not yet up and
running.
Pros and cons

Consider these pros and cons when choosing to use local vs. remote.
Pros (Advantages) Cons (Handicaps)
Local compute target No environment start-up time Subset of features

Can't parallelize jobs
Worse for large data.
No data streaming while
training
No DNN-based
featurization
Python SDK only
Remote ML compute Full set of features Start-up time for cluster

clusters Parallelize child jobs nodes
Large data support Start-up time for each
DNN-based featurization child job
Dynamic scalability of compute
cluster on demand
No-code experience (web UI) also
available
Feature availability
More features are available when you use the remote compute, as shown in the table
below.
Feature Remote Local
Data streaming (Large data support, up to 100 GB) ✓
DNN-BERT-based text featurization and training ✓
Out-of-the-box GPU support (training and inference) ✓
Image Classification and Labeling support ✓
Auto-ARIMA, Prophet and ForecastTCN models for forecasting ✓
Multiple jobs/iterations in parallel ✓

Feature Remote Local
Create models with interpretability in AutoML studio web experience UI ✓
Feature engineering customization in studio web experience UI ✓
Azure Machine Learning hyperparameter tuning ✓
Azure Machine Learning Pipeline workflow support ✓
Continue a job ✓
Forecasting ✓ ✓
Create and run experiments in notebooks ✓ ✓
Register and visualize experiment's info and metrics in UI ✓ ✓
Data guardrails ✓ ✓
Training, validation and test data

With automated ML you provide the training data to train ML models, and you can
specify what type of model validation to perform. Automated ML performs model
validation as part of training. That is, automated ML uses validation data to tune model
hyperparameters based on the applied algorithm to find the best combination that best
fits the training data. However, the same validation data is used for each iteration of
tuning, which introduces model evaluation bias since the model continues to improve
and fit to the validation data.
To help confirm that such bias isn't applied to the final recommended model, automated
ML supports the use of test data to evaluate the final model that automated ML
recommends at the end of your experiment. When you provide test data as part of your
AutoML experiment configuration, this recommended model is tested by default at the
end of your experiment (preview).
） Important
Testing your models with a test dataset to evaluate generated models is a preview
feature. This capability is an experimental preview feature, and may change at any
time.
Learn how to configure AutoML experiments to use test data (preview) with the SDK (v1)
or with the Azure Machine Learning studio.
You can also test any existing automated ML model (preview) (v1)), including models
from child jobs, by providing your own test data or by setting aside a portion of your
training data.
Feature engineering
Feature engineering is the process of using domain knowledge of the data to create
features that help ML algorithms learn better. In Azure Machine Learning, scaling and
normalization techniques are applied to facilitate feature engineering. Collectively, these
techniques and feature engineering are referred to as featurization.
For automated machine learning experiments, featurization is applied automatically, but

can also be customized based on your data. Learn more about what featurization is
included (v1) and how AutoML helps prevent over-fitting and imbalanced data in your
models.
７ Note
Automated machine learning featurization steps (feature normalization, handling

missing data, converting text to numeric, etc.) become part of the underlying
model. When using the model for predictions, the same featurization steps applied
during training are applied to your input data automatically.
Customize featurization
Additional feature engineering techniques such as, encoding and transforms are also
available.
Enable this setting with:
Azure Machine Learning studio: Enable Automatic featurization in the View

additional configuration section with these (v1) steps.
Python SDK: Specify "feauturization": 'auto' / 'off' / 'FeaturizationConfig'

in your AutoMLConfig object. Learn more about enabling featurization (v1).
Ensemble models
Automated machine learning supports ensemble models, which are enabled by default.
Ensemble learning improves machine learning results and predictive performance by
combining multiple models as opposed to using single models. The ensemble iterations
appear as the final iterations of your job. Automated machine learning uses both voting
and stacking ensemble methods for combining models:
Voting: predicts based on the weighted average of predicted class probabilities

(for classification tasks) or predicted regression targets (for regression tasks).
Stacking: stacking combines heterogenous models and trains a meta-model based
on the output from the individual models. The current default meta-models are
LogisticRegression for classification tasks and ElasticNet for regression/forecasting
tasks.
The Caruana ensemble selection algorithm with sorted ensemble initialization is used
to decide which models to use within the ensemble. At a high level, this algorithm
initializes the ensemble with up to five models with the best individual scores, and
verifies that these models are within 5% threshold of the best score to avoid a poor
initial ensemble. Then for each ensemble iteration, a new model is added to the existing
ensemble and the resulting score is calculated. If a new model improved the existing
ensemble score, the ensemble is updated to include the new model.
See the how-to (v1) for changing default ensemble settings in automated machine
learning.
AutoML & ONNX

With Azure Machine Learning, you can use automated ML to build a Python model and
have it converted to the ONNX format. Once the models are in the ONNX format, they
can be run on a variety of platforms and devices. Learn more about accelerating ML
models with ONNX.
See how to convert to ONNX format in this Jupyter notebook example . Learn which
algorithms are supported in ONNX (v1).
The ONNX runtime also supports C#, so you can use the model built automatically in
your C# apps without any need for recoding or any of the network latencies that REST
endpoints introduce. Learn more about using an AutoML ONNX model in a .NET
application with ML.NET and inferencing ONNX models with the ONNX runtime C#
API .
Next steps
There are multiple resources to get you up and running with AutoML.
Tutorials/ how-tos
Tutorials are end-to-end introductory examples of AutoML scenarios.
For a code first experience, follow the Tutorial: Train a regression model with
AutoML and Python (v1).
For a low or no-code experience, see the Tutorial: Train a classification model with
no-code AutoML in Azure Machine Learning studio.
For using AutoML to train computer vision models, see the Tutorial: Train an
object detection model with AutoML and Python (v1).
How-to articles provide additional detail into what functionality automated ML offers.
For example,
Configure the settings for automatic training experiments

Without code in the Azure Machine Learning studio.
With the Python SDK v1.
Learn how to train forecasting models with time series data (v1).
Learn how to train computer vision models with Python (v1).
Learn how to view the generated code from your automated ML models.
Jupyter notebook samples

Review detailed code examples and use cases in the GitHub notebook repository for
automated machine learning samples .
Python SDK reference

Deepen your expertise of SDK design patterns and class specifications with the AutoML
class reference documentation.
７ Note
Automated machine learning capabilities are also available in other Microsoft

solutions such as, ML.NET, HDInsight, Power BI and SQL Server
Set up AutoML training with Python
Article • 06/01/2023
In this guide, learn how to set up an automated machine learning, AutoML, training run
with the Azure Machine Learning Python SDK using Azure Machine Learning automated
ML. Automated ML picks an algorithm and hyperparameters for you and generates a
model ready for deployment. This guide provides details of the various options that you
can use to configure automated ML experiments.
For an end to end example, see Tutorial: AutoML- train regression model.
If you prefer a no-code experience, you can also Set up no-code AutoML training in the
Prerequisites
For this article you need,
An Azure Machine Learning workspace. To create the workspace, see Create

workspace resources.
The Azure Machine Learning Python SDK installed. To install the SDK you can
either,
Create a compute instance, which automatically installs the SDK and is

preconfigured for ML workflows. See Create and manage an Azure Machine
Learning compute instance for more information.
Install the automl package yourself , which includes the default installation of
the SDK.
） Important
The Python commands in this article require the latest azureml-train-automl

package version.
Install the latest azureml-train-automl package to your local
environment.
For details on the latest azureml-train-automl package, see the release
notes.
２ Warning
Python 3.8 is not compatible with automl .
Select your experiment type

Before you begin your experiment, you should determine the kind of machine learning
problem you are solving. Automated machine learning supports task types of
classification , regression , and forecasting . Learn more about task types.
７ Note
Support for natural language processing (NLP) tasks: image classification (multi-
class and multi-label) and named entity recognition is available in public preview.
Learn more about NLP tasks in automated ML.
These preview capabilities are provided without a service-level agreement. Certain

features might not be supported or might have constrained functionality. For more
information, see Supplemental Terms of Use for Microsoft Azure Previews .
The following code uses the task parameter in the AutoMLConfig constructor to specify
the experiment type as classification .
Python
# task can be one of classification, regression, forecasting

automl_config = AutoMLConfig(task = "classification")
Data source and format

Automated machine learning supports data that resides on your local desktop or in the
cloud such as Azure Blob Storage. The data can be read into a Pandas DataFrame or an
Azure Machine Learning TabularDataset. Learn more about datasets.
Requirements for training data in machine learning:
Data must be in tabular form.

The value to predict, target column, must be in the data.
） Important
Automated ML experiments do not support training with datasets that use

identity-based data access.
For remote experiments, training data must be accessible from the remote compute.
Automated ML only accepts Azure Machine Learning TabularDatasets when working on
a remote compute.
Azure Machine Learning datasets expose functionality to:
Easily transfer data from static files or URL sources into your workspace.
Make your data available to training scripts when running on cloud compute
resources. See How to train with datasets for an example of using the Dataset
class to mount data to your remote compute target.
The following code creates a TabularDataset from a web url. See Create a TabularDataset
for code examples on how to create datasets from other sources like local files and
datastores.
Python

data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-
sample-notebook-data/creditcard.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
For local compute experiments, we recommend pandas dataframes for faster

processing times.
Python
import pandas as pd
df = pd.read_csv("your-local-file.csv")
train_data, test_data = train_test_split(df, test_size=0.1, random_state=42)
label = "label-col-name"
Training, validation, and test data

You can specify separate training data and validation data sets directly in the
AutoMLConfig constructor. Learn more about how to configure training, validation, cross
validation, and test data for your AutoML experiments.
If you do not explicitly specify a validation_data or n_cross_validation parameter,

automated ML applies default techniques to determine how validation is performed.
This determination depends on the number of rows in the dataset assigned to your
training_data parameter.
Training data size Validation technique
Larger than 20,000 rows Train/validation data split is applied. The default is to take 10% of the
initial training data set as the validation set. In turn, that validation set
is used for metrics calculation.
Smaller than 20,000 rows Cross-validation approach is applied. The default number of folds
depends on the number of rows.
If the dataset is less than 1,000 rows, 10 folds are used.
If the rows are between 1,000 and 20,000, then three folds are used.
 Tip
You can upload test data (preview) to evaluate models that automated ML
generated for you. These features are experimental preview capabilities, and may
change at any time. Learn how to:
Pass in test data to your AutoMLConfig object.

Test the models automated ML generated for your experiment.
If you prefer a no-code experience, see step 12 in Set up AutoML with the studio
UI
Large data
Automated ML supports a limited number of algorithms for training on large data that
can successfully build models for big data on small virtual machines. Automated ML
heuristics depend on properties such as data size, virtual machine memory size,
experiment timeout and featurization settings to determine if these large data
algorithms should be applied. Learn more about what models are supported in
automated ML.
For regression, Online Gradient Descent Regressor and Fast Linear Regressor
For classification, Averaged Perceptron Classifier and Linear SVM Classifier; where
the Linear SVM classifier has both large data and small data versions.
If you want to override these heuristics, apply the following settings:
Task Setting Notes
Block data streaming blocked_models in your AutoMLConfig Results in either run

algorithms object and list the model(s) you don't failure or long run
want to use. time
Use data streaming algorithms allowed_models in your AutoMLConfig

object and list the model(s) you want to
use.
Use data streaming algorithms Block all models except the big data
(studio UI experiments) algorithms you want to use.
Compute to run experiment

Next determine where the model will be trained. An automated ML training experiment
can run on the following compute options.
Choose a local compute: If your scenario is about initial explorations or demos

using small data and short trains (i.e. seconds or a couple of minutes per child run),
training on your local computer might be a better choice. There is no setup time,
the infrastructure resources (your PC or VM) are directly available. See this
notebook for a local compute example.
Choose a remote ML compute cluster: If you are training with larger datasets like
in production training creating models which need longer trains, remote compute
will provide much better end-to-end time performance because AutoML will
parallelize trains across the cluster's nodes. On a remote compute, the start-up
time for the internal infrastructure will add around 1.5 minutes per child run, plus
additional minutes for the cluster infrastructure if the VMs are not yet up and
running.Azure Machine Learning Managed Compute is a managed service that
enables the ability to train machine learning models on clusters of Azure virtual
machines. Compute instance is also supported as a compute target.
An Azure Databricks cluster in your Azure subscription. You can find more details
in Set up an Azure Databricks cluster for automated ML. See this GitHub site for
examples of notebooks with Azure Databricks.
Consider these factors when choosing your compute target:

Local compute target No environment start-up time Subset of features

Can't parallelize runs
Worse for large data.
No data streaming while
training
No DNN-based
featurization
Python SDK only
Remote ML compute Full set of features Start-up time for cluster

clusters Parallelize child runs nodes
Large data support Start-up time for each
DNN-based featurization child run
Dynamic scalability of compute
cluster on demand
No-code experience (web UI) also
available
Configure your experiment settings

There are several options that you can use to configure your automated ML experiment.
These parameters are set by instantiating an AutoMLConfig object. See the
AutoMLConfig class for a full list of parameters.
The following example is for a classification task. The experiment uses AUC weighted as
the primary metric and has an experiment time out set to 30 minutes and 2 cross-
validation folds.
Python
automl_classifier=AutoMLConfig(task='classification',
primary_metric='AUC_weighted',
experiment_timeout_minutes=30,
blocked_models=['XGBoostClassifier'],
training_data=train_data,
label_column_name=label,
n_cross_validations=2)
You can also configure forecasting tasks, which requires extra setup. See the Set up
AutoML for time-series forecasting article for more details.
Python
time_series_settings = {
'time_column_name': time_column_name,
'time_series_id_column_names':
time_series_id_column_names,
'forecast_horizon': n_test_periods
}
automl_config = AutoMLConfig(
task = 'forecasting',
debug_log='automl_oj_sales_errors.log',
primary_metric='normalized_root_mean_squared_error',
n_cross_validations=5,
path=project_folder,
verbosity=logging.INFO,
**time_series_settings
)
Supported models
Automated machine learning tries different models and algorithms during the
automation and tuning process. As a user, there is no need for you to specify the
algorithm.
The three different task parameter values determine the list of algorithms, or models,
to apply. Use the allowed_models or blocked_models parameters to further modify
iterations with the available models to include or exclude. The following table
summarizes the supported models by task type.
７ Note
If you plan to export your automated ML created models to an ONNX model, only
those algorithms indicated with an * (asterisk) are able to be converted to the
ONNX format. Learn more about converting models to ONNX.
Also note, ONNX only supports classification and regression tasks at this time.
Classification Regression Time Series Forecasting
Logistic Regression* Elastic Net* AutoARIMA
Light GBM* Light GBM* Prophet

Classification Regression Time Series Forecasting
Gradient Boosting* Gradient Boosting* Elastic Net
Decision Tree* Decision Tree* Light GBM
K Nearest Neighbors* K Nearest Neighbors* Gradient Boosting
Linear SVC* LARS Lasso* Decision Tree
Support Vector Classification Stochastic Gradient Descent Arimax

(SVC)* (SGD)*
Random Forest* Random Forest LARS Lasso
Extremely Randomized Extremely Randomized Stochastic Gradient Descent

Trees * Trees * (SGD)
Xgboost* Xgboost* Random Forest
Averaged Perceptron Classifier Online Gradient Descent Xgboost

Regressor
Naive Bayes * Fast Linear Regressor ForecastTCN
Stochastic Gradient Descent Naive

(SGD)*
Linear SVM Classifier* SeasonalNaive
Average
SeasonalAverage
ExponentialSmoothing
Primary metric
The primary_metric parameter determines the metric to be used during model training
for optimization. The available metrics you can select is determined by the task type you
choose.
Choosing a primary metric for automated ML to optimize depends on many factors. We

recommend your primary consideration be to choose a metric that best represents your
business needs. Then consider if the metric is suitable for your dataset profile (data size,
range, class distribution, etc.). The following sections summarize the recommended
primary metrics based on task type and business scenario.
Learn about the specific definitions of these metrics in Understand automated machine
learning results.
Metrics for classification scenarios
Threshold-dependent metrics, like accuracy , recall_score_weighted ,

norm_macro_recall , and precision_score_weighted may not optimize as well for datasets
that are small, have very large class skew (class imbalance), or when the expected metric
value is very close to 0.0 or 1.0. In those cases, AUC_weighted can be a better choice for
the primary metric. After automated ML completes, you can choose the winning model
based on the metric best suited to your business needs.
Metric Example use case(s)
accuracy Image classification, Sentiment analysis, Churn prediction
AUC_weighted Fraud detection, Image classification, Anomaly

detection/spam detection
average_precision_score_weighted Sentiment analysis
norm_macro_recall Churn prediction
precision_score_weighted
Metrics for regression scenarios
r2_score , normalized_mean_absolute_error and normalized_root_mean_squared_error

are all trying to minimize prediction errors. r2_score and
normalized_root_mean_squared_error are both minimizing average squared errors while
normalized_mean_absolute_error is minizing the average absolute value of errors.
Absolute value treats errors at all magnitudes alike and squared errors will have a much
larger penalty for errors with larger absolute values. Depending on whether larger errors
should be punished more or not, one can choose to optimize squared error or absolute
error.
The main difference between r2_score and normalized_root_mean_squared_error is the

way they are normalized and their meanings. normalized_root_mean_squared_error is
root mean squared error normalized by range and can be interpreted as the average
error magnitude for prediction. r2_score is mean squared error normalized by an
estimate of variance of data. It is the proportion of variation that can be captured by the
model.
７ Note
r2_score and normalized_root_mean_squared_error also behave similarly as primary

metrics. If a fixed validation set is applied, these two metrics are optimizing the
same target, mean squared error, and will be optimized by the same model. When
only a training set is available and cross-validation is applied, they would be slightly
different as the normalizer for normalized_root_mean_squared_error is fixed as the
range of training set, but the normalizer for r2_score would vary for every fold as
it's the variance for each fold.
If the rank, instead of the exact value is of interest, spearman_correlation can be a better
choice as it measures the rank correlation between real values and predictions.
However, currently no primary metrics for regression addresses relative difference. All of
r2_score , normalized_mean_absolute_error , and normalized_root_mean_squared_error
treat a $20k prediction error the same for a worker with a $30k salary as a worker
making $20M, if these two data points belongs to the same dataset for regression, or
the same time series specified by the time series identifier. While in reality, predicting
only $20k off from a $20M salary is very close (a small 0.1% relative difference), whereas
$20k off from $30k is not close (a large 67% relative difference). To address the issue of
relative difference, one can train a model with available primary metrics, and then select
the model with best mean_absolute_percentage_error or root_mean_squared_log_error .
spearman_correlation
normalized_root_mean_squared_error Price prediction (house/product/tip), Review score

prediction
r2_score Airline delay, Salary estimation, Bug resolution time
normalized_mean_absolute_error
Metrics for time series forecasting scenarios
The recommendations are similar to those noted for regression scenarios.
normalized_root_mean_squared_error Price prediction (forecasting), Inventory optimization,

Demand forecasting
r2_score Price prediction (forecasting), Inventory optimization,

Demand forecasting
normalized_mean_absolute_error
Data featurization
In every automated ML experiment, your data is automatically scaled and normalized to
help certain algorithms that are sensitive to features that are on different scales. This
scaling and normalization is referred to as featurization. See Featurization in AutoML for
more detail and code examples.
７ Note

When configuring your experiments in your AutoMLConfig object, you can

enable/disable the setting featurization . The following table shows the accepted
settings for featurization in the AutoMLConfig object.
Featurization Configuration Description
"featurization": 'auto' Indicates that as part of preprocessing, data

guardrails and featurization steps are performed
automatically. Default setting.
"featurization": 'off' Indicates featurization step shouldn't be done

automatically.
"featurization": 'FeaturizationConfig' Indicates customized featurization step should be

used. Learn how to customize featurization.
Ensemble configuration
Ensemble models are enabled by default, and appear as the final run iterations in an
AutoML run. Currently VotingEnsemble and StackEnsemble are supported.
Voting implements soft-voting, which uses weighted averages. The stacking
implementation uses a two layer implementation, where the first layer has the same
models as the voting ensemble, and the second layer model is used to find the optimal
combination of the models from the first layer.
If you are using ONNX models, or have model-explainability enabled, stacking is

disabled and only voting is utilized.
Ensemble training can be disabled by using the enable_voting_ensemble and

enable_stack_ensemble boolean parameters.
Python
automl_classifier = AutoMLConfig(
task='classification',
training_data=data_train,
enable_voting_ensemble=False,
enable_stack_ensemble=False
)
To alter the default ensemble behavior, there are multiple default arguments that can be
provided as kwargs in an AutoMLConfig object.
） Important
The following parameters aren't explicit parameters of the AutoMLConfig class.
ensemble_download_models_timeout_sec : During VotingEnsemble and
StackEnsemble model generation, multiple fitted models from the previous child
runs are downloaded. If you encounter this error: AutoMLEnsembleException: Could
not find any models for running ensembling , then you may need to provide more
time for the models to be downloaded. The default value is 300 seconds for
downloading these models in parallel and there is no maximum timeout limit.
Configure this parameter with a higher value than 300 secs, if more time is needed.
７ Note
If the timeout is reached and there are models downloaded, then the
ensembling proceeds with as many models it has downloaded. It's not
required that all the models need to be downloaded to finish within that
timeout. The following parameters only apply to StackEnsemble models:
stack_meta_learner_type : the meta-learner is a model trained on the output of the
individual heterogeneous models. Default meta-learners are LogisticRegression

for classification tasks (or LogisticRegressionCV if cross-validation is enabled) and
ElasticNet for regression/forecasting tasks (or ElasticNetCV if cross-validation is
enabled). This parameter can be one of the following strings: LogisticRegression ,

LogisticRegressionCV , LightGBMClassifier , ElasticNet , ElasticNetCV ,
LightGBMRegressor , or LinearRegression .
stack_meta_learner_train_percentage : specifies the proportion of the training set
(when choosing train and validation type of training) to be reserved for training
the meta-learner. Default value is 0.2 .
stack_meta_learner_kwargs : optional parameters to pass to the initializer of the
meta-learner. These parameters and parameter types mirror the parameters and
parameter types from the corresponding model constructor, and are forwarded to
the model constructor.
The following code shows an example of specifying custom ensemble behavior in an

AutoMLConfig object.
Python
ensemble_settings = {
"ensemble_download_models_timeout_sec": 600
"stack_meta_learner_type": "LogisticRegressionCV",
"stack_meta_learner_train_percentage": 0.3,
"stack_meta_learner_kwargs": {
"refit": True,
"fit_intercept": False,
"class_weight":
"balanced",
"multi_class": "auto",
"n_jobs": -1
}
}
automl_classifier = AutoMLConfig(
task='classification',
**ensemble_settings
)
Exit criteria
There are a few options you can define in your AutoMLConfig to end your experiment.
Criteria description
No criteria If you do not define any exit parameters the experiment continues
until no further progress is made on your primary metric.
After a length of time Use experiment_timeout_minutes in your settings to define how long,
in minutes, your experiment should continue to run.
To help avoid experiment time out failures, there is a minimum of 15

minutes, or 60 minutes if your row by column size exceeds 10 million.
A score has been reached Use experiment_exit_score completes the experiment after a
specified primary metric score has been reached.
Run experiment
２ Warning
If you run an experiment with the same configuration settings and primary metric
multiple times, you'll likely see variation in each experiments final metrics score and
generated models. The algorithms automated ML employs have inherent
randomness that can cause slight variation in the models output by the experiment
and the recommended model's final metrics score, like accuracy. You'll likely also
see results with the same model name, but different hyperparameters used.
For automated ML, you create an Experiment object, which is a named object in a
Workspace used to run experiments.
Python
# Choose a name for the experiment and specify the project folder.
experiment_name = 'Tutorial-automl'
project_folder = './sample_projects/automl-classification'
experiment = Experiment(ws, experiment_name)

Submit the experiment to run and generate a model. Pass the AutoMLConfig to the
submit method to generate the model.
Python
run = experiment.submit(automl_config, show_output=True)
７ Note
Dependencies are first installed on a new machine. It may take up to 10 minutes

before output is shown. Setting show_output to True results in output being shown
on the console.
Multiple child runs on clusters

Automated ML experiment child runs can be performed on a cluster that is already
running another experiment. However, the timing depends on how many nodes the
cluster has, and if those nodes are available to run a different experiment.
Each node in the cluster acts as an individual virtual machine (VM) that can accomplish a
single training run; for automated ML this means a child run. If all the nodes are busy,
the new experiment is queued. But if there are free nodes, the new experiment will run
automated ML child runs in parallel in the available nodes/VMs.
To help manage child runs and when they can be performed, we recommend you create
a dedicated cluster per experiment, and match the number of
max_concurrent_iterations of your experiment to the number of nodes in the cluster.
This way, you use all the nodes of the cluster at the same time with the number of
concurrent child runs/iterations you want.
Configure max_concurrent_iterations in your AutoMLConfig object. If it is not

configured, then by default only one concurrent child run/iteration is allowed per
experiment. In case of compute instance, max_concurrent_iterations can be set to be
the same as number of cores on the compute instance VM.
Explore models and metrics

Automated ML offers options for you to monitor and evaluate your training results.
You can view your training results in a widget or inline if you are in a notebook. See
Monitor automated machine learning runs for more details.
For definitions and examples of the performance charts and metrics provided for
each run, see Evaluate automated machine learning experiment results.
To get a featurization summary and understand what features were added to a

particular model, see Featurization transparency.
You can view the hyperparameters, the scaling and normalization techniques, and
algorithm applied to a specific automated ML run with the custom code solution,
print_model().
 Tip
Automated ML also let's you view the generated model training code for Auto ML
trained models. This functionality is in public preview and can change at any time.
Monitor automated machine learning runs

For automated ML runs, to access the charts from a previous run, replace
<<experiment_name>> with the appropriate experiment name:
Python
from azureml.widgets import RunDetails

from azureml.core.run import Run
experiment = Experiment (workspace, <<experiment_name>>)

run_id = 'autoML_my_runID' #replace with run_ID
run = Run(experiment, run_id)
RunDetails(run).show()
Test models (preview)
） Important
Testing your models with a test dataset to evaluate automated ML generated

models is a preview feature. This capability is an experimental preview feature, and
may change at any time.
２ Warning
This feature is not available for the following automated ML scenarios
Computer vision tasks

Many models and hiearchical time series forecasting training (preview)
Forecasting tasks where deep learning neural networks (DNN) are enabled
Automated ML runs from local computes or Azure Databricks clusters
Passing the test_data or test_size parameters into the AutoMLConfig , automatically

triggers a remote test run that uses the provided test data to evaluate the best model
that automated ML recommends upon completion of the experiment. This remote test
run is done at the end of the experiment, once the best model is determined. See how
to pass test data into your AutoMLConfig.
Get test job results

You can get the predictions and metrics from the remote test job from the Azure
Machine Learning studio or with the following code.
Python
best_run, fitted_model = remote_run.get_output()

test_run = next(best_run.get_children(type='automl.model_test'))
test_run.wait_for_completion(show_output=False, wait_post_processing=True)
# Get test metrics

test_run_metrics = test_run.get_metrics()
for name, value in test_run_metrics.items():
print(f"{name}: {value}")
# Get test predictions as a Dataset

test_run_details = test_run.get_details()
dataset_id = test_run_details['outputDatasets'][0]['identifier']['savedId']
test_run_predictions = Dataset.get_by_id(workspace, dataset_id)
predictions_df = test_run_predictions.to_pandas_dataframe()
# Alternatively, the test predictions can be retrieved via the run outputs.
test_run.download_file("predictions/predictions.csv")
predictions_df = pd.read_csv("predictions.csv")
The model test job generates the predictions.csv file that's stored in the default
datastore created with the workspace. This datastore is visible to all users with the same
subscription. Test jobs are not recommended for scenarios if any of the information
used for or created by the test job needs to remain private.
Test existing automated ML model

To test other existing automated ML models created, best job or child job, use
ModelProxy() to test a model after the main AutoML run has completed. ModelProxy()
already returns the predictions and metrics and does not require further processing to
retrieve the outputs.
７ Note
ModelProxy is an experimental preview class, and may change at any time.
The following code demonstrates how to test a model from any run by using
ModelProxy.test() method. In the test() method you have the option to specify if you
only want to see the predictions of the test run with the include_predictions_only
parameter.
Python
from azureml.train.automl.model_proxy import ModelProxy
model_proxy = ModelProxy(child_run=my_run, compute_target=cpu_cluster)

predictions, metrics = model_proxy.test(test_data, include_predictions_only=
True
)
Register and deploy models

After you test a model and confirm you want to use it in production, you can register it
for later use and
To register a model from an automated ML run, use the register_model() method.
Python
best_run = run.get_best_child()
print(fitted_model.steps)
model_name = best_run.properties['model_name']
description = 'AutoML forecast example'
tags = None
model = run.register_model(model_name = model_name,

description = description,
tags = tags)
For details on how to create a deployment configuration and deploy a registered model
to a web service, see how and where to deploy a model.
 Tip
For registered models, one-click deployment is available via the Azure Machine
Learning studio . See how to deploy registered models from the studio.
Model interpretability
Model interpretability allows you to understand why your models made predictions, and
the underlying feature importance values. The SDK includes various packages for
enabling model interpretability features, both at training and inference time, for local
and deployed models.
See how to enable interpretability features specifically within automated ML

experiments.
For general information on how model explanations and feature importance can be
enabled in other areas of the SDK outside of automated machine learning, see the
concept article on interpretability .
７ Note
The ForecastTCN model is not currently supported by the Explanation Client. This
model will not return an explanation dashboard if it is returned as the best model,
and does not support on-demand explanation runs.
Next steps
Learn more about how and where to deploy a model.
Learn more about how to train a regression model with Automated machine
learning.
Troubleshoot automated ML experiments.

Train a regression model with AutoML and Python (SDK v1)
Article • 06/02/2023
In this article, you learn how to train a regression model with the Azure Machine Learning Python SDK using Azure Machine Learning
automated ML. This regression model predicts NYC taxi fares.
This process accepts training data and configuration settings, and automatically iterates through combinations of different feature
normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.
You'll write code using the Python SDK in this article. You'll learn the following tasks:
＂ Download, transform, and clean data using Azure Open Datasets

＂ Train an automated machine learning regression model
＂ Calculate model accuracy
For no-code AutoML, try the following tutorials:
Tutorial: Train no-code classification models
Tutorial: Forecast demand with automated machine learning
Prerequisites
If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning
today.
Complete the Quickstart: Get started with Azure Machine Learning if you don't already have an Azure Machine Learning workspace or
a compute instance.
After you complete the quickstart:
1. Select Notebooks in the studio.

2. Select the Samples tab.
3. Open the SDK v1/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb notebook.
4. To run each cell in the tutorial, select Clone this notebook
This article is also available on GitHub if you wish to run it in your own local environment. To get the required packages,
Install the full automl client .

Run pip install azureml-opendatasets azureml-widgets to get the required packages.
Download and prepare data

Import the necessary packages. The Open Datasets package contains a class representing each data source ( NycTlcGreen for example) to
easily filter date parameters before downloading.
Python
from azureml.opendatasets import NycTlcGreen

import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading
one month of data at a time with certain classes to avoid MemoryError with large datasets.
To download taxi data, iteratively fetch one month at a time, and before appending it to green_taxi_df randomly sample 2,000 records
from each month to avoid bloating the dataframe. Then preview the data.
Python
green_taxi_df = pd.DataFrame([])
start = datetime.strptime("1/1/2015","%m/%d/%Y")
end = datetime.strptime("1/31/2015","%m/%d/%Y")
for sample_month in range(12):

temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \
.to_pandas_dataframe()
green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))
green_taxi_df.head(10)
vendorID lpepPickupDatetime lpepDropoffDatetime passengerCount tripDistance puLocationId doLocationId pickupLongitude pickupLatitud
131969 2 2015-01-11 05:34:44 2015-01-11 3 4.84 None None -73.88

05:45:03
1129817 2 2015-01-20 16:26:29 2015-01-20 1 0.69 None None -73.96

16:30:26
1278620 2 2015-01-01 05:58:10 2015-01-01 1 0.45 None None -73.92

06:00:55
348430 2 2015-01-17 02:20:50 2015-01-17 1 0.00 None None -73.81

02:41:38
1269627 1 2015-01-01 05:04:10 2015-01-01 1 0.50 None None -73.92

05:06:23
811755 1 2015-01-04 19:57:51 2015-01-04 2 1.10 None None -73.96

20:05:45
737281 1 2015-01-03 12:27:31 2015-01-03 1 0.90 None None -73.88

12:33:52
113951 1 2015-01-09 23:25:51 2015-01-09 1 3.30 None None -73.96

23:39:52
150436 2 2015-01-11 17:15:14 2015-01-11 1 1.19 None None -73.94

17:22:57
432136 2 2015-01-22 23:16:33 None None -73.94 40.71 -73.94 ...

2015-01-22 23:20:13 1
0.65
Remove some of the columns that you won't need for training or additional feature building. Automate machine learning will automatically
handle time-based features such as lpepPickupDatetime.
Python
columns_to_remove = ["lpepDropoffDatetime", "puLocationId", "doLocationId", "extra", "mtaTax",

"improvementSurcharge", "tollsAmount", "ehailFee", "tripType", "rateCodeID",
"storeAndFwdFlag", "paymentType", "fareAmount", "tipAmount"
]
for col in columns_to_remove:
green_taxi_df.pop(col)
green_taxi_df.head(5)
Cleanse data
Run the describe() function on the new dataframe to see summary statistics for each field.
Python
green_taxi_df.describe()
vendorID passengerCount tripDistance pickupLongitude pickupLatitude dropoffLongitude dropoffLatitude totalAmount month_num day
day_of_month
count 48000.00 48000.00 48000.00 48000.00 48000.00 48000.00 48000.00 48000.00 480
mean 1.78 1.37 2.87 -73.83 40.69 -73.84 40.70 14.75 6.50
std 0.41 1.04 2.93 2.76 1.52 2.61 1.44 12.08 3.4
min 1.00 0.00 0.00 -74.66 0.00 -74.66 0.00 -300.00 1.00
25% 2.00 1.00 1.06 -73.96 40.70 -73.97 40.70 7.80 3.7
50% 2.00 1.00 1.90 -73.94 40.75 -73.94 40.75 11.30 6.50
75% 2.00 1.00 3.60 -73.92 40.80 -73.91 40.79 17.80 9.2
max 2.00 9.00 97.57 0.00 41.93 0.00 41.94 450.00 12.0
From the summary statistics, you see that there are several fields that have outliers or values that will reduce model accuracy. First filter the
lat/long fields to be within the bounds of the Manhattan area. This will filter out longer taxi trips or trips that are outliers in respect to their
relationship with other features.
Additionally filter the tripDistance field to be greater than zero but less than 31 miles (the haversine distance between the two lat/long
pairs). This eliminates long outlier trips that have inconsistent trip cost.
Lastly, the totalAmount field has negative values for the taxi fares, which don't make sense in the context of our model, and the
passengerCount field has bad data with the minimum values being zero.
Filter out these anomalies using query functions, and then remove the last few columns unnecessary for training.
Python
final_df = green_taxi_df.query("pickupLatitude>=40.53 and pickupLatitude<=40.88")

final_df = final_df.query("pickupLongitude>=-74.09 and pickupLongitude<=-73.72")
final_df = final_df.query("tripDistance>=0.25 and tripDistance<31")
final_df = final_df.query("passengerCount>0 and totalAmount>0")
columns_to_remove_for_training = ["pickupLongitude", "pickupLatitude", "dropoffLongitude", "dropoffLatitude"]

for col in columns_to_remove_for_training:
final_df.pop(col)
Call describe() again on the data to ensure cleansing worked as expected. You now have a prepared and cleansed set of taxi, holiday, and
weather data to use for machine learning model training.
Python
final_df.describe()
Configure workspace
Create a workspace object from the existing workspace. A Workspace is a class that accepts your Azure subscription and resource
information. It also creates a cloud resource to monitor and track your model runs. Workspace.from_config() reads the file config.json and
loads the authentication details into an object named ws . ws is used throughout the rest of the code in this article.
Python
from azureml.core.workspace import Workspace

Split the data into train and test sets

Split the data into training and test sets by using the train_test_split function in the scikit-learn library. This function segregates the
data into the x (features) data set for model training and the y (values to predict) data set for testing.
The test_size parameter determines the percentage of data to allocate to testing. The random_state parameter sets a seed to the random
generator, so that your train-test splits are deterministic.
Python
x_train, x_test = train_test_split(final_df, test_size=0.2, random_state=223)
The purpose of this step is to have data points to test the finished model that haven't been used to train the model, in order to measure
true accuracy.
In other words, a well-trained model should be able to accurately make predictions from data it hasn't already seen. You now have data
prepared for auto-training a machine learning model.
Automatically train a model

To automatically train a model, take the following steps:
1. Define settings for the experiment run. Attach your training data to the configuration, and modify settings that control the training
process.
2. Submit the experiment for model tuning. After submitting the experiment, the process iterates through different machine learning
algorithms and hyperparameter settings, adhering to your defined constraints. It chooses the best-fit model by optimizing an
accuracy metric.
Define training settings

Define the experiment parameter and model settings for training. View the full list of settings. Submitting the experiment with these default
settings will take approximately 5-20 min, but if you want a shorter run time, reduce the experiment_timeout_hours parameter.
Property Value in this article Description
iteration_timeout_minutes 10 Time limit in minutes for each iteration. Increase this value for larger datasets that need more time for
each iteration.
experiment_timeout_hours 0.3 Maximum amount of time in hours that all iterations combined can take before the experiment
terminates.
enable_early_stopping True Flag to enable early termination if the score is not improving in the short term.
primary_metric spearman_correlation Metric that you want to optimize. The best-fit model will be chosen based on this metric.
featurization auto By using auto, the experiment can preprocess the input data (handling missing data, converting text
to numeric, etc.)
verbosity logging.INFO Controls the level of logging.
n_cross_validations 5 Number of cross-validation splits to perform when validation data is not specified.
Python
import logging
automl_settings = {
"iteration_timeout_minutes": 10,
"experiment_timeout_hours": 0.3,
"enable_early_stopping": True,
"primary_metric": 'spearman_correlation',
"featurization": 'auto',
"verbosity": logging.INFO,
"n_cross_validations": 5
}
Use your defined training settings as a **kwargs parameter to an AutoMLConfig object. Additionally, specify your training data and the type
of model, which is regression in this case.
Python
automl_config = AutoMLConfig(task='regression',
debug_log='automated_ml_errors.log',
training_data=x_train,
label_column_name="totalAmount",
**automl_settings)
７ Note
Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to numeric, etc.)
become part of the underlying model. When using the model for predictions, the same pre-processing steps applied during training
are applied to your input data automatically.
Train the automatic regression model

Create an experiment object in your workspace. An experiment acts as a container for your individual jobs. Pass the defined automl_config
object to the experiment, and set the output to True to view progress during the job.
After starting the experiment, the output shown updates live as the experiment runs. For each iteration, you see the model type, the run
duration, and the training accuracy. The field BEST tracks the best running training score based on your metric type.
Python

experiment = Experiment(ws, "Tutorial-NYCTaxi")
local_run = experiment.submit(automl_config, show_output=True)
Output
Running on local machine

Parent Run ID: AutoML_1766cdf7-56cf-4b28-a340-c4aeee15b12b
Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.
****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************
ITERATION PIPELINE DURATION METRIC BEST

0 StandardScalerWrapper RandomForest 0:00:16 0.8746 0.8746
1 MinMaxScaler RandomForest 0:00:15 0.9468 0.9468
2 StandardScalerWrapper ExtremeRandomTrees 0:00:09 0.9303 0.9468
3 StandardScalerWrapper LightGBM 0:00:10 0.9424 0.9468
4 RobustScaler DecisionTree 0:00:09 0.9449 0.9468
5 StandardScalerWrapper LassoLars 0:00:09 0.9440 0.9468
6 StandardScalerWrapper LightGBM 0:00:10 0.9282 0.9468
9 MinMaxScaler ExtremeRandomTrees 0:00:35 0.9199 0.9468
10 RobustScaler ExtremeRandomTrees 0:00:19 0.9411 0.9468
13 MinMaxScaler ExtremeRandomTrees 0:00:14 0.9186 0.9468
14 RobustScaler RandomForest 0:00:10 0.8810 0.9468
18 VotingEnsemble 0:00:23 0.9471 0.9471
19 StackEnsemble 0:00:27 0.9463 0.9471
Explore the results

Explore the results of automatic training with a Jupyter widget. The widget allows you to see a graph and table of all individual job
iterations, along with training accuracy metrics and metadata. Additionally, you can filter on different accuracy metrics than your primary
metric with the dropdown selector.
Python

RunDetails(local_run).show()
Retrieve the best model

Select the best model from your iterations. The get_output function returns the best run and the fitted model for the last fit invocation. By
using the overloads on get_output , you can retrieve the best run and fitted model for any logged metric or a particular iteration.
Python
best_run, fitted_model = local_run.get_output()

print(best_run)
print(fitted_model)
Test the best model accuracy

Use the best model to run predictions on the test data set to predict taxi fares. The function predict uses the best model and predicts the
values of y, trip cost, from the x_test data set. Print the first 10 predicted cost values from y_predict .
Python
y_test = x_test.pop("totalAmount")
y_predict = fitted_model.predict(x_test)
print(y_predict[:10])
Calculate the root mean squared error of the results. Convert the y_test dataframe to a list to compare to the predicted values. The
function mean_squared_error takes two arrays of values and calculates the average squared error between them. Taking the square root of
the result gives an error in the same units as the y variable, cost. It indicates roughly how far the taxi fare predictions are from the actual
fares.
Python
from sklearn.metrics import mean_squared_error

from math import sqrt
y_actual = y_test.values.flatten().tolist()
rmse = sqrt(mean_squared_error(y_actual, y_predict))
rmse
Run the following code to calculate mean absolute percent error (MAPE) by using the full y_actual and y_predict data sets. This metric
calculates an absolute difference between each predicted and actual value and sums all the differences. Then it expresses that sum as a
percent of the total of the actual values.
Python
sum_actuals = sum_errors = 0
for actual_val, predict_val in zip(y_actual, y_predict):

abs_error = actual_val - predict_val
if abs_error < 0:
abs_error = abs_error * -1
sum_errors = sum_errors + abs_error

sum_actuals = sum_actuals + actual_val
mean_abs_percent_error = sum_errors / sum_actuals

print("Model MAPE:")
print(mean_abs_percent_error)
print()
print("Model Accuracy:")
print(1 - mean_abs_percent_error)
Output
Model MAPE:
0.14353867606052823
Model Accuracy:
0.8564613239394718
From the two prediction accuracy metrics, you see that the model is fairly good at predicting taxi fares from the data set's features, typically
within +- $4.00, and approximately 15% error.
The traditional machine learning model development process is highly resource-intensive, and requires significant domain knowledge and
time investment to run and compare the results of dozens of models. Using automated machine learning is a great way to rapidly test
many different models for your scenario.
Clean up resources
Do not complete this section if you plan on running other Azure Machine Learning tutorials.

If you used a compute instance, stop the VM when you aren't using it to reduce cost.
1. In your workspace, select Compute.
2. From the list, select the name of the compute instance.
3. Select Stop.
4. When you're ready to use the server again, select Start.

Delete everything
If you don't plan to use the resources you created, delete them, so you don't incur any charges.

2. From the list, select the resource group you created.
You can also keep the resource group but delete a single workspace. Display the workspace properties and select Delete.
Next steps
In this automated machine learning article, you did the following tasks:
＂ Configured a workspace and prepared data for an experiment.

＂ Trained by using an automated regression model locally with custom parameters.
＂ Explored and reviewed training results.
Set up AutoML to train computer vision models with Python (v1)

Tutorial: Train an object detection model
(preview) with AutoML and Python (v1)
Article • 06/01/2023
） Important
The features presented in this article are in preview. They should be considered
experimental preview features that might change at any time.
In this tutorial, you learn how to train an object detection model using Azure Machine
Learning automated ML with the Azure Machine Learning Python SDK. This object
detection model identifies whether the image contains objects, such as a can, carton,
milk bottle, or water bottle.
Automated ML accepts training data and configuration settings, and automatically

iterates through combinations of different feature normalization/standardization
methods, models, and hyperparameter settings to arrive at the best model.
You'll write code using the Python SDK in this tutorial and learn the following tasks:
＂ Download and transform data

＂ Train an automated machine learning object detection model
＂ Specify hyperparameter values for your model
＂ Perform a hyperparameter sweep
＂ Deploy your model
＂ Visualize detections
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today.
Python 3.7 or 3.8 are supported for this feature
Complete the Quickstart: Get started with Azure Machine Learning if you don't
already have an Azure Machine Learning workspace.
Download and unzip the *odFridgeObjects.zip data file. The dataset is annotated
in Pascal VOC format, where each image corresponds to an xml file. Each xml file
contains information on where its corresponding image file is located and also
contains information about the bounding boxes and the object labels. In order to
use this data, you first need to convert it to the required JSONL format as seen in
the Convert the downloaded data to JSONL section of the notebook.
This tutorial is also available in the azureml-examples repository on GitHub if you wish
to run it in your own local environment. To get the required packages,
Run pip install azureml

Install the full automl client
Compute target setup

You first need to set up a compute target to use for your automated ML model training.
Automated ML models for image tasks require GPU SKUs.
This tutorial uses the NCsv3-series (with V100 GPUs) as this type of compute target
leverages multiple GPUs to speed up training. Additionally, you can set up multiple
nodes to take advantage of parallelism when tuning hyperparameters for your model.
The following code creates a GPU compute of size Standard _NC24s_v3 with four nodes
that are attached to the workspace, ws .
２ Warning
Ensure your subscription has sufficient quota for the compute target you wish to
use.
Python
from azureml.core.compute import AmlCompute, ComputeTarget
cluster_name = "gpu-nc24sv3"
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print('Found existing compute target.')
except KeyError:
print('Creating a new compute target...')
compute_config =
AmlCompute.provisioning_configuration(vm_size='Standard_NC24s_v3',
idle_seconds_before_scaledown=1800,
min_nodes=0,
max_nodes=4)
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
#If no min_node_count is provided, the scale settings are used for the
cluster.
compute_target.wait_for_completion(show_output=True, min_node_count=None,
timeout_in_minutes=20)
Experiment setup
Next, create an Experiment in your workspace to track your model training runs.
Python
experiment_name = 'automl-image-object-detection'
experiment = Experiment(ws, name=experiment_name)
Visualize input data

Once you have the input image data prepared in JSONL (JSON Lines) format, you can
visualize the ground truth bounding boxes for an image. To do so, be sure you have
matplotlib installed.
%pip install --upgrade matplotlib
Python
%matplotlib inline
import matplotlib.patches as patches
from PIL import Image as pil_image
import numpy as np
import json
import os
def plot_ground_truth_boxes(image_file, ground_truth_boxes):

# Display the image
plt.figure()
img_np = mpimg.imread(image_file)
img = pil_image.fromarray(img_np.astype("uint8"), "RGB")
img_w, img_h = img.size
fig,ax = plt.subplots(figsize=(12, 16))

ax.imshow(img_np)
ax.axis("off")
label_to_color_mapping = {}
for gt in ground_truth_boxes:
label = gt["label"]
xmin, ymin, xmax, ymax = gt["topX"], gt["topY"], gt["bottomX"],

gt["bottomY"]
topleft_x, topleft_y = img_w * xmin, img_h * ymin
width, height = img_w * (xmax - xmin), img_h * (ymax - ymin)
if label in label_to_color_mapping:
color = label_to_color_mapping[label]
else:
# Generate a random color. If you want to use a specific color,
you can use something like "red".
color = np.random.rand(3)
label_to_color_mapping[label] = color
# Display bounding box

rect = patches.Rectangle((topleft_x, topleft_y), width, height,
linewidth=2, edgecolor=color,
facecolor="none")
ax.add_patch(rect)
# Display label
ax.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)
plt.show()
def plot_ground_truth_boxes_jsonl(image_file, jsonl_file):

image_base_name = os.path.basename(image_file)
ground_truth_data_found = False
with open(jsonl_file) as fp:
for line in fp.readlines():
line_json = json.loads(line)
filename = line_json["image_url"]
if image_base_name in filename:
ground_truth_data_found = True
plot_ground_truth_boxes(image_file, line_json["label"])
break
if not ground_truth_data_found:
print("Unable to find ground truth information for image:
{}".format(image_file))
def plot_ground_truth_boxes_dataset(image_file, dataset_pd):

image_base_name = os.path.basename(image_file)
image_pd =
dataset_pd[dataset_pd['portable_path'].str.contains(image_base_name)]
if not image_pd.empty:
ground_truth_boxes = image_pd.iloc[0]["label"]
plot_ground_truth_boxes(image_file, ground_truth_boxes)
else:
print("Unable to find ground truth information for image:
{}".format(image_file))
Using the above helper functions, for any given image, you can run the following code
to display the bounding boxes.
Python
image_file = "./odFridgeObjects/images/31.jpg"
jsonl_file = "./odFridgeObjects/train_annotations.jsonl"
plot_ground_truth_boxes_jsonl(image_file, jsonl_file)
Upload data and create dataset

In order to use the data for training, upload it to your workspace via a datastore. The
datastore provides a mechanism for you to upload or download data, and interact with
it from your remote compute targets.
Python
ds = ws.get_default_datastore()
ds.upload(src_dir='./odFridgeObjects', target_path='odFridgeObjects')
Once uploaded to the datastore, you can create an Azure Machine Learning dataset
from the data. Datasets package your data into a consumable object for training.
The following code creates a dataset for training. Since no validation dataset is specified,
by default 20% of your training data is used for validation.
Python

from azureml.data import DataType
training_dataset_name = 'odFridgeObjectsTrainingDataset'
if training_dataset_name in ws.datasets:
training_dataset = ws.datasets.get(training_dataset_name)
print('Found the training dataset', training_dataset_name)
else:
# create training dataset
# create training dataset
training_dataset = Dataset.Tabular.from_json_lines_files(
path=ds.path('odFridgeObjects/train_annotations.jsonl'),
set_column_types={"image_url": DataType.to_stream(ds.workspace)},
)
training_dataset = training_dataset.register(workspace=ws,
name=training_dataset_name)
print("Training dataset name: " + training_dataset.name)
Visualize dataset
You can also visualize the ground truth bounding boxes for an image from this dataset.
Load the dataset into a pandas dataframe.
Python
import azureml.dataprep as dprep
from azureml.dataprep.api.functions import get_portable_path
# Get pandas dataframe from the dataset

dflow =
training_dataset._dataflow.add_column(get_portable_path(dprep.col("image_url
")),
"portable_path", "image_url")
dataset_pd = dflow.to_pandas_dataframe(extended_types=True)
For any given image, you can run the following code to display the bounding boxes.
Python
image_file = "./odFridgeObjects/images/31.jpg"
plot_ground_truth_boxes_dataset(image_file, dataset_pd)
Configure your object detection experiment

To configure automated ML runs for image-related tasks, use the AutoMLImageConfig
object. In your AutoMLImageConfig , you can specify the model algorithms with the
model_name parameter and configure the settings to perform a hyperparameter sweep
over a defined parameter space to find the optimal model.

In this example, we use the AutoMLImageConfig to train an object detection model with
yolov5 and fasterrcnn_resnet50_fpn , both of which are pretrained on COCO, a large-
scale object detection, segmentation, and captioning dataset that contains over
thousands of labeled images with over 80 label categories.
Hyperparameter sweeping for image tasks

You can perform a hyperparameter sweep over a defined parameter space to find the
optimal model.
The following code, defines the parameter space in preparation for the hyperparameter
sweep for each defined algorithm, yolov5 and fasterrcnn_resnet50_fpn . In the
parameter space, specify the range of values for learning_rate , optimizer ,
lr_scheduler , etc., for AutoML to choose from as it attempts to generate a model with
the optimal primary metric. If hyperparameter values are not specified, then default
values are used for each algorithm.
For the tuning settings, use random sampling to pick samples from this parameter space
by importing the GridParameterSampling, RandomParameterSampling and
BayesianParameterSampling classes. Doing so, tells automated ML to try a total of 20
iterations with these different samples, running four iterations at a time on our compute
target, which was set up using four nodes. The more parameters the space has, the
more iterations you need to find optimal models.
The Bandit early termination policy is also used. This policy terminates poor performing
configurations; that is, those configurations that are not within 20% slack of the best
performing configuration, which significantly saves compute resources.
Python
from azureml.train.hyperdrive import RandomParameterSampling

from azureml.train.hyperdrive import BanditPolicy, HyperDriveConfig
from azureml.train.hyperdrive import choice, uniform
parameter_space = {
'model': choice(
{
'model_name': choice('yolov5'),
'learning_rate': uniform(0.0001, 0.01),
#'model_size': choice('small', 'medium'), # model-specific
'img_size': choice(640, 704, 768), # model-specific
},
{
'model_name': choice('fasterrcnn_resnet50_fpn'),
#'warmup_cosine_lr_warmup_epochs': choice(0, 3),
'optimizer': choice('sgd', 'adam', 'adamw'),
'min_size': choice(600, 800), # model-specific
}
)
}
tuning_settings = {
'iterations': 20,
'max_concurrent_iterations': 4,
'hyperparameter_sampling': RandomParameterSampling(parameter_space),
'policy': BanditPolicy(evaluation_interval=2, slack_factor=0.2,
delay_evaluation=6)
}
Once the parameter space and tuning settings are defined, you can pass them into your
AutoMLImageConfig object and then submit the experiment to train an image model
using your training dataset.
Python
from azureml.train.automl import AutoMLImageConfig

automl_image_config = AutoMLImageConfig(task='image-object-detection',
training_data=training_dataset,
validation_data=validation_dataset,
primary_metric='mean_average_precision',
**tuning_settings)
automl_image_run = experiment.submit(automl_image_config)
automl_image_run.wait_for_completion(wait_post_processing=True)
When doing a hyperparameter sweep, it can be useful to visualize the different

configurations that were tried using the HyperDrive UI. You can navigate to this UI by
going to the 'Child runs' tab in the UI of the main automl_image_run from above, which
is the HyperDrive parent run. Then you can go into the 'Child runs' tab of this one.
Alternatively, here below you can see directly the HyperDrive parent run and navigate to
its 'Child runs' tab:
Python

hyperdrive_run = Run(experiment=experiment, run_id=automl_image_run.id +
'_HD')
hyperdrive_run
Register the best model
Once the run completes, we can register the model that was created from the best run.
Python
best_child_run = automl_image_run.get_best_child()
model_name = best_child_run.properties['model_name']
model = best_child_run.register_model(model_name = model_name,
model_path='outputs/model.pt')
Deploy model as a web service

Once you have your trained model, you can deploy the model on Azure. You can deploy
your trained model as a web service on Azure Container Instances (ACI) or Azure
Kubernetes Service (AKS). ACI is the perfect option for testing deployments, while AKS is
better suited for high-scale, production usage.
In this tutorial, we deploy the model as a web service in AKS.
1. Create an AKS compute cluster. In this example, a GPU virtual machine SKU is used
for the deployment cluster
Python
from azureml.core.compute import ComputeTarget, AksCompute

from azureml.exceptions import ComputeTargetException
# Choose a name for your cluster

aks_name = "cluster-aks-gpu"
# Check to see if the cluster already exists

try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
# Provision AKS cluster with GPU machine
prov_config =
AksCompute.provisioning_configuration(vm_size="STANDARD_NC6",
location="eastus2")
# Create the cluster
aks_target = ComputeTarget.create(workspace=ws,
name=aks_name,
provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
2. Define the inference configuration that describes how to set up the web-service
containing your model. You can use the scoring script and the environment from
the training run in your inference config.
７ Note
To change the model's settings, open the downloaded scoring script and
modify the model_settings variable before deploying the model.
Python
best_child_run.download_file('outputs/scoring_file_v_1_0_0.py',
output_file_path='score.py')
environment = best_child_run.get_environment()
inference_config = InferenceConfig(entry_script='score.py',
environment=environment)
3. You can then deploy the model as an AKS web service.
Python
from azureml.core.webservice import AksWebservice

from azureml.core.webservice import Webservice
aks_config = AksWebservice.deploy_configuration(autoscale_enabled=True,
cpu_cores=1,
memory_gb=50,
enable_app_insights=True)
aks_service = Model.deploy(ws,
models=[model],
deployment_config=aks_config,
deployment_target=aks_target,
name='automl-image-test',
overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)
Test the web service
You can test the deployed web service to predict new images. For this tutorial, pass a
random image from the dataset and pass it to the scoring URI.
Python
import requests
# URL for the web service

scoring_uri = aks_service.scoring_uri
# If the service is authenticated, set the key or token

key, _ = aks_service.get_keys()
sample_image = './test_image.jpg'
# Load image data

data = open(sample_image, 'rb').read()
# Set the content type

headers = {'Content-Type': 'application/octet-stream'}
# If authentication is enabled, set the authorization header

headers['Authorization'] = f'Bearer {key}'
# Make the request and display the response

resp = requests.post(scoring_uri, data, headers=headers)
print(resp.text)
Visualize detections
Now that you have scored a test image, you can visualize the bounding boxes for this
image. To do so, be sure you have matplotlib installed.
%pip install --upgrade matplotlib
Python
%matplotlib inline
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import json
IMAGE_SIZE = (18,12)
plt.figure(figsize=IMAGE_SIZE)
img_np=mpimg.imread(sample_image)
img = Image.fromarray(img_np.astype('uint8'),'RGB')
x, y = img.size
fig,ax = plt.subplots(1, figsize=(15,15))

# Display the image
ax.imshow(img_np)
# draw box and label for each detection

detections = json.loads(resp.text)
for detect in detections['boxes']:
label = detect['label']
box = detect['box']
conf_score = detect['score']
if conf_score > 0.6:
ymin, xmin, ymax, xmax = box['topY'],box['topX'],
box['bottomY'],box['bottomX']
topleft_x, topleft_y = x * xmin, y * ymin
width, height = x * (xmax - xmin), y * (ymax - ymin)
print('{}: [{}, {}, {}, {}], {}'.format(detect['label'],
round(topleft_x, 3),
round(topleft_y, 3),
round(width, 3),
round(height, 3),
round(conf_score, 3)))
color = np.random.rand(3) #'red'

rect = patches.Rectangle((topleft_x, topleft_y), width, height,
linewidth=3,
edgecolor=color,facecolor='none')
ax.add_patch(rect)
plt.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)
plt.show()
Clean up resources
Do not complete this section if you plan on running other Azure Machine Learning
tutorials.
If you don't plan to use the resources you created, delete them, so you don't incur any
charges.

2. From the list, select the resource group you created.
workspace properties and select Delete.
Next steps
In this automated machine learning tutorial, you did the following tasks:
＂ Configured a workspace and prepared data for an experiment.

＂ Trained an automated object detection model
＂ Specified hyperparameter values for your model
＂ Performed a hyperparameter sweep
＂ Deployed your model
＂ Visualized detections
Learn more about computer vision in automated ML.

Learn how to set up AutoML to train computer vision models with Python.
Learn how to configure incremental training on computer vision models.
See what hyperparameters are available for computer vision tasks.
Review detailed code examples and use cases in the GitHub notebook repository
for automated machine learning samples . Please check the folders with 'image-'
prefix for samples specific to building computer vision models.
７ Note
Use of the fridge objects dataset is available through the license under the MIT
License .
Set up AutoML to train a natural
language processing model with Python
(preview)
Article • 06/01/2023
） Important
Previews .
In this article, you learn how to train natural language processing (NLP) models with
automated ML in the Azure Machine Learning Python SDK.
Automated ML supports NLP which allows ML professionals and data scientists to bring
their own text data and build custom models for tasks such as, multi-class text
classification, multi-label text classification, and named entity recognition (NER).
You can seamlessly integrate with the Azure Machine Learning data labeling capability
to label your text data or bring your existing labeled data. Automated ML provides the
option to use distributed training on multi-GPU compute clusters for faster model
training. The resulting model can be operationalized at scale by leveraging Azure
Machine Learning's MLOps capabilities.
Prerequisites
Azure subscription. If you don't have an Azure subscription, sign up to try the free
or paid version of Azure Machine Learning today.
An Azure Machine Learning workspace with a GPU training compute. To create the
workspace, see Create workspace resources. See GPU optimized virtual machine
sizes for more details of GPU instances provided by Azure.
２ Warning
Support for multilingual models and the use of models with longer max
sequence length is necessary for several NLP use cases, such as non-english
datasets and longer range documents. As a result, these scenarios may
require higher GPU memory for model training to succeed, such as the NC_v3
series or the ND series.
The Azure Machine Learning Python SDK installed.
To install the SDK you can either,
Create a compute instance, which automatically installs the SDK and is pre-
configured for ML workflows. See Create and manage an Azure Machine
Learning compute instance for more information.
Install the automl package yourself , which includes the default installation of
the SDK.
） Important

package version.
environment.
notes.
This article assumes some familiarity with setting up an automated machine

learning experiment. Follow the tutorial or how-to to see the main automated
machine learning experiment design patterns.
Select your NLP task

Determine what NLP task you want to accomplish. Currently, automated ML supports
the follow deep neural network NLP tasks.
Task AutoMLConfig Description

syntax
Task AutoMLConfig Description
syntax
Multi-class task = 'text- There are multiple possible classes and each sample can be
text classification' classified as exactly one class. The task is to predict the correct
classification class for each sample.
For example, classifying a movie script as "Comedy" or

"Romantic".
Multi-label task = 'text- There are multiple possible classes and each sample can be
text classification- assigned any number of classes. The task is to predict all the
classification multilabel' classes for each sample
For example, classifying a movie script as "Comedy", or

"Romantic", or "Comedy and Romantic".
Named task = 'text- There are multiple possible tags for tokens in sequences. The
Entity ner' task is to predict the tags for all the tokens for each sequence.
Recognition
(NER) For example, extracting domain-specific entities from
unstructured text, such as contracts or financial documents
Preparing data
For NLP experiments in automated ML, you can bring an Azure Machine Learning
dataset with .csv format for multi-class and multi-label classification tasks. For NER
tasks, two-column .txt files that use a space as the separator and adhere to the CoNLL
format are supported. The following sections provide additional detail for the data
format accepted for each task.
Multi-class
For multi-class classification, the dataset can contain several text columns and exactly
one label column. The following example has only one text column.
Python
text,labels
"I love watching Chicago Bulls games.","NBA"
"Tom Brady is a great player.","NFL"
"There is a game between Yankees and Orioles tonight","MLB"
"Stephen Curry made the most number of 3-Pointers","NBA"
Multi-label
For multi-label classification, the dataset columns would be the same as multi-class,
however there are special format requirements for data in the label column. The two
accepted formats and examples are in the following table.
Label column format options Multiple labels One label No labels
Plain text "label1, label2, label3" "label1" ""
Python list with quotes "['label1','label2','label3']" "['label1']" "[]"
） Important
Different parsers are used to read labels for these formats. If you are using the plain
text format, only use alphabetical, numerical and '_' in your labels. All other
characters are recognized as the separator of labels.
For example, if your label is "cs.AI" , it's read as "cs" and "AI" . Whereas with the
Python list format, the label would be "['cs.AI']" , which is read as "cs.AI" .
Example data for multi-label in plain text format.
Python
text,labels
"I love watching Chicago Bulls games.","basketball"
"The four most popular leagues are NFL, MLB, NBA and
NHL","football,baseball,basketball,hockey"
"I like drinking beer.",""
Example data for multi-label in Python list with quotes format.
Python
text,labels
"I love watching Chicago Bulls games.","['basketball']"
"The four most popular leagues are NFL, MLB, NBA and NHL","
['football','baseball','basketball','hockey']"
"I like drinking beer.","[]"
Named entity recognition (NER)

Unlike multi-class or multi-label, which takes .csv format datasets, named entity
recognition requires CoNLL format. The file must contain exactly two columns and in
each row, the token and the label is separated by a single space.
For example,
Python
Hudson B-loc
Square I-loc
is O
a O
famous O
place O
in O
New B-loc
York I-loc
City I-loc
Stephen B-per
Curry I-per
got O
three O
championship O
rings O
Data validation
Before training, automated ML applies data validation checks on the input data to
ensure that the data can be preprocessed correctly. If any of these checks fail, the run
fails with the relevant error message. The following are the requirements to pass data
validation checks for each task.
７ Note
Some data validation checks are applicable to both the training and the validation
set, whereas others are applicable only to the training set. If the test dataset could
not pass the data validation, that means that automated ML couldn't capture it and
there is a possibility of model inference failure, or a decline in model performance.
Task Data validation check
All tasks - Both training and validation sets must be provided

- At least 50 training samples are required
Task Data validation check
Multi-class The training data and validation data must have

and Multi- - The same set of columns
label - The same order of columns from left to right
- The same data type for columns with the same name
- At least two unique labels
- Unique column names within each dataset (For example, the training set can't
have multiple columns named Age)
Multi-class None
only
Multi-label - The label column format must be in accepted format

only - At least one sample should have 0 or 2+ labels, otherwise it should be a
multiclass task
- All labels should be in str or int format, with no overlapping. You should not
have both label 1 and label '1'
NER only - The file should not start with an empty line
- Each line must be an empty line, or follow format {token} {label} , where there is
exactly one space between the token and the label and no white space after the
label
- All labels must start with I- , B- , or be exactly O . Case sensitive
- Exactly one empty line between two samples
- Exactly one empty line at the end of the file
Configure experiment
Automated ML's NLP capability is triggered through AutoMLConfig , which is the same
workflow for submitting automated ML experiments for classification, regression and
forecasting tasks. You would set most of the parameters as you would for those
experiments, such as task , compute_target and data inputs.
However, there are key differences:
You can ignore primary_metric , as it is only for reporting purpose. Currently,

automated ML only trains one model per run for NLP and there is no model
selection.
The label_column_name parameter is only required for multi-class and multi-label
text classification tasks.
If the majority of the samples in your dataset contain more than 128 words, it's
considered long range. For this scenario, you can enable the long range text
option with the enable_long_range_text=True parameter in your AutoMLConfig .
Doing so, helps improve model performance but requires longer training times.
If you enable long range text, then a GPU with higher memory is required such
as, NCv3 series or ND series.
The enable_long_range_text parameter is only available for multi-class
classification tasks.
Python
automl_settings = {
"enable_long_range_text": True, # # You only need to set this parameter
if you want to enable the long-range text setting
}
task="text-classification",
debug_log="automl_errors.log",
training_data=train_dataset,
validation_data=val_dataset,
label_column_name=target_column_name,
**automl_settings
)
Language settings
As part of the NLP functionality, automated ML supports 104 languages leveraging
language specific and multilingual pre-trained text DNN models, such as the BERT family
of models. Currently, language selection defaults to English.
The following table summarizes what model is applied based on task type and
language. See the full list of supported languages and their codes.
Task type Syntax for Text model algorithm

dataset_language
Multi-label text 'eng' English BERT uncased

classification 'deu' German BERT
'mul' Multilingual BERT
For all other languages, automated ML applies

multilingual BERT
Multi-class text 'eng' English BERT cased

classification 'deu' Multilingual BERT
'mul'
multilingual BERT
Task type Syntax for Text model algorithm
dataset_language
Named entity 'eng' English BERT cased

recognition (NER) 'deu' German BERT
'mul' Multilingual BERT

multilingual BERT
You can specify your dataset language in your FeaturizationConfig . BERT is also used in
the featurization process of automated ML experiment training, learn more about BERT
integration and featurization in automated ML.
Python
from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig(dataset_language='{your language

code}')
automl_config = AutomlConfig("featurization": featurization_config)
Distributed training
You can also run your NLP experiments with distributed training on an Azure Machine
Learning compute cluster. This is handled automatically by automated ML when the
parameters max_concurrent_iterations = number_of_vms and
enable_distributed_dnn_training = True are provided in your AutoMLConfig during
experiment set up.
Python
max_concurrent_iterations = number_of_vms
enable_distributed_dnn_training = True
Doing so, schedules distributed training of the NLP models and automatically scales to
every GPU on your virtual machine or cluster of virtual machines. The max number of
virtual machines allowed is 32. The training is scheduled with number of virtual
machines that is in powers of two.
Example notebooks
See the sample notebooks for detailed code examples for each NLP task.
Multi-class text classification
Multi-label text classification
Named entity recognition
Next steps
Troubleshoot automated ML experiments.
Configure training, validation, cross-
validation and test data in automated
machine learning
Article • 06/01/2023
In this article, you learn the different options for configuring training data and validation
data splits along with cross-validation settings for your automated machine learning,
automated ML, experiments.
In Azure Machine Learning, when you use automated ML to build multiple ML models,
each child run needs to validate the related model by calculating the quality metrics for
that model, such as accuracy or AUC weighted. These metrics are calculated by
comparing the predictions made with each model with real labels from past
observations in the validation data. Learn more about how metrics are calculated based
on validation type.
Automated ML experiments perform model validation automatically. The following

sections describe how you can further customize validation settings with the Azure
Machine Learning Python SDK.
For a low-code or no-code experience, see Create your automated machine learning
experiments in Azure Machine Learning studio.
Prerequisites

Familiarity with setting up an automated machine learning experiment with the

Azure Machine Learning SDK. Follow the tutorial or how-to to see the fundamental
automated machine learning experiment design patterns.
An understanding of train/validation data splits and cross-validation as machine

learning concepts. For a high-level explanation,
About training, validation and test data in machine learning

Understand Cross Validation in machine learning
） Important

package version.
Install the latest azureml-train-automl package to your local environment.

For details on the latest azureml-train-automl package, see the release notes.
Default data splits and cross-validation in

machine learning
Use the AutoMLConfig object to define your experiment and training settings. In the
following code snippet, notice that only the required parameters are defined, that is the
parameters for n_cross_validations or validation_data are not included.
７ Note
The default data splits and cross-validation are not supported in forecasting
scenarios.
Python
automl_config = AutoMLConfig(compute_target = aml_remote_compute,

task = 'classification',
primary_metric = 'AUC_weighted',
training_data = dataset,
label_column_name = 'Class'
)
If you do not explicitly specify either a validation_data or n_cross_validations

parameter, automated ML applies default techniques depending on the number of rows
provided in the single dataset training_data .

Larger than 20,000 rows Train/validation data split is applied. The default is to take 10% of the
initial training data set as the validation set. In turn, that validation set
is used for metrics calculation.
Smaller than 20,000 rows Cross-validation approach is applied. The default number of folds
depends on the number of rows.
If the dataset is less than 1,000 rows, 10 folds are used.
If the rows are between 1,000 and 20,000, then three folds are used.
Provide validation data

In this case, you can either start with a single data file and split it into training data and
validation data sets or you can provide a separate data file for the validation set. Either
way, the validation_data parameter in your AutoMLConfig object assigns which data to
use as your validation set. This parameter only accepts data sets in the form of an Azure
Machine Learning dataset or pandas dataframe.
７ Note
The validation_data parameter requires the training_data and label_column_name

parameters to be set as well. You can only set one validation parameter, that is you
can only specify either validation_data or n_cross_validations , not both.
The following code example explicitly defines which portion of the provided data in
dataset to use for training and validation.
Python
training_data, validation_data = dataset.random_split(percentage=0.8,

seed=1)

training_data = training_data,
validation_data = validation_data,
)
Provide validation set size
In this case, only a single dataset is provided for the experiment. That is, the
validation_data parameter is not specified, and the provided dataset is assigned to the
training_data parameter.
In your AutoMLConfig object, you can set the validation_size parameter to hold out a
portion of the training data for validation. This means that the validation set will be split
by automated ML from the initial training_data provided. This value should be between
0.0 and 1.0 non-inclusive (for example, 0.2 means 20% of the data is held out for
validation data).
７ Note
The validation_size parameter is not supported in forecasting scenarios.
See the following code example:
Python

validation_size = 0.2,
)
K-fold cross-validation
To perform k-fold cross-validation, include the n_cross_validations parameter and set
it to a value. This parameter sets how many cross validations to perform, based on the
same number of folds.
７ Note
The n_cross_validations parameter is not supported in classification scenarios that

use deep neural networks. For forecasting scenarios, see how cross validation is
applied in Set up AutoML to train a time-series forecasting model.
In the following code, five folds for cross-validation are defined. Hence, five different
trainings, each training using 4/5 of the data, and each validation using 1/5 of the data
with a different holdout fold each time.
As a result, metrics are calculated with the average of the five validation metrics.
Python

n_cross_validations = 5
)
Monte Carlo cross-validation

To perform Monte Carlo cross validation, include both the validation_size and
n_cross_validations parameters in your AutoMLConfig object.
For Monte Carlo cross validation, automated ML sets aside the portion of the training
data specified by the validation_size parameter for validation, and then assigns the
rest of the data for training. This process is then repeated based on the value specified
in the n_cross_validations parameter; which generates new training and validation
splits, at random, each time.
７ Note
The Monte Carlo cross-validation is not supported in forecasting scenarios.
The follow code defines, 7 folds for cross-validation and 20% of the training data should
be used for validation. Hence, 7 different trainings, each training uses 80% of the data,
and each validation uses 20% of the data with a different holdout fold each time.
Python

n_cross_validations = 7
validation_size = 0.2,
)
Specify custom cross-validation data folds

You can also provide your own cross-validation (CV) data folds. This is considered a
more advanced scenario because you are specifying which columns to split and use for
validation. Include custom CV split columns in your training data, and specify which
columns by populating the column names in the cv_split_column_names parameter.
Each column represents one cross-validation split, and is filled with integer values 1 or
0--where 1 indicates the row should be used for training and 0 indicates the row should
be used for validation.
７ Note
The cv_split_column_names parameter is not supported in forecasting scenarios.
The following code snippet contains bank marketing data with two CV split columns
'cv1' and 'cv2'.
Python
sample-notebook-data/bankmarketing_with_cv.csv"

label_column_name = 'y',
cv_split_column_names = ['cv1', 'cv2']
)
７ Note
To use cv_split_column_names with training_data and label_column_name , please

upgrade your Azure Machine Learning Python SDK version 1.6.0 or later. For
previous SDK versions, please refer to using cv_splits_indices , but note that it is
used with X and y dataset input only.
Metric calculation for cross validation in

machine learning
When either k-fold or Monte Carlo cross validation is used, metrics are computed on
each validation fold and then aggregated. The aggregation operation is an average for
scalar metrics and a sum for charts. Metrics computed during cross validation are based
on all folds and therefore all samples from the training set. Learn more about metrics in
automated machine learning.
When either a custom validation set or an automatically selected validation set is used,
model evaluation metrics are computed from only that validation set, not the training
data.
Provide test data (preview)
） Important
Previews .
You can also provide test data to evaluate the recommended model that automated ML
generates for you upon completion of the experiment. When you provide test data it's
considered a separate from training and validation, so as to not bias the results of the
test run of the recommended model. Learn more about training, validation and test data
in automated ML.
２ Warning
This feature is not available for the following automated ML scenarios
Computer vision tasks

Many models and hiearchical time series forecasting training (preview)
Forecasting tasks where deep learning neural networks (DNN) are enabled
Automated ML runs from local computes or Azure Databricks clusters
Test datasets must be in the form of an Azure Machine Learning TabularDataset. You can
specify a test dataset with the test_data and test_size parameters in your
AutoMLConfig object. These parameters are mutually exclusive and can not be specified
at the same time or with cv_split_column_names or cv_splits_indices .
With the test_data parameter, specify an existing dataset to pass into your
AutoMLConfig object.
Python
automl_config = AutoMLConfig(task='forecasting',
...
# Provide an existing test dataset
test_data=test_dataset,
...
forecasting_parameters=forecasting_parameters)
To use a train/test split instead of providing test data directly, use the test_size
parameter when creating the AutoMLConfig . This parameter must be a floating point
value between 0.0 and 1.0 exclusive, and specifies the percentage of the training dataset
that should be used for the test dataset.
Python
automl_config = AutoMLConfig(task = 'regression',

...
# Specify train/test split
training_data=training_data,
test_size=0.2)
７ Note
For regression tasks, random sampling is used.

For classification tasks, stratified sampling is used, but random sampling is used as
a fall back when stratified sampling is not feasible.
Forecasting does not currently support specifying a test dataset using a train/test
split with the test_size parameter.
Passing the test_data or test_size parameters into the AutoMLConfig , automatically

triggers a remote test run upon completion of your experiment. This test run uses the
provided test data to evaluate the best model that automated ML recommends. Learn
more about how to get the predictions from the test run.
Next steps
Prevent imbalanced data and overfitting.
How to Auto-train a time-series forecast model.

Data featurization in automated
machine learning
Article • 06/01/2023
Learn about the data featurization settings in Azure Machine Learning, and how to
customize those features for automated machine learning experiments.
Feature engineering and featurization

Training data consists of rows and columns. Each row is an observation or record, and
the columns of each row are the features that describe each record. Typically, the
features that best characterize the patterns in the data are selected to create predictive
models.
Although many of the raw data fields can be used directly to train a model, it's often
necessary to create additional (engineered) features that provide information that better
differentiates patterns in the data. This process is called feature engineering, where the
use of domain knowledge of the data is leveraged to create features that, in turn, help
machine learning algorithms to learn better.
In Azure Machine Learning, data-scaling and normalization techniques are applied to

make feature engineering easier. Collectively, these techniques and this feature
engineering are called featurization in automated ML experiments.
Prerequisites
This article assumes that you already know how to configure an automated ML
experiment.
） Important

package version.
Install the latest azureml-train-automl package to your local environment.

For details on the latest azureml-train-automl package, see the release notes.
For information about configuration, see the following articles:
For a code-first experience: Configure automated ML experiments by using the

Azure Machine Learning SDK for Python.
For a low-code or no-code experience: Create, review, and deploy automated
machine learning models by using the Azure Machine Learning studio.
Configure featurization
In every automated machine learning experiment, automatic scaling and normalization
techniques are applied to your data by default. These techniques are types of
featurization that help certain algorithms that are sensitive to features on different
scales. You can enable more featurization, such as missing-values imputation, encoding,
and transforms.
７ Note
Steps for automated machine learning featurization (such as feature normalization,

handling missing data, or converting text to numeric) become part of the
underlying model. When you use the model for predictions, the same featurization
steps that are applied during training are applied to your input data automatically.
For experiments that you configure with the Python SDK, you can enable or disable the
featurization setting and further specify the featurization steps to be used for your
experiment. If you're using the Azure Machine Learning studio, see the steps to enable
featurization.
The following table shows the accepted settings for featurization in the AutoMLConfig
class:
Featurization configuration Description
"featurization": 'auto' Specifies that, as part of preprocessing, data

guardrails and featurization steps are to be done
automatically. This setting is the default.
"featurization": 'off' Specifies that featurization steps are not to be done

automatically.
"featurization": 'FeaturizationConfig' Specifies that customized featurization steps are to be

used. Learn how to customize featurization.
Automatic featurization
The following table summarizes techniques that are automatically applied to your data.
These techniques are applied for experiments that are configured by using the SDK or
the studio UI. To disable this behavior, set "featurization": 'off' in your AutoMLConfig
object.
７ Note
If you plan to export your AutoML-created models to an ONNX model, only the
featurization options indicated with an asterisk ("*") are supported in the ONNX
format. Learn more about converting models to ONNX.
Featurization steps Description
Drop high Drop these features from training and validation sets. Applies to features
cardinality or no with all values missing, with the same value across all rows, or with high
variance features* cardinality (for example, hashes, IDs, or GUIDs).
Impute missing For numeric features, impute with the average of values in the column.
values*
For categorical features, impute with the most frequent value.
Generate more For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter,
features* Week of the year, Hour, Minute, Second.
For forecasting tasks, these additional DateTime features are created: ISO
year, Half - half-year, Calendar month as string, Week, Day of week as
string, Day of quarter, Day of year, AM/PM (0 if hour is before noon (12
pm), 1 otherwise), AM/PM as string, Hour of day (12-hr basis)
For Text features: Term frequency based on unigrams, bigrams, and

trigrams. Learn more about how this is done with BERT.
Transform and Transform numeric features that have few unique values into categorical
encode* features.
One-hot encoding is used for low-cardinality categorical features. One-

hot-hash encoding is used for high-cardinality categorical features.
Word embeddings A text featurizer converts vectors of text tokens into sentence vectors by
using a pre-trained model. Each word's embedding vector in a document is
aggregated with the rest to produce a document feature vector.
Featurization steps Description
Cluster Distance Trains a k-means clustering model on all numeric columns. Produces k new
features (one new numeric feature per cluster) that contain the distance of
each sample to the centroid of each cluster.
In every automated machine learning experiment, your data is automatically scaled or

normalized to help algorithms perform well. During model training, one of the following
scaling or normalization techniques are applied to each model.
Scaling & processing Description
StandardScaleWrapper Standardize features by removing the mean and scaling to unit

variance
MinMaxScalar Transforms features by scaling each feature by that column's

minimum and maximum
MaxAbsScaler Scale each feature by its maximum absolute value
RobustScalar Scales features by their quantile range
PCA Linear dimensionality reduction using Singular Value Decomposition

of the data to project it to a lower dimensional space
TruncatedSVDWrapper This transformer performs linear dimensionality reduction by means

of truncated singular value decomposition (SVD). Contrary to PCA,
this estimator does not center the data before computing the singular
value decomposition, which means it can work with scipy.sparse
matrices efficiently
SparseNormalizer Each sample (that is, each row of the data matrix) with at least one
non-zero component is rescaled independently of other samples so
that its norm (l1 or l2) equals one
Data guardrails
Data guardrails help you identify potential issues with your data (for example, missing
values or class imbalance). They also help you take corrective actions for improved
results.
Data guardrails are applied:
For SDK experiments: When the parameters "featurization": 'auto' or

validation=auto are specified in your AutoMLConfig object.
For studio experiments: When automatic featurization is enabled.

You can review the data guardrails for your experiment:
By setting show_output=True when you submit an experiment by using the SDK.
In the studio, on the Data guardrails tab of your automated ML run.
Data guardrail states

Data guardrails display one of three states:
State Description
Passed No data problems were detected and no action is required by you.
Done Changes were applied to your data. We encourage you to review the corrective actions
that AutoML took, to ensure that the changes align with the expected results.
Alerted A data issue was detected but couldn't be remedied. We encourage you to revise and
fix the issue.
Supported data guardrails

The following table describes the data guardrails that are currently supported and the
associated statuses that you might see when you submit your experiment:
Guardrail Status Condition for trigger
Missing Passed No missing feature values were detected in your training data. Learn
feature more about missing-value imputation.
values
imputation Done Missing feature values were detected in your training data and were
imputed.
High Passed Your inputs were analyzed, and no high-cardinality features were
cardinality detected.
feature
detection Done High-cardinality features were detected in your inputs and were handled.
Validation Done The validation configuration was set to 'auto' and the training data
split contained fewer than 20,000 rows.
handling Each iteration of the trained model was validated by using cross-
validation. Learn more about validation data.
The validation configuration was set to 'auto' , and the training data
contained more than 20,000 rows.
The input data has been split into a training dataset and a validation
dataset for validation of the model.
Class Passed Your inputs were analyzed, and all classes are balanced in your training
balancing data. A dataset is considered to be balanced if each class has good
detection representation in the dataset, as measured by number and ratio of
samples.
Alerted
Imbalanced classes were detected in your inputs. To fix model bias, fix
the balancing problem. Learn more about imbalanced data.
Done
Imbalanced classes were detected in your inputs and the sweeping logic
has determined to apply balancing.
Memory Passed
issues The selected values (horizon, lag, rolling window) were analyzed, and no
detection potential out-of-memory issues were detected. Learn more about time-
series forecasting configurations.
Done
The selected values (horizon, lag, rolling window) were analyzed and will
potentially cause your experiment to run out of memory. The lag or
rolling-window configurations have been turned off.
Frequency Passed
detection The time series was analyzed, and all data points are aligned with the
detected frequency.
Done The time series was analyzed, and data points that don't align with the
detected frequency were detected. These data points were removed
from the dataset.
Cross Done In order to accurately evaluate the model(s) trained by AutoML, we

validation leverage a dataset that the model is not trained on. Hence, if the user
doesn't provide an explicit validation dataset, a part of the training
dataset is used to achieve this. For smaller datasets (fewer than 20,000
samples), cross-validation is leveraged, else a single hold-out set is split
from the training data to serve as the validation dataset. Hence, for your
input data we leverage cross-validation with 10 folds, if the number of
training samples are fewer than 1000, and 3 folds in all other cases.
Train-Test Done In order to accurately evaluate the model(s) trained by AutoML, we

data split leverage a dataset that the model is not trained on. Hence, if the user
doesn't provide an explicit validation dataset, a part of the training
dataset is used to achieve this. For smaller datasets (fewer than 20,000
samples), cross-validation is leveraged, else a single hold-out set is split
from the training data to serve as the validation dataset. Hence, your
input data has been split into a training dataset and a holdout validation
dataset.
Time Series Passed

ID The data set was analyzed, and no duplicate time index were detected.
detection
Multiple time series were found in the dataset, and the time series
Fixed identifiers were automatically created for your dataset.
Time series Passed

aggregation The dataset frequency is aligned with the user specified frequency. No
aggregation was performed.
Fixed The data was aggregated to comply with user provided frequency.
Short series Passed

handling Automated ML detected enough data points for each series in the input
data to continue with training.
Fixed Automated ML detected that some series did not contain enough data
points to train a model. To continue with training, these short series have
been dropped or padded.
You can customize your featurization settings to ensure that the data and features that
are used to train your ML model result in relevant predictions.
To customize featurizations, specify "featurization": FeaturizationConfig in your

AutoMLConfig object. If you're using the Azure Machine Learning studio for your
experiment, see the how-to article. To customize featurization for forecastings task
types, refer to the forecasting how-to.
Supported customizations include:
Customization Definition
Column Override the autodetected feature type for the specified column.
purpose update
Transformer Update the parameters for the specified transformer. Currently supports
parameter Imputer (mean, most frequent, and median) and HashOneHotEncoder.
update
Drop columns Specifies columns to drop from being featurized.
Block Specifies block transformers to be used in the featurization process.

transformers
７ Note
The drop columns functionality is deprecated as of SDK version 1.19. Drop columns
from your dataset as part of data cleansing, prior to consuming it in your
automated ML experiment.
Create the FeaturizationConfig object by using API calls:
Python
featurization_config = FeaturizationConfig()
featurization_config.blocked_transformers = ['LabelEncoder']
featurization_config.drop_columns = ['aspiration', 'stroke']
featurization_config.add_column_purpose('engine-size', 'Numeric')
featurization_config.add_column_purpose('body-style', 'CategoricalHash')
#default strategy mean, add transformer param for for 3 columns
featurization_config.add_transformer_params('Imputer', ['engine-size'],
{"strategy": "median"})
featurization_config.add_transformer_params('Imputer', ['city-mpg'],
featurization_config.add_transformer_params('Imputer', ['bore'],
{"strategy": "most_frequent"})
featurization_config.add_transformer_params('HashOneHotEncoder', [],
{"number_of_bits": 3})
Featurization transparency
Every AutoML model has featurization automatically applied. Featurization includes
automated feature engineering (when "featurization": 'auto' ) and scaling and
normalization, which then impacts the selected algorithm and its hyperparameter values.
AutoML supports different methods to ensure you have visibility into what was applied
to your model.
Consider this forecasting example:
There are four input features: A (Numeric), B (Numeric), C (Numeric), D (DateTime).

Numeric feature C is dropped because it is an ID column with all unique values.
Numeric features A and B have missing values and hence are imputed by the
mean.
DateTime feature D is featurized into 11 different engineered features.
To get this information, use the fitted_model output from your automated ML
experiment run.
Python
automl_config = AutoMLConfig(…)
automl_run = experiment.submit(automl_config …)
best_run, fitted_model = automl_run.get_output()
Automated feature engineering

The get_engineered_feature_names() returns a list of engineered feature names.
７ Note
Use 'timeseriestransformer' for task='forecasting', else use 'datatransformer' for

'regression' or 'classification' task.
Python
fitted_model.named_steps['timeseriestransformer'].
get_engineered_feature_names ()
This list includes all engineered feature names.
['A', 'B', 'A_WASNULL', 'B_WASNULL', 'year', 'half', 'quarter', 'month',

'day', 'hour', 'am_pm', 'hour12', 'wday', 'qday', 'week']
The get_featurization_summary() gets a featurization summary of all the input features.
Python
fitted_model.named_steps['timeseriestransformer'].get_featurization_summary(
)
Output
[{'RawFeatureName': 'A',
'TypeDetected': 'Numeric',
'Dropped': 'No',
'EngineeredFeatureCount': 2,
'Tranformations': ['MeanImputer', 'ImputationMarker']},
{'RawFeatureName': 'B',
'Dropped': 'No',
'Tranformations': ['MeanImputer', 'ImputationMarker']},
{'RawFeatureName': 'C',
'Dropped': 'Yes',
'Tranformations': []},
{'RawFeatureName': 'D',
'TypeDetected': 'DateTime',
'Dropped': 'No',
'Tranformations':
['DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime
','DateTime','DateTime','DateTime','DateTime']}]
Output Definition
RawFeatureName Input feature/column name from the dataset provided.
TypeDetected Detected datatype of the input feature.
Dropped Indicates if the input feature was dropped or used.
EngineeringFeatureCount Number of features generated through automated feature

engineering transforms.
Transformations List of transformations applied to input features to generate

engineered features.
Scaling and normalization

To understand the scaling/normalization and the selected algorithm with its
hyperparameter values, use fitted_model.steps .
The following sample output is from running fitted_model.steps for a chosen run:
[('RobustScaler',
RobustScaler(copy=True,
quantile_range=[10, 90],
with_centering=True,
with_scaling=True)),
('LogisticRegression',
LogisticRegression(C=0.18420699693267145, class_weight='balanced',
dual=False,
fit_intercept=True,
intercept_scaling=1,
max_iter=100,
multi_class='multinomial',
n_jobs=1, penalty='l2',
random_state=None,
solver='newton-cg',
tol=0.0001,
verbose=0,
warm_start=False))
To get more details, use this helper function:
Python
from pprint import pprint
def print_model(model, prefix=""):

for step in model.steps:
print(prefix + step[0])
if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
pprint({'estimators': list(e[0] for e in step[1].estimators),
'weights': step[1].weights})
print()
for estimator in step[1].estimators:
print_model(estimator[1], estimator[0]+ ' - ')
elif hasattr(step[1], '_base_learners') and hasattr(step[1],
'_meta_learner'):
print("\nMeta Learner")
pprint(step[1]._meta_learner)
print()
for estimator in step[1]._base_learners:
print_model(estimator[1], estimator[0]+ ' - ')
else:
pprint(step[1].get_params())
print()
This helper function returns the following output for a particular run using
LogisticRegression with RobustScalar as the specific algorithm.
RobustScaler
{'copy': True,
'quantile_range': [10, 90],
'with_centering': True,
'with_scaling': True}
LogisticRegression
{'C': 0.18420699693267145,
'class_weight': 'balanced',
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'multinomial',
'n_jobs': 1,
'penalty': 'l2',
'random_state': None,
'solver': 'newton-cg',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}
Predict class probability

Models produced using automated ML all have wrapper objects that mirror functionality
from their open-source origin class. Most classification model wrapper objects returned
by automated ML implement the predict_proba() function, which accepts an array-like
or sparse matrix data sample of your features (X values), and returns an n-dimensional
array of each sample and its respective class probability.
Assuming you have retrieved the best run and fitted model using the same calls from
above, you can call predict_proba() directly from the fitted model, supplying an X_test
sample in the appropriate format depending on the model type.
Python
best_run, fitted_model = automl_run.get_output()

class_prob = fitted_model.predict_proba(X_test)
If the underlying model does not support the predict_proba() function or the format is
incorrect, a model class-specific exception will be thrown. See the
RandomForestClassifier and XGBoost reference docs for examples of how this
function is implemented for different model types.
BERT integration in automated ML

BERT is used in the featurization layer of automated ML. In this layer, if a column
contains free text or other types of data like timestamps or simple numbers, then
featurization is applied accordingly.
For BERT, the model is fine-tuned and trained utilizing the user-provided labels. From
here, document embeddings are output as features alongside others, like timestamp-
based features, day of week.
Learn how to set up natural language processing (NLP) experiments that also use BERT
with automated ML.
Steps to invoke BERT

In order to invoke BERT, set enable_dnn: True in your automl_settings and use a GPU
compute ( vm_size = "STANDARD_NC6" or a higher GPU). If a CPU compute is used, then
instead of BERT, AutoML enables the BiLSTM DNN featurizer.
Automated ML takes the following steps for BERT.
1. Preprocessing and tokenization of all text columns. For example, the "StringCast"
transformer can be found in the final model's featurization summary. An example
of how to produce the model's featurization summary can be found in this
notebook .
2. Concatenate all text columns into a single text column, hence the
StringConcatTransformer in the final model.
Our implementation of BERT limits total text length of a training sample to 128
tokens. That means, all text columns when concatenated, should ideally be at most
128 tokens in length. If multiple columns are present, each column should be
pruned so this condition is satisfied. Otherwise, for concatenated columns of
length >128 tokens BERT's tokenizer layer truncates this input to 128 tokens.
3. As part of feature sweeping, AutoML compares BERT against the baseline (bag
of words features) on a sample of the data. This comparison determines if BERT
would give accuracy improvements. If BERT performs better than the baseline,
AutoML then uses BERT for text featurization for the whole data. In that case, you
will see the PretrainedTextDNNTransformer in the final model.
BERT generally runs longer than other featurizers. For better performance, we
recommend using "STANDARD_NC24r" or "STANDARD_NC24rs_V3" for their RDMA
capabilities.
AutoML will distribute BERT training across multiple nodes if they are available (upto a
max of eight nodes). This can be done in your AutoMLConfig object by setting the
max_concurrent_iterations parameter to higher than 1.
Supported languages for BERT in AutoML

AutoML currently supports around 100 languages and depending on the dataset's
language, AutoML chooses the appropriate BERT model. For German data, we use the
German BERT model. For English, we use the English BERT model. For all other
languages, we use the multilingual BERT model.
In the following code, the German BERT model is triggered, since the dataset language
is specified to deu , the three letter language code for German according to ISO
classification :
Python
from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig(dataset_language='deu')
automl_settings = {
"experiment_timeout_minutes": 120,
"primary_metric": 'accuracy',
# All other settings you want to use
"featurization": featurization_config,
"enable_dnn": True, # This enables BERT DNN featurizer

"enable_voting_ensemble": False,
"enable_stack_ensemble": False
}
Next steps
Learn how to set up your automated ML experiments:
For a code-first experience: Configure automated ML experiments by using the
Azure Machine Learning SDK.
For a low-code or no-code experience: Create your automated ML experiments
in the Azure Machine Learning studio.
Learn more about how to train a regression model by using automated machine
learning or how to train by using automated machine learning on a remote
resource.
View training code for an Automated
ML model
Article • 06/02/2023
In this article, you learn how to view the generated training code from any automated
machine learning trained model.
Code generation for automated ML trained models allows you to see the following
details that automated ML uses to train and build the model for a specific run.
Data preprocessing
Algorithm selection
Featurization
Hyperparameters
You can select any automated ML trained model, recommended or child run, and view
the generated Python training code that created that specific model.
With the generated model's training code you can,
Learn what featurization process and hyperparameters the model algorithm uses.
Track/version/audit trained models. Store versioned code to track what specific
training code is used with the model that's to be deployed to production.
Customize the training code by changing hyperparameters or applying your ML
and algorithms skills/experience, and retrain a new model with your customized
code.
The following diagram illustrates that you can generate the code for automated ML
experiments with all task types. First select a model. The model you selected will be
highlighted, then Azure Machine Learning copies the code files used to create the
model, and displays them into your notebooks shared folder. From here, you can view
and customize the code as needed.
Prerequisites

learning experiment. Follow the tutorial or how-to to see the main automated
machine learning experiment design patterns.
Automated ML code generation is only available for experiments run on remote

Azure Machine Learning compute targets. Code generation isn't supported for
local runs.
All automated ML runs triggered through Azure Machine Learning Studio, SDKv2
or CLIv2 will have code generation enabled.
Get generated code and model artifacts

By default, each automated ML trained model generates its training code after training
completes. Automated ML saves this code in the experiment's outputs/generated_code
for that specific model. You can view them in the Azure Machine Learning studio UI on
the Outputs + logs tab of the selected model.
script.py This is the model's training code that you likely want to analyze with the
featurization steps, specific algorithm used, and hyperparameters.
script_run_notebook.ipynb Notebook with boiler-plate code to run the model's

training code (script.py) in Azure Machine Learning compute through Azure
Machine Learning SDKv2.
After the automated ML training run completes, there are you can access the script.py
and the script_run_notebook.ipynb files via the Azure Machine Learning studio UI.
To do so, navigate to the Models tab of the automated ML experiment parent run page.
After you select one of the trained models, you can select the View generated code
button. This button redirects you to the Notebooks portal extension, where you can
view, edit and run the generated code for that particular selected model.
You can also access to the model's generated code from the top of the child run's page
once you navigate into that child run's page of a particular model.
If you're using the Python SDKv2, you can also download the "script.py" and the
"script_run_notebook.ipynb" by retrieving the best run via MLFlow & downloading the
resulting artifacts.
script.py
The script.py file contains the core logic needed to train a model with the previously
used hyperparameters. While intended to be executed in the context of an Azure
Machine Learning script run, with some modifications, the model's training code can
also be run standalone in your own on-premises environment.
The script can roughly be broken down into several the following parts: data loading,
data preparation, data featurization, preprocessor/algorithm specification, and training.
Data loading
The function get_training_dataset() loads the previously used dataset. It assumes that
the script is run in an Azure Machine Learning script run under the same workspace as
the original experiment.
Python
def get_training_dataset(dataset_id):
logger.info("Running get_training_dataset")
ws = Run.get_context().experiment.workspace
dataset = Dataset.get_by_id(workspace=ws, id=dataset_id)
return dataset.to_pandas_dataframe()
When running as part of a script run, Run.get_context().experiment.workspace retrieves

the correct workspace. However, if this script is run inside of a different workspace or
run locally, you need to modify the script to explicitly specify the appropriate workspace.
Once the workspace has been retrieved, the original dataset is retrieved by its ID.
Another dataset with exactly the same structure could also be specified by ID or name
with the get_by_id() or get_by_name(), respectively. You can find the ID later on in the
script, in a similar section as the following code.
Python
if __name__ == '__main__':
parser.add_argument('--training_dataset_id', type=str,
default='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx', help='Default training
dataset id is populated from the parent run')
main(args.training_dataset_id)
You can also opt to replace this entire function with your own data loading mechanism;
the only constraints are that the return value must be a Pandas dataframe and that the
data must have the same shape as in the original experiment.
Data preparation code

The function prepare_data() cleans the data, splits out the feature and sample weight
columns and prepares the data for use in training. This function can vary depending on
the type of dataset and the experiment task type: classification, regression, time-series
forecasting, images or NLP tasks.
The following example shows that in general, the dataframe from the data loading step
is passed in. The label column and sample weights, if originally specified, are extracted
and rows containing NaN are dropped from the input data.
Python
def prepare_data(dataframe):
from azureml.training.tabular.preprocessing import data_cleaning
logger.info("Running prepare_data")
label_column_name = 'y'
# extract the features, target and sample weight arrays

y = dataframe[label_column_name].values
X = dataframe.drop([label_column_name], axis=1)
sample_weights = None
X, y, sample_weights = data_cleaning._remove_nan_rows_in_X_y(X, y,
sample_weights,
is_timeseries=False, target_column=label_column_name)
return X, y, sample_weights
If you want to do any additional data preparation, it can be done in this step by adding
your custom data preparation code.
Data featurization code

The function generate_data_transformation_config() specifies the featurization step in
the final scikit-learn pipeline. The featurizers from the original experiment are
reproduced here, along with their parameters.
For example, possible data transformation that can happen in this function can be based
on imputers like, SimpleImputer() and CatImputer() , or transformers such as
StringCastTransformer() and LabelEncoderTransformer() .
The following is a transformer of type StringCastTransformer() that can be used to

transform a set of columns. In this case, the set indicated by column_names .
Python
def get_mapper_0(column_names):
# ... Multiple imports to package dependencies, removed for simplicity
...
definition = gen_features(
columns=column_names,
classes=[
{
'class': StringCastTransformer,
},
{
'class': CountVectorizer,
'analyzer': 'word',
'binary': True,
'decode_error': 'strict',
'dtype': numpy.uint8,
'encoding': 'utf-8',
'input': 'content',
'lowercase': True,
'max_df': 1.0,
'max_features': None,
'min_df': 1,
'ngram_range': (1, 1),
'preprocessor': None,
'stop_words': None,
'strip_accents': None,
'token_pattern': '(?u)\\b\\w\\w+\\b',
'tokenizer': wrap_in_lst,
'vocabulary': None,
},
]
)
mapper = DataFrameMapper(features=definition, input_df=True,
sparse=True)
return mapper
Be aware that if you have many columns that need to have the same
featurization/transformation applied (for example, 50 columns in several column
groups), these columns are handled by grouping based on type.
In the following example, notice that each group has a unique mapper applied. This
mapper is then applied to each of the columns of that group.
Python
def generate_data_transformation_config():
from sklearn.pipeline import FeatureUnion
column_group_1 = [['id'], ['ps_reg_01'], ['ps_reg_02'], ['ps_reg_03'],

['ps_car_11_cat'], ['ps_car_12'], ['ps_car_13'], ['ps_car_14'],
['ps_car_15'], ['ps_calc_01'], ['ps_calc_02'], ['ps_calc_03']]
column_group_2 = ['ps_ind_06_bin', 'ps_ind_07_bin', 'ps_ind_08_bin',

'ps_ind_09_bin', 'ps_ind_10_bin', 'ps_ind_11_bin', 'ps_ind_12_bin',
'ps_ind_13_bin', 'ps_ind_16_bin', 'ps_ind_17_bin', 'ps_ind_18_bin',
'ps_car_08_cat', 'ps_calc_15_bin', 'ps_calc_16_bin', 'ps_calc_17_bin',
'ps_calc_18_bin', 'ps_calc_19_bin', 'ps_calc_20_bin']
column_group_3 = ['ps_ind_01', 'ps_ind_02_cat', 'ps_ind_03',

'ps_ind_04_cat', 'ps_ind_05_cat', 'ps_ind_14', 'ps_ind_15', 'ps_car_01_cat',
'ps_car_02_cat', 'ps_car_03_cat', 'ps_car_04_cat', 'ps_car_05_cat',
'ps_car_06_cat', 'ps_car_07_cat', 'ps_car_09_cat', 'ps_car_10_cat',
'ps_car_11', 'ps_calc_04', 'ps_calc_05', 'ps_calc_06', 'ps_calc_07',
'ps_calc_08', 'ps_calc_09', 'ps_calc_10', 'ps_calc_11', 'ps_calc_12',
'ps_calc_13', 'ps_calc_14']
feature_union = FeatureUnion([
('mapper_0', get_mapper_0(column_group_1)),
])
return feature_union
This approach allows you to have a more streamlined code, by not having a
transformer's code-block for each column, which can be especially cumbersome even
when you have tens or hundreds of columns in your dataset.
With classification and regression tasks, [ FeatureUnion ] is used for featurizers. For time-
series forecasting models, multiple time series-aware featurizers are collected into a
scikit-learn pipeline, then wrapped in the TimeSeriesTransformer . Any user provided
featurizations for time series forecasting models happens before the ones provided by
automated ML.
Preprocessor specification code

The function generate_preprocessor_config() , if present, specifies a preprocessing step
to be done after featurization in the final scikit-learn pipeline.
Normally, this preprocessing step only consists of data standardization/normalization

that's accomplished with sklearn.preprocessing .
Automated ML only specifies a preprocessing step for non-ensemble classification and

regression models.
Here's an example of a generated preprocessor code:
Python
def generate_preprocessor_config():
from sklearn.preprocessing import MaxAbsScaler
preproc = MaxAbsScaler(
copy=True
)
return preproc
Algorithm and hyperparameters specification code

The algorithm and hyperparameters specification code is likely what many ML
professionals are most interested in.
The generate_algorithm_config() function specifies the actual algorithm and
hyperparameters for training the model as the last stage of the final scikit-learn pipeline.
The following example uses an XGBoostClassifier algorithm with specific

hyperparameters.
Python
def generate_algorithm_config():
from xgboost.sklearn import XGBClassifier
algorithm = XGBClassifier(
base_score=0.5,
booster='gbtree',
colsample_bylevel=1,
colsample_bynode=1,
colsample_bytree=1,
gamma=0,
learning_rate=0.1,
max_delta_step=0,
max_depth=3,
min_child_weight=1,
missing=numpy.nan,
n_estimators=100,
n_jobs=-1,
nthread=None,
objective='binary:logistic',
random_state=0,
reg_alpha=0,
reg_lambda=1,
scale_pos_weight=1,
seed=None,
silent=None,
subsample=1,
verbosity=0,
tree_method='auto',
verbose=-10
)
return algorithm
The generated code in most cases uses open source software (OSS) packages and
classes. There are instances where intermediate wrapper classes are used to simplify
more complex code. For example, XGBoost classifier and other commonly used libraries
like LightGBM or Scikit-Learn algorithms can be applied.
As an ML Professional, you are able to customize that algorithm's configuration code by

tweaking its hyperparameters as needed based on your skills and experience for that
algorithm and your particular ML problem.
For ensemble models, generate_preprocessor_config_N() (if needed) and
generate_algorithm_config_N() are defined for each learner in the ensemble model,
where N represents the placement of each learner in the ensemble model's list. For
stack ensemble models, the meta learner generate_algorithm_config_meta() is defined.
End to end training code

Code generation emits build_model_pipeline() and train_model() for defining the
scikit-learn pipeline and for calling fit() on it, respectively.
Python
def build_model_pipeline():
from sklearn.pipeline import Pipeline
logger.info("Running build_model_pipeline")
pipeline = Pipeline(
steps=[
('featurization', generate_data_transformation_config()),
('preproc', generate_preprocessor_config()),
('model', generate_algorithm_config()),
]
)
return pipeline
The scikit-learn pipeline includes the featurization step, a preprocessor (if used), and the
algorithm or model.
For time-series forecasting models, the scikit-learn pipeline is wrapped in a

ForecastingPipelineWrapper , which has some additional logic needed to properly
handle time-series data depending on the applied algorithm. For all task types, we use
PipelineWithYTransformer in cases where the label column needs to be encoded.
Once you have the scikit-Learn pipeline, all that is left to call is the fit() method to
train the model:
Python
def train_model(X, y, sample_weights):
logger.info("Running train_model")
model_pipeline = build_model_pipeline()
model = model_pipeline.fit(X, y)
return model
The return value from train_model() is the model fitted/trained on the input data.
The main code that runs all the previous functions is the following:
Python
def main(training_dataset_id=None):
# The following code is for when running this code as part of an Azure
Machine Learning script run.
setup_instrumentation(run)
df = get_training_dataset(training_dataset_id)
X, y, sample_weights = prepare_data(df)
split_ratio = 0.1
try:
(X_train, y_train, sample_weights_train), (X_valid, y_valid,
sample_weights_valid) = split_dataset(X, y, sample_weights, split_ratio,
should_stratify=True)
except Exception:
(X_train, y_train, sample_weights_train), (X_valid, y_valid,
sample_weights_valid) = split_dataset(X, y, sample_weights, split_ratio,
should_stratify=False)
model = train_model(X_train, y_train, sample_weights_train)
metrics = calculate_metrics(model, X, y, sample_weights, X_test=X_valid,

y_test=y_valid)
print(metrics)
for metric in metrics:
run.log(metric, metrics[metric])
Once you have the trained model, you can use it for making predictions with the
predict() method. If your experiment is for a time series model, use the forecast()
method for predictions.
Python
y_pred = model.predict(X)
Finally, the model is serialized and saved as a .pkl file named "model.pkl":
Python
with open('model.pkl', 'wb') as f:

pickle.dump(model, f)
run.upload_file('outputs/model.pkl', 'model.pkl')
script_run_notebook.ipynb
The script_run_notebook.ipynb notebook serves as an easy way to execute script.py
on an Azure Machine Learning compute. This notebook is similar to the existing
automated ML sample notebooks however, there are a couple of key differences as
explained in the following sections.
Environment
Typically, the training environment for an automated ML run is automatically set by the
SDK. However, when running a custom script run like the generated code, automated
ML is no longer driving the process, so the environment must be specified for the
command job to succeed.
Code generation reuses the environment that was used in the original automated ML
experiment, if possible. Doing so guarantees that the training script run doesn't fail due
to missing dependencies, and has a side benefit of not needing a Docker image rebuild,
which saves time and compute resources.
If you make changes to script.py that require additional dependencies, or you would
like to use your own environment, you need to update the environment in the
script_run_notebook.ipynb accordingly.
Submit the experiment

Since the generated code isn't driven by automated ML anymore, instead of creating
and submitting an AutoML Job, you need to create a Command Job and provide the
generated code (script.py) to it.
The following example contains the parameters and regular dependencies needed to
run a Command Job, such as compute, environment, etc.
Python
from azure.ai.ml import command, Input
# To test with new training / validation datasets, replace the default

dataset id(s) taken from parent run below
training_dataset_id = '<DATASET_ID>'
dataset_arguments = {'training_dataset_id': training_dataset_id}

command_str = 'python script.py --training_dataset_id
${{inputs.training_dataset_id}}'
command_job = command(
code=project_folder,
command=command_str,
environment='AutoML-Non-Prod-DNN:25',
inputs=dataset_arguments,
compute='automl-e2e-cl2',
experiment_name='build_70775722_9249eda8'
)
returned_job = ml_client.create_or_update(command_job)
print(returned_job.studio_url) # link to naviagate to submitted run in Azure
Machine Learning Studio
Next steps
See how to enable interpretability features specifically within automated ML
experiments.
Set up a development environment with
Azure Databricks and AutoML in Azure
Machine Learning
Article • 06/01/2023
Learn how to configure a development environment in Azure Machine Learning that

uses Azure Databricks and automated ML.
Azure Databricks is ideal for running large-scale intensive machine learning workflows
on the scalable Apache Spark platform in the Azure cloud. It provides a collaborative
Notebook-based environment with a CPU or GPU-based compute cluster.
For information on other machine learning development environments, see Set up

Python development environment.
Prerequisite
Azure Machine Learning workspace. To create one, use the steps in the Create
workspace resources article.
Azure Databricks with Azure Machine Learning

and AutoML
Azure Databricks integrates with Azure Machine Learning and its AutoML capabilities.
You can use Azure Databricks:
To train a model using Spark MLlib and deploy the model to ACI/AKS.
With automated machine learning capabilities using an Azure Machine Learning
SDK.
As a compute target from an Azure Machine Learning pipeline.
Set up a Databricks cluster

Create a Databricks cluster. Some settings apply only if you install the SDK for
automated machine learning on Databricks.
It takes few minutes to create the cluster.

Use these settings:
Setting Applies to Value
Cluster Name always yourclustername
Databricks Runtime Version always 9.1 LTS
Python version always 3
Worker Type Automated ML Memory optimized VM preferred

(determines max # of concurrent iterations) only
Workers always 2 or higher
Enable Autoscaling Automated ML Uncheck

only
Wait until the cluster is running before proceeding further.
Add the Azure Machine Learning SDK to

Databricks
Once the cluster is running, create a library to attach the appropriate Azure Machine
Learning SDK package to your cluster.
To use automated ML, skip to Add the Azure Machine Learning SDK with AutoML.
1. Right-click the current Workspace folder where you want to store the library. Select
Create > Library.
 Tip
If you have an old SDK version, deselect it from cluster's installed libraries and
move to trash. Install the new SDK version and restart the cluster. If there is an
issue after the restart, detach and reattach your cluster.
2. Choose the following option (no other SDK installations are supported)
SDK package extras Source PyPi Name
For Databricks Upload Python Egg or PyPI azureml-sdk[databricks]
２ Warning
No other SDK extras can be installed. Choose only the [ databricks ] option .
Do not select Attach automatically to all clusters.

Select Attach next to your cluster name.
3. Monitor for errors until status changes to Attached, which may take several
minutes. If this step fails:
Try restarting your cluster by:

a. In the left pane, select Clusters.
b. In the table, select your cluster name.
c. On the Libraries tab, select Restart.
A successful install looks like the following:
Add the Azure Machine Learning SDK with

AutoML to Databricks
If the cluster was created with Databricks Runtime 7.3 LTS (not ML), run the following
command in the first cell of your notebook to install the Azure Machine Learning SDK.
%pip install --upgrade --force-reinstall -r

https://aka.ms/automl_linux_requirements.txt
AutoML config settings

In AutoML config, when using Azure Databricks add the following parameters:
max_concurrent_iterations is based on number of worker nodes in your cluster.

spark_context=sc is based on the default spark context.
ML notebooks that work with Azure Databricks
Try it out:
While many sample notebooks are available, only these sample notebooks
work with Azure Databricks.
Import these samples directly from your workspace. See below:
Learn how to create a pipeline with Databricks as the training compute.
Troubleshooting
Databricks cancel an automated machine learning run: When you use automated
machine learning capabilities on Azure Databricks, to cancel a run and start a new
experiment run, restart your Azure Databricks cluster.
Databricks >10 iterations for automated machine learning: In automated machine

learning settings, if you have more than 10 iterations, set show_output to False
when you submit the run.
Databricks widget for the Azure Machine Learning SDK and automated machine
learning: The Azure Machine Learning SDK widget isn't supported in a Databricks
notebook because the notebooks can't parse HTML widgets. You can view the
widget in the portal by using this Python code in your Azure Databricks notebook
cell:
displayHTML("<a href={} target='_blank'>Azure Portal: {}

</a>".format(local_run.get_portal_url(), local_run.id))
Failure when installing packages
Azure Machine Learning SDK installation fails on Azure Databricks when more
packages are installed. Some packages, such as psutil , can cause conflicts. To
avoid installation errors, install packages by freezing the library version. This issue
is related to Databricks and not to the Azure Machine Learning SDK. You might
experience this issue with other libraries, too. Example:
Python
psutil cryptography==1.5 pyopenssl==16.0.0 ipython==2.2.0
Alternatively, you can use init scripts if you keep facing install issues with Python
libraries. This approach isn't officially supported. For more information, see Cluster-
scoped init scripts.
Import error: cannot import name Timedelta from pandas._libs.tslibs : If you

see this error when you use automated machine learning, run the two following
lines in your notebook:
%sh rm -rf /databricks/python/lib/python3.7/site-packages/pandas-

0.23.4.dist-info /databricks/python/lib/python3.7/site-packages/pandas
%sh /databricks/python/bin/pip install pandas==0.23.4
Import error: No module named 'pandas.core.indexes': If you see this error when
you use automated machine learning:
1. Run this command to install two packages in your Azure Databricks cluster:
Bash
scikit-learn==0.19.1
pandas==0.22.0
2. Detach and then reattach the cluster to your notebook.
If these steps don't solve the issue, try restarting the cluster.
FailToSendFeather: If you see a FailToSendFeather error when reading data on

Azure Databricks cluster, refer to the following solutions:
Upgrade azureml-sdk[automl] package to the latest version.
Add azureml-dataprep version 1.1.8 or above.
Add pyarrow version 0.11 or above.
Next steps
Train and deploy a model on Azure Machine Learning with the MNIST dataset.
See the Azure Machine Learning SDK for Python reference.
Set up AutoML to train a time-series
forecasting model with Python (SDKv1)
Article • 06/01/2023
In this article, you learn how to set up AutoML training for time-series forecasting
models with Azure Machine Learning automated ML in the Azure Machine Learning
Python SDK.
To do so, you:
＂ Prepare data for time series modeling.

＂ Configure specific time-series parameters in an AutoMLConfig object.
＂ Run predictions with time-series data.
For a low code experience, see the Tutorial: Forecast demand with automated machine
learning for a time-series forecasting example using automated ML in the Azure
Machine Learning studio .
Unlike classical time series methods, in automated ML, past time-series values are
"pivoted" to become additional dimensions for the regressor together with other
predictors. This approach incorporates multiple contextual variables and their
relationship to one another during training. Since multiple factors can influence a
forecast, this method aligns itself well with real world forecasting scenarios. For example,
when forecasting sales, interactions of historical trends, exchange rate, and price all
jointly drive the sales outcome.
Prerequisites


learning experiment. Follow the how-to to see the main automated machine
learning experiment design patterns.
） Important
package version.
environment.
notes.
Training and validation data

The most important difference between a forecasting regression task type and
regression task type within automated ML is including a feature in your training data
that represents a valid time series. A regular time series has a well-defined and
consistent frequency and has a value at every sample point in a continuous time span.
） Important
When training a model for forecasting future values, ensure all the features used in
training can be used when running predictions for your intended horizon. For
example, when creating a demand forecast, including a feature for current stock
price could massively increase training accuracy. However, if you intend to forecast
with a long horizon, you may not be able to accurately predict future stock values
corresponding to future time-series points, and model accuracy could suffer.
You can specify separate training data and validation data directly in the AutoMLConfig
object. Learn more about the AutoMLConfig.
For time series forecasting, only Rolling Origin Cross Validation (ROCV) is used for
validation by default. ROCV divides the series into training and validation data using an
origin time point. Sliding the origin in time generates the cross-validation folds. This
strategy preserves the time series data integrity and eliminates the risk of data leakage.
Pass your training and validation data as one dataset to the parameter training_data .
Set the number of cross validation folds with the parameter n_cross_validations and
set the number of periods between two consecutive cross-validation folds with
cv_step_size . You can also leave either or both parameters empty and AutoML will set
them automatically.
Python
training_data= training_data,
n_cross_validations="auto", # Could be
customized as an integer
cv_step_size = "auto", # Could be customized as
an integer
...
**time_series_settings)
You can also bring your own validation data, learn more in Configure data splits and
cross-validation in AutoML.
Learn more about how AutoML applies cross validation to prevent over-fitting models.
Configure experiment
The AutoMLConfig object defines the settings and data necessary for an automated
machine learning task. Configuration for a forecasting model is similar to the setup of a
standard regression model, but certain models, configuration options, and featurization
steps exist specifically for time-series data.
Supported models
Automated machine learning automatically tries different models and algorithms as part
of the model creation and tuning process. As a user, there is no need for you to specify
the algorithm. For forecasting experiments, both native time-series and deep learning
models are part of the recommendation system.
 Tip
Traditional regression models are also tested as part of the recommendation

system for forecasting experiments. See a complete list of the supported models in
the SDK reference documentation.
Configuration settings
Similar to a regression problem, you define standard training parameters like task type,
number of iterations, training data, and number of cross-validations. Forecasting tasks
require the time_column_name and forecast_horizon parameters to configure your
experiment. If the data includes multiple time series, such as sales data for multiple
stores or energy data across different states, automated ML automatically detects this
and sets the time_series_id_column_names parameter (preview) for you. You can also
include additional parameters to better configure your run, see the optional
configurations section for more detail on what can be included.
） Important
Automatic time series identification is currently in public preview. This preview

version is provided without a service-level agreement. Certain features might not
be supported or might have constrained capabilities. For more information, see
Supplemental Terms of Use for Microsoft Azure Previews .
Parameter name Description
time_column_name Used to specify the datetime column in the input data used for building the
time series and inferring its frequency.
forecast_horizon Defines how many periods forward you would like to forecast. The horizon is
in units of the time series frequency. Units are based on the time interval of
your training data, for example, monthly, weekly that the forecaster should
predict out.
The following code,
Leverages the ForecastingParameters class to define the forecasting parameters for

your experiment training
Sets the time_column_name to the day_datetime field in the data set.
Sets the forecast_horizon to 50 in order to predict for the entire test set.
Python
from azureml.automl.core.forecasting_parameters import ForecastingParameters
forecasting_parameters =
ForecastingParameters(time_column_name='day_datetime',
forecast_horizon=50,
freq='W')
These forecasting_parameters are then passed into your standard AutoMLConfig object
along with the forecasting task type, primary metric, exit criteria, and training data.
Python

import logging
primary_metric='normalized_root_mean_squared_error',
enable_early_stopping=True,
n_cross_validations="auto", # Could be
customized as an integer
cv_step_size = "auto", # Could be customized as
an integer
enable_ensembling=False,
verbosity=logging.INFO,
The amount of data required to successfully train a forecasting model with automated
ML is influenced by the forecast_horizon , n_cross_validations , and target_lags or
target_rolling_window_size values specified when you configure your AutoMLConfig .
The following formula calculates the amount of historic data that what would be needed
to construct time series features.
Minimum historic data required: (2x forecast_horizon ) + # n_cross_validations +
max(max( target_lags ), target_rolling_window_size )
An Error exception is raised for any series in the dataset that does not meet the
required amount of historic data for the relevant settings specified.
Featurization steps
In every automated machine learning experiment, automatic scaling and normalization
techniques are applied to your data by default. These techniques are types of
featurization that help certain algorithms that are sensitive to features on different
scales. Learn more about default featurization steps in Featurization in AutoML
However, the following steps are performed only for forecasting task types:
Detect time-series sample frequency (for example, hourly, daily, weekly) and create
new records for absent time points to make the series continuous.
Impute missing values in the target (via forward-fill) and feature columns (using
median column values)
Create features based on time series identifiers to enable fixed effects across
different series
Create time-based features to assist in learning seasonal patterns
Encode categorical variables to numeric quantities
Detect the non-stationary time series and automatically differencing them to
mitigate the impact of unit roots.
To view the full list of possible engineered features generated from time series data, see
TimeIndexFeaturizer Class.
７ Note

You also have the option to customize your featurization settings to ensure that the data
and features that are used to train your ML model result in relevant predictions.
Supported customizations for forecasting tasks include:
Customization Definition
Column purpose Override the auto-detected feature type for the specified column.
update
Transformer Update the parameters for the specified transformer. Currently supports
parameter update Imputer (fill_value and median).
Drop columns Specifies columns to drop from being featurized.
To customize featurizations with the SDK, specify "featurization": FeaturizationConfig

in your AutoMLConfig object. Learn more about custom featurizations.
７ Note
The drop columns functionality is deprecated as of SDK version 1.19. Drop columns
from your dataset as part of data cleansing, prior to consuming it in your
automated ML experiment.
Python
featurization_config = FeaturizationConfig()
# `logQuantity` is a leaky feature, so we remove it.

featurization_config.drop_columns = ['logQuantitity']
# Force the CPWVOL5 feature to be of numeric type.

featurization_config.add_column_purpose('CPWVOL5', 'Numeric')
# Fill missing values in the target column, Quantity, with zeroes.

featurization_config.add_transformer_params('Imputer', ['Quantity'],
{"strategy": "constant", "fill_value": 0})
# Fill mising values in the ÌNCOME` column with median value.

featurization_config.add_transformer_params('Imputer', ['INCOME'],
If you're using the Azure Machine Learning studio for your experiment, see how to
customize featurization in the studio.
Optional configurations
Additional optional configurations are available for forecasting tasks, such as enabling
deep learning and specifying a target rolling window aggregation. A complete list of
additional parameters is available in the ForecastingParameters SDK reference
documentation.
Frequency & target data aggregation

Leverage the frequency, freq , parameter to help avoid failures caused by irregular data,
that is data that doesn't follow a set cadence, like hourly or daily data.
For highly irregular data or for varying business needs, users can optionally set their
desired forecast frequency, freq , and specify the target_aggregation_function to
aggregate the target column of the time series. Leverage these two settings in your
AutoMLConfig object can help save some time on data preparation.
Supported aggregation operations for target column values include:
Function Description
sum Sum of target values
mean Mean or average of target values
min Minimum value of a target
max Maximum value of a target
Enable deep learning
７ Note
DNN support for forecasting in Automated Machine Learning is in preview and not
supported for local runs or runs initiated in Databricks.
You can also apply deep learning with deep neural networks, DNNs, to improve the
scores of your model. Automated ML's deep learning allows for forecasting univariate
and multivariate time series data.
Deep learning models have three intrinsic capabilities:
1. They can learn from arbitrary mappings from inputs to outputs

2. They support multiple inputs and outputs
3. They can automatically extract patterns in input data that spans over long
sequences.
To enable deep learning, set the enable_dnn=True in the AutoMLConfig object.
Python
enable_dnn=True,
...
２ Warning
When you enable DNN for experiments created with the SDK, best model
explanations are disabled.
To enable DNN for an AutoML experiment created in the Azure Machine Learning
studio, see the task type settings in the studio UI how-to.
Target rolling window aggregation

Often the best information a forecaster can have is the recent value of the target. Target
rolling window aggregations allow you to add a rolling aggregation of data values as
features. Generating and using these features as extra contextual data helps with the
accuracy of the train model.
For example, say you want to predict energy demand. You might want to add a rolling
window feature of three days to account for thermal changes of heated spaces. In this
example, create this window by setting target_rolling_window_size= 3 in the
AutoMLConfig constructor.
The table shows resulting feature engineering that occurs when window aggregation is
applied. Columns for minimum, maximum, and sum are generated on a sliding window
of three based on the defined settings. Each row has a new calculated feature, in the
case of the timestamp for September 8, 2017 4:00am the maximum, minimum, and sum
values are calculated using the demand values for September 8, 2017 1:00AM - 3:00AM.
This window of three shifts along to populate data for the remaining rows.
View a Python code example applying the target rolling window aggregate feature .
Short series handling

Automated ML considers a time series a short series if there are not enough data points
to conduct the train and validation phases of model development. The number of data
points varies for each experiment, and depends on the max_horizon, the number of
cross validation splits, and the length of the model lookback, that is the maximum of
history that's needed to construct the time-series features.
Automated ML offers short series handling by default with the

short_series_handling_configuration parameter in the ForecastingParameters object.
To enable short series handling, the freq parameter must also be defined. To define an
hourly frequency, we will set freq='H' . View the frequency string options by visiting the
pandas Time series page DataOffset objects section . To change the default behavior,
short_series_handling_configuration = 'auto' , update the
short_series_handling_configuration parameter in your ForecastingParameter object.
Python
from azureml.automl.core.forecasting_parameters import ForecastingParameters
forecast_parameters = ForecastingParameters(time_column_name='day_datetime',
forecast_horizon=50,
short_series_handling_configuration='auto',
freq = 'H',
target_lags='auto')
The following table summarizes the available settings for short_series_handling_config .
Setting Description
auto The default value for short series handling.

- If all series are short, pad the data.
- If not all series are short, drop the short series.
Setting Description
pad If short_series_handling_config = pad , then automated ML adds random values to

each short series found. The following lists the column types and what they're padded
with:
- Object columns with NaNs
- Numeric columns with 0
- Boolean/logic columns with False
- The target column is padded with random values with mean of zero and standard
deviation of 1.
drop If short_series_handling_config = drop , then automated ML drops the short series,

and it will not be used for training or prediction. Predictions for these series will return
NaN's.
None No series is padded or dropped
２ Warning
Padding may impact the accuracy of the resulting model, since we are introducing
artificial data just to get past training without failures. If many of the series are
short, then you may also see some impact in explainability results
Non-stationary time series detection and handling

A time series whose moments (mean and variance) change over time is called a non-
stationary. For example, time series that exhibit stochastic trends are non-stationary by
nature. To visualize this, the below image plots a series that is generally trending
upward. Now, compute and compare the mean (average) values for the first and the
second half of the series. Are they the same? Here, the mean of the series in the first half
of the plot is significantly smaller than in the second half. The fact that the mean of the
series depends on the time interval one is looking at, is an example of the time-varying
moments. Here, the mean of a series is the first moment.
Next, let's examine the image below, which plots the the original series in first
differences, x t = y t − y t − 1 where x t is the change in retail sales and y t and y t − 1 represent
the original series and its first lag, respectively. The mean of the series is roughly
constant regardless the time frame one is looking at. This is an example of a first order
stationary times series. The reason we added the first order term is because the first
moment (mean) does not change with time interval, the same cannot be said about the
variance, which is a second moment.
AutoML Machine learning models can not inherently deal with stochastic trends, or
other well-known problems associated with non-stationary time series. As a result, their
out of sample forecast accuracy will be "poor" if such trends are present.
AutoML automatically analyzes time series dataset to check whether it is stationary or

not. When non-stationary time series are detected, AutoML applies a differencing
transform automatically to mitigate the impact of non-stationary time series.
Run the experiment

When you have your AutoMLConfig object ready, you can submit the experiment. After
the model finishes, retrieve the best run iteration.
Python
experiment = Experiment(ws, "Tutorial-automl-forecasting")
local_run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = local_run.get_output()
Forecasting with best model

Use the best model iteration to forecast values for data that wasn't used to train the
model.
Evaluating model accuracy with a rolling forecast
Before you put a model into production, you should evaluate its accuracy on a test set
held out from the training data. A best practice procedure is a so-called rolling
evaluation which rolls the trained forecaster forward in time over the test set, averaging
error metrics over several prediction windows to obtain statistically robust estimates for
some set of chosen metrics. Ideally, the test set for the evaluation is long relative to the
model's forecast horizon. Estimates of forecasting error may otherwise be statistically
noisy and, therefore, less reliable.
For example, suppose you train a model on daily sales to predict demand up to two
weeks (14 days) into the future. If there is sufficient historic data available, you might
reserve the final several months to even a year of the data for the test set. The rolling
evaluation begins by generating a 14-day-ahead forecast for the first two weeks of the
test set. Then, the forecaster is advanced by some number of days into the test set and
you generate another 14-day-ahead forecast from the new position. The process
continues until you get to the end of the test set.
To do a rolling evaluation, you call the rolling_forecast method of the fitted_model ,

then compute desired metrics on the result. For example, assume you have test set
features in a pandas DataFrame called test_features_df and the test set actual values
of the target in a numpy array called test_target . A rolling evaluation using the mean
squared error is shown in the following code sample:
Python

rolling_forecast_df = fitted_model.rolling_forecast(
test_features_df, test_target, step=1)
mse = mean_squared_error(
rolling_forecast_df[fitted_model.actual_column_name],
rolling_forecast_df[fitted_model.forecast_column_name])
In this sample, the step size for the rolling forecast is set to one which means that the
forecaster is advanced one period, or one day in our demand prediction example, at
each iteration. The total number of forecasts returned by rolling_forecast thus
depends on the length of the test set and this step size. For more details and examples
see the rolling_forecast() documentation and the Forecasting away from training data
notebook .
Prediction into the future

The forecast_quantiles() function allows specifications of when predictions should start,
unlike the predict() method, which is typically used for classification and regression
tasks. The forecast_quantiles() method by default generates a point forecast or a
mean/median forecast which doesn't have a cone of uncertainty around it. Learn more
in the Forecasting away from training data notebook .
In the following example, you first replace all values in y_pred with NaN . The forecast
origin is at the end of training data in this case. However, if you replaced only the
second half of y_pred with NaN , the function would leave the numerical values in the
first half unmodified, but forecast the NaN values in the second half. The function returns
both the forecasted values and the aligned features.
You can also use the forecast_destination parameter in the forecast_quantiles()

function to forecast values up to a specified date.
Python
label_query = test_labels.copy().astype(np.float)
label_query.fill(np.nan)
label_fcst, data_trans = fitted_model.forecast_quantiles(
test_dataset, label_query, forecast_destination=pd.Timestamp(2019, 1,
8))
Often customers want to understand the predictions at a specific quantile of the

distribution. For example, when the forecast is used to control inventory like grocery
items or virtual machines for a cloud service. In such cases, the control point is usually
something like "we want the item to be in stock and not run out 99% of the time". The
following demonstrates how to specify which quantiles you'd like to see for your
predictions, such as 50th or 95th percentile. If you don't specify a quantile, like in the
aforementioned code example, then only the 50th percentile predictions are generated.
Python
# specify which quantiles you would like

fitted_model.quantiles = [0.05,0.5, 0.9]
fitted_model.forecast_quantiles(
test_dataset, label_query, forecast_destination=pd.Timestamp(2019, 1,
8))
You can calculate model metrics like, root mean squared error (RMSE) or mean absolute
percentage error (MAPE) to help you estimate the models performance. See the
Evaluate section of the Bike share demand notebook for an example.
After the overall model accuracy has been determined, the most realistic next step is to
use the model to forecast unknown future values.
Supply a data set in the same format as the test set test_dataset but with future
datetimes, and the resulting prediction set is the forecasted values for each time-series
step. Assume the last time-series records in the data set were for 12/31/2018. To
forecast demand for the next day (or as many periods as you need to forecast, <=
forecast_horizon ), create a single time series record for each store for 01/01/2019.
Output
day_datetime,store,week_of_year
01/01/2019,A,1
01/01/2019,A,1
Repeat the necessary steps to load this future data to a dataframe and then run
best_run.forecast_quantiles(test_dataset) to predict future values.
７ Note
In-sample predictions are not supported for forecasting with automated ML when
target_lags and/or target_rolling_window_size are enabled.
Forecasting at scale
There are scenarios where a single machine learning model is insufficient and multiple
machine learning models are needed. For instance, predicting sales for each individual
store for a brand, or tailoring an experience to individual users. Building a model for
each instance can lead to improved results on many machine learning problems.
Grouping is a concept in time series forecasting that allows time series to be combined
to train an individual model per group. This approach can be particularly helpful if you
have time series which require smoothing, filling or entities in the group that can benefit
from history or trends from other entities. Many models and hierarchical time series
forecasting are solutions powered by automated machine learning for these large scale
forecasting scenarios.
Many models
The Azure Machine Learning many models solution with automated machine learning
allows users to train and manage millions of models in parallel. Many models The
solution accelerator leverages Azure Machine Learning pipelines to train the model.
Specifically, a Pipeline object and ParalleRunStep are used and require specific
configuration parameters set through the ParallelRunConfig.
The following diagram shows the workflow for the many models solution.
The following code demonstrates the key parameters users need to set up their many
models run. See the Many Models- Automated ML notebook for a many models
forecasting example
Python
from azureml.train.automl.runtime._many_models.many_models_parameters import

ManyModelsTrainParameters
partition_column_names = ['Store', 'Brand']

automl_settings = {"task" : 'forecasting',
"primary_metric" : 'normalized_root_mean_squared_error',
"iteration_timeout_minutes" : 10, #This needs to be
changed based on the dataset. Explore how long training is taking before
setting this value
"iterations" : 15,
"experiment_timeout_hours" : 1,
"label_column_name" : 'Quantity',
"n_cross_validations" : "auto", # Could be customized as
an integer
"cv_step_size" : "auto", # Could be customized as an
integer
"time_column_name": 'WeekStarting',
"max_horizon" : 6,
"track_child_runs": False,
"pipeline_fetch_max_batch_size": 15,}
mm_paramters = ManyModelsTrainParameters(automl_settings=automl_settings,
partition_column_names=partition_column_names)
Hierarchical time series forecasting
In most applications, customers have a need to understand their forecasts at a macro
and micro level of the business; whether that be predicting sales of products at different
geographic locations, or understanding the expected workforce demand for different
organizations at a company. The ability to train a machine learning model to
intelligently forecast on hierarchy data is essential.
A hierarchical time series is a structure in which each of the unique series are arranged
into a hierarchy based on dimensions such as, geography or product type. The following
example shows data with unique attributes that form a hierarchy. Our hierarchy is
defined by: the product type such as headphones or tablets, the product category which
splits product types into accessories and devices, and the region the products are sold
in.
To further visualize this, the leaf levels of the hierarchy contain all the time series with
unique combinations of attribute values. Each higher level in the hierarchy considers one
less dimension for defining the time series and aggregates each set of child nodes from
the lower level into a parent node.
The hierarchical time series solution is built on top of the Many Models Solution and
share a similar configuration setup.
The following code demonstrates the key parameters to set up your hierarchical time
series forecasting runs. See the Hierarchical time series- Automated ML notebook , for
an end to end example.
Python
from azureml.train.automl.runtime._hts.hts_parameters import

HTSTrainParameters
model_explainability = True
engineered_explanations = False # Define your hierarchy. Adjust the settings

below based on your dataset.
hierarchy = ["state", "store_id", "product_category", "SKU"]
training_level = "SKU"# Set your forecast parameters. Adjust the settings
below based on your dataset.
time_column_name = "date"
label_column_name = "quantity"
forecast_horizon = 7
automl_settings = {"task" : "forecasting",

"primary_metric" : "normalized_root_mean_squared_error",
"label_column_name": label_column_name,
"time_column_name": time_column_name,
"forecast_horizon": forecast_horizon,
"hierarchy_column_names": hierarchy,
"hierarchy_training_level": training_level,
"track_child_runs": False,
"pipeline_fetch_max_batch_size": 15,
"model_explainability": model_explainability,# The
following settings are specific to this sample and should be adjusted
according to your own needs.
"iteration_timeout_minutes" : 10,
"iterations" : 10,
"n_cross_validations" : "auto", # Could be customized as
an integer
"cv_step_size" : "auto", # Could be customized as an
integer
}
hts_parameters = HTSTrainParameters(
automl_settings=automl_settings,
hierarchy_column_names=hierarchy,
training_level=training_level,
enable_engineered_explanations=engineered_explanations
)
Example notebooks
See the forecasting sample notebooks for detailed code examples of advanced
forecasting configuration including:
holiday detection and featurization

rolling-origin cross validation
configurable lags
rolling window aggregate features
Next steps
Learn more about How to deploy an AutoML model to an online endpoint.
Learn about Interpretability: model explanations in automated machine learning
(preview).
Learn about how AutoML builds forecasting models.
Make predictions with ONNX on
computer vision models from AutoML
(v1)
Article • 06/02/2023
） Important

and Python SDK v2.
In this article, you learn how to use Open Neural Network Exchange (ONNX) to make
predictions on computer vision models generated from automated machine learning
(AutoML) in Azure Machine Learning.
To use ONNX for predictions, you need to:
1. Download ONNX model files from an AutoML training run.

2. Understand the inputs and outputs of an ONNX model.
3. Preprocess your data so it's in the required format for input images.
4. Perform inference with ONNX Runtime for Python.
5. Visualize predictions for object detection and instance segmentation tasks.
ONNX is an open standard for machine learning and deep learning models. It enables
model import and export (interoperability) across the popular AI frameworks. For more
details, explore the ONNX GitHub project .
ONNX Runtime is an open-source project that supports cross-platform inference.

ONNX Runtime provides APIs across programming languages (including Python, C++,
C#, C, Java, and JavaScript). You can use these APIs to perform inference on input
images. After you have the model that has been exported to ONNX format, you can use
these APIs on any programming language that your project needs.
In this guide, you'll learn how to use Python APIs for ONNX Runtime to make
predictions on images for popular vision tasks. You can use these ONNX exported
models across languages.
Prerequisites
Get an AutoML-trained computer vision model for any of the supported image
tasks: classification, object detection, or instance segmentation. Learn more about
AutoML support for computer vision tasks.
Install the onnxruntime package. The methods in this article have been tested
with versions 1.3.0 to 1.8.0.
Download ONNX model files

You can download ONNX model files from AutoML runs by using the Azure Machine
Learning studio UI or the Azure Machine Learning Python SDK. We recommend
downloading via the SDK with the experiment name and parent run ID.

On Azure Machine Learning studio, go to your experiment by using the hyperlink to the
experiment generated in the training notebook, or by selecting the experiment name on
the Experiments tab under Assets. Then select the best child run.
Within the best child run, go to Outputs+logs > train_artifacts. Use the Download
button to manually download the following files:
labels.json: File that contains all the classes or labels in the training dataset.
model.onnx: Model in ONNX format.
Save the downloaded model files in a directory. The example in this article uses the
./automl_models directory.

With the SDK, you can select the best child run (by primary metric) with the experiment
name and parent run ID. Then, you can download the labels.json and model.onnx files.
The following code returns the best child run based on the relevant primary metric.
Python
from azureml.train.automl.run import AutoMLRun
# Select the best child run

run_id = '' # Specify the run ID
automl_image_run = AutoMLRun(experiment=experiment, run_id=run_id)
Download the labels.json file, which contains all the classes and labels in the training
dataset.
Python
labels_file = 'automl_models/labels.json'
best_child_run.download_file(name='train_artifacts/labels.json',
output_file_path=labels_file)
Download the model.onnx file.

Python
onnx_model_path = 'automl_models/model.onnx'
best_child_run.download_file(name='train_artifacts/model.onnx',
output_file_path=onnx_model_path)
Model generation for batch scoring

By default, AutoML for Images supports batch scoring for classification. But object
detection and instance segmentation models don't support batch inferencing. In case of
batch inference for object detection and instance segmentation, use the following
procedure to generate an ONNX model for the required batch size. Models generated
for a specific batch size don't work for other batch sizes.
Python
from azureml.core.script_run_config import ScriptRunConfig

# specify experiment name

experiment_name = ''
# specify workspace parameters
subscription_id = ''
resource_group = ''
workspace_name = ''
# load the workspace and compute target
ws = ''
compute_target = ''
experiment = Experiment(ws, name=experiment_name)
# specify the run id of the automl run

run_id = ''
automl_image_run = AutoMLRun(experiment=experiment, run_id=run_id)
Use the following model specific arguments to submit the script. For more details on
arguments, refer to model specific hyperparameters and for supported object detection
model names refer to the supported model algorithm section.
To get the argument values needed to create the batch scoring model, refer to the
scoring scripts generated under the outputs folder of the AutoML training runs. Use the
hyperparameter values available in the model settings variable inside the scoring file for
the best child run.
Multi-class image classification
For multi-class image classification, the generated ONNX model for the best child-
run supports batch scoring by default. Therefore, no model specific arguments are
needed for this task type and you can skip to the Load the labels and ONNX model
files section.
Download and keep the ONNX_batch_model_generator_automl_for_images.py file in the

current directory and submit the script. Use ScriptRunConfig to submit the script
ONNX_batch_model_generator_automl_for_images.py available in the azureml-examples
GitHub repository , to generate an ONNX model of a specific batch size. In the

following code, the trained model environment is used to submit this script to generate
and save the ONNX model to the outputs directory.
Python
script_run_config = ScriptRunConfig(source_directory='.',
script='ONNX_batch_model_generator_automl_for_images.py',
arguments=arguments,
environment=best_child_run.get_environment())
remote_run = experiment.submit(script_run_config)
remote_run.wait_for_completion(wait_post_processing=True)
Once the batch model is generated, either download it from Outputs+logs > outputs
manually, or use the following method:
Python
batch_size= 8 # use the batch size used to generate the model

onnx_model_path = 'automl_models/model.onnx' # local path to save the model
remote_run.download_file(name='outputs/model_'+str(batch_size)+'.onnx',
output_file_path=onnx_model_path)
After the model downloading step, you use the ONNX Runtime Python package to
perform inferencing by using the model.onnx file. For demonstration purposes, this
article uses the datasets from How to prepare image datasets for each vision task.
We've trained the models for all vision tasks with their respective datasets to
demonstrate ONNX model inference.
Load the labels and ONNX model files
The following code snippet loads labels.json, where class names are ordered. That is, if
the ONNX model predicts a label ID as 2, then it corresponds to the label name given at
the third index in the labels.json file.
Python
import json
import onnxruntime
labels_file = "automl_models/labels.json"
with open(labels_file) as f:
classes = json.load(f)
print(classes)
try:
session = onnxruntime.InferenceSession(onnx_model_path)
print("ONNX model loaded...")
print("Error loading ONNX file: ",str(e))
Get expected input and output details for an

ONNX model
When you have the model, it's important to know some model-specific and task-specific
details. These details include the number of inputs and number of outputs, expected
input shape or format for preprocessing the image, and output shape so you know the
model-specific or task-specific outputs.
Python
sess_input = session.get_inputs()
sess_output = session.get_outputs()
print(f"No. of inputs : {len(sess_input)}, No. of outputs :
{len(sess_output)}")
for idx, input_ in enumerate(range(len(sess_input))):

input_name = sess_input[input_].name
input_shape = sess_input[input_].shape
input_type = sess_input[input_].type
print(f"{idx} Input name : { input_name }, Input shape : {input_shape},
\
Input type : {input_type}")
for idx, output in enumerate(range(len(sess_output))):

output_name = sess_output[output].name
output_shape = sess_output[output].shape
output_type = sess_output[output].type
print(f" {idx} Output name : {output_name}, Output shape :
{output_shape}, \
Output type : {output_type}")
Expected input and output formats for the ONNX model

Every ONNX model has a predefined set of input and output formats.
This example applies the model trained on the fridgeObjects dataset with 134
images and 4 classes/labels to explain ONNX model inference. For more
information on training an image classification task, see the multi-class image
classification notebook .
Input format
The input is a preprocessed image.
Input Input shape Input type Description

name
input1 (batch_size, ndarray(float) Input is a preprocessed image, with the shape (1,
num_channels, 3, 224, 224) for a batch size of 1, and a height
height, and width of 224. These numbers correspond to
width) the values used for crop_size in the training
example.
Output format
The output is an array of logits for all the classes/labels.
Output Output Output type Description

name shape
output1 (batch_size, ndarray(float) Model returns logits (without softmax ). For

num_classes) instance, for batch size 1 and 4 classes, it returns
(1, 4) .
Preprocessing
Perform the following preprocessing steps for the ONNX model inference:
1. Convert the image to RGB.

2. Resize the image to valid_resize_size and valid_resize_size values that
correspond to the values used in the transformation of the validation dataset
during training. The default value for valid_resize_size is 256.
3. Center crop the image to height_onnx_crop_size and width_onnx_crop_size . It
corresponds to valid_crop_size with the default value of 224.
4. Change HxWxC to CxHxW .
5. Convert to float type.
6. Normalize with ImageNet's mean = [0.485, 0.456, 0.406] and std = [0.229,
0.224, 0.225] .
If you chose different values for the hyperparameters valid_resize_size and

valid_crop_size during training, then those values should be used.
Get the input shape needed for the ONNX model.
Python
batch, channel, height_onnx_crop_size, width_onnx_crop_size =

session.get_inputs()[0].shape
batch, channel, height_onnx_crop_size, width_onnx_crop_size
Without PyTorch
Python
import glob
import numpy as np
def preprocess(image, resize_size, crop_size_onnx):

"""Perform pre-processing on raw input image
:param image: raw input image

:type image: PIL image
:param resize_size: value to resize the image
:type image: Int
:param crop_size_onnx: expected height of an input image in onnx
model
:type crop_size_onnx: Int
:return: pre-processed image in numpy format
:rtype: ndarray 1xCxHxW
"""
image = image.convert('RGB')
# resize
image = image.resize((resize_size, resize_size))
# center crop
left = (resize_size - crop_size_onnx)/2
top = (resize_size - crop_size_onnx)/2
right = (resize_size + crop_size_onnx)/2
bottom = (resize_size + crop_size_onnx)/2
image = image.crop((left, top, right, bottom))
np_image = np.array(image)
# HWC -> CHW
np_image = np_image.transpose(2, 0, 1) # CxHxW
# normalize the image
mean_vec = np.array([0.485, 0.456, 0.406])
std_vec = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(np_image.shape).astype('float32')
for i in range(np_image.shape[0]):
norm_img_data[i,:,:] = (np_image[i,:,:]/255 -
mean_vec[i])/std_vec[i]
np_image = np.expand_dims(norm_img_data, axis=0) # 1xCxHxW

return np_image
# following code loads only batch_size number of images for

demonstrating ONNX inference
# make sure that the data directory has at least batch_size number of
images
test_images_path = "automl_models_multi_cls/test_images_dir/*" # replace

with path to images
# Select batch size needed
batch_size = 8
# you can modify resize_size based on your trained model
resize_size = 256
# height and width will be the same for classification
crop_size_onnx = height_onnx_crop_size
image_files = glob.glob(test_images_path)
img_processed_list = []
for i in range(batch_size):
img = Image.open(image_files[i])
img_processed_list.append(preprocess(img, resize_size,
crop_size_onnx))
if len(img_processed_list) > 1:
img_data = np.concatenate(img_processed_list)
elif len(img_processed_list) == 1:
img_data = img_processed_list[0]
else:
img_data = None
assert batch_size == img_data.shape[0]

With PyTorch
Python
import glob
import torch
import numpy as np
from torchvision import transforms
def _make_3d_tensor(x) -> torch.Tensor:

"""This function is for images that have less channels.
:param x: input tensor

:type x: torch.Tensor
:return: return a tensor with the correct number of channels
:rtype: torch.Tensor
"""
return x if x.shape[0] == 3 else x.expand((3, x.shape[1],
x.shape[2]))
def preprocess(image, resize_size, crop_size_onnx):

transform = transforms.Compose([
transforms.Resize(resize_size),
transforms.CenterCrop(crop_size_onnx),
transforms.ToTensor(),
transforms.Lambda(_make_3d_tensor),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,
0.225])])
img_data = transform(image)
img_data = img_data.numpy()
img_data = np.expand_dims(img_data, axis=0)
return img_data
# following code loads only batch_size number of images for

demonstrating ONNX inference
# make sure that the data directory has at least batch_size number of
images
test_images_path = "automl_models_multi_cls/test_images_dir/*" # replace

with path to images
# Select batch size needed
batch_size = 8
# you can modify resize_size based on your trained model
resize_size = 256
# height and width will be the same for classification
crop_size_onnx = height_onnx_crop_size
image_files = glob.glob(test_images_path)
img_processed_list = []
for i in range(batch_size):
img = Image.open(image_files[i])
img_processed_list.append(preprocess(img, resize_size,
crop_size_onnx))
if len(img_processed_list) > 1:
img_data = np.concatenate(img_processed_list)
elif len(img_processed_list) == 1:
img_data = img_processed_list[0]
else:
img_data = None
assert batch_size == img_data.shape[0]
Inference with ONNX Runtime

Inferencing with ONNX Runtime differs for each computer vision task.
Python
def get_predictions_from_ONNX(onnx_session, img_data):

"""Perform predictions with ONNX runtime
:param onnx_session: onnx model session

:type onnx_session: class InferenceSession
:param img_data: pre-processed numpy image
:type img_data: ndarray with shape 1xCxHxW
:return: scores with shapes
(1, No. of classes in training dataset)
:rtype: numpy array
"""
sess_input = onnx_session.get_inputs()
sess_output = onnx_session.get_outputs()
print(f"No. of inputs : {len(sess_input)}, No. of outputs :
{len(sess_output)}")
# predict with ONNX Runtime
output_names = [ output.name for output in sess_output]
scores = onnx_session.run(output_names=output_names,\
input_feed=
{sess_input[0].name: img_data})
return scores[0]
scores = get_predictions_from_ONNX(session, img_data)

Postprocessing
Apply softmax() over predicted values to get classification confidence scores

(probabilities) for each class. Then the prediction will be the class with the highest
probability.
Without PyTorch
Python
def softmax(x):
e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
return e_x / np.sum(e_x, axis=1, keepdims=True)
conf_scores = softmax(scores)
class_preds = np.argmax(conf_scores, axis=1)
print("predicted classes:", ([(class_idx, classes[class_idx]) for
class_idx in class_preds]))
With PyTorch
Python
conf_scores = torch.nn.functional.softmax(torch.from_numpy(scores),
dim=1)
class_preds = torch.argmax(conf_scores, dim=1)
print("predicted classes:", ([(class_idx.item(), classes[class_idx]) for
class_idx in class_preds]))
Visualize predictions
Visualize an input image with labels
Python

%matplotlib inline
sample_image_index = 0 # change this for an image of interest from
image_files list
IMAGE_SIZE = (18, 12)
plt.figure(figsize=IMAGE_SIZE)
img_np = mpimg.imread(image_files[sample_image_index])
img = Image.fromarray(img_np.astype('uint8'), 'RGB')

x, y = img.size
fig,ax = plt.subplots(1, figsize=(15, 15))

# Display the image
ax.imshow(img_np)
label = class_preds[sample_image_index]
if torch.is_tensor(label):
label = label.item()
conf_score = conf_scores[sample_image_index]
if torch.is_tensor(conf_score):
conf_score = np.max(conf_score.tolist())
else:
conf_score = np.max(conf_score)
display_text = '{} ({})'.format(label, round(conf_score, 3))

print(display_text)
color = 'red'
plt.text(30, 30, display_text, color=color, fontsize=30)
plt.show()
Next steps
Learn more about computer vision tasks in AutoML
Troubleshoot AutoML experiments
Troubleshoot automated ML
experiments in Python
Article • 06/01/2023
In this guide, learn how to identify and resolve known issues in your automated machine
learning experiments with the Azure Machine Learning SDK.
Version dependencies
AutoML dependencies to newer package versions break compatibility. After SDK
version 1.13.0, models aren't loaded in older SDKs due to incompatibility between the
older versions pinned in previous AutoML packages, and the newer versions pinned
today.
Expect errors such as:
Module not found errors such as,
No module named 'sklearn.decomposition._truncated_svd'
Import errors such as,
ImportError: cannot import name 'RollingOriginValidator' ,
Attribute errors such as,
AttributeError: 'SimpleImputer' object has no attribute 'add_indicator'
Resolutions depend on your AutoML SDK training version:
If your AutoML SDK training version is greater than 1.13.0, you need pandas ==
0.25.1 and scikit-learn==0.22.1 .
If there is a version mismatch, upgrade scikit-learn and/or pandas to correct

version with the following,
Bash
pip install --upgrade pandas==0.25.1

pip install --upgrade scikit-learn==0.22.1
If your AutoML SDK training version is less than or equal to 1.12.0, you need pandas
== 0.23.4 and sckit-learn==0.20.3 .
If there is a version mismatch, downgrade scikit-learn and/or pandas to correct

version with the following,
Bash
pip install --upgrade pandas==0.23.4

pip install --upgrade scikit-learn==0.20.3
Setup
AutoML package changes since version 1.0.76 require the previous version to be
uninstalled before updating to the new version.
ImportError: cannot import name AutoMLConfig
If you encounter this error after upgrading from an SDK version before v1.0.76 to
v1.0.76 or later, resolve the error by running: pip uninstall azureml-train automl
and then pip install azureml-train-automl . The automl_setup.cmd script does
this automatically.
automl_setup fails
On Windows, run automl_setup from an Anaconda Prompt. Install Miniconda .
Ensure that conda 64-bit version 4.4.10 or later is installed. You can check the bit
with the conda info command. The platform should be win-64 for Windows or
osx-64 for Mac. To check the version use the command conda -V . If you have a
previous version installed, you can update it by using the command: conda
update conda . To check 32-bit by running
Ensure that conda is installed.
Linux - gcc: error trying to exec 'cc1plus'
1. If the gcc: error trying to exec 'cc1plus': execvp: No such file or

directory error is encountered, install the GCC build tools for your Linux
distribution. For example, on Ubuntu, use the command sudo apt-get

install build-essential .
2. Pass a new name as the first parameter to automl_setup to create a new
conda environment. View existing conda environments using conda env
list and remove them with conda env remove -n <environmentname> .
automl_setup_linux.sh fails: If automl_setup_linus.sh fails on Ubuntu Linux with the

error: unable to execute 'gcc': No such file or directory
1. Make sure that outbound ports 53 and 80 are enabled. On an Azure virtual
machine, you can do this from the Azure portal by selecting the VM and
clicking on Networking.
2. Run the command: sudo apt-get update
3. Run the command: sudo apt-get install build-essential --fix-missing
4. Run automl_setup_linux.sh again
configuration.ipynb fails:
For local conda, first ensure that automl_setup has successfully run.
Ensure that the subscription_id is correct. Find the subscription_id in the Azure
portal by selecting All Service and then Subscriptions. The characters "<" and
">" should not be included in the subscription_id value. For example,
subscription_id = "12345678-90ab-1234-5678-1234567890abcd" has the valid
format.
Ensure Contributor or Owner access to the subscription.
Check that the region is one of the supported regions: eastus2 , eastus ,
westcentralus , southeastasia , westeurope , australiaeast , westus2 ,
southcentralus .
Ensure access to the region using the Azure portal.
workspace.from_config fails:
If the call ws = Workspace.from_config() fails:
1. Ensure that the configuration.ipynb notebook has run successfully.

2. If the notebook is being run from a folder that is not under the folder where
the configuration.ipynb was run, copy the folder aml_config and the file
config.json that it contains to the new folder. Workspace.from_config reads
the config.json for the notebook folder or its parent folder.
3. If a new subscription, resource group, workspace, or region, is being used,
make sure that you run the configuration.ipynb notebook again. Changing
config.json directly will only work if the workspace already exists in the
specified resource group under the specified subscription.
4. If you want to change the region, change the workspace, resource group, or
subscription. Workspace.create will not create or update a workspace if it
already exists, even if the region specified is different.
TensorFlow
As of version 1.5.0 of the SDK, automated machine learning does not install TensorFlow
models by default. To install TensorFlow and use it with your automated ML
experiments, install tensorflow==1.12.0 via CondaDependencies .
Python
from azureml.core.runconfig import RunConfiguration

run_config.environment.python.conda_dependencies =
CondaDependencies.create(conda_packages=['tensorflow==1.12.0'])
Numpy failures
import numpy fails in Windows: Some Windows environments see an error loading
numpy with the latest Python version 3.6.8. If you see this issue, try with Python
version 3.6.7.
import numpy fails: Check the TensorFlow version in the automated ml conda
environment. Supported versions are < 1.13. Uninstall TensorFlow from the
environment if version is >= 1.13.
You can check the version of TensorFlow and uninstall as follows:
1. Start a command shell, activate conda environment where automated ml packages

are installed.
2. Enter pip freeze and look for tensorflow , if found, the version listed should be <
1.13
3. If the listed version is not a supported version, pip uninstall tensorflow in the
command shell and enter y for confirmation.
jwt.exceptions.DecodeError
Exact error message: jwt.exceptions.DecodeError: It is required that you pass in a
value for the "algorithms" argument when calling decode() .
For SDK versions <= 1.17.0, installation might result in an unsupported version of
PyJWT. Check that the PyJWT version in the automated ml conda environment is a
supported version. That is PyJWT version < 2.0.0.
You may check the version of PyJWT as follows:
1. Start a command shell and activate conda environment where automated ML

packages are installed.
2. Enter pip freeze and look for PyJWT , if found, the version listed should be < 2.0.0
If the listed version is not a supported version:
1. Consider upgrading to the latest version of AutoML SDK: pip install -U azureml-
sdk[automl]
2. If that is not viable, uninstall PyJWT from the environment and install the right
version as follows:
a. pip uninstall PyJWT in the command shell and enter y for confirmation.
b. Install using pip install 'PyJWT<2.0.0' .
Data access
For automated ML jobs, you need to ensure the file datastore that connects to your
AzureFile storage has the appropriate authentication credentials. Otherwise, the
following message results. Learn how to update your data access authentication
credentials.
Error message: Could not create a connection to the AzureFileService due to missing
credentials. Either an Account Key or SAS token needs to be linked the default
workspace blob store.
Data schema
When you try to create a new automated ML experiment via the Edit and submit button
in the Azure Machine Learning studio, the data schema for the new experiment must
match the schema of the data that was used in the original experiment. Otherwise, an
error message similar to the following results. Learn more about how to edit and submit
experiments from the studio UI.
Error message non-vision experiments: Schema mismatch error: (an) additional

column(s): "Column1: String, Column2: String, Column3: String", (a) missing
column(s)
Error message for vision datasets: Schema mismatch error: (an) additional column(s):
"dataType: String, dataSubtype: String, dateTime: Date, category: String,
subcategory: String, status: String, address: String, latitude: Decimal, longitude:

Decimal, source: String, extendedProperties: String", (a) missing column(s):
"image_url: Stream, image_details: DataRow, label: List" Vision dataset error(s):
Vision dataset should have a target column with name 'label'. Vision dataset should
have labelingProjectType tag with value as 'Object Identification (Bounding Box)'.
Databricks
See How to configure an automated ML experiment with Databricks (Azure Machine
Learning SDK v1).
Forecasting R2 score is always zero

This issue arises if the training data provided has time series that contains the same
value for the last n_cv_splits + forecasting_horizon data points.
If this pattern is expected in your time series, you can switch your primary metric to
normalized root mean squared error.
Failed deployment
For versions <= 1.18.0 of the SDK, the base image created for deployment may fail with
the following error: ImportError: cannot import name cached_property from werkzeug .
The following steps can work around the issue:
1. Download the model package

2. Unzip the package
3. Deploy using the unzipped assets
Azure Functions application

Automated ML does not currently support Azure Functions applications.
Sample notebook failures

If a sample notebook fails with an error that property, method, or library does not exist:
Ensure that the correct kernel has been selected in the Jupyter Notebook. The
kernel is displayed in the top right of the notebook page. The default is
azure_automl. The kernel is saved as part of the notebook. If you switch to a new
conda environment, you need to select the new kernel in the notebook.
For Azure Notebooks, it should be Python 3.6.
For local conda environments, it should be the conda environment name that
you specified in automl_setup.
To ensure the notebook is for the SDK version that you are using,
Check the SDK version by executing azureml.core.VERSION in a Jupyter
Notebook cell.
You can download previous version of the sample notebooks from GitHub with
these steps:
1. Select the Branch button

2. Navigate to the Tags tab
3. Select the version
Experiment throttling
If you have over 100 automated ML experiments, this may cause new automated ML
experiments to have long run times.
VNet Firewall Setting Download Failure

If you are under virtual networks (VNets), you may run into model download failures
when using AutoML NLP. This is because network traffic is blocked from downloading
the models and tokenizers from Azure CDN. To unblock this, please allow list the below
URLs in the "Application rules" setting of the VNet firewall policy:
aka.ms
https://automlresources-prod.azureedge.net
Please follow the instructions here to configure the firewall settings.
Instructions for configuring workspace under vnet are available here.
Next steps
Learn more about how to train a regression model with Automated machine
learning or how to train using Automated machine learning on a remote resource.

Train models with Azure Machine
Learning (v1)
Article • 06/01/2023
Azure Machine Learning provides several ways to train your models, from code-first
solutions using the SDK to low-code solutions such as automated machine learning and
the visual designer. Use the following list to determine which training method is right
for you:
Azure Machine Learning SDK for Python: The Python SDK provides several ways to
train models, each with different capabilities.
Training Description
method
Run A typical way to train models is to use a training script and job
configuration configuration. The job configuration provides the information needed to
configure the training environment used to train your model. You can
specify your training script, compute target, and Azure Machine Learning
environment in your job configuration and run a training job.
Automated Automated machine learning allows you to train models without extensive
machine data science or programming knowledge. For people with a data science
learning and programming background, it provides a way to save time and
resources by automating algorithm selection and hyperparameter tuning.
You don't have to worry about defining a job configuration when using
automated machine learning.
Training Description
method
Machine Pipelines are not a different training method, but a way of defining a
learning workflow using modular, reusable steps that can include training as part of
pipeline the workflow. Machine learning pipelines support using automated
machine learning and run configuration to train models. Since pipelines are
not focused specifically on training, the reasons for using a pipeline are
more varied than the other training methods. Generally, you might use a
pipeline when:
* You want to schedule unattended processes such as long running
training jobs or data preparation.
* Use multiple steps that are coordinated across heterogeneous compute
resources and storage locations.
* Use the pipeline as a reusable template for specific scenarios, such as
retraining or batch scoring.
* Track and version data sources, inputs, and outputs for your workflow.
* Your workflow is implemented by different teams that work on specific
steps independently. Steps can then be joined together in a pipeline to
implement the workflow.
Designer: Azure Machine Learning designer provides an easy entry-point into

machine learning for building proof of concepts, or for users with little coding
experience. It allows you to train models using a drag and drop web-based UI. You
can use Python code as part of the design, or train models without writing any
code.
Azure CLI: The machine learning CLI provides commands for common tasks with
Azure Machine Learning, and is often used for scripting and automating tasks. For
example, once you've created a training script or pipeline, you might use the Azure
CLI to start a training job on a schedule or when the data files used for training are
updated. For training models, it provides commands that submit training jobs. It
can submit jobs using run configurations or pipelines.
Each of these training methods can use different types of compute resources for
training. Collectively, these resources are referred to as compute targets. A compute
target can be a local machine or a cloud resource, such as an Azure Machine Learning
Compute, Azure HDInsight, or a remote virtual machine.
Python SDK
The Azure Machine Learning SDK for Python allows you to build and run machine
learning workflows with Azure Machine Learning. You can interact with the service from
an interactive Python session, Jupyter Notebooks, Visual Studio Code, or other IDE.
What is the Azure Machine Learning SDK for Python
Install/update the SDK
Configure a development environment for Azure Machine Learning
Run configuration
A generic training job with Azure Machine Learning can be defined using the
ScriptRunConfig. The script run configuration is then used, along with your training
script(s) to train a model on a compute target.
You may start with a run configuration for your local computer, and then switch to one
for a cloud-based compute target as needed. When changing the compute target, you
only change the run configuration you use. A run also logs information about the
training job, such as the inputs, outputs, and logs.
What is a run configuration?

Tutorial: Train your first ML model
Examples: Jupyter Notebook and Python examples of training models
How to: Configure a training run
Automated Machine Learning

Define the iterations, hyperparameter settings, featurization, and other settings. During
training, Azure Machine Learning tries different algorithms and parameters in parallel.
Training stops once it hits the exit criteria you defined.
 Tip
In addition to the Python SDK, you can also use Automated ML through Azure
What is automated machine learning?

Tutorial: Create your first classification model with automated machine learning
Examples: Jupyter Notebook examples for automated machine learning
How to: Configure automated ML experiments in Python
How to: Autotrain a time-series forecast model
How to: Create, explore, and deploy automated machine learning experiments with
Machine learning pipeline

Machine learning pipelines can use the previously mentioned training methods.
Pipelines are more about creating a workflow, so they encompass more than just the
training of models. In a pipeline, you can train a model using automated machine
learning or run configurations.
What are ML pipelines in Azure Machine Learning?

Create and run machine learning pipelines with Azure Machine Learning SDK
Tutorial: Use Azure Machine Learning Pipelines for batch scoring
Examples: Jupyter Notebook examples for machine learning pipelines
Examples: Pipeline with automated machine learning
Understand what happens when you submit a training

job
The Azure training lifecycle consists of:
1. Zipping the files in your project folder, ignoring those specified in .amlignore or
.gitignore
2. Scaling up your compute cluster
3. Building or downloading the dockerfile to the compute node
a. The system calculates a hash of:
The base image

Custom docker steps (see Deploy a model using a custom Docker base
image)
The conda definition YAML (see Create & use software environments in
Azure Machine Learning)
b. The system uses this hash as the key in a lookup of the workspace Azure
Container Registry (ACR)
c. If it is not found, it looks for a match in the global ACR
d. If it is not found, the system builds a new image (which will be cached and
registered with the workspace ACR)
4. Downloading your zipped project file to temporary storage on the compute node
5. Unzipping the project file
6. The compute node executing python <entry script> <arguments>
7. Saving logs, model files, and other files written to ./outputs to the storage
account associated with the workspace
8. Scaling down compute, including removing temporary storage
If you choose to train on your local machine ("configure as local run"), you do not need
to use Docker. You may use Docker locally if you choose (see the section Configure ML
pipeline for an example).

The designer lets you train models using a drag and drop interface in your web browser.
What is the designer?

Tutorial: Predict automobile price
Azure CLI
The machine learning CLI is an extension for the Azure CLI. It provides cross-platform
CLI commands for working with Azure Machine Learning. Typically, you use the CLI to
automate tasks, such as training a machine learning model.
Use the CLI extension for Azure Machine Learning

MLOps on Azure
Next steps
Learn how to Configure a training run.
Configure and submit training jobs
Article • 04/04/2023
In this article, you learn how to configure and submit Azure Machine Learning jobs to
train your models. Snippets of code explain the key parts of configuration and
submission of a training script. Then use one of the example notebooks to find the full
end-to-end working examples.
When training, it is common to start on your local computer, and then later scale out to
a cloud-based cluster. With Azure Machine Learning, you can run your script on various
compute targets without having to change your training script.
All you need to do is define the environment for each compute target within a script job
configuration. Then, when you want to run your training experiment on a different
compute target, specify the job configuration for that compute.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin. Try
the free or paid version of Azure Machine Learning today
The Azure Machine Learning SDK for Python (>= 1.13.0)
An Azure Machine Learning workspace, ws
A compute target, my_compute_target . Create a compute target
What's a script run configuration?

A ScriptRunConfig is used to configure the information necessary for submitting a
training job as part of an experiment.
You submit your training experiment with a ScriptRunConfig object. This object includes
the:
source_directory: The source directory that contains your training script

script: The training script to run
compute_target: The compute target to run on
environment: The environment to use when running the script
and some additional configurable options (see the reference documentation for
more information)
Train your model
The code pattern to submit a training job is the same for all types of compute targets:
1. Create an experiment to run

2. Create an environment where the script will run
3. Create a ScriptRunConfig, which specifies the compute target and environment
4. Submit the job
5. Wait for the job to complete
Or you can:
Submit a HyperDrive run for hyperparameter tuning.

Submit an experiment via the VS Code extension.
Create an experiment
Create an experiment in your workspace. An experiment is a light-weight container that
helps to organize job submissions and keep track of code.
Python
experiment_name = 'my_experiment'
experiment = Experiment(workspace=ws, name=experiment_name)
Select a compute target

Select the compute target where your training script will run on. If no compute target is
specified in the ScriptRunConfig, or if compute_target='local' , Azure Machine Learning
will execute your script locally.
The example code in this article assumes that you have already created a compute
target my_compute_target from the "Prerequisites" section.
７ Note
Azure Databricks is not supported as a compute target for model training. You can
use Azure Databricks for data preparation and deployment tasks.
７ Note
To create and attach a compute target for training on Azure Arc-enabled

Kubernetes cluster, see Configure Azure Arc-enabled Machine Learning
Create an environment
Azure Machine Learning environments are an encapsulation of the environment where
your machine learning training happens. They specify the Python packages, Docker
image, environment variables, and software settings around your training and scoring
scripts. They also specify runtimes (Python, Spark, or Docker).
You can either define your own environment, or use an Azure Machine Learning curated
environment. Curated environments are predefined environments that are available in
your workspace by default. These environments are backed by cached Docker images
which reduce the job preparation cost. See Azure Machine Learning Curated
Environments for the full list of available curated environments.
For a remote compute target, you can use one of these popular curated environments
to start with:
Python
from azureml.core import Workspace, Environment
myenv = Environment.get(workspace=ws, name="AzureML-Minimal")
For more information and details about environments, see Create & use software
environments in Azure Machine Learning.
Local compute target

If your compute target is your local machine, you are responsible for ensuring that all
the necessary packages are available in the Python environment where the script runs.
Use python.user_managed_dependencies to use your current Python environment (or the
Python on the path you specify).
Python

myenv = Environment("user-managed-env")
myenv.python.user_managed_dependencies = True
# You can choose a specific Python environment by pointing to a Python path

# myenv.python.interpreter_path =
'/home/johndoe/miniconda3/envs/myenv/bin/python'
Create the script job configuration

Now that you have a compute target ( my_compute_target , see Prerequisites and
environment ( myenv , see Create an environment), create a script job configuration that
runs your training script ( train.py ) located in your project_folder directory:
Python
src = ScriptRunConfig(source_directory=project_folder,
script='train.py',
compute_target=my_compute_target,
environment=myenv)
# Set compute target

# Skip this if you are running on your local computer
script_run_config.run_config.target = my_compute_target
If you do not specify an environment, a default environment will be created for you.
If you have command-line arguments you want to pass to your training script, you can
specify them via the arguments parameter of the ScriptRunConfig constructor, e.g.
arguments=['--arg1', arg1_val, '--arg2', arg2_val] .
If you want to override the default maximum time allowed for the job, you can do so via
the max_run_duration_seconds parameter. The system will attempt to automatically
cancel the job if it takes longer than this value.
Specify a distributed job configuration

If you want to run a distributed training job, provide the distributed job-specific config
to the distributed_job_config parameter. Supported config types include
MpiConfiguration, TensorflowConfiguration, and PyTorchConfiguration.
For more information and examples on running distributed Horovod, TensorFlow and
PyTorch jobs, see:
Distributed training of deep learning models on Azure
Submit the experiment

Python
run = experiment.submit(config=src)
） Important
When you submit the training job, a snapshot of the directory that contains your
training scripts is created and sent to the compute target. It is also stored as part of
the experiment in your workspace. If you change files and submit the job again,
only the changed files will be uploaded.
For more information about snapshots, see Snapshots.
） Important
Special Folders Two folders, outputs and logs, receive special treatment by Azure
Machine Learning. During training, when you write files to folders named outputs
and logs that are relative to the root directory ( ./outputs and ./logs , respectively),
the files will automatically upload to your job history so that you have access to
them once your job is finished.
To create artifacts during training (such as model files, checkpoints, data files, or
plotted images) write these to the ./outputs folder.
Similarly, you can write any logs from your training job to the ./logs folder. To
utilize Azure Machine Learning's TensorBoard integration make sure you write
your TensorBoard logs to this folder. While your job is in progress, you will be able
to launch TensorBoard and stream these logs. Later, you will also be able to restore
the logs from any of your previous jobs.
For example, to download a file written to the outputs folder to your local machine
after your remote training job: run.download_file(name='outputs/my_output_file',
output_file_path='my_destination_path')

When you start a training job where the source directory is a local Git repository,
information about the repository is stored in the job history. For more information, see
Git integration for Azure Machine Learning.
Notebook examples
See these notebooks for examples of configuring jobs for various training scenarios:
Training on various compute targets

Training with ML frameworks
tutorials/img-classification-part1-training.ipynb
Learn how to run notebooks by following the article Use Jupyter notebooks to explore
this service.
Troubleshooting
AttributeError: 'RoundTripLoader' object has no attribute 'comment_handling':
This error comes from the new version (v0.17.5) of ruamel-yaml , an azureml-core
dependency, that introduces a breaking change to azureml-core . In order to fix
this error, please uninstall ruamel-yaml by running pip uninstall ruamel-yaml and
installing a different version of ruamel-yaml ; the supported versions are v0.15.35 to
v0.17.4 (inclusive). You can do this by running pip install "ruamel-yaml>=0.15.35,
<0.17.5" .
Job fails with jwt.exceptions.DecodeError : Exact error message:

jwt.exceptions.DecodeError: It is required that you pass in a value for the
"algorithms" argument when calling decode() .
Consider upgrading to the latest version of azureml-core: pip install -U azureml-

core .
If you are running into this issue for local jobs, check the version of PyJWT installed
in your environment where you are starting jobs. The supported versions of PyJWT
are < 2.0.0. Uninstall PyJWT from the environment if the version is >= 2.0.0. You
may check the version of PyJWT, uninstall and install the right version as follows:
1. Start a command shell, activate conda environment where azureml-core is

installed.
2. Enter pip freeze and look for PyJWT , if found, the version listed should be <
2.0.0
3. If the listed version is not a supported version, pip uninstall PyJWT in the
command shell and enter y for confirmation.
4. Install using pip install 'PyJWT<2.0.0'
If you are submitting a user-created environment with your job, consider using the
latest version of azureml-core in that environment. Versions >= 1.18.0 of azureml-
core already pin PyJWT < 2.0.0. If you need to use a version of azureml-core <
1.18.0 in the environment you submit, make sure to specify PyJWT < 2.0.0 in your
pip dependencies.
ModuleErrors (No module named): If you are running into ModuleErrors while
submitting experiments in Azure Machine Learning, the training script is expecting
a package to be installed but it isn't added. Once you provide the package name,
Azure Machine Learning installs the package in the environment used for your
training job.
If you are using Estimators to submit experiments, you can specify a package name
via pip_packages or conda_packages parameter in the estimator based on from
which source you want to install the package. You can also specify a yml file with
all your dependencies using conda_dependencies_file or list all your pip
requirements in a txt file using pip_requirements_file parameter. If you have your
own Azure Machine Learning Environment object that you want to override the
default image used by the estimator, you can specify that environment via the
environment parameter of the estimator constructor.
Azure Machine Learning maintained docker images and their contents can be seen
in Azure Machine Learning Containers . Framework-specific dependencies are
listed in the respective framework documentation:
Chainer
PyTorch
TensorFlow
SKLearn
７ Note
If you think a particular package is common enough to be added in Azure
Machine Learning maintained images and environments please raise a GitHub
issue in Azure Machine Learning Containers .
NameError (Name not defined), AttributeError (Object has no attribute): This

exception should come from your training scripts. You can look at the log files
from Azure portal to get more information about the specific name not defined or
attribute error. From the SDK, you can use run.get_details() to look at the error
message. This will also list all the log files generated for your job. Please make sure
to take a look at your training script and fix the error before resubmitting your job.
Job or experiment deletion: Experiments can be archived by using the

Experiment.archive method, or from the Experiment tab view in Azure Machine
Learning studio client via the "Archive experiment" button. This action hides the
experiment from list queries and views, but does not delete it.
Permanent deletion of individual experiments or jobs is not currently supported.

For more information on deleting Workspace assets, see Export or delete your
Machine Learning service workspace data.
Metric Document is too large: Azure Machine Learning has internal limits on the
size of metric objects that can be logged at once from a training job. If you
encounter a "Metric Document is too large" error when logging a list-valued
metric, try splitting the list into smaller chunks, for example:
Python
run.log_list("my metric name", my_metric[:N])

run.log_list("my metric name", my_metric[N:])
Internally, Azure Machine Learning concatenates the blocks with the same metric
name into a contiguous list.
Compute target takes a long time to start: The Docker images for compute
targets are loaded from Azure Container Registry (ACR). By default, Azure Machine
Learning creates an ACR that uses the basic service tier. Changing the ACR for your
workspace to standard or premium tier may reduce the time it takes to build and
load images. For more information, see Azure Container Registry service tiers.
Next steps
Tutorial: Train and deploy a model uses a managed compute target to train a
model.
See how to train models with specific ML frameworks, such as Scikit-learn,
TensorFlow, and PyTorch.
Learn how to efficiently tune hyperparameters to build better models.
Once you have a trained model, learn how and where to deploy models.
View the ScriptRunConfig class SDK reference.
Use Azure Machine Learning with Azure Virtual Networks
Train models with the Azure Machine
Learning Python SDK (v1)
Article • 06/09/2023
Learn how to attach Azure compute resources to your Azure Machine Learning
workspace with SDK v1. Then you can use these resources as training and inference
compute targets in your machine learning tasks.
In this article, learn how to set up your workspace to use these compute resources:
Your local computer

Apache Spark pools (powered by Azure Synapse Analytics)
Azure HDInsight
Azure Batch
Azure Databricks - used as a training compute target only in machine learning
pipelines
Azure Container Instance
Azure Machine Learning Kubernetes
To use compute targets managed by Azure Machine Learning, see:

Azure Machine Learning compute cluster
Azure Kubernetes Service cluster
） Important
Items in this article marked as "preview" are currently in public preview. The
preview version is provided without a service level agreement, and it's not
or might have constrained capabilities. For more information, see Supplemental
Prerequisites
An Azure Machine Learning workspace. For more information, see Create
The Azure CLI extension for Machine Learning service, Azure Machine Learning
Python SDK, or the Azure Machine Learning Visual Studio Code extension.
Limitations
Do not create multiple, simultaneous attachments to the same compute from
your workspace. For example, attaching one Azure Kubernetes Service cluster to a
workspace using two different names. Each new attachment will break the previous
existing attachment(s).
If you want to reattach a compute target, for example to change TLS or other
cluster configuration setting, you must first remove the existing attachment.
What's a compute target?

With Azure Machine Learning, you can train your model on various resources or
environments, collectively referred to as compute targets. A compute target can be a
local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure
HDInsight, or a remote virtual machine. You also use compute targets for model
deployment as described in "Where and how to deploy your models".
Local computer
When you use your local computer for training, there is no need to create a compute
target. Just submit the training run from your local machine.
When you use your local computer for inference, you must have Docker installed. To
perform the deployment, use LocalWebservice.deploy_configuration() to define the port
that the web service will use. Then use the normal deployment process as described in
Deploy models with Azure Machine Learning.

Azure Machine Learning also supports attaching an Azure Virtual Machine. The VM must
be an Azure Data Science Virtual Machine (DSVM). The VM offers a curated choice of
tools and frameworks for full-lifecycle machine learning development. For more
information on how to use the DSVM with Azure Machine Learning, see Configure a
development environment.
 Tip
Instead of a remote VM, we recommend using the Azure Machine Learning

compute instance. It is a fully managed, cloud-based compute solution that is
specific to Azure Machine Learning. For more information, see create and manage
Azure Machine Learning compute instance.
1. Create: Azure Machine Learning cannot create a remote VM for you. Instead, you
must create the VM and then attach it to your Azure Machine Learning workspace.
For information on creating a DSVM, see Provision the Data Science Virtual
Machine for Linux (Ubuntu).
２ Warning
Azure Machine Learning only supports virtual machines that run Ubuntu.
When you create a VM or choose an existing VM, you must select a VM that
uses Ubuntu.
Azure Machine Learning also requires the virtual machine to have a public IP
address.
2. Attach: To attach an existing virtual machine as a compute target, you must

provide the resource ID, user name, and password for the virtual machine. The
resource ID of the VM can be constructed using the subscription ID, resource
group name, and VM name using the following string format:
/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Mic
rosoft.Compute/virtualMachines/<vm_name>
Python
from azureml.core.compute import RemoteCompute, ComputeTarget
# Create the compute config

compute_target_name = "attach-dsvm"
attach_config =
RemoteCompute.attach_configuration(resource_id='<resource_id>',
ssh_port=22,
username='<username>',
password="<password>")
# Attach the compute
compute = ComputeTarget.attach(ws, compute_target_name, attach_config)
compute.wait_for_completion(show_output=True)
Or you can attach the DSVM to your workspace using Azure Machine Learning
studio.
２ Warning
Do not create multiple, simultaneous attachments to the same DSVM from

your workspace. Each new attachment will break the previous existing
attachment(s).
3. Configure: Create a run configuration for the DSVM compute target. Docker and
conda are used to create and configure the training environment on the DSVM.
Python

# Create environment
# Specify the conda dependencies

myenv.python.conda_dependencies =
CondaDependencies.create(conda_packages=['scikit-learn'])
# If no base image is explicitly specified the default CPU image

"azureml.core.runconfig.DEFAULT_CPU_IMAGE" will be used
# To use GPU in DSVM, you should specify the default GPU base Docker
image or another GPU-enabled image:
# myenv.docker.enabled = True
# myenv.docker.base_image = azureml.core.runconfig.DEFAULT_GPU_IMAGE
# Configure the run configuration with the Linux DSVM as the compute
target and the environment defined above
src = ScriptRunConfig(source_directory=".", script="train.py",
compute_target=compute, environment=myenv)
 Tip
If you want to remove (detach) a VM from your workspace, use the

RemoteCompute.detach() method.
Azure Machine Learning does not delete the VM for you. You must manually delete
the VM using the Azure portal, CLI, or the SDK for Azure VM.
Apache Spark pools

The Azure Synapse Analytics integration with Azure Machine Learning (preview) allows
you to attach an Apache Spark pool backed by Azure Synapse for interactive data
exploration and preparation. With this integration, you can have a dedicated compute
for data wrangling at scale. For more information, see How to attach Apache Spark
pools powered by Azure Synapse Analytics.
Azure HDInsight
Azure HDInsight is a popular platform for big-data analytics. The platform provides
Apache Spark, which can be used to train your model.
1. Create: Azure Machine Learning cannot create an HDInsight cluster for you.
Instead, you must create the cluster and then attach it to your Azure Machine
Learning workspace. For more information, see Create a Spark Cluster in HDInsight.
２ Warning
Azure Machine Learning requires the HDInsight cluster to have a public IP

address.
When you create the cluster, you must specify an SSH user name and password.
Take note of these values, as you need them to use HDInsight as a compute target.
After the cluster is created, connect to it with the hostname <clustername>-

ssh.azurehdinsight.net, where <clustername> is the name that you provided for
the cluster.
2. Attach: To attach an HDInsight cluster as a compute target, you must provide the
resource ID, user name, and password for the HDInsight cluster. The resource ID of
the HDInsight cluster can be constructed using the subscription ID, resource group
name, and HDInsight cluster name using the following string format:
/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Mic
rosoft.HDInsight/clusters/<cluster_name>
Python
from azureml.core.compute import ComputeTarget, HDInsightCompute
try:
# if you want to connect using SSH key instead of username/password you
can provide parameters private_key_file and private_key_passphrase
attach_config =
HDInsightCompute.attach_configuration(resource_id='<resource_id>',
ssh_port=22,
username='<ssh-
username>',
password='<ssh-
pwd>')
hdi_compute = ComputeTarget.attach(workspace=ws,
name='myhdi',
attach_configuration=attach_config)
except ComputeTargetException as e:
print("Caught = {}".format(e.message))
hdi_compute.wait_for_completion(show_output=True)
Or you can attach the HDInsight cluster to your workspace using Azure Machine
Learning studio.
２ Warning
Do not create multiple, simultaneous attachments to the same HDInsight

from your workspace. Each new attachment will break the previous existing
attachment(s).
3. Configure: Create a run configuration for the HDI compute target.
Python

# use pyspark framework

run_hdi = RunConfiguration(framework="pyspark")
# Set compute target to the HDI cluster

run_hdi.target = hdi_compute.name
# specify CondaDependencies object to ask system installing numpy

cd = CondaDependencies()
cd.add_conda_package('numpy')
run_hdi.environment.python.conda_dependencies = cd
 Tip
If you want to remove (detach) an HDInsight cluster from the workspace, use the
HDInsightCompute.detach() method.
Azure Machine Learning does not delete the HDInsight cluster for you. You must
manually delete it using the Azure portal, CLI, or the SDK for Azure HDInsight.
Azure Batch
Azure Batch is used to run large-scale parallel and high-performance computing (HPC)
applications efficiently in the cloud. AzureBatchStep can be used in an Azure Machine
Learning Pipeline to submit jobs to an Azure Batch pool of machines.
To attach Azure Batch as a compute target, you must use the Azure Machine Learning
SDK and provide the following information:
Azure Batch compute name: A friendly name to be used for the compute within
the workspace
Azure Batch account name: The name of the Azure Batch account
Resource Group: The resource group that contains the Azure Batch account.
The following code demonstrates how to attach Azure Batch as a compute target:
Python
from azureml.core.compute import ComputeTarget, BatchCompute

# Name to associate with new compute in workspace

batch_compute_name = 'mybatchcompute'
# Batch account details needed to attach as compute to workspace

batch_account_name = "<batch_account_name>" # Name of the Batch account
# Name of the resource group which contains this account
batch_resource_group = "<batch_resource_group>"
try:
# check if the compute is already attached
batch_compute = BatchCompute(ws, batch_compute_name)
print('Attaching Batch compute...')
provisioning_config = BatchCompute.attach_configuration(
resource_group=batch_resource_group,
account_name=batch_account_name)
batch_compute = ComputeTarget.attach(
ws, batch_compute_name, provisioning_config)
batch_compute.wait_for_completion()
print("Provisioning state:{}".format(batch_compute.provisioning_state))
print("Provisioning errors:
{}".format(batch_compute.provisioning_errors))
print("Using Batch compute:{}".format(batch_compute.cluster_resource_id))
２ Warning
Do not create multiple, simultaneous attachments to the same Azure Batch from
your workspace. Each new attachment will break the previous existing
attachment(s).
Azure Databricks
Azure Databricks is an Apache Spark-based environment in the Azure cloud. It can be
used as a compute target with an Azure Machine Learning pipeline.
） Important
Azure Machine Learning cannot create an Azure Databricks compute target.

Instead, you must create an Azure Databricks workspace, and then attach it to your
Azure Machine Learning workspace. To create a workspace resource, see the Run a
Spark job on Azure Databricks document.
To attach an Azure Databricks workspace from a different Azure subscription, you

(your Azure AD account) must be granted the Contributor role on the Azure
Databricks workspace. Check your access in the Azure portal .
To attach Azure Databricks as a compute target, provide the following information:
Databricks compute name: The name you want to assign to this compute
resource.
Databricks workspace name: The name of the Azure Databricks workspace.
Databricks access token: The access token used to authenticate to Azure
Databricks. To generate an access token, see the Authentication document.
The following code demonstrates how to attach Azure Databricks as a compute target
with the Azure Machine Learning SDK:
Python
import os
from azureml.core.compute import ComputeTarget, DatabricksCompute
databricks_compute_name = os.environ.get(
"AML_DATABRICKS_COMPUTE_NAME", "<databricks_compute_name>")
databricks_workspace_name = os.environ.get(
"AML_DATABRICKS_WORKSPACE", "<databricks_workspace_name>")
databricks_resource_group = os.environ.get(
"AML_DATABRICKS_RESOURCE_GROUP", "<databricks_resource_group>")
databricks_access_token = os.environ.get(
"AML_DATABRICKS_ACCESS_TOKEN", "<databricks_access_token>")
try:
databricks_compute = ComputeTarget(
workspace=ws, name=databricks_compute_name)
print('Compute target already exists')
print('compute not found')
print('databricks_compute_name {}'.format(databricks_compute_name))
print('databricks_workspace_name {}'.format(databricks_workspace_name))
print('databricks_access_token {}'.format(databricks_access_token))
# Create attach config

attach_config =
DatabricksCompute.attach_configuration(resource_group=databricks_resource_gr
oup,
workspace_name=databricks_workspace_name,
access_token=databricks_access_token)
databricks_compute = ComputeTarget.attach(
ws,
databricks_compute_name,
attach_config
)
databricks_compute.wait_for_completion(True)
For a more detailed example, see an example notebook on GitHub.
２ Warning
Do not create multiple, simultaneous attachments to the same Azure Databricks
from your workspace. Each new attachment will break the previous existing
attachment(s).

Azure Data Lake Analytics is a big data analytics platform in the Azure cloud. It can be
used as a compute target with an Azure Machine Learning pipeline.
Create an Azure Data Lake Analytics account before using it. To create this resource, see
the Get started with Azure Data Lake Analytics document.
To attach Data Lake Analytics as a compute target, you must use the Azure Machine
Learning SDK and provide the following information:
Compute name: The name you want to assign to this compute resource.
Resource Group: The resource group that contains the Data Lake Analytics
account.
Account name: The Data Lake Analytics account name.
The following code demonstrates how to attach Data Lake Analytics as a compute
target:
Python
import os
from azureml.core.compute import ComputeTarget, AdlaCompute
adla_compute_name = os.environ.get(
"AML_ADLA_COMPUTE_NAME", "<adla_compute_name>")
adla_resource_group = os.environ.get(
"AML_ADLA_RESOURCE_GROUP", "<adla_resource_group>")
adla_account_name = os.environ.get(
"AML_ADLA_ACCOUNT_NAME", "<adla_account_name>")
try:
adla_compute = ComputeTarget(workspace=ws, name=adla_compute_name)
print('Compute target already exists')
print('compute not found')
print('adla_compute_name {}'.format(adla_compute_name))
print('adla_resource_id {}'.format(adla_resource_group))
print('adla_account_name {}'.format(adla_account_name))
# create attach config
attach_config =
AdlaCompute.attach_configuration(resource_group=adla_resource_group,
account_name=adla_account_name)
# Attach ADLA
adla_compute = ComputeTarget.attach(
ws,
adla_compute_name,
attach_config
)
adla_compute.wait_for_completion(True)
For a more detailed example, see an example notebook on GitHub.
２ Warning
Do not create multiple, simultaneous attachments to the same ADLA from your
workspace. Each new attachment will break the previous existing attachment(s).
 Tip
Azure Machine Learning pipelines can only work with data stored in the default
data store of the Data Lake Analytics account. If the data you need to work with is
in a non-default store, you can use a DataTransferStep to copy the data before
training.

Azure Container Instances (ACI) are created dynamically when you deploy a model. You
cannot create or attach ACI to your workspace in any other way. For more information,
see Deploy a model to Azure Container Instances.
Kubernetes
Azure Machine Learning provides you with the option to attach your own Kubernetes
clusters for training and inferencing. See Configure Kubernetes cluster for Azure
Machine Learning.
To detach a Kubernetes cluster from your workspace, use the following method:
Python
compute_target.detach()
２ Warning
Detaching a cluster does not delete the cluster. To delete an Azure Kubernetes
Service cluster, see Use the Azure CLI with AKS. To delete an Azure Arc-enabled
Kubernetes cluster, see Azure Arc quickstart.
Notebook examples
See these notebooks for examples of training with various compute targets:
how-to-use-azureml/training
tutorials/img-classification-part1-training.ipynb
this service.
Next steps
Use the compute resource to configure and submit a training run.
Tutorial: Train and deploy a model uses a managed compute target to train a
model.
Learn how to efficiently tune hyperparameters to build better models.
Once you have a trained model, learn how and where to deploy models.
Use Azure Machine Learning with Azure Virtual Networks
Distributed GPU training guide (SDK v1)
Article • 03/02/2023
Learn more about how to use distributed GPU training code in Azure Machine Learning
(ML). This article will not teach you about distributed training. It will help you run your
existing distributed training code on Azure Machine Learning. It offers tips and examples
for you to follow for each framework:
Message Passing Interface (MPI)

Horovod
DeepSpeed
Environment variables from Open MPI
PyTorch
Process group initialization
Launch options
DistributedDataParallel (per-process-launch)
Using torch.distributed.launch (per-node-launch)
PyTorch Lightning
Hugging Face Transformers
TensorFlow
Environment variables for TensorFlow (TF_CONFIG)
Accelerate GPU training with InfiniBand
Prerequisites
Review these basic concepts of distributed GPU training such as data parallelism,
distributed data parallelism, and model parallelism.
 Tip
If you don't know which type of parallelism to use, more than 90% of the time you
should use Distributed Data Parallelism.
MPI
Azure Machine Learning offers an MPI job to launch a given number of processes in
each node. You can adopt this approach to run distributed training using either per-
process-launcher or per-node-launcher, depending on whether process_count_per_node
is set to 1 (the default) for per-node-launcher, or equal to the number of devices/GPUs
for per-process-launcher. Azure Machine Learning constructs the full MPI launch
command ( mpirun ) behind the scenes. You can't provide your own full head-node-
launcher commands like mpirun or DeepSpeed launcher .
 Tip
The base Docker image used by an Azure Machine Learning MPI job needs to have
an MPI library installed. Open MPI is included in all the Azure Machine Learning
GPU base images . When you use a custom Docker image, you are responsible
for making sure the image includes an MPI library. Open MPI is recommended, but
you can also use a different MPI implementation such as Intel MPI. Azure Machine
Learning also provides curated environments for popular frameworks.
To run distributed training using MPI, follow these steps:
1. Use an Azure Machine Learning environment with the preferred deep learning
framework and MPI. Azure Machine Learning provides curated environment for
popular frameworks.
2. Define MpiConfiguration with process_count_per_node and node_count .
process_count_per_node should be equal to the number of GPUs per node for per-
process-launch, or set to 1 (the default) for per-node-launch if the user script will
be responsible for launching the processes per node.
3. Pass the MpiConfiguration object to the distributed_job_config parameter of
ScriptRunConfig .
Python
from azureml.core import Workspace, ScriptRunConfig, Environment, Experiment

from azureml.core.runconfig import MpiConfiguration
curated_env_name = 'AzureML-PyTorch-1.6-GPU'
pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = MpiConfiguration(process_count_per_node=4, node_count=2)
run_config = ScriptRunConfig(
source_directory= './src',
script='train.py',
environment=pytorch_env,
distributed_job_config=distr_config,
)
# submit the run configuration to start the job
run = Experiment(ws, "experiment_name").submit(run_config)
Horovod
Use the MPI job configuration when you use Horovod for distributed training with the
deep learning framework.
Make sure your code follows these tips:
The training code is instrumented correctly with Horovod before adding the Azure
Machine Learning parts
Your Azure Machine Learning environment contains Horovod and MPI. The
PyTorch and TensorFlow curated GPU environments come pre-configured with
Horovod and its dependencies.
Create an MpiConfiguration with your desired distribution.
Horovod example
azureml-examples: TensorFlow distributed training using Horovod
DeepSpeed
Don't use DeepSpeed's custom launcher to run distributed training with the
DeepSpeed library on Azure Machine Learning. Instead, configure an MPI job to
launch the training job with MPI .
Make sure your code follows these tips:
Your Azure Machine Learning environment contains DeepSpeed and its

dependencies, Open MPI, and mpi4py.
Create an MpiConfiguration with your distribution.
DeepSpeed example
azureml-examples: Distributed training with DeepSpeed on CIFAR-10
Environment variables from Open MPI

When running MPI jobs with Open MPI images, the following environment variables for
each process launched:
1. OMPI_COMM_WORLD_RANK - the rank of the process
2. OMPI_COMM_WORLD_SIZE - the world size
3. AZ_BATCH_MASTER_NODE - primary address with port, MASTER_ADDR:MASTER_PORT
4. OMPI_COMM_WORLD_LOCAL_RANK - the local rank of the process on the node
5. OMPI_COMM_WORLD_LOCAL_SIZE - number of processes on the node
 Tip
Despite the name, environment variable OMPI_COMM_WORLD_NODE_RANK does not

corresponds to the NODE_RANK . To use per-node-launcher, set
process_count_per_node=1 and use OMPI_COMM_WORLD_RANK as the NODE_RANK .
PyTorch
Azure Machine Learning supports running distributed jobs using PyTorch's native
distributed training capabilities ( torch.distributed ).
 Tip
For data parallelism, the official PyTorch guidance is to use

DistributedDataParallel (DDP) over DataParallel for both single-node and multi-
node distributed training. PyTorch also recommends using DistributedDataParallel
over the multiprocessing package . Azure Machine Learning documentation and
examples will therefore focus on DistributedDataParallel training.
Process group initialization

The backbone of any distributed training is based on a group of processes that know
each other and can communicate with each other using a backend. For PyTorch, the
process group is created by calling torch.distributed.init_process_group in all
distributed processes to collectively form a process group.
torch.distributed.init_process_group(backend='nccl', init_method='env://',
...)
The most common communication backends used are mpi , nccl , and gloo . For GPU-
based training nccl is recommended for best performance and should be used
whenever possible.
init_method tells how each process can discover each other, how they initialize and
verify the process group using the communication backend. By default if init_method is
not specified PyTorch will use the environment variable initialization method ( env:// ).
init_method is the recommended initialization method to use in your training code to
run distributed PyTorch on Azure Machine Learning. PyTorch will look for the following
environment variables for initialization:
MASTER_ADDR - IP address of the machine that will host the process with rank 0.
MASTER_PORT - A free port on the machine that will host the process with rank 0.
WORLD_SIZE - The total number of processes. Should be equal to the total number
of devices (GPU) used for distributed training.

RANK - The (global) rank of the current process. The possible values are 0 to (world
size - 1).
For more information on process group initialization, see the PyTorch documentation .
Beyond these, many applications will also need the following environment variables:
LOCAL_RANK - The local (relative) rank of the process within the node. The possible
values are 0 to (# of processes on the node - 1). This information is useful because
many operations such as data preparation only should be performed once per
node --- usually on local_rank = 0.
NODE_RANK - The rank of the node for multi-node training. The possible values are 0
to (total # of nodes - 1).
PyTorch launch options

The Azure Machine Learning PyTorch job supports two types of options for launching
distributed training:
Per-process-launcher: The system will launch all distributed processes for you, with
all the relevant information (such as environment variables) to set up the process
group.
Per-node-launcher: You provide Azure Machine Learning with the utility launcher
that will get run on each node. The utility launcher will handle launching each of
the processes on a given node. Locally within each node, RANK and LOCAL_RANK are
set up by the launcher. The torch.distributed.launch utility and PyTorch Lightning
both belong in this category.
There are no fundamental differences between these launch options. The choice is
largely up to your preference or the conventions of the frameworks/libraries built on top
of vanilla PyTorch (such as Lightning or Hugging Face).
The following sections go into more detail on how to configure Azure Machine Learning
PyTorch jobs for each of the launch options.
DistributedDataParallel (per-process-launch)
You don't need to use a launcher utility like torch.distributed.launch . To run a
distributed PyTorch job:
1. Specify the training script and arguments

2. Create a PyTorchConfiguration and specify the process_count and node_count . The
process_count corresponds to the total number of processes you want to run for
your job. process_count should typically equal # GPUs per node x # nodes . If
process_count isn't specified, Azure Machine Learning will by default launch one
process per node.
Azure Machine Learning will set the MASTER_ADDR , MASTER_PORT , WORLD_SIZE , and
NODE_RANK environment variables on each node, and set the process-level RANK and
LOCAL_RANK environment variables.
To use this option for multi-process-per-node training, use Azure Machine Learning
Python SDK >= 1.22.0 . Process_count was introduced in 1.22.0.
Python
from azureml.core import ScriptRunConfig, Environment, Experiment

from azureml.core.runconfig import PyTorchConfiguration
distr_config = PyTorchConfiguration(process_count=8, node_count=2)
script='train.py',
arguments=['--epochs', 50],
)
run = Experiment(ws, 'experiment_name').submit(run_config)

 Tip
If your training script passes information like local rank or rank as script arguments,
you can reference the environment variable(s) in the arguments:
Python
arguments=['--epochs', 50, '--local_rank', $LOCAL_RANK]
Pytorch per-process-launch example

azureml-examples: Distributed training with PyTorch on CIFAR-10
Using torch.distributed.launch (per-node-launch)

PyTorch provides a launch utility in torch.distributed.launch that you can use to launch
multiple processes per node. The torch.distributed.launch module spawns multiple
training processes on each of the nodes.
The following steps demonstrate how to configure a PyTorch job with a per-node-
launcher on Azure Machine Learning. The job achieves the equivalent of running the
following command:
shell
python -m torch.distributed.launch --nproc_per_node <num processes per node>

\
--nnodes <num nodes> --node_rank $NODE_RANK --master_addr $MASTER_ADDR \
--master_port $MASTER_PORT --use_env \
<your training script> <your script arguments>
1. Provide the torch.distributed.launch command to the command parameter of the

ScriptRunConfig constructor. Azure Machine Learning runs this command on each
node of your training cluster. --nproc_per_node should be less than or equal to the
number of GPUs available on each node. MASTER_ADDR, MASTER_PORT, and
NODE_RANK are all set by Azure Machine Learning, so you can just reference the
environment variables in the command. Azure Machine Learning sets
MASTER_PORT to 6105 , but you can pass a different value to the --master_port
argument of torch.distributed.launch command if you wish. (The launch utility will
reset the environment variables.)
2. Create a PyTorchConfiguration and specify the node_count .
Python

distr_config = PyTorchConfiguration(node_count=2)
launch_cmd = "python -m torch.distributed.launch --nproc_per_node 4 --nnodes
2 --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port
$MASTER_PORT --use_env train.py --epochs 50".split()
command=launch_cmd,
)
 Tip
Single-node multi-GPU training: If you are using the launch utility to run single-
node multi-GPU PyTorch training, you do not need to specify the
distributed_job_config parameter of ScriptRunConfig.
Python
launch_cmd = "python -m torch.distributed.launch --nproc_per_node 4 --

use_env train.py --epochs 50".split()
command=launch_cmd,
)
PyTorch per-node-launch example

azureml-examples: Distributed training with PyTorch on CIFAR-10
PyTorch Lightning
PyTorch Lightning is a lightweight open-source library that provides a high-level
interface for PyTorch. Lightning abstracts away many of the lower-level distributed
training configurations required for vanilla PyTorch. Lightning allows you to run your
training scripts in single GPU, single-node multi-GPU, and multi-node multi-GPU
settings. Behind the scene, it launches multiple processes for you similar to
torch.distributed.launch .
For single-node training (including single-node multi-GPU), you can run your code on
Azure Machine Learning without needing to specify a distributed_job_config . To run an
experiment using multiple nodes with multiple GPUs, there are 2 options:
Using PyTorch configuration (recommended): Define PyTorchConfiguration and

specify communication_backend="Nccl" , node_count , and process_count (note that
this is the total number of processes, ie, num_nodes * process_count_per_node ). In
Lightning Trainer module, specify both num_nodes and gpus to be consistent with
PyTorchConfiguration . For example, num_nodes = node_count and gpus =
process_count_per_node .
Using MPI Configuration:
Define MpiConfiguration and specify both node_count and

process_count_per_node . In Lightning Trainer, specify both num_nodes and gpus
to be respectively the same as node_count and process_count_per_node from

MpiConfiguration .
For multi-node training with MPI, Lightning requires the following environment
variables to be set on each node of your training cluster:
MASTER_ADDR
MASTER_PORT
NODE_RANK
LOCAL_RANK
Manually set these environment variables that Lightning requires in the main
training scripts:
Python
import os
from argparse import ArgumentParser
def set_environment_variables_for_mpi(num_nodes, gpus_per_node,

master_port=54965):
if num_nodes > 1:
os.environ["MASTER_ADDR"], os.environ["MASTER_PORT"] =
os.environ["AZ_BATCH_MASTER_NODE"].split(":")
else:
os.environ["MASTER_ADDR"] =
os.environ["AZ_BATCHAI_MPI_MASTER_NODE"]
os.environ["MASTER_PORT"] = str(master_port)
try:
os.environ["NODE_RANK"] =
str(int(os.environ.get("OMPI_COMM_WORLD_RANK")) // gpus_per_node)
# additional variables
os.environ["MASTER_ADDRESS"] = os.environ["MASTER_ADDR"]
os.environ["LOCAL_RANK"] =
os.environ["OMPI_COMM_WORLD_LOCAL_RANK"]
os.environ["WORLD_SIZE"] = os.environ["OMPI_COMM_WORLD_SIZE"]
except:
# fails when used with pytorch configuration instead of mpi
pass
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--num_nodes", type=int, required=True)
parser.add_argument("--gpus_per_node", type=int, required=True)
set_environment_variables_for_mpi(args.num_nodes,
args.gpus_per_node)
trainer = Trainer(
num_nodes=args.num_nodes,
gpus=args.gpus_per_node
)
Lightning handles computing the world size from the Trainer flags --gpus and --
num_nodes .
Python
from azureml.core import ScriptRunConfig, Experiment

nnodes = 2
gpus_per_node = 4
args = ['--max_epochs', 50, '--gpus_per_node', gpus_per_node, '--
accelerator', 'ddp', '--num_nodes', nnodes]
distr_config = MpiConfiguration(node_count=nnodes,
process_count_per_node=gpus_per_node)
script='train.py',
arguments=args,
)
Hugging Face Transformers

Hugging Face provides many examples for using its Transformers library with
torch.distributed.launch to run distributed training. To run these examples and your
own custom training scripts using the Transformers Trainer API, follow the Using
torch.distributed.launch section.
Sample job configuration code to fine-tune the BERT large model on the text
classification MNLI task using the run_glue.py script on one node with 8 GPUs:
Python

distr_config = PyTorchConfiguration() # node_count defaults to 1

launch_cmd = "python -m torch.distributed.launch --nproc_per_node 8 text-
classification/run_glue.py --model_name_or_path bert-large-uncased-whole-
word-masking --task_name mnli --do_train --do_eval --max_seq_length 128 --
per_device_train_batch_size 8 --learning_rate 2e-5 --num_train_epochs 3.0 --
output_dir /tmp/mnli_output".split()
command=launch_cmd,
)
You can also use the per-process-launch option to run distributed training without using
torch.distributed.launch . One thing to keep in mind if using this method is that the
transformers TrainingArguments expect the local rank to be passed in as an argument
( --local_rank ). torch.distributed.launch takes care of this when --use_env=False , but
if you are using per-process-launch you'll need to explicitly pass the local rank in as an
argument to the training script --local_rank=$LOCAL_RANK as Azure Machine Learning
only sets the LOCAL_RANK environment variable.
TensorFlow
If you're using native distributed TensorFlow in your training code, such as TensorFlow
2.x's tf.distribute.Strategy API, you can launch the distributed job via Azure Machine
Learning using the TensorflowConfiguration .
To do so, specify a TensorflowConfiguration object to the distributed_job_config

parameter of the ScriptRunConfig constructor. If you're using
tf.distribute.experimental.MultiWorkerMirroredStrategy , specify the worker_count in
the TensorflowConfiguration corresponding to the number of nodes for your training

job.
Python

from azureml.core.runconfig import TensorflowConfiguration
curated_env_name = 'AzureML-TensorFlow-2.3-GPU'
tf_env = Environment.get(workspace=ws, name=curated_env_name)
distr_config = TensorflowConfiguration(worker_count=2,
parameter_server_count=0)
script='train.py',
environment=tf_env,
)
# submit the run configuration to start the job

run = Experiment(ws, "experiment_name").submit(run_config)
If your training script uses the parameter server strategy for distributed training, such as
for legacy TensorFlow 1.x, you'll also need to specify the number of parameter servers to
use in the job, for example, tf_config = TensorflowConfiguration(worker_count=2,
parameter_server_count=1) .
TF_CONFIG
In TensorFlow, the TF_CONFIG environment variable is required for training on multiple
machines. For TensorFlow jobs, Azure Machine Learning will configure and set the
TF_CONFIG variable appropriately for each worker before executing your training script.
You can access TF_CONFIG from your training script if you need to:
os.environ['TF_CONFIG'] .
Example TF_CONFIG set on a chief worker node:

JSON
TF_CONFIG='{
"cluster": {
"worker": ["host0:2222", "host1:2222"]
},
"task": {"type": "worker", "index": 0},
"environment": "cloud"
}'
TensorFlow example
azureml-examples: Distributed TensorFlow training with
MultiWorkerMirroredStrategy
Accelerating distributed GPU training with

InfiniBand
As the number of VMs training a model increases, the time required to train that model
should decrease. The decrease in time, ideally, should be linearly proportional to the
number of training VMs. For instance, if training a model on one VM takes 100 seconds,
then training the same model on two VMs should ideally take 50 seconds. Training the
model on four VMs should take 25 seconds, and so on.
InfiniBand can be an important factor in attaining this linear scaling. InfiniBand enables
low-latency, GPU-to-GPU communication across nodes in a cluster. InfiniBand requires
specialized hardware to operate. Certain Azure VM series, specifically the NC, ND, and
H-series, now have RDMA-capable VMs with SR-IOV and InfiniBand support. These VMs
communicate over the low latency and high-bandwidth InfiniBand network, which is
much more performant than Ethernet-based connectivity. SR-IOV for InfiniBand enables
near bare-metal performance for any MPI library (MPI is used by many distributed
training frameworks and tooling, including NVIDIA's NCCL software.) These SKUs are
intended to meet the needs of computationally intensive, GPU-acclerated machine
learning workloads. For more information, see Accelerating Distributed Training in Azure
Machine Learning with SR-IOV .
Typically, VM SKUs with an 'r' in their name contain the required InfiniBand hardware,
and those without an 'r' typically do not. ('r' is a reference to RDMA, which stands for
"remote direct memory access.") For instance, the VM SKU Standard_NC24rs_v3 is
InfiniBand-enabled, but the SKU Standard_NC24s_v3 is not. Aside from the InfiniBand
capabilities, the specs between these two SKUs are largely the same – both have 24
cores, 448 GB RAM, 4 GPUs of the same SKU, etc. Learn more about RDMA- and
InfiniBand-enabled machine SKUs.
２ Warning
The older-generation machine SKU Standard_NC24r is RDMA-enabled, but it does

not contain SR-IOV hardware required for InfiniBand.
If you create an AmlCompute cluster of one of these RDMA-capable, InfiniBand-enabled

sizes, the OS image will come with the Mellanox OFED driver required to enable
InfiniBand preinstalled and preconfigured.
Next steps
Deploy machine learning models to Azure
Reference architecture for distributed deep learning training in Azure
Train scikit-learn models at scale with
Azure Machine Learning (SDK v1)
Article • 06/01/2023
In this article, learn how to run your scikit-learn training scripts with Azure Machine
Learning.
The example scripts in this article are used to classify iris flower images to build a
machine learning model based on scikit-learn's iris dataset .
Whether you're training a machine learning scikit-learn model from the ground-up or
you're bringing an existing model into the cloud, you can use Azure Machine Learning
to scale out open-source training jobs using elastic cloud compute resources. You can
build, deploy, version, and monitor production-grade models with Azure Machine
Learning.
Prerequisites
You can run this code in either an Azure Machine Learning compute instance, or your
own Jupyter Notebook:

Complete the Quickstart: Get started with Azure Machine Learning to create a
compute instance. Every compute instance includes a dedicated notebook
server pre-loaded with the SDK and the notebooks sample repository.
Select the notebook tab in the Azure Machine Learning studio. In the samples
training folder, find a completed and expanded notebook by navigating to this
directory: how-to-use-azureml > ml-frameworks > scikit-learn > train-
hyperparameter-tune-deploy-with-sklearn folder.
You can use the pre-populated code in the sample training folder to complete
this tutorial.
Create a Jupyter Notebook server and run the code in the following sections.
Install the Azure Machine Learning SDK (>= 1.13.0).
Set up the experiment

This section sets up the training experiment by loading the required Python packages,
initializing a workspace, defining the training environment, and preparing the training
script.
Initialize a workspace
The Azure Machine Learning workspace is the top-level resource for the service. It
provides you with a centralized place to work with all the artifacts you create. In the
Python SDK, you can access the workspace artifacts by creating a workspace object.
Create a workspace object from the config.json file created in the prerequisites section.
Python
Prepare scripts
In this tutorial, the training script train_iris.py is already provided for you. In practice,
you should be able to take any custom training script as is and run it with Azure
Machine Learning without having to modify your code.
７ Note
The provided training script shows how to log some metrics to your Azure
Machine Learning run using the Run object within the script.
The provided training script uses example data from the iris =
datasets.load_iris() function. To use and access your own data, see how to
train with datasets to make data available during training.
Define your environment

To define the Azure Machine Learning Environment that encapsulates your training
script's dependencies, you can either define a custom environment or use and Azure
Machine Learning curated environment.
Use a curated environment

Optionally, Azure Machine Learning provides prebuilt, curated environments if you don't
want to define your own environment.
If you want to use a curated environment, you can run the following command instead:
Python
sklearn_env = Environment.get(workspace=ws, name='AzureML-Tutorial')
Create a custom environment

You can also create your own custom environment. Define your conda dependencies in
a YAML file; in this example the file is named conda_dependencies.yml .
YAML
dependencies:
- python=3.7
- scikit-learn
- numpy
- pip:
- azureml-defaults
Create an Azure Machine Learning environment from this Conda environment

specification. The environment will be packaged into a Docker container at runtime.
Python
sklearn_env = Environment.from_conda_specification(name='sklearn-env',
file_path='conda_dependencies.yml')
For more information on creating and using environments, see Create and use software
Configure and submit your training run
Create a ScriptRunConfig
Create a ScriptRunConfig object to specify the configuration details of your training job,
including your training script, environment to use, and the compute target to run on.
Any arguments to your training script will be passed via command line if specified in the
arguments parameter.
The following code will configure a ScriptRunConfig object for submitting your job for
execution on your local machine.
Python
src = ScriptRunConfig(source_directory='.',
script='train_iris.py',
arguments=['--kernel', 'linear', '--penalty', 1.0],
environment=sklearn_env)
If you want to instead run your job on a remote cluster, you can specify the desired
compute target to the compute_target parameter of ScriptRunConfig.
Python
compute_target = ws.compute_targets['<my-cluster-name>']
script='train_iris.py',
arguments=['--kernel', 'linear', '--penalty', 1.0],
environment=sklearn_env)
Submit your run

Python
run = Experiment(ws,'Tutorial-TrainIRIS').submit(src)
２ Warning
Azure Machine Learning runs training scripts by copying the entire source directory.
If you have sensitive data that you don't want to upload, use a .ignore file or don't
include it in the source directory . Instead, access your data using an Azure Machine
Learning dataset.
What happens during run execution

As the run is executed, it goes through the following stages:
Preparing: A docker image is created according to the environment defined. The

image is uploaded to the workspace's container registry and cached for later runs.
Logs are also streamed to the run history and can be viewed to monitor progress.
If a curated environment is specified instead, the cached image backing that
curated environment will be used.
Scaling: The cluster attempts to scale up if the Batch AI cluster requires more
nodes to execute the run than are currently available.
Running: All scripts in the script folder are uploaded to the compute target, data
stores are mounted or copied, and the script is executed. Outputs from stdout
and the ./logs folder are streamed to the run history and can be used to monitor
the run.
Post-Processing: The ./outputs folder of the run is copied over to the run history.
Save and register the model

Once you've trained the model, you can save and register it to your workspace. Model
registration lets you store and version your models in your workspace to simplify model
management and deployment.
Add the following code to your training script, train_iris.py, to save the model.
Python
import joblib
joblib.dump(svm_model_linear, 'model.joblib')
Register the model to your workspace with the following code. By specifying the
parameters model_framework , model_framework_version , and resource_configuration ,
no-code model deployment becomes available. No-code model deployment allows you
to directly deploy your model as a web service from the registered model, and the
ResourceConfiguration object defines the compute resource for the web service.
Python

model = run.register_model(model_name='sklearn-iris',
model_path='outputs/model.joblib',
model_framework=Model.Framework.SCIKITLEARN,
model_framework_version='0.19.1',
resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))
Deployment
The model you just registered can be deployed the exact same way as any other
registered model in Azure Machine Learning. The deployment how-to contains a section
on registering models, but you can skip directly to [creating a compute targethow-to-
deploy-and-where.md#choose-a-compute-target) for deployment, since you already
have a registered model.
(Preview) No-code model deployment
） Important
Previews .
Instead of the traditional deployment route, you can also use the no-code deployment
feature (preview) for scikit-learn. No-code model deployment is supported for all built-
in scikit-learn model types. By registering your model as shown above with the
model_framework , model_framework_version , and resource_configuration parameters,
you can simply use the deploy() static function to deploy your model.
Python
web_service = Model.deploy(ws, "scikit-learn-service", [model])
７ Note
These dependencies are included in the pre-built scikit-learn inference container.
YAML
- azureml-defaults
- inference-schema[numpy-support]
- scikit-learn
- numpy
The full how-to covers deployment in Azure Machine Learning in greater depth.
Next steps
In this article, you trained and registered a scikit-learn model, and learned about
deployment options. See these other articles to learn more about Azure Machine
Learning.
Track run metrics during training

Tune hyperparameters
Train TensorFlow models at scale with
Azure Machine Learning SDK (v1)
Article • 06/01/2023
In this article, learn how to run your TensorFlow training scripts at scale using Azure
Machine Learning.
This example trains and registers a TensorFlow model to classify handwritten digits using
a deep neural network (DNN).
Whether you're developing a TensorFlow model from the ground-up or you're bringing
an existing model into the cloud, you can use Azure Machine Learning to scale out
open-source training jobs to build, deploy, version, and monitor production-grade
models.
Prerequisites
Run this code on either of these environments:
Azure Machine Learning compute instance - no downloads or installation

necessary
dedicated notebook server pre-loaded with the SDK and the sample repository.
In the samples deep learning folder on the notebook server, find a completed
and expanded notebook by navigating to this directory: how-to-use-azureml >
ml-frameworks > tensorflow > train-hyperparameter-tune-deploy-with-
tensorflow folder.
Your own Jupyter Notebook server

Download the sample script files tf_mnist.py and utils.py
You can also find a completed Jupyter Notebook version of this guide on the
GitHub samples page. The notebook includes expanded sections covering
intelligent hyperparameter tuning, model deployment, and notebook widgets.
Before you can run the code in this article to create a GPU cluster, you'll need to request
a quota increase for your workspace.
initializing a workspace, creating the compute target, and defining the training
environment.
Import packages
First, import the necessary Python libraries.
Python
import os
import urllib
import shutil
import azureml

from azureml.core import Workspace, Run
from azureml.core.compute import ComputeTarget, AmlCompute

Python
Create a file dataset

A FileDataset object references one or multiple files in your workspace datastore or
public urls. The files can be of any format, and the class provides you with the ability to
download or mount the files to your compute. By creating a FileDataset , you create a
reference to the data source location. If you applied any transformations to the data set,
they'll be stored in the data set as well. The data remains in its existing location, so no
extra storage cost is incurred. For more information the Dataset package, see the How
to create register datasets article.
Python
web_paths = [
]
dataset = Dataset.File.from_files(path = web_paths)
Use the register() method to register the data set to your workspace so they can be
shared with others, reused across various experiments, and referred to by name in your
training script.
Python
dataset = dataset.register(workspace=ws,
name='mnist-dataset',
description='training and test dataset',
# list the files referenced by dataset

dataset.to_path()
Create a compute target

Create a compute target for your TensorFlow job to run on. In this example, create a
GPU-enabled Azure Machine Learning compute cluster.
） Important
Before you can create a GPU cluster, you'll need to request a quota increase for
your workspace.
Python
cluster_name = "gpu-cluster"
try:
compute_config =
AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
max_nodes=4)
compute_target.wait_for_completion(show_output=True,
min_node_count=None, timeout_in_minutes=20)
７ Note
You may choose to use low-priority VMs to run some or all of your workloads. See
how to create a low-priority VM.
For more information on compute targets, see the what is a compute target article.

script's dependencies, you can either define a custom environment or use an Azure

Azure Machine Learning provides prebuilt, curated environments if you don't want to
define your own environment. Azure Machine Learning has several CPU and GPU
curated environments for TensorFlow corresponding to different versions of TensorFlow.
You can use the latest version of this environment using the @latest directive. For more
info, see Azure Machine Learning Curated Environments.
If you want to use a curated environment, the code will be similar to the following
example:
Python
curated_env_name = 'AzureML-tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu'
tf_env = Environment.get(workspace=ws, name=curated_env_name)
To see the packages included in the curated environment, you can write out the conda
dependencies to disk:
Python
tf_env.save_to_directory(path=curated_env_name)
Make sure the curated environment includes all the dependencies required by your
training script. If not, you'll have to modify the environment to include the missing
dependencies. If the environment is modified, you'll have to give it a new name, as the
'AzureML' prefix is reserved for curated environments. If you modified the conda
dependencies YAML file, you can create a new environment from it with a new name, for
example:
Python
tf_env = Environment.from_conda_specification(name='AzureML-tensorflow-2.7-
ubuntu20.04-py38-cuda11-gpu', file_path='./conda_dependencies.yml')
If you had instead modified the curated environment object directly, you can clone that
environment with a new name:
Python
tf_env = tf_env.clone(new_name='my-AzureML-tensorflow-2.7-ubuntu20.04-py38-
cuda11-gpu')
You can also create your own Azure Machine Learning environment that encapsulates
your training script's dependencies.
First, define your conda dependencies in a YAML file; in this example the file is named
conda_dependencies.yml .
YAML
channels:
- conda-forge
dependencies:
- python=3.7
- pip:
- azureml-defaults
- tensorflow-gpu==2.2.0
Create an Azure Machine Learning environment from this conda environment
By default if no base image is specified, Azure Machine Learning will use a CPU image
azureml.core.environment.DEFAULT_CPU_IMAGE as the base image. Since this example
runs training on a GPU cluster, you'll need to specify a GPU base image that has the
necessary GPU drivers and dependencies. Azure Machine Learning maintains a set of
base images published on Microsoft Container Registry (MCR) that you can use, see the
Azure/AzureML-Containers GitHub repo for more information.
Python
tf_env = Environment.from_conda_specification(name='AzureML-tensorflow-2.7-
ubuntu20.04-py38-cuda11-gpu', file_path='./conda_dependencies.yml')
# Specify a GPU base image

tf_env.docker.enabled = True
tf_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-
cudnn7-ubuntu18.04'
 Tip
Optionally, you can just capture all your dependencies directly in a custom Docker
image or Dockerfile, and create your environment from that. For more information,
see Train with custom image.
arguments parameter.
Python
args = ['--data-folder', dataset.as_mount(),

'--batch-size', 64,
'--first-layer-neurons', 256,
'--second-layer-neurons', 128,
'--learning-rate', 0.01]
script='tf_mnist.py',
arguments=args,
environment=tf_env)
２ Warning
Learning dataset.
For more information on configuring jobs with ScriptRunConfig, see Configure and
submit training runs.
２ Warning
If you were previously using the TensorFlow estimator to configure your TensorFlow
training jobs, please note that Estimators have been deprecated as of the 1.19.0
SDK release. With Azure Machine Learning SDK >= 1.15.0, ScriptRunConfig is the
recommended way to configure training jobs, including those using deep learning
frameworks. For common migration questions, see the Estimator to
ScriptRunConfig migration guide.
Submit a run
The Run object provides the interface to the run history while the job is running and
after it has completed.
Python
run = Experiment(workspace=ws, name='Tutorial-TF-Mnist').submit(src)



the run.
Register or download a model

Once you've trained the model, you can register it to your workspace. Model
Optional: by specifying the parameters model_framework , model_framework_version , and

resource_configuration , no-code model deployment becomes available. This allows you
to directly deploy your model as a web service from the registered model, and the
ResourceConfiguration object defines the compute resource for the web service.
Python

model = run.register_model(model_name='tf-mnist',
model_path='outputs/model',
model_framework=Model.Framework.TENSORFLOW,
model_framework_version='2.0',
resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5))
You can also download a local copy of the model by using the Run object. In the training
script tf_mnist.py , a TensorFlow saver object persists the model to a local folder (local
to the compute target). You can use the Run object to download a copy.
Python
# Create a model folder in the current directory

os.makedirs('./model', exist_ok=True)
run.download_files(prefix='outputs/model', output_directory='./model',
append_prefix=False)
Azure Machine Learning also supports multi-node distributed TensorFlow jobs so that
you can scale your training workloads. You can easily run distributed TensorFlow jobs
and Azure Machine Learning will manage the orchestration for you.
Azure Machine Learning supports running distributed TensorFlow jobs with both
Horovod and TensorFlow's built-in distributed training API.
For more information about distributed training, see the Distributed GPU training guide.
Deploy a TensorFlow model

The deployment how-to contains a section on registering models, but you can skip
directly to creating a compute target for deployment, since you already have a
registered model.
(Preview) No-code model deployment
） Important
Previews .
Instead of the traditional deployment route, you can also use the no-code deployment
feature (preview) for TensorFlow. By registering your model as shown above with the
model_framework , model_framework_version , and resource_configuration parameters,
you can use the deploy() static function to deploy your model.
Python
service = Model.deploy(ws, "tensorflow-web-service", [model])
The full how-to covers deployment in Azure Machine Learning in greater depth.
Next steps
In this article, you trained and registered a TensorFlow model, and learned about
options for deployment. See these other articles to learn more about Azure Machine
Learning.

Train Keras models at scale with Azure
Machine Learning (SDK v1)
Article • 06/01/2023
In this article, learn how to run your Keras training scripts with Azure Machine Learning.
The example code in this article shows you how to train and register a Keras
classification model built using the TensorFlow backend with Azure Machine Learning. It
uses the popular MNIST dataset to classify handwritten digits using a deep neural
network (DNN) built using the Keras Python library running on top of TensorFlow .
Keras is a high-level neural network API capable of running top of other popular DNN
frameworks to simplify development. With Azure Machine Learning, you can rapidly
scale out training jobs using elastic cloud compute resources. You can also track your
training runs, version models, deploy models, and much more.
Whether you're developing a Keras model from the ground-up or you're bringing an
existing model into the cloud, Azure Machine Learning can help you build production-
ready models.
７ Note
If you are using the Keras API tf.keras built into TensorFlow and not the standalone
Keras package, refer instead to Train TensorFlow models.
Prerequisites

necessary
In the samples folder on the notebook server, find a completed and expanded
notebook by navigating to this directory: how-to-use-azureml > ml-
frameworks > keras > train-hyperparameter-tune-deploy-with-keras folder.

Download the sample script files keras_mnist.py and utils.py

initializing a workspace, creating the FileDataset for the input training data, creating the
compute target, and defining the training environment.
Import packages
Python
import os
import azureml
from azureml.core import Workspace, Run
Python
Create a file dataset
A FileDataset object references one or multiple files in your workspace datastore or
public urls. The files can be of any format, and the class provides you with the ability to
download or mount the files to your compute. By creating a FileDataset , you create a
reference to the data source location. If you applied any transformations to the data set,
they will be stored in the data set as well. The data remains in its existing location, so no
extra storage cost is incurred. See the how-to guide on the Dataset package for more
information.
Python
web_paths = [
]
dataset = Dataset.File.from_files(path=web_paths)
You can use the register() method to register the data set to your workspace so they
can be shared with others, reused across various experiments, and referred to by name
in your training script.
Python
dataset = dataset.register(workspace=ws,
name='mnist-dataset',
description='training and test dataset',

Create a compute target for your training job to run on. In this example, create a GPU-
enabled Azure Machine Learning compute cluster.
） Important
your workspace.
Python
try:
compute_config =
max_nodes=4)
７ Note

Define the Azure Machine Learning Environment that encapsulates your training script's
dependencies.
YAML
channels:
- conda-forge
dependencies:
- python=3.7
- pip:
- azureml-defaults
- tensorflow-gpu==2.0.0
- keras<=2.3.1
- matplotlib
runs training on a GPU cluster, you will need to specify a GPU base image that has the
base images published on Microsoft Container Registry (MCR) that you can use, see the
Azure/AzureML-Containers GitHub repo for more information.
Python
keras_env = Environment.from_conda_specification(name='keras-env',
file_path='conda_dependencies.yml')

keras_env.docker.enabled = True
keras_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-
cuda10.0-cudnn7-ubuntu18.04'
First get the data from the workspace datastore using the Dataset class.
Python
dataset = Dataset.get_by_name(ws, 'mnist-dataset')
# list the files referenced by mnist-dataset

dataset.to_path()
arguments parameter. The DatasetConsumptionConfig for our FileDataset is passed as
an argument to the training script, for the --data-folder argument. Azure Machine
Learning will resolve this DatasetConsumptionConfig to the mount-point of the backing
datastore, which can then be accessed from the training script.
Python
args = ['--data-folder', dataset.as_mount(),

'--batch-size', 50,
'--first-layer-neurons', 300,
'--second-layer-neurons', 100,
'--learning-rate', 0.001]
script='keras_mnist.py',
arguments=args,
environment=keras_env)
２ Warning
If you were previously using the TensorFlow estimator to configure your Keras
Submit your run

Python
run = Experiment(workspace=ws, name='Tutorial-Keras-Minst').submit(src)


the run.
Register the model

Python
model = run.register_model(model_name='keras-mnist',
model_path='outputs/model')
 Tip
registered model.
You can also download a local copy of the model. This can be useful for doing additional
model validation work locally. In the training script, keras_mnist.py , a TensorFlow saver
object persists the model to a local folder (local to the compute target). You can use the
Run object to download a copy from the run history.
Python

for f in run.get_file_names():
if f.startswith('outputs/model'):
output_file_path = os.path.join('./model', f.split('/')[-1])
print('Downloading from {} to {} ...'.format(f, output_file_path))
run.download_file(name=f, output_file_path=output_file_path)
Next steps
In this article, you trained and registered a Keras model on Azure Machine Learning. To
learn how to deploy a model, continue on to our model deployment article.
How and where to deploy models

Train PyTorch models at scale with
Azure Machine Learning SDK (v1)
Article • 06/01/2023
In this article, learn how to run your PyTorch training scripts at enterprise scale using
The example scripts in this article are used to classify chicken and turkey images to build
a deep learning neural network (DNN) based on PyTorch's transfer learning tutorial .
Transfer learning is a technique that applies knowledge gained from solving one
problem to a different but related problem. Transfer learning shortens the training
process by requiring less data, time, and compute resources than training from scratch.
To learn more about transfer learning, see the deep learning vs machine learning article.
Whether you're training a deep learning PyTorch model from the ground-up or you're
bringing an existing model into the cloud, you can use Azure Machine Learning to scale
out open-source training jobs using elastic cloud compute resources. You can build,
deploy, version, and monitor production-grade models with Azure Machine Learning.
Prerequisites

necessary
In the samples deep learning folder on the notebook server, find a completed
and expanded notebook by navigating to this directory: how-to-use-azureml >
ml-frameworks > pytorch > train-hyperparameter-tune-deploy-with-pytorch
folder.

Download the sample script files pytorch_train.py

initializing a workspace, creating the compute target, and defining the training
environment.
Import packages
Python
import os
import shutil


Python
Get the data

The dataset consists of about 120 training images each for turkeys and chickens, with
100 validation images for each class. We'll download and extract the dataset as part of
our training script pytorch_train.py . The images are a subset of the Open Images v5
Dataset . For more steps on creating a JSONL to train with your own data, see this
Jupyter notebook .
Prepare training script

In this tutorial, the training script, pytorch_train.py , is already provided. In practice, you
can take any custom training script, as is, and run it with Azure Machine Learning.
Create a folder for your training script(s).
Python
project_folder = './pytorch-birds'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('pytorch_train.py', project_folder)

Create a compute target for your PyTorch job to run on. In this example, create a GPU-
enabled Azure Machine Learning compute cluster.
） Important
your workspace.
Python
# Choose a name for your CPU cluster

# Verify that cluster does not exist already

try:
compute_config =
max_nodes=4)
# Create the cluster with the specified name and configuration

# Wait for the cluster to complete, show the output log
If you instead want to create a CPU cluster, provide a different VM size to the vm_size
parameter, such as STANDARD_D2_V2.
７ Note

script's dependencies, you can either define a custom environment or use an Azure
Azure Machine Learning provides prebuilt, curated environments if you don't want to
define your own environment. There are several CPU and GPU curated environments for
PyTorch corresponding to different versions of PyTorch.
If you want to use a curated environment, you can run the following command instead:
Python
To see the packages included in the curated environment, you can write out the conda
dependencies to disk:
Python
pytorch_env.save_to_directory(path=curated_env_name)
Make sure the curated environment includes all the dependencies required by your
training script. If not, you'll have to modify the environment to include the missing
dependencies. If the environment is modified, you'll have to give it a new name, as the
'AzureML' prefix is reserved for curated environments. If you modified the conda
dependencies YAML file, you can create a new environment from it with a new name, for
example:
Python
pytorch_env = Environment.from_conda_specification(name='pytorch-1.6-gpu',
file_path='./conda_dependencies.yml')
If you had instead modified the curated environment object directly, you can clone that
environment with a new name:
Python
pytorch_env = pytorch_env.clone(new_name='pytorch-1.6-gpu')

You can also create your own Azure Machine Learning environment that encapsulates
your training script's dependencies.
YAML
channels:
- conda-forge
dependencies:
- python=3.7
- pip=21.3.1
- pip:
- azureml-defaults
- torch==1.6.0
- torchvision==0.7.0
- future==0.17.1
- pillow

runs training on a GPU cluster, you'll need to specify a GPU base image that has the
base images published on Microsoft Container Registry (MCR) that you can use. For
more information, see AzureML-Containers GitHub repo .
Python
pytorch_env = Environment.from_conda_specification(name='pytorch-1.6-gpu',
file_path='./conda_dependencies.yml')

pytorch_env.docker.enabled = True
pytorch_env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-
cuda10.1-cudnn7-ubuntu18.04'
 Tip
Optionally, you can just capture all your dependencies directly in a custom Docker
image or Dockerfile, and create your environment from that. For more information,
see Train with custom image.
arguments parameter. The following code will configure a single-node PyTorch job.
Python
src = ScriptRunConfig(source_directory=project_folder,
script='pytorch_train.py',
arguments=['--num_epochs', 30, '--output_dir',
'./outputs'],
environment=pytorch_env)
２ Warning
Learning dataset.
２ Warning
If you were previously using the PyTorch estimator to configure your PyTorch
Submit your run

Python
run = Experiment(ws, name='Tutorial-pytorch-birds').submit(src)



the run.
Register or download a model

Python
model = run.register_model(model_name='pytorch-birds',
model_path='outputs/model.pt')
 Tip
registered model.
You can also download a local copy of the model by using the Run object. In the training
script pytorch_train.py , a PyTorch save object persists the model to a local folder (local
to the compute target). You can use the Run object to download a copy.
Python

# Download the model from run history

run.download_file(name='outputs/model.pt',
output_file_path='./model/model.pt'),
Azure Machine Learning also supports multi-node distributed PyTorch jobs so that you
can scale your training workloads. You can easily run distributed PyTorch jobs and Azure
Machine Learning will manage the orchestration for you.
Azure Machine Learning supports running distributed PyTorch jobs with both Horovod
and PyTorch's built-in DistributedDataParallel module.
For more information about distributed training, see the Distributed GPU training guide.
Export to ONNX
To optimize inference with the ONNX Runtime, convert your trained PyTorch model to
the ONNX format. Inference, or model scoring, is the phase where the deployed model
is used for prediction, most commonly on production data. For an example, see the
Exporting model from PyTorch to ONNX tutorial .
Next steps
In this article, you trained and registered a deep learning, neural network using PyTorch
on Azure Machine Learning. To learn how to deploy a model, continue on to our model
deployment article.
How and where to deploy models

Train a model by using a custom Docker
image
Article • 04/20/2023
In this article, learn how to use a custom Docker image when you're training models
with Azure Machine Learning. You'll use the example scripts in this article to classify pet
images by creating a convolutional neural network.
Azure Machine Learning provides a default Docker base image. You can also use Azure
Machine Learning environments to specify a different base image, such as one of the
maintained Azure Machine Learning base images or your own custom image. Custom
base images allow you to closely manage your dependencies and maintain tighter
control over component versions when running training jobs.
Prerequisites
Run the code on either of these environments:
Azure Machine Learning compute instance (no downloads or installation

necessary):
Complete the Create resources to get started tutorial to create a dedicated
notebook server preloaded with the SDK and the sample repository.
Your own Jupyter Notebook server:
Install the Azure Machine Learning SDK.
Create an Azure container registry or other Docker registry that's available on
the internet.
Set up a training experiment

In this section, you set up your training experiment by initializing a workspace, defining
your environment, and configuring a compute target.
The Azure Machine Learning workspace is the top-level resource for the service. It gives
you a centralized place to work with all the artifacts that you create. In the Python SDK,
you can access the workspace artifacts by creating a Workspace object.
Create a Workspace object from the config.json file that you created as a prerequisite.
Python

Create an Environment object.
Python
fastai_env = Environment("fastai2")
The specified base image in the following code supports the fast.ai library, which allows
for distributed deep-learning capabilities. For more information, see the fast.ai Docker
Hub repository .
When you're using your custom Docker image, you might already have your Python
environment properly set up. In that case, set the user_managed_dependencies flag to
True to use your custom image's built-in Python environment. By default, Azure
Machine Learning builds a Conda environment with dependencies that you specified.
The service runs the script in that environment instead of using any Python libraries that
you installed on the base image.
Python
fastai_env.docker.base_image = "fastdotai/fastai2:latest"
fastai_env.python.user_managed_dependencies = True
Use a private container registry (optional)

To use an image from a private container registry that isn't in your workspace, use
docker.base_image_registry to specify the address of the repository and a username
and password:
Python
# Set the container registry information.
fastai_env.docker.base_image_registry.address = "myregistry.azurecr.io"
fastai_env.docker.base_image_registry.username = "username"
fastai_env.docker.base_image_registry.password = "password"
Use a custom Dockerfile (optional)
It's also possible to use a custom Dockerfile. Use this approach if you need to install
non-Python packages as dependencies. Remember to set the base image to None .
Python
# Specify Docker steps as a string.

dockerfile = r"""
FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210615.v1
RUN echo "Hello from custom container!"
"""
# Set the base image to None, because the image is defined by Dockerfile.
fastai_env.docker.base_image = None
fastai_env.docker.base_dockerfile = dockerfile
# Alternatively, load the string from a file.

fastai_env.docker.base_image = None
fastai_env.docker.base_dockerfile = "./Dockerfile"
） Important
Azure Machine Learning only supports Docker images that provide the following
software:
Ubuntu 18.04 or greater.

Conda 4.7.# or greater.
Python 3.7+.
A POSIX compliant shell available at /bin/sh is required in any container
image used for training.
For more information about creating and managing Azure Machine Learning
environments, see Create and use software environments.
Create or attach a compute target

You need to create a compute target for training your model. In this tutorial, you create
AmlCompute as your training compute resource.
Creation of AmlCompute takes a few minutes. If the AmlCompute resource is already in

your workspace, this code skips the creation process.
As with other Azure services, there are limits on certain resources (for example,
AmlCompute ) associated with the Azure Machine Learning service. For more information,
see Default limits and how to request a higher quota.
Python

# Choose a name for your cluster.

try:
print('Found existing compute target.')
compute_config =
max_nodes=4)
# Create the cluster.

compute_target.wait_for_completion(show_output=True)
# Use get_status() to get a detailed status for the current AmlCompute.

print(compute_target.get_status().serialize())
） Important
Use CPU SKUs for any image build on compute.
Configure your training job

For this tutorial, use the training script train.py on GitHub . In practice, you can take
any custom training script and run it, as is, with Azure Machine Learning.
Create a ScriptRunConfig resource to configure your job for running on the desired
compute target.
Python
src = ScriptRunConfig(source_directory='fastai-example',
script='train.py',
environment=fastai_env)
Submit your training job

When you submit a training run by using a ScriptRunConfig object, the submit method
returns an object of type ScriptRun . The returned ScriptRun object gives you
programmatic access to information about the training run.
Python
run = Experiment(ws,'Tutorial-fastai').submit(src)
２ Warning
If you have sensitive data that you don't want to upload, use an .ignore file or don't
include it in the source directory. Instead, access your data by using a datastore.
Next steps
In this article, you trained a model by using a custom Docker image. See these other
articles to learn more about Azure Machine Learning:
Track run metrics during training.

Deploy a model by using a custom Docker image.
Migrating from Estimators to
ScriptRunConfig
Article • 03/02/2023
Up until now, there have been multiple methods for configuring a training job in Azure
Machine Learning via the SDK, including Estimators, ScriptRunConfig, and the lower-
level RunConfiguration. To address this ambiguity and inconsistency, we are simplifying
the job configuration process in Azure Machine Learning. You should now use
ScriptRunConfig as the recommended option for configuring training jobs.
Estimators are deprecated with the 1.19. release of the Python SDK. You should also
generally avoid explicitly instantiating a RunConfiguration object yourself, and instead
configure your job using the ScriptRunConfig class.
This article covers common considerations when migrating from Estimators to

ScriptRunConfig.
） Important
To migrate to ScriptRunConfig from Estimators, make sure you are using >= 1.15.0
of the Python SDK.
ScriptRunConfig documentation and samples

Azure Machine Learning documentation and samples have been updated to use
ScriptRunConfig for job configuration and submission.
For information on using ScriptRunConfig, refer to the following documentation:

Configuring PyTorch training jobs
Configuring TensorFlow training jobs
Configuring scikit-learn training jobs
In addition, refer to the following samples & tutorials:
Azure/MachineLearningNotebooks
Azure/azureml-examples
Defining the training environment
While the various framework estimators have preconfigured environments that are
backed by Docker images, the Dockerfiles for these images are private. Therefore you
do not have a lot of transparency into what these environments contain. In addition, the
estimators take in environment-related configurations as individual parameters (such as
pip_packages , custom_docker_image ) on their respective constructors.
When using ScriptRunConfig, all environment-related configurations are encapsulated in

the Environment object that gets passed into the environment parameter of the
ScriptRunConfig constructor. To configure a training job, provide an environment that
has all the dependencies required for your training script. If no environment is provided,
Azure Machine Learning will use one of the Azure Machine Learning base images,
specifically the one defined by azureml.core.environment.DEFAULT_CPU_IMAGE , as the
default environment. There are a couple of ways to provide an environment:
Use a curated environment - curated environments are predefined environments

available in your workspace by default. There is a corresponding curated
environment for each of the preconfigured framework/version Docker images that
backed each framework estimator.
Define your own custom environment
Here is an example of using the curated environment for training:
Python
from azureml.core import Workspace, ScriptRunConfig, Environment
curated_env_name = '<add Pytorch curated environment name here>'

compute_target = ws.compute_targets['my-cluster']
script='train.py',
 Tip
For a list of curated environments, see curated environments.
If you want to specify environment variables that will get set on the process where the
training script is executed, use the Environment object:
myenv.environment_variables = {"MESSAGE":"Hello from Azure Machine
Learning"}
For information on configuring and managing Azure Machine Learning environments,

see:
How to use environments

Curated environments
Train with a custom Docker image
Using data for training
Datasets
If you are using an Azure Machine Learning dataset for training, pass the dataset as an
argument to your script using the arguments parameter. By doing so, you will get the
data path (mounting point or download path) in your training script via arguments.
The following example configures a training job where the FileDataset, mnist_ds , will get
mounted on the remote compute.
Python
script='train.py',
arguments=['--data-folder', mnist_ds.as_mount()], # or
mnist_ds.as_download() to download
DataReference (old)
While we recommend using Azure Machine Learning Datasets over the old
DataReference way, if you are still using DataReferences for any reason, you must
configure your job as follows:
Python
# if you want to pass a DataReference object, such as the below:

data_ref = datastore.path('./foo').as_mount()
script='train.py',
arguments=['--data-folder', str(data_ref)], # cast the
DataReference object to str
src.run_config.data_references = {data_ref.data_reference_name:
data_ref.to_config()} # set a dict of the DataReference(s) you want to the
`data_references` attribute of the ScriptRunConfig's underlying
RunConfiguration object.
For more information on using data for training, see:
Train with datasets in Azure Machine Learning
If you need to configure a distributed job for training, do so by specifying the
distributed_job_config parameter in the ScriptRunConfig constructor. Pass in an
MpiConfiguration, PyTorchConfiguration, or TensorflowConfiguration for distributed jobs

of the respective types.
The following example configures a PyTorch training job to use distributed training with
MPI/Horovod:
Python
script='train.py',
distributed_job_config=MpiConfiguration(node_count=2,
process_count_per_node=2))
For more information, see:
Distributed training with PyTorch

Distributed training with TensorFlow
Miscellaneous
If you need to access the underlying RunConfiguration object for a ScriptRunConfig for
any reason, you can do so as follows:
src.run_config
Next steps
Use authentication credential secrets in
Azure Machine Learning training jobs
Article • 04/04/2023
In this article, you learn how to use secrets in training jobs securely. Authentication
information such as your user name and password are secrets. For example, if you
connect to an external database in order to query training data, you would need to pass
your username and password to the remote job context. Coding such values into
training scripts in cleartext is insecure as it would expose the secret.
Instead, your Azure Machine Learning workspace has an associated resource called a
Azure Key Vault. Use this Key Vault to pass secrets to remote jobs securely through a set
of APIs in the Azure Machine Learning Python SDK.
The standard flow for using secrets is:
1. On local computer, log in to Azure and connect to your workspace.

2. On local computer, set a secret in Workspace Key Vault.
3. Submit a remote job.
4. Within the remote job, get the secret from Key Vault and use it.
Set secrets
In the Azure Machine Learning, the Keyvault class contains methods for setting secrets.
In your local Python session, first obtain a reference to your workspace Key Vault, and
then use the set_secret() method to set a secret by name and value. The set_secret
method updates the secret value if the name already exists.
Python

from azureml.core import Keyvault
import os
my_secret = os.environ.get("MY_SECRET")
keyvault = ws.get_default_keyvault()
keyvault.set_secret(name="mysecret", value = my_secret)
Do not put the secret value in your Python code as it is insecure to store it in file as
cleartext. Instead, obtain the secret value from an environment variable, for example
Azure DevOps build secret, or from interactive user input.
You can list secret names using the list_secrets() method and there is also a batch
version,set_secrets() that allows you to set multiple secrets at a time.
） Important
Using list_secrets() will only list secrets created through set_secret() or

set_secrets() using the Azure Machine Learning SDK. It will not list secrets created
by something other than the SDK. For example, a secret created using the Azure
portal or Azure PowerShell will not be listed.
You can use get_secret() to get a secret value from the key vault, regardless of how
it was created. So you can retrieve secrets that are not listed by list_secrets() .
Get secrets
In your local code, you can use the get_secret() method to get the secret value by name.
For jobs submitted the Experiment.submit , use the get_secret() method with the Run
class. Because a submitted run is aware of its workspace, this method shortcuts the
Workspace instantiation and returns the secret value directly.
Python
# Code in submitted job

from azureml.core import Experiment, Run
secret_value = run.get_secret(name="mysecret")
Be careful not to expose the secret value by writing or printing it out.
There is also a batch version, get_secrets() for accessing multiple secrets at once.
Next steps
View example notebook
Learn about enterprise security with Azure Machine Learning
Use the Python interpretability package
to explain ML models & predictions
(preview)
Article • 03/14/2023
In this how-to guide, you learn to use the interpretability package of the Azure Machine
Learning Python SDK to perform the following tasks:
Explain the entire model behavior or individual predictions on your personal

machine locally.
Enable interpretability techniques for engineered features.
Explain the behavior for the entire model and individual predictions in Azure.
Upload explanations to Azure Machine Learning Run History.
Use a visualization dashboard to interact with your model explanations, both in a

Jupyter Notebook and in the Azure Machine Learning studio.
Deploy a scoring explainer alongside your model to observe explanations during

inferencing.
） Important
Previews .
For more information on the supported interpretability techniques and machine

learning models, see Model interpretability in Azure Machine Learning and sample
notebooks .
For guidance on how to enable interpretability for models trained with automated
machine learning see, Interpretability: model explanations for automated machine
learning models (preview).
Generate feature importance value on your
personal machine
The following example shows how to use the interpretability package on your personal
machine without contacting Azure services.
1. Install the azureml-interpret package.
Bash
pip install azureml-interpret
2. Train a sample model in a local Jupyter Notebook.
Python
# load breast cancer dataset, a well-known small dataset that comes

with scikit-learn
from sklearn.datasets import load_breast_cancer
from sklearn import svm
breast_cancer_data = load_breast_cancer()
classes = breast_cancer_data.target_names.tolist()
# split data into train and test

x_train, x_test, y_train, y_test =
train_test_split(breast_cancer_data.data,
breast_cancer_data.target,
test_size=0.2,
random_state=0)
clf = svm.SVC(gamma=0.001, C=100., probability=True)
model = clf.fit(x_train, y_train)
3. Call the explainer locally.
To initialize an explainer object, pass your model and some training data to
the explainer's constructor.
To make your explanations and visualizations more informative, you can
choose to pass in feature names and output class names if doing
classification.
The following code blocks show how to instantiate an explainer object with
TabularExplainer , MimicExplainer , and PFIExplainer locally.
TabularExplainer calls one of the three SHAP explainers underneath
( TreeExplainer , DeepExplainer , or KernelExplainer ).

TabularExplainer automatically selects the most appropriate one for your
use case, but you can call each of its three underlying explainers directly.
Python
from interpret.ext.blackbox import TabularExplainer
# "features" and "classes" fields are optional

explainer = TabularExplainer(model,
x_train,
features=breast_cancer_data.feature_names,
classes=classes)
or
Python
from interpret.ext.blackbox import MimicExplainer
# you can use one of the following four interpretable models as a

global surrogate to the black box model
from interpret.ext.glassbox import LGBMExplainableModel

from interpret.ext.glassbox import LinearExplainableModel
from interpret.ext.glassbox import SGDExplainableModel
from interpret.ext.glassbox import DecisionTreeExplainableModel

# augment_data is optional and if true, oversamples the initialization
examples to improve surrogate model accuracy to fit original model.
Useful for high-dimensional data where the number of rows is less than
the number of columns.
# max_num_of_augmentations is optional and defines max number of times
we can increase the input data size.
# LGBMExplainableModel can be replaced with LinearExplainableModel,
SGDExplainableModel, or DecisionTreeExplainableModel
explainer = MimicExplainer(model,
x_train,
LGBMExplainableModel,
augment_data=True,
max_num_of_augmentations=10,
classes=classes)
or
Python
from interpret.ext.blackbox import PFIExplainer

explainer = PFIExplainer(model,
classes=classes)
Explain the entire model behavior (global explanation)

Refer to the following example to help you get the aggregate (global) feature
importance values.
Python
# you can use the training data or the test data here, but test data would
allow you to use Explanation Exploration
global_explanation = explainer.explain_global(x_test)
# if you used the PFIExplainer in the previous step, use the next line of
code instead
# global_explanation = explainer.explain_global(x_train,
true_labels=y_train)
# sorted feature importance values and feature names

sorted_global_importance_values =
global_explanation.get_ranked_global_values()
sorted_global_importance_names =
global_explanation.get_ranked_global_names()
dict(zip(sorted_global_importance_names, sorted_global_importance_values))
# alternatively, you can print out a dictionary that holds the top K feature
names and values
global_explanation.get_feature_importance_dict()
Explain an individual prediction (local explanation)

Get the individual feature importance values of different datapoints by calling
explanations for an individual instance or a group of instances.
７ Note
PFIExplainer does not support local explanations.
Python
# get explanation for the first data point in the test set
local_explanation = explainer.explain_local(x_test[0:5])
# sorted feature importance values and feature names

sorted_local_importance_names = local_explanation.get_ranked_local_names()
sorted_local_importance_values = local_explanation.get_ranked_local_values()
Raw feature transformations

You can opt to get explanations in terms of raw, untransformed features rather than
engineered features. For this option, you pass your feature transformation pipeline to
the explainer in train_explain.py . Otherwise, the explainer provides explanations in
terms of engineered features.
The format of supported transformations is the same as described in sklearn-pandas .

In general, any transformations are supported as long as they operate on a single
column so that it's clear they're one-to-many.
Get an explanation for raw features by using a sklearn.compose.ColumnTransformer or

with a list of fitted transformer tuples. The following example uses
sklearn.compose.ColumnTransformer .
Python
from sklearn.compose import ColumnTransformer
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
# append classifier to preprocessing pipeline.

# now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))])
# clf.steps[-1][1] returns the trained classification model

# pass transformation as an input to create the explanation object
tabular_explainer = TabularExplainer(clf.steps[-1][1],
initialization_examples=x_train,
features=dataset_feature_names,
classes=dataset_classes,
transformations=preprocessor)
In case you want to run the example with the list of fitted transformer tuples, use the
following code:
Python
from sklearn.pipeline import Pipeline

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn_pandas import DataFrameMapper
# assume that we have created two arrays, numerical and categorical, which
holds the numerical and categorical feature names
numeric_transformations = [([f], Pipeline(steps=[('imputer', SimpleImputer(

strategy='median')), ('scaler', StandardScaler())])) for f in numerical]
categorical_transformations = [([f], OneHotEncoder(

handle_unknown='ignore', sparse=False)) for f in categorical]
transformations = numeric_transformations + categorical_transformations
# append model to preprocessing pipeline.

# now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),
('classifier', LogisticRegression(solver='lbfgs'))])
# clf.steps[-1][1] returns the trained classification model

# pass transformation as an input to create the explanation object
tabular_explainer = TabularExplainer(clf.steps[-1][1],
transformations=transformations)
Generate feature importance values via remote

runs
The following example shows how you can use the ExplanationClient class to enable
model interpretability for remote runs. It’s conceptually similar to the local process,
except you:
Use the ExplanationClient in the remote run to upload the interpretability
context.
Download the context later in a local environment.
1. Install the azureml-interpret package.
Bash
pip install azureml-interpret
2. Create a training script in a local Jupyter Notebook. For example,

train_explain.py .
Python
from azureml.interpret import ExplanationClient

client = ExplanationClient.from_run(run)
# write code to get and split your data into train and test sets here
# write code to train your model here
# explain predictions on your local machine

x_train,
features=feature_names,
classes=classes)
# explain overall model predictions (global explanation)

global_explanation = explainer.explain_global(x_test)
# uploading global model explanation data for storage or visualization

in webUX
# the explanation can then be downloaded on any compute
# multiple explanations can be uploaded
client.upload_model_explanation(global_explanation, comment='global
explanation: all features')
# or you can only upload the explanation object with the top k feature
info
#client.upload_model_explanation(global_explanation, top_k=2,
comment='global explanation: Only top 2 features')
3. Set up an Azure Machine Learning Compute as your compute target and submit
your training run. See Create and manage Azure Machine Learning compute
clusters for instructions. You might also find the example notebooks helpful.
4. Download the explanation in your local Jupyter Notebook.
Python
client = ExplanationClient.from_run(run)
# get model explanation data

explanation = client.download_model_explanation()
# or only get the top k (e.g., 4) most important features with their
importance values
explanation = client.download_model_explanation(top_k=4)
global_importance_values = explanation.get_ranked_global_values()
global_importance_names = explanation.get_ranked_global_names()
print('global importance values: {}'.format(global_importance_values))
print('global importance names: {}'.format(global_importance_names))
Visualizations
After you download the explanations in your local Jupyter Notebook, you can use the
visualizations in the explanations dashboard to understand and interpret your model. To
load the explanations dashboard widget in your Jupyter Notebook, use the following
code:
Python
from raiwidgets import ExplanationDashboard
ExplanationDashboard(global_explanation, model, datasetX=x_test)
The visualizations support explanations on both engineered and raw features. Raw
explanations are based on the features from the original dataset and engineered
explanations are based on the features from the dataset with feature engineering
applied.
When attempting to interpret a model with respect to the original dataset, it’s
recommended to use raw explanations as each feature importance will correspond to a
column from the original dataset. One scenario where engineered explanations might
be useful is when examining the impact of individual categories from a categorical
feature. If a one-hot encoding is applied to a categorical feature, then the resulting
engineered explanations will include a different importance value per category, one per
one-hot engineered feature. This encoding can be useful when narrowing down which
part of the dataset is most informative to the model.
７ Note
Engineered and raw explanations are computed sequentially. First an engineered

explanation is created based on the model and featurization pipeline. Then the raw
explanation is created based on that engineered explanation by aggregating the
importance of engineered features that came from the same raw feature.
Create, edit, and view dataset cohorts

The top ribbon shows the overall statistics on your model and data. You can slice and
dice your data into dataset cohorts, or subgroups, to investigate or compare your
model’s performance and explanations across these defined subgroups. By comparing
your dataset statistics and explanations across those subgroups, you can get a sense of
why possible errors are happening in one group versus another.
Understand entire model behavior (global explanation)

The first three tabs of the explanation dashboard provide an overall analysis of the
trained model along with its predictions and explanations.
Model performance
Evaluate the performance of your model by exploring the distribution of your prediction
values and the values of your model performance metrics. You can further investigate
your model by looking at a comparative analysis of its performance across different
cohorts or subgroups of your dataset. Select filters along y-value and x-value to cut
across different dimensions. View metrics such as accuracy, precision, recall, false
positive rate (FPR), and false negative rate (FNR).
Dataset explorer
Explore your dataset statistics by selecting different filters along the X, Y, and color axes
to slice your data along different dimensions. Create dataset cohorts above to analyze
dataset statistics with filters such as predicted outcome, dataset features and error
groups. Use the gear icon in the upper right-hand corner of the graph to change graph
types.

Aggregate feature importance
Explore the top-k important features that impact your overall model predictions (also
known as global explanation). Use the slider to show descending feature importance
values. Select up to three cohorts to see their feature importance values side by side.
Select any of the feature bars in the graph to see how values of the selected feature
impact model prediction in the dependence plot below.
Understand individual predictions (local explanation)

The fourth tab of the explanation tab lets you drill into an individual datapoint and their
individual feature importances. You can load the individual feature importance plot for
any data point by clicking on any of the individual data points in the main scatter plot or
selecting a specific datapoint in the panel wizard on the right.
Plot Description
Individual Shows the top-k important features for an individual prediction. Helps illustrate
feature the local behavior of the underlying model on a specific data point.
importance
What-If analysis Allows changes to feature values of the selected real data point and observe
resulting changes to prediction value by generating a hypothetical datapoint
with the new feature values.
Plot Description
Individual Allows feature value changes from a minimum value to a maximum value.
Conditional Helps illustrate how the data point's prediction changes when a feature
Expectation changes.
(ICE)
７ Note
These are explanations based on many approximations and are not the "cause" of
predictions. Without strict mathematical robustness of causal inference, we do not
advise users to make real-life decisions based on the feature perturbations of the
What-If tool. This tool is primarily for understanding your model and debugging.
Visualization in Azure Machine Learning studio

If you complete the remote interpretability steps (uploading generated explanations to
Azure Machine Learning Run History), you can view the visualizations on the
explanations dashboard in Azure Machine Learning studio . This dashboard is a
simpler version of the dashboard widget that's generated within your Jupyter Notebook.
What-If datapoint generation and ICE plots are disabled as there’s no active compute in
Azure Machine Learning studio that can perform their real-time computations.
If the dataset, global, and local explanations are available, data populates all of the tabs.
However, if only a global explanation is available, the Individual feature importance tab
will be disabled.
Follow one of these paths to access the explanations dashboard in Azure Machine
Learning studio:
Experiments pane (Preview)
1. Select Experiments in the left pane to see a list of experiments that you've
run on Azure Machine Learning.
2. Select a particular experiment to view all the runs in that experiment.
3. Select a run, and then the Explanations tab to the explanation visualization
dashboard.
Models pane
1. If you registered your original model by following the steps in Deploy models
with Azure Machine Learning, you can select Models in the left pane to view
it.
2. Select a model, and then the Explanations tab to view the explanations
dashboard.
Interpretability at inference time

You can deploy the explainer along with the original model and use it at inference time
to provide the individual feature importance values (local explanation) for any new
datapoint. We also offer lighter-weight scoring explainers to improve interpretability
performance at inference time, which is currently supported only in Azure Machine
Learning SDK. The process of deploying a lighter-weight scoring explainer is similar to
deploying a model and includes the following steps:
1. Create an explanation object. For example, you can use TabularExplainer :
Python
transformations=transformations)
2. Create a scoring explainer with the explanation object.
Python
from azureml.interpret.scoring.scoring_explainer import

KernelScoringExplainer, save
# create a lightweight explainer at scoring time

scoring_explainer = KernelScoringExplainer(explainer)
# pickle scoring explainer

# pickle scoring explainer locally
OUTPUT_DIR = 'my_directory'
save(scoring_explainer, directory=OUTPUT_DIR, exist_ok=True)
3. Configure and register an image that uses the scoring explainer model.
Python
# register explainer model using the path from ScoringExplainer.save -

could be done on remote compute
# scoring_explainer.pkl is the filename on disk, while
my_scoring_explainer.pkl will be the filename in cloud storage
run.upload_file('my_scoring_explainer.pkl', os.path.join(OUTPUT_DIR,
'scoring_explainer.pkl'))
scoring_explainer_model =
run.register_model(model_name='my_scoring_explainer',
model_path='my_scoring_explainer.pkl')
print(scoring_explainer_model.name, scoring_explainer_model.id,
scoring_explainer_model.version, sep = '\t')
4. As an optional step, you can retrieve the scoring explainer from cloud and test the
explanations.
Python
from azureml.interpret.scoring.scoring_explainer import load
# retrieve the scoring explainer model from cloud"

scoring_explainer_model = Model(ws, 'my_scoring_explainer')
scoring_explainer_model_path =
scoring_explainer_model.download(target_dir=os.getcwd(), exist_ok=True)
# load scoring explainer from disk

scoring_explainer = load(scoring_explainer_model_path)
# test scoring explainer locally

preds = scoring_explainer.explain(x_test)
print(preds)
5. Deploy the image to a compute target, by following these steps:
a. If needed, register your original prediction model by following the steps in

Deploy models with Azure Machine Learning.
b. Create a scoring file.
Python
import json
import numpy as np
import pandas as pd
import os
import pickle
from sklearn.externals import joblib
def init():
global original_model
global scoring_model
# retrieve the path to the model file using the model name
# assume original model is named original_prediction_model
original_model_path =
Model.get_model_path('original_prediction_model')
scoring_explainer_path =
Model.get_model_path('my_scoring_explainer')
original_model = joblib.load(original_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
# get predictions and explanations for each data point
data = pd.read_json(raw_data)
# make prediction
predictions = original_model.predict(data)
# retrieve model explanations
local_importance_values = scoring_explainer.explain(data)
# you can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(),
'local_importance_values': local_importance_values}
c. Define the deployment configuration.
This configuration depends on the requirements of your model. The following

example defines a configuration that uses one CPU core and one GB of
memory.
Python
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data":
"NAME_OF_THE_DATASET",
"method" :
"local_explanation"},
description='Get local
explanations for NAME_OF_THE_PROBLEM')
d. Create a file with environment dependencies.
Python
# WARNING: to install this, g++ needs to be available on the Docker

image and is not by default (look at the next cell)
azureml_pip_packages = ['azureml-defaults', 'azureml-core',

'azureml-telemetry', 'azureml-interpret']
# specify CondaDependencies obj

myenv = CondaDependencies.create(conda_packages=['scikit-learn',
'pandas'],
pip_packages=['sklearn-pandas'] +
azureml_pip_packages,
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
with open("myenv.yml","r") as f:
print(f.read())
e. Create a custom dockerfile with g++ installed.
Python
%%writefile dockerfile
RUN apt-get update && apt-get install -y g++
f. Deploy the created image.
This process takes approximately five minutes.
Python

from azureml.core.image import ContainerImage
# use the custom scoring, docker, and conda files we created above
image_config =
ContainerImage.image_configuration(execution_script="score.py",
docker_file="dockerfile",
runtime="python",
conda_file="myenv.yml")
# use configs and models generated above

service = Webservice.deploy_from_model(workspace=ws,
name='model-scoring-service',
deployment_config=aciconfig,
models=[scoring_explainer_model,
original_model],
image_config=image_config)
6. Test the deployment.
Python
import requests
# create data to test service with

examples = x_list[:4]
input_data = examples.to_json()
headers = {'Content-Type':'application/json'}
# send request to service

resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)

# can covert back to Python objects from json string if desired
7. Clean up.
To delete a deployed web service, use service.delete() .
Troubleshooting
Sparse data not supported: The model explanation dashboard breaks/slows down
substantially with a large number of features, therefore we currently don’t support
sparse data format. Additionally, general memory issues will arise with large
datasets and large number of features.
Supported explanations features matrix
Supported Raw Raw features (sparse) Engineered Engineered features

explanation features features (sparse)
tab (dense) (dense)
Model Supported Supported (not Supported Supported

performance (not forecasting)
forecasting)
Dataset Supported Not supported. Since Supported Not supported. Since

explorer (not sparse data isn’t uploaded sparse data isn’t
forecasting) and UI has issues uploaded and UI has
rendering sparse data. issues rendering sparse
data.
Aggregate Supported Supported Supported Supported

feature
importance
Individual Supported Not supported. Since Supported Not supported. Since

feature (not sparse data isn’t uploaded sparse data isn’t
importance forecasting) and UI has issues uploaded and UI has
rendering sparse data. issues rendering sparse
data.
Forecasting models not supported with model explanations: Interpretability, best
model explanation, isn’t available for AutoML forecasting experiments that
recommend the following algorithms as the best model: TCNForecaster,
AutoArima, Prophet, ExponentialSmoothing, Average, Naive, Seasonal Average,
and Seasonal Naive. AutoML Forecasting regression models support explanations.
However, in the explanation dashboard, the "Individual feature importance" tab
isn’t supported for forecasting because of complexity in their data pipelines.
Local explanation for data index: The explanation dashboard doesn’t support
relating local importance values to a row identifier from the original validation
dataset if that dataset is greater than 5000 datapoints as the dashboard randomly
downsamples the data. However, the dashboard shows raw dataset feature values
for each datapoint passed into the dashboard under the Individual feature
importance tab. Users can map local importances back to the original dataset
through matching the raw dataset feature values. If the validation dataset size is
less than 5000 samples, the index feature in Azure Machine Learning studio will
correspond to the index in the validation dataset.
What-if/ICE plots not supported in studio: What-If and Individual Conditional

Expectation (ICE) plots aren’t supported in Azure Machine Learning studio under
the Explanations tab since the uploaded explanation needs an active compute to
recalculate predictions and probabilities of perturbed features. It’s currently
supported in Jupyter notebooks when run as a widget using the SDK.
Next steps
Techniques for model interpretability in Azure Machine Learning
Check out Azure Machine Learning interpretability sample notebooks

Interpretability: Model explainability in
automated ML (preview)
Article • 05/30/2023
In this article, you learn how to get explanations for automated machine learning
(automated ML) models in Azure Machine Learning using the Python SDK. Automated
ML helps you understand feature importance of the models that are generated.
All SDK versions after 1.0.85 set model_explainability=True by default. In SDK version
1.0.85 and earlier versions users need to set model_explainability=True in the
AutoMLConfig object in order to use model interpretability.
In this article, you learn how to:
Perform interpretability during training for best model or any model.

Enable visualizations to help you see patterns in data and explanations.
Implement interpretability during inference or scoring.
Prerequisites
Interpretability features. Run pip install azureml-interpret to get the necessary
package.
Knowledge of building automated ML experiments. For more information on how
to use the Azure Machine Learning SDK, complete this object detection model
tutorial or see how to configure automated ML experiments.
） Important
Previews .
Interpretability during training for the best

model
Retrieve the explanation from the best_run , which includes explanations for both raw
and engineered features.
７ Note
Interpretability, model explanation, is not available for the TCNForecaster model

recommended by Auto ML forecasting experiments.
Download the engineered feature importances from the

best run
You can use ExplanationClient to download the engineered feature explanations from
the artifact store of the best_run .
Python
client = ExplanationClient.from_run(best_run)
engineered_explanations = client.download_model_explanation(raw=False)
print(engineered_explanations.get_feature_importance_dict())
Download the raw feature importances from the best run

You can use ExplanationClient to download the raw feature explanations from the
artifact store of the best_run .
Python
client = ExplanationClient.from_run(best_run)
raw_explanations = client.download_model_explanation(raw=True)
print(raw_explanations.get_feature_importance_dict())
Interpretability during training for any model

When you compute model explanations and visualize them, you're not limited to an
existing model explanation for an AutoML model. You can also get an explanation for
your model with different test data. The steps in this section show you how to compute
and visualize engineered feature importance based on your test data.
Retrieve any other AutoML model from training
Python
automl_run, fitted_model = local_run.get_output(metric='accuracy')
Set up the model explanations

Use automl_setup_model_explanations to get the engineered and raw explanations. The
fitted_model can generate the following items:
Featured data from trained or test samples

Engineered feature name lists
Findable classes in your labeled column in classification scenarios
The automl_explainer_setup_obj contains all the structures from above list.
Python
from azureml.train.automl.runtime.automl_explain_utilities import

automl_setup_model_explanations
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model,
X=X_train,
X_test=X_test,
y=y_train,
task='classification')
Initialize the Mimic Explainer for feature importance

To generate an explanation for automated ML models, use the MimicWrapper class. You
can initialize the MimicWrapper with these parameters:
The explainer setup object

Your workspace
A surrogate model to explain the fitted_model automated ML model
The MimicWrapper also takes the automl_run object where the engineered explanations
will be uploaded.
Python
from azureml.interpret import MimicWrapper
# Initialize the Mimic Explainer

explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,
explainable_model=automl_explainer_setup_obj.surrogate_model,
init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,
features=automl_explainer_setup_obj.engineered_feature_names,
feature_maps=
[automl_explainer_setup_obj.feature_map],
classes=automl_explainer_setup_obj.classes,
explainer_kwargs=automl_explainer_setup_obj.surrogate_model_params)
Use Mimic Explainer for computing and visualizing

engineered feature importance
You can call the explain() method in MimicWrapper with the transformed test samples
to get the feature importance for the generated engineered features. You can also sign
in to Azure Machine Learning studio to view the explanations dashboard visualization
of the feature importance values of the generated engineered features by automated
ML featurizers.
Python
engineered_explanations = explainer.explain(['local', 'global'],

eval_dataset=automl_explainer_setup_obj.X_test_transform)
print(engineered_explanations.get_feature_importance_dict())
For models trained with automated ML, you can get the best model using the
get_output() method and compute explanations locally. You can visualize the
explanation results with ExplanationDashboard from the raiwidgets package.
Python
best_run, fitted_model = remote_run.get_output()

AutoMLExplainerSetupClass, automl_setup_model_explanations
automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model,
X=X_train,
X_test=X_test,
y=y_train,
task='regression')
from interpret.ext.glassbox import LGBMExplainableModel

from azureml.interpret.mimic_wrapper import MimicWrapper
explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,

LGBMExplainableModel,
init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,
features=automl_explainer_setup_obj.engineered_feature_names,
feature_maps=
[automl_explainer_setup_obj.feature_map],
classes=automl_explainer_setup_obj.classes)
pip install interpret-community[visualization]
engineered_explanations = explainer.explain(['local', 'global'],

print(engineered_explanations.get_feature_importance_dict()),
ExplanationDashboard(engineered_explanations,
automl_explainer_setup_obj.automl_estimator,
datasetX=automl_explainer_setup_obj.X_test_transform)
raw_explanations = explainer.explain(['local', 'global'], get_raw=True,
raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
print(raw_explanations.get_feature_importance_dict()),
ExplanationDashboard(raw_explanations,
automl_explainer_setup_obj.automl_pipeline,
datasetX=automl_explainer_setup_obj.X_test_raw)
Use Mimic Explainer for computing and visualizing raw

feature importance
You can call the explain() method in MimicWrapper with the transformed test samples
to get the feature importance for the raw features. In the Machine Learning studio ,
you can view the dashboard visualization of the feature importance values of the raw
features.
Python
raw_explanations = explainer.explain(['local', 'global'], get_raw=True,

raw_feature_names=automl_explainer_setup_obj.raw_feature_names,
eval_dataset=automl_explainer_setup_obj.X_test_transform,
raw_eval_dataset=automl_explainer_setup_obj.X_test_raw)
print(raw_explanations.get_feature_importance_dict())
Interpretability during inference

In this section, you learn how to operationalize an automated ML model with the
explainer that was used to compute the explanations in the previous section.
Register the model and the scoring explainer

Use the TreeScoringExplainer to create the scoring explainer that'll compute the
engineered feature importance values at inference time. You initialize the scoring
explainer with the feature_map that was computed previously.
Save the scoring explainer, and then register the model and the scoring explainer with
the Model Management Service. Run the following code:
Python
from azureml.interpret.scoring.scoring_explainer import

TreeScoringExplainer, save
# Initialize the ScoringExplainer

scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=
[automl_explainer_setup_obj.feature_map])
# Pickle scoring explainer locally

save(scoring_explainer, exist_ok=True)
# Register trained automl model present in the 'outputs' folder in the

artifacts
original_model = automl_run.register_model(model_name='automl_model',
model_path='outputs/model.pkl')
# Register scoring explainer

automl_run.upload_file('scoring_explainer.pkl', 'scoring_explainer.pkl')
scoring_explainer_model =
automl_run.register_model(model_name='scoring_explainer',
model_path='scoring_explainer.pkl')
Create the conda dependencies for setting up the service

Next, create the necessary environment dependencies in the container for the deployed
model. Please note that azureml-defaults with version >= 1.0.45 must be listed as a pip
dependency, because it contains the functionality needed to host the model as a web
service.
Python
azureml_pip_packages = [
'azureml-interpret', 'azureml-train-automl', 'azureml-defaults'
]
myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas',

'numpy', 'py-xgboost<=0.80'],
pip_packages=azureml_pip_packages,
pin_sdk_version=True)
with open("myenv.yml","r") as f:
print(f.read())
Create the scoring script

Write a script that loads your model and produces predictions and explanations based
on a new batch of data.
Python
import joblib
import pandas as pd
automl_setup_model_explanations
def init():
global automl_model
global scoring_explainer
# Retrieve the path to the model file using the model name
# Assume original model is named automl_model
automl_model_path = Model.get_model_path('automl_model')
scoring_explainer_path = Model.get_model_path('scoring_explainer')
automl_model = joblib.load(automl_model_path)
scoring_explainer = joblib.load(scoring_explainer_path)
def run(raw_data):
data = pd.read_json(raw_data, orient='records')
# Make prediction
predictions = automl_model.predict(data)
# Setup for inferencing explanations
automl_explainer_setup_obj =
automl_setup_model_explanations(automl_model,
X_test=data, task='classification')
# Retrieve model explanations for engineered explanations
engineered_local_importance_values =
scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform)
# Retrieve model explanations for raw explanations
raw_local_importance_values =
scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform,
get_raw=True)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist(),
'engineered_local_importance_values':
engineered_local_importance_values,
'raw_local_importance_values': raw_local_importance_values}
Deploy the service

Deploy the service using the conda file and the scoring file from the previous steps.
Python

from azureml.core.model import Model, InferenceConfig
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "Bank
Marketing",
"method" :
"local_explanation"},
description='Get local
explanations for Bank marketing test data')
myenv = Environment.from_conda_specification(name="myenv",
file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score_local_explain.py",
environment=myenv)
# Use configs and models generated above

service = Model.deploy(ws,
'model-scoring',
[scoring_explainer_model, original_model],
inference_config,
aciconfig)
Inference with test data

Inference with some test data to see the predicted value from AutoML model, currently
supported only in Azure Machine Learning SDK. View the feature importances
contributing towards a predicted value.
Python
if service.state == 'Healthy':
# Serialize the first row of the test data into json
X_test_json = X_test[:1].to_json(orient='records')
print(X_test_json)
# Call the service to get the predictions and the engineered
explanations
output = service.run(X_test_json)
# Print the predicted value
print(output['predictions'])
# Print the engineered feature importances for the predicted value
print(output['engineered_local_importance_values'])
# Print the raw feature importances for the predicted value
print('raw_local_importance_values:\n{}\n'.format(output['raw_local_importan
ce_values']))
Visualize to discover patterns in data and

explanations at training time
You can visualize the feature importance chart in your workspace in Azure Machine
Learning studio . After your AutoML run is complete, select View model details to view
a specific run. Select the Explanations tab to see the visualizations in the explanation
dashboard.

For more information on the explanation dashboard visualizations and specific plots,
please refer to the how-to doc on interpretability.
Next steps
For more information about how you can enable model explanations and feature
importance in areas other than automated ML, see more techniques for model
interpretability.
What are Azure Machine Learning
pipelines?
Article • 04/04/2023
An Azure Machine Learning pipeline is an independently executable workflow of a

complete machine learning task. An Azure Machine Learning pipeline helps to
standardize the best practices of producing a machine learning model, enables the team
to execute at scale, and improves the model building efficiency.
Why are Azure Machine Learning pipelines

needed?
The core of a machine learning pipeline is to split a complete machine learning task into
a multistep workflow. Each step is a manageable component that can be developed,
optimized, configured, and automated individually. Steps are connected through well-
defined interfaces. The Azure Machine Learning pipeline service automatically
orchestrates all the dependencies between pipeline steps. This modular approach brings
two key benefits:
Standardize the Machine learning operation (MLOps) practice and support scalable
team collaboration
Training efficiency and cost reduction
Standardize the MLOps practice and support scalable

team collaboration
Machine learning operation (MLOps) automates the process of building machine
learning models and taking the model to production. This is a complex process. It
usually requires collaboration from different teams with different skills. A well-defined
machine learning pipeline can abstract this complex process into a multiple steps
workflow, mapping each step to a specific task such that each team can work
independently.
For example, a typical machine learning project includes the steps of data collection,
data preparation, model training, model evaluation, and model deployment. Usually, the
data engineers concentrate on data steps, data scientists spend most time on model
training and evaluation, the machine learning engineers focus on model deployment
and automation of the entire workflow. By leveraging machine learning pipeline, each
team only needs to work on building their own steps. The best way of building steps is
using Azure Machine Learning component, a self-contained piece of code that does one
step in a machine learning pipeline. All these steps built by different users are finally
integrated into one workflow through the pipeline definition. The pipeline is a
collaboration tool for everyone in the project. The process of defining a pipeline and all
its steps can be standardized by each company's preferred DevOps practice. The
pipeline can be further versioned and automated. If the ML projects are described as a
pipeline, then the best MLOps practice is already applied.

Besides being the tool to put MLOps into practice, the machine learning pipeline also
improves large model training's efficiency and reduces cost. Taking modern natural
language model training as an example. It requires pre-processing large amounts of
data and GPU intensive transformer model training. It takes hours to days to train a
model each time. When the model is being built, the data scientist wants to test
different training code or hyperparameters and run the training many times to get the
best model performance. For most of these trainings, there's usually small changes from
one training to another one. It will be a significant waste if every time the full training
from data processing to model training takes place. By using machine learning pipeline,
it can automatically calculate which steps result is unchanged and reuse outputs from
previous training. Additionally, the machine learning pipeline supports running each
step on different computation resources. Such that, the memory heavy data processing
work and run-on high memory CPU machines, and the computation intensive training
can run on expensive GPU machines. By properly choosing which step to run on which
type of machines, the training cost can be significantly reduced.
Getting started best practices

Depending on what a machine learning project already has, the starting point of
building a machine learning pipeline may vary. There are a few typical approaches to
building a pipeline.
The first approach usually applies to the team that hasn't used pipeline before and
wants to take some advantage of pipeline like MLOps. In this situation, data scientists
typically have developed some machine learning models on their local environment
using their favorite tools. Machine learning engineers need to take data scientists'
output into production. The work involves cleaning up some unnecessary code from
original notebook or Python code, changes the training input from local data to
parameterized values, split the training code into multiple steps as needed, perform unit
test of each step, and finally wraps all steps into a pipeline.
Once the teams get familiar with pipelines and want to do more machine learning
projects using pipelines, they'll find the first approach is hard to scale. The second
approach is set up a few pipeline templates, each try to solve one specific machine
learning problem. The template predefines the pipeline structure including how many
steps, each step's inputs and outputs, and their connectivity. To start a new machine
learning project, the team first forks one template repo. The team leader then assigns
members which step they need to work on. The data scientists and data engineers do
their regular work. When they're happy with their result, they structure their code to fit
in the pre-defined steps. Once the structured codes are checked-in, the pipeline can be
executed or automated. If there's any change, each member only needs to work on their
piece of code without touching the rest of the pipeline code.
Once a team has built a collection of machine learnings pipelines and reusable
components, they could start to build the machine learning pipeline from cloning
previous pipeline or tie existing reusable component together. At this stage, the team's
overall productivity will be improved significantly.
Azure Machine Learning offers different methods to build a pipeline. For users who are
familiar with DevOps practices, we recommend using CLI. For data scientists who are
familiar with python, we recommend writing pipeline using the Azure Machine Learning
SDK v2. For users who prefer to use UI, they could use the designer to build pipeline by
using registered components.
:::moniker-end
Which Azure pipeline technology should I use?

The Azure cloud provides several types of pipeline, each with a different purpose. The
following table lists the different pipelines and what they're used for:
Scenario Primary Azure OSS Canonical Strengths

persona offering offering pipe
Model Data Azure Kubeflow Data -> Distribution, caching,

orchestration scientist Machine Pipelines Model code-first, reuse
(Machine Learning
learning) Pipelines
Data Data Azure Data Apache Data -> Strongly typed

orchestration engineer Factory Airflow Data movement, data-centric
(Data prep) pipelines activities
Code & app App Azure Jenkins Code + Most open and flexible
orchestration Developer Pipelines Model -> activity support, approval
(CI/CD) / Ops App/Service queues, phases with
gating
Next steps
Azure Machine Learning pipelines are a powerful facility that begins delivering value in
the early development stages.
Define pipelines with the Azure Machine Learning CLI v2

Define pipelines with the Azure Machine Learning SDK v2
Define pipelines with Designer
Try out CLI v2 pipeline example
Try out Python SDK v2 pipeline example
What is Azure Machine Learning
designer (v1)?
Article • 05/29/2023
Azure Machine Learning designer is a drag-and-drop interface used to train and deploy
models in Azure Machine Learning. This article describes the tasks you can do in the
designer.
７ Note
Designer supports two types of components, classic prebuilt components（v1）

and custom components(v2). These two types of components are NOT compatible.

components added.
and SDK v2.
To get started with the designer, see Tutorial: Train a no-code regression model.
To learn about the components available in the designer, see the Algorithm and
component reference.

The designer uses your Azure Machine Learning workspace to organize shared
resources such as:
Pipelines
Data
Compute resources
Registered models
Published pipelines
Real-time endpoints
Model training and deployment

Use a visual canvas to build an end-to-end machine learning workflow. Train, test, and
deploy models all in the designer:
Drag-and-drop data assets and components onto the canvas.

Connect the components to create a pipeline draft.
Submit a pipeline run using the compute resources in your Azure Machine
Learning workspace.
Convert your training pipelines to inference pipelines.
Publish your pipelines to a REST pipeline endpoint to submit a new pipeline that
runs with different parameters and data assets.
Publish a training pipeline to reuse a single pipeline to train multiple models
while changing parameters and data assets.
Publish a batch inference pipeline to make predictions on new data by using a
previously trained model.
Deploy a real-time inference pipeline to an online endpoint to make predictions
on new data in real time.
Pipeline
A pipeline consists of data assets and analytical components, which you connect.
Pipelines have many uses: you can make a pipeline that trains a single model, or one
that trains multiple models. You can create a pipeline that makes predictions in real time
or in batch, or make a pipeline that only cleans data. Pipelines let you reuse your work
and organize your projects.
Pipeline draft
As you edit a pipeline in the designer, your progress is saved as a pipeline draft. You
can edit a pipeline draft at any point by adding or removing components, configuring
compute targets, creating parameters, and so on.
A valid pipeline has these characteristics:
Data assets can only connect to components.

components can only connect to either data assets or other components.
All input ports for components must have some connection to the data flow.
All required parameters for each component must be set.
When you're ready to run your pipeline draft, you submit a pipeline job.
Pipeline job
Each time you run a pipeline, the configuration of the pipeline and its results are stored
in your workspace as a pipeline job. You can go back to any pipeline job to inspect it for
troubleshooting or auditing. Clone a pipeline job to create a new pipeline draft for you
to edit.
Pipeline jobs are grouped into experiments to organize job history. You can set the
experiment for every pipeline job.
Data
A machine learning data asset makes it easy to access and work with your data. Several
sample data assets are included in the designer for you to experiment with. You can
register more data assets as you need them.
Component
A component is an algorithm that you can perform on your data. The designer has
several components ranging from data ingress functions to training, scoring, and
validation processes.
A component may have a set of parameters that you can use to configure the
component's internal algorithms. When you select a component on the canvas, the
component's parameters are displayed in the Properties pane to the right of the canvas.
You can modify the parameters in that pane to tune your model. You can set the
compute resources for individual components in the designer.
For some help navigating through the library of machine learning algorithms available,
see Algorithm & component reference overview. For help with choosing an algorithm,
see the Azure Machine Learning Algorithm Cheat Sheet.
Compute resources
Use compute resources from your workspace to run your pipeline and host your
deployed models as online endpoints or pipeline endpoints (for batch inference). The
supported compute targets are:
Compute target Training Deployment
Azure Machine Learning compute ✓
Azure Kubernetes Service ✓
Compute targets are attached to your Azure Machine Learning workspace. You manage
your compute targets in your workspace in the Azure Machine Learning studio .
Deploy
To perform real-time inferencing, you must deploy a pipeline as an online endpoint. The
online endpoint creates an interface between an external application and your scoring
model. A call to an online endpoint returns prediction results to the application in real
time. To make a call to an online endpoint, you pass the API key that was created when
you deployed the endpoint. The endpoint is based on REST, a popular architecture
choice for web programming projects.
Online endpoints must be deployed to an Azure Kubernetes Service cluster.
To learn how to deploy your model, see Tutorial: Deploy a machine learning model with
the designer.
Publish
You can also publish a pipeline to a pipeline endpoint. Similar to an online endpoint, a
pipeline endpoint lets you submit new pipeline jobs from external applications using
REST calls. However, you cannot send or receive data in real time using a pipeline
endpoint.
Published pipelines are flexible, they can be used to train or retrain models, perform
batch inferencing, process new data, and much more. You can publish multiple pipelines
to a single pipeline endpoint and specify which pipeline version to run.
A published pipeline runs on the compute resources you define in the pipeline draft for
each component.
The designer creates the same PublishedPipeline object as the SDK.
Next steps
Learn the fundamentals of predictive analytics and machine learning with Tutorial:
Predict automobile price with the designer
Learn how to modify existing designer samples to adapt them to your needs.
Machine Learning Algorithm Cheat
Sheet for Azure Machine Learning
designer
Article • 05/29/2023
The Azure Machine Learning Algorithm Cheat Sheet helps you choose the right
algorithm from the designer for a predictive analytics model.
７ Note

components added.
and SDK v2.
Azure Machine Learning has a large library of algorithms from the classification,
recommender systems, clustering, anomaly detection, regression, and text analytics
families. Each is designed to address a different type of machine learning problem.
For more information, see How to select algorithms.
Download: Machine Learning Algorithm Cheat

Sheet
Download the cheat sheet here: Machine Learning Algorithm Cheat Sheet (11x17 in.)

Download and print the Machine Learning Algorithm Cheat Sheet in tabloid size to keep
it handy and get help choosing an algorithm.
How to use the Machine Learning Algorithm

Cheat Sheet
The suggestions offered in this algorithm cheat sheet are approximate rules-of-thumb.
Some can be bent, and some can be flagrantly violated. This cheat sheet is intended to
suggest a starting point. Don’t be afraid to run a head-to-head competition between
several algorithms on your data. There is simply no substitute for understanding the
principles of each algorithm and the system that generated your data.
Every machine learning algorithm has its own style or inductive bias. For a specific
problem, several algorithms may be appropriate, and one algorithm may be a better fit
than others. But it's not always possible to know beforehand, which is the best fit. In
cases like these, several algorithms are listed together in the cheat sheet. An appropriate
strategy would be to try one algorithm, and if the results are not yet satisfactory, try the
others.
To learn more about the algorithms in Azure Machine Learning designer, go to the
Algorithm and component reference.
Kinds of machine learning

There are three main categories of machine learning: supervised learning, unsupervised
learning, and reinforcement learning.
Supervised learning
In supervised learning, each data point is labeled or associated with a category or value
of interest. An example of a categorical label is assigning an image as either a ‘cat’ or a
‘dog’. An example of a value label is the sale price associated with a used car. The goal
of supervised learning is to study many labeled examples like these, and then to be able
to make predictions about future data points. For example, identifying new photos with
the correct animal or assigning accurate sale prices to other used cars. This is a popular
and useful type of machine learning.
Unsupervised learning
In unsupervised learning, data points have no labels associated with them. Instead, the
goal of an unsupervised learning algorithm is to organize the data in some way or to
describe its structure. Unsupervised learning groups data into clusters, as K-means does,
or finds different ways of looking at complex data so that it appears simpler.
Reinforcement learning
In reinforcement learning, the algorithm gets to choose an action in response to each
data point. It is a common approach in robotics, where the set of sensor readings at one
point in time is a data point, and the algorithm must choose the robot’s next action. It's
also a natural fit for Internet of Things applications. The learning algorithm also receives
a reward signal a short time later, indicating how good the decision was. Based on this
signal, the algorithm modifies its strategy in order to achieve the highest reward.
Next steps
See more information on How to select algorithms
Learn about studio in Azure Machine Learning and the Azure portal.
Tutorial: Build a prediction model in Azure Machine Learning designer.
Learn about deep learning vs. machine learning.

How to select algorithms for Azure
Machine Learning
Article • 05/29/2023
A common question is “Which machine learning algorithm should I use?” The algorithm
you select depends primarily on two different aspects of your data science scenario:
What you want to do with your data? Specifically, what is the business question
you want to answer by learning from your past data?
What are the requirements of your data science scenario? Specifically, what is the
accuracy, training time, linearity, number of parameters, and number of features
your solution supports?
７ Note

components added.
and SDK v2.
Business scenarios and the Machine Learning

Algorithm Cheat Sheet
The Azure Machine Learning Algorithm Cheat Sheet helps you with the first
consideration: What you want to do with your data? On the Machine Learning
Algorithm Cheat Sheet, look for task you want to do, and then find a Azure Machine
Learning designer algorithm for the predictive analytics solution.
Machine Learning designer provides a comprehensive portfolio of algorithms, such as

Multiclass Decision Forest, Recommendation systems, Neural Network Regression,
Multiclass Neural Network, and K-Means Clustering. Each algorithm is designed to
address a different type of machine learning problem. See the Machine Learning
designer algorithm and component reference for a complete list along with
documentation about how each algorithm works and how to tune parameters to
optimize the algorithm.
７ Note
Download the cheat sheet here: Machine Learning Algorithm Cheat Sheet (11x17
in.)
Along with guidance in the Azure Machine Learning Algorithm Cheat Sheet, keep in
mind other requirements when choosing a machine learning algorithm for your solution.
Following are additional factors to consider, such as the accuracy, training time, linearity,
number of parameters and number of features.
Comparison of machine learning algorithms

Some learning algorithms make particular assumptions about the structure of the data
or the desired results. If you can find one that fits your needs, it can give you more
useful results, more accurate predictions, or faster training times.
The following table summarizes some of the most important characteristics of

algorithms from the classification, regression, and clustering families:
Algorithm Accuracy Training Linearity Parameters Notes
time
Classification
family
Two-Class Good Fast Yes 4

logistic
regression
Two-class Excellent Moderate No 5 Shows slower scoring times.

decision Suggest not working with One-
forest vs-All Multiclass, because of
slower scoring times caused by
tread locking in accumulating
tree predictions
Two-class Excellent Moderate No 6 Large memory footprint

boosted
decision tree
Two-class Good Moderate No 8

neural
network
Two-class Good Moderate Yes 4

averaged
perceptron
Two-class Good Fast Yes 5 Good for large feature sets

support
vector
machine
Multiclass Good Fast Yes 4

logistic
regression
Multiclass Excellent Moderate No 5 Shows slower scoring times

decision
forest
Multiclass Excellent Moderate No 6 Tends to improve accuracy with

boosted some small risk of less
decision tree coverage
Multiclass Good Moderate No 8

neural
network
Algorithm Accuracy Training Linearity Parameters Notes
time
One-vs-all - - - - See properties of the two-class

multiclass method selected
Regression
family
Linear Good Fast Yes 4

regression
Decision Excellent Moderate No 5

forest
regression
Boosted Excellent Moderate No 6 Large memory footprint

decision tree
regression
Neural Good Moderate No 8

network
regression
Clustering
family
K-means Excellent Moderate Yes 8 A clustering algorithm

clustering
Requirements for a data science scenario

Once you know what you want to do with your data, you need to determine additional
requirements for your solution.
Make choices and possibly trade-offs for the following requirements:
Accuracy
Training time
Linearity
Number of parameters
Number of features
Accuracy
Accuracy in machine learning measures the effectiveness of a model as the proportion
of true results to total cases. In Machine Learning designer, the Evaluate Model
component computes a set of industry-standard evaluation metrics. You can use this
component to measure the accuracy of a trained model.
Getting the most accurate answer possible isn’t always necessary. Sometimes an
approximation is adequate, depending on what you want to use it for. If that is the case,
you may be able to cut your processing time dramatically by sticking with more
approximate methods. Approximate methods also naturally tend to avoid overfitting.
There are three ways to use the Evaluate Model component:
Generate scores over your training data in order to evaluate the model
Generate scores on the model, but compare those scores to scores on a reserved
testing set
Compare scores for two different but related models, using the same set of data
For a complete list of metrics and approaches you can use to evaluate the accuracy of
machine learning models, see Evaluate Model component.
Training time
In supervised learning, training means using historical data to build a machine learning
model that minimizes errors. The number of minutes or hours necessary to train a
model varies a great deal between algorithms. Training time is often closely tied to
accuracy; one typically accompanies the other.
In addition, some algorithms are more sensitive to the number of data points than
others. You might choose a specific algorithm because you have a time limitation,
especially when the data set is large.
In Machine Learning designer, creating and using a machine learning model is typically
a three-step process:
1. Configure a model, by choosing a particular type of algorithm, and then defining

its parameters or hyperparameters.
2. Provide a dataset that is labeled and has data compatible with the algorithm.
Connect both the data and the model to Train Model component.
3. After training is completed, use the trained model with one of the scoring
components to make predictions on new data.
Linearity
Linearity in statistics and machine learning means that there is a linear relationship
between a variable and a constant in your dataset. For example, linear classification
algorithms assume that classes can be separated by a straight line (or its higher-
dimensional analog).
Lots of machine learning algorithms make use of linearity. In Azure Machine Learning
designer, they include:
Multiclass logistic regression

Two-class logistic regression
Support vector machines
Linear regression algorithms assume that data trends follow a straight line. This
assumption isn't bad for some problems, but for others it reduces accuracy. Despite
their drawbacks, linear algorithms are popular as a first strategy. They tend to be
algorithmically simple and fast to train.
Nonlinear class boundary: Relying on a linear classification algorithm would result in low
accuracy.
Data with a nonlinear trend: Using a linear regression method would generate much
larger errors than necessary.
Number of parameters
Parameters are the knobs a data scientist gets to turn when setting up an algorithm.
They are numbers that affect the algorithm’s behavior, such as error tolerance or
number of iterations, or options between variants of how the algorithm behaves. The
training time and accuracy of the algorithm can sometimes be sensitive to getting just
the right settings. Typically, algorithms with large numbers of parameters require the
most trial and error to find a good combination.
Alternatively, there is the Tune Model Hyperparameters component in Machine Learning

designer: The goal of this component is to determine the optimum hyperparameters for
a machine learning model. The component builds and tests multiple models by using
different combinations of settings. It compares metrics over all models to get the
combinations of settings.
While this is a great way to make sure you’ve spanned the parameter space, the time
required to train a model increases exponentially with the number of parameters. The
upside is that having many parameters typically indicates that an algorithm has greater
flexibility. It can often achieve very good accuracy, provided you can find the right
combination of parameter settings.
Number of features
In machine learning, a feature is a quantifiable variable of the phenomenon you are
trying to analyze. For certain types of data, the number of features can be very large
compared to the number of data points. This is often the case with genetics or textual
data.
A large number of features can bog down some learning algorithms, making training
time unfeasibly long. Support vector machines are particularly well suited to scenarios
with a high number of features. For this reason, they have been used in many
applications from information retrieval to text and image classification. Support vector
machines can be used for both classification and regression tasks.
Feature selection refers to the process of applying statistical tests to inputs, given a
specified output. The goal is to determine which columns are more predictive of the
output. The Filter Based Feature Selection component in Machine Learning designer
provides multiple feature selection algorithms to choose from. The component includes
correlation methods such as Pearson correlation and chi-squared values.
You can also use the Permutation Feature Importance component to compute a set of
feature importance scores for your dataset. You can then leverage these scores to help
you determine the best features to use in a model.
Next steps
Learn more about Azure Machine Learning designer
For descriptions of all the machine learning algorithms available in Azure Machine
Learning designer, see Machine Learning designer algorithm and component
reference
To explore the relationship between deep learning, machine learning, and AI, see
Deep Learning vs. Machine Learning
Transform data in Azure Machine
Learning designer
Article • 05/29/2023
In this article, you'll learn how to transform and save datasets in the Azure Machine
Learning designer, to prepare your own data for machine learning.
You'll use the sample Adult Census Income Binary Classification dataset to prepare two
datasets: one dataset that includes adult census information from only the United
States, and another dataset that includes census information from non-US adults.
In this article, you'll learn how to:
1. Transform a dataset to prepare it for training.

2. Export the resulting datasets to a datastore.
3. View the results.
This how-to is a prerequisite for the how to retrain designer models article. In that
article, you'll learn how to use the transformed datasets to train multiple models, with
pipeline parameters.
） Important
users and roles.
Transform a dataset
In this section, you'll learn how to import the sample dataset, and split the data into US
and non-US datasets. See how to import data for more information about how to
import your own data into the designer.
Import data
Use these steps to import the sample dataset:
1. Sign in to ml.azure.com , and select the workspace you want to use.

2. Go to the designer. Select Easy-to-use-prebuild components to create a new
pipeline.
3. Select a default compute target to run the pipeline.
4. To the left of the pipeline canvas, you'll see a palette of datasets and components.
Select Datasets. Then view the Samples section.
5. Drag and drop the Adult Census Income Binary classification dataset onto the
canvas.
6. Right-click the Adult Census Income dataset component, and select Visualize >
Dataset output
7. Use the data preview window to explore the dataset. Take special note of the
"native-country" column values.
Split the data

In this section, you'll use the Split Data component to identify and split rows that
contain "United-States" in the "native-country" column.
1. To the left of the canvas, in the component palette, expand the Data
Transformation section, and find the Split Data component.
2. Drag the Split Data component onto the canvas, and drop that component below
the dataset component.
3. Connect the dataset component to the Split Data component.
5. To the right of the canvas in the component details pane, set Splitting mode to
Regular Expression.
6. Enter the Regular Expression: \"native-country" United-States .
The Regular expression mode tests a single column for a value. See the related
algorithm component reference page for more information on the Split Data
component.
Your pipeline should look like this:

Save the datasets
Now that you set up your pipeline to split the data, you must specify where to persist
the datasets. For this example, use the Export Data component to save your dataset to a
datastore. See Connect to Azure storage services for more information about datastores.
1. To the left of the canvas in the component palette, expand the Data Input and
Output section, and find the Export Data component.
2. Drag and drop two Export Data components below the Split Data component.
3. Connect each output port of the Split Data component to a different Export Data
component.
Your pipeline should look something like this:

.
4. Select the Export Data component connected to the left-most port of the Split
Data component.
For the Split Data component, the output port order matters. The first output port
contains the rows where the regular expression is true. In this case, the first port
contains rows for US-based income, and the second port contains rows for non-US
based income.
5. In the component details pane to the right of the canvas, set the following options:
Datastore type: Azure Blob Storage
Datastore: Select an existing datastore, or select "New datastore" to create one

now.
Path: /data/us-income
File format: csv
７ Note
This article assumes that you have access to a datastore registered to the
current Azure Machine Learning workspace. See Connect to Azure storage
services for datastore setup instructions.
You can create a datastore if you don't have one now. For example purposes, this
article will save the datasets to the default blob storage account associated with
the workspace. It will save the datasets into the azureml container, in a new folder
named data .
6. Select the Export Data component connected to the right-most port of the Split
Data component.
7. To the right of the canvas in the component details pane, set the following options:
Datastore type: Azure Blob Storage
Datastore: Select the same datastore as above
Path: /data/non-us-income
File format: csv
8. Verify that the Export Data component connected to the left port of the Split Data
has the Path /data/us-income .
9. Verify that the Export Data component connected to the right port has the Path
/data/non-us-income .
Your pipeline and settings should look like this:
Submit the job

Now that you set up your pipeline to split and export the data, submit a pipeline job.
1. Select Submit at the top of the canvas.
2. Select Create new in the Set up pipeline job, to create an experiment.
Experiments logically group related pipeline jobs together. If you run this pipeline
in the future, you should use the same experiment for logging and tracking
purposes.
3. Provide a descriptive experiment name - for example "split-census-data".

4. Select Submit.
View results
After the pipeline finishes running, you can navigate to your Azure portal blob storage
to view your results. You can also view the intermediary results of the Split Data
component to confirm that your data has been split correctly.
2. In the component details pane to the right of the canvas, select Outputs + logs.
3. Select the visualize icon next to Results dataset1.
4. Verify that the "native-country" column contains only the value "United-States".
5. Select the visualize icon next to Results dataset2.
6. Verify that the "native-country" column does not contain the value "United-States".
Clean up resources
To continue with part two of this Retrain models with Azure Machine Learning designer
how-to, skip this section.
） Important
Delete everything

Next steps
In this article, you learned how to transform a dataset, and save it to a registered
datastore.
Continue to the next part of this how-to series with Retrain models with Azure Machine
Learning designer, to use your transformed datasets and pipeline parameters to train
machine learning models.
Use pipeline parameters in the designer
to build versatile pipelines
Article • 05/29/2023
Use pipeline parameters to build flexible pipelines in the designer. Pipeline parameters
let you dynamically set values at runtime to encapsulate pipeline logic and reuse assets.
Pipeline parameters are especially useful when resubmitting a pipeline job, retraining
models, or performing batch predictions.
In this article, you learn how to do the following:
＂ Create pipeline parameters

＂ Delete and manage pipeline parameters
＂ Trigger pipeline jobs while adjusting pipeline parameters
Prerequisites
An Azure Machine Learning workspace. See Create workspace resources.
For a guided introduction to the designer, complete the designer tutorial.
） Important
users and roles.
Create pipeline parameter

There are three ways to create a pipeline parameter in the designer:
Create a pipeline parameter in the settings panel, and bind it to a component.

Promote a component parameter to a pipeline parameter.
Promote a dataset to a pipeline parameter
７ Note
Pipeline parameters only support basic data types like int , float , and string .
Option 1: Create a pipeline parameter in the settings

panel
In this section, you create a pipeline parameter in the settings panel.
In this example, you create a pipeline parameter that defines how a pipeline fills in
missing data using the Clean missing data component.
1. Next to the name of your pipeline draft, select the gear icon to open the Settings
panel.
2. In the Pipeline parameters section, select the + icon.
3. Enter a name for the parameter and a default value.
For example, enter replace-missing-value as parameter name and 0 as default

value.
After you create a pipeline parameter, you must attach it to the component parameter
that you want to dynamically set.
Option 2: Promote a component parameter

The simplest way to create a pipeline parameter for a component value is to promote a
component parameter. Use the following steps to promote a component parameter to a
pipeline parameter:
1. Select the component you want to attach a pipeline parameter to.
2. In the component detail pane, mouseover the parameter you want to specify.
3. Select the ellipses (...) that appear.
4. Select Add to pipeline parameter.
5. Enter a parameter name and default value.
6. Select Save
You can now specify new values for this parameter anytime you submit this pipeline.
Option 3: Promote a dataset to a pipeline parameter

If you want to submit your pipeline with variable datasets, you must promote your
dataset to a pipeline parameter:
1. Select the dataset you want to turn into a pipeline parameter.
2. In the detail panel of dataset, check Set as pipeline parameter.

You can now specify a different dataset by using the pipeline parameter the next time
you run the pipeline.
Attach and detach component parameter to

pipeline parameter
In this section, you will learn how to attach and detach component parameter to
pipeline parameter.
Attach component parameter to pipeline parameter

You can attach the same component parameters of duplicated components to the same
pipeline parameter if you want to alter the value at one time when triggering the
pipeline job.
The following example has duplicated Clean Missing Data component. For each Clean
Missing Data component, attach Replacement value to pipeline parameter replace-
missing-value:
1. Select the Clean Missing Data component.
2. In the component detail pane, to the right of the canvas, set the Cleaning mode to
"Custom substitution value".
3. Mouseover the Replacement value field.
4. Select the ellipses (...) that appear.
5. Select the pipeline parameter replace-missing-value .

You have successfully attached the Replacement value field to your pipeline parameter.
Detach component parameter to pipeline parameter

After you attach Replacement value to pipeline parameter, it is non-actionable.
You can detach component parameter to pipeline parameter by clicking the ellipses (...)
next to the component parameter, and select Detach from pipeline parameter.
Update and delete pipeline parameters
In this section, you learn how to update and delete pipeline parameters.
Update pipeline parameters

Use the following steps to update a component pipeline parameter:
1. At the top of the canvas, select the gear icon.

2. In the Pipeline parameters section, you can view and update the name and default
value for all of your pipeline parameter.
Delete a dataset pipeline parameter

Use the following steps to delete a dataset pipeline parameter:
1. Select the dataset component.

2. Uncheck the option Set as pipeline parameter.
Delete component pipeline parameters

Use the following steps to delete a component pipeline parameter:
1. At the top of the canvas, select the gear icon.
2. Select the ellipses (...) next to the pipeline parameter.
This view shows you which components the pipeline parameter is attached to.
3. Select Delete parameter to delete the pipeline parameter.
７ Note
Deleting a pipeline parameter will cause all attached component parameters

to be detached and the value of detached component parameters will keep
current pipeline parameter value.
Trigger a pipeline job with pipeline parameters

In this section, you learn how to submit a pipeline job while setting pipeline parameters.
Resubmit a pipeline job

After submitting a pipeline with pipeline parameters, you can resubmit a pipeline job
with different parameters:
1. Go to pipeline detail page. In the Pipeline job overview window, you can check
current pipeline parameters and values.
2. Select Resubmit.
3. In the Setup pipeline job, specify your new pipeline parameters.
Use published pipelines

You can also publish a pipeline to use its pipeline parameters. A published pipeline is a
pipeline that has been deployed to a compute resource, which client applications can
invoke via a REST endpoint.
Published endpoints are especially useful for retraining and batch prediction scenarios.
For more information, see How to retrain models in the designer or Run batch
predictions in the designer.
Next steps
In this article, you learned how to create pipeline parameters in the designer. Next, see
how you can use pipeline parameters to retrain models or perform batch predictions.
You can also learn how to use pipelines programmatically with the SDK v1.
Use pipeline parameters to retrain
models in the designer
Article • 05/29/2023
In this how-to article, you learn how to use Azure Machine Learning designer to retrain a
machine learning model using pipeline parameters. You will use published pipelines to
automate your workflow and set parameters to train your model on new data. Pipeline
parameters let you re-use existing pipelines for different jobs.
＂ Train a machine learning model.

＂ Create a pipeline parameter.
＂ Publish your training pipeline.
＂ Retrain your model with new parameters.
Prerequisites
Complete part 1 of this how-to series, Transform data in the designer
） Important
users and roles.
This article also assumes that you have some knowledge of building pipelines in the
designer. For a guided introduction, complete the tutorial.
Sample pipeline
The pipeline used in this article is an altered version of a sample pipeline Income
prediction in the designer homepage. The pipeline uses the Import Data component
instead of the sample dataset to show you how to train models using your own data.
Create a pipeline parameter
Pipeline parameters are used to build versatile pipelines which can be resubmitted later
with varying parameter values. Some common scenarios are updating datasets or some
hyper-parameters for retraining. Create pipeline parameters to dynamically set variables
at runtime.
Pipeline parameters can be added to data source or component parameters in a

pipeline. When the pipeline is resubmitted, the values of these parameters can be
specified.
For this example, you will change the training data path from a fixed value to a
parameter, so that you can retrain your model on different data. You can also add other
component parameters as pipeline parameters according to your use case.
1. Select the Import Data component.
７ Note
This example uses the Import Data component to access data in a registered
datastore. However, you can follow similar steps if you use alternative data
access patterns.
2. In the component detail pane, to the right of the canvas, select your data source.
3. Enter the path to your data. You can also select Browse path to browse your file
tree.
4. Mouseover the Path field, and select the ellipses above the Path field that appear.
5. Select Add to pipeline parameter.
6. Provide a parameter name and a default value.
7. Select Save.
７ Note
You can also detach a component parameter from pipeline parameter in the
component detail pane, similar to adding pipeline parameters.
You can inspect and edit your pipeline parameters by selecting the Settings
gear icon next to the title of your pipeline draft.
After detaching, you can delete the pipeline parameter in the Setings
pane.
You can also add a pipeline parameter in the Settings pane, and then
apply it on some component parameter.
8. Submit the pipeline job.
Publish a training pipeline

Publish a pipeline to a pipeline endpoint to easily reuse your pipelines in the future. A
pipeline endpoint creates a REST endpoint to invoke pipeline in the future. In this
example, your pipeline endpoint lets you reuse your pipeline to retrain a model on
different data.
1. Select Publish above the designer canvas.
2. Select or create a pipeline endpoint.
７ Note
You can publish multiple pipelines to a single endpoint. Each pipeline in a

given endpoint is given a version number, which you can specify when you
call the pipeline endpoint.
3. Select Publish.
Retrain your model

Now that you have a published training pipeline, you can use it to retrain your model on
new data. You can submit jobs from a pipeline endpoint from the studio workspace or
programmatically.
Submit jobs by using the studio portal

Use the following steps to submit a parameterized pipeline endpoint job from the
studio portal:
1. Go to the Endpoints page in your studio workspace.

2. Select the Pipeline endpoints tab. Then, select your pipeline endpoint.
3. Select the Published pipelines tab. Then, select the pipeline version that you want
to run.
4. Select Submit.
5. In the setup dialog box, you can specify the parameters values for the job. For this
example, update the data path to train your model using a non-US dataset.
Submit jobs by using code

You can find the REST endpoint of a published pipeline in the overview panel. By calling
the endpoint, you can retrain the published pipeline.
To make a REST call, you need an OAuth 2.0 bearer-type authentication header. For
information about setting up authentication to your workspace and making a
parameterized REST call, see Build an Azure Machine Learning pipeline for batch scoring.
Next steps
In this article, you learned how to create a parameterized training pipeline endpoint
using the designer.
For a complete walkthrough of how you can deploy a model to make predictions, see
the designer tutorial to train and deploy a regression model.
For how to publish and submit a job to pipeline endpoint using the SDK v1, see this
article.
Run batch predictions using Azure
Machine Learning designer
Article • 05/29/2023
In this article, you learn how to use the designer to create a batch prediction pipeline.
Batch prediction lets you continuously score large datasets on-demand using a web
service that can be triggered from any HTTP library.
In this how-to, you learn to do the following tasks:
＂ Create and publish a batch inference pipeline

＂ Consume a pipeline endpoint
＂ Manage endpoint versions
To learn how to set up batch scoring services using the SDK, see the accompanying
tutorial on pipeline batch scoring.
Prerequisites
This how-to assumes you already have a training pipeline. For a guided introduction to
the designer, complete part one of the designer tutorial.
） Important
users and roles.
Create a batch inference pipeline

Your training pipeline must be run at least once to be able to create an inferencing
pipeline.
1. Go to the Designer tab in your workspace.
2. Select the training pipeline that trains the model you want to use to make
prediction.
3. Submit the pipeline.
You'll see a submission list on the left of canvas. You can select the job detail link to go
to the job detail page, and after the training pipeline job completes, you can create a
batch inference pipeline.

1. In job detail page, above the canvas, select the dropdown Create inference
pipeline. Select Batch inference pipeline.
７ Note
Currently auto-generating inference pipeline only works for training pipeline

built purely by the designer built-in components.
It will create a batch inference pipeline draft for you. The batch inference pipeline
draft uses the trained model as MD- node and transformation as TD- node from
the training pipeline job.
You can also modify this inference pipeline draft to better handle your input data
for batch inference.
Add a pipeline parameter

To create predictions on new data, you can either manually connect a different dataset
in this pipeline draft view or create a parameter for your dataset. Parameters let you
change the behavior of the batch inferencing process at runtime.
In this section, you create a dataset parameter to specify a different dataset to make
predictions on.
1. Select the dataset component.
2. A pane will appear to the right of the canvas. At the bottom of the pane, select Set
as pipeline parameter.
Enter a name for the parameter, or accept the default value.


3. Submit the batch inference pipeline and go to job detail page by selecting the job
link in the left pane.
Publish your batch inference pipeline

Now you're ready to deploy the inference pipeline. This will deploy the pipeline and
make it available for others to use.
1. Select the Publish button.
2. In the dialog that appears, expand the drop-down for PipelineEndpoint, and select
New PipelineEndpoint.
3. Provide an endpoint name and optional description.
Near the bottom of the dialog, you can see the parameter you configured with a
default value of the dataset ID used during training.
4. Select Publish.

Consume an endpoint
Now, you have a published pipeline with a dataset parameter. The pipeline will use the
trained model created in the training pipeline to score the dataset you provide as a
parameter.
Submit a pipeline job

In this section, you'll set up a manual pipeline job and alter the pipeline parameter to
score new data.
1. After the deployment is complete, go to the Endpoints section.
2. Select Pipeline endpoints.
3. Select the name of the endpoint you created.

1. Select Published pipelines.
This screen shows all published pipelines published under this endpoint.
2. Select the pipeline you published.
The pipeline details page shows you a detailed job history and connection string
information for your pipeline.
3. Select Submit to create a manual run of the pipeline.


4. Change the parameter to use a different dataset.
5. Select Submit to run the pipeline.
Use the REST endpoint

You can find information on how to consume pipeline endpoints and published pipeline
in the Endpoints section.
You can find the REST endpoint of a pipeline endpoint in the job overview panel. By
calling the endpoint, you're consuming its default published pipeline.
You can also consume a published pipeline in the Published pipelines page. Select a
published pipeline and you can find the REST endpoint of it in the Published pipeline
overview panel to the right of the graph.
To make a REST call, you'll need an OAuth 2.0 bearer-type authentication header. See
the following tutorial section for more detail on setting up authentication to your
workspace and making a parameterized REST call.
Versioning endpoints
The designer assigns a version to each subsequent pipeline that you publish to an
endpoint. You can specify the pipeline version that you want to execute as a parameter
in your REST call. If you don't specify a version number, the designer will use the default
pipeline.
When you publish a pipeline, you can choose to make it the new default pipeline for
that endpoint.
You can also set a new default pipeline in the Published pipelines tab of your endpoint.
Update pipeline endpoint
If you make some modifications in your training pipeline, you may want to update the
newly trained model to the pipeline endpoint.
1. After your modified training pipeline completes successfully, go to the job detail
page.
2. Right click Train Model component and select Register data
Input name and select File type.


3. Find the previous batch inference pipeline draft, or you can just Clone the
published pipeline into a new draft.
4. Replace the MD- node in the inference pipeline draft with the registered data in
the step above.
5. Updating data transformation node TD- is the same as the trained model.
6. Then you can submit the inference pipeline with the updated model and
transformation, and publish again.
Next steps
Follow the designer tutorial to train and deploy a regression model.
For how to publish and run a published pipeline using the SDK v1, see the How to
deploy pipelines article.
Run Python code in Azure Machine
Learning designer
Article • 05/29/2023
In this article, you'll learn how to use the Execute Python Script component to add
custom logic to the Azure Machine Learning designer. In this how-to, you use the
Pandas library to do simple feature engineering.
You can use the in-built code editor to quickly add simple Python logic. You should use
the zip file method to add more complex code, or to upload additional Python libraries.
The default execution environment uses the Anacondas distribution of Python. See the
Execute Python Script component reference page for a complete list of pre-installed
packages.
） Important
users and roles.
Execute Python written in the designer
Add the Execute Python Script component

1. Find the Execute Python Script component in the designer palette. It can be found
in the Python Language section.
2. Drag and drop the component onto the pipeline canvas.
Connect input datasets

This article uses the Automobile price data (Raw) sample dataset.
1. Drag and drop your dataset to the pipeline canvas.
2. Connect the output port of the dataset to the top-left input port of the Execute
Python Script component. The designer exposes the input as a parameter to the
entry point script.
The right input port is reserved for zipped Python libraries.
3. Carefully note the specific input port you use. The designer assigns the left input
port to the variable dataset1 , and the middle input port to dataset2 .
Input components are optional, since you can generate or import data directly in the
Execute Python Script component.
Write your Python code

The designer provides an initial entry point script for you to edit and enter your own
Python code.
In this example, you use Pandas to combine two of the automobile dataset columns -
Price and Horsepower - to create a new column, Dollars per horsepower. This column
represents how much you pay for each horsepower unit, which could become a useful
information point to decide if a specific car is a good deal for its price.
1. Select the Execute Python Script component.
2. In the pane that appears to the right of the canvas, select the Python script text
box.
3. Copy and paste the following code into the text box:
Python
import pandas as pd
def azureml_main(dataframe1 = None, dataframe2 = None):

dataframe1['Dollar/HP'] = dataframe1.price / dataframe1.horsepower
return dataframe1
Your pipeline should look like this image:
The entry point script must contain the function azureml_main . The function has
two function parameters that map to the two input ports for the Execute Python
Script component.
The return value must be a Pandas Dataframe. You can return at most two
dataframes as component outputs.
4. Submit the pipeline.

Now you have a dataset, which has a new Dollars/HP feature. This new feature could
help to train a car recommender. This example shows feature extraction and
dimensionality reduction.
Next steps
Learn how to import your own data in Azure Machine Learning designer.
Create and run machine learning
pipelines with Azure Machine Learning
SDK
Article • 06/01/2023
In this article, you learn how to create and run machine learning pipelines by using the
Azure Machine Learning SDK. Use ML pipelines to create a workflow that stitches
together various ML phases. Then, publish that pipeline for later access or sharing with
others. Track ML pipelines to see how your model is performing in the real world and to
detect data drift. ML pipelines are ideal for batch scoring scenarios, using various
computes, reusing steps instead of rerunning them, and sharing ML workflows with
others.
This article isn't a tutorial. For guidance on creating your first pipeline, see Tutorial: Build
an Azure Machine Learning pipeline for batch scoring or Use automated ML in an Azure
Machine Learning pipeline in Python.
While you can use a different kind of pipeline called an Azure Pipeline for CI/CD
automation of ML tasks, that type of pipeline isn't stored in your workspace. Compare
these different pipelines.
The ML pipelines you create are visible to the members of your Azure Machine Learning
workspace.
ML pipelines execute on compute targets (see What are compute targets in Azure
Machine Learning). Pipelines can read and write data to and from supported Azure
Storage locations.
If you don't have an Azure subscription, create a free account before you begin. Try the
free or paid version of Azure Machine Learning .
Prerequisites

installed.
Start by attaching your workspace:
Python
import azureml.core
Set up machine learning resources

Create the resources required to run an ML pipeline:
Set up a datastore used to access the data needed in the pipeline steps.
Configure a Dataset object to point to persistent data that lives in, or is accessible
in, a datastore. Configure an OutputFileDatasetConfig object for temporary data
passed between pipeline steps.
Set up the compute targets on which your pipeline steps will run.
Set up a datastore
A datastore stores the data for the pipeline to access. Each workspace has a default
datastore. You can register more datastores.
When you create your workspace, Azure Files and Azure Blob storage are attached to
the workspace. A default datastore is registered to connect to the Azure Blob storage. To
learn more, see Deciding when to use Azure Files, Azure Blobs, or Azure Disks.
Python
# Default datastore
def_data_store = ws.get_default_datastore()
# Get the blob storage associated with the workspace

def_blob_store = Datastore(ws, "workspaceblobstore")
# Get file storage associated with the workspace

def_file_store = Datastore(ws, "workspacefilestore")
Steps generally consume data and produce output data. A step can create data such as
a model, a directory with model and dependent files, or temporary data. This data is
then available for other steps later in the pipeline. To learn more about connecting your
pipeline to your data, see the articles How to Access Data and How to Register Datasets.
Configure data with Dataset and OutputFileDatasetConfig

objects
The preferred way to provide data to a pipeline is a Dataset object. The Dataset object
points to data that lives in or is accessible from a datastore or at a Web URL. The
Dataset class is abstract, so you'll create an instance of either a FileDataset (referring
to one or more files) or a TabularDataset that's created by from one or more files with
delimited columns of data.
You create a Dataset using methods like from_files or from_delimited_files.
Python
my_dataset = Dataset.File.from_files([(def_blob_store, 'train-images/')])
Intermediate data (or output of a step) is represented by an OutputFileDatasetConfig

object. output_data1 is produced as the output of a step. Optionally, this data can be
registered as a dataset by calling register_on_complete . If you create an
OutputFileDatasetConfig in one step and use it as an input to another step, that data
dependency between steps creates an implicit execution order in the pipeline.
OutputFileDatasetConfig objects return a directory, and by default writes output to the

default datastore of the workspace.
Python
output_data1 = OutputFileDatasetConfig(destination = (datastore,

'outputdataset/{run-id}'))
output_data_dataset = output_data1.register_on_complete(name =
'prepared_output_data')
） Important
Intermediate data stored using OutputFileDatasetConfig isn't automatically deleted

by Azure. You should either programmatically delete intermediate data at the end
of a pipeline run, use a datastore with a short data-retention policy, or regularly do
manual clean up.
 Tip
Only upload files relevant to the job at hand. Any change in files within the data
directory will be seen as reason to rerun the step the next time the pipeline is run
even if reuse is specified.
Set up a compute target

In Azure Machine Learning, the term compute (or compute target) refers to the
machines or clusters that do the computational steps in your machine learning pipeline.
See compute targets for model training for a full list of compute targets and Create
compute targets for how to create and attach them to your workspace. The process for
creating and or attaching a compute target is the same whether you're training a model
or running a pipeline step. After you create and attach your compute target, use the
ComputeTarget object in your pipeline step.
） Important
Performing management operations on compute targets isn't supported from

inside remote jobs. Since machine learning pipelines are submitted as a remote job,
do not use management operations on compute targets from inside the pipeline.
Azure Machine Learning compute

You can create an Azure Machine Learning compute for running your steps. The code
for other compute targets is similar, with slightly different parameters, depending on the
type.
Python
compute_name = "aml-compute"
vm_size = "STANDARD_NC6"
if compute_name in ws.compute_targets:
compute_target = ws.compute_targets[compute_name]
if compute_target and type(compute_target) is AmlCompute:
print('Found compute target: ' + compute_name)
else:
provisioning_config =
AmlCompute.provisioning_configuration(vm_size=vm_size, # STANDARD_NC6 is
GPU-enabled
min_nodes=0,
max_nodes=4)
# create the compute target
compute_target = ComputeTarget.create(
ws, compute_name, provisioning_config)
# Can poll for a minimum number of nodes and for a specific timeout.
# If no min node count is provided it will use the scale settings for
the cluster
show_output=True, min_node_count=None, timeout_in_minutes=20)
# For a more detailed view of current cluster status, use the 'status'
property
print(compute_target.status.serialize())
Configure the training run's environment

The next step is making sure that the remote training run has all the dependencies
needed by the training steps. Dependencies and the runtime context are set by creating
and configuring a RunConfiguration object.
Python

aml_run_config = RunConfiguration()
# `compute_target` as defined in "Azure Machine Learning compute" section
above
aml_run_config.target = compute_target
USE_CURATED_ENV = True
if USE_CURATED_ENV :
curated_environment = Environment.get(workspace=ws, name="AzureML-
sklearn-0.24-ubuntu18.04-py37-cpu")
aml_run_config.environment = curated_environment
else:
aml_run_config.environment.python.user_managed_dependencies = False
# Add some packages relied on by data prep step

aml_run_config.environment.python.conda_dependencies =
CondaDependencies.create(
conda_packages=['pandas','scikit-learn'],
pip_packages=['azureml-sdk', 'azureml-dataset-
runtime[fuse,pandas]'],
The code above shows two options for handling dependencies. As presented, with
USE_CURATED_ENV = True , the configuration is based on a curated environment. Curated
environments are "prebaked" with common inter-dependent libraries and can be faster
to bring online. Curated environments have prebuilt Docker images in the Microsoft
Container Registry . For more information, see Azure Machine Learning curated
environments.
The path taken if you change USE_CURATED_ENV to False shows the pattern for explicitly
setting your dependencies. In that scenario, a new custom Docker image will be created
and registered in an Azure Container Registry within your resource group (see
Introduction to private Docker container registries in Azure). Building and registering
this image can take quite a few minutes.
Construct your pipeline steps

Once you have the compute resource and environment created, you're ready to define
your pipeline's steps. There are many built-in steps available via the Azure Machine
Learning SDK, as you can see on the reference documentation for the
azureml.pipeline.steps package. The most flexible class is PythonScriptStep, which runs a
Python script.
Python

dataprep_source_dir = "./dataprep_src"
entry_point = "prepare.py"
# `my_dataset` as defined above
ds_input = my_dataset.as_named_input('input1')
# òutput_data1`, `compute_target`, àml_run_config` as defined above

data_prep_step = PythonScriptStep(
script_name=entry_point,
source_directory=dataprep_source_dir,
arguments=["--input", ds_input.as_download(), "--output", output_data1],
runconfig=aml_run_config,
allow_reuse=True
)
The above code shows a typical initial pipeline step. Your data preparation code is in a
subdirectory (in this example, "prepare.py" in the directory "./dataprep.src" ). As part
of the pipeline creation process, this directory is zipped and uploaded to the
compute_target and the step runs the script specified as the value for script_name .
The arguments values specify the inputs and outputs of the step. In the example above,
the baseline data is the my_dataset dataset. The corresponding data will be downloaded
to the compute resource since the code specifies it as as_download() . The script
prepare.py does whatever data-transformation tasks are appropriate to the task at hand
and outputs the data to output_data1 , of type OutputFileDatasetConfig . For more

information, see Moving data into and between ML pipeline steps (Python). The step will
run on the machine defined by compute_target , using the configuration aml_run_config .
Reuse of previous results ( allow_reuse ) is key when using pipelines in a collaborative

environment since eliminating unnecessary reruns offers agility. Reuse is the default
behavior when the script_name, inputs, and the parameters of a step remain the same.
When reuse is allowed, results from the previous run are immediately sent to the next
step. If allow_reuse is set to False , a new run will always be generated for this step
during pipeline execution.
It's possible to create a pipeline with a single step, but almost always you'll choose to
split your overall process into several steps. For instance, you might have steps for data
preparation, training, model comparison, and deployment. For instance, one might
imagine that after the data_prep_step specified above, the next step might be training:
Python
train_source_dir = "./train_src"
train_entry_point = "train.py"
training_results = OutputFileDatasetConfig(name = "training_results",

destination = def_blob_store)
script_name=train_entry_point,
source_directory=train_source_dir,
arguments=["--prepped_data", output_data1.as_input(), "--
training_results", training_results],
allow_reuse=True
)
The above code is similar to the code in the data preparation step. The training code is
in a directory separate from that of the data preparation code. The
OutputFileDatasetConfig output of the data preparation step, output_data1 is used as
the input to the training step. A new OutputFileDatasetConfig object, training_results
is created to hold the results for a later comparison or deployment step.
For other code examples, see how to build a two step ML pipeline and how to write
data back to datastores upon run completion .
After you define your steps, you build the pipeline by using some or all of those steps.
７ Note
No file or data is uploaded to Azure Machine Learning when you define the steps
or build the pipeline. The files are uploaded when you call Experiment.submit().
Python
# list of steps to run (`compare_step` definition not shown)

compare_models = [data_prep_step, train_step, compare_step]
# Build the pipeline

pipeline1 = Pipeline(workspace=ws, steps=[compare_models])
Use a dataset
Datasets created from Azure Blob storage, Azure Files, Azure Data Lake Storage Gen1,
Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Database for
PostgreSQL can be used as input to any pipeline step. You can write output to a
DataTransferStep, DatabricksStep, or if you want to write data to a specific datastore use
OutputFileDatasetConfig.
） Important
Writing output data back to a datastore using OutputFileDatasetConfig is only

supported for Azure Blob, Azure File share, ADLS Gen 1 and Gen 2 datastores.
Python
dataset_consuming_step = PythonScriptStep(
script_name="iris_train.py",
inputs=[iris_tabular_dataset.as_named_input("iris_data")],
source_directory=project_folder
)
You then retrieve the dataset in your pipeline by using the Run.input_datasets dictionary.
Python
# iris_train.py
from azureml.core import Run, Dataset
run_context = Run.get_context()
iris_dataset = run_context.input_datasets['iris_data']
dataframe = iris_dataset.to_pandas_dataframe()
The line Run.get_context() is worth highlighting. This function retrieves a Run

representing the current experimental run. In the above sample, we use it to retrieve a
registered dataset. Another common use of the Run object is to retrieve both the
experiment itself and the workspace in which the experiment resides:
Python
# Within a PythonScriptStep
ws = Run.get_context().experiment.workspace
For more detail, including alternate ways to pass and access data, see Moving data into
and between ML pipeline steps (Python).
Caching & reuse

To optimize and customize the behavior of your pipelines, you can do a few things
around caching and reuse. For example, you can choose to:
Turn off the default reuse of the step run output by setting allow_reuse=False
during step definition. Reuse is key when using pipelines in a collaborative
environment since eliminating unnecessary runs offers agility. However, you can
opt out of reuse.
Force output regeneration for all steps in a run with pipeline_run =
exp.submit(pipeline, regenerate_outputs=True)
By default, allow_reuse for steps is enabled and the source_directory specified in the
step definition is hashed. So, if the script for a given step remains the same
( script_name , inputs, and the parameters), and nothing else in the source_directory
has changed, the output of a previous step run is reused, the job isn't submitted to the
compute, and the results from the previous run are immediately available to the next
step instead.
Python
step = PythonScriptStep(name="Hello World",

script_name="hello_world.py",
compute_target=aml_compute,
source_directory=source_directory,
allow_reuse=False,
hash_paths=['hello_world.ipynb'])
７ Note
If the names of the data inputs change, the step will rerun, even if the underlying
data does not change. You must explicitly set the name field of input data
( data.as_input(name=...) ). If you do not explicitly set this value, the name field will
be set to a random guid and the step's results will not be reused.
Submit the pipeline

When you submit the pipeline, Azure Machine Learning checks the dependencies for
each step and uploads a snapshot of the source directory you specified. If no source
directory is specified, the current local directory is uploaded. The snapshot is also stored
as part of the experiment in your workspace.
） Important
For more information, see Snapshots.
Python
# Submit the pipeline to be run

pipeline_run1 = Experiment(ws, 'Compare_Models_Exp').submit(pipeline1)
pipeline_run1.wait_for_completion()
When you first run a pipeline, Azure Machine Learning:
Downloads the project snapshot to the compute target from the Blob storage
associated with the workspace.
Builds a Docker image corresponding to each step in the pipeline.
Downloads the Docker image for each step to the compute target from the
container registry.
Configures access to Dataset and OutputFileDatasetConfig objects. For

as_mount() access mode, FUSE is used to provide virtual access. If mount isn't
supported or if the user specified access as as_upload() , the data is instead copied
to the compute target.
Runs the step in the compute target specified in the step definition.
Creates artifacts, such as logs, stdout and stderr, metrics, and output specified by
the step. These artifacts are then uploaded and kept in the user's default datastore.
For more information, see the Experiment class reference.
Use pipeline parameters for arguments that

change at inference time
Sometimes, the arguments to individual steps within a pipeline relate to the
development and training period: things like training rates and momentum, or paths to
data or configuration files. When a model is deployed, though, you'll want to
dynamically pass the arguments upon which you're inferencing (that is, the query you
built the model to answer!). You should make these types of arguments pipeline
parameters. To do this in Python, use the azureml.pipeline.core.PipelineParameter
class, as shown in the following code snippet:
Python
from azureml.pipeline.core import PipelineParameter
pipeline_param = PipelineParameter(name="pipeline_arg",
default_value="default_val")
train_step = PythonScriptStep(script_name="train.py",
arguments=["--param1", pipeline_param],
target=compute_target,
How Python environments work with pipeline parameters

As discussed previously in Configure the training run's environment, environment state,
and Python library dependencies are specified using an Environment object. Generally,
you can specify an existing Environment by referring to its name and, optionally, a
version:
Python
aml_run_config.environment.name = 'MyEnvironment'
aml_run_config.environment.version = '1.0'
However, if you choose to use PipelineParameter objects to dynamically set variables at

runtime for your pipeline steps, you can't use this technique of referring to an existing
Environment . Instead, if you want to use PipelineParameter objects, you must set the
environment field of the RunConfiguration to an Environment object. It is your
responsibility to ensure that such an Environment has its dependencies on external

Python packages properly set.
View results of a pipeline

See the list of all your pipelines and their run details in the studio:
2. View your workspace.
3. On the left, select Pipelines to see all your pipeline runs.
4. Select a specific pipeline to see the run results.

When you start a training run where the source directory is a local Git repository,
information about the repository is stored in the run history. For more information, see
Git integration for Azure Machine Learning.
Next steps
To share your pipeline with colleagues or customers, see Publish machine learning
pipelines
Use these Jupyter notebooks on GitHub to explore machine learning pipelines
further
See the SDK reference help for the azureml-pipelines-core package and the
azureml-pipelines-steps package
See the how-to for tips on debugging and troubleshooting pipelines=
Learn how to run notebooks by following the article Use Jupyter notebooks to
explore this service.
Moving data into and between ML
pipeline steps (Python)
Article • 03/02/2023
This article provides code for importing, transforming, and moving data between steps
in an Azure Machine Learning pipeline. For an overview of how data works in Azure
Machine Learning, see Access data in Azure storage services. For the benefits and
structure of Azure Machine Learning pipelines, see What are Azure Machine Learning
pipelines?
This article will show you how to:
Use Dataset objects for pre-existing data

Access data within your steps
Split Dataset data into subsets, such as training and validation subsets
Create OutputFileDatasetConfig objects to transfer data to the next pipeline step
Use OutputFileDatasetConfig objects as input to pipeline steps
Create new Dataset objects from OutputFileDatasetConfig you wish to persist
Prerequisites
You'll need:

Learning .
The Azure Machine Learning SDK for Python, or access to Azure Machine Learning
studio .
Either create an Azure Machine Learning workspace or use an existing one via the
Python SDK. Import the Workspace and Datastore class, and load your subscription
information from the file config.json using the function from_config() . This
function looks for the JSON file in the current directory by default, but you can also
specify a path parameter to point to the file using
from_config(path="your/file/path") .
Python
import azureml.core
Some pre-existing data. This article briefly shows the use of an Azure blob
container.
Optional: An existing machine learning pipeline, such as the one described in

Create and run machine learning pipelines with Azure Machine Learning SDK.
Use Dataset objects for pre-existing data

The preferred way to ingest data into a pipeline is to use a Dataset object. Dataset
objects represent persistent data available throughout a workspace.
There are many ways to create and register Dataset objects. Tabular datasets are for
delimited data available in one or more files. File datasets are for binary data (such as
images) or for data that you'll parse. The simplest programmatic ways to create Dataset
objects are to use existing blobs in workspace storage or public URLs:
Python
datastore = Datastore.get(workspace, 'training_data')

iris_dataset = Dataset.Tabular.from_delimited_files(DataPath(datastore,
'iris.csv'))
datastore_path = [
DataPath(datastore, 'animals/dog/1.jpg'),
DataPath(datastore, 'animals/dog/2.jpg'),
DataPath(datastore, 'animals/cat/*.jpg')
]
cats_dogs_dataset = Dataset.File.from_files(path=datastore_path)
For more options on creating datasets with different options and from different sources,
registering them and reviewing them in the Azure Machine Learning UI, understanding
how data size interacts with compute capacity, and versioning them, see Create Azure
Machine Learning datasets.
Pass datasets to your script

To pass the dataset's path to your script, use the Dataset object's as_named_input()
method. You can either pass the resulting DatasetConsumptionConfig object to your
script as an argument or, by using the inputs argument to your pipeline script, you can
retrieve the dataset using Run.get_context().input_datasets[] .
Once you've created a named input, you can choose its access mode: as_mount() or
as_download() . If your script processes all the files in your dataset and the disk on your
compute resource is large enough for the dataset, the download access mode is the
better choice. The download access mode will avoid the overhead of streaming the data
at runtime. If your script accesses a subset of the dataset or it's too large for your
compute, use the mount access mode. For more information, read Mount vs. Download
To pass a dataset to your pipeline step:
1. Use TabularDataset.as_named_input() or FileDataset.as_named_input() (no 's' at

end) to create a DatasetConsumptionConfig object
2. Use as_mount() or as_download() to set the access mode
3. Pass the datasets to your pipeline steps using either the arguments or the inputs
argument
The following snippet shows the common pattern of combining these steps within the
PythonScriptStep constructor:
Python
name="train_data",
script_name="train.py",
compute_target=cluster,
inputs=[iris_dataset.as_named_input('iris').as_mount()]
)
７ Note
You would need to replace the values for all these arguments (that is, "train_data" ,
"train.py" , cluster , and iris_dataset ) with your own data. The above snippet
just shows the form of the call and is not part of a Microsoft sample.
You can also use methods such as random_split() and take_sample() to create multiple
inputs or reduce the amount of data passed to your pipeline step:
Python
seed = 42 # PRNG seed
smaller_dataset = iris_dataset.take_sample(0.1, seed=seed) # 10%
train, test = smaller_dataset.random_split(percentage=0.8, seed=seed)
name="train_data",
inputs=[train.as_named_input('train').as_download(),
test.as_named_input('test').as_download()]
)
Access datasets within your script

Named inputs to your pipeline step script are available as a dictionary within the Run
object. Retrieve the active Run object using Run.get_context() and then retrieve the
dictionary of named inputs using input_datasets . If you passed the
DatasetConsumptionConfig object using the arguments argument rather than the inputs
argument, access the data using ArgParser code. Both techniques are demonstrated in
the following snippet.
Python
# In pipeline definition script:

# Code for demonstration only: It would be very confusing to split datasets
between àrguments` and ìnputs`
name="train_data",
arguments=['--training-folder',
train.as_named_input('train').as_download()],
inputs=[test.as_named_input('test').as_download()]
)
# In pipeline script
parser.add_argument('--training-folder', type=str, dest='train_folder',
help='training data folder mounting point')
training_data_folder = args.train_folder
testing_data_folder = Run.get_context().input_datasets['test']
The passed value will be the path to the dataset file(s).

It's also possible to access a registered Dataset directly. Since registered datasets are
persistent and shared across a workspace, you can retrieve them directly:
Python
ws = run.experiment.workspace
ds = Dataset.get_by_name(workspace=ws, name='mnist_opendataset')
７ Note
The preceding snippets show the form of the calls and are not part of a Microsoft
sample. You must replace the various arguments with values from your own project.
Use OutputFileDatasetConfig for intermediate

data
While Dataset objects represent only persistent data, OutputFileDatasetConfig object(s)
can be used for temporary data output from pipeline steps and persistent output data.
OutputFileDatasetConfig supports writing data to blob storage, fileshare, adlsgen1, or
adlsgen2. It supports both mount mode and upload mode. In mount mode, files written
to the mounted directory are permanently stored when the file is closed. In upload
mode, files written to the output directory are uploaded at the end of the job. If the job
fails or is canceled, the output directory will not be uploaded.
OutputFileDatasetConfig object's default behavior is to write to the default datastore of
the workspace. Pass your OutputFileDatasetConfig objects to your PythonScriptStep

with the arguments parameter.
Python

dataprep_output = OutputFileDatasetConfig()
input_dataset = Dataset.get_by_name(workspace, 'raw_data')
dataprep_step = PythonScriptStep(
name="prep_data",
script_name="dataprep.py",
arguments=[input_dataset.as_named_input('raw_data').as_mount(),
dataprep_output]
)
７ Note
Concurrent writes to a OutputFileDatasetConfig will fail. Do not attempt to use a

single OutputFileDatasetConfig concurrently. Do not share a single
OutputFileDatasetConfig in a multiprocessing situation, such as when using
distributed training.
Use OutputFileDatasetConfig as outputs of a training step

Within your pipeline's PythonScriptStep , you can retrieve the available output paths
using the program's arguments. If this step is the first and will initialize the output data,
you must create the directory at the specified path. You can then write whatever files
you wish to be contained in the OutputFileDatasetConfig .
Python
parser.add_argument('--output_path', dest='output_path', required=True)
# Make directory for file

os.makedirs(os.path.dirname(args.output_path), exist_ok=True)
with open(args.output_path, 'w') as f:
f.write("Step 1's output")
Read OutputFileDatasetConfig as inputs to non-initial

steps
After the initial pipeline step writes some data to the OutputFileDatasetConfig path and
it becomes an output of that initial step, it can be used as an input to a later step.
In the following code:
step1_output_data indicates that the output of the PythonScriptStep, step1 is
written to the ADLS Gen 2 datastore, my_adlsgen2 in upload access mode. Learn
more about how to set up role permissions in order to write data back to ADLS
Gen 2 datastores.
After step1 completes and the output is written to the destination indicated by
step1_output_data , then step2 is ready to use step1_output_data as an input.
Python
# get adls gen 2 datastore already registered with the workspace
datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data",
destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()
step1 = PythonScriptStep(
name="generate_data",
script_name="step1.py",
runconfig = aml_run_config,
arguments = ["--output_path", step1_output_data]
)
step2 = PythonScriptStep(
name="read_pipeline_data",
script_name="step2.py",
compute_target=compute,
runconfig = aml_run_config,
arguments = ["--pd", step1_output_data.as_input()]
pipeline = Pipeline(workspace=ws, steps=[step1, step2])
Register OutputFileDatasetConfig objects for

reuse
If you'd like to make your OutputFileDatasetConfig available for longer than the
duration of your experiment, register it to your workspace to share and reuse across
experiments.
Python
step1_output_ds = step1_output_data.register_on_complete(
name='processed_data',
description = 'files from step1'
)
Delete OutputFileDatasetConfig contents when

no longer needed
Azure does not automatically delete intermediate data written with
OutputFileDatasetConfig . To avoid storage charges for large amounts of unneeded
data, you should either:
Ｕ Caution
Only delete intermediate data after 30 days from the last change date of the data.
Deleting the data earlier could cause the pipeline run to fail because the pipeline
will assume the intermediate data exists within 30 day period for reuse.
Programmatically delete intermediate data at the end of a pipeline job, when it is

no longer needed.
Use blob storage with a short-term storage policy for intermediate data (see
Optimize costs by automating Azure Blob Storage access tiers). This policy can only
be set to a workspace's non-default datastore. Use OutputFileDatasetConfig to
export intermediate data to another datastore that isn't the default.
Python
# Get adls gen 2 datastore already registered with the workspace

datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data",
destination=(datastore, "mypath/{run-id}/{output-name}")).as_upload()
Regularly review and delete no-longer-needed data.
For more information, see Plan and manage costs for Azure Machine Learning.
Next steps
Create an Azure machine learning dataset
Create and run machine learning pipelines with Azure Machine Learning SDK
Use automated ML in an Azure Machine
Learning pipeline in Python
Article • 06/01/2023
Azure Machine Learning's automated ML capability helps you discover high-performing

models without you reimplementing every possible approach. Combined with Azure
Machine Learning pipelines, you can create deployable workflows that can quickly
discover the algorithm that works best for your data. This article will show you how to
efficiently join a data preparation step to an automated ML step. Automated ML can
quickly discover the algorithm that works best for your data, while putting you on the
road to MLOps and model lifecycle operationalization with pipelines.
Prerequisites
Learning today.
An Azure Machine Learning workspace. See Create workspace resources.
Familiarity with Azure's automated machine learning and machine learning

pipelines facilities and SDK.
Review automated ML's central classes

Automated ML in a pipeline is represented by an AutoMLStep object. The AutoMLStep
class is a subclass of PipelineStep . A graph of PipelineStep objects defines a Pipeline .
There are several subclasses of PipelineStep . In addition to the AutoMLStep , this article
will show a PythonScriptStep for data preparation and another for registering the
model.
The preferred way to initially move data into an ML pipeline is with Dataset objects. To
move data between steps and possible save data output from runs, the preferred way is
with OutputFileDatasetConfig and OutputTabularDatasetConfig objects. To be used with
AutoMLStep , the PipelineData object must be transformed into a
PipelineOutputTabularDataset object. For more information, see Input and output data
from ML pipelines.
The AutoMLStep is configured via an AutoMLConfig object. AutoMLConfig is a flexible

class, as discussed in Configure automated ML experiments in Python.
A Pipeline runs in an Experiment . The pipeline Run has, for each step, a child StepRun .
The outputs of the automated ML StepRun are the training metrics and highest-
performing model.
To make things concrete, this article creates a simple pipeline for a classification task.
The task is predicting Titanic survival, but we won't be discussing the data or task except
in passing.
Get started
Retrieve initial dataset

Often, an ML workflow starts with pre-existing baseline data. This is a good scenario for
a registered dataset. Datasets are visible across the workspace, support versioning, and
can be interactively explored. There are many ways to create and populate a dataset, as
discussed in Create Azure Machine Learning datasets. Since we'll be using the Python
SDK to create our pipeline, use the SDK to download baseline data and register it with
the name 'titanic_ds'.
Python
if not 'titanic_ds' in ws.datasets.keys() :
# create a TabularDataset from Titanic training data
web_paths = ['https://dprepdata.blob.core.windows.net/demo/Titanic.csv',
'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv']
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_paths)
titanic_ds.register(workspace = ws,
description = 'Titanic baseline data',
titanic_ds = Dataset.get_by_name(ws, 'titanic_ds')

The code first logs in to the Azure Machine Learning workspace defined in config.json
(for an explanation, see Create a workspace configuration file. If there isn't already a
dataset named 'titanic_ds' registered, then it creates one. The code downloads CSV
data from the Web, uses them to instantiate a TabularDataset and then registers the
dataset with the workspace. Finally, the function Dataset.get_by_name() assigns the
Dataset to titanic_ds .
Configure your storage and compute target

Additional resources that the pipeline will need are storage and, generally, Azure
Machine Learning compute resources.
Python
from azureml.core import Datastore

from azureml.core.compute import AmlCompute, ComputeTarget
compute_name = 'cpu-cluster'
if not compute_name in ws.compute_targets :
print('creating a new compute target...')
AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
min_nodes=0,
max_nodes=1)
compute_target = ComputeTarget.create(ws, compute_name,
provisioning_config)
show_output=True, min_node_count=None, timeout_in_minutes=20)
# Show the result

print(compute_target.get_status().serialize())
compute_target = ws.compute_targets[compute_name]
７ Note
The intermediate data between the data preparation and the automated ML step can be
stored in the workspace's default datastore, so we don't need to do more than call
get_default_datastore() on the Workspace object.
After that, the code checks if the Azure Machine Learning compute target 'cpu-
cluster' already exists. If not, we specify that we want a small CPU-based compute
target. If you plan to use automated ML's deep learning features (for instance, text
featurization with DNN support) you should choose a compute with strong GPU
support, as described in GPU optimized virtual machine sizes.
The code blocks until the target is provisioned and then prints some details of the just-
created compute target. Finally, the named compute target is retrieved from the
workspace and assigned to compute_target .

The runtime context is set by creating and configuring a RunConfiguration object. Here
we set the compute target.
Python

# Use just-specified compute target ("cpu-cluster")
aml_run_config.target = compute_target
# Specify CondaDependencies obj, add necessary packages

aml_run_config.environment.python.conda_dependencies =
CondaDependencies.create(
conda_packages=['pandas','scikit-learn'],
pip_packages=['azureml-sdk[automl]', 'pyarrow'])
Prepare data for automated machine learning
Write the data preparation code

The baseline Titanic dataset consists of mixed numerical and text data, with some values
missing. To prepare it for automated machine learning, the data preparation pipeline
step will:
Fill missing data with either random data or a category corresponding to

"Unknown"
Transform categorical data to integers
Drop columns that we don't intend to use
Split the data into training and testing sets
Write the transformed data to the OutputFileDatasetConfig output paths
Python
%%writefile dataprep.py
import pandas as pd
import numpy as np
import argparse
RANDOM_SEED=42
def prepare_age(df):
# Fill in missing Age values from distribution of present Age values
mean = df["Age"].mean()
std = df["Age"].std()
is_null = df["Age"].isnull().sum()
# compute enough (== is_null().sum()) random numbers between the mean,
std
rand_age = np.random.randint(mean - std, mean + std, size = is_null)
# fill NaN values in Age column with random values generated
age_slice = df["Age"].copy()
age_slice[np.isnan(age_slice)] = rand_age
df["Age"] = age_slice
df["Age"] = df["Age"].astype(int)
# Quantize age into 5 classes

df['Age_Group'] = pd.qcut(df['Age'],5, labels=False)
df.drop(['Age'], axis=1, inplace=True)
return df
def prepare_fare(df):
df['Fare'].fillna(0, inplace=True)
df['Fare_Group'] = pd.qcut(df['Fare'],5,labels=False)
df.drop(['Fare'], axis=1, inplace=True)
return df
def prepare_genders(df):
genders = {"male": 0, "female": 1, "unknown": 2}
df['Sex'] = df['Sex'].map(genders)
df['Sex'].fillna(2, inplace=True)
df['Sex'] = df['Sex'].astype(int)
return df
def prepare_embarked(df):
df['Embarked'].replace('', 'U', inplace=True)
df['Embarked'].fillna('U', inplace=True)
ports = {"S": 0, "C": 1, "Q": 2, "U": 3}
df['Embarked'] = df['Embarked'].map(ports)
return df
parser.add_argument('--output_path', dest='output_path', required=True)
titanic_ds = Run.get_context().input_datasets['titanic_ds']
df = titanic_ds.to_pandas_dataframe().drop(['PassengerId', 'Name', 'Ticket',
'Cabin'], axis=1)
df = prepare_embarked(prepare_genders(prepare_fare(prepare_age(df))))
df.to_csv(os.path.join(args.output_path,"prepped_data.csv"))
print(f"Wrote prepped data to {args.output_path}/prepped_data.csv")
The above code snippet is a complete, but minimal, example of data preparation for the
Titanic data. The snippet starts with a Jupyter "magic command" to output the code to a
file. If you aren't using a Jupyter notebook, remove that line and create the file manually.
The various prepare_ functions in the above snippet modify the relevant column in the
input dataset. These functions work on the data once it has been changed into a Pandas
DataFrame object. In each case, missing data is either filled with representative random
data or categorical data indicating "Unknown." Text-based categorical data is mapped to
integers. No-longer-needed columns are overwritten or dropped.
After the code defines the data preparation functions, the code parses the input
argument, which is the path to which we want to write our data. (These values will be
determined by OutputFileDatasetConfig objects that will be discussed in the next step.)
The code retrieves the registered 'titanic_cs' Dataset , converts it to a Pandas
DataFrame , and calls the various data preparation functions.
Since the output_path is a directory, the call to to_csv() specifies the filename
prepped_data.csv .
Write the data preparation pipeline step

( PythonScriptStep )
The data preparation code described above must be associated with a PythonScripStep
object to be used with a pipeline. The path to which the CSV output is written is
generated by a OutputFileDatasetConfig object. The resources prepared earlier, such as
the ComputeTarget , the RunConfig , and the 'titanic_ds' Dataset are used to complete
the specification.
Python

prepped_data_path = OutputFileDatasetConfig(name="output_path")
dataprep_step = PythonScriptStep(
name="dataprep",
script_name="dataprep.py",
arguments=["--output_path", prepped_data_path],
inputs=[titanic_ds.as_named_input('titanic_ds')],
allow_reuse=True
)
The prepped_data_path object is of type OutputFileDatasetConfig which points to a

directory. Notice that it's specified in the arguments parameter. If you review the
previous step, you'll see that within the data preparation code, the value of the
argument '--output_path' is the directory path at which the CSV file was written.
Train with AutoMLStep

Configuring an automated ML pipeline step is done with the AutoMLConfig class. This
flexible class is described in Configure automated ML experiments in Python. Data input
and output are the only aspects of configuration that require special attention in an ML
pipeline. Input and output for AutoMLConfig in pipelines is discussed in detail below.
Beyond data, an advantage of ML pipelines is the ability to use different compute
targets for different steps. You might choose to use a more powerful ComputeTarget only
for the automated ML process. Doing so is as straightforward as assigning a more
powerful RunConfiguration to the AutoMLConfig object's run_configuration parameter.
Send data to AutoMLStep

In an ML pipeline, the input data must be a Dataset object. The highest-performing way
is to provide the input data in the form of OutputTabularDatasetConfig objects. You
create an object of that type with the read_delimited_files() on a
OutputFileDatasetConfig , such as the prepped_data_path , such as the
prepped_data_path object.
Python
# type(prepped_data) == OutputTabularDatasetConfig
prepped_data = prepped_data_path.read_delimited_files()
Another option is to use Dataset objects registered in the workspace:

Python
prepped_data = Dataset.get_by_name(ws, 'Data_prepared')
Comparing the two techniques:
Technique Benefits and drawbacks
OutputTabularDatasetConfig Higher performance
Natural route from OutputFileDatasetConfig
Data isn't persisted after pipeline run
Registered Dataset Lower performance
Can be generated in many ways
Data persists and is visible throughout workspace
Notebook showing registered Dataset technique
Specify automated ML outputs

The outputs of the AutoMLStep are the final metric scores of the higher-performing
model and that model itself. To use these outputs in further pipeline steps, prepare
OutputFileDatasetConfig objects to receive them.
Python
from azureml.pipeline.core import TrainingOutput, PipelineData
metrics_data = PipelineData(name='metrics_data',
datastore=datastore,
pipeline_output_name='metrics_output',
training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='best_model_data',
pipeline_output_name='model_output',
training_output=TrainingOutput(type='Model'))
The snippet above creates the two PipelineData objects for the metrics and model
output. Each is named, assigned to the default datastore retrieved earlier, and
associated with the particular type of TrainingOutput from the AutoMLStep . Because we
assign pipeline_output_name on these PipelineData objects, their values will be
available not just from the individual pipeline step, but from the pipeline as a whole, as
will be discussed below in the section "Examine pipeline results."
Configure and create the automated ML pipeline step

Once the inputs and outputs are defined, it's time to create the AutoMLConfig and
AutoMLStep . The details of the configuration will depend on your task, as described in
Configure automated ML experiments in Python. For the Titanic survival classification
task, the following snippet demonstrates a simple configuration.
Python

from azureml.pipeline.steps import AutoMLStep
# Change iterations to a reasonable number (50) to get better accuracy

automl_settings = {
"iteration_timeout_minutes" : 10,
"iterations" : 2,
"experiment_timeout_hours" : 0.25,
"primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(task = 'classification',

path = '.',
debug_log = 'automated_ml_errors.log',
compute_target = compute_target,
run_configuration = aml_run_config,
featurization = 'auto',
training_data = prepped_data,
label_column_name = 'Survived',
**automl_settings)
train_step = AutoMLStep(name='AutoML_Classification',
automl_config=automl_config,
passthru_automl_config=False,
outputs=[metrics_data,model_data],
enable_default_model_output=False,
enable_default_metrics_output=False,
allow_reuse=True)
The snippet shows an idiom commonly used with AutoMLConfig . Arguments that are
more fluid (hyperparameter-ish) are specified in a separate dictionary while the values
less likely to change are specified directly in the AutoMLConfig constructor. In this case,
the automl_settings specify a brief run: the run will stop after only 2 iterations or 15
minutes, whichever comes first.
The automl_settings dictionary is passed to the AutoMLConfig constructor as kwargs.
The other parameters aren't complex:
task is set to classification for this example. Other valid values are regression
and forecasting
path and debug_log describe the path to the project and a local file to which
debug information will be written

compute_target is the previously defined compute_target that, in this example, is
an inexpensive CPU-based machine. If you're using AutoML's Deep Learning
facilities, you would want to change the compute target to be GPU-based
featurization is set to auto . More details can be found in the Data Featurization
section of the automated ML configuration document

label_column_name indicates which column we are interested in predicting
training_data is set to the OutputTabularDatasetConfig objects made from the
outputs of the data preparation step
The AutoMLStep itself takes the AutoMLConfig and has, as outputs, the PipelineData
objects created to hold the metrics and model data.
） Important
You must set enable_default_model_output and enable_default_metrics_output to

True only if you are using AutoMLStepRun .
In this example, the automated ML process will perform cross-validations on the

training_data . You can control the number of cross-validations with the
n_cross_validations argument. If you've already split your training data as part of your
data preparation steps, you can set validation_data to its own Dataset .
You might occasionally see the use X for data features and y for data labels. This
technique is deprecated and you should use training_data for input.
Register the model generated by automated

ML
The last step in a simple ML pipeline is registering the created model. By adding the
model to the workspace's model registry, it will be available in the portal and can be
versioned. To register the model, write another PythonScriptStep that takes the
model_data output of the AutoMLStep .
Write the code to register the model
A model is registered in a Workspace . You're probably familiar with using
Workspace.from_config() to log on to your workspace on your local machine, but
there's another way to get the workspace from within a running ML pipeline. The
Run.get_context() retrieves the active Run . This run object provides access to many
important objects, including the Workspace used here.
Python
%%writefile register_model.py
from azureml.core.model import Model, Dataset
from azureml.core.run import Run, _OfflineRun
import argparse
parser.add_argument("--model_name", required=True)
parser.add_argument("--model_path", required=True)
print(f"model_name : {args.model_name}")
print(f"model_path: {args.model_path}")
ws = Workspace.from_config() if type(run) == _OfflineRun else
run.experiment.workspace
model = Model.register(workspace=ws,
model_path=args.model_path,
model_name=args.model_name)
print("Registered version {0} of model {1}".format(model.version,

model.name))
Write the PythonScriptStep code
２ Warning
If you are using the Azure Machine Learning SDK v1, and your workspace is
configured for network isolation (VNet), you may receive an error when running
this step. For more information, see HyperdriveStep and AutoMLStep fail with
network isolation.
The model-registering PythonScriptStep uses a PipelineParameter for one of its
arguments. Pipeline parameters are arguments to pipelines that can be easily set at run-
submission time. Once declared, they're passed as normal arguments.
Python
from azureml.pipeline.core.graph import PipelineParameter
# The model name with which to register the trained model in the workspace.
model_name = PipelineParameter("model_name",
default_value="TitanicSurvivalInitial")
register_step = PythonScriptStep(script_name="register_model.py",
name="register_model",
allow_reuse=False,
arguments=["--model_name",
model_name, "--model_path", model_data],
inputs=[model_data],
runconfig=aml_run_config)
Create and run your automated ML pipeline

Creating and running a pipeline that contains an AutoMLStep is no different than a
normal pipeline.
Python

pipeline = Pipeline(ws, [dataprep_step, train_step, register_step])
experiment = Experiment(workspace=ws, name='titanic_automl')
run = experiment.submit(pipeline, show_output=True)

run.wait_for_completion()
The code above combines the data preparation, automated ML, and model-registering
steps into a Pipeline object. It then creates an Experiment object. The Experiment
constructor will retrieve the named experiment if it exists or create it if necessary. It
submits the Pipeline to the Experiment , creating a Run object that will asynchronously
run the pipeline. The wait_for_completion() function blocks until the run completes.
Examine pipeline results
Once the run completes, you can retrieve PipelineData objects that have been
assigned a pipeline_output_name . You can download the results and load them for
further processing.
Python
metrics_output_port = run.get_pipeline_output('metrics_output')
model_output_port = run.get_pipeline_output('model_output')
metrics_output_port.download('.', show_progress=True)
model_output_port.download('.', show_progress=True)
Downloaded files are written to the subdirectory azureml/{run.id}/ . The metrics file is
JSON-formatted and can be converted into a Pandas dataframe for examination.
For local processing, you may need to install relevant packages, such as Pandas, Pickle,
the Azure Machine Learning SDK, and so forth. For this example, it's likely that the best
model found by automated ML will depend on XGBoost.
Bash
!pip install xgboost==0.90
Python
import pandas as pd
import json
metrics_filename = metrics_output._path_on_datastore
# metrics_filename = path to downloaded file
with open(metrics_filename) as f:
metrics_output_result = f.read()
deserialized_metrics_output = json.loads(metrics_output_result)
df = pd.DataFrame(deserialized_metrics_output)
df
The code snippet above shows the metrics file being loaded from its location on the
Azure datastore. You can also load it from the downloaded file, as shown in the
comment. Once you've deserialized it and converted it to a Pandas DataFrame, you can
see detailed metrics for each of the iterations of the automated ML step.
The model file can be deserialized into a Model object that you can use for inferencing,
further metrics analysis, and so forth.
Python
import pickle
model_filename = model_output._path_on_datastore
# model_filename = path to downloaded file
with open(model_filename, "rb" ) as f:

best_model = pickle.load(f)
# ... inferencing code not shown ...
For more information on loading and working with existing models, see Use an existing
model with Azure Machine Learning.
Download the results of an automated ML run

If you've been following along with the article, you'll have an instantiated run object.
But you can also retrieve completed Run objects from the Workspace by way of an
Experiment object.
The workspace contains a complete record of all your experiments and runs. You can
either use the portal to find and download the outputs of experiments or use code. To
access the records from a historic run, use Azure Machine Learning to find the ID of the
run in which you are interested. With that ID, you can choose the specific run by way of
the Workspace and Experiment .
Python
# Retrieved from Azure Machine Learning web UI

run_id = 'aaaaaaaa-bbbb-cccc-dddd-0123456789AB'
experiment = ws.experiments['titanic_automl']
run = next(run for run in ex.get_runs() if run.id == run_id)
You would have to change the strings in the above code to the specifics of your
historical run. The snippet above assumes that you've assigned ws to the relevant
Workspace with the normal from_config() . The experiment of interest is directly
retrieved and then the code finds the Run of interest by matching the run.id value.
Once you have a Run object, you can download the metrics and model.
Python
automl_run = next(r for r in run.get_children() if r.name ==

'AutoML_Classification')
outputs = automl_run.get_outputs()
metrics = outputs['default_metrics_AutoML_Classification']
model = outputs['default_model_AutoML_Classification']
metrics.get_port_data_reference().download('.')
model.get_port_data_reference().download('.')
Each Run object contains StepRun objects that contain information about the individual
pipeline step run. The run is searched for the StepRun object for the AutoMLStep . The
metrics and model are retrieved using their default names, which are available even if
you don't pass PipelineData objects to the outputs parameter of the AutoMLStep .
Finally, the actual metrics and model are downloaded to your local machine, as was
discussed in the "Examine pipeline results" section above.
Next Steps
Run this Jupyter notebook showing a complete example of automated ML in a
pipeline that uses regression to predict taxi fares
Create automated ML experiments without writing code
Explore a variety of Jupyter notebooks demonstrating automated ML
Read about integrating your pipeline in to End-to-end MLOps or investigate the
MLOps GitHub repository
Publish and track machine learning
pipelines
Article • 06/01/2023
This article will show you how to share a machine learning pipeline with your colleagues
or customers.
Machine learning pipelines are reusable workflows for machine learning tasks. One
benefit of pipelines is increased collaboration. You can also version pipelines, allowing
customers to use the current model while you're working on a new version.
Prerequisites
Create an Azure Machine Learning workspace to hold all your pipeline resources

installed
Create and run a machine learning pipeline, such as by following Tutorial: Build an
Azure Machine Learning pipeline for batch scoring. For other options, see Create
and run machine learning pipelines with Azure Machine Learning SDK
Publish a pipeline
Once you have a pipeline up and running, you can publish a pipeline so that it runs with
different inputs. For the REST endpoint of an already published pipeline to accept
parameters, you must configure your pipeline to use PipelineParameter objects for the
arguments that will vary.
1. To create a pipeline parameter, use a PipelineParameter object with a default value.
Python
from azureml.pipeline.core.graph import PipelineParameter
pipeline_param = PipelineParameter(
name="pipeline_arg",
default_value=10)
2. Add this PipelineParameter object as a parameter to any of the steps in the
pipeline as follows:
Python
compareStep = PythonScriptStep(
script_name="compare.py",
arguments=["--comp_data1", comp_data1, "--comp_data2", comp_data2, "-
-output_data", out_data3, "--param1", pipeline_param],
inputs=[ comp_data1, comp_data2],
outputs=[out_data3],
3. Publish this pipeline that will accept a parameter when invoked.
Python
published_pipeline1 = pipeline_run1.publish_pipeline(
name="My_Published_Pipeline",
description="My Published Pipeline Description",
version="1.0")
4. After you publish your pipeline, you can check it in the UI. Pipeline ID is the unique
identified of the published pipeline.
Run a published pipeline

All published pipelines have a REST endpoint. With the pipeline endpoint, you can
trigger a run of the pipeline from any external systems, including non-Python clients.
This endpoint enables "managed repeatability" in batch scoring and retraining scenarios.
） Important
If you are using Azure role-based access control (Azure RBAC) to manage access to
your pipeline, set the permissions for your pipeline scenario (training or scoring).
To invoke the run of the preceding pipeline, you need an Azure Active Directory
authentication header token. Getting such a token is described in the
AzureCliAuthentication class reference and in the Authentication in Azure Machine
Learning notebook.
Python
from azureml.pipeline.core import PublishedPipeline

import requests
response = requests.post(published_pipeline1.endpoint,
headers=aad_token,
json={"ExperimentName": "My_Pipeline",
"ParameterAssignments": {"pipeline_arg":
20}})
The json argument to the POST request must contain, for the ParameterAssignments
key, a dictionary containing the pipeline parameters and their values. In addition, the
json argument may contain the following keys:
Key Description
ExperimentName The name of the experiment associated with this endpoint
Description Freeform text describing the endpoint
Tags Freeform key-value pairs that can be used to label and

annotate requests
DataSetDefinitionValueAssignments Dictionary used for changing datasets without retraining

(see discussion below)
DataPathAssignments Dictionary used for changing datapaths without retraining

(see discussion below)
Run a published pipeline using C#

The following code shows how to call a pipeline asynchronously from C#. The partial
code snippet just shows the call structure and isn't part of a Microsoft sample. It doesn't
show complete classes or error handling.
C#
[DataContract]
public class SubmitPipelineRunRequest
{
[DataMember]
public string ExperimentName { get; set; }
[DataMember]
public string Description { get; set; }
[DataMember(IsRequired = false)]
public IDictionary<string, string> ParameterAssignments { get; set; }
}
// ... in its own class and method ...

const string RestEndpoint = "your-pipeline-endpoint";
using (HttpClient client = new HttpClient())

{
var submitPipelineRunRequest = new SubmitPipelineRunRequest()
{
ExperimentName = "YourExperimentName",
Description = "Asynchronous C# REST api call",
ParameterAssignments = new Dictionary<string, string>
{
{
// Replace with your pipeline parameter keys and values
"your-pipeline-parameter", "default-value"
}
}
};
string auth_key = "your-auth-key";

client.DefaultRequestHeaders.Authorization = new
AuthenticationHeaderValue("Bearer", auth_key);
// submit the job

var requestPayload =
JsonConvert.SerializeObject(submitPipelineRunRequest);
var httpContent = new StringContent(requestPayload, Encoding.UTF8,
"application/json");
var submitResponse = await client.PostAsync(RestEndpoint,
httpContent).ConfigureAwait(false);
if (!submitResponse.IsSuccessStatusCode)
{
await WriteFailedResponse(submitResponse); // ... method not shown
...
return;
}
var result = await

submitResponse.Content.ReadAsStringAsync().ConfigureAwait(false);
var obj = JObject.Parse(result);
// ... use òbj` dictionary to access results
}
Run a published pipeline using Java

The following code shows a call to a pipeline that requires authentication (see Set up
authentication for Azure Machine Learning resources and workflows). If your pipeline is
deployed publicly, you don't need the calls that produce authKey . The partial code
snippet doesn't show Java class and exception-handling boilerplate. The code uses
Optional.flatMap for chaining together functions that may return an empty Optional .
The use of flatMap shortens and clarifies the code, but note that getRequestBody()
swallows exceptions.
Java
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.Optional;
// JSON library
import com.google.gson.Gson;
String scoringUri = "scoring-endpoint";

String tenantId = "your-tenant-id";
String clientId = "your-client-id";
String clientSecret = "your-client-secret";
String resourceManagerUrl = "https://management.azure.com";
String dataToBeScored = "{ \"ExperimentName\" : \"My_Pipeline\",
\"ParameterAssignments\" : { \"pipeline_arg\" : \"20\" }}";
HttpClient client = HttpClient.newBuilder().build();

Gson gson = new Gson();
HttpRequest tokenAuthenticationRequest =
tokenAuthenticationRequest(tenantId, clientId, clientSecret,
resourceManagerUrl);
Optional<String> authBody = getRequestBody(client,
tokenAuthenticationRequest);
Optional<String> authKey = authBody.flatMap(body ->
Optional.of(gson.fromJson(body, AuthenticationBody.class).access_token);;
Optional<HttpRequest> scoringRequest = authKey.flatMap(key ->
Optional.of(scoringRequest(key, scoringUri, dataToBeScored)));
Optional<String> scoringResult = scoringRequest.flatMap(req ->
getRequestBody(client, req));
// ... etc (`scoringResult.orElse()`) ...
static HttpRequest tokenAuthenticationRequest(String tenantId, String

clientId, String clientSecret, String resourceManagerUrl)
{
String authUrl =
String.format("https://login.microsoftonline.com/%s/oauth2/token",
tenantId);
String clientIdParam = String.format("client_id=%s", clientId);
String resourceParam = String.format("resource=%s", resourceManagerUrl);
String clientSecretParam = String.format("client_secret=%s",
clientSecret);
String bodyString =
String.format("grant_type=client_credentials&%s&%s&%s", clientIdParam,
resourceParam, clientSecretParam);
HttpRequest request = HttpRequest.newBuilder()

.uri(URI.create(authUrl))
.POST(HttpRequest.BodyPublishers.ofString(bodyString))
.build();
return request;
}
static HttpRequest scoringRequest(String authKey, String scoringUri, String

dataToBeScored)
{
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(scoringUri))
.header("Authorization", String.format("Token %s", authKey))
.POST(HttpRequest.BodyPublishers.ofString(dataToBeScored))
.build();
return request;
static Optional<String> getRequestBody(HttpClient client, HttpRequest

request) {
try {
HttpResponse<String> response = client.send(request,
HttpResponse.BodyHandlers.ofString());
if (response.statusCode() != 200) {
System.out.println(String.format("Unexpected server response
%d", response.statusCode()));
return Optional.empty();
}
return Optional.of(response.body());
}catch(Exception x)
{
System.out.println(x.toString());
return Optional.empty();
}
}
class AuthenticationBody {
String access_token;
String token_type;
int expires_in;
String scope;
String refresh_token;
String id_token;
AuthenticationBody() {}
}
Changing datasets and datapaths without retraining

You may want to train and inference on different datasets and datapaths. For instance,
you may wish to train on a smaller dataset but inference on the complete dataset. You
switch datasets with the DataSetDefinitionValueAssignments key in the request's json
argument. You switch datapaths with DataPathAssignments . The technique for both is
similar:
1. In your pipeline definition script, create a PipelineParameter for the dataset.

Create a DatasetConsumptionConfig or DataPath from the PipelineParameter :
Python
tabular_dataset =
Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windo
ws.net/demo/Titanic.csv')
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param",
default_value=tabular_dataset)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset",
tabular_pipeline_param)
2. In your ML script, access the dynamically specified dataset using

Run.get_context().input_datasets :
Python
input_tabular_ds = Run.get_context().input_datasets['tabular_dataset']
dataframe = input_tabular_ds.to_pandas_dataframe()
# ... etc ...
Notice that the ML script accesses the value specified for the
DatasetConsumptionConfig ( tabular_dataset ) and not the value of the
PipelineParameter ( tabular_ds_param ).
3. In your pipeline definition script, set the DatasetConsumptionConfig as a parameter

to the PipelineScriptStep :
Python
name="train_step",
script_name="train_with_dataset.py",
arguments=["--param1", tabular_ds_consumption],
inputs=[tabular_ds_consumption],
source_directory=source_directory)
pipeline = Pipeline(workspace=ws, steps=[train_step])
4. To switch datasets dynamically in your inferencing REST call, use

DataSetDefinitionValueAssignments :
Python
tabular_ds1 =
Dataset.Tabular.from_delimited_files('path_to_training_dataset')
tabular_ds2 =
Dataset.Tabular.from_delimited_files('path_to_inference_dataset')
ds1_id = tabular_ds1.id
d22_id = tabular_ds2.id
response = requests.post(rest_endpoint,
headers=aad_token,
json={
"ExperimentName": "MyRestPipeline",
"DataSetDefinitionValueAssignments": {
"tabular_ds_param": {
"SavedDataSetReference": {"Id":
ds1_id #or ds2_id
}}}})
The notebooks Showcasing Dataset and PipelineParameter and Showcasing DataPath

and PipelineParameter have complete examples of this technique.
Create a versioned pipeline endpoint

You can create a Pipeline Endpoint with multiple published pipelines behind it. This
technique gives you a fixed REST endpoint as you iterate on and update your ML
pipelines.
Python
from azureml.pipeline.core import PipelineEndpoint
published_pipeline = PublishedPipeline.get(workspace=ws,
id="My_Published_Pipeline_id")
pipeline_endpoint = PipelineEndpoint.publish(workspace=ws,
name="PipelineEndpointTest",
pipeline=published_pipeline,
description="Test description Notebook")
Submit a job to a pipeline endpoint

You can submit a job to the default version of a pipeline endpoint:
Python
pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws,
name="PipelineEndpointTest")
run_id = pipeline_endpoint_by_name.submit("PipelineEndpointExperiment")
print(run_id)
You can also submit a job to a specific version:
Python
run_id = pipeline_endpoint_by_name.submit("PipelineEndpointExperiment",
pipeline_version="0")
print(run_id)
The same can be accomplished using the REST API:
Python
rest_endpoint = pipeline_endpoint_by_name.endpoint
response = requests.post(rest_endpoint,
headers=aad_token,
json={"ExperimentName":
"PipelineEndpointExperiment",
"RunSource": "API",
"ParameterAssignments": {"1": "united",
"2":"city"}})
Use published pipelines in the studio

You can also run a published pipeline from the studio:
2. View your workspace.

3. On the left, select Endpoints.
4. On the top, select Pipeline endpoints.
5. Select a specific pipeline to run, consume, or review results of previous runs of the
pipeline endpoint.
Disable a published pipeline

To hide a pipeline from your list of published pipelines, you disable it, either in the
studio or from the SDK:
Python
# Get the pipeline by using its ID from Azure Machine Learning studio
p = PublishedPipeline.get(ws, id="068f4885-7088-424b-8ce2-eeb9ba5381a6")
p.disable()
You can enable it again with p.enable() . For more information, see PublishedPipeline
class reference.
Next steps
Use these Jupyter notebooks on GitHub to explore machine learning pipelines
further.
See the SDK reference help for the azureml-pipelines-core package and the
azureml-pipelines-steps package.
See the how-to for tips on debugging and troubleshooting pipelines.
How to use Apache Spark (powered by
Azure Synapse Analytics) in your
machine learning pipeline (deprecated)
Article • 06/01/2023
２ Warning
In this article, you'll learn how to use Apache Spark pools powered by Azure Synapse
Analytics as the compute target for a data preparation step in an Azure Machine
Learning pipeline. You'll learn how a single pipeline can use compute resources suited
for the specific step, such as data preparation or training. You'll see how data is
prepared for the Spark step and how it's passed to the next step.
Prerequisites
Create an Azure Machine Learning workspace to hold all your pipeline resources.

installed.
Create an Azure Synapse Analytics workspace and Apache Spark pool (see
Quickstart: Create a serverless Apache Spark pool using Synapse Studio).
Link your Azure Machine Learning workspace

and Azure Synapse Analytics workspace
You create and administer your Apache Spark pools in an Azure Synapse Analytics
workspace. To integrate an Apache Spark pool with an Azure Machine Learning
workspace, you must link to the Azure Synapse Analytics workspace.
Once your Azure Machine Learning workspace and your Azure Synapse Analytics
workspaces are linked, you can attach an Apache Spark pool via
Python SDK (as elaborated below)
Azure Resource Manager (ARM) template (see this Example ARM template ).
You can use the command line to follow the ARM template, add the linked
service, and attach the Apache Spark pool with the following code:
Azure CLI
az deployment group create --name --resource-group <rg_name> --

template-file "azuredeploy.json" --parameters
@"azuredeploy.parameters.json"
） Important
To link to the Azure Synapse Analytics workspace successfully, you must have the
Owner role in the Azure Synapse Analytics workspace resource. Check your access
in the Azure portal.
The linked service will get a system-assigned managed identity (SAI) when you
create it. You must assign this link service SAI the "Synapse Apache Spark
administrator" role from Synapse Studio so that it can submit the Spark job (see
How to manage Synapse RBAC role assignments in Synapse Studio).
You must also give the user of the Azure Machine Learning workspace the role
"Contributor" from Azure portal of resource management.
Retrieve the link between your Azure Synapse

Analytics workspace and your Azure Machine
Learning workspace
You can retrieve linked services in your workspace with code such as:
Python
from azureml.core import Workspace, LinkedService,
SynapseWorkspaceLinkedServiceConfiguration
for service in LinkedService.list(ws) :

print(f"Service: {service}")
# Retrieve a known linked service

linked_service = LinkedService.get(ws, 'synapselink1')
First, Workspace.from_config() accesses your Azure Machine Learning workspace using

the configuration in config.json (see Create a workspace configuration file). Then, the
code prints all of the linked services available in the Workspace. Finally,
LinkedService.get() retrieves a linked service named 'synapselink1' .
Attach your Apache spark pool as a compute

target for Azure Machine Learning
To use your Apache spark pool to power a step in your machine learning pipeline, you
must attach it as a ComputeTarget for the pipeline step, as shown in the following code.
Python
from azureml.core.compute import SynapseCompute, ComputeTarget
attach_config = SynapseCompute.attach_configuration(
linked_service = linked_service,
type="SynapseSpark",
pool_name="spark01") # This name comes from your Synapse workspace
synapse_compute=ComputeTarget.attach(
workspace=ws,
name='link1-spark01',
attach_configuration=attach_config)
synapse_compute.wait_for_completion()
The first step is to configure the SynapseCompute . The linked_service argument is the
LinkedService object you created or retrieved in the previous step. The type argument
must be SynapseSpark . The pool_name argument in
SynapseCompute.attach_configuration() must match that of an existing pool in your
Azure Synapse Analytics workspace. For more information on creating an Apache spark
pool in the Azure Synapse Analytics workspace, see Quickstart: Create a serverless
Apache Spark pool using Synapse Studio. The type of attach_config is
ComputeTargetAttachConfiguration .
Once the configuration is created, you create a machine learning ComputeTarget by

passing in the Workspace , ComputeTargetAttachConfiguration , and the name by which
you'd like to refer to the compute within the machine learning workspace. The call to
ComputeTarget.attach() is asynchronous, so the sample blocks until the call completes.
Create a SynapseSparkStep that uses the linked

Apache Spark pool
The sample notebook Spark job on Apache spark pool defines a simple machine
learning pipeline. First, the notebook defines a data preparation step powered by the
synapse_compute defined in the previous step. Then, the notebook defines a training
step powered by a compute target better suited for training. The sample notebook uses
the Titanic survival database to demonstrate data input and output; it doesn't actually
clean the data or make a predictive model. Since there's no real training in this sample,
the training step uses an inexpensive, CPU-based compute resource.
Data flows into a machine learning pipeline by way of DatasetConsumptionConfig

objects, which can hold tabular data or sets of files. The data often comes from files in
blob storage in a workspace's datastore. The following code shows some typical code
for creating input for a machine learning pipeline:
Python
file_name = 'Titanic.csv'
titanic_tabular_dataset = Dataset.Tabular.from_delimited_files(path=
[(datastore, file_name)])
step1_input1 = titanic_tabular_dataset.as_named_input("tabular_input")
# Example only: it wouldn't make sense to duplicate input data, especially

one as tabular and the other as files
titanic_file_dataset = Dataset.File.from_files(path=[(datastore,
file_name)])
step1_input2 = titanic_file_dataset.as_named_input("file_input").as_hdfs()
The above code assumes that the file Titanic.csv is in blob storage. The code shows
how to read the file as a TabularDataset and as a FileDataset . This code is for
demonstration purposes only, as it would be confusing to duplicate inputs or to
interpret a single data source as both a table-containing resource and just as a file.
） Important
In order to use a FileDataset as input, your azureml-core version must be at least

1.20.0 . How to specify this using the Environment class is discussed below.
When a step completes, you may choose to store output data using code similar to:
Python
from azureml.data import HDFSOutputDatasetConfig

step1_output = HDFSOutputDatasetConfig(destination=
(datastore,"test")).register_on_complete(name="registered_dataset")
In this case, the data would be stored in the datastore in a file called test and would
be available within the machine learning workspace as a Dataset with the name
registered_dataset .
In addition to data, a pipeline step may have per-step Python dependencies. Individual
SynapseSparkStep objects can specify their precise Azure Synapse Apache Spark
configuration, as well. This is shown in the following code, which specifies that the
azureml-core package version must be at least 1.20.0 . (As mentioned previously, this
requirement for azureml-core is needed to use a FileDataset as an input.)
Python

from azureml.pipeline.steps import SynapseSparkStep
env = Environment(name="myenv")
env.python.conda_dependencies.add_pip_package("azureml-core>=1.20.0")
step_1 = SynapseSparkStep(name = 'synapse-spark',

file = 'dataprep.py',
source_directory="./code",
inputs=[step1_input1, step1_input2],
outputs=[step1_output],
arguments = ["--tabular_input", step1_input1,
"--file_input", step1_input2,
"--output_dir", step1_output],
compute_target = 'link1-spark01',
driver_memory = "7g",
driver_cores = 4,
executor_memory = "7g",
executor_cores = 2,
num_executors = 1,
environment = env)
The above code specifies a single step in the Azure machine learning pipeline. This
step's environment specifies a specific azureml-core version and could add other conda
or pip dependencies as necessary.
The SynapseSparkStep will zip and upload from the local computer the subdirectory
./code . That directory will be recreated on the compute server and the step will run the
file dataprep.py from that directory. The inputs and outputs of that step are the
step1_input1 , step1_input2 , and step1_output objects previously discussed. The easiest
way to access those values within the dataprep.py script is to associate them with
named arguments .
The next set of arguments to the SynapseSparkStep constructor control Apache Spark.
The compute_target is the 'link1-spark01' that we attached as a compute target
previously. The other parameters specify the memory and cores we'd like to use.
The sample notebook uses the following code for dataprep.py :
Python
import os
import sys
import azureml.core
from pyspark.sql import SparkSession
from azureml.core import Run, Dataset
print(azureml.core.VERSION)
print(os.environ)
import argparse
parser.add_argument("--tabular_input")
parser.add_argument("--file_input")
parser.add_argument("--output_dir")
# use dataset sdk to read tabular dataset

run_context = Run.get_context()
dataset =
Dataset.get_by_id(run_context.experiment.workspace,id=args.tabular_input)
sdf = dataset.to_spark_dataframe()
sdf.show()
# use hdfs path to read file dataset

spark= SparkSession.builder.getOrCreate()
sdf = spark.read.option("header", "true").csv(args.file_input)
sdf.show()
sdf.coalesce(1).write\
.option("header", "true")\
.mode("append")\
.csv(args.output_dir)
This "data preparation" script doesn't do any real data transformation, but illustrates
how to retrieve data, convert it to a spark dataframe, and how to do some basic Apache
Spark manipulation. You can find the output in Azure Machine Learning Studio by
opening the child job, choosing the Outputs + logs tab, and opening the
logs/azureml/driver/stdout file, as shown in the following figure.
Use the SynapseSparkStep in a pipeline

The following example uses the output from the SynapseSparkStep created in the
previous section. Other steps in the pipeline may have their own unique environments
and run on different compute resources appropriate to the task at hand. The sample
notebook runs the "training step" on a small CPU cluster:
Python
cpu_cluster_name = "cpucluster"
if cpu_cluster_name in ws.compute_targets:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print('Found existing cluster, use it.')
else:
compute_config =
AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=1)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
print('Allocating new CPU compute cluster')
cpu_cluster.wait_for_completion(show_output=True)
step2_input = step1_output.as_input("step2_input").as_download()
step_2 = PythonScriptStep(script_name="train.py",
arguments=[step2_input],
inputs=[step2_input],
compute_target=cpu_cluster_name,
source_directory="./code",
allow_reuse=False)
The code above creates the new compute resource if necessary. Then, the step1_output
result is converted to input for the training step. The as_download() option means that
the data will be moved onto the compute resource, resulting in faster access. If the data
was so large that it wouldn't fit on the local compute hard drive, you would use the
as_mount() option to stream the data via the FUSE filesystem. The compute_target of
this second step is 'cpucluster' , not the 'link1-spark01' resource you used in the data
preparation step. This step uses a simple program train.py instead of the dataprep.py
you used in the previous step. You can see the details of train.py in the sample
notebook.
Once you've defined all of your steps, you can create and run your pipeline.
Python
pipeline = Pipeline(workspace=ws, steps=[step_1, step_2])

pipeline_run = pipeline.submit('synapse-pipeline', regenerate_outputs=True)
The above code creates a pipeline consisting of the data preparation step on Apache
Spark pools powered by Azure Synapse Analytics ( step_1 ) and the training step
( step_2 ). Azure calculates the execution graph by examining the data dependencies
between the steps. In this case, there's only a straightforward dependency that
step2_input necessarily requires step1_output .
The call to pipeline.submit creates, if necessary, an Experiment called synapse-pipeline
and asynchronously begins a Job within it. Individual steps within the pipeline are run as
Child Jobs of this main job and can be monitored and reviewed in the Experiments page
of Studio.
Next steps
Publish and track machine learning pipelines
Monitor Azure Machine Learning
Use automated ML in an Azure Machine Learning pipeline in Python
Trigger machine learning pipelines
Article • 03/02/2023
In this article, you'll learn how to programmatically schedule a pipeline to run on Azure.
You can create a schedule based on elapsed time or on file-system changes. Time-based
schedules can be used to take care of routine tasks, such as monitoring for data drift.
Change-based schedules can be used to react to irregular or unpredictable changes,
such as new data being uploaded or old data being edited. After learning how to create
schedules, you'll learn how to retrieve and deactivate them. Finally, you'll learn how to
use other Azure services, Azure Logic App and Azure Data Factory, to run pipelines. An
Azure Logic App allows for more complex triggering logic or behavior. Azure Data
Factory pipelines allow you to call a machine learning pipeline as part of a larger data
orchestration pipeline.
Prerequisites
An Azure subscription. If you don’t have an Azure subscription, create a free
account .
A Python environment in which the Azure Machine Learning SDK for Python is
installed. For more information, see Create and manage reusable environments for
training and deployment with Azure Machine Learning.
A Machine Learning workspace with a published pipeline. You can use the one
built in Create and run machine learning pipelines with Azure Machine Learning
SDK.
Trigger pipelines with Azure Machine Learning

SDK for Python
To schedule a pipeline, you'll need a reference to your workspace, the identifier of your
published pipeline, and the name of the experiment in which you wish to create the
schedule. You can get these values with the following code:
Python
import azureml.core
from azureml.pipeline.core import Pipeline, PublishedPipeline
experiments = Experiment.list(ws)
for experiment in experiments:
print(experiment.name)
published_pipelines = PublishedPipeline.list(ws)
for published_pipeline in published_pipelines:
print(f"{published_pipeline.name},'{published_pipeline.id}'")
experiment_name = "MyExperiment"
pipeline_id = "aaaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
Create a schedule
To run a pipeline on a recurring basis, you'll create a schedule. A Schedule associates a
pipeline, an experiment, and a trigger. The trigger can either be a ScheduleRecurrence
that describes the wait between jobs or a Datastore path that specifies a directory to
watch for changes. In either case, you'll need the pipeline identifier and the name of the
experiment in which to create the schedule.
At the top of your Python file, import the Schedule and ScheduleRecurrence classes:
Python
from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule
Create a time-based schedule

The ScheduleRecurrence constructor has a required frequency argument that must be
one of the following strings: "Minute", "Hour", "Day", "Week", or "Month". It also
requires an integer interval argument specifying how many of the frequency units
should elapse between schedule starts. Optional arguments allow you to be more
specific about starting times, as detailed in the ScheduleRecurrence SDK docs.
Create a Schedule that begins a job every 15 minutes:
Python
recurrence = ScheduleRecurrence(frequency="Minute", interval=15)

recurring_schedule = Schedule.create(ws, name="MyRecurringSchedule",
description="Based on time",
pipeline_id=pipeline_id,
experiment_name=experiment_name,
recurrence=recurrence)
Create a change-based schedule

Pipelines that are triggered by file changes may be more efficient than time-based
schedules. When you want to do something before a file is changed, or when a new file
is added to a data directory, you can preprocess that file. You can monitor any changes
to a datastore or changes within a specific directory within the datastore. If you monitor
a specific directory, changes within subdirectories of that directory will not trigger a job.
７ Note
Change-based schedules only supports monitoring Azure Blob storage.
To create a file-reactive Schedule , you must set the datastore parameter in the call to
Schedule.create. To monitor a folder, set the path_on_datastore argument.
The polling_interval argument allows you to specify, in minutes, the frequency at

which the datastore is checked for changes.
If the pipeline was constructed with a DataPath PipelineParameter, you can set that
variable to the name of the changed file by setting the data_path_parameter_name
argument.
Python
datastore = Datastore(workspace=ws, name="workspaceblobstore")
reactive_schedule = Schedule.create(ws, name="MyReactiveSchedule",

description="Based on input file change.",
pipeline_id=pipeline_id,
experiment_name=experiment_name, datastore=datastore,
data_path_parameter_name="input_data")
Optional arguments when creating a schedule

In addition to the arguments discussed previously, you may set the status argument to
"Disabled" to create an inactive schedule. Finally, the continue_on_step_failure allows
you to pass a Boolean that will override the pipeline's default failure behavior.
View your scheduled pipelines
In your Web browser, navigate to Azure Machine Learning. From the Endpoints section
of the navigation panel, choose Pipeline endpoints. This takes you to a list of the
pipelines published in the Workspace.
In this page you can see summary information about all the pipelines in the Workspace:
names, descriptions, status, and so forth. Drill in by clicking in your pipeline. On the
resulting page, there are more details about your pipeline and you may drill down into
individual jobs.
Deactivate the pipeline

If you have a Pipeline that is published, but not scheduled, you can disable it with:
Python
pipeline = PublishedPipeline.get(ws, id=pipeline_id)

pipeline.disable()
If the pipeline is scheduled, you must cancel the schedule first. Retrieve the schedule's
identifier from the portal or by running:
Python
ss = Schedule.list(ws)
for s in ss:
print(s)
Once you have the schedule_id you wish to disable, run:
Python
def stop_by_schedule_id(ws, schedule_id):

s = next(s for s in Schedule.list(ws) if s.id == schedule_id)
s.disable()
return s
stop_by_schedule_id(ws, schedule_id)
If you then run Schedule.list(ws) again, you should get an empty list.
Use Azure Logic Apps for complex triggers

More complex trigger rules or behavior can be created using an Azure Logic App.
To use an Azure Logic App to trigger a Machine Learning pipeline, you'll need the REST
endpoint for a published Machine Learning pipeline. Create and publish your pipeline.
Then find the REST endpoint of your PublishedPipeline by using the pipeline ID:
Python
# You can find the pipeline ID in Azure Machine Learning studio
published_pipeline = PublishedPipeline.get(ws, id="<pipeline-id-here>")

published_pipeline.endpoint
Create a Logic App

Now create an Azure Logic App instance. If you wish, use an integration service
environment (ISE) and set up a customer-managed key for use by your Logic App.
Once your Logic App has been provisioned, use these steps to configure a trigger for
your pipeline:
1. Create a system-assigned managed identity to give the app access to your Azure
Machine Learning Workspace.
2. Navigate to the Logic App Designer view and select the Blank Logic App template.
3. In the Designer, search for blob. Select the When a blob is added or modified
(properties only) trigger and add this trigger to your Logic App.
4. Fill in the connection info for the Blob storage account you wish to monitor for
blob additions or modifications. Select the Container to monitor.
Choose the Interval and Frequency to poll for updates that work for you.
７ Note
This trigger will monitor the selected Container but won't monitor subfolders.
5. Add an HTTP action that will run when a new or modified blob is detected. Select
+ New Step, then search for and select the HTTP action.
Use the following settings to configure your action:
Setting Value
HTTP action POST
URI the endpoint to the published pipeline that you found as a Prerequisite
Authentication mode Managed Identity

1. Set up your schedule to set the value of any DataPath PipelineParameters you
may have:
JSON
{
"DataPathAssignments": {
"input_datapath": {
"DataStoreName": "<datastore-name>",
"RelativePath": "@{triggerBody()?['Name']}"
}
},
"ExperimentName": "MyRestPipeline",
"ParameterAssignments": {
"input_string": "sample_string3"
},
"RunSource": "SDK"
}
Use the DataStoreName you added to your workspace as a Prerequisite.

2. Select Save and your schedule is now ready.
） Important
If you are using Azure role-based access control (Azure RBAC) to manage access to
your pipeline, set the permissions for your pipeline scenario (training or scoring).
Call machine learning pipelines from Azure

Data Factory pipelines
In an Azure Data Factory pipeline, the Machine Learning Execute Pipeline activity runs an
Azure Machine Learning pipeline. You can find this activity in the Data Factory's
authoring page under the Machine Learning category:
Next steps
In this article, you used the Azure Machine Learning SDK for Python to schedule a
pipeline in two different ways. One schedule recurs based on elapsed clock time. The
other schedule jobs if a file is modified on a specified Datastore or within a directory on
that store. You saw how to use the portal to examine the pipeline and individual jobs.
You learned how to disable a schedule so that the pipeline stops running. Finally, you
created an Azure Logic App to trigger a pipeline.
Use Azure Machine Learning Pipelines for batch scoring
Learn more about pipelines

Learn more about exploring Azure Machine Learning with Jupyter
Convert ML experiments to production
Python code
Article • 04/13/2023
In this tutorial, you learn how to convert Jupyter notebooks into Python scripts to make
it testing and automation friendly using the MLOpsPython code template and Azure
Machine Learning. Typically, this process is used to take experimentation / training code
from a Jupyter notebook and convert it into Python scripts. Those scripts can then be
used testing and CI/CD automation in your production environment.
A machine learning project requires experimentation where hypotheses are tested with
agile tools like Jupyter Notebook using real datasets. Once the model is ready for
production, the model code should be placed in a production code repository. In some
cases, the model code must be converted to Python scripts to be placed in the
production code repository. This tutorial covers a recommended approach on how to
export experimentation code to Python scripts.
In this tutorial, you learn how to:
＂ Clean nonessential code

＂ Refactor Jupyter Notebook code into functions
＂ Create Python scripts for related tasks
＂ Create unit tests
Prerequisites
Generate the MLOpsPython template and use the experimentation/Diabetes
Ridge Regression Training.ipynb and experimentation/Diabetes Ridge Regression
Scoring.ipynb notebooks. These notebooks are used as an example of converting

from experimentation to production. You can find these notebooks at
https://github.com/microsoft/MLOpsPython/tree/master/experimentation .
Install nbconvert . Follow only the installation instructions under section Installing
nbconvert on the Installation page.
Remove all nonessential code

Some code written during experimentation is only intended for exploratory purposes.
Therefore, the first step to convert experimental code into production code is to remove
this nonessential code. Removing nonessential code will also make the code more
maintainable. In this section, you'll remove code from the experimentation/Diabetes
Ridge Regression Training.ipynb notebook. The statements printing the shape of X and
y and the cell calling features.describe are just for data exploration and can be
removed. After removing nonessential code, experimentation/Diabetes Ridge

Regression Training.ipynb should look like the following code without markdown:
Python
from sklearn.datasets import load_diabetes

import joblib
import pandas as pd
sample_data = load_diabetes()
df = pd.DataFrame(
data=sample_data.data,
columns=sample_data.feature_names)
df['Y'] = sample_data.target
X = df.drop('Y', axis=1).values
y = df['Y'].values
X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=0)
data = {"train": {"X": X_train, "y": y_train},
"test": {"X": X_test, "y": y_test}}
args = {
"alpha": 0.5
}
reg_model = Ridge(**args)
reg_model.fit(data["train"]["X"], data["train"]["y"])
preds = reg_model.predict(data["test"]["X"])
mse = mean_squared_error(preds, y_test)
metrics = {"mse": mse}
print(metrics)
model_name = "sklearn_regression_model.pkl"
joblib.dump(value=reg, filename=model_name)
Refactor code into functions
Second, the Jupyter code needs to be refactored into functions. Refactoring code into
functions makes unit testing easier and makes the code more maintainable. In this
section, you'll refactor:
The Diabetes Ridge Regression Training notebook( experimentation/Diabetes Ridge

Regression Training.ipynb )
The Diabetes Ridge Regression Scoring notebook( experimentation/Diabetes Ridge
Regression Scoring.ipynb )
Refactor Diabetes Ridge Regression Training notebook

into functions
In experimentation/Diabetes Ridge Regression Training.ipynb , complete the following
steps:
1. Create a function called split_data to split the data frame into test and train data.
The function should take the dataframe df as a parameter, and return a dictionary
containing the keys train and test .
Move the code under the Split Data into Training and Validation Sets heading into
the split_data function and modify it to return the data object.
2. Create a function called train_model , which takes the parameters data and args
and returns a trained model.
Move the code under the heading Training Model on Training Set into the
train_model function and modify it to return the reg_model object. Remove the
args dictionary, the values will come from the args parameter.
3. Create a function called get_model_metrics , which takes parameters reg_model and

data , and evaluates the model then returns a dictionary of metrics for the trained
model.
Move the code under the Validate Model on Validation Set heading into the
get_model_metrics function and modify it to return the metrics object.
The three functions should be as follows:
Python
# Split the dataframe into test and train data
def split_data(df):
y = df['Y'].values

return data
# Train the model, return the model

def train_model(data, args):
return reg_model
# Evaluate the metrics for the model

def get_model_metrics(reg_model, data):
mse = mean_squared_error(preds, data["test"]["y"])
return metrics
Still in experimentation/Diabetes Ridge Regression Training.ipynb , complete the

following steps:
1. Create a new function called main , which takes no parameters and returns nothing.
2. Move the code under the "Load Data" heading into the main function.
3. Add invocations for the newly written functions into the main function:
Python
# Split Data into Training and Validation Sets

data = split_data(df)
Python
# Train Model on Training Set

args = {
"alpha": 0.5
}
reg = train_model(data, args)
Python
# Validate Model on Validation Set

metrics = get_model_metrics(reg, data)
4. Move the code under the "Save Model" heading into the main function.
The main function should look like the following code:
Python
def main():
# Load Data
df = pd.DataFrame(


args = {
"alpha": 0.5
}

# Save Model
At this stage, there should be no code remaining in the notebook that isn't in a function,
other than import statements in the first cell.
Add a statement that calls the main function.
Python
main()
After refactoring, experimentation/Diabetes Ridge Regression Training.ipynb should

look like the following code without the markdown:
Python

import pandas as pd
import joblib

def split_data(df):
y = df['Y'].values

return data

return reg_model

return metrics
def main():
# Load Data
df = pd.DataFrame(


args = {
"alpha": 0.5
}
# Save Model
main()
Refactor Diabetes Ridge Regression Scoring notebook

into functions
In experimentation/Diabetes Ridge Regression Scoring.ipynb , complete the following
steps:
1. Create a new function called init , which takes no parameters and return nothing.
2. Copy the code under the "Load Model" heading into the init function.
The init function should look like the following code:
Python
def init():
model_path = Model.get_model_path(
model_name="sklearn_regression_model.pkl")
Once the init function has been created, replace all the code under the heading "Load
Model" with a single call to init as follows:
Python
init()
In experimentation/Diabetes Ridge Regression Scoring.ipynb , complete the following

steps:
1. Create a new function called run , which takes raw_data and request_headers as
parameters and returns a dictionary of results as follows:
Python
{"result": result.tolist()}
2. Copy the code under the "Prepare Data" and "Score Data" headings into the run
function.
The run function should look like the following code (Remember to remove the
statements that set the variables raw_data and request_headers , which will be
used later when the run function is called):
Python
def run(raw_data, request_headers):

data = json.loads(raw_data)["data"]
data = numpy.array(data)
return {"result": result.tolist()}
Once the run function has been created, replace all the code under the "Prepare Data"
and "Score Data" headings with the following code:
Python
raw_data = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(raw_data, request_header)
print("Test result: ", prediction)
The previous code sets variables raw_data and request_header , calls the run function
with raw_data and request_header , and prints the predictions.
After refactoring, experimentation/Diabetes Ridge Regression Scoring.ipynb should

look like the following code without the markdown:
Python
import json
import numpy
import joblib
def init():

init()
test_row = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(test_row, {})
Combine related functions in Python files

Third, related functions need to be merged into Python files to better help code reuse.
In this section, you'll be creating Python files for the following notebooks:
The Diabetes Ridge Regression Training notebook( experimentation/Diabetes Ridge

Regression Training.ipynb )
The Diabetes Ridge Regression Scoring notebook( experimentation/Diabetes Ridge

Regression Scoring.ipynb )
Create Python file for the Diabetes Ridge Regression

Training notebook
Convert your notebook to an executable script by running the following statement in a
command prompt, which uses the nbconvert package and the path of
experimentation/Diabetes Ridge Regression Training.ipynb :
jupyter nbconvert "Diabetes Ridge Regression Training.ipynb" --to script --

output train
Once the notebook has been converted to train.py , remove any unwanted comments.
Replace the call to main() at the end of the file with a conditional invocation like the
following code:
Python
if __name__ == '__main__':
main()
Your train.py file should look like the following code:

Python

import pandas as pd
import joblib

def split_data(df):
y = df['Y'].values

return data

return reg_model

return metrics
def main():
# Load Data
df = pd.DataFrame(


args = {
"alpha": 0.5
}
# Save Model
if __name__ == '__main__':
main()
train.py can now be invoked from a terminal by running python train.py . The
functions from train.py can also be called from other files.
The train_aml.py file found in the diabetes_regression/training directory in the

MLOpsPython repository calls the functions defined in train.py in the context of an
Azure Machine Learning experiment job. The functions can also be called in unit tests,
covered later in this guide.
Create Python file for the Diabetes Ridge Regression

Scoring notebook
Convert your notebook to an executable script by running the following statement in a
command prompt that which uses the nbconvert package and the path of
experimentation/Diabetes Ridge Regression Scoring.ipynb :
jupyter nbconvert "Diabetes Ridge Regression Scoring.ipynb" --to script --

output score
Once the notebook has been converted to score.py , remove any unwanted comments.
Your score.py file should look like the following code:
Python
import json
import numpy
import joblib
def init():
init()
test_row = '{"data":[[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]]}'
request_header = {}
prediction = run(test_row, request_header)
The model variable needs to be global so that it's visible throughout the script. Add the
following statement at the beginning of the init function:
Python
global model
After adding the previous statement, the init function should look like the following
code:
Python
def init():
global model
# load the model from file into a global object

Create unit tests for each Python file

Fourth, create unit tests for your Python functions. Unit tests protect code against
functional regressions and make it easier to maintain. In this section, you'll be creating
unit tests for the functions in train.py .
train.py contains multiple functions, but we'll only create a single unit test for the
train_model function using the Pytest framework in this tutorial. Pytest isn't the only
Python unit testing framework, but it's one of the most commonly used. For more
information, visit Pytest .
A unit test usually contains three main actions:
Arrange object - creating and setting up necessary objects

Act on an object
Assert what is expected
The unit test will call train_model with some hard-coded data and arguments, and
validate that train_model acted as expected by using the resulting trained model to
make a prediction and comparing that prediction to an expected value.
Python
import numpy as np
from code.training.train import train_model
def test_train_model():
# Arrange
X_train = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)
y_train = np.array([10, 9, 8, 8, 6, 5])
data = {"train": {"X": X_train, "y": y_train}}
# Act
reg_model = train_model(data, {"alpha": 1.2})
# Assert
preds = reg_model.predict([[1], [2]])
np.testing.assert_almost_equal(preds, [9.93939393939394,
9.03030303030303])
Next steps
Now that you understand how to convert from an experiment to production code, see
the following links for more information and next steps:
MLOpsPython : Build a CI/CD pipeline to train, evaluate and deploy your own
model using Azure Pipelines and Azure Machine Learning
Monitor Azure Machine Learning experiment jobs and metrics
Monitor and collect data from ML web service endpoints
Deploy machine learning models to
Azure
Article • 03/02/2023
Learn how to deploy your machine learning or deep learning model as a web service in
the Azure cloud.
７ Note
Azure Machine Learning Endpoints (v2) provide an improved, simpler deployment

experience. Endpoints support both real-time and batch inference scenarios.
Endpoints provide a unified interface to invoke and manage model deployments
across compute types. See What are Azure Machine Learning endpoints?.
Workflow for deploying a model

The workflow is similar no matter where you deploy your model:
1. Register the model.

2. Prepare an entry script.
3. Prepare an inference configuration.
4. Deploy the model locally to ensure everything works.
5. Choose a compute target.
6. Deploy the model to the cloud.
7. Test the resulting web service.
For more information on the concepts involved in the machine learning deployment
workflow, see Manage, deploy, and monitor models with Azure Machine Learning.
Prerequisites
Azure CLI
） Important
The Azure CLI commands in this article require the azure-cli-ml , or v1,
extension for Azure Machine Learning. Support for the v1 extension will end on
September 30, 2025. You will be able to install and use the v1 extension until
that date.
We recommend that you transition to the ml , or v2, extension before

September 30, 2025. For more information on the v2 extension, see Azure ML
CLI extension and Python SDK v2.

A model. The examples in this article use a pre-trained model.
A machine that can run Docker, such as a compute instance.
Connect to your workspace

Azure CLI
To see the workspaces that you have access to, use the following commands:
Azure CLI
az login
az account set -s <subscription>
az ml workspace list --resource-group=<resource-group>
Register the model

A typical situation for a deployed machine learning service is that you need the
following components:
Resources representing the specific model that you want deployed (for example: a
pytorch model file).
Code that you will be running in the service that executes the model on a given
input.
Azure Machine Learnings allows you to separate the deployment into two separate
components, so that you can keep the same code, but merely update the model. We
define the mechanism by which you upload a model separately from your code as
"registering the model".
When you register a model, we upload the model to the cloud (in your workspace's
default storage account) and then mount it to the same compute where your webservice
is running.
The following examples demonstrate how to register a model.
） Important
You should use only models that you create or obtain from a trusted source. You
should treat serialized models as code, because security vulnerabilities have been
discovered in a number of popular formats. Also, models might be intentionally
trained with malicious intent to provide biased or inaccurate output.
Azure CLI
The following commands download a model and then register it with your Azure
Machine Learning workspace:
Azure CLI
wget https://aka.ms/bidaf-9-model -O model.onnx --show-progress

az ml model register -n bidaf_onnx \
-p ./model.onnx \
-g <resource-group> \
-w <workspace-name>
Set -p to the path of a folder or a file that you want to register.
For more information on az ml model register , see the reference documentation.
Register a model from an Azure Machine Learning

training job
If you need to register a model that was created previously through an Azure
Machine Learning training job, you can specify the experiment, run, and path to the
model:
Azure CLI
az ml model register -n bidaf_onnx --asset-path outputs/model.onnx --

experiment-name myexperiment --run-id myrunid --tag area=qna
The --asset-path parameter refers to the cloud location of the model. In this
example, the path of a single file is used. To include multiple files in the model
registration, set --asset-path to the path of a folder that contains the files.
For more information on az ml model register , see the reference documentation.
７ Note
You can also register a model from a local file via the Workspace UI portal.
Currently, there are two options to upload a local model file in the UI:
From local files, which will register a v2 model.

From local files (based on framework), which will register a v1 model.
Note that only models registered via the From local files (based on framework)
entrance (which are known as v1 models) can be deployed as webservices using
SDKv1/CLIv1.
Define a dummy entry script

The entry script receives data submitted to a deployed web service and passes it to the
model. It then returns the model's response to the client. The script is specific to your
model. The entry script must understand the data that the model expects and returns.
The two things you need to accomplish in your entry script are:
1. Loading your model (using a function called init() )

2. Running your model on input data (using a function called run() )
For your initial deployment, use a dummy entry script that prints the data it receives.
Python
import json
def init():
print("This is init")
def run(data):
test = json.loads(data)
print(f"received data {test}")
return f"test is {test}"
Save this file as echo_score.py inside of a directory called source_dir . This dummy
script returns the data you send to it, so it doesn't use the model. But it is useful for
testing that the scoring script is running.
Define an inference configuration

An inference configuration describes the Docker container and files to use when
initializing your web service. All of the files within your source directory, including
subdirectories, will be zipped up and uploaded to the cloud when you deploy your web
service.
The inference configuration below specifies that the machine learning deployment will
use the file echo_score.py in the ./source_dir directory to process incoming requests
and that it will use the Docker image with the Python packages specified in the
project_environment environment.
You can use any Azure Machine Learning inference curated environments as the base
Docker image when creating your project environment. We will install the required
dependencies on top and store the resulting Docker image into the repository that is
associated with your workspace.
７ Note
Azure machine learning inference source directory upload does not respect
.gitignore or .amlignore
Azure CLI
A minimal inference configuration can be written as:
JSON
{
"entryScript": "echo_score.py",
"sourceDirectory": "./source_dir",
"environment": {
"docker": {
"arguments": [],
ubuntu18.04",
"enabled": false,
"shmSize": null
},
},
"python": {
"channels": [],
"dependencies": [
"python=3.6.2",
{
"pip": [
"azureml-defaults"
]
}
],
},
},
"version": "1"
}
}
Save this file with the name dummyinferenceconfig.json .
See this article for a more thorough discussion of inference configurations.
Define a deployment configuration

A deployment configuration specifies the amount of memory and cores your webservice
needs in order to run. It also provides configuration details of the underlying
webservice. For example, a deployment configuration lets you specify that your service
needs 2 gigabytes of memory, 2 CPU cores, 1 GPU core, and that you want to enable
autoscaling.
The options available for a deployment configuration differ depending on the compute
target you choose. In a local deployment, all you can specify is which port your
webservice will be served on.
Azure CLI

LocalWebservice.deploy_configuration. The following table describes the mapping
JSON Method Description

entity parameter
computeType NA The compute target. For local targets, the value must be
local .
port port The local port on which to expose the service's HTTP
endpoint.
This JSON is an example deployment configuration for use with the CLI:
JSON
{
"computeType": "local",
"port": 32267
}
Save this JSON as a file called deploymentconfig.json .
For more information, see the deployment schema.
Deploy your machine learning model

You are now ready to deploy your model.
Azure CLI

Replace bidaf_onnx:1 with the name of your model and its version number.
Azure CLI
az ml model deploy -n myservice \

-m bidaf_onnx:1 \
--overwrite \
--ic dummyinferenceconfig.json \
--dc deploymentconfig.json \
-w <workspace-name>
Call into your model

Let's check that your echo model deployed successfully. You should be able to do a
simple liveness request, as well as a scoring request:
Azure CLI
Azure CLI
curl -v http://localhost:32267
curl -v -X POST -H "content-type:application/json" \
-d '{"query": "What color is the fox", "context": "The quick brown
fox jumped over the lazy dog."}' \
http://localhost:32267/score
Define an entry script

Now it's time to actually load your model. First, modify your entry script:
Python
import json
import numpy as np
import os
import onnxruntime
from nltk import word_tokenize
import nltk
def init():
nltk.download("punkt")
global sess
sess = onnxruntime.InferenceSession(
os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model.onnx")
)
def run(request):
print(request)
text = json.loads(request)
qw, qc = preprocess(text["query"])
cw, cc = preprocess(text["context"])
# Run inference
test = sess.run(
None,
{"query_word": qw, "query_char": qc, "context_word": cw,
"context_char": cc},
)
start = np.asscalar(test[0])
end = np.asscalar(test[1])
ans = [w for w in cw[start : end + 1].reshape(-1)]
print(ans)
return ans
def preprocess(word):
tokens = word_tokenize(word)
# split into lower-case word tokens, in numpy array with shape of (seq,
1)
words = np.asarray([w.lower() for w in tokens]).reshape(-1, 1)
# split words into chars, in numpy array with shape of (seq, 1, 1, 16)
chars = [[c for c in t][:16] for t in tokens]
chars = [cs + [""] * (16 - len(cs)) for cs in chars]
chars = np.asarray(chars).reshape(-1, 1, 1, 16)
return words, chars
Save this file as score.py inside of source_dir .
Notice the use of the AZUREML_MODEL_DIR environment variable to locate your registered
model. Now that you've added some pip packages.
Azure CLI
JSON
{
"sourceDirectory": "./source_dir",
"environment": {
"docker": {
"arguments": [],
ubuntu18.04",
"enabled": false,
"shmSize": null
},
},
"python": {
"channels": [],
"dependencies": [
"python=3.6.2",
{
"pip": [
"azureml-defaults",
"nltk",
"numpy",
"onnxruntime"
]
}
],
},
},
"version": "2"
}
}
Save this file as inferenceconfig.json
Deploy again and call your service

Deploy your service again:
Azure CLI
Azure CLI

-m bidaf_onnx:1 \
--overwrite \
--ic inferenceconfig.json \
--dc deploymentconfig.json \
-w <workspace-name>
Then ensure you can send a post request to the service:
Azure CLI
Azure CLI
curl -v -X POST -H "content-type:application/json" \

-d '{"query": "What color is the fox", "context": "The quick brown
fox jumped over the lazy dog."}' \
http://localhost:32267/score
Choose a compute target

The compute target you use to host your model will affect the cost and availability of
your deployed endpoint. Use this table to choose an appropriate compute target.

support
Local web service Testing/debugging Use for limited testing and

troubleshooting. Hardware acceleration
depends on use of libraries in the local
system.
Azure Machine Real-time Yes Fully managed computes for real-time

Learning endpoints inference (managed online endpoints) and batch
(SDK/CLI v2 only) scoring (batch endpoints) on serverless
Batch inference compute.
support
Azure Machine Real-time Yes Run inferencing workloads on on-

Learning inference premises, cloud, and edge Kubernetes
Kubernetes clusters.
Batch inference
Azure Container Real-time Use for low-scale CPU-based workloads

Instances (SDK/CLI inference that require less than 48 GB of RAM.
v1 only) Doesn't require you to manage a cluster.
Recommended for
dev/test purposes Supported in the designer.
only.
７ Note
When choosing a cluster SKU, first scale up and then scale out. Start with a machine
that has 150% of the RAM your model requires, profile the result and find a
machine that has the performance you need. Once you've learned that, increase the
number of machines to fit your need for concurrent inference.
７ Note
Container instances require the SDK or CLI v1 and are suitable only for small
models less than 1 GB in size.
Deploy to cloud
Once you've confirmed your service works locally and chosen a remote compute target,
you are ready to deploy to the cloud.
Change your deploy configuration to correspond to the compute target you've chosen,
in this case Azure Container Instances:
Azure CLI
The options available for a deployment configuration differ depending on the

compute target you choose.
JSON
{
{
"cpu": 0.5,
"memoryInGB": 1.0
},
}
Save this file as re-deploymentconfig.json .
For more information, see this reference.
Deploy your service again:
Azure CLI
Azure CLI

-m bidaf_onnx:1 \
--overwrite \
--ic inferenceconfig.json \
--dc re-deploymentconfig.json \
-w <workspace-name>
To view the service logs, use the following command:
Azure CLI
az ml service get-logs -n myservice \

-w <workspace-name>
Call your remote webservice

When you deploy remotely, you may have key authentication enabled. The example
below shows how to get your service key with Python in order to make an inference
request.
Python
import requests
import json
from azureml.core import Webservice
service = Webservice(workspace=ws, name="myservice")

scoring_uri = service.scoring_uri

key, _ = service.get_keys()
# Set the appropriate headers

headers = {"Content-Type": "application/json"}
headers["Authorization"] = f"Bearer {key}"
# Make the request and display the response and logs

data = {
"query": "What color is the fox",
"context": "The quick brown fox jumped over the lazy dog.",
}
data = json.dumps(data)
resp = requests.post(scoring_uri, data=data, headers=headers)
print(resp.text)
Python
print(service.get_logs())
See the article on client applications to consume web services for more example clients
in other languages.
How to configure emails in the studio

To start receiving emails when your job, online endpoint, or batch endpoint is complete
or if there's an issue (failed, canceled), use the following steps:
1. In Azure ML studio , go to settings by selecting the gear icon.

2. Select the Email notifications tab.
3. Toggle to enable or disable email notifications for a specific event.
Understanding service state
During model deployment, you may see the service state change while it fully deploys.
The following table describes the different service states:
Webservice Description Final

state state?
Transitioning The service is in the process of deployment. No
Unhealthy The service has deployed but is currently unreachable. No
Unschedulable The service cannot be deployed at this time due to lack of No

resources.
Failed The service has failed to deploy due to an error or crash. Yes
Healthy The service is healthy and the endpoint is available. Yes
 Tip
When deploying, Docker images for compute targets are built and loaded from
Azure Container Registry (ACR). By default, Azure Machine Learning creates an ACR
that uses the basic service tier. Changing the ACR for your workspace to standard
or premium tier may reduce the time it takes to build and deploy images to your
compute targets. For more information, see Azure Container Registry service tiers.
７ Note
If you are deploying a model to Azure Kubernetes Service (AKS), we advise you
enable Azure Monitor for that cluster. This will help you understand overall cluster
health and resource usage. You might also find the following resources useful:
Check for Resource Health events impacting your AKS cluster

Azure Kubernetes Service Diagnostics
If you are trying to deploy a model to an unhealthy or overloaded cluster, it is

expected to experience issues. If you need help troubleshooting AKS cluster
problems please contact AKS Support.
Delete resources
Azure CLI
Python
# Get the current model id

import os
stream = os.popen(
'az ml model list --model-name=bidaf_onnx --latest --query "[0].id"
-o tsv'
)
MODEL_ID = stream.read()[0:-1]
MODEL_ID
Azure CLI
az ml service delete -n myservice

az ml service delete -n myaciservice
az ml model delete --model-id=<MODEL_ID>
To delete a deployed webservice, use az ml service delete <name of webservice> .
To delete a registered model from your workspace, use az ml model delete <model
id>
Read more about deleting a webservice and deleting a model.
Next steps
Troubleshoot a failed deployment
Update web service
One click deployment for automated ML runs in the Azure Machine Learning
studio
Use TLS to secure a web service through Azure Machine Learning
Monitor your Azure Machine Learning models with Application Insights
Create event alerts and triggers for model deployments
Deploy models trained with Azure
Machine Learning on your local
machines
Article • 05/23/2023
This article describes how to use your local computer as a target for training or
deploying models created in Azure Machine Learning. Azure Machine Learning is flexible
enough to work with most Python machine learning frameworks. Machine learning
solutions generally have complex dependencies that can be difficult to duplicate. This
article will show you how to balance total control with ease of use.
Scenarios for local deployment include:
Quickly iterating data, scripts, and models early in a project.

Debugging and troubleshooting in later stages.
Final deployment on user-managed hardware.
Prerequisites
A model and an environment. If you don't have a trained model, you can use the
model and dependency files provided in this tutorial.
A conda manager, like Anaconda or Miniconda, if you want to mirror Azure
Machine Learning package dependencies.
Docker, if you want to use a containerized version of the Azure Machine Learning
environment.
Prepare your local machine

The most reliable way to locally run an Azure Machine Learning model is with a Docker
image. A Docker image provides an isolated, containerized experience that duplicates,
except for hardware issues, the Azure execution environment. For more information on
installing and configuring Docker for development scenarios, see Overview of Docker
remote development on Windows.
It's possible to attach a debugger to a process running in Docker. (See Attach to a
running container .) But you might prefer to debug and iterate your Python code
without involving Docker. In this scenario, it's important that your local machine uses
the same libraries that are used when you run your experiment in Azure Machine
Learning. To manage Python dependencies, Azure uses conda . You can re-create the
environment by using other package managers, but installing and configuring conda on
your local machine is the easiest way to synchronize.
） Important
GPU base images can't be used for local deployment, unless the local deployment
is on an Azure Machine Learning compute instance. GPU base images are
supported only on Microsoft Azure Services such as Azure Machine Learning
compute clusters and instances, Azure Container Instance (ACI), Azure VMs, or
Azure Kubernetes Service (AKS).
Prepare your entry script

Even if you use Docker to manage the model and dependencies, the Python scoring
script must be local. The script must have two methods:
An init() method that takes no arguments and returns nothing

A run() method that takes a JSON-formatted string and returns a JSON-
serializable object
The argument to the run() method will be in this form:
JSON
{
"data": <model-specific-data-structure>
}
The object you return from the run() method must implement toJSON() -> string .
The following example demonstrates how to load a registered scikit-learn model and
score it by using NumPy data. This example is based on the model and dependencies of
this tutorial.
Python
import json
import numpy as np
import os
import pickle
import joblib
def init():
global model
# AZUREML_MODEL_DIR is an environment variable created during
deployment.
# It's the path to the model folder (./azureml-
models/$MODEL_NAME/$VERSION).
# For multiple models, it points to the folder containing all deployed
models (./azureml-models).
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'),
'sklearn_mnist_model.pkl')
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
# Make prediction.
y_hat = model.predict(data)
# You can return any data type as long as it's JSON-serializable.
return y_hat.tolist()
For more advanced examples, including automatic Swagger schema generation and
scoring binary data (for example, images), see Advanced entry script authoring.
Deploy as a local web service by using Docker

The easiest way to replicate the environment used by Azure Machine Learning is to
deploy a web service by using Docker. With Docker running on your local machine, you
will:
1. Connect to the Azure Machine Learning workspace in which your model is

registered.
2. Create a Model object that represents the model.
3. Create an Environment object that contains the dependencies and defines the
software environment in which your code will run.
4. Create an InferenceConfig object that associates the entry script with the
Environment .
5. Create a DeploymentConfiguration object of the subclass
LocalWebserviceDeploymentConfiguration .
6. Use Model.deploy() to create a Webservice object. This method downloads the
Docker image and associates it with the Model , InferenceConfig , and
DeploymentConfiguration .
7. Activate the Webservice by using Webservice.wait_for_deployment() .
The following code shows these steps:
Python
from azureml.core.webservice import LocalWebservice

model = Model(ws, 'sklearn_mnist')
myenv = Environment.get(workspace=ws, name="tutorial-env", version="1")

inference_config = InferenceConfig(entry_script="score.py",
environment=myenv)
deployment_config = LocalWebservice.deploy_configuration(port=6789)
local_service = Model.deploy(workspace=ws,
name='sklearn-mnist-local',
models=[model],
deployment_config = deployment_config)
local_service.wait_for_deployment(show_output=True)
print(f"Scoring URI is : {local_service.scoring_uri}")
The call to Model.deploy() can take a few minutes. After you've initially deployed the
web service, it's more efficient to use the update() method rather than starting from
scratch. See Update a deployed web service.
Test your local deployment

When you run the previous deployment script, it will output the URI to which you can
POST data for scoring (for example, http://localhost:6789/score ). The following
sample shows a script that scores sample data by using the "sklearn-mnist-local"
locally deployed model. The model, if properly trained, infers that
normalized_pixel_values should be interpreted as a "2".
Python
import requests
normalized_pixel_values = "[\
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5,
0.7, 1.0, 1.0, 0.6, 0.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 1.0,
1.0, 1.0, 1.0, 1.0, 1.0, 0.9, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 1.0,
1.0, 1.0, 0.8, 0.6, 0.7, 1.0, 1.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0,
1.0, 0.8, 0.1, 0.0, 0.0, 0.0, 0.8, 1.0, 0.5, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3,
1.0, 0.8, 0.1, 0.0, 0.0, 0.0, 0.5, 1.0, 1.0, 0.3, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.8, 1.0, 1.0, 0.3, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 1.0, 1.0, 0.8, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 1.0, 1.0, 0.9, 0.2, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 1.0, 1.0, 0.6, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7, 1.0, 1.0, 0.6, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.9, 1.0, 0.9, 0.1, \
0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.8, 1.0, 1.0, 0.6, \
0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 1.0, 1.0, 0.7, \
0.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.8, 1.0, 1.0, \
1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 1.0, 1.0, \
1.0, 0.7, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, \
1.0, 1.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, \
1.0, 1.0, 1.0, 0.2, 0.1, 0.1, 0.1, 0.1, 0.0, 0.0, 0.0, 0.1, 0.1, 0.1, 0.6,
0.6, 0.6, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.7, 0.6, 0.7, 1.0, 1.0, 1.0,
1.0, 1.0, 1.0, 1.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.7, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,
1.0, 1.0, 0.7, 0.5, 0.5, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.5, 0.5, 0.5, 0.5, 0.7, 1.0, 1.0, 1.0, 0.6, 0.5, 0.5,
0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]"
input_data = "{\"data\": [" + normalized_pixel_values + "]}"
headers = {'Content-Type': 'application/json'}
scoring_uri = "http://localhost:6789/score"
resp = requests.post(scoring_uri, input_data, headers=headers)
print("Should be predicted as '2'")

Download and run your model directly

Using Docker to deploy your model as a web service is the most common option. But
you might want to run your code directly by using local Python scripts. You'll need two
important components:
The model itself

The dependencies upon which the model relies
You can download the model:
From the portal, by selecting the Models tab, selecting the desired model, and on
the Details page, selecting Download.
From the command line, by using az ml model download . (See model download.)
By using the Python SDK Model.download() method. (See Model class.)
An Azure model may be in whatever form your framework uses but is generally one or
more serialized Python objects, packaged as a Python pickle file ( .pkl extension). The
contents of the pickle file depend on the machine learning library or technique used to
train the model. For example, if you're using the model from the tutorial, you might load
the model with:
Python
import pickle
with open('sklearn_mnist_model.pkl', 'rb') as f :

logistic_model = pickle.load(f, encoding='latin1')
Dependencies are always tricky to get right, especially with machine learning, where
there can often be a dizzying web of specific version requirements. You can re-create an
Azure Machine Learning environment on your local machine either as a complete conda
environment or as a Docker image by using the build_local() method of the
Environment class:
Python
myenv = Environment.get(workspace=ws, name="tutorial-env", version="1")
myenv.build_local(workspace=ws, useDocker=False) #Creates conda environment.
If you set the build_local() useDocker argument to True , the function will create a
Docker image rather than a conda environment. If you want more control, you can use
the save_to_directory() method of Environment , which writes conda_dependencies.yml
and azureml_environment.json definition files that you can fine-tune and use as the
basis for extension.
The Environment class has many other methods for synchronizing environments across
your compute hardware, your Azure workspace, and Docker images. For more
information, see Environment class.
After you download the model and resolve its dependencies, there are no Azure-defined
restrictions on how you perform scoring, fine-tune the model, use transfer learning, and
so forth.
Upload a retrained model to Azure Machine

Learning
If you have a locally trained or retrained model, you can register it with Azure. After it's
registered, you can continue tuning it by using Azure compute or deploy it by using
Azure facilities like Azure Kubernetes Service or Triton Inference Server (Preview).
To be used with the Azure Machine Learning Python SDK, a model must be stored as a
serialized Python object in pickle format (a .pkl file). It must also implement a
predict(data) method that returns a JSON-serializable object. For example, you might
store a locally trained scikit-learn diabetes model with:
Python
import joblib

dataset_x, dataset_y = load_diabetes(return_X_y=True)

sk_model = Ridge().fit(dataset_x, dataset_y)
joblib.dump(sk_model, "sklearn_regression_model.pkl")
To make the model available in Azure, you can then use the register() method of the
Model class:
Python
model = Model.register(model_path="sklearn_regression_model.pkl",
model_name="sklearn_regression_model",
tags={'area': "diabetes", 'type': "regression"},
description="Ridge regression model to predict
diabetes",
workspace=ws)
You can then find your newly registered model on the Azure Machine Learning Model
tab:
For more information on uploading and updating models and environments, see
Register model and deploy locally with advanced usages .
Next steps
For information on using VS Code with Azure Machine Learning, see Launch Visual
Studio Code remotely connected to a compute instance (preview)
For more information on managing environments, see Create & use software
To learn about accessing data from your datastore, see Connect to storage services
on Azure.
Deploy a model locally
Article • 04/20/2023
Learn how to use Azure Machine Learning to deploy a model as a web service on your
Azure Machine Learning compute instance. Use compute instances if one of the
following conditions is true:
You need to quickly deploy and validate your model.

You are testing a model that is under development.
 Tip
Deploying a model from a Jupyter Notebook on a compute instance, to a web

service on the same VM is a local deployment. In this case, the 'local' computer is
the compute instance.
７ Note

Prerequisites
An Azure Machine Learning workspace with a compute instance running. For more
information, see Create resources to get started.
Deploy to the compute instances

An example notebook that demonstrates local deployments is included on your
compute instance. Use the following steps to load the notebook and deploy the model
as a web service on the VM:
1. From Azure Machine Learning studio , select "Notebooks", and then select how-
to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb
under "Sample notebooks". Clone this notebook to your user folder.
2. Find the notebook cloned in step 1, choose or create a Compute Instance to run
the notebook.
3. The notebook displays the URL and port that the service is running on. For
example, https://localhost:6789 . You can also run the cell containing
print('Local service port: {}'.format(local_service.port)) to display the port.
4. To test the service from a compute instance, use the https://localhost:

<local_service.port> URL. To test from a remote client, get the public URL of the
service running on the compute instance. The public URL can be determined use
the following formula;
Notebook VM: https://<vm_name>-<local_service_port>.

<azure_region_of_workspace>.notebooks.azureml.net/score .
Compute instance: https://<vm_name>-<local_service_port>.

<azure_region_of_workspace>.instances.azureml.net/score .
For example,
Notebook VM: https://vm-name-

6789.northcentralus.notebooks.azureml.net/score
Compute instance: https://vm-name-

6789.northcentralus.instances.azureml.net/score
Test the service

To submit sample data to the running service, use the following code. Replace the value
of service_url with the URL of from the previous step:
７ Note
When authenticating to a deployment on the compute instance, the authentication

is made using Azure Active Directory. The call to
interactive_auth.get_authentication_header() in the example code authenticates
you using AAD, and returns a header that can then be used to authenticate to the
service on the compute instance. For more information, see Set up authentication
for Azure Machine Learning resources and workflows.
When authenticating to a deployment on Azure Kubernetes Service or Azure

Container Instances, a different authentication method is used. For more
information on, see Configure authentication for Azure Machine models deployed
as web services.
Python
import requests
import json
from azureml.core.authentication import InteractiveLoginAuthentication
# Get a token to authenticate to the compute instance from remote

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()
# Create and submit a request using the auth header

headers = auth_header
# Add content type header
headers.update({'Content-Type':'application/json'})
# Sample data to send to the service

test_sample = json.dumps({'data': [
[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]
]})
test_sample = bytes(test_sample,encoding = 'utf8')
# Replace with the URL for your compute instance, as determined from the
previous section
service_url = "https://vm-name-
6789.northcentralus.notebooks.azureml.net/score"
# for a compute instance, the url would be https://vm-name-
6789.northcentralus.instances.azureml.net/score
resp = requests.post(service_url, test_sample, headers=headers)
Next steps
How to deploy a model using a custom Docker image
Deployment troubleshooting
Consume a ML Model deployed as a web service
Collect data for models in production
Deploy a deep learning model for
inference with GPU
Article • 06/01/2023
This article teaches you how to use Azure Machine Learning to deploy a GPU-enabled
model as a web service. The information in this article is based on deploying a model on
Azure Kubernetes Service (AKS). The AKS cluster provides a GPU resource that is used by
the model for inference.
Inference, or model scoring, is the phase where the deployed model is used to make
predictions. Using GPUs instead of CPUs offers performance advantages on highly
parallelizable computation.
７ Note

） Important
When using the Azure Machine Learning SDK v1, GPU inference is only supported
on Azure Kubernetes Service. When using the Azure Machine Learning SDK v2 or
CLI v2, you can use an online endpoint for GPU inference. For more information,
see Deploy and score a machine learning model with an online endpoint.
For inference using a machine learning pipeline, GPUs are only supported on Azure
Machine Learning Compute. For more information on using ML pipelines, see
Tutorial: Build an Azure Machine Learning pipeline for batch scoring.
 Tip
Although the code snippets in this article use a TensorFlow model, you can apply
the information to any machine learning framework that supports GPUs.
７ Note
The information in this article builds on the information in the How to deploy to
Azure Kubernetes Service article. Where that article generally covers deployment
to AKS, this article covers GPU specific deployment.
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure
A Python development environment with the Azure Machine Learning SDK

installed. For more information, see Azure Machine Learning SDK.
A registered model that uses a GPU.
To learn how to register models, see Deploy Models.
To create and register the Tensorflow model used to create this document, see
How to Train a TensorFlow Model.
A general understanding of How and where to deploy models.

To connect to an existing workspace, use the following code:
） Important
directory or its parent. For more information on creating a workspace, see Create
workspace resources. For more information on saving the configuration to file, see
Python
# Connect to the workspace

Create a Kubernetes cluster with GPUs
Azure Kubernetes Service provides many different GPU options. You can use any of
them for model inference. See the list of N-series VMs for a full breakdown of
capabilities and costs.
The following code demonstrates how to create a new AKS cluster for your workspace:
Python

# Choose a name for your cluster

aks_name = "aks-gpu"
# Check to see if the cluster already exists

try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
# Provision AKS cluster with GPU machine
prov_config =
AksCompute.provisioning_configuration(vm_size="Standard_NC6")

aks_target = ComputeTarget.create(
workspace=ws, name=aks_name, provisioning_configuration=prov_config
)
） Important
Azure will bill you as long as the AKS cluster exists. Make sure to delete your AKS
cluster when you're done with it.
For more information on using AKS with Azure Machine Learning, see How to deploy to
Azure Kubernetes Service.
Write the entry script

The entry script receives data submitted to the web service, passes it to the model, and
returns the scoring results. The following script loads the Tensorflow model on startup,
and then uses the model to score data.
 Tip
The entry script is specific to your model. For example, the script must know the
framework to use with your model, data formats, etc.
Python
import json
import numpy as np
import os
import tensorflow as tf
def init():
global X, output, sess
tf.reset_default_graph()
model_root = os.getenv('AZUREML_MODEL_DIR')
# the name of the folder in which to look for tensorflow model files
tf_model_folder = 'model'
saver = tf.train.import_meta_graph(
os.path.join(model_root, tf_model_folder, 'mnist-tf.model.meta'))
X = tf.get_default_graph().get_tensor_by_name("network/X:0")
output =
tf.get_default_graph().get_tensor_by_name("network/output/MatMul:0")
sess = tf.Session()
saver.restore(sess, os.path.join(model_root, tf_model_folder, 'mnist-
tf.model'))
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
# make prediction
out = output.eval(session=sess, feed_dict={X: data})
y_hat = np.argmax(out, axis=1)
return y_hat.tolist()
This file is named score.py . For more information on entry scripts, see How and where
to deploy.
Define the conda environment

The conda environment file specifies the dependencies for the service. It includes
dependencies required by both the model and the entry script. Please note that you
must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it
contains the functionality needed to host the model as a web service. The following
YAML defines the environment for a Tensorflow model. It specifies tensorflow-gpu ,
which will make use of the GPU used in this deployment:
YAML
name: project_environment
dependencies:
# The Python interpreter version.
# Currently Azure Machine Learning only supports 3.5.2 and later.
- python=3.7
- pip:
# You must list azureml-defaults as a pip dependency
- azureml-defaults>=1.0.45
- numpy
- tensorflow-gpu=1.12
channels:
- conda-forge
For this example, the file is saved as myenv.yml .
Define the deployment configuration
） Important
AKS does not allow pods to share GPUs, you can have only as many replicas of a
GPU-enabled web service as there are GPUs in the cluster.
The deployment configuration defines the Azure Kubernetes Service environment used
to run the web service:
Python
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=2,
memory_gb=4)
For more information, see the reference documentation for

AksService.deploy_configuration.
Define the inference configuration
The inference configuration points to the entry script and an environment object, which
uses a docker image with GPU support. Please note that the YAML file used for
environment definition must list azureml-defaults with version >= 1.0.45 as a pip
dependency, because it contains the functionality needed to host the model as a web
service.
Python

from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE
myenv = Environment.from_conda_specification(name="myenv",
myenv.docker.base_image = DEFAULT_GPU_IMAGE
environment=myenv)
For more information on environments, see Create and manage environments for
training and deployment. For more information, see the reference documentation for
InferenceConfig.
Deploy the model

Deploy the model to your AKS cluster and wait for it to create your service.
Python
# Name of the web service that is deployed

aks_service_name = 'aks-dnn-mnist'
# Get the registerd model
model = Model(ws, "tf-dnn-mnist")
# Deploy the model
aks_service = Model.deploy(ws,
models=[model],
deployment_config=gpu_aks_config,
name=aks_service_name)
For more information, see the reference documentation for Model.

Issue a sample query to your service
Send a test query to the deployed model. When you send a jpeg image to the model, it
scores the image. The following code sample downloads test data and then selects a
random test image to send to the service.
Python
# Used to test your webservice

import os
import urllib
import gzip
import numpy as np
import struct
import requests
# load compressed MNIST gz files and return numpy arrays

def load_data(filename, label=False):
with gzip.open(filename) as gz:
struct.unpack('I', gz.read(4))
n_items = struct.unpack('>I', gz.read(4))
if not label:
n_rows = struct.unpack('>I', gz.read(4))[0]
n_cols = struct.unpack('>I', gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols),
dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols)
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(n_items[0], 1)
return res
# one-hot encode a 1-D array

def one_hot_encode(array, num_of_classes):
return np.eye(num_of_classes)[array.reshape(-1)]
# Download test data

os.makedirs('./data/mnist', exist_ok=True)
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-
idx3-ubyte.gz', filename='./data/mnist/test-images.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-
idx1-ubyte.gz', filename='./data/mnist/test-labels.gz')
# Load test data from model training

X_test = load_data('./data/mnist/test-images.gz', False) / 255.0
y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)
# send a random row from the test set to score

random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"
api_key = aks_service.get_keys()[0]
headers = {'Content-Type': 'application/json',
'Authorization': ('Bearer ' + api_key)}
resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)
print("POST to url", aks_service.scoring_uri)

print("label:", y_test[random_index])
For more information on creating a client application, see Create client to consume
deployed web service.
Clean up the resources

If you created the AKS cluster specifically for this example, delete your resources after
you're done.
） Important
Azure bills you based on how long the AKS cluster is deployed. Make sure to clean
it up after you are done with it.
Python
aks_service.delete()
aks_target.delete()
Next steps
Deploy model on FPGA
Deploy model with ONNX
Train TensorFlow DNN Models
Deploy a model for use with Cognitive
Search
Article • 04/04/2023
This article teaches you how to use Azure Machine Learning to deploy a model for use
with Azure Cognitive Search.
Cognitive Search performs content processing over heterogenous content, to make it

queryable by humans or applications. This process can be enhanced by using a model
deployed from Azure Machine Learning.
Azure Machine Learning can deploy a trained model as a web service. The web service is
then embedded in a Cognitive Search skill, which becomes part of the processing
pipeline.
） Important
The information in this article is specific to the deployment of the model. It

provides information on the supported deployment configurations that allow the
model to be used by Cognitive Search.
For information on how to configure Cognitive Search to use the deployed model,
see the Build and deploy a custom skill with Azure Machine Learning tutorial.
When deploying a model for use with Azure Cognitive Search, the deployment must
meet the following requirements:
Use Azure Kubernetes Service to host the model for inference.

Enable transport layer security (TLS) for the Azure Kubernetes Service. TLS is used
to secure HTTPS communications between Cognitive Search and the deployed
model.
The entry script must use the inference_schema package to generate an OpenAPI
(Swagger) schema for the service.
The entry script must also accept JSON data as input, and generate JSON as
output.
Prerequisites
A Python development environment with the Azure Machine Learning SDK

installed. For more information, see Azure Machine Learning SDK.
A registered model.
A general understanding of How and where to deploy models.

An Azure Machine Learning workspace provides a centralized place to work with all the
artifacts you create when you use Azure Machine Learning. The workspace keeps a
history of all training jobs, including logs, metrics, output, and a snapshot of your
scripts.
To connect to an existing workspace, use the following code:
） Important
directory or its parent. For more information, see Create and manage Azure
Machine Learning workspaces. For more information on saving the configuration
to file, see Create a workspace configuration file.
Python
try:
# Load the workspace configuration from local cached inffo
print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
print('Library configuration succeeded')
except:
print('Workspace not found')
Create a Kubernetes cluster

Time estimate: Approximately 20 minutes.
A Kubernetes cluster is a set of virtual machine instances (called nodes) that are used for
running containerized applications.
When you deploy a model from Azure Machine Learning to Azure Kubernetes Service,
the model and all the assets needed to host it as a web service are packaged into a
Docker container. This container is then deployed onto the cluster.
The following code demonstrates how to create a new Azure Kubernetes Service (AKS)
cluster for your workspace:
 Tip
You can also attach an existing Azure Kubernetes Service to your Azure Machine
Learning workspace. For more information, see How to deploy models to Azure
Kubernetes Service.
） Important
Notice that the code uses the enable_ssl() method to enable transport layer
security (TLS) for the cluster. This is required when you plan on using the deployed
model from Cognitive Search.
Python
from azureml.core.compute import AksCompute, ComputeTarget

# Create or attach to an AKS inferencing cluster
# Create the provisioning configuration with defaults

prov_config = AksCompute.provisioning_configuration()
# Enable TLS (sometimes called SSL) communications

# Leaf domain label generates a name using the formula
# "<leaf-domain-label>######.<azure-region>.cloudapp.azure.com"
# where "######" is a random series of characters
prov_config.enable_ssl(leaf_domain_label = "contoso")
cluster_name = 'amlskills'
# Try to use an existing compute target by that name.
# If one doesn't exist, create one.
try:
aks_target = ComputeTarget(ws, cluster_name)

print("Attaching to existing cluster")
print("Creating new cluster")
aks_target = ComputeTarget.create(workspace = ws,
name = cluster_name,
provisioning_configuration = prov_config)
# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)
） Important
Azure will bill you as long as the AKS cluster exists. Make sure to delete your AKS
cluster when you're done with it.
For more information on using AKS with Azure Machine Learning, see How to deploy to
Write the entry script

The entry script receives data submitted to the web service, passes it to the model, and
returns the scoring results. The following script loads the model on startup, and then
uses the model to score data. This file is sometimes called score.py .
 Tip
The entry script is specific to your model. For example, the script must know the
framework to use with your model, data formats, etc.
） Important
When you plan on using the deployed model from Azure Cognitive Search you
must use the inference_schema package to enable schema generation for the
deployment. This package provides decorators that allow you to define the input
and output data format for the web service that performs inference using the
model.
Python

from nlp_architect.models.absa.inference.inference import SentimentInference
from spacy.cli.download import download as spacy_download
import traceback
import json
# Inference schema for schema discovery
NumpyParameterType
from inference_schema.parameter_types.standard_py_parameter_type import
StandardPythonParameterType
def init():
"""
Set up the ABSA model for Inference
"""
global SentInference
spacy_download('en')
aspect_lex = Model.get_model_path('hotel_aspect_lex')
opinion_lex = Model.get_model_path('hotel_opinion_lex')
SentInference = SentimentInference(aspect_lex, opinion_lex)
# Use inference schema decorators and sample input/output to

# build the OpenAPI (Swagger) schema for the deployment
standard_sample_input = {'text': 'a sample input record containing some
text' }
standard_sample_output = {"sentiment": {"sentence": "This place makes false
booking prices, when you get there, they say they do not have the
reservation for that day.",
"terms": [{"text": "hotels", "type":
"AS", "polarity": "POS", "score": 1.0, "start": 300, "len": 6},
{"text": "nice", "type":
"OP", "polarity": "POS", "score": 1.0, "start": 295, "len": 4}]}}
@input_schema('raw_data',
StandardPythonParameterType(standard_sample_input))
@output_schema(StandardPythonParameterType(standard_sample_output))
def run(raw_data):
try:
# Get the value of the 'text' field from the JSON input and perform
inference
input_txt = raw_data["text"]
doc = SentInference.run(doc=input_txt)
if doc is None:
return None
sentences = doc._sentences
result = {"sentence": doc._doc_text}
terms = []
for sentence in sentences:
for event in sentence._events:
for x in event:
term = {"text": x._text, "type":x._type.value,
"polarity": x._polarity.value, "score": x._score,"start": x._start,"len":
x._len }
terms.append(term)
result["terms"] = terms
print("Success!")
# Return the results to the client as a JSON document
return {"sentiment": result}
result = str(e)
# return error message back to the client
print("Failure!")
print(traceback.format_exc())
return json.dumps({"error": result, "tb": traceback.format_exc()})
For more information on entry scripts, see How and where to deploy.
Define the software environment

The environment class is used to define the Python dependencies for the service. It
includes dependencies required by both the model and the entry script. In this example,
it installs packages from the regular pypi index, as well as from a GitHub repo.
Python

conda = None
pip = ["azureml-defaults", "azureml-monitoring",
"git+https://github.com/NervanaSystems/nlp-architect.git@absa", 'nlp-
architect', 'inference-schema',
"spacy==2.0.18"]
conda_deps = CondaDependencies.create(conda_packages=None, pip_packages=pip)
myenv = Environment(name='myenv')
myenv.python.conda_dependencies = conda_deps
For more information on environments, see Create and manage environments for
training and deployment.
Define the deployment configuration

The deployment configuration defines the Azure Kubernetes Service hosting
environment used to run the web service.
 Tip
If you aren't sure about the memory, CPU, or GPU needs of your deployment, you
can use profiling to learn these. For more information, see How and where to
deploy a model.
Python
from azureml.core.image import ContainerImage
from azureml.core.webservice import AksWebservice, Webservice
# If deploying to a cluster configured for dev/test, ensure that it was

created with enough
# cores and memory to handle this deployment configuration. Note that memory
is also used by
# things such as dependencies and Azure Machine Learning components.
autoscale_min_replicas=1,
autoscale_max_replicas=3,
autoscale_refresh_seconds=10,
autoscale_target_utilization=70,
auth_enabled=True,
cpu_cores=1,
memory_gb=2,
scoring_timeout_ms=5000,
replica_max_concurrent_requests=2,
max_request_wait_time=5000)
For more information, see the reference documentation for

AksService.deploy_configuration.
Define the inference configuration

The inference configuration points to the entry script and the environment object:
Python

inf_config = InferenceConfig(entry_script='score.py', environment=myenv)
For more information, see the reference documentation for InferenceConfig.
Deploy the model

Deploy the model to your AKS cluster and wait for it to create your service. In this
example, two registered models are loaded from the registry and deployed to AKS. After
deployment, the score.py file in the deployment loads these models and uses them to
perform inference.
Python
c_aspect_lex = Model(ws, 'hotel_aspect_lex')

c_opinion_lex = Model(ws, 'hotel_opinion_lex')
service_name = "hotel-absa-v2"
aks_service = Model.deploy(workspace=ws,
name=service_name,
models=[c_aspect_lex, c_opinion_lex],
inference_config=inf_config,
overwrite=True)
aks_service.wait_for_deployment(show_output = True)
For more information, see the reference documentation for Model.
Issue a sample query to your service

The following example uses the deployment information stored in the aks_service
variable by the previous code section. It uses this variable to retrieve the scoring URL
and authentication token needed to communicate with the service:
Python
import requests
import json
primary, secondary = aks_service.get_keys()
# Test data
input_data = '{"raw_data": {"text": "This is a nice place for a relaxing
evening out with friends. The owners seem pretty nice, too. I have been
there a few times including last night. Recommend."}}'
# Since authentication was enabled for the deployment, set the authorization
header.
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+
primary)}
# Send the request and display the results
resp = requests.post(aks_service.scoring_uri, input_data, headers=headers)
print(resp.text)
The result returned from the service is similar to the following JSON:
JSON
{"sentiment": {"sentence": "This is a nice place for a relaxing evening out

with friends. The owners seem pretty nice, too. I have been there a few
times including last night. Recommend.", "terms": [{"text": "place", "type":
"AS", "polarity": "POS", "score": 1.0, "start": 15, "len": 5}, {"text":
"nice", "type": "OP", "polarity": "POS", "score": 1.0, "start": 10, "len":
4}]}}
Connect to Cognitive Search

For information on using this model from Cognitive Search, see the Build and deploy a
custom skill with Azure Machine Learning tutorial.
Clean up the resources

If you created the AKS cluster specifically for this example, delete your resources after
you're done testing it with Cognitive Search.
） Important
Azure bills you based on how long the AKS cluster is deployed. Make sure to clean
it up after you are done with it.
Python
aks_target.delete()
Next steps
Build and deploy a custom skill with Azure Machine Learning
Deploy ML models to field-
programmable gate arrays (FPGAs) with
Azure Machine Learning
Article • 03/02/2023
In this article, you learn about FPGAs and how to deploy your ML models to an Azure
FPGA using the hardware-accelerated models Python package from Azure Machine
Learning.
What are FPGAs?

FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfigurable
interconnects. The interconnects allow these blocks to be configured in various ways
after manufacturing. Compared to other chips, FPGAs provide a combination of
programmability and performance.
FPGAs make it possible to achieve low latency for real-time inference (or model scoring)
requests. Asynchronous requests (batching) aren't needed. Batching can cause latency,
because more data needs to be processed. Implementations of neural processing units
don't require batching; therefore the latency can be many times lower, compared to
CPU and GPU processors.
You can reconfigure FPGAs for different types of machine learning models. This flexibility
makes it easier to accelerate the applications based on the most optimal numerical
precision and memory model being used. Because FPGAs are reconfigurable, you can
stay current with the requirements of rapidly changing AI algorithms.
Processor Abbreviation Description
Application- ASICs Custom circuits, such as Google's Tensor Processor Units

specific (TPU), provide the highest efficiency. They can't be
integrated reconfigured as your needs change.
circuits
Field- FPGAs FPGAs, such as those available on Azure, provide performance

programmable close to ASICs. They're also flexible and reconfigurable over
gate arrays time, to implement new logic.
Graphics GPUs A popular choice for AI computations. GPUs offer parallel

processing units processing capabilities, making it faster at image rendering
than CPUs.
Central CPUs General-purpose processors, the performance of which isn't

processing units ideal for graphics and video processing.
FPGA support in Azure

Microsoft Azure is the world's largest cloud investment in FPGAs. Microsoft uses FPGAs
for deep neural networks (DNN) evaluation, Bing search ranking, and software defined
networking (SDN) acceleration to reduce latency, while freeing CPUs for other tasks.
FPGAs on Azure are based on Intel's FPGA devices, which data scientists and developers
use to accelerate real-time AI calculations. This FPGA-enabled architecture offers
performance, flexibility, and scale, and is available on Azure.
Azure FPGAs are integrated with Azure Machine Learning. Azure can parallelize pre-
trained DNN across FPGAs to scale out your service. The DNNs can be pre-trained, as a
deep featurizer for transfer learning, or fine-tuned with updated weights.
Scenarios & configurations on Azure Supported DNN Regional

models support
+ Image classification and recognition scenarios - ResNet 50 - East US

+ TensorFlow deployment (requires Tensorflow - ResNet 152 - Southeast Asia
1.x) - DenseNet-121 - West Europe
+ Intel FPGA hardware - VGG-16 - West US 2
- SSD-VGG
To optimize latency and throughput, your client sending data to the FPGA model should
be in one of the regions above (the one you deployed the model to).
The PBS Family of Azure VMs contains Intel Arria 10 FPGAs. It will show as "Standard
PBS Family vCPUs" when you check your Azure quota allocation. The PB6 VM has six
vCPUs and one FPGA. PB6 VM is automatically provisioned by Azure Machine Learning
during model deployment to an FPGA. It's only used with Azure Machine Learning, and
it can't run arbitrary bitstreams. For example, you won't be able to flash the FPGA with
bitstreams to do encryption, encoding, etc.
Deploy models on FPGAs

You can deploy a model as a web service on FPGAs with Azure Machine Learning
Hardware Accelerated Models. Using FPGAs provides ultra-low latency inference, even
with a single batch size.
In this example, you create a TensorFlow graph to preprocess the input image, make it a
featurizer using ResNet 50 on an FPGA, and then run the features through a classifier
trained on the ImageNet data set. Then, the model is deployed to an AKS cluster.
Prerequisites
An Azure subscription. If you don't have one, create a pay-as-you-go account
(free Azure accounts aren't eligible for FPGA quota).
An Azure Machine Learning workspace and the Azure Machine Learning SDK for
Python installed, as described in Create a workspace.
The hardware-accelerated models package: pip install --upgrade azureml-accel-

models[cpu]
The Azure CLI
FPGA quota. Submit a request for quota , or run this CLI command to check
quota:
Azure CLI
az vm list-usage --location "eastus" -o table --query "[?

localName=='Standard PBS Family vCPUs']"
Make sure you have at least 6 vCPUs under the CurrentValue returned.
Define the TensorFlow model

Begin by using the Azure Machine Learning SDK for Python to create a service
definition. A service definition is a file describing a pipeline of graphs (input, featurizer,
and classifier) based on TensorFlow. The deployment command compresses the
definition and graphs into a ZIP file, and uploads the ZIP to Azure Blob storage. The
DNN is already deployed to run on the FPGA.
1. Load Azure Machine Learning workspace
Python
import os
print(ws.name, ws.resource_group, ws.location, ws.subscription_id,
sep='\n')
2. Preprocess image. The input to the web service is a JPEG image. The first step is to
decode the JPEG image and preprocess it. The JPEG images are treated as strings
and the result are tensors that will be the input to the ResNet 50 model.
Python
# Input images as a two-dimensional tensor containing an arbitrary

number of images represented a strings
import azureml.accel.models.utils as utils
in_images = tf.placeholder(tf.string)
image_tensors = utils.preprocess_array(in_images)
print(image_tensors.shape)
3. Load featurizer. Initialize the model and download a TensorFlow checkpoint of the
quantized version of ResNet50 to be used as a featurizer. Replace
"QuantizedResnet50" in the code snippet to import other deep neural networks:
QuantizedResnet152
QuantizedVgg16
Densenet121
Python
from azureml.accel.models import QuantizedResnet50

save_path = os.path.expanduser('~/models')
model_graph = QuantizedResnet50(save_path, is_frozen=True)
feature_tensor = model_graph.import_graph_def(image_tensors)
print(model_graph.version)
print(feature_tensor.name)
print(feature_tensor.shape)
4. Add a classifier. This classifier was trained on the ImageNet data set.
Python
classifier_output = model_graph.get_default_classifier(feature_tensor)
print(classifier_output)
5. Save the model. Now that the preprocessor, ResNet 50 featurizer, and the classifier
have been loaded, save the graph and associated variables as a model.
Python
model_name = "resnet50"
model_save_path = os.path.join(save_path, model_name)
print("Saving model in {}".format(model_save_path))
with tf.Session() as sess:

model_graph.restore_weights(sess)
tf.saved_model.simple_save(sess, model_save_path,
inputs={'images': in_images},
outputs={'output_alias':
classifier_output})
6. Save input and output tensors as you will use them for model conversion and
inference requests.
Python
input_tensors = in_images.name
output_tensors = classifier_output.name
print(input_tensors)
print(output_tensors)
The following models are listed with their classifier output tensors for inference if
you used the default classifier.
Resnet50, QuantizedResnet50
Python
output_tensors = "classifier_1/resnet_v1_50/predictions/Softmax:0"
Resnet152, QuantizedResnet152
Python
output_tensors = "classifier/resnet_v1_152/predictions/Softmax:0"
Densenet121, QuantizedDensenet121
Python
output_tensors = "classifier/densenet121/predictions/Softmax:0"
Vgg16, QuantizedVgg16
Python
output_tensors = "classifier/vgg_16/fc8/squeezed:0"
SsdVgg, QuantizedSsdVgg
Python
output_tensors = ['ssd_300_vgg/block4_box/Reshape_1:0',
'ssd_300_vgg/block7_box/Reshape_1:0',
'ssd_300_vgg/block4_box/Reshape:0',
'ssd_300_vgg/block11_box/Reshape:0']
Convert the model to the Open Neural Network
Exchange format (ONNX)
Before you can deploy to FPGAs, convert the model to the ONNX format.
1. Register the model by using the SDK with the ZIP file in Azure Blob storage.
Adding tags and other metadata about the model helps you keep track of your
trained models.
Python
registered_model = Model.register(workspace=ws,
model_path=model_save_path,
model_name=model_name)
print("Successfully registered: ", registered_model.name,

registered_model.description, registered_model.version, sep='\t')
If you've already registered a model and want to load it, you may retrieve it.
Python

model_name = "resnet50"
# By default, the latest version is retrieved. You can specify the
version, i.e. version=1
registered_model = Model(ws, name="resnet50")
print(registered_model.name, registered_model.description,
registered_model.version, sep='\t')
2. Convert the TensorFlow graph to the ONNX format. You must provide the names
of the input and output tensors, so your client can use them when you consume
the web service.
Python
from azureml.accel import AccelOnnxConverter
convert_request = AccelOnnxConverter.convert_tf_model(
ws, registered_model, input_tensors, output_tensors)
# If it fails, you can run wait_for_completion again with

show_output=True.
convert_request.wait_for_completion(show_output=False)
# If the above call succeeded, get the converted model

converted_model = convert_request.result
print("\nSuccessfully converted: ", converted_model.name,
converted_model.url, converted_model.version,
converted_model.id, converted_model.created_time, '\n')
Containerize and deploy the model

Next, create a Docker image from the converted model and all dependencies. This
Docker image can then be deployed and instantiated. Supported deployment targets
include Azure Kubernetes Service (AKS) in the cloud or an edge device such as Azure
Azure Stack Edge. You can also add tags and descriptions for your registered Docker
image.
Python
from azureml.core.image import Image

from azureml.accel import AccelContainerImage
image_config = AccelContainerImage.image_configuration()
# Image name must be lowercase
image_name = "{}-image".format(model_name)
image = Image.create(name=image_name,
models=[converted_model],
image_config=image_config,
workspace=ws)
image.wait_for_creation(show_output=False)
List the images by tag and get the detailed logs for any debugging.
Python
for i in Image.list(workspace=ws):
print('{}(v.{} [{}]) stored at {} with build log {}'.format(
i.name, i.version, i.creation_state, i.image_location,
i.image_build_log_uri))
Deploy to an Azure Kubernetes Service Cluster
1. To deploy your model as a high-scale production web service, use AKS. You can
create a new one using the Azure Machine Learning SDK, CLI, or Azure Machine
Learning studio .
Python
# Specify the Standard_PB6s Azure VM and location. Values for location

may be "eastus", "southeastasia", "westeurope", or "westus2". If no
value is specified, the default is "eastus".
prov_config = AksCompute.provisioning_configuration(vm_size =
"Standard_PB6s",
agent_count = 1,
location =
"eastus")
aks_name = 'my-aks-cluster'
name=aks_name,
The AKS deployment may take around 15 minutes. Check to see if the deployment
succeeded.
Python
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)
2. Deploy the container to the AKS cluster.
Python
from azureml.core.webservice import Webservice, AksWebservice
# For this deployment, set the web service configuration without

enabling auto-scaling or authentication for testing
aks_config =
AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=1,
auth_enabled=False)
aks_service_name = 'my-aks-service'
aks_service = Webservice.deploy_from_image(workspace=ws,
name=aks_service_name,
image=image,
deployment_target=aks_target)
Deploy to a local edge server
All Azure Azure Stack Edge devices contain an FPGA for running the model. Only one
model can be running on the FPGA at one time. To run a different model, just deploy a
new container. Instructions and sample code can be found in this Azure Sample .
Consume the deployed model

Lastly, use the sample client to call into the Docker image to get predictions from the
model. Sample client code is available:
Python
C#
The Docker image supports gRPC and the TensorFlow Serving "predict" API.
You can also download a sample client for TensorFlow Serving.
Python
# Using the grpc client in Azure Machine Learning Accelerated Models SDK
package
from azureml.accel import PredictionClient
address = aks_service.scoring_uri
ssl_enabled = address.startswith("https")
address = address[address.find('/')+2:].strip('/')
port = 443 if ssl_enabled else 80
# Initialize Azure Machine Learning Accelerated Models client

client = PredictionClient(address=address,
port=port,
use_ssl=ssl_enabled,
service_name=aks_service.name)
Since this classifier was trained on the ImageNet data set, map the classes to human-
readable labels.
Python
import requests
classes_entries = requests.get(
"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/
imagenet_classes.txt").text.splitlines()
# Score image with input and output tensor names
results = client.score_file(path="./snowleopardgaze.jpg",
input_name=input_tensors,
outputs=output_tensors)
# map results [class_id] => [confidence]

results = enumerate(results)
# sort results by confidence
sorted_results = sorted(results, key=lambda x: x[1], reverse=True)
# print top 5 results
for top in sorted_results[:5]:
print(classes_entries[top[0]], 'confidence:', top[1])
Clean up resources
To avoid unnecessary costs, clean up your resources in this order: web service, then
image, and then the model.
Python
aks_target.delete()
image.delete()
registered_model.delete()
converted_model.delete()
Next steps
Learn how to secure your web services document.
Learn about FPGA and Azure Machine Learning pricing and costs .
Hyperscale hardware: ML at scale on top of Azure + FPGA: Build 2018 (video)
Project Brainwave for real-time AI
Automated optical inspection system

Hyperparameter tuning a model with
Azure Machine Learning (v1)
Article • 06/01/2023
） Important

and Python SDK v2.
Automate efficient hyperparameter tuning by using Azure Machine Learning (v1)

HyperDrive package. Learn how to complete the steps required to tune
hyperparameters with the Azure Machine Learning SDK:
1. Define the parameter search space

2. Specify a primary metric to optimize
3. Specify early termination policy for low-performing runs
4. Create and assign resources
5. Launch an experiment with the defined configuration
6. Visualize the training runs
7. Select the best configuration for your model
What is hyperparameter tuning?

Hyperparameters are adjustable parameters that let you control the model training
process. For example, with neural networks, you decide the number of hidden layers and
the number of nodes in each layer. Model performance depends heavily on
hyperparameters.
Hyperparameter tuning, also called hyperparameter optimization, is the process of

finding the configuration of hyperparameters that results in the best performance. The
process is typically computationally expensive and manual.
Azure Machine Learning lets you automate hyperparameter tuning and run experiments
in parallel to efficiently optimize hyperparameters.
Define the search space

Tune hyperparameters by exploring the range of values defined for each
hyperparameter.
Hyperparameters can be discrete or continuous, and has a distribution of values

described by a parameter expression.
Discrete hyperparameters
Discrete hyperparameters are specified as a choice among discrete values. choice can
be:
one or more comma-separated values

a range object
any arbitrary list object
Python
{
"batch_size": choice(16, 32, 64, 128)
"number_of_hidden_layers": choice(range(1,5))
}
In this case, batch_size one of the values [16, 32, 64, 128] and number_of_hidden_layers
takes one of the values [1, 2, 3, 4].
The following advanced discrete hyperparameters can also be specified using a

distribution:
quniform(low, high, q) - Returns a value like round(uniform(low, high) / q) * q
qloguniform(low, high, q) - Returns a value like round(exp(uniform(low, high)) /

q) * q
qnormal(mu, sigma, q) - Returns a value like round(normal(mu, sigma) / q) * q
qlognormal(mu, sigma, q) - Returns a value like round(exp(normal(mu, sigma)) / q)

*q
Continuous hyperparameters
The Continuous hyperparameters are specified as a distribution over a continuous range
of values:
uniform(low, high) - Returns a value uniformly distributed between low and high
loguniform(low, high) - Returns a value drawn according to exp(uniform(low,

high)) so that the logarithm of the return value is uniformly distributed
normal(mu, sigma) - Returns a real value that's normally distributed with mean mu
and standard deviation sigma

lognormal(mu, sigma) - Returns a value drawn according to exp(normal(mu,
sigma)) so that the logarithm of the return value is normally distributed
An example of a parameter space definition:
Python
{
"learning_rate": normal(10, 3),
"keep_probability": uniform(0.05, 0.1)
}
This code defines a search space with two parameters - learning_rate and
keep_probability . learning_rate has a normal distribution with mean value 10 and a
standard deviation of 3. keep_probability has a uniform distribution with a minimum

value of 0.05 and a maximum value of 0.1.
Sampling the hyperparameter space

Specify the parameter sampling method to use over the hyperparameter space. Azure
Machine Learning supports the following methods:
Random sampling
Grid sampling
Bayesian sampling
Random sampling
Random sampling supports discrete and continuous hyperparameters. It supports early

termination of low-performance runs. Some users do an initial search with random
sampling and then refine the search space to improve results.
In random sampling, hyperparameter values are randomly selected from the defined
search space.
Python
from azureml.train.hyperdrive import RandomParameterSampling

from azureml.train.hyperdrive import normal, uniform, choice
param_sampling = RandomParameterSampling( {
"learning_rate": normal(10, 3),
"keep_probability": uniform(0.05, 0.1),
}
)
Grid sampling
Grid sampling supports discrete hyperparameters. Use grid sampling if you can budget
to exhaustively search over the search space. Supports early termination of low-
performance runs.
Grid sampling does a simple grid search over all possible values. Grid sampling can only
be used with choice hyperparameters. For example, the following space has six samples:
Python
from azureml.train.hyperdrive import GridParameterSampling

from azureml.train.hyperdrive import choice
param_sampling = GridParameterSampling( {
"num_hidden_layers": choice(1, 2, 3),
"batch_size": choice(16, 32)
}
)
Bayesian sampling
Bayesian sampling is based on the Bayesian optimization algorithm. It picks samples
based on how previous samples did, so that new samples improve the primary metric.
Bayesian sampling is recommended if you have enough budget to explore the

hyperparameter space. For best results, we recommend a maximum number of runs
greater than or equal to 20 times the number of hyperparameters being tuned.
The number of concurrent runs has an impact on the effectiveness of the tuning
process. A smaller number of concurrent runs may lead to better sampling convergence,
since the smaller degree of parallelism increases the number of runs that benefit from
previously completed runs.
Bayesian sampling only supports choice , uniform , and quniform distributions over the
search space.
Python
from azureml.train.hyperdrive import BayesianParameterSampling

from azureml.train.hyperdrive import uniform, choice
param_sampling = BayesianParameterSampling( {
"learning_rate": uniform(0.05, 0.1),
}
)
Specify primary metric

Specify the primary metric you want hyperparameter tuning to optimize. Each training
run is evaluated for the primary metric. The early termination policy uses the primary
metric to identify low-performance runs.
Specify the following attributes for your primary metric:
primary_metric_name : The name of the primary metric needs to exactly match the
name of the metric logged by the training script

primary_metric_goal : It can be either PrimaryMetricGoal.MAXIMIZE or
PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be

maximized or minimized when evaluating the runs.
Python
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE
This sample maximizes "accuracy".
Log metrics for hyperparameter tuning

The training script for your model must log the primary metric during model training so
that HyperDrive can access it for hyperparameter tuning.
Log the primary metric in your training script with the following sample snippet:
Python
run_logger = Run.get_context()
run_logger.log("accuracy", float(val_accuracy))
The training script calculates the val_accuracy and logs it as the primary metric
"accuracy". Each time the metric is logged, it's received by the hyperparameter tuning
service. It's up to you to determine the frequency of reporting.
For more information on logging values in model training runs, see Enable logging in
Azure Machine Learning training runs.
Specify early termination policy

Automatically end poorly performing runs with an early termination policy. Early
termination improves computational efficiency.
You can configure the following parameters that control when a policy is applied:
evaluation_interval : the frequency of applying the policy. Each time the training
script logs the primary metric counts as one interval. An evaluation_interval of 1

will apply the policy every time the training script reports the primary metric. An
evaluation_interval of 2 will apply the policy every other time. If not specified,
evaluation_interval is set to 1 by default.

delay_evaluation : delays the first policy evaluation for a specified number of
intervals. This is an optional parameter that avoids premature termination of

training runs by allowing all configurations to run for a minimum number of
intervals. If specified, the policy applies every multiple of evaluation_interval that is
greater than or equal to delay_evaluation.
Azure Machine Learning supports the following early termination policies:
Bandit policy
Median stopping policy
Truncation selection policy
No termination policy
Bandit policy
Bandit policy is based on slack factor/slack amount and evaluation interval. Bandit ends
runs when the primary metric isn't within the specified slack factor/slack amount of the
most successful run.
７ Note
Bayesian sampling does not support early termination. When using Bayesian
sampling, set early_termination_policy = None .
Specify the following configuration parameters:
slack_factor or slack_amount : the slack allowed with respect to the best
performing training run. slack_factor specifies the allowable slack as a ratio.

slack_amount specifies the allowable slack as an absolute amount, instead of a
ratio.
For example, consider a Bandit policy applied at interval 10. Assume that the best
performing run at interval 10 reported a primary metric is 0.8 with a goal to
maximize the primary metric. If the policy specifies a slack_factor of 0.2, any
training runs whose best metric at interval 10 is less than 0.66
(0.8/(1+ slack_factor )) will be terminated.
evaluation_interval : (optional) the frequency for applying the policy
delay_evaluation : (optional) delays the first policy evaluation for a specified
number of intervals
Python
from azureml.train.hyperdrive import BanditPolicy

early_termination_policy = BanditPolicy(slack_factor = 0.1,
evaluation_interval=1, delay_evaluation=5)
In this example, the early termination policy is applied at every interval when metrics are
reported, starting at evaluation interval 5. Any run whose best metric is less than
(1/(1+0.1) or 91% of the best performing run will be terminated.
Median stopping policy

Median stopping is an early termination policy based on running averages of primary
metrics reported by the runs. This policy computes running averages across all training
runs and stops runs whose primary metric value is worse than the median of the
averages.
This policy takes the following configuration parameters:

evaluation_interval : the frequency for applying the policy (optional parameter).
delay_evaluation : delays the first policy evaluation for a specified number of

intervals (optional parameter).
Python
from azureml.train.hyperdrive import MedianStoppingPolicy

early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
In this example, the early termination policy is applied at every interval starting at
evaluation interval 5. A run is stopped at interval 5 if its best primary metric is worse
than the median of the running averages over intervals 1:5 across all training runs.
Truncation selection policy

Truncation selection cancels a percentage of lowest performing runs at each evaluation
interval. Runs are compared using the primary metric.
This policy takes the following configuration parameters:
truncation_percentage : the percentage of lowest performing runs to terminate at
each evaluation interval. An integer value between 1 and 99.

evaluation_interval : (optional) the frequency for applying the policy
delay_evaluation : (optional) delays the first policy evaluation for a specified
number of intervals
exclude_finished_jobs : specifies whether to exclude finished jobs when applying
the policy
Python
from azureml.train.hyperdrive import TruncationSelectionPolicy

early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1,
truncation_percentage=20, delay_evaluation=5, exclude_finished_jobs=true)
In this example, the early termination policy is applied at every interval starting at
evaluation interval 5. A run terminates at interval 5 if its performance at interval 5 is in
the lowest 20% of performance of all runs at interval 5 and will exclude finished jobs
when applying the policy.
No termination policy (default)

If no policy is specified, the hyperparameter tuning service will let all training runs
execute to completion.
Python
policy=None
Picking an early termination policy

For a conservative policy that provides savings without terminating promising jobs,
consider a Median Stopping Policy with evaluation_interval 1 and
delay_evaluation 5. These are conservative settings, that can provide
approximately 25%-35% savings with no loss on primary metric (based on our

evaluation data).
For more aggressive savings, use Bandit Policy with a smaller allowable slack or
Truncation Selection Policy with a larger truncation percentage.
Create and assign resources

Control your resource budget by specifying the maximum number of training runs.
max_total_runs : Maximum number of training runs. Must be an integer between 1
and 1000.
max_duration_minutes : (optional) Maximum duration, in minutes, of the
hyperparameter tuning experiment. Runs after this duration are canceled.
７ Note
If both max_total_runs and max_duration_minutes are specified, the

hyperparameter tuning experiment terminates when the first of these two
thresholds is reached.
Additionally, specify the maximum number of training runs to run concurrently during
your hyperparameter tuning search.
max_concurrent_runs : (optional) Maximum number of runs that can run
concurrently. If not specified, all runs launch in parallel. If specified, must be an

integer between 1 and 100.
７ Note
The number of concurrent runs is gated on the resources available in the specified
compute target. Ensure that the compute target has the available resources for the
desired concurrency.
Python
max_total_runs=20,
max_concurrent_runs=4
This code configures the hyperparameter tuning experiment to use a maximum of 20

total runs, running four configurations at a time.
Configure hyperparameter tuning experiment

To configure your hyperparameter tuning experiment, provide the following:
The defined hyperparameter search space

Your early termination policy
The primary metric
Resource allocation settings
ScriptRunConfig script_run_config
The ScriptRunConfig is the training script that will run with the sampled
hyperparameters. It defines the resources per job (single or multi-node), and the
compute target to use.
７ Note
The compute target used in script_run_config must have enough resources to

satisfy your concurrency level. For more information on ScriptRunConfig, see
Configure training runs.
Configure your hyperparameter tuning experiment:
Python
from azureml.train.hyperdrive import HyperDriveConfig

from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy,
uniform, PrimaryMetricGoal
param_sampling = RandomParameterSampling( {
'momentum': uniform(0.9, 0.99)
}
)
early_termination_policy = BanditPolicy(slack_factor=0.15,
evaluation_interval=1, delay_evaluation=10)
hd_config = HyperDriveConfig(run_config=script_run_config,
hyperparameter_sampling=param_sampling,
policy=early_termination_policy,
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=100,
max_concurrent_runs=4)
The HyperDriveConfig sets the parameters passed to the ScriptRunConfig

script_run_config . The script_run_config , in turn, passes parameters to the training
script. The above code snippet is taken from the sample notebook Train,
hyperparameter tune, and deploy with PyTorch . In this sample, the learning_rate and
momentum parameters will be tuned. Early stopping of runs will be determined by a
BanditPolicy , which stops a run whose primary metric falls outside the slack_factor
(see BanditPolicy class reference).
The following code from the sample shows how the being-tuned values are received,
parsed, and passed to the training script's fine_tune_model function:
Python
# from pytorch_train.py
def main():
print("Torch version:", torch.__version__)
# get command-line arguments

parser.add_argument('--num_epochs', type=int, default=25,
help='number of epochs to train')
parser.add_argument('--output_dir', type=str, help='output directory')
parser.add_argument('--learning_rate', type=float,
default=0.001, help='learning rate')
parser.add_argument('--momentum', type=float, default=0.9,
help='momentum')
data_dir = download_data()
print("data directory is: " + data_dir)
model = fine_tune_model(args.num_epochs, data_dir,
args.learning_rate, args.momentum)
os.makedirs(args.output_dir, exist_ok=True)
torch.save(model, os.path.join(args.output_dir, 'model.pt'))
） Important
Every hyperparameter run restarts the training from scratch, including rebuilding
the model and all the data loaders. You can minimize this cost by using an Azure
Machine Learning pipeline or manual process to do as much data preparation as
possible prior to your training runs.
Submit hyperparameter tuning experiment

After you define your hyperparameter tuning configuration, submit the experiment:
Python

experiment = Experiment(workspace, experiment_name)
hyperdrive_run = experiment.submit(hd_config)
Warm start hyperparameter tuning (optional)

Finding the best hyperparameter values for your model can be an iterative process. You
can reuse knowledge from the five previous runs to accelerate hyperparameter tuning.
Warm starting is handled differently depending on the sampling method:
Bayesian sampling: Trials from the previous run are used as prior knowledge to
pick new samples, and to improve the primary metric.
Random sampling or grid sampling: Early termination uses knowledge from
previous runs to determine poorly performing runs.
Specify the list of parent runs you want to warm start from.
Python
from azureml.train.hyperdrive import HyperDriveRun
warmstart_parent_1 = HyperDriveRun(experiment, "warmstart_parent_run_ID_1")

warmstart_parent_2 = HyperDriveRun(experiment, "warmstart_parent_run_ID_2")
warmstart_parents_to_resume_from = [warmstart_parent_1, warmstart_parent_2]
If a hyperparameter tuning experiment is canceled, you can resume training runs from
the last checkpoint. However, your training script must handle checkpoint logic.
The training run must use the same hyperparameter configuration and mounted the
outputs folders. The training script must accept the resume-from argument, which
contains the checkpoint or model files from which to resume the training run. You can
resume individual training runs using the following snippet:
Python
resume_child_run_1 = Run(experiment, "resume_child_run_ID_1")

resume_child_run_2 = Run(experiment, "resume_child_run_ID_2")
child_runs_to_resume = [resume_child_run_1, resume_child_run_2]
You can configure your hyperparameter tuning experiment to warm start from a
previous experiment or resume individual training runs using the optional parameters
resume_from and resume_child_runs in the config:
Python
from azureml.train.hyperdrive import HyperDriveConfig
hd_config = HyperDriveConfig(run_config=script_run_config,
resume_from=warmstart_parents_to_resume_from,
resume_child_runs=child_runs_to_resume,
max_total_runs=100,
Visualize hyperparameter tuning runs

You can visualize your hyperparameter tuning runs in the Azure Machine Learning
studio, or you can use a notebook widget.
Studio
You can visualize all of your hyperparameter tuning runs in the Azure Machine Learning
studio . For more information on how to view an experiment in the portal, see View
run records in the studio.
Metrics chart: This visualization tracks the metrics logged for each hyperdrive child
run over the duration of hyperparameter tuning. Each line represents a child run,
and each point measures the primary metric value at that iteration of runtime.
Parallel Coordinates Chart: This visualization shows the correlation between

primary metric performance and individual hyperparameter values. The chart is
interactive via movement of axes (click and drag by the axis label), and by
highlighting values across a single axis (click and drag vertically along a single axis
to highlight a range of desired values). The parallel coordinates chart includes an
axis on the right most portion of the chart that plots the best metric value
corresponding to the hyperparameters set for that run instance. This axis is
provided in order to project the chart gradient legend onto the data in a more
readable fashion.
2-Dimensional Scatter Chart: This visualization shows the correlation between any
two individual hyperparameters along with their associated primary metric value.
3-Dimensional Scatter Chart: This visualization is the same as 2D but allows for
three hyperparameter dimensions of correlation with the primary metric value. You
can also click and drag to reorient the chart to view different correlations in 3D
space.
Notebook widget
Use the Notebook widget to visualize the progress of your training runs. The following
snippet visualizes all your hyperparameter tuning runs in one place in a Jupyter
notebook:
Python

RunDetails(hyperdrive_run).show()
This code displays a table with details about the training runs for each of the
hyperparameter configurations.
You can also visualize the performance of each of the runs as training progresses.
Find the best model

Once all of the hyperparameter tuning runs have completed, identify the best
performing configuration and hyperparameter values:
Python
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']
print('Best Run Id: ', best_run.id)

print('\n Accuracy:', best_run_metrics['accuracy'])
print('\n learning rate:',parameter_values[3])
print('\n keep probability:',parameter_values[5])
print('\n batch size:',parameter_values[7])
Sample notebook
Refer to train-hyperparameter-* notebooks in this folder:
how-to-use-azureml/ml-frameworks
this service.
Next steps
Track an experiment
Deploy a trained model
Deploy a model to an Azure Kubernetes
Service cluster with v1
Article • 03/15/2023
） Important
This article shows how to use the CLI and SDK v1 to deploy a model. For the
recommended approach for v2, see Deploy and score a machine learning model
by using an online endpoint.
Learn how to use Azure Machine Learning to deploy a model as a web service on Azure
Kubernetes Service (AKS). Azure Kubernetes Service is good for high-scale production
deployments. Use Azure Kubernetes service if you need one or more of the following
capabilities:
Fast response time

Autoscaling of the deployed service
Logging
Model data collection
Authentication
TLS termination
Hardware acceleration options such as GPU and field-programmable gate arrays
(FPGA)
When deploying to Azure Kubernetes Service, you deploy to an AKS cluster that is
connected to your workspace. For information on connecting an AKS cluster to your
workspace, see Create and attach an Azure Kubernetes Service cluster.
） Important
We recommend that you debug locally before deploying to the web service. For
more information, see Debug Locally
You can also refer to Azure Machine Learning - Deploy to Local Notebook
７ Note

Prerequisites
A machine learning model registered in your workspace. If you don't have a

registered model, see How and where to deploy models.
The Azure CLI extension (v1) for Machine Learning service, Azure Machine Learning
） Important
Some of the Azure CLI commands in this article use the azure-cli-ml , or v1,
extension for Azure Machine Learning. Support for the v1 extension will end
on September 30, 2025. You will be able to install and use the v1 extension
until that date.

The Python code snippets in this article assume that the following variables are set:
ws - Set to your workspace.
model - Set to your registered model.
inference_config - Set to the inference configuration for the model.
For more information on setting these variables, see How and where to deploy
models.
The CLI snippets in this article assume that you've created an

inferenceconfig.json document. For more information on creating this document,
see How and where to deploy models.
An Azure Kubernetes Service cluster connected to your workspace. For more

information, see Create and attach an Azure Kubernetes Service cluster.
If you want to deploy models to GPU nodes or FPGA nodes (or any specific
SKU), then you must create a cluster with the specific SKU. There's no support
for creating a secondary node pool in an existing cluster and deploying models
in the secondary node pool.
Understand the deployment processes

The word "deployment" is used in both Kubernetes and Azure Machine Learning.
"Deployment" has different meanings in these two contexts. In Kubernetes, a
Deployment is a concrete entity, specified with a declarative YAML file. A Kubernetes
Deployment has a defined lifecycle and concrete relationships to other Kubernetes
entities such as Pods and ReplicaSets . You can learn about Kubernetes from docs and
videos at What is Kubernetes? .
In Azure Machine Learning, "deployment" is used in the more general sense of making
available and cleaning up your project resources. The steps that Azure Machine Learning
considers part of deployment are:
1. Zipping the files in your project folder, ignoring those specified in .amlignore or
.gitignore
2. Scaling up your compute cluster (Relates to Kubernetes)
3. Building or downloading the dockerfile to the compute node (Relates to
Kubernetes)
a. The system calculates a hash of:
The base image

Custom docker steps (see Deploy a model using a custom Docker base
image)
The conda definition YAML (see Create & use software environments in
Azure Machine Learning)
b. The system uses this hash as the key in a lookup of the workspace Azure
Container Registry (ACR)
c. If it isn't found, it looks for a match in the global ACR
d. If it isn't found, the system builds a new image (which will be cached and
pushed to the workspace ACR)
4. Downloading your zipped project file to temporary storage on the compute node
5. Unzipping the project file
6. The compute node executing python <entry script> <arguments>
7. Saving logs, model files, and other files written to ./outputs to the storage
account associated with the workspace
8. Scaling down compute, including removing temporary storage (Relates to
Kubernetes)
Azure Machine Learning router
The front-end component (azureml-fe) that routes incoming inference requests to
deployed services automatically scales as needed. Scaling of azureml-fe is based on the
AKS cluster purpose and size (number of nodes). The cluster purpose and nodes are
configured when you create or attach an AKS cluster. There's one azureml-fe service per
cluster, which may be running on multiple pods.
） Important
When using a cluster configured as dev-test, the self-scaler is disabled. Even for
FastProd/DenseProd clusters, Self-Scaler is only enabled when telemetry shows that
it's needed.
７ Note
The maximum request payload is 100MB.
Azureml-fe scales both up (vertically) to use more cores, and out (horizontally) to use
more pods. When making the decision to scale up, the time that it takes to route
incoming inference requests is used. If this time exceeds the threshold, a scale-up
occurs. If the time to route incoming requests continues to exceed the threshold, a
scale-out occurs.
When scaling down and in, CPU usage is used. If the CPU usage threshold is met, the
front end will first be scaled down. If the CPU usage drops to the scale-in threshold, a
scale-in operation happens. Scaling up and out will only occur if there are enough
cluster resources available.
When scale-up or scale-down, azureml-fe pods will be restarted to apply the

cpu/memory changes. Inferencing requests aren't affected by the restarts.
Understand connectivity requirements for AKS

inferencing cluster
When Azure Machine Learning creates or attaches an AKS cluster, AKS cluster is
deployed with one of the following two network models:
Kubenet networking - The network resources are typically created and configured
as the AKS cluster is deployed.
Azure Container Networking Interface (CNI) networking - The AKS cluster is
connected to an existing virtual network resource and configurations.
For Kubenet networking, the network is created and configured properly for Azure
Machine Learning service. For the CNI networking, you need to understand the
connectivity requirements and ensure DNS resolution and outbound connectivity for
AKS inferencing. For example, you may be using a firewall to block network traffic.
The following diagram shows the connectivity requirements for AKS inferencing. Black
arrows represent actual communication, and blue arrows represent the domain names.
You may need to add entries for these hosts to your firewall or to your custom DNS
server.
For general AKS connectivity requirements, see Control egress traffic for cluster nodes in
For accessing Azure Machine Learning services behind a firewall, see How to access
azureml behind firewall.
Overall DNS resolution requirements

DNS resolution within an existing VNet is under your control. For example, a firewall or
custom DNS server. The following hosts must be reachable:
Host name Used by
<cluster>.hcp.<region>.azmk8s.io AKS API server
mcr.microsoft.com Microsoft Container Registry (MCR)
<ACR name>.azurecr.io Your Azure Container Registry (ACR)
<account>.table.core.windows.net Azure Storage Account (table storage)
<account>.blob.core.windows.net Azure Storage Account (blob storage)
api.azureml.ms Azure Active Directory (Azure AD) authentication
ingest- Kusto endpoint for uploading telemetry

vienna<region>.kusto.windows.net
<leaf-domain-label + auto- Endpoint domain name, if you autogenerated by Azure

generated suffix>. Machine Learning. If you used a custom domain name, you
<region>.cloudapp.azure.com don't need this entry.
Connectivity requirements in chronological order: from

cluster creation to model deployment
In the process of AKS create or attach, Azure Machine Learning router (azureml-fe) is
deployed into the AKS cluster. In order to deploy Azure Machine Learning router, AKS
node should be able to:
Resolve DNS for AKS API server

Resolve DNS for MCR in order to download docker images for Azure Machine
Learning router
Download images from MCR, where outbound connectivity is required
Right after azureml-fe is deployed, it will attempt to start and this requires to:
Resolve DNS for AKS API server

Query AKS API server to discover other instances of itself (it's a multi-pod service)
Connect to other instances of itself
Once azureml-fe is started, it requires the following connectivity to function properly:
Connect to Azure Storage to download dynamic configuration

Resolve DNS for Azure AD authentication server api.azureml.ms and communicate
with it when the deployed service uses Azure AD authentication.
Query AKS API server to discover deployed models
Communicate to deployed model PODs
At model deployment time, for a successful model deployment AKS node should be
able to:
Resolve DNS for customer's ACR

Download images from customer's ACR
Resolve DNS for Azure BLOBs where model is stored
Download models from Azure BLOBs
After the model is deployed and service starts, azureml-fe will automatically discover it
using AKS API, and will be ready to route request to it. It must be able to communicate
to model PODs.
７ Note
If the deployed model requires any connectivity (e.g. querying external database or
other REST service, downloading a BLOB etc), then both DNS resolution and
outbound communication for these services should be enabled.
Deploy to AKS
To deploy a model to Azure Kubernetes Service, create a deployment configuration that
describes the compute resources needed. For example, number of cores and memory.
You also need an inference configuration, which describes the environment needed to
host the model and web service. For more information on creating the inference
configuration, see How and where to deploy models.
７ Note
The number of models to be deployed is limited to 1,000 models per deployment

(per container).
Python SDK
Python

from azureml.core.compute import AksCompute
aks_target = AksCompute(ws,"myaks")
# If deploying to a cluster configured for dev/test, ensure that it was
created with enough
# cores and memory to handle this deployment configuration. Note that
memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1)
service = Model.deploy(ws, "myservice", [model], inference_config,
deployment_config, aks_target)
service.wait_for_deployment(show_output = True)
print(service.state)
For more information on the classes, methods, and parameters used in this
example, see the following reference documents:
AksCompute
AksWebservice.deploy_configuration
Model.deploy
Webservice.wait_for_deployment
Autoscaling
The component that handles autoscaling for Azure Machine Learning model
deployments is azureml-fe, which is a smart request router. Since all inference requests
go through it, it has the necessary data to automatically scale the deployed model(s).
） Important
Do not enable Kubernetes Horizontal Pod Autoscaler (HPA) for model

deployments. Doing so would cause the two auto-scaling components to
compete with each other. Azureml-fe is designed to auto-scale models
deployed by Azure Machine Learning, where HPA would have to guess or
approximate model utilization from a generic metric like CPU usage or a
custom metric configuration.
Azureml-fe does not scale the number of nodes in an AKS cluster, because
this could lead to unexpected cost increases. Instead, it scales the number of
replicas for the model within the physical cluster boundaries. If you need to
scale the number of nodes within the cluster, you can manually scale the
cluster or configure the AKS cluster autoscaler.
Autoscaling can be controlled by setting autoscale_target_utilization ,

autoscale_min_replicas , and autoscale_max_replicas for the AKS web service. The
following example demonstrates how to enable autoscaling:
Python
autoscale_target_utilization=30,
autoscale_min_replicas=1,
autoscale_max_replicas=4)
Decisions to scale up/down is based off of utilization of the current container replicas.
The number of replicas that are busy (processing a request) divided by the total number
of current replicas is the current utilization. If this number exceeds
autoscale_target_utilization , then more replicas are created. If it's lower, then replicas
are reduced. By default, the target utilization is 70%.
Decisions to add replicas are eager and fast (around 1 second). Decisions to remove
replicas are conservative (around 1 minute).
You can calculate the required replicas by using the following code:
Python
from math import ceil

# target requests per second
targetRps = 20
# time to process the request (in seconds)
reqTime = 10
# Maximum requests per container
maxReqPerContainer = 1
# target_utilization. 70% in this example
targetUtilization = .7
concurrentRequests = targetRps * reqTime / targetUtilization
# Number of container replicas

replicas = ceil(concurrentRequests / maxReqPerContainer)
For more information on setting autoscale_target_utilization ,

autoscale_max_replicas , and autoscale_min_replicas , see the AksWebservice module
reference.
Web service authentication

When deploying to Azure Kubernetes Service, key-based authentication is enabled by
default. You can also enable token-based authentication. Token-based authentication
requires clients to use an Azure Active Directory account to request an authentication
token, which is used to make requests to the deployed service.
To disable authentication, set the auth_enabled=False parameter when creating the

deployment configuration. The following example disables authentication using the
SDK:
Python
deployment_config = AksWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1, auth_enabled=False)
For information on authenticating from a client application, see the Consume an Azure
Machine Learning model deployed as a web service.
Authentication with keys

If key authentication is enabled, you can use the get_keys method to retrieve a primary
and secondary authentication key:
Python
primary, secondary = service.get_keys()

print(primary)
） Important
If you need to regenerate a key, use service.regen_key
Authentication with tokens

To enable token authentication, set the token_auth_enabled=True parameter when you're
creating or updating a deployment. The following example enables token authentication
using the SDK:
Python
deployment_config = AksWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1, token_auth_enabled=True)
If token authentication is enabled, you can use the get_token method to retrieve a JWT
token and that token's expiration time:
Python
token, refresh_by = service.get_token()

print(token)
） Important
You will need to request a new token after the token's refresh_by time.
Microsoft strongly recommends that you create your Azure Machine Learning
workspace in the same region as your Azure Kubernetes Service cluster. To
authenticate with a token, the web service will make a call to the region in which
your Azure Machine Learning workspace is created. If your workspace's region is
unavailable, then you will not be able to fetch a token for your web service even, if
your cluster is in a different region than your workspace. This effectively results in
Token-based Authentication being unavailable until your workspace's region is
available again. In addition, the greater the distance between your cluster's region
and your workspace's region, the longer it will take to fetch a token.
To retrieve a token, you must use the Azure Machine Learning SDK or the az ml
service get-access-token command.
Vulnerability scanning
Microsoft Defender for Cloud provides unified security management and advanced
threat protection across hybrid cloud workloads. You should allow Microsoft Defender
for Cloud to scan your resources and follow its recommendations. For more, see Azure
Kubernetes Services integration with Defender for Cloud.
Next steps
Use Azure RBAC for Kubernetes authorization
Secure inferencing environment with Azure Virtual Network
Update web service
Deploy a model to Azure Container
Instances with CLI (v1)
Article • 03/02/2023
） Important
This article shows how to use the CLI and SDK v1 to deploy a model. For the
recommended approach for v2, see Deploy and score a machine learning model
by using an online endpoint.
Learn how to use Azure Machine Learning to deploy a model as a web service on Azure
Container Instances (ACI). Use Azure Container Instances if you:
prefer not to manage your own Kubernetes cluster

Are OK with having only a single replica of your service, which may impact uptime
For information on quota and region availability for ACI, see Quotas and region
availability for Azure Container Instances article.
） Important
It is highly advised to debug locally before deploying to the web service, for more
information see Debug Locally
You can also refer to Azure Machine Learning - Deploy to Local Notebook
Prerequisites
A machine learning model registered in your workspace. If you don't have a

registered model, see How and where to deploy models.
） Important
until that date.

The Python code snippets in this article assume that the following variables are set:
ws - Set to your workspace.
model - Set to your registered model.
inference_config - Set to the inference configuration for the model.
For more information on setting these variables, see How and where to deploy
models.
The CLI snippets in this article assume that you've created an

inferenceconfig.json document. For more information on creating this document,
see How and where to deploy models.
Limitations
When your Azure Machine Learning workspace is configured with a private endpoint,
deploying to Azure Container Instances in a VNet is not supported. Instead, consider
using a Managed online endpoint with network isolation.
Deploy to ACI
To deploy a model to Azure Container Instances, create a deployment configuration
that describes the compute resources needed. For example, number of cores and
memory. You also need an inference configuration, which describes the environment
needed to host the model and web service. For more information on creating the
inference configuration, see How and where to deploy models.
７ Note
ACI is suitable only for small models that are under 1 GB in size.
We recommend using single-node AKS to dev-test larger models.
The number of models to be deployed is limited to 1,000 models per
deployment (per container).
Using the SDK

Python
from azureml.core.webservice import AciWebservice, Webservice

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1)
service = Model.deploy(ws, "aciservice", [model], inference_config,
deployment_config)
For more information on the classes, methods, and parameters used in this example, see
the following reference documents:
AciWebservice.deploy_configuration
Model.deploy
Webservice.wait_for_deployment
Using the Azure CLI

To deploy using the CLI, use the following command. Replace mymodel:1 with the name
and version of the registered model. Replace myservice with the name to give this
service:
Azure CLI
az ml model deploy -n myservice -m mymodel:1 --ic inferenceconfig.json --dc

deploymentconfig.json

AciWebservice.deploy_configuration. The following table describes the mapping
computeType NA The compute target. For ACI, the value

must be ACI .
containerResourceRequirements NA Container for the CPU and memory

entities.
cpu cpu_cores The number of CPU cores to allocate.

Defaults, 0.1
memoryInGB memory_gb The amount of memory (in GB) to

allocate for this web service. Default,
0.5
location location The Azure region to deploy this

Webservice to. If not specified the
Workspace location will be used. More
details on available regions can be
found here: ACI Regions
authEnabled auth_enabled Whether to enable auth for this

sslEnabled ssl_enabled Whether to enable SSL for this

Webservice. Defaults to False.
appInsightsEnabled enable_app_insights Whether to enable AppInsights for this

sslCertificate ssl_cert_pem_file The cert file needed if SSL is enabled
sslKey ssl_key_pem_file The key file needed if SSL is enabled
cname ssl_cname The cname for if SSL is enabled
dnsNameLabel dns_name_label The dns name label for the scoring

endpoint. If not specified a unique dns
name label will be generated for the
scoring endpoint.
JSON
{
{
"cpu": 0.5,
"memoryInGB": 1.0
},
}
For more information, see the az ml model deploy reference.
Using VS Code
See how to manage resources in VS Code.
） Important
You don't need to create an ACI container to test in advance. ACI containers are
created as needed.
） Important
We append hashed workspace id to all underlying ACI resources which are created,
all ACI names from same workspace will have same suffix. The Azure Machine
Learning service name would still be the same customer provided "service_name"
and all the user facing Azure Machine Learning SDK APIs do not need any change.
We do not give any guarantees on the names of underlying resources being
created.
Next steps
Update the web service
Deploy MLflow models as Azure web
services
Article • 06/01/2023
In this article, learn how to deploy your MLflow model as an Azure web service, so you
can leverage and apply Azure Machine Learning's model management and data drift
detection capabilities to your production models. See MLflow and Azure Machine
Learning for additional MLflow and Azure Machine Learning functionality integrations.
Azure Machine Learning offers deployment configurations for:
Azure Container Instance (ACI) which is a suitable choice for a quick dev-test
deployment.
Azure Kubernetes Service (AKS) which is recommended for scalable production
deployments.
７ Note

 Tip
The information in this document is primarily for data scientists and developers
who want to deploy their MLflow model to an Azure Machine Learning web service
endpoint. If you are an administrator interested in monitoring resource usage and
events from Azure Machine Learning, such as quotas, completed training runs, or
completed model deployments, see Monitoring Azure Machine Learning.
MLflow with Azure Machine Learning

deployment
MLflow is an open-source library for managing the life cycle of your machine learning
experiments. Its integration with Azure Machine Learning allows for you to extend this
management beyond model training to the deployment phase of your production
model.
The following diagram demonstrates that with the MLflow deploy API and Azure
Machine Learning, you can deploy models created with popular frameworks, like
PyTorch, Tensorflow, scikit-learn, etc., as Azure web services and manage them in your
workspace.
Prerequisites
A machine learning model. If you don't have a trained model, find the notebook
example that best fits your compute scenario in this repo and follow its
instructions.
Set up the MLflow Tracking URI to connect Azure Machine Learning.
Install the azureml-mlflow package.
This package automatically brings in azureml-core of the The Azure Machine
Learning Python SDK, which provides the connectivity for MLflow to access your
workspace.
See which access permissions you need to perform your MLflow operations with
your workspace.
Deploy to Azure Container Instance (ACI)

To deploy your MLflow model to an Azure Machine Learning web service, your model
must be set up with the MLflow Tracking URI to connect with Azure Machine Learning.
In order to deploy to ACI, you don't need to define any deployment configuration, the
service will default to an ACI deployment when a config is not provided. Then, register
and deploy the model in one step with MLflow's deploy method for Azure Machine
Learning.
Python
from mlflow.deployments import get_deploy_client
# set the tracking uri as the deployment client

client = get_deploy_client(mlflow.get_tracking_uri())
# set the model path

model_path = "model"
# define the model path and the name is the service name
# the model gets registered automatically and a name is autogenerated using
the "name" parameter below
client.create_deployment(name="mlflow-test-aci",
model_uri='runs:/{}/{}'.format(run.id, model_path))
Customize deployment configuration

If you prefer not to use the defaults, you can set up your deployment configuration with
a deployment config json file that uses parameters from the deploy_configuration()
method as reference.
For your deployment config json file, each of the deployment config parameters need to
be defined in the form of a dictionary. The following is an example. Learn more about
what your deployment configuration json file can contain.
Azure Container Instance deployment configuration

schema
JSON
{"computeType": "aci",
"containerResourceRequirements": {"cpu": 1, "memoryInGB": 1},
"location": "eastus2"
}
Your json file can then be used to create your deployment.

Python
# set the deployment config

deploy_path = "deployment_config.json"
test_config = {'deploy-config-file': deploy_path}
client.create_deployment(model_uri='runs:/{}/{}'.format(run.id, model_path),
config=test_config,
name="mlflow-test-aci")
Deploy to Azure Kubernetes Service (AKS)

To deploy your MLflow model to an Azure Machine Learning web service, your model
must be set up with the MLflow Tracking URI to connect with Azure Machine Learning.
To deploy to AKS, first create an AKS cluster. Create an AKS cluster using the
ComputeTarget.create() method. It may take 20-25 minutes to create a new cluster.
Python
# Use the default configuration (can also provide parameters to customize)

aks_name = 'aks-mlflow'

name=aks_name,
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)
Create a deployment config json using deploy_configuration() method values as a

reference. Each of the deployment config parameters simply need to be defined as a
dictionary. Here's an example below:
JSON
{"computeType": "aks", "computeTargetName": "aks-mlflow"}
Then, register and deploy the model in one step with MLflow's deployment client .
Python
from mlflow.deployments import get_deploy_client
# set the tracking uri as the deployment client

client = get_deploy_client(mlflow.get_tracking_uri())
# set the model path

model_path = "model"
# set the deployment config

deploy_path = "deployment_config.json"
test_config = {'deploy-config-file': deploy_path}
# define the model path and the name is the service name
# the model gets registered automatically and a name is autogenerated using
the "name" parameter below
client.create_deployment(model_uri='runs:/{}/{}'.format(run.id, model_path),
config=test_config,
name="mlflow-test-aci")
The service deployment can take several minutes.
Clean up resources
If you don't plan to use your deployed web service, use service.delete() to delete it
from your notebook. For more information, see the documentation for
WebService.delete().
Example notebooks
The MLflow with Azure Machine Learning notebooks demonstrate and expand upon
concepts presented in this article.
７ Note
A community-driven repository of examples using mlflow can be found at

https://github.com/Azure/azureml-examples .
Next steps
Manage your models.
Monitor your production models for data drift.
Track Azure Databricks runs with MLflow.
Update a deployed web service (v1)
Article • 03/02/2023
In this article, you learn how to update a web service that was deployed with Azure
Machine Learning.
Prerequisites
This article assumes you have already deployed a web service with Azure Machine
Learning. If you need to learn how to deploy a web service, follow these steps.
The code snippets in this article assume that the ws variable has already been
initialized to your workspace by using the Workflow() constructor or loading a
saved configuration with Workspace.from_config(). The following snippet
demonstrates how to use the constructor:
Python

ws = Workspace(subscription_id="mysubscriptionid",
resource_group="myresourcegroup",
workspace_name="myworkspace")
） Important
September 30, 2025. You will be able to install and use the v1 extension until that
date.

and Python SDK v2.
Update web service

To update a web service, use the update method. You can update the web service to use
a new model, a new entry script, or new dependencies that can be specified in an
inference configuration. For more information, see the documentation for
Webservice.update.
See AKS Service Update Method.
See ACI Service Update Method.
） Important
When you create a new version of a model, you must manually update each service
that you want to use it.
You can not use the SDK to update a web service published from the Azure
Machine Learning designer.
） Important
Azure Kubernetes Service uses Blobfuse FlexVolume driver for the versions
<=1.16 and Blob CSI driver for the versions >=1.17.
Therefore, it is important to re-deploy or update the web service after cluster

upgrade in order to deploy to correct blobfuse method for the cluster version.
７ Note
When an operation is already in progress, any new operation on that same web
service will respond with 409 conflict error. For example, If create or update web
service operation is in progress and if you trigger a new Delete operation it will
throw an error.
Using the SDK
The following code shows how to use the SDK to update the model, environment, and
entry script for a web service:
Python
# Register new model.

new_model = Model.register(model_path="outputs/sklearn_mnist_model.pkl",
model_name="sklearn_mnist",
tags={"key": "0.1"},
description="test",
workspace=ws)
# Use version 3 of the environment.

deploy_env = Environment.get(workspace=ws,name="myenv",version="3")
environment=deploy_env)
service_name = 'myservice'
# Retrieve existing service.
service = Webservice(name=service_name, workspace=ws)
# Update to new model(s).

service.update(models=[new_model], inference_config=inference_config)
Using the CLI
You can also update a web service by using the ML CLI. The following example
demonstrates registering a new model and then updating a web service to use the new
model:
Azure CLI
az ml model register -n sklearn_mnist --asset-path

outputs/sklearn_mnist_model.pkl --experiment-name myexperiment --output-
metadata-file modelinfo.json
az ml service update -n myservice --model-metadata-file modelinfo.json
 Tip
In this example, a JSON document is used to pass the model information from the
registration command into the update command.
To update the service to use a new entry script or environment, create an inference
configuration file and specify it with the ic parameter.
For more information, see the az ml service update documentation.
Next steps
Create client applications to consume web services
Profile your model to determine
resource utilization
Article • 03/02/2023
This article shows how to profile a machine learning to model to determine how much
CPU and memory you will need to allocate for the model when deploying it as a web
service.
） Important
This article applies to CLI v1 and SDK v1. This profiling technique is not available for
v2 of either CLI or SDK.
） Important
date.

and Python SDK v2.
Prerequisites
This article assumes you have trained and registered a model with Azure Machine
Learning. See the sample tutorial here for an example of training and registering a scikit-
learn model with Azure Machine Learning.
Limitations
Profiling will not work when the Azure Container Registry (ACR) for your workspace
is behind a virtual network.
Run the profiler
Once you have registered your model and prepared the other components necessary
for its deployment, you can determine the CPU and memory the deployed service will
need. Profiling tests the service that runs your model and returns information such as
the CPU usage, memory usage, and response latency. It also provides a
recommendation for the CPU and memory based on resource usage.
In order to profile your model, you will need:
A registered model.
An inference configuration based on your entry script and inference environment
definition.
A single column tabular dataset, where each row contains a string representing
sample request data.
） Important
At this point we only support profiling of services that expect their request data to
be a string, for example: string serialized json, text, string serialized image, etc. The
content of each row of the dataset (string) will be put into the body of the HTTP
request and sent to the service encapsulating the model for scoring.
） Important
We only support profiling up to 2 CPUs in ChinaEast2 and USGovArizona region.
Below is an example of how you can construct an input dataset to profile a service that
expects its incoming request data to contain serialized json. In this case, we created a
dataset based 100 instances of the same request data content. In real world scenarios
we suggest that you use larger datasets containing various inputs, especially if your
model resource usage/behavior is input dependent.
Python
import json
from azureml.core import Datastore
from azureml.data import dataset_type_definitions
input_json = {'data': [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}
# create a string that can be utf-8 encoded and
# put in the body of the request
serialized_input_json = json.dumps(input_json)
dataset_content = []
for i in range(100):
dataset_content.append(serialized_input_json)
dataset_content = '\n'.join(dataset_content)
file_name = 'sample_request_data.txt'
f = open(file_name, 'w')
f.write(dataset_content)
f.close()
# upload the txt file created above to the Datastore and create a dataset
from it
data_store = Datastore.get_default(ws)
data_store.upload_files(['./' + file_name],
target_path='sample_request_data')
datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]
sample_request_data = Dataset.Tabular.from_delimited_files(
datastore_path, separator='\n',
infer_column_types=True,
header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)
sample_request_data = sample_request_data.register(workspace=ws,
name='sample_request_data',
Once you have the dataset containing sample request data ready, create an inference
configuration. Inference configuration is based on the score.py and the environment
definition. The following example demonstrates how to create the inference
configuration and run profiling:
Python
from azureml.core.model import InferenceConfig, Model

model = Model(ws, id=model_id)

inference_config = InferenceConfig(entry_script='path-to-score.py',
environment=myenv)
input_dataset = Dataset.get_by_name(workspace=ws,
name='sample_request_data')
profile = Model.profile(ws,
'unique_name',
[model],
inference_config,
input_dataset=input_dataset)
profile.wait_for_completion(True)
# see the result
details = profile.get_details()
Next steps
Deploy to Azure Kubernetes Service
Update web service
Troubleshooting remote model
deployment
Article • 03/02/2023
Learn how to troubleshoot and solve, or work around, common errors you may
encounter when deploying a model to Azure Container Instances (ACI) and Azure
Kubernetes Service (AKS) using Azure Machine Learning.
７ Note
If you are deploying a model to Azure Kubernetes Service (AKS), we advise you
enable Azure Monitor for that cluster. This will help you understand overall cluster
health and resource usage. You might also find the following resources useful:
Check for Resource Health events impacting your AKS cluster

Azure Kubernetes Service Diagnostics
If you are trying to deploy a model to an unhealthy or overloaded cluster, it is

expected to experience issues. If you need help troubleshooting AKS cluster
problems please contact AKS Support.
Prerequisites
An Azure subscription. Try the free or paid version of Azure Machine Learning .
The Azure Machine Learning SDK.
The Azure CLI.
The CLI extension v1 for Azure Machine Learning.
） Important
until that date.

Steps for Docker deployment of machine

learning models
When you deploy a model to non-local compute in Azure Machine Learning, the
following things happen:
1. The Dockerfile you specified in your Environments object in your InferenceConfig is

sent to the cloud, along with the contents of your source directory
2. If a previously built image isn't available in your container registry, a new Docker
image is built in the cloud and stored in your workspace's default container
registry.
3. The Docker image from your container registry is downloaded to your compute
target.
4. Your workspace's default Blob store is mounted to your compute target, giving you
access to registered models
5. Your web server is initialized by running your entry script's init() function
6. When your deployed model receives a request, your run() function handles that
request
The main difference when using a local deployment is that the container image is built
on your local machine, which is why you need to have Docker installed for a local
deployment.
Understanding these high-level steps should help you understand where errors are
happening.
Get deployment logs

The first step in debugging errors is to get your deployment logs. First, follow the
instructions here to connect to your workspace.
Azure CLI
To get the logs from a deployed webservice, do:
Azure CLI
az ml service get-logs --verbose --workspace-name <my workspace name> --
name <service name>
Debug locally
If you have problems when deploying a model to ACI or AKS, deploy it as a local web
service. Using a local web service makes it easier to troubleshoot problems. To
troubleshoot a deployment locally, see the local troubleshooting article.
Azure Machine Learning inference HTTP server

The local inference server allows you to quickly debug your entry script ( score.py ). In
case the underlying score script has a bug, the server will fail to initialize or serve the
model. Instead, it will throw an exception & the location where the issues occurred.
Learn more about Azure Machine Learning inference HTTP Server
1. Install the azureml-inference-server-http package from the pypi feed:
Bash
python -m pip install azureml-inference-server-http
2. Start the server and set score.py as the entry script:
Bash
azmlinfsrv --entry_script score.py
3. Send a scoring request to the server using curl :
Bash
curl -p 127.0.0.1:5001/score
７ Note
Learn frequently asked questions about Azure machine learning Inference HTTP
server.
Container can't be scheduled
When deploying a service to an Azure Kubernetes Service compute target, Azure
Machine Learning will attempt to schedule the service with the requested amount of
resources. If there are no nodes available in the cluster with the appropriate amount of
resources after 5 minutes, the deployment will fail. The failure message is Couldn't
Schedule because the kubernetes cluster didn't have available resources after
trying for 00:05:00 . You can address this error by either adding more nodes, changing
the SKU of your nodes, or changing the resource requirements of your service.
The error message will typically indicate which resource you need more of - for instance,
if you see an error message indicating 0/3 nodes are available: 3 Insufficient
nvidia.com/gpu that means that the service requires GPUs and there are three nodes in
the cluster that don't have available GPUs. This could be addressed by adding more
nodes if you're using a GPU SKU, switching to a GPU enabled SKU if you aren't or
changing your environment to not require GPUs.
Service launch fails

After the image is successfully built, the system attempts to start a container using your
deployment configuration. As part of container starting-up process, the init() function
in your scoring script is invoked by the system. If there are uncaught exceptions in the
init() function, you might see CrashLoopBackOff error in the error message.
Use the info in the Inspect the Docker log article.
Container azureml-fe-aci launch fails

When deploying a service to an Azure Container Instance compute target, Azure
Machine Learning attempts to create a front-end container that has the name azureml-
fe-aci for the inference request. If azureml-fe-aci crashes, you can see logs by running
az container logs --name MyContainerGroup --resource-group MyResourceGroup --
subscription MySubscription --container-name azureml-fe-aci . You can follow the error

message in the logs to make the fix.
The most common failure for azureml-fe-aci is that the provided SSL certificate or key
is invalid.
Function fails: get_model_path()

Often, in the init() function in the scoring script, Model.get_model_path() function is
called to locate a model file or a folder of model files in the container. If the model file
or folder can't be found, the function fails. The easiest way to debug this error is to run
the below Python code in the Container shell:
Python

import logging
logging.basicConfig(level=logging.DEBUG)
print(Model.get_model_path(model_name='my-best-model'))
This example prints the local path (relative to /var/azureml-app ) in the container where
your scoring script is expecting to find the model file or folder. Then you can verify if the
file or folder is indeed where it's expected to be.
Setting the logging level to DEBUG may cause additional information to be logged,
which may be useful in identifying the failure.
Function fails: run(input_data)

If the service is successfully deployed, but it crashes when you post data to the scoring
endpoint, you can add error catching statement in your run(input_data) function so
that it returns detailed error message instead. For example:
Python
def run(input_data):
try:
data = json.loads(input_data)['data']
data = np.array(data)
return json.dumps({"result": result.tolist()})
result = str(e)
# return error message back to the client
return json.dumps({"error": result})
Note: Returning error messages from the run(input_data) call should be done for
debugging purpose only. For security reasons, you shouldn't return error messages this
way in a production environment.
HTTP status code 502
A 502 status code indicates that the service has thrown an exception or crashed in the
run() method of the score.py file. Use the information in this article to debug the file.

Azure Kubernetes Service deployments support autoscaling, which allows replicas to be
added to support extra load. The autoscaler is designed to handle gradual changes in
load. If you receive large spikes in requests per second, clients may receive an HTTP
status code 503. Even though the autoscaler reacts quickly, it takes AKS a significant
amount of time to create more containers.
Decisions to scale up/down is based off of utilization of the current container replicas.
The number of replicas that are busy (processing a request) divided by the total number
of current replicas is the current utilization. If this number exceeds
autoscale_target_utilization , then more replicas are created. If it's lower, then replicas
are reduced. Decisions to add replicas are eager and fast (around 1 second). Decisions
to remove replicas are conservative (around 1 minute). By default, autoscaling target
utilization is set to 70%, which means that the service can handle spikes in requests per
second (RPS) of up to 30%.
There are two things that can help prevent 503 status codes:
 Tip
These two approaches can be used individually or in combination.
Change the utilization level at which autoscaling creates new replicas. You can
adjust the utilization target by setting the autoscale_target_utilization to a lower
value.
） Important
This change does not cause replicas to be created faster. Instead, they are
created at a lower utilization threshold. Instead of waiting until the service is
70% utilized, changing the value to 30% causes replicas to be created when
30% utilization occurs.
If the web service is already using the current max replicas and you're still seeing
503 status codes, increase the autoscale_max_replicas value to increase the
maximum number of replicas.
Change the minimum number of replicas. Increasing the minimum replicas

provides a larger pool to handle the incoming spikes.
To increase the minimum number of replicas, set autoscale_min_replicas to a

higher value. You can calculate the required replicas by using the following code,
replacing values with values specific to your project:
Python
from math import ceil

# target requests per second
targetRps = 20
# time to process the request (in seconds)
reqTime = 10
# Maximum requests per container
maxReqPerContainer = 1
# target_utilization. 70% in this example
targetUtilization = .7
concurrentRequests = targetRps * reqTime / targetUtilization
# Number of container replicas

replicas = ceil(concurrentRequests / maxReqPerContainer)
７ Note
If you receive request spikes larger than the new minimum replicas can
handle, you may receive 503s again. For example, as traffic to your service
increases, you may need to increase the minimum replicas.
For more information on setting autoscale_target_utilization ,

autoscale_max_replicas , and autoscale_min_replicas for, see the AksWebservice
module reference.

A 504 status code indicates that the request has timed out. The default timeout is 1
minute.
You can increase the timeout or try to speed up the service by modifying the score.py to
remove unnecessary calls. If these actions don't correct the problem, use the
information in this article to debug the score.py file. The code may be in a non-
responsive state or an infinite loop.
Other error messages

Take these actions for the following errors:
Error Resolution
409 conflict error When an operation is already in progress, any

new operation on that same web service will
respond with 409 conflict error. For example, If
create or update web service operation is in
progress and if you trigger a new Delete
operation it will throw an error.
Image building failure when deploying web Add "pynacl==1.2.1" as a pip dependency to
service Conda file for image configuration
['DaskOnBatch:context_managers.DaskOnBatch', Change the SKU for VMs used in your

'setup.py']' died with <Signals.SIGKILL: 9> deployment to one that has more memory.
FPGA failure You won't be able to deploy models on FPGAs

until you've requested and been approved for
FPGA quota. To request access, fill out the quota
request form: https://aka.ms/aml-real-time-ai
Advanced debugging
You may need to interactively debug the Python code contained in your model
deployment. For example, if the entry script is failing and the reason can't be
determined by extra logging. By using Visual Studio Code and the debugpy, you can
attach to the code running inside the Docker container.
For more information, visit the interactive debugging in VS Code guide.
Model deployment user forum
Next steps
Learn more about deployment:
How to deploy and wherezzs
How to run and debug experiments locally
Troubleshooting with a local model
deployment
Article • 03/02/2023
Try a local model deployment as a first step in troubleshooting deployment to Azure

Container Instances (ACI) or Azure Kubernetes Service (AKS). Using a local web service
makes it easier to spot and fix common Azure Machine Learning Docker web service
deployment errors.
Prerequisites
An Azure subscription. Try the free or paid version of Azure Machine Learning .
Option A (Recommended) - Debug locally on Azure Machine Learning Compute
Instance
An Azure Machine Learning Workspace with compute instance running
Option B - Debug locally on your compute
The Azure Machine Learning SDK.
The Azure CLI.
The CLI extension for Azure Machine Learning.
Have a working Docker installation on your local system.
To verify your Docker installation, use the command docker run hello-world
from a terminal or command prompt. For information on installing Docker, or
troubleshooting Docker errors, see the Docker Documentation .
Option C - Enable local debugging with Azure Machine Learning inference HTTP
server.
The Azure Machine Learning inference HTTP server (preview) is a Python
package that allows you to easily validate your entry script ( score.py ) in a local
development environment. If there's a problem with the scoring script, the
server will return an error. It will also return the location where the error
occurred.
The server can also be used when creating validation gates in a continuous
integration and deployment pipeline. For example, start the server with thee
candidate script and run the test suite against the local endpoint.
Azure Machine Learning inference HTTP server

1. Install the azureml-inference-server-http package from the pypi feed:
Bash
python -m pip install azureml-inference-server-http
2. Start the server and set score.py as the entry script:
Bash
azmlinfsrv --entry_script score.py
3. Send a scoring request to the server using curl :
Bash
curl -p 127.0.0.1:5001/score
７ Note
Learn frequently asked questions about Azure machine learning Inference HTTP
server.
Debug locally
You can find a sample local deployment notebook in the
MachineLearningNotebooks repo to explore a runnable example.
２ Warning
Local web service deployments are not supported for production scenarios.
To deploy locally, modify your code to use LocalWebservice.deploy_configuration() to

create a deployment configuration. Then use Model.deploy() to deploy the service. The
following example deploys a model (contained in the model variable) as a local web
service:

Python

from azureml.core.webservice import LocalWebservice
# Create inference configuration based on the environment definition and the

entry script
myenv = Environment.from_conda_specification(name="env",
environment=myenv)
# Create a local deployment, using port 8890 for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration(port=8890)
# Deploy the service
ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
service.wait_for_deployment(True)
# Display the port that the web service is available on
print(service.port)
If you are defining your own conda specification YAML, list azureml-defaults version >=
1.0.45 as a pip dependency. This package is needed to host the model as a web service.
At this point, you can work with the service as normal. The following code demonstrates
sending data to the service:
Python
import json
test_sample = json.dumps({'data': [
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
]})
test_sample = bytes(test_sample, encoding='utf8')
prediction = service.run(input_data=test_sample)
print(prediction)
For more information on customizing your Python environment, see Create and manage
environments for training and deployment.
Update the service

During local testing, you may need to update the score.py file to add logging or
attempt to resolve any problems that you've discovered. To reload changes to the
score.py file, use reload() . For example, the following code reloads the script for the
service, and then sends data to it. The data is scored using the updated score.py file:
） Important
The reload method is only available for local deployments. For information on
updating a deployment to another compute target, see how to update your
webservice.
Python
service.reload()
print(service.run(input_data=test_sample))
７ Note
The script is reloaded from the location specified by the InferenceConfig object
used by the service.
To change the model, Conda dependencies, or deployment configuration, use update().

The following example updates the model used by the service:
Python
service.update([different_model], inference_config, deployment_config)
Delete the service

To delete the service, use delete().
Inspect the Docker log

You can print out detailed Docker engine log messages from the service object. You can
view the log for ACI, AKS, and Local deployments. The following example demonstrates
how to print the logs.
Python
# if you already have the service object handy
# if you only know the name of the service (note there might be multiple
services with the same name but different version number)
print(ws.webservices['mysvc'].get_logs())
If you see the line Booting worker with pid: <pid> occurring multiple times in the logs,
it means, there isn't enough memory to start the worker. You can address the error by
increasing the value of memory_gb in deployment_config
Next steps
Learn more about deployment:
How to troubleshoot remote deployments

Azure Machine Learning inference HTTP Server
How to run and debug experiments locally
Configure authentication for models
deployed as web services
Article • 03/02/2023
Azure Machine Learning allows you to deploy your trained machine learning models as
web services. In this article, learn how to configure authentication for these
deployments.
The model deployments created by Azure Machine Learning can be configured to use
one of two authentication methods:
key-based: A static key is used to authenticate to the web service.
token-based: A temporary token must be obtained from the Azure Machine

Learning workspace (using Azure Active Directory) and used to authenticate to the
web service. This token expires after a period of time, and must be refreshed to
continue working with the web service.
７ Note
Token-based authentication is only available when deploying to Azure

Kubernetes Service.
Key-based authentication
Web-services deployed on Azure Kubernetes Service (AKS) have key-based auth enabled
by default.
Azure Container Instances (ACI) deployed services have key-based auth disabled by
default, but you can enable it by setting auth_enabled=True when creating the ACI web-
service. The following code is an example of creating an ACI deployment configuration
with key-based auth enabled.
Python
aci_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1,
auth_enabled=True)
Then you can use the custom ACI configuration in deployment using the Model class.
Python
environment=myenv)
aci_service = Model.deploy(workspace=ws,
name="aci_service_sample",
models=[model],
deployment_config=aci_config)
aci_service.wait_for_deployment(True)
To fetch the auth keys, use aci_service.get_keys() . To regenerate a key, use the
regen_key() function and pass either Primary or Secondary.
Python
aci_service.regen_key("Primary")
# or
aci_service.regen_key("Secondary")
Token-based authentication
When you enable token authentication for a web service, users must present an Azure
Machine Learning JSON Web Token to the web service to access it. The token expires
after a specified time-frame and needs to be refreshed to continue making calls.
Token authentication is disabled by default when you deploy to Azure Kubernetes

Service.
Token authentication isn't supported when you deploy to Azure Container
Instances.
Token authentication can't be used at the same time as key-based authentication.
To control token authentication, use the token_auth_enabled parameter when you create
or update a deployment:
Python
# Create the config

aks_config = AksWebservice.deploy_configuration()
# Enable token auth and disable (key) auth on the webservice

aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True,
auth_enabled=False)
aks_service_name ='aks-service-1'
# deploy the model

aks_service = Model.deploy(workspace=ws,
name=aks_service_name,
models=[model],
deployment_target=aks_target)
aks_service.wait_for_deployment(show_output = True)
If token authentication is enabled, you can use the get_token method to retrieve a JSON
Web Token (JWT) and that token's expiration time:
 Tip
If you use a service principal to get the token, and want it to have the minimum
required access to retrieve a token, assign it to the reader role for the workspace.
Python
token, refresh_by = aks_service.get_token()

print(token)
） Important
You'll need to request a new token after the token's refresh_by time. If you need to
refresh tokens outside of the Python SDK, one option is to use the REST API with
service-principal authentication to periodically make the service.get_token() call,
as discussed previously.
We strongly recommend that you create your Azure Machine Learning workspace
in the same region as your Azure Kubernetes Service cluster.
To authenticate with a token, the web service will make a call to the region in which
your Azure Machine Learning workspace is created. If your workspace region is
unavailable, you won't be able to fetch a token for your web service, even if your
cluster is in a different region from your workspace. The result is that Azure AD
Authentication is unavailable until your workspace region is available again.
Also, the greater the distance between your cluster's region and your workspace
region, the longer it will take to fetch a token.
Next steps
For more information on authenticating to a deployed model, see Create a client for a
model deployed as a web service.
Consume an Azure Machine Learning
model deployed as a web service
Article • 03/02/2023
Deploying an Azure Machine Learning model as a web service creates a REST API
endpoint. You can send data to this endpoint and receive the prediction returned by the
model. In this document, learn how to create clients for the web service by using C#, Go,
Java, and Python.
You create a web service when you deploy a model to your local environment, Azure
Container Instances, Azure Kubernetes Service, or field-programmable gate arrays
(FPGA). You retrieve the URI used to access the web service by using the Azure Machine
Learning SDK. If authentication is enabled, you can also use the SDK to get the
authentication keys or tokens.
The general workflow for creating a client that uses a machine learning web service is:
1. Use the SDK to get the connection information.

2. Determine the type of request data used by the model.
3. Create an application that calls the web service.
 Tip
The examples in this document are manually created without the use of OpenAPI
(Swagger) specifications. If you've enabled an OpenAPI specification for your
deployment, you can use tools such as swagger-codegen to create client
libraries for your service.
） Important
date.

and Python SDK v2.
Connection information
７ Note
Use the Azure Machine Learning SDK to get the web service information. This is a
Python SDK. You can use any language to create a client for the service.
The azureml.core.Webservice class provides the information you need to create a client.
The following Webservice properties are useful for creating a client application:
auth_enabled - If key authentication is enabled, True ; otherwise, False .
token_auth_enabled - If token authentication is enabled, True ; otherwise, False .
scoring_uri - The REST API address.

swagger_uri - The address of the OpenAPI specification. This URI is available if you
enabled automatic schema generation. For more information, see Deploy models
with Azure Machine Learning.
There are several ways to retrieve this information for deployed web services:
Python SDK
When you deploy a model, a Webservice object is returned with information

about the service:
Python
service = Model.deploy(ws, "myservice", [model], inference_config,

deployment_config)
print(service.scoring_uri)
print(service.swagger_uri)
You can use Webservice.list to retrieve a list of deployed web services for
models in your workspace. You can add filters to narrow the list of information
returned. For more information about what can be filtered on, see the
Webservice.list reference documentation.
Python
services = Webservice.list(ws)
print(services[0].scoring_uri)
print(services[0].swagger_uri)
If you know the name of the deployed service, you can create a new instance
of Webservice , and provide the workspace and service name as parameters.
The new object contains information about the deployed service.
Python
service = Webservice(workspace=ws, name='myservice')

print(service.scoring_uri)
print(service.swagger_uri)
The following table shows what these URIs look like:
URI type Example
Scoring URI http://104.214.29.152:80/api/v1/service/<service-name>/score
Swagger URI http://104.214.29.152/api/v1/service/<service-name>/swagger.json
 Tip
The IP address will be different for your deployment. Each AKS cluster will have its
own IP address that is shared by deployments to that cluster.
Secured web service

If you secured the deployed web service using a TLS/SSL certificate, you can use
HTTPS to connect to the service using the scoring or swagger URI. HTTPS helps secure
communications between a client and a web service by encrypting communications
between the two. Encryption uses Transport Layer Security (TLS) . TLS is sometimes still
referred to as Secure Sockets Layer (SSL), which was the predecessor of TLS.
） Important
Web services deployed by Azure Machine Learning only support TLS version 1.2.
When creating a client application, make sure that it supports this version.
For more information, see Use TLS to secure a web service through Azure Machine
Learning.
Authentication for services
Azure Machine Learning provides two ways to control access to your web services.
Authentication Method ACI AKS
Key Disabled by default Enabled by default
Token Not Available Disabled by default
When sending a request to a service that is secured with a key or token, use the
Authorization header to pass the key or token. The key or token must be formatted as
Bearer <key-or-token> , where <key-or-token> is your key or token value.
The primary difference between keys and tokens is that keys are static and can be
regenerated manually, and tokens need to be refreshed upon expiration. Key-based
auth is supported for Azure Container Instance and Azure Kubernetes Service deployed
web-services, and token-based auth is only available for Azure Kubernetes Service
deployments. For more information on configuring authentication, see Configure
authentication for models deployed as web services.
Authentication with keys

When you enable authentication for a deployment, you automatically create
authentication keys.
Authentication is enabled by default when you're deploying to Azure Kubernetes

Service.
Authentication is disabled by default when you're deploying to Azure Container
Instances.
To control authentication, use the auth_enabled parameter when you're creating or

updating a deployment.
If authentication is enabled, you can use the get_keys method to retrieve a primary and
secondary authentication key:
Python
primary, secondary = service.get_keys()

print(primary)
） Important
If you need to regenerate a key, use service.regen_key.
Authentication with tokens

When you enable token authentication for a web service, a user must provide an Azure
Machine Learning JWT token to the web service to access it.
Token authentication is disabled by default when you're deploying to Azure

Kubernetes Service.
Token authentication isn't supported when you're deploying to Azure Container
Instances.
To control token authentication, use the token_auth_enabled parameter when you're

creating or updating a deployment.
If token authentication is enabled, you can use the get_token method to retrieve a
bearer token and that tokens expiration time:
Python
token, refresh_by = service.get_token()

print(token)
If you have the Azure CLI and the machine learning extension, you can use the following
command to get a token:
Azure CLI
az ml service get-access-token -n <service-name>
） Important
Currently the only way to retrieve the token is by using the Azure Machine Learning
SDK or the Azure CLI machine learning extension.
You'll need to request a new token after the token's refresh_by time.
Request data
The REST API expects the body of the request to be a JSON document with the
following structure:
JSON
{
"data":
[
<model-specific-data-structure>
]
}
） Important
The structure of the data needs to match what the scoring script and model in the
service expect. The scoring script might modify the data before passing it to the
model.
Binary data
For information on how to enable support for binary data in your service, see Binary
data.
 Tip
Enabling support for binary data happens in the score.py file used by the deployed
model. From the client, use the HTTP functionality of your programming language.
For example, the following snippet sends the contents of a JPG file to a web service:
Python
import requests
# Load image data
data = open('example.jpg', 'rb').read()
# Post raw data to scoring URI
res = request.post(url='<scoring-uri>', data=data, headers={'Content-
Type': 'application/> octet-stream'})
Cross-origin resource sharing (CORS)

For information on enabling CORS support in your service, see Cross-origin resource
sharing.
Call the service (C#)
This example demonstrates how to use C# to call the web service created from the Train
within notebook example:
C#
using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using Newtonsoft.Json;
namespace MLWebServiceClient
{
// The data structure expected by the service
internal class InputData
{
[JsonProperty("data")]
// The service used by this example expects an array containing
// one or more arrays of doubles
internal double[,] data;
}
class Program
{
static void Main(string[] args)
{
// Set the scoring URI and authentication key or token
string scoringUri = "<your web service URI>";
string authKey = "<your key or token>";
// Set the data to be sent to the service.

// In this case, we are sending two sets of data to be scored.
InputData payload = new InputData();
payload.data = new double[,] {
{
0.0199132141783263,
0.0506801187398187,
0.104808689473925,
0.0700725447072635,
-0.0359677812752396,
-0.0266789028311707,
-0.0249926566315915,
-0.00259226199818282,
0.00371173823343597,
0.0403433716478807
},
{
-0.0127796318808497,
-0.044641636506989,
0.0606183944448076,
0.0528581912385822,
0.0479653430750293,
0.0293746718291555,
-0.0176293810234174,
0.0343088588777263,
0.0702112981933102,
0.00720651632920303
}
};
// Create the HTTP client

HttpClient client = new HttpClient();
// Set the auth header. Only needed if the web service requires
authentication.
client.DefaultRequestHeaders.Authorization = new
AuthenticationHeaderValue("Bearer", authKey);
// Make the request

try {
var request = new HttpRequestMessage(HttpMethod.Post, new
Uri(scoringUri));
request.Content = new
StringContent(JsonConvert.SerializeObject(payload));
request.Content.Headers.ContentType = new
MediaTypeHeaderValue("application/json");
var response = client.SendAsync(request).Result;
// Display the response from the web service
Console.WriteLine(response.Content.ReadAsStringAsync().Result);
}
catch (Exception e)
{
Console.Out.WriteLine(e.Message);
}
}
}
}
The results returned are similar to the following JSON document:
JSON
[217.67978776218715, 224.78937091757172]
Call the service (Go)

This example demonstrates how to use Go to call the web service created from the Train
within notebook example:
Go
package main
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
)
// Features for this model are an array of decimal values

type Features []float64
// The web service input can accept multiple sets of values for scoring
type InputData struct {
Data []Features `json:"data",omitempty`
}
// Define some example data

var exampleData = []Features{
[]float64{
0.0199132141783263,
0.0506801187398187,
0.104808689473925,
0.0700725447072635,
-0.0359677812752396,
-0.0266789028311707,
-0.0249926566315915,
-0.00259226199818282,
0.00371173823343597,
0.0403433716478807,
},
[]float64{
-0.0127796318808497,
-0.044641636506989,
0.0606183944448076,
0.0528581912385822,
0.0479653430750293,
0.0293746718291555,
-0.0176293810234174,
0.0343088588777263,
0.0702112981933102,
0.00720651632920303,
},
}
// Set to the URI for your service

var serviceUri string = "<your web service URI>"
// Set to the authentication key or token (if any) for your service
var authKey string = "<your key or token>"
func main() {
// Create the input data from example data
jsonData := InputData{
Data: exampleData,
}
// Create JSON from it and create the body for the HTTP request
jsonValue, _ := json.Marshal(jsonData)
body := bytes.NewBuffer(jsonValue)
// Create the HTTP request

client := &http.Client{}
request, err := http.NewRequest("POST", serviceUri, body)
request.Header.Add("Content-Type", "application/json")
// These next two are only needed if using an authentication key

bearer := fmt.Sprintf("Bearer %v", authKey)
request.Header.Add("Authorization", bearer)
// Send the request to the web service

resp, err := client.Do(request)
if err != nil {
fmt.Println("Failure: ", err)
}
// Display the response received

respBody, _ := ioutil.ReadAll(resp.Body)
fmt.Println(string(respBody))
}
JSON
[217.67978776218715, 224.78937091757172]
Call the service (Java)

This example demonstrates how to use Java to call the web service created from the
Train within notebook example:
Java
import java.io.IOException;
import org.apache.http.client.fluent.*;
import org.apache.http.entity.ContentType;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
public class App {

// Handle making the request
public static void sendRequest(String data) {
// Replace with the scoring_uri of your service
String uri = "<your web service URI>";
// If using authentication, replace with the auth key or token
String key = "<your key or token>";
try {
// Create the request
Content content = Request.Post(uri)
.addHeader("Content-Type", "application/json")
// Only needed if using authentication
.addHeader("Authorization", "Bearer " + key)
// Set the JSON data as the body
.bodyString(data, ContentType.APPLICATION_JSON)
// Make the request and display the response.
.execute().returnContent();
System.out.println(content);
}
catch (IOException e) {
System.out.println(e);
}
}
public static void main(String[] args) {
// Create the data to send to the service
JSONObject obj = new JSONObject();
// In this case, it's an array of arrays
JSONArray dataItems = new JSONArray();
// Inner array has 10 elements
JSONArray item1 = new JSONArray();
item1.add(0.0199132141783263);
item1.add(0.0506801187398187);
item1.add(0.104808689473925);
item1.add(0.0700725447072635);
item1.add(-0.0359677812752396);
item1.add(-0.0266789028311707);
item1.add(-0.0249926566315915);
item1.add(-0.00259226199818282);
item1.add(0.00371173823343597);
item1.add(0.0403433716478807);
// Add the first set of data to be scored
dataItems.add(item1);
// Create and add the second set
JSONArray item2 = new JSONArray();
item2.add(-0.0127796318808497);
item2.add(-0.044641636506989);
item2.add(0.0606183944448076);
item2.add(0.0528581912385822);
item2.add(0.0479653430750293);
item2.add(0.0293746718291555);
item2.add(-0.0176293810234174);
item2.add(0.0343088588777263);
item2.add(0.0702112981933102);
item2.add(0.00720651632920303);
dataItems.add(item2);
obj.put("data", dataItems);
// Make the request using the JSON document string

sendRequest(obj.toJSONString());
}
}
JSON
[217.67978776218715, 224.78937091757172]
Call the service (Python)

This example demonstrates how to use Python to call the web service created from the
Train within notebook example:
Python
import requests
import json
# URL for the web service

scoring_uri = '<your web service URI>'
key = '<your key or token>'
# Two sets of data to score, so we get two results back

data = {"data":
[
[
0.0199132141783263,
0.0506801187398187,
0.104808689473925,
0.0700725447072635,
-0.0359677812752396,
-0.0266789028311707,
-0.0249926566315915,
-0.00259226199818282,
0.00371173823343597,
0.0403433716478807
],
[
-0.0127796318808497,
-0.044641636506989,
0.0606183944448076,
0.0528581912385822,
0.0479653430750293,
0.0293746718291555,
-0.0176293810234174,
0.0343088588777263,
0.0702112981933102,
0.00720651632920303]
]
}
# Convert to JSON string
input_data = json.dumps(data)
# Set the content type

# If authentication is enabled, set the authorization header
# Make the request and display the response

print(resp.text)
JSON
[217.67978776218715, 224.78937091757172]
Web service schema (OpenAPI specification)

If you used automatic schema generation with your deployment, you can get the
address of the OpenAPI specification for the service by using the swagger_uri property.
(For example, print(service.swagger_uri) .) Use a GET request or open the URI in a
browser to retrieve the specification.
The following JSON document is an example of a schema (OpenAPI specification)

generated for a deployment:
JSON
{
"swagger": "2.0",
"info": {
"title": "myservice",
"description": "API specification for Azure Machine Learning
myservice",
"version": "1.0"
},
"schemes": [
"https"
],
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"securityDefinitions": {
"Bearer": {
"type": "apiKey",
"name": "Authorization",
"in": "header",
"description": "For example: Bearer abc123"
}
},
"paths": {
"/": {
"get": {
"operationId": "ServiceHealthCheck",
"description": "Simple health check endpoint to ensure the
service is up at any given point.",
"responses": {
"200": {
"description": "If service is up and running, this
response will be returned with the content 'Healthy'",
"schema": {
"type": "string"
},
"examples": {
"application/json": "Healthy"
}
},
"default": {
"description": "The service failed to execute due to
an error.",
"schema": {
"$ref": "#/definitions/ErrorResponse"
}
}
}
}
},
"/score": {
"post": {
"operationId": "RunMLService",
"description": "Run web service's model and get the
prediction output",
"security": [
{
"Bearer": []
}
],
"parameters": [
{
"name": "serviceInputPayload",
"in": "body",
"description": "The input payload for executing the
real-time machine learning service.",
"schema": {
"$ref": "#/definitions/ServiceInput"
}
}
],
"responses": {
"200": {
"description": "The service processed the input
correctly and provided a result prediction, if applicable.",
"schema": {
"$ref": "#/definitions/ServiceOutput"
}
},
"default": {
"description": "The service failed to execute due to
an error.",
"schema": {
"$ref": "#/definitions/ErrorResponse"
}
}
}
}
}
},
"definitions": {
"ServiceInput": {
"type": "object",
"properties": {
"data": {
"type": "array",
"items": {
"type": "array",
"items": {
"type": "integer",
"format": "int64"
}
}
}
},
"example": {
"data": [
[ 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 ]
]
}
},
"ServiceOutput": {
"type": "array",
"items": {
"type": "number",
"format": "double"
},
"example": [
3726.995
]
},
"ErrorResponse": {
"type": "object",
"properties": {
"status_code": {
"type": "integer",
"format": "int32"
},
"message": {
"type": "string"
}
}
}
}
}
For more information, see OpenAPI specification .
For a utility that can create client libraries from the specification, see swagger-
codegen .
 Tip
You can retrieve the schema JSON document after you deploy the service. Use the
swagger_uri property from the deployed web service (for example,
service.swagger_uri ) to get the URI to the local web service's Swagger file.
Consume the service from Power BI

Power BI supports consumption of Azure Machine Learning web services to enrich the
data in Power BI with predictions.
To generate a web service that's supported for consumption in Power BI, the schema
must support the format that's required by Power BI. Learn how to create a Power BI-
supported schema.
Once the web service is deployed, it's consumable from Power BI dataflows. Learn how
to consume an Azure Machine Learning web service from Power BI.
Next steps
To view a reference architecture for real-time scoring of Python and deep learning
models, go to the Azure architecture center.
Advanced entry script authoring
Article • 03/02/2023
This article shows how to write entry scripts for specialized use cases.
Prerequisites
This article assumes you already have a trained machine learning model that you intend
to deploy with Azure Machine Learning. To learn more about model deployment, see
How to deploy and where.
Automatically generate a Swagger schema

To automatically generate a schema for your web service, provide a sample of the input
and/or output in the constructor for one of the defined type objects. The type and
sample are used to automatically create the schema. Azure Machine Learning then
creates an OpenAPI (Swagger) specification for the web service during deployment.
２ Warning
You must not use sensitive or private data for sample input or output. The Swagger
page for AML-hosted inferencing exposes the sample data.
These types are currently supported:
pandas
numpy
pyspark
Standard Python object
To use schema generation, include the open-source inference-schema package version

1.1.0 or above in your dependencies file. For more information on this package, see
https://github.com/Azure/InferenceSchema . In order to generate conforming swagger
for automated web service consumption, scoring script run() function must have API
shape of:
A first parameter of type "StandardPythonParameterType", named Inputs and

nested.
An optional second parameter of type "StandardPythonParameterType", named
GlobalParameters.
Return a dictionary of type "StandardPythonParameterType" named Results and
nested.
Define the input and output sample formats in the input_sample and output_sample
variables, which represent the request and response formats for the web service. Use
these samples in the input and output function decorators on the run() function. The
following scikit-learn example uses schema generation.
Power BI compatible endpoint

The following example demonstrates how to define API shape according to above
instruction. This method is supported for consuming the deployed web service from
Power BI. (Learn more about how to consume the web service from Power BI.)
Python
import json
import pickle
import numpy as np
import pandas as pd
import azureml.train.automl

from inference_schema.parameter_types.standard_py_parameter_type import
StandardPythonParameterType
NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import
PandasParameterType
def init():
global model
# Replace filename if needed.
'sklearn_regression_model.pkl')
# Deserialize the model file back into a sklearn model.
# providing 3 sample inputs for schema generation

numpy_sample_input = NumpyParameterType(np.array([[1,2,3,4,5,6,7,8,9,10],
[10,9,8,7,6,5,4,3,2,1]],dtype='float64'))
pandas_sample_input = PandasParameterType(pd.DataFrame({'name': ['Sarah',
'John'], 'age': [25, 26]}))
standard_sample_input = StandardPythonParameterType(0.0)
# This is a nested input sample, any item wrapped by `ParameterType` will be

described by schema
sample_input = StandardPythonParameterType({'input1': numpy_sample_input,
'input2': pandas_sample_input,
'input3': standard_sample_input})
sample_global_parameters = StandardPythonParameterType(1.0) # this is

optional
sample_output = StandardPythonParameterType([1.0, 1.0])
outputs = StandardPythonParameterType({'Results':sample_output}) # 'Results'
is case sensitive
@input_schema('Inputs', sample_input)
# 'Inputs' is case sensitive
@input_schema('GlobalParameters', sample_global_parameters)
# this is optional, 'GlobalParameters' is case sensitive
@output_schema(outputs)
def run(Inputs, GlobalParameters):

# the parameters here have to match those in decorator, both 'Inputs'
and
# 'GlobalParameters' here are case sensitive
try:
data = Inputs['input1']
# data will be convert to target format
assert isinstance(data, np.ndarray)
error = str(e)
return error
 Tip
The return value from the script can be any Python object that is serializable to
JSON. For example, if your model returns a Pandas dataframe that contains multiple
columns, you might use an output decorator similar to the following code:
Python
output_sample = pd.DataFrame(data=[{"a1": 5, "a2": 6}])

@output_schema(PandasParameterType(output_sample))
...
return result
Binary (that is, image) data
If your model accepts binary data, like an image, you must modify the score.py file
used for your deployment to accept raw HTTP requests. To accept raw data, use the
AMLRequest class in your entry script and add the @rawhttp decorator to the run()
function.
Here's an example of a score.py that accepts binary data:
Python
from azureml.contrib.services.aml_request import AMLRequest, rawhttp

from azureml.contrib.services.aml_response import AMLResponse
import json
def init():
print("This is init()")
@rawhttp
def run(request):
print("This is run()")
if request.method == 'GET':
# For this example, just return the URL for GETs.
respBody = str.encode(request.full_path)
return AMLResponse(respBody, 200)
elif request.method == 'POST':
file_bytes = request.files["image"]
image = Image.open(file_bytes).convert('RGB')
# For a real-world solution, you would load the data from reqBody
# and send it to the model. Then return the response.
# For demonstration purposes, this example just returns the size of

the image as the response.
return AMLResponse(json.dumps(image.size), 200)
else:
return AMLResponse("bad request", 500)
） Important
The AMLRequest class is in the azureml.contrib namespace. Entities in this

namespace change frequently as we work to improve the service. Anything in this
namespace should be considered a preview that's not fully supported by Microsoft.
If you need to test this in your local development environment, you can install the
components by using the following command:
shell
pip install azureml-contrib-services
７ Note
500 is not recommended as a customed status code, as at azureml-fe side, the

status code will be rewritten to 502.
The status code will be passed through the azureml-fe then sent to client.
The azureml-fe will only rewrite the 500 returned from the model side to be
502, the client will receive 502.
But if the azureml-fe itself returns 500, client side will still receive 500.
The AMLRequest class only allows you to access the raw posted data in the score.py,
there's no client-side component. From a client, you post data as normal. For example,
the following Python code reads an image file and posts the data:
Python
import requests
uri = service.scoring_uri
image_path = 'test.jpg'
files = {'image': open(image_path, 'rb').read()}
response = requests.post(uri, files=files)
print(response.json)
Cross-origin resource sharing (CORS)

Cross-origin resource sharing is a way to allow resources on a webpage to be requested
from another domain. CORS works via HTTP headers sent with the client request and
returned with the service response. For more information on CORS and valid headers,
see Cross-origin resource sharing in Wikipedia.
To configure your model deployment to support CORS, use the AMLResponse class in
your entry script. This class allows you to set the headers on the response object.
The following example sets the Access-Control-Allow-Origin header for the response
from the entry script:
Python
from azureml.contrib.services.aml_request import AMLRequest, rawhttp

from azureml.contrib.services.aml_response import AMLResponse
def init():
print("This is init()")
@rawhttp
def run(request):
print("This is run()")
print("Request: [{0}]".format(request))
if request.method == 'GET':
# For this example, just return the URL for GET.
# For a real-world solution, you would load the data from URL params
or headers
respBody = str.encode(request.full_path)
resp = AMLResponse(respBody, 200)
resp.headers["Allow"] = "OPTIONS, GET, POST"
resp.headers["Access-Control-Allow-Methods"] = "OPTIONS, GET, POST"
resp.headers['Access-Control-Allow-Origin'] =
"http://www.example.com"
resp.headers['Access-Control-Allow-Headers'] = "*"
return resp
elif request.method == 'POST':
reqBody = request.get_data(False)
# For a real-world solution, you would load the data from reqBody
resp = AMLResponse(reqBody, 200)
return resp
elif request.method == 'OPTIONS':
resp = AMLResponse("", 200)
return resp
else:
return AMLResponse("bad request", 400)
） Important
The AMLResponse class is in the azureml.contrib namespace. Entities in this

namespace change frequently as we work to improve the service. Anything in this
namespace should be considered a preview that's not fully supported by Microsoft.
If you need to test this in your local development environment, you can install the
components by using the following command:
shell
pip install azureml-contrib-services
２ Warning
Azure Machine Learning will route only POST and GET requests to the containers
running the scoring service. This can cause errors due to browsers using OPTIONS
requests to pre-flight CORS requests.
Load registered models

There are two ways to locate models in your entry script:
AZUREML_MODEL_DIR : An environment variable containing the path to the model
location.
Model.get_model_path : An API that returns the path to model file using the
registered model name.
AZUREML_MODEL_DIR
AZUREML_MODEL_DIR is an environment variable created during service deployment.

You can use this environment variable to find the location of the deployed model(s).
The following table describes the value of AZUREML_MODEL_DIR depending on the

number of models deployed:
Deployment Environment variable value
Single model The path to the folder containing the model.

Deployment Environment variable value
Multiple The path to the folder containing all models. Models are located by name and
models version in this folder ( $MODEL_NAME/$VERSION )
During model registration and deployment, Models are placed in the

AZUREML_MODEL_DIR path, and their original filenames are preserved.
To get the path to a model file in your entry script, combine the environment variable
with the file path you're looking for.
Single model example
Python
# Example when the model is a file

# Example when the model is a folder containing a file

file_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'my_model_folder',
Multiple model example
In this scenario, two models are registered with the workspace:
my_first_model : Contains one file ( my_first_model.pkl ) and there's only one
version ( 1 ).
my_second_model : Contains one file ( my_second_model.pkl ) and there are two
versions; 1 and 2 .
When the service was deployed, both models are provided in the deploy operation:
Python
first_model = Model(ws, name="my_first_model", version=1)

second_model = Model(ws, name="my_second_model", version=2)
service = Model.deploy(ws, "myservice", [first_model, second_model],
inference_config, deployment_config)
In the Docker image that hosts the service, the AZUREML_MODEL_DIR environment variable
contains the directory where the models are located. In this directory, each of the
models is located in a directory path of MODEL_NAME/VERSION . Where MODEL_NAME is the
name of the registered model, and VERSION is the version of the model. The files that
make up the registered model are stored in these directories.
In this example, the paths would be

$AZUREML_MODEL_DIR/my_first_model/1/my_first_model.pkl and
$AZUREML_MODEL_DIR/my_second_model/2/my_second_model.pkl .
Python
# Example when the model is a file, and the deployment contains multiple
models
first_model_name = 'my_first_model'
first_model_version = '1'
first_model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'),
first_model_name, first_model_version, 'my_first_model.pkl')
second_model_name = 'my_second_model'
second_model_version = '2'
second_model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'),
second_model_name, second_model_version, 'my_second_model.pkl')
get_model_path
When you register a model, you provide a model name that's used for managing the
model in the registry. You use this name with the Model.get_model_path() method to
retrieve the path of the model file or files on the local file system. If you register a folder
or a collection of files, this API returns the path of the directory that contains those files.
When you register a model, you give it a name. The name corresponds to where the
model is placed, either locally or during service deployment.
Framework-specific examples
More entry script examples for specific machine learning use cases can be found below:
PyTorch
TensorFlow
Keras
AutoML
ONNX
Next steps
Update web service
Python package extensibility for
prebuilt Docker images (preview)
Article • 03/02/2023
The prebuilt Docker images for model inference contain packages for popular machine
learning frameworks. There are two methods that can be used to add Python packages
without rebuilding the Docker image:
Dynamic installation: This approach uses a requirements file to automatically

restore Python packages when the Docker container boots.
Consider this method for rapid prototyping. When the image starts, packages are
restored using the requirements.txt file. This method increases startup of the
image, and you must wait longer before the deployment can handle requests.
Pre-installed Python packages: You provide a directory containing preinstalled

Python packages. During deployment, this directory is mounted into the container
for your entry script ( score.py ) to use.
Use this approach for production deployments. Since the directory containing the
packages is mounted to the image, it can be used even when your deployments
don't have public internet access. For example, when deployed into a secured
Azure Virtual Network.
） Important
Using Python package extensibility for prebuilt Docker images with Azure Machine
Learning is currently in preview. Preview functionality is provided "as-is", with no
guarantee of support or service level agreement. For more information, see the
Supplemental terms of use for Microsoft Azure previews .
Prerequisites
An Azure Machine Learning workspace. For a tutorial on creating a workspace, see
Get started with Azure Machine Learning.
Familiarity with using Azure Machine Learning environments.
Familiarity with Where and how to deploy models with Azure Machine Learning.
Dynamic installation
This approach uses a requirements file to automatically restore Python packages
when the image starts up.
To extend your prebuilt docker container image through a requirements.txt, follow these
steps:
1. Create a requirements.txt file alongside your score.py script.

2. Add all of your required packages to the requirements.txt file.
3. Set the AZUREML_EXTRA_REQUIREMENTS_TXT environment variable in your Azure
Machine Learning environment to the location of requirements.txt file.
Once deployed, the packages will automatically be restored for your score script.
 Tip
Even while prototyping, we recommend that you pin each package version in
requirements.txt . For example, use scipy == 1.2.3 instead of just scipy or even
scipy > 1.2.3 . If you don't pin an exact version and scipy releases a new version,
this can break your scoring script and cause failures during deployment and
scaling.
The following example demonstrates setting the AZUREML_EXTRA_REQUIRMENTS_TXT

environment variable:
Python

myenv = Environment(name="my_azureml_env")
myenv.docker.enabled = True
myenv.docker.base_image = <MCR-path>
myenv.environment_variables = {
"AZUREML_EXTRA_REQUIREMENTS_TXT": "requirements.txt"
}
The following diagram is a visual representation of the dynamic installation process:

Pre-installed Python packages
This approach mounts a directory that you provide into the image. The Python packages
from this directory can then be used by the entry script ( score.py ).
To extend your prebuilt docker container image through pre-installed Python packages,
follow these steps:
） Important
You must use packages compatible with Python 3.7. All current images are pinned
to Python 3.7.
1. Create a virtual environment using virtualenv .
2. Install your Dependencies. If you have a list of dependencies in a

requirements.txt , for example, you can use that to install with pip install -r
requirements.txt or just pip install individual dependencies.
3. When you specify the AZUREML_EXTRA_PYTHON_LIB_PATH environment variable, make

sure that you point to the correct site packages directory, which will vary
depending on your environment name and Python version. The following code
demonstrates setting the path for a virtual environment named myenv and Python
3.7:
Python

myenv = Environment(name='my_azureml_env')
myenv.docker.enabled = True
myenv.docker.base_image = <MCR-path>
"AZUREML_EXTRA_PYTHON_LIB_PATH": "myenv/lib/python3.7/site-
packages"
}
The following diagram is a visual representation of the pre-installed packages process:
Common problems
The mounting solution will only work when your myenv site packages directory contains
all of your dependencies. If your local environment is using dependencies installed in a
different location, they won't be available in the image.
Here are some things that may cause this problem:

virtualenv creates an isolated environment by default. Once you activate the
virtual environment, global dependencies cannot be used.

If you have a PYTHONPATH environment variable pointing to your global
dependencies, it may interfere with your virtual environment. Run pip list and
pip freeze after activating your environment to make sure no unwanted
dependencies are in your environment.

Conda and virtualenv environments can interfere. Make sure that not to use
Conda environment and virtualenv at the same time.
Limitations
Model.package()
The Model.package() method lets you create a model package in the form of a
Docker image or Dockerfile build context. Using Model.package() with prebuilt
inference docker images triggers an intermediate image build that changes the
non-root user to root user.
We encourage you to use our Python package extensibility solutions. If other

dependencies are required (such as apt packages), create your own Dockerfile
extending from the inference image.
Frequently asked questions

In the requirements.txt extensibility approach is it mandatory for the file name to
be requirements.txt ?
Python
"AZUREML_EXTRA_REQUIREMENTS_TXT": "name of your pip requirements
file goes here"
}
Can you summarize the requirements.txt approach versus the mounting

approach?
Start prototyping with the requirements.txt approach. After some iteration, when
you're confident about which packages (and versions) you need for a successful
model deployment, switch to the Mounting Solution.
Here's a detailed comparison.
Compared Requirements.txt (dynamic Package Mount

item installation)
Solution Create a requirements.txt that Create a local Python environment

installs the specified packages when with all of the dependencies. Mount
the container starts. this directory into container at
runtime.
Package No extra installation (assuming pip Virtual environment or conda

Installation already installed) environment installation.
Virtual No extra setup of virtual Need to set up a clean virtual

environment environment required, as users can environment, may take extra steps
Setup pull the current local user depending on the current user local
environment with pip freeze as environment.
needed to create the
requirements.txt .
Debugging Easy to set up and debug server, Unclean virtual environment could
since dependencies are clearly cause problems when debugging of
listed. server. For example, it may not be
clear if errors come from the
environment or user code.
Consistency Not consistent as dependent on Relies solely on user environment, so

during external PyPi packages and users no consistency issues.
scaling out pinning their dependencies. These
external downloads could be flaky.
Why are my requirements.txt and mounted dependencies directory not found in

the container?
Locally, verify the environment variables are set properly. Next, verify the paths
that are specified are spelled properly and exist. Check if you have set your source
directory correctly in the inference config constructor.
Can I override Python package dependencies in prebuilt inference docker image?
Yes. If you want to use other version of Python package that is already installed in
an inference image, our extensibility solution will respect your version. Make sure
there are no conflicts between the two versions.
Best Practices
Refer to the Load registered model docs. When you register a model directory,
don't include your scoring script, your mounted dependencies directory, or
requirements.txt within that directory.
For more information on how to load a registered or local model, see Where and
how to deploy.
Bug Fixes
2021-07-26
AZUREML_EXTRA_REQUIREMENTS_TXT and AZUREML_EXTRA_PYTHON_LIB_PATH are now
always relative to the directory of the score script. For example, if both the
requirements.txt and score script is in my_folder, then
AZUREML_EXTRA_REQUIREMENTS_TXT will need to be set to requirements.txt. No longer
will AZUREML_EXTRA_REQUIREMENTS_TXT be set to my_folder/requirements.txt.
Next steps
To learn more about deploying a model, see How to deploy a model.
To learn how to troubleshoot prebuilt docker image deployments, see how to

troubleshoot prebuilt Docker image deployments.
Extend a prebuilt Docker image
Article • 04/20/2023
In some cases, the prebuilt Docker images for model inference and extensibility
solutions for Azure Machine Learning may not meet your inference service needs.
In this case, you can use a Dockerfile to create a new image, using one of the prebuilt
images as the starting point. By extending from an existing prebuilt Docker image, you
can use the Azure Machine Learning network stack and libraries without creating an
image from scratch.
Benefits and tradeoffs
Using a Dockerfile allows for full customization of the image before deployment. It
allows you to have maximum control over what dependencies or environment variables,
among other things, are set in the container.
The main tradeoff for this approach is that an extra image build will take place during
deployment, which slows down the deployment process. If you can use the Python
package extensibility method, deployment will be faster.
Prerequisites
An Azure Machine Learning workspace. For a tutorial on creating a workspace, see
Create resources to get started.
Familiarity with authoring a Dockerfile .
Either a local working installation of Docker , including the docker CLI, OR an

Azure Container Registry (ACR) associated with your Azure Machine Learning
workspace.
２ Warning
The Azure Container Registry for your workspace is created the first time you
train or deploy a model using the workspace. If you've created a new
workspace, but not trained or created a model, no Azure Container Registry
will exist for the workspace.
Create and build Dockerfile

Below is a sample Dockerfile that uses an Azure Machine Learning prebuilt Docker
image as a base image:
Dockerfile
FROM mcr.microsoft.com/azureml/<image_name>:<tag>
COPY requirements.txt /tmp/requirements.txt
RUN pip install –r /tmp/requirements.txt
Then put the above Dockerfile into the directory with all the necessary files and run the
following command to build the image:
Bash
docker build -f <above dockerfile> -t <image_name>:<tag> .
 Tip
More details about docker build can be found here in the Docker
documentation .
If the docker build command isn't available locally, use the Azure Container Registry
ACR for your Azure Machine Learning Workspace to build the Docker image in the
cloud. For more information, see Tutorial: Build and deploy container images with Azure
Container Registry.
） Important
Microsoft recommends that you first validate that your Dockerfile works locally
before trying to create a custom base image via Azure Container Registry.
The following sections contain more specific details on the Dockerfile.
Install extra packages

If there are any other apt packages that need to be installed in the Ubuntu container,
you can add them in the Dockerfile. The following example demonstrates how to use
the apt-get command from a Dockerfile:
Dockerfile
FROM <prebuilt docker image from MCR>
# Switch to root to install apt packages

USER root:root
RUN apt-get update && \

apt-get install -y \
<package-1> \
...
<package-n> && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/*
# Switch back to non-root user

USER dockeruser
You can also install addition pip packages from a Dockerfile. The following example
demonstrates using pip install :
Dockerfile
RUN pip install <library>
Build model and code into images

If the model and code need to be built into the image, the following environment
variables need to be set in the Dockerfile:
AZUREML_ENTRY_SCRIPT : The entry script of your code. This file contains the init()
and run() methods.

AZUREML_MODEL_DIR : The directory that contains the model file(s). The entry script
should use this directory as the root directory of the model.
The following example demonstrates setting these environment variables in the

Dockerfile:
Dockerfile
FROM <prebuilt docker image from MCR>
# Code
COPY <local_code_directory> /var/azureml-app
ENV AZUREML_ENTRY_SCRIPT=<entryscript_file_name>
# Model
COPY <model_directory> /var/azureml-app/azureml-models
ENV AZUREML_MODEL_DIR=/var/azureml-app/azureml-models
Example Dockerfile
The following example demonstrates installing apt packages, setting environment
variables, and including code and models as part of the Dockerfile:
Dockerfile
FROM mcr.microsoft.com/azureml/pytorch-1.6-ubuntu18.04-py37-cpu-
inference:latest
USER root:root
# Install libpng-tools and opencv

RUN apt-get update && \
apt-get install -y \
libpng-tools \
python3-opencv && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/*
# Switch back to non-root user

USER dockeruser
# Code
COPY code /var/azureml-app
ENV AZUREML_ENTRY_SCRIPT=score.py
# Model
COPY model /var/azureml-app/azureml-models
ENV AZUREML_MODEL_DIR=/var/azureml-app/azureml-models
Next steps
To use a Dockerfile with the Azure Machine Learning Python SDK, see the following
documents:
Use your own local Dockerfile

Use a pre-built Docker image and create a custom base image
To learn more about deploying a model, see How to deploy a model.
To learn how to troubleshoot prebuilt docker image deployments, see how to

troubleshoot prebuilt Docker image deployments.
Troubleshooting prebuilt docker images
for inference
Article • 03/02/2023
Learn how to troubleshoot problems you may see when using prebuilt docker images
for inference with Azure Machine Learning.
） Important
Using Python package extensibility for prebuilt Docker images with Azure
Machine Learning is currently in preview. Preview functionality is provided "as-is",
with no guarantee of support or service level agreement. For more information, see
the Supplemental terms of use for Microsoft Azure previews .
Model deployment failed

If model deployment fails, you won't see logs in Azure Machine Learning studio and
service.get_logs() will return None. If there is a problem in the init() function of
score.py, service.get_logs() will return logs for the same.
So you'll need to run the container locally using one of the commands shown below and
replace <MCR-path> with an image path. For a list of the images and paths, see Prebuilt
Docker images for inference.
Mounting extensibility solution

Go to the directory containing score.py and run:
Bash
docker run -it -v $(pwd):/var/azureml-app -e

AZUREML_EXTRA_PYTHON_LIB_PATH="myenv/lib/python3.7/site-packages" <mcr-path>
requirements.txt extensibility solution

Go to the directory containing score.py and run:
Bash
docker run -it -v $(pwd):/var/azureml-app -e
AZUREML_EXTRA_REQUIREMENTS_TXT="requirements.txt" <mcr-path>
Enable local debugging

For common model deployment issues

For problems when deploying a model from Azure Machine Learning to Azure Container
Instances (ACI) or Azure Kubernetes Service (AKS), see Troubleshoot model deployment.
init() or run() failing to write a file

HTTP server in our Prebuilt Docker Images run as non-root user, it may not have access
right to all directories. Only write to directories you have access rights to. For example,
the /tmp directory in the container.
Extra Python packages not installed

Check if there's a typo in the environment variable or file name.
Check the container log to see if pip install -r <your_requirements.txt> is
installed or not.
Check if source directory is set correctly in the inference config constructor.
If installation not found and log says "file not found", check if the file name shown
in the log is correct.
If installation started but failed or timed out, try to install the same
requirements.txt locally with same Python and pip version in clean environment
(that is, no cache directory; pip install --no-cache-dir -r requriements.txt ). See

if the problem can be reproduced locally.
Mounting solution failed

Check if there's a typo in the environment variable or directory name.
The environment variable must be set to the relative path of the score.py file.
Check if source directory is set correctly in the inference config constructor.
The directory needs to be the "site-packages" directory of the environment.
If score.py still returns ModuleNotFound and the module is supposed to be in the
directory mounted, try to print the sys.path in init() or run() to see if any path
is missing.
Building an image based on the prebuilt

Docker image failed
If failed during apt package installation, check if the user has been set to root
before running the apt command? (Make sure switch back to non-root user)
Run doesn't complete on GPU local

deployment
GPU base images can't be used for local deployment, unless the local deployment is on
an Azure Machine Learning compute instance. GPU base images are supported only on
Microsoft Azure Services such as Azure Machine Learning compute clusters and
instances, Azure Container Instance (ACI), Azure VMs, or Azure Kubernetes Service (AKS).
Image built based on the prebuilt Docker

image can't boot up
The non-root user needs to be dockeruser . Otherwise, the owner of the following
directories must be set to the user name you want to use when running the image:
Bash
/var/runit
/var/log
/var/lib/nginx
/run
/opt/miniconda
/var/azureml-app
If the ENTRYPOINT has been changed in the new built image, then the HTTP server
and related components need to be loaded by runsvdir /var/runit
Next steps
Add Python packages to prebuilt images.
Use a prebuilt package as a base for a new Dockerfile.
Collect data from models in production
Article • 06/01/2023
This article shows how to collect data from an Azure Machine Learning model deployed
on an Azure Kubernetes Service (AKS) cluster. The collected data is then stored in Azure
Blob storage.
Once collection is enabled, the data you collect helps you:
Monitor data drifts on the production data you collect.
Analyze collected data using Power BI or Azure Databricks
Make better decisions about when to retrain or optimize your model.
Retrain your model with the collected data.
Limitations
The model data collection feature can only work with Ubuntu 18.04 image.
） Important
As of 03/10/2023, the Ubuntu 18.04 image is now deprecated. Support for Ubuntu
18.04 images will be dropped starting January 2023 when it reaches EOL on April
30, 2023.
The MDC feature is incompatible with any other image than Ubuntu 18.04, which is
no available after the Ubuntu 18.04 image is deprecated.
mMore information you can refer to:
openmpi3.1.2-ubuntu18.04 release-notes
data science virtual machine release notes
７ Note
The data collection feature is currently in preview, any preview features are not
recommended for production workloads.
What is collected and where it goes
The following data can be collected:
Model input data from web services deployed in an AKS cluster. Voice audio,
images, and video are not collected.
Model predictions using production input data.
７ Note
Preaggregation and precalculations on this data are not currently part of the
collection service.
The output is saved in Blob storage. Because the data is added to Blob storage, you can
choose your favorite tool to run the analysis.
The path to the output data in the blob follows this syntax:
/modeldata/<subscriptionid>/<resourcegroup>/<workspace>/<webservice>/<model>
/<version>/<designation>/<year>/<month>/<day>/data.csv
# example: /modeldata/1a2b3c4d-5e6f-7g8h-9i10-
j11k12l13m14/myresourcegrp/myWorkspace/aks-w-
collv9/best_model/10/inputs/2018/12/31/data.csv
７ Note
In versions of the Azure Machine Learning SDK for Python earlier than version
0.1.0a16, the designation argument is named identifier . If you developed your
code with an earlier version, you need to update it accordingly.
Prerequisites
If you don't have an Azure subscription, create a free account before you begin.
An Azure Machine Learning workspace, a local directory containing your scripts,

and the Azure Machine Learning SDK for Python must be installed. To learn how to
install them, see How to configure a development environment.
You need a trained machine-learning model to be deployed to AKS. If you don't
have a model, see the Train image classification model tutorial.
You need an AKS cluster. For information on how to create one and deploy to it,
see Deploy machine learning models to Azure.
Set up your environment and install the Azure Machine Learning Monitoring SDK.
Use a docker image based on Ubuntu 18.04, which is shipped with libssl 1.0.0 ,
the essential dependency of modeldatacollector. You can refer to prebuilt images.
Enable data collection

You can enable data collection regardless of the model you deploy through Azure
Machine Learning or other tools.
To enable data collection, you need to:
1. Open the scoring file.
2. Add the following code at the top of the file:
Python
from azureml.monitoring import ModelDataCollector
3. Declare your data collection variables in your init function:
Python
global inputs_dc, prediction_dc

inputs_dc = ModelDataCollector("best_model", designation="inputs",
feature_names=["feat1", "feat2", "feat3", "feat4", "feat5", "feat6"])
prediction_dc = ModelDataCollector("best_model",
designation="predictions", feature_names=["prediction1",
"prediction2"])
CorrelationId is an optional parameter. You don't need to use it if your model

doesn't require it. Use of CorrelationId does help you more easily map with other
data, such as LoanNumber or CustomerId.
The Identifier parameter is later used for building the folder structure in your blob.
You can use it to differentiate raw data from processed data.
4. Add the following lines of code to the run(input_df) function:

Python
data = np.array(data)
inputs_dc.collect(data) #this call is saving our input data into Azure
Blob
prediction_dc.collect(result) #this call is saving our prediction data
into Azure Blob
5. Data collection is not automatically set to true when you deploy a service in AKS.
Update your configuration file, as in the following example:
Python
aks_config =
AksWebservice.deploy_configuration(collect_model_data=True)
You can also enable Application Insights for service monitoring by changing this
configuration:
Python
aks_config =
AksWebservice.deploy_configuration(collect_model_data=True,
enable_app_insights=True)
6. To create a new image and deploy the machine learning model, see Deploy
machine learning models to Azure.
7. Add the 'Azure-Monitoring' pip package to the conda-dependencies of the web

service environment:
Python
env = Environment('webserviceenv')
env.python.conda_dependencies = CondaDependencies.create(conda_packages=
['numpy'],pip_packages=['azureml-defaults','azureml-monitoring','inference-
schema[numpy-support]'])
Disable data collection

You can stop collecting data at any time. Use Python code to disable data collection.
Python
## replace <service_name> with the name of the web service
<service_name>.update(collect_model_data=False)
Validate and analyze your data

You can choose a tool of your preference to analyze the data collected in your Blob
storage.
Quickly access your blob data

1. Sign in to Azure portal .
2. Open your workspace.
3. Select Storage.
4. Follow the path to the blob's output data with this syntax:
/modeldata/<subscriptionid>/<resourcegroup>/<workspace>/<webservice>/<m
odel>/<version>/<designation>/<year>/<month>/<day>/data.csv
# example: /modeldata/1a2b3c4d-5e6f-7g8h-9i10-
j11k12l13m14/myresourcegrp/myWorkspace/aks-w-
collv9/best_model/10/inputs/2018/12/31/data.csv
Analyze model data using Power BI

1. Download and open Power BI Desktop .
2. Select Get Data and select Azure Blob Storage.


3. Add your storage account name and enter your storage key. You can find this
information by selecting Settings > Access keys in your blob.
4. Select the model data container and select Edit.


5. In the query editor, click under the Name column and add your storage account.
6. Enter your model path into the filter. If you want to look only into files from a
specific year or month, just expand the filter path. For example, to look only into
March data, use this filter path:
/modeldata/<subscriptionid>/<resourcegroupname>/<workspacename>/<webse
rvicename>/<modelname>/<modelversion>/<designation>/<year>/3
7. Filter the data that is relevant to you based on Name values. If you stored
predictions and inputs, you need to create a query for each.
8. Select the downward double arrows next to the Content column heading to
combine the files.
9. Select OK. The data preloads.


10. Select Close and Apply.
11. If you added inputs and predictions, your tables are automatically ordered by
RequestId values.
12. Start building your custom reports on your model data.
Analyze model data using Azure Databricks

1. Create an Azure Databricks workspace.
2. Go to your Databricks workspace.
3. In your Databricks workspace, select Upload Data.


4. Select Create New Table and select Other Data Sources > Azure Blob Storage >
Create Table in Notebook.

5. Update the location of your data. Here is an example:
file_location =
"wasbs://mycontainer@storageaccountname.blob.core.windows.net/modeldata
/1a2b3c4d-5e6f-7g8h-9i10-j11k12l13m14/myresourcegrp/myWorkspace/aks-w-
collv9/best_model/10/inputs/2018/*/*/data.csv"
file_type = "csv"
6. Follow the steps on the template to view and analyze your data.
Next steps
Detect data drift on the data you have collected.
Monitor and collect data from ML web
service endpoints
Article • 06/01/2023
In this article, you learn how to collect data from models deployed to web service
endpoints in Azure Kubernetes Service (AKS) or Azure Container Instances (ACI). Use
Azure Application Insights to collect the following data from an endpoint:
Output data
Responses
Request rates, response times, and failure rates
Dependency rates, response times, and failure rates
Exceptions
The enable-app-insights-in-production-service.ipynb notebook demonstrates

concepts in this article.
this service.
） Important
The information in this article relies on the Azure Application Insights instance that
was created with your workspace. If you deleted this Application Insights instance,
there is no way to re-create it other than deleting and recreating the workspace.
 Tip
If you are using online endpoints instead, use the information in the Monitor
online endpoints article instead.
Prerequisites
An Azure subscription - try the free or paid version of Azure Machine Learning .
An Azure Machine Learning workspace, a local directory that contains your scripts,
and the Azure Machine Learning SDK for Python installed. To learn more, see How
to configure a development environment.
A trained machine learning model. To learn more, see the Train image classification
model tutorial.
Configure logging with the Python SDK

In this section, you learn how to enable Application Insight logging by using the Python
SDK.
Update a deployed service

Use the following steps to update an existing web service:
1. Identify the service in your workspace. The value for ws is the name of your
workspace
Python

aks_service= Webservice(ws, "my-service-name")
2. Update your service and enable Azure Application Insights
Python
aks_service.update(enable_app_insights=True)
Log custom traces in your service
） Important
Azure Application Insights only logs payloads of up to 64kb. If this limit is reached,
you may see errors such as out of memory, or no information may be logged. If the
data you want to log is larger 64kb, you should instead store it to blob storage
using the information in Collect Data for models in production.
For more complex situations, like model tracking within an AKS deployment, we
recommend using a third-party library like OpenCensus .
To log custom traces, follow the standard deployment process for AKS or ACI in the How
to deploy and where document. Then, use the following steps:
1. Update the scoring file by adding print statements to send data to Application
Insights during inference. For more complex information, such as the request data
and the response, use a JSON structure.
The following example score.py file logs when the model was initialized, input and
output during inference, and the time any errors occur.
Python
import pickle
import json
import numpy
import time
def init():
global model
#Print statement for appinsights custom traces:
print ("model initialized" + time.strftime("%H:%M:%S"))
# note here "sklearn_regression_model.pkl" is the name of the model

registered under the workspace
# this call should return the path to the model.pkl file on the
local disk.
model_path = Model.get_model_path(model_name =
# deserialize the model file back into a sklearn model

# note you can pass in multiple rows for scoring

def run(raw_data):
try:
data = json.loads(raw_data)['data']
# Log the input and output data to appinsights:
info = {
"input": raw_data,
"output": result.tolist()
}
print(json.dumps(info))
# you can return any datatype as long as it is JSON-
serializable
error = str(e)
print (error + time.strftime("%H:%M:%S"))
return error
2. Update the service configuration, and make sure to enable Application Insights.
Python
config = Webservice.deploy_configuration(enable_app_insights=True)
3. Build an image and deploy it on AKS or ACI. For more information, see How to
deploy and where.
Disable tracking in Python

To disable Azure Application Insights, use the following code:
Python
## replace <service_name> with the name of the web service

<service_name>.update(enable_app_insights=False)
Configure logging with Azure Machine

Learning studio
You can also enable Azure Application Insights from Azure Machine Learning studio.
When you're ready to deploy your model as a web service, use the following steps to
enable Application Insights:
1. Sign in to the studio at https://ml.azure.com .
2. Go to Models and select the model you want to deploy.
3. Select +Deploy.
4. Populate the Deploy model form.
5. Expand the Advanced menu.

6. Select Enable Application Insights diagnostics and data collection.
View metrics and logs
Query logs for deployed models

Logs of online endpoints are customer data. You can use the get_logs() function to
retrieve logs from a previously deployed web service. The logs may contain detailed
information about any errors that occurred during deployment.
Python

# load existing web service

service = Webservice(name="service-name", workspace=ws)
logs = service.get_logs()
If you have multiple Tenants, you may need to add the following authenticate code
before ws = Workspace.from_config()
Python
interactive_auth = InteractiveLoginAuthentication(tenant_id="the tenant_id
in which your workspace resides")
View logs in the studio

Azure Application Insights stores your service logs in the same resource group as the
Azure Machine Learning workspace. Use the following steps to view your data using the
studio:
1. Go to your Azure Machine Learning workspace in the studio .
2. Select Endpoints.
3. Select the deployed service.
4. Select the Application Insights url link.
5. In Application Insights, from the Overview tab or the Monitoring section, select
Logs.

6. To view information logged from the score.py file, look at the traces table. The
following query searches for logs where the input value was logged:
Kusto
traces
| where customDimensions contains "input"
| limit 10
For more information on how to use Azure Application Insights, see What is Application
Insights?.
Web service metadata and response data
） Important
Azure Application Insights only logs payloads of up to 64kb. If this limit is reached
then you may see errors such as out of memory, or no information may be logged.
To log web service request information, add print statements to your score.py file. Each
print statement results in one entry in the Application Insights trace table under the
message STDOUT . Application Insights stores the print statement outputs in
customDimensions and in the Contents trace table. Printing JSON strings produces a
hierarchical data structure in the trace output under Contents .
Export data for retention and processing
） Important
Azure Application Insights only supports exports to blob storage. For more
information on the limits of this implementation, see Export telemetry from App
Insights.
Use Application Insights' continuous export to export data to a blob storage account
where you can define retention settings. Application Insights exports the data in JSON
format.
Next steps
In this article, you learned how to enable logging and view logs for web service
endpoints. Try these articles for next steps:
How to deploy a model to an AKS cluster
How to deploy a model to Azure Container Instances
MLOps: Manage, deploy, and monitor models with Azure Machine Learning to
learn more about leveraging data collected from models in production. Such data
can help to continually improve your machine learning process.
Use the studio to deploy models trained
in the designer
Article • 06/01/2023
In this article, you learn how to deploy a designer model as an online (real-time)
endpoint in Azure Machine Learning studio.
Once registered or downloaded, you can use designer trained models just like any other
model. Exported models can be deployed in use cases such as internet of things (IoT)
and local deployments.
Deployment in the studio consists of the following steps:
1. Register the trained model.

2. Download the entry script and conda dependencies file for the model.
3. (Optional) Configure the entry script.
4. Deploy the model to a compute target.
You can also deploy models directly in the designer to skip model registration and file
download steps. This can be useful for rapid deployment. For more information see,
Deploy a model with the designer.
Models trained in the designer can also be deployed through the SDK or command-line
interface (CLI). For more information, see Deploy your existing model with Azure
Machine Learning.
Prerequisites
A completed training pipeline containing one of following components:

Train Model Component
Train Anomaly Detection Model component
Train Clustering Model component
Train Pytorch Model component
Train SVD Recommender component
Train Vowpal Wabbit Model component
Train Wide & Deep Model component
Register the model

After the training pipeline completes, register the trained model to your Azure Machine
Learning workspace to access the model in other projects.
2. Select the Outputs + logs tab in the right pane.
3. Select the Register Model icon .
4. Enter a name for your model, then select Save.
After registering your model, you can find it in the Models asset page in the studio.
Download the entry script file and conda

dependencies file
You need the following files to deploy a model in Azure Machine Learning studio:
Entry script file - loads the trained model, processes input data from requests,
does real-time inferences, and returns the result. The designer automatically
generates a score.py entry script file when the Train Model component
completes.
Conda dependencies file - specifies which pip and conda packages your
webservice depends on. The designer automatically creates a conda_env.yaml file
when the Train Model component completes.
You can download these two files in the right pane of the Train Model component:
2. In the Outputs + logs tab, select the folder trained_model_outputs .
3. Download the conda_env.yaml file and score.py file.
Alternatively, you can download the files from the Models asset page after registering
your model:
1. Navigate to the Models asset page.
2. Select the model you want to deploy.
3. Select the Artifacts tab.
4. Select the trained_model_outputs folder.
5. Download the conda_env.yaml file and score.py file.

７ Note
The score.py file provides nearly the same functionality as the Score Model
components. However, some components like Score SVD Recommender, Score
Wide and Deep Recommender, and Score Vowpal Wabbit Model have parameters
for different scoring modes. You can also change those parameters in the entry
script.
For more information on setting parameters in the score.py file, see the section,
Configure the entry script.
Deploy the model

After downloading the necessary files, you're ready to deploy the model.
1. In the Models asset page, select the registered model.

2. Select Deploy and select Deploy to web service.
3. In the configuration menu, enter the following information:
Input a name for the endpoint.

Select to deploy the model to Azure Kubernetes Service or Azure Container
Instance.
Upload the score.py for the Entry script file.
Upload the conda_env.yml for the Conda dependencies file.
 Tip
In Advanced setting, you can set CPU/Memory capacity and other parameters
for deployment. These settings are important for certain models such as
PyTorch models, which consume considerable amount of memery (about 4
GB).
4. Select Deploy to deploy your model as an online endpoint.

Consume the online endpoint
After deployment succeeds, you can find the endpoint in the Endpoints asset page.
Once there, you will find a REST endpoint, which clients can use to submit requests to
the endpoint.
７ Note
The designer also generates a sample data json file for testing, you can download
_samples.json in the trained_model_outputs folder.
Use the following code sample to consume an online endpoint.
Python
import json
from pathlib import Path
from azureml.core.workspace import Workspace, Webservice
service_name = 'YOUR_SERVICE_NAME'
ws = Workspace.get(
name='WORKSPACE_NAME',
subscription_id='SUBSCRIPTION_ID',
resource_group='RESOURCEGROUP_NAME'
)
service = Webservice(ws, service_name)
sample_file_path = '_samples.json'
with open(sample_file_path, 'r') as f:

sample_data = json.load(f)
score_result = service.run(json.dumps(sample_data))
print(f'Inference result = {score_result}')
Consume computer vision related online endpoints

When consuming computer vision related online endpoints, you need to convert images
to bytes, since web service only accepts string as input. Following is the sample code:
Python
import base64
import json
from copy import deepcopy
from azureml.studio.core.io.image_directory import (IMG_EXTS,
image_from_file, image_to_bytes)
from azureml.studio.core.io.transformation_directory import
ImageTransformationDirectory
# image path
image_path = Path('YOUR_IMAGE_FILE_PATH')
# provide the same parameter setting as in the training pipeline. Just an

example here.
image_transform = [
# format: (op, args). {} means using default parameter values of
torchvision.transforms.
# See https://pytorch.org/docs/stable/torchvision/transforms.html
('Resize', 256),
('CenterCrop', 224),
# ('Pad', 0),
# ('ColorJitter', {}),
# ('Grayscale', {}),
# ('RandomResizedCrop', 256),
# ('RandomCrop', 224),
# ('RandomHorizontalFlip', {}),
# ('RandomVerticalFlip', {}),
# ('RandomRotation', 0),
# ('RandomAffine', 0),
# ('RandomGrayscale', {}),
# ('RandomPerspective', {}),
]
transform =
ImageTransformationDirectory.create(transforms=image_transform).torch_transf
orm
# download _samples.json file under Outputs+logs tab in the right pane of

Train Pytorch Model component
sample_file_path = '_samples.json'
with open(sample_file_path, 'r') as f:
sample_data = json.load(f)
# use first sample item as the default value

default_data = sample_data[0]
data_list = []
for p in image_path.iterdir():
if p.suffix.lower() in IMG_EXTS:
data = deepcopy(default_data)
# convert image to bytes
data['image'] =
base64.b64encode(image_to_bytes(transform(image_from_file(p)))).decode()
data_list.append(data)
# use data.json as input of consuming the endpoint

data_file_path = 'data.json'
with open(data_file_path, 'w') as f:
json.dump(data_list, f)
Configure the entry script

Some components in the designer like Score SVD Recommender, Score Wide and Deep
Recommender, and Score Vowpal Wabbit Model have parameters for different scoring
modes.
In this section, you learn how to update these parameters in the entry script file too.
The following example updates the default behavior for a trained Wide & Deep
recommender model. By default, the score.py file tells the web service to predict
ratings between users and items.
You can modify the entry script file to make item recommendations, and return
recommended items by changing the recommender_prediction_kind parameter.
Python
import os
import json
from collections import defaultdict
from azureml.studio.core.io.model_directory import ModelDirectory
from azureml.designer.modules.recommendation.dnn.wide_and_deep.score. \
score_wide_and_deep_recommender import ScoreWideAndDeepRecommenderModule
from azureml.designer.serving.dagengine.utils import decode_nan
from azureml.designer.serving.dagengine.converter import
create_dfd_from_dict
'trained_model_outputs')
schema_file_path = Path(model_path) / '_schema.json'
with open(schema_file_path) as fp:
schema_data = json.load(fp)
def init():
global model
model = ModelDirectory.load(load_from_dir=model_path)
def run(data):
data = json.loads(data)
input_entry = defaultdict(list)
for row in data:
for key, val in row.items():
input_entry[key].append(decode_nan(val))
data_frame_directory = create_dfd_from_dict(input_entry, schema_data)
# The parameter names can be inferred from Score Wide and Deep
Recommender component parameters:
# convert the letters to lower cases and replace whitespaces to
underscores.
score_params = dict(
trained_wide_and_deep_recommendation_model=model,
dataset_to_score=data_frame_directory,
training_data=None,
user_features=None,
item_features=None,
################### Note #################
# Set 'Recommender prediction kind' parameter to enable item
recommendation model
recommender_prediction_kind='Item Recommendation',
recommended_item_selection='From All Items',
maximum_number_of_items_to_recommend_to_a_user=5,
whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels='
True')
result_dfd, = ScoreWideAndDeepRecommenderModule().run(**score_params)
result_df = result_dfd.data
return json.dumps(result_df.to_dict("list"))
For Wide & Deep recommender and Vowpal Wabbit models, you can configure the
scoring mode parameter using the following methods:
The parameter names are the lowercase and underscore combinations of

parameter names for Score Vowpal Wabbit Model and Score Wide and Deep
Recommender;
Mode type parameter values are strings of the corresponding option names. Take
Recommender prediction kind in the above codes as example, the value can be
'Rating Prediction' or 'Item Recommendation' . Other values are not allowed.
For SVD recommender trained model, the parameter names and values maybe less
obvious, and you can look up the tables below to decide how to set parameters.
Parameter name in Score SVD Recommender Parameter name in the entry

script file
Recommender prediction kind prediction_kind
Recommended item selection recommended_item_selection
Minimum size of the recommendation pool for a single user min_recommendation_pool_size
Maximum number of items to recommend to a user max_recommended_item_count
Whether to return the predicted ratings of the items along return_ratings

with the labels
The following code shows you how to set parameters for an SVD recommender, which
uses all six parameters to recommend rated items with predicted ratings attached.
Python
score_params = dict(
learner=model,
test_data=DataTable.from_dfd(data_frame_directory),
training_data=None,
# RecommenderPredictionKind has 2 members, 'RatingPrediction' and
'ItemRecommendation'. You
# can specify prediction_kind parameter with one of them.
prediction_kind=RecommenderPredictionKind.ItemRecommendation,
# RecommendedItemSelection has 3 members, 'FromAllItems',
'FromRatedItems', 'FromUndatedItems'.
# You can specify recommended_item_selection parameter with one of
them.
recommended_item_selection=RecommendedItemSelection.FromRatedItems,
min_recommendation_pool_size=1,
max_recommended_item_count=3,
return_ratings=True,
)
Next steps
Train a model in the designer
Deploy models with Azure Machine Learning SDK
Update web service
How to package a registered model with
Docker
Article • 03/02/2023
This article shows how to package a registered Azure Machine Learning model with
Docker.
Prerequisites
This article assumes you have already trained and registered a model in your machine
learning workspace. To learn how to train and register a scikit-learn model, follow this
tutorial.
Package models
In some cases, you might want to create a Docker image without deploying the model.
Or you might want to download the image and run it on a local Docker installation. You
might even want to download the files used to build the image, inspect them, modify
them, and build the image manually.
Model packaging enables you to do these things. It packages all the assets needed to
host a model as a web service and allows you to download either a fully built Docker
image or the files needed to build one. There are two ways to use model packaging:
Download a packaged model: Download a Docker image that contains the model and
other files needed to host it as a web service.
Generate a Dockerfile: Download the Dockerfile, model, entry script, and other assets
needed to build a Docker image. You can then inspect the files or make changes before
you build the image locally.
Both packages can be used to get a local Docker image.
 Tip
Creating a package is similar to deploying a model. You use a registered model and
an inference configuration.
） Important
To download a fully built image or build an image locally, you need to have
Docker installed in your development environment.
Download a packaged model

The following example builds an image, which is registered in the Azure container
registry for your workspace:
Python
package = Model.package(ws, [model], inference_config)

package.wait_for_creation(show_output=True)
After you create a package, you can use package.pull() to pull the image to your local
Docker environment. The output of this command will display the name of the image.
For example:
Status: Downloaded newer image for
myworkspacef78fd10.azurecr.io/package:20190822181338 .
After you download the model, use the docker images command to list the local images:
text
REPOSITORY TAG IMAGE ID

CREATED SIZE
myworkspacef78fd10.azurecr.io/package 20190822181338 7ff48015d5bd
4 minutes ago 1.43 GB
To start a local container based on this image, use the following command to start a
named container from the shell or command line. Replace the <imageid> value with the
image ID returned by the docker images command.
Bash
docker run -p 6789:5001 --name mycontainer <imageid>
This command starts the latest version of the image named myimage . It maps local port
6789 to the port in the container on which the web service is listening (5001). It also
assigns the name mycontainer to the container, which makes the container easier to
stop. After the container is started, you can submit requests to
http://localhost:6789/score .
Generate a Dockerfile and dependencies
The following example shows how to download the Dockerfile, model, and other assets
needed to build an image locally. The generate_dockerfile=True parameter indicates
that you want the files, not a fully built image.
Python
package = Model.package(ws, [model], inference_config,

generate_dockerfile=True)
package.wait_for_creation(show_output=True)
# Download the package.
package.save("./imagefiles")
# Get the Azure container registry that the model/Dockerfile uses.
acr=package.get_container_registry()
print("Address:", acr.address)
print("Username:", acr.username)
print("Password:", acr.password)
This code downloads the files needed to build the image to the imagefiles directory.
The Dockerfile included in the saved files references a base image stored in an Azure
container registry. When you build the image on your local Docker installation, you need
to use the address, user name, and password to authenticate to the registry. Use the
following steps to build the image by using a local Docker installation:
1. From a shell or command-line session, use the following command to authenticate

Docker with the Azure container registry. Replace <address> , <username> , and
<password> with the values retrieved by package.get_container_registry() .
Bash
docker login <address> -u <username> -p <password>
2. To build the image, use the following command. Replace <imagefiles> with the
path of the directory where package.save() saved the files.
Bash
docker build --tag myimage <imagefiles>
This command sets the image name to myimage .
To verify that the image is built, use the docker images command. You should see the
myimage image in the list:
text
REPOSITORY TAG IMAGE ID CREATED

SIZE
<none> <none> 2d5ee0bf3b3b 49 seconds ago
1.43 GB
myimage latest 739f22498d64 3 minutes ago
1.43 GB
To start a new container based on this image, use the following command:
Bash
docker run -p 6789:5001 --name mycontainer myimage:latest
This command starts the latest version of the image named myimage . It maps local port
6789 to the port in the container on which the web service is listening (5001). It also
assigns the name mycontainer to the container, which makes the container easier to
stop. After the container is started, you can submit requests to
http://localhost:6789/score .
Example client to test the local container

The following code is an example of a Python client that can be used with the container:
Python
import requests
import json
# URL for the web service.

scoring_uri = 'http://localhost:6789/score'
# Two sets of data to score, so we get two results back.

data = {"data":
[
[ 1,2,3,4,5,6,7,8,9,10 ],
[ 10,9,8,7,6,5,4,3,2,1 ]
]
}
# Convert to JSON string.
input_data = json.dumps(data)
# Set the content type.

# Make the request and display the response.

print(resp.text)
For example clients in other programming languages, see Consume models deployed as
web services.
Stop the Docker container

To stop the container, use the following command from a different shell or command
line:
Bash
docker kill mycontainer
Next steps
Update web service
MLOps: Model management,
deployment, lineage, and monitoring
with Azure Machine Learning v1
Article • 05/31/2023
In this article, learn how to apply Machine Learning Operations (MLOps) practices in
Azure Machine Learning for the purpose of managing the lifecycle of your models.
Applying MLOps practices can improve the quality and consistency of your machine
learning solutions.
） Important
What is MLOps?
Machine Learning Operations (MLOps) is based on DevOps principles and practices
that increase the efficiency of workflows. For example, continuous integration, delivery,
and deployment. MLOps applies these principles to the machine learning process, with
the goal of:
Faster experimentation and development of models

Faster deployment of models into production
Quality assurance and end-to-end lineage tracking
MLOps in Azure Machine Learning

Azure Machine Learning provides the following MLOps capabilities:
Create reproducible ML pipelines. Machine Learning pipelines allow you to define

repeatable and reusable steps for your data preparation, training, and scoring
processes.
Create reusable software environments for training and deploying models.
Register, package, and deploy models from anywhere. You can also track
associated metadata required to use the model.
Capture the governance data for the end-to-end ML lifecycle. The logged lineage
information can include who is publishing models, why changes were made, and
when models were deployed or used in production.
Notify and alert on events in the ML lifecycle. For example, experiment
completion, model registration, model deployment, and data drift detection.
Monitor ML applications for operational and ML-related issues. Compare model
inputs between training and inference, explore model-specific metrics, and provide
monitoring and alerts on your ML infrastructure.
Automate the end-to-end ML lifecycle with Azure Machine Learning and Azure
Pipelines. Using pipelines allows you to frequently update models, test new
models, and continuously roll out new ML models alongside your other
applications and services.
For more information on MLOps, see Machine Learning DevOps (MLOps).
Create reproducible ML pipelines

Use ML pipelines from Azure Machine Learning to stitch together all of the steps
involved in your model training process.
An ML pipeline can contain steps from data preparation to feature extraction to

hyperparameter tuning to model evaluation. For more information, see ML pipelines.
If you use the Designer to create your ML pipelines, you may at any time click the "..." at
the top-right of the Designer page and then select Clone. Cloning your pipeline allows
you to iterate your pipeline design without losing your old versions.
Create reusable software environments

Azure Machine Learning environments allow you to track and reproduce your projects'
software dependencies as they evolve. Environments allow you to ensure that builds are
reproducible without manual software configurations.
Environments describe the pip and Conda dependencies for your projects, and can be
used for both training and deployment of models. For more information, see What are
Azure Machine Learning environments.
Register, package, and deploy models from
anywhere
Register and track ML models

Model registration allows you to store and version your models in the Azure cloud, in
your workspace. The model registry makes it easy to organize and keep track of your
trained models.
 Tip
A registered model is a logical container for one or more files that make up your
model. For example, if you have a model that is stored in multiple files, you can
register them as a single model in your Azure Machine Learning workspace. After
registration, you can then download or deploy the registered model and receive all
the files that were registered.
Registered models are identified by name and version. Each time you register a model
with the same name as an existing one, the registry increments the version. Additional
metadata tags can be provided during registration. These tags are then used when
searching for a model. Azure Machine Learning supports any model that can be loaded
using Python 3.5.2 or higher.
 Tip
You can also register models trained outside Azure Machine Learning.
You can't delete a registered model that is being used in an active deployment. For
more information, see the register model section of Deploy models.
） Important
When using Filter by Tags option on the Models page of Azure Machine Learning
Studio, instead of using TagName : TagValue customers should use
TagName=TagValue (without space)
Package and debug models

Before deploying a model into production, it is packaged into a Docker image. In most
cases, image creation happens automatically in the background during deployment. You
can manually specify the image.
If you run into problems with the deployment, you can deploy on your local
development environment for troubleshooting and debugging.
For more information, see Deploy models and Troubleshooting deployments.
Convert and optimize models

Converting your model to Open Neural Network Exchange (ONNX) may improve
performance. On average, converting to ONNX can yield a 2x performance increase.
For more information on ONNX with Azure Machine Learning, see the Create and
accelerate ML models article.
Use models
Trained machine learning models are deployed as web services in the cloud or locally.
Deployments use CPU, GPU, or field-programmable gate arrays (FPGA) for inferencing.
You can also use models from Power BI.
When using a model as a web service, you provide the following items:
The model(s) that are used to score data submitted to the service/device.
An entry script. This script accepts requests, uses the model(s) to score the data,
and return a response.
An Azure Machine Learning environment that describes the pip and Conda
dependencies required by the model(s) and entry script.
Any additional assets such as text, data, etc. that are required by the model(s) and
entry script.
You also provide the configuration of the target deployment platform. For example, the
VM family type, available memory, and number of cores when deploying to Azure
Kubernetes Service.
When the image is created, components required by Azure Machine Learning are also
added. For example, assets needed to run the web service.
Batch scoring
Batch scoring is supported through ML pipelines. For more information, see Batch
predictions on big data.
Real-time web services
You can use your models in web services with the following compute targets:

Local development environment
To deploy the model as a web service, you must provide the following items:
The model or ensemble of models.

Dependencies required to use the model. For example, a script that accepts
requests and invokes the model, conda dependencies, etc.
Deployment configuration that describes how and where to deploy the model.
For more information, see Deploy models.
Analytics
Microsoft Power BI supports using machine learning models for data analytics. For more
information, see Azure Machine Learning integration in Power BI (preview).
Capture the governance data required for

MLOps
Azure Machine Learning gives you the capability to track the end-to-end audit trail of all
of your ML assets by using metadata.
Azure Machine Learning integrates with Git to track information on which

repository / branch / commit your code came from.
Azure Machine Learning Datasets help you track, profile, and version data.
Interpretability allows you to explain your models, meet regulatory compliance,
and understand how models arrive at a result for given input.
Azure Machine Learning Run history stores a snapshot of the code, data, and
computes used to train a model.
The Azure Machine Learning Model Registry captures all of the metadata
associated with your model (which experiment trained it, where it is being
deployed, if its deployments are healthy).
Integration with Azure allows you to act on events in the ML lifecycle. For example,
model registration, deployment, data drift, and training (run) events.
 Tip
While some information on models and datasets is automatically captured, you can
add additional information by using tags. When looking for registered models and
datasets in your workspace, you can use tags as a filter.
Associating a dataset with a registered model is an optional step. For information

on referencing a dataset when registering a model, see the Model class reference.
Notify, automate, and alert on events in the ML

lifecycle
Azure Machine Learning publishes key events to Azure Event Grid, which can be used to
notify and automate on events in the ML lifecycle. For more information, please see this
document.
Monitor for operational & ML issues

Monitoring enables you to understand what data is being sent to your model, and the
predictions that it returns.
This information helps you understand how your model is being used. The collected
input data may also be useful in training future versions of the model.
For more information, see How to enable model data collection.
Retrain your model on new data

Often, you'll want to validate your model, update it, or even retrain it from scratch, as
you receive new information. Sometimes, receiving new data is an expected part of the
domain. Other times, as discussed in Detect data drift (preview) on datasets, model
performance can degrade in the face of such things as changes to a particular sensor,
natural data changes such as seasonal effects, or features shifting in their relation to
other features.
There is no universal answer to "How do I know if I should retrain?" but Azure Machine
Learning event and monitoring tools previously discussed are good starting points for
automation. Once you have decided to retrain, you should:
Preprocess your data using a repeatable, automated process

Train your new model
Compare the outputs of your new model to those of your old model
Use predefined criteria to choose whether to replace your old model
A theme of the above steps is that your retraining should be automated, not ad hoc.
Azure Machine Learning pipelines are a good answer for creating workflows relating to
data preparation, training, validation, and deployment. Read Retrain models with Azure
Machine Learning designer to see how pipelines and the Azure Machine Learning
designer fit into a retraining scenario.
Automate the ML lifecycle

You can use GitHub and Azure Pipelines to create a continuous integration process that
trains a model. In a typical scenario, when a Data Scientist checks a change into the Git
repo for a project, the Azure Pipeline will start a training run. The results of the run can
then be inspected to see the performance characteristics of the trained model. You can
also create a pipeline that deploys the model as a web service.
The Azure Machine Learning extension makes it easier to work with Azure Pipelines. It
provides the following enhancements to Azure Pipelines:
Enables workspace selection when defining a service connection.

Enables release pipelines to be triggered by trained models created in a training
pipeline.
For more information on using Azure Pipelines with Azure Machine Learning, see the
following links:
Continuous integration and deployment of ML models with Azure Pipelines

Azure Machine Learning MLOps repository
Azure Machine Learning MLOpsPython repository
You can also use Azure Data Factory to create a data ingestion pipeline that prepares
data for use with training. For more information, see Data ingestion pipeline.
Next steps
Learn more by reading and exploring the following resources:
How & where to deploy models with Azure Machine Learning

Tutorial: Train and deploy a model.
End-to-end MLOps examples repo
CI/CD of ML models with Azure Pipelines
Create clients that consume a deployed model
Machine learning at scale
Azure AI reference architectures & best practices rep

What are Azure Machine Learning
pipelines?
Article • 04/04/2023
An Azure Machine Learning pipeline is an independently executable workflow of a

complete machine learning task. An Azure Machine Learning pipeline helps to
standardize the best practices of producing a machine learning model, enables the team
to execute at scale, and improves the model building efficiency.
Why are Azure Machine Learning pipelines

needed?
The core of a machine learning pipeline is to split a complete machine learning task into
a multistep workflow. Each step is a manageable component that can be developed,
optimized, configured, and automated individually. Steps are connected through well-
defined interfaces. The Azure Machine Learning pipeline service automatically
orchestrates all the dependencies between pipeline steps. This modular approach brings
two key benefits:
Standardize the Machine learning operation (MLOps) practice and support scalable
team collaboration
Standardize the MLOps practice and support scalable

team collaboration
Machine learning operation (MLOps) automates the process of building machine
learning models and taking the model to production. This is a complex process. It
usually requires collaboration from different teams with different skills. A well-defined
machine learning pipeline can abstract this complex process into a multiple steps
workflow, mapping each step to a specific task such that each team can work
independently.
For example, a typical machine learning project includes the steps of data collection,
data preparation, model training, model evaluation, and model deployment. Usually, the
data engineers concentrate on data steps, data scientists spend most time on model
training and evaluation, the machine learning engineers focus on model deployment
and automation of the entire workflow. By leveraging machine learning pipeline, each
team only needs to work on building their own steps. The best way of building steps is
using Azure Machine Learning component, a self-contained piece of code that does one
step in a machine learning pipeline. All these steps built by different users are finally
integrated into one workflow through the pipeline definition. The pipeline is a
collaboration tool for everyone in the project. The process of defining a pipeline and all
its steps can be standardized by each company's preferred DevOps practice. The
pipeline can be further versioned and automated. If the ML projects are described as a
pipeline, then the best MLOps practice is already applied.

Besides being the tool to put MLOps into practice, the machine learning pipeline also
improves large model training's efficiency and reduces cost. Taking modern natural
language model training as an example. It requires pre-processing large amounts of
data and GPU intensive transformer model training. It takes hours to days to train a
model each time. When the model is being built, the data scientist wants to test
different training code or hyperparameters and run the training many times to get the
best model performance. For most of these trainings, there's usually small changes from
one training to another one. It will be a significant waste if every time the full training
from data processing to model training takes place. By using machine learning pipeline,
it can automatically calculate which steps result is unchanged and reuse outputs from
previous training. Additionally, the machine learning pipeline supports running each
step on different computation resources. Such that, the memory heavy data processing
work and run-on high memory CPU machines, and the computation intensive training
can run on expensive GPU machines. By properly choosing which step to run on which
type of machines, the training cost can be significantly reduced.
Getting started best practices

Depending on what a machine learning project already has, the starting point of
building a machine learning pipeline may vary. There are a few typical approaches to
building a pipeline.
The first approach usually applies to the team that hasn't used pipeline before and
wants to take some advantage of pipeline like MLOps. In this situation, data scientists
typically have developed some machine learning models on their local environment
using their favorite tools. Machine learning engineers need to take data scientists'
output into production. The work involves cleaning up some unnecessary code from
original notebook or Python code, changes the training input from local data to
parameterized values, split the training code into multiple steps as needed, perform unit
test of each step, and finally wraps all steps into a pipeline.
Once the teams get familiar with pipelines and want to do more machine learning
projects using pipelines, they'll find the first approach is hard to scale. The second
approach is set up a few pipeline templates, each try to solve one specific machine
learning problem. The template predefines the pipeline structure including how many
steps, each step's inputs and outputs, and their connectivity. To start a new machine
learning project, the team first forks one template repo. The team leader then assigns
members which step they need to work on. The data scientists and data engineers do
their regular work. When they're happy with their result, they structure their code to fit
in the pre-defined steps. Once the structured codes are checked-in, the pipeline can be
executed or automated. If there's any change, each member only needs to work on their
piece of code without touching the rest of the pipeline code.
Once a team has built a collection of machine learnings pipelines and reusable
components, they could start to build the machine learning pipeline from cloning
previous pipeline or tie existing reusable component together. At this stage, the team's
overall productivity will be improved significantly.
Azure Machine Learning offers different methods to build a pipeline. For users who are
familiar with DevOps practices, we recommend using CLI. For data scientists who are
familiar with python, we recommend writing pipeline using the Azure Machine Learning
SDK v2. For users who prefer to use UI, they could use the designer to build pipeline by
using registered components.
:::moniker-end
Which Azure pipeline technology should I use?

The Azure cloud provides several types of pipeline, each with a different purpose. The
following table lists the different pipelines and what they're used for:

Model Data Azure Kubeflow Data -> Distribution, caching,

orchestration scientist Machine Pipelines Model code-first, reuse
(Machine Learning
learning) Pipelines
Data Data Azure Data Apache Data -> Strongly typed

orchestration engineer Factory Airflow Data movement, data-centric
(Data prep) pipelines activities
Code & app App Azure Jenkins Code + Most open and flexible
orchestration Developer Pipelines Model -> activity support, approval
(CI/CD) / Ops App/Service queues, phases with
gating
Next steps
Azure Machine Learning pipelines are a powerful facility that begins delivering value in
the early development stages.
Define pipelines with the Azure Machine Learning CLI v2

Define pipelines with the Azure Machine Learning SDK v2
Define pipelines with Designer
Try out CLI v2 pipeline example
Try out Python SDK v2 pipeline example
Trigger applications, processes, or CI/CD workflows based on
Azure Machine Learning events (preview)
Article • 04/04/2023
In this article, you learn how to set up event-driven applications, processes, or CI/CD workflows based on Azure Machine Learning events,
such as failure notification emails or ML pipeline runs, when certain conditions are detected by Azure Event Grid.
Azure Machine Learning manages the entire lifecycle of machine learning process, including model training, model deployment, and
monitoring. You can use Event Grid to react to Azure Machine Learning events, such as the completion of training runs, the registration and
deployment of models, and the detection of data drift, by using modern serverless architectures. You can then subscribe and consume
events such as run status changed, run completion, model registration, model deployment, and data drift detection within a workspace.
When to use Event Grid for event driven actions:
Send emails on run failure and run completion

Use an Azure function after a model is registered
Streaming events from Azure Machine Learning to various of endpoints
Trigger an ML pipeline when drift is detected
） Important
This feature is currently in public preview. This preview version is provided without a service-level agreement, and it's not
recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more
information, see Supplemental Terms of Use for Microsoft Azure Previews .
Prerequisites
To use Event Grid, you need contributor or owner access to the Azure Machine Learning workspace you will create events for.
The event model & types

Azure Event Grid reads events from sources, such as Azure Machine Learning and other Azure services. These events are then sent to event
handlers such as Azure Event Hubs, Azure Functions, Logic Apps, and others. The following diagram shows how Event Grid connects
sources and handlers, but is not a comprehensive list of supported integrations.
For more information on event sources and event handlers, see What is Event Grid?
Event types for Azure Machine Learning
Azure Machine Learning provides events in the various points of machine learning lifecycle:
Event type Description
Microsoft.MachineLearningServices.RunCompleted Raised when a machine learning experiment run is completed
Microsoft.MachineLearningServices.ModelRegistered Raised when a machine learning model is registered in the workspace
Microsoft.MachineLearningServices.ModelDeployed Raised when a deployment of inference service with one or more models is completed
Microsoft.MachineLearningServices.DatasetDriftDetected Raised when a data drift detection job for two datasets is completed
Microsoft.MachineLearningServices.RunStatusChanged Raised when a run status is changed
Filter & subscribe to events

These events are published through Azure Event Grid. Using Azure portal, PowerShell or Azure CLI, customers can easily subscribe to events
by specifying one or more event types, and filtering conditions.
When setting up your events, you can apply filters to only trigger on specific event data. In the example below, for run status changed
events, you can filter by run types. The event only triggers when the criteria is met. Refer to the Azure Machine Learning event grid schema
to learn about event data you can filter by.
Subscriptions for Azure Machine Learning events are protected by Azure role-based access control (Azure RBAC). Only contributor or owner
of a workspace can create, update, and delete event subscriptions. Filters can be applied to event subscriptions either during the creation of
the event subscription or at a later time.
1. Go to the Azure portal, select a new subscription or an existing one.
2. Select the Events entry from the left navigation area, and then select + Event subscription.
3. Select the filters tab and scroll down to Advanced filters. For the Key and Value, provide the property types you want to filter by. Here
you can see the event will only trigger when the run type is a pipeline run or pipeline step run.
Filter by event type: An event subscription can specify one or more Azure Machine Learning event types.
Filter by event subject: Azure Event Grid supports subject filters based on begins with and ends with matches, so that events with a
matching subject are delivered to the subscriber. Different machine learning events have different subject format.
Event type Subject format Sample subject

Event type Subject format Sample subject
Microsoft.MachineLearningServices.RunCompleted experiments/{ExperimentId}/runs/{RunId} experiments/b1d7966c-f73a-4c68-b846-

992ace89551f/runs/my_exp1_1554835758_38dbaa94
Microsoft.MachineLearningServices.ModelRegistered models/{modelName}:{modelVersion} models/sklearn_regression_model:3
Microsoft.MachineLearningServices.ModelDeployed endpoints/{serviceId} endpoints/my_sklearn_aks
Microsoft.MachineLearningServices.DatasetDriftDetected datadrift/{data.DataDriftId}/run/{data.RunId} datadrift/4e694bf5-712e-4e40-b06a-

d2a2755212d4/run/my_driftrun1_1550564444_fbbc
Microsoft.MachineLearningServices.RunStatusChanged experiments/{ExperimentId}/runs/{RunId} experiments/b1d7966c-f73a-4c68-b846-

992ace89551f/runs/my_exp1_1554835758_38dbaa94
Advanced filtering: Azure Event Grid also supports advanced filtering based on published event schema. Azure Machine Learning
event schema details can be found in Azure Event Grid event schema for Azure Machine Learning. Some sample advanced filterings
you can perform include:
For Microsoft.MachineLearningServices.ModelRegistered event, to filter model's tag value:
--advanced-filter data.ModelTags.key1 StringIn ('value1')
To learn more about how to apply filters, see Filter events for Event Grid.
Consume Machine Learning events

Applications that handle Machine Learning events should follow a few recommended practices:
＂ As multiple subscriptions can be configured to route events to the same event handler, it is important not to assume events are from a
particular source, but to check the topic of the message to ensure that it comes from the machine learning workspace you are
expecting.
＂ Similarly, check that the eventType is one you are prepared to process, and do not assume that all events you receive will be the types
you expect.
＂ As messages can arrive out of order and after some delay, use the etag fields to understand if your information about objects is still
up-to-date. Also, use the sequencer fields to understand the order of events on any particular object.
＂ Ignore fields you don't understand. This practice will help keep you resilient to new features that might be added in the future.
＂ Failed or cancelled Azure Machine Learning operations will not trigger an event. For example, if a model deployment fails
Microsoft.MachineLearningServices.ModelDeployed won't be triggered. Consider such failure mode when design your applications.
You can always use Azure Machine Learning SDK, CLI or portal to check the status of an operation and understand the detailed failure
reasons.
Azure Event Grid allows customers to build de-coupled message handlers, which can be triggered by Azure Machine Learning events. Some
notable examples of message handlers are:
Azure Functions
Azure Logic Apps
Azure Event Hubs
Azure Data Factory Pipeline
Generic webhooks, which may be hosted on the Azure platform or elsewhere
Set up in Azure portal

1. Open the Azure portal and go to your Azure Machine Learning workspace.
2. From the left bar, select Events and then select Event Subscriptions.
3. Select the event type to consume. For example, the following screenshot has selected Model registered, Model deployed, Run
completed, and Dataset drift detected:
4. Select the endpoint to publish the event to. In the following screenshot, Event hub is the selected endpoint:
Once you have confirmed your selection, click Create. After configuration, these events will be pushed to your endpoint.
Set up with the CLI

You can either install the latest Azure CLI, or use the Azure Cloud Shell that is provided as part of your Azure subscription.
To install the Event Grid extension, use the following command from the CLI:
Azure CLI
az extension add --name eventgrid
The following example demonstrates how to select an Azure subscription and creates e a new event subscription for Azure Machine
Learning:
Azure CLI
# Select the Azure subscription that contains the workspace

az account set --subscription "<name or ID of the subscription>"
# Subscribe to the machine learning workspace. This example uses EventHub as a destination.
az eventgrid event-subscription create --name {eventGridFilterName} \
--source-resource-id
/subscriptions/{subId}/resourceGroups/{RG}/providers/Microsoft.MachineLearningServices/workspaces/{wsName} \
--endpoint-type eventhub \
--endpoint /subscriptions/{SubID}/resourceGroups/TestRG/providers/Microsoft.EventHub/namespaces/n1/eventhubs/EH1 \
--included-event-types Microsoft.MachineLearningServices.ModelRegistered \
--subject-begins-with "models/mymodelname"
Examples
Example: Send email alerts
Use Azure Logic Apps to configure emails for all your events. Customize with conditions and specify recipients to enable collaboration and
awareness across teams working together.
1. In the Azure portal, go to your Azure Machine Learning workspace and select the events tab from the left bar. From here, select Logic
apps.
2. Sign into the Logic App UI and select Machine Learning service as the topic type.
3. Select which event(s) to be notified for. For example, the following screenshot RunCompleted.
4. Next, add a step to consume this event and search for email. There are several different mail accounts you can use to receive events.
You can also configure conditions on when to send an email alert.
5. Select Send an email and fill in the parameters. In the subject, you can include the Event Type and Topic to help filter events. You can
also include a link to the workspace page for runs in the message body.
6. To save this action, select Save As on the left corner of the page. From the right bar that appears, confirm creation of this action.
Example: Data drift triggers retraining
） Important
This example relies on a feature (data drift) that is only available when using Azure Machine Learning SDK v1 or Azure CLI extension v1
for Azure Machine Learning. For more information, see What is Azure Machine Learning CLI & SDK v2.
Models go stale over time, and not remain useful in the context it is running in. One way to tell if it's time to retrain the model is detecting
data drift.
This example shows how to use event grid with an Azure Logic App to trigger retraining. The example triggers an Azure Data Factory
pipeline when data drift occurs between a model's training and serving datasets.
Before you begin, perform the following actions:
Set up a dataset monitor to detect data drift (SDK/CLI v1) in a workspace

Create a published Azure Data Factory pipeline.
In this example, a simple Data Factory pipeline is used to copy files into a blob store and run a published Machine Learning pipeline. For
more information on this scenario, see how to set up a Machine Learning step in Azure Data Factory
1. Start with creating the logic app. Go to the Azure portal , search for Logic Apps, and select create.
2. Fill in the requested information. To simplify the experience, use the same subscription and resource group as your Azure Data Factory
Pipeline and Azure Machine Learning workspace.
3. Once you have created the logic app, select When an Event Grid resource event occurs.
4. Login and fill in the details for the event. Set the Resource Name to the workspace name. Set the Event Type to
DatasetDriftDetected.
5. Add a new step, and search for Azure Data Factory. Select Create a pipeline run.
6. Login and specify the published Azure Data Factory pipeline to run.
7. Save and create the logic app using the save button on the top left of the page. To view your app, go to your workspace in the Azure
portal and click on Events.
Now the data factory pipeline is triggered when drift occurs. View details on your data drift run and machine learning pipeline in Azure
Next steps
Learn more about Event Grid and give Azure Machine Learning events a try:
About Event Grid
Event schema for Azure Machine Learning

Tutorial: How to create a secure
workspace
Article • 04/04/2023
In this article, learn how to create and connect to a secure Azure Machine Learning
workspace. The steps in this article use an Azure Virtual Network to create a security
boundary around resources used by Azure Machine Learning.
 Tip
Azure Machine Learning also provides managed virtual networks (preview). With a
managed virtual network, Azure Machine Learning handles the job of network
isolation for your workspace and managed computes. You can also add private
endpoints for resources needed by the workspace, such as Azure Storage Account.
For more information, see Workspace managed network isolation.
In this tutorial, you accomplish the following tasks:
＂ Create an Azure Virtual Network (VNet) to secure communications between

services in the virtual network.
＂ Create an Azure Storage Account (blob and file) behind the VNet. This service is
used as default storage for the workspace.
＂ Create an Azure Key Vault behind the VNet. This service is used to store secrets
used by the workspace. For example, the security information needed to access the
storage account.
＂ Create an Azure Container Registry (ACR). This service is used as a repository for
Docker images. Docker images provide the compute environments needed when
training a machine learning model or deploying a trained model as an endpoint.
＂ Create an Azure Machine Learning workspace.
＂ Create a jump box. A jump box is an Azure Virtual Machine that is behind the VNet.
Since the VNet restricts access from the public internet, the jump box is used as a
way to connect to resources behind the VNet.
＂ Configure Azure Machine Learning studio to work behind a VNet. The studio
provides a web interface for Azure Machine Learning.
＂ Create an Azure Machine Learning compute cluster. A compute cluster is used when
training machine learning models in the cloud. In configurations where Azure
Container Registry is behind the VNet, it is also used to build Docker images.
＂ Connect to the jump box and use the Azure Machine Learning studio.
 Tip
If you're looking for a template (Microsoft Bicep or Hashicorp Terraform) that

demonstrates how to create a secure workspace, see Tutorial - Create a secure
workspace using a template.
After completing this tutorial, you'll have the following architecture:
An Azure Virtual Network, which contains three subnets:

Training: Contains the Azure Machine Learning workspace, dependency
services, and resources used for training models.
Scoring: For the steps in this tutorial, it isn't used. However if you continue
using this workspace for other tutorials, we recommend using this subnet when
deploying models to endpoints.
AzureBastionSubnet: Used by the Azure Bastion service to securely connect
clients to Azure Virtual Machines.
An Azure Machine Learning workspace that uses a private endpoint to
communicate using the VNet.
An Azure Storage Account that uses private endpoints to allow storage services
such as blob and file to communicate using the VNet.
An Azure Container Registry that uses a private endpoint communicate using the
VNet.
Azure Bastion, which allows you to use your browser to securely communicate with
the jump box VM inside the VNet.
An Azure Virtual Machine that you can remotely connect to and access resources
secured inside the VNet.
An Azure Machine Learning compute instance and compute cluster.
 Tip
The Azure Batch Service listed on the diagram is a back-end service required by the
compute clusters and compute instances.

Prerequisites
Familiarity with Azure Virtual Networks and IP networking. If you aren't familiar, try
the Fundamentals of computer networking module.
While most of the steps in this article use the Azure portal or the Azure Machine
Learning studio, some steps use the Azure CLI extension for Machine Learning v2.
Create a virtual network

To create a virtual network, use the following steps:
1. In the Azure portal , select the portal menu in the upper left corner. From the
menu, select + Create a resource and then enter Virtual Network in the search
field. Select the Virtual Network entry, and then select Create.
2. From the Basics tab, select the Azure subscription to use for this resource and
then select or create a new resource group. Under Instance details, enter a
friendly name for your virtual network and select the region to create it in.
3. Select Security. Select to Enable Azure Bastion. Azure Bastion provides a secure
way to access the VM jump box you'll create inside the VNet in a later step. Use
the following values for the remaining fields:
Bastion name: A unique name for this Bastion instance

Public IP address: Create a new public IP address.
Leave the other fields at the default values.

4. Select IP Addresses. The default settings should be similar to the following image:
Use the following steps to configure the IP address and configure a subnet for
training and scoring resources:
 Tip
While you can use a single subnet for all Azure Machine Learning resources,
the steps in this article show how to create two subnets to separate the
training & scoring resources.
The workspace and other dependency services will go into the training
subnet. They can still be used by resources in other subnets, such as the
scoring subnet.
a. Look at the default IPv4 address space value. In the screenshot, the value is
172.16.0.0/16. The value may be different for you. While you can use a different
value, the rest of the steps in this tutorial are based on the 172.16.0.0/16 value.
） Important
We do not recommend using the 172.17.0.0/16 IP address range for your
VNet. This is the default subnet range used by the Docker bridge network.
Other ranges may also conflict depending on what you want to connect to
the virtual network. For example, if you plan to connect your on premises
network to the VNet, and your on-premises network also uses the
172.16.0.0/16 range. Ultimately, it is up to you to plan your network
infrastructure.
b. Select the Default subnet and then select Remove subnet.
c. To create a subnet to contain the workspace, dependency services, and

resources used for training, select + Add subnet and set the subnet name,
starting address, and subnet size. The following are the values used in this
tutorial:
Name: Training
Starting address: 172.16.0.0
Subnet size: /24 (256 addresses)
d. To create a subnet for compute resources used to score your models, select +
Add subnet again, and set the name and address range:
Subnet name: Scoring

e. To create a subnet for Azure Bastion, select + Add subnet and set the template,
starting address, and subnet size:
Subnet template: Azure Bastion

5. Select Review + create.
6. Verify that the information is correct, and then select Create.
Create a storage account
menu, select + Create a resource and then enter Storage account. Select the
Storage Account entry, and then select Create.
2. From the Basics tab, select the subscription, resource group, and region you
previously used for the virtual network. Enter a unique Storage account name, and
set Redundancy to Locally-redundant storage (LRS).
3. From the Networking tab, select Private endpoint and then select + Add private
endpoint.
4. On the Create private endpoint form, use the following values:
Subscription: The same Azure subscription that contains the previous

resources you've created.
Resource group: The same Azure resource group that contains the previous
Location: The same Azure region that contains the previous resources you've
created.
Name: A unique name for this private endpoint.
Target sub-resource: blob
Virtual network: The virtual network you created earlier.
Subnet: Training (172.16.0.0/24)
Private DNS integration: Yes
Private DNS Zone: privatelink.blob.core.windows.net
Select OK to create the private endpoint.
5. Select Review + create. Verify that the information is correct, and then select
Create.
6. Once the Storage Account has been created, select Go to resource:

7. From the left navigation, select Networking the Private endpoint connections tab,
and then select + Private endpoint:
７ Note
While you created a private endpoint for Blob storage in the previous steps,
you must also create one for File storage.
8. On the Create a private endpoint form, use the same subscription, resource
group, and Region that you've used for previous resources. Enter a unique Name.
9. Select Next : Resource, and then set Target sub-resource to file.
10. Select Next : Configuration, and then use the following values:
Virtual network: The network you created previously

Subnet: Training
Integrate with private DNS zone: Yes
Private DNS zone: privatelink.file.core.windows.net
11. Select Review + Create. Verify that the information is correct, and then select
Create.
 Tip
If you plan to use a batch endpoint or an Azure Machine Learning pipeline that
uses a ParallelRunStep, it is also required to configure private endpoints target
queue and table sub-resources. ParallelRunStep uses queue and table under the
hood for task scheduling and dispatching.
Create a key vault

menu, select + Create a resource and then enter Key Vault. Select the Key Vault
entry, and then select Create.
2. From the Basics tab, select the subscription, resource group, and region you
previously used for the virtual network. Enter a unique Key vault name. Leave the
other fields at the default value.
3. From the Networking tab, select Private endpoint and then select + Add.

created.
Target sub-resource: Vault
Private DNS Zone: privatelink.vaultcore.azure.net

Create.
Create a container registry

menu, select + Create a resource and then enter Container Registry. Select the
Container Registry entry, and then select Create.
2. From the Basics tab, select the subscription, resource group, and location you
previously used for the virtual network. Enter a unique Registry name and set the
SKU to Premium.
3. From the Networking tab, select Private endpoint and then select + Add.

created.
Target sub-resource: registry
Private DNS Zone: privatelink.azurecr.io
Create.
6. After the container registry has been created, select Go to resource.

7. From the left of the page, select Access keys, and then enable Admin user. This
setting is required when using Azure Container Registry inside a virtual network
Create a workspace
menu, select + Create a resource and then enter Machine Learning. Select the
Machine Learning entry, and then select Create.
2. From the Basics tab, select the subscription, resource group, and Region you
previously used for the virtual network. Use the following values for the other
fields:
Workspace name: A unique name for your workspace.

Storage account: Select the storage account you created previously.
Key vault: Select the key vault you created previously.
Application insights: Use the default value.
Container registry: Use the container registry you created previously.
3. From the Networking tab, select Private with Internet Outbound. In the
Workspace inbound access section, select + add.

created.
Target sub-resource: amlworkspace
Private DNS Zone: Leave the two private DNS zones at the default values of
privatelink.api.azureml.ms and privatelink.notebooks.azure.net.
5. From the Networking tab, in the Workspace outbound access section, select Use
my own virtual network.
Create.
7. Once the workspace has been created, select Go to resource.
8. From the Settings section on the left, select Private endpoint connections and
then select the link in the Private endpoint column:
9. Once the private endpoint information appears, select DNS configuration from the
left of the page. Save the IP address and fully qualified domain name (FQDN)
information on this page, as it will be used later.
） Important
There are still some configuration steps needed before you can fully use the
workspace. However, these require you to connect to the workspace.
Enable studio
Azure Machine Learning studio is a web-based application that lets you easily manage
your workspace. However, it needs some extra configuration before it can be used with
resources secured inside a VNet. Use the following steps to enable studio:
1. When using an Azure Storage Account that has a private endpoint, add the service
principal for the workspace as a Reader for the storage private endpoint(s). From
the Azure portal, select your storage account and then select Networking. Next,
select Private endpoint connections.
2. For each private endpoint listed, use the following steps:
a. Select the link in the Private endpoint column.
b. Select Access control (IAM) from the left side.
c. Select + Add, and then Add role assignment (Preview).
d. On the Role tab, select the Reader.

e. On the Members tab, select User, group, or service principal in the Assign
access to area and then select + Select members. In the Select members
dialog, enter the name as your Azure Machine Learning workspace. Select the
service principal for the workspace, and then use the Select button.
f. On the Review + assign tab, select Review + assign to assign the role.
Secure Azure Monitor and Application Insights
７ Note
For more information on securing Azure Monitor and Application Insights, see the
following links:
Migrate to workspace-based Application Insights resources.

Configure your Azure Monitor private link.
1. In the Azure portal , select your Azure Machine Learning workspace. From
Overview, select the Application Insights link.
2. In the Properties for Application Insights, check the WORKSPACE entry to see if it
contains a value. If it doesn't, select Migrate to Workspace-based, select the
Subscription and Log Analytics Workspace to use, then select Apply.
3. In the Azure portal, select Home, and then search for Private link. Select the Azure
Monitor Private Link Scope result and then select Create.
4. From the Basics tab, select the same Subscription, Resource Group, and Resource
group region as your Azure Machine Learning workspace. Enter a Name for the
instance, and then select Review + Create. To create the instance, select Create.
5. Once the Azure Monitor Private Link Scope instance has been created, select the
instance in the Azure portal. From the Configure section, select Azure Monitor
Resources and then select + Add.
6. From Select a scope, use the filters to select the Application Insights instance for
your Azure Machine Learning workspace. Select Apply to add the instance.
7. From the Configure section, select Private Endpoint connections and then select
+ Private Endpoint.
8. Select the same Subscription, Resource Group, and Region that contains your
VNet. Select Next: Resource.
9. Select Microsoft.insights/privateLinkScopes as the Resource type. Select the

Private Link Scope you created earlier as the Resource. Select azuremonitor as the
Target sub-resource. Finally, select Next: Virtual Network to continue.
10. Select the Virtual network you created earlier, and the Training subnet. Select
Next until you arrive at Review + Create. Select Create to create the private
endpoint.
11. After the private endpoint has been created, return to the Azure Monitor Private
Link Scope resource in the portal. From the Configure section, select Access
modes. Select Private only for Ingestion access mode and Query access mode,
then select Save.
Connect to the workspace

There are several ways that you can connect to the secured workspace. The steps in this
article use a jump box, which is a virtual machine in the VNet. You can connect to it
using your web browser and Azure Bastion. The following table lists several other ways
that you might connect to the secure workspace:
Method Description
Azure VPN Connects on-premises networks to the VNet over a private connection.
gateway Connection is made over the public internet.
ExpressRoute Connects on-premises networks into the cloud over a private connection.
Connection is made using a connectivity provider.
） Important
When using a VPN gateway or ExpressRoute, you will need to plan how name
resolution works between your on-premises resources and those in the VNet. For
more information, see Use a custom DNS server.
Create a jump box (VM)

Use the following steps to create an Azure Virtual Machine to use as a jump box. Azure
Bastion enables you to connect to the VM desktop through your browser. From the VM
desktop, you can then use the browser on the VM to connect to resources inside the
VNet, such as Azure Machine Learning studio. Or you can install development tools on
the VM.
 Tip
The steps below create a Windows 11 enterprise VM. Depending on your

requirements, you may want to select a different VM image. The Windows 11 (or
10) enterprise image is useful if you need to join the VM to your organization's
domain.
menu, select + Create a resource and then enter Virtual Machine. Select the
Virtual Machine entry, and then select Create.
2. From the Basics tab, select the subscription, resource group, and Region you
previously used for the virtual network. Provide values for the following fields:
Virtual machine name: A unique name for the VM.
Username: The username you'll use to log in to the VM.
Password: The password for the username.
Security type: Standard.
Image: Windows 11 Enterprise.
 Tip
If Windows 11 Enterprise isn't in the list for image selection, use See all
images_. Find the Windows 11 entry from Microsoft, and use the Select
drop-down to select the enterprise image.
You can leave other fields at the default values.

3. Select Networking, and then select the Virtual network you created earlier. Use
the following information to set the remaining fields:
Select the Training subnet.

Set the Public IP to None.
Leave the other fields at the default value.
Create.
Connect to the jump box

1. Once the virtual machine has been created, select Go to resource.
2. From the top of the page, select Connect and then Bastion.
3. Select Use Bastion, and then provide your authentication information for the
virtual machine, and a connection will be established in your browser.
Create a compute cluster and compute instance

A compute cluster is used by your training jobs. A compute instance provides a Jupyter
Notebook experience on a shared compute resource attached to your workspace.
1. From an Azure Bastion connection to the jump box, open the Microsoft Edge
browser on the remote desktop.
2. In the remote browser session, go to https://ml.azure.com . When prompted,

authenticate using your Azure AD account.
3. From the Welcome to studio! screen, select the Machine Learning workspace you
created earlier and then select Get started.
 Tip
If your Azure AD account has access to multiple subscriptions or directories,

use the Directory and Subscription dropdown to select the one that contains
the workspace.
4. From studio, select Compute, Compute clusters, and then + New.
5. From the Virtual Machine dialog, select Next to accept the default virtual machine
configuration.
6. From the Configure Settings dialog, enter cpu-cluster as the Compute name. Set
the Subnet to Training and then select Create to create the cluster.
 Tip
Compute clusters dynamically scale the nodes in the cluster as needed. We

recommend leaving the minimum number of nodes at 0 to reduce costs when
the cluster is not in use.
7. From studio, select Compute, Compute instance, and then + New.
8. From the Virtual Machine dialog, enter a unique Computer name and select Next:
Advanced Settings.
9. From the Advanced Settings dialog, set the Subnet to Training, and then select
Create.
 Tip
When you create a compute cluster or compute instance, Azure Machine Learning
dynamically adds a Network Security Group (NSG). This NSG contains the following
rules, which are specific to compute cluster and compute instance:
Allow inbound TCP traffic on ports 29876-29877 from the
BatchNodeManagement service tag.
Allow inbound TCP traffic on port 44224 from the AzureMachineLearning

service tag.
The following screenshot shows an example of these rules:
For more information on creating a compute cluster and compute cluster, including how
to do so with Python and the CLI, see the following articles:
Create a compute cluster

Create a compute instance
Configure image builds

APPLIES TO: Azure CLI ml extension v2 (current)
When Azure Container Registry is behind the virtual network, Azure Machine Learning
can't use it to directly build Docker images (used for training and deployment). Instead,
configure the workspace to use the compute cluster you created earlier. Use the
following steps to create a compute cluster and configure the workspace to use it to
build images:
1. Navigate to https://shell.azure.com/ to open the Azure Cloud Shell.
2. From the Cloud Shell, use the following command to install the 2.0 CLI for Azure
Machine Learning:
Azure CLI
az extension add -n ml
3. To update the workspace to use the compute cluster to build Docker images.
Replace docs-ml-rg with your resource group. Replace docs-ml-ws with your
workspace. Replace cpu-cluster with the compute cluster to use:
Azure CLI
az ml workspace update \
-n myworkspace \
-g myresourcegroup \
-i mycomputecluster
７ Note
You can use the same compute cluster to train models and build Docker
images for the workspace.
Use the workspace
） Important
The steps in this article put Azure Container Registry behind the VNet. In this
configuration, you cannot deploy a model to Azure Container Instances inside the
VNet. We do not recommend using Azure Container Instances with Azure Machine
Learning in a virtual network. For more information, see Secure the inference
environment (SDK/CLI v1).
As an alternative to Azure Container Instances, try Azure Machine Learning

managed online endpoints. For more information, see Enable network isolation for
managed online endpoints.
At this point, you can use the studio to interactively work with notebooks on the
compute instance and run training jobs on the compute cluster. For a tutorial on using
the compute instance and compute cluster, see Tutorial: Azure Machine Learning in a
day.
Stop compute instance and jump box
２ Warning
While it is running (started), the compute instance and jump box will continue
charging your subscription. To avoid excess cost, stop them when they are not in
use.
The compute cluster dynamically scales between the minimum and maximum node
count set when you created it. If you accepted the defaults, the minimum is 0, which
effectively turns off the cluster when not in use.

From studio, select Compute, Compute clusters, and then select the compute instance.
Finally, select Stop from the top of the page.
Stop the jump box

Once it has been created, select the virtual machine in the Azure portal and then use the
Stop button. When you're ready to use it again, use the Start button to start it.
You can also configure the jump box to automatically shut down at a specific time. To do
so, select Auto-shutdown, Enable, set a time, and then select Save.
Clean up resources
If you plan to continue using the secured workspace and other resources, skip this
section.
To delete all resources created in this tutorial, use the following steps:
2. From the list, select the resource group that you created in this tutorial.
4. Enter the resource group name, then select Delete.
Next steps
Now that you've created a secure workspace, learn how to deploy a model.
Tutorial: How to create a secure
workspace by using template
Article • 06/05/2023
Templates provide a convenient way to create reproducible service deployments. The

template defines what will be created, with some information provided by you when you
use the template. For example, specifying a unique name for the Azure Machine
Learning workspace.
In this tutorial, you learn how to use a Microsoft Bicep and Hashicorp Terraform
template to create the following Azure resources:
Azure Virtual Network. The following resources are secured behind this VNet:
Azure Machine Learning workspace
Azure Machine Learning compute cluster
Azure Key Vault
Azure Application Insights
Azure Container Registry
Azure Bastion host
Azure Machine Learning Virtual Machine (Data Science Virtual Machine)
The Bicep template also creates an Azure Kubernetes Service cluster, and a
separate resource group for it.
 Tip
Prerequisites
Before using the steps in this article, you must have an Azure subscription. If you don't
have an Azure subscription, create a free account .
You must also have either a Bash or Azure PowerShell command line.
 Tip
When reading this article, use the tabs in each section to select whether to view
information on using Bicep or Terraform templates.
Bicep
1. To install the command-line tools, see Set up Bicep development and

deployment environments.
2. The Bicep template used in this article is located at

https://github.com/Azure/azure-quickstart-
templates/blob/master/quickstarts/microsoft.machinelearningservices/machin
e-learning-end-to-end-secure . Use the following commands to clone the
GitHub repo to your development environment:
 Tip
If you do not have the git command on your development environment,

you can install it from https://git-scm.com/ .
Azure CLI
git clone https://github.com/Azure/azure-quickstart-templates

cd azure-quickstart-
templates/quickstarts/microsoft.machinelearningservices/machine-
learning-end-to-end-secure
Understanding the template

Bicep
The Bicep template is made up of the main.bicep and the .bicep files in the
modules subdirectory. The following table describes what each file is responsible
for:
File Description
File Description
main.bicep Parameters and variables. Passing parameters &

variables to other modules in the modules
subdirectory.
vnet.bicep Defines the Azure Virtual Network and subnets.
nsg.bicep Defines the network security group rules for the VNet.
bastion.bicep Defines the Azure Bastion host and subnet. Azure

Bastion allows you to easily access a VM inside the
VNet using your web browser.
dsvmjumpbox.bicep Defines the Data Science Virtual Machine (DSVM).

Azure Bastion is used to access this VM through your
web browser.
storage.bicep Defines the Azure Storage account used by the

workspace for default storage.
keyvault.bicep Defines the Azure Key Vault used by the workspace.
containerregistry.bicep Defines the Azure Container Registry used by the

workspace.
applicationinsights.bicep Defines the Azure Application Insights instance used

by the workspace.
machinelearningnetworking.bicep Defines the private endpoints and DNS zones for the
Azure Machine Learning workspace.
Machinelearning.bicep Defines the Azure Machine Learning workspace.
machinelearningcompute.bicep Defines an Azure Machine Learning compute cluster

and compute instance.
privateaks.bicep Defines an Azure Kubernetes Services cluster instance.
） Important
The example templates may not always use the latest API version for Azure
Machine Learning. Before using the template, we recommend modifying it to
use the latest API versions. For information on the latest API versions for Azure
Machine Learning, see the Azure Machine Learning REST API.
Each Azure service has its own set of API versions. For information on the API
for a specific service, check the service information in the Azure REST API
reference.
To update the API version, find the
Microsoft.MachineLearningServices/<resource> entry for the resource type and
update it to the latest version. The following example is an entry for the Azure
Machine Learning workspace that uses an API version of 2022-05-01 :
JSON
resource machineLearning
'Microsoft.MachineLearningServices/workspaces@2022-05-01' = {
） Important
The DSVM and Azure Bastion is used as an easy way to connect to the secured
workspace for this tutorial. In a production environment, we recommend using an
Azure VPN gateway or Azure ExpressRoute to access the resources inside the
VNet directly from your on-premises network.
Configure the template

Bicep
To run the Bicep template, use the following commands from the machine-learning-
end-to-end-secure where the main.bicep file is:
1. To create a new Azure Resource Group, use the following command. Replace
exampleRG with your resource group name, and eastus with the Azure region
you want to use:
Azure CLI
Azure CLI
az group create --name exampleRG --location eastus
2. To run the template, use the following command. Replace the prefix with a
unique prefix. The prefix will be used when creating Azure resources that are
required for Azure Machine Learning. Replace the securepassword with a
secure password for the jump box. The password is for the login account for
the jump box ( azureadmin in the examples below):
 Tip
The prefix must be 5 or less characters. It can't be entirely numeric or

contain the following characters: ~ ! @ # $ % ^ & * ( ) = + _ [ ] { } \
| ; : . ' " , < > / ?.
Azure CLI
Azure CLI
az deployment group create \

--resource-group exampleRG \
--template-file main.bicep \
--parameters \
prefix=prefix \
dsvmJumpboxUsername=azureadmin \
dsvmJumpboxPassword=securepassword
Connect to the workspace

After the template completes, use the following steps to connect to the DSVM:
1. From the Azure portal , select the Azure Resource Group you used with the
template. Then, select the Data Science Virtual Machine that was created by the
template. If you have trouble finding it, use the filters section to filter the Type to
virtual machine.
2. From the Overview section of the Virtual Machine, select Connect, and then select
Bastion from the dropdown.
3. When prompted, provide the username and password you specified when
configuring the template and then select Connect.
） Important
The first time you connect to the DSVM desktop, a PowerShell window opens
and begins running a script. Allow this to complete before continuing with the
next step.
4. From the DSVM desktop, start Microsoft Edge and enter https://ml.azure.com as
the address. Sign in to your Azure subscription, and then select the workspace
created by the template. The studio for your workspace is displayed.
Troubleshooting
Error: Windows computer name cannot be more than 15
characters long, be entirely numeric, or contain the
following characters
This error can occur when the name for the DSVM jump box is greater than 15
characters or includes one of the following characters: ~ ! @ # $ % ^ & * ( ) = + _ [ ]
{ } \ | ; : . ' " , < > / ?.
When using the Bicep template, the jump box name is generated programmatically
using the prefix value provided to the template. To make sure the name does not exceed
15 characters or contain any invalid characters, use a prefix that is 5 characters or less
and do not use any of the following characters in the prefix: ~ ! @ # $ % ^ & * ( ) = +
_ [ ] { } \ | ; : . ' " , < > / ?.
When using the Terraform template, the jump box name is passed using the dsvm_name
parameter. To avoid this error, use a name that is not greater than 15 characters and
does not use any of the following characters as part of the name: ~ ! @ # $ % ^ & * ( )
= + _ [ ] { } \ | ; : . ' " , < > / ?.
Next steps
） Important
The Data Science Virtual Machine (DSVM) and any compute instance resources bill
you for every hour that they are running. To avoid excess charges, you should stop
these resources when they are not in use. For more information, see the following
articles:
Create/manage VMs (Linux).

Create/manage VMs (Windows).
Create/manage compute instance.
To learn more about common secure workspace configurations and input/output

requirements, see Azure Machine Learning secure workspace traffic flow.
Network traffic flow when using a
secured workspace
Article • 04/04/2023
When your Azure Machine Learning workspace and associated resources are secured in
an Azure Virtual Network, it changes the network traffic between resources. Without a
virtual network, network traffic flows over the public internet or within an Azure data
center. Once a virtual network (VNet) is introduced, you may also want to harden
network security. For example, blocking inbound and outbound communications
between the VNet and public internet. However, Azure Machine Learning requires
access to some resources on the public internet. For example, Azure Resource
Management is used for deployments and management operations.
This article lists the required traffic to/from the public internet. It also explains how
network traffic flows between your client development environment and a secured
Azure Machine Learning workspace in the following scenarios:
Using Azure Machine Learning studio to work with:

Your workspace
AutoML
Designer
Datasets and datastores
 Tip
Azure Machine Learning studio is a web-based UI that runs partially in your

web browser, and makes calls to Azure services to perform tasks such as
training a model, using designer, or viewing datasets. Some of these calls use
a different communication flow than if you are using the SDK, CLI, REST API,
or VS Code.
Using Azure Machine Learning studio, SDK, CLI, or REST API to work with:
Compute instances and clusters
Docker images managed by Azure Machine Learning
 Tip
If a scenario or task is not listed here, it should work the same with or without a
secured workspace.
Assumptions
This article assumes the following configuration:
Azure Machine Learning workspace using a private endpoint to communicate with

the VNet.
The Azure Storage Account, Key Vault, and Container Registry used by the
workspace also use a private endpoint to communicate with the VNet.
A VPN gateway or Express Route is used by the client workstations to access the
VNet.
Inbound and outbound requirements

Scenario Required inbound Required outbound Additional configuration
Access NA Azure Active Directory You may need to use a

workspace Azure Front Door custom DNS server. For
from Azure Machine Learning more information, see Use
studio service your workspace with a
custom DNS.
Use NA NA Workspace service

AutoML, principal
designer, configuration
dataset, Allow access from
and trusted Azure services
datastore
from For more information, see
studio How to secure a workspace
in a virtual network.
Use Azure Azure Active Directory If you use a firewall, create

compute Machine Azure Resource Manager user-defined routes. For
instance Learning Azure Machine Learning more information, see
and service on service Configure inbound and
compute port 44224 Azure Storage Account outbound traffic.
cluster Azure Batch Azure Key Vault
Management
service on
ports 29876-
29877
Scenario Required inbound Required outbound Additional configuration
Use Azure NA For information on the

Kubernetes outbound configuration for
Service AKS, see How to secure
Kubernetes inference.
Use NA Microsoft Container If the Azure Container

Docker Registry Registry for your workspace
images viennaglobal.azurecr.io is behind the VNet,
managed global container registry configure the workspace to
by Azure use a compute cluster to
Machine build images. For more
Learning information, see How to
secure a workspace in a
virtual network.
） Important
Azure Machine Learning uses multiple storage accounts. Each stores different data,
and has a different purpose:
Your storage: The Azure Storage Account(s) in your Azure subscription are
used to store your data and artifacts such as models, training data, training
logs, and Python scripts. For example, the default storage account for your
workspace is in your subscription. The Azure Machine Learning compute
instance and compute clusters access file and blob data in this storage over
ports 445 (SMB) and 443 (HTTPS).
When using a compute instance or compute cluster, your storage account is

mounted as a file share using the SMB protocol. The compute instance and
cluster use this file share to store the data, models, Jupyter notebooks,
datasets, etc. The compute instance and cluster use the private endpoint
when accessing the storage account.
Microsoft storage: The Azure Machine Learning compute instance and

compute clusters rely on Azure Batch, and access storage located in a
Microsoft subscription. This storage is used only for the management of the
compute instance/cluster. None of your data is stored here. The compute
instance and compute cluster access the blob, table, and queue data in this
storage, using port 443 (HTTPS).
Machine Learning also stores metadata in an Azure Cosmos DB instance. By default,
this instance is hosted in a Microsoft subscription and managed by Microsoft. You
can optionally use an Azure Cosmos DB instance in your Azure subscription. For
more information, see Data encryption with Azure Machine Learning.
Scenario: Access workspace from studio
７ Note
The information in this section is specific to using the workspace from the Azure
Machine Learning studio. If you use the Azure Machine Learning SDK, REST API, CLI,
or Visual Studio Code, the information in this section does not apply to you.
When accessing your workspace from studio, the network traffic flows are as follows:
To authenticate to resources, Azure Active Directory is used.

For management and deployment operations, Azure Resource Manager is used.
For Azure Machine Learning specific tasks, Azure Machine Learning service is used
For access to Azure Machine Learning studio (https://ml.azure.com ), Azure
FrontDoor is used.
For most storage operations, traffic flows through the private endpoint of the
default storage for your workspace. Exceptions are discussed in the Use AutoML,
designer, dataset, and datastore section.
You also need to configure a DNS solution that allows you to resolve the names of
the resources within the VNet. For more information, see Use your workspace with
a custom DNS.
Scenario: Use AutoML, designer, dataset, and
datastore from studio
The following features of Azure Machine Learning studio use data profiling:
Dataset: Explore the dataset from studio.

Designer: Visualize module output data.
AutoML: View a data preview/profile and choose a target column.
Labeling
Data profiling depends on the Azure Machine Learning managed service being able to
access the default Azure Storage Account for your workspace. The managed service
doesn't exist in your VNet, so can't directly access the storage account in the VNet.
Instead, the workspace uses a service principal to access storage.
 Tip
You can provide a service principal when creating the workspace. If you do not, one
is created for you and will have the same name as your workspace.
To allow access to the storage account, configure the storage account to allow a
resource instance for your workspace or select the Allow Azure services on the trusted
services list to access this storage account. This setting allows the managed service to
access storage through the Azure data center network.
Next, add the service principal for the workspace to the Reader role to the private
endpoint of the storage account. This role is used to verify the workspace and storage
subnet information. If they're the same, access is allowed. Finally, the service principal
also requires Blob data contributor access to the storage account.
For more information, see the Azure Storage Account section of How to secure a
workspace in a virtual network.
Scenario: Use compute instance and compute

cluster
Azure Machine Learning compute instance and compute cluster are managed services
hosted by Microsoft. They're built on top of the Azure Batch service. While they exist in a
Microsoft managed environment, they're also injected into your VNet.
When you create a compute instance or compute cluster, the following resources are
also created in your VNet:
A Network Security Group with required outbound rules. These rules allow
inbound access from the Azure Machine Learning (TCP on port 44224) and Azure
Batch service (TCP on ports 29876-29877).
） Important
If you use a firewall to block internet access into the VNet, you must configure
the firewall to allow this traffic. For example, with Azure Firewall you can
create user-defined routes. For more information, see Configure inbound and
outbound network traffic.
A load balancer with a public IP.
Also allow outbound access to the following service tags. For each tag, replace region
with the Azure region of your compute instance/cluster:
Storage.region - This outbound access is used to connect to the Azure Storage

Account inside the Azure Batch service-managed VNet.
Keyvault.region - This outbound access is used to connect to the Azure Key Vault
account inside the Azure Batch service-managed VNet.
Data access from your compute instance or cluster goes through the private endpoint of
the Storage Account for your VNet.
If you use Visual Studio Code on a compute instance, you must allow other outbound
traffic. For more information, see Configure inbound and outbound network traffic.
Scenario: Use Azure Kubernetes Service
For information on the outbound configuration required for Azure Kubernetes Service,
see the connectivity requirements section of How to secure inference.
７ Note
The Azure Kubernetes Service load balancer is not the same as the load balancer
created by Azure Machine Learning. If you want to host your model as a secured
application, only available on the VNet, use the internal load balancer created by
Azure Machine Learning. If you want to allow public access, use the public load
balancer created by Azure Machine Learning.
If your model requires extra inbound or outbound connectivity, such as to an external

data source, use a network security group or your firewall to allow the traffic.
Scenario: Use Docker images managed by

Azure Machine Learning provides Docker images that can be used to train models or
perform inference. If you don't specify your own images, the ones provided by Azure
Machine Learning are used. These images are hosted on the Microsoft Container
Registry (MCR). They're also hosted on a geo-replicated Azure Container Registry named
viennaglobal.azurecr.io .
If you provide your own docker images, such as on an Azure Container Registry that you
provide, you don't need the outbound communication with MCR or
viennaglobal.azurecr.io .
 Tip
If your Azure Container Registry is secured in the VNet, it cannot be used by Azure
Machine Learning to build Docker images. Instead, you must designate an Azure
Machine Learning compute cluster to build images. For more information, see How
to secure a workspace in a virtual network.
Next steps
Now that you've learned how network traffic flows in a secured configuration, learn
more about securing Azure Machine Learning in a virtual network by reading the Virtual
network isolation and privacy overview article.
For information on best practices, see the Azure Machine Learning best practices for
enterprise security article.
Azure security baseline for Azure
Machine Learning
Article • 11/14/2022
This security baseline applies guidance from the Azure Security Benchmark version 2.0
to Microsoft Azure Machine Learning. The Azure Security Benchmark provides
recommendations on how you can secure your cloud solutions on Azure. The content is
grouped by the security controls defined by the Azure Security Benchmark and the
related guidance applicable to Azure Machine Learning.
You can monitor this security baseline and its recommendations using Microsoft
Defender for Cloud. Azure Policy definitions will be listed in the Regulatory Compliance
section of the Microsoft Defender for Cloud dashboard.
When a section has relevant Azure Policy Definitions, they are listed in this baseline to
help you measure compliance to the Azure Security Benchmark controls and
recommendations. Some recommendations may require a paid Microsoft Defender plan
to enable certain security scenarios.
７ Note
Controls not applicable to Azure Machine Learning, and those for which the global
guidance is recommended verbatim, have been excluded. To see how Azure
Machine Learning completely maps to the Azure Security Benchmark, see the full
Azure Machine Learning security baseline mapping file .
Network Security
For more information, see the Azure Security Benchmark: Network Security.
NS-1: Implement security for internal traffic

Guidance: When you deploy Azure Machine Learning resources, create or use an
existing virtual network. Ensure that all Azure virtual networks follow an enterprise
segmentation principle that aligns to the business risks. Isolate any system that could
incur higher risk for the organization within its own virtual network. Secure the system
sufficiently with either a network security group (NSG) or Azure Firewall.
How to create a network security group with security rules

How to deploy and configure Azure Firewall
Responsibility: Customer
NS-2: Connect private networks together

Guidance: Use Azure ExpressRoute or Azure virtual private network (VPN) to create
private connections between Azure datacenters and on-premises infrastructure in a
colocation environment.
ExpressRoute connections don't go over the public internet, and offer more reliability,
faster speeds, and lower latencies than typical internet connections. For point-to-site
and site-to-site VPN, you can connect on-premises devices or networks to a virtual
network. Use any combination of these VPN options and Azure ExpressRoute.
To connect two or more Azure virtual networks, use virtual network peering. Traffic
between peered virtual networks is private and remains on the Azure backbone network.
What are the ExpressRoute connectivity models
Azure VPN overview
Virtual network peering
Microsoft Defender for Cloud monitoring: The Azure Security Benchmark is the default
policy initiative for Microsoft Defender for Cloud and is the foundation for Microsoft
Defender for Cloud's recommendations. The Azure Policy definitions related to this
control are enabled automatically by Microsoft Defender for Cloud. Alerts related to this
control may require an Microsoft Defender plan for the related services.
Azure Policy built-in definitions - Microsoft.MachineLearningServices:
Name Description Effect(s) Version

(Azure portal) (GitHub)
Azure Azure Private Link lets you connect your virtual network to Audit, 1.1.0
Machine Azure services without a public IP address at the source or Deny,
Learning destination. The Private Link platform handles the Disabled
workspaces connectivity between the consumer and services over the
should use Azure backbone network. By mapping private endpoints to
private Azure Machine Learning workspaces, data leakage risks are
link reduced. Learn more about private links at:
https://docs.microsoft.com/azure/machine-learning/how-to-
configure-private-link.
NS-3: Establish private network access to Azure services
Guidance: Use Azure Private Link to enable private access to Azure Machine Learning
from virtual networks without crossing the internet. Private access adds a defense-in-
depth measure to Azure authentication and traffic security.
Azure Machine Learning doesn't provide a service endpoint.
Understand Azure Private Link
Configure a private endpoint for an Azure Machine Learning workspace

Azure Azure Private Link lets you connect your virtual network to Audit, 1.1.0
Machine Azure services without a public IP address at the source or Deny,
Learning destination. The Private Link platform handles the Disabled
workspaces connectivity between the consumer and services over the
should use Azure backbone network. By mapping private endpoints to
private Azure Machine Learning workspaces, data leakage risks are
link reduced. Learn more about private links at:
https://docs.microsoft.com/azure/machine-learning/how-to-
configure-private-link.
NS-4: Protect applications and services from external

network attacks
Guidance: Protect Azure Machine Learning resources against attacks from external
networks. Attacks can include:
Distributed denial of service (DDoS) attacks

Application-specific attacks
Unsolicited and potentially malicious internet traffic
Use Azure Firewall to protect applications and services against potentially malicious
traffic from the internet and other external locations. Protect assets against DDoS
attacks by enabling DDoS Protection Standard on Azure virtual networks. Use Microsoft
Defender for Cloud to detect misconfiguration risks in network-related resources.
Use Web Application Firewall (WAF) capabilities in Azure Application Gateway, Azure
Front Door, and Azure Content Delivery Network (CDN). These capabilities protect your
applications that run on Azure Machine Learning from application-layer attacks.
Azure Firewall Documentation
Manage Azure DDoS Protection Standard using the Azure portal
Microsoft Defender for Cloud recommendations
Use workspace behind a Firewall for Azure Machine Learning
How to deploy Azure WAF
NS-6: Simplify network security rules

Guidance: Use Azure Virtual Network service tags to define network access controls for
Azure Machine Learning resources in network security groups or Azure Firewall. You can
use service tags in place of specific IP addresses when creating security rules. Specify a
service tag name like "azuremachinelearning" in the appropriate rule source or
destination field to allow or deny traffic for the service. Microsoft manages the address
prefixes the service tag encompasses, and automatically updates the service tag as
addresses change.
Note: The regional tag "azuremachinelearning" isn't currently supported.
Understand and use service tags
NS-7: Secure Domain Name Service (DNS)

Guidance: Follow best practices for DNS configurations for Azure Machine Learning. For
more information, see How to use your workspace with a custom DNS server.
Follow the best practices for DNS security to mitigate against common attacks like:
Dangling DNS
DNS amplification attacks
DNS poisoning and spoofing
When you use Azure DNS as your DNS service, make sure to protect DNS zones and
records from accidental or malicious changes by using Azure Role-Based Access Control
(RBAC) and resource locks.
Azure DNS overview
Secure Domain Name System (DNS) Deployment Guide
Prevent dangling DNS entries and avoid subdomain takeover
Identity Management
For more information, see the Azure Security Benchmark: Identity Management.
IM-1: Standardize Azure Active Directory as the central

identity and authentication system
Guidance: Azure Machine Learning uses Azure AD as its default identity and access
management service. Standardize Azure AD to govern your organization's identity and
access management in:
Microsoft Cloud resources. Resources include:
The Azure portal
Azure Storage
Azure Linux and Windows virtual machines
Azure Key Vault
Platform-as-a-service (PaaS)
Software-as-a-service (SaaS) applications
Your organization's resources, such as applications on Azure or your corporate

network resources.
Securing Azure AD should be a high priority for your organization's cloud security
practice. Azure AD provides an identity secure score to help you compare your identity
security posture to Microsoft's best practice recommendations. Use the score to gauge
how closely your configuration matches best practice recommendations, and to make
improvements in your security posture.
Note: Azure AD supports external identities that allow users without Microsoft accounts
to sign in to their applications and resources.
Tenancy in Azure Active Directory
How to create and configure an Azure AD instance
Use external identity providers for application
What is the identity secure score in Azure Active Directory
Azure Machine Learning built-in role details
IM-2: Manage application identities securely and

automatically
Guidance: For Azure Machine Learning, use Azure AD to create a service principal with
restricted permissions at the resource level. Configure service principals with certificate
credentials, and fall back to client secrets.
You can use Azure Key Vault with Azure-managed identities, so a runtime environment
like Azure Functions can get credentials from the key vault.
Azure service principal
Create a service principal with certificates
Use Azure Key Vault for security principal registration
IM-3: Use Azure AD single sign-on (SSO) for application

access
Guidance: Azure Machine Learning uses Azure AD identity and access management for
Azure resources, cloud applications, and on-premises applications. Identities include
enterprise identities like employees, and external identities like partners, vendors, and
suppliers.
Azure AD provides single sign-on (SSO) to manage and secure access to your
organization's on-premises and cloud data and resources.
Connect all your users, applications, and devices to Azure AD. Azure AD offers seamless,
secure access, and greater visibility and control.
Understand application SSO with Azure AD
IM-7: Eliminate unintended credential exposure

Guidance: Azure Machine Learning lets customers deploy and run code or
configurations or persist data that potentially contains identities or secrets. Use
Credential Scanner to discover these credentials in code, configurations, or data.
Credential Scanner encourages moving discovered credentials to secure locations like
Azure Key Vault.
For GitHub, you can use the native secret scanning feature to identify credentials or
other secrets in code.
How to set up Credential Scanner
GitHub secret scanning
Privileged Access
For more information, see the Azure Security Benchmark: Privileged Access.
PA-1: Protect and limit highly privileged users

Guidance: The most critical built-in Azure AD roles are the Global Administrator and the
Privileged Role Administrator. Users with these two roles can delegate administrator
roles.
The Global Administrator or Company Administrator has access to all Azure AD

administrative features, and services that use Azure AD identities.
The Privileged Role Administrator can manage role assignments in Azure AD and
Azure AD Privileged Identity Management (PIM). This role can manage all aspects
of PIM and administrative units.
Limit the number of highly privileged accounts or roles, and protect these accounts at
an elevated level. Highly privileged users can directly or indirectly read and modify all
your Azure resources.
You can enable just-in-time (JIT) privileged access to Azure resources and Azure AD
using Azure AD PIM. JIT grants temporary permissions to perform privileged tasks only
when users need it. PIM can also generate security alerts for suspicious or unsafe activity
in your Azure AD organization.
Azure Machine Learning comes with three default roles when a new workspace is
created. Create standard operating procedures for using owner accounts.
Learn more about Azure Machine Learning default roles
PA-3: Review and reconcile user access regularly

Guidance: Azure Machine Learning uses Azure AD accounts to manage its resources.
Review user accounts and access assignments regularly to ensure the accounts and their
access are valid. You can use Azure AD access reviews to review group memberships,
enterprise application access, and role assignments.
Azure AD reporting can provide logs to help discover stale accounts. You can also create
access review report workflows in Azure AD PIM to ease the review process.
You can configure Azure AD PIM to alert you when there are too many administrator
accounts. PIM can identify administrator accounts that are stale or improperly
configured.
Azure Machine Learning offers built-in roles for data scientist and UX service users.
Note: Some Azure services support local users and roles that aren't managed through
Azure AD. Manage these users separately.
Create an access review of Azure resource roles in Privileged Identity Management

(PIM)
How to use Azure AD identity and access reviews
Manage access to an Azure Machine Learning workspace
PA-6: Use privileged access workstations

Guidance: Secured, isolated workstations are critical for the security of sensitive roles
like administrators, developers, and critical service operators. Use highly secured user
workstations and Azure Bastion for administrative tasks.
Use Azure AD, Microsoft Defender ATP, or Microsoft Intune to deploy a secure and
managed user workstation for administrative tasks. You can centrally manage secured
workstations to enforce a security configuration that includes:
Strong authentication
Software and hardware baselines
Restricted logical and network access
For more information, see the following references:
Understand privileged access workstations
Deploy a privileged access workstation
PA-7: Follow the least privilege principle of just enough

administration
Guidance: Azure Machine Learning integrates with Azure RBAC to manage its resources.
With RBAC, you manage Azure resource access through role assignments. You can
assign roles to users, groups, service principals, and managed identities. Certain
resources have pre-defined, built-in roles. You can inventory or query these roles
through tools like Azure CLI, Azure PowerShell, or the Azure portal.
Limit the privileges you assign to resources through Azure RBAC to what the roles
require. This practice complements the just-in-time (JIT) approach of Azure AD PIM.
Review roles and assignments periodically.
Use built-in roles to allocate permissions, and only create custom roles when required.
Azure Machine Learning offers built-in roles for data scientist and UX service users.
What is Azure Role-Based Access Control (Azure RBAC)
How to configure RBAC in Azure
How to use Azure AD identity and access reviews

Data Protection
For more information, see the Azure Security Benchmark: Data Protection.
DP-1: Discover, classify, and label sensitive data

Guidance: Discover, classify, and label your sensitive data. Design the appropriate
controls for the organization's technology systems to securely store, process, and
transmit sensitive information.
Use Azure Information Protection (AIP) and its associated scanning tool for sensitive
information in Office documents. You can use AIP on Azure, Office 365, on-premises, or
in other locations.
You can use Azure SQL Information Protection to help classify and label information
stored in Azure SQL Database.
Tag sensitive information using Azure Information Protection
How to implement Azure SQL Data Discovery
DP-2: Protect sensitive data

Guidance: Protect sensitive data by restricting access with Azure RBAC, network-based
access controls, and specific controls in Azure services. For example, use encryption in
SQL and other databases.
For consistency, align all types of access control with your enterprise segmentation
strategy. Inform your enterprise segmentation strategy by the location of sensitive or
business-critical data and systems.
Microsoft treats all customer content in the underlying Microsoft-managed platform as

sensitive. Microsoft guards against customer data loss and exposure. Microsoft has
default data protection controls and capabilities to ensure that Azure customer data
remains secure.
Azure Role Based Access Control (RBAC)
Understand customer data protection in Azure

DP-3: Monitor for unauthorized transfer of sensitive data

Guidance: Monitor for unauthorized transfer of data to locations outside enterprise
visibility and control. Monitor for anomalous activities like large or unusual transfers that
could indicate unauthorized data exfiltration.
Azure Storage ATP and Azure SQL ATP can alert on anomalous information transfers that
might indicate unauthorized transfers of sensitive information.
AIP provides monitoring capabilities for classified and labeled information.
If necessary for DLP compliance, you can use a host-based DLP solution to enforce
detection and prevention controls and prevent data exfiltration.
Enable Azure SQL ATP
Enable Azure Storage ATP
DP-4: Encrypt sensitive information in transit

Guidance: To complement access controls, protect data in transit against out-of-band
attacks like traffic capture. Use encryption to ensure that attackers can't easily read or
modify the data. Azure Machine Learning supports data encryption in transit with
Transport Layer Security (TLS) v1.2.
This requirement is optional for traffic on private networks, but is critical for traffic on
external and public networks. For HTTP traffic, make sure any clients that connect to
your Azure resources can use TLS v1.2 or greater.
For remote management, use secure shell (SSH) for Linux or remote desktop protocol
(RDP) and TLS for Windows. Don't use an unencrypted protocol. Disable weak ciphers
and obsolete SSL, TLS, and SSH versions and protocols.
Azure encrypts data in transit between Azure data centers by default.
Understand encryption in transit with Azure
Information on TLS security
Double encryption for Azure data in transit

Responsibility: Microsoft
DP-5: Encrypt sensitive data at rest

Guidance: To complement access controls, Azure Machine Learning protects data at rest
against out-of-band attacks, such as accessing underlying storage, by using encryption.
Encryption helps ensure that attackers can't easily read or modify the data.
Azure provides encryption for data at rest by default. For highly sensitive data, you can
implement extra encryption at rest on Azure resources where available. Azure manages
your encryption keys by default, but certain Azure services provide options for
customer-managed keys.
Understand encryption at rest in Azure
Data at rest double encryption in Azure
How to configure customer-managed encryption keys
Encryption model and key management table
Data encryption with Azure Machine Learning

Azure Manage encryption at rest of Azure Machine Learning Audit, 1.0.3

Machine workspace data with customer-managed keys. By default, Deny,
Learning customer data is encrypted with service-managed keys, but Disabled
workspaces customer-managed keys are commonly required to meet
should be regulatory compliance standards. Customer-managed keys
encrypted enable the data to be encrypted with an Azure Key Vault key
with a created and owned by you. You have full control and
customer- responsibility for the key lifecycle, including rotation and
managed management. Learn more at https://aka.ms/azureml-
key workspaces-cmk.
Asset Management
For more information, see the Azure Security Benchmark: Asset Management.
AM-1: Ensure security team has visibility into risks for

assets
Guidance: Make sure to grant security teams Security Reader permissions in your Azure
tenant and subscriptions, so they can monitor for security risks by using Microsoft
Defender for Cloud.
Monitoring for security risks could be the responsibility of a central security team or a
local team, depending on how you structure responsibilities. Always aggregate security
insights and risks centrally within an organization.
You can apply Security Reader permissions broadly to an entire tenant's Root
Management Group, or scope permissions to specific management groups or
subscriptions.
Note: Visibility into workloads and services might require more permissions.
Overview of Security Reader role
Overview of Azure Management Groups
AM-2: Ensure security team has access to asset inventory

and metadata
Guidance: Apply tags to your Azure resources, resource groups, and subscriptions to
logically organize them into a taxonomy. Each tag consists of a name and value pair. For
example, you can apply the name "Environment" and the value "Production" to all the
resources in production.
Use Azure Virtual Machine Inventory to automate collecting information about software
on virtual machines (VMs). Software Name, Version, Publisher, and Refresh Time are
available from the Azure portal. To access install dates and other information, enable
guest-level diagnostics and import the Windows Event Logs into a Log Analytics
workspace.
Use Microsoft Defender for Cloud Adaptive Application Controls to specify which file
types a rule applies to.
How to create queries with Azure Resource Graph Explorer
Microsoft Defender for Cloud asset inventory management
Resource naming and tagging decision guide
How to enable Azure virtual machine inventory
How to use Microsoft Defender for Cloud Adaptive Application Controls
AM-3: Use only approved Azure services

Guidance: Use Azure Policy to audit and restrict which services users can provision in
your environment. Use Azure Resource Graph to query for and discover resources within
subscriptions. You can also use Azure Monitor to create rules to trigger alerts when they
detect an unapproved service.
How to configure and manage Azure Policy
How to deny a specific resource type with Azure Policy
How to create queries with Azure Resource Graph Explorer
AM-6: Use only approved applications in compute

resources
Guidance: Use Azure Virtual Machine Inventory to automate collecting information
about all software on VMs. Software Name, Version, Publisher, and Refresh time are
available from the Azure portal. To access install dates and other information, enable
guest-level diagnostics and bring the Windows event logs into a Log Analytics
workspace.
How to enable Azure virtual machine inventory
Logging and Threat Detection

For more information, see the Azure Security Benchmark: Logging and Threat Detection.
LT-1: Enable threat detection for Azure resources
Guidance: Use the Microsoft Defender for Cloud built-in threat detection capability.
Enable Microsoft Defender for your Azure Machine Learning resources. Microsoft
Defender for Azure Machine Learning provides an extra layer of security intelligence.
Microsoft Defender detects unusual and potentially harmful attempts to access or
exploit your Azure Machine Learning resources.
Threat protection in Microsoft Defender for Cloud
Microsoft Defender for Cloud security alerts reference guide
Enable logging in Azure Machine Learning
LT-2: Enable threat detection for Azure identity and

access management
Guidance: Azure AD provides the following user logs. You can view the logs in Azure AD
reporting. You can integrate the logs with Azure Monitor, Microsoft Sentinel, or other
SIEM and monitoring tools for more sophisticated monitoring and analytics use cases.
Sign-ins - Information about managed application usage and user sign-in

activities.
Audit logs - Traceability through logs for all changes made by various Azure AD
features. Audit logs include changes made to any resource within Azure AD.
Changes include adding or removing users, apps, groups, roles, and policies.
Risky sign-ins - An indicator for sign-in attempts by someone who might not be
the legitimate owner of a user account.
Users flagged for risk - An indicator for a user account that might have been
compromised.
Microsoft Defender for Cloud can also alert you about certain suspicious activities like
an excessive number of failed authentication attempts. Deprecated accounts in the
subscription can also trigger alerts.
Microsoft Defender for Cloud can also alert you about suspicious activities like an
excessive number of failed authentication attempts, or about deprecated accounts.
In addition to basic security hygiene monitoring, Microsoft Defender for Cloud's Threat
Protection module can collect more in-depth security alerts from:
Individual Azure compute resources like VMs, containers, and app service
Data resources like Azure SQL Database and Azure Storage
Azure service layers
This capability gives you visibility into account anomalies in individual resources.
Audit activity reports in the Azure Active Directory
Enable Azure Identity Protection
Threat protection in Microsoft Defender for Cloud
For more information on Azure Machine Learning alerts
LT-3: Enable logging for Azure network activities

Guidance: Enable and collect network security group (NSG) resource logs, NSG flow
logs, Azure Firewall logs, and Web Application Firewall (WAF) logs for security analysis.
Logs support incident investigations, threat hunting, and security alert generation. You
can send the flow logs to an Azure Monitor Log Analytics workspace and use Traffic
Analytics to provide insights.
Make sure to collect DNS query logs to help correlate other network data. You can
implement a third-party DNS logging solution from Azure Marketplace as your
organization needs.
How to enable network security group flow logs
Azure Firewall logs and metrics
How to enable and use Traffic Analytics
Monitoring with Network Watcher
Azure networking monitoring solutions in Azure Monitor
Gather insights about your DNS infrastructure with the DNS Analytics solution
LT-4: Enable logging for Azure resources
Guidance: Activity logs are available automatically. The logs contain all PUT, POST, and
DELETE, but not GET, operations for your Azure Machine Learning resources. You can
use activity logs to find errors when troubleshooting, or to monitor how users in your
organization modified resources.
Enable Azure resource logs for Azure Machine Learning. You can use Microsoft Defender
for Cloud and Azure Policy to enable resource logs and log data collecting. These logs
can be critical for investigating security incidents and doing forensic exercises.
Azure Machine Learning also produces security audit logs for the local administrator
accounts. Enable these local admin audit logs. Configure the logs to be sent to a central
Log Analytics workspace or storage account for long-term retention and auditing.
How to collect platform logs and metrics with Azure Monitor
Understand logging and different log types in Azure
Understand Microsoft Defender for Cloud data collection
LT-5: Centralize security log management and analysis

Guidance: Centralize logging storage and analysis to enable correlation. For each log
source, ensure you have:
An assigned data owner

Access guidance
Storage location
What tools you use to process and access the data
Data retention requirements
Make sure to integrate Azure activity logs into your central logging.
Ingest logs via Azure Monitor to aggregate security data that endpoint devices, network
resources, and other security systems generate. In Azure Monitor, use Log Analytics
workspaces to query and do analytics.
Use Azure Storage accounts for long-term and archival storage.

Enable and onboard data to Microsoft Sentinel or a third-party SIEM. Many
organizations use Microsoft Sentinel for “hot” data they use frequently and Azure
Storage for “cold” data they use less frequently.
For applications that run on Azure Machine Learning, forward all security-related logs to
your SIEM for centralized management.
How to collect platform logs and metrics with Azure Monitor
How to onboard Microsoft Sentinel
LT-6: Configure log storage retention

Guidance: Make sure that any storage accounts or Log Analytics workspaces you use to
store Azure Machine Learning logs have log retention periods set according to your
organization's compliance regulations.
In Azure Monitor, you can set your Log Analytics workspace retention period according
to your organization's compliance regulations. Use Azure Storage, Azure Data Lake, or
Log Analytics workspace accounts for long-term and archival storage.
How to configure Log Analytics workspace retention period
Storing resource logs in an Azure Storage Account
LT-7: Use approved time synchronization sources

Guidance: Azure Machine Learning doesn't support configuring your own time
synchronization sources. The Azure Machine Learning service relies on Microsoft time
synchronization sources that aren't exposed to customers for configuration.
Responsibility: Microsoft
Posture and Vulnerability Management

For more information, see the Azure Security Benchmark: Posture and Vulnerability
Management.
PV-1: Establish secure configurations for Azure services
Guidance: Azure Machine Learning supports service-specific policies that are available in
Microsoft Defender for Cloud to audit and enforce Azure resource configurations. You
can configure the policies in Microsoft Defender for Cloud or Azure Policy.
Use Azure Blueprints to automate deployment and configuration of services and

application environments. A single blueprint definition can include Azure Resource
Manager templates, RBAC controls, and policies.
Policies supported by Azure Machine Learning
Working with security policies in Microsoft Defender for Cloud
Illustration of Guardrails implementation in Enterprise Scale Landing Zone
Tutorial: Create and manage policies to enforce compliance
Azure Blueprints
PV-2: Sustain secure configurations for Azure services

Guidance: Use Microsoft Defender for Cloud to monitor your configuration baseline.
Use Azure Policy [deny] and [deploy if not exist] to enforce secure configuration across
Azure compute resources including VMs and containers.
Understand Azure Policy effects
Create and manage policies to enforce compliance
PV-3: Establish secure configurations for compute

resources
Guidance: Azure Machine Learning has varying support across different compute
resources, including your own compute resources. For compute resources that your
organization owns, use Microsoft Defender for Cloud recommendations to maintain
security configurations. You can also use custom operating system images or Azure
Automation State Configuration to set the operating system security configuration your
organization requires.
Use Microsoft Defender for Cloud and Azure Policy to establish secure configurations on
all compute resources, including VMs and containers.
How to monitor Microsoft Defender for Cloud recommendations
Security recommendations - a reference guide
PV-4: Sustain secure configurations for compute

resources
Guidance: Use Microsoft Defender for Cloud and Azure Policy to regularly assess and
remediate configuration risks on Azure compute resources, including VMs and
containers. You can also use Azure Resource Manager (ARM) templates, custom
operating system images, or Azure Automation state configuration to maintain the
operating system security configuration your organization requires.
Microsoft VM templates combined with Azure Automation State Configuration can help
meet and maintain security requirements.
Microsoft manages and maintains the VM images they publish on Azure Marketplace.
Microsoft Defender for Cloud can scan vulnerabilities in container images and
continuously monitor your Docker container configurations against CIS Docker
Benchmarks. You can use the Microsoft Defender for Cloud Recommendations page to
view recommendations and remediate issues.
How to implement Microsoft Defender for Cloud vulnerability assessment

recommendations
How to create an Azure Virtual Machine from an ARM template
Azure Automation State Configuration Overview
Create a Windows virtual machine in the Azure portal
Information on how to download the VM template
Sample script to upload a VHD to Azure and create a new VM
Container security in Microsoft Defender for Cloud
PV-5: Securely store custom operating system and
container images
Guidance: Azure Machine Learning lets customers manage container images. Use Azure
RBAC to ensure that only authorized users can access your custom images. Use an Azure
Shared Image Gallery to share your images to different users, service principals, or Azure
AD groups in your organization. Store container images in Azure Container Registry, and
use RBAC to ensure that only authorized users have access.
Understand Azure RBAC
Understand Azure RBAC for Container Registry
How to configure Azure RBAC
Shared Image Gallery overview
PV-6: Perform software vulnerability assessments

Guidance: Azure Machine Learning allows service deployment through container
registries in its environment.
Follow recommendations from Microsoft Defender for Cloud for performing

vulnerability assessments on your container images. Microsoft Defender for Cloud has a
built-in vulnerability scanner for container images.
Export scan results at consistent intervals as needed. Compare the results with previous
scans to verify that vulnerabilities have been remediated. When using vulnerability
management recommendations suggested by Microsoft Defender for Cloud, you can
pivot into the selected solution's portal to view historical scan data.
Azure Machine Learning can use a third-party solution for performing vulnerability
assessments on network devices and web applications. When conducting remote scans,
don't use a single, perpetual, administrative account. Consider implementing JIT
provisioning methodology for the scan account. Protect and monitor credentials for the
scan account, and use the account only for vulnerability scanning.
How to implement Microsoft Defender for Cloud vulnerability assessment

recommendations
PV-7: Rapidly and automatically remediate software
vulnerabilities
Guidance: Azure Machine Learning uses open-source software as part of the Azure
Machine Learning service.
For third-party software, use a third-party patch management solution or System Center
Updates Publisher for Configuration Manager.
PV-8: Conduct regular attack simulation

Guidance: Conduct penetration testing or red team activities on your Azure resources as
needed, and ensure remediation of all critical security findings.
Follow the Microsoft Cloud Penetration Testing Rules of Engagement to ensure your
penetration tests don't violate Microsoft policies. Use Microsoft's Red Teaming strategy
and execution. Do live site penetration testing against Microsoft-managed cloud
infrastructure, services, and applications.
Penetration testing in Azure
Penetration Testing Rules of Engagement
Microsoft Cloud Red Teaming
Responsibility: Shared
Endpoint Security
For more information, see the Azure Security Benchmark: Endpoint Security.
ES-1: Use Endpoint Detection and Response (EDR)

Guidance: Enable Endpoint Detection and Response (EDR) capabilities for servers and
clients. Integrate with SIEM and security operations processes.
Microsoft Defender Advanced Threat Protection provides EDR capability as part of an

enterprise endpoint security platform to prevent, detect, investigate, and respond to
advanced threats.
Microsoft Defender Advanced Threat Protection Overview

Microsoft Defender ATP service for Windows servers
Microsoft Defender ATP service for non-Windows servers
ES-2: Use centrally managed, modern antimalware

software
Guidance: Protect your Azure Machine Learning and its resources with centrally
managed, modern antimalware software. Use a centrally managed endpoint
antimalware solution that can do real-time and periodic scanning.
Microsoft Antimalware for Azure Cloud Services is the default antimalware solution
for Windows VMs.
For Linux VMs, use a third-party antimalware solution.
Use Microsoft Defender for Cloud threat detection for data services to detect
malware uploaded to Azure Storage accounts.
Use Microsoft Defender for Cloud to automatically:
Identify several popular antimalware solutions for your VMs
Report endpoint protection running status
Make recommendations
For more information, see the following references:
How to configure Microsoft Antimalware for Cloud Services and Virtual Machines
Supported endpoint protection solutions
ES-3: Be sure to update antimalware software and

signatures
Guidance: Make sure to update antimalware signatures rapidly and consistently.
Follow recommendations in Microsoft Defender for Cloud "Compute & Apps" to ensure
all VMs and containers are up to date with the latest signatures.
For Windows, Microsoft Antimalware automatically installs the latest signatures and
engine updates by default. For Linux, use third-party antimalware solution.
How to deploy Microsoft Antimalware for Azure Cloud Services and Virtual
Machines
Endpoint protection assessment and recommendations in Microsoft Defender for

Cloud
Backup and Recovery

For more information, see the Azure Security Benchmark: Backup and Recovery.
BR-1: Make sure to run regular automated backups

Guidance: Make sure you back up systems and data to maintain business continuity
after an unexpected event. Use guidance for any recovery time objective (RTO) and
recovery point objective (RPO)s.
Enable Azure Backup. Configure the backup sources, like Azure VMs, SQL Server, HANA
databases, or file shares. Configure the frequency and retention period you want.
For higher redundancy, enable geo-redundant storage options to replicate backup data
to a secondary region and recover using cross-region restore.
Enterprise-scale business continuity and disaster recovery
How to enable Azure Backup
How to recover files from Azure Virtual Machine backup
How to enable cross region restore
Failover for business continuity and disaster recovery of Azure Machine Learning
BR-2: Encrypt backup data

Guidance: Make sure to protect your backups are against attacks. Backup protection
should include encryption to protect against loss of confidentiality.
On-premises backup using Azure Backup provides encryption at rest using the
passphrase you provide. Regular Azure service backup automatically encrypts backup
data using Azure platform-managed keys. You can choose to encrypt the backup using a
customer-managed key. In this case, make sure this customer-managed key in the key
vault is also in the backup scope.
Use RBAC in Azure Backup, Azure Key Vault, and other resources to protect backups and
customer-managed keys. You can also enable advanced security features to require MFA
before backups can be altered or deleted.
Overview of security features in Azure Backup
Encryption of backup data using customer-managed keys
How to back up Key Vault keys in Azure
Security features to help protect hybrid backups from attacks
BR-3: Validate all backups, including customer-managed

keys
Guidance: Periodically do data restorations of your backups, and ensure that you can
restore backed-up customer-managed keys.
How to recover files from Azure Virtual Machine backup
How to restore Key Vault keys in Azure
BR-4: Mitigate risk of lost keys

Guidance: Ensure you have measures in place to prevent and recover from loss of keys.
Enable soft delete and purge protection in Azure Key Vault to protect keys against
accidental or malicious deletion.
How to enable soft delete and purge protection in Key Vault
Next steps
See the Azure Security Benchmark V2 overview
Learn more about Azure security baselines
Data encryption with Azure Machine
Learning
Article • 04/04/2023
Azure Machine Learning relies on a various of Azure data storage services and compute
resources when training models and performing inferences. In this article, learn about
the data encryption for each service both at rest and in transit.
） Important
For production grade encryption during training, Microsoft recommends using

Azure Machine Learning compute cluster. For production grade encryption during
inference, Microsoft recommends using Azure Kubernetes Service.
Azure Machine Learning compute instance is a dev/test environment. When using

it, we recommend that you store your files, such as notebooks and scripts, in a file
share. Your data should be stored in a datastore.
Encryption at rest
Azure Machine Learning end to end projects integrates with services like Azure Blob
Storage, Azure Cosmos DB, Azure SQL Database etc. The article describes encryption
method of such services.
Azure Blob storage

Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage
account (default storage account) that's tied to the Azure Machine Learning workspace
and your subscription. All the data stored in Azure Blob storage is encrypted at rest with
Microsoft-managed keys.
For information on how to use your own keys for data stored in Azure Blob storage, see
Azure Storage encryption with customer-managed keys in Azure Key Vault.
Training data is typically also stored in Azure Blob storage so that it's accessible to
training compute targets. This storage isn't managed by Azure Machine Learning but
mounted to compute targets as a remote file system.
If you need to rotate or revoke your key, you can do so at any time. When rotating a
key, the storage account will start using the new key (latest version) to encrypt data at
rest. When revoking (disabling) a key, the storage account takes care of failing requests.
It usually takes an hour for the rotation or revocation to be effective.
For information on regenerating the access keys, see Regenerate storage access keys.
Azure Data Lake Storage
７ Note
On Feb 29, 2024 Azure Data Lake Storage Gen1 will be retired. For more
information, see the official announcement . If you use Azure Data Lake Storage
Gen1, make sure to migrate to Azure Data Lake Storage Gen2 prior to that date. To
learn how, see Migrate Azure Data Lake Storage from Gen1 to Gen2 by using the
Azure portal.
Unless you already have an Azure Data Lake Storage Gen1 account, you cannot
create new ones.
ADLS Gen2 Azure Data Lake Storage Gen 2 is built on top of Azure Blob Storage and is
designed for enterprise big data analytics. ADLS Gen2 is used as a datastore for Azure
Machine Learning. Same as Azure Blob Storage the data at rest is encrypted with
Microsoft-managed keys.
For information on how to use your own keys for data stored in Azure Data Lake
Storage, see Azure Storage encryption with customer-managed keys in Azure Key Vault.
Azure Relational Databases

Azure Machine Learning services support data from different data sources such as Azure
SQL Database, Azure PostgreSQL and Azure MYSQL.
Azure SQL Database Transparent Data Encryption protects Azure SQL Database against
threat of malicious offline activity by encrypting data at rest. By default, TDE is enabled
for all newly deployed SQL Databases with Microsoft managed keys.
For information on how to use customer managed keys for transparent data encryption,
see Azure SQL Database Transparent Data Encryption .
Azure Database for PostgreSQL Azure PostgreSQL uses Azure Storage encryption to
encrypt data at rest by default using Microsoft managed keys. It is similar to Transparent
Data Encryption (TDE) in other databases such as SQL Server.
For information on how to use customer managed keys for transparent data encryption,
see Azure Database for PostgreSQL Single server data encryption with a customer-
managed key.
Azure Database for MySQL Azure Database for MySQL is a relational database service
in the Microsoft cloud based on the MySQL Community Edition database engine. The
Azure Database for MySQL service uses the FIPS 140-2 validated cryptographic module
for storage encryption of data at-rest.
To encrypt data using customer managed keys, see Azure Database for MySQL data
encryption with a customer-managed key .
Azure Cosmos DB
Azure Machine Learning stores metadata in an Azure Cosmos DB instance. This instance
is associated with a Microsoft subscription managed by Azure Machine Learning. All the
data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
When using your own (customer-managed) keys to encrypt the Azure Cosmos DB
instance, a Microsoft managed Azure Cosmos DB instance is created in your
subscription. This instance is created in a Microsoft-managed resource group, which is
different than the resource group for your workspace. For more information, see
Customer-managed keys.

All container images in your registry (Azure Container Registry) are encrypted at rest.
Azure automatically encrypts an image before storing it and decrypts it when Azure
Machine Learning pulls the image.
To use customer-managed keys to encrypt your Azure Container Registry, you need to
create your own ACR and attach it while provisioning the workspace. You can encrypt
the default instance that gets created at the time of workspace provisioning.
） Important
Azure Machine Learning requires the admin account be enabled on your Azure
Container Registry. By default, this setting is disabled when you create a container
registry. For information on enabling the admin account, see Admin account.
Once an Azure Container Registry has been created for a workspace, do not delete
it. Doing so will break your Azure Machine Learning workspace.
For an example of creating a workspace using an existing Azure Container Registry, see
Create a workspace for Azure Machine Learning with Azure CLI.

Create a workspace with Python SDK.
Use an Azure Resource Manager template to create a workspace for Azure
Machine Learning
） Important
Deployments to ACI rely on the Azure Machine Learning Python SDK and CLI v1.
You may encrypt a deployed Azure Container Instance (ACI) resource using customer-
managed keys. The customer-managed key used for ACI can be stored in the Azure Key
Vault for your workspace. For information on generating a key, see Encrypt data with a
customer-managed key.
To use the key when deploying a model to Azure Container Instance, create a new
deployment configuration using AciWebservice.deploy_configuration() . Provide the key
information using the following parameters:
cmk_vault_base_url : The URL of the key vault that contains the key.
cmk_key_name : The name of the key.
cmk_key_version : The version of the key.
For more information on creating and using a deployment configuration, see the
following articles:
AciWebservice.deploy_configuration() reference
Where and how to deploy
For more information on using a customer-managed key with ACI, see Encrypt
deployment data.
You may encrypt a deployed Azure Kubernetes Service resource using customer-
managed keys at any time. For more information, see Bring your own keys with Azure
Kubernetes Service.
This process allows you to encrypt both the Data and the OS Disk of the deployed
virtual machines in the Kubernetes cluster.
） Important
This process only works with AKS K8s version 1.17 or higher. Azure Machine
Learning added support for AKS 1.17 on Jan 13, 2020.
Machine Learning Compute

Compute cluster The OS disk for each compute node stored in Azure Storage is
encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts.
This compute target is ephemeral, and clusters are typically scaled down when no jobs
are queued. The underlying virtual machine is de-provisioned, and the OS disk is
deleted. Azure Disk Encryption is not enabled for workspaces by default. If the
workspace was created with the hbi_workspace parameter set to TRUE , then the OS disk
is encrypted.
Each virtual machine also has a local temporary disk for OS operations. If you want, you
can use the disk to stage training data. If the workspace was created with the
hbi_workspace parameter set to TRUE , the temporary disk is encrypted. This
environment is short-lived (only during your job,) and encryption support is limited to
system-managed keys only.
Managed online endpoint and batch endpoint use machine learning compute in the
backend, and follows the same encryption mechanism.
Compute instance The OS disk for compute instance is encrypted with Microsoft-
managed keys in Azure Machine Learning storage accounts. If the workspace was
created with the hbi_workspace parameter set to TRUE , the local OS and temporary disks
on compute instance are encrypted with Microsoft managed keys. Customer managed
key encryption is not supported for OS and temporary disks.
For more information, see Customer-managed keys.

Azure Data Factory
The Azure Data Factory pipeline is used to ingest data for use with Azure Machine
Learning. Azure Data Factory encrypts data at rest, including entity definitions and any
data cached while runs are in progress. By default, data is encrypted with a randomly
generated Microsoft-managed key that is uniquely assigned to your data factory.
For information on how to use customer managed keys for encryption use Encrypt
Azure Data Factory with customer managed keys .
Azure Databricks
Azure Databricks can be used in Azure Machine Learning pipelines. By default, the
Databricks File System (DBFS) used by Azure Databricks is encrypted using a Microsoft-
managed key. To configure Azure Databricks to use customer-managed keys, see
Configure customer-managed keys on default (root) DBFS.
Microsoft-generated data
When using services such as Automated Machine Learning, Microsoft may generate a
transient, pre-processed data for training multiple models. This data is stored in a
datastore in your workspace, which allows you to enforce access controls and
encryption appropriately.
You may also want to encrypt diagnostic information logged from your deployed
endpoint into your Azure Application Insights instance.
Encryption in transit
Azure Machine Learning uses TLS to secure internal communication between various
Azure Machine Learning microservices. All Azure Storage access also occurs over a
secure channel.
To secure external calls made to the scoring endpoint, Azure Machine Learning uses TLS.
For more information, see Use TLS to secure a web service through Azure Machine
Learning.
Data collection and handling
Microsoft collected data

Microsoft may collect non-user identifying information like resource names (for example
the dataset name, or the machine learning experiment name), or job environment
variables for diagnostic purposes. All such data is stored using Microsoft-managed keys
in storage hosted in Microsoft owned subscriptions and follows Microsoft's standard
Privacy policy and data handling standards . This data is kept within the same region
as your workspace.
Microsoft also recommends not storing sensitive information (such as account key
secrets) in environment variables. Environment variables are logged, encrypted, and
stored by us. Similarly when naming your jobs, avoid including sensitive information
such as user names or secret project names. This information may appear in telemetry
logs accessible to Microsoft Support engineers.
You may opt out from diagnostic data being collected by setting the hbi_workspace
parameter to TRUE while provisioning the workspace. This functionality is supported
when using the Azure Machine Learning Python SDK, the Azure CLI, REST APIs, or Azure
Resource Manager templates.
Using Azure Key Vault

Azure Machine Learning uses the Azure Key Vault instance associated with the
workspace to store credentials of various kinds:
The associated storage account connection string

Passwords to Azure Container Repository instances
Connection strings to data stores
SSH passwords and keys to compute targets like Azure HDInsight and VMs are stored in
a separate key vault that's associated with the Microsoft subscription. Azure Machine
Learning doesn't store any passwords or keys provided by users. Instead, it generates,
authorizes, and stores its own SSH keys to connect to VMs and HDInsight to run the
experiments.
Each workspace has an associated system-assigned managed identity that has the same
name as the workspace. This managed identity has access to all keys, secrets, and
certificates in the key vault.
Next steps
Connect to Azure storage
Get data from a datastore
Connect to data
Train with datasets
Customer-managed keys
Use Managed identities with Azure
Machine Learning CLI v1
Article • 04/04/2023
Managed identities allow you to configure your workspace with the minimum required
permissions to access resources.
When configuring Azure Machine Learning workspace in trustworthy manner, it is

important to ensure that different services associated with the workspace have the
correct level of access. For example, during machine learning workflow the workspace
needs access to Azure Container Registry (ACR) for Docker images, and storage
accounts for training data.
Furthermore, managed identities allow fine-grained control over permissions, for

example you can grant or revoke access from specific compute resources to a specific
ACR.
In this article, you'll learn how to use managed identities to:
Configure and use ACR for your Azure Machine Learning workspace without
having to enable admin user access to ACR.
Access a private ACR external to your workspace, to pull base images for training
or inference.
Create workspace with user-assigned managed identity to access associated
resources.
Prerequisites
The Azure CLI extension for Machine Learning service
） Important
until that date.

The Azure Machine Learning Python SDK.
To assign roles, the login for your Azure subscription must have the Managed
Identity Operator role, or other role that grants the required actions (such as
Owner).
You must be familiar with creating and working with Managed Identities.
Configure managed identities

In some situations, it's necessary to disallow admin user access to Azure Container
Registry. For example, the ACR may be shared and you need to disallow admin access
by other users. Or, creating ACR with admin user enabled is disallowed by a subscription
level policy.
） Important
When using Azure Machine Learning for inference on Azure Container Instance
(ACI), admin user access on ACR is required. Do not disable it if you plan on
deploying models to ACI for inference.
When you create ACR without enabling admin user access, managed identities are used
to access the ACR to build and pull Docker images.
You can bring your own ACR with admin user disabled when you create the workspace.
Alternatively, let Azure Machine Learning create workspace ACR and disable admin user
afterwards.
Bring your own ACR

If ACR admin user is disallowed by subscription policy, you should first create ACR
without admin user, and then associate it with the workspace. Also, if you have existing
ACR with admin user disabled, you can attach it to the workspace.
Create ACR from Azure CLI without setting --admin-enabled argument, or from Azure
portal without enabling admin user. Then, when creating Azure Machine Learning
workspace, specify the Azure resource ID of the ACR. The following example
demonstrates creating a new Azure Machine Learning workspace that uses an existing
ACR:
 Tip
To get the value for the --container-registry parameter, use the az acr show
command to show information for your ACR. The id field contains the resource ID
for your ACR.
Azure CLI
az ml workspace create -w <workspace name> \

-g <workspace resource group> \
-l <region> \
--container-registry /subscriptions/<subscription id>/resourceGroups/<acr
resource group>/providers/Microsoft.ContainerRegistry/registries/<acr name>
Let Azure Machine Learning service create workspace

ACR
If you do not bring your own ACR, Azure Machine Learning service will create one for
you when you perform an operation that needs one. For example, submit a training run
to Machine Learning Compute, build an environment, or deploy a web service endpoint.
The ACR created by the workspace will have admin user enabled, and you need to
disable the admin user manually.
1. Create a new workspace
Azure CLI
az ml workspace show -n <my workspace> -g <my resource group>
2. Perform an action that requires ACR. For example, the tutorial on training a model.
3. Get the ACR name created by the cluster:
Azure CLI
az ml workspace show -w <my workspace> \

-g <my resource group>
--query containerRegistry
This command returns a value similar to the following text. You only want the last
portion of the text, which is the ACR instance name:
Output
/subscriptions/<subscription id>/resourceGroups/<my resource

group>/providers/MicrosoftContainerReggistry/registries/<ACR instance
name>
4. Update the ACR to disable the admin user:
Azure CLI
az acr update --name <ACR instance name> --admin-enabled false
Create compute with managed identity to access Docker

images for training
To access the workspace ACR, create machine learning compute cluster with system-
assigned managed identity enabled. You can enable the identity from Azure portal or
Studio when creating compute, or from Azure CLI using the below. For more
information, see using managed identity with compute clusters.
Python SDK
When creating a compute cluster with the AmlComputeProvisioningConfiguration,

use the identity_type parameter to set the managed identity type.
A managed identity is automatically granted ACRPull role on workspace ACR to enable

pulling Docker images for training.
７ Note
If you create compute first, before workspace ACR has been created, you have to
assign the ACRPull role manually.
Access base images from private ACR

By default, Azure Machine Learning uses Docker base images that come from a public
repository managed by Microsoft. It then builds your training or inference environment
on those images. For more information, see What are ML environments?.
To use a custom base image internal to your enterprise, you can use managed identities
to access your private ACR. There are two use cases:
Use base image for training as is.

Build Azure Machine Learning managed image with custom image as a base.
Pull Docker base image to machine learning compute

cluster for training as is
Create machine learning compute cluster with system-assigned managed identity
enabled as described earlier. Then, determine the principal ID of the managed identity.
Azure CLI
az ml computetarget amlcompute identity show --name <cluster name> -w

<workspace> -g <resource group>
Optionally, you can update the compute cluster to assign a user-assigned managed
identity:
Azure CLI
az ml computetarget amlcompute identity assign --name <cluster name> \

-w $mlws -g $mlrg --identities <my-identity-id>
To allow the compute cluster to pull the base images, grant the managed service
identity ACRPull role on the private ACR
Azure CLI
az role assignment create --assignee <principal ID> \

--role acrpull \
--scope "/subscriptions/<subscription ID>/resourceGroups/<private ACR
resource group>/providers/Microsoft.ContainerRegistry/registries/<private
ACR name>"
Finally, when submitting a training run, specify the base image location in the
environment definition.
Python

env = Environment(name="private-acr")
env.docker.base_image = "<ACR name>.azurecr.io/<base image repository>/<base
image version>"
env.python.user_managed_dependencies = True
） Important
To ensure that the base image is pulled directly to the compute resource, set
user_managed_dependencies = True and do not specify a Dockerfile. Otherwise
Azure Machine Learning service will attempt to build a new Docker image and fail,
because only the compute cluster has access to pull the base image from ACR.
Build Azure Machine Learning managed environment into

base image from private ACR for training or inference
In this scenario, Azure Machine Learning service builds the training or inference
environment on top of a base image you supply from a private ACR. Because the image
build task happens on the workspace ACR using ACR Tasks, you must perform more
steps to allow access.
1. Create user-assigned managed identity and grant the identity ACRPull access to
the private ACR.
2. Grant the workspace system-assigned managed identity a Managed Identity

Operator role on the user-assigned managed identity from the previous step. This
role allows the workspace to assign the user-assigned managed identity to ACR
Task for building the managed environment.
a. Obtain the principal ID of workspace system-assigned managed identity:
Azure CLI
az ml workspace show -w <workspace name> -g <resource group> --query
identityPrincipalId
b. Grant the Managed Identity Operator role:
Azure CLI
az role assignment create --assignee <principal ID> --role

managedidentityoperator --scope <user-assigned managed identity
resource ID>
The user-assigned managed identity resource ID is Azure resource ID of the

user assigned identity, in the format /subscriptions/<subscription
ID>/resourceGroups/<resource
group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<user-
assigned managed identity name> .
3. Specify the external ACR and client ID of the user-assigned managed identity in
workspace connections by using Workspace.set_connection method:
Python
workspace.set_connection(
name="privateAcr",
category="ACR",
target = "<acr url>",
authType = "RegistryConnection",
value={"ResourceId": "<user-assigned managed identity resource
id>", "ClientId": "<user-assigned managed identity client ID>"})
4. Once the configuration is complete, you can use the base images from private ACR
when building environments for training or inference. The following code snippet
demonstrates how to specify the base image ACR and image name in an
environment definition:
Python
env = Environment(name="my-env")
env.docker.base_image = "<acr url>/my-repo/my-image:latest"
Optionally, you can specify the managed identity resource URL and client ID in the
environment definition itself by using RegistryIdentity. If you use registry identity
explicitly, it overrides any workspace connections specified earlier:
Python
from azureml.core.container_registry import RegistryIdentity
identity = RegistryIdentity()
identity.resource_id= "<user-assigned managed identity resource ID>"
identity.client_id="<user-assigned managed identity client ID>"
env.docker.base_image_registry.registry_identity=identity
env.docker.base_image = "my-acr.azurecr.io/my-repo/my-image:latest"
Use Docker images for inference

Once you've configured ACR without admin user as described earlier, you can access
Docker images for inference without admin keys from your Azure Kubernetes service
(AKS). When you create or attach AKS to workspace, the cluster's service principal is
automatically assigned ACRPull access to workspace ACR.
７ Note
If you bring your own AKS cluster, the cluster must have service principal enabled
instead of managed identity.
Create workspace with user-assigned managed

identity
When creating a workspace, you can bring your own user-assigned managed identity
that will be used to access the associated resources: ACR, KeyVault, Storage, and App
Insights.
） Important
When creating workspace with user-assigned managed identity, you must create
the associated resources yourself, and grant the managed identity roles on those
resources. Use the role assignment ARM template to make the assignments.
Use Azure CLI or Python SDK to create the workspace. When using the CLI, specify the
ID using the --primary-user-assigned-identity parameter. When using the SDK, use
primary_user_assigned_identity . The following are examples of using the Azure CLI and
Python to create a new workspace using these parameters:
Azure CLI
Azure CLI
az ml workspace create -w <workspace name> -g <resource group> --primary-

user-assigned-identity <managed identity ARM ID>
Python
Python
ws = Workspace.create(name="workspace name",
subscription_id="subscription id",
resource_group="resource group name",
primary_user_assigned_identity="managed identity ARM ID")
You can also use an ARM template to create a workspace with user-assigned
managed identity.
For a workspace with customer-managed keys for encryption, you can pass in a user-
assigned managed identity to authenticate from storage to Key Vault. Use argument
user-assigned-identity-for-cmk-encryption (CLI) or
user_assigned_identity_for_cmk_encryption (SDK) to pass in the managed identity. This
managed identity can be the same or different as the workspace primary user assigned
managed identity.
Next steps
Learn more about enterprise security in Azure Machine Learning
Learn about identity-based data access
Learn about managed identities on compute cluster.
Set up authentication for Azure Machine
Learning resources and workflows using
SDK v1
Article • 06/01/2023
Learn how to set up authentication to your Azure Machine Learning workspace.

Authentication to your Azure Machine Learning workspace is based on Azure Active
Directory (Azure AD) for most things. In general, there are four authentication
workflows that you can use when connecting to the workspace:
Interactive: You use your account in Azure Active Directory to either directly
authenticate, or to get a token that is used for authentication. Interactive
authentication is used during experimentation and iterative development.
Interactive authentication enables you to control access to resources (such as a
web service) on a per-user basis.
Service principal: You create a service principal account in Azure Active Directory,
and use it to authenticate or get a token. A service principal is used when you need
an automated process to authenticate to the service without requiring user
interaction. For example, a continuous integration and deployment script that
trains and tests a model every time the training code changes.
Azure CLI session: You use an active Azure CLI session to authenticate. Azure CLI
authentication is used during experimentation and iterative development, or when
you need an automated process to authenticate to the service using a pre-
authenticated session. You can log in to Azure via the Azure CLI on your local
workstation, without storing credentials in Python code or prompting the user to
authenticate. Similarly, you can reuse the same scripts as part of continuous
integration and deployment pipelines, while authenticating the Azure CLI with a
service principal identity.
Managed identity: When using the Azure Machine Learning SDK on an Azure
Virtual Machine, you can use a managed identity for Azure. This workflow allows
the VM to connect to the workspace using the managed identity, without storing
credentials in Python code or prompting the user to authenticate. Azure Machine
Learning compute clusters and compute instances can also be configured to use a
managed identity to access the workspace when training models.
Regardless of the authentication workflow used, Azure role-based access control (Azure
RBAC) is used to scope the level of access (authorization) allowed to the resources. For
example, an admin or automation process might have access to create a compute
instance, but not use it, while a data scientist could use it, but not delete or create it. For
more information, see Manage access to Azure Machine Learning workspace.
Azure AD Conditional Access can be used to further control or restrict access to the
workspace for each authentication workflow. For example, an admin can allow
workspace access from managed devices only.
Prerequisites
SDK, or use a Azure Machine Learning compute instance with the SDK already
installed.
Azure Active Directory

All the authentication workflows for your workspace rely on Azure Active Directory. If
you want users to authenticate using individual accounts, they must have accounts in
your Azure AD. If you want to use service principals, they must exist in your Azure AD.
Managed identities are also a feature of Azure AD.
For more on Azure AD, see What is Azure Active Directory authentication.
Once you've created the Azure AD accounts, see Manage access to Azure Machine
Learning workspace for information on granting them access to the workspace and
other operations in Azure Machine Learning.
Configure a service principal

To use a service principal (SP), you must first create the SP. Then grant it access to your
workspace. As mentioned earlier, Azure role-based access control (Azure RBAC) is used
to control access, so you must also decide what access to grant the SP.
） Important
When using a service principal, grant it the minimum access required for the task
it is used for. For example, you would not grant a service principal owner or
contributor access if all it is used for is reading the access token for a web
deployment.
The reason for granting the least access is that a service principal uses a password
to authenticate, and the password may be stored as part of an automation script. If
the password is leaked, having the minimum access required for a specific tasks
minimizes the malicious use of the SP.
The easiest way to create an SP and grant access to your workspace is by using the
Azure CLI. To create a service principal and grant it access to your workspace, use the
following steps:
７ Note
You must be an admin on the subscription to perform all of these steps.
1. Authenticate to your Azure subscription:
Azure CLI
az login
Otherwise, you need to open a browser and follow the instructions on the
command line. The instructions involve browsing to https://aka.ms/devicelogin
and entering an authorization code.
If you have multiple Azure subscriptions, you can use the az account set -s
<subscription name or ID> command to set the subscription. For more
information, see Use multiple Azure subscriptions.
2. Create the service principal. In the following example, an SP named ml-auth is

created:
Azure CLI
az ad sp create-for-rbac --sdk-auth --name ml-auth --role Contributor -

-scopes /subscriptions/<subscription id>
The output will be a JSON similar to the following. Take note of the clientId ,
clientSecret , and tenantId fields, as you'll need them for other steps in this
article.
JSON
{
"clientId": "your-client-id",
"clientSecret": "your-client-secret",
"subscriptionId": "your-sub-id",
"tenantId": "your-tenant-id",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
"resourceManagerEndpointUrl": "https://management.azure.com",
"activeDirectoryGraphResourceId": "https://graph.windows.net",
"sqlManagementEndpointUrl":
"https://management.core.windows.net:5555",
"galleryEndpointUrl": "https://gallery.azure.com/",
"managementEndpointUrl": "https://management.core.windows.net"
}
3. Retrieve the details for the service principal by using the clientId value returned
in the previous step:
Azure CLI
az ad sp show --id your-client-id
The following JSON is a simplified example of the output from the command. Take
note of the objectId field, as you'll need its value for the next step.
JSON
{
"accountEnabled": "True",
"addIns": [],
"appDisplayName": "ml-auth",
...
...
...
"objectId": "your-sp-object-id",
"objectType": "ServicePrincipal"
}
4. To grant access to the workspace and other resources used by Azure Machine
Learning, use the information in the following articles:
How to assign roles and actions in Azure Machine Learning

How to assign roles in the CLI
） Important
Owner access allows the service principal to do virtually any operation in your
workspace. It is used in this document to demonstrate how to grant access; in
a production environment Microsoft recommends granting the service
principal the minimum access needed to perform the role you intend it for.
For information on creating a custom role with the access needed for your
scenario, see Manage access to Azure Machine Learning workspace.
Configure a managed identity
） Important
Managed identity is only supported when using the Azure Machine Learning SDK
from an Azure Virtual Machine or with an Azure Machine Learning compute cluster
or compute instance.
Managed identity with a VM

1. Enable a system-assigned managed identity for Azure resources on the VM.
2. From the Azure portal , select your workspace and then select Access Control
(IAM).
3. Select Add, Add Role Assignment to open the Add role assignment page.
4. Assign the following role. For detailed steps, see Assign Azure roles using the
Azure portal.
Setting Value
Role The role you want to assign.
Assign access to Managed Identity
Members The managed identity you created earlier

Managed identity with compute cluster
For more information, see Set up managed identity for compute cluster.
Use interactive authentication
） Important
Interactive authentication uses your browser, and requires cookies (including 3rd
party cookies). If you have disabled cookies, you may receive an error such as "we
couldn't sign you in." This error may also occur if you have enabled Azure AD
Multi-Factor Authentication.
Most examples in the documentation and samples use interactive authentication. For
example, when using the SDK there are two function calls that will automatically prompt
you with a UI-based authentication flow:
Calling the from_config() function will issue the prompt.
Python

The from_config() function looks for a JSON file containing your workspace
connection information.
Using the Workspace constructor to provide subscription, resource group, and

workspace information, will also prompt for interactive authentication.
Python
ws = Workspace(subscription_id="your-sub-id",
resource_group="your-resource-group-id",
workspace_name="your-workspace-name"
)
 Tip
If you have access to multiple tenants, you may need to import the class and
explicitly define what tenant you are targeting. Calling the constructor for
InteractiveLoginAuthentication will also prompt you to login similar to the calls
above.
Python

interactive_auth = InteractiveLoginAuthentication(tenant_id="your-
tenant-id")
When using the Azure CLI, the az login command is used to authenticate the CLI
session. For more information, see Get started with Azure CLI.
 Tip
If you are using the SDK from an environment where you have previously
authenticated interactively using the Azure CLI, you can use the
AzureCliAuthentication class to authenticate to the workspace using the
credentials cached by the CLI:
Python
from azureml.core.authentication import AzureCliAuthentication

cli_auth = AzureCliAuthentication()
workspace_name="your-workspace-name",
auth=cli_auth
)
Use service principal authentication

To authenticate to your workspace from the SDK, using a service principal, use the
ServicePrincipalAuthentication class constructor. Use the values you got when
creating the service provider as the parameters. The tenant_id parameter maps to
tenantId from above, service_principal_id maps to clientId , and
service_principal_password maps to clientSecret .
Python
from azureml.core.authentication import ServicePrincipalAuthentication
sp = ServicePrincipalAuthentication(tenant_id="your-tenant-id", # tenantID
service_principal_id="your-client-id", #
clientId
service_principal_password="your-client-
secret") # clientSecret
The sp variable now holds an authentication object that you use directly in the SDK. In
general, it's a good idea to store the ids/secrets used above in environment variables as
shown in the following code. Storing in environment variables prevents the information
from being accidentally checked into a GitHub repo.
Python
import os
sp = ServicePrincipalAuthentication(tenant_id=os.environ['AML_TENANT_ID'],
service_principal_id=os.environ['AML_PRINCIPAL_ID'],
service_principal_password=os.environ['AML_PRINCIPAL_PASS'])
For automated workflows that run in Python and use the SDK primarily, you can use this
object as-is in most cases for your authentication. The following code authenticates to
your workspace using the auth object you created.
Python

ws = Workspace.get(name="ml-example",
auth=sp,
subscription_id="your-sub-id",
resource_group="your-rg-name")
ws.get_details()
Use managed identity authentication

To authenticate to the workspace from a VM, compute cluster, or compute instance that
is configured with a managed identity, use the MsiAuthentication class. The following
example demonstrates how to use this class to authenticate to a workspace:
Python
from azureml.core.authentication import MsiAuthentication
msi_auth = MsiAuthentication()
workspace_name="your-workspace-name",
auth=msi_auth
)
Use Conditional Access

As an administrator, you can enforce Azure AD Conditional Access policies for users
signing in to the workspace. For example, you can require two-factor authentication, or
allow sign in only from managed devices. To use Conditional Access for Azure Machine
Learning workspaces specifically, assign the Conditional Access policy to the app named
Azure Machine Learning. The app ID is 0736f41a-0425-bdb5-1563eff02385.
Next steps
How to use secrets in training.
How to configure authentication for models deployed as a web service.
Consume an Azure Machine Learning model deployed as a web service.
Article • 06/12/2023
In this article, you learn how to manage access (authorization) to an Azure Machine Learning workspace. Azure role-based access control
(Azure RBAC) is used to manage access to Azure resources, such as the ability to create new resources or use existing ones. Users in your
Azure Active Directory (Azure AD) are assigned specific roles, which grant access to resources. Azure provides both built-in roles and the
ability to create custom roles.
 Tip
While this article focuses on Azure Machine Learning, individual services that Azure Machine Learning relies on provide their own
RBAC settings. For example, using the information in this article, you can configure who can submit scoring requests to a model
deployed as a web service on Azure Kubernetes Service. But Azure Kubernetes Service provides its own set of Azure roles. For service
specific RBAC information that may be useful with Azure Machine Learning, see the following links:
Control access to Azure Kubernetes cluster resources

Use Azure RBAC for access to blob data
２ Warning
Applying some roles may limit UI functionality in Azure Machine Learning studio for other users. For example, if a user's role does not
have the ability to create a compute instance, the option to create a compute instance will not be available in studio. This behavior is
expected, and prevents the user from attempting operations that would return an access denied error.
Default roles
Azure Machine Learning workspaces have a five built-in roles that are available by default. When adding users to a workspace, they can be
assigned one of the built-in roles described below.
Role Access level
AzureML Data Can perform all actions within an Azure Machine Learning workspace, except for creating or deleting compute resources and
Scientist modifying the workspace itself.
AzureML Can create, manage and access compute resources within a workspace.
Compute
Operator
Reader Read-only actions in the workspace. Readers can list and view assets, including datastore credentials, in a workspace. Readers can't
create or update these assets.
Contributor View, create, edit, or delete (where applicable) assets in a workspace. For example, contributors can create an experiment, create or
attach a compute cluster, submit a run, and deploy a web service.
Owner Full access to the workspace, including the ability to view, create, edit, or delete (where applicable) assets in a workspace. Additionally,
you can change role assignments.
In addition, Azure Machine Learning registries have a AzureML Registry User role that can be assigned to a registry resource to grant data
scientists user-level permissions. For administrator-level permissions to create or delete registries, use Contributor or Owner role.
Role Access level
AzureML Registry User Can get registries, and read, write and delete assets within them. Cannot create new registry resources or delete them.
You can combine the roles to grant different levels of access. For example, you can grant a workspace user both AzureML Data Scientist
and AzureML Compute Operator roles to permit the user to perform experiments while creating computes in a self-service manner.
） Important
Role access can be scoped to multiple levels in Azure. For example, someone with owner access to a workspace may not have owner
access to the resource group that contains the workspace. For more information, see How Azure RBAC works.
Manage workspace access
If you're an owner of a workspace, you can add and remove roles for the workspace. You can also assign roles to users. Use the following
links to discover how to manage access:
Azure portal UI
PowerShell
Azure CLI
REST API
Azure Resource Manager templates
Use Azure AD security groups to manage workspace access

You can use Azure AD security groups to manage access to workspaces. This approach has following benefits:
Team or project leaders can manage user access to workspace as security group owners, without needing Owner role on the
workspace resource directly.
You can organize, manage and revoke users' permissions on workspace and other resources as a group, without having to manage
permissions on user-by-user basis.
Using Azure AD groups helps you to avoid reaching the subscription limit on role assignments.
To use Azure AD security groups:
1. Create a security group.

2. Add a group owner. This user has permissions to add or remove group members. Note that the group owner isn't required to be
group member, or have direct RBAC role on the workspace.
3. Assign the group an RBAC role on the workspace, such as AzureML Data Scientist, Reader or Contributor.
4. Add group members. The members consequently gain access to the workspace.
Create custom role

If the built-in roles are insufficient, you can create custom roles. Custom roles might have read, write, delete, and compute resource
permissions in that workspace. You can make the role available at a specific workspace level, a specific resource group level, or a specific
subscription level.
７ Note
You must be an owner of the resource at that level to create custom roles within that resource.
To create a custom role, first construct a role definition JSON file that specifies the permission and scope for the role. The following
example defines a custom role named "Data Scientist Custom" scoped at a specific workspace level:
data_scientist_custom_role.json :
JSON
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute.",
"Actions": ["*"],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/*/delete",
"Microsoft.MachineLearningServices/workspaces/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/write",
"Microsoft.MachineLearningServices/workspaces/computes/*/delete",
"Microsoft.Authorization/*/write"
],
"AssignableScopes": [
"/subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.MachineLearningServices/workspac
es/<workspace_name>"
]
}
 Tip
You can change the AssignableScopes field to set the scope of this custom role at the subscription level, the resource group level, or a
specific workspace level. The above custom role is just an example, see some suggested custom roles for the Azure Machine Learning
service.
This custom role can do everything in the workspace except for the following actions:
It can't delete the workspace.

It can't create or update the workspace.
It can't create or update compute resources.
It can't delete compute resources.
It can't add, delete, or alter role assignments.
To deploy this custom role, use the following Azure CLI command:
Azure CLI
az role definition create --role-definition data_scientist_role.json
After deployment, this role becomes available in the specified workspace. Now you can add and assign this role in the Azure portal.
For more information on custom roles, see Azure custom roles.
Azure Machine Learning operations

For more information on the operations (actions and not actions) usable with custom roles, see Resource provider operations. You can also
use the following Azure CLI command to list operations:
Azure CLI
az provider operation show –n Microsoft.MachineLearningServices
List custom roles

In the Azure CLI, run the following command:
Azure CLI
az role definition list --subscription <sub-id> --custom-role-only true
To view the role definition for a specific custom role, use the following Azure CLI command. The <role-name> should be in the same format
returned by the command above:
Azure CLI
az role definition list -n <role-name> --subscription <sub-id>
Update a custom role

In the Azure CLI, run the following command:
Azure CLI
az role definition update --role-definition update_def.json --subscription <sub-id>
You need to have permissions on the entire scope of your new role definition. For example if this new role has a scope across three
subscriptions, you need to have permissions on all three subscriptions.
７ Note
Role updates can take 15 minutes to an hour to apply across all role assignments in that scope.
Use Azure Resource Manager templates for repeatability
If you anticipate that you'll need to recreate complex role assignments, an Azure Resource Manager template can be a significant help. The
machine-learning-dependencies-role-assignment template shows how role assignments can be specified in source code for reuse.
Common scenarios
The following table is a summary of Azure Machine Learning activities and the permissions required to perform them at the least scope. For
example, if an activity can be performed with a workspace scope (Column 4), then all higher scope with that permission will also work
automatically. Note that for certain activities the permissions differ between V1 and V2 APIs.
） Important
All paths in this table that start with / are relative paths to Microsoft.MachineLearningServices/ :
Activity Subscription-level scope Resource Workspace-level scope

group-
level
scope
Create new Not required Owner or N/A (becomes Owner or inherits higher scope role after creation)
workspace 1 contributor
Request Owner, or contributor, or custom role Not Not Authorized

subscription allowing /locations/updateQuotas/action Authorized
level at subscription scope
Amlcompute
quota or set
workspace
level quota
Create new Not required Not Owner, contributor, or custom role allowing: /workspaces/computes/write
compute required
cluster
Create new Not required Not Owner, contributor, or custom role allowing: /workspaces/computes/write
compute required
instance
Submitting any Not required Not Owner, contributor, or custom role allowing: "/workspaces/*/read",
type of run required "/workspaces/environments/write", "/workspaces/experiments/runs/write",
(V1) "/workspaces/metadata/artifacts/write",
"/workspaces/metadata/snapshots/write",
"/workspaces/environments/build/action",
"/workspaces/experiments/runs/submit/action",
"/workspaces/environments/readSecrets/action"
Submitting any Not required Not Owner, contributor, or custom role allowing: "/workspaces/*/read",
type of run required "/workspaces/environments/write", "/workspaces/jobs/*",
(V2) "/workspaces/metadata/artifacts/write", "/workspaces/metadata/codes/*/write",
"/workspaces/environments/build/action",
"/workspaces/environments/readSecrets/action"
Publishing Not required Not Owner, contributor, or custom role allowing:

pipelines and required "/workspaces/endpoints/pipelines/*", "/workspaces/pipelinedrafts/*",
endpoints (V1) "/workspaces/modules/*"
Publishing Not required Not Owner, contributor, or custom role allowing:

pipelines and required "/workspaces/endpoints/pipelines/*", "/workspaces/pipelinedrafts/*",
endpoints (V2) "/workspaces/components/*"
Attach an AKS Not required Owner or

resource 2 contributor
on the
resource
group that
contains
AKS
Activity Subscription-level scope Resource Workspace-level scope
group-
level
scope
Deploying a Not required Not Owner, contributor, or custom role allowing: "/workspaces/services/aks/write",
registered required "/workspaces/services/aci/write"
model on an
AKS/ACI
resource
Scoring against Not required Not Owner, contributor, or custom role allowing:
a deployed required "/workspaces/services/aks/score/action",
AKS endpoint "/workspaces/services/aks/listkeys/action" (when you are not using Azure
Active Directory auth) OR "/workspaces/read" (when you are using token auth)
Accessing Not required Not Owner, contributor, or custom role allowing: "/workspaces/computes/read",
storage using required "/workspaces/notebooks/samples/read", "/workspaces/notebooks/storage/*",
interactive "/workspaces/listStorageAccountKeys/action",
notebooks "/workspaces/listNotebookAccessToken/read"
Create new Owner, contributor, or custom role allowing Not Owner, contributor, or custom role allowing: /workspaces/computes/write
custom role Microsoft.Authorization/roleDefinitions/write required
Create/manage Not required Not Owner, contributor, or custom role allowing

online required Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*
endpoints and
deployments
Retrieve Not required Not Owner, contributor, or custom role allowing

authentication required Microsoft.MachineLearningServices/workspaces/onlineEndpoints/token/action
credentials for and
online Microsoft.MachineLearningServices/workspaces/onlineEndpoints/listkeys/action .
endpoints
1: If you receive a failure when trying to create a workspace for the first time, make sure that your role allows
Microsoft.MachineLearningServices/register/action . This action allows you to register the Azure Machine Learning resource provider with
your Azure subscription.
2: When attaching an AKS cluster, you also need to have the Azure Kubernetes Service Cluster Admin Role on the cluster.
Differences between actions for V1 and V2 APIs

There are certain differences between actions for V1 APIs and V2 APIs.
Asset Action path for V1 API Action path for V2 API
Dataset Microsoft.MachineLearningServices/workspaces/datasets Microsoft.MachineLearningServices/workspaces/datasets/versions
Experiment runs and Microsoft.MachineLearningServices/workspaces/experiments Microsoft.MachineLearningServices/workspaces/jobs

jobs
Models Microsoft.MachineLearningServices/workspaces/models Microsoft.MachineLearningServices/workspaces/models/versions
Snapshots and code Microsoft.MachineLearningServices/workspaces/snapshots Microsoft.MachineLearningServices/workspaces/codes/versions
Modules and Microsoft.MachineLearningServices/workspaces/modules Microsoft.MachineLearningServices/workspaces/components

components
You can make custom roles compatible with both V1 and V2 APIs by including both actions, or using wildcards that include both actions,
for example Microsoft.MachineLearningServices/workspaces/datasets/*/read.
Create a workspace using a customer-managed key

When using a customer-managed key (CMK), an Azure Key Vault is used to store the key. The user or service principal used to create the
workspace must have owner or contributor access to the key vault.
Within the key vault, the user or service principal must have create, get, delete, and purge access to the key through a key vault access
policy. For more information, see Azure Key Vault security.
User-assigned managed identity with Azure Machine Learning compute cluster

To assign a user assigned identity to an Azure Machine Learning compute cluster, you need write permissions to create the compute and
the Managed Identity Operator Role. For more information on Azure RBAC with Managed Identities, read How to manage user assigned
identity
MLflow operations
To perform MLflow operations with your Azure Machine Learning workspace, use the following scopes your custom role:
MLflow operation Scope
(V1) List, read, create, update or delete experiments Microsoft.MachineLearningServices/workspaces/experiments/*
(V2) List, read, create, update or delete jobs Microsoft.MachineLearningServices/workspaces/jobs/*
Get registered model by name, fetch a list of all registered models in the registry, search Microsoft.MachineLearningServices/workspaces/models/*/read
for registered models, latest version models for each requests stage, get a registered
model's version, search model versions, get URI where a model version's artifacts are
stored, search for runs by experiment ids
Create a new registered model, update a registered model's name/description, rename Microsoft.MachineLearningServices/workspaces/models/*/write
existing registered model, create new version of the model, update a model version's
description, transition a registered model to one of the stages
Delete a registered model along with all its version, delete specific versions of a Microsoft.MachineLearningServices/workspaces/models/*/delete
registered model
Example custom roles
Data scientist
Allows a data scientist to perform all operations inside a workspace except:
Creation of compute
Deploying models to a production AKS cluster
Deploying a pipeline endpoint in production
data_scientist_custom_role.json :
JSON
{
"Name": "Data Scientist Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute or deploy production endpoints.",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/*/read",
"Microsoft.MachineLearningServices/workspaces/*/action",
"Microsoft.MachineLearningServices/workspaces/*/write"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/delete",
"Microsoft.Authorization/*",
"Microsoft.MachineLearningServices/workspaces/computes/listKeys/action",
"Microsoft.MachineLearningServices/workspaces/listKeys/action",
"Microsoft.MachineLearningServices/workspaces/services/aks/write",
"Microsoft.MachineLearningServices/workspaces/services/aks/delete",
"Microsoft.MachineLearningServices/workspaces/endpoints/pipelines/write"
],
"/subscriptions/<subscription_id>"
]
}
Data scientist restricted

A more restricted role definition without wildcards in the allowed actions. It can perform all operations inside a workspace except:
Creation of compute
data_scientist_restricted_custom_role.json :
JSON
{
"Name": "Data Scientist Restricted Custom",
"IsCustom": true,
"Description": "Can run experiment but can't create or delete compute or deploy production endpoints",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/computes/start/action",
"Microsoft.MachineLearningServices/workspaces/computes/stop/action",
"Microsoft.MachineLearningServices/workspaces/computes/restart/action",
"Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/action",
"Microsoft.MachineLearningServices/workspaces/notebooks/storage/write",
"Microsoft.MachineLearningServices/workspaces/notebooks/storage/delete",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/write",
"Microsoft.MachineLearningServices/workspaces/experiments/write",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/submit/action",
"Microsoft.MachineLearningServices/workspaces/pipelinedrafts/write",
"Microsoft.MachineLearningServices/workspaces/metadata/snapshots/write",
"Microsoft.MachineLearningServices/workspaces/metadata/artifacts/write",
"Microsoft.MachineLearningServices/workspaces/environments/write",
"Microsoft.MachineLearningServices/workspaces/models/*/write",
"Microsoft.MachineLearningServices/workspaces/modules/write",
"Microsoft.MachineLearningServices/workspaces/components/*/write",
"Microsoft.MachineLearningServices/workspaces/datasets/*/write",
"Microsoft.MachineLearningServices/workspaces/datasets/*/delete",
"Microsoft.MachineLearningServices/workspaces/computes/listNodes/action",
"Microsoft.MachineLearningServices/workspaces/environments/build/action"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/computes/write",
"Microsoft.MachineLearningServices/workspaces/computes/delete",
"Microsoft.MachineLearningServices/workspaces/datasets/registered/profile/read",
"Microsoft.MachineLearningServices/workspaces/datasets/registered/preview/read",
"Microsoft.MachineLearningServices/workspaces/datasets/unregistered/profile/read",
"Microsoft.MachineLearningServices/workspaces/datasets/unregistered/preview/read",
"Microsoft.MachineLearningServices/workspaces/datasets/registered/schema/read",
"Microsoft.MachineLearningServices/workspaces/datasets/unregistered/schema/read",
"Microsoft.MachineLearningServices/workspaces/datastores/write",
"Microsoft.MachineLearningServices/workspaces/datastores/delete"
],
]
}
MLflow data scientist

Allows a data scientist to perform all MLflow Azure Machine Learning supported operations except:
Creation of compute
mlflow_data_scientist_custom_role.json :
JSON
{
"Name": "MLFlow Data Scientist Custom",
"IsCustom": true,
"Description": "Can perform azureml mlflow integrated functionalities that includes mlflow tracking, projects, model
registry",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/experiments/*",
"Microsoft.MachineLearningServices/workspaces/jobs/*",
"Microsoft.MachineLearningServices/workspaces/models/*"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/services/aks/write",
"Microsoft.MachineLearningServices/workspaces/services/aks/delete",
"Microsoft.MachineLearningServices/workspaces/endpoints/pipelines/write"
],
]
}
MLOps
Allows you to assign a role to a service principal and use that to automate your MLOps pipelines. For example, to submit runs against an
already published pipeline:
mlops_custom_role.json :
JSON
{
"Name": "MLOps Custom",
"IsCustom": true,
"Description": "Can run pipelines against a published pipeline endpoint",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/read",
"Microsoft.MachineLearningServices/workspaces/endpoints/pipelines/read",
"Microsoft.MachineLearningServices/workspaces/metadata/artifacts/read",
"Microsoft.MachineLearningServices/workspaces/metadata/snapshots/read",
"Microsoft.MachineLearningServices/workspaces/environments/read",
"Microsoft.MachineLearningServices/workspaces/metadata/secrets/read",
"Microsoft.MachineLearningServices/workspaces/modules/read",
"Microsoft.MachineLearningServices/workspaces/components/read",
"Microsoft.MachineLearningServices/workspaces/datasets/*/read",
"Microsoft.MachineLearningServices/workspaces/datastores/read",
"Microsoft.MachineLearningServices/workspaces/environments/write",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/read",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/write",
"Microsoft.MachineLearningServices/workspaces/experiments/runs/submit/action",
"Microsoft.MachineLearningServices/workspaces/experiments/jobs/read",
"Microsoft.MachineLearningServices/workspaces/experiments/jobs/write",
"Microsoft.MachineLearningServices/workspaces/metadata/artifacts/write",
"Microsoft.MachineLearningServices/workspaces/metadata/snapshots/write",
"Microsoft.MachineLearningServices/workspaces/metadata/codes/*/write",
"Microsoft.MachineLearningServices/workspaces/environments/build/action",
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/computes/write",
"Microsoft.MachineLearningServices/workspaces/computes/delete",
"Microsoft.Authorization/*"
],
]
}
Workspace Admin
Allows you to perform all operations within the scope of a workspace, except:
Creating a new workspace

Assigning subscription or workspace level quotas
The workspace admin also cannot create a new role. It can only assign existing built-in or custom roles within the scope of their workspace:
workspace_admin_custom_role.json :
JSON
{
"Name": "Workspace Admin Custom",
"IsCustom": true,
"Description": "Can perform all operations except quota management and upgrades",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/*/action",
"Microsoft.MachineLearningServices/workspaces/*/write",
"Microsoft.Authorization/roleAssignments/*"
],
"NotActions": [
"Microsoft.MachineLearningServices/workspaces/write"
],
]
}
Data labeling
Data labeler
Allows you to define a role scoped only to labeling data:
labeler_custom_role.json :
JSON
{
"Name": "Labeler Custom",
"IsCustom": true,
"Description": "Can label data for Labeling",
"Actions": [
"Microsoft.MachineLearningServices/workspaces/read",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/read",
"Microsoft.MachineLearningServices/workspaces/labeling/projects/summary/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/read",
"Microsoft.MachineLearningServices/workspaces/labeling/labels/write"
],
"NotActions": [
],
]
}
Troubleshooting
Here are a few things to be aware of while you use Azure role-based access control (Azure RBAC):
When you create a resource in Azure, such as a workspace, you're not directly the owner of the resource. Your role is inherited from
the highest scope role that you're authorized against in that subscription. As an example if you're a Network Administrator, and have
the permissions to create a Machine Learning workspace, you would be assigned the Network Administrator role against that
workspace, and not the Owner role.
To perform quota operations in a workspace, you need subscription level permissions. This means setting either subscription level
quota or workspace level quota for your managed compute resources can only happen if you have write permissions at the
subscription scope.
When there are two role assignments to the same Azure Active Directory user with conflicting sections of Actions/NotActions, your
operations listed in NotActions from one role might not take effect if they are also listed as Actions in another role. To learn more
about how Azure parses role assignments, read How Azure RBAC determines if a user has access to a resource
To deploy resources into a virtual network or subnet, your user account must have permissions to the following actions in Azure role-
based access control (Azure RBAC):
"Microsoft.Network/*/read" on the virtual network resource. This permission isn't needed for Azure Resource Manager (ARM)
template deployments.
"Microsoft.Network/virtualNetworks/join/action" on the virtual network resource.
"Microsoft.Network/virtualNetworks/subnets/join/action" on the subnet resource.
For more information on Azure RBAC with networking, see the Networking built-in roles
It can sometimes take up to 1 hour for your new role assignments to take effect over cached permissions across the stack.
Next steps
Enterprise security overview
Virtual network isolation and privacy overview
Tutorial: Train and deploy a model
Resource provider operations
Secure Azure Machine Learning
workspace resources using virtual
networks (VNets)
Article • 04/04/2023
Secure Azure Machine Learning workspace resources and compute environments using
Azure Virtual Networks (VNets). This article uses an example scenario to show you how
to configure a complete virtual network.
 Tip
This article is part of a series on securing an Azure Machine Learning workflow. See the
other articles in this series:
Secure the workspace resources

Secure the training environment
Secure the inference environment
Enable studio functionality
Use custom DNS
Use a firewall
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or
Tutorial: Create a secure workspace using a template.
Prerequisites
This article assumes that you have familiarity with the following topics:
Azure Virtual Networks

IP networking
Azure Machine Learning workspace with private endpoint
Network Security Groups (NSG)
Network firewalls
Example scenario
In this section, you learn how a common network scenario is set up to secure Azure
Machine Learning communication with private IP addresses.
The following table compares how services access different parts of an Azure Machine
Learning network with and without a VNet:
Scenario Workspace Associated Training Inferencing

resources compute compute
environment environment
No virtual network Public IP Public IP Public IP Public IP
Public workspace, all other Public IP Public IP Public IP Private IP

resources in a virtual (service
network endpoint)
- or -
Private IP
(private
endpoint)
Secure resources in a Private IP Public IP Private IP Private IP

virtual network (private (service
endpoint) endpoint)
- or -
Private IP
(private
endpoint)
Workspace - Create a private endpoint for your workspace. The private endpoint
connects the workspace to the vnet through several private IP addresses.
Public access - You can optionally enable public access for a secured workspace.
Associated resource - Use service endpoints or private endpoints to connect to
workspace resources like Azure storage, Azure Key Vault. For Azure Container
Services, use a private endpoint.
Service endpoints provide the identity of your virtual network to the Azure
service. Once you enable service endpoints in your virtual network, you can add
a virtual network rule to secure the Azure service resources to your virtual
network. Service endpoints use public IP addresses.
Private endpoints are network interfaces that securely connect you to a service
powered by Azure Private Link. Private endpoint uses a private IP address from
your VNet, effectively bringing the service into your VNet.
Training compute access - Access training compute targets like Azure Machine
Learning Compute Instance and Azure Machine Learning Compute Clusters with
public or private IP addresses.
Inference compute access - Access Azure Kubernetes Services (AKS) compute
clusters with private IP addresses.
The next sections show you how to secure the network scenario described above. To
secure your network, you must:
1. Secure the workspace and associated resources.

2. Secure the training environment.
3. Secure the inferencing environment.
4. Optionally: enable studio functionality.
5. Configure firewall settings.
6. Configure DNS name resolution.
Public workspace and secured resources
） Important
While this is a supported configuration for Azure Machine Learning, Microsoft

doesn't recommend it. The data in the Azure Storage Account behind the virtual
network can be exposed on the public workspace. You should verify this
configuration with your security team before using it in production.
If you want to access the workspace over the public internet while keeping all the
associated resources secured in a virtual network, use the following steps:
1. Create an Azure Virtual Network that will contain the resources used by the
workspace.
2. Use one of the following options to create a publicly accessible workspace:

Create an Azure Machine Learning workspace that does not use the virtual
network. For more information, see Manage Azure Machine Learning
workspaces.
Create a Private Link-enabled workspace to enable communication between
your VNet and workspace. Then enable public access to the workspace.
3. Add the following services to the virtual network by using either a service
endpoint or a private endpoint. Also allow trusted Microsoft services to access
these services:
Service Endpoint Allow trusted information

information
Azure Key Vault Service endpoint Allow trusted Microsoft services to bypass
Private endpoint this firewall
Azure Storage Service and private Grant access to trusted Azure services
Account endpoint
Private endpoint
Azure Container Private endpoint Allow trusted services

Registry
4. In properties for the Azure Storage Account(s) for your workspace, add your client
IP address to the allowed list in firewall settings. For more information, see
Configure firewalls and virtual networks.
Secure the workspace and associated resources

Use the following steps to secure your workspace and associated resources. These steps
allow your services to communicate in the virtual network.
1. Create an Azure Virtual Networks that will contain the workspace and other
resources. Then create a Private Link-enabled workspace to enable communication
between your VNet and workspace.
2. Add the following services to the virtual network by using either a service
endpoint or a private endpoint. Also allow trusted Microsoft services to access
these services:

information
Azure Key Vault Service endpoint Allow trusted Microsoft services to bypass
Private endpoint this firewall
information
Azure Storage Service and private Grant access from Azure resource instances
Account endpoint or
Private endpoint Grant access to trusted Azure services
Azure Container Private endpoint Allow trusted services

Registry
For detailed instructions on how to complete these steps, see Secure an Azure Machine
Learning workspace.
Limitations
Securing your workspace and associated resources within a virtual network have the
following limitations:
All resources must be behind the same VNet. However, subnets within the same
VNet are allowed.
In this section, you learn how to secure the training environment in Azure Machine
Learning. You also learn how Azure Machine Learning completes a training job to
understand how the network configurations work together.
To secure the training environment, use the following steps:
1. Create an Azure Machine Learning compute instance and computer cluster in the
virtual network to run the training job.
2. If your compute cluster or compute instance uses a public IP address, you must
Allow inbound communication so that management services can submit jobs to
your compute resources.
 Tip
Compute cluster and compute instance can be created with or without a

public IP address. If created with a public IP address, you get a load balancer
with a public IP to accept the inbound access from Azure batch service and
Azure Machine Learning service. You need to configure User Defined Routing
(UDR) if you use a firewall. If created without a public IP, you get a private link
service to accept the inbound access from Azure batch service and Azure
Machine Learning service without a public IP.
For detailed instructions on how to complete these steps, see Secure a training
environment.
Example training job submission

In this section, you learn how Azure Machine Learning securely communicates between
services to submit a training job. This shows you how all your configurations work
together to secure communication.
1. The client uploads training scripts and training data to storage accounts that are
secured with a service or private endpoint.
2. The client submits a training job to the Azure Machine Learning workspace
through the private endpoint.
3. Azure Batch service receives the job from the workspace. It then submits the
training job to the compute environment through the public load balancer for the
compute resource.
4. The compute resource receives the job and begins training. The compute resource
uses information stored in key vault to access storage accounts to download
training files and upload output.
Limitations
Azure Compute Instance and Azure Compute Clusters must be in the same VNet,
region, and subscription as the workspace and its associated resources.
Secure the inferencing environment

In this section, you learn the options available for securing an inferencing environment
when using the Azure CLI extension for ML v1 or the Azure Machine Learning Python
SDK v1. When doing a v1 deployment, we recommend that you use Azure Kubernetes
Services (AKS) clusters for high-scale, production deployments.
You have two options for AKS clusters in a virtual network:
Deploy or attach a default AKS cluster to your VNet.

Attach a private AKS cluster to your VNet.
Default AKS clusters have a control plane with public IP addresses. You can add a
default AKS cluster to your VNet during the deployment or attach a cluster after it's
created.
Private AKS clusters have a control plane, which can only be accessed through private
IPs. Private AKS clusters must be attached after the cluster is created.
For detailed instructions on how to add default and private clusters, see Secure an
inferencing environment.
Regardless default AKS cluster or private AKS cluster used, if your AKS cluster is behind
of VNET, your workspace and its associate resources (storage, key vault, and ACR) must
have private endpoints or service endpoints in the same VNET as the AKS cluster.
The following network diagram shows a secured Azure Machine Learning workspace
with a private AKS cluster attached to the virtual network.
Optional: Enable public access

You can secure the workspace behind a VNet using a private endpoint and still allow
access over the public internet. The initial configuration is the same as securing the
workspace and associated resources.
After securing the workspace with a private endpoint, use the following steps to enable
clients to develop remotely using either the SDK or Azure Machine Learning studio:
1. Enable public access to the workspace.

2. Configure the Azure Storage firewall to allow communication with the IP address
of clients that connect over the public internet.
Optional: enable studio functionality

If your storage is in a VNet, you must use extra configuration steps to enable full
functionality in studio. By default, the following features are disabled:
Preview data in the studio.

Visualize data in the designer.
Deploy a model in the designer.
Submit an AutoML experiment.
Start a labeling project.
To enable full studio functionality, see Use Azure Machine Learning studio in a virtual
network.
Limitations
ML-assisted data labeling doesn't support a default storage account behind a virtual
network. Instead, use a storage account other than the default for ML assisted data
labeling.
 Tip
As long as it is not the default storage account, the account used by data labeling
can be secured behind the virtual network.
Configure firewall settings

Configure your firewall to control traffic between your Azure Machine Learning
workspace resources and the public internet. While we recommend Azure Firewall, you
can use other firewall products.
For more information on firewall settings, see Use workspace behind a Firewall.
Custom DNS
If you need to use a custom DNS solution for your virtual network, you must add host
records for your workspace.
For more information on the required domain names and IP addresses, see how to use a
workspace with a custom DNS server.
Microsoft Sentinel
Microsoft Sentinel is a security solution that can integrate with Azure Machine Learning.
For example, using Jupyter notebooks provided through Azure Machine Learning. For
more information, see Use Jupyter notebooks to hunt for security threats.
Public access
Microsoft Sentinel can automatically create a workspace for you if you're OK with a
public endpoint. In this configuration, the security operations center (SOC) analysts and
system administrators connect to notebooks in your workspace through Sentinel.
For information on this process, see Create an Azure Machine Learning workspace from
Microsoft Sentinel
Private endpoint
If you want to secure your workspace and associated resources in a VNet, you must
create the Azure Machine Learning workspace first. You must also create a virtual
machine 'jump box' in the same VNet as your workspace, and enable Azure Bastion
connectivity to it. Similar to the public configuration, SOC analysts and administrators
can connect using Microsoft Sentinel, but some operations must be performed using
Azure Bastion to connect to the VM.
For more information on this configuration, see Create an Azure Machine Learning
workspace from Microsoft Sentinel
Next steps

Use custom DNS
Use a firewall
Secure an Azure Machine Learning
workspace with virtual networks (v1)
Article • 05/23/2023
In this article, you learn how to secure an Azure Machine Learning workspace and its
associated resources in an Azure Virtual Network.
 Tip
Virtual network overview

Use custom DNS
Use a firewall
API platform network isolation
For a tutorial on creating a secure workspace, see Tutorial: Create a secure workspace or
Tutorial: Create a secure workspace using a template.
In this article you learn how to enable the following workspaces resources in a virtual
network:
＂ Azure Machine Learning workspace

＂ Azure Storage accounts
＂ Azure Machine Learning datastores and datasets
＂ Azure Key Vault
＂ Azure Container Registry
Prerequisites
Read the Network security overview article to understand common virtual network
scenarios and overall virtual network architecture.
Read the Azure Machine Learning best practices for enterprise security article to
learn about best practices.
An existing virtual network and subnet to use with your compute resources.
） Important
We do not recommend using the 172.17.0.0/16 IP address range for your

VNet. This is the default subnet range used by the Docker bridge network.
Other ranges may also conflict depending on what you want to connect to the
virtual network. For example, if you plan to connect your on premises network
to the VNet, and your on-premises network also uses the 172.16.0.0/16 range.
Ultimately, it is up to you to plan your network infrastructure.
To deploy resources into a virtual network or subnet, your user account must have
permissions to the following actions in Azure role-based access control (Azure
RBAC):
"Microsoft.Network/*/read" on the virtual network resource. This permission
isn't needed for Azure Resource Manager (ARM) template deployments.
"Microsoft.Network/virtualNetworks/join/action" on the virtual network
resource.
"Microsoft.Network/virtualNetworks/subnets/join/action" on the subnet
resource.
For more information on Azure RBAC with networking, see the Networking built-in
roles

Your Azure Container Registry must be Premium version. For more information on
upgrading, see Changing SKUs.
If your Azure Container Registry uses a private endpoint, it must be in the same
virtual network as the storage account and compute targets used for training or
inference. If it uses a service endpoint, it must be in the same virtual network and
subnet as the storage account and compute targets.
Your Azure Machine Learning workspace must contain an Azure Machine Learning
compute cluster.
Limitations

If you plan to use Azure Machine Learning studio and the storage account is also
in the VNet, there are extra validation requirements:
and storage private endpoint must be in the same VNet. In this case, they can
be in different subnets.
Azure Container Instances


When ACR is behind a virtual network, Azure Machine Learning can't use it to directly
build Docker images. Instead, the compute cluster is used to build the images.
） Important
The compute cluster used to build Docker images needs to be able to access the
package repositories that are used to train and deploy your models. You may need
to add network security rules that allow access to public repos, use private Python
packages, or use custom Docker images that already include the packages.
２ Warning
If your Azure Container Registry uses a private endpoint or service endpoint to

communicate with the virtual network, you cannot use a managed identity with an
Azure Machine Learning compute cluster.
Azure Monitor
２ Warning
Azure Monitor supports using Azure Private Link to connect to a VNet. However,
you must use the open Private Link mode in Azure Monitor. For more information,
see Private Link access modes: Private only vs. Open.
Required public internet access

Azure Machine Learning requires both inbound and outbound access to the public
internet. The following tables provide an overview of the required access and what
purpose it serves. For service tags that end in .region , replace region with the Azure
region that contains your workspace. For example, Storage.westus :
 Tip
The required tab lists the required inbound and outbound configuration. The
situational tab lists optional inbound and outbound configurations required by
specific configurations you may want to enable.
Required
Direction Protocol Service tag Purpose

&
ports
Outbound TCP: 80, AzureActiveDirectory Authentication using Azure

443 AD.
Outbound TCP: 443, AzureMachineLearning Using Azure Machine

18881 Learning services.
UDP: Python intellisense in
5831 notebooks uses port 18881.
Creating, updating, and
deleting an Azure Machine
Learning compute instance
uses port 5831.
&
ports
Outbound ANY: BatchNodeManagement.region Communication with Azure

443 Batch back-end for Azure
Machine Learning compute
instances/clusters.
Outbound TCP: 443 AzureResourceManager Creation of Azure resources

with Azure Machine Learning,
Azure CLI, and Azure Machine
Learning SDK.
Outbound TCP: 443 Storage.region Access data stored in the

Azure Storage Account for
compute cluster and
compute instance. For
information on preventing
data exfiltration over this
outbound, see Data
exfiltration protection.
Outbound TCP: 443 AzureFrontDoor.FrontEnd Global entry point for Azure

* Not needed in Azure China. Machine Learning studio .
Store images and
environments for AutoML.
For information on
preventing data exfiltration
over this outbound, see Data
Outbound TCP: 443 MicrosoftContainerRegistry.region Access docker images

Note that this tag has a provided by Microsoft. Setup
dependency on the of the Azure Machine
AzureFrontDoor.FirstParty tag Learning router for Azure
Kubernetes Service.
 Tip
If you need the IP addresses instead of service tags, use one of the following
options:
Download a list from Azure IP Ranges and Service Tags .

Use the Azure CLI az network list-service-tags command.
Use the Azure PowerShell Get-AzNetworkServiceTag command.
The IP addresses may change periodically.
You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft
sites for the installation of packages required by your machine learning project. The
following table lists commonly used repositories for machine learning:
Host name Purpose
anaconda.com Used to install default packages.

*.anaconda.com
*.anaconda.org Used to get repo data.
pypi.org Used to list dependencies from the default index,

if any, and the index isn't overwritten by user
settings. If the index is overwritten, you must also
allow *.pythonhosted.org .
cloud.r-project.org Used when installing CRAN packages for R

development.
*pytorch.org Used by some examples based on PyTorch.
*.tensorflow.org Used by some examples based on Tensorflow.
code.visualstudio.com Required to download and install Visual Studio

Code desktop. This isn't required for Visual
Studio Code Web.
update.code.visualstudio.com Used to retrieve Visual Studio Code server bits

*.vo.msecnd.net that are installed on the compute instance
through a setup script.
marketplace.visualstudio.com Required to download and install Visual Studio

vscode.blob.core.windows.net Code extensions. These hosts enable the remote
*.gallerycdn.vsassets.io connection to Compute Instances provided by
the Azure ML extension for Visual Studio Code.
For more information, see Connect to an Azure
Machine Learning compute instance in Visual
Studio Code.
raw.githubusercontent.com/microsoft/vscode- Used to retrieve websocket server bits, which are

tools-for- installed on the compute instance. The
ai/master/azureml_remote_websocket_server/* websocket server is used to transmit requests
from Visual Studio Code client (desktop
application) to Visual Studio Code server running
on the compute instance.
７ Note
When using the Azure Machine Learning VS Code extension the remote
compute instance will require an access to public repositories to install the
packages required by the extension. If the compute instance requires a proxy to
access these public repositories or the Internet, you will need to set and export the
HTTP_PROXY and HTTPS_PROXY environment variables in the ~/.bashrc file of the
compute instance. This process can be automated at provisioning time by using a

custom script.
When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the
following traffic to the AKS VNet:
General inbound/outbound requirements for AKS as described in the Restrict

egress traffic in Azure Kubernetes Service article.
Outbound to mcr.microsoft.com.
When deploying a model to an AKS cluster, use the guidance in the Deploy ML
models to Azure Kubernetes Service article.
For information on using a firewall solution, see Use a firewall with Azure Machine
Learning.
Secure the workspace with private endpoint

Azure Private Link lets you connect to your workspace using a private endpoint. The
private endpoint is a set of private IP addresses within your virtual network. You can
then limit access to your workspace to only occur over the private IP addresses. A
private endpoint helps reduce the risk of data exfiltration.
For more information on configuring a private endpoint for your workspace, see How to
configure a private endpoint.
２ Warning
Securing a workspace with private endpoints does not ensure end-to-end security
by itself. You must follow the steps in the rest of this article, and the VNet series, to
secure individual components of your solution. For example, if you use a private
endpoint for the workspace, but your Azure Storage Account is not behind the
VNet, traffic between the workspace and storage does not use the VNet for
security.
Secure Azure storage accounts
Azure Machine Learning supports storage accounts configured to use either a private
endpoint or service endpoint.
Private endpoint
1. In the Azure portal, select the Azure Storage Account.
2. Use the information in Use private endpoints for Azure Storage to add private
endpoints for the following storage resources:
Blob
File
Queue - Only needed if you plan to use ParallelRunStep in an Azure
Machine Learning pipeline.
Table - Only needed if you plan to use ParallelRunStep in an Azure
Machine Learning pipeline.
 Tip
When configuring a storage account that is not the default storage,

select the Target subresource type that corresponds to the storage
account you want to add.
3. After creating the private endpoints for the storage resources, select the
Firewalls and virtual networks tab under Networking for the storage account.
4. Select Selected networks, and then under Resource instances, select

Microsoft.MachineLearningServices/Workspace as the Resource type. Select
your workspace using Instance name. For more information, see Trusted
access based on system-assigned managed identity.
 Tip
Alternatively, you can select Allow Azure services on the trusted services
list to access this storage account to more broadly allow access from
trusted services. For more information, see Configure Azure Storage
firewalls and virtual networks.
5. Select Save to save the configuration.

 Tip
When using a private endpoint, you can also disable public access. For more
information, see disallow public read access.
Secure Azure Key Vault

Azure Machine Learning uses an associated Key Vault instance to store the following
credentials:
The associated storage account connection string

Passwords to Azure Container Repository instances
Connection strings to data stores
Azure key vault can be configured to use either a private endpoint or service endpoint.
To use Azure Machine Learning experimentation capabilities with Azure Key Vault
behind a virtual network, use the following steps:
 Tip
Regardless of whether you use a private endpoint or service endpoint, the key vault
must be in the same network as the private endpoint of the workspace.
Private endpoint
For information on using a private endpoint with Azure Key Vault, see Integrate Key
Vault with Azure Private Link.
Enable Azure Container Registry (ACR)
 Tip
If you did not use an existing Azure Container Registry when creating the
workspace, one may not exist. By default, the workspace will not create an ACR
instance until it needs one. To force the creation of one, train or deploy a model
using your workspace before using the steps in this section.
Azure Container Registry can be configured to use a private endpoint. Use the following
steps to configure your workspace to use ACR when it is in the virtual network:
1. Find the name of the Azure Container Registry for your workspace, using one of
the following methods:
Azure CLI
If you've installed the Machine Learning extension v1 for Azure CLI, you can
use the az ml workspace show command to show the workspace information.
Azure CLI
az ml workspace show -w yourworkspacename -g resourcegroupname --

query 'containerRegistry'
This command returns a value similar to

"/subscriptions/{GUID}/resourceGroups/{resourcegroupname}/providers/Micro
soft.ContainerRegistry/registries/{ACRname}" . The last part of the string is
the name of the Azure Container Registry for the workspace.
2. Limit access to your virtual network using the steps in Connect privately to an
Azure Container Registry. When adding the virtual network, select the virtual
network and subnet for your Azure Machine Learning resources.
3. Configure the ACR for the workspace to Allow access by trusted services.
4. Create an Azure Machine Learning compute cluster. This cluster is used to build
Docker images when ACR is behind a VNet. For more information, see Create a
compute cluster.
5. Use one of the following methods to configure the workspace to build Docker
images using the compute cluster.
） Important
The following limitations apply When using a compute cluster for image
builds:
Only a CPU SKU is supported.

If you use a compute cluster configured for no public IP address, you
must provide some way for the cluster to access the public internet.
Internet access is required when accessing images stored on the
Microsoft Container Registry, packages installed on Pypi, Conda, etc. You
need to configure User Defined Routing (UDR) to reach to a public IP to
access the internet. For example, you can use the public IP of your
firewall, or you can use Virtual Network NAT with a public IP. For more
information, see How to securely train in a VNet.
Azure CLI
You can use the az ml workspace update command to set a build compute.
The command is the same for both the v1 and v2 Azure CLI extensions for
machine learning. In the following command, replace myworkspace with your
workspace name, myresourcegroup with the resource group that contains the
workspace, and mycomputecluster with the compute cluster name:
Azure CLI
az ml workspace update --name myworkspace --resource-group

myresourcegroup --image-build-compute mycomputecluster
 Tip
When ACR is behind a VNet, you can also disable public access to it.
Datastores and datasets

The following table lists the services that you need to skip validation for:
Service Skip validation required?
Azure Blob storage Yes
Azure File share Yes
Azure Data Lake Store Gen1 No
Azure Data Lake Store Gen2 No

Service Skip validation required?
Azure SQL Database Yes
PostgreSql Yes
７ Note
Azure Data Lake Store Gen1 and Azure Data Lake Store Gen2 skip validation by
default, so you don't have to do anything.
The following code sample creates a new Azure Blob datastore and sets
skip_validation=True .
Python
account_key=account_key,
skip_validation=True ) // Set skip_validation to true
Use datasets
The syntax to skip dataset validation is similar for the following dataset types:
Delimited file
JSON
Parquet
SQL
File
The following code creates a new JSON dataset and sets validate=False .
Python
json_ds = Dataset.Tabular.from_json_lines_files(path=datastore_paths,
validate=False)
Secure Azure Monitor and Application Insights

To enable network isolation for Azure Monitor and the Application Insights instance for
the workspace, use the following steps:
1. Upgrade the Application Insights instance for your workspace. For steps on how to
upgrade, see Migrate to workspace-based Application Insights resources.
 Tip
New workspaces create a workspace-based Application Insights resource by

default.
2. Create an Azure Monitor Private Link Scope and add the Application Insights
instance from step 1 to the scope. For steps on how to do this, see Configure your
Azure Monitor private link.
Securely connect to your workspace

To connect to a workspace that's secured behind a VNet, use one of the following
methods:
Azure VPN gateway - Connects on-premises networks to the VNet over a private
connection. Connection is made over the public internet. There are two types of
VPN gateways that you might use:
Point-to-site: Each client computer uses a VPN client to connect to the VNet.
Site-to-site: A VPN device connects the VNet to your on-premises network.
ExpressRoute - Connects on-premises networks into the cloud over a private

connection. Connection is made using a connectivity provider.
Azure Bastion - In this scenario, you create an Azure Virtual Machine (sometimes
called a jump box) inside the VNet. You then connect to the VM using Azure
Bastion. Bastion allows you to connect to the VM using either an RDP or SSH
session from your local web browser. You then use the jump box as your
development environment. Since it is inside the VNet, it can directly access the
workspace. For an example of using a jump box, see Tutorial: Create a secure
workspace.
） Important
If you have problems connecting to the workspace, see Troubleshoot secure workspace
connectivity.
Workspace diagnostics
You can run diagnostics on your workspace from Azure Machine Learning studio or the
Python SDK. After diagnostics run, a list of any detected problems is returned. This list
includes links to possible solutions. For more information, see How to use workspace
diagnostics.
Next steps

Use custom DNS
Use a firewall
Tutorial: Create a secure workspace
Tutorial: Create a secure workspace using a template
training environment with virtual
networks (SDKv1)
Article • 05/23/2023
In this article, you learn how to secure training environments with a virtual network in
Azure Machine Learning using the Python SDK v1.
Azure Machine Learning compute instance and compute cluster can be used to securely
train models in a virtual network. When planning your environment, you can configure
the compute instance/cluster with or without a public IP address. The general
differences between the two are:
No public IP: Reduces costs as it doesn't have the same networking resource
requirements. Improves security by removing the requirement for inbound traffic
from the internet. However, there are additional configuration changes required to
enable outbound access to required resources (Azure Active Directory, Azure
Resource Manager, etc.).
Public IP: Works by default, but costs more due to additional Azure networking
resources. Requires inbound communication from the Azure Machine Learning
service over the public internet.
The following table contains the differences between these configurations:
Configuration With public IP Without public IP
Inbound traffic AzureMachineLearning service None

tag.
Outbound traffic By default, can access the By default, it can't access the internet. If it can
public internet with no still send outbound traffic to internet, it is
restrictions. because of Azure default outbound access
You can restrict what it and you have an NSG that allows outbound to
accesses using a Network the internet. We don't recommend using the
Security Group or firewall. default outbound access.
If you need outbound access to the internet,
we recommend using a Virtual Network NAT
gateway or Firewall instead if you need to
route outbound traffic to required resources
on the internet.
Configuration With public IP Without public IP
Azure Public IP address, load None

networking balancer, network interface
resources
You can also use Azure Databricks or HDInsight to train models in a virtual network.
 Tip
７ Note
For information on using the Azure Machine Learning studio and the Python SDK
v2, see Secure training environment (v2).
For a tutorial on creating a secure workspace, see Tutorial: Create a secure

workspace in Azure portal or Tutorial: Create a secure workspace using a
template.
In this article you learn how to secure the following training compute resources in a
virtual network:
＂ Azure Machine Learning compute cluster

＂ Azure Machine Learning compute instance
＂ Azure Databricks
＂ Virtual Machine
＂ HDInsight cluster
） Important
Prerequisites
An existing virtual network and subnet to use with your compute resources. This
VNet must be in the same subscription as your Azure Machine Learning
workspace.
We recommend putting the storage accounts used by your workspace and
training jobs in the same Azure region that you plan to use for your compute
instances and clusters. If they aren't in the same Azure region, you may incur
data transfer costs and increased network latency.
Make sure that WebSocket communication is allowed to
*.instances.azureml.net and *.instances.azureml.ms in your VNet.
WebSockets are used by Jupyter on compute instances.
An existing subnet in the virtual network. This subnet is used when creating
compute instances and clusters.
Make sure that the subnet isn't delegated to other Azure services.
Make sure that the subnet contains enough free IP addresses. Each compute
instance requires one IP address. Each node within a compute cluster requires
one IP address.
If you have your own DNS server, we recommend using DNS forwarding to resolve
the fully qualified domain names (FQDN) of compute instances and clusters. For
more information, see Use a custom DNS with Azure Machine Learning.
RBAC):
resource.
resource.
roles
Limitations
Azure Machine Learning compute cluster/instance
Compute clusters can be created in a different region and VNet than your
workspace. However, this functionality is only available using the SDK v2, CLI v2, or
studio. For more information, see the v2 version of secure training environments.
Compute cluster/instance deployment in virtual network isn't supported with

Azure Lighthouse.
Port 445 must be open for private network communications between your
compute instances and the default storage account during training. For example, if
your computes are in one VNet and the storage account is in another, don't block
port 445 to the storage account VNet.
Azure Databricks
The virtual network must be in the same subscription and region as the Azure
If the Azure Storage Account(s) for the workspace are also secured in a virtual
network, they must be in the same virtual network as the Azure Databricks cluster.
In addition to the databricks-private and databricks-public subnets used by Azure
Databricks, the default subnet created for the virtual network is also required.
Azure Databricks doesn't use a private endpoint to communicate with the virtual
network.
For more information on using Azure Databricks in a virtual network, see Deploy Azure
Databricks in your Azure Virtual Network.
Azure HDInsight or virtual machine

Azure Machine Learning supports only virtual machines that are running Ubuntu.
Compute instance/cluster with no public IP
） Important
If you have been using compute instances or compute clusters configured for no
public IP without opting-in to the preview, you will need to delete and recreate
them after January 20, 2023 (when the feature is generally available).
If you were previously using the preview of no public IP, you may also need to
modify what traffic you allow inbound and outbound, as the requirements have
changed for general availability:
Outbound requirements - Two additional outbound, which are only used for
the management of compute instances and clusters. The destination of these
service tags are owned by Microsoft:
AzureMachineLearning service tag on UDP port 5831.
BatchNodeManagement service tag on TCP port 443.
The following configurations are in addition to those listed in the Prerequisites section,
and are specific to creating a compute instances/clusters configured for no public IP:
You must use a workspace private endpoint for the compute resource to
communicate with Azure Machine Learning services from the VNet. For more
information, see Configure a private endpoint for Azure Machine Learning
workspace.
In your VNet, allow outbound traffic to the following service tags or fully qualified
domain names (FQDN):
Service tag Protocol Port Notes
AzureMachineLearning TCP 443/8787/18881 Communication with the Azure

UDP 5831 Machine Learning service.
BatchNodeManagement. ANY 443 Replace <region> with the Azure

<region> region that contains your Azure
Communication with Azure Batch.
Compute instance and compute
cluster are implemented using the
Azure Batch service.
Storage.<region> TCP 443 Replace <region> with the Azure

region that contains your Azure
Machine Learning workspace. This
service tag is used to communicate
with the Azure Storage account
used by Azure Batch.
） Important
The outbound access to Storage.<region> could potentially be used to
exfiltrate data from your workspace. By using a Service Endpoint Policy, you
can mitigate this vulnerability. For more information, see the Azure Machine
Learning data exfiltration prevention article.
FQDN Protocol Port Notes
<region>.tundra.azureml.ms UDP 5831 Replace <region> with

the Azure region that
contains your Azure
Machine Learning
workspace.
graph.windows.net TCP 443 Communication with

the Microsoft Graph
API.
*.instances.azureml.ms TCP 443/8787/18881 Communication with

Azure Machine
Learning.
*.<region>.batch.azure.com ANY 443 Replace <region> with

contains your Azure
Machine Learning
workspace.
Communication with
Azure Batch.
*. ANY 443 Replace <region> with

<region>.service.batch.azure.com the Azure region that
contains your Azure
Machine Learning
workspace.
Communication with
Azure Batch.
*.blob.core.windows.net TCP 443 Communication with

Azure Blob storage.
*.queue.core.windows.net TCP 443 Communication with

Azure Queue storage.
*.table.core.windows.net TCP 443 Communication with

Azure Table storage.
Create either a firewall and outbound rules or a NAT gateway and network service
groups to allow outbound traffic. Since the compute has no public IP address, it
can't communicate with resources on the public internet without this
configuration. For example, it wouldn't be able to communicate with Azure Active
Directory or Azure Resource Manager. Installing Python packages from public
sources would also require this configuration.
For more information on the outbound traffic that is used by Azure Machine
Learning, see the following articles:
Configure inbound and outbound network traffic.
Azure's outbound connectivity methods.
Use the following information to create a compute instance or cluster with no public IP
address:
To create a compute instance or compute cluster with no public IP, use the Azure
Machine Learning studio UI to create the resource:
1. Sign in to the Azure Machine Learning studio , and then select your subscription
and workspace.
2. Select the Compute page from the left navigation bar.
3. Select the + New from the navigation bar of compute instance or compute cluster.
4. Configure the VM size and configuration you need, then select Next.
5. From the Advanced Settings, Select Enable virtual network, your virtual network
and subnet, and finally select the No Public IP option under the VNet/subnet
section.

 Tip
You can also use the Azure Machine Learning SDK v2 or Azure CLI extension for ML
v2. For information on creating a compute instance or cluster with no public IP, see
the v2 version of Secure an Azure Machine Learning training environment article.
Compute instance/cluster with public IP

The following configurations are in addition to those listed in the Prerequisites section,
and are specific to creating compute instances/clusters that have a public IP:
If you put multiple compute instances/clusters in one virtual network, you may
need to request a quota increase for one or more of your resources. The Machine
Learning compute instance or cluster automatically allocates networking resources
in the resource group that contains the virtual network. For each compute
instance or cluster, the service allocates the following resources:
A network security group (NSG) is automatically created. This NSG allows

inbound TCP traffic on port 44224 from the AzureMachineLearning service tag.
） Important
Compute instance and compute cluster automatically create an NSG with

the required rules.
If you have another NSG at the subnet level, the rules in the subnet level
NSG mustn't conflict with the rules in the automatically created NSG.
To learn how the NSGs filter your network traffic, see How network
security groups filter network traffic.
One load balancer
For compute clusters, these resources are deleted every time the cluster scales
down to 0 nodes and created when scaling up.
For a compute instance, these resources are kept until the instance is deleted.
Stopping the instance doesn't remove the resources.
） Important
These resources are limited by the subscription's resource quotas. If the

virtual network resource group is locked then deletion of compute
cluster/instance will fail. Load balancer cannot be deleted until the compute
cluster/instance is deleted. Also please ensure there is no Azure Policy
assignment which prohibits creation of network security groups.
In your VNet, allow inbound TCP traffic on port 44224 from the
AzureMachineLearning service tag.
） Important
The compute instance/cluster is dynamically assigned an IP address when it is
created. Since the address is not known before creation, and inbound access
is required as part of the creation process, you cannot statically assign it on
your firewall. Instead, if you are using a firewall with the VNet you must create
a user-defined route to allow this inbound traffic.
In your VNet, allow outbound traffic to the following service tags:
Service tag Protocol Port Notes
AzureMachineLearning TCP 443/8787/18881 Communication with the Azure

UDP 5831 Machine Learning service.
BatchNodeManagement. ANY 443 Replace <region> with the Azure

<region> region that contains your Azure
Communication with Azure Batch.
Compute instance and compute
cluster are implemented using the
Azure Batch service.
Storage.<region> TCP 443 Replace <region> with the Azure

region that contains your Azure
Machine Learning workspace. This
service tag is used to communicate
with the Azure Storage account
used by Azure Batch.
） Important
The outbound access to Storage.<region> could potentially be used to

exfiltrate data from your workspace. By using a Service Endpoint Policy, you
can mitigate this vulnerability. For more information, see the Azure Machine
Learning data exfiltration prevention article.
<region>.tundra.azureml.ms UDP 5831 Replace <region> with

contains your Azure
Machine Learning
workspace.
graph.windows.net TCP 443 Communication with

the Microsoft Graph
API.
*.instances.azureml.ms TCP 443/8787/18881 Communication with

Azure Machine
Learning.
*.<region>.batch.azure.com ANY 443 Replace <region> with

contains your Azure
Machine Learning
workspace.
Communication with
Azure Batch.
*. ANY 443 Replace <region> with

<region>.service.batch.azure.com the Azure region that
contains your Azure
Machine Learning
workspace.
Communication with
Azure Batch.
*.blob.core.windows.net TCP 443 Communication with

Azure Blob storage.
*.queue.core.windows.net TCP 443 Communication with

Azure Queue storage.
*.table.core.windows.net TCP 443 Communication with

Azure Table storage.
Compute instance
Python
import datetime
import time
from azureml.core.compute import ComputeTarget, ComputeInstance

# Choose a name for your instance

# Compute instance name should be unique across the azure region
compute_name = "ci{}".format(ws._workspace_id)[:10]
# Verify that instance does not exist already

try:
instance = ComputeInstance(workspace=ws, name=compute_name)
print('Found existing instance, use it.')
compute_config = ComputeInstance.provisioning_configuration(
vm_size='STANDARD_D3_V2',
ssh_public_access=False,
vnet_resourcegroup_name='vnet_resourcegroup_name',
vnet_name='vnet_name',
subnet_name='subnet_name',
# admin_user_ssh_public_key='<my-sshkey>'
)
instance = ComputeInstance.create(ws, compute_name, compute_config)
instance.wait_for_completion(show_output=True)
When the creation process finishes, you train your model. For more information, see
Select and use a compute target for training.
７ Note
Azure Databricks
The virtual network must be in the same subscription and region as the Azure
If the Azure Storage Account(s) for the workspace are also secured in a virtual
network, they must be in the same virtual network as the Azure Databricks cluster.
In addition to the databricks-private and databricks-public subnets used by Azure
Databricks, the default subnet created for the virtual network is also required.
Azure Databricks doesn't use a private endpoint to communicate with the virtual
network.
For specific information on using Azure Databricks with a virtual network, see Deploy
Azure Databricks in your Azure Virtual Network.
Required public internet access to train models
） Important
While previous sections of this article describe configurations required to create

compute resources, the configuration information in this section is required to use
these resources to train models.
internet. The following tables provide an overview of the required access and what
purpose it serves. For service tags that end in .region , replace region with the Azure
region that contains your workspace. For example, Storage.westus :
 Tip
The required tab lists the required inbound and outbound configuration. The
situational tab lists optional inbound and outbound configurations required by
specific configurations you may want to enable.
Required

&
ports
Outbound TCP: 80, AzureActiveDirectory Authentication using Azure

443 AD.
Outbound TCP: 443, AzureMachineLearning Using Azure Machine

18881 Learning services.
UDP: Python intellisense in
5831 notebooks uses port
18881.
Creating, updating, and
deleting an Azure Machine
Learning compute instance
uses port 5831.
Outbound ANY: 443 BatchNodeManagement.region Communication with Azure

Batch back-end for Azure
Machine Learning compute
instances/clusters.
Outbound TCP: 443 AzureResourceManager Creation of Azure resources

with Azure Machine
Learning, Azure CLI, and
SDK.
Outbound TCP: 443 Storage.region Access data stored in the

Azure Storage Account for
&
ports
compute cluster and

compute instance. For
information on preventing
data exfiltration over this
outbound, see Data
Outbound TCP: 443 AzureFrontDoor.FrontEnd Global entry point for

* Not needed in Azure China. Azure Machine Learning
studio . Store images and
environments for AutoML.
For information on
preventing data exfiltration
over this outbound, see
Data exfiltration protection.
Outbound TCP: 443 MicrosoftContainerRegistry.region Access docker images

Note that this tag has a dependency provided by Microsoft.
on the AzureFrontDoor.FirstParty Setup of the Azure
tag Machine Learning router
for Azure Kubernetes
Service.
 Tip
If you need the IP addresses instead of service tags, use one of the following
options:
Download a list from Azure IP Ranges and Service Tags .

Use the Azure CLI az network list-service-tags command.
Use the Azure PowerShell Get-AzNetworkServiceTag command.
The IP addresses may change periodically.
You may also need to allow outbound traffic to Visual Studio Code and non-Microsoft
sites for the installation of packages required by your machine learning project. The
following table lists commonly used repositories for machine learning:
Host name Purpose

Host name Purpose
*.anaconda.com
pypi.org Used to list dependencies from the default

index, if any, and the index isn't overwritten
by user settings. If the index is overwritten,
you must also allow *.pythonhosted.org .
cloud.r-project.org Used when installing CRAN packages for R

development.
*.pytorch.org Used by some examples based on PyTorch.
*.tensorflow.org Used by some examples based on

Tensorflow.
code.visualstudio.com Required to download and install Visual

Studio Code desktop. This isn't required for
Visual Studio Code Web.
update.code.visualstudio.com Used to retrieve Visual Studio Code server

*.vo.msecnd.net bits that are installed on the compute
instance through a setup script.
marketplace.visualstudio.com Required to download and install Visual

vscode.blob.core.windows.net Studio Code extensions. These hosts enable
*.gallerycdn.vsassets.io the remote connection to Compute
Instances provided by the Azure ML
extension for Visual Studio Code. For more
information, see Connect to an Azure
Machine Learning compute instance in
Visual Studio Code.
raw.githubusercontent.com/microsoft/vscode- Used to retrieve websocket server bits,

tools-for- which are installed on the compute instance.
ai/master/azureml_remote_websocket_server/* The websocket server is used to transmit
requests from Visual Studio Code client
(desktop application) to Visual Studio Code
server running on the compute instance.
７ Note
When using the Azure Machine Learning VS Code extension the remote
compute instance will require an access to public repositories to install the
packages required by the extension. If the compute instance requires a proxy to
access these public repositories or the Internet, you will need to set and export the
HTTP_PROXY and HTTPS_PROXY environment variables in the ~/.bashrc file of the
compute instance. This process can be automated at provisioning time by using a

custom script.
When using Azure Kubernetes Service (AKS) with Azure Machine Learning, allow the
following traffic to the AKS VNet:
General inbound/outbound requirements for AKS as described in the Restrict

egress traffic in Azure Kubernetes Service article.
Outbound to mcr.microsoft.com.
When deploying a model to an AKS cluster, use the guidance in the Deploy ML
models to Azure Kubernetes Service article.
For information on using a firewall solution, see Use a firewall with Azure Machine
Learning.
Next steps
Virtual network overview (v1)

Secure inference environment (v1)
Use custom DNS
Use a firewall
inferencing environment with virtual
networks (v1)
Article • 04/04/2023
In this article, you learn how to secure inferencing environments with a virtual network
in Azure Machine Learning. This article is specific to the SDK/CLI v1 deployment
workflow of deploying a model as a web service.
 Tip
This article is part of a series on securing an Azure Machine Learning workflow. See
the other articles in this series:

Use custom DNS
Use a firewall

workspace or Tutorial: Create a secure workspace using a template.
In this article you learn how to secure the following inferencing resources in a virtual
network:
＂ Default Azure Kubernetes Service (AKS) cluster

＂ Private AKS cluster
＂ AKS cluster with private link
Prerequisites
An existing virtual network and subnet to use with your compute resources.
RBAC):
resource.
resource.
roles
） Important
date.

and Python SDK v2.
Limitations
Azure Container Instances


If your AKS cluster is behind of a VNET, your workspace and its associated
resources (storage, key vault, Azure Container Registry) must have private
endpoints or service endpoints in the same VNET as AKS cluster's VNET. Please
read tutorial create a secure workspace to add those private endpoints or service
endpoints to your VNET.
If your workspace has a private endpoint, the Azure Kubernetes Service cluster
must be in the same Azure region as the workspace.
Using a public fully qualified domain name (FQDN) with a private AKS cluster is not
supported with Azure Machine Learning.
） Important
To use an AKS cluster in a virtual network, first follow the prerequisites in Configure
advanced networking in Azure Kubernetes Service (AKS).
To add AKS in a virtual network to your workspace, use the following steps:
1. Sign in to Azure Machine Learning studio , and then select your subscription and
workspace.
2. Select Compute on the left, Inference clusters from the center, and then select +
New.
3. From the Create inference cluster dialog, select Create new and the VM size to
use for the cluster. Finally, select Next.
4. From the Configure Settings section, enter a Compute name, select the Cluster
Purpose, Number of nodes, and then select Advanced to display the network
settings. In the Configure virtual network area, set the following values:
Set the Virtual network to use.
 Tip
If your workspace uses a private endpoint to connect to the virtual

network, the Virtual network selection field is greyed out.
Set the Subnet to create the cluster in.
In the Kubernetes Service address range field, enter the Kubernetes service
address range. This address range uses a Classless Inter-Domain Routing
(CIDR) notation IP range to define the IP addresses that are available for the
cluster. It must not overlap with any subnet IP ranges (for example,
10.0.0.0/16).
In the Kubernetes DNS service IP address field, enter the Kubernetes DNS
service IP address. This IP address is assigned to the Kubernetes DNS service.
It must be within the Kubernetes service address range (for example,
10.0.0.10).
In the Docker bridge address field, enter the Docker bridge address. This IP
address is assigned to Docker Bridge. It must not be in any subnet IP ranges,
or the Kubernetes service address range (for example, 172.18.0.1/16).
5. When you deploy a model as a web service to AKS, a scoring endpoint is created
to handle inferencing requests. Make sure that the network security group (NSG)
that controls the virtual network has an inbound security rule enabled for the IP
address of the scoring endpoint if you want to call it from outside the virtual
network.
To find the IP address of the scoring endpoint, look at the scoring URI for the
deployed service. For information on viewing the scoring URI, see Consume a
model deployed as a web service.
） Important
Keep the default outbound rules for the NSG. For more information, see the
default security rules in Security groups.
](./media/how-to-secure-inferencing-vnet/aks-vnet-inbound-nsg-
scoring.png#lightbox)
） Important
The IP address shown in the image for the scoring endpoint will be different
for your deployments. While the same IP is shared by all deployments to one
AKS cluster, each AKS cluster will have a different IP address.
You can also use the Azure Machine Learning SDK to add Azure Kubernetes Service in a
virtual network. If you already have an AKS cluster in a virtual network, attach it to the
workspace as described in How to deploy to AKS. The following code creates a new AKS
instance in the default subnet of a virtual network named mynetwork :
Python
# Create the compute configuration and set virtual network information

config = AksCompute.provisioning_configuration(location="eastus2")
config.vnet_resourcegroup_name = "mygroup"
config.vnet_name = "mynetwork"
config.subnet_name = "default"
config.service_cidr = "10.0.0.0/16"
config.dns_service_ip = "10.0.0.10"
config.docker_bridge_cidr = "172.17.0.1/16"
# Create the compute target

name="myaks",
provisioning_configuration=config)
When the creation process is completed, you can run inference, or model scoring, on an
AKS cluster behind a virtual network. For more information, see How to deploy to AKS.
For more information on using Role-Based Access Control with Kubernetes, see Use
Azure RBAC for Kubernetes authorization.
Network contributor role
） Important
If you create or attach an AKS cluster by providing a virtual network you previously
created, you must grant the service principal (SP) or managed identity for your AKS
cluster the Network Contributor role to the resource group that contains the virtual
network.
To add the identity as network contributor, use the following steps:
1. To find the service principal or managed identity ID for AKS, use the following
Azure CLI commands. Replace <aks-cluster-name> with the name of the cluster.
Replace <resource-group-name> with the name of the resource group that contains
the AKS cluster:
Azure CLI
az aks show -n <aks-cluster-name> --resource-group <resource-group-

name> --query servicePrincipalProfile.clientId
If this command returns a value of msi , use the following command to identify the
principal ID for the managed identity:
Azure CLI
az aks show -n <aks-cluster-name> --resource-group <resource-group-

name> --query identity.principalId
2. To find the ID of the resource group that contains your virtual network, use the
following command. Replace <resource-group-name> with the name of the
resource group that contains the virtual network:
Azure CLI
az group show -n <resource-group-name> --query id
3. To add the service principal or managed identity as a network contributor, use the
following command. Replace <SP-or-managed-identity> with the ID returned for
the service principal or managed identity. Replace <resource-group-id> with the ID
returned for the resource group that contains the virtual network:
Azure CLI
az role assignment create --assignee <SP-or-managed-identity> --role

'Network Contributor' --scope <resource-group-id>
For more information on using the internal load balancer with AKS, see Use internal load
balancer with Azure Kubernetes Service.
Secure VNet traffic

There are two approaches to isolate traffic to and from the AKS cluster to the virtual
network:
Private AKS cluster: This approach uses Azure Private Link to secure
communications with the cluster for deployment/management operations.
Internal AKS load balancer: This approach configures the endpoint for your
deployments to AKS to use a private IP within the virtual network.
Private AKS cluster

By default, AKS clusters have a control plane, or API server, with public IP addresses. You
can configure AKS to use a private control plane by creating a private AKS cluster. For
more information, see Create a private Azure Kubernetes Service cluster.
After you create the private AKS cluster, attach the cluster to the virtual network to use
Internal AKS load balancer

By default, AKS deployments use a public load balancer. In this section, you learn how to
configure AKS to use an internal load balancer. An internal (or private) load balancer is
used where only private IPs are allowed as frontend. Internal load balancers are used to
load balance traffic inside a virtual network
A private load balancer is enabled by configuring AKS to use an internal load balancer.
Enable private load balancer
） Important
You cannot enable private IP when creating the Azure Kubernetes Service cluster in
Azure Machine Learning studio. You can create one with an internal load balancer
when using the Python SDK or Azure CLI extension for machine learning.
The following examples demonstrate how to create a new AKS cluster with a private
IP/internal load balancer using the SDK and CLI:
Python SDK
Python
import azureml.core

try:
aks_target = AksCompute(workspace=ws, name=aks_cluster_name)
print("Found existing aks cluster")
except:
print("Creating new aks cluster")
# Subnet to use for AKS

subnet_name = "default"
# Create AKS configuration
prov_config=AksCompute.provisioning_configuration(load_balancer_type="In
ternalLoadBalancer")
# Set info for existing virtual network to create the cluster in
prov_config.vnet_resourcegroup_name = "myvnetresourcegroup"
prov_config.vnet_name = "myvnetname"
prov_config.service_cidr = "10.0.0.0/16"
prov_config.dns_service_ip = "10.0.0.10"
prov_config.subnet_name = subnet_name
prov_config.load_balancer_subnet = subnet_name
prov_config.docker_bridge_cidr = "172.17.0.1/16"
# Create compute target

aks_target = ComputeTarget.create(workspace = ws, name = "myaks",
provisioning_configuration = prov_config)
# Wait for the operation to complete
When attaching an existing cluster to your workspace, use the load_balancer_type and
load_balancer_subnet parameters of AksCompute.attach_configuration() to configure
the load balancer.
For information on attaching a cluster, see Attach an existing AKS cluster.
Limit outbound connectivity from the virtual

network
If you don't want to use the default outbound rules and you do want to limit the
outbound access of your virtual network, you must allow access to Azure Container
Registry. For example, make sure that your Network Security Groups (NSG) contains a
rule that allows access to the AzureContainerRegistry.RegionName service tag where
`{RegionName} is the name of an Azure region.
Next steps

Use custom DNS
Use a firewall
Use Azure Machine Learning studio in
an Azure virtual network
Article • 04/04/2023
In this article, you learn how to use Azure Machine Learning studio in a virtual network.
The studio includes features like AutoML, the designer, and data labeling.
Some of the studio's features are disabled by default in a virtual network. To re-enable
these features, you must enable managed identity for storage accounts you intend to
use in the studio.
The following operations are disabled by default in a virtual network:
Preview data in the studio.

Visualize data in the designer.
Deploy a model in the designer.
Submit an AutoML experiment.
Start a labeling project.
The studio supports reading data from the following datastore types in a virtual
network:
Azure Storage Account (blob & file)

Azure SQL Database
＂ Give the studio access to data stored inside of a virtual network.

＂ Access the studio from a resource inside of a virtual network.
＂ Understand how the studio impacts storage security.
 Tip

Use custom DNS

Use a firewall

workspace or Tutorial: Create a secure workspace using a template.
Prerequisites
Read the Network security overview to understand common virtual network
scenarios and architecture.
A pre-existing virtual network and subnet to use.
An existing Azure Machine Learning workspace with a private endpoint.
An existing Azure storage account added your virtual network.
Limitations

When the storage account is in the VNet, there are extra validation requirements
when using studio:
and storage private endpoint must be in the same VNet. In this case, they can
be in different subnets.
Designer sample pipeline

There's a known issue where user cannot run sample pipeline in Designer homepage.
This is the sample dataset used in the sample pipeline is Azure Global dataset, and it
cannot satisfy all virtual network environment.
To resolve this issue, you can use a public workspace to run sample pipeline to get to
know how to use the designer and then replace the sample dataset with your own
dataset in the workspace within virtual network.
Datastore: Azure Storage Account
Use the following steps to enable access to data stored in Azure Blob and File storage:
 Tip
The first step is not required for the default storage account for the workspace. All
other steps are required for any storage account behind the VNet and used by the
workspace, including the default storage account.
1. If the storage account is the default storage for your workspace, skip this step. If
it is not the default, Grant the workspace managed identity the 'Storage Blob
Data Reader' role for the Azure storage account so that it can read data from blob
storage.
For more information, see the Blob Data Reader built-in role.
2. Grant the workspace managed identity the 'Reader' role for storage private
endpoints. If your storage service uses a private endpoint, grant the workspace's
managed identity Reader access to the private endpoint. The workspace's
managed identity in Azure AD has the same name as your Azure Machine Learning
workspace.
 Tip
Your storage account may have multiple private endpoints. For example, one
storage account may have separate private endpoint for blob, file, and dfs
(Azure Data Lake Storage Gen2). Add the managed identity to all these
endpoints.
For more information, see the Reader built-in role.
3. Enable managed identity authentication for default storage accounts. Each Azure
Machine Learning workspace has two default storage accounts, a default blob
storage account and a default file store account, which are defined when you
create your workspace. You can also set new defaults in the Datastore
management page.
The following table describes why managed identity authentication is used for
your workspace default storage accounts.
Storage Notes
account
Workspace Stores model assets from the designer. Enable managed identity
default authentication on this storage account to deploy models in the designer. If
blob managed identity authentication is disabled, the user's identity is used to
storage access data stored in the blob.
You can visualize and run a designer pipeline if it uses a non-default datastore
that has been configured to use managed identity. However, if you try to
deploy a trained model without managed identity enabled on the default
datastore, deployment will fail regardless of any other datastores in use.
Workspace Stores AutoML experiment assets. Enable managed identity authentication on

default file this storage account to submit AutoML experiments.
store
4. Configure datastores to use managed identity authentication. After you add an

Azure storage account to your virtual network with either a service endpoint or
private endpoint, you must configure your datastore to use managed identity
authentication. Doing so lets the studio access data in your storage account.
Azure Machine Learning uses datastore to connect to storage accounts. When

creating a new datastore, use the following steps to configure a datastore to use
managed identity authentication:
a. In the studio, select Datastores.

b. To update an existing datastore, select the datastore and select Update
credentials.
To create a new datastore, select + New datastore.
c. In the datastore settings, select Yes for Use workspace managed identity for
data preview and profiling in Azure Machine Learning studio.
d. In the Networking settings for the Azure Storage Account, add the
Microsoft.MachineLearningService/workspaces Resource type, and set the
Instance name to the workspace.
These steps add the workspace's managed identity as a Reader to the new storage
service using Azure RBAC. Reader access allows the workspace to view the
resource, but not make changes.
Datastore: Azure Data Lake Storage Gen1

When using Azure Data Lake Storage Gen1 as a datastore, you can only use POSIX-style
access control lists. You can assign the workspace's managed identity access to
resources just like any other security principal. For more information, see Access control
in Azure Data Lake Storage Gen1.
Datastore: Azure Data Lake Storage Gen2

When using Azure Data Lake Storage Gen2 as a datastore, you can use both Azure RBAC
and POSIX-style access control lists (ACLs) to control data access inside of a virtual
network.
To use Azure RBAC, follow the steps in the Datastore: Azure Storage Account section of
this article. Data Lake Storage Gen2 is based on Azure Storage, so the same steps apply
when using Azure RBAC.
To use ACLs, the workspace's managed identity can be assigned access just like any
other security principal. For more information, see Access control lists on files and
directories.
Datastore: Azure SQL Database

To access data stored in an Azure SQL Database with a managed identity, you must
create a SQL contained user that maps to the managed identity. For more information
on creating a user from an external provider, see Create contained users mapped to
Azure AD identities.
After you create a SQL contained user, grant permissions to it by using the GRANT T-
SQL command.
Intermediate component output

When using the Azure Machine Learning designer intermediate component output, you
can specify the output location for any component in the designer. Use this to store
intermediate datasets in separate location for security, logging, or auditing purposes. To
specify output, use the following steps:
1. Select the component whose output you'd like to specify.

2. In the component settings pane that appears to the right, select Output settings.
3. Specify the datastore you want to use for each component output.
Make sure that you have access to the intermediate storage accounts in your virtual
network. Otherwise, the pipeline will fail.
Enable managed identity authentication for intermediate storage accounts to visualize

output data.
Access the studio from a resource inside the

VNet
If you are accessing the studio from a resource inside of a virtual network (for example,
a compute instance or virtual machine), you must allow outbound traffic from the virtual
network to the studio.
For example, if you are using network security groups (NSG) to restrict outbound traffic,
add a rule to a service tag destination of AzureFrontDoor.Frontend.
Firewall settings
Some storage services, such as Azure Storage Account, have firewall settings that apply
to the public endpoint for that specific service instance. Usually this setting allows you
to allow/disallow access from specific IP addresses from the public internet. This is not
supported when using Azure Machine Learning studio. It is supported when using the
Azure Machine Learning SDK or CLI.
 Tip
Azure Machine Learning studio is supported when using the Azure Firewall service.
For more information, see Use your workspace behind a firewall.
Next steps

Use custom DNS

Use a firewall
Configure a private endpoint for an
Azure Machine Learning workspace with
SDK and CLI v1
Article • 04/04/2023
In this document, you learn how to configure a private endpoint for your Azure Machine
Learning workspace. For information on creating a virtual network for Azure Machine
Learning, see Virtual network isolation and privacy overview.
Azure Private Link enables you to connect to your workspace using a private endpoint.
The private endpoint is a set of private IP addresses within your virtual network. You can
then limit access to your workspace to only occur over the private IP addresses. A
private endpoint helps reduce the risk of data exfiltration. To learn more about private
endpoints, see the Azure Private Link article.
２ Warning
Securing a workspace with private endpoints does not ensure end-to-end security
by itself. You must secure all of the individual components of your solution. For
example, if you use a private endpoint for the workspace, but your Azure Storage
Account is not behind the VNet, traffic between the workspace and storage does
not use the VNet for security.
For more information on securing resources used by Azure Machine Learning, see
Virtual network isolation and privacy overview.

Secure workspace resources.
Secure training environments (v1).
Secure inference environment (v1)
Use Azure Machine Learning studio in a VNet.
API platform network isolation.
Prerequisites
You must have an existing virtual network to create the private endpoint in.
Disable network policies for private endpoints before adding the private endpoint.
Limitations
If you enable public access for a workspace secured with private endpoint and use
Azure Machine Learning studio over the public internet, some features such as the
designer may fail to access your data. This problem happens when the data is
stored on a service that is secured behind the VNet. For example, an Azure Storage
Account.
You may encounter problems trying to access the private endpoint for your
workspace if you're using Mozilla Firefox. This problem may be related to DNS over
HTTPS in Mozilla Firefox. We recommend using Microsoft Edge or Google Chrome
as a workaround.
Using a private endpoint doesn't affect Azure control plane (management

operations) such as deleting the workspace or managing compute resources. For
example, creating, updating, or deleting a compute target. These operations are
performed over the public Internet as normal. Data plane operations, such as using
Azure Machine Learning studio, APIs (including published pipelines), or the SDK
use the private endpoint.
When creating a compute instance or compute cluster in a workspace with a

private endpoint, the compute instance and compute cluster must be in the same
Azure region as the workspace.
When creating or attaching an Azure Kubernetes Service cluster to a workspace

with a private endpoint, the cluster must be in the same region as the workspace.
When using a workspace with multiple private endpoints, one of the private
endpoints must be in the same VNet as the following dependency services:
Azure Storage Account that provides the default storage for the workspace
Azure Key Vault for the workspace
Azure Container Registry for the workspace.
For example, one VNet ('services' VNet) would contain a private endpoint for the
dependency services and the workspace. This configuration allows the workspace
to communicate with the services. Another VNet ('clients') might only contain a
private endpoint for the workspace, and be used only for communication between
client development machines and the workspace.
Create a workspace that uses a private
endpoint
Use one of the following methods to create a workspace with a private endpoint. Each
of these methods requires an existing virtual network:
 Tip
If you'd like to create a workspace, private endpoint, and virtual network at the
same time, see Use an Azure Resource Manager template to create a workspace
for Azure Machine Learning.
Python SDK
The Azure Machine Learning Python SDK provides the PrivateEndpointConfig class,
which can be used with Workspace.create() to create a workspace with a private
endpoint. This class requires an existing virtual network.
Python

from azureml.core import PrivateEndPointConfig
pe = PrivateEndPointConfig(name='myprivateendpoint', vnet_name='myvnet',
vnet_subnet_name='default')
ws = Workspace.create(name='myworkspace',
subscription_id='<my-subscription-id>',
resource_group='myresourcegroup',
location='eastus2',
private_endpoint_config=pe,
private_endpoint_auto_approval=True,
show_output=True)
Add a private endpoint to a workspace

Use one of the following methods to add a private endpoint to an existing workspace:
２ Warning
If you have any existing compute targets associated with this workspace, and they
are not behind the same virtual network tha the private endpoint is created in, they
will not work.
Python
Python

from azureml.core import PrivateEndPointConfig
pe = PrivateEndPointConfig(name='myprivateendpoint', vnet_name='myvnet',
vnet_subnet_name='default')
ws.add_private_endpoint(private_endpoint_config=pe,
private_endpoint_auto_approval=True, show_output=True)
For more information on the classes and methods used in this example, see
PrivateEndpointConfig and Workspace.add_private_endpoint.
Remove a private endpoint

You can remove one or all private endpoints for a workspace. Removing a private
endpoint removes the workspace from the VNet that the endpoint was associated with.
Removing the private endpoint may prevent the workspace from accessing resources in
that VNet, or resources in the VNet from accessing the workspace. For example, if the
VNet doesn't allow access to or from the public internet.
２ Warning
Removing the private endpoints for a workspace doesn't make it publicly

accessible. To make the workspace publicly accessible, use the steps in the Enable
public access section.
To remove a private endpoint, use the following information:
Python SDK
To remove a private endpoint, use Workspace.delete_private_endpoint_connection.
The following example demonstrates how to remove a private endpoint:
Python
# get the connection name
_, _, connection_name = ws.get_details()['privateEndpointConnections']
[0]['id'].rpartition('/')
ws.delete_private_endpoint_connection(private_endpoint_connection_name=c
onnection_name)
Enable public access

In some situations, you may want to allow someone to connect to your secured
workspace over a public endpoint, instead of through the VNet. Or you may want to
remove the workspace from the VNet and re-enable public access.
） Important
Enabling public access doesn't remove any private endpoints that exist. All
communications between components behind the VNet that the private
endpoint(s) connect to are still secured. It enables public access only to the
workspace, in addition to the private access through any private endpoints.
２ Warning
When connecting over the public endpoint while the workspace uses a private
endpoint to communicate with other resources:
Some features of studio will fail to access your data. This problem happens
when the data is stored on a service that is secured behind the VNet. For
example, an Azure Storage Account.
Using Jupyter, JupyterLab, RStudio, or Posit Workbench (formerly RStudio
Workbench) on a compute instance, including running notebooks, is not
supported.
To enable public access, use the following steps:
 Tip
There are two possible properties that you can configure:
allow_public_access_when_behind_vnet - used by the Python SDK and CLI v2
public_network_access - used by the Python SDK and CLI v2 Each property
overrides the other. For example, setting public_network_access will override

any previous setting to allow_public_access_when_behind_vnet .
Microsoft recommends using public_network_access to enable or disable public

access to a workspace.
Python SDK
To enable public access, use Workspace.update and set

allow_public_access_when_behind_vnet=True .
Python
ws.update(allow_public_access_when_behind_vnet=True)
Securely connect to your workspace

To connect to a workspace that's secured behind a VNet, use one of the following
methods:
Azure VPN gateway - Connects on-premises networks to the VNet over a private
connection. Connection is made over the public internet. There are two types of
VPN gateways that you might use:
Point-to-site: Each client computer uses a VPN client to connect to the VNet.
Site-to-site: A VPN device connects the VNet to your on-premises network.
ExpressRoute - Connects on-premises networks into the cloud over a private

connection. Connection is made using a connectivity provider.
Azure Bastion - In this scenario, you create an Azure Virtual Machine (sometimes
called a jump box) inside the VNet. You then connect to the VM using Azure
Bastion. Bastion allows you to connect to the VM using either an RDP or SSH
session from your local web browser. You then use the jump box as your
development environment. Since it is inside the VNet, it can directly access the
workspace. For an example of using a jump box, see Tutorial: Create a secure
workspace.
） Important
If you have problems connecting to the workspace, see Troubleshoot secure workspace
connectivity.
Multiple private endpoints

Azure Machine Learning supports multiple private endpoints for a workspace. Multiple
private endpoints are often used when you want to keep different environments
separate. The following are some scenarios that are enabled by using multiple private
endpoints:
Client development environments in a separate VNet.
An Azure Kubernetes Service (AKS) cluster in a separate VNet.
Other Azure services in a separate VNet. For example, Azure Synapse and Azure
Data Factory can use a Microsoft managed virtual network. In either case, a private
endpoint for the workspace can be added to the managed VNet used by those
services. For more information on using a managed virtual network with these
services, see the following articles:
Synapse managed private endpoints.
Azure Data Factory managed virtual network.
） Important
Synapse's data exfiltration protection is not supported with Azure Machine

Learning.
） Important
Each VNet that contains a private endpoint for the workspace must also be able to
access the Azure Storage Account, Azure Key Vault, and Azure Container Registry
used by the workspace. For example, you might create a private endpoint for the
services in each VNet.
Adding multiple private endpoints uses the same steps as described in the Add a private
endpoint to a workspace section.
Scenario: Isolated clients

If you want to isolate the development clients, so they don't have direct access to the
compute resources used by Azure Machine Learning, use the following steps:
７ Note
These steps assume that you have an existing workspace, Azure Storage Account,
Azure Key Vault, and Azure Container Registry. Each of these services has a private
endpoints in an existing VNet.
1. Create another VNet for the clients. This VNet might contain Azure Virtual
Machines that act as your clients, or it may contain a VPN Gateway used by on-
premises clients to connect to the VNet.
2. Add a new private endpoint for the Azure Storage Account, Azure Key Vault, and
Azure Container Registry used by your workspace. These private endpoints should
exist in the client VNet.
3. If you have another storage that is used by your workspace, add a new private
endpoint for that storage. The private endpoint should exist in the client VNet and
have private DNS zone integration enabled.
4. Add a new private endpoint to your workspace. This private endpoint should exist
in the client VNet and have private DNS zone integration enabled.
5. Use the steps in the Use studio in a virtual network article to enable studio to
access the storage account(s).
The following diagram illustrates this configuration. The Workload VNet contains
computes created by the workspace for training & deployment. The Client VNet
contains clients or client ExpressRoute/VPN connections. Both VNets contain private
endpoints for the workspace, Azure Storage Account, Azure Key Vault, and Azure
Container Registry.
Scenario: Isolated Azure Kubernetes Service
If you want to create an isolated Azure Kubernetes Service used by the workspace, use
the following steps:
７ Note
These steps assume that you have an existing workspace, Azure Storage Account,
Azure Key Vault, and Azure Container Registry. Each of these services has a private
endpoints in an existing VNet.
1. Create an Azure Kubernetes Service instance. During creation, AKS creates a VNet
that contains the AKS cluster.
2. Add a new private endpoint for the Azure Storage Account, Azure Key Vault, and
Azure Container Registry used by your workspace. These private endpoints should
exist in the client VNet.
3. If you have other storage that is used by your workspace, add a new private
endpoint for that storage. The private endpoint should exist in the client VNet and
have private DNS zone integration enabled.
4. Add a new private endpoint to your workspace. This private endpoint should exist
in the client VNet and have private DNS zone integration enabled.
5. Attach the AKS cluster to the Azure Machine Learning workspace. For more
information, see Create and attach an Azure Kubernetes Service cluster.
Next steps
For more information on securing your Azure Machine Learning workspace, see
the Virtual network isolation and privacy overview article.
If you plan on using a custom DNS solution in your virtual network, see how to use
a workspace with a custom DNS server.

Use TLS to secure a web service through
Article • 03/02/2023
This article shows you how to secure a web service that's deployed through Azure
Machine Learning.
You use HTTPS to restrict access to web services and secure the data that clients
submit. HTTPS helps secure communications between a client and a web service by
encrypting communications between the two. Encryption uses Transport Layer Security
(TLS) . TLS is sometimes still referred to as Secure Sockets Layer (SSL), which was the
predecessor of TLS.
 Tip
The Azure Machine Learning SDK uses the term "SSL" for properties that are related
to secure communications. This doesn't mean that your web service doesn't use
TLS. SSL is just a more commonly recognized term.
Specifically, web services deployed through Azure Machine Learning support TLS
version 1.2 for AKS and ACI. For ACI deployments, if you are on older TLS version,
we recommend re-deploying to get the latest TLS version.
TLS version 1.3 for Azure Machine Learning - AKS Inference is unsupported.
TLS and SSL both rely on digital certificates, which help with encryption and identity
verification. For more information on how digital certificates work, see the Wikipedia
topic Public key infrastructure .
２ Warning
If you don't use HTTPS for your web service, data that's sent to and from the
service might be visible to others on the internet.
HTTPS also enables the client to verify the authenticity of the server that it's
connecting to. This feature protects clients against man-in-the-middle attacks.
This is the general process to secure a web service:

1. Get a domain name.
2. Get a digital certificate.
3. Deploy or update the web service with TLS enabled.
4. Update your DNS to point to the web service.
） Important
If you're deploying to Azure Kubernetes Service (AKS), you can purchase your own
certificate or use a certificate that's provided by Microsoft. If you use a certificate
from Microsoft, you don't need to get a domain name or TLS/SSL certificate. For
more information, see the Enable TLS and deploy section of this article.
There are slight differences when you secure across deployment targets.
） Important
date.

and Python SDK v2.
Get a domain name

If you don't already own a domain name, purchase one from a domain name registrar.
The process and price differ among registrars. The registrar provides tools to manage
the domain name. You use these tools to map a fully qualified domain name (FQDN)
(such as www.contoso.com) to the IP address that hosts your web service.
Get a TLS/SSL certificate

There are many ways to get an TLS/SSL certificate (digital certificate). The most common
is to purchase one from a certificate authority (CA). Regardless of where you get the
certificate, you need the following files:
A certificate. The certificate must contain the full certificate chain, and it must be
"PEM-encoded."
A key. The key must also be PEM-encoded.
When you request a certificate, you must provide the FQDN of the address that you plan
to use for the web service (for example, www.contoso.com). The address that's stamped
into the certificate and the address that the clients use are compared to verify the
identity of the web service. If those addresses don't match, the client gets an error
message.
 Tip
If the certificate authority can't provide the certificate and key as PEM-encoded
files, you can use a utility such as OpenSSL to change the format.
２ Warning
Use self-signed certificates only for development. Don't use them in production
environments. Self-signed certificates can cause problems in your client
applications. For more information, see the documentation for the network libraries
that your client application uses.
Enable TLS and deploy

For AKS deployment, you can enable TLS termination when you create or attach an AKS
cluster in Azure Machine Learning workspace. At AKS model deployment time, you can
disable TLS termination with deployment configuration object, otherwise all AKS model
deployment by default will have TLS termination enabled at AKS cluster create or attach
time.
For ACI deployment, you can enable TLS termination at model deployment time with
deployment configuration object.
Deploy on Azure Kubernetes Service
７ Note
The information in this section also applies when you deploy a secure web service
for the designer. If you aren't familiar with using the Python SDK, see What is the
Azure Machine Learning SDK for Python?.
When you create or attach an AKS cluster in Azure Machine Learning workspace, you
can enable TLS termination with AksCompute.provisioning_configuration() and
AksCompute.attach_configuration() configuration objects. Both methods return a
configuration object that has an enable_ssl method, and you can use enable_ssl method
to enable TLS.
You can enable TLS either with Microsoft certificate or a custom certificate purchased
from CA.
When you use a certificate from Microsoft, you must use the leaf_domain_label
parameter. This parameter generates the DNS name for the service. For example, a
value of "contoso" creates a domain name of "contoso<six-random-characters>.
<azureregion>.cloudapp.azure.com", where <azureregion> is the region that
contains the service. Optionally, you can use the overwrite_existing_domain
parameter to overwrite the existing leaf_domain_label. The following example
demonstrates how to create a configuration that enables an TLS with Microsoft
certificate:
Python
# Config used to create a new AKS cluster and enable TLS

provisioning_config = AksCompute.provisioning_configuration()

provisioning_config.enable_ssl(leaf_domain_label = "contoso")
# Config used to attach an existing AKS cluster to your workspace and

enable TLS
attach_config = AksCompute.attach_configuration(resource_group =
resource_group,
cluster_name = cluster_name)

attach_config.enable_ssl(leaf_domain_label = "contoso")
） Important
When you use a certificate from Microsoft, you don't need to purchase your
own certificate or domain name.
When you use a custom certificate that you purchased, you use the
ssl_cert_pem_file, ssl_key_pem_file, and ssl_cname parameters. The PEM file with
pass phrase protection is not supported. The following example demonstrates how
to use .pem files to create a configuration that uses a TLS/SSL certificate that you
purchased:
Python
# Config used to create a new AKS cluster and enable TLS

provisioning_config = AksCompute.provisioning_configuration()
provisioning_config.enable_ssl(ssl_cert_pem_file="cert.pem",
ssl_key_pem_file="key.pem",
ssl_cname="www.contoso.com")
# Config used to attach an existing AKS cluster to your workspace and

enable SSL
resource_group,
attach_config.enable_ssl(ssl_cert_pem_file="cert.pem",
For more information about enable_ssl, see AksProvisioningConfiguration.enable_ssl()

and AksAttachConfiguration.enable_ssl().
Deploy on Azure Container Instances

When you deploy to Azure Container Instances, you provide values for TLS-related
parameters, as the following code snippet shows:
Python
aci_config = AciWebservice.deploy_configuration(
ssl_enabled=True, ssl_cert_pem_file="cert.pem",
ssl_key_pem_file="key.pem", ssl_cname="www.contoso.com")
For more information, see AciWebservice.deploy_configuration().

Update your DNS
For either AKS deployment with custom certificate or ACI deployment, you must update
your DNS record to point to the IP address of scoring endpoint.
） Important
When you use a certificate from Microsoft for AKS deployment, you don't need to
manually update the DNS value for the cluster. The value should be set
automatically.
You can follow following steps to update DNS record for your custom domain name:
1. Get scoring endpoint IP address from scoring endpoint URI, which is usually in the
format of http://104.214.29.152:80/api/v1/service/<service-name>/score . In this
example, the IP address is 104.214.29.152.
2. Use the tools from your domain name registrar to update the DNS record for your
domain name. The record maps the FQDN (for example, www.contoso.com) to the
IP address. The record must point to the IP address of scoring endpoint.
 Tip
Microsoft does is not responsible for updating the DNS for your custom DNS
name or certificate. You must update it with your domain name registrar.
3. After DNS record update, you can validate DNS resolution using nslookup custom-
domain-name command. If DNS record is correctly updated, the custom domain
name will point to the IP address of scoring endpoint.
There can be a delay of minutes or hours before clients can resolve the domain
name, depending on the registrar and the "time to live" (TTL) that's configured for
the domain name.
For more information on DNS resolution with Azure Machine Learning, see How to use
your workspace with a custom DNS server.
Update the TLS/SSL certificate

TLS/SSL certificates expire and must be renewed. Typically this happens every year. Use
the information in the following sections to update and renew your certificate for
models deployed to Azure Kubernetes Service:
Update a Microsoft generated certificate

If the certificate was originally generated by Microsoft (when using the
leaf_domain_label to create the service), it will automatically renew when needed. If you
want to manually renew it, use one of the following examples to update the certificate:
） Important
If the existing certificate is still valid, use renew=True (SDK) or --ssl-renew

(CLI) to force the configuration to renew it. This operation will take about 5
hours to take effect.
When the service was originally deployed, the leaf_domain_label is used to
create a DNS name using the pattern <leaf-domain-label>######.<azure-
region>.cloudapp.azure.com . To preserve the existing name (including the 6
digits originally generated), use the original leaf_domain_label value. Do not

include the 6 digits that were generated.
Use the SDK
Python

from azureml.core.compute.aks import AksUpdateConfiguration
from azureml.core.compute.aks import SslConfiguration
# Get the existing cluster

aks_target = AksCompute(ws, clustername)
# Update the existing certificate by referencing the leaf domain label

ssl_configuration = SslConfiguration(leaf_domain_label="myaks",
overwrite_existing_domain=True, renew=True)
update_config = AksUpdateConfiguration(ssl_configuration)
aks_target.update(update_config)
Use the CLI
Azure CLI
az ml computetarget update aks -g "myresourcegroup" -w "myresourceworkspace"

-n "myaks" --ssl-leaf-domain-label "myaks" --ssl-overwrite-domain True --
ssl-renew
For more information, see the following reference docs:
SslConfiguration
AksUpdateConfiguration
Update custom certificate

If the certificate was originally generated by a certificate authority, use the following
steps:
1. Use the documentation provided by the certificate authority to renew the

certificate. This process creates new certificate files.
2. Use either the SDK or CLI to update the service with the new certificate:
Use the SDK
Python

# Read the certificate file

def get_content(file_name):
with open(file_name, 'r') as f:
return f.read()

# Update cluster with custom certificate

ssl_configuration = SslConfiguration(cname="myaks",
cert=get_content('cert.pem'), key=get_content('key.pem'))
Use the CLI
Azure CLI
az ml computetarget update aks -g "myresourcegroup" -w

"myresourceworkspace" -n "myaks" --ssl-cname "myaks"--ssl-cert-file
"cert.pem" --ssl-key-file "key.pem"
For more information, see the following reference docs:
SslConfiguration
AksUpdateConfiguration
Disable TLS
To disable TLS for a model deployed to Azure Kubernetes Service, create an
SslConfiguration with status="Disabled" , then perform an update:
Python


# Disable TLS
ssl_configuration = SslConfiguration(status="Disabled")
Next steps
Learn how to:
Consume a machine learning model deployed as a web service

Virtual network isolation and privacy overview
How to use your workspace with a custom DNS server
How to use your workspace with a
custom DNS server
Article • 04/04/2023
When using an Azure Machine Learning workspace with a private endpoint, there are
several ways to handle DNS name resolution. By default, Azure automatically handles
name resolution for your workspace and private endpoint. If you instead use your own
custom DNS server, you must manually create DNS entries or use conditional
forwarders for the workspace.
） Important
This article covers how to find the fully qualified domain names (FQDN) and IP
addresses for these entries if you would like to manually register DNS records in
your DNS solution. Additionally this article provides architecture recommendations
for how to configure your custom DNS solution to automatically resolve FQDNs to
the correct IP addresses. This article does NOT provide information on configuring
the DNS records for these items. Consult the documentation for your DNS software
for information on how to add records.
 Tip


Use a firewall
Prerequisites
An Azure Virtual Network that uses your own DNS server.
An Azure Machine Learning workspace with a private endpoint. For more
information, see Create an Azure Machine Learning workspace.
Familiarity with using Network isolation during training & inference.
Familiarity with Azure Private Endpoint DNS zone configuration
Familiarity with Azure Private DNS
Optionally, Azure CLI or Azure PowerShell.
Automated DNS server integration
Introduction
There are two common architectures to use automated DNS server integration with
Azure Machine Learning:
A custom DNS server hosted in an Azure Virtual Network.

A custom DNS server hosted on-premises, connected to Azure Machine Learning
through ExpressRoute.
While your architecture may differ from these examples, you can use them as a
reference point. Both example architectures provide troubleshooting steps that can help
you identify components that may be misconfigured.
Another option is to modify the hosts file on the client that is connecting to the Azure
Virtual Network (VNet) that contains your workspace. For more information, see the
Host file section.
Workspace DNS resolution path

Access to a given Azure Machine Learning workspace via Private Link is done by
communicating with the following Fully Qualified Domains (called the workspace
FQDNs) listed below:
Azure Public regions:
<per-workspace globally-unique identifier>.workspace.<region the workspace was

created in>.api.azureml.ms

created in>.cert.api.azureml.ms
<compute instance name>.<region the workspace was created
in>.instances.azureml.ms
ml-<workspace-name, truncated>-<region>-<per-workspace globally-unique
identifier>.<region>.notebooks.azure.net
<managed online endpoint name>.<region>.inference.ml.azure.com - Used by
managed online endpoints
Azure China 21Vianet regions:
created in>.api.ml.azure.cn
created in>.cert.api.ml.azure.cn

in>.instances.azureml.cn

identifier>.<region>.notebooks.chinacloudapi.cn
<managed online endpoint name>.<region>.inference.ml.azure.cn - Used by
Azure US Government regions:

created in>.api.ml.azure.us

created in>.cert.api.ml.azure.us
in>.instances.azureml.us
identifier>.<region>.notebooks.usgovcloudapi.net
<managed online endpoint name>.<region>.inference.ml.azure.us - Used by
The Fully Qualified Domains resolve to the following Canonical Names (CNAMEs) called
the workspace Private Link FQDNs:
created in>.privatelink.api.azureml.ms
identifier>.<region>.privatelink.notebooks.azure.net
<managed online endpoint name>.<per-workspace globally-unique
identifier>.inference.<region>.privatelink.api.azureml.ms - Used by managed

online endpoints
Azure China regions:

created in>.privatelink.api.ml.azure.cn

identifier>.<region>.privatelink.notebooks.chinacloudapi.cn
identifier>.inference.<region>.privatelink.api.ml.azure.cn - Used by managed

online endpoints
created in>.privatelink.api.ml.azure.us

identifier>.<region>.privatelink.notebooks.usgovcloudapi.net

identifier>.inference.<region>.privatelink.api.ml.azure.us - Used by managed
online endpoints
The FQDNs resolve to the IP addresses of the Azure Machine Learning workspace in that
region. However, resolution of the workspace Private Link FQDNs can be overridden by
using a custom DNS server hosted in the virtual network. For an example of this
architecture, see the custom DNS server hosted in a vnet example.
７ Note
Managed online endpoints share the workspace private endpoint. If you are
manually adding DNS records to the private DNS zone
privatelink.api.azureml.ms , an A record with wildcard *.<per-workspace globally-
unique identifier>.inference.<region>.privatelink.api.azureml.ms should be
added to route all endpoints under the workspace to the private endpoint.
Manual DNS server integration

This section discusses which Fully Qualified Domains to create A records for in a DNS
Server, and which IP address to set the value of the A record to.
Retrieve Private Endpoint FQDNs
Azure Public region
The following list contains the fully qualified domain names (FQDNs) used by your
workspace if it is in the Azure Public Cloud:
<workspace-GUID>.workspace.<region>.cert.api.azureml.ms
<workspace-GUID>.workspace.<region>.api.azureml.ms
ml-<workspace-name, truncated>-<region>-<workspace-guid>.
<region>.notebooks.azure.net
７ Note
The workspace name for this FQDN may be truncated. Truncation is done to
keep ml-<workspace-name, truncated>-<region>-<workspace-guid> at 63
characters or less.
<instance-name>.<region>.instances.azureml.ms
７ Note
Compute instances can be accessed only from within the virtual network.
The IP address for this FQDN is not the IP of the compute instance. Instead,
use the private IP address of the workspace private endpoint (the IP of the
*.api.azureml.ms entries.)

Azure China region

The following FQDNs are for Azure China regions:
<workspace-GUID>.workspace.<region>.cert.api.ml.azure.cn
<workspace-GUID>.workspace.<region>.api.ml.azure.cn
<region>.notebooks.chinacloudapi.cn
７ Note
characters or less.
<instance-name>.<region>.instances.azureml.cn
The IP address for this FQDN is not the IP of the compute instance. Instead, use
the private IP address of the workspace private endpoint (the IP of the

Azure US Government
The following FQDNs are for Azure US Government regions:
<workspace-GUID>.workspace.<region>.cert.api.ml.azure.us
<workspace-GUID>.workspace.<region>.api.ml.azure.us
<region>.notebooks.usgovcloudapi.net
７ Note
characters or less.
<instance-name>.<region>.instances.azureml.us
The IP address for this FQDN is not the IP of the compute instance. Instead,
use the private IP address of the workspace private endpoint (the IP of the
Find the IP addresses

To find the internal IP addresses for the FQDNs in the VNet, use one of the following
methods:
７ Note
The fully qualified domain names and IP addresses will be different based on your
configuration. For example, the GUID value in the domain name will be specific to
your workspace.
Azure CLI
1. To get the ID of the private endpoint network interface, use the following
command:
Azure CLI
az network private-endpoint show --name <endpoint> --resource-group

<resource-group> --query 'networkInterfaces[*].id' --output table
2. To get the IP address and FQDN information, use the following command.
Replace <resource-id> with the ID from the previous step:
Azure CLI
az network nic show --ids <resource-id> --query

'ipConfigurations[*].{IPAddress: privateIpAddress, FQDNs:
privateLinkConnectionProperties.fqdns}'
The output will be similar to the following text:
JSON
[
{
"FQDNs": [
"fb7e20a0-8891-458b-b969-
55ddb3382f51.workspace.eastus.api.azureml.ms",
"fb7e20a0-8891-458b-b969-
55ddb3382f51.workspace.eastus.cert.api.azureml.ms"
],
"IPAddress": "10.1.0.5"
},
{
"FQDNs": [
"ml-myworkspace-eastus-fb7e20a0-8891-458b-b969-
55ddb3382f51.eastus.notebooks.azure.net"
],
"IPAddress": "10.1.0.6"
},
{
"FQDNs": [
"*.eastus.inference.ml.azure.com"
],
"IPAddress": "10.1.0.7"
}
]
The information returned from all methods is the same; a list of the FQDN and private IP
address for the resources. The following example is from the Azure Public Cloud:
FQDN IP
Address
fb7e20a0-8891-458b-b969-55ddb3382f51.workspace.eastus.api.azureml.ms 10.1.0.5
fb7e20a0-8891-458b-b969-55ddb3382f51.workspace.eastus.cert.api.azureml.ms 10.1.0.5
ml-myworkspace-eastus-fb7e20a0-8891-458b-b969- 10.1.0.6
55ddb3382f51.eastus.notebooks.azure.net
*.eastus.inference.ml.azure.com 10.1.0.7
The following table shows example IPs from Azure China regions:
FQDN IP
Address
52882c08-ead2-44aa-af65-08a75cf094bd.workspace.chinaeast2.api.ml.azure.cn 10.1.0.5
52882c08-ead2-44aa-af65-08a75cf094bd.workspace.chinaeast2.cert.api.ml.azure.cn 10.1.0.5
ml-mype-pltest-chinaeast2-52882c08-ead2-44aa-af65- 10.1.0.6
08a75cf094bd.chinaeast2.notebooks.chinacloudapi.cn
*.chinaeast2.inference.ml.azure.cn 10.1.0.7
The following table shows example IPs from Azure US Government regions:
FQDN IP
Address
52882c08-ead2-44aa-af65-08a75cf094bd.workspace.chinaeast2.api.ml.azure.us 10.1.0.5
52882c08-ead2-44aa-af65-08a75cf094bd.workspace.chinaeast2.cert.api.ml.azure.us 10.1.0.5
ml-mype-plt-usgovvirginia-52882c08-ead2-44aa-af65- 10.1.0.6
08a75cf094bd.usgovvirginia.notebooks.usgovcloudapi.net
*.usgovvirginia.inference.ml.azure.us 10.1.0.7
７ Note
privatelink.api.azureml.ms , an A record with wildcard *.<per-workspace globally-
unique identifier>.inference.<region>.privatelink.api.azureml.ms should be
added to route all endpoints under the workspace to the private endpoint.
Create A records in custom DNS server

Once the list of FQDNs and corresponding IP addresses are gathered, proceed to create
A records in the configured DNS Server. Refer to the documentation for your DNS server
to determine how to create A records. Note it is recommended to create a unique zone
for the entire FQDN, and create the A record in the root of the zone.
Example: Custom DNS Server hosted in VNet

This architecture uses the common Hub and Spoke virtual network topology. One virtual
network contains the DNS server and one contains the private endpoint to the Azure
Machine Learning workspace and associated resources. There must be a valid route
between both virtual networks. For example, through a series of peered virtual networks.

The following steps describe how this topology works:
1. Create Private DNS Zone and link to DNS Server Virtual Network:
The first step in ensuring a Custom DNS solution works with your Azure Machine
Learning workspace is to create two Private DNS Zones rooted at the following
domains:
privatelink.api.azureml.ms
privatelink.notebooks.azure.net
privatelink.api.ml.azure.cn
privatelink.notebooks.chinacloudapi.cn
privatelink.api.ml.azure.us
privatelink.notebooks.usgovcloudapi.net
７ Note
privatelink.api.azureml.ms , an A record with wildcard *.<per-workspace
globally-unique identifier>.inference.<region>.privatelink.api.azureml.ms
should be added to route all endpoints under the workspace to the private
endpoint.
Following creation of the Private DNS Zone, it needs to be linked to the DNS
Server Virtual Network. The Virtual Network that contains the DNS Server.
A Private DNS Zone overrides name resolution for all names within the scope of
the root of the zone. This override applies to all Virtual Networks the Private DNS
Zone is linked to. For example, if a Private DNS Zone rooted at
privatelink.api.azureml.ms is linked to Virtual Network foo, all resources in
Virtual Network foo that attempt to resolve

bar.workspace.westus2.privatelink.api.azureml.ms will receive any record that is
listed in the privatelink.api.azureml.ms zone.
However, records listed in Private DNS Zones are only returned to devices
resolving domains using the default Azure DNS Virtual Server IP address. So the
custom DNS Server will resolve domains for devices spread throughout your
network topology. But the custom DNS Server will need to resolve Azure Machine
Learning-related domains against the Azure DNS Virtual Server IP address.
2. Create private endpoint with private DNS integration targeting Private DNS
Zone linked to DNS Server Virtual Network:
The next step is to create a Private Endpoint to the Azure Machine Learning
workspace. The private endpoint targets both Private DNS Zones created in step 1.
This ensures all communication with the workspace is done via the Private
Endpoint in the Azure Machine Learning Virtual Network.
） Important
The private endpoint must have Private DNS integration enabled for this
example to function correctly.
3. Create conditional forwarder in DNS Server to forward to Azure DNS:
Next, create a conditional forwarder to the Azure DNS Virtual Server. The
conditional forwarder ensures that the DNS server always queries the Azure DNS
Virtual Server IP address for FQDNs related to your workspace. This means that the
DNS Server will return the corresponding record from the Private DNS Zone.
The zones to conditionally forward are listed below. The Azure DNS Virtual Server
IP address is 168.63.129.16:
api.azureml.ms
notebooks.azure.net
instances.azureml.ms
aznbcontent.net
inference.ml.azure.com - Used by managed online endpoints
api.ml.azure.cn
notebooks.chinacloudapi.cn
instances.azureml.cn
aznbcontent.net
inference.ml.azure.cn - Used by managed online endpoints
api.ml.azure.us
notebooks.usgovcloudapi.net
instances.azureml.us
aznbcontent.net
inference.ml.azure.us - Used by managed online endpoints
） Important
Configuration steps for the DNS Server are not included here, as there are
many DNS solutions available that can be used as a custom DNS Server. Refer
to the documentation for your DNS solution for how to appropriately
configure conditional forwarding.
4. Resolve workspace domain:
At this point, all setup is done. Now any client that uses DNS Server for name
resolution and has a route to the Azure Machine Learning Private Endpoint can
proceed to access the workspace. The client will first start by querying DNS Server
for the address of the following FQDNs:
<per-workspace globally-unique identifier>.workspace.<region the
workspace was created in>.api.azureml.ms


workspace was created in>.api.ml.azure.cn
workspace was created in>.api.ml.azure.us

5. Azure DNS recursively resolves workspace domain to CNAME:
The DNS Server will resolve the FQDNs from step 4 from Azure DNS. Azure DNS
will respond with one of the domains listed in step 1.
6. DNS Server recursively resolves workspace domain CNAME record from Azure
DNS:
DNS Server will proceed to recursively resolve the CNAME received in step 5.
Because there was a conditional forwarder setup in step 3, DNS Server will send
the request to the Azure DNS Virtual Server IP address for resolution.
7. Azure DNS returns records from Private DNS zone:
The corresponding records stored in the Private DNS Zones will be returned to
DNS Server, which will mean Azure DNS Virtual Server returns the IP addresses of
the Private Endpoint.
8. Custom DNS Server resolves workspace domain name to private endpoint

address:
Ultimately the Custom DNS Server now returns the IP addresses of the Private
Endpoint to the client from step 4. This ensures that all traffic to the Azure Machine
Learning workspace is via the Private Endpoint.
Troubleshooting
If you cannot access the workspace from a virtual machine or jobs fail on compute
resources in the virtual network, use the following steps to identify the cause:
1. Locate the workspace FQDNs on the Private Endpoint:
Navigate to the Azure portal using one of the following links:
Azure Public regions

Azure China regions
Azure US Government regions
Navigate to the Private Endpoint to the Azure Machine Learning workspace. The
workspace FQDNs will be listed on the "Overview" tab.
2. Access compute resource in Virtual Network topology:
Proceed to access a compute resource in the Azure Virtual Network topology. This
will likely require accessing a Virtual Machine in a Virtual Network that is peered
with the Hub Virtual Network.
3. Resolve workspace FQDNs:
Open a command prompt, shell, or PowerShell. Then for each of the workspace
FQDNs, run the following command:
nslookup <workspace FQDN>
The result of each nslookup should return one of the two private IP addresses on
the Private Endpoint to the Azure Machine Learning workspace. If it does not, then
there is something misconfigured in the custom DNS solution.
Possible causes:
The compute resource running the troubleshooting commands is not using

DNS Server for DNS resolution
The Private DNS Zones chosen when creating the Private Endpoint are not
linked to the DNS Server VNet
Conditional forwarders to Azure DNS Virtual Server IP were not configured
correctly
Example: Custom DNS Server hosted on-
premises
This architecture uses the common Hub and Spoke virtual network topology.
ExpressRoute is used to connect from your on-premises network to the Hub virtual
network. The Custom DNS server is hosted on-premises. A separate virtual network
contains the private endpoint to the Azure Machine Learning workspace and associated
resources. With this topology, there needs to be another virtual network hosting a DNS
server that can send requests to the Azure DNS Virtual Server IP address.
The following steps describe how this topology works:
1. Create Private DNS Zone and link to DNS Server Virtual Network:
The first step in ensuring a Custom DNS solution works with your Azure Machine
Learning workspace is to create two Private DNS Zones rooted at the following
domains:
privatelink.api.azureml.ms
privatelink.notebooks.azure.net
privatelink.api.ml.azure.cn
privatelink.notebooks.chinacloudapi.cn
privatelink.api.ml.azure.us
privatelink.notebooks.usgovcloudapi.net
７ Note
privatelink.api.azureml.ms , an A record with wildcard *.<per-workspace
globally-unique identifier>.inference.<region>.privatelink.api.azureml.ms
should be added to route all endpoints under the workspace to the private
endpoint.
Following creation of the Private DNS Zone, it needs to be linked to the DNS
Server VNet – the Virtual Network that contains the DNS Server.
７ Note
The DNS Server in the virtual network is separate from the On-premises DNS
Server.
A Private DNS Zone overrides name resolution for all names within the scope of
the root of the zone. This override applies to all Virtual Networks the Private DNS
Zone is linked to. For example, if a Private DNS Zone rooted at
privatelink.api.azureml.ms is linked to Virtual Network foo, all resources in
Virtual Network foo that attempt to resolve
bar.workspace.westus2.privatelink.api.azureml.ms will receive any record that is
listed in the privatelink.api.azureml.ms zone.
However, records listed in Private DNS Zones are only returned to devices
resolving domains using the default Azure DNS Virtual Server IP address. The
Azure DNS Virtual Server IP address is only valid within the context of a Virtual
Network. When using an on-premises DNS server, it is not able to query the Azure
DNS Virtual Server IP address to retrieve records.
To get around this behavior, create an intermediary DNS Server in a virtual

network. This DNS server can query the Azure DNS Virtual Server IP address to
retrieve records for any Private DNS Zone linked to the virtual network.
While the On-premises DNS Server will resolve domains for devices spread
throughout your network topology, it will resolve Azure Machine Learning-related
domains against the DNS Server. The DNS Server will resolve those domains from
the Azure DNS Virtual Server IP address.
2. Create private endpoint with private DNS integration targeting Private DNS
Zone linked to DNS Server Virtual Network:
The next step is to create a Private Endpoint to the Azure Machine Learning
workspace. The private endpoint targets both Private DNS Zones created in step 1.
This ensures all communication with the workspace is done via the Private
Endpoint in the Azure Machine Learning Virtual Network.
） Important
The private endpoint must have Private DNS integration enabled for this
example to function correctly.
3. Create conditional forwarder in DNS Server to forward to Azure DNS:
Next, create a conditional forwarder to the Azure DNS Virtual Server. The
conditional forwarder ensures that the DNS server always queries the Azure DNS
Virtual Server IP address for FQDNs related to your workspace. This means that the
DNS Server will return the corresponding record from the Private DNS Zone.
The zones to conditionally forward are listed below. The Azure DNS Virtual Server
IP address is 168.63.129.16.
api.azureml.ms
notebooks.azure.net
aznbcontent.net
api.ml.azure.cn
aznbcontent.net
api.ml.azure.us
aznbcontent.net
） Important
4. Create conditional forwarder in On-premises DNS Server to forward to DNS

Server:
Next, create a conditional forwarder to the DNS Server in the DNS Server Virtual
Network. This forwarder is for the zones listed in step 1. This is similar to step 3,
but, instead of forwarding to the Azure DNS Virtual Server IP address, the On-
premises DNS Server will be targeting the IP address of the DNS Server. As the On-
premises DNS Server is not in Azure, it is not able to directly resolve records in
Private DNS Zones. In this case the DNS Server proxies requests from the On-
premises DNS Server to the Azure DNS Virtual Server IP. This allows the On-
premises DNS Server to retrieve records in the Private DNS Zones linked to the
DNS Server Virtual Network.
The zones to conditionally forward are listed below. The IP addresses to forward to
are the IP addresses of your DNS Servers:
api.azureml.ms
notebooks.azure.net
api.ml.azure.cn
api.ml.azure.us
） Important
5. Resolve workspace domain:
At this point, all setup is done. Any client that uses on-premises DNS Server for
name resolution, and has a route to the Azure Machine Learning Private Endpoint,
can proceed to access the workspace.
The client will first start by querying On-premises DNS Server for the address of
the following FQDNs:
workspace was created in>.api.azureml.ms


workspace was created in>.api.ml.azure.cn


workspace was created in>.api.ml.azure.us
6. On-premises DNS server recursively resolves workspace domain:
The on-premises DNS Server will resolve the FQDNs from step 5 from the DNS
Server. Because there is a conditional forwarder (step 4), the on-premises DNS
Server will send the request to the DNS Server for resolution.
7. DNS Server resolves workspace domain to CNAME from Azure DNS:
The DNS server will resolve the FQDNs from step 5 from the Azure DNS. Azure
DNS will respond with one of the domains listed in step 1.
8. On-premises DNS Server recursively resolves workspace domain CNAME record

from DNS Server:
On-premises DNS Server will proceed to recursively resolve the CNAME received in
step 7. Because there was a conditional forwarder setup in step 4, On-premises
DNS Server will send the request to DNS Server for resolution.
9. DNS Server recursively resolves workspace domain CNAME record from Azure
DNS:
DNS Server will proceed to recursively resolve the CNAME received in step 7.
Because there was a conditional forwarder setup in step 3, DNS Server will send
the request to the Azure DNS Virtual Server IP address for resolution.
10. Azure DNS returns records from Private DNS zone:
The corresponding records stored in the Private DNS Zones will be returned to
DNS Server, which will mean the Azure DNS Virtual Server returns the IP addresses
of the Private Endpoint.
11. On-premises DNS Server resolves workspace domain name to private endpoint
address:
The query from On-premises DNS Server to DNS Server in step 8 ultimately returns
the IP addresses associated with the Private Endpoint to the Azure Machine
Learning workspace. These IP addresses are returned to the original client, which
will now communicate with the Azure Machine Learning workspace over the
Private Endpoint configured in step 1.
） Important
If VPN Gateway is being used in this set up along with custom DNS Server IP's
on VNet then Azure DNS IP (168.63.129.16) needs to be added in the list as
well to maintain undisrupted communication.
Example: Hosts file

The hosts file is a text document that Linux, macOS, and Windows all use to override
name resolution for the local computer. The file contains a list of IP addresses and the
corresponding host name. When the local computer tries to resolve a host name, if the
host name is listed in the hosts file, the name is resolved to the corresponding IP
address.
） Important
The hosts file only overrides name resolution for the local computer. If you want to
use a hosts file with multiple computers, you must modify it individually on each
computer.
The following table lists the location of the hosts file:
Operating system Location
Linux /etc/hosts
macOS /etc/hosts
Windows %SystemRoot%\System32\drivers\etc\hosts
 Tip
The name of the file is hosts with no extension. When editing the file, use
administrator access. For example, on Linux or macOS you might use sudo vi . On
Windows, run notepad as an administrator.
The following is an example of hosts file entries for Azure Machine Learning:
# For core Azure Machine Learning hosts

10.1.0.5 fb7e20a0-8891-458b-b969-
55ddb3382f51.workspace.eastus.api.azureml.ms
10.1.0.5 fb7e20a0-8891-458b-b969-
55ddb3382f51.workspace.eastus.cert.api.azureml.ms
10.1.0.6 ml-myworkspace-eastus-fb7e20a0-8891-458b-b969-
55ddb3382f51.eastus.notebooks.azure.net
# For a managed online/batch endpoint named 'mymanagedendpoint'

10.1.0.7 mymanagedendpoint.eastus.inference.ml.azure.com
# For a compute instance named 'mycomputeinstance'

10.1.0.5 mycomputeinstance.eastus.instances.azureml.ms
For more information on the hosts file, see https://wikipedia.org/wiki/Hosts_(file) .
Dependency services DNS resolution

The services that your workspace relies on may also be secured using a private
endpoint. If so, then you may need to create a custom DNS record if you need to
directly communicate with the service. For example, if you want to directly work with the
data in an Azure Storage Account used by your workspace.
７ Note
Some services have multiple private-endpoints for sub-services or features. For

example, an Azure Storage Account may have individual private endpoints for Blob,
File, and DFS. If you need to access both Blob and File storage, then you must
enable resolution for each specific private endpoint.
For more information on the services and DNS resolution, see Azure Private Endpoint
DNS configuration.
Troubleshooting
If after running through the above steps you are unable to access the workspace from a
virtual machine or jobs fail on compute resources in the Virtual Network containing the
Private Endpoint to the Azure Machine Learning workspace, follow the below steps to
try to identify the cause.
1. Locate the workspace FQDNs on the Private Endpoint:
Navigate to the Azure portal using one of the following links:
Azure Public regions

Azure China regions
Azure US Government regions
Navigate to the Private Endpoint to the Azure Machine Learning workspace. The
workspace FQDNs will be listed on the "Overview" tab.
2. Access compute resource in Virtual Network topology:
Proceed to access a compute resource in the Azure Virtual Network topology. This
will likely require accessing a Virtual Machine in a Virtual Network that is peered
with the Hub Virtual Network.
3. Resolve workspace FQDNs:
Open a command prompt, shell, or PowerShell. Then for each of the workspace
FQDNs, run the following command:
nslookup <workspace FQDN>
The result of each nslookup should yield one of the two private IP addresses on
the Private Endpoint to the Azure Machine Learning workspace. If it does not, then
there is something misconfigured in the custom DNS solution.
Possible causes:
The compute resource running the troubleshooting commands is not using

DNS Server for DNS resolution
The Private DNS Zones chosen when creating the Private Endpoint are not
linked to the DNS Server VNet
Conditional forwarders from DNS Server to Azure DNS Virtual Server IP were
not configured correctly
Conditional forwarders from On-premises DNS Server to DNS Server were
not configured correctly
Next steps


Use a firewall
For information on integrating Private Endpoints into your DNS configuration, see Azure
Private Endpoint DNS configuration.
Configure inbound and outbound
network traffic
Article • 04/14/2023
Azure Machine Learning requires access to servers and services on the public internet.
When implementing network isolation, you need to understand what access is required
and how to enable it.
７ Note
The information in this article applies to Azure Machine Learning workspace

configured with a private endpoint.
Common terms and information

The following terms and information are used throughout this article:
Azure service tags: A service tag is an easy way to specify the IP ranges used by an
Azure service. For example, the AzureMachineLearning tag represents the IP
addresses used by the Azure Machine Learning service.
） Important
Azure service tags are only supported by some Azure services. For a list of
service tags supported with network security groups and Azure Firewall, see
the Virtual network service tags article.
If you are using a non-Azure solution such as a 3rd party firewall, download a
list of Azure IP Ranges and Service Tags . Extract the file and search for the
service tag within the file. The IP addresses may change periodically.
Region: Some service tags allow you to specify an Azure region. This limits access
to the service IP addresses in a specific region, usually the one that your service is
in. In this article, when you see <region> , substitute your Azure region instead. For
example, BatchNodeManagement.<region> would be BatchNodeManagement.uswest if
your Azure Machine Learning workspace is in the US West region.
Azure Batch: Azure Machine Learning compute clusters and compute instances
rely on a back-end Azure Batch instance. This back-end service is hosted in a
Microsoft subscription.
Ports: The following ports are used in this article. If a port range isn't listed in this
table, it's specific to the service and may not have any published information on
what it's used for:
Port Description
80 Unsecured web traffic (HTTP)
443 Secured web traffic (HTTPS)
445 SMB traffic used to access file shares in Azure File storage
8787 Used when connecting to RStudio on a compute instance
18881 Used to connect to the language server to enable IntelliSense for notebooks on a
compute instance.
Protocol: Unless noted otherwise, all network traffic mentioned in this article uses
TCP.
Basic configuration
This configuration makes the following assumptions:
You're using docker images provided by a container registry that you provide, and
won't be using images provided by Microsoft.
You're using a private Python package repository, and won't be accessing public
package repositories such as pypi.org , *.anaconda.com , or *.anaconda.org .
The private endpoints can communicate directly with each other within the VNet.
For example, all services have a private endpoint in the same VNet:
Azure Machine Learning workspace
Azure Storage Account (blob, file, table, queue)
Inbound traffic
Source Source Destination Destinationports Purpose

ports
AzureMachineLearning Any VirtualNetwork 44224 Inbound to compute

instance/cluster. Only
needed if the
instance/cluster is
configured to use a
public IP address.
 Tip
A network security group (NSG) is created by default for this traffic. For more
information, see Default security rules.
Outbound traffic
Service tag(s) Ports Purpose
AzureActiveDirectory 80, Authentication using Azure AD.

443
AzureMachineLearning 443, Using Azure Machine Learning services.

8787,
18881
UDP:
5831
BatchNodeManagement. 443 Communication Azure Batch.

<region>
AzureResourceManager 443 Creation of Azure resources with Azure Machine

Learning.
Storage.<region> 443 Access data stored in the Azure Storage Account for
compute cluster and compute instance. This outbound
can be used to exfiltrate data. For more information, see
Data exfiltration protection.
AzureFrontDoor.FrontEnd 443 Global entry point for Azure Machine Learning studio .
* Not needed in Azure China. Store images and environments for AutoML.
MicrosoftContainerRegistry. 443 Access docker images provided by Microsoft.

<region>
Frontdoor.FirstParty 443 Access docker images provided by Microsoft.
AzureMonitor 443 Used to log monitoring and metrics to Azure Monitor.

Only needed if you haven't secured Azure Monitor for
the workspace.
* This outbound is also used to log information for
support incidents.
） Important
If a compute instance or compute cluster is configured for no public IP, by default it

can't access the internet. If it can still send outbound traffic to the internet, it is
because of Azure default outbound access and you have an NSG that allows
outbound to the internet. We don't recocmmend using the default outbound
access. If you need outbound access to the internet, we recommend using one of
the following options instead of the default outbound access:
Azure Virtual Network NAT with a public IP: For more information on using
Virtual Network Nat, see the Virtual Network NAT documentation.
User-defined route and firewall: Create a user-defined route in the subnet
that contains the compute. The Next hop for the route should reference the
private IP address of the firewall, with an address prefix of 0.0.0.0/0.
For more information, see the Default outbound access in Azure article.
Recommended configuration for training and deploying

models
Outbound traffic
Service tag(s) Ports Purpose
MicrosoftContainerRegistry. 443 Allows use of Docker images that Microsoft provides for
<region> and training and inference. Also sets up the Azure Machine
AzureFrontDoor.FirstParty Learning router for Azure Kubernetes Service.
To allow installation of Python packages for training and deployment, allow outbound
traffic to the following host names:
７ Note
This is not a complete list of the hosts required for all Python resources on the
internet, only the most commonly used. For example, if you need access to a
GitHub repository or other host, you must identify and add the required hosts for
that scenario.
Host name Purpose

*.anaconda.com

Host name Purpose
pypi.org Used to list dependencies from the default index, if any, and the index isn't
overwritten by user settings. If the index is overwritten, you must also allow
*.pythonhosted.org .
*pytorch.org Used by some examples based on PyTorch.
*.tensorflow.org Used by some examples based on Tensorflow.
Scenario: Install RStudio on compute instance

To allow installation of RStudio on a compute instance, the firewall needs to allow
outbound access to the sites to pull the Docker image from. Add the following
Application rule to your Azure Firewall policy:
Name: AllowRStudioInstall
Source Type: IP Address
Source IP Addresses: The IP address range of the subnet where you will create the
compute instance. For example, 172.16.0.0/24 .
Destination Type: FQDN
Target FQDN: ghcr.io , pkg-containers.githubusercontent.com
Protocol: Https:443
To allow the installation of R packages, allow outbound traffic to cloud.r-project.org .

This host is used for installing CRAN packages.
７ Note
If you need access to a GitHub repository or other host, you must identify and add
the required hosts for that scenario.
Scenario: Using compute cluster or compute

instance with a public IP
） Important
A compute instance or compute cluster without a public IP does not need inbound
traffic from Azure Batch management and Azure Machine Learning services.
However, if you have multiple computes and some of them use a public IP address,
you will need to allow this traffic.
When using Azure Machine Learning compute instance or compute cluster (with a
public IP address), allow inbound traffic from the Azure Machine Learning service. A
compute instance or compute cluster with no public IP (preview) doesn't require this
inbound communication. A Network Security Group allowing this traffic is dynamically
created for you, however you may need to also create user-defined routes (UDR) if you
have a firewall. When creating a UDR for this traffic, you can use either IP Addresses or
service tags to route the traffic.
IP Address routes
For the Azure Machine Learning service, you must add the IP address of both the
primary and secondary regions. To find the secondary region, see the Cross-region
replication in Azure. For example, if your Azure Machine Learning service is in East
US 2, the secondary region is Central US.
To get a list of IP addresses of the Azure Machine Learning service, download the
Azure IP Ranges and Service Tags and search the file for AzureMachineLearning.
<region> , where <region> is your Azure region.
） Important
The IP addresses may change over time.
When creating the UDR, set the Next hop type to Internet. This means the inbound
communication from Azure skips your firewall to access the load balancers with
public IPs of Compute Instance and Compute Cluster. UDR is required because
Compute Instance and Compute Cluster will get random public IPs at creation, and
you cannot know the public IPs before creation to register them on your firewall to
allow the inbound from Azure to specific IPs for Compute Instance and Compute
Cluster. The following image shows an example IP address based UDR in the Azure
portal:
For information on configuring UDR, see Route network traffic with a routing table.
Scenario: Firewall between Azure Machine

Learning and Azure Storage endpoints
You must also allow outbound access to Storage.<region> on port 445.
Scenario: Workspace created with the

hbi_workspace flag enabled
You must also allow outbound access to Keyvault.<region> . This outbound traffic is
used to access the key vault instance for the back-end Azure Batch service.
For more information on the hbi_workspace flag, see the data encryption article.
Scenario: Use Kubernetes compute

Kubernetes Cluster running behind an outbound proxy server or firewall needs extra
egress network configuration.
For Kubernetes with Azure Arc connection, configure the Azure Arc network
requirements needed by Azure Arc agents.
For AKS cluster without Azure Arc connection, configure the AKS extension
network requirements.
Besides above requirements, the following outbound URLs are also required for Azure
Machine Learning,
Outbound Endpoint Port Description Training Inference
*.kusto.windows.net 443 Required to upload system ✓ ✓

*.table.core.windows.net logs to Kusto.
*.queue.core.windows.net
<your ACR name>.azurecr.io 443 Azure container registry, ✓ ✓

<your ACR name>. required to pull docker
<region>.data.azurecr.io images used for machine
learning workloads.
<your storage account 443 Azure blob storage, required ✓ ✓

name>.blob.core.windows.net to fetch machine learning
project scripts, data or
models, and upload job
logs/outputs.
<your workspace ID>.workspace. 443 Azure Machine Learning ✓ ✓

<region>.api.azureml.ms service API.
<region>.experiments.azureml.net
<region>.api.azureml.ms
pypi.org 443 Python package index, to ✓ N/A

install pip packages used for
training job environment
initialization.
archive.ubuntu.com 80 Required to download the ✓ N/A

security.ubuntu.com necessary security patches.
ppa.launchpad.net
７ Note
Replace <your workspace workspace ID> with your workspace ID. The ID can
be found in Azure portal - your Machine Learning resource page - Properties -
Workspace ID.
Replace <your storage account> with the storage account name.
Replace <your ACR name> with the name of the Azure Container Registry for
your workspace.
Replace <region> with the region of your workspace.
In-cluster communication requirements
To install the Azure Machine Learning extension on Kubernetes compute, all Azure
Machine Learning related components are deployed in a azureml namespace. The
following in-cluster communication is needed to ensure the ML workloads work well in
the AKS cluster.
The components in azureml namespace should be able to communicate with

Kubernetes API server.
The components in azureml namespace should be able to communicate with each
other.
The components in azureml namespace should be able to communicate with
kube-dns and konnectivity-agent in kube-system namespace.
If the cluster is used for real-time inferencing, azureml-fe-xxx PODs should be able
to communicate with the deployed model PODs on 5001 port in other namespace.
azureml-fe-xxx PODs should open 11001, 12001, 12101, 12201, 20000, 8000, 8001,
9001 ports for internal communication.

If the cluster is used for real-time inferencing, the deployed model PODs should be
able to communicate with amlarc-identity-proxy-xxx PODs on 9999 port.
Scenario: Visual Studio Code

The hosts in this section are used to install Visual Studio Code packages to establish a
remote connection between Visual Studio Code and compute instances in your Azure
７ Note
This is not a complete list of the hosts required for all Visual Studio Code resources
on the internet, only the most commonly used. For example, if you need access to a
GitHub repository or other host, you must identify and add the required hosts for
that scenario.
Host name Purpose
*.vscode.dev Required to access vscode.dev (Visual Studio

*.vscode-unpkg.net Code for the Web)
*.vscode-cdn.net
*.vscodeexperiments.azureedge.net
default.exp-tas.com
Host name Purpose
code.visualstudio.com Required to download and install VS Code

desktop. This host isn't required for VS Code
Web.
update.code.visualstudio.com Used to retrieve VS Code server bits that are

*.vo.msecnd.net installed on the compute instance through a
setup script.
marketplace.visualstudio.com Required to download and install VS Code

vscode.blob.core.windows.net extensions. These hosts enable the remote
*.gallerycdn.vsassets.io connection to compute instances using the Azure
Machine Learning extension for VS Code. For
more information, see Connect to an Azure
Machine Learning compute instance in Visual
Studio Code
raw.githubusercontent.com/microsoft/vscode- Used to retrieve websocket server bits that are

tools-for- installed on the compute instance. The
ai/master/azureml_remote_websocket_server/* websocket server is used to transmit requests
from Visual Studio Code client (desktop
application) to Visual Studio Code server running
on the compute instance.
Scenario: Third party firewall

The guidance in this section is generic, as each firewall has its own terminology and
specific configurations. If you have questions, check the documentation for the firewall
you're using.
If not configured correctly, the firewall can cause problems using your workspace. There
are various host names that are used both by the Azure Machine Learning workspace.
The following sections list hosts that are required for Azure Machine Learning.
Dependencies API
You can also use the Azure Machine Learning REST API to get a list of hosts and ports
that you must allow outbound traffic to. To use this API, use the following steps:
1. Get an authentication token. The following command demonstrates using the

Azure CLI to get an authentication token and subscription ID:
Azure CLI
TOKEN=$(az account get-access-token --query accessToken -o tsv)
SUBSCRIPTION=$(az account show --query id -o tsv)
2. Call the API. In the following command, replace the following values:
Replace <region> with the Azure region your workspace is in. For example,
westus2 .
Replace <resource-group> with the resource group that contains your

workspace.
Replace <workspace-name> with the name of your workspace.
Azure CLI
az rest --method GET \

--url
"https://<region>.api.azureml.ms/rp/workspaces/subscriptions/$SUBSCRIPT
ION/resourceGroups/<resource-
group>/providers/Microsoft.MachineLearningServices/workspaces/<workspac
e-name>/outboundNetworkDependenciesEndpoints?api-version=2018-03-01-
preview" \
--header Authorization="Bearer $TOKEN"
The result of the API call is a JSON document. The following snippet is an excerpt of this
document:
JSON
{
"value": [
{
"properties": {
"category": "Azure Active Directory",
"endpoints": [
{
"domainName": "login.microsoftonline.com",
"endpointDetails": [
{
"port": 80
},
{
"port": 443
}
]
}
]
}
},
{
"properties": {
"category": "Azure portal",
"endpoints": [
{
"domainName": "management.azure.com",
"endpointDetails": [
{
"port": 443
}
]
}
]
}
},
...
Microsoft hosts
The hosts in the following tables are owned by Microsoft, and provide services required
for the proper functioning of your workspace. The tables list hosts for the Azure public,
Azure Government, and Azure China 21Vianet regions.
） Important
Azure Machine Learning uses Azure Storage Accounts in your subscription and in
Microsoft-managed subscriptions. Where applicable, the following terms are used
to differentiate between them in this section:
Your storage: The Azure Storage Account(s) in your subscription, which is

used to store your data and artifacts such as models, training data, training
logs, and Python scripts.>
Microsoft storage: The Azure Machine Learning compute instance and
compute clusters rely on Azure Batch, and must access storage located in a
Microsoft subscription. This storage is used only for the management of the
compute instances. None of your data is stored here.
General Azure hosts
Azure public
Required for Hosts Protocol Ports

Azure Active Directory login.microsoftonline.com TCP 80, 443
Azure portal management.azure.com TCP 443
Azure Resource Manager management.azure.com TCP 443
Azure Machine Learning hosts
） Important
In the following table, replace <storage> with the name of the default storage
account for your Azure Machine Learning workspace. Replace <region> with the
region of your workspace.
Azure public
Azure Machine Learning ml.azure.com TCP 443

studio
API *.azureml.ms TCP 443
API *.azureml.net TCP 443
Model management *.modelmanagement.azureml.net TCP 443
Integrated notebook *.notebooks.azure.net TCP 443
Integrated notebook <storage>.file.core.windows.net TCP 443, 445
Integrated notebook <storage>.dfs.core.windows.net TCP 443
Integrated notebook <storage>.blob.core.windows.net TCP 443
Integrated notebook graph.microsoft.com TCP 443
Integrated notebook *.aznbcontent.net TCP 443
AutoML NLP, Vision automlresources-prod.azureedge.net TCP 443
AutoML NLP, Vision aka.ms TCP 443

７ Note
AutoML NLP, Vision are currently only supported in Azure public regions.
Azure Machine Learning compute instance and compute cluster hosts
 Tip
The host for Azure Key Vault is only needed if your workspace was created
with the hbi_workspace flag enabled.
Ports 8787 and 18881 for compute instance are only needed when your
Azure Machine workspace has a private endpoint.
In the following table, replace <storage> with the name of the default storage
account for your Azure Machine Learning workspace.
In the following table, replace <region> with the Azure region that contains
your Azure Machine Learning workspace.
Websocket communication must be allowed to the compute instance. If you
block websocket traffic, Jupyter notebooks won't work correctly.
Azure public
Compute graph.windows.net TCP 443

cluster/instance
Compute instance *.instances.azureml.net TCP 443
Compute instance *.instances.azureml.ms TCP 443, 8787,

18881
Compute instance <region>.tundra.azureml.ms UDP 5831
Compute instance *.<region>.batch.azure.com ANY 443
Compute instance *. ANY 443

<region>.service.batch.azure.com
Microsoft storage *.blob.core.windows.net TCP 443

access
Microsoft storage *.table.core.windows.net TCP 443

access
Microsoft storage *.queue.core.windows.net TCP 443

access
Your storage account <storage>.file.core.windows.net TCP 443, 445
Your storage account <storage>.blob.core.windows.net TCP 443
Azure Key Vault *.vault.azure.net TCP 443
Docker images maintained by by Azure Machine Learning
Microsoft Container Registry mcr.microsoft.com TCP 443

*.data.mcr.microsoft.com
 Tip
Azure Container Registry is required for any custom Docker image. This
includes small modifications (such as additional packages) to base images
provided by Microsoft. It is also required by the internal training job
submission process of Azure Machine Learning.
Microsoft Container Registry is only needed if you plan on using the default
Docker images provided by Microsoft, and enabling user-managed
dependencies.
If you plan on using federated identity, follow the Best practices for securing
Active Directory Federation Services article.
Also, use the information in the compute with public IP section to add IP addresses for
BatchNodeManagement and AzureMachineLearning .
For information on restricting access to models deployed to AKS, see Restrict egress
traffic in Azure Kubernetes Service.
Monitoring, metrics, and diagnostics
If you haven't secured Azure Monitor for the workspace, you must allow outbound
traffic to the following hosts:
７ Note
The information logged to these hosts is also used by Microsoft Support to be able
to diagnose any problems you run into with your workspace.
dc.applicationinsights.azure.com
dc.applicationinsights.microsoft.com
dc.services.visualstudio.com
*.in.applicationinsights.azure.com
For a list of IP addresses for these hosts, see IP addresses used by Azure Monitor.
Next steps


Use custom DNS
For more information on configuring Azure Firewall, see Tutorial: Deploy and configure
Azure Firewall using the Azure portal.
Azure Machine Learning data
exfiltration prevention
Article • 05/23/2023
Azure Machine Learning has several inbound and outbound dependencies. Some of
these dependencies can expose a data exfiltration risk by malicious agents within your
organization. This document explains how to minimize data exfiltration risk by limiting
inbound and outbound requirements.
Inbound: If your compute instance or cluster uses a public IP address, you have an
inbound on azuremachinelearning (port 44224) service tag. You can control this
inbound traffic by using a network security group (NSG) and service tags. It's
difficult to disguise Azure service IPs, so there's low data exfiltration risk. You can
also configure the compute to not use a public IP, which removes inbound
requirements.
Outbound: If malicious agents don't have write access to outbound destination

resources, they can't use that outbound for data exfiltration. Azure Active
Directory, Azure Resource Manager, Azure Machine Learning, and Microsoft
Container Registry belong to this category. On the other hand, Storage and
AzureFrontDoor.frontend can be used for data exfiltration.
Storage Outbound: This requirement comes from compute instance and

compute cluster. A malicious agent can use this outbound rule to exfiltrate data
by provisioning and saving data in their own storage account. You can remove
data exfiltration risk by using an Azure Service Endpoint Policy and Azure
Batch's simplified node communication architecture.
AzureFrontDoor.frontend outbound: Azure Front Door is used by the Azure

Machine Learning studio UI and AutoML. Instead of allowing outbound to the
service tag (AzureFrontDoor.frontend), switch to the following fully qualified
domain names (FQDN). Switching to these FQDNs removes unnecessary
outbound traffic included in the service tag and allows only what is needed for
Azure Machine Learning studio UI and AutoML.
ml.azure.com
automlresources-prod.azureedge.net
 Tip
The information in this article is primarily about using an Azure Virtual Network.
Azure Machine Learning can also use a managed virtual networks (preview). With a
isolation for your workspace and managed computes.
To address data exfiltration concerns, managed virtual networks allow you to

restrict egress to only approved outbound traffic. For more information, see
Workspace managed network isolation.
Prerequisites
An Azure subscription
An Azure Virtual Network (VNet)
An Azure Machine Learning workspace with a private endpoint that connects to
the VNet.
The storage account used by the workspace must also connect to the VNet
using a private endpoint.
You need to recreate compute instance or scale down compute cluster to zero
node.
Not required if you have joined preview.
Not required if you have new compute instance and compute cluster created
after December 2022.
Why do I need to use the service endpoint

policy
Service endpoint policies allow you to filter egress virtual network traffic to Azure
Storage accounts over service endpoint and allow data exfiltration to only specific Azure
Storage accounts. Azure Machine Learning compute instance and compute cluster
requires access to Microsoft-managed storage accounts for its provisioning. The Azure
Machine Learning alias in service endpoint policies includes Microsoft-managed storage
accounts. We use service endpoint policies with the Azure Machine Learning alias to
prevent data exfiltration or control the destination storage accounts. You can learn more
in Service Endpoint policy documentation.
1. Create the service endpoint policy

1. From the Azure portal , add a new Service Endpoint Policy. On the Basics tab,
provide the required information and then select Next.
2. On the Policy definitions tab, perform the following actions:
a. Select + Add a resource, and then provide the following information:
Service: Microsoft.Storage
Scope: Select the scope as Single account to limit the network traffic to
one storage account.
Subscription: The Azure subscription that contains the storage account.
Resource group: The resource group that contains the storage account.
Resource: The default storage account of your workspace.
Select Add to add the resource information.
b. Select + Add an alias, and then select /services/Azure/MachineLearning as the

Server Alias value. Select Add to add the alias.
７ Note
The Azure CLI and Azure PowerShell do not provide support for adding an
alias to the policy.
3. Select Review + Create, and then select Create.
） Important
If your compute instance and compute cluster need access to additional storage
accounts, your service endpoint policy should include the additional storage
accounts in the resources section. Note that it is not required if you use Storage
private endpoints. Service endpoint policy and private endpoint are independent.
2. Allow inbound and outbound network traffic
Inbound
） Important
The following information modifies the guidance provided in the How to secure
training environment article.
When using Azure Machine Learning compute instance with a public IP address, allow
inbound traffic from Azure Batch management (service tag BatchNodeManagement.
<region> ). A compute instance with no public IP doesn't require this inbound
communication.
Outbound
） Important
The following information is in addition to the guidance provided in the Secure

training environment with virtual networks and Configure inbound and
outbound network traffic articles.
Select the configuration that you're using:
Service tag/NSG
Allow outbound traffic to the following service tags. Replace <region> with the
Azure region that contains your compute cluster or instance:
Service tag Protocol Port
BatchNodeManagement.<region> ANY 443
AzureMachineLearning TCP 443
Storage.<region> TCP 443
７ Note
For the storage outbound, a Service Endpoint Policy will be applied in a later
step to limit outbound traffic.
For more information, see How to secure training environments and Configure inbound
and outbound network traffic.
3. Enable storage endpoint for the subnet

1. From the Azure portal , select the Azure Virtual Network for your Azure Machine
Learning workspace.
2. From the left of the page, select Subnets and then select the subnet that contains
your compute cluster/instance resources.
3. In the form that appears, expand the Services dropdown and then enable
Microsoft.Storage. Select Save to save these changes.
4. Apply the service endpoint policy to your workspace subnet.
4. Curated environments
When using Azure Machine Learning curated environments, make sure to use the latest
environment version. The container registry for the environment must also be
mcr.microsoft.com . To check the container registry, use the following steps:
1. From Azure Machine Learning studio , select your workspace and then select
Environments.
2. Verify that the Azure container registry begins with a value of mcr.microsoft.com .
） Important
If the container registry is viennaglobal.azurecr.io you cannot use the

curated environment with the data exfiltration. Try upgrading to the latest
version of the curated environment.
3. When using mcr.microsoft.com , you must also allow outbound configuration to

the following resources. Select the configuration option that you're using:
Service tag/NSG
Allow outbound traffic over TCP port 443 to the following service tags.
Replace <region> with the Azure region that contains your compute cluster or
instance.
MicrosoftContainerRegistry.<region>
AzureFrontDoor.FirstParty
Next steps
For more information, see the following articles:
How to configure inbound and outbound network traffic

Azure Batch simplified node communication
Attach an Azure Databricks compute
that is secured in a virtual network
(VNet)
Article • 04/18/2023
Both Azure Machine Learning and Azure Databricks can be secured by using a VNet to
restrict incoming and outgoing network communication. When both services are
configured to use a VNet, you can use a private endpoint to allow Azure Machine
Learning to attach Azure Databricks as a compute resource.
The information in this article assumes that your Azure Machine Learning workspace
and Azure Databricks are configured for two separate Azure Virtual Networks. To enable
communication between the two services, Azure Private Link is used. A private endpoint
for each service is created in the VNet for the other service. A private endpoint for Azure
Machine Learning is added to communicate with the VNet used by Azure Databricks. A
private endpoint for Azure Databricks is added to communicate with the VNet used by
Azure Databricks
Computes Spark
Azure Machine Learning Azure Databricks

virtual network virtual network
Prerequisites
An Azure Machine Learning workspace that is configured for network isolation.
An Azure Databricks deployment that is configured in a virtual network (VNet

injection).
） Important
Azure Databricks requires two subnets (sometimes called the private and
public subnet). Both of these subnets are delegated, and cannot be used by
the Azure Machine Learning workspace when creating a private endpoint. We
recommend adding a third subnet to the VNet used by Azure Databricks and
using this subnet for the private endpoint.
The VNets used by Azure Machine Learning and Azure Databricks must use a
different set of IP address ranges.
Limitations
Scenarios where the Azure Machine Learning control plane needs to communicate with
the Azure Databricks control plane are not supported. Currently the only scenario we
have identified where this is a problem is when using the DatabrickStep in a machine
learning pipeline. To work around this limitation, allows public access to your workspace.
This can be either using a workspace that isn't configured with a private link or a
workspace with a private link that is configured to allow public access.
Create a private endpoint for Azure Machine

Learning
To allow the Azure Machine Learning workspace to communicate with the VNet that
Azure Databricks is using, use the following steps:
1. From the Azure portal , select your Azure Machine Learning workspace.
2. From the sidebar, select Networking, Private endpoint connections, and then +
Private endpoint.
3. From the Create a private endpoint form, enter a name for the new private
endpoint. Adjust the other values as needed by your scenario.
4. Select Next until you arrive at the Virtual Network tab. Select the Virtual network
that is used by Azure Databricks, and the Subnet to connect to using the private
endpoint.
5. Select Next until you can select Create to create the resource.
Create a private endpoint for Azure Databricks

To allow Azure Databricks to communicate with the VNet that the Azure Machine
Learning workspace is using, use the following steps:
1. From the Azure portal , select your Azure Databricks instance.
2. From the sidebar, select Networking, Private endpoint connections, and then +
Private endpoint.
3. From the Create a private endpoint form, enter a name for the new private
endpoint. Adjust the other values as needed by your scenario.
4. Select Next until you arrive at the Virtual Network tab. Select the Virtual network
that is used by Azure Machine Learning, and the Subnet to connect to using the
private endpoint.
Attach the Azure Databricks compute

1. From Azure Machine Learning studio , select your workspace and then select
Compute from the sidebar. Select Attached computes, + New, and then Azure
Databricks.
2. From the Attach Databricks compute form, provide the following information:
Compute name: The name of the compute you're adding. This value can be
different than the name of your Azure Databricks workspace.
Subscription: The subscription that contains the Azure Databricks workspace.
Databricks workspace: The Azure Databricks workspace that you're attaching.
Databricks access token: For information on generating a token, see Azure
Databricks personal access tokens.
Select Attach to complete the process.

Next steps
Manage compute resources for training and deployment
Failover for business continuity and
disaster recovery
Article • 05/09/2023
To maximize your uptime, plan ahead to maintain business continuity and prepare for
disaster recovery with Azure Machine Learning.
Microsoft strives to ensure that Azure services are always available. However, unplanned
service outages may occur. We recommend having a disaster recovery plan in place for
handling regional service outages. In this article, you'll learn how to:
Plan for a multi-regional deployment of Azure Machine Learning and associated

resources.
Maximize chances to recover logs, notebooks, docker images, and other metadata.
Design for high availability of your solution.
Initiate a failover to another region.
） Important
Azure Machine Learning itself does not provide automatic failover or disaster
recovery. Backup and restore of workspace metadata such as run history is
unavailable.
In case you have accidentally deleted your workspace or corresponding components,

this article also provides you with currently supported recovery options.
Understand Azure services for Azure Machine

Learning
Azure Machine Learning depends on multiple Azure services. Some of these services are
provisioned in your subscription. You're responsible for the high-availability
configuration of these services. Other services are created in a Microsoft subscription
and are managed by Microsoft.
Azure services include:
Azure Machine Learning infrastructure: A Microsoft-managed environment for

the Azure Machine Learning workspace.
Associated resources: Resources provisioned in your subscription during Azure
Machine Learning workspace creation. These resources include Azure Storage,
Azure Key Vault, Azure Container Registry, and Application Insights.
Default storage has data such as model, training log data, and dataset.
Key Vault has credentials for Azure Storage, Container Registry, and data stores.
Container Registry has a Docker image for training and inferencing
environments.
Application Insights is for monitoring Azure Machine Learning.
Compute resources: Resources you create after workspace deployment. For

example, you might create a compute instance or compute cluster to train a
Machine Learning model.
Compute instance and compute cluster: Microsoft-managed model
development environments.
Other resources: Microsoft computing resources that you can attach to Azure
Machine Learning, such as Azure Kubernetes Service (AKS), Azure Databricks,
Azure Container Instances, and Azure HDInsight. You're responsible for
configuring high-availability settings for these resources.
Other data stores: Azure Machine Learning can mount other data stores such as
Azure Storage, Azure Data Lake Storage, and Azure SQL Database for training data.
These data stores are provisioned within your subscription. You're responsible for
configuring their high-availability settings.
The following table shows the Azure services are managed by Microsoft and which are
managed by you. It also indicates the services that are highly available by default.
Service Managed High availability by

by default
Azure Machine Learning infrastructure Microsoft
Associated resources
Azure Storage You
Key Vault You ✓
Container Registry You
Application Insights You NA
Compute resources
Compute instance Microsoft

Service Managed High availability by
by default
Compute cluster Microsoft
Other compute resources such as AKS, You

Azure Databricks, Container Instances, HDInsight
Other data stores such as Azure Storage, SQL You

Database,
Azure Database for PostgreSQL, Azure Database for
MySQL,
Azure Databricks File System
The rest of this article describes the actions you need to take to make each of these
services highly available.
Plan for multi-regional deployment

A multi-regional deployment relies on creation of Azure Machine Learning and other
resources (infrastructure) in two Azure regions. If a regional outage occurs, you can
switch to the other region. When planning on where to deploy your resources, consider:
Regional availability: Use regions that are close to your users. To check regional
availability for Azure Machine Learning, see Azure products by region .
Azure paired regions: Paired regions coordinate platform updates and prioritize
recovery efforts where needed. For more information, see Azure paired regions.
Service availability: Decide whether the resources used by your solution should be
hot/hot, hot/warm, or hot/cold.
Hot/hot: Both regions are active at the same time, with one region ready to
begin use immediately.
Hot/warm: Primary region active, secondary region has critical resources (for
example, deployed models) ready to start. Non-critical resources would need to
be manually deployed in the secondary region.
Hot/cold: Primary region active, secondary region has Azure Machine Learning
and other resources deployed, along with needed data. Resources such as
models, model deployments, or pipelines would need to be manually deployed.
 Tip
Depending on your business requirements, you may decide to treat different Azure
Machine Learning resources differently. For example, you may want to use hot/hot
for deployed models (inference), and hot/cold for experiments (training).
Azure Machine Learning builds on top of other services. Some services can be
configured to replicate to other regions. Others you must manually create in multiple
regions. The following table provides a list of services, who is responsible for replication,
and an overview of the configuration:
Azure Geo- Configuration

service replicated
by
Machine You Create a workspace in the selected regions.

Learning
workspace
Machine You Create the compute resources in the selected regions. For compute
Learning resources that can dynamically scale, make sure that both regions
compute provide sufficient compute quota for your needs.
Key Vault Microsoft Use the same Key Vault instance with the Azure Machine Learning
workspace and resources in both regions. Key Vault automatically fails
over to a secondary region. For more information, see Azure Key Vault
availability and redundancy.
Container Microsoft Configure the Container Registry instance to geo-replicate registries to

Registry the paired region for Azure Machine Learning. Use the same instance
for both workspace instances. For more information, see Geo-
replication in Azure Container Registry.
Storage You Azure Machine Learning does not support default storage-account
Account failover using geo-redundant storage (GRS), geo-zone-redundant
storage (GZRS), read-access geo-redundant storage (RA-GRS), or read-
access geo-zone-redundant storage (RA-GZRS). Create a separate
storage account for the default storage of each workspace.
Create separate storage accounts or services for other data storage.
For more information, see Azure Storage redundancy.
Application You Create Application Insights for the workspace in both regions. To
Insights adjust the data-retention period and details, see Data collection,
retention, and storage in Application Insights.
To enable fast recovery and restart in the secondary region, we recommend the
following development practices:
Use Azure Resource Manager templates. Templates are 'infrastructure-as-code',

and allow you to quickly deploy services in both regions.
To avoid drift between the two regions, update your continuous integration and
deployment pipelines to deploy to both regions.
When automating deployments, include the configuration of workspace attached
compute resources such as Azure Kubernetes Service.
Create role assignments for users in both regions.
Create network resources such as Azure Virtual Networks and private endpoints for
both regions. Make sure that users have access to both network environments. For
example, VPN and DNS configurations for both virtual networks.
Compute and data services

Depending on your needs, you may have more compute or data services that are used
by Azure Machine Learning. For example, you may use Azure Kubernetes Services or
Azure SQL Database. Use the following information to learn how to configure these
services for high availability.
Compute resources
Azure Kubernetes Service: See Best practices for business continuity and disaster
recovery in Azure Kubernetes Service (AKS) and Create an Azure Kubernetes
Service (AKS) cluster that uses availability zones. If the AKS cluster was created by
using the Azure Machine Learning Studio, SDK, or CLI, cross-region high availability
is not supported.
Azure Databricks: See Regional disaster recovery for Azure Databricks clusters.
Container Instances: An orchestrator is responsible for failover. See Azure
Container Instances and container orchestrators.
HDInsight: See High availability services supported by Azure HDInsight.
Data services
Azure Blob container / Azure Files / Data Lake Storage Gen2: See Azure Storage
redundancy.
Data Lake Storage Gen1: See High availability and disaster recovery guidance for
Data Lake Storage Gen1.
SQL Database: See High availability for Azure SQL Database and SQL Managed
Instance.
Azure Database for PostgreSQL: See High availability concepts in Azure Database
for PostgreSQL - Single Server.
Azure Database for MySQL: See Understand business continuity in Azure Database
for MySQL.
Azure Databricks File System: See Regional disaster recovery for Azure Databricks
clusters.
 Tip
If you provide your own customer-managed key to deploy an Azure Machine

Learning workspace, Azure Cosmos DB is also provisioned within your subscription.
In that case, you're responsible for configuring its high-availability settings. See
High availability with Azure Cosmos DB.
Design for high availability
Deploy critical components to multiple regions

Determine the level of business continuity that you are aiming for. The level may differ
between the components of your solution. For example, you may want to have a
hot/hot configuration for production pipelines or model deployments, and hot/cold for
experimentation.
Manage training data on isolated storage

By keeping your data storage isolated from the default storage the workspace uses for
logs, you can:
Attach the same storage instances as datastores to the primary and secondary
workspaces.
Make use of geo-replication for data storage accounts and maximize your uptime.
Manage machine learning assets as code
７ Note
Backup and restore of workspace metadata such as run history, models and
environments is unavailable. Specifying assets and configurations as code using
YAML specs, will help you re-recreate assets across workspaces in case of a disaster.
Jobs in Azure Machine Learning are defined by a job specification. This specification
includes dependencies on input artifacts that are managed on a workspace-instance
level, including environments, datasets, and compute. For multi-region job submission
and deployments, we recommend the following practices:
Manage your code base locally, backed by a Git repository.

Export important notebooks from Azure Machine Learning studio.
Export pipelines authored in studio as code.
７ Note
Pipelines created in studio designer cannot currently be exported as code.
Manage configurations as code.

Avoid hardcoded references to the workspace. Instead, configure a reference to
the workspace instance using a config file and use Workspace.from_config() to
initialize the workspace. To automate the process, use the Azure CLI extension
for machine learning command az ml folder attach.
Use job submission helpers such as ScriptRunConfig and Pipeline.
Use Environments.save_to_directory() to save your environment definitions.
Use a Dockerfile if you use custom Docker images.
Use the Dataset class to define the collection of data paths used by your
solution.
Use the Inferenceconfig class to deploy models as inference endpoints.
Initiate a failover
Continue work in the failover workspace

When your primary workspace becomes unavailable, you can switch over the secondary
workspace to continue experimentation and development. Azure Machine Learning
does not automatically submit jobs to the secondary workspace if there is an outage.
Update your code configuration to point to the new workspace resource. We
recommend to avoiding hardcoding workspace references. Instead, use a workspace
config file to minimize manual user steps when changing workspaces. Make sure to also
update any automation, such as continuous integration and deployment pipelines to the
new workspace.
Azure Machine Learning cannot sync or recover artifacts or metadata between

workspace instances. Dependent on your application deployment strategy, you might
have to move artifacts or recreate experimentation inputs such as dataset objects in the
failover workspace in order to continue job submission. In case you have configured
your primary workspace and secondary workspace resources to share associated
resources with geo-replication enabled, some objects might be directly available to the
failover workspace. For example, if both workspaces share the same docker images,
configured datastores, and Azure Key Vault resources. The following diagram shows a
configuration where two workspaces share the same images (1), datastores (2), and Key
Vault (3).
７ Note
Any jobs that are running when a service outage occurs will not automatically
transition to the secondary workspace. It is also unlikely that the jobs will resume
and finish successfully in the primary workspace once the outage is resolved.
Instead, these jobs must be resubmitted, either in the secondary workspace or in
the primary (once the outage is resolved).
Moving artifacts between workspaces

Depending on your recovery approach, you may need to copy artifacts such as dataset
and model objects between the workspaces to continue your work. Currently, the
portability of artifacts between workspaces is limited. We recommend managing
artifacts as code where possible so that they can be recreated in the failover instance.
The following artifacts can be exported and imported between workspaces by using the
Azure CLI extension for machine learning:
Artifact Export Import
Models az ml model download --model- az ml model register –name

id {ID} --target-dir {PATH} {NAME} --path {PATH}
Artifact Export Import
Environments az ml environment download -n az ml environment register -

{NAME} -d {PATH} d {PATH}
Azure Machine Learning az ml pipeline get --path {PATH} az ml pipeline create --name
pipelines (code-generated) {NAME} -y {PATH}
 Tip
Registered datasets cannot be downloaded or moved. This includes datasets

generated by Azure Machine Learning, such as intermediate pipeline datasets.
However datasets that refer to a shared file location that both workspaces can
access, or where the underlying data storage is replicated, can be registered
on both workspaces. Use the az ml dataset register to register a dataset.
Job outputs are stored in the default storage account associated with a
workspace. While job outputs might become inaccessible from the studio UI
in the case of a service outage, you can directly access the data through the
storage account. For more information on working with data stored in blobs,
see Create, download, and list blobs with Azure CLI.
Recovery options
Workspace deletion
If you accidentally deleted your workspace, you might able to recover it. For recovery
steps, see Recover workspace data after accidental deletion with soft delete (Preview).
Even if your workspace cannot be recovered, you may still be able to retrieve your
notebooks from the workspace-associated Azure storage resource by following these
steps:
In the Azure portal navigate to the storage account that was linked to the
deleted Azure Machine Learning workspace.
In the Data storage section on the left, click on File shares.
Your notebooks are located on the file share with the name that contains your
workspace ID.
Next steps
To learn about repeatable infrastructure deployments with Azure Machine Learning, use
an Azure Resource Manager template.
Regenerate storage account access keys
Article • 06/02/2023
Learn how to change the access keys for Azure Storage accounts used by Azure Machine
Learning. Azure Machine Learning can use storage accounts to store data or trained
models.
For security purposes, you may need to change the access keys for an Azure Storage
account. When you regenerate the access key, Azure Machine Learning must be
updated to use the new key. Azure Machine Learning may be using the storage account
for both model storage and as a datastore.
） Important
Credentials registered with datastores are saved in your Azure Key Vault associated
with the workspace. If you have soft-delete enabled for your Key Vault, this article
provides instructions for updating credentials. If you unregister the datastore and
try to re-register it under the same name, this action will fail. See Turn on Soft
Delete for an existing key vault for how to enable soft delete in this scenario.
Prerequisites
An Azure Machine Learning workspace. For more information, see the Create
workspace resources article.
The Azure Machine Learning SDK v1.
The Azure Machine Learning CLI extension v1.
７ Note
The code snippets in this document were tested with version 1.0.83 of the Python
SDK.
What needs to be updated

Storage accounts can be used by the Azure Machine Learning workspace (storing logs,
models, snapshots, etc.) and as a datastore. The process to update the workspace is a
single Azure CLI command, and can be ran after updating the storage key. The process
of updating datastores is more involved, and requires discovering what datastores are
currently using the storage account and then re-registering them.
） Important
Update the workspace using the Azure CLI, and the datastores using Python, at the
same time. Updating only one or the other is not sufficient, and may cause errors
until both are updated.
To discover the storage accounts that are used by your datastores, use the following
code:
Python
import azureml.core
default_ds = ws.get_default_datastore()
print("Default datstore: " + default_ds.name + ", storage account name: " +
default_ds.account_name + ", container name: " +
default_ds.container_name)
for name, ds in datastores.items():
if ds.datastore_type == "AzureBlob":
print("Blob store - datastore name: " + name + ", storage account
name: " +
ds.account_name + ", container name: " + ds.container_name)
if ds.datastore_type == "AzureFile":
print("File share - datastore name: " + name + ", storage account
name: " +
ds.account_name + ", container name: " + ds.container_name)
This code looks for any registered datastores that use Azure Storage and lists the
following information:
Datastore name: The name of the datastore that the storage account is registered
under.
Storage account name: The name of the Azure Storage account.
Container: The container in the storage account that is used by this registration.
It also indicates whether the datastore is for an Azure Blob or an Azure File share, as
there are different methods to re-register each type of datastore.
If an entry exists for the storage account that you plan on regenerating access keys for,
save the datastore name, storage account name, and container name.
Update the access key

To update Azure Machine Learning to use the new key, use the following steps:
） Important
Perform all steps, updating both the workspace using the CLI, and datastores using
Python. Updating only one or the other may cause errors until both are updated.
1. Regenerate the key. For information on regenerating an access key, see Manage
storage account access keys. Save the new key.
2. The Azure Machine Learning workspace will automatically synchronize the new key
and begin using it after an hour. To force the workspace to synch to the new key
immediately, use the following steps:
a. To sign in to the Azure subscription that contains your workspace by using the
following Azure CLI command:
Azure CLI
az login
 Tip
After logging in, you see a list of subscriptions associated with your Azure
account. The subscription information with isDefault: true is the currently
activated subscription for Azure CLI commands. This subscription must be
the same one that contains your Azure Machine Learning workspace. You
can find the subscription ID from the Azure portal by visiting the
overview page for your workspace. You can also use the SDK to get the
subscription ID from the workspace object. For example,
Workspace.from_config().subscription_id .
To select another subscription, use the az account set -s <subscription
name or ID> command and specify the subscription name or ID to switch
to. For more information about subscription selection, see Use multiple
Azure Subscriptions.
b. To update the workspace to use the new key, use the following command.
Replace myworkspace with your Azure Machine Learning workspace name, and
replace myresourcegroup with the name of the Azure resource group that
contains the workspace.
Azure CLI
az ml workspace sync-keys -w myworkspace -g myresourcegroup
This command automatically syncs the new keys for the Azure storage account
used by the workspace.
3. You can re-register datastore(s) that use the storage account via the SDK or the
Azure Machine Learning studio .
a. To re-register datastores via the Python SDK, use the values from the What
needs to be updated section and the key from step 1 with the following code.
Since overwrite=True is specified, this code overwrites the existing registration

and updates it to use the new key.
Python
# Re-register the blob container

ds_blob = Datastore.register_azure_blob_container(workspace=ws,
datastore_name='your
datastore name',
container_name='your
container name',
account_name='your storage
account name',
account_key='new storage
account key',
overwrite=True)
# Re-register file shares
ds_file = Datastore.register_azure_file_share(workspace=ws,
datastore_name='your datastore
name',
file_share_name='your
container name',
account_name='your storage
account name',
account_key='new storage
account key',
overwrite=True)
b. To re-register datastores via the studio
i. In the studio, select Data on the left pane under Assets.
ii. At the top, select Datastores.
iii. Select which datastore you want to update.
iv. Select the Update credentials button on the top left.
v. Use your new access key from step 1 to populate the form and click Save.
If you are updating credentials for your default datastore, complete this step
and repeat step 2b to resync your new key with the default datastore of the
workspace.
Next steps
For more information on registering datastores, see the Datastore class reference.
Manage Azure Machine Learning
workspaces using Azure CLI extension v1
Article • 05/30/2023
） Important

and Python SDK v2.
In this article, you learn how to create and manage Azure Machine Learning workspaces
using the Azure CLI. The Azure CLI provides commands for managing Azure resources
and is designed to get you working quickly with Azure, with an emphasis on
automation. The machine learning extension for the CLI provides commands for working
with Azure Machine Learning resources.
Prerequisites
An Azure subscription. If you don't have one, try the free or paid version of Azure
Machine Learning .
To use the CLI commands in this document from your local environment, you
need the Azure CLI.
If you use the Azure Cloud Shell , the CLI is accessed through the browser and
lives in the cloud.
Limitations
When creating a new workspace, you can either automatically create services
needed by the workspace or use existing services. If you want to use existing
services from a different Azure subscription than the workspace, you must
register the Azure Machine Learning namespace in the subscription that contains
those services. For example, creating a workspace in subscription A that uses a
storage account from subscription B, the Azure Machine Learning namespace must
be registered in subscription B before you can use the storage account with the
workspace.
The resource provider for Azure Machine Learning is

Microsoft.MachineLearningServices. For information on how to see if it is
registered and how to register it, see the Azure resource providers and types
article.
） Important
This only applies to resources provided during workspace creation; Azure

Storage Accounts, Azure Container Register, Azure Key Vault, and Application
Insights.
 Tip
An Azure Application Insights instance is created when you create the workspace.
You can delete the Application Insights instance after cluster creation if you want.
Deleting it limits the information gathered from the workspace, and may make it
more difficult to troubleshoot problems. If you delete the Application Insights
instance created by the workspace, you cannot re-create it without deleting and
recreating the workspace.
For more information on using this Application Insights instance, see Monitor and
collect data from Machine Learning web service endpoints.
Secure CLI communications

Some of the Azure CLI commands communicate with Azure Resource Manager over the
internet. This communication is secured using HTTPS/TLS 1.2.
With the Azure Machine Learning CLI extension v1 ( azure-cli-ml ), only some of the
commands communicate with the Azure Resource Manager. Specifically, commands that
create, update, delete, list, or show Azure resources. Operations such as submitting a
training job communicate directly with the Azure Machine Learning workspace. If your
workspace is secured with a private endpoint, that is enough to secure commands
provided by the azure-cli-ml extension.
Connect the CLI to your Azure subscription
） Important
If you are using the Azure Cloud Shell, you can skip this section. The cloud shell
automatically authenticates you using the account you log into your Azure
subscription.
There are several ways that you can authenticate to your Azure subscription from the
CLI. The most simple is to interactively authenticate using a browser. To authenticate
interactively, open a command line or terminal and use the following command:
Azure CLI
az login
Otherwise, you need to open a browser and follow the instructions on the command
line. The instructions involve browsing to https://aka.ms/devicelogin and entering an
authorization code.
 Tip
After logging in, you see a list of subscriptions associated with your Azure account.
The subscription information with isDefault: true is the currently activated
subscription for Azure CLI commands. This subscription must be the same one that
contains your Azure Machine Learning workspace. You can find the subscription ID
from the Azure portal by visiting the overview page for your workspace. You can
also use the SDK to get the subscription ID from the workspace object. For
example, Workspace.from_config().subscription_id .
To select another subscription, use the az account set -s <subscription name or

ID> command and specify the subscription name or ID to switch to. For more
information about subscription selection, see Use multiple Azure Subscriptions.
Create a resource group

The Azure Machine Learning workspace must be created inside a resource group. You
can use an existing resource group or create a new one. To create a new resource
group, use the following command. Replace <resource-group-name> with the name to
use for this resource group. Replace <location> with the Azure region to use for this
resource group:
７ Note
You should select a region where Azure Machine Learning is available. For
information, see Products available by region .
Azure CLI
az group create --name <resource-group-name> --location <location>
The response from this command is similar to the following JSON. You can use the
output values to locate the created resources or parse them as input to subsequent CLI
steps for automation.
JSON
{
"id": "/subscriptions/<subscription-
GUID>/resourceGroups/<resourcegroupname>",
"location": "<location>",
"managedBy": null,
"name": "<resource-group-name>",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null,
"type": null
}
For more information on working with resource groups, see az group.
Create a workspace
When you deploy an Azure Machine Learning workspace, various other services are
required as dependent associated resources. When you use the CLI to create the
workspace, the CLI can either create new associated resources on your behalf or you
could attach existing resources.
） Important
When attaching your own storage account, make sure that it meets the following
criteria:
The storage account is not a premium account (Premium_LRS and

Premium_GRS)
Both Azure Blob and Azure File capabilities enabled
Hierarchical Namespace (ADLS Gen 2) is disabled These requirements are only
for the default storage account used by the workspace.
When attaching Azure container registry, you must have the admin account
enabled before it can be used with an Azure Machine Learning workspace.
Create with new resources
To create a new workspace where the services are automatically created, use the
following command:
Azure CLI
az ml workspace create -w <workspace-name> -g <resource-group-name>
） Important
When you attaching existing resources, you don't have to specify all. You can
specify one or more. For example, you can specify an existing storage account and
the workspace will create the other resources.
The output of the workspace creation command is similar to the following JSON. You
can use the output values to locate the created resources or parse them as input to
subsequent CLI steps.
JSON
{
"applicationInsights": "/subscriptions/<service-
GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.insights/components/<application-insight-name>",
"containerRegistry": "/subscriptions/<service-
GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.containerregistry/registries/<acr-name>",
"creationTime": "2019-08-30T20:24:19.6984254+00:00",
"description": "",
"friendlyName": "<workspace-name>",
"id": "/subscriptions/<service-GUID>/resourceGroups/<resource-group-
name>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-
name>",
"identityPrincipalId": "<GUID>",
"identityTenantId": "<GUID>",
"identityType": "SystemAssigned",
"keyVault": "/subscriptions/<service-GUID>/resourcegroups/<resource-group-
name>/providers/microsoft.keyvault/vaults/<key-vault-name>",
"location": "<location>",
"name": "<workspace-name>",
"resourceGroup": "<resource-group-name>",
"storageAccount": "/subscriptions/<service-GUID>/resourcegroups/<resource-
group-name>/providers/microsoft.storage/storageaccounts/<storage-account-
name>",
"type": "Microsoft.MachineLearningServices/workspaces",
"workspaceid": "<GUID>"
}
Advanced configurations
Configure workspace for private network connectivity

Dependent on your use case and organizational requirements, you can choose to
configure Azure Machine Learning using private network connectivity. You can use the
Azure CLI to deploy a workspace and a Private link endpoint for the workspace resource.
For more information on using a private endpoint and virtual network (VNet) with your
workspace, see Virtual network isolation and privacy overview. For complex resource
configurations, also refer to template based deployment options including Azure
Resource Manager.
If you want to restrict workspace access to a virtual network, you can use the following
parameters as part of the az ml workspace create command or use the az ml workspace
private-endpoint commands.
Azure CLI
az ml workspace create -w <workspace-name>

-g <resource-group-name>
--pe-name "<pe name>"
--pe-auto-approval "<pe-autoapproval>"
--pe-resource-group "<pe name>"
--pe-vnet-name "<pe name>"
--pe-subnet-name "<pe name>"
--pe-name : The name of the private endpoint that is created.
--pe-auto-approval : Whether private endpoint connections to the workspace

should be automatically approved.
--pe-resource-group : The resource group to create the private endpoint in. Must
be the same group that contains the virtual network.
--pe-vnet-name : The existing virtual network to create the private endpoint in.
--pe-subnet-name : The name of the subnet to create the private endpoint in. The
default value is default .
For more information on how to use these commands, see the CLI reference pages.
Customer-managed key and high business impact

workspace
By default, metadata for the workspace is stored in an Azure Cosmos DB instance that
Microsoft maintains. This data is encrypted using Microsoft-managed keys. Instead of
using the Microsoft-managed key, you can also provide your own key. Doing so creates
an extra set of resources in your Azure subscription to store your data.
To learn more about the resources that are created when you bring your own key for
encryption, see Data encryption with Azure Machine Learning.
Use the --cmk-keyvault parameter to specify the Azure Key Vault that contains the key,
and --resource-cmk-uri to specify the resource ID and uri of the key within the vault.
To limit the data that Microsoft collects on your workspace, you can additionally specify
the --hbi-workspace parameter.
Azure CLI
az ml workspace create -w <workspace-name>

-g <resource-group-name>
--cmk-keyvault "<cmk keyvault name>"
--resource-cmk-uri "<resource cmk uri>"
--hbi-workspace
７ Note
Authorize the Machine Learning App (in Identity and Access Management) with
contributor permissions on your subscription to manage the data encryption
additional resources.
７ Note
Azure Cosmos DB is not used to store information such as model performance,

information logged by experiments, or information logged from your model
deployments. For more information on monitoring these items, see the Monitoring
and logging section of the architecture and concepts article.
） Important
Selecting high business impact can only be done when creating a workspace. You
cannot change this setting after workspace creation.
For more information on customer-managed keys and high business impact workspace,
see Enterprise security for Azure Machine Learning.
Using the CLI to manage workspaces
Get workspace information

To get information about a workspace, use the following command:
Azure CLI
az ml workspace show -w <workspace-name> -g <resource-group-name>
Update a workspace
To update a workspace, use the following command:
Azure CLI
az ml workspace update -n <workspace-name> -g <resource-group-name>
Sync keys for dependent resources

If you change access keys for one of the resources used by your workspace, it takes
around an hour for the workspace to synchronize to the new key. To force the
workspace to sync the new keys immediately, use the following command:
Azure CLI
az ml workspace sync-keys -w <workspace-name> -g <resource-group-name>
For more information on changing keys, see Regenerate storage access keys.
Delete a workspace
２ Warning
If soft-delete is enabled for the workspace, it can be recovered after deletion. If

soft-delete isn't enabled, or you select the option to permanently delete the
workspace, it can't be recovered. For more information, see Recover a deleted
workspace.
To delete a workspace after it's no longer needed, use the following command:
Azure CLI
az ml workspace delete -w <workspace-name> -g <resource-group-name>
） Important
Deleting a workspace does not delete the application insight, storage account, key
vault, or container registry used by the workspace.
You can also delete the resource group, which deletes the workspace and all other Azure
resources in the resource group. To delete the resource group, use the following
command:
Azure CLI
az group delete -g <resource-group-name>
 Tip
The default behavior for Azure Machine Learning is to soft delete the workspace.
This means that the workspace is not immediately deleted, but instead is marked
for deletion. For more information, see Soft delete.
Troubleshooting
Resource provider errors

When creating an Azure Machine Learning workspace, or a resource used by the
workspace, you may receive an error similar to the following messages:
No registered resource provider found for location {location}
The subscription is not registered to use namespace {resource-provider-
namespace}
Most resource providers are automatically registered, but not all. If you receive this
message, you need to register the provider mentioned.
The following table contains a list of the resource providers required by Azure Machine
Learning:
Resource provider Why it's needed
Microsoft.MachineLearningServices Creating the Azure Machine Learning workspace.
Microsoft.Storage Azure Storage Account is used as the default storage

for the workspace.
Microsoft.ContainerRegistry Azure Container Registry is used by the workspace to

build Docker images.
Microsoft.KeyVault Azure Key Vault is used by the workspace to store

secrets.
Microsoft.Notebooks/NotebookProxies Integrated notebooks on Azure Machine Learning

compute instance.
Microsoft.ContainerService If you plan on deploying trained models to Azure

Kubernetes Services.
If you plan on using a customer-managed key with Azure Machine Learning, then the
following service providers must be registered:

Microsoft.DocumentDB/databaseAccounts Azure CosmosDB instance that logs metadata for

the workspace.
Microsoft.Search/searchServices Azure Search provides indexing capabilities for the

workspace.
For information on registering resource providers, see Resolve errors for resource
provider registration.
Moving the workspace
２ Warning
Moving your Azure Machine Learning workspace to a different subscription, or

moving the owning subscription to a new tenant, is not supported. Doing so may
cause errors.
Deleting the Azure Container Registry

The Azure Machine Learning workspace uses Azure Container Registry (ACR) for some
operations. It will automatically create an ACR instance when it first needs one.
２ Warning
Once an Azure Container Registry has been created for a workspace, do not delete
it. Doing so will break your Azure Machine Learning workspace.
Next steps
For more information on the Azure CLI extension for machine learning, see the az ml
(v1) documentation.
To check for problems with your workspace, see How to use workspace diagnostics.
To learn how to move a workspace to a new Azure subscription, see How to move a
workspace.
For information on how to keep your Azure Machine Learning up to date with the latest
security updates, see Vulnerability management.
Recover workspace data while soft
deleted
Article • 06/16/2023
The soft delete feature for Azure Machine Learning workspace provides a data
protection capability that enables you to attempt recovery of workspace data after
accidental deletion. Soft delete introduces a two-step approach in deleting a workspace.
When a workspace is deleted, it's first soft deleted. While in soft-deleted state, you can
choose to recover or permanently delete a workspace and its data during a data
retention period.
How workspace soft delete works

When a workspace is soft deleted, data and metadata stored service-side get soft
deleted, but some configurations get hard deleted. Below table provides an overview of
which configurations and objects get soft deleted, and which are hard deleted.
Data / configuration Soft deleted Hard deleted
Run History ✓
Models ✓
Data ✓
Environments ✓
Components ✓
Notebooks ✓
Pipelines ✓
Designer pipelines ✓
AutoML jobs ✓
Data labeling projects ✓
Datastores ✓
Queued or running jobs ✓
Role assignments ✓*
Internal cache ✓
Data / configuration Soft deleted Hard deleted
Compute instance ✓
Compute clusters ✓
Inference endpoints ✓
Linked Databricks workspaces ✓*
* Microsoft attempts recreation or reattachment when a workspace is recovered. Recovery

isn't guaranteed, and a best effort attempt.
After soft deletion, the service keeps necessary data and metadata during the recovery
retention period. When the retention period expires, or in case you permanently delete
a workspace, data and metadata will be actively deleted.
Soft delete retention period

A default retention period of 14 days holds for deleted workspaces. The retention period
indicates how long workspace data remains available after it's deleted. The clock starts
on the retention period as soon as a workspace is soft deleted.
During the retention period, soft deleted workspaces can be recovered or permanently
deleted. Any other operations on the workspace, like submitting a training job, will fail.
） Important
You can't reuse the name of a workspace that has been soft deleted until the
retention period has passed or the workspace is permanently deleted. Once the
retention period elapses, a soft deleted workspace automatically gets permanently
deleted.
Deleting a workspace
The default deletion behavior when deleting a workspace is soft delete. Optionally, you
may override the soft delete behavior by permanently deleting your workspace.
Permanently deleting a workspace ensures workspace data is immediately deleted. Use
this option to meet related compliance requirements, or whenever you require a
workspace name to be reused immediately after deletion. This may be useful in dev/test
scenarios where you want to create and later delete a workspace.
When deleting a workspace from the Azure portal, check Delete the workspace
permanently. You can permanently delete only one workspace at a time, and not using
a batch operation.
 Tip
The v1 SDK and CLI don't provide functionality to override the default soft-delete
behavior. To override the default behavior from SDK or CLI, use the the v2 versions.
For more information, see the CLI & SDK v2 article or the v2 version of this article.
Once permanently deleted, workspace data can no longer be recovered. Permanent

deletion of workspace data is also triggered when the soft delete retention period
expires.
Manage soft deleted workspaces

Soft deleted workspaces can be managed under the Azure Machine Learning resource
provider in the Azure portal. To list soft deleted workspaces, use the following steps:
1. From the Azure portal , select More services. From the AI + machine learning
category, select Azure Machine Learning.
2. From the top of the page, select Recently deleted to view workspaces that were
soft-deleted and are still within the retention period.
3. From the recently deleted workspaces view, you can recover or permanently delete
a workspace.
Recover a soft deleted workspace

When you select Recover on a soft deleted workspace, it initiates an operation to restore
the workspace state. The service attempts recreation or reattachment of a subset of
resources, including Azure RBAC role assignments. Hard-deleted resources including
compute clusters should be recreated by you.
Azure Machine Learning recovers Azure RBAC role assignments for the workspace
identity, but doesn't recover role assignments you have added on the workspace. It may
take up to 15 minutes for role assignments to propagate after workspace recovery.
Recovery of a workspace may not always be possible. Azure Machine Learning stores
workspace metadata on other Azure resources associated with the workspace. In the
event these dependent Azure resources were deleted, it may prevent the workspace
from being recovered or correctly restored. Dependencies of the Azure Machine
Learning workspace must be recovered first, before recovering a deleted workspace. The
following table outlines recovery options for each dependency of the Azure Machine
Learning workspace.
Dependency Recovery approach
Azure Key Recover a deleted Azure Key Vault instance

Vault
Azure Recover a deleted Azure storage account.

Storage
Azure Azure Container Registry is not a hard requirement for workspace recovery. Azure
Container Machine Learning can regenerate images for custom environments.
Registry
Azure First, recover your log analytics workspace. Then recreate an application insights
Application with the original name.
Insights
Billing implications
In general, when a workspace is in soft deleted state, there are only two operations
possible: 'permanently delete' and 'recover'. All other operations will fail. Therefore, even
though the workspace exists, no compute operations can be performed and hence no
usage will occur. When a workspace is soft deleted, any cost-incurring resources
including compute clusters are hard deleted.
） Important
Workspaces that use customer-managed keys for encryption store additional

service data in your subscription in a managed resource group. When a workspace
is soft deleted, the managed resource group and resources in it will not be deleted
and will incur cost until the workspace is hard-deleted.
General Data Protection Regulation (GDPR)
implications
After soft deletion, the service keeps necessary data and metadata during the recovery
retention period. From a GDPR and privacy perspective, a request to delete personal
data should be interpreted as a request for permanent deletion of a workspace and not
soft delete.
When the retention period expires, or in case you permanently delete a workspace, data
and metadata will be actively deleted. You could choose to permanently delete a
workspace at the time of deletion.
For more information, see the Export or delete workspace data article.
Next steps
Create and manage a workspace
Export or delete workspace data
How to use workspace diagnostics
Article • 04/04/2023
Azure Machine Learning provides a diagnostic API that can be used to identify problems
with your workspace. Errors returned in the diagnostics report include information on
how to resolve the problem.
You can use the workspace diagnostics from the Azure Machine Learning studio or
Python SDK.
Prerequisites
An Azure Machine Learning workspace. If you don't have one, see Create a
workspace.
The Azure Machine Learning SDK v1 for Python.
Diagnostics from studio

From Azure Machine Learning studio or the Python SDK, you can run diagnostics on
your workspace to check your setup. To run diagnostics, select the '?' icon from the
upper right corner of the page. Then select Run workspace diagnostics.
After diagnostics run, a list of any detected problems is returned. This list includes links
to possible solutions.
Diagnostics from Python

The following snippet demonstrates how to use workspace diagnostics from Python
Python
diag_param = {
"value": {
}
}
resp = ws.diagnose_workspace(diag_param)
print(resp)
The response is a JSON document that contains information on any problems detected
with the workspace. The following JSON is an example response:
JSON
{
"value": {
"user_defined_route_results": [],
"network_security_rule_results": [],
"resource_lock_results": [],
"dns_resolution_results": [{
"code": "CustomDnsInUse",
"level": "Warning",
"message": "It is detected VNet '/subscriptions/<subscription-
id>/resourceGroups/<resource-group-
name>/providers/Microsoft.Network/virtualNetworks/<virtual-network-name>' of
private endpoint '/subscriptions/<subscription-
id>/resourceGroups/larrygroup0916/providers/Microsoft.Network/privateEndpoin
ts/<workspace-private-endpoint>' is not using Azure default DNS. You need to
configure your DNS server and check
https://learn.microsoft.com/azure/machine-learning/how-to-custom-dns to make
sure the custom DNS is set up correctly."
}],
"storage_account_results": [],
"key_vault_results": [],
"container_registry_results": [],
"application_insights_results": [],
"other_results": []
}
}
If no problems are detected, an empty JSON document is returned.
For more information, see the Workspace.diagnose_workspace() reference.
Next steps
How to manage workspaces in portal or SDK
Create and manage an Azure Machine
Learning compute instance with CLI v1
Article • 04/04/2023
Learn how to create and manage a compute instance in your Azure Machine Learning
workspace with CLI v1.
Use a compute instance as your fully configured and managed development

environment in the cloud. For development and testing, you can also use the instance as
a training compute target or for an inference target. A compute instance can run
multiple jobs in parallel and has a job queue. As a development environment, a
compute instance can't be shared with other users in your workspace.
Compute instances can run jobs securely in a virtual network environment, without
requiring enterprises to open up SSH ports. The job executes in a containerized
environment and packages your model dependencies in a Docker container.
Create a compute instance

Manage (start, stop, restart, delete) a compute instance
７ Note
This article covers only how to do these tasks using CLI v1. For more recent ways to
manage a compute instance, see Create an Azure Machine Learning compute
cluster.
Prerequisites
The Azure CLI extension for Machine Learning service (v1) or Azure Machine
Learning Python SDK (v1).
） Important
until that date.

Create
） Important
Items marked (preview) below are currently in public preview. The preview version
is provided without a service level agreement, and it's not recommended for
production workloads. Certain features might not be supported or might have
Creating a compute instance is a one time process for your workspace. You can reuse
the compute as a development workstation or as a compute target for training. You can
have multiple compute instances attached to your workspace.
applies to compute instance creation, is unified and shared with Azure Machine Learning
training compute cluster quota. Stopping the compute instance doesn't release quota to
ensure you'll be able to restart the compute instance. It isn't possible to change the
virtual machine size of compute instance once it's created.
The following example demonstrates how to create a compute instance:
Python SDK
Python
import datetime
import time
# Choose a name for your instance

# Compute instance name should be unique across the azure region
compute_name = "ci{}".format(ws._workspace_id)[:10]
# Verify that instance does not exist already

try:
instance = ComputeInstance(workspace=ws, name=compute_name)
print('Found existing instance, use it.')
vm_size='STANDARD_D3_V2',
ssh_public_access=False,
# vnet_resourcegroup_name='<my-resource-group>',
# vnet_name='<my-vnet-name>',
# subnet_name='default',
# admin_user_ssh_public_key='<my-sshkey>'
)
instance = ComputeInstance.create(ws, compute_name, compute_config)
ComputeInstance class
ComputeTarget.create
ComputeInstance.wait_for_completion
Manage
Start, stop, restart, and delete a compute instance. A compute instance doesn't
automatically scale down, so make sure to stop the resource to prevent ongoing
charges. Stopping a compute instance deallocates it. Then start it again when you need
it. While stopping the compute instance stops the billing for compute hours, you'll still
be billed for disk, public IP, and standard load balancer.
 Tip
The compute instance has 120GB OS disk. If you run out of disk space, use the
terminal to clear at least 1-2 GB before you stop or restart the compute instance.
Please do not stop the compute instance by issuing sudo shutdown from the
terminal. The temp disk size on compute instance depends on the VM size chosen
and is mounted on /mnt.
Python SDK
In the examples below, the name of the compute instance is instance.
Get status
Python
# get_status() gets the latest status of the ComputeInstance target

instance.get_status()
Stop
Python
# stop() is used to stop the ComputeInstance

# Stopping ComputeInstance will stop the billing meter and persist
the state on the disk.
# Available Quota will not be changed with this operation.
instance.stop(wait_for_completion=True, show_output=True)
Start
Python
# start() is used to start the ComputeInstance if it is in stopped

state
instance.start(wait_for_completion=True, show_output=True)
Restart
Python
# restart() is used to restart the ComputeInstance

instance.restart(wait_for_completion=True, show_output=True)
Delete
Python
# delete() is used to delete the ComputeInstance target. Useful if

you want to re-use the compute name
instance.delete(wait_for_completion=True, show_output=True)
Azure RBAC allows you to control which users in the workspace can create, delete, start,
stop, restart a compute instance. All users in the workspace contributor and owner role
can create, delete, start, stop, and restart compute instances across the workspace.
However, only the creator of a specific compute instance, or the user assigned if it was
created on their behalf, is allowed to access Jupyter, JupyterLab, RStudio, and Posit
Workbench (formerly RStudio Workbench) on that compute instance. A compute
instance is dedicated to a single user who has root access. That user has access to
Jupyter/JupyterLab/RStudio/Posit Workbench running on the instance. Compute
instance will have single-user sign in and all actions will use that user's identity for Azure
RBAC and attribution of experiment runs. SSH access is controlled through
public/private key mechanism.
These actions can be controlled by Azure RBAC:
Microsoft.MachineLearningServices/workspaces/computes/read
Microsoft.MachineLearningServices/workspaces/computes/write
Microsoft.MachineLearningServices/workspaces/computes/delete
Microsoft.MachineLearningServices/workspaces/computes/start/action
Microsoft.MachineLearningServices/workspaces/computes/stop/action
Microsoft.MachineLearningServices/workspaces/computes/restart/action
Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action
To create a compute instance, you'll need permissions for the following actions:
Microsoft.MachineLearningServices/workspaces/computes/write
Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action
Next steps
Access the compute instance terminal
Create and manage files
Update the compute instance to the latest VM image
Submit a training run
Create an Azure Machine Learning
compute cluster with CLI v1
Article • 07/03/2023
Learn how to create and manage a compute cluster in your Azure Machine Learning
workspace.
You can use Azure Machine Learning compute cluster to distribute a training or batch
inference process across a cluster of CPU or GPU compute nodes in the cloud. For more
information on the VM sizes that include GPUs, see GPU-optimized virtual machine
sizes.
In this article, learn how to:
Create a compute cluster

Lower your compute cluster cost
Set up a managed identity for the cluster
Prerequisites
The Azure CLI extension for Machine Learning service (v1), Azure Machine Learning
） Important
until that date.

If using the Python SDK, set up your development environment with a workspace.
Once your environment is set up, attach to the workspace in your Python script:
Python
What is a compute cluster?

Azure Machine Learning compute cluster is a managed-compute infrastructure that
allows you to easily create a single or multi-node compute. The compute cluster is a
resource that can be shared with other users in your workspace. The compute scales up
automatically when a job is submitted, and can be put in an Azure Virtual Network.
Compute cluster supports no public IP deployment as well in virtual network. The
compute executes in a containerized environment and packages your model
dependencies in a Docker container .
Compute clusters can run jobs securely in a virtual network environment, without
requiring enterprises to open up SSH ports. The job executes in a containerized
environment and packages your model dependencies in a Docker container.
Limitations
Compute clusters can be created in a different region and VNet than your
workspace. However, this functionality is only available using the SDK v2, CLI v2, or
studio. For more information, see the v2 version of secure training environments.
We currently support only creation (and not updating) of clusters through ARM
templates. For updating compute, we recommend using the SDK, Azure CLI or UX
for now.
Azure Machine Learning Compute has default limits, such as the number of cores
that can be allocated. For more information, see Manage and request quotas for
Azure resources.
Azure allows you to place locks on resources, so that they cannot be deleted or are
read only. Do not apply resource locks to the resource group that contains your
workspace. Applying a lock to the resource group that contains your workspace
will prevent scaling operations for Azure Machine Learning compute clusters. For
more information on locking resources, see Lock resources to prevent unexpected
changes.
 Tip
Clusters can generally scale up to 100 nodes as long as you have enough quota for
the number of cores required. By default clusters are setup with inter-node
communication enabled between the nodes of the cluster to support MPI jobs for
example. However you can scale your clusters to 1000s of nodes by simply raising a
support ticket , and requesting to allow list your subscription, or workspace, or a
specific cluster for disabling inter-node communication.
Create
Azure Machine Learning Compute can be reused across runs. The compute can be
shared with other users in the workspace and is retained between runs, automatically
scaling nodes up or down based on the number of runs submitted, and the max_nodes
set on your cluster. The min_nodes setting controls the minimum nodes available.
applies to compute cluster creation, is unified and shared with Azure Machine Learning
training compute instance quota.
） Important
To avoid charges when no jobs are running, set the minimum nodes to 0. This
setting allows Azure Machine Learning to de-allocate the nodes when they aren't in
use. Any value larger than 0 will keep that number of nodes running, even if they
are not in use.
The compute autoscales down to zero nodes when it isn't used. Dedicated VMs are
created to run your jobs as needed.
Python SDK
To create a persistent Azure Machine Learning Compute resource in Python, specify

the vm_size and max_nodes properties. Azure Machine Learning then uses smart
defaults for the other properties.
vm_size: The VM family of the nodes created by Azure Machine Learning

Compute.
max_nodes: The max number of nodes to autoscale up to when you run a job
on Azure Machine Learning Compute.
Python



try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
# To use a different region for the compute, add a
location='<region>' parameter
compute_config =
max_nodes=4)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name,
compute_config)
You can also configure several advanced properties when you create Azure Machine
Learning Compute. The properties allow you to create a persistent cluster of fixed
size, or within an existing Azure Virtual Network in your subscription. See the
AmlCompute class for details.
２ Warning
When setting the location parameter, if it is a different region than your

workspace or datastores you may see increased network latency and data
transfer costs. The latency and costs can occur when creating the cluster, and
when running jobs on it.
Lower your compute cluster cost

You may also choose to use low-priority VMs to run some or all of your workloads.
These VMs do not have guaranteed availability and may be preempted while in use. You
will have to restart a preempted job.
Python SDK
Python
compute_config =
vm_priority='lowpriority',
max_nodes=4)
Set up managed identity

Azure Machine Learning compute clusters also support managed identities to
authenticate access to Azure resources without including credentials in your code. There
are two types of managed identities:
A system-assigned managed identity is enabled directly on the Azure Machine

Learning compute cluster and compute instance. The life cycle of a system-
assigned identity is directly tied to the compute cluster or instance. If the compute
cluster or instance is deleted, Azure automatically cleans up the credentials and the
identity in Azure AD.
A user-assigned managed identity is a standalone Azure resource provided
through Azure Managed Identity service. You can assign a user-assigned managed
identity to multiple resources, and it persists for as long as you want. This
managed identity needs to be created beforehand and then passed as the
identity_id as a required parameter.
Python SDK
Configure managed identity in your provisioning configuration:
System assigned managed identity created in a workspace named ws
Python
# configure cluster with a system-assigned managed identity
compute_config =
max_nodes=5,
identity_type="SystemAssigned",
)
cpu_cluster_name = "cpu-cluster"
compute_config)
User-assigned managed identity created in a workspace named ws
Python
# configure cluster with a user-assigned managed identity

compute_config =
max_nodes=5,
identity_type="UserAssigned",
identity_id=
['/subscriptions/<subcription_id>/resourcegroups/<resource_group
>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<us
er_assigned_identity>'])
cpu_cluster_name = "cpu-cluster"
compute_config)
Add managed identity to an existing compute cluster named cpu_cluster
System-assigned managed identity:
Python
# add a system-assigned managed identity

cpu_cluster.add_identity(identity_type="SystemAssigned")
User-assigned managed identity:
Python
# add a user-assigned managed identity

cpu_cluster.add_identity(identity_type="UserAssigned",
identity_id=
['/subscriptions/<subcription_id>/resourcegroups/<resource_group
>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<us
er_assigned_identity>'])
７ Note
Azure Machine Learning compute clusters support only one system-assigned

identity or multiple user-assigned identities, not both concurrently.
Managed identity usage

The default managed identity is the system-assigned managed identity or the first
user-assigned managed identity.
During a run there are two applications of an identity:
1. The system uses an identity to set up the user's storage mounts, container registry,
and datastores.
In this case, the system will use the default-managed identity.
2. The user applies an identity to access resources from within the code for a
submitted run
In this case, provide the client_id corresponding to the managed identity you
want to use to retrieve a credential.
Alternatively, get the user-assigned identity's client ID through the
DEFAULT_IDENTITY_CLIENT_ID environment variable.
For example, to retrieve a token for a datastore with the default-managed identity:
Python
client_id = os.environ.get('DEFAULT_IDENTITY_CLIENT_ID')
credential = ManagedIdentityCredential(client_id=client_id)
token = credential.get_token('https://storage.azure.com/')
Troubleshooting
There is a chance that some users who created their Azure Machine Learning workspace
from the Azure portal before the GA release might not be able to create AmlCompute in
that workspace. You can either raise a support request against the service or create a
new workspace through the portal or the SDK to unblock yourself immediately.
Stuck at resizing
If your Azure Machine Learning compute cluster appears stuck at resizing (0 -> 0) for
the node state, this may be caused by Azure resource locks.
Azure allows you to place locks on resources, so that they cannot be deleted or are read
only. Locking a resource can lead to unexpected results. Some operations that don't
seem to modify the resource actually require actions that are blocked by the lock.
With Azure Machine Learning, applying a delete lock to the resource group for your
workspace will prevent scaling operations for Azure ML compute clusters. To work
around this problem we recommend removing the lock from resource group and
instead applying it to individual items in the group.
） Important
Do not apply the lock to the following resources:
Resource name Resource type
<GUID>-azurebatch-cloudservicenetworksecurityggroup Network security group
<GUID>-azurebatch-cloudservicepublicip Public IP address
<GUID>-azurebatch-cloudserviceloadbalancer Load balancer
These resources are used to communicate with, and perform operations such as scaling
on, the compute cluster. Removing the resource lock from these resources should allow
autoscaling for your compute clusters.
For more information on resource locking, see Lock resources to prevent unexpected
changes.
Next steps
Use your compute cluster to:
Submit a training run

Run batch inference.
Create and attach an Azure Kubernetes
Service cluster with v1
Article • 04/04/2023
） Important
This article shows how to use the CLI and SDK v1 to create or attach an Azure
Kubernetes Service cluster, which is considered as legacy feature now. To attach
Azure Kubernetes Service cluster using the recommended approach for v2, see
Introduction to Kubernetes compute target in v2.
Azure Machine Learning can deploy trained machine learning models to Azure
Kubernetes Service. However, you must first either create an Azure Kubernetes Service
(AKS) cluster from your Azure Machine Learning workspace, or attach an existing AKS
cluster. This article provides information on both creating and attaching a cluster.
Prerequisites
） Important
until that date.

If you plan on using an Azure Virtual Network to secure communication between
your Azure Machine Learning workspace and the AKS cluster, your workspace and
its associated resources (storage, key vault, Azure Container Registry) must have
private endpoints or service endpoints in the same VNET as AKS cluster's VNET.
Please follow tutorial create a secure workspace to add those private endpoints or
service endpoints to your VNET.
Limitations
An AKS can only be created or attached as a single compute target in Azure
Machine Learning workspace. Multiple attachments for one AKS is not supported.
If you need a Standard Load Balancer(SLB) deployed in your cluster instead of a

Basic Load Balancer(BLB), create a cluster in the AKS portal/CLI/SDK and then
attach it to the Azure Machine Learning workspace.
If you have an Azure Policy that restricts the creation of Public IP addresses, then
AKS cluster creation will fail. AKS requires a Public IP for egress traffic. The egress
traffic article also provides guidance to lock down egress traffic from the cluster
through the Public IP, except for a few fully qualified domain names. There are 2
ways to enable a Public IP:
The cluster can use the Public IP created by default with the BLB or SLB, Or
The cluster can be created without a Public IP and then a Public IP is configured
with a firewall with a user defined route. For more information, see Customize
cluster egress with a user-defined-route.
The Azure Machine Learning control plane does not talk to this Public IP. It talks to
the AKS control plane for deployments.
To attach an AKS cluster, the service principal/user performing the operation must
be assigned the Owner or contributor Azure role-based access control (Azure
RBAC) role on the Azure resource group that contains the cluster. The service
principal/user must also be assigned Azure Kubernetes Service Cluster Admin Role
on the cluster.
If you attach an AKS cluster, which has an Authorized IP range enabled to access
the API server, enable the Azure Machine Learning control plane IP ranges for the
AKS cluster. The Azure Machine Learning control plane is deployed across paired
regions and deploys inference pods on the AKS cluster. Without access to the API
server, the inference pods cannot be deployed. Use the IP ranges for both the
paired regions when enabling the IP ranges in an AKS cluster.
Authorized IP ranges only works with Standard Load Balancer.
If you want to use a private AKS cluster (using Azure Private Link), you must create
the cluster first, and then attach it to the workspace. For more information, see
Create a private Azure Kubernetes Service cluster.
Using a public fully qualified domain name (FQDN) with a private AKS cluster is not
supported with Azure Machine Learning.
The compute name for the AKS cluster MUST be unique within your Azure
Machine Learning workspace. It can include letters, digits and dashes. It must start
with a letter, end with a letter or digit, and be between 3 and 24 characters in
length.
If you want to deploy models to GPU nodes or FPGA nodes (or any specific SKU),
then you must create a cluster with the specific SKU. There is no support for
creating a secondary node pool in an existing cluster and deploying models in the
secondary node pool.
When creating or attaching a cluster, you can select whether to create the cluster
for dev-test or production. If you want to create an AKS cluster for development,
validation, and testing instead of production, set the cluster purpose to dev-test.
If you do not specify the cluster purpose, a production cluster is created.
） Important
A dev-test cluster is not suitable for production level traffic and may increase
inference times. Dev/test clusters also do not guarantee fault tolerance.
When creating or attaching a cluster, if the cluster will be used for production,
then it must contain at least 3 nodes. For a dev-test cluster, it must contain at least
1 node.
The Azure Machine Learning SDK does not provide support scaling an AKS cluster.
To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure
Machine Learning studio. You can only change the node count, not the VM size of
the cluster. For more information on scaling the nodes in an AKS cluster, see the
following articles:
Manually scale the node count in an AKS cluster
Set up cluster autoscaler in AKS
Do not directly update the cluster by using a YAML configuration. While Azure
Kubernetes Services supports updates via YAML configuration, Azure Machine
Learning deployments will override your changes. The only two YAML fields that
will not overwritten are request limits and cpu and memory.
Creating an AKS cluster using the Azure Machine Learning studio UI, SDK, or CLI
extension is not idempotent. Attempting to create the resource again will result in
an error that a cluster with the same name already exists.
Using an Azure Resource Manager template and the
Microsoft.MachineLearningServices/workspaces/computes resource to create an
AKS cluster is also not idempotent. If you attempt to use the template again to
update an already existing resource, you will receive the same error.
Azure Kubernetes Service version

Azure Kubernetes Service allows you to create a cluster using a variety of Kubernetes
versions. For more information on available versions, see supported Kubernetes versions
in Azure Kubernetes Service.
When creating an Azure Kubernetes Service cluster using one of the following methods,
you do not have a choice in the version of the cluster that is created:
Azure Machine Learning studio, or the Azure Machine Learning section of the
Azure portal.
Machine Learning extension for Azure CLI.
Azure Machine Learning SDK.
These methods of creating an AKS cluster use the default version of the cluster. The
default version changes over time as new Kubernetes versions become available.
When attaching an existing AKS cluster, we support all currently supported AKS
versions.
） Important
Azure Kubernetes Service uses Blobfuse FlexVolume driver for the versions
<=1.16 and Blob CSI driver for the versions >=1.17. Therefore, it is important to
re-deploy or update the web service after cluster upgrade in order to deploy to
correct blobfuse method for the cluster version.
７ Note
There may be edge cases where you have an older cluster that is no longer
supported. In this case, the attach operation will return an error and list the
currently supported versions.
You can attach preview versions. Preview functionality is provided without a service
level agreement, and it's not recommended for production workloads. Certain
features might not be supported or might have constrained capabilities. Support
for using preview versions may be limited. For more information, see Supplemental
Available and default versions

To find the available and default AKS versions, use the Azure CLI command az aks get-
versions. For example, the following command returns the versions available in the West
US region:
Azure CLI
az aks get-versions -l westus -o table
The output of this command is similar to the following text:
text
KubernetesVersion Upgrades
------------------- ----------------------------------------
1.18.6(preview) None available
1.18.4(preview) 1.18.6(preview)
1.17.9 1.18.4(preview), 1.18.6(preview)
1.17.7 1.17.9, 1.18.4(preview), 1.18.6(preview)
1.16.13 1.17.7, 1.17.9
1.16.10 1.16.13, 1.17.7, 1.17.9
1.15.12 1.16.10, 1.16.13
1.15.11 1.15.12, 1.16.10, 1.16.13
To find the default version that is used when creating a cluster through Azure Machine
Learning, you can use the --query parameter to select the default version:
Azure CLI
az aks get-versions -l westus --query "orchestrators[?default ==

`true`].orchestratorVersion" -o table
The output of this command is similar to the following text:
text
Result
--------
1.16.13
If you'd like to programmatically check the available versions, use the Container
Service Client - List Orchestrators REST API. To find the available versions, look at the
entries where orchestratorType is Kubernetes . The associated orchestrationVersion
entries contain the available versions that can be attached to your workspace.
To find the default version that is used when creating a cluster through Azure Machine
Learning, find the entry where orchestratorType is Kubernetes and default is true . The
associated orchestratorVersion value is the default version. The following JSON snippet
shows an example entry:
JSON
...
{
"orchestratorType": "Kubernetes",
"orchestratorVersion": "1.16.13",
"default": true,
"upgrades": [
{
"orchestratorType": "",
"orchestratorVersion": "1.17.7",
"isPreview": false
}
]
},
...
Create a new AKS cluster

Creating or attaching an AKS cluster is a one time process for your workspace. You can
reuse this cluster for multiple deployments. If you delete the cluster or the resource
group that contains it, you must create a new cluster the next time you need to deploy.
You can have multiple AKS clusters attached to your workspace.
The following example demonstrates how to create a new AKS cluster using the SDK
and CLI:
Python SDK
Python
# Use the default configuration (you can also provide parameters to

customize this).
# For example, to create a dev/test cluster, use:
# prov_config = AksCompute.provisioning_configuration(cluster_purpose =
AksCompute.ClusterPurpose.DEV_TEST)
# Example configuration to use an existing virtual network

# prov_config.vnet_name = "mynetwork"
# prov_config.vnet_resourcegroup_name = "mygroup"
# prov_config.subnet_name = "default"
# prov_config.service_cidr = "10.0.0.0/16"
# prov_config.dns_service_ip = "10.0.0.10"
# prov_config.docker_bridge_cidr = "172.17.0.1/16"
aks_name = 'myaks'
name = aks_name,
provisioning_configuration =
prov_config)

AksCompute.ClusterPurpose
AksCompute.provisioning_configuration
ComputeTarget.create
ComputeTarget.wait_for_completion
Attach an existing AKS cluster

If you already have AKS cluster in your Azure subscription, you can use it with your
workspace.
 Tip
The existing AKS cluster can be in a Azure region other than your Azure Machine
Learning workspace.
２ Warning
Do not create multiple, simultaneous attachments to the same AKS cluster. For
example, attaching one AKS cluster to a workspace using two different names, or
attaching one AKS cluster to different workspace. Each new attachment will break
the previous existing attachment(s), and cause unpredictable error.
If you want to re-attach an AKS cluster, for example to change TLS or other cluster
configuration setting, you must first remove the existing attachment by using
AksCompute.detach().
For more information on creating an AKS cluster using the Azure CLI or portal, see the
following articles:
Create an AKS cluster (CLI)

Create an AKS cluster (portal)
Create an AKS cluster (ARM Template on Azure Quickstart Templates)
The following example demonstrates how to attach an existing AKS cluster to your
workspace:
Python SDK
Python

# Set the resource group that contains the AKS cluster and the cluster
name
resource_group = 'myresourcegroup'
cluster_name = 'myexistingcluster'
# Attach the cluster to your workgroup. If the cluster has less than 12
virtual CPUs, use the following instead:
# attach_config = AksCompute.attach_configuration(resource_group =
resource_group,
# cluster_name = cluster_name,
# cluster_purpose =
AksCompute.ClusterPurpose.DEV_TEST)
resource_group,
aks_target = ComputeTarget.attach(ws, 'myaks', attach_config)
# Wait for the attach process to complete

AksCompute.attach_configuration()
AksCompute.ClusterPurpose
AksCompute.attach
Create or attach an AKS cluster with TLS

termination
When you create or attach an AKS cluster, you can enable TLS termination with
AksCompute.provisioning_configuration() and AksCompute.attach_configuration()
configuration objects. Both methods return a configuration object that has an
enable_ssl method, and you can use enable_ssl method to enable TLS.
Following example shows how to enable TLS termination with automatic TLS certificate
generation and configuration by using Microsoft certificate under the hood.
Python
# Enable TLS termination when you create an AKS cluster by using

provisioning_config object enable_ssl method

provisioning_config.enable_ssl(leaf_domain_label = "contoso")
# Enable TLS termination when you attach an AKS cluster by using

attach_config object enable_ssl method

attach_config.enable_ssl(leaf_domain_label = "contoso")
Following example shows how to enable TLS termination with custom certificate and
custom domain name. With custom domain and certificate, you must update your DNS
record to point to the IP address of scoring endpoint, please see Update your DNS
Python
# Enable TLS termination with custom certificate and custom domain when
creating an AKS cluster
provisioning_config.enable_ssl(ssl_cert_pem_file="cert.pem",
# Enable TLS termination with custom certificate and custom domain when
attaching an AKS cluster
attach_config.enable_ssl(ssl_cert_pem_file="cert.pem",
７ Note
For more information about how to secure model deployment on AKS cluster,
please see use TLS to secure a web service through Azure Machine Learning
Create or attach an AKS cluster to use Internal

Load Balancer with private IP
When you create or attach an AKS cluster, you can configure the cluster to use an
Internal Load Balancer. With an Internal Load Balancer, scoring endpoints for your
deployments to AKS will use a private IP within the virtual network. Following code
snippets show how to configure an Internal Load Balancer for an AKS cluster.
Create
To create an AKS cluster that uses an Internal Load Balancer, use the
load_balancer_type and load_balancer_subnet parameters:
Python

# When you create an AKS cluster, you can specify Internal Load Balancer
to be created with provisioning_config object
AksCompute.provisioning_configuration(load_balancer_type =
'InternalLoadBalancer')

name = aks_name,
provisioning_configuration =
provisioning_config)

） Important
If your AKS cluster is configured with an Internal Load Balancer, using a Microsoft
provided certificate is not supported and you must use custom certificate to
enable TLS.
７ Note
For more information about how to secure inferencing environment, please see
Secure an Azure Machine Learning Inferencing Environment
Detach an AKS cluster

To detach a cluster from your workspace, use one of the following methods:
２ Warning
Using the Azure Machine Learning studio, SDK, or the Azure CLI extension for
machine learning to detach an AKS cluster does not delete the AKS cluster. To
delete the cluster, see Use the Azure CLI with AKS.
Python SDK
Python
aks_target.detach()
Troubleshooting
Update the cluster

Updates to Azure Machine Learning components installed in an Azure Kubernetes
Service cluster must be manually applied.
You can apply these updates by detaching the cluster from the Azure Machine Learning
workspace and reattaching the cluster to the workspace.
Python
compute_target = ComputeTarget(workspace=ws, name=clusterWorkspaceName)

compute_target.detach()
Before you can re-attach the cluster to your workspace, you need to first delete any
azureml-fe related resources. If there is no active service in the cluster, you can delete
your azureml-fe related resources with the following code.
shell
kubectl delete sa azureml-fe

kubectl delete clusterrole azureml-fe-role
kubectl delete clusterrolebinding azureml-fe-binding
kubectl delete svc azureml-fe
kubectl delete svc azureml-fe-int-http
kubectl delete deploy azureml-fe
kubectl delete secret azuremlfessl
kubectl delete cm azuremlfeconfig
If TLS is enabled in the cluster, you will need to supply the TLS/SSL certificate and
private key when reattaching the cluster.
Python
attach_config =
AksCompute.attach_configuration(resource_group=resourceGroup,
cluster_name=kubernetesClusterName)
# If SSL is enabled.
attach_config.enable_ssl(
ssl_cert_pem_file="cert.pem",
ssl_cname=sslCname)
attach_config.validate_configuration()
compute_target = ComputeTarget.attach(workspace=ws,
name=args.clusterWorkspaceName, attach_configuration=attach_config)
If you no longer have the TLS/SSL certificate and private key, or you are using a
certificate generated by Azure Machine Learning, you can retrieve the files prior to
detaching the cluster by connecting to the cluster using kubectl and retrieving the
secret azuremlfessl .
Bash
kubectl get secret/azuremlfessl -o yaml
７ Note
Kubernetes stores the secrets in Base64-encoded format. You will need to Base64-
decode the cert.pem and key.pem components of the secrets prior to providing
them to attach_config.enable_ssl .
Webservice failures
Many webservice failures in AKS can be debugged by connecting to the cluster using
kubectl . You can get the kubeconfig.json for an AKS cluster by running
Azure CLI
az aks get-credentials -g <rg> -n <aks cluster name>
Delete azureml-fe related resources

After detaching cluster, if there is none active service in cluster, please delete the
azureml-fe related resources before attaching again:
shell
kubectl delete sa azureml-fe

kubectl delete clusterrole azureml-fe-role
kubectl delete clusterrolebinding azureml-fe-binding
kubectl delete svc azureml-fe
kubectl delete svc azureml-fe-int-http
kubectl delete deploy azureml-fe
kubectl delete secret azuremlfessl
kubectl delete cm azuremlfeconfig
Load balancers should not have public IPs

When trying to create or attach an AKS cluster, you may receive a message that the
request has been denied because "Load Balancers should not have public IPs". This
message is returned when an administrator has applied a policy that prevents using an
AKS cluster with a public IP address.
To resolve this problem, create/attach the cluster by using the load_balancer_type and
load_balancer_subnet parameters. For more information, see Internal Load Balancer
(private IP).
Next steps
How and where to deploy a model
Link Azure Synapse Analytics and Azure
Machine Learning workspaces and
attach Apache Spark pools(deprecated)
Article • 03/02/2023
２ Warning
In this article, you learn how to create a linked service that links your Azure Synapse
Analytics workspace and Azure Machine Learning workspace.
With your Azure Machine Learning workspace linked with your Azure Synapse
workspace, you can attach an Apache Spark pool, powered by Azure Synapse Analytics,
as a dedicated compute for data wrangling at scale or conduct model training all from
the same Python notebook.
You can link your ML workspace and Synapse workspace via the Python SDK or the
You can also link workspaces and attach a Synapse Spark pool with a single Azure
Resource Manager (ARM) template .
Prerequisites
Create a Synapse workspace in Azure portal.
Create Apache Spark pool using Azure portal, web tools, or Synapse Studio
Install the Azure Machine Learning Python SDK

Access to the Azure Machine Learning studio .
Link workspaces with the Python SDK
） Important
To link to the Synapse workspace successfully, you must be granted the Owner role
of the Synapse workspace. Check your access in the Azure portal .
If you are not an Owner and are only a Contributor to the Synapse workspace, you
can only use existing linked services. See how to Retrieve and use an existing
linked service.
The following code employs the LinkedService and

SynapseWorkspaceLinkedServiceConfiguration classes to,
Link your machine learning workspace, ws with your Azure Synapse workspace.
Register your Synapse workspace with Azure Machine Learning as a linked service.
Python
import datetime
from azureml.core import Workspace, LinkedService,
SynapseWorkspaceLinkedServiceConfiguration
# Azure Machine Learning workspace

#link configuration
synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(
subscription_id=ws.subscription_id,
resource_group= 'your resource group',
name='mySynapseWorkspaceName')
# Link workspaces and register Synapse workspace in Azure Machine Learning

linked_service = LinkedService.register(workspace = ws,
name = 'synapselink1',
linked_service_config =
synapse_link_config)
） Important
A managed identity, system_assigned_identity_principal_id , is created for each

linked service. This managed identity must be granted the Synapse Apache Spark
Administrator role of the Synapse workspace before you start your Synapse
session. Assign the Synapse Apache Spark Administrator role to the managed
identity in the Synapse Studio.
To find the system_assigned_identity_principal_id of a specific linked service, use

LinkedService.get('<your-mlworkspace-name>', '<linked-service-name>') .
Manage linked services

View all the linked services associated with your machine learning workspace.
Python
LinkedService.list(ws)
To unlink your workspaces, use the unregister() method
Python
linked_service.unregister()
Link workspaces via studio

Link your machine learning workspace and Synapse workspace via the Azure Machine
Learning studio with the following steps:
1. Sign in to the Azure Machine Learning studio .
2. Select Linked Services in the Manage section of the left pane.
3. Select Add integration.
4. On the Link workspace form, populate the fields
Field Description
Name Provide a name for your linked service. This name is what will be used to
reference to this particular linked service.
Subscription Select the name of your subscription that's associated with your machine
name learning workspace.
Synapse Select the Synapse workspace you want to link to.

workspace
5. Select Next to open the Select Spark pools (optional) form. On this form, you
select which Synapse Spark pool to attach to your workspace
6. Select Next to open the Review form and check your selections.
7. Select Create to complete the linked service creation process.
Get an existing linked service

Before you can attach a dedicated compute for data wrangling, you must have an ML
workspace that's linked to an Azure Synapse Analytics workspace, this is referred to as a
linked service.
To retrieve and use an existing linked service, requires User or Contributor permissions
to the Azure Synapse Analytics workspace.
This example retrieves an existing linked service, synapselink1 , from the workspace, ws ,
with the get() method.
Python
from azureml.core import LinkedService
linked_service = LinkedService.get(ws, 'synapselink1')
Attach Synapse Spark pool as a compute

Once you retrieve the linked service, attach a Synapse Apache Spark pool as a dedicated
compute resource for your data wrangling tasks.
You can attach Apache Spark pools via,

Azure Resource Manager (ARM) templates
The Azure Machine Learning Python SDK
Attach a pool via the studio

Follow these steps:
1. Sign in to the Azure Machine Learning studio .

2. Select Linked Services in the Manage section of the left pane.
3. Select your Synapse workspace.
4. Select Attached Spark pools on the top left.
5. Select Attach.
6. Select your Apache Spark pool from the list and provide a name.
a. This list identifies the available Synapse Spark pools that can be attached to
your compute.
b. To create a new Synapse Spark pool, see Create Apache Spark pool with the
Synapse Studio
7. Select Attach selected.
Attach a pool with the Python SDK

You can also employ the Python SDK to attach an Apache Spark pool.
The follow code,
1. Configures the SynapseCompute with,

a. The LinkedService, linked_service that you either created or retrieved in the
previous step.
b. The type of compute target you want to attach, SynapseSpark
c. The name of the Apache Spark pool. This must match an existing Apache Spark
pool that is in your Azure Synapse Analytics workspace.
2. Creates a machine learning ComputeTarget by passing in,

a. The machine learning workspace you want to use, ws
b. The name you'd like to refer to the compute within the Azure Machine Learning
workspace.
c. The attach_configuration you specified when configuring your Synapse
Compute.
i. The call to ComputeTarget.attach() is asynchronous, so the sample blocks
until the call completes.
Python
from azureml.core.compute import SynapseCompute, ComputeTarget
attach_config = SynapseCompute.attach_configuration(linked_service, #Linked

synapse workspace alias
type='SynapseSpark',
#Type of assets to attach
pool_name=synapse_spark_pool_name) #Name of Synapse spark pool
synapse_compute = ComputeTarget.attach(workspace= ws,

name= synapse_compute_name,
attach_configuration= attach_config
)
synapse_compute.wait_for_completion()
Verify the Apache Spark pool is attached.
Python
ws.compute_targets['Synapse Spark pool alias']
Next steps
How to data wrangle with Azure Synapse (preview).
How to use Apache Spark in your machine learning pipeline with Azure Synapse
(preview)
Train a model.
How to securely integrate Azure Synapse and Azure Machine Learning workspaces.
Troubleshoot "cannot import name
'SerializationError'"
Article • 06/19/2023
When using Azure Machine Learning, you may receive one of the following errors:
cannot import name 'SerializationError'
cannot import name 'SerializationError' from 'azure.core.exceptions'
This error may occur when using an Azure Machine Learning environment. For example,
when submitting a training job or using AutoML.
Cause
This problem is caused by a bug in the Azure Machine Learning SDK version 1.42.0.
Resolution
Update the affected environment to use SDK version 1.42.0.post1 or greater. For a local
development environment or compute instance, use the following command:
Bash
pip install azureml-sdk[automl,explain,notebooks]>=1.42.0
For more information on updating an Azure Machine Learning environment (for training
or deployment), see the following articles:
Manage environments in studio

Create & use software environments
To verify the version of your installed SDK, use the following command:
Bash
pip show azureml-core
Next steps

Troubleshoot descriptors cannot not be
created directly error
Article • 06/19/2023
When using Azure Machine Learning, you may receive the following error:
TypeError: Descriptors cannot not be created directly. If this call came

from a _pb2.py file, your generated code is out of date and must be
regenerated with protoc >= 3.19.0." It is followed by the proposition to
install the appropriate version of protobuf library.
If you cannot immediately regenerate your protos, some other possible

workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use
pure-Python parsing and will be much slower).
You may notice this error specifically when using AutoML.
Cause
This problem is caused by breaking changes introduced in protobuf 4.0.0. For more
information, see https://developers.google.com/protocol-buffers/docs/news/2022-05-
06#python-updates .
Resolution
For a local development environment or compute instance, install the Azure Machine
Learning SDK version 1.42.0.post1 or greater.
Bash
pip install azureml-sdk[automl,explain,notebooks]>=1.42.0

To verify the version of your installed SDK, use the following command:
Bash
pip show azureml-core
This command should return information similar to Version: 1.42.0.post1 .
 Tip
If you can't upgrade your Azure Machine Learning SDK installation, you can pin the
protobuf version in your environment to 3.20.1 . The following example is a
conda.yml file that demonstrates how to pin the version:
yml
name: model-env
channels:
- conda-forge
dependencies:
- python=3.8
- numpy=1.21.2
- pip=21.2.4
- scikit-learn=0.24.2
- scipy=1.7.1
- pandas>=1.1,<1.2
- pip:
- inference-schema[numpy-support]==1.3.0
- xlrd==2.0.1
- mlflow== 1.26.0
- azureml-mlflow==1.41.0
- protobuf==3.20.1
Next steps
For more information on the breaking changes in protobuf 4.0.0, see
https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-
updates .

Troubleshooting environment issues
Article • 06/09/2023
In this article, learn how to troubleshoot common problems you may encounter with
environment image builds and learn about AzureML environment vulnerabilities.
We are actively seeking your feedback! If you navigated to this page via your
Environment Definition or Build Failure Analysis logs, we'd like to know if the feature was
helpful to you, or if you'd like to report a failure scenario that isn't yet covered by our
analysis. You can also leave feedback on this documentation. Leave your thoughts
here .
Azure Machine Learning environments

Azure Machine Learning environments are an encapsulation of the environment where
your machine learning training happens. They specify the base docker image, Python
packages, and software settings around your training and scoring scripts. Environments
are managed and versioned assets within your Machine Learning workspace that enable
reproducible, auditable, and portable machine learning workflows across various
compute targets.
Types of environments
Environments fall under three categories: curated, user-managed, and system-managed.
Curated environments are pre-created environments managed by Azure Machine

Learning and are available by default in every workspace. They contain collections of
Python packages and settings to help you get started with various machine learning
frameworks, and you're meant to use them as is. These pre-created environments also
allow for faster deployment time.
In user-managed environments, you're responsible for setting up your environment and

installing every package that your training script needs on the compute target. Also be
sure to include any dependencies needed for model deployment.
These types of environments have two subtypes. For the first type, BYOC (bring your
own container), you bring an existing Docker image to Azure Machine Learning. For the
second type, Docker build context based environments, Azure Machine Learning
materializes the image from the context that you provide.
When you want conda to manage the Python environment for you, use a system-
managed environment. Azure Machine Learning creates a new isolated conda
environment by materializing your conda specification on top of a base Docker image.
By default, Azure Machine Learning adds common features to the derived image. Any
Python packages present in the base image aren't available in the isolated conda
environment.
Create and manage environments

You can create and manage environments from clients like Azure Machine Learning
Python SDK, Azure Machine Learning CLI, Azure Machine Learning Studio UI, Visual
Studio Code extension.
"Anonymous" environments are automatically registered in your workspace when you

submit an experiment without registering or referencing an already existing
environment. They aren't listed but you can retrieve them by version or label.
Azure Machine Learning builds environment definitions into Docker images. It also
caches the images in the Azure Container Registry associated with your Azure Machine
Learning Workspace so they can be reused in subsequent training jobs and service
endpoint deployments. Multiple environments with the same definition may result in the
same cached image.
Running a training script remotely requires the creation of a Docker image.
Vulnerabilities in AzureML Environments

You can address vulnerabilities by upgrading to a newer version of a dependency (base
image, Python package, etc.) or by migrating to a different dependency that satisfies
security requirements. Mitigating vulnerabilities is time consuming and costly since it
can require refactoring of code and infrastructure. With the prevalence of open source
software and the use of complicated nested dependencies, it's important to manage
and keep track of vulnerabilities.
There are some ways to decrease the impact of vulnerabilities:
Reduce your number of dependencies - use the minimal set of the dependencies
for each scenario.
Compartmentalize your environment so you can scope and fix issues in one place.
Understand flagged vulnerabilities and their relevance to your scenario.
Scan for Vulnerabilities

You can monitor and maintain environment hygiene with Microsoft Defender for
Container Registry to help scan images for vulnerabilities.
To automate this process based on triggers from Microsoft Defender, see Automate
responses to Microsoft Defender for Cloud triggers.
Vulnerabilities vs Reproducibility
Reproducibility is one of the foundations of software development. When you're
developing production code, a repeated operation must guarantee the same result.
Mitigating vulnerabilities can disrupt reproducibility by changing dependencies.
Azure Machine Learning's primary focus is to guarantee reproducibility. Environments

fall under three categories: curated, user-managed, and system-managed.
Curated Environments
Curated environments are pre-created environments that Azure Machine Learning
manages and are available by default in every Azure Machine Learning workspace
provisioned. New versions are released by Azure Machine Learning to address
vulnerabilities. Whether you use the latest image may be a tradeoff between
reproducibility and vulnerability management.
Curated Environments contain collections of Python packages and settings to help you
get started with various machine learning frameworks. You're meant to use them as is.
These pre-created environments also allow for faster deployment time.
User-managed Environments
In user-managed environments, you're responsible for setting up your environment and
installing every package that your training script needs on the compute target and for
model deployment. These types of environments have two subtypes:
BYOC (bring your own container): the user provides a Docker image to Azure
Machine Learning
Docker build context: Azure Machine Learning materializes the image from the
user provided content
Once you install more dependencies on top of a Microsoft-provided image, or bring

your own base image, vulnerability management becomes your responsibility.
System-managed Environments
You use system-managed environments when you want conda to manage the Python
environment for you. Azure Machine Learning creates a new isolated conda
environment by materializing your conda specification on top of a base Docker image.
While Azure Machine Learning patches base images with each release, whether you use
the latest image may be a tradeoff between reproducibility and vulnerability
management. So, it's your responsibility to choose the environment version used for
your jobs or model deployments while using system-managed environments.
Vulnerabilities: Common Issues
Vulnerabilities in Base Docker Images

System vulnerabilities in an environment are usually introduced from the base image.
For example, vulnerabilities marked as "Ubuntu" or "Debian" are from the system level
of the environment–the base Docker image. If the base image is from a third-party
issuer, please check if the latest version has fixes for the flagged vulnerabilities. Most
common sources for the base images in Azure Machine Learning are:
Microsoft Artifact Registry (MAR) aka Microsoft Container Registry

(mcr.microsoft.com).
Images can be listed from MAR homepage, calling catalog API, or /tags/list
Source and release notes for training base images from AzureML can be found
in Azure/AzureML-Containers
Nvidia (nvcr.io, or nvidia's Profile )
If the latest version of your base image does not resolve your vulnerabilities, base image
vulnerabilities can be addressed by installing versions recommended by a vulnerability
scan:
apt-get install -y library_name
Vulnerabilities in Python Packages

Vulnerabilities can also be from installed Python packages on top of the system-
managed base image. These Python-related vulnerabilities should be resolved by
updating your Python dependencies. Python (pip) vulnerabilities in the image usually
come from user-defined dependencies.
To search for known Python vulnerabilities and solutions please see GitHub Advisory
Database . To address Python vulnerabilities, update the package to the version that
has fixes for the flagged issue:
pip install -u my_package=={good.version}
If you're using a conda environment, update the reference in the conda dependencies
file.
In some cases, Python packages will be automatically installed during conda's setup of
your environment on top of a base Docker image. Mitigation steps for those are the
same as those for user-introduced packages. Conda installs necessary dependencies for
every environment it materializes. Packages like cryptography, setuptools, wheel, etc. will
be automatically installed from conda's default channels. There's a known issue with the
default anaconda channel missing latest package versions, so it's recommended to
prioritize the community-maintained conda-forge channel. Otherwise, please explicitly
specify packages and versions, even if you don't reference them in the code you plan to
execute on that environment.
Cache issues
Associated to your Azure Machine Learning workspace is an Azure Container Registry
instance that's a cache for container images. Any image materialized is pushed to the
container registry and used if you trigger experimentation or deployment for the
corresponding environment. Azure Machine Learning doesn't delete images from your
container registry, and it's your responsibility to evaluate which images you need to
maintain over time.
Troubleshooting environment image builds

Learn how to troubleshoot issues with environment image builds and package
installations.
Environment definition problems
Environment name issues

Curated prefix not allowed
This issue can happen when the name of your custom environment uses terms reserved
only for curated environments. Curated environments are environments that Microsoft
maintains. Custom environments are environments that you create and maintain.
Potential causes:
Your environment name starts with Microsoft or AzureML
Affected areas (symptoms):
Failure in registering your environment
Troubleshooting steps
Update your environment name to exclude the reserved prefix you're currently using
Resources
Create and manage reusable environments
Environment name is too long

Potential causes:
Your environment name is longer than 255 characters
Update your environment name to be 255 characters or less
Docker issues
To create a new environment, you must use one of the following approaches (see
DockerSection ):
Base image
Provide base image name, repository from which to pull it, and credentials if
needed
Provide a conda specification
Base Dockerfile
Provide a Dockerfile
Provide a conda specification
Docker build context
Provide the location of the build context (URL)
The build context must contain at least a Dockerfile, but may contain other files
as well
Missing Docker definition

This issue can happen when your environment definition is missing a DockerSection .
This section configures settings related to the final Docker image built from your
environment specification.
Potential causes:
You didn't specify the DockerSection of your environment definition
Add a DockerSection to your environment definition, specifying either a base image,

base dockerfile, or docker build context.
Python

# Specify docker steps as a string.
dockerfile = r'''
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
'''
myenv.docker.base_dockerfile = dockerfile
Resources
DockerSection
Too many Docker options

Potential causes:
You have more than one of these Docker options specified in your environment
definition
base_image
base_dockerfile
build_context
See DockerSection
Choose which Docker option you'd like to use to build your environment. Then set all
other specified options to None.
Python

myenv = Environment(name="myEnv")
dockerfile = r'''
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
'''
myenv.docker.base_dockerfile = dockerfile
myenv.docker.base_image = "pytorch/pytorch:latest"
# Having both base dockerfile and base image set will cause failure. Delete
the one you won't use.
myenv.docker.base_image = None
Missing Docker option

Potential causes:
You didn't specify one of the following options in your environment definition
base_image
base_dockerfile
build_context
See DockerSection
Choose which Docker option you'd like to use to build your environment, then populate
that option in your environment definition.
Python

myenv = Environment(name="myEnv")
myenv.docker.base_image = "pytorch/pytorch:latest"
Resources
Environment class v1
Container registry credentials missing either username or

password
Potential causes:
You've specified either a username or a password for your container registry in

your environment definition, but not both
Add the missing username or password to your environment definition to fix the issue
Python
myEnv.docker.base_image_registry.username = "username"
Alternatively, provide authentication via workspace connections
Python

ws.set_connection("connection1", "ACR", "<URL>", "Basic", "{'Username':
'<username>', 'Password': '<password>'}")
Resources
Python SDK v1 workspace connections
Multiple credentials for base image registry

Potential causes:
You've specified more than one set of credentials for your base image registry
If you're using workspace connections, view the connections you have set, and delete
whichever one(s) you don't want to use
Python

ws.list_connections()
ws.delete_connection("myConnection2")
If you've specified credentials in your environment definition, choose one set of

credentials to use, and set all others to null
Python
myEnv.docker.base_image_registry.registry_identity = None
７ Note
Providing credentials in your environment definition is no longer supported.

Use workspace connections instead.
Resources
Delete a workspace connection v1

Secrets in base image registry

Potential causes:
You've specified credentials in your environment definition
Specifying credentials in your environment definition is no longer supported. Delete

credentials from your environment definition and use workspace connections instead.
Set a workspace connection on your workspace
Python

ws.set_connection("connection1", "ACR", "<URL>", "Basic", "{'Username':
'<username>', 'Password': '<password>'}")
Resources
Deprecated Docker attribute

Potential causes:
You've specified Docker attributes in your environment definition that are now
deprecated
The following are deprecated properties:
enabled
arguments
shared_volumes
gpu_support
Azure Machine Learning now automatically detects and uses NVIDIA Docker
extension when available
smh_size
Instead of specifying these attributes in the DockerSection of your environment

definition, use DockerConfiguration
Resources
See DockerSection deprecated variables
Dockerfile length over limit

Potential causes:
Your specified Dockerfile exceeded the maximum size of 100 KB
Shorten your Dockerfile to get it under this limit
Resources
See best practices

Docker build context issues
Missing Docker build context location

Potential causes:
You didn't provide the path of your build context directory in your environment
definition
Include a path in the build_context of your DockerSection
See DockerBuildContext Class
Resources
Understand build context
Missing Dockerfile path

This issue can happen when Azure Machine Learning fails to find your Dockerfile. As a
default, Azure Machine Learning looks for a Dockerfile named 'Dockerfile' at the root of
your build context directory unless you specify a Dockerfile path.
Potential causes:
Your Dockerfile isn't at the root of your build context directory and/or is named
something other than 'Dockerfile,' and you didn't provide its path
In the build_context of your DockerSection , include a dockerfile_path

Resources
Not allowed to specify attribute with Docker build

context
This issue can happen when you've specified properties in your environment definition
that can't be included with a Docker build context.
Potential causes:
You specified a Docker build context, along with at least one of the following
properties in your environment definition:
Environment variables
Conda dependencies
R
Spark
If you specified any of the above-listed properties in your environment definition,

remove them
If you're using a Docker build context and want to specify conda dependencies,
your conda specification should reside in your build context directory
Resources

Python SDK v1 Environment Class
Location type not supported/Unknown location type

Potential causes:
You specified a location type for your Docker build context that isn't supported or
is unknown
The following are accepted location types:
Git
You can provide git URLs to Azure Machine Learning, but you can't use them to
build images yet. Use a storage account until builds have Git support
Storage account
See this storage account overview
See how to create a storage account
Resources

Invalid location
Potential causes:
The specified location of your Docker build context is invalid
For scenarios in which you're storing your Docker build context in a storage account
You must specify the path of the build context as
https://<storage-account>.blob.core.windows.net/<container>/<path>
Ensure that the location you provided is a valid URL

Ensure that you've specified a container and a path
Resources

Python SDK/Azure CLI v2 sample
Base image issues
Base image is deprecated

Potential causes:
You used a deprecated base image

Azure Machine Learning can't provide troubleshooting support for failed builds
with deprecated images
Azure Machine Learning doesn't update or maintain these images, so they're at
risk of vulnerabilities
The following base images are deprecated:
azureml/base
azureml/base-gpu
azureml/base-lite
azureml/intelmpi2018.3-cuda10.0-cudnn7-ubuntu16.04
azureml/intelmpi2018.3-cuda9.0-cudnn7-ubuntu16.04
azureml/intelmpi2018.3-ubuntu16.04
azureml/o16n-base/python-slim
azureml/openmpi3.1.2-cuda10.0-cudnn7-ubuntu16.04
azureml/openmpi3.1.2-ubuntu16.04
azureml/openmpi3.1.2-ubuntu18.04
azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04

Upgrade your base image to a latest version of supported images
See available base images
No tag or digest
Potential causes:
You didn't include a version tag or a digest on your specified base image
Without one of these specifiers, the environment isn't reproducible
Include at least one of the following specifiers on your base image
Version tag
Digest
See image with immutable identifier
Environment variable issues
Misplaced runtime variables

Potential causes:
You specified runtime variables in your environment definition
Use the environment_variables attribute on the RunConfiguration object instead

Python issues
Python section missing

Potential causes:
Your environment definition doesn't have a Python section
Populate the Python section of your environment definition
See PythonSection class
Python version missing

Potential causes:
You haven't specified a Python version in your environment definition
Add Python as a conda package and specify the version
Python
conda_dep.add_conda_package("python==3.8")
env.python.conda_dependencies = conda_dep
If you're using a YAML for your conda specification, include Python as a dependency
YAML
dependencies:
- python=3.8
- pip:
- azureml-defaults
channels:
- anaconda
Resources
Add conda package v1
Multiple Python versions

Potential causes:
You've specified more than one Python version in your environment definition
Choose which Python version you want to use, and remove all other versions
Python
myenv.python.conda_dependencies.remove_conda_package("python=3.6")
If you're using a YAML for your conda specification, include only one Python version as a
dependency
Resources
CondaDependencies Class v1
Python version not supported

Potential causes:
You've specified a Python version that is at or near its end-of-life and is no longer
supported
Specify a python version that hasn't reached and isn't nearing its end-of-life
Python version not recommended

Potential causes:
You've specified a Python version that is at or near its end-of-life
Specify a python version that hasn't reached and isn't nearing its end-of-life
Failed to validate Python version

Potential causes:
You specified a Python version with incorrect syntax or improper formatting
Use correct syntax to specify a Python version using the SDK
Python
myenv.python.conda_dependencies.add_conda_package("python=3.8")
Use correct syntax to specify a Python version in a conda YAML

YAML
dependencies:
- python=3.8
- pip:
- azureml-defaults
channels:
- anaconda
Resources
See conda package pinning
Conda issues
Missing conda dependencies

Potential causes:
You haven't provided a conda specification in your environment definition, and

user_managed_dependencies is set to False (the default)
If you don't want Azure Machine Learning to create a Python environment for you based
on conda_dependencies, set user_managed_dependencies to True
Python
env.python.user_managed_dependencies = True
You're responsible for ensuring that all necessary packages are available in the
Python environment in which you choose to run the script
If you want Azure Machine Learning to create a Python environment for you based on a
conda specification, you must populate conda_dependencies in your environment
definition
Python
env = Environment(name="env")
Resources
See how to create a conda file manually
See CondaDependencies class

See how to set a conda specification on the environment definition
Invalid conda dependencies

Potential causes:
You incorrectly formatted the conda dependencies specified in your environment

definition
Ensure that conda_dependencies is a JSONified version of the conda dependencies YAML

structure
JSON
"channels": [
"anaconda",
"conda-forge"
],
"dependencies": [
"python=3.8",
{
"pip": [
"azureml-defaults"
]
}
],
}
You can also specify conda dependencies using the add_conda_package method
Python
Resources
See more extensive examples

See how to set a conda specification on the environment definition
Missing conda channels

Potential causes:
You haven't specified conda channels in your environment definition
For reproducibility of your environment, specify channels from which to pull

dependencies. If you don't specify conda channels, conda uses defaults that might
change.
Add a conda channel using the Python SDK
Python
conda_dep.add_channel("conda-forge")
If you're using a YAML for your conda specification, include the conda channel(s) you'd
like to use
YAML
dependencies:
- python=3.8
- pip:
- azureml-defaults
channels:
- anaconda
- conda-forge
Resources
See how to set a conda specification on the environment definition v1

Base conda environment not recommended

Potential causes:
You specified a base conda environment in your environment definition
Partial environment updates can lead to dependency conflicts and/or unexpected

runtime errors, so the use of base conda environments isn't recommended.
Remove your base conda environment, and specify all packages needed for your
environment in the conda_dependencies section of your environment definition
Python

env.python.base_conda_environment = None
Resources
See how to set a conda specification on the environment definition v1

Unpinned dependencies
Potential causes:
You didn't specify versions for certain packages in your conda specification
If you don't specify a dependency version, the conda package resolver may choose a
different version of the package on subsequent builds of the same environment. This
breaks reproducibility of the environment and can lead to unexpected errors.
Include version numbers when adding packages to your conda specification
Python
conda_dep.add_conda_package("numpy==1.24.1")
If you're using a YAML for your conda specification, specify versions for your
dependencies
YAML
dependencies:
- python=3.8
- pip:
- numpy=1.24.1
channels:
- anaconda
- conda-forge
Resources
Pip issues
Pip not specified

Potential causes:
You didn't specify pip as a dependency in your conda specification
For reproducibility, you should specify and pin pip as a dependency in your conda
specification.
Specify pip as a dependency, along with its version
Python
env.python.conda_dependencies.add_conda_package("pip==22.3.1")
If you're using a YAML for your conda specification, specify pip as a dependency
YAML
dependencies:
- python=3.8
- pip=22.3.1
- pip:
- numpy=1.24.1
channels:
- anaconda
- conda-forge
Resources
Pip not pinned

Potential causes:
You didn't specify a version for pip in your conda specification
If you don't specify a pip version, a different version may be used on subsequent builds
of the same environment. This behavior can cause reproducibility issues and other
unexpected errors if different versions of pip resolve your packages differently.
Specify a pip version in your conda dependencies
Python
env.python.conda_dependencies.add_conda_package("pip==22.3.1")
If you're using a YAML for your conda specification, specify a version for pip
YAML
dependencies:
- python=3.8
- pip=22.3.1
- pip:
- numpy=1.24.1
channels:
- anaconda
- conda-forge
Resources
Miscellaneous environment issues
R section is deprecated
Potential causes:
You specified an R section in your environment definition
The Azure Machine Learning SDK for R was deprecated at the end of 2021 to make way
for an improved R training and deployment experience using the Azure CLI v2
Remove the R section from your environment definition
Python
env.r = None
See the samples repository to get started training R models using the Azure CLI v2
No definition exists for environment

Potential causes:
You specified an environment that doesn't exist or hasn't been registered

There was a misspelling or syntactical error in the way you specified your
environment name or environment version
Ensure that you're specifying your environment name correctly, along with the correct
version
path-to-resource:version-number
You should specify the 'latest' version of your environment in a different way
path-to-resource@latest
Image build problems
ACR issues
ACR unreachable
This issue can happen when there's a failure in accessing a workspace's associated Azure
Container Registry (ACR) resource.
Potential causes:
Your workspace's ACR is behind a virtual network (VNet) (private endpoint or

service endpoint), and you aren't using a compute cluster to build images.
Your workspace's ACR is behind a virtual network (VNet) (private endpoint or
service endpoint), and the compute cluster used for building images has no access
to the workspace's ACR.
Failure in building environments from UI, SDK, and CLI.

Failure in running jobs because Azure Machine Learning implicitly builds the
environment in the first step.
Pipeline job failures.
Model deployment failures.
Update the workspace image build compute property using SDK:
Python

ws.update(image_build_compute = 'mycomputecluster')
APPLIES TO: Azure CLI ml extension v2 (current)
Update the workspace image build compute property using Azure CLI:
az ml workspace update --name myworkspace --resource-group myresourcegroup -

-image-build-compute mycomputecluster
７ Note
Only Azure Machine Learning compute clusters are supported. Compute,

Azure Kubernetes Service (AKS), or other instance types are not supported for
image build compute.
Make sure the compute cluster's VNet that's used for the image build
compute has access to the workspace's ACR.
Make sure the compute cluster is CPU based.
Resources
Enable Azure Container Registry (ACR)

How To Use Environments
Unexpected Dockerfile Format

This issue can happen when your Dockerfile is formatted incorrectly.
Potential causes:
Your Dockerfile contains invalid syntax

Your Dockerfile contains characters that aren't compatible with UTF-8

Failure in running jobs because it will implicitly build the environment in the first
step.
Ensure Dockerfile is formatted correctly and is encoded in UTF-8
Resources
Dockerfile format
Docker pull issues
Failed to pull Docker image

This issue can happen when a Docker image pull fails during an image build.
Potential causes:
The path name to the container registry is incorrect

A container registry behind a virtual network is using a private endpoint in an
unsupported region
The image you're trying to reference doesn't exist in the container registry you
specified
You haven't provided credentials for a private registry you're trying to pull the
image from, or the provided credentials are incorrect

Check that the path name to your container registry is correct
For a registry my-registry.io and image test/image with tag 3.2 , a valid image
path would be my-registry.io/test/image:3.2
See registry path documentation
If your container registry is behind a virtual network or is using a private endpoint in an

unsupported region
Configure the container registry by using the service endpoint (public access) from
the portal and retry
After you put the container registry behind a virtual network, run the Azure
Resource Manager template so the workspace can communicate with the
container registry instance
If the image you're trying to reference doesn't exist in the container registry you
specified
Check that you've used the correct tag and that you've set
user_managed_dependencies to True . Setting user_managed_dependencies to
True disables conda and uses the user's installed packages
If you haven't provided credentials for a private registry you're trying to pull from, or the
provided credentials are incorrect
Set workspace connections for the container registry if needed
Resources
Workspace connections v1
I/O Error
This issue can happen when a Docker image pull fails due to a network issue.
Potential causes:
Network connection issue, which could be temporary

Firewall is blocking the connection
ACR is unreachable and there's network isolation. For more information, see ACR
unreachable.

Add the host to the firewall rules
See configure inbound and outbound network traffic to learn how to use Azure
Firewall for your workspace and resources behind a VNet
Assess your workspace set-up. Are you using a virtual network, or are any of the
resources you're trying to access during your image build behind a virtual network?
Ensure that you've followed the steps in this article on securing a workspace with
virtual networks
internet. If there's a problem with your virtual network setup, there might be an
issue with accessing certain repositories required during your image build
If you aren't using a virtual network, or if you've configured it correctly
Try rebuilding your image. If the timeout was due to a network issue, the problem
might be transient, and a rebuild could fix the problem
Conda issues during build
Bad spec
This issue can happen when a package listed in your conda specification is invalid or
when you've executed a conda command incorrectly.
Potential causes:
The syntax you used in your conda specification is incorrect

You're executing a conda command incorrectly

Conda spec errors can happen if you use the conda create command incorrectly
Read the documentation and ensure that you're using valid options and syntax
There's known confusion regarding conda env create versus conda create . You
can read more about conda's response and other users' known solutions here
To ensure a successful build, ensure that you're using proper syntax and valid package
specification in your conda yaml
See package match specifications and how to create a conda file manually
Communications error
This issue can happen when there's a failure in communicating with the entity from
which you wish to download packages listed in your conda specification.
Potential causes:
Failed to communicate with a conda channel or a package repository

These failures may be due to transient network failures

Ensure that the conda channels/repositories you're using in your conda specification are
correct
Check that they exist and that you've spelled them correctly
If the conda channels/repositories are correct
Try to rebuild the image--there's a chance that the failure is transient, and a rebuild
might fix the issue
Check to make sure that the packages listed in your conda specification exist in the
channels/repositories you specified
Compile error
This issue can happen when there's a failure building a package required for the conda
environment due to a compiler error.
Potential causes:
You spelled a package incorrectly and therefore it wasn't recognized

There's something wrong with the compiler

If you're using a compiler
Ensure that the compiler you're using is recognized

If needed, add an installation step to your Dockerfile
Verify the version of your compiler and check that all commands or options you're
using are compatible with the compiler version
If necessary, upgrade your compiler version
Ensure that you've spelled all listed packages correctly and that you've pinned versions
correctly
Resources
Dockerfile reference on running commands

Example compiler issue
Missing command
This issue can happen when a command isn't recognized during an image build or in
the specified Python package requirement.
Potential causes:
You didn't spell the command correctly

The command can't be executed because a required package isn't installed

Ensure that you've spelled the command correctly

Ensure that you've installed any packages needed to execute the command you're
trying to perform
If needed, add an installation step to your Dockerfile
Resources
Dockerfile reference on running commands
Conda timeout
This issue can happen when conda package resolution takes too long to complete.
Potential causes:
There's a large number of packages listed in your conda specification and
unnecessary packages are included
You haven't pinned your dependencies (you included tensorflow instead of
tensorflow=2.8)
You've listed packages for which there's no solution (you included package X=1.3
and Y=2.8, but X's version is incompatible with Y's version)

Remove any packages from your conda specification that are unnecessary
Pin your packages--environment resolution is faster
If you're still having issues, review this article for an in-depth look at understanding
and improving conda's performance
Out of memory
This issue can happen when conda package resolution fails due to available memory
being exhausted.
Potential causes:
There's a large number of packages listed in your conda specification and

unnecessary packages are included
You haven't pinned your dependencies (you included tensorflow instead of
tensorflow=2.8)
You've listed packages for which there's no solution (you included package X=1.3
and Y=2.8, but X's version is incompatible with Y's version)

Remove any packages from your conda specification that are unnecessary
Pin your packages--environment resolution is faster
If you're still having issues, review this article for an in-depth look at understanding
and improving conda's performance
Package not found

This issue can happen when one or more conda packages listed in your specification
can't be found in a channel/repository.
Potential causes:
You listed the package's name or version incorrectly in your conda specification
The package exists in a conda channel that you didn't list in your conda
specification

Ensure that you've spelled the package correctly and that the specified version
exists
Ensure that the package exists on the channel you're targeting
Ensure that you've listed the channel/repository in your conda specification so the
package can be pulled correctly during package resolution
Specify channels in your conda specification:
YAML
channels:
- conda-forge
- anaconda
dependencies:
- python=3.8
- tensorflow=2.8
Name: my_environment
Resources
Managing channels
Missing Python module

This issue can happen when a Python module listed in your conda specification doesn't
exist or isn't valid.
Potential causes:
You spelled the module incorrectly

The module isn't recognized

Ensure that you've spelled the module correctly and that it exists
Check to make sure that the module is compatible with the Python version you've
specified in your conda specification
If you haven't listed a specific Python version in your conda specification, make
sure to list a specific version that's compatible with your module otherwise a
default may be used that isn't compatible
Pin a Python version that's compatible with the pip module you're using:
YAML
channels:
- conda-forge
- anaconda
dependencies:
- python=3.8
- pip:
- dataclasses
No matching distribution
This issue can happen when there's no package found that matches the version you
specified.
Potential causes:
You spelled the package name incorrectly

The package and version can't be found on the channels or feeds that you
specified
The version you specified doesn't exist

Ensure that you've spelled the package correctly and that it exists
Ensure that the version you specified for the package exists
Ensure that you've specified the channel from which the package will be installed.
If you don't specify a channel, defaults are used and those defaults may or may not
have the package you're looking for
How to list channels in a conda yaml specification:
YAML
channels:
- conda-forge
- anaconda
dependencies:
- python = 3.8
- tensorflow = 2.8
Resources
Managing channels
pypi
Can't build mpi4py

This issue can happen when building wheels for mpi4py fails.
Potential causes:
Requirements for a successful mpi4py installation aren't met

There's something wrong with the method you've chosen to install mpi4py

Ensure that you have a working MPI installation (preference for MPI-3 support and for
MPI built with shared/dynamic libraries)
See mpi4py installation

If needed, follow these steps on building MPI
Ensure that you're using a compatible python version
Azure Machine Learning requires Python 2.5 or 3.5+, but Python 3.7+ is
recommended
See mpi4py installation
Resources
mpi4py installation
Interactive auth was attempted

This issue can happen when pip attempts interactive authentication during package
installation.
Potential causes:
You've listed a package that requires authentication, but you haven't provided
credentials
During the image build, pip tried to prompt you to authenticate which failed the
build because you can't provide interactive authentication during a build

Provide authentication via workspace connections
Python
ws.set_connection("connection1", "PythonFeed", "<URL>", "Basic", "
{'Username': '<username>', 'Password': '<password>'}")
Resources
Forbidden blob
This issue can happen when an attempt to access a blob in a storage account is rejected.
Potential causes:
The authorization method you're using to access the storage account is invalid
You're attempting to authorize via shared access signature (SAS), but the SAS
token is expired or invalid

Read the following to understand how to authorize access to blob data in the Azure
portal
Read the following to understand how to authorize access to data in Azure storage
Read the following if you're interested in using SAS to access Azure storage resources
Horovod build
This issue can happen when the conda environment fails to be created or updated
because horovod failed to build.
Potential causes:
Horovod installation requires other modules that you haven't installed

Horovod installation requires certain libraries that you haven't included

Many issues could cause a horovod failure, and there's a comprehensive list of them in
horovod's documentation
Review the horovod troubleshooting guide

Review your Build log to see if there's an error message that surfaced when
horovod failed to build
It's possible that the horovod troubleshooting guide explains the problem you're
encountering, along with a solution
Resources
horovod installation
Conda command not found

This issue can happen when the conda command isn't recognized during conda
environment creation or update.
Potential causes:
You haven't installed conda in the base image you're using

You haven't installed conda via your Dockerfile before you try to execute the conda
command
You haven't included conda in your path, or you haven't added it to your path

Ensure that you have a conda installation step in your Dockerfile before trying to
execute any conda commands
Review this list of conda installers to determine what you need for your scenario
If you've tried installing conda and are experiencing this issue, ensure that you've added
conda to your path
Review this example for guidance

Review how to set environment variables in a Dockerfile
Resources
All available conda distributions are found in the conda repository
Incompatible Python version

This issue can happen when there's a package specified in your conda environment that
isn't compatible with your specified Python version.

Use a different version of the package that's compatible with your specified Python
version
Alternatively, use a different version of Python that's compatible with the package
you've specified
If you're changing your Python version, use a version that's supported and that
isn't nearing its end-of-life soon
See Python end-of-life dates
Resources
Python documentation by version
Conda bare redirection

This issue can happen when you've specified a package on the command line using "<"
or ">" without using quotes. This syntax can cause conda environment creation or
update to fail.

Add quotes around the package specification
For example, change conda install -y pip<=20.1.1 to conda install -y

"pip<=20.1.1"
UTF-8 decoding error

This issue can happen when there's a failure decoding a character in your conda
specification.
Potential causes:
Your conda YAML file contains characters that aren't compatible with UTF-8.

Pip issues during build
Failed to install packages

This issue can happen when your image build fails during Python package installation.
Potential causes:
There are many issues that could cause this error

This message is generic and is surfaced when Azure Machine Learning analysis
doesn't yet cover the error you're encountering

Review your Build log for more information on your image build failure
Leave feedback for the Azure Machine Learning team to analyze the error you're
experiencing
File a problem or suggestion
Can't uninstall package

This issue can happen when pip fails to uninstall a Python package that the operating
system's package manager installed.
Potential causes:
An existing pip problem or a problematic pip version

An issue arising from not using an isolated environment

Read the following and determine if an existing pip problem caused your failure
Can't uninstall while creating Docker image

pip 10 disutils partial uninstall issue
pip 10 no longer uninstalls disutils packages
Try the following
pip install --ignore-installed [package]
Try creating a separate environment using conda
Invalid operator
This issue can happen when pip fails to install a Python package due to an invalid
operator found in the requirement.
Potential causes:
There's an invalid operator found in the Python package requirement

Ensure that you've spelled the package correctly and that the specified version
exists
Ensure that your package version specifier is formatted correctly and that you're
using valid comparison operators. See Version specifiers
Replace the invalid operator with the operator recommended in the error message
No matching distribution
This issue can happen when there's no package found that matches the version you
specified.
Potential causes:
You spelled the package name incorrectly

The package and version can't be found on the channels or feeds that you
specified
The version you specified doesn't exist

Ensure that you've spelled the package correctly and that it exists
Ensure that the version you specified for the package exists
Run pip install --upgrade pip and then run the original command again
Ensure the pip you're using can install packages for the desired Python version.
See Should I use pip or pip3?
Resources
Running Pip
pypi
Installing Python Modules
Invalid wheel filename

This issue can happen when you've specified a wheel file incorrectly.
Potential causes:
You spelled the wheel filename incorrectly or used improper formatting

The wheel file you specified can't be found

Ensure that you've spelled the filename correctly and that it exists
Ensure that you're following the format for wheel filenames
Make issues
No targets specified and no makefile found

This issue can happen when you haven't specified any targets and no makefile is found
when running make .
Potential causes:
Makefile doesn't exist in the current directory

No targets are specified

Ensure that you've spelled the makefile correctly
Ensure that the makefile exists in the current directory
If you have a custom makefile, specify it using make -f custommakefile
Specify targets in the makefile or in the command line
Configure your build and generate a makefile
Ensure that you've formatted your makefile correctly and that you've used tabs for
indentation
Resources
GNU Make
Copy issues
File not found

This issue can happen when Docker fails to find and copy a file.
Potential causes:
Source file not found in Docker build context

Source file excluded by .dockerignore

step.
Ensure that the source file exists in the Docker build context
Ensure that the source and destination paths exist and are spelled correctly
Ensure that the source file isn't listed in the .dockerignore of the current and
parent directories
Remove any trailing comments from the same line as the COPY command
Resources
Docker COPY
Docker Build Context
Apt-Get Issues
Failed to run apt-get command

This issue can happen when apt-get fails to run.
Potential causes:
Network connection issue, which could be temporary

Broken dependencies related to the package you're running apt-get on
You don't have the correct permissions to use the apt-get command

step.
Check your network connection and DNS settings

Run apt-get check to check for broken dependencies
Run apt-get update and then run your original command again
Run the command with the -f flag, which will try to resolve the issue coming from
the broken dependencies
Run the command with sudo permissions, such as sudo apt-get install <package-
name>
Resources
Package management with APT

Ubuntu Apt-Get
What to do when apt-get fails
apt-get command in Linux with Examples
Docker push issues
Failed to store Docker image

This issue can happen when there's a failure in pushing a Docker image to a container
registry.
Potential causes:
A transient issue has occurred with the ACR associated with the workspace
A container registry behind a virtual network is using a private endpoint in an
unsupported region
Failure in building environments from the UI, SDK, and CLI.

Retry the environment build if you suspect the failure is a transient issue with the
workspace's Azure Container Registry (ACR)
If your container registry is behind a virtual network or is using a private endpoint in an

unsupported region
Configure the container registry by using the service endpoint (public access) from
the portal and retry
After you put the container registry behind a virtual network, run the Azure
Resource Manager template so the workspace can communicate with the
container registry instance
If you aren't using a virtual network, or if you've configured it correctly, test that your
credentials are correct for your ACR by attempting a simple local build
Get credentials for your workspace ACR from the Azure portal
Log in to your ACR using docker login <myregistry.azurecr.io> -u "username" -p
"password"
For an image "helloworld", test pushing to your ACR by running docker push
helloworld
See Quickstart: Build and run a container image using Azure Container Registry
Tasks
Unknown Docker command
Unknown Docker instruction

This issue can happen when Docker doesn't recognize an instruction in the Dockerfile.
Potential causes:
Unknown Docker instruction being used in Dockerfile

Your Dockerfile contains invalid syntax

step.
Ensure that the Docker command is valid and spelled correctly

Ensure there's a space between the Docker command and arguments
Ensure there's no unnecessary whitespace in the Dockerfile
Ensure Dockerfile is formatted correctly and is encoded in UTF-8
Resources
Dockerfile reference
Command Not Found
Command not recognized

This issue can happen when the command being run isn't recognized.
Potential causes:
You haven't installed the command via your Dockerfile before you try to execute
the command
You haven't included the command in your path, or you haven't added it to your
path

step.
Troubleshooting steps Ensure that you have an installation step for the command in
your Dockerfile before trying to execute the command
Review this example

If you've tried installing the command and are experiencing this issue, ensure that
you've added the command to your path
Review this example

Review how to set environment variables in a Dockerfile
Miscellaneous build issues
Build log unavailable

Potential causes:
Azure Machine Learning isn't authorized to store your build logs in your storage
account
A transient error occurred while saving your build logs
A system error occurred before an image build was triggered
A successful build, but no available logs.

A rebuild may fix the issue if it's transient
Image not found

This issue can happen when the base image you specified can't be found.
Potential causes:
You specified the image incorrectly

The image you specified doesn't exist in the registry you specified

step.
Ensure that the base image is spelled and formatted correctly
Ensure that the base image you're using exists in the registry you specified
Resources
Azure Machine Learning base images

Troubleshooting machine learning
pipelines
Article • 05/29/2023
In this article, you learn how to troubleshoot when you get errors running a machine
learning pipeline in the Azure Machine Learning SDK and Azure Machine Learning
designer.
Troubleshooting tips
The following table contains common problems during pipeline development, with
potential solutions.
Problem Possible solution
Unable to Ensure you have created a directory in the script that corresponds to where your
pass data to pipeline expects the step output data. In most cases, an input argument will
PipelineData define the output directory, and then you create the directory explicitly. Use
directory os.makedirs(args.output_dir, exist_ok=True) to create the output directory. See
the tutorial for a scoring script example that shows this design pattern.
Dependency If you see dependency errors in your remote pipeline that did not occur when
bugs locally testing, confirm your remote environment dependencies and versions
match those in your test environment. (See Environment building, caching, and
reuse
Ambiguous Try deleting and re-creating compute targets. Re-creating compute targets is
errors with quick and can solve some transient issues.
compute
targets
Pipeline not Step reuse is enabled by default, but ensure you haven't disabled it in a pipeline
reusing steps step. If reuse is disabled, the allow_reuse parameter in the step will be set to
False .
Pipeline is To ensure that steps only rerun when their underlying data or scripts change,
rerunning decouple your source-code directories for each step. If you use the same source
unnecessarily directory for multiple steps, you may experience unnecessary reruns. Use the
source_directory parameter on a pipeline step object to point to your isolated
directory for that step, and ensure you aren't using the same source_directory
path for multiple steps.
Problem Possible solution
Step slowing Try switching any file writes, including logging, from as_mount() to as_upload() .
down over The mount mode uses a remote virtualized filesystem and uploads the entire file
training each time it is appended to.
epochs or
other
looping
behavior
Compute Docker images for compute targets are loaded from Azure Container Registry
target takes a (ACR). By default, Azure Machine Learning creates an ACR that uses the basic
long time to service tier. Changing the ACR for your workspace to standard or premium tier
start may reduce the time it takes to build and load images. For more information, see
Azure Container Registry service tiers.
Authentication errors
If you perform a management operation on a compute target from a remote job, you
will receive one of the following errors:
JSON
{"code":"Unauthorized","statusCode":401,"message":"Unauthorized","details":
[{"code":"InvalidOrExpiredToken","message":"The request token was either
invalid or expired. Please try again with a valid token."}]}
JSON
{"error":{"code":"AuthenticationFailed","message":"Authentication failed."}}
For example, you will receive an error if you try to create or attach a compute target
from an ML Pipeline that is submitted for remote execution.
Troubleshooting ParallelRunStep
The script for a ParallelRunStep must contain two functions:
init() : Use this function for any costly or common preparation for later inference.
For example, use it to load the model into a global object. This function will be
called only once at beginning of process.
run(mini_batch) : The function will run for each mini_batch instance.
mini_batch : ParallelRunStep will invoke run method and pass either a list or
pandas DataFrame as an argument to the method. Each entry in mini_batch will

be a file path if input is a FileDataset or a pandas DataFrame if input is a
TabularDataset .
response : run() method should return a pandas DataFrame or an array. For
append_row output_action, these returned elements are appended into the

common output file. For summary_only, the contents of the elements are
ignored. For all output actions, each returned output element indicates one
successful run of input element in the input mini-batch. Make sure that enough
data is included in run result to map input to run output result. Run output will
be written in output file and not guaranteed to be in order, you should use
some key in the output to map it to input.
Python
%%writefile digit_identification.py
# Snippets from a sample script.
# Refer to the accompanying digit_identification.py
# (https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-
azureml/machine-learning-pipelines/parallel-run)
# for the implementation script.
import os
import numpy as np
def init():
global g_tf_sess
# Pull down the model from the workspace

model_path = Model.get_model_path("mnist")
# Construct a graph to execute

saver = tf.train.import_meta_graph(os.path.join(model_path, 'mnist-
tf.model.meta'))
g_tf_sess = tf.Session()
saver.restore(g_tf_sess, os.path.join(model_path, 'mnist-tf.model'))
def run(mini_batch):
print(f'run method start: {__file__}, run({mini_batch})')
resultList = []
in_tensor = g_tf_sess.graph.get_tensor_by_name("network/X:0")
output = g_tf_sess.graph.get_tensor_by_name("network/output/MatMul:0")
for image in mini_batch:

# Prepare each image
data = Image.open(image)
np_im = np.array(data).reshape((1, 784))
# Perform inference
inference_result = output.eval(feed_dict={in_tensor: np_im},
session=g_tf_sess)
# Find the best probability, and add it to the result list
best_result = np.argmax(inference_result)
resultList.append("{}: {}".format(os.path.basename(image),
best_result))
return resultList
If you have another file or folder in the same directory as your inference script, you can
reference it by finding the current working directory.
Python
script_dir = os.path.realpath(os.path.join(__file__, '..',))

file_path = os.path.join(script_dir, "<file_name>")
Parameters for ParallelRunConfig

ParallelRunConfig is the major configuration for ParallelRunStep instance within the
Azure Machine Learning pipeline. You use it to wrap your script and configure necessary
parameters, including all of the following entries:
entry_script : A user script as a local file path that will be run in parallel on
multiple nodes. If source_directory is present, use a relative path. Otherwise, use
any path that's accessible on the machine.
mini_batch_size : The size of the mini-batch passed to a single run() call.
(optional; the default value is 10 files for FileDataset and 1MB for
TabularDataset .)
For FileDataset , it's the number of files with a minimum value of 1 . You can
combine multiple files into one mini-batch.
For TabularDataset , it's the size of data. Example values are 1024 , 1024KB , 10MB ,
and 1GB . The recommended value is 1MB . The mini-batch from TabularDataset
will never cross file boundaries. For example, if you have .csv files with various
sizes, the smallest file is 100 KB and the largest is 10 MB. If you set
mini_batch_size = 1MB , then files with a size smaller than 1 MB will be treated
as one mini-batch. Files with a size larger than 1 MB will be split into multiple
mini-batches.
error_threshold : The number of record failures for TabularDataset and file failures
for FileDataset that should be ignored during processing. If the error count for
the entire input goes above this value, the job will be aborted. The error threshold
is for the entire input and not for individual mini-batch sent to the run() method.
The range is [-1, int.max] . The -1 part indicates ignoring all failures during
processing.
output_action : One of the following values indicates how the output will be
organized:
summary_only : The user script will store the output. ParallelRunStep will use the
output only for the error threshold calculation.
append_row : For all inputs, only one file will be created in the output folder to
append all outputs separated by line.

append_row_file_name : To customize the output file name for append_row
output_action (optional; default value is parallel_run_step.txt ).

source_directory : Paths to folders that contain all files to execute on the compute
target (optional).
compute_target : Only AmlCompute is supported.
node_count : The number of compute nodes to be used for running the user script.
process_count_per_node : The number of processes per node. Best practice is to set

to the number of GPU or CPU one node has (optional; default value is 1 ).
environment : The Python environment definition. You can configure it to use an
existing Python environment or to set up a temporary environment. The definition
is also responsible for setting the required application dependencies (optional).
logging_level : Log verbosity. Values in increasing verbosity are: WARNING , INFO ,
and DEBUG . (optional; the default value is INFO )
run_invocation_timeout : The run() method invocation timeout in seconds.
(optional; default value is 60 )
run_max_try : Maximum try count of run() for a mini-batch. A run() is failed if an
exception is thrown, or nothing is returned when run_invocation_timeout is

reached (optional; default value is 3 ).
You can specify mini_batch_size , node_count , process_count_per_node , logging_level ,

run_invocation_timeout , and run_max_try as PipelineParameter , so that when you
resubmit a pipeline run, you can fine-tune the parameter values. In this example, you
use PipelineParameter for mini_batch_size and Process_count_per_node and you will
change these values when you resubmit a run later.
Parameters for creating the ParallelRunStep

Create the ParallelRunStep by using the script, environment configuration, and
parameters. Specify the compute target that you already attached to your workspace as
the target of execution for your inference script. Use ParallelRunStep to create the
batch inference pipeline step, which takes all the following parameters:
name : The name of the step, with the following naming restrictions: unique, 3-32
characters, and regex ^[a-z]([-a-z0-9]*[a-z0-9])?$.

parallel_run_config : A ParallelRunConfig object, as defined earlier.
inputs : One or more single-typed Azure Machine Learning datasets to be
partitioned for parallel processing.

side_inputs : One or more reference data or datasets used as side inputs without
need to be partitioned.
output : An OutputFileDatasetConfig object that corresponds to the output
directory.
arguments : A list of arguments passed to the user script. Use unknown_args to
retrieve them in your entry script (optional).
allow_reuse : Whether the step should reuse previous results when run with the
same settings/inputs. If this parameter is False , a new run will always be

generated for this step during pipeline execution. (optional; the default value is
True .)
Python
from azureml.pipeline.steps import ParallelRunStep
parallelrun_step = ParallelRunStep(
name="predict-digits-mnist",
parallel_run_config=parallel_run_config,
inputs=[input_mnist_ds_consumption],
output=output_dir,
allow_reuse=True
)
Debugging techniques
There are three major techniques for debugging pipelines:
Debug individual pipeline steps on your local computer

Use logging and Application Insights to isolate and diagnose the source of the
problem
Attach a remote debugger to a pipeline running in Azure
Debug scripts locally
One of the most common failures in a pipeline is that the domain script does not run as
intended, or contains runtime errors in the remote compute context that are difficult to
debug.
Pipelines themselves cannot be run locally, but running the scripts in isolation on your
local machine allows you to debug faster because you don't have to wait for the
compute and environment build process. Some development work is required to do
this:
If your data is in a cloud datastore, you will need to download data and make it
available to your script. Using a small sample of your data is a good way to cut
down on runtime and quickly get feedback on script behavior
If you are attempting to simulate an intermediate pipeline step, you may need to
manually build the object types that the particular script is expecting from the
prior step
You will also need to define your own environment, and replicate the
dependencies defined in your remote compute environment
Once you have a script setup to run on your local environment, it is much easier to do
debugging tasks like:
Attaching a custom debug configuration

Pausing execution and inspecting object-state
Catching type or logical errors that won't be exposed until runtime
 Tip
Once you can verify that your script is running as expected, a good next step is
running the script in a single-step pipeline before attempting to run it in a pipeline
with multiple steps.
Configure, write to, and review pipeline logs

Testing scripts locally is a great way to debug major code fragments and complex logic
before you start building a pipeline, but at some point you will likely need to debug
scripts during the actual pipeline run itself, especially when diagnosing behavior that
occurs during the interaction between pipeline steps. We recommend liberal use of
print() statements in your step scripts so that you can see object state and expected
values during remote execution, similar to how you would debug JavaScript code.
Logging options and behavior
The table below provides information for different debug options for pipelines. It isn't
an exhaustive list, as other options exist besides just the Azure Machine Learning,
Python, and OpenCensus ones shown here.
Library Type Example Destination Resources
Azure Machine Metric run.log(name, val) Azure How to track

Learning SDK Machine experiments
Learning azureml.core.Run
Portal UI class
Python Log print(val) Driver logs, How to track

printing/logging logging.info(message) Azure experiments
Machine
Learning Python
designer logging
OpenCensus Log logger.addHandler(AzureLogHandler()) Application Debug pipelines

Python logging.log(message) Insights - in Application
traces Insights
OpenCensus
Azure Monitor
Exporters
Python logging
cookbook
Logging options example
Python
import logging

from opencensus.ext.azure.log_exporter import AzureLogHandler
# Azure Machine Learning Scalar value logging

run.log("scalar_value", 0.95)
# Python print statement

print("I am a python print statement, I will be sent to the driver logs.")
# Initialize Python logger

logger = logging.getLogger(__name__)
logger.setLevel(args.log_level)
# Plain Python logging statements
logger.debug("I am a plain debug statement, I will be sent to the driver
logs.")
logger.info("I am a plain info statement, I will be sent to the driver
logs.")
handler = AzureLogHandler(connection_string='<connection string>')

logger.addHandler(handler)
# Python logging with OpenCensus AzureLogHandler

logger.warning("I am an OpenCensus warning statement, find me in Application
Insights!")
logger.error("I am an OpenCensus error statement with custom dimensions",
{'step_id': run.id})

For pipelines created in the designer, you can find the 70_driver_log file in either the
authoring page, or in the pipeline run detail page.
Enable logging for real-time endpoints

In order to troubleshoot and debug real-time endpoints in the designer, you must
enable Application Insight logging using the SDK. Logging lets you troubleshoot and
debug model deployment and usage issues. For more information, see Logging for
deployed models.
Get logs from the authoring page

When you submit a pipeline run and stay in the authoring page, you can find the log
files generated for each component as each component finishes running.
1. Select a component that has finished running in the authoring canvas.
2. In the right pane of the component, go to the Outputs + logs tab.
3. Expand the right pane, and select the 70_driver_log.txt to view the file in browser.
You can also download logs locally.
Get logs from pipeline runs
You can also find the log files for specific runs in the pipeline run detail page, which can
be found in either the Pipelines or Experiments section of the studio.
1. Select a pipeline run created in the designer.
2. Select a component in the preview pane.
3. In the right pane of the component, go to the Outputs + logs tab.
4. Expand the right pane to view the std_log.txt file in browser, or select the file to
download the logs locally.
） Important
To update a pipeline from the pipeline run details page, you must clone the
pipeline run to a new pipeline draft. A pipeline run is a snapshot of the pipeline. It's
similar to a log file, and cannot be altered.
Application Insights
For more information on using the OpenCensus Python library in this manner, see this
guide: Debug and troubleshoot machine learning pipelines in Application Insights
Interactive debugging with Visual Studio Code

In some cases, you may need to interactively debug the Python code used in your ML
pipeline. By using Visual Studio Code (VS Code) and debugpy, you can attach to the
code as it runs in the training environment. For more information, visit the interactive
debugging in VS Code guide.
HyperdriveStep and AutoMLStep fail with

network isolation
After using HyperdriveStep and AutoMLStep, when you attempt to register the model
you may receive an error.
You are using Azure Machine Learning SDK v1.
Your Azure Machine Learning workspace is configured for network isolation (VNet).
Your pipeline attempts to register the model generated by the previous step. For
example, in the following code the inputs parameter is the saved_model from a
HyperdriveStep:
Python
register_model_step = PythonScriptStep(script_name='register_model.py',
name="register_model_step01",
inputs=[saved_model],
compute_target=cpu_cluster,
arguments=["--saved-model",
saved_model],
allow_reuse=True,
runconfig=rcfg)
Workaround
） Important
This behavior does not occur when using Azure Machine Learning SDK v2.
To work around this error, use the Run class to get the model created from the
HyperdriveStep or AutoMLStep. The following is an example script that gets the output
model from a HyperdriveStep:
Python
%%writefile $script_folder/model_download9.py
import argparse
from azureml.pipeline.core import PipelineRun
from azureml.train.hyperdrive import HyperDriveRun
from azureml.pipeline.steps import HyperDriveStepRun
if __name__ == "__main__":
'--hd_step_name',
type=str, dest='hd_step_name',
help='The name of the step that runs AutoML training within this
pipeline')
current_run = Run.get_context()
pipeline_run = PipelineRun(current_run.experiment,
current_run.experiment.name)
hd_step_run =
HyperDriveStepRun((pipeline_run.find_step_run(args.hd_step_name))[0])
hd_best_run = hd_step_run.get_best_run_by_primary_metric()
print(hd_best_run)
hd_best_run.download_file("outputs/model/saved_model.pb",
"saved_model.pb")
print("Successfully downloaded model")
The file can then be used from a PythonScriptStep:

Python

conda_dep.add_pip_package("azureml-sdk")
conda_dep.add_pip_package("azureml-pipeline")
rcfg = RunConfiguration(conda_dependencies=conda_dep)
model_download_step = PythonScriptStep(
name="Download Model 9",
script_name="model_download9.py",
arguments=["--hd_step_name", hd_step_name],
allow_reuse=False,
runconfig=rcfg
)
Next steps
For a complete tutorial using ParallelRunStep , see Tutorial: Build an Azure
Machine Learning pipeline for batch scoring.
For a complete example showing automated machine learning in ML pipelines, see

Use automated ML in an Azure Machine Learning pipeline in Python.
See the SDK reference for help with the azureml-pipelines-core package and the
azureml-pipelines-steps package.
See the list of designer exceptions and error codes.

Collect machine learning pipeline log
files in Application Insights for alerts
and debugging
Article • 06/01/2023
The OpenCensus Python library can be used to route logs to Application Insights from
your scripts. Aggregating logs from pipeline runs in one place allows you to build
queries and diagnose issues. Using Application Insights will allow you to track logs over
time and compare pipeline logs across runs.
Having your logs in once place will provide a history of exceptions and error messages.
Since Application Insights integrates with Azure Alerts, you can also create alerts based
on Application Insights queries.
Prerequisites
Follow the steps to create an Azure Machine Learning workspace and create your
first pipeline
SDK.
Install the OpenCensus Azure Monitor Exporter package locally:
Python
pip install opencensus-ext-azure
Create an Application Insights instance (this doc also contains information on

getting the connection string for the resource)
Getting Started
This section is an introduction specific to using OpenCensus from an Azure Machine
Learning pipeline. For a detailed tutorial, see OpenCensus Azure Monitor Exporters
Add a PythonScriptStep to your Azure Machine Learning Pipeline. Configure your

RunConfiguration with the dependency on opencensus-ext-azure. Configure the
APPLICATIONINSIGHTS_CONNECTION_STRING environment variable.
Python

# Connecting to the workspace and compute target not shown
# Add pip dependency on OpenCensus

dependencies = CondaDependencies()
dependencies.add_pip_package("opencensus-ext-azure>=1.0.1")
run_config = RunConfiguration(conda_dependencies=dependencies)
# Add environment variable with Application Insights Connection String

# Replace the value with your own connection string
run_config.environment.environment_variables = {
"APPLICATIONINSIGHTS_CONNECTION_STRING": 'InstrumentationKey=00000000-
0000-0000-0000-000000000000'
}
# Configure step with runconfig

sample_step = PythonScriptStep(
script_name="sample_step.py",
runconfig=run_config
)
# Submit new pipeline run

pipeline = Pipeline(workspace=ws, steps=[sample_step])
pipeline.submit(experiment_name="Logging_Experiment")
Create a file called sample_step.py . Import the AzureLogHandler class to route logs to
Application Insights. You'll also need to import the Python Logging library.
Python
from opencensus.ext.azure.log_exporter import AzureLogHandler

import logging
Next, add the AzureLogHandler to the Python logger.
Python
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
# Assumes the environment variable APPLICATIONINSIGHTS_CONNECTION_STRING is

already set
logger.addHandler(AzureLogHandler())
logger.warning("I will be sent to Application Insights")
Logging with Custom Dimensions

By default, logs forwarded to Application Insights won't have enough context to trace
back to the run or experiment. To make the logs actionable for diagnosing issues, more
fields are needed.
To add these fields, Custom Dimensions can be added to provide context to a log
message. One example is when someone wants to view logs across multiple steps in the
same pipeline run.
Custom Dimensions make up a dictionary of key-value (stored as string, string) pairs.

The dictionary is then sent to Application Insights and displayed as a column in the
query results. Its individual dimensions can be used as query parameters.
Helpful Context to include
Field Reasoning/Example
parent_run_id Can query logs for ones with the same parent_run_id to see logs over time for
all steps, instead of having to dive into each individual step
step_id Can query logs for ones with the same step_id to see where an issue occurred
with a narrow scope to just the individual step
step_name Can query logs to see step performance over time. Also helps to find a
step_id for recent runs without diving into the portal UI
experiment_name Can query across logs to see experiment performance over time. Also helps
find a parent_run_id or step_id for recent runs without diving into the portal
UI
run_url Can provide a link directly back to the run for investigation.
Other helpful fields
These fields may require extra code instrumentation, and aren't provided by the run
context.
build_url/build_version If using CI/CD to deploy, this field can correlate logs to the code version
that provided the step and pipeline logic. This link can further help to
diagnose issues, or identify models with specific traits (log/metric values)
run_type Can differentiate between different model types, or training vs. scoring
runs
Creating a Custom Dimensions dictionary

Python
run = Run.get_context(allow_offline=False)
custom_dimensions = {
"parent_run_id": run.parent.id,
"step_id": run.id,
"step_name": run.name,
"experiment_name": run.experiment.name,
"run_url": run.parent.get_portal_url(),
"run_type": "training"
}
# Assumes AzureLogHandler was already registered above

logger.info("I will be sent to Application Insights with Custom Dimensions",
extra= {"custom_dimensions":custom_dimensions})
OpenCensus Python logging considerations

The OpenCensus AzureLogHandler is used to route Python logs to Application Insights.
As a result, Python logging nuances should be considered. When a logger is created, it
has a default log level and will show logs greater than or equal to that level. A good
reference for using Python logging features is the Logging Cookbook .
The APPLICATIONINSIGHTS_CONNECTION_STRING environment variable is needed for the

OpenCensus library. We recommend setting this environment variable instead of
passing it in as a pipeline parameter to avoid passing around plaintext connection
strings.
Querying logs in Application Insights

The logs routed to Application Insights will show up under 'traces' or 'exceptions'. Be
sure to adjust your time window to include your pipeline run.
The result in Application Insights will show the log message and level, file path, and
code line number. It will also show any custom dimensions included. In this image, the
customDimensions dictionary shows the key/value pairs from the previous code sample.
Other helpful queries

Some of the queries below use 'customDimensions.Level'. These severity levels
correspond to the level the Python log was originally sent with. For more query
information, see Azure Monitor Log Queries.
Use case Query
Log results for specific custom

dimension, for example
traces |
'parent_run_id'
where customDimensions.parent_run_id ==
'931024c2-3720-11ea-b247-c49deda841c1
Log results for all training runs over

the last seven days
traces |
where timestamp > ago(7d)
and customDimensions.run_type ==
'training'
Use case Query
Log results with severityLevel Error

from the last seven days
traces |
and customDimensions.Level == 'ERROR'
Count of log results with severityLevel

Error over the last seven days
traces |
and customDimensions.Level == 'ERROR' |
summarize count()
Next Steps
Once you have logs in your Application Insights instance, they can be used to set Azure
Monitor alerts based on query results.
You can also add results from queries to an Azure Dashboard for more insights.
Troubleshooting the ParallelRunStep
Article • 03/02/2023
In this article, you learn how to troubleshoot when you get errors using the
ParallelRunStep class from the Azure Machine Learning SDK.
For general tips on troubleshooting a pipeline, see Troubleshooting machine learning

pipelines.
Testing scripts locally

Your ParallelRunStep runs as a step in ML pipelines. You may want to test your scripts
locally as a first step.
Entry script requirements

The entry script for a ParallelRunStep must contain a run() function and optionally
contains an init() function:
init() : Use this function for any costly or common preparation for later
processing. For example, use it to load the model into a global object. This
function will be called only once at beginning of process.
７ Note
If your init method creates an output directory, specify that parents=True

and exist_ok=True . The init method is called from each worker process on
every node on which the job is running.
run(mini_batch) : The function will run for each mini_batch instance.
mini_batch : ParallelRunStep will invoke run method and pass either a list or
pandas DataFrame as an argument to the method. Each entry in mini_batch will
be a file path if input is a FileDataset or a pandas DataFrame if input is a
TabularDataset .
response : run() method should return a pandas DataFrame or an array. For
append_row output_action, these returned elements are appended into the

common output file. For summary_only, the contents of the elements are
ignored. For all output actions, each returned output element indicates one
successful run of input element in the input mini-batch. Make sure that enough
data is included in run result to map input to run output result. Run output will
be written in output file and not guaranteed to be in order, you should use
some key in the output to map it to input.
７ Note
One output element is expected for one input element.
Python
%%writefile digit_identification.py
# Snippets from a sample script.
# Refer to the accompanying digit_identification.py
# (https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-
azureml/machine-learning-pipelines/parallel-run)
# for the implementation script.
import os
import numpy as np
def init():
global g_tf_sess
# Pull down the model from the workspace

model_path = Model.get_model_path("mnist")
# Construct a graph to execute

saver = tf.train.import_meta_graph(os.path.join(model_path, 'mnist-
tf.model.meta'))
g_tf_sess = tf.Session()
saver.restore(g_tf_sess, os.path.join(model_path, 'mnist-tf.model'))
print(f'run method start: {__file__}, run({mini_batch})')
resultList = []
in_tensor = g_tf_sess.graph.get_tensor_by_name("network/X:0")
output = g_tf_sess.graph.get_tensor_by_name("network/output/MatMul:0")
for image in mini_batch:

# Prepare each image
data = Image.open(image)
np_im = np.array(data).reshape((1, 784))
# Perform inference
inference_result = output.eval(feed_dict={in_tensor: np_im},
session=g_tf_sess)
# Find the best probability, and add it to the result list
best_result = np.argmax(inference_result)
resultList.append("{}: {}".format(os.path.basename(image),
best_result))
return resultList
If you have another file or folder in the same directory as your inference script, you can
reference it by finding the current working directory. If you want to import your
packages, you can also append your package folder to sys.path .
Python
script_dir = os.path.realpath(os.path.join(__file__, '..',))

file_path = os.path.join(script_dir, "<file_name>")
packages_dir = os.path.join(file_path, '<your_package_folder>')

if packages_dir not in sys.path:
sys.path.append(packages_dir)
from <your_package> import <your_class>
Parameters for ParallelRunConfig

ParallelRunConfig is the major configuration for ParallelRunStep instance within the
Azure Machine Learning pipeline. You use it to wrap your script and configure necessary
parameters, including all of the following entries:
entry_script : A user script as a local file path that will be run in parallel on
multiple nodes. If source_directory is present, use a relative path. Otherwise, use

any path that's accessible on the machine.
mini_batch_size : The size of the mini-batch passed to a single run() call.
(optional; the default value is 10 files for FileDataset and 1MB for
TabularDataset .)
For FileDataset , it's the number of files with a minimum value of 1 . You can
combine multiple files into one mini-batch.
For TabularDataset , it's the size of data. Example values are 1024 , 1024KB , 10MB ,
and 1GB . The recommended value is 1MB . The mini-batch from TabularDataset
will never cross file boundaries. For example, if you have .csv files with various
sizes, the smallest file is 100 KB and the largest is 10 MB. If you set
mini_batch_size = 1MB , then files with a size smaller than 1 MB will be treated
as one mini-batch. Files with a size larger than 1 MB will be split into multiple
mini-batches.
７ Note
TabularDatasets backed by SQL cannot be partitioned. TabularDatasets

from a single parquet file and single row group cannot be partitioned.
error_threshold : The number of record failures for TabularDataset and file failures
for FileDataset that should be ignored during processing. If the error count for
the entire input goes above this value, the job will be aborted. The error threshold
is for the entire input and not for individual mini-batch sent to the run() method.
The range is [-1, int.max] . The -1 part indicates ignoring all failures during
processing.
output_action : One of the following values indicates how the output will be
organized:
summary_only : The user script will store the output. ParallelRunStep will use the
output only for the error threshold calculation.
append_row : For all inputs, only one file will be created in the output folder to
append all outputs separated by line.
append_row_file_name : To customize the output file name for append_row
output_action (optional; default value is parallel_run_step.txt ).
source_directory : Paths to folders that contain all files to execute on the compute
target (optional).
compute_target : Only AmlCompute is supported.
node_count : The number of compute nodes to be used for running the user script.
process_count_per_node : The number of worker processes per node to run the

entry script in parallel. For a GPU machine, the default value is 1. For a CPU
machine, the default value is the number of cores per node. A worker process will
call run() repeatedly by passing the mini batch it gets. The total number of worker
processes in your job is process_count_per_node * node_count , which decides the
max number of run() to execute in parallel.
environment : The Python environment definition. You can configure it to use an
existing Python environment or to set up a temporary environment. The definition

is also responsible for setting the required application dependencies (optional).
logging_level : Log verbosity. Values in increasing verbosity are: WARNING , INFO ,
and DEBUG . (optional; the default value is INFO )
run_invocation_timeout : The run() method invocation timeout in seconds.
(optional; default value is 60 )
run_max_try : Maximum try count of run() for a mini-batch. A run() is failed if an
exception is thrown, or nothing is returned when run_invocation_timeout is

reached (optional; default value is 3 ).
You can specify mini_batch_size , node_count , process_count_per_node , logging_level ,

run_invocation_timeout , and run_max_try as PipelineParameter , so that when you
resubmit a pipeline run, you can fine-tune the parameter values. In this example, you
use PipelineParameter for mini_batch_size and Process_count_per_node and you will
change these values when you resubmit another run.
CUDA devices visibility
For compute targets equipped with GPUs, the environment variable

CUDA_VISIBLE_DEVICES will be set in worker processes. In AmlCompute, you can find the
total number of GPU devices in the environment variable AZ_BATCHAI_GPU_COUNT_FOUND ,

which is set automatically. If you want each worker process to have a dedicated GPU, set
process_count_per_node equal to the number of GPU devices on a machine. Each worker
process will assign a unique index to CUDA_VISIBLE_DEVICES . If a worker process stops for
any reason, the next started worker process will use the released GPU index.
If the total number of GPU devices is less than process_count_per_node , the worker
processes will be assigned GPU index until all have been used.
Given the total GPU devices is 2 and process_count_per_node = 4 as an example,

process 0 and process 1 will have index 0 and 1. Process 2 and 3 won't have an
environment variable. For a library using this environment variable for GPU assignment,
process 2 and 3 won't have GPUs and won't try to acquire GPU devices. If process 0
stops, it will release GPU index 0. The next process, which is process 4, will have GPU
index 0 assigned.
For more information, see CUDA Pro Tip: Control GPU Visibility with
CUDA_VISIBLE_DEVICES .
Parameters for creating the ParallelRunStep

Create the ParallelRunStep by using the script, environment configuration, and
parameters. Specify the compute target that you already attached to your workspace as
the target of execution for your inference script. Use ParallelRunStep to create the
batch inference pipeline step, which takes all the following parameters:
name : The name of the step, with the following naming restrictions: unique, 3-32
characters, and regex ^[a-z]([-a-z0-9]*[a-z0-9])?$.

parallel_run_config : A ParallelRunConfig object, as defined earlier.
inputs : One or more single-typed Azure Machine Learning datasets to be
partitioned for parallel processing.

side_inputs : One or more reference data or datasets used as side inputs without
need to be partitioned.
output : An OutputFileDatasetConfig object that represents the directory path at
which the output data will be stored.
arguments : A list of arguments passed to the user script. Use unknown_args to
retrieve them in your entry script (optional).
allow_reuse : Whether the step should reuse previous results when run with the
same settings/inputs. If this parameter is False , a new run will always be

generated for this step during pipeline execution. (optional; the default value is
True .)
Python
from azureml.pipeline.steps import ParallelRunStep
inputs=[input_mnist_ds_consumption],
output=output_dir,
allow_reuse=True
)
Debugging scripts from remote context

The transition from debugging a scoring script locally to debugging a scoring script in
an actual pipeline can be a difficult leap. For information on finding your logs in the
portal, see machine learning pipelines section on debugging scripts from a remote
context. The information in that section also applies to a ParallelRunStep.
For example, the log file 70_driver_log.txt contains information from the controller
that launches the ParallelRunStep code.
Because of the distributed nature of ParallelRunStep jobs, there are logs from several
different sources. However, two consolidated files are created that provide high-level
information:
~/logs/job_progress_overview.txt : This file provides a high-level info about the

number of mini-batches (also known as tasks) created so far and number of mini-
batches processed so far. At this end, it shows the result of the job. If the job failed,
it will show the error message and where to start the troubleshooting.
~/logs/sys/master_role.txt : This file provides the principal node (also known as
the orchestrator) view of the running job. Includes task creation, progress
monitoring, the run result.
Logs generated from entry script using EntryScript helper and print statements will be
found in following files:
~/logs/user/entry_script_log/<node_id>/<process_name>.log.txt : These files are
the logs written from entry_script using EntryScript helper.
~/logs/user/stdout/<node_id>/<process_name>.stdout.txt : These files are the logs
from stdout (for example, print statement) of entry_script.
~/logs/user/stderr/<node_id>/<process_name>.stderr.txt : These files are the logs

from stderr of entry_script.
For a concise understanding of errors in your script there is:
~/logs/user/error.txt : This file will try to summarize the errors in your script.
For more information on errors in your script, there is:
~/logs/user/error/ : Contains full stack traces of exceptions thrown while loading
and running entry script.
When you need a full understanding of how each node executed the score script, look
at the individual process logs for each node. The process logs can be found in the
sys/node folder, grouped by worker nodes:
~/logs/sys/node/<node_id>/<process_name>.txt : This file provides detailed info
about each mini-batch as it's picked up or completed by a worker. For each mini-
batch, this file includes:
The IP address and the PID of the worker process.
The total number of items, successfully processed items count, and failed item
count.
The start time, duration, process time and run method time.
You can also view the results of periodical checks of the resource usage for each node.
The log files and setup files are in this folder:
~/logs/perf : Set --resource_monitor_interval to change the checking interval in

seconds. The default interval is 600 , which is approximately 10 minutes. To stop the
monitoring, set the value to 0 . Each <node_id> folder includes:
os/ : Information about all running processes in the node. One check runs an
operating system command and saves the result to a file. On Linux, the
command is ps . On Windows, use tasklist .
%Y%m%d%H : The sub folder name is the time to hour.
processes_%M : The file ends with the minute of the checking time.
node_disk_usage.csv : Detailed disk usage of the node.

node_resource_usage.csv : Resource usage overview of the node.
processes_resource_usage.csv : Resource usage overview of each process.
How do I log from my user script from a

remote context?
ParallelRunStep may run multiple processes on one node based on
process_count_per_node. In order to organize logs from each process on node and
combine print and log statement, we recommend using ParallelRunStep logger as
shown below. You get a logger from EntryScript and make the logs show up in logs/user
folder in the portal.
A sample entry script using the logger:
Python
from azureml_user.parallel_run import EntryScript
def init():
"""Init once in a worker process."""
entry_script = EntryScript()
logger = entry_script.logger
logger.info("This will show up in files under logs/user on the Azure
portal.")
"""Call once for a mini batch. Accept and return the list back."""
# This class is in singleton pattern and will return same instance as
the one in init()
logger = entry_script.logger
logger.info(f"{__file__}: {mini_batch}.")
...
return mini_batch
Where does the message from Python logging

sink to?
ParallelRunStep sets a handler on the root logger, which sinks the message to
logs/user/stdout/<node_id>/processNNN.stdout.txt .
logging defaults to INFO level. By default, levels below INFO won't show up, such as
DEBUG .
How could I write to a file to show up in the

portal?
Files in logs folder will be uploaded and show up in the portal. You can get the folder
logs/user/entry_script_log/<node_id> like below and compose your file path to write:
Python

def init():
"""Init once in a worker process."""
log_dir = entry_script.log_dir
log_dir = Path(entry_script.log_dir) #
logs/user/entry_script_log/<node_id>/.
log_dir.mkdir(parents=True, exist_ok=True) # Create the folder if not
existing.
proc_name = entry_script.agent_name # The process name in pattern

"processNNN".
fil_path = log_dir / f"{proc_name}_<file_name>" # Avoid conflicting
among worker processes with proc_name.
How to handle log in new processes?

You can spawn new processes in you entry script with subprocess module, connect to
their input/output/error pipes and obtain their return codes.
The recommended approach is to use the run() function with capture_output=True .

Errors will show up in logs/user/error/<node_id>/<process_name>.txt .
If you want to use Popen() , you should redirect stdout/stderr to files, like:
Python

from subprocess import Popen
def init():
"""Show how to redirect stdout/stderr to files in
logs/user/entry_script_log/<node_id>/."""
proc_name = entry_script.agent_name # The process name in pattern
"processNNN".
log_dir = Path(entry_script.log_dir) #
logs/user/entry_script_log/<node_id>/.
log_dir.mkdir(parents=True, exist_ok=True) # Create the folder if not
existing.
stdout_file = str(log_dir / f"{proc_name}_demo_stdout.txt")
stderr_file = str(log_dir / f"{proc_name}_demo_stderr.txt")
proc = Popen(
["...")],
stdout=open(stdout_file, "w"),
stderr=open(stderr_file, "w"),
# ...
)
７ Note
A worker process runs "system" code and the entry script code in the same process.
If no stdout or stderr specified, a subprocess created with Popen() in your entry

script will inherit the setting of the worker process.
stdout will write to logs/sys/node/<node_id>/processNNN.stdout.txt and stderr to
logs/sys/node/<node_id>/processNNN.stderr.txt .
How do I write a file to the output directory,
and then view it in the portal?
You can get the output directory from the EntryScript class and write to it. To view the
written files, in the step Run view in the Azure Machine Learning portal, select the
Outputs + logs tab. Select the Data outputs link, and then complete the steps that are
described in the dialog.
Use EntryScript in your entry script like in this example:
Python

output_dir = Path(entry_script.output_dir)
(Path(output_dir) / res1).write...
(Path(output_dir) / res2).write...
How can I pass a side input such as, a file or

file(s) containing a lookup table, to all my
workers?
User can pass reference data to script using side_inputs parameter of ParalleRunStep. All
datasets provided as side_inputs will be mounted on each worker node. User can get the
location of mount by passing argument.
Construct a Dataset containing the reference data, specify a local mount path and
register it with your workspace. Pass it to the side_inputs parameter of your
ParallelRunStep . Additionally, you can add its path in the arguments section to easily
access its mounted path.
７ Note
Use FileDatasets only for side_inputs.
Python
local_path = "/tmp/{}".format(str(uuid.uuid4()))
label_config = label_ds.as_named_input("labels_input").as_mount(local_path)
batch_score_step = ParallelRunStep(
name=parallel_step_name,
inputs=[input_images.as_named_input("input_images")],
output=output_dir,
arguments=["--labels_dir", label_config],
side_inputs=[label_config],
)
After that you can access it in your inference script (for example, in your init() method)
as follows:
Python
parser.add_argument('--labels_dir', dest="labels_dir", required=True)
args, _ = parser.parse_known_args()
labels_path = args.labels_dir
How to use input datasets with service

principal authentication?
User can pass input datasets with service principal authentication used in workspace.
Using such dataset in ParallelRunStep requires that dataset to be registered for it to
construct ParallelRunStep configuration.
Python
service_principal = ServicePrincipalAuthentication(
tenant_id="***",
service_principal_id="***",
service_principal_password="***")
ws = Workspace(
subscription_id="***",
resource_group="***",
workspace_name="***",
auth=service_principal
)
default_blob_store = ws.get_default_datastore() # or Datastore(ws,

'***datastore-name***')
ds = Dataset.File.from_files(default_blob_store, '**path***')
registered_ds = ds.register(ws, '***dataset-name***',
How to Check Progress and Analyze it
This section is about how to check the progress of a ParallelRunStep job and check the
cause of unexpected behavior.
How to check job progress?

Besides looking at the overall status of the StepRun, the count of scheduled/processed
mini-batches and the progress of generating output can be viewed in
~/logs/job_progress_overview.<timestamp>.txt . The file rotates on daily basis, you can
check the one with the largest timestamp for the latest information.
What should I check if there is no progress for a while?

You can go into ~/logs/sys/errror to see if there's any exception. If there is none, it's
likely that your entry script is taking a long time, you can print out progress information
in your code to locate the time-consuming part, or add "--profiling_module",
"cProfile" to the arguments of ParallelRunStep to generate a profile file named as
<process_name>.profile under ~/logs/sys/node/<node_id> folder.
When will a job stop?

if not canceled, the job will stop with status:
Completed. If all mini-batches have been processed and output has been
generated for append_row mode.
Failed. If error_threshold in Parameters for ParallelRunConfig is exceeded, or
system error occurred during the job.
Where to find the root cause of failure?

You can follow the lead in ~logs/job_result.txt to find the cause and detailed error
log.
Will node failure impact the job result?

Not if there are other available nodes in the designated compute cluster. The
orchestrator will start a new node as replacement, and ParallelRunStep is resilient to
such operation.
What happens if init function in entry script fails?
ParallelRunStep has mechanism to retry for a certain times to give chance for recovery
from transient issues without delaying the job failure for too long, the mechanism is as
follows:
1. If after a node starts, init on all agents keeps failing, we will stop trying after 3 *
process_count_per_node failures.
2. If after job starts, init on all agents of all nodes keeps failing, we will stop trying if
job runs more than 2 minutes and there're 2 * node_count *
process_count_per_node failures.
3. If all agents are stuck on init for more than 3 * run_invocation_timeout + 30

seconds, the job would fail because of no progress for too long.
What will happen on OutOfMemory? How can I check the

cause?
ParallelRunStep will set the current attempt to process the mini-batch to failure status
and try to restart the failed process. You can check ~logs/perf/<node_id> to find the
memory-consuming process.
Why do I have a lot of processNNN files?

ParallelRunStep will start new worker processes in replace of the ones exited abnormally,
and each process will generate a processNNN file as log. However, if the process failed
because of exception during the init function of user script, and that the error
repeated continuously for 3 * process_count_per_node times, no new worker process
will be started.
Next steps
See these Jupyter notebooks demonstrating Azure Machine Learning pipelines
See the SDK reference for help with the azureml-pipeline-steps package.
View reference documentation for ParallelRunConfig class and documentation for

ParallelRunStep class.
Follow the advanced tutorial on using pipelines with ParallelRunStep. The tutorial
shows how to pass another file as a side input.
Interactive debugging with Visual
Studio Code
Article • 04/04/2023
Learn how to interactively debug Azure Machine Learning experiments, pipelines, and
deployments using Visual Studio Code (VS Code) and debugpy .
Run and debug experiments locally

Use the Azure Machine Learning extension to validate, run, and debug your machine
learning experiments before submitting them to the cloud.
Prerequisites
Azure Machine Learning VS Code extension (preview). For more information, see
Set up Azure Machine Learning VS Code extension.
） Important
The Azure Machine Learning VS Code extension uses the CLI (v2) by default.
The instructions in this guide use 1.0 CLI. To switch to the 1.0 CLI, set the
azureML.CLI Compatibility Mode setting in Visual Studio Code to 1.0 . For
more information on modifying your settings in Visual Studio Code, see the
user and workspace settings documentation .
） Important
This feature is currently in public preview. This preview version is provided

without a service-level agreement, and it's not recommended for production
workloads. Certain features might not be supported or might have
constrained capabilities. For more information, see Supplemental Terms of
Use for Microsoft Azure Previews .
Docker
Docker Desktop for Mac and Windows

Docker Engine for Linux.
７ Note
On Windows, make sure to configure Docker to use Linux containers .
 Tip
For Windows, although not required, it's highly recommended to use

Docker with Windows Subsystem for Linux (WSL) 2.
Python 3
Debug experiment locally
） Important
Before running your experiment locally make sure that:
Docker is running.
The azureML.CLI Compatibility Mode setting in Visual Studio Code is set to
1.0 as specified in the prerequisites
1. In VS Code, open the Azure Machine Learning extension view.
2. Expand the subscription node containing your workspace. If you don't already have
one, you can create an Azure Machine Learning workspace using the extension.
4. Right-click the Experiments node and select Create experiment. When the prompt
appears, provide a name for your experiment.
5. Expand the Experiments node, right-click the experiment you want to run and
select Run Experiment.
6. From the list of options, select Locally.
7. First time use on Windows only. When prompted to allow File Share, select Yes.
When you enable file share, it allows Docker to mount the directory containing
your script to the container. Additionally, it also allows Docker to store the logs
and outputs from your run in a temporary directory on your system.
8. Select Yes to debug your experiment. Otherwise, select No. Selecting no will run
your experiment locally without attaching to the debugger.
9. Select Create new Run Configuration to create your run configuration. The run
configuration defines the script you want to run, dependencies, and datasets used.
Alternatively, if you already have one, select it from the dropdown.
a. Choose your environment. You can choose from any of the Azure Machine
Learning curated or create your own.
b. Provide the name of the script you want to run. The path is relative to the
directory opened in VS Code.
c. Choose whether you want to use an Azure Machine Learning dataset or not. You
can create Azure Machine Learning datasets using the extension.
d. Debugpy is required in order to attach the debugger to the container running
your experiment. To add debugpy as a dependency,select Add Debugpy.
Otherwise, select Skip. Not adding debugpy as a dependency runs your
experiment without attaching to the debugger.
e. A configuration file containing your run configuration settings opens in the
editor. If you're satisfied with the settings, select Submit experiment.
Alternatively, you open the command palette (View > Command Palette) from
the menu bar and enter the AzureML: Submit experiment command into the text
box.
10. Once your experiment is submitted, a Docker image containing your script and the
configurations specified in your run configuration is created.
When the Docker image build process begins, the contents of the
60_control_log.txt file stream to the output console in VS Code.
７ Note
The first time your Docker image is created can take several minutes.
11. Once your image is built, a prompt appears to start the debugger. Set your
breakpoints in your script and select Start debugger when you're ready to start
debugging. Doing so attaches the VS Code debugger to the container running
your experiment. Alternatively, in the Azure Machine Learning extension, hover
over the node for your current run and select the play icon to start the debugger.
） Important
You cannot have multiple debug sessions for a single experiment. You can
however debug two or more experiments using multiple VS Code instances.
At this point, you should be able to step-through and debug your code using VS Code.
If at any point you want to cancel your run, right-click your run node and select Cancel
run.
Similar to remote experiment runs, you can expand your run node to inspect the logs
and outputs.
 Tip
Docker images that use the same dependencies defined in your environment are
reused between runs. However, if you run an experiment using a new or different
environment, a new image is created. Since these images are saved to your local
storage, it's recommended to remove old or unused Docker images. To remove
images from your system, use the Docker CLI or the VS Code Docker
extension .
Debug and troubleshoot machine learning

pipelines
In some cases, you may need to interactively debug the Python code used in your ML
pipeline. By using VS Code and debugpy, you can attach to the code as it runs in the
training environment.
Prerequisites
An Azure Machine Learning workspace that is configured to use an Azure Virtual
Network.
An Azure Machine Learning pipeline that uses Python scripts as part of the
pipeline steps. For example, a PythonScriptStep.
An Azure Machine Learning Compute cluster, which is in the virtual network and is
used by the pipeline for training.
A development environment that is in the virtual network. The development
environment might be one of the following:
An Azure Virtual Machine in the virtual network
A Compute instance of Notebook VM in the virtual network
A client machine that has private network connectivity to the virtual network,
either by VPN or via ExpressRoute.
For more information on using an Azure Virtual Network with Azure Machine Learning,
see Virtual network isolation and privacy overview.
 Tip
Although you can work with Azure Machine Learning resources that are not behind
a virtual network, using a virtual network is recommended.
How it works
Your ML pipeline steps run Python scripts. These scripts are modified to perform the
following actions:
1. Log the IP address of the host that they're running on. You use the IP address to
connect the debugger to the script.
2. Start the debugpy debug component, and wait for a debugger to connect.
3. From your development environment, you monitor the logs created by the training
process to find the IP address where the script is running.
4. You tell VS Code the IP address to connect the debugger to by using a

launch.json file.
5. You attach the debugger and interactively step through the script.
Configure Python scripts

To enable debugging, make the following changes to the Python script(s) used by steps
in your ML pipeline:
1. Add the following import statements:
Python
import argparse
import os
import debugpy
import socket
2. Add the following arguments. These arguments allow you to enable the debugger
as needed, and set the timeout for attaching the debugger:
Python
parser.add_argument('--remote_debug', action='store_true')
parser.add_argument('--remote_debug_connection_timeout', type=int,
default=300,
help=f'Defines how much time the Azure Machine
Learning compute target '
f'will await a connection from a debugger client
(VSCODE).')
parser.add_argument('--remote_debug_client_ip', type=str,
help=f'Defines IP Address of VS Code client')
parser.add_argument('--remote_debug_port', type=int,
default=5678,
help=f'Defines Port of VS Code client')
3. Add the following statements. These statements load the current run context so
that you can log the IP address of the node that the code is running on:
Python
global run
4. Add an if statement that starts debugpy and waits for a debugger to attach. If no
debugger attaches before the timeout, the script continues as normal. Make sure
to replace the HOST and PORT values is the listen function with your own.
Python
if args.remote_debug:
print(f'Timeout for debug connection:
{args.remote_debug_connection_timeout}')
# Log the IP and port
try:
ip = args.remote_debug_client_ip
except:
print("Need to supply IP address for VS Code client")
print(f'ip_address: {ip}')
debugpy.listen(address=(ip, args.remote_debug_port))
# Wait for the timeout for debugger to attach
debugpy.wait_for_client()
print(f'Debugger attached = {debugpy.is_client_connected()}')
The following Python example shows a simple train.py file that enables debugging:
Python
# Copyright (c) Microsoft. All rights reserved.

# Licensed under the MIT license.
import argparse
import os
import debugpy
import socket
print("In train.py")
print("As a data scientist, this is where I use my training code.")
parser = argparse.ArgumentParser("train")
parser.add_argument("--input_data", type=str, help="input data")

parser.add_argument("--output_train", type=str, help="output_train
directory")
# Argument check for remote debugging

parser.add_argument('--remote_debug', action='store_true')
parser.add_argument('--remote_debug_connection_timeout', type=int,
default=300,
help=f'Defines how much time the Azure Machine Learning
compute target '
f'will await a connection from a debugger client
(VSCODE).')
parser.add_argument('--remote_debug_client_ip', type=str,
help=f'Defines IP Address of VS Code client')
parser.add_argument('--remote_debug_port', type=int,
default=5678,
help=f'Defines Port of VS Code client')
# Get run object, so we can find and log the IP of the host instance
global run
# Start debugger if remote_debug is enabled

if args.remote_debug:
print(f'Timeout for debug connection:
{args.remote_debug_connection_timeout}')
# Log the IP and port
ip = socket.gethostbyname(socket.gethostname())
# try:
# ip = args.remote_debug_client_ip
# except:
# print("Need to supply IP address for VS Code client")
print(f'ip_address: {ip}')
debugpy.listen(address=(ip, args.remote_debug_port))
# Wait for the timeout for debugger to attach
print(f'Debugger attached = {debugpy.is_client_connected()}')
print("Argument 1: %s" % args.input_data)

print("Argument 2: %s" % args.output_train)
if not (args.output_train is None):

os.makedirs(args.output_train, exist_ok=True)
print("%s created" % args.output_train)
Configure ML pipeline
To provide the Python packages needed to start debugpy and get the run context,
create an environment and set pip_packages=['debugpy', 'azureml-sdk==<SDK-
VERSION>'] . Change the SDK version to match the one you're using. The following code
snippet demonstrates how to create an environment:
Python
# Use a RunConfiguration to specify some additional requirements for this

step.
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
# create a new runconfig object

# enable Docker
run_config.environment.docker.enabled = True
# set Docker base image to the default CPU-based image

run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE
# use conda_dependencies.yml to create a conda environment in the Docker

image for execution
run_config.environment.python.user_managed_dependencies = False
# specify CondaDependencies obj

run_config.environment.python.conda_dependencies =
CondaDependencies.create(conda_packages=['scikit-learn'],
pip_packages=['debugpy', 'azureml-sdk==<SDK-VERSION>'])
In the Configure Python scripts section, new arguments were added to the scripts used
by your ML pipeline steps. The following code snippet demonstrates how to use these
arguments to enable debugging for the component and set a timeout. It also
demonstrates how to use the environment created earlier by setting
runconfig=run_config :
Python
# Use RunConfig from a pipeline step

step1 = PythonScriptStep(name="train_step",
arguments=['--remote_debug', '--
remote_debug_connection_timeout', 300,'--remote_debug_client_ip','<VS-CODE-
CLIENT-IP>','--remote_debug_port',5678],
source_directory=source_directory,
runconfig=run_config,
allow_reuse=False)
When the pipeline runs, each step creates a child run. If debugging is enabled, the
modified script logs information similar to the following text in the 70_driver_log.txt
for the child run:
text
Timeout for debug connection: 300

ip_address: 10.3.0.5
Save the ip_address value. It's used in the next section.
 Tip
You can also find the IP address from the run logs for the child run for this pipeline
step. For more information on viewing this information, see Monitor Azure
Machine Learning experiment runs and metrics.
Configure development environment

1. To install debugpy on your VS Code development environment, use the following
command:
Bash
python -m pip install --upgrade debugpy

For more information on using debugpy with VS Code, see Remote Debugging .
2. To configure VS Code to communicate with the Azure Machine Learning compute

that is running the debugger, create a new debug configuration:
a. From VS Code, select the Debug menu and then select Open configurations. A
file named launch.json opens.
b. In the launch.json file, find the line that contains "configurations": [ , and
insert the following text after it. Change the "host": "<IP-ADDRESS>" entry to
the IP address returned in your logs from the previous section. Change the
"localRoot": "${workspaceFolder}/code/step" entry to a local directory that
contains a copy of the script being debugged:
JSON
{
"name": "Azure Machine Learning Compute: remote debug",
"type": "python",
"request": "attach",
"port": 5678,
"host": "<IP-ADDRESS>",
"redirectOutput": true,
"pathMappings": [
{
"localRoot": "${workspaceFolder}/code/step1",
"remoteRoot": "."
}
]
}
） Important
If there are already other entries in the configurations section, add a

comma (,) after the code that you inserted.
 Tip
The best practice, especially for pipelines is to keep the resources for
scripts in separate directories so that code is relevant only for each of the
steps. In this example the localRoot example value references
/code/step1 .
If you are debugging multiple scripts, in different directories, create a
separate configuration section for each script.
c. Save the launch.json file.
Connect the debugger

1. Open VS Code and open a local copy of the script.
2. Set breakpoints where you want the script to stop once you've attached.
3. While the child process is running the script, and the Timeout for debug
connection is displayed in the logs, use the F5 key or select Debug. When
prompted, select the Azure Machine Learning Compute: remote debug

configuration. You can also select the debug icon from the side bar, the Azure
Machine Learning: remote debug entry from the Debug dropdown menu, and
then use the green arrow to attach the debugger.
At this point, VS Code connects to debugpy on the compute node and stops at the
breakpoint you set previously. You can now step through the code as it runs, view
variables, etc.
７ Note
If the log displays an entry stating Debugger attached = False , then the
timeout has expired and the script continued without the debugger. Submit
the pipeline again and connect the debugger after the Timeout for debug
connection message, and before the timeout expires.
Debug and troubleshoot deployments

In some cases, you may need to interactively debug the Python code contained in your
model deployment. For example, if the entry script is failing and the reason can't be
determined by extra logging. By using VS Code and the debugpy, you can attach to the
code running inside the Docker container.
 Tip
Save time and catch bugs early by debugging managed online endpoints and
deployments locally. For more information, see Debug managed online endpoints
locally in Visual Studio Code (preview).
） Important
This method of debugging does not work when using Model.deploy() and
LocalWebservice.deploy_configuration to deploy a model locally. Instead, you must
create an image using the Model.package() method.
Local web service deployments require a working Docker installation on your local
system. For more information on using Docker, see the Docker Documentation . When
working with compute instances, Docker is already installed.
Configure development environment

1. To install debugpy on your local VS Code development environment, use the
following command:
Bash
python -m pip install --upgrade debugpy
For more information on using debugpy with VS Code, see Remote Debugging .
2. To configure VS Code to communicate with the Docker image, create a new debug
configuration:
a. From VS Code, select the Debug menu in the Run extention and then select
Open configurations. A file named launch.json opens.
b. In the launch.json file, find the "configurations" item (the line that contains
"configurations": [ ), and insert the following text after it.
JSON
{
"name": "Azure Machine Learning Deployment: Docker Debug",
"type": "python",
"connect": {
"port": 5678,
"host": "0.0.0.0",
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/var/azureml-app"
}
]
}
After insertion, the launch.json file should be similar to the following:
JSON
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit:
https://go.microsoft.com/fwlink/linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
},
{
"name": "Azure Machine Learning Deployment: Docker Debug",
"type": "python",
"connect": {
"port": 5678,
"host": "0.0.0.0"
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/var/azureml-app"
}
]
}
]
}
） Important
If there are already other entries in the configurations section, add a

comma ( , ) after the code that you inserted.
This section attaches to the Docker container using port 5678.

c. Save the launch.json file.
Create an image that includes debugpy

1. Modify the conda environment for your deployment so that it includes debugpy.
The following example demonstrates adding it using the pip_packages parameter:
Python
# Usually a good idea to choose specific version numbers

# so training is made on same packages as scoring
myenv = CondaDependencies.create(conda_packages=['numpy==1.15.4',
'scikit-learn==0.19.1', 'pandas==0.23.4'],
pip_packages = ['azureml-
defaults==1.0.83', 'debugpy'])
2. To start debugpy and wait for a connection when the service starts, add the
following to the top of your score.py file:
Python
import debugpy
# Allows other computers to attach to debugpy on this IP address and
port.
debugpy.listen(('0.0.0.0', 5678))
# Wait 30 seconds for a debugger to attach. If none attaches, the
script continues as normal.
print("Debugger attached...")
3. Create an image based on the environment definition and pull the image to the
local registry.
７ Note
This example assumes that ws points to your Azure Machine Learning

workspace, and that model is the model being deployed. The myenv.yml file
contains the conda dependencies created in step 1.
Python

myenv = Environment.from_conda_specification(name="env",
myenv.docker.base_image = None
myenv.docker.base_dockerfile = "FROM
mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210615.v1"
environment=myenv)
package = Model.package(ws, [model], inference_config)
package.wait_for_creation(show_output=True) # Or show_output=False to
hide the Docker build logs.
package.pull()
Once the image has been created and downloaded (this process may take more
than 10 minutes), the image path (includes repository, name, and tag, which in this
case is also its digest) is finally displayed in a message similar to the following:
text
Status: Downloaded newer image for

myregistry.azurecr.io/package@sha256:<image-digest>
4. To make it easier to work with the image locally, you can use the following
command to add a tag for this image. Replace myimagepath in the following
command with the location value from the previous step.
Bash
docker tag myimagepath debug:1
For the rest of the steps, you can refer to the local image as debug:1 instead of the
full image path value.
Debug the service
 Tip
If you set a timeout for the debugpy connection in the score.py file, you must
connect VS Code to the debug session before the timeout expires. Start VS Code,
open the local copy of score.py , set a breakpoint, and have it ready to go before
using the steps in this section.
For more information on debugging and setting breakpoints, see Debugging .
1. To start a Docker container using the image, use the following command:
Bash
docker run -it --name debug -p 8000:5001 -p 5678:5678 -v

<my_local_path_to_score.py>:/var/azureml-app/score.py debug:1 /bin/bash
This command attaches your score.py locally to the one in the container.
Therefore, any changes made in the editor are automatically reflected in the
container
2. For a better experience, you can go into the container with a new VS code
interface. Select the Docker extention from the VS Code side bar, find your local
container created, in this documentation its debug:1 . Right-click this container and
select "Attach Visual Studio Code" , then a new VS Code interface will be opened
automatically, and this interface shows the inside of your created container.
3. Inside the container, run the following command in the shell
Bash
runsvdir /var/runit
Then you can see the following output in the shell inside your container:
4. To attach VS Code to debugpy inside the container, open VS Code, and use the F5
key or select Debug. When prompted, select the Azure Machine Learning
Deployment: Docker Debug configuration. You can also select the Run extention
icon from the side bar, the Azure Machine Learning Deployment: Docker Debug
entry from the Debug dropdown menu, and then use the green arrow to attach
the debugger.
After you select the green arrow and attach the debugger, in the container VS
Code interface you can see some new information:
Also, in your main VS Code interface, what you can see is following:
And now, the local score.py which is attached to the container has already stopped at
the breakpoints where you set. At this point, VS Code connects to debugpy inside the
Docker container and stops the Docker container at the breakpoint you set previously.
You can now step through the code as it runs, view variables, etc.
For more information on using VS Code to debug Python, see Debug your Python
code .
Stop the container

To stop the container, use the following command:
Bash
docker stop debug
Next steps
Now that you've set up VS Code Remote, you can use a compute instance as remote
compute from VS Code to interactively debug your code.
Learn more about troubleshooting:
Local model deployment

Remote model deployment
Machine learning pipelines
ParallelRunStep
Explore Azure Machine Learning with
Jupyter Notebooks (v1)
Article • 06/01/2023
The Azure Machine Learning Notebooks repository includes Azure Machine Learning
Python SDK (v1) samples. These Jupyter notebooks are designed to help you explore the
SDK and serve as models for your own machine learning projects. In this repository,
you'll find tutorial notebooks in the tutorials folder and feature-specific notebooks in
the how-to-use-azureml folder.
This article shows you how to access the repositories from the following environments:

Bring your own notebook server
Data Science Virtual Machine
Option 1: Access on Azure Machine Learning

compute instance (recommended)
The easiest way to get started with the samples is to complete Create resources you
need to get started. Once completed, you'll have a dedicated notebook server pre-
loaded with the SDK and the Azure Machine Learning Notebooks repository. No
downloads or installation necessary.
To view example notebooks: 1. Sign in to studio and select your workspace if

necessary. 1. Select Notebooks. 1. Select the Samples tab. Use the SDK v1 folder for
examples using Python SDK v1.
Option 2: Access on your own notebook server

If you'd like to bring your own notebook server for local development, follow these
steps on your computer.
1. Use the instructions at Azure Machine Learning SDK to install the Azure Machine
Learning SDK (v1) for Python
2. Create an Azure Machine Learning workspace.

3. Write a configuration file file (aml_config/config.json).
4. Clone the Machine Learning Notebooks repository .
Bash
git clone https://github.com/Azure/MachineLearningNotebooks.git --depth

1
5. Start the notebook server from the directory containing your clone.
Bash
jupyter notebook
These instructions install the base SDK packages necessary for the quickstart and tutorial
notebooks. Other sample notebooks may require you to install extra components. For
more information, see Install the Azure Machine Learning SDK for Python.
Option 3: Access on a DSVM

The Data Science Virtual Machine (DSVM) is a customized VM image built specifically for
doing data science. If you create a DSVM, the SDK and notebook server are installed and
configured for you. However, you'll still need to create a workspace and clone the
sample repository.
2. Add a workspace configuration file using either of these methods:
In Azure Machine Learning studio , select your workspace settings in the

upper right, then select Download config file.
Create a new workspace using code in the configuration.ipynb notebook.
3. From the directory where you added the configuration file, clone the Machine
Learning Notebooks repository .
Bash
git clone https://github.com/Azure/MachineLearningNotebooks.git --depth

1
4. Start the notebook server from the directory, which now contains the clone and
the config file.
Bash
jupyter notebook
Next steps
Explore the MachineLearningNotebooks repository to discover what Azure Machine
Learning can do.
For more GitHub sample projects and examples, see these repos:
Microsoft/MLOps
Microsoft/MLOpsPython
Example pipelines & datasets for Azure
Machine Learning designer
Article • 05/29/2023
Use the built-in examples in Azure Machine Learning designer to quickly get started
building your own machine learning pipelines. The Azure Machine Learning designer
GitHub repository contains detailed documentation to help you understand some
common machine learning scenarios.
Prerequisites
account
） Important
users and roles.
Use sample pipelines

The designer saves a copy of the sample pipelines to your studio workspace. You can
edit the pipeline to adapt it to your needs and save it as your own. Use them as a
starting point to jumpstart your projects.
Here's how to use a designer sample:
1. Sign in to ml.azure.com , and select the workspace you want to work with.
2. Select Designer.
3. Select a sample pipeline under the New pipeline section.
Select Show more samples for a complete list of samples.

4. To run a pipeline, you first have to set default compute target to run the pipeline
on.
a. In the Settings pane to the right of the canvas, select Select compute target.
b. In the dialog that appears, select an existing compute target or create a new
one. Select Save.
c. Select Submit at the top of the canvas to submit a pipeline job.
Depending on the sample pipeline and compute settings, jobs may take some time
to complete. The default compute settings have a minimum node size of 0, which
improve efficiency.
5. After the pipeline finishes running, you can review the pipeline and view the output
for each component to learn more. Use the following steps to view component
outputs:
a. Right-click the component in the canvas whose output you'd like to see.
b. Select Visualize.
Use the samples as starting points for some of the most common machine
learning scenarios.
Regression
Explore these built-in regression samples.
Sample title Description
Regression - Automobile Predict car prices using linear regression.

Price Prediction (Basic)
Regression - Automobile Predict car prices using decision forest and boosted decision tree
Price Prediction regressors. Compare models to find the best algorithm.
(Advanced)
Classification
Explore these built-in classification samples. You can learn more about the samples by
opening the samples and viewing the component comments in the designer.
Binary Classification with Feature Predict income as high or low, using a two-class boosted
Selection - Income Prediction decision tree. Use Pearson correlation to select features.
Binary Classification with custom Classify credit applications as high or low risk. Use the
Python script - Credit Risk Execute Python Script component to weight your data.
Prediction
Binary Classification - Customer Predict customer churn using two-class boosted decision
Relationship Prediction trees. Use SMOTE to sample biased data.
Text Classification - Wikipedia SP Classify company types from Wikipedia articles with
500 Dataset multiclass logistic regression.
Multiclass Classification - Letter Create an ensemble of binary classifiers to classify written

Recognition letters.
Computer vision
Explore these built-in computer vision samples. You can learn more about the samples
by opening the samples and viewing the component comments in the designer.
Image Classification Use computer vision components to build image classification model
using DenseNet based on PyTorch DenseNet.
Recommender
Explore these built-in recommender samples. You can learn more about the samples by
opening the samples and viewing the component comments in the designer.
Wide & Deep based Recommendation - Build a restaurant recommender engine from
Restaurant Rating Prediction restaurant/user features and ratings.
Recommendation - Movie Rating Tweets Build a movie recommender engine from

movie/user features and ratings.
Utility
Learn more about the samples that demonstrate machine learning utilities and features.
You can learn more about the samples by opening the samples and viewing the
component comments in the designer.
Binary Vowpal Wabbit is a machine learning system which pushes the frontier of
Classification machine learning with techniques such as online, hashing, allreduce,
using Vowpal reductions, learning2search, active, and interactive learning. This sample
Wabbit Model - shows how to use Vowpal Wabbit model to build binary classification model.
Adult Income
Prediction
Use custom R Use customized R script to predict if a scheduled passenger flight will be
script - Flight delayed by more than 15 minutes.
Delay
Prediction
Cross Validation Use cross validation to build a binary classifier for adult income.
for Binary
Classification -
Adult Income
Prediction
Permutation Use permutation feature importance to compute importance scores for the
Feature test dataset.
Importance
Tune Parameters Use Tune Model Hyperparameters to find optimal hyperparameters to build
for Binary a binary classifier.
Classification -
Adult Income
Prediction
Datasets
When you create a new pipeline in Azure Machine Learning designer, a number of
sample datasets are included by default. These sample datasets are used by the sample
pipelines in the designer homepage.
The sample datasets are available under Datasets-Samples category. You can find this in
the component palette to the left of the canvas in the designer. You can use any of
these datasets in your own pipeline by dragging it to the canvas.
Dataset name Dataset description

Adult Census A subset of the 1994 Census database, using working adults over the age of
Income Binary 16 with an adjusted income index of > 100.
Classification Usage: Classify people using demographics to predict whether a person earns
dataset over 50K a year.
Related Research: Kohavi, R., Becker, B., (1996). UCI Machine Learning
Repository . Irvine, CA: University of California, School of Information and
Computer Science
Automobile Information about automobiles by make and model, including the price,
price data (Raw) features such as the number of cylinders and MPG, as well as an insurance risk
score.
The risk score is initially associated with auto price. It is then adjusted for
actual risk in a process known to actuaries as symboling. A value of +3
indicates that the auto is risky, and a value of -3 that it is probably safe.
Usage: Predict the risk score by features, using regression or multivariate
classification.
Related Research: Schlimmer, J.C. (1987). UCI Machine Learning Repository .
Irvine, CA: University of California, School of Information and Computer
Science.
CRM Appetency Labels from the KDD Cup 2009 customer relationship prediction challenge
Labels Shared (orange_small_train_appetency.labels ).
CRM Churn Labels from the KDD Cup 2009 customer relationship prediction challenge
Labels Shared (orange_small_train_churn.labels ).
CRM Dataset This data comes from the KDD Cup 2009 customer relationship prediction
Shared challenge (orange_small_train.data.zip ).
The dataset contains 50K customers from the French Telecom company
Orange. Each customer has 230 anonymized features, 190 of which are
numeric and 40 are categorical. The features are very sparse.
CRM Upselling Labels from the KDD Cup 2009 customer relationship prediction challenge
Labels Shared (orange_large_train_upselling.labels
Flight Delays Passenger flight on-time performance data taken from the TranStats data
Data collection of the U.S. Department of Transportation (On-Time ).
The dataset covers the time period April-October 2013. Before uploading to
the designer, the dataset was processed as follows:
- The dataset was filtered to cover only the 70 busiest airports in the
continental US
- Canceled flights were labeled as delayed by more than 15 minutes
- Diverted flights were filtered out
- The following columns were selected: Year, Month, DayofMonth,
DayOfWeek, Carrier, OriginAirportID, DestAirportID, CRSDepTime, DepDelay,
DepDel15, CRSArrTime, ArrDelay, ArrDel15, Canceled
German Credit The UCI Statlog (German Credit Card) dataset

Card UCI dataset (Statlog+German+Credit+Data ), using the german.data file.
The dataset classifies people, described by a set of attributes, as low or high
credit risks. Each example represents a person. There are 20 features, both
numerical and categorical, and a binary label (the credit risk value). High credit
risk entries have label = 2, low credit risk entries have label = 1. The cost of
misclassifying a low risk example as high is 1, whereas the cost of
misclassifying a high risk example as low is 5.
IMDB Movie The dataset contains information about movies that were rated in Twitter
Titles tweets: IMDB movie ID, movie name, genre, and production year. There are
17K movies in the dataset. The dataset was introduced in the paper "S.
Dooms, T. De Pessemier and L. Martens. MovieTweetings: a Movie Rating
Dataset Collected From Twitter. Workshop on Crowdsourcing and Human
Computation for Recommender Systems, CrowdRec at RecSys 2013."
Movie Ratings The dataset is an extended version of the Movie Tweetings dataset. The
dataset has 170K ratings for movies, extracted from well-structured tweets on
Twitter. Each instance represents a tweet and is a tuple: user ID, IMDB movie
ID, rating, timestamp, number of favorites for this tweet, and number of
retweets of this tweet. The dataset was made available by A. Said, S. Dooms, B.
Loni and D. Tikk for Recommender Systems Challenge 2014.
Weather Dataset Hourly land-based weather observations from NOAA (merged data from
201304 to 201310 ).
The weather data covers observations made from airport weather stations,
covering the time period April-October 2013. Before uploading to the
designer, the dataset was processed as follows:
- Weather station IDs were mapped to corresponding airport IDs
- Weather stations not associated with the 70 busiest airports were filtered out
- The Date column was split into separate Year, Month, and Day columns
- The following columns were selected: AirportID, Year, Month, Day, Time,
TimeZone, SkyCondition, Visibility, WeatherType, DryBulbFarenheit,
DryBulbCelsius, WetBulbFarenheit, WetBulbCelsius, DewPointFarenheit,
DewPointCelsius, RelativeHumidity, WindSpeed, WindDirection,
ValueForWindCharacter, StationPressure, PressureTendency, PressureChange,
SeaLevelPressure, RecordType, HourlyPrecip, Altimeter
Wikipedia SP Data is derived from Wikipedia (https://www.wikipedia.org/ ) based on

500 Dataset articles of each S&P 500 company, stored as XML data.
Before uploading to the designer, the dataset was processed as follows:
- Extract text content for each specific company
- Remove wiki formatting
- Remove non-alphanumeric characters
- Convert all text to lowercase
- Known company categories were added
Note that for some companies an article could not be found, so the number
of records is less than 500.
Restaurant A set of metadata about restaurants and their features, such as food type,
Feature Data dining style, and location.
Usage: Use this dataset, in combination with the other two restaurant
datasets, to train and test a recommender system.
Related Research: Bache, K. and Lichman, M. (2013). UCI Machine Learning
Computer Science.
Restaurant Contains ratings given by users to restaurants on a scale from 0 to 2.

Ratings Usage: Use this dataset, in combination with the other two restaurant
Computer Science.
Restaurant A set of metadata about customers, including demographics and preferences.

Customer Data Usage: Use this dataset, in combination with the other two restaurant
Repository Irvine, CA: University of California, School of Information and
Computer Science.
Clean up resources
） Important
Delete everything

Next steps
Learn the fundamentals of predictive analytics and machine learning with Tutorial:
Predict automobile price with the designer
What is the Azure Machine Learning
SDK for Python?
Article • 11/23/2021
Data scientists and AI developers use the Azure Machine Learning SDK for Python to
build and run machine learning workflows with the Azure Machine Learning service. You
can interact with the service in any Python environment, including Jupyter Notebooks,
Visual Studio Code, or your favorite Python IDE.
Key areas of the SDK include:
Explore, prepare and manage the lifecycle of your datasets used in machine
learning experiments.
Manage cloud resources for monitoring, logging, and organizing your machine
Train models either locally or by using cloud resources, including GPU-accelerated
model training.
Use automated machine learning, which accepts configuration parameters and
training data. It automatically iterates through algorithms and hyperparameter
settings to find the best model for running predictions.
Deploy web services to convert your trained models into RESTful services that can
be consumed in any application.
For a step-by-step walkthrough of how to get started, try the tutorial.
The following sections are overviews of some of the most important classes in the SDK,
and common design patterns for using them. To get the SDK, see the installation guide.
Stable vs experimental
The Azure Machine Learning SDK for Python provides both stable and experimental
features in the same SDK.
Feature/capability Description
status
Stable features Production ready
These features are recommended for most use cases and production
environments. They are updated less frequently then experimental features.
Feature/capability Description
status
Experimental Developmental
features
These features are newly developed capabilities & updates that may not be
ready or fully tested for production usage. While the features are typically
functional, they can include some breaking changes. Experimental features
are used to iron out SDK breaking bugs, and will only receive updates for
the duration of the testing period. Experimental features are also referred to
as features that are in preview.
As the name indicates, the experimental (preview) features are for

experimenting and is not considered bug free or stable. For this reason, we
only recommend experimental features to advanced users who wish to try
out early versions of capabilities and updates, and intend to participate in
the reporting of bugs and glitches.
Experimental features are labelled by a note section in the SDK reference and denoted
by text such as, (preview) throughout Azure Machine Learning documentation.
Workspace
Namespace: azureml.core.workspace.Workspace
The Workspace class is a foundational resource in the cloud that you use to experiment,
train, and deploy machine learning models. It ties your Azure subscription and resource
group to an easily consumed object.
View all parameters of the create Workspace method to reuse existing instances
(Storage, Key Vault, App-Insights, and Azure Container Registry-ACR) as well as
modify additional settings such as private endpoint configuration and compute target.
Import the class and create a new workspace by using the following code. Set
create_resource_group to False if you have a previously existing Azure resource group
that you want to use for the workspace. Some functions might prompt for Azure
authentication credentials.
Python

ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
Use the same workspace in multiple environments by first writing it to a configuration

JSON file. This saves your subscription, resource, and workspace name data.
Python
ws.write_config(path="./file-path", file_name="ws_config.json")
Load your workspace by reading the configuration file.
Python

ws_other_environment = Workspace.from_config(path="./file-
path/ws_config.json")
Alternatively, use the static get() method to load an existing workspace without using
configuration files.
Python

ws = Workspace.get(name="myworkspace", subscription_id='<azure-subscription-
id>', resource_group='myresourcegroup')
The variable ws represents a Workspace object in the following code examples.
Experiment
Namespace: azureml.core.experiment.Experiment
The Experiment class is another foundational cloud resource that represents a collection
of trials (individual model runs). The following code fetches an Experiment object from
within Workspace by name, or it creates a new Experiment object if the name doesn't
exist.
Python

experiment = Experiment(workspace=ws, name='test-experiment')
Run the following code to get a list of all Experiment objects contained in Workspace .
Python
list_experiments = Experiment.list(ws)
Use the get_runs function to retrieve a list of Run objects (trials) from Experiment . The
following code retrieves the runs and prints each run ID.
Python
list_runs = experiment.get_runs()
for run in list_runs:
print(run.id)
There are two ways to execute an experiment trial. If you're interactively experimenting
in a Jupyter notebook, use the start_logging function. If you're submitting an
experiment from a standard Python environment, use the submit function. Both
functions return a Run object. The experiment variable represents an Experiment object
in the following code examples.
Run
Namespace: azureml.core.run.Run
A run represents a single trial of an experiment. Run is the object that you use to
monitor the asynchronous execution of a trial, store the output of the trial, analyze
results, and access generated artifacts. You use Run inside your experimentation code to
log metrics and artifacts to the Run History service. Functionality includes:
Storing and retrieving metrics and data.

Using tags and the child hierarchy for easy lookup of past runs.
Registering stored model files for deployment.
Storing, modifying, and retrieving properties of a run.
Create a Run object by submitting an Experiment object with a run configuration object.
Use the tags parameter to attach custom categories and labels to your runs. You can
easily find and retrieve them later from Experiment .
Python
tags = {"prod": "phase-1-model-tests"}

run = experiment.submit(config=your_config_object, tags=tags)
Use the static list function to get a list of all Run objects from Experiment . Specify the
tags parameter to filter by your previously created tag.
Python

filtered_list_runs = Run.list(experiment, tags=tags)
Use the get_details function to retrieve the detailed output for the run.
Python
run_details = run.get_details()
Output for this function is a dictionary that includes:
Run ID
Status
Start and end time
Compute target (local versus cloud)
Dependencies and versions used in the run
Training-specific data (differs depending on model type)
For more examples of how to configure and monitor runs, see the how-to.
Model
Namespace: azureml.core.model.Model
The Model class is used for working with cloud representations of machine learning
models. Methods help you transfer models between local development environments
and the Workspace object in the cloud.
You can use model registration to store and version your models in the Azure cloud, in
your workspace. Registered models are identified by name and version. Each time you
register a model with the same name as an existing one, the registry increments the
version. Azure Machine Learning supports any model that can be loaded through
Python 3, not just Azure Machine Learning models.
The following example shows how to build a simple local classification model with
scikit-learn , register the model in Workspace , and download the model from the
cloud.
Create a simple classifier, clf , to predict customer churn based on their age. Then
dump the model to a .pkl file in the same directory.
Python

import joblib
import numpy as np
# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]
clf = svm.SVC(gamma=0.001, C=100.)

joblib.dump(value=clf, filename="churn-model.pkl")
Use the register function to register the model in your workspace. Specify the local
model path and the model name. Registering the same name more than once will create
a new version.
Python
model = Model.register(workspace=ws, model_path="churn-model.pkl",

model_name="churn-model-test")
Now that the model is registered in your workspace, it's easy to manage, download, and
organize your models. To retrieve a model (for example, in another environment) object
from Workspace , use the class constructor and specify the model name and any optional
parameters. Then, use the download function to download the model, including the
cloud folder structure.
Python

import os
model = Model(workspace=ws, name="churn-model-test")

model.download(target_dir=os.getcwd())
Use the delete function to remove the model from Workspace .
Python
model.delete()
After you have a registered model, deploying it as a web service is a straightforward

process. First you create and register an image. This step configures the Python
environment and its dependencies, along with a script to define the web service request
and response formats. After you create an image, you build a deploy configuration that
sets the CPU cores and memory parameters for the compute target. You then attach
your image.
ComputeTarget, RunConfiguration, and

ScriptRunConfig
Namespace: azureml.core.compute.ComputeTarget
Namespace: azureml.core.runconfig.RunConfiguration
Namespace: azureml.core.script_run_config.ScriptRunConfig
The ComputeTarget class is the abstract parent class for creating and managing compute
targets. A compute target represents a variety of resources where you can train your
machine learning models. A compute target can be either a local machine or a cloud
resource, such as Azure Machine Learning Compute, Azure HDInsight, or a remote
virtual machine.
Use compute targets to take advantage of powerful virtual machines for model training,
and set up either persistent compute targets or temporary runtime-invoked targets. For
a comprehensive guide on setting up and managing compute targets, see the how-to.
The following code shows a simple example of setting up an AmlCompute (child class of
ComputeTarget ) target. This target creates a runtime remote compute resource in your
Workspace object. The resource scales automatically when a job is submitted. It's deleted
automatically when the run finishes.
Reuse the simple scikit-learn churn model and build it into its own file, train.py , in
the current directory. At the end of the file, create a new directory called outputs . This
step creates a directory in the cloud (your workspace) to store your trained model that
joblib.dump() serialized.
Python
# train.py

import numpy as np
import joblib
import os
# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]
clf = svm.SVC(gamma=0.001, C=100.)

os.makedirs("outputs", exist_ok=True)
joblib.dump(value=clf, filename="outputs/churn-model.pkl")
Next you create the compute target by instantiating a RunConfiguration object and
setting the type and size. This example uses the smallest resource size (1 CPU core, 3.5
GB of memory). The list_vms variable contains a list of supported virtual machines and
their sizes.
Python

list_vms = AmlCompute.supported_vmsizes(workspace=ws)
compute_config = RunConfiguration()
compute_config.target = "amlcompute"
compute_config.amlcompute.vm_size = "STANDARD_D1_V2"
Create dependencies for the remote compute resource's Python environment by using
the CondaDependencies class. The train.py file is using scikit-learn and numpy , which
need to be installed in the environment. You can also specify versions of dependencies.
Use the dependencies object to set the environment in compute_config .
Python
dependencies = CondaDependencies()
dependencies.add_pip_package("scikit-learn")
dependencies.add_pip_package("numpy==1.15.4")
compute_config.environment.python.conda_dependencies = dependencies
Now you're ready to submit the experiment. Use the ScriptRunConfig class to attach the
compute target configuration, and to specify the path/file to the training script
train.py . Submit the experiment by specifying the config parameter of the submit()
function. Call wait_for_completion on the resulting run to see asynchronous run output
as the environment is initialized and the model is trained.
２ Warning
The following are limitations around specific characters when used in

ScriptRunConfig parameters:
The " , $ , ; , and \ characters are escaped by the back end, as they are
considered reserved characters for separating bash commands.
The ( , ) , % , ! , ^ , < , > , & , and | characters are escaped for local runs on
Windows.
Python

script_run_config = ScriptRunConfig(source_directory=os.getcwd(),
script="train.py", run_config=compute_config)
experiment = Experiment(workspace=ws, name="compute_target_test")
run = experiment.submit(config=script_run_config)
After the run finishes, the trained model file churn-model.pkl is available in your
workspace.
Environment
Namespace: azureml.core.environment
Azure Machine Learning environments specify the Python packages, environment

variables, and software settings around your training and scoring scripts. In addition to
Python, you can also configure PySpark, Docker and R for environments. Internally,
environments result in Docker images that are used to run the training and scoring
processes on the compute target. The environments are managed and versioned entities
within your Machine Learning workspace that enable reproducible, auditable, and
portable machine learning workflows across a variety of compute targets and compute
types.
You can use an Environment object to:
Develop your training script.

Reuse the same environment on Azure Machine Learning Compute for model
training at scale.
Deploy your model with that same environment without being tied to a specific
compute type.
The following code imports the Environment class from the SDK and to instantiates an
environment object.
Python

Environment(name="myenv")
Add packages to an environment by using Conda, pip, or private wheel files. Specify
each package dependency by using the CondaDependency class to add it to the
environment's PythonSection .
The following example adds to the environment. It adds version 1.17.0 of numpy . It also
adds the pillow package to the environment, myenv . The example uses the
add_conda_package() method and the add_pip_package() method, respectively.
Python

# Installs numpy version 1.17.0 conda package

conda_dep.add_conda_package("numpy==1.17.0")
# Installs pillow package

conda_dep.add_pip_package("pillow")
# Adds dependencies to PythonSection of myenv

myenv.python.conda_dependencies=conda_dep
To submit a training run, you need to combine your environment, compute target, and
your training Python script into a run configuration. This configuration is a wrapper
object that's used for submitting runs.
When you submit a training run, the building of a new environment can take several
minutes. The duration depends on the size of the required dependencies. The
environments are cached by the service. So as long as the environment definition
remains unchanged, you incur the full setup time only once.
The following example shows where you would use ScriptRunConfig as your wrapper
object.
Python
from azureml.core import ScriptRunConfig, Experiment

exp = Experiment(name="myexp", workspace = ws)

# Instantiate environment
# Add training script to run config

runconfig = ScriptRunConfig(source_directory=".", script="train.py")
# Attach compute target to run config

runconfig.run_config.target = "local"
# Attach environment to run config

runconfig.run_config.environment = myenv
# Submit run
run = exp.submit(runconfig)
If you don't specify an environment in your run configuration before you submit the run,
then a default environment is created for you.
See the Model deploy section to use environments to deploy a web service.
Pipeline, PythonScriptStep
Namespace: azureml.pipeline.core.pipeline.Pipeline
Namespace: azureml.pipeline.steps.python_script_step.PythonScriptStep
An Azure Machine Learning pipeline is an automated workflow of a complete machine

learning task. Subtasks are encapsulated as a series of steps within the pipeline. An
Azure Machine Learning pipeline can be as simple as one step that calls a Python script.
Pipelines include functionality for:
Data preparation including importing, validating and cleaning, munging and

transformation, normalization, and staging
Training configuration including parameterizing arguments, filepaths, and logging
/ reporting configurations
Training and validating efficiently and repeatably, which might include specifying
specific data subsets, different hardware compute resources, distributed
processing, and progress monitoring
Deployment, including versioning, scaling, provisioning, and access control
Publishing a pipeline to a REST endpoint to rerun from any HTTP library
A PythonScriptStep is a basic, built-in step to run a Python Script on a compute target.

It takes a script name and other optional parameters like arguments for the script,
compute target, inputs and outputs. The following code is a simple example of a
PythonScriptStep . For an example of a train.py script, see the tutorial sub-section.
Python
arguments=["--input", blob_input_data, "--output", output_data1],
inputs=[blob_input_data],
outputs=[output_data1],
source_directory=project_folder
)
After at least one step has been created, steps can be linked together and published as
a simple automated pipeline.
Python
pipeline = Pipeline(workspace=ws, steps=[train_step])

pipeline_run = experiment.submit(pipeline)
For a comprehensive example of building a pipeline workflow, follow the advanced

tutorial.
Pattern for creating and using pipelines

An Azure Machine Learning pipeline is associated with an Azure Machine Learning
workspace and a pipeline step is associated with a compute target available within that
workspace. For more information, see this article about workspaces or this explanation
of compute targets.
A common pattern for pipeline steps is:
1. Specify workspace, compute, and storage

2. Configure your input and output data using
a. Dataset which makes available an existing Azure datastore
b. PipelineDataset which encapsulates typed tabular data
c. PipelineData which is used for intermediate file or directory data written by one
step and intended to be consumed by another
3. Define one or more pipeline steps
4. Instantiate a pipeline using your workspace and steps
5. Create an experiment to which you submit the pipeline
6. Monitor the experiment results
This notebook is a good example of this pattern. job
For more information about Azure Machine Learning Pipelines, and in particular how
they are different from other types of pipelines, see this article.
AutoMLConfig
Namespace: azureml.train.automl.automlconfig.AutoMLConfig
Use the AutoMLConfig class to configure parameters for automated machine learning
training. Automated machine learning iterates over many combinations of machine
learning algorithms and hyperparameter settings. It then finds the best-fit model based
on your chosen accuracy metric. Configuration allows for specifying:
Task type (classification, regression, forecasting)

Number of algorithm iterations and maximum time per iteration
Accuracy metric to optimize
Algorithms to blocklist/allowlist
Number of cross-validations
Compute targets
Training data
７ Note
Use the automl extra in your installation to use automated machine learning.
For detailed guides and examples of setting up automated machine learning

experiments, see the tutorial and how-to.
The following code illustrates building an automated machine learning configuration
object for a classification model, and using it when you're submitting an experiment.
Python
automl_config = AutoMLConfig(task="classification",
X=your_training_features,
y=your_training_labels,
iterations=30,
iteration_timeout_minutes=5,
primary_metric="AUC_weighted",
n_cross_validations=5
)
Use the automl_config object to submit an experiment.
Python
experiment = Experiment(ws, "automl_test_experiment")

run = experiment.submit(config=automl_config, show_output=True)
After you submit the experiment, output shows the training accuracy for each iteration
as it finishes. After the run is finished, an AutoMLRun object (which extends the Run class)
is returned. Get the best-fit model by using the get_output() function to return a Model
object.
Python
best_model = run.get_output()
y_predict = best_model.predict(X_test)
Model deploy
Namespace: azureml.core.model.InferenceConfig
Namespace: azureml.core.webservice.webservice.Webservice
The InferenceConfig class is for configuration settings that describe the environment
needed to host the model and web service.
Webservice is the abstract parent class for creating and deploying web services for your
models. For a detailed guide on preparing for model deployment and deploying web
services, see this how-to.
You can use environments when you deploy your model as a web service. Environments
enable a reproducible, connected workflow where you can deploy your model using the
same libraries in both your training compute and your inference compute. Internally,
environments are implemented as Docker images. You can use either images provided
by Microsoft, or use your own custom Docker images. If you were previously using the
ContainerImage class for your deployment, see the DockerSection class for
accomplishing a similar workflow with environments.
To deploy a web service, combine the environment, inference compute, scoring script,
and registered model in your deployment object, deploy().
The following example, assumes you already completed a training run using
environment, myenv , and want to deploy that model to Azure Container Instances.
Python

from azureml.core.webservice import AciWebservice, Webservice
# Register the model to deploy

model = run.register_model(model_name = "mymodel", model_path =
"outputs/model.pkl")
# Combine scoring script & environment in Inference configuration

environment=myenv)
# Set deployment configuration

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1)
# Define the model, inference, & deployment configuration and web service
name and location to deploy
service = Model.deploy(workspace = ws,
name = "my_web_service",
models = [model],
inference_config = inference_config,
deployment_config = deployment_config)
This example creates an Azure Container Instances web service, which is best for small-
scale testing and quick deployments. To deploy your model as a production-scale web
service, use Azure Kubernetes Service (AKS). For more information, see AksCompute
class.
Dataset
Namespace: azureml.core.dataset.Dataset
Namespace: azureml.data.file_dataset.FileDataset
Namespace: azureml.data.tabular_dataset.TabularDataset
The Dataset class is a foundational resource for exploring and managing data within
Azure Machine Learning. You can explore your data with summary statistics, and save
the Dataset to your AML workspace to get versioning and reproducibility capabilities.
Datasets are easily consumed by models during training. For detailed usage examples,
see the how-to guide.
TabularDataset represents data in a tabular format created by parsing a file or list

of files.
FileDataset references single or multiple files in datastores or from public URLs.
The following example shows how to create a TabularDataset pointing to a single path
in a datastore.
Python
dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'train-

dataset/tabular/iris.csv')])
dataset.take(3).to_pandas_dataframe()
The following example shows how to create a FileDataset referencing multiple file
URLs.
Python
url_paths = [
]
dataset = Dataset.File.from_files(path=url_paths)
Next steps
Try these next steps to learn how to use the Azure Machine Learning SDK for Python:
Follow the tutorial to learn how to build, train, and deploy a model in Python.
Look up classes and modules in the reference documentation on this site by using
the table of contents on the left.
Machine Learning REST API reference
Article • 05/09/2023
The Azure Machine Learning REST APIs allow you to develop clients that use REST calls
to work with the service.
Rest Operation Groups

Machine Learning REST APIs provide operations for working with the following
resources:
Latest GA API Version

Uses REST API Version 2023-04-01
Operation group Description Operation

subgroups
Workspaces Provides operations for managing workspaces.
Compute Provides operations for managing compute. Compute

Usages
Virtual Machine
Sizes
Datastores Provides operations for managing datastores.
Environments Provides operations for managing environments. Environment

Containers
Environment
Versions
Data Provides operations for managing data assets. Data Containers

Data Versions
Code Provides operations for managing code assets. Code Containers

Code Versions
Models Provides operations for managing models. Model

Containers
Model Versions
Components Provides operations for managing components. Component

Containers
Component
Versions
subgroups
Jobs Provides operations for managing jobs.
Online Endpoints Provides operations for managing online endpoints. Online

Endpoints
Online
Deployments
Batch Endpoints Provides operations for managing batch endpoints. Batch Endpoints
Batch
Deployments
Workspace Provides operations for managing workspace

Connections connections.
Quotas Provides operations for managing quotas.
Private Endpoint Provides operations for managing private endpoint

Connections connections to a workspace.
Private Link Provides operations for managing private link resources

Resources for a workspace.
Latest Preview API Version

Uses REST API Version 2023-06-01-preview

subgroups
Workspaces Provides operations for managing workspaces.
Compute Provides operations for managing compute. Compute

Usages
Virtual Machine
Sizes
Datastores Provides operations for managing datastores.
Environments Provides operations for managing environments. Environment

Containers
Environment
Versions
Data Provides operations for managing data assets. Data Containers

Data Versions
subgroups
Code Provides operations for managing code assets. Code Containers

Code Versions
Models Provides operations for managing models. Model

Containers
Model Versions
Components Provides operations for managing components. Component

Containers
Component
Versions
Jobs Provides operations for managing jobs.
Labeling Jobs Provides operations for managing labeling jobs.
Online Endpoints Provides operations for managing online endpoints. Online

Endpoints
Online
Deployments
Batch Endpoints Provides operations for managing batch endpoints. Batch Endpoints
Batch
Deployments
Schedules Provides operations for managing schedules.
Workspace Provides operations for managing workspace

Connections connections.
Quotas Provides operations for managing quotas.
Private Endpoint Provides operations for managing private endpoint

Connections connections to a workspace.
Private Link Provides operations for managing private link resources

Resources for a workspace.
See Also
Learn more about this service:
Azure Machine Learning Documentation

az ml
Reference
７ Note
This reference is part of the ml extension for the Azure CLI (version 2.15.0 or
higher). The extension will automatically install the first time you run an az ml
command. Learn more about extensions.
Manage Azure Machine Learning resources with the Azure CLI ML extension v2.
Install Azure CLI ML extension v2 https://docs.microsoft.com/azure/machine-

learning/how-to-configure-cli.
Commands
az ml batch-deployment Manage Azure ML batch deployments.
az ml batch-deployment Create a deployment. If the deployment already exists, it will be

create over-written with the new settings.
az ml batch-deployment Delete a deployment.

delete
az ml batch-deployment list List deployments.
az ml batch-deployment list- List the batch scoring jobs for a batch deployment.
jobs
az ml batch-deployment show Show a deployment.
az ml batch-deployment Update a deployment.

update
az ml batch-endpoint Manage Azure ML batch endpoints.
az ml batch-endpoint create Create an endpoint.
az ml batch-endpoint delete Delete an endpoint.
az ml batch-endpoint invoke Invoke an endpoint.
az ml batch-endpoint list List endpoints in a workspace.
az ml batch-endpoint list-jobs List the batch scoring jobs for a batch endpoint.
az ml batch-endpoint show Show details for an endpoint.
az ml batch-endpoint update Update an endpoint.
az ml component Manage Azure ML components.
az ml component archive Archive a component.
az ml component create Create a component.
az ml component list List components in a workspace.
az ml component restore Restore an archived component.
az ml component show Show details for a component.
az ml component update Update a component. Currently only a few fields(description,

display_name) support update.
az ml compute Manage Azure ML compute resources.
az ml compute attach Attach an existing compute resource to a workspace.
az ml compute connect-ssh Set up SSH connection to Compute Instance.
az ml compute create Create a compute target.
az ml compute delete Delete a compute target.
az ml compute detach Detach a previously attached compute resource from a

workspace.
az ml compute list List the compute targets in a workspace.
az ml compute list-nodes List node details for a compute target. The only supported
compute type for this command is AML compute.
az ml compute list-sizes List the VM sizes available by location.
az ml compute list-usage List the available usage resources for VMs.
az ml compute restart Restart a ComputeInstance target.
az ml compute show Show details for a compute target.
az ml compute start Start a ComputeInstance target.
az ml compute stop Stop a ComputeInstance target.
az ml compute update Update a compute target.
az ml connection Manage Azure ML workspace connection.

az ml connection create Create a workspace connection.
az ml connection delete Delete a workspace connection.
az ml connection list List all the workspaces connection.
az ml connection show Show details of a workspace connection.
az ml connection update Update a workspace connection.
az ml data Manage Azure ML data assets.
az ml data archive Archive a data asset.
az ml data create Create a data asset in a workspace/registry. If you are using a

registry, replace --workspace-name my-workspace with the --
registry-name <registry-name> option.
az ml data import Import data and create a data asset.
az ml data list List data assets in a workspace/registry. If you are using a

az ml data list-materialization- Show status of list of data import materialization jobs that create
status versions of a data asset.
az ml data restore Restore an archived data asset.
az ml data share Share a specific data asset from workspace to registry.
az ml data show Shows details for a data asset in a workspace/registry. If you are
using a registry, replace --workspace-name my-workspace with the
--registry-name <registry-name> option.
az ml data update Update a data asset.
az ml datastore Manage Azure ML datastores.
az ml datastore create Create a datastore.
az ml datastore delete Delete a datastore.
az ml datastore list List datastores in a workspace.
az ml datastore show Show details for a datastore.
az ml datastore update Update a datastore.
az ml environment Manage Azure ML environments.
az ml environment archive Archive an environment.

az ml environment create Create an environment.
az ml environment list List environments in a workspace.
az ml environment restore Restore an archived environment.
az ml environment share Share a specific environment from workspace to registry.
az ml environment show Show details for an environment.
az ml environment update Update an environment.
az ml feature-set Manage Azure ML feature sets.
az ml feature-set archive Archive a feature set.
az ml feature-set backfill Begin backfill job.
az ml feature-set create Create a feature set.
az ml feature-set get-feature Get a feature for a feature set.
az ml feature-set list List feature set in a feature store.
az ml feature-set list-features List Features for a feature set.
az ml feature-set list- List Materialization operation.

materialization-operation
az ml feature-set restore Restore an archived feature set.
az ml feature-set show Shows details for a feature set.
az ml feature-set update Update a feature set.
az ml feature-store Manage Azure ML feature stores.
az ml feature-store create Create a feature store.
az ml feature-store delete Delete a feature store.
az ml feature-store list List all the feature stores in a subscription.
az ml feature-store show Show details for a feature store.
az ml feature-store update Update a feature store.
az ml feature-store-entity Manage Azure ML feature store entities.
az ml feature-store-entity Archive a feature store entity.

archive
az ml feature-store-entity Create a feature store entity.

create
az ml feature-store-entity list List feature store entity in a feature store.
az ml feature-store-entity Restore an archived feature store entity.

restore
az ml feature-store-entity Shows details for a feature store entity.

show
az ml feature-store-entity Update a feature store entity.

update
az ml job Manage Azure ML jobs.
az ml job archive Archive a job.
az ml job cancel Cancel a job.
az ml job connect-ssh Set up ssh connection and sends the request to the SSH service
running inside user's container through Tundra.
az ml job create Create a job.
az ml job download Download all job-related files.
az ml job list List jobs in a workspace.
az ml job restore Restore an archived job.
az ml job show Show details for a job.
az ml job show-services Show services of a job per node.
az ml job stream Stream job logs to the console.
az ml job update Update a job.
az ml model Manage Azure ML models.
az ml model archive Archive a model.
az ml model create Create a model.
az ml model download Download all model-related files.
az ml model list List models in a workspace/registry. If you are using a registry,

replace --workspace-name my-workspace with the --registry-
name <registry-name> option.
az ml model package Package a model into an environment.
az ml model restore Restore an archived model.

az ml model share Share a specific model from workspace to registry.
az ml model show Show details for a model in a workspace/registry. If you are

az ml model update Update a model in a workspace/registry.
az ml online-deployment Manage Azure ML online deployments.
az ml online-deployment Create a deployment. If the deployment already exists, it will fail.

create If you want to update existing deployment, use az ml online-
deployment update.
az ml online-deployment Delete a deployment.

delete
az ml online-deployment get- Get the container logs for an online deployment.

logs
az ml online-deployment list List deployments.
az ml online-deployment Show a deployment.

show
az ml online-deployment Update a deployment.

update
az ml online-endpoint Manage Azure ML online endpoints.
az ml online-endpoint create Create an endpoint.
az ml online-endpoint delete Delete an endpoint.
az ml online-endpoint get- List the token/keys for an online endpoint.

credentials
az ml online-endpoint invoke Invoke an endpoint.
az ml online-endpoint list List endpoints in a workspace.
az ml online-endpoint Regenerate the keys for an online endpoint.

regenerate-keys
az ml online-endpoint show Show details for an endpoint.
az ml online-endpoint update Update an endpoint.
az ml registry Manage Azure ML registries.
az ml registry create Create a registry.
az ml registry delete Delete a given registry.

az ml registry list List all the registries in a subscription or resource group.
az ml registry show Show details for a registry.
az ml registry update Update a registry.
az ml schedule Manage Azure ML schedule resources.
az ml schedule create Create a schedule.
az ml schedule delete Delete a schedule. The previous triggered jobs will NOT be
deleted.
az ml schedule disable Disable a schedule so that it will stop triggering jobs.
az ml schedule enable Enable a schedule so that it will continue triggering jobs.
az ml schedule list List the schedules in a workspace.
az ml schedule show Show details of a schedule.
az ml schedule update Update a schedule.
az ml workspace Manage Azure ML workspaces.
az ml workspace create Create a workspace.
az ml workspace delete Delete a workspace.
az ml workspace diagnose Diagnose workspace setup problems.
az ml workspace list List all the workspaces in a subscription.
az ml workspace list-keys List workspace keys for dependent resources such as Azure
Storage, Azure Container Registry, and Azure Application
Insights.
az ml workspace outbound- Manage outbound rules for the managed network of an Azure
rule ML workspace.
az ml workspace outbound- List all the managed network outbound rules for a workspace.
rule list
az ml workspace outbound- Remove an outbound rule from the managed network for a
rule remove workspace.
az ml workspace outbound- Show details for a managed network outbound rule for a
rule show workspace.
az ml workspace provision- Provision workspace managed network.

network
az ml workspace show Show details for a workspace.

az ml workspace sync-keys Sync workspace keys for dependent resources such as Azure
Insights.
az ml workspace update Update a workspace.

Az.MachineLearningServices
Reference
Microsoft Azure PowerShell: MachineLearningServices cmdlets
Machine Learning
Get-AzMLServiceQuota Gets the currently assigned
Workspace Quotas based on
VMFamily.
Get-AzMLServiceUsage Gets the current usage information

as well as limits for AML resources
for given subscription and location.
Get-AzMLServiceVMSize Returns supported VM Sizes in a

location
Get-AzMLWorkspace Gets the properties of the specified

machine learning workspace.
Get-AzMLWorkspaceBatchDeployment Gets a batch inference deployment

by id.
Get-AzMLWorkspaceBatchEndpoint Gets a batch inference endpoint by

name.
Get-AzMLWorkspaceCodeVersion Get version.
Get-AzMLWorkspaceComponentContainer Get container.
Get-AzMLWorkspaceComponentVersion Get version.
Get-AzMLWorkspaceCompute Gets compute definition by its name.

Any secrets (storage keys, service
credentials, etc) are not returned -
use 'keys' nested resource to get
them.
Get-AzMLWorkspaceComputeKey Gets secrets related to Machine

Learning compute (storage keys,
service credentials, etc).
Get-AzMLWorkspaceComputeNode Get the details (e.g IP address, port

etc) of all the compute nodes in the
compute.
Get-AzMLWorkspaceConnection Get the connection of the specified

machine learning workspace.
Get-AzMLWorkspaceDataContainer Get container.
Get-AzMLWorkspaceDatastore Get datastore.
Get-AzMLWorkspaceDatastoreSecret Get datastore secrets.
Get-AzMLWorkspaceDataVersion Get version.
Get-AzMLWorkspaceEnvironmentContainer Get container.
Get-AzMLWorkspaceEnvironmentVersion Get version.
Get-AzMLWorkspaceFeature Lists all enabled features for a

workspace
Get-AzMLWorkspaceJob Gets a Job by name/id.
Get-AzMLWorkspaceKey Lists all the keys associated with this

workspace. This includes keys for the
storage account, app insights and
password for container registry
Get-AzMLWorkspaceModelContainer Get container.
Get-AzMLWorkspaceModelVersion Get version.
Get-AzMLWorkspaceNotebookAccessToken return notebook access token and

refresh token
Get-AzMLWorkspaceNotebookKey List keys of a notebook.
Get-AzMLWorkspaceOnlineDeployment Get Inference Deployment

Deployment.
Get-AzMLWorkspaceOnlineDeploymentLog Polls an Endpoint operation.
Get-AzMLWorkspaceOnlineDeploymentSku List Inference Endpoint Deployment

Skus.
Get-AzMLWorkspaceOnlineEndpoint Get Online Endpoint.
Get-AzMLWorkspaceOnlineEndpointKey List EndpointAuthKeys for an

Endpoint using Key-based
authentication.
Get-AzMLWorkspaceOnlineEndpointToken Retrieve a valid AAD token for an

Endpoint using AMLToken-based
authentication.
Get- Called by Client (Portal, CLI, etc) to

AzMLWorkspaceOutboundNetworkDependencyEndpoint get a list of all external outbound
dependencies (FQDNs)
programmatically.
Get-AzMLWorkspaceStorageAccountKey List storage account keys of a

workspace.
Invoke-AzMLWorkspaceDiagnose Diagnose workspace setup issue.
Invoke-AzMLWorkspaceNotebook Prepare a notebook.
New-AzMLWorkspace Creates or updates a workspace with

the specified parameters.
New-AzMLWorkspaceAksObject Create an in-memory object for Aks.
New-AzMLWorkspaceAmlComputeObject Create an in-memory object for

AmlCompute.
New-AzMLWorkspaceBatchDeployment Creates/updates a batch inference

deployment (asynchronous).
New-AzMLWorkspaceBatchEndpoint Creates a batch inference endpoint

(asynchronous).
New-AzMLWorkspaceCodeVersion Create or update version.
New-AzMLWorkspaceCommandJobObject Create an in-memory object for

CommandJob.
New-AzMLWorkspaceComponentContainer Create or update container.
New-AzMLWorkspaceComponentVersion Create or update version.
New-AzMLWorkspaceCompute Creates or updates compute. This

call will overwrite a compute if it
exists. This is a nonrecoverable
operation. If your intent is to create
a new compute, do a GET first to
verify that it does not exist yet.
New-AzMLWorkspaceComputeInstanceObject Create an in-memory object for

ComputeInstance.
New-AzMLWorkspaceComputeStartStopScheduleObject Create an in-memory object for

ComputeStartStopSchedule.
New-AzMLWorkspaceConnection Creates or updates a workspace

connection with the specified
parameters.
New-AzMLWorkspaceCustomModelJobInputObject Create an in-memory object for

CustomModelJobInput.
New-AzMLWorkspaceCustomModelJobOutputObject Create an in-memory object for
CustomModelJobOutput.
New-AzMLWorkspaceDatabricksObject Create an in-memory object for

Databricks.
New-AzMLWorkspaceDataContainer Create or update container.
New-AzMLWorkspaceDataFactoryObject Create an in-memory object for

DataFactory.
New-AzMLWorkspaceDataLakeAnalyticsObject Create an in-memory object for

DataLakeAnalytics.
New-AzMLWorkspaceDatastore Create or update datastore.
New-AzMLWorkspaceDatastoreBlobObject Create an in-memory object for

AzureBlobDatastore.
New-AzMLWorkspaceDatastoreCredentialObject Create an in-memory object for

CertificateDatastoreCredentials.
New-AzMLWorkspaceDatastoreDataLakeGen1Object Create an in-memory object for

AzureDataLakeGen1Datastore.
New-AzMLWorkspaceDatastoreDataLakeGen2Object Create an in-memory object for

AzureDataLakeGen2Datastore.
New-AzMLWorkspaceDatastoreFileObject Create an in-memory object for

AzureFileDatastore.
New-AzMLWorkspaceDatastoreKeyCredentialObject Create an in-memory object for

AccountKeyDatastoreCredentials.
New-AzMLWorkspaceDatastoreNoneCredentialObject Create an in-memory object for

NoneDatastoreCredentials.
New-AzMLWorkspaceDatastoreSasCredentialObject Create an in-memory object for

SasDatastoreCredentials.
New- Create an in-memory object for

AzMLWorkspaceDatastoreServicePrincipalCredentialObject ServicePrincipalDatastoreCredentials.
New-AzMLWorkspaceDataVersion Create or update version.
New-AzMLWorkspaceEnvironmentVersion Creates or updates an

EnvironmentVersion.
New-AzMLWorkspaceHDInsightObject Create an in-memory object for

HDInsight.
New-AzMLWorkspaceJob Creates and executes a Job.

New-AzMLWorkspaceJobServiceObject Create an in-memory object for
JobService.
New-AzMLWorkspaceKubernetesObject Create an in-memory object for

Kubernetes.
New-AzMLWorkspaceLiteralJobInputObject Create an in-memory object for

LiteralJobInput.
New-AzMLWorkspaceMLFlowModelJobInputObject Create an in-memory object for

MLFlowModelJobInput.
New-AzMLWorkspaceMLFlowModelJobOutputObject Create an in-memory object for

MLFlowModelJobOutput.
New-AzMLWorkspaceMLTableJobInputObject Create an in-memory object for

MLTableJobInput.
New-AzMLWorkspaceMLTableJobOutputObject Create an in-memory object for

MLTableJobOutput.
New-AzMLWorkspaceModelContainer Create or update container.
New-AzMLWorkspaceModelVersion Create or update version.
New-AzMLWorkspaceOnlineDeployment Create or update Inference Endpoint

Deployment (asynchronous).
New-AzMLWorkspaceOnlineEndpoint Create or update Online Endpoint

(asynchronous).
New-AzMLWorkspaceOnlineEndpointKey Regenerate EndpointAuthKeys for an

Endpoint using Key-based
authentication (asynchronous).
New-AzMLWorkspacePipelineJobObject Create an in-memory object for

PipelineJob.
New-AzMLWorkspaceQuotaPropertiesObject Create an in-memory object for

QuotaBaseProperties.
New-AzMLWorkspaceSharedPrivateLinkResourceObject Create an in-memory object for

SharedPrivateLinkResource.
New-AzMLWorkspaceSweepJobObject Create an in-memory object for

SweepJob.
New-AzMLWorkspaceSynapseSparkObject Create an in-memory object for

SynapseSpark.
New-AzMLWorkspaceTritonModelJobInputObject Create an in-memory object for

TritonModelJobInput.
New-AzMLWorkspaceTritonModelJobOutputObject Create an in-memory object for
TritonModelJobOutput.
New-AzMLWorkspaceUriFileJobInputObject Create an in-memory object for

UriFileJobInput.
New-AzMLWorkspaceUriFileJobOutputObject Create an in-memory object for

UriFileJobOutput.
New-AzMLWorkspaceUriFolderJobInputObject Create an in-memory object for

UriFolderJobInput.
New-AzMLWorkspaceUriFolderJobOutputObject Create an in-memory object for

UriFolderJobOutput.
New-AzMLWorkspaceVirtualMachineObject Create an in-memory object for

VirtualMachine.
Remove-AzMLWorkspace Deletes a machine learning

workspace.
Remove-AzMLWorkspaceBatchDeployment Delete Batch Inference deployment

(asynchronous).
Remove-AzMLWorkspaceBatchEndpoint Delete Batch Inference Endpoint

(asynchronous).
Remove-AzMLWorkspaceCodeVersion Delete version.
Remove-AzMLWorkspaceComponentContainer Delete container.
Remove-AzMLWorkspaceComponentVersion Delete version.
Remove-AzMLWorkspaceCompute Deletes specified Machine Learning

compute.
Remove-AzMLWorkspaceConnection Deletes a machine learning

workspace connection.
Remove-AzMLWorkspaceDataContainer Delete container.
Remove-AzMLWorkspaceDatastore Delete datastore.
Remove-AzMLWorkspaceDataVersion Delete version.
Remove-AzMLWorkspaceEnvironmentContainer Delete container.
Remove-AzMLWorkspaceEnvironmentVersion Delete version.
Remove-AzMLWorkspaceJob Deletes a Job (asynchronous).
Remove-AzMLWorkspaceModelContainer Delete container.

Remove-AzMLWorkspaceModelVersion Delete version.
Remove-AzMLWorkspaceOnlineDeployment Delete Inference Endpoint

Deployment (asynchronous).
Remove-AzMLWorkspaceOnlineEndpoint Delete Online Endpoint

(asynchronous).
Restart-AzMLWorkspaceCompute Posts a restart action to a compute

instance
Start-AzMLWorkspaceCompute Posts a start action to a compute

instance
Stop-AzMLWorkspaceCompute Posts a stop action to a compute

instance
Stop-AzMLWorkspaceJob Cancels a Job (asynchronous).
Sync-AzMLWorkspaceKey Resync all the keys associated with

this workspace. This includes keys
for the storage account, app insights
and password for container registry
Update-AzMLServiceQuota Update quota for each VM family in

workspace.
Update-AzMLWorkspace Updates a machine learning

workspace with the specified
parameters.
Update-AzMLWorkspaceBatchDeployment Update a batch inference

deployment (asynchronous).
Update-AzMLWorkspaceBatchEndpoint Update a batch inference endpoint

(asynchronous).
Update-AzMLWorkspaceCompute Updates properties of a compute.

This call will overwrite a compute if it
exists. This is a nonrecoverable
operation.
Update-AzMLWorkspaceOnlineDeployment Update Online Deployment

(asynchronous).
Update-AzMLWorkspaceOnlineEndpoint Update Online Endpoint

(asynchronous).
Azure Machine Learning Curated
Environments
Article • 06/15/2023
This article lists the curated environments with latest framework versions in Azure
Machine Learning. Curated environments are provided by Azure Machine Learning and
are available in your workspace by default. The curated environments rely on cached
Docker images that use the latest version of the Azure Machine Learning SDK. Using a
curated environment can reduce the run preparation cost and allow for faster
deployment time. Use these environments to quickly get started with various machine
learning frameworks.
７ Note
Use the Python SDK, CLI, or Azure Machine Learning studio to get the full list of
environments and their dependencies. For more information, see the environments
article.
Why should I use curated environments?

Reduces training and deployment latency.
Improves training and deployment success rate.
Avoid unnecessary image builds.
Only have required dependencies and access right in the image/container.
） Important
To view more information about curated environment packages and versions, visit
the Environments tab in the Azure Machine Learning studio.
Curated environments
Azure Container for PyTorch (ACPT)

Description: Recommended environment for Deep Learning with PyTorch on Azure. It
contains the Azure Machine Learning SDK with the latest compatible versions of Ubuntu,
Python, PyTorch, CUDA\RocM, and NebulaML. It also provides optimizers like ORT
Training, +DeepSpeed+MSCCL+ORT MoE, and checkpointing using NebulaML and
more.
To learn more, see Azure Container for PyTorch (ACPT).
７ Note
Currently, due to underlying cuda and cluster incompatibilities, on NC series only

acpt-pytorch-1.11-cuda11.3 with cuda 11.3 and torch 1.11 can be used.
PyTorch
Name: AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu
Description: An environment for deep learning with PyTorch containing the Azure
Machine Learning Python SDK and other Python packages.
GPU: Cuda11
OS: Ubuntu18.04
PyTorch: 1.10
Other available PyTorch environments:
AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu
LightGBM
Name: AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu
Description: An environment for machine learning with Scikit-learn, LightGBM, XGBoost,
Dask containing the Azure Machine Learning Python SDK and other packages.
OS: Ubuntu18.04
Dask: 2021.6
LightGBM: 3.2
Scikit-learn: 0.24
XGBoost: 1.4
Sklearn
Name: AzureML-sklearn-1.0-ubuntu20.04-py38-cpu
Description: An environment for tasks such as regression, clustering, and classification
with Scikit-learn. Contains the Azure Machine Learning Python SDK and other Python
packages.
OS: Ubuntu20.04
Scikit-learn: 1.0
Other available Sklearn environments:
AzureML-sklearn-0.24-ubuntu18.04-py37-cpu
TensorFlow
Name: AzureML-tensorflow-2.4-ubuntu18.04-py37-cuda11-gpu
Description: An environment for deep learning with TensorFlow containing the Azure
Machine Learning Python SDK and other Python packages.
GPU: Cuda11
Horovod: 2.4.1
OS: Ubuntu18.04
TensorFlow: 2.4
Automated ML (AutoML)
Azure Machine Learning pipeline training workflows that use AutoML automatically
selects a curated environment based on the compute type and whether DNN is enabled.
AutoML provides the following curated environments:
Name Compute Type DNN enabled
AzureML-AutoML CPU No
AzureML-AutoML-DNN CPU Yes
AzureML-AutoML-GPU GPU No
AzureML-AutoML-DNN-GPU GPU Yes
For more information on AutoML and Azure Machine Learning pipelines, see use
automated ML in an Azure Machine Learning pipeline in Python SDK v1.
Support
Version updates for supported environments, including the base images they reference,
are released every quarter to address vulnerabilities. Based on usage, some
environments may be deprecated (hidden from the product but usable) to support more
common machine learning scenarios.
az ml
Reference
７ Note
This reference is part of the ml extension for the Azure CLI (version 2.15.0 or
higher). The extension will automatically install the first time you run an az ml
command. Learn more about extensions.
Manage Azure Machine Learning resources with the Azure CLI ML extension v2.
Install Azure CLI ML extension v2 https://docs.microsoft.com/azure/machine-

learning/how-to-configure-cli.
Commands
az ml batch-deployment Manage Azure ML batch deployments.
az ml batch-deployment Create a deployment. If the deployment already exists, it will be

create over-written with the new settings.
az ml batch-deployment Delete a deployment.

delete
az ml batch-deployment list List deployments.
az ml batch-deployment list- List the batch scoring jobs for a batch deployment.
jobs
az ml batch-deployment show Show a deployment.
az ml batch-deployment Update a deployment.

update
az ml batch-endpoint Manage Azure ML batch endpoints.
az ml batch-endpoint create Create an endpoint.
az ml batch-endpoint delete Delete an endpoint.
az ml batch-endpoint invoke Invoke an endpoint.
az ml batch-endpoint list List endpoints in a workspace.
az ml batch-endpoint list-jobs List the batch scoring jobs for a batch endpoint.
az ml batch-endpoint show Show details for an endpoint.
az ml batch-endpoint update Update an endpoint.
az ml component Manage Azure ML components.
az ml component archive Archive a component.
az ml component create Create a component.
az ml component list List components in a workspace.
az ml component restore Restore an archived component.
az ml component show Show details for a component.
az ml component update Update a component. Currently only a few fields(description,

display_name) support update.
az ml compute Manage Azure ML compute resources.
az ml compute attach Attach an existing compute resource to a workspace.
az ml compute connect-ssh Set up SSH connection to Compute Instance.
az ml compute create Create a compute target.
az ml compute delete Delete a compute target.
az ml compute detach Detach a previously attached compute resource from a

workspace.
az ml compute list List the compute targets in a workspace.
az ml compute list-nodes List node details for a compute target. The only supported
compute type for this command is AML compute.
az ml compute list-sizes List the VM sizes available by location.
az ml compute list-usage List the available usage resources for VMs.
az ml compute restart Restart a ComputeInstance target.
az ml compute show Show details for a compute target.
az ml compute start Start a ComputeInstance target.
az ml compute stop Stop a ComputeInstance target.
az ml compute update Update a compute target.
az ml connection Manage Azure ML workspace connection.

az ml connection create Create a workspace connection.
az ml connection delete Delete a workspace connection.
az ml connection list List all the workspaces connection.
az ml connection show Show details of a workspace connection.
az ml connection update Update a workspace connection.
az ml data Manage Azure ML data assets.
az ml data archive Archive a data asset.
az ml data create Create a data asset in a workspace/registry. If you are using a

az ml data import Import data and create a data asset.
az ml data list List data assets in a workspace/registry. If you are using a

az ml data list-materialization- Show status of list of data import materialization jobs that create
status versions of a data asset.
az ml data restore Restore an archived data asset.
az ml data share Share a specific data asset from workspace to registry.
az ml data show Shows details for a data asset in a workspace/registry. If you are
az ml data update Update a data asset.
az ml datastore Manage Azure ML datastores.
az ml datastore create Create a datastore.
az ml datastore delete Delete a datastore.
az ml datastore list List datastores in a workspace.
az ml datastore show Show details for a datastore.
az ml datastore update Update a datastore.
az ml environment Manage Azure ML environments.
az ml environment archive Archive an environment.

az ml environment create Create an environment.
az ml environment list List environments in a workspace.
az ml environment restore Restore an archived environment.
az ml environment share Share a specific environment from workspace to registry.
az ml environment show Show details for an environment.
az ml environment update Update an environment.
az ml feature-set Manage Azure ML feature sets.
az ml feature-set archive Archive a feature set.
az ml feature-set backfill Begin backfill job.
az ml feature-set create Create a feature set.
az ml feature-set get-feature Get a feature for a feature set.
az ml feature-set list List feature set in a feature store.
az ml feature-set list-features List Features for a feature set.
az ml feature-set list- List Materialization operation.

materialization-operation
az ml feature-set restore Restore an archived feature set.
az ml feature-set show Shows details for a feature set.
az ml feature-set update Update a feature set.
az ml feature-store Manage Azure ML feature stores.
az ml feature-store create Create a feature store.
az ml feature-store delete Delete a feature store.
az ml feature-store list List all the feature stores in a subscription.
az ml feature-store show Show details for a feature store.
az ml feature-store update Update a feature store.
az ml feature-store-entity Manage Azure ML feature store entities.
az ml feature-store-entity Archive a feature store entity.

archive
az ml feature-store-entity Create a feature store entity.

create
az ml feature-store-entity list List feature store entity in a feature store.
az ml feature-store-entity Restore an archived feature store entity.

restore
az ml feature-store-entity Shows details for a feature store entity.

show
az ml feature-store-entity Update a feature store entity.

update
az ml job Manage Azure ML jobs.
az ml job archive Archive a job.
az ml job cancel Cancel a job.
az ml job connect-ssh Set up ssh connection and sends the request to the SSH service
running inside user's container through Tundra.
az ml job create Create a job.
az ml job download Download all job-related files.
az ml job list List jobs in a workspace.
az ml job restore Restore an archived job.
az ml job show Show details for a job.
az ml job show-services Show services of a job per node.
az ml job stream Stream job logs to the console.
az ml job update Update a job.
az ml model Manage Azure ML models.
az ml model archive Archive a model.
az ml model create Create a model.
az ml model download Download all model-related files.
az ml model list List models in a workspace/registry. If you are using a registry,

replace --workspace-name my-workspace with the --registry-
name <registry-name> option.
az ml model package Package a model into an environment.
az ml model restore Restore an archived model.

az ml model share Share a specific model from workspace to registry.
az ml model show Show details for a model in a workspace/registry. If you are

az ml model update Update a model in a workspace/registry.
az ml online-deployment Manage Azure ML online deployments.
az ml online-deployment Create a deployment. If the deployment already exists, it will fail.

create If you want to update existing deployment, use az ml online-
deployment update.
az ml online-deployment Delete a deployment.

delete
az ml online-deployment get- Get the container logs for an online deployment.

logs
az ml online-deployment list List deployments.
az ml online-deployment Show a deployment.

show
az ml online-deployment Update a deployment.

update
az ml online-endpoint Manage Azure ML online endpoints.
az ml online-endpoint create Create an endpoint.
az ml online-endpoint delete Delete an endpoint.
az ml online-endpoint get- List the token/keys for an online endpoint.

credentials
az ml online-endpoint invoke Invoke an endpoint.
az ml online-endpoint list List endpoints in a workspace.
az ml online-endpoint Regenerate the keys for an online endpoint.

regenerate-keys
az ml online-endpoint show Show details for an endpoint.
az ml online-endpoint update Update an endpoint.
az ml registry Manage Azure ML registries.
az ml registry create Create a registry.
az ml registry delete Delete a given registry.

az ml registry list List all the registries in a subscription or resource group.
az ml registry show Show details for a registry.
az ml registry update Update a registry.
az ml schedule Manage Azure ML schedule resources.
az ml schedule create Create a schedule.
az ml schedule delete Delete a schedule. The previous triggered jobs will NOT be
deleted.
az ml schedule disable Disable a schedule so that it will stop triggering jobs.
az ml schedule enable Enable a schedule so that it will continue triggering jobs.
az ml schedule list List the schedules in a workspace.
az ml schedule show Show details of a schedule.
az ml schedule update Update a schedule.
az ml workspace Manage Azure ML workspaces.
az ml workspace create Create a workspace.
az ml workspace delete Delete a workspace.
az ml workspace diagnose Diagnose workspace setup problems.
az ml workspace list List all the workspaces in a subscription.
az ml workspace list-keys List workspace keys for dependent resources such as Azure
Insights.
az ml workspace outbound- Manage outbound rules for the managed network of an Azure
rule ML workspace.
az ml workspace outbound- List all the managed network outbound rules for a workspace.
rule list
az ml workspace outbound- Remove an outbound rule from the managed network for a
rule remove workspace.
az ml workspace outbound- Show details for a managed network outbound rule for a
rule show workspace.
az ml workspace provision- Provision workspace managed network.

network
az ml workspace show Show details for a workspace.

az ml workspace sync-keys Sync workspace keys for dependent resources such as Azure
Insights.
az ml workspace update Update a workspace.

CLI (v1) pipeline job YAML schema
Article • 04/04/2023
７ Note
The YAML syntax detailed in this document is based on the JSON schema for the v1
version of the ML CLI extension. This syntax is guaranteed only to work with the ML
CLI v1 extension. Switch to the v2 (current version) for the syntax for ML CLI v2.
） Important

and Python SDK v2.
Define your machine learning pipelines in YAML . When using the machine learning
extension for the Azure CLI v1., many of the pipeline-related commands expect a YAML
file that defines the pipeline.
The following table lists what is and is not currently supported when defining a pipeline
in YAML for use with CLI v1:
Step type Supported?
PythonScriptStep Yes
ParallelRunStep Yes
AdlaStep Yes
AzureBatchStep Yes
DatabricksStep Yes
DataTransferStep Yes
AutoMLStep No
Step type Supported?
HyperDriveStep No
ModuleStep Yes
MPIStep No
EstimatorStep No
Pipeline definition
A pipeline definition uses the following keys, which correspond to the Pipelines class:
YAML key Description
name The description of the pipeline.
parameters Parameter(s) to the pipeline.
data_reference Defines how and where data should be made available in a run.
default_compute Default compute target where all steps in the pipeline run.
steps The steps used in the pipeline.
Parameters
The parameters section uses the following keys, which correspond to the
PipelineParameter class:
YAML Description
key
type The value type of the parameter. Valid types are string , int , float , bool , or
datapath .
default The default value.
Each parameter is named. For example, the following YAML snippet defines three
parameters named NumIterationsParameter , DataPathParameter , and
NodeCountParameter :
YAML
pipeline:
name: SamplePipelineFromYaml
parameters:
NumIterationsParameter:
type: int
default: 40
DataPathParameter:
type: datapath
default:
datastore: workspaceblobstore
path_on_datastore: sample2.txt
NodeCountParameter:
type: int
default: 4
Data reference
The data_references section uses the following keys, which correspond to the
DataReference:
datastore The datastore to reference.
path_on_datastore The relative path in the backing storage for the data reference.
Each data reference is contained in a key. For example, the following YAML snippet
defines a data reference stored in the key named employee_data :
YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
employee_data:
datastore: adftestadla
path_on_datastore: "adla_sample/sample_input.csv"
Steps
Steps define a computational environment, along with the files to run on the
environment. To define the type of a step, use the type key:
Step type Description
AdlaStep Runs a U-SQL script with Azure Data Lake Analytics. Corresponds to the
AdlaStep class.
AzureBatchStep Runs jobs using Azure Batch. Corresponds to the AzureBatchStep class.
DatabricsStep Adds a Databricks notebook, Python script, or JAR. Corresponds to the

DatabricksStep class.
DataTransferStep Transfers data between storage options. Corresponds to the DataTransferStep

class.
PythonScriptStep Runs a Python script. Corresponds to the PythonScriptStep class.
ParallelRunStep Runs a Python script to process large amounts of data asynchronously and in
parallel. Corresponds to the ParallelRunStep class.
ADLA step
script_name The name of the U-SQL script (relative to the source_directory ).
compute The Azure Data Lake compute target to use for this step.
parameters Parameters to the pipeline.
inputs Inputs can be InputPortBinding, DataReference, PortDataReference,

PipelineData, Dataset, DatasetDefinition, or PipelineDataset.
outputs Outputs can be either PipelineData or OutputPortBinding.
source_directory Directory that contains the script, assemblies, etc.
priority The priority value to use for the current job.
params Dictionary of name-value pairs.
degree_of_parallelism The degree of parallelism to use for this job.
runtime_version The runtime version of the Data Lake Analytics engine.
allow_reuse Determines whether the step should reuse previous results when run
again with the same settings.
The following example contains an ADLA Step definition:
YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
employee_data:
path_on_datastore: "adla_sample/sample_input.csv"
default_compute: adlacomp
steps:
Step1:
runconfig: "D:\\Yaml\\default_runconfig.yml"
parameters:
NUM_ITERATIONS_2:
source: PipelineParam1
NUM_ITERATIONS_1: 7
type: "AdlaStep"
name: "MyAdlaStep"
script_name: "sample_script.usql"
source_directory: "D:\\scripts\\Adla"
inputs:
employee_data:
source: employee_data
outputs:
OutputData:
destination: Output4
bind_mode: mount
Azure Batch step
compute The Azure Batch compute target to use for this step.
inputs Inputs can be InputPortBinding, DataReference,

PortDataReference, PipelineData, Dataset, DatasetDefinition, or
PipelineDataset.
source_directory Directory that contains the module binaries, executable,

assemblies, etc.
executable Name of the command/executable that will be ran as part of

this job.
create_pool Boolean flag to indicate whether to create the pool before

running the job.
delete_batch_job_after_finish Boolean flag to indicate whether to delete the job from the
Batch account after it's finished.
delete_batch_pool_after_finish Boolean flag to indicate whether to delete the pool after the
job finishes.
is_positive_exit_code_failure Boolean flag to indicate if the job fails if the task exits with a
positive code.
vm_image_urn If create_pool is True , and VM uses

VirtualMachineConfiguration .
pool_id The ID of the pool where the job will run.
allow_reuse Determines whether the step should reuse previous results

when run again with the same settings.
The following example contains an Azure Batch step definition:
YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
input:
path_on_datastore: "input.txt"
default_compute: testbatch
steps:
Step1:
parameters:
NUM_ITERATIONS_2:
NUM_ITERATIONS_1: 7
type: "AzureBatchStep"
name: "MyAzureBatchStep"
pool_id: "MyPoolName"
create_pool: true
executable: "azurebatch.cmd"
source_directory: "D:\\scripts\\AureBatch"
allow_reuse: false
inputs:
input:
source: input
outputs:
output:
destination: output
Databricks step
compute The Azure Databricks compute target to use for this step.

run_name The name in Databricks for this run.
source_directory Directory that contains the script and other files.
num_workers The static number of workers for the Databricks run cluster.
runconfig The path to a .runconfig file. This file is a YAML representation of the
RunConfiguration class. For more information on the structure of this file, see
runconfigschema.json .
allow_reuse Determines whether the step should reuse previous results when run again
with the same settings.
The following example contains a Databricks step:
YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
adls_test_data:
path_on_datastore: "testdata"
blob_test_data:
path_on_datastore: "dbtest"
default_compute: mydatabricks
steps:
Step1:
parameters:
NUM_ITERATIONS_2:
NUM_ITERATIONS_1: 7
type: "DatabricksStep"
name: "MyDatabrickStep"
run_name: "DatabricksRun"
python_script_name: "train-db-local.py"
source_directory: "D:\\scripts\\Databricks"
num_workers: 1
allow_reuse: true
inputs:
blob_test_data:
source: blob_test_data
outputs:
OutputData:
bind_mode: mount
Data transfer step
compute The Azure Data Factory compute target to use for this step.
source_data_reference Input connection that serves as the source of data transfer

operations. Supported values are InputPortBinding, DataReference,
PortDataReference, PipelineData, Dataset, DatasetDefinition, or
PipelineDataset.
destination_data_reference Input connection that serves as the destination of data transfer

operations. Supported values are PipelineData and
OutputPortBinding.
allow_reuse Determines whether the step should reuse previous results when
run again with the same settings.
The following example contains a data transfer step:
YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
adls_test_data:
blob_test_data:
default_compute: adftest
steps:
Step1:
parameters:
NUM_ITERATIONS_2:
NUM_ITERATIONS_1: 7
type: "DataTransferStep"
name: "MyDataTransferStep"
adla_compute_name: adftest
source_data_reference:
adls_test_data:
source: adls_test_data
destination_data_reference:
blob_test_data:
source: blob_test_data
Python script step

script_name The name of the Python script (relative to source_directory ).
source_directory Directory that contains the script, Conda environment, etc.
runconfig The path to a .runconfig file. This file is a YAML representation of the
RunConfiguration class. For more information on the structure of this file, see
runconfig.json .
allow_reuse Determines whether the step should reuse previous results when run again
with the same settings.
The following example contains a Python script step:

YAML
pipeline:
parameters:
PipelineParam1:
type: int
default: 3
data_references:
DataReference1:
path_on_datastore: testfolder/sample.txt
default_compute: cpu-cluster
steps:
Step1:
parameters:
NUM_ITERATIONS_2:
NUM_ITERATIONS_1: 7
type: "PythonScriptStep"
name: "MyPythonScriptStep"
script_name: "train.py"
allow_reuse: True
source_directory: "D:\\scripts\\PythonScript"
inputs:
InputData:
source: DataReference1
outputs:
OutputData:
bind_mode: mount
Parallel run step
inputs Inputs can be Dataset, DatasetDefinition, or PipelineDataset.
script_name The name of the Python script (relative to source_directory ).
source_directory Directory that contains the script, Conda environment, etc.
parallel_run_config The path to a parallel_run_config.yml file. This file is a YAML

representation of the ParallelRunConfig class.
allow_reuse Determines whether the step should reuse previous results when run
again with the same settings.
The following example contains a Parallel run step:
YAML
pipeline:
description: SamplePipelineFromYaml
data_references:
MyMinistInput:
dataset_name: mnist_sample_data
parameters:
PipelineParamTimeout:
type: int
default: 600
steps:
Step1:
parallel_run_config: "yaml/parallel_run_config.yml"
type: "ParallelRunStep"
name: "parallel-run-step-1"
allow_reuse: True
arguments:
- "--progress_update_timeout"
- parameter:timeout_parameter
- "--side_input"
- side_input:SideInputData
parameters:
timeout_parameter:
source: PipelineParamTimeout
inputs:
InputData:
source: MyMinistInput
side_inputs:
SideInputData:
source: Output4
bind_mode: mount
outputs:
OutputDataStep2:
bind_mode: mount
Pipeline with multiple steps
YAML Description
key
steps Sequence of one or more PipelineStep definitions. Note that the destination keys of one
step's outputs become the source keys to the inputs of the next step.
YAML
pipeline:
name: SamplePipelineFromYAML
description: Sample multistep YAML pipeline
data_references:
TitanicDS:
dataset_name: 'titanic_ds'
bind_mode: download
steps:
Dataprep:
name: "DataPrep Step"
compute: cpu-cluster
runconfig: ".\\default_runconfig.yml"
script_name: "prep.py"
arguments:
- '--train_path'
- output:train_path
- '--test_path'
- output:test_path
allow_reuse: True
inputs:
titanic_ds:
source: TitanicDS
bind_mode: download
outputs:
train_path:
destination: train_csv
test_path:
destination: test_csv
Training:
name: "Training Step"
compute: cpu-cluster
runconfig: ".\\default_runconfig.yml"
script_name: "train.py"
arguments:
- "--train_path"
- input:train_path
- "--test_path"
- input:test_path
inputs:
train_path:
source: train_csv
bind_mode: download
test_path:
source: test_csv
bind_mode: download
Schedules
When defining the schedule for a pipeline, it can be either datastore-triggered or
recurring based on a time interval. The following are the keys used to define a schedule:
description A description of the schedule.
recurrence Contains recurrence settings, if the schedule is recurring.
pipeline_parameters Any parameters that are required by the pipeline.
wait_for_provisioning Whether to wait for provisioning of the schedule to complete.
wait_timeout The number of seconds to wait before timing out.
datastore_name The datastore to monitor for modified/added blobs.
polling_interval How long, in minutes, between polling for modified/added blobs.

Default value: 5 minutes. Only supported for datastore schedules.
data_path_parameter_name The name of the data path pipeline parameter to set with the
changed blob path. Only supported for datastore schedules.
continue_on_step_failure Whether to continue execution of other steps in the submitted

PipelineRun if a step fails. If provided, will override the
continue_on_step_failure setting of the pipeline.
path_on_datastore Optional. The path on the datastore to monitor for modified/added

blobs. The path is under the container for the datastore, so the actual
path the schedule monitors is container/ path_on_datastore . If none,
the datastore container is monitored. Additions/modifications made
in a subfolder of the path_on_datastore are not monitored. Only
supported for datastore schedules.
The following example contains the definition for a datastore-triggered schedule:
YAML
Schedule:
description: "Test create with datastore"
recurrence: ~
pipeline_parameters: {}
wait_for_provisioning: True
wait_timeout: 3600
datastore_name: "workspaceblobstore"
polling_interval: 5
data_path_parameter_name: "input_data"
continue_on_step_failure: None
path_on_datastore: "file/path"
When defining a recurring schedule, use the following keys under recurrence :
frequency How often the schedule recurs. Valid values are "Minute" , "Hour" , "Day" , "Week" ,
or "Month" .
interval How often the schedule fires. The integer value is the number of time units to wait
until the schedule fires again.
start_time The start time for the schedule. The string format of the value is YYYY-MM-
DDThh:mm:ss . If no start time is provided, the first workload is run instantly and
future workloads are run based on the schedule. If the start time is in the past, the
first workload is run at the next calculated run time.
time_zone The time zone for the start time. If no time zone is provided, UTC is used.
hours If frequency is "Day" or "Week" , you can specify one or more integers from 0 to
23, separated by commas, as the hours of the day when the pipeline should run.
Only time_of_day or hours and minutes can be used.
minutes If frequency is "Day" or "Week" , you can specify one or more integers from 0 to
59, separated by commas, as the minutes of the hour when the pipeline should
run. Only time_of_day or hours and minutes can be used.
time_of_day If frequency is "Day" or "Week" , you can specify a time of day for the schedule to
run. The string format of the value is hh:mm . Only time_of_day or hours and
minutes can be used.
week_days If frequency is "Week" , you can specify one or more days, separated by commas,
when the schedule should run. Valid values are "Monday" , "Tuesday" , "Wednesday" ,
"Thursday" , "Friday" , "Saturday" , and "Sunday" .
The following example contains the definition for a recurring schedule:
YAML
Schedule:
description: "Test create with recurrence"
recurrence:
frequency: Week # Can be "Minute", "Hour", "Day", "Week", or
"Month".
interval: 1 # how often fires
start_time: 2019-06-07T10:50:00
time_zone: UTC
hours:
- 1
minutes:
- 0
time_of_day: null
week_days:
- Friday
pipeline_parameters:
'a': 1
wait_for_provisioning: True
wait_timeout: 3600
datastore_name: ~
polling_interval: ~
data_path_parameter_name: ~
continue_on_step_failure: None
path_on_datastore: ~
Next steps
Learn how to use the CLI extension for Azure Machine Learning.
Hyperparameters for computer vision
tasks in automated machine learning
(v1)
Article • 06/02/2023
Learn which hyperparameters are available specifically for computer vision tasks in
automated ML experiments.
With support for computer vision tasks, you can control the model algorithm and sweep
hyperparameters. These model algorithms and hyperparameters are passed in as the
parameter space for the sweep. While many of the hyperparameters exposed are
model-agnostic, there are instances where hyperparameters are model-specific or task-
specific.
Model-specific hyperparameters
This table summarizes hyperparameters specific to the yolov5 algorithm.
Parameter name Description Default
validation_metric_type Metric computation method to use for validation voc

metrics.
Must be none , coco , voc , or coco_voc .
validation_iou_threshold IOU threshold for box matching when computing 0.5

validation metrics.
Must be a float in the range [0.1, 1].
img_size Image size for train and validation. 640

Must be a positive integer.
Note: training run may get into CUDA OOM if the size is
too big.
model_size Model size. medium

Must be small , medium , large , or xlarge .
Note: training run may get into CUDA OOM if the model
size is too big.
multi_scale Enable multi-scale image by varying image size by +/- 0

50%
Must be 0 or 1.
Note: training run may get into CUDA OOM if no

sufficient GPU memory.
box_score_thresh During inference, only return proposals with a score 0.1

greater than box_score_thresh . The score is the
multiplication of the objectness score and classification
probability.
Must be a float in the range [0, 1].
nms_iou_thresh IOU threshold used during inference in non-maximum 0.5

suppression post processing.
tile_grid_size The grid size to use for tiling each image. No

Note: tile_grid_size must not be None to enable small Default
object detection logic
A tuple of two integers passed as a string. Example: --
tile_grid_size "(3, 2)"
tile_overlap_ratio Overlap ratio between adjacent tiles in each dimension. 0.25

Must be float in the range of [0, 1)
tile_predictions_nms_thresh The IOU threshold to use to perform NMS while 0.25

merging predictions from tiles and image. Used in
validation/ inference.
Must be float in the range of [0, 1]
This table summarizes hyperparameters specific to the maskrcnn_* for instance

segmentation during inference.
mask_pixel_score_threshold Score cutoff for considering a pixel as part of the mask 0.5
of an object.
max_number_of_polygon_points Maximum number of (x, y) coordinate pairs in polygon 100

after converting from a mask.
export_as_image Export masks as images. False
image_type Type of image to export mask as (options are jpg, png, JPG
bmp).
Model agnostic hyperparameters
The following table describes the hyperparameters that are model agnostic.
number_of_epochs Number of training epochs. 15

Must be a positive integer. (except
yolov5 : 30)
training_batch_size Training batch size. Multi-

Must be a positive integer. class/multi-
label: 78
(except vit-
variants:
vits16r224 :
128
vitb16r224 : 48
vitl16r224 :10)
Object
detection: 2
(except
yolov5 : 16)
Instance
segmentation:
2
Note: The
defaults are
largest batch
size that can
be used on 12
GiB GPU
memory.
validation_batch_size Validation batch size. Multi-

Must be a positive integer. class/multi-
label: 78
(except vit-
variants:
vits16r224 :
128
vitb16r224 : 48
vitl16r224 :10)
Object
detection: 1
(except
yolov5 : 16)
Instance
segmentation:
1
Note: The
defaults are
largest batch
size that can
be used on 12
GiB GPU
memory.
grad_accumulation_step Gradient accumulation means running a 1

configured number of grad_accumulation_step
without updating the model weights while
accumulating the gradients of those steps, and
then using the accumulated gradients to compute
the weight updates.
early_stopping Enable early stopping logic during training. 1

Must be 0 or 1.
early_stopping_patience Minimum number of epochs or validation 5

evaluations with
no primary metric improvement before the run is
stopped.
early_stopping_delay Minimum number of epochs or validation 5

evaluations to wait
before primary metric improvement is tracked for
early stopping.
learning_rate Initial learning rate. Multi-class:

Must be a float in the range [0, 1]. 0.01
(except vit-
variants:
vits16r224 :
0.0125
vitb16r224 :
0.0125
vitl16r224 :
0.001)
Multi-label:
0.035
(except vit-
variants:
vits16r224 :
0.025
vitb16r224 :
0.025
vitl16r224 :
0.002)
Object
detection:
0.005
(except
yolov5 : 0.01)
Instance
segmentation:
0.005
lr_scheduler Type of learning rate scheduler. warmup_cosine

Must be warmup_cosine or step .
step_lr_gamma Value of gamma when learning rate scheduler is 0.5

step .
step_lr_step_size Value of step size when learning rate scheduler is 5

step .
warmup_cosine_lr_cycles Value of cosine cycle when learning rate scheduler 0.45

is warmup_cosine .
warmup_cosine_lr_warmup_epochs Value of warmup epochs when learning rate 2

scheduler is warmup_cosine .
optimizer Type of optimizer. sgd

Must be either sgd , adam , adamw .
momentum Value of momentum when optimizer is sgd . 0.9

weight_decay Value of weight decay when optimizer is sgd , 1e-4

adam , or adamw .
nesterov Enable nesterov when optimizer is sgd . 1

Must be 0 or 1.
beta1 Value of beta1 when optimizer is adam or adamw . 0.9

beta2 Value of beta2 when optimizer is adam or adamw . 0.999

amsgrad Enable amsgrad when optimizer is adam or adamw . 0

Must be 0 or 1.
evaluation_frequency Frequency to evaluate validation dataset to get 1

metric scores.
checkpoint_frequency Frequency to store model checkpoints. Checkpoint at

Must be a positive integer. epoch with
best primary
metric on
validation.
checkpoint_run_id The run ID of the experiment that has a pretrained no default

checkpoint for incremental training.
checkpoint_dataset_id FileDataset ID containing pretrained checkpoint(s) no default

for incremental training. Make sure to pass
checkpoint_filename along with
checkpoint_dataset_id .
checkpoint_filename The pretrained checkpoint filename in FileDataset no default

for incremental training. Make sure to pass
checkpoint_dataset_id along with
checkpoint_filename .
layers_to_freeze How many layers to freeze for your model. For no default
instance, passing 2 as value for seresnext means
freezing layer0 and layer1 referring to the below
supported model layer info.
'resnet': [('conv1.', 'bn1.'), 'layer1.',

'layer2.', 'layer3.', 'layer4.'],
'mobilenetv2': ['features.0.', 'features.1.',
'features.2.', 'features.3.', 'features.4.',
'features.17.', 'features.18.'],
'seresnext': ['layer0.', 'layer1.', 'layer2.',
'layer3.', 'layer4.'],
'vit': ['patch_embed', 'blocks.0.',
'blocks.1.', 'blocks.2.', 'blocks.3.',
'blocks.4.', 'blocks.5.',
'blocks.6.','blocks.7.', 'blocks.8.',
'blocks.9.', 'blocks.10.', 'blocks.11.'],
'yolov5_backbone': ['model.0.', 'model.1.',
'model.2.', 'model.3.', 'model.4.','model.5.',
'model.6.', 'model.7.', 'model.8.',
'model.9.'],
'resnet_backbone': ['backbone.body.conv1.',
'backbone.body.layer1.',
'backbone.body.layer2.','backbone.body.layer3.',
'backbone.body.layer4.']
Image classification (multi-class and multi-

label) specific hyperparameters
The following table summarizes hyperparmeters for image classification (multi-class and
multi-label) tasks.
weighted_loss 0 for no weighted loss. 0

1 for weighted loss with sqrt.(class_weights)
2 for weighted loss with class_weights.
Must be 0 or 1 or 2.
valid_resize_size Image size to which to resize before cropping for validation 256
dataset.
Notes:
seresnext doesn't take an arbitrary size.
Training run may get into CUDA OOM if the size is too big
.
valid_crop_size Image crop size that's input to your neural network for 224
validation dataset.
Notes:
ViT-variants should have the same valid_crop_size and
train_crop_size .
.
train_crop_size Image crop size that's input to your neural network for train 224
dataset.
Notes:
ViT-variants should have the same valid_crop_size and
train_crop_size .
.
Object detection and instance segmentation

task specific hyperparameters
The following hyperparameters are for object detection and instance segmentation
tasks.
２ Warning
These parameters are not supported with the yolov5 algorithm. See the model
specific hyperparameters section for yolov5 supported hyperparmeters.
validation_metric_type Metric computation method to use for validation metrics. voc

Must be none , coco , voc , or coco_voc .
validation_iou_threshold IOU threshold for box matching when computing 0.5

validation metrics.
Must be a float in the range [0.1, 1].
min_size Minimum size of the image to be rescaled before feeding 600

it to the backbone.
too big.
max_size Maximum size of the image to be rescaled before 1333

feeding it to the backbone.
too big.
box_score_thresh During inference, only return proposals with a 0.3

classification score greater than box_score_thresh .
nms_iou_thresh IOU (intersection over union) threshold used in non- 0.5

maximum suppression (NMS) for the prediction head.
Used during inference.
box_detections_per_img Maximum number of detections per image, for all 100

classes.
tile_grid_size The grid size to use for tiling each image. No

Note: tile_grid_size must not be None to enable small Default
object detection logic
A tuple of two integers passed as a string. Example: --
tile_grid_size "(3, 2)"
tile_overlap_ratio Overlap ratio between adjacent tiles in each dimension. 0.25

Must be float in the range of [0, 1)
tile_predictions_nms_thresh The IOU threshold to use to perform NMS while merging 0.25
predictions from tiles and image. Used in validation/
inference.
Must be float in the range of [0, 1]
Next steps
Learn how to Set up AutoML to train computer vision models with Python
(preview).
Tutorial: Train an object detection model (preview) with AutoML and Python.
Data schemas to train computer vision
models with automated machine
learning (v1)
Article • 06/02/2023
） Important

and Python SDK v2.
） Important
a service-level agreement. Certain features might not be supported or might have
Learn how to format your JSONL files for data consumption in automated ML
experiments for computer vision tasks during training and inference.
Data schema for training

Azure Machine Learning AutoML for Images requires input image data to be prepared in
JSONL (JSON Lines) format. This section describes input data formats or schema for
image classification multi-class, image classification multi-label, object detection, and
instance segmentation. We'll also provide a sample of final training or validation JSON
Lines file.
Image classification (binary/multi-class)

Input data format/schema in each JSON Line:
JSON
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":"class_name",
}
Key Description Example
image_url Image location in Azure "AmlDatastore://data_directory/Image_01.jpg"

Machine Learning datastore
Required, String
image_details Image details "image_details":{"format": "jpg", "width":

Optional, Dictionary "400px", "height": "258px"}
format Image type (all the available "jpg" or "jpeg" or "png" or "jpe" or "jfif"
Image formats in Pillow or "bmp" or "tif" or "tiff"
library are supported)
Optional, String from
{"jpg", "jpeg", "png", "jpe",
"jfif","bmp", "tif", "tiff"}
width Width of the image "400px" or 400

Optional, String or Positive
Integer
height Height of the image "200px" or 200

Integer
label Class/label of the image "cat"

Required, String
Example of a JSONL file for multi-class image classification:
JSON
{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":

{"format": "jpg", "width": "400px", "height": "258px"}, "label": "can"}
{"format": "jpg", "width": "397px", "height": "296px"}, "label":
"milk_bottle"}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details":
"water_bottle"}
Image classification multi-label

The following is an example of input data format/schema in each JSON Line for image
classification.
JSON
{
"image_details":{
},
"label":[
"class_name_1",
"class_name_2",
"class_name_3",
"...",
"class_name_n"
]
}


Required, String

format Image type (all the Image "jpg" or "jpeg" or "png" or "jpe" or "jfif"
formats available in Pillow or "bmp" or "tif" or "tiff"
library are supported)
"jfif", "bmp", "tif",
"tiff"}

Integer

Integer
label List of classes/labels in the ["cat","dog"]

image
Required, List of Strings
Example of a JSONL file for Image Classification Multi-label:
JSON

{"format": "jpg", "width": "400px", "height": "258px"}, "label": ["can"]}
["can","milk_bottle"]}
.
.
.
["carton","milk_bottle","water_bottle"]}
Object detection
The following is an example JSONL file for object detection.
JSON
{
"image_details":{
},
"label":[
{
"label":"class_name_1",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
{
"label":"class_name_2",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
"..."
]
}
Here,
xmin = x coordinate of top-left corner of bounding box

ymin = y coordinate of top-left corner of bounding box
xmax = x coordinate of bottom-right corner of bounding box

ymax = y coordinate of bottom-right corner of bounding box

Required, String

format Image type (all the Image "jpg" or "jpeg" or "png" or "jpe" or "jfif"
formats available in Pillow or "bmp" or "tif" or "tiff"
library are supported. But for
YOLO only image formats
allowed by opencv are
supported)
"jfif", "bmp", "tif",
"tiff"}

Integer

Integer
label (outer List of bounding boxes, where [{"label": "cat", "topX": 0.260, "topY":
key) each box is a dictionary of 0.406, "bottomX": 0.735, "bottomY": 0.701,
label, topX, topY, bottomX, "isCrowd": 0}]
bottomY, isCrowd their top-left
and bottom-right coordinates
Required, List of
dictionaries
label (inner Class/label of the object in the "cat"

key) bounding box
Required, String
topX Ratio of x coordinate of top- 0.260

left corner of the bounding
box and width of the image
Required, Float in the range
[0,1]
topY Ratio of y coordinate of top- 0.406

left corner of the bounding
box and height of the image
[0,1]
bottomX Ratio of x coordinate of 0.735

bottom-right corner of the
bounding box and width of
the image
[0,1]
bottomY Ratio of y coordinate of 0.701

bottom-right corner of the
bounding box and height of
the image
[0,1]
isCrowd Indicates whether the 0

bounding box is around the
crowd of objects. If this special
flag is set, we skip this
particular bounding box when
calculating the metric.
Optional, Bool
Example of a JSONL file for object detection:
JSON

{"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label":
"can", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701,
"isCrowd": 0}]}
"carton", "topX": 0.172, "topY": 0.153, "bottomX": 0.432, "bottomY": 0.659,
"isCrowd": 0}, {"label": "milk_bottle", "topX": 0.300, "topY": 0.566,
"bottomX": 0.891, "bottomY": 0.735, "isCrowd": 0}]}
.
.
.
"carton", "topX": 0.0180, "topY": 0.297, "bottomX": 0.380, "bottomY": 0.836,
"isCrowd": 0}, {"label": "milk_bottle", "topX": 0.454, "topY": 0.348,
"bottomX": 0.613, "bottomY": 0.683, "isCrowd": 0}, {"label": "water_bottle",
"topX": 0.667, "topY": 0.279, "bottomX": 0.841, "bottomY": 0.615, "isCrowd":
0}]}
Instance segmentation
For instance segmentation, automated ML only support polygon as input and output,
no masks.
The following is an example JSONL file for instance segmentation.
JSON
{
"image_details":{
},
"label":[
{
"label":"class_name",
"isCrowd":"isCrowd",
"polygon":[["x1", "y1", "x2", "y2", "x3", "y3", "...", "xn", "yn"]]
}
]
}

Required, String

format Image type "jpg" or "jpeg" or "png" or "jpe" or "jfif"

Optional, String from or "bmp" or "tif" or "tiff"
"jfif", "bmp", "tif", "tiff"
}

Integer

Integer
label (outer List of masks, where each [{"label": "can", "isCrowd": 0, "polygon":
key) mask is a dictionary of label, [[0.577, 0.689,
isCrowd, polygon coordinates 0.562, 0.681,
Required, List of 0.559, 0.686]]}]
dictionaries
label (inner Class/label of the object in the "cat"

key) mask
Required, String
isCrowd Indicates whether the mask is 0

around the crowd of objects
Optional, Bool
polygon Polygon coordinates for the [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686]]
object
Required, List of list for
multiple segments of the same
instance. Float values in the
range [0,1]
Example of a JSONL file for Instance Segmentation:

Python

"can", "isCrowd": 0, "polygon": [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686,
0.380, 0.593, 0.304, 0.555, 0.294, 0.545, 0.290, 0.534, 0.274, 0.512,
0.2705, 0.496, 0.270, 0.478, 0.284, 0.453, 0.308, 0.432, 0.326, 0.423,
0.356, 0.415, 0.418, 0.417, 0.635, 0.493, 0.683, 0.507, 0.701, 0.518, 0.709,
0.528, 0.713, 0.545, 0.719, 0.554, 0.719, 0.579, 0.713, 0.597, 0.697, 0.621,
0.695, 0.629, 0.631, 0.678, 0.619, 0.683, 0.595, 0.683, 0.577, 0.689]]}]}
"carton", "isCrowd": 0, "polygon": [[0.240, 0.65, 0.234, 0.654, 0.230,
0.647, 0.210, 0.512, 0.202, 0.403, 0.182, 0.267, 0.184, 0.243, 0.180, 0.166,
0.186, 0.159, 0.198, 0.156, 0.396, 0.162, 0.408, 0.169, 0.406, 0.217, 0.414,
0.249, 0.422, 0.262, 0.422, 0.569, 0.342, 0.569, 0.334, 0.572, 0.320, 0.585,
0.308, 0.624, 0.306, 0.648, 0.240, 0.657]]}, {"label": "milk_bottle",
"isCrowd": 0, "polygon": [[0.675, 0.732, 0.635, 0.731, 0.621, 0.725, 0.573,
0.717, 0.516, 0.717, 0.505, 0.720, 0.462, 0.722, 0.438, 0.719, 0.396, 0.719,
0.358, 0.714, 0.334, 0.714, 0.322, 0.711, 0.312, 0.701, 0.306, 0.687, 0.304,
0.663, 0.308, 0.630, 0.320, 0.596, 0.32, 0.588, 0.326, 0.579]]}]}
.
.
.
"water_bottle", "isCrowd": 0, "polygon": [[0.334, 0.626, 0.304, 0.621,
0.254, 0.603, 0.164, 0.605, 0.158, 0.602, 0.146, 0.602, 0.142, 0.608, 0.094,
0.612, 0.084, 0.599, 0.080, 0.585, 0.080, 0.539, 0.082, 0.536, 0.092, 0.533,
0.126, 0.530, 0.132, 0.533, 0.144, 0.533, 0.162, 0.525, 0.172, 0.525, 0.186,
0.521, 0.196, 0.521 ]]}, {"label": "milk_bottle", "isCrowd": 0, "polygon":
[[0.392, 0.773, 0.380, 0.732, 0.379, 0.767, 0.367, 0.755, 0.362, 0.735,
0.362, 0.714, 0.352, 0.644, 0.352, 0.611, 0.362, 0.597, 0.40, 0.593, 0.444,
0.494, 0.588, 0.515, 0.585, 0.621, 0.588, 0.671, 0.582, 0.713, 0.572, 0.753
]]}]}
Data format for inference
In this section, we document the input data format required to make predictions when
using a deployed model. Any aforementioned image format is accepted with content
type application/octet-stream .
Input format
The following is the input format needed to generate predictions on any task using task-
specific model endpoint. After we deploy the model, we can use the following code
snippet to get predictions for all tasks.
Python
# input image for inference

sample_image = './test_image.jpg'
# load image data
data = open(sample_image, 'rb').read()
# set the content type
headers = {'Content-Type': 'application/octet-stream'}
# if authentication is enabled, set the authorization header
# make the request and display the response
response = requests.post(scoring_uri, data, headers=headers)
Output format
Predictions made on model endpoints follow different structure depending on the task
type. This section explores the output data formats for multi-class, multi-label image
classification, object detection, and instance segmentation tasks.
Image classification
Endpoint for image classification returns all the labels in the dataset and their
probability scores for the input image in the following format.
JSON
{
"filename":"/tmp/tmppjr4et28",
"probs":[
2.098e-06,
4.783e-08,
0.999,
8.637e-06
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
Image classification multi-label

For image classification multi-label, model endpoint returns labels and their
probabilities.
JSON
{
"filename":"/tmp/tmpsdzxlmlm",
"probs":[
0.997,
0.960,
0.982,
0.025
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
Object detection
Object detection model returns multiple boxes with their scaled top-left and bottom-
right coordinates along with box label and confidence score.
JSON
{
"filename":"/tmp/tmpdkg2wkdy",
"boxes":[
{
"box":{
"topX":0.224,
"topY":0.285,
"bottomX":0.399,
"bottomY":0.620
},
"label":"milk_bottle",
"score":0.937
},
{
"box":{
"topX":0.664,
"topY":0.484,
"bottomX":0.959,
"bottomY":0.812
},
"label":"can",
"score":0.891
},
{
"box":{
"topX":0.423,
"topY":0.253,
"bottomX":0.632,
"bottomY":0.725
},
"label":"water_bottle",
"score":0.876
}
]
}
Instance segmentation
In instance segmentation, output consists of multiple boxes with their scaled top-left
and bottom-right coordinates, labels, confidence scores, and polygons (not masks).
Here, the polygon values are in the same format that we discussed in the schema
section.
JSON
{
"filename":"/tmp/tmpi8604s0h",
"boxes":[
{
"box":{
"topX":0.679,
"topY":0.491,
"bottomX":0.926,
"bottomY":0.810
},
"label":"can",
"score":0.992,
"polygon":[
[
0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735,
0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717,
0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745,
0.502, 0.755, 0.493
]
]
},
{
"box":{
"topX":0.220,
"topY":0.298,
"bottomX":0.397,
"bottomY":0.601
},
"label":"milk_bottle",
"score":0.989,
"polygon":[
[
0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251,
0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442,
0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238,
0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375,
0.598, 0.365
]
]
},
{
"box":{
"topX":0.433,
"topY":0.280,
"bottomX":0.621,
"bottomY":0.679
},
"label":"water_bottle",
"score":0.988,
"polygon":[
[
0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625,
0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431,
0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327,
0.471, 0.318
]
]
}
]
}
７ Note
The images used in this article are from the Fridge Objects dataset, copyright ©
Microsoft Corporation and available at computervision-
recipes/01_training_introduction.ipynb under the MIT License .
Next steps
Learn how to Prepare data for training computer vision models with automated
ML.
Set up computer vision tasks in AutoML
Tutorial: Train an object detection model (preview) with AutoML and Python.
Migrate to Azure Machine Learning
from ML Studio (classic)
Article • 03/14/2023
） Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We
recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning
Studio (classic) resources. Through 31 August 2024, you can continue to use the
existing Machine Learning Studio (classic) resources.
ML Studio (classic) documentation is being retired and may not be updated in the
future.
Learn how to migrate from Studio (classic) to Azure Machine Learning. Azure Machine
Learning provides a modernized data science platform that combines no-code and
code-first approaches.
This is a guide for a basic "lift and shift" migration. If you want to optimize an existing
machine learning workflow, or modernize a machine learning platform, see the Azure
Machine Learning adoption framework for additional resources including digital
survey tools, worksheets, and planning templates.
Please work with your Cloud Solution Architect on the migration.

Recommended approach
To migrate to Azure Machine Learning, we recommend the following approach:
＂ Step 1: Assess Azure Machine Learning

＂ Step 2: Define a strategy and plan
＂ Step 3: Rebuild experiments and web services
＂ Step 4: Integrate client apps
＂ Step 5: Clean up Studio (classic) assets
＂ Step 6: Review and expand scenarios
Step 1: Assess Azure Machine Learning

1. Learn about Azure Machine Learning ; its benefits, costs, and architecture.
2. Compare the capabilities of Azure Machine Learning and Studio (classic).
７ Note
The designer feature in Azure Machine Learning provides a similar drag-and-

drop experience to Studio (classic). However, Azure Machine Learning also
provides robust code-first workflows as an alternative. This migration series
focuses on the designer, since it's most similar to the Studio (classic)
experience.
The following table summarizes the key differences between ML Studio (classic)
and Azure Machine Learning.
Feature ML Studio (classic) Azure Machine Learning
Drag and drop Classic experience Updated experience - Azure Machine

interface Learning designer
Code SDKs Not supported Fully integrated with Azure Machine

Learning Python and R SDKs
Experiment Scalable (10-GB training Scale with compute target

data limit)
Training compute Proprietary compute Wide range of customizable training

targets target, CPU support only compute targets. Includes GPU and CPU
support
Deployment Proprietary web service Wide range of customizable deployment

compute targets format, not customizable compute targets. Includes GPU and CPU
support
ML Pipeline Not supported Build flexible, modular pipelines to

automate workflows
MLOps Basic model management Entity versioning (model, data,

and deployment; CPU only workflows), workflow automation,
deployments integration with CICD tooling, CPU and
GPU deployments and more
Model format Proprietary format, Studio Multiple supported formats depending

(classic) only on training job type
Automated Not supported Supported. Code-first and no-code

model training options.
and
hyperparameter
tuning
Data drift Not supported Supported

detection
Data labeling Not supported Supported

projects
Role-Based Only contributor and Flexible role definition and RBAC control
Access Control owner role
(RBAC)
Feature ML Studio (classic) Azure Machine Learning
AI Gallery Supported Unsupported

(https://gallery.azure.ai/ )
Learn with sample Python SDK
notebooks .
3. Verify that your critical Studio (classic) modules are supported in Azure Machine
Learning designer. For more information, see the Studio (classic) and designer
component-mapping table below.
Step 2: Define a strategy and plan

1. Define business justifications and expected outcomes.
2. Align an actionable Azure Machine Learning adoption plan to business outcomes.
3. Prepare people, processes, and environments for change.
Please work with your Cloud Solution Architect to define your strategy.
See the Azure Machine Learning Adoption Framework for planning resources
including a planning doc template.
Step 3: Rebuild your first model

After you've defined a strategy, migrate your first model.
1. Migrate datasets to Azure Machine Learning.
2. Use the designer to rebuild experiments.
3. Use the designer to redeploy web services.
７ Note
Above guidance are built on top of Azure Machine Learning v1 concepts and
features. Azure Machine Learning has CLI v2 and Python SDK v2. We suggest
to rebuild your ML Studio(classic) models using v2 instead of v1. Start with
Azure Machine Learning v2
Step 4: Integrate client apps

1. Modify client applications that invoke Studio (classic) web services to use your new
Azure Machine Learning endpoints.
Step 5: Cleanup Studio (classic) assets

1. Clean up Studio (classic) assets to avoid extra charges. You may want to retain
assets for fallback until you have validated Azure Machine Learning workloads.
Step 6: Review and expand scenarios

1. Review the model migration for best practices and validate workloads.
2. Expand scenarios and migrate additional workloads to Azure Machine Learning.
Studio (classic) and designer component-

mapping
Consult the following table to see which modules to use while rebuilding Studio (classic)
experiments in the designer.
） Important
The designer implements modules through open-source Python packages rather

than C# packages like Studio (classic). Because of this difference, the output of
designer components may vary slightly from their Studio (classic) counterparts.
Category Studio (classic) module Replacement designer

component
Data input and output - Enter Data Manually - Enter Data Manually
- Export Data - Export Data
- Import Data - Import Data
- Load Trained Model
- Unpack Zipped Datasets
Data Format Conversions - Convert to CSV - Convert to CSV

- Convert to Dataset - Convert to Dataset
- Convert to ARFF
- Convert to SVMLight
- Convert to TSV
component
Data Transformation - - Add Columns - Add Columns

Manipulation - Add Rows - Add Rows
- Apply SQL Transformation - Apply SQL Transformation
- Cleaning Missing Data - Cleaning Missing Data
- Convert to Indicator Values - Convert to Indicator
- Edit Metadata Values
- Join Data - Edit Metadata
- Remove Duplicate Rows - Join Data
- Select Columns in Dataset - Remove Duplicate Rows
- Select Columns Transform - Select Columns in Dataset
- SMOTE - Select Columns Transform
- Group Categorical Values - SMOTE
Data Transformation – Scale and - Clip Values - Clip Values

Reduce - Group Data into Bins - Group Data into Bins
- Normalize Data - Normalize Data
- Principal Component Analysis
Data Transformation – Sample - Partition and Sample - Partition and Sample

and Split - Split Data - Split Data
Data Transformation – Filter - Apply Filter

- FIR Filter
- IIR Filter
- Median Filter
- Moving Average Filter
- Threshold Filter
- User Defined Filter
Data Transformation – Learning - Build Counting Transform

with Counts - Export Count Table
- Import Count Table
- Merge Count Transform
- Modify Count Table
Parameters
Feature Selection - Filter Based Feature Selection - Filter Based Feature

- Fisher Linear Discriminant Selection
Analysis - Permutation Feature
- Permutation Feature Importance
Importance
component
Model - Classification - Multiclass Decision Forest - Multiclass Decision Forest

- Multiclass Decision Jungle - Multiclass Boost Decision
- Multiclass Logistic Regression Tree
- Multiclass Neural Network - Multiclass Logistic
- One-vs-All Multiclass Regression
- Two-Class Averaged - Multiclass Neural Network
Perceptron - One-vs-All Multiclass
- Two-Class Bayes Point - Two-Class Averaged
Machine Perceptron
- Two-Class Boosted Decision - Two-Class Boosted
Tree Decision Tree
- Two-Class Decision Forest - Two-Class Decision Forest
- Two-Class Decision Jungle - Two-Class Logistic
- Two-Class Locally-Deep SVM Regression
- Two-Class Logistic Regression - Two-Class Neural Network
- Two-Class Neural Network - Two-Class Support Vector
- Two-Class Support Vector Machine
Machine
Model - Clustering - K-means clustering - K-means clustering
Model - Regression - Bayesian Linear Regression - Boosted Decision Tree

- Boosted Decision Tree Regression
Regression - Decision Forest Regression
- Decision Forest Regression - Fast Forest Quantile
- Fast Forest Quantile Regression
Regression - Linear Regression
- Linear Regression - Neural Network
- Neural Network Regression Regression
- Ordinal Regression Poisson - Poisson Regression
Regression
Model – Anomaly Detection - One-Class SVM - PCA-Based Anomaly

- PCA-Based Anomaly Detection
Detection
Machine Learning – Evaluate - Cross Validate Model - Cross Validate Model

- Evaluate Model - Evaluate Model
- Evaluate Recommender - Evaluate Recommender
component
Machine Learning – Train - Sweep Clustering - Train Anomaly Detection

- Train Anomaly Detection Model
Model - Train Clustering Model
- Train Clustering Model - Train Model -
- Train Matchbox Recommender - Train PyTorch Model
- - Train SVD Recommender
Train Model - Train Wide and Deep
- Tune Model Hyperparameters Recommender
- Tune Model
Hyperparameters
Machine Learning – Score - Apply Transformation - Apply Transformation

- Assign Data to clusters - Assign Data to clusters
- Score Matchbox - Score Image Model
Recommender - Score Model
- Score Model - Score SVD Recommender
-Score Wide and Deep
Recommender
OpenCV Library Modules - Import Images

- Pre-trained Cascade Image
Classification
Python Language Modules - Execute Python Script - Execute Python Script

- Create Python Model
R Language Modules - Execute R Script - Execute R Script

- Create R Model
Statistical Functions - Apply Math Operation - Apply Math Operation

- Compute Elementary Statistics - Summarize Data
- Compute Linear Correlation
- Evaluate Probability Function
- Replace Discrete Values
- Summarize Data
- Test Hypothesis using t-Test
component
Text Analytics - Detect Languages - Convert Word to Vector

- Extract Key Phrases from Text - Extract N-Gram Features
- Extract N-Gram Features from from Text
Text - Feature Hashing
- Feature Hashing - Latent Dirichlet Allocation
- Latent Dirichlet Allocation - Preprocess Text
- Named Entity Recognition - Score Vowpal Wabbit
- Preprocess Text Model
- Score Vowpal Wabbit Version - Train Vowpal Wabbit
7-10 Model Model
- Score Vowpal Wabbit Version
8 Model
- Train Vowpal Wabbit Version
7-10 Model
- Train Vowpal Wabbit Version 8
Model
Time Series - Time Series Anomaly

Detection
Web Service - Input - Input

- Output - Output
Computer Vision - Apply Image

Transformation
- Convert to Image
Directory
- Init Image Transformation
- Split Image Directory
- DenseNet Image
Classification
- ResNet Image
Classification
For more information on how to use individual designer components, see the designer
component reference.
What if a designer component is missing?

Azure Machine Learning designer contains the most popular modules from Studio
(classic). It also includes new modules that take advantage of the latest machine learning
techniques.
If your migration is blocked due to missing modules in the designer, contact us by

creating a support ticket .
Example migration
The following experiment migration highlights some of the differences between Studio
(classic) and Azure Machine Learning.
Datasets
In Studio (classic), datasets were saved in your workspace and could only be used by
Studio (classic).
In Azure Machine Learning, datasets are registered to the workspace and can be used
across all of Azure Machine Learning. For more information on the benefits of Azure
Machine Learning datasets, see Secure data access.
Pipeline
In Studio (classic), experiments contained the processing logic for your work. You
created experiments with drag-and-drop modules.
In Azure Machine Learning, pipelines contain the processing logic for your work. You
can create pipelines with either drag-and-drop modules or by writing code.
Web service endpoint
Studio (classic) used REQUEST/RESPOND API for real-time prediction and BATCH
EXECUTION API for batch prediction or retraining.
Azure Machine Learning uses real-time endpoints (managed endpoints) for real-time
prediction and pipeline endpoints for batch prediction or retraining.
Next steps
In this article, you learned the high-level requirements for migrating to Azure Machine
Learning. For detailed steps, see the other articles in the Studio (classic) migration series:
1. Migration overview.
2. Migrate dataset.
3. Rebuild a Studio (classic) training pipeline.
4. Rebuild a Studio (classic) web service.
5. Integrate an Azure Machine Learning web service with client apps.
6. Migrate Execute R Script.
See the Azure Machine Learning Adoption Framework for additional migration
resources.
Migrate a Studio (classic) dataset to
Article • 03/14/2023
） Important
Studio (classic) resources (workspace and web service plan). Through 31 August
2024, you can continue to use the existing Machine Learning Studio (classic)
experiments and web services.
See information on moving machine learning projects from ML Studio

(classic) to Azure Machine Learning.
Learn more about Azure Machine Learning
future.
In this article, you learn how to migrate a Studio (classic) dataset to Azure Machine
Learning. For more information on migrating from Studio (classic), see the migration
overview article.
You have three options to migrate a dataset to Azure Machine Learning. Read each
section to determine which option is best for your scenario.
Where is the Migration option

data?
In Studio Option 1: Download the dataset from Studio (classic) and upload it to Azure
(classic) Machine Learning.
Cloud storage Option 2: Register a dataset from a cloud source.
Option 3: Use the Import Data module to get data from a cloud source.
７ Note
Azure Machine Learning also supports code-first workflows for creating and
managing datasets.
Prerequisites
An Azure account with an active subscription. Create an account for free .
A Studio (classic) dataset to migrate.
Download the dataset from Studio (classic)

The simplest way to migrate a Studio (classic) dataset to Azure Machine Learning is to
download your dataset and register it in Azure Machine Learning. This creates a new
copy of your dataset and uploads it to an Azure Machine Learning datastore.
You can download the following Studio (classic) dataset types directly.
Plain text (.txt)

Comma-separated values (CSV) with a header (.csv) or without (.nh.csv)
Tab-separated values (TSV) with a header (.tsv) or without (.nh.tsv)
Excel file
Zip file (.zip)
To download datasets directly:
1. Go to your Studio (classic) workspace (https://studio.azureml.net ).
2. In the left navigation bar, select the Datasets tab.
3. Select the dataset(s) you want to download.
4. In the bottom action bar, select Download.


For the following data types, you must use the Convert to CSV module to download
datasets.
SVMLight data (.svmlight)

Attribute Relation File Format (ARFF) data (.arff)
R object or workspace file (.RData)
Dataset type (.data). Dataset type is Studio(classic) internal data type for module
output.
To convert your dataset to a CSV and download the results:
1. Go to your Studio (classic) workspace (https://studio.azureml.net ).
2. Create a new experiment.
3. Drag and drop the dataset you want to download onto the canvas.
4. Add a Convert to CSV module.
5. Connect the Convert to CSV input port to the output port of your dataset.
6. Run the experiment.
7. Right-click the Convert to CSV module.
8. Select Results dataset > Download.


Upload your dataset to Azure Machine Learning

After you download the data file, you can register it as a data asset in Azure Machine
Learning:
1. Navigate to Azure Machine Learning studio
2. Under Assets in the left navigation, select Data. On the Data assets tab, select
Create
3. Give your data asset a name and optional description. Then, select the Tabular
option under Type, in the Dataset types section of the dropdown.
７ Note
You can also upload ZIP files as data assets. To upload a ZIP file, select File for
Type, in the Dataset types section of the dropdown.
4. For data source, select the "From local files" option to upload your dataset.
5. For file selection, first choose where you want your data to be stored in Azure. You
will be selecting an Azure Machine Learning datastore. For more information on
datastores, see Connect to storage services. Next, upload the dataset you
downloaded earlier.
6. Follow the steps to set the data parsing settings and schema for your data asset.
7. Once you reach the Review step, click Create on the last page
Import data from cloud sources

If your data is already in a cloud storage service, and you want to keep your data in its
native location. You can use either of the following options:
Ingestion Description
method
Ingestion Description
method
Register an Ingest data from local and online data sources (Blob, ADLS Gen1, ADLS Gen2,
Azure Machine File share, SQL DB).
Learning
dataset Creates a reference to the data source, which is lazily evaluated at runtime. Use
this option if you repeatedly access this dataset and want to enable advanced
data features like data versioning and monitoring.
Import Data Ingest data from online data sources (Blob, ADLS Gen1, ADLS Gen2, File share,
module SQL DB).
The dataset is only imported to the current designer pipeline run.
７ Note
Studio (classic) users should note that the following cloud sources are not natively
supported in Azure Machine Learning:
Hive Query
Azure Table
Azure Cosmos DB
On-premises SQL Database
We recommend that users migrate their data to a supported storage services using
Azure Data Factory.
Register an Azure Machine Learning dataset

Use the following steps to register a dataset to Azure Machine Learning from a cloud
service:
1. Create a datastore, which links the cloud storage service to your Azure Machine
Learning workspace.
2. Register a dataset. If you are migrating a Studio (classic) dataset, select the Tabular
dataset setting.
After you register a dataset in Azure Machine Learning, you can use it in designer:
1. Create a new designer pipeline draft.

2. In the module palette to the left, expand the Datasets section.
3. Drag your registered dataset onto the canvas.
Use the Import Data module
Use the following steps to import data directly to your designer pipeline:
1. Create a datastore, which links the cloud storage service to your Azure Machine
Learning workspace.
After you create the datastore, you can use the Import Data module in the designer to
ingest data from it:
1. Create a new designer pipeline draft.

2. In the module palette to the left, find the Import Data module and drag it to the
canvas.
3. Select the Import Data module, and use the settings in the right panel to
configure your data source.
Next steps
In this article, you learned how to migrate a Studio (classic) dataset to Azure Machine
Learning. The next step is to rebuild a Studio (classic) training pipeline.
See the other articles in the Studio (classic) migration series:
2. Migrate datasets.
Rebuild a Studio (classic) experiment in
Article • 05/29/2023
） Important

future.
In this article, you learn how to rebuild an ML Studio (classic) experiment in Azure
Machine Learning. For more information on migrating from Studio (classic), see the
migration overview article.
Studio (classic) experiments are similar to pipelines in Azure Machine Learning.

However, in Azure Machine Learning pipelines are built on the same back-end that
powers the SDK. This means that you have two options for machine learning
development: the drag-and-drop designer or code-first SDKs.
For more information on building pipelines with the SDK, see What are Azure Machine
Learning pipelines.
Prerequisites
A Studio (classic) experiment to migrate.
Upload your dataset to Azure Machine Learning.
Rebuild the pipeline
After you migrate your dataset to Azure Machine Learning, you're ready to recreate your
experiment.
In Azure Machine Learning, the visual graph is called a pipeline draft. In this section, you
recreate your classic experiment as a pipeline draft.
1. Go to Azure Machine Learning studio (ml.azure.com )
2. In the left navigation pane, select Designer > Easy-to-use prebuilt modules
3. Manually rebuild your experiment with designer components.
Consult the module-mapping table to find replacement modules. Many of Studio

(classic)'s most popular modules have identical versions in the designer.
） Important
If your experiment uses the Execute R Script module, you need to perform
additional steps to migrate your experiment. For more information, see
Migrate R Script modules.
4. Adjust parameters.
Select each module and adjust the parameters in the module settings panel to the
right. Use the parameters to recreate the functionality of your Studio (classic)
experiment. For more information on each module, see the module reference.
Submit a job and check results

After you recreate your Studio (classic) experiment, it's time to submit a pipeline job.
A pipeline job executes on a compute target attached to your workspace. You can set a
default compute target for the entire pipeline, or you can specify compute targets on a
per-module basis.
Once you submit a job from a pipeline draft, it turns into a pipeline job. Each pipeline
job is recorded and logged in Azure Machine Learning.
To set a default compute target for the entire pipeline:
1. Select the Gear icon next to the pipeline name.

2. Select Select compute target.
3. Select an existing compute, or create a new compute by following the on-screen
instructions.
Now that your compute target is set, you can submit a pipeline job:
1. At the top of the canvas, select Submit.
2. Select Create new to create a new experiment.
Experiments organize similar pipeline jobs together. If you run a pipeline multiple
times, you can select the same experiment for successive jobs. This is useful for
logging and tracking.
3. Enter an experiment name. Then, select Submit.
The first job may take up to 20 minutes. Since the default compute settings have a
minimum node size of 0, the designer must allocate resources after being idle.
Successive jobs take less time, since the nodes are already allocated. To speed up
the running time, you can create a compute resource with a minimum node size of
1 or greater.
After the job finishes, you can check the results of each module:
1. Right-click the module whose output you want to see.
2. Select either Visualize, View Output, or View Log.

Visualize: Preview the results dataset.
View Output: Open a link to the output storage location. Use this to explore
or download the output.
View Log: View driver and system logs. Use the 70_driver_log to see
information related to your user-submitted script such as errors and
exceptions.
） Important
Designer components use open source Python packages to implement machine

learning algorithms. However Studio (classic) uses a Microsoft internal C# library.
Therefore, prediction result may vary between the designer and Studio (classic).
Save trained model to use in another pipeline

Sometimes you may want to save the model trained in a pipeline and use the model in
another pipeline later. In Studio (classic), all trained models are saved in "Trained
Models" category in the module list. In designer, the trained models are automatically
registered as file dataset with a system generated name. Naming convention follows
"MD - pipeline draft name - component name - Trained model ID" pattern.
To give a trained model a meaningful name, you can register the output of Train Model
component as a file dataset. Give it the name you want, for example linear-regression-
model.
You can find the trained model in "Dataset" category in the component list or search it
by name. Then connect the trained model to a Score Model component to use it for
prediction.
Next steps
In this article, you learned how to rebuild a Studio (classic) experiment in Azure Machine
Learning. The next step is to rebuild web services in Azure Machine Learning.
2. Migrate dataset.
Rebuild a Studio (classic) web service in
Article • 03/14/2023
） Important

future.
In this article, you learn how to rebuild an ML Studio (classic) web service as an endpoint
in Azure Machine Learning.
Use Azure Machine Learning pipeline endpoints to make predictions, retrain models, or
run any generic pipeline. The REST endpoint lets you run pipelines from any platform.
This article is part of the Studio (classic) to Azure Machine Learning migration series. For
more information on migrating to Azure Machine Learning, see the migration overview
article.
７ Note
This migration series focuses on the drag-and-drop designer. For more information
on deploying models programmatically, see Deploy machine learning models in
Azure.
Prerequisites
An Azure Machine Learning training pipeline. For more information, see Rebuild a
Studio (classic) experiment in Azure Machine Learning.
Real-time endpoint vs pipeline endpoint

Studio (classic) web services have been replaced by endpoints in Azure Machine
Learning. Use the following table to choose which endpoint type to use:
Studio (classic) web service Azure Machine Learning replacement
Request/respond web service (real-time prediction) Real-time endpoint
Batch web service (batch prediction) Pipeline endpoint
Retraining web service (retraining) Pipeline endpoint
Deploy a real-time endpoint

In Studio (classic), you used a REQUEST/RESPOND web service to deploy a model for
real-time predictions. In Azure Machine Learning, you use a real-time endpoint.
There are multiple ways to deploy a model in Azure Machine Learning. One of the
simplest ways is to use the designer to automate the deployment process. Use the
following steps to deploy a model as a real-time endpoint:
1. Run your completed training pipeline at least once.
2. After the job completes, at the top of the canvas, select Create inference pipeline
> Real-time inference pipeline.
The designer converts the training pipeline into a real-time inference pipeline. A
similar conversion also occurs in Studio (classic).
In the designer, the conversion step also registers the trained model to your Azure
3. Select Submit to run the real-time inference pipeline, and verify that it runs
successfully.
4. After you verify the inference pipeline, select Deploy.
5. Enter a name for your endpoint and a compute type.
The following table describes your deployment compute options in the designer:
Compute Used for Description Creation

target
Compute Used for Description Creation
target
Azure Real-time Large-scale, production User-created. For more

Kubernetes inference deployments. Fast response information, see Create
Service time and service autoscaling. compute targets.
(AKS)
Azure Testing or Small-scale, CPU-based Automatically created by

Container development workloads that require less than Azure Machine Learning.
Instances 48 GB of RAM.
Test the real-time endpoint

After deployment completes, you can see more details and test your endpoint:
1. Go the Endpoints tab.
2. Select your endpoint.
3. Select the Test tab.

Publish a pipeline endpoint for batch
prediction or retraining
You can also use your training pipeline to create a pipeline endpoint instead of a real-
time endpoint. Use pipeline endpoints to perform either batch prediction or retraining.
Pipeline endpoints replace Studio (classic) batch execution endpoints and retraining
web services.
Publish a pipeline endpoint for batch prediction

Publishing a batch prediction endpoint is similar to the real-time endpoint.
Use the following steps to publish a pipeline endpoint for batch prediction:
1. Run your completed training pipeline at least once.
2. After the job completes, at the top of the canvas, select Create inference pipeline
> Batch inference pipeline.
The designer converts the training pipeline into a batch inference pipeline. A
similar conversion also occurs in Studio (classic).
In the designer, this step also registers the trained model to your Azure Machine
Learning workspace.
3. Select Submit to run the batch inference pipeline and verify that it successfully
completes.
4. After you verify the inference pipeline, select Publish.

5. Create a new pipeline endpoint or select an existing one.
A new pipeline endpoint creates a new REST endpoint for your pipeline.
If you select an existing pipeline endpoint, you don't overwrite the existing
pipeline. Instead, Azure Machine Learning versions each pipeline in the endpoint.
You can specify which version to run in your REST call. You must also set a default
pipeline if the REST call doesn't specify a version.
Publish a pipeline endpoint for retraining

To publish a pipeline endpoint for retraining, you must already have a pipeline draft that
trains a model. For more information on building a training pipeline, see Rebuild a
Studio (classic) experiment.
To reuse your pipeline endpoint for retraining, you must create a pipeline parameter for
your input dataset. This lets you dynamically set your training dataset, so that you can
retrain your model.
Use the following steps to publish retraining pipeline endpoint:
1. Run your training pipeline at least once.
2. After the run completes, select the dataset module.
3. In the module details pane, select Set as pipeline parameter.
4. Provide a descriptive name like "InputDataset".

This creates a pipeline parameter for your input dataset. When you call your
pipeline endpoint for training, you can specify a new dataset to retrain the model.
5. Select Publish.
Call your pipeline endpoint from the studio
After you create your batch inference or retraining pipeline endpoint, you can call your
endpoint directly from your browser.
1. Go to the Pipelines tab, and select Pipeline endpoints.
2. Select the pipeline endpoint you want to run.
3. Select Submit.
You can specify any pipeline parameters after you select Submit.
Next steps
In this article, you learned how to rebuild a Studio (classic) web service in Azure Machine
Learning. The next step is to integrate your web service with client apps.
2. Migrate dataset.
Consume pipeline endpoints from client
applications
Article • 03/14/2023
） Important

future.
In this article, you learn how to integrate client applications with Azure Machine
Learning endpoints.
This article is part of the ML Studio (classic) to Azure Machine Learning migration series.
For more information on migrating to Azure Machine Learning, see the migration
overview article.
Prerequisites
An Azure Machine Learning real-time endpoint or pipeline endpoint.
Consume a real-time endpoint

If you deployed your model as a real-time endpoint, you can find its REST endpoint, and
pre-generated consumption code in C#, Python, and R:
1. Go to Azure Machine Learning studio (ml.azure.com ).
3. Select your real-time endpoint.
4. Select Consume.
７ Note
You can also find the Swagger specification for your endpoint in the Details tab.
Use the Swagger definition to understand your endpoint schema. For more
information on Swagger definition, see Swagger official documentation .
Consume a pipeline endpoint

There are two ways to consume a pipeline endpoint:
REST API calls

Integration with Azure Data Factory
Use REST API calls

Call the REST endpoint from your client application. You can use the Swagger
specification for your endpoint to understand its schema:
1. Go to Azure Machine Learning studio (ml.azure.com ).

3. Select Pipeline endpoints.
4. Select your pipeline endpoint.
5. In the Pipeline endpoint overview pane, select the link under REST endpoint
documentation.
Use Azure Data Factory

You can call your Azure Machine Learning pipeline as a step in an Azure Data Factory
pipeline. For more information, see Execute Azure Machine Learning pipelines in Azure
Data Factory.
Next steps
In this article, you learned how to find schema and sample code for your pipeline
endpoints. For more information on authenticating to an endpoint, see Authenticate to
an online endpoint.
See the rest of the articles in the Azure Machine Learning migration series:
Migration overview.
Migrate dataset.
Rebuild a Studio (classic) training pipeline.
Rebuild a Studio (classic) web service.
Migrate Execute R Script.
Migrate Execute R Script modules in
Studio (classic)
Article • 03/14/2023
） Important

future.
In this article, you learn how to rebuild a Studio (classic) Execute R Script module in
For more information on migrating from Studio (classic), see the migration overview
article.
Execute R Script
Azure Machine Learning designer now runs on Linux. Studio (classic) runs on Windows.
Due to the platform change, you must adjust your Execute R Script during migration,
otherwise the pipeline will fail.
To migrate an Execute R Script module from Studio (classic), you must replace the
maml.mapInputPort and maml.mapOutputPort interfaces with standard functions.
The following table summarizes the changes to the R Script module:
Feature Studio (classic) Azure Machine Learning

designer
Feature Studio (classic) Azure Machine Learning
designer
Script Interface maml.mapInputPort and Function interface

maml.mapOutputPort
Platform Windows Linux
Internet No Yes
Accessible
Memory 14 GB Dependent on Compute SKU
How to update the R script interface

Here are the contents of a sample Execute R Script module in Studio (classic):
# Map 1-based optional input ports to variables

dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame
# Contents of optional Zip port are in ./src/

# source("src/yourfile.R");
# load("src/yourData.rdata");
# Sample operation
data.set = rbind(dataset1, dataset2);
# You'll see this output in the R Device port.

# It'll have your stdout, stderr and PNG graphics device(s).
plot(data.set);
# Select data.frame to be sent to the output Dataset port

maml.mapOutputPort("data.set");
Here are the updated contents in the designer. Notice that the maml.mapInputPort and
maml.mapOutputPort have been replaced with the standard function interface
azureml_main .
azureml_main <- function(dataframe1, dataframe2){

# Use the parameters dataframe1 and dataframe2 directly
dataset1 <- dataframe1
dataset2 <- dataframe2
# Contents of optional Zip port are in ./src/
# source("src/yourfile.R");
# load("src/yourData.rdata");
# Sample operation
data.set = rbind(dataset1, dataset2);
# You'll see this output in the R Device port.

# It'll have your stdout, stderr and PNG graphics device(s).
plot(data.set);
# Return datasets as a Named List
return(list(dataset1=data.set))
}
For more information, see the designer Execute R Script module reference.
Install R packages from the internet

Azure Machine Learning designer lets you install packages directly from CRAN.
This is an improvement over Studio (classic). Since Studio (classic) runs in a sandbox
environment with no internet access, you had to upload scripts in a zip bundle to install
more packages.
Use the following code to install CRAN packages in the designer's Execute R Script
module:
if(!require(zoo)) {
install.packages("zoo",repos = "http://cran.us.r-project.org")
}
library(zoo)
Next steps
In this article, you learned how to migrate Execute R Script modules to Azure Machine
Learning.
2. Migrate dataset.
5. Integrate a Machine Learning web service with client apps.
6. Migrate Execute R Script modules.
Upgrade to v2
Article • 04/04/2023
Azure Machine Learning's v2 REST APIs, Azure CLI extension, and Python SDK introduce
consistency and a set of new features to accelerate the production machine learning
lifecycle. This article provides an overview of upgrading to v2 with recommendations to
help you decide on v1, v2, or both.
Prerequisites
General familiarity with Azure Machine Learning and the v1 Python SDK.
Understand what is v2?
Should I use v2?

You should use v2 if you're starting a new machine learning project or workflow. You
should use v2 if you want to use the new features offered in v2. The features include:
Managed Inferencing
Reusable components in pipelines
Improved scheduling of pipelines
Responsible AI dashboard
Registry of assets
A new v2 project can reuse existing resources like workspaces and compute and existing
assets like models and environments created using v1.
Some feature gaps in v2 include:
Spark support in jobs - this is currently in preview in v2.

Publishing jobs (pipelines in v1) as endpoints. You can however, schedule pipelines
without publishing.
Support for SQL/database datastores.
Ability to use classic prebuilt components in the designer with v2.
You should then ensure the features you need in v2 meet your organization's
requirements, such as being generally available.
） Important
New features in Azure Machine Learning will only be launched in v2.

Should I upgrade existing code to v2
You can reuse your existing assets in your v2 workflows. For instance a model created in
v1 can be used to perform Managed Inferencing in v2.
Optionally, if you want to upgrade specific parts of your existing code to v2, please refer
to the comparison links provided in the details of each resource or asset in the rest of
this document.
Which v2 API should I use?

In v2 interfaces via REST API, CLI, and Python SDK are available. The interface you should
use depends on your scenario and preferences.
API Notes
REST Fewest dependencies and overhead. Use for building applications on Azure Machine
Learning as a platform, directly in programming languages without an SDK provided, or
per personal preference.
CLI Recommended for automation with CI/CD or per personal preference. Allows quick
iteration with YAML files and straightforward separation between Azure Machine
Learning and ML model code.
Python Recommended for complicated scripting (for example, programmatically generating

SDK large pipeline jobs) or per personal preference. Allows quick iteration with YAML files or
development solely in Python.
Can I use v1 and v2 together?

v1 and v2 can co-exist in a workspace. You can reuse your existing assets in your v2
workflows. For instance a model created in v1 can be used to perform Managed
Inferencing in v2. Resources like workspace, compute, and datastore work across v1 and
v2, with exceptions. A user can call the v1 Python SDK to change a workspace's
description, then using the v2 CLI extension change it again. Jobs
(experiments/runs/pipelines in v1) can be submitted to the same workspace from the v1
or v2 Python SDK. A workspace can have both v1 and v2 model deployment endpoints.
Using v1 and v2 code together

We do not recommend using the v1 and v2 SDKs together in the same code. It is
technically possible to use v1 and v2 in the same code because they use different Azure
namespaces. However, there are many classes with the same name across these
namespaces (like Workspace, Model) which can cause confusion and make code
readability and debuggability challenging.
） Important
If your workspace uses a private endpoint, it will automatically have the

v1_legacy_mode flag enabled, preventing usage of v2 APIs. See how to configure
network isolation with v2 for details.
Resources and assets in v1 and v2

This section gives an overview of specific resources and assets in Azure Machine
Learning. See the concept article for each entity for details on their usage in v2.
Workspace
Workspaces don't need to be upgraded with v2. You can use the same workspace,
regardless of whether you're using v1 or v2.
If you create workspaces using automation, do consider upgrading the code for creating
a workspace to v2. Typically Azure resources are managed via Azure Resource Manager
(and Bicep) or similar resource provisioning tools. Alternatively, you can use the CLI (v2)
and YAML files.
For a comparison of SDK v1 and v2 code, see Workspace management in SDK v1 and
SDK v2.
） Important
If your workspace uses a private endpoint, it will automatically have the

v1_legacy_mode flag enabled, preventing usage of v2 APIs. See how to configure
network isolation with v2 for details.
Connection (workspace connection in v1)

Workspace connections from v1 are persisted on the workspace, and fully available with
v2.
For a comparison of SDK v1 and v2 code, see Workspace management in SDK v1 and
SDK v2.
Datastore
Object storage datastore types created with v1 are fully available for use in v2. Database
datastores are not supported; export to object storage (usually Azure Blob) is the
recommended migration path.
For a comparison of SDK v1 and v2 code, see Datastore management in SDK v1 and SDK
v2.
Compute
Compute of type AmlCompute and ComputeInstance are fully available for use in v2.
For a comparison of SDK v1 and v2 code, see Compute management in SDK v1 and SDK
v2.
Endpoint and deployment (endpoint and web service in

v1)
With SDK/CLI v1, you can deploy models on ACI or AKS as web services. Your existing v1
model deployments and web services will continue to function as they are, but Using
SDK/CLI v1 to deploy models on ACI or AKS as web services is now consiered as legacy.
For new model deployments, we recommend upgrading to v2. In v2, we offer managed
endpoints or Kubernetes endpoints. The following table guides our recommendation:
Endpoint Upgrade from Notes

type in v2
Local ACI Quick test of model deployment locally; not for production.
Managed ACI, AKS Enterprise-grade managed model deployment infrastructure

online with near real-time responses and massive scaling for
endpoint production.
Managed ParallelRunStep in Enterprise-grade managed model deployment infrastructure

batch a pipeline for batch with massively parallel batch processing for production.
endpoint scoring
Endpoint Upgrade from Notes
type in v2
Azure ACI, AKS Manage your own AKS cluster(s) for model deployment,
Kubernetes giving flexibility and granular control at the cost of IT
Service overhead.
(AKS)
Azure Arc N/A Manage your own Kubernetes cluster(s) in other clouds or
Kubernetes on-premises, giving flexibility and granular control at the
cost of IT overhead.
For a comparison of SDK v1 and v2 code, see Upgrade deployment endpoints to SDK v2.
For migration steps from your existing ACI web services to managed online endpoints,
see our upgrade guide article and blog .
Jobs (experiments, runs, pipelines in v1)

In v2, "experiments", "runs", and "pipelines" are consolidated into jobs. A job has a type.
Most jobs are command jobs that run a command, like python main.py . What runs in a
job is agnostic to any programming language, so you can run bash scripts, invoke
python interpreters, run a bunch of curl commands, or anything else. Another common
type of job is pipeline , which defines child jobs that may have input/output
relationships, forming a directed acyclic graph (DAG).
For a comparison of SDK v1 and v2 code, see
Run a script
Local runs
Hyperparameter tuning
Parallel Run
Pipelines
AutoML
Designer
You can use designer to build pipelines using your own v2 custom components and the
new prebuilt components from registry. In this situation, you can use v1 or v2 data
assets in your pipeline.
You can continue to use designer to build pipelines using classic prebuilt components
and v1 dataset types (tabular, file). You cannot use existing designer classic prebuilt
components with v2 data asset.
You cannot build a pipeline using both existing designer classic prebuilt components
and v2 custom components.
Data (datasets in v1)

Datasets are renamed to data assets. Backwards compatibility is provided, which means
you can use V1 Datasets in V2. When you consume a V1 Dataset in a V2 job you will
notice they are automatically mapped into V2 types as follows:
V1 FileDataset = V2 Folder ( uri_folder )

V1 TabularDataset = V2 Table ( mltable )
It should be noted that forwards compatibility is not provided, which means you cannot
use V2 data assets in V1.
This article talks more about handling data in v2 - Read and write data in a job
For a comparison of SDK v1 and v2 code, see Data assets in SDK v1 and v2.
Model
Models created from v1 can be used in v2.
For a comparison of SDK v1 and v2 code, see Model management in SDK v1 and SDK v2
Environment
Environments created from v1 can be used in v2. In v2, environments have new features
like creation from a local Docker context.
Managing secrets
The management of Key Vault secrets differs significantly in V2 compared to V1. The V1
set_secret and get_secret SDK methods are not available in V2. Instead, direct access
using Key Vault client libraries should be used.
For details about Key Vault, see Use authentication credential secrets in Azure Machine
Learning training jobs.
Scenarios across the machine learning lifecycle

There are a few scenarios that are common across the machine learning lifecycle using
Azure Machine Learning. We'll look at a few and give general recommendations for
upgrading to v2.
Azure setup
Azure recommends Azure Resource Manager templates (often via Bicep for ease of use)
to create resources. The same is a good approach for creating Azure Machine Learning
resources as well.
If your team is only using Azure Machine Learning, you may consider provisioning the
workspace and any other resources via YAML files and CLI instead.
Prototyping models
We recommend v2 for prototyping models. You may consider using the CLI for an
interactive use of Azure Machine Learning, while your model training code is Python or
any other programming language. Alternatively, you may adopt a full-stack approach
with Python solely using the Azure Machine Learning SDK or a mixed approach with the
Azure Machine Learning Python SDK and YAML files.
Production model training

We recommend v2 for production model training. Jobs consolidate the terminology and
provide a set of consistency that allows for easier transition between types (for example,
command to sweep ) and a GitOps-friendly process for serializing jobs into YAML files.
With v2, you should separate your machine learning code from the control plane code.
This separation allows for easier iteration and allows for easier transition between local
and cloud. We also recommend using MLflow for tracking and model logging. See the
MLflow concept article for details.
Production model deployment

We recommend v2 for production model deployment. Managed endpoints abstract the
IT overhead and provide a performant solution for deploying and scoring models, both
for online (near real-time) and batch (massively parallel) scenarios.
Kubernetes deployments are supported in v2 through AKS or Azure Arc, enabling Azure
cloud and on-premises deployments managed by your organization.
Machine learning operations (MLOps)
A MLOps workflow typically involves CI/CD through an external tool. Typically a CLI is
used in CI/CD, though you can alternatively invoke Python or directly use REST.
The solution accelerator for MLOps with v2 is being developed at

https://github.com/Azure/mlops-v2 and can be used as reference or adopted for
setup and automation of the machine learning lifecycle.
A note on GitOps with v2

A key paradigm with v2 is serializing machine learning entities as YAML files for source
control with git , enabling better GitOps approaches than were possible with v1. For
instance, you could enforce policy by which only a service principal used in CI/CD
pipelines can create/update/delete some or all entities, ensuring changes go through a
governed process like pull requests with required reviewers. Since the files in source
control are YAML, they're easy to diff and track changes over time. You and your team
may consider shifting to this paradigm as you upgrade to v2.
You can obtain a YAML representation of any entity with the CLI via az ml <entity> show
--output yaml . Note that this output will have system-generated properties, which can
be ignored or deleted.
Next steps
Get started with the CLI (v2)
Get started with the Python SDK (v2)
Upgrade workspace management to
SDK v2
Article • 04/04/2023
The workspace functionally remains unchanged with the V2 development platform.

However, there are network-related changes to be aware of. For details, see Network
Isolation Change with Our New API Platform on Azure Resource Manager
This article gives a comparison of scenario(s) in SDK v1 and SDK v2.
Create a workspace
SDK v1
Python
ws = Workspace.create(
name='my_workspace',
location='eastus',
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
)
SDK v2
Python
from azure.ai.ml import MLClient

from azure.ai.ml.entities import Workspace
from azure.identity import DefaultAzureCredential
# specify the details of your subscription

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
# get a handle to the subscription

ml_client = MLClient(DefaultAzureCredential(), subscription_id,
resource_group)
# specify the workspace details

ws = Workspace(
name="my_workspace",
location="eastus",
display_name="My workspace",
description="This example shows how to create a workspace",
tags=dict(purpose="demo"),
)
ml_client.workspaces.begin_create(ws)
Create a workspace for use with Azure Private

Link endpoints
SDK v1
Python
ws = Workspace.create(
location='eastus',
)
ple = PrivateEndPointConfig(
name='my_private_link_endpoint',
vnet_name='<VNET_NAME>',
vnet_subnet_name='<VNET_SUBNET_NAME>',
vnet_subscription_id='<SUBSCRIPTION_ID>',
vnet_resource_group='<RESOURCE_GROUP>'
)
ws.add_private_endpoint(ple, private_endpoint_auto_approval=True)
SDK v2
Python


# get a handle to the subscription

ml_client = MLClient(DefaultAzureCredential(), subscription_id,
resource_group)
ws = Workspace(
name="private_link_endpoint_workspace,
location="eastus",
display_name="Private Link endpoint workspace",
description="When using private link, you must set the
image_build_compute property to a cluster name to use for Docker image
environment building. You can also specify whether the workspace should
be accessible over the internet.",
image_build_compute="cpu-compute",
public_network_access="Disabled",
tags=dict(purpose="demonstration"),
)
ml_client.workspaces.begin_create(ws)
Load/connect to workspace using parameters

SDK v1
Python


# get handle on the workspace

ws = Workspace.get(
subscription_id='<SUBSCRIPTION_ID>',
resource_group='<RESOURCE_GROUP>',
)
SDK v2
Python


# get handle on the workspace

ws = MLClient(
DefaultAzureCredential(),
subscription_id='<SUBSCRIPTION_ID>',
resource_group_name='<RESOURCE_GROUP>',
workspace_name='my_workspace'
)
Load/connect to workspace using config file

SDK v1
Python
ws.get_details()
SDK v2
Python

ws = MLClient.from_config(
DefaultAzureCredential()
)
Mapping of key functionality in SDK v1 and

SDK v2
Functionality in SDK v1 Rough mapping in SDK v2
Method/API in SDK v1 (use links to ref docs) Method/API in SDK v2 (use links to ref docs)
Related documents
What is a workspace?
Upgrade compute management to v2
Article • 07/06/2023
The compute management functionally remains unchanged with the v2 development

platform.
Create compute instance

SDK v1
Python
import datetime
import time

# Compute Instances need to have a unique name across the region.

# Here, we create a unique name with current datetime
ci_basic_name = "basic-ci" +
datetime.datetime.now().strftime("%Y%m%d%H%M")
vm_size='STANDARD_DS3_V2'
)
instance = ComputeInstance.create(ws, ci_basic_name ,
compute_config)
SDK v2
Python
# Compute Instances need to have a unique name across the region.

# Here, we create a unique name with current datetime
from azure.ai.ml.entities import ComputeInstance, AmlCompute
import datetime
ci_basic_name = "basic-ci" +
datetime.datetime.now().strftime("%Y%m%d%H%M")
ci_basic = ComputeInstance(name=ci_basic_name, size="STANDARD_DS3_v2",
idle_time_before_shutdown_minutes="30")
ml_client.begin_create_or_update(ci_basic)
Create compute cluster
SDK v1
Python


compute_config =
AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
max_nodes=4)
compute_config)
SDK v2
Python
from azure.ai.ml.entities import AmlCompute

cluster_basic = AmlCompute(
name=cpu_cluster_name,
type="amlcompute",
size="STANDARD_DS3_v2",
max_instances=4
)
ml_client.begin_create_or_update(cluster_basic)

SDK v2
Next steps
Create an Azure Machine Learning compute instance
Create an Azure Machine Learning compute cluster
Upgrade datastore management to SDK
v2
Article • 04/04/2023
Azure Machine Learning Datastores securely keep the connection information to your
data storage on Azure, so you don't have to code it in your scripts. V2 Datastore
concept remains mostly unchanged compared with V1. The difference is we won't
support SQL-like data sources via Azure Machine Learning Datastores. We'll support
SQL-like data sources via Azure Machine Learning data import&export functionalities.
Create a datastore from an Azure Blob

container via account_key
SDK v1
Python

container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") #
Name of Azure blob container
account_name=os.getenv("BLOB_ACCOUNTNAME", "<my-account-name>") #
Storage account name
account_key=os.getenv("BLOB_ACCOUNT_KEY", "<my-account-key>") # Storage
account access key
SDK v2
Python
from azure.ai.ml.entities import AzureBlobDatastore

ml_client = MLClient.from_config()
store = AzureBlobDatastore(
name="blob-protocol-example",
description="Datastore pointing to a blob container using wasbs
protocol.",
account_name="mytestblobstore",
container_name="data-container",
protocol="wasbs",
credentials={
"account_key":
"XXXxxxXXXxXXXXxxXXXXXxXXXXXxXxxXxXXXxXXXxXXxxxXXxxXXXxXxXXXxxXxxXXXXxx
xxxXXxxxxxxXXXxXXX"
},
)
ml_client.create_or_update(store)

container via sas_token
SDK v1
Python

container_name=os.getenv("BLOB_CONTAINER", "<my-container-name>") #
Name of Azure blob container
sas_token=os.getenv("BLOB_SAS_TOKEN", "<my-sas-token>") # Sas token
sas_token=sas_token)
SDK v2
Python

name="blob-sas-example",
description="Datastore pointing to a blob container using SAS
token.",
account_name="mytestblobstore",
container_name="data-container",
credentials=SasTokenCredentials(
sas_token= "?xx=XXXX-XX-
XX&xx=xxxx&xxx=xxx&xx=xxxxxxxxxxx&xx=XXXX-XX-XXXXX:XX:XXX&xx=XXXX-XX-
XXXXX:XX:XXX&xxx=xxxxx&xxx=XXxXXXxxxxxXXXXXXXxXxxxXXXXXxxXXXXXxXXXXxXXX
xXXxXX"
),
)

container via identity-based authentication
SDK v1
Python
datastore_name='credentialless_blob',
container_name='my_container_name',
account_name='my_account_name')
SDK v2
Python

name="",
description="",
account_name="",
container_name=""
)
Get datastores from your workspace
SDK v1
Python
# Get a named datastore from the current workspace

datastore = Datastore.get(ws, datastore_name='your datastore name')
Python
# List all datastores registered in the current workspace

for name, datastore in datastores.items():
print(name, datastore.datastore_type)
SDK v2
Python

#Enter details of your Azure Machine Learning workspace

workspace_name = '<AZUREML_WORKSPACE_NAME>'
ml_client = MLClient(credential=DefaultAzureCredential(),
subscription_id=subscription_id,
resource_group_name=resource_group)
datastore = ml_client.datastores.get(name='your datastore name')

SDK v2
Storage types in SDK v1 Storage types in SDK v2
azureml_blob_datastore azureml_blob_datastore
azureml_data_lake_gen1_datastore azureml_data_lake_gen1_datastore
azureml_data_lake_gen2_datastore azureml_data_lake_gen2_datastore
azuremlml_sql_database_datastore Will be supported via import & export functionalities

Storage types in SDK v1 Storage types in SDK v2
azuremlml_my_sql_datastore Will be supported via import & export functionalities
azuremlml_postgre_sql_datastore Will be supported via import & export functionalities
Next steps
Create datastores
Read and write data in a job
V2 datastore operations
Upgrade data management to SDK v2
Article • 04/04/2023
In V1, an Azure Machine Learning dataset can either be a Filedataset or a

Tabulardataset . In V2, an Azure Machine Learning data asset can be a uri_folder ,
uri_file or mltable . You can conceptually map Filedataset to uri_folder and

uri_file , Tabulardataset to mltable .
URIs ( uri_folder , uri_file ) - a Uniform Resource Identifier that is a reference to a

storage location on your local computer or in the cloud, that makes it easy to
access data in your jobs.
MLTable - a method to abstract the tabular data schema definition, to make it
easier for consumers of that data to materialize the table into a Pandas/Dask/Spark
dataframe.
This article compares data scenario(s) in SDK v1 and SDK v2.
Create a filedataset / uri type of data asset

SDK v1 - Create a Filedataset
Python
# create a FileDataset pointing to files in 'animals' folder and its

subfolders recursively
datastore_paths = [(datastore, 'animals')]
animal_ds = Dataset.File.from_files(path=datastore_paths)
# create a FileDataset from image and label files behind public web
urls
web_paths =
['https://azureopendatastorage.blob.core.windows.net/mnist/train-
images-idx3-ubyte.gz',
'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-
idx1-ubyte.gz']
mnist_ds = Dataset.File.from_files(path=web_paths)
SDK v2
Create a URI_FOLDER type data asset

Python
from azure.ai.ml.entities import Data

from azure.ai.ml.constants import AssetTypes
# Supported paths include:

# local: './<path>'
# blob:
'https://<account_name>.blob.core.windows.net/<container_name>/<path
>'
# ADLS gen2:
'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/'
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>'
my_path = '<path>'
my_data = Data(
path=my_path,
type=AssetTypes.URI_FOLDER,
description="<description>",
name="<name>",
version='<version>'
)
ml_client.data.create_or_update(my_data)
Create a URI_FILE type data asset.
Python


# local: './<path>/<file>'
# blob:
'https://<account_name>.blob.core.windows.net/<container_name>/<path
>/<file>'
# ADLS gen2:
'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/<f
ile>'
# Datastore:
'azureml://datastores/<data_store_name>/paths/<path>/<file>'
my_path = '<path>'
my_data = Data(
path=my_path,
type=AssetTypes.URI_FILE,
name="<name>",
version="<version>"
)
Create a tabular dataset/data asset

SDK v1
Python
datastore_name = 'your datastore name'
# get existing workspace

# retrieve an existing datastore in the workspace by name

datastore = Datastore.get(workspace, datastore_name)
# create a TabularDataset from 3 file paths in datastore

datastore_paths = [(datastore, 'weather/2018/11.csv'),
(datastore, 'weather/2018/12.csv'),
(datastore, 'weather/2019/*.csv')]
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
SDK v2 - Create mltable data asset via yaml definition
YAML
type: mltable
paths:
- pattern: ./*.txt
transformations:
- read_delimited:
delimiter: ,
encoding: ascii
header: all_files_same_headers
Python

# my_path must point to folder containing MLTable artifact (MLTable

file + data
# local: './<path>'
# blob:
'https://<account_name>.blob.core.windows.net/<container_name>/<path>'
# ADLS gen2:
'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/'
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>'
my_path = '<path>'
my_data = Data(
path=my_path,
type=AssetTypes.MLTABLE,
name="<name>",
version='<version>'
)
Use data in an experiment/job

SDK v1
Python
script='train_titanic.py',
# pass dataset as an input with friendly name
'titanic'
arguments=['--input-data',
titanic_ds.as_named_input('titanic')],
environment=myenv)

SDK v2
Python
from azure.ai.ml import command

from azure.ai.ml import Input, Output
# Possible Asset Types for Data:

# AssetTypes.URI_FILE
# AssetTypes.URI_FOLDER
# AssetTypes.MLTABLE
# Possible Paths for Data:

# Blob:
https://<account_name>.blob.core.windows.net/<container_name>/<folder>/
<file>
# Datastore: azureml://datastores/paths/<folder>/<file>
# Data Asset: azureml:<my_data>:<version>
my_job_inputs = {
"raw_data": Input(type=AssetTypes.URI_FOLDER, path="<path>")
}
my_job_outputs = {
"prep_data": Output(type=AssetTypes.URI_FOLDER, path="<path>")
}
job = command(
code="./src", # local path where the code is stored
command="python process_data.py --raw_data ${{inputs.raw_data}} --
prep_data ${{outputs.prep_data}}",
inputs=my_job_inputs,
outputs=my_job_outputs,
environment="<environment_name>:<version>",
compute="cpu-cluster",
)
# submit the command

returned_job = ml_client.create_or_update(job)
# get a URL for the status of the job
returned_job.services["Studio"].endpoint

SDK v2
Method/API in SDK v1 Method/API in SDK v2
Next steps
For more information, see the documentation here:
Data in Azure Machine Learning

Create data_assets
Read and write data in a job
V2 datastore operations
Upgrade model management to SDK v2
Article • 04/04/2023
Create model
SDK v1
Python
import urllib.request
# Register model
model = Model.register(ws, model_name="local-file-example",
model_path="mlflow-model/model.pkl")
SDK v2
Python
from azure.ai.ml.entities import Model

file_model = Model(
path="mlflow-model/model.pkl",
type=AssetTypes.CUSTOM_MODEL,
name="local-file-example",
description="Model created from local file."
)
ml_client.models.create_or_update(file_model)
Use model in an experiment/job

SDK v1
Python
model = run.register_model(model_name='run-model-example',
model_path='outputs/model/')
print(model.name, model.id, model.version, sep='\t')
SDK v2
Python

run_model = Model(
path="azureml://jobs/$RUN_ID/outputs/artifacts/paths/model/",
name="run-model-example",
description="Model created from run.",
type=AssetTypes.CUSTOM_MODEL
)
ml_client.models.create_or_update(run_model)

SDK v2
Model.register ml_client.models.create_or_update
run.register_model ml_client.models.create_or_update
Model.deploy ml_client.begin_create_or_update(blue_deployment)
Next steps
Create a model in v1
Deploy a model in v1
Create a model in v2
Deploy a model in v2
Upgrade script run to SDK v2
Article • 04/04/2023
In SDK v2, "experiments" and "runs" are consolidated into jobs.
A job has a type. Most jobs are command jobs that run a command , like python main.py .
What runs in a job is agnostic to any programming language, so you can run bash
scripts, invoke python interpreters, run a bunch of curl commands, or anything else.
To upgrade, you'll need to change your code for submitting jobs to SDK v2. What you
run within the job doesn't need to be upgraded to SDK v2. However, it's recommended
to remove any code specific to Azure Machine Learning from your model training
scripts. This separation allows for an easier transition between local and cloud and is
considered best practice for mature MLOps. In practice, this means removing azureml.*
lines of code. Model logging and tracking code should be replaced with MLflow. For
more details, see how to use MLflow in v2.
Submit a script run

SDK v1
Python
from azureml.core import Workspace, Experiment, Environment,

ScriptRunConfig
# connect to the workspace

# define and configure the experiment

script='train.py',

name='pytorch-env',
file_path='pytorch-env.yml')
print(aml_url)
SDK v2
Python
#import required libraries

from azure.ai.ml import MLClient, command
from azure.ai.ml.entities import Environment
#connect to the workspace

ml_client = MLClient.from_config(DefaultAzureCredential())

env = Environment(
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04",
conda_file="pytorch-env.yml",
name="pytorch-env"
)
# define the command

code="./src",
command="train.py",
environment=env,
)
returned_job = ml_client.jobs.create_or_update(command_job)
returned_job
Mapping of key functionality in v1 and v2

experiment.submit MLCLient.jobs.create_or_update
ScriptRunConfig() command()
Next steps
V1 - Experiment
V2 - Command Job
Train models with the Azure Machine Learning Python SDK v2
Upgrade local runs to SDK v2
Article • 04/04/2023
Local runs are similar in both V1 and V2. Use the "local" string when setting the
compute target in either version.
Submit a local run

SDK v1
Python
from azureml.core import Workspace, Experiment, Environment,

ScriptRunConfig
# connect to the workspace

# define and configure the experiment

script='train.py',
compute_target='local')

name='pytorch-env',
file_path='pytorch-env.yml')
print(aml_url)
SDK v2
Python
#import required libraries

from azure.ai.ml import MLClient, command
#connect to the workspace

ml_client = MLClient.from_config(DefaultAzureCredential())
env = Environment(
image='mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04',
conda_file='pytorch-env.yml',
name='pytorch-env'
)
# define the command

code='./src',
command='train.py',
environment=env,
compute='local',
)
returned_job = ml_client.jobs.create_or_update(command_job)
returned_job

SDK v2
experiment.submit MLCLient.jobs.create_or_update
Next steps
Train models with Azure Machine Learning
Upgrade AutoML to SDK v2
Article • 04/04/2023
In SDK v2, "experiments" and "runs" are consolidated into jobs.
In SDK v1, AutoML was primarily configured and run using the AutoMLConfig class. In
SDK v2, this class has been converted to an AutoML job. Although there are some
differences in the configuration options, by and large, naming & functionality has been
preserved in V2.
Submit AutoML run

SDK v1: Below is a sample AutoML classification task. For the entire code, check
out our examples repo .
Python
# Imports
import azureml.core
# Load tabular dataset

data = "<url_to_data>"
training_data, validation_data = dataset.random_split(percentage=0.8,
seed=223)
label_column_name = "Class"
# Configure Auto ML settings

automl_settings = {
"n_cross_validations": 3,
"primary_metric": "average_precision_score_weighted",
"enable_early_stopping": True,
"max_concurrent_iterations": 2,
"experiment_timeout_hours": 0.25,
}
# Put together an AutoML job constructor

task="classification",
debug_log="automl_errors.log",
training_data=training_data,
label_column_name=label_column_name,
**automl_settings,
)
# Submit run
remote_run = experiment.submit(automl_config, show_output=False)
azureml_url = remote_run.get_portal_url()
print(azureml_url)
SDK v2: Below is a sample AutoML classification task. For the entire code, check
out our examples repo .
Python
# Imports
from azure.ai.ml import automl, Input, MLClient

from azure.ai.ml.automl import (
classification,
ClassificationPrimaryMetrics,
ClassificationModels,
)
# Create MLTables for training dataset

# Note that AutoML Job can also take in tabular data
my_training_data_input = Input(
type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)
# Create the AutoML classification job with the related factory-

function.
classification_job = automl.classification(
compute="<compute_name>",
experiment_name="<exp_name?",
training_data=my_training_data_input,
target_column_name="<name_of_target_column>",
primary_metric="accuracy",
enable_model_explainability=True,
tags={"my_custom_tag": "My custom value"},
)
# Limits are all optional

classification_job.set_limits(
timeout_minutes=600,
trial_timeout_minutes=20,
max_trials=5,
max_concurrent_trials = 4,
max_cores_per_trial= 1,
enable_early_termination=True,
)
# Training properties are optional

classification_job.set_training(
blocked_training_algorithms=["LogisticRegression"],
enable_onnx_compatible_models=True,
)
# Submit the AutoML job

returned_job = ml_client.jobs.create_or_update(classification_job)
returned_job

SDK v2
Next steps
How to train an AutoML model with Python SDKv2

Upgrade hyperparameter tuning to SDK
v2
Article • 04/04/2023
In SDK v2, tuning hyperparameters are consolidated into jobs.
A sweep job is another type of job, which defines sweep settings and can be initiated by
calling the sweep method of command.
To upgrade, you'll need to change your code for defining and submitting your
hyperparameter tuning experiment to SDK v2. What you run within the job doesn't need
to be upgraded to SDK v2. However, it's recommended to remove any code specific to
Azure Machine Learning from your model training scripts. This separation allows for an
easier transition between local and cloud and is considered best practice for mature
MLOps. In practice, this means removing azureml.* lines of code. Model logging and
tracking code should be replaced with MLflow. For more information, see how to use
MLflow in v2.
Run hyperparameter tuning in an experiment

SDK v1
Python
from azureml.core import ScriptRunConfig, Experiment, Workspace

from azureml.train.hyperdrive import RandomParameterSampling,
BanditPolicy, HyperDriveConfig, PrimaryMetricGoal
from azureml.train.hyperdrive import choice, loguniform
dataset = Dataset.get_by_name(ws, 'mnist-dataset')
# list the files referenced by mnist dataset

dataset.to_path()
#define the search space for your hyperparameters

param_sampling = RandomParameterSampling(
{
'--batch-size': choice(25, 50, 100),
'--first-layer-neurons': choice(10, 50, 200, 300, 500),
'--second-layer-neurons': choice(10, 50, 200, 500),
'--learning-rate': loguniform(-6, -1)
}
)
args = ['--data-folder', dataset.as_named_input('mnist').as_mount()]
#Set up your script run

script='keras_mnist.py',
arguments=args,
environment=keras_env)
# Set early stopping on this one

early_termination_policy = BanditPolicy(evaluation_interval=2,
slack_factor=0.1)
# Define the configurations for your hyperparameter tuning experiment

hyperdrive_config = HyperDriveConfig(run_config=src,
primary_metric_name='Accuracy',
max_total_runs=20,
# Specify your experiment details
experiment = Experiment(workspace, experiment_name)
hyperdrive_run = experiment.submit(hyperdrive_config)
#Find the best model

best_run = hyperdrive_run.get_best_run_by_primary_metric()
SDK v2
Python

from azure.ai.ml import command, Input
from azure.ai.ml.sweep import Choice, Uniform, MedianStoppingPolicy
# Create your command

command_job_for_sweep = command(
code="./src",
command="python main.py --iris-csv ${{inputs.iris_csv}} --learning-
rate ${{inputs.learning_rate}} --boosting ${{inputs.boosting}}",
environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
inputs={
"iris_csv": Input(
type="uri_file",
path="https://azuremlexamples.blob.core.windows.net/datasets/iris.csv",
),
#define the search space for your hyperparameters
"learning_rate": Uniform(min_value=0.01, max_value=0.9),
"boosting": Choice(values=["gbdt", "dart"]),
},
)
# Call sweep() on your command job to sweep over your parameter

expressions
sweep_job = command_job_for_sweep.sweep(
sampling_algorithm="random",
primary_metric="test-multi_logloss",
goal="Minimize",
)
# Define the limits for this sweep

sweep_job.set_limits(max_total_trials=20, max_concurrent_trials=10,
timeout=7200)
# Set early stopping on this one

sweep_job.early_termination = MedianStoppingPolicy(delay_evaluation=5,
evaluation_interval=2)
# Specify your experiment details

sweep_job.display_name = "lightgbm-iris-sweep-example"
sweep_job.experiment_name = "lightgbm-iris-sweep-example"
sweep_job.description = "Run a hyperparameter sweep job for LightGBM on
Iris dataset."
# submit the sweep

returned_sweep_job = ml_client.create_or_update(sweep_job)
# get a URL for the status of the job

returned_sweep_job.services["Studio"].endpoint
# Download best trial model output

ml_client.jobs.download(returned_sweep_job.name, output_name="model")
Run hyperparameter tuning in a pipeline

SDK v1
Python
tf_env = Environment.get(ws, name='AzureML-TensorFlow-2.0-GPU')

data_folder = dataset.as_mount()
script='tf_mnist.py',
arguments=['--data-folder', data_folder],
environment=tf_env)
#Define HyperDrive configs

ps = RandomParameterSampling(
{
'--batch-size': choice(25, 50, 100),
'--first-layer-neurons': choice(10, 50, 200, 300, 500),
'--second-layer-neurons': choice(10, 50, 200, 500),
'--learning-rate': loguniform(-6, -1)
}
)
early_termination_policy = BanditPolicy(evaluation_interval=2,
slack_factor=0.1)
hd_config = HyperDriveConfig(run_config=src,
hyperparameter_sampling=ps,
primary_metric_name='validation_acc',
max_total_runs=4,
metrics_output_name = 'metrics_output'
metrics_data = PipelineData(name='metrics_data',
pipeline_output_name=metrics_output_name,
training_output=TrainingOutput("Metrics"))
model_output_name = 'model_output'
saved_model = PipelineData(name='saved_model',
pipeline_output_name=model_output_name,
training_output=TrainingOutput("Model",
model_file="outputs/model/saved_model.pb"))
#Create HyperDriveStep
hd_step_name='hd_step01'
hd_step = HyperDriveStep(
name=hd_step_name,
hyperdrive_config=hd_config,
inputs=[data_folder],
outputs=[metrics_data, saved_model])
#Find and register best model

conda_dep.add_pip_package("azureml-sdk")
rcfg = RunConfiguration(conda_dependencies=conda_dep)
register_model_step = PythonScriptStep(script_name='register_model.py',
name="register_model_step01",
inputs=[saved_model],
compute_target=cpu_cluster,
arguments=["--saved-model",
saved_model],
allow_reuse=True,
runconfig=rcfg)
register_model_step.run_after(hd_step)
#Run the pipeline

pipeline = Pipeline(workspace=ws, steps=[hd_step, register_model_step])
pipeline_run = exp.submit(pipeline)
SDK v2
Python
train_component_func = load_component(path="./train.yml")
score_component_func = load_component(path="./predict.yml")
# define a pipeline
@pipeline()
def pipeline_with_hyperparameter_sweep():
"""Tune hyperparameters using sample components."""
train_model = train_component_func(
data=Input(
type="uri_file",
path="wasbs://datasets@azuremlexamples.blob.core.windows.net/iris.csv",
),
c_value=Uniform(min_value=0.5, max_value=0.9),
kernel=Choice(["rbf", "linear", "poly"]),
coef0=Uniform(min_value=0.1, max_value=1),
degree=3,
gamma="scale",
shrinking=False,
probability=False,
tol=0.001,
cache_size=1024,
verbose=False,
max_iter=-1,
decision_function_shape="ovr",
break_ties=False,
random_state=42,
)
sweep_step = train_model.sweep(
primary_metric="training_f1_score",
goal="minimize",
sampling_algorithm="random",
)
sweep_step.set_limits(max_total_trials=20,
max_concurrent_trials=10, timeout=7200)
score_data = score_component_func(
model=sweep_step.outputs.model_output,
test_data=sweep_step.outputs.test_data
)
pipeline_job = pipeline_with_hyperparameter_sweep()
# set pipeline level compute

pipeline_job.settings.default_compute = "cpu-cluster"
# submit job to workspace

pipeline_job = ml_client.jobs.create_or_update(
pipeline_job, experiment_name="pipeline_samples"
)
pipeline_job

SDK v2
HyperDriveRunConfig() SweepJob()
hyperdrive Package sweep Package
Next steps
SDK v1 - Tune Hyperparameters

SDK v2 - Tune Hyperparameters
SDK v2 - Sweep in Pipeline
Upgrade parallel run step to SDK v2
Article • 04/04/2023
In SDK v2, "Parallel run step" is consolidated into job concept as parallel job . Parallel job keeps the same
target to empower users to accelerate their job execution by distributing repeated tasks on powerful multi-
nodes compute clusters. On top of parallel run step, v2 parallel job provides extra benefits:
Flexible interface, which allows user to define multiple custom inputs and outputs for your parallel
job. You can connect them with other steps to consume or manage their content in your entry script
Simplify input schema, which replaces Dataset as input by using v2 data asset concept. You can
easily use your local files or blob directory URI as the inputs to parallel job.
More powerful features are under developed in v2 parallel job only. For example, resume the
failed/canceled parallel job to continue process the failed or unprocessed mini-batches by reusing
the successful result to save duplicate effort.
To upgrade your current sdk v1 parallel run step to v2, you'll need to
Use parallel_run_function to create parallel job by replacing ParallelRunConfig and

ParallelRunStep in v1.
Upgrade your v1 pipeline to v2. Then invoke your v2 parallel job as a step in your v2 pipeline. See
how to upgrade pipeline from v1 to v2 for the details about pipeline upgrade.
Note: User entry script is compatible between v1 parallel run step and v2 parallel job. So you can keep
using the same entry_script.py when you upgrade your parallel run job.
This article gives a comparison of scenario(s) in SDK v1 and SDK v2. In the following examples, we'll build a
parallel job to predict input data in a pipelines job. You'll see how to build a parallel job, and how to use it
in a pipeline job for both SDK v1 and SDK v2.
Prerequisites
Prepare your SDK v2 environment: Install the Azure Machine Learning SDK v2 for Python
Understand the basis of SDK v2 pipeline: How to create Azure Machine Learning pipeline with Python
SDK v2
Create parallel step

SDK v1
Python
# Create the configuration to wrap the inference script

from azureml.pipeline.steps import ParallelRunStep, ParallelRunConfig
parallel_run_config = ParallelRunConfig(
source_directory=scripts_folder,
entry_script=script_file,
mini_batch_size=PipelineParameter(name="batch_size_param", default_value="5"),
error_threshold=10,
output_action="append_row",
append_row_file_name="mnist_outputs.txt",
environment=batch_env,
process_count_per_node=PipelineParameter(name="process_count_param",
default_value=2),
node_count=2
)
# Create the Parallel run step

inputs=[ input_mnist_ds_consumption ],
output=output_dir,
allow_reuse=False
)
SDK v2
Python
# parallel job to process file data

file_batch_inference = parallel_run_function(
name="file_batch_score",
display_name="Batch Score with File Dataset",
description="parallel component for batch score",
inputs=dict(
job_data_path=Input(
type=AssetTypes.MLTABLE,
description="The data to be split and scored in parallel",
)
),
outputs=dict(job_output_path=Output(type=AssetTypes.MLTABLE)),
input_data="${{inputs.job_data_path}}",
instance_count=2,
mini_batch_size="1",
mini_batch_error_threshold=1,
max_concurrency_per_instance=1,
task=RunFunction(
code="./src",
entry_script="file_batch_inference.py",
program_arguments="--job_output_path ${{outputs.job_output_path}}",
environment="azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:1",
),
)
Use parallel step in pipeline

SDK v1
Python
# Run pipeline with parallel run step

pipeline = Pipeline(workspace=ws, steps=[parallelrun_step])

experiment = Experiment(ws, 'digit_identification')
pipeline_run = experiment.submit(pipeline)
pipeline_run.wait_for_completion(show_output=True)
SDK v2
Python
@pipeline()
def parallel_in_pipeline(pipeline_job_data_path, pipeline_score_model):
prepare_file_tabular_data = prepare_data(input_data=pipeline_job_data_path)
# output of file & tabular data should be type MLTable
prepare_file_tabular_data.outputs.file_output_data.type = AssetTypes.MLTABLE
prepare_file_tabular_data.outputs.tabular_output_data.type = AssetTypes.MLTABLE
batch_inference_with_file_data = file_batch_inference(
job_data_path=prepare_file_tabular_data.outputs.file_output_data
)
# use eval_mount mode to handle file data
batch_inference_with_file_data.inputs.job_data_path.mode = (
InputOutputModes.EVAL_MOUNT
)
batch_inference_with_file_data.outputs.job_output_path.type = AssetTypes.MLTABLE
batch_inference_with_tabular_data = tabular_batch_inference(
job_data_path=prepare_file_tabular_data.outputs.tabular_output_data,
score_model=pipeline_score_model,
)
# use direct mode to handle tabular data
batch_inference_with_tabular_data.inputs.job_data_path.mode = (
InputOutputModes.DIRECT
)
return {
"pipeline_job_out_file": batch_inference_with_file_data.outputs.job_output_path,
"pipeline_job_out_tabular":
batch_inference_with_tabular_data.outputs.job_output_path,
}
pipeline_job_data_path = Input(
path="./dataset/", type=AssetTypes.MLTABLE, mode=InputOutputModes.RO_MOUNT
)
pipeline_score_model = Input(
path="./model/", type=AssetTypes.URI_FOLDER, mode=InputOutputModes.DOWNLOAD
)
# create a pipeline
pipeline_job = parallel_in_pipeline(
pipeline_job_data_path=pipeline_job_data_path,
pipeline_score_model=pipeline_score_model,
)
pipeline_job.outputs.pipeline_job_out_tabular.type = AssetTypes.URI_FILE
# set pipeline level compute

pipeline_job.settings.default_compute = "cpu-cluster"
# run pipeline job

pipeline_job, experiment_name="pipeline_samples"
)
Mapping of key functionality in SDK v1 and SDK v2

azureml.pipeline.steps.parallelrunconfig azure.ai.ml.parallel
azureml.pipeline.steps.parallelrunstep
OutputDatasetConfig Output
dataset as_mount Input
Parallel job configurations and settings mapping

SDK v1 SDK v2 Description
ParallelRunConfig.environment parallel_run_function.task.environment Environment that training

job will run in.
ParallelRunConfig.entry_script parallel_run_function.task.entry_script User script that will be run

in parallel on multiple
nodes.
ParallelRunConfig.error_threshold parallel_run_function.error_threshold The number of failed mini

batches that could be
ignored in this parallel job.
If the count of failed mini-
batch is higher than this
threshold, the parallel job
will be marked as failed.
"-1" is the default number,

which means to ignore all
failed mini-batch during
parallel job.
ParallelRunConfig.output_action parallel_run_function.append_row_to Aggregate all returns from

each run of mini-batch and
output it into this file. May
reference to one of the
outputs of parallel job by
using the expression
${{outputs.
<output_name>}}
ParallelRunConfig.node_count parallel_run_function.instance_count Optional number of

instances or nodes used by
the compute target.
Defaults to 1.
ParallelRunConfig.process_count_per_node parallel_run_function.max_concurrency_per_instance The max parallelism that

each compute instance
has.
ParallelRunConfig.mini_batch_size parallel_run_function.mini_batch_size Define the size of each

mini-batch to split the
input.
If the input_data is a folder

or set of files, this number
defines the file count for
each mini-batch. For
example, 10, 100.
If the input_data is tabular

data from mltable , this
number defines the
proximate physical size for
each mini-batch. The
default unit is Byte and the
value could accept string
like 100 kb, 100 mb.
ParallelRunConfig.source_directory parallel_run_function.task.code A local or remote path

pointing at source code.
ParallelRunConfig.description parallel_run_function.description A friendly description of

the parallel
ParallelRunConfig.logging_level parallel_run_function.logging_level A string of the logging

level name, which is
defined in 'logging'.
Possible values are
'WARNING', 'INFO', and
'DEBUG'. (optional, default
value is 'INFO'.) This value
could be set through
PipelineParameter.
ParallelRunConfig.run_invocation_timeout parallel_run_function.retry_settings.timeout The timeout in seconds for

executing custom run()
function. If the execution
time is higher than this
threshold, the mini-batch
will be aborted, and
marked as a failed mini-
batch to trigger retry.
ParallelRunConfig.run_max_try parallel_run_function.retry_settings.max_retries The number of retries

when mini-batch is failed
or timeout. If all retries are
failed, the mini-batch will
be marked as failed to be
counted by
mini_batch_error_threshold
calculation.
ParallelRunConfig.append_row_file_name parallel_run_function.append_row_to Combined with

append_row_to setting.
ParallelRunConfig.allowed_failed_count parallel_run_function.mini_batch_error_threshold The number of failed mini

batches that could be
ignored in this parallel job.
If the count of failed mini-
batch is higher than this
threshold, the parallel job
will be marked as failed.
"-1" is the default number,

which means to ignore all
failed mini-batch during
parallel job.
ParallelRunConfig.allowed_failed_percent parallel_run_function.task.program_arguments set Similar to

--allowed_failed_percent "allowed_failed_count" but
this setting uses the
percent of failed mini-
batches instead of the
mini-batch failure count.
The range of this setting is

[0, 100]. "100" is the
default number, which
means to ignore all failed
mini-batch during parallel
job.
ParallelRunConfig.partition_keys Under development.
ParallelRunConfig.environment_variables parallel_run_function.environment_variables A dictionary of

environment variables
names and values. These
environment variables are
set on the process where
user script is being
executed.
ParallelRunStep.name parallel_run_function.name Name of the parallel job or

component created.
ParallelRunStep.inputs parallel_run_function.inputs A dict of inputs used by

this parallel.
-- parallel_run_function.input_data Declare the data to be split

and processed with
parallel
ParallelRunStep.output parallel_run_function.outputs The outputs of this parallel

job.
ParallelRunStep.side_inputs parallel_run_function.inputs Defined together with

inputs .
ParallelRunStep.arguments parallel_run_function.task.program_arguments The arguments of the

parallel task.
ParallelRunStep.allow_reuse parallel_run_function.is_deterministic Specify whether the

parallel will return same
output given same input.
Next steps
Parallel run step SDK v1 examples

Parallel job SDK v2 examples
Upgrade pipelines to SDK v2
Article • 05/15/2023
In SDK v2, "pipelines" are consolidated into jobs.
A pipeline is another type of job, which defines child jobs that may have input/output
relationships, forming a directed acyclic graph (DAG).
To upgrade, you'll need to change your code for defining and submitting the pipelines
to SDK v2. What you run within the child job doesn't need to be upgraded to SDK v2.
However, it's recommended to remove any code specific to Azure Machine Learning
from your model training scripts. This separation allows for an easier transition between
local and cloud and is considered best practice for mature MLOps. In practice, this
means removing azureml.* lines of code. Model logging and tracking code should be
replaced with MLflow. For more information, see how to use MLflow in v2.
This article gives a comparison of scenario(s) in SDK v1 and SDK v2. In the following
examples, we'll build three steps (train, score and evaluate) into a dummy pipeline job.
This demonstrates how to build pipeline jobs using SDK v1 and SDK v2, and how to
consume data and transfer data between steps.
Run a pipeline
SDK v1
Python
# import required libraries

import os
import azureml.core
from azureml.core import (
Workspace,
Dataset,
Datastore,
ComputeTarget,
Experiment,
ScriptRunConfig,
)
# check core SDK version number
print("Azure Machine Learning SDK Version: ", azureml.core.VERSION)
# load workspace
print(
"Workspace name: " + workspace.name,
"Azure region: " + workspace.location,
"Subscription id: " + workspace.subscription_id,
"Resource group: " + workspace.resource_group,
sep="\n",
)
# create an ML experiment
experiment = Experiment(workspace=workspace,
name="train_score_eval_pipeline")
# create a directory
script_folder = "./src"
# create compute

amlcompute_cluster_name = "cpu-cluster"

try:
aml_compute = ComputeTarget(workspace=workspace,
name=amlcompute_cluster_name)
compute_config =
AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
max_nodes=4)
aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name,
compute_config)
aml_compute.wait_for_completion(show_output=True)
# define data set

data_urls =
["wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv"]
input_ds = Dataset.File.from_files(data_urls)
# define steps in pipeline

model_output = OutputFileDatasetConfig('model_output')
name="train step",
arguments=['--training_data',
input_ds.as_named_input('training_data').as_mount() ,'--max_epocs', 5,
'--learning_rate', 0.1,'--model_output', model_output],
allow_reuse=True,
)
score_output = OutputFileDatasetConfig('score_output')
score_step = PythonScriptStep(
name="score step",
script_name="score.py",
arguments=['--model_input',model_output.as_input('model_input'), '-
-test_data', input_ds.as_named_input('test_data').as_mount(), '--
score_output', score_output],
allow_reuse=True,
)
eval_output = OutputFileDatasetConfig('eval_output')
eval_step = PythonScriptStep(
name="eval step",
script_name="eval.py",
arguments=['--
scoring_result',score_output.as_input('scoring_result'), '--
eval_output', eval_output],
allow_reuse=True,
)
# built pipeline
pipeline_steps = [train_step, score_step, eval_step]
pipeline = Pipeline(workspace = workspace, steps=pipeline_steps)

print("Pipeline is built.")
pipeline_run = experiment.submit(pipeline, regenerate_outputs=False)
print("Pipeline submitted for execution.")
SDK v2. Full sample link
Python
# import required libraries

from azure.identity import DefaultAzureCredential,
InteractiveBrowserCredential
from azure.ai.ml import MLClient, Input

from azure.ai.ml.dsl import pipeline
try:
credential = DefaultAzureCredential()
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")
except Exception as ex:
# Fall back to InteractiveBrowserCredential in case
DefaultAzureCredential not work
credential = InteractiveBrowserCredential()
# Get a handle to workspace

ml_client = MLClient.from_config(credential=credential)
# Retrieve an already attached Azure Machine Learning Compute.

cluster_name = "cpu-cluster"
print(ml_client.compute.get(cluster_name))
# Import components that are defined with Python function

with open("src/components.py") as fin:
print(fin.read())
# You need to install mldesigner package to use command_component

decorator.
# Option 1: install directly
# !pip install mldesigner
# Option 2: install as an extra dependency of azure-ai-ml

# !pip install azure-ai-ml[designer]
# import the components as functions

from src.components import train_model, score_data, eval_model
cluster_name = "cpu-cluster"
# define a pipeline with component
@pipeline(default_compute=cluster_name)
def pipeline_with_python_function_components(input_data, test_data,
learning_rate):
"""E2E dummy train-score-eval pipeline with components defined via
Python function components"""
# Call component obj as function: apply given inputs & parameters

to create a node in pipeline
train_with_sample_data = train_model(
training_data=input_data, max_epochs=5,
learning_rate=learning_rate
)
score_with_sample_data = score_data(
model_input=train_with_sample_data.outputs.model_output,
test_data=test_data
)
eval_with_sample_data = eval_model(
scoring_result=score_with_sample_data.outputs.score_output
)
# Return: pipeline outputs
return {
"eval_output": eval_with_sample_data.outputs.eval_output,
"model_output": train_with_sample_data.outputs.model_output,
}
pipeline_job = pipeline_with_python_function_components(
input_data=Input(
path="wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv",
type="uri_file"
),
test_data=Input(
path="wasbs://demo@dprepdata.blob.core.windows.net/Titanic.csv",
type="uri_file"
),
learning_rate=0.1,
)
# submit job to workspace

pipeline_job, experiment_name="train_score_eval_pipeline"
)

SDK v2
azureml.pipeline.core.Pipeline azure.ai.ml.dsl.pipeline
OutputDatasetConfig Output
dataset as_mount Input
Step and job/component type mapping

step in SDK v1 job type in SDK v2 component type in SDK v2
adla_step None None
automl_step automl job automl component
azurebatch_step None None

step in SDK v1 job type in SDK v2 component type in SDK v2
command_step command job command component
data_transfer_step coming soon coming soon
databricks_step coming soon coming soon
estimator_step command job command component
hyper_drive_step sweep job sweep component
kusto_step None None
module_step None command component
mpi_step command job command component
parallel_run_step Parallel job Parallel component
python_script_step command job command component
r_script_step command job command component
synapse_spark_step coming soon coming soon
Published pipelines
different inputs. This was known as Published Pipelines. Batch Endpoint proposes a
similar yet more powerful way to handle multiple assets running under a durable API
which is why the Published pipelines functionality has been moved to Pipeline
component deployments in batch endpoints (preview).
Batch endpoints decouples the interface (endpoint) from the actual implementation
(deployment) and allow the user to decide which deployment serves the default
implementation of the endpoint. Pipeline component deployments in batch endpoints
(preview) allow users to deploy pipeline components instead of pipelines, which make a
better use of reusable assets for those organizations looking to streamline their MLOps
practice.
The following table shows a comparison of each of the concepts:
Concept SDK v1 SDK v2
Pipeline's REST endpoint for invocation Pipeline endpoint Batch endpoint

Pipeline's specific version under the Published Pipeline component

endpoint pipeline deployment
Pipeline's arguments on invocation Pipeline Job inputs

parameter
Job generated from a published pipeline Pipeline job Batch job
See Upgrade pipeline endpoints to SDK v2 for specific guidance about how to migrate
to batch endpoints.
Related documents
steps in SDK v1
Create and run machine learning pipelines using components with the Azure
Machine Learning SDK v2
Build a simple ML pipeline for image classification (SDK v1)
OutputDatasetConfig
mldesigner
Upgrade deployment endpoints to SDK
v2
Article • 07/07/2023
With SDK/CLI v1, you can deploy models on ACI or AKS as web services. Your existing v1
model deployments and web services will continue to function as they are, but Using
SDK/CLI v1 to deploy models on ACI or AKS as web services is now considered as
legacy. For new model deployments, we recommend upgrading to v2.
In v2, we offer managed endpoints or Kubernetes endpoints. For a comparison of v1

and v2, see Endpoints and deployment.
There are several deployment funnels such as managed online endpoints, kubernetes
online endpoints (including Azure Kubernetes Services and Arc-enabled Kubernetes) in
v2, and Azure Container Instances (ACI) and Kubernetes Services (AKS) webservices in
v1. In this article, we'll focus on the comparison of deploying to ACI webservices (v1) and
managed online endpoints (v2).
Examples in this article show how to:
Deploy your model to Azure

Score using the endpoint
Delete the webservice/endpoint
Create inference resources

SDK v1
1. Configure a model, an environment, and a scoring script:
Python
# configure a model. example for registering a model

model = Model.register(ws, model_name="bidaf_onnx",
model_path="./model.onnx")
# configure an environment
env = Environment(name='myenv')
python_packages = ['nltk', 'numpy', 'onnxruntime']
for package in python_packages:
env.python.conda_dependencies.add_pip_package(package)
# configure an inference configuration with a scoring script
inference_config = InferenceConfig(
environment=env,
source_directory="./source_dir",
entry_script="./score.py",
)
2. Configure and deploy an ACI webservice:
Python
# defince compute resources for ACI

deployment_config = AciWebservice.deploy_configuration(
cpu_cores=0.5, memory_gb=1, auth_enabled=True
)
# define an ACI webservice

ws,
"myservice",
[model],
inference_config,
deployment_config,
overwrite=True,
)
# create the service

For more information on registering models, see Register a model from a local file.
SDK v2
1. Configure a model, an environment, and a scoring script:
Python

# configure a model
model = Model(path="../model-
1/model/sklearn_regression_model.pkl")
# configure an environment
env = Environment(
conda_file="../model-1/environment/conda.yml",
image="mcr.microsoft.com/azureml/openmpi3.1.2-
ubuntu18.04:20210727.v1",
)
# configure an inference configuration with a scoring script

from azure.ai.ml.entities import CodeConfiguration
code_config = CodeConfiguration(
code="../model-1/onlinescoring", scoring_script="score.py"
)
2. Configure and create an online endpoint:
Python
import datetime
from azure.ai.ml.entities import ManagedOnlineEndpoint
# create a unique endpoint name with current datetime to avoid

conflicts
online_endpoint_name = "endpoint-" +
datetime.datetime.now().strftime("%m%d%H%M%f")
# define an online endpoint

endpoint = ManagedOnlineEndpoint(
name=online_endpoint_name,
description="this is a sample online endpoint",
auth_mode="key",
tags={"foo": "bar"},
)
# create the endpoint:

ml_client.begin_create_or_update(endpoint)
3. Configure and create an online deployment:
Python
from azure.ai.ml.entities import ManagedOnlineDeployment
# define a deployment
blue_deployment = ManagedOnlineDeployment(
name="blue",
endpoint_name=online_endpoint_name,
model=model,
environment=env,
code_configuration=code_config,
instance_type="Standard_F2s_v2",
instance_count=1,
)
# create the deployment:

ml_client.begin_create_or_update(blue_deployment)
# blue deployment takes 100 traffic

endpoint.traffic = {"blue": 100}
ml_client.begin_create_or_update(endpoint)
For more information on concepts for endpoints and deployments, see What are online
endpoints?
Submit a request
SDK v1
Python
import json
data = {
"query": "What color is the fox",
"context": "The quick brown fox jumped over the lazy dog.",
}
data = json.dumps(data)
predictions = service.run(input_data=data)
print(predictions)
SDK v2
Python
# test the endpoint (the request will route to blue deployment as set
above)
ml_client.online_endpoints.invoke(
request_file="../model-1/sample-request.json",
)
# test the specific (blue) deployment

ml_client.online_endpoints.invoke(
deployment_name="blue",
request_file="../model-1/sample-request.json",
)
Delete resources
SDK v1
Python
service.delete()
SDK v2
Python
ml_client.online_endpoints.begin_delete(name=online_endpoint_name)

SDK v2
azureml.core.model.Model class azure.ai.ml.entities.Model class
azureml.core.Environment class azure.ai.ml.entities.Environment class
azureml.core.model.InferenceConfig azure.ai.ml.entities.CodeConfiguration class

class
azureml.core.webservice.AciWebservice azure.ai.ml.entities.OnlineDeployment class (and

class azure.ai.ml.entities.ManagedOnlineEndpoint class)
Model.deploy or Webservice.deploy ml_client.begin_create_or_update(online_deployment)
Webservice.run ml_client.online_endpoints.invoke
Webservice.delete ml_client.online_endpoints.delete
Related documents
For more information, see
v2 docs:
What are endpoints?

Deploy machine learning models to managed online endpoint using Python SDK
v2
v1 docs:
MLOps: ML model management v1

Deploy machine learning models
Upgrade pipeline endpoints to SDK v2
Article • 05/15/2023
different inputs. This was known as Published Pipelines.
What has changed?

Batch Endpoint proposes a similar yet more powerful way to handle multiple assets
running under a durable API which is why the Published pipelines functionality has been
moved to Pipeline component deployments in batch endpoints (preview).
Batch endpoints decouples the interface (endpoint) from the actual implementation
(deployment) and allow the user to decide which deployment serves the default
implementation of the endpoint. Pipeline component deployments in batch endpoints
allow users to deploy pipeline components instead of pipelines, which make a better use
of reusable assets for those organizations looking to streamline their MLOps practice.
The following table shows a comparison of each of the concepts:
Pipeline's REST endpoint for invocation Pipeline endpoint Batch endpoint
Pipeline's specific version under the Published Pipeline component

endpoint pipeline deployment
Pipeline's arguments on invocation Pipeline Job inputs

parameter
Job generated from a published pipeline Pipeline job Batch job
To learn how to create your first pipeline component deployment see How to deploy
pipelines in Batch Endpoints.
Moving to batch endpoints

Use the following guidelines to learn how to move from SDK v1 to SDK v2 using the
concepts in Batch Endpoints.
Publish a pipeline
Compare how publishing a pipeline has changed from v1 to v2:
SDK v1
1. First, we need to get the pipeline we want to publish:
Python
pipeline1 = Pipeline(workspace=ws, steps=[step1, step2])
2. We can publish the pipeline as follows:
Python
from azureml.pipeline.core import PipelineEndpoint
endpoint_name = "PipelineEndpointTest"
pipeline_endpoint = PipelineEndpoint.publish(
workspace=ws,
name=endpoint_name,
pipeline=pipeline,
description="A hello world endpoint for component deployments"
)
Submit a job to a pipeline endpoint
SDK v1
To call the default version of the pipeline, you can use:
Python
pipeline_endpoint = PipelineEndpoint.get(workspace=ws,
name="PipelineEndpointTest")
run_id = pipeline_endpoint.submit("PipelineEndpointExperiment")
You can also submit a job to a specific version:
SDK v1
Python
run_id = pipeline_endpoint.submit(endpoint_name, pipeline_version="0")
Get all pipelines deployed
SDK v1
Python
all_pipelines = PublishedPipeline.get_all(ws)
Using the REST API

You can create jobs from the endpoints by using the REST API of the invocation URL.
See the following examples to see how invocation has changed from v1 to v2.
SDK v1
Python
pipeline_endpoint = PipelineEndpoint.get(workspace=ws,
name=endpoint_name)
rest_endpoint = pipeline_endpoint.endpoint
response = requests.post(
rest_endpoint,
headers=aad_token,
json={
"ExperimentName": "PipelineEndpointExperiment",
"RunSource": "API",
"ParameterAssignments": {"argument1": "united", "argument2":45}
}
)
Next steps
How to deploy pipelines in Batch Endpoints
How to operationalize a training routine in batch endpoints
How to operationalize an scoring routine in batch endpoints
Upgrade steps for Azure Container
Instances web services to managed
online endpoints
Article • 04/04/2023
Managed online endpoints help to deploy your ML models in a turnkey manner.

Managed online endpoints work with powerful CPU and GPU machines in Azure in a
scalable, fully managed way. Managed online endpoints take care of serving, scaling,
securing, and monitoring your models, freeing you from the overhead of setting up and
managing the underlying infrastructure. Details can be found on Deploy and score a
machine learning model by using an online endpoint.
You can deploy directly to the new compute target with your previous models and
environments, or use the scripts provided by us to export the current services and
then deploy to the new compute without affecting your existing services. If you regularly
create and delete Azure Container Instances (ACI) web services, we strongly recommend
the deploying directly and not using the scripts.
） Important
The scoring URL will be changed after upgrade. For example, the scoring url for
ACI web service is like http://aaaaaa-bbbbb-1111.westus.azurecontainer.io/score .
The scoring URI for a managed online endpoint is like https://endpoint-
name.westus.inference.ml.azure.com/score .
Supported scenarios and differences
Auth mode
No auth isn't supported for managed online endpoint. If you use the upgrade scripts, it
will convert it to key auth. For key auth, the original keys will be used. Token-based auth
is also supported.
TLS
For ACI service secured with HTTPS, you don't need to provide your own certificates
anymore, all the managed online endpoints are protected by TLS.
Custom DNS name isn't supported.
Resource requirements
ContainerResourceRequirements isn't supported, you can choose the proper SKU for
your inferencing. The upgrade tool will map the CPU/Memory requirement to
corresponding SKU. If you choose to redeploy manually through CLI/SDK V2, we also
suggest the corresponding SKU for your new deployment.
CPU request Memory request in GB Suggested SKU
(0, 1] (0, 1.2] DS1 V2
(1, 2] (1.2, 1.7] F2s V2
(1, 2] (1.7, 4.7] DS2 V2
(1, 2] (4.7, 13.7] E2s V3
(2, 4] (0, 5.7] F4s V2
(2, 4] (5.7, 11.7] DS3 V2
(2, 4] (11.7, 16] E4s V3
"(" means greater than and "]" means less than or equal to. For example, “(0, 1]” means
“greater than 0 and less than or equal to 1”.
） Important
When upgrading from ACI, there will be some changes in how you'll be charged.
See our blog for a rough cost comparison to help you choose the right VM SKUs
for your workload.
Network isolation
For private workspace and VNet scenarios, see Use network isolation with managed
online endpoints.
） Important
As there are many settings for your workspace and VNet, we strongly suggest that
redeploy through the Azure CLI extension v2 for machine learning instead of the
script tool.
Not supported
EncryptionProperties for ACI container isn't supported.
ACI web services deployed through deploy_from_model and deploy_from_image
isn't supported by the upgrade tool. Redeploy manually through CLI/SDK V2.
Upgrade steps
With our CLI or SDK

Redeploy manually with your model files and environment definition. You can find our
examples on azureml-examples . Specifically, this is the SDK example for managed
online endpoint .
With our upgrade tool

This tool will automatically create new managed online endpoint based on your existing
web services. Your original services won't be affected. You can safely route the traffic to
the new endpoint and then delete the old one.
７ Note
The upgrade script is a sample script and is provided without a service level
agreement (SLA).
Use the following steps to run the scripts:
 Tip
The new endpoint created by the scripts will be created under the same workspace.
1. Use a bash shell to run the scripts. For example, a terminal session on Linux or the
Windows Subsystem for Linux (WSL).
2. Install Python SDK V1 to run the Python script.
3. Install Azure CLI.

4. Clone the repository to your local env. For example, git clone
https://github.com/Azure/azureml-examples .
5. Edit the following values in the migrate-service.sh file. Replace the values with
ones that apply to your configuration.
<SUBSCRIPTION_ID> - The subscription ID of your Azure subscription that
contains your workspace.

<RESOURCEGROUP_NAME> - The resource group that contains your workspace.
<WORKSPACE_NAME> - The workspace name.
<SERVICE_NAME> - The name of your existing ACI service.

<LOCAL_PATH> - A local path where resources and templates used by the script
are downloaded.
<NEW_ENDPOINT_NAME> - The name of the new endpoint that will be created.
We recommend that the new endpoint name is different from the previous
service name. Otherwise, the original service won't be displayed if you check
your endpoints on the portal.
<NEW_DEPLOYMENT_NAME> - The name of the deployment to the new endpoint.
6. Run the bash script. For example, ./migrate-service.sh . It will take about 5-10
minutes to finish the new deployment.
 Tip
If you receive an error that the script is not executable, or an editor opens
when you try to run the script, use the following command to mark the script
as executable:
Bash
chmod +x migrate-service.sh
7. After the deployment is completes successfully, you can verify the endpoint with
the az ml online-endpoint invoke command.
Contact us
If you have any questions or feedback on the upgrade script, contact us at
moeonboard@microsoft.com.
Next steps
What are Azure Machine Learning endpoints?
Deploy and score a model with an online endpoint
Anaconda licensing for Microsoft
products and services
Article • 04/24/2023
Microsoft is licensed to include packages from Anaconda and make these available to
our customers. Pre-installed packages that are embedded in our products and services
that you license from us may be used under the terms of the applicable Microsoft
license agreement or terms of service.
Additionally, Microsoft cloud-hosted products and services with a preinstalled copy of

Conda can be used by customers to access additional packages from Anaconda’s
repository (https://repo.anaconda.com/ ). This access is under the Anaconda Terms of
Service located at: https://legal.anaconda.com/policies/?name=terms-of-service#terms-
of-service , except that Microsoft cloud customers may use the Anaconda packages for
commercial purposes on Azure without obtaining a separate paid license from
Anaconda. The packages are only for use as part of our services, and do not entitle you
to download them to your own infrastructure or to use Anaconda’s trademarks.
Packages may have their own licenses provided by the package authors
Compare Microsoft machine learning
products and technologies
Article • 12/16/2022
Learn about the machine learning products and technologies from Microsoft. Compare
options to help you choose how to most effectively build, deploy, and manage your
machine learning solutions.
Cloud-based machine learning products

The following options are available for machine learning in the Azure cloud.
Cloud options What it is What you can do with it
Azure Machine Learning Managed platform Use a pretrained model. Or, train, deploy, and
for machine learning manage models on Azure using Python and
CLI
Azure Cognitive Services Pre-built AI Build intelligent applications quickly using

capabilities standard programming languages. Doesn't
implemented require machine learning and data science
through REST APIs expertise
and SDKs
Azure SQL Managed In-database machine Train and deploy models inside Azure SQL
Instance Machine learning for SQL Managed Instance
Learning Services
Machine learning in Analytics service with Train and deploy models inside Azure Synapse
Azure Synapse Analytics machine learning Analytics
Machine learning and AI Machine learning in Train and deploy models inside Azure SQL
with ONNX in Azure SQL on IoT Edge
SQL Edge
Azure Databricks Apache Spark-based Build and deploy models and data workflows
analytics platform using integrations with open-source machine
learning libraries and the MLFlow platform.
On-premises machine learning products

The following options are available for machine learning on-premises. On-premises
servers can also run in a virtual machine in the cloud.
On-premises options What it is What you can do with it
SQL Server Machine Learning In-database machine Train and deploy models inside
Services learning for SQL SQL Server
Machine Learning Services on SQL Machine learning in Train and deploy models on SQL
Server Big Data Clusters Big Data Clusters Server Big Data Clusters
Development platforms and tools

The following development platforms and tools are available for machine learning.
Platforms/tools What it is What you can do with it
Azure Data Science Virtual machine with pre- Develop machine learning solutions in a
Virtual Machine installed data science tools pre-configured environment
ML.NET Open-source, cross-platform Develop machine learning solutions for .NET

machine learning SDK applications
Windows ML Windows 10 machine Evaluate trained models on a Windows 10

learning platform device
MMLSpark Open-source, distributed, Create and deploy scalable machine

machine learning and learning applications for Scala and Python.
microservices framework for
Apache Spark
Machine Learning Open-source and cross- Manage packages, import machine learning
extension for platform machine learning models, make predictions, and create
Azure Data Studio extension for Azure Data notebooks to run experiments for your SQL
Studio databases

Azure Machine Learning is a fully managed cloud service used to train, deploy, and
manage machine learning models at scale. It fully supports open-source technologies,
so you can use tens of thousands of open-source Python packages such as TensorFlow,
PyTorch, and scikit-learn. Rich tools are also available, such as Compute instances,
Jupyter notebooks, or the Azure Machine Learning for Visual Studio Code extension, a
free extension that allows you to manage your resources, model training workflows and
deployments in Visual Studio Code. Azure Machine Learning includes features that
automate model generation and tuning with ease, efficiency, and accuracy.
Use Python SDK, Jupyter notebooks, R, and the CLI for machine learning at cloud scale.
For a low-code or no-code option, use Azure Machine Learning's interactive designer in
the studio to easily and quickly build, test, and deploy models using pre-built machine
learning algorithms.
Try Azure Machine Learning for free .
Item Description
Type Cloud-based machine learning solution
Supported Python, R
languages
Machine learning Model training

phases Deployment
MLOps/Management
Key benefits Code first (SDK) and studio & drag-and-drop designer web interface
authoring options.
Central management of scripts and run history, making it easy to compare

model versions.
Easy deployment and management of models to the cloud or edge

devices.
Considerations Requires some familiarity with the model management model.
Azure Cognitive Services

Azure Cognitive Services is a set of pre-built APIs that enable you to build apps that use
natural methods of communication. The term pre-built suggests that you do not need
to bring datasets or data science expertise to train models to use in your applications.
That's all done for you and packaged as APIs and SDKs that allow your apps to see, hear,
speak, understand, and interpret user needs with just a few lines of code. You can easily
add intelligent features to your apps, such as:
Vision: Object detection, face recognition, OCR, etc. See Computer Vision, Face,
Form Recognizer.
Speech: Speech-to-text, text-to-speech, speaker recognition, etc. See Speech
Service.
Language: Translation, Sentiment analysis, key phrase extraction, language
understanding, etc. See Translator, Text Analytics, Language Understanding, QnA
Maker
Decision: Anomaly detection, content moderation, reinforcement learning. See
Anomaly Detector, Content Moderator, Personalizer.
Use Cognitive Services to develop apps across devices and platforms. The APIs keep
improving, and are easy to set up.
Item Description
Type APIs for building intelligent applications
Supported Various options depending on the service. Standard ones are C#, Java,
languages JavaScript, and Python.
Machine learning Deployment

phases
Key benefits Build intelligent applications using pre-trained models available through
REST API and SDK.
Variety of models for natural communication methods with vision, speech,
language, and decision.
No machine learning or data science expertise required.
SQL machine learning

SQL machine learning adds statistical analysis, data visualization, and predictive analytics
in Python and R for relational data, both on-premises and in the cloud. Current
platforms and tools include:
SQL Server Machine Learning Services

Machine Learning Services on SQL Server Big Data Clusters
Azure SQL Managed Instance Machine Learning Services
Machine learning in Azure Synapse Analytics
Machine learning and AI with ONNX in Azure SQL Edge
Machine Learning extension for Azure Data Studio
Use SQL machine learning when you need built-in AI and predictive analytics on
relational data in SQL.
Item Description
Type On-premises predictive analytics for relational data
Supported Python, R, SQL

languages
Item Description
Machine learning Data preparation

phases Model training
Deployment
Key benefits Encapsulate predictive logic in a database function, making it easy to

include in data-tier logic.
Considerations Assumes a SQL database as the data tier for your application.
Azure Data Science Virtual Machine

The Azure Data Science Virtual Machine is a customized virtual machine environment on
the Microsoft Azure cloud. It is available in versions for both Windows and Linux
Ubuntu. The environment is built specifically for doing data science and developing ML
solutions. It has many popular data science, ML frameworks, and other tools pre-
installed and pre-configured to jump-start building intelligent applications for advanced
analytics.
Use the Data Science VM when you need to run or host your jobs on a single node. Or if
you need to remotely scale up your processing on a single machine.
Item Description
Type Customized virtual machine environment for data science
Key benefits Reduced time to install, manage, and troubleshoot data science tools and
frameworks.
The latest versions of all commonly used tools and frameworks are included.
Virtual machine options include highly scalable images with GPU capabilities for
intensive data modeling.
Considerations The virtual machine cannot be accessed when offline.
Running a virtual machine incurs Azure charges, so you must be careful to have
it running only when required.
Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform optimized for the
Microsoft Azure cloud services platform. Databricks is integrated with Azure to provide
one-click setup, streamlined workflows, and an interactive workspace that enables
collaboration between data scientists, data engineers, and business analysts. Use
Python, R, Scala, and SQL code in web-based notebooks to query, visualize, and model
data.
Use Databricks when you want to collaborate on building machine learning solutions on
Apache Spark.
Item Description
Type Apache Spark-based analytics platform
Supported languages Python, R, Scala, SQL
Machine learning phases Data preparation

Data preprocessing
Model training
Model tuning
Model inference
Management
Deployment
ML.NET
ML.NET is an open-source, and cross-platform machine learning framework. With
ML.NET, you can build custom machine learning solutions and integrate them into your
.NET applications. ML.NET offers varying levels of interoperability with popular
frameworks like TensorFlow and ONNX for training and scoring machine learning and
deep learning models. For resource-intensive tasks like training image classification
models, you can take advantage of Azure to train your models in the cloud.
Use ML.NET when you want to integrate machine learning solutions into your .NET
applications. Choose between the API for a code-first experience and Model Builder or
the CLI for a low-code experience.
Item Description
Type Open-source cross-platform framework for developing custom machine

learning applications with .NET
Languages C#, F#
supported

phases Training
Deployment
Item Description
Key benefits Data science & ML experience not required

Use familiar tools (Visual Studio, VS Code) and languages
Deploy where .NET runs
Extensible
Scalable
Local-first experience
Windows ML
Windows ML inference engine allows you to use trained machine learning models in
your applications, evaluating trained models locally on Windows 10 devices.
Use Windows ML when you want to use trained machine learning models within your
Windows applications.
Item Description
Type Inference engine for trained models in Windows devices
Languages supported C#/C++, JavaScript
MMLSpark
Microsoft ML for Apache Spark (MMLSpark) is an open-source library that expands
the distributed computing framework Apache Spark . MMLSpark adds many deep
learning and data science tools to the Spark ecosystem, including seamless integration
of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK),
LightGBM , LIME (Model Interpretability) , and OpenCV . You can use these tools to
create powerful predictive models on any Spark cluster, such as Azure Databricks or
Cosmic Spark.
MMLSpark also brings new networking capabilities to the Spark ecosystem. With the
HTTP on Spark project, users can embed any web service into their SparkML models.
Additionally, MMLSpark provides easy-to-use tools for orchestrating Azure Cognitive
Services at scale. For production-grade deployment, the Spark Serving project enables
high throughput, submillisecond latency web services, backed by your Spark cluster.
Item Description
Type Open-source, distributed machine learning and microservices framework

for Apache Spark
Item Description
Languages Scala 2.11, Java, Python 3.5+, R (beta)

supported

phases Model training
Deployment
Key benefits Scalability

Streaming + Serving compatible
Fault-tolerance
Considerations Requires Apache Spark
Contributors
This article is maintained by Microsoft. It was originally written by the following
contributors.
Principal author:
Zoiner Tejada | CEO and Architect
Next steps
To learn about all the Artificial Intelligence (AI) development products available
from Microsoft, see Microsoft AI platform .
For training in developing AI and Machine Learning solutions with Microsoft, see
Microsoft Learn training.
Related resources
Azure Machine Learning decision guide for optimal tool selection
Choose a Microsoft cognitive services technology
Artificial intelligence (AI) architecture design
How Azure Machine Learning works: resources and assets
Machine learning at scale
release notes
Article • 06/26/2023
In this article, learn about Azure Machine Learning Python SDK releases. For the full SDK
reference content, visit the Azure Machine Learning's main SDK for Python reference
page.
RSS feed: Get notified when this page is updated by copying and pasting the following
URL into your feed reader: https://learn.microsoft.com/api/search/rss?
search=%22Azure+machine+learning+release+notes%22&locale=en-us
2023-06-26
Azure Machine Learning SDK for Python v1.52.0

azureml-automl-dnn-vision
The mlflow signature for the runtime (legacy) automl models has changed to
accept binary inputs. This enables batch inferencing. The predict function is
backwards compatible so users can still send base64 strings as input. The
output from the predict function has changed to remove the temporary file
name and the empty visualizations and attributions key when model
explainability is n...
azureml-contrib-automl-dnn-forecasting
Fixed a bug that caused failures during distributed TCN training when the data
consists of a single time series.
azureml-interpret
remove shap pin in azureml-interpret to update to latest in interpret-
community
azureml-responsibleai
updated common environment and azureml-responsibleai package to
raiwidgets and responsibleai 0.28.0
2023-05-20

azureml-automl-core
AutoML forecasting task now supports rolling forecast and partial support for
quantile forecasts for hierarchical time series (HTS).
Disallow using non-tabular datasets to customers for Classification (multi-class
and multi-label) scenarios
azureml-automl-dnn-nlp
Disallow using non-tabular datasets to customers for Classification (multi-class
and multi-label) scenarios
azureml-contrib-automl-pipeline-steps
azureml-fsspec
Replaces all user caused errors in MLTable & FSSpec with a custom
UserErrorException imported from azureml-dataprep.
azureml-interpret
updated azureml-interpret package to interpret-community 0.29.*
azureml-pipeline-core
Fix pipeline_version not taking effect when calling
pipeline_endpoint.submit() .
azureml-train-automl-client
azureml-train-automl-runtime
mltable
More encoding variants like utf-8 are now supported when loading MLTable
files.
Replaces all user caused errors in MLTable & FSSpec with a custom
UserErrorException imported from azureml-dataprep.
2023-04-10

Added support for forecasting at given quantiles for TCN models.
updated common environment and azureml-responsibleai package to
raiwidgets and responsibleai 0.26.0
Fix MLTable handling for model test scenario
azureml-training-tabular
Added quantiles as parameter in the forecast_quantile method.
2023-03-01
Announcing end of support for Python 3.7 in Azure

Machine Learning SDK v1 packages
Feature deprecation
Deprecate Python 3.7 as a supported runtime for SDK v1 packages
On December 4, 2023, Azure Machine Learning will officially stop supporting
Python 3.7 for SDK v1 packages and deprecate it as a supported runtime. For
more details, please read our page on Azure SDK for Python version support
policy
As of the deprecation date of December 4, 2023, the Azure Machine Learning
SDK v1 packages will no longer receive security patches and other updates
for the Python 3.7 runtime.
The current Python 3.7 versions for Azure Machine Learning SDK v1 still
functions. However, in order to continue receiving security updates and
remaining qualified for technical assistance, Azure Machine Learning strongly
advises that you move your scripts and dependencies to a supported version
of the Python runtime.
As a runtime for Azure Machine Learning SDK v1 files, we advise using
Python version 3.8 or later.
Additionally, Python 3.7 based Azure Machine Learning SDK v1 packages no
longer qualifies for technical assistance.
Use Azure Machine Learning support to get in touch with us if you have any
concerns.
2023-13-02

Breaking changes
Starting with v1.49.0 and above, the following AutoML algorithms won't be
supported.
Regression: FastLinearRegressor, OnlineGradientDescentRegressor
Classification: AveragedPerceptronClassifier.
Use v1.48.0 or below to continue using these algorithms.
Bug fixes and improvements
Logs to show the final values applied to the model and hyperparameter
settings based on both the default values and the user-specified ones.
Nonscalar metrics for TCNForecaster now reflects values from the last epoch.
Forecast horizon visuals for train-set and test-set are now available while
running the TCN training experiment.
Runs won't fail anymore because of "Failed to calculate TCN metrics" error.
The warning message that says "Forecast Metric calculation resulted in error,
reporting back worst scores" will still be logged. Instead we raise exception
when we face inf/nan validation loss for more than two times consecutively
with a message "Invalid Model, TCN training didn't converge.". The
customers need be aware of the fact that loaded models may return nan/inf
values as predictions while inferencing after this change.
azureml-core
Azure Machine Learning workspace creation makes use of Log Analytics
Based Application Insights in preparation for deprecation of Classic
Application Insights. Users wishing to use Classic Application Insights
resources can still specify their own to bring when creating an Azure Machine
Learning workspace.
azureml-interpret
updated azureml-interpret package to interpret-community 0.28.*
azureml-mlflow
Updating azureml-mlflow client with initial support for MLflow 2.0
updated azureml-responsibleai package and notebooks to raiwidgets and
responsibleai v0.24.0
azureml-sdk
azureml-sdk and azureml-train-automl-client now support Python version
3.10
azureml-sdk and azureml-train-automl-client now support Python version
3.10
Clean up missing y before training
Clean up nan or empty values of target column for nonstreaming scenarios
Forecast horizon visuals for test-set are now available while running the
training experiment.
azureml-train-core
Added the support to customer to provide custom run id for hyperdrive runs
azureml-train-restclients-hyperdrive
Added the support to customer to provide custom run id for hyperdrive runs
2022-12-05

Breaking changes
Python 3.6 support has been deprecated for Azure Machine Learning SDK
packages.

azureml-core
Storage accounts created as a part of workspace creation now set blob public
access to be disabled by default
Updated azureml-responsibleai package and notebooks to raiwidgets and
responsibleai packages v0.23.0
Added model serializer and pyfunc model to azureml-responsibleai package
for saving and retrieving models easily
Added docstring for ManyModels Parameters and HierarchicalTimeSeries
Parameters
Fixed bug where generated code doesn't do train/test splits correctly.
Fixed a bug that was causing forecasting generated code training jobs to fail.
2022-10-25

Runtime changes for AutoML NLP to account for fixed training parameters, as
part of the newly introduced model sweeping and hyperparameter tuning.
azureml-mlflow
AZUREML_ARTIFACTS_DEFAULT_TIMEOUT can be used to control the timeout
for artifact upload
Many Models and Hierarchical Time Series training now enforces check on
timeout parameters to detect conflict before submitting the experiment for run.
This prevents experiment failure during the run by raising exception before
submitting experiment.
Customers can now control the step size while using rolling forecast in Many
Models inferences.
ManyModels inference with unpartitioned tabular data now supports
forecast_quantiles.
2022-09-26

Customers will no longer be allowed to specify a line in CoNLL, which only
comprises with a token. The line must always either be an empty newline or one
with exactly one token followed by exactly one space followed by exactly one
label.
There's a corner case where samples are reduced to 1 after the cross validation
split but sample_size still points to the count before the split and hence
batch_size ends up being more than sample count in some cases. In this fix we
initialize sample_size after the split
azureml-core
Added deprecation warning when inference customers use CLI/SDK v1 model
deployment APIs to deploy models and also when Python version is 3.6 and
less.
The following values of AZUREML_LOG_DEPRECATION_WARNING_ENABLED change the
behavior as follows:
Default - displays the warning when customer uses Python 3.6 and less and
for cli/sdk v1.
True - displays the sdk v1 deprecation warning on azureml-sdk packages.
False - disables the sdk v1 deprecation warning on azureml-sdk packages.

Command to be executed to set the environment variable to disable the
deprecation message:
Windows - setx AZUREML_LOG_DEPRECATION_WARNING_ENABLED "False"
Linux - export AZUREML_LOG_DEPRECATION_WARNING_ENABLED="False"
azureml-interpret
update azureml-interpret package to interpret-community 0.27.*
Fix schedule default time zone to UTC.
Fix incorrect reuse when using SqlDataReference in DataTransfer step.
update azureml-responsibleai package and curated images to raiwidgets and
responsibleai v0.22.0
Fixed a bug in generated scripts that caused certain metrics to not render
correctly in ui.
Many Models now support rolling forecast for inferencing.
Support to return top N models in Many models scenario.
2022-08-29

azureml-automl-runtime
Fixed a bug where the sample_weight column wasn't properly validated.
Added rolling_forecast() public method to the forecasting pipeline wrappers for
all supported forecasting models. This method replaces the deprecated
rolling_evaluation() method.
Fixed an issue where AutoML Regression tasks may fall back to train-valid split
for model evaluation, when CV would have been a more appropriate choice.
azureml-core
New cloud configuration suffix added, "aml_discovery_endpoint".
Updated the vendored azure-storage package from version 2 to version 12.
azureml-mlflow
New cloud configuration suffix added, "aml_discovery_endpoint".
update azureml-responsibleai package and curated images to raiwidgets and
responsibleai 0.21.0
azureml-sdk
The azureml-sdk package now allows Python 3.9.
2022-08-01
Weighted accuracy and Matthews correlation coefficient (MCC) will no longer
be a metric displayed on calculated metrics for NLP Multilabel classification.
Raise user error when invalid annotation format is provided
azureml-cli-common
Updated the v1 CLI description
Fixed the "Failed to calculate TCN metrics." issues caused for TCNForecaster
when different time series in the validation dataset have different lengths.
Added auto timeseries ID detection for DNN forecasting models like
TCNForecaster.
Fixed a bug with the Forecast TCN model where validation data could be
corrupted in some circumstances when the user provided the validation set.
azureml-core
Allow setting a timeout_seconds parameter when downloading artifacts from a
Run
Warning message added - Azure Machine Learning CLI v1 is getting retired on
2025-09-. Users are recommended to adopt CLI v2.
Fix submission to non-AmlComputes throwing exceptions.
Added docker context support for environments
azureml-interpret
Increase numpy version for AutoML packages
Fix regenerate_outputs=True not taking effect when submit pipeline.
Increase numpy version for AutoML packages
Enable code generation for vision and nlp
Original columns on which grains are created are added as part of
predictions.csv
2022-07-21
Announcing end of support for Python 3.6 in Azure

Machine Learning SDK v1 packages
Feature deprecation
Deprecate Python 3.6 as a supported runtime for SDK v1 packages
On December 05, 2022, Azure Machine Learning will deprecate Python 3.6 as
a supported runtime, formally ending our Python 3.6 support for SDK v1
packages.
From the deprecation date of December 05, 2022, Azure Machine Learning
will no longer apply security patches and other updates to the Python 3.6
runtime used by Azure Machine Learning SDK v1 packages.
The existing Azure Machine Learning SDK v1 packages with Python 3.6 still
continues to run. However, Azure Machine Learning strongly recommends
that you migrate your scripts and dependencies to a supported Python
runtime version so that you continue to receive security patches and remain
eligible for technical support.
We recommend using Python 3.8 version as a runtime for Azure Machine
Learning SDK v1 packages.
In addition, Azure Machine Learning SDK v1 packages using Python 3.6 is no
longer eligible for technical support.
If you have any questions, contact us through AML Support.
2022-06-27
Remove duplicate labels column from multi-label predictions
Many Models now provide the capability to generate prediction output in csv
format as well. - Many Models predictions now includes column names in the
output file in case of csv file format.
azureml-core
ADAL authentication is now deprecated and all authentication classes now use
MSAL authentication. Install azure-cli>=2.30.0 to utilize MSAL based
authentication when using AzureCliAuthentication class.
Added fix to force environment registration when
Environment.build(workspace) . The fix solves confusion of the latest
environment built instead of the asked one when environment is cloned or
inherited from another instance.
SDK warning message to restart Compute Instance before May 31, 2022, if it
was created before September 19, 2021
azureml-interpret
Updated azureml-interpret package to interpret-community 0.26.*
In the azureml-interpret package, add ability to get raw and engineered feature
names from scoring explainer. Also, add example to the scoring notebook to
get feature names from the scoring explainer and add documentation about
raw and engineered feature names.
azureml-mlflow
azureml-core as a dependency of azureml-mlflow has been removed. - MLflow
projects and local deployments require azureml-core and needs to be installed
separately.
Adding support for creating endpoints and deploying to them via the MLflow
client plugin.
Updated azureml-responsibleai package and environment images to latest
responsibleai and raiwidgets 0.19.0 release
Now OutputDatasetConfig is supported as the input of the MM/HTS pipeline
builder. The mappings are: 1) OutputTabularDatasetConfig -> treated as
unpartitioned tabular dataset. 2) OutputFileDatasetConfig -> treated as filed
dataset.
Added data validation that requires the number of minority class samples in the
dataset to be at least as much as the number of CV folds requested.
Automatic cross-validation parameter configuration is now available for AutoML
forecasting tasks. Users can now specify "auto" for n_cross_validations and
cv_step_size or leave them empty, and AutoML provides those configurations
base on your data. However, currently this feature isn't supported when TCN is
enabled.
Forecasting Parameters in Many Models and Hierarchical Time Series can now
be passed via object rather than using individual parameters in dictionary.
Enabled forecasting model endpoints with quantiles support to be consumed in
Power BI.
Updated AutoML scipy dependency upper bound to 1.5.3 from 1.5.2
2022-04-25

Breaking change warning
This breaking change comes from the June release of azureml-inference-server-http . In

the azureml-inference-server-http June release (v0.9.0), Python 3.6 support is dropped.
Since azureml-defaults depends on azureml-inference-server-http , this change is
propagated to azureml-defaults . If you aren't using azureml-defaults for inference, feel
free to use azureml-core or any other Azure Machine Learning SDK packages directly
instead of install azureml-defaults .
Turning on long range text feature by default.
Changing the ObjectAnnotation Class type from object to "data object".
azureml-core
This release updates the Keyvault class used by customers to enable them to
provide the keyvault content type when creating a secret using the SDK. This
release also updates the SDK to include a new function that enables customers
to retrieve the value of the content type from a specific secret.
azureml-interpret
updated azureml-interpret package to interpret-community 0.25.0
Don't print run detail anymore if pipeline_run.wait_for_completion with
show_output=False
Fixes a bug that would cause code generation to fail when the azureml-contrib-
automl-dnn-forecasting package is present in the training environment.
Fix error when using a test dataset without a label column with AutoML Model
Testing.
2022-03-28

We're making the Long Range Text feature optional and only if the customers
explicitly opt in for it, using the kwarg "enable_long_range_text"
Adding data validation layer for multi-class classification scenario, which applies
the same base class as multilabel for common validations, and a derived class
for more task specific data validation checks.
Fixing KeyError while computing class weights.
azureml-contrib-reinforcementlearning
SDK warning message for upcoming deprecation of RL service
azureml-core
Return logs for runs that went through our new runtime when calling any of
the get logs function on the run object, including run.get_details ,
run.get_all_logs , etc.
Added experimental method Datastore.register_onpremises_hdfs to allow users

to create datastores pointing to on-premises HDFS resources.
Updating the CLI documentation in the help command
azureml-interpret
For azureml-interpret package, remove shap pin with packaging update.
Remove numba and numpy pin after CE env update.
azureml-mlflow
Bugfix for MLflow deployment client run_local failing when config object wasn't
provided.
azureml-pipeline-steps
Remove broken link of deprecated pipeline EstimatorStep
update azureml-responsibleai package to raiwidgets and responsibleai 0.17.0
release
Code generation for automated ML now supports ForecastTCN models
(experimental).
Models created via code generation now have all metrics calculated by default
(except normalized mean absolute error, normalized median absolute error,
normalized RMSE, and normalized RMSLE in the case of forecasting models).
The list of metrics to be calculated can be changed by editing the return value
of get_metrics_names() . Cross validation is now used by default for forecasting
models created via code generation.
azureml-training-tabular
The list of metrics to be calculated can be changed by editing the return value
of get_metrics_names() . Cross validation is now used by default for forecasting
models created via code generation.
Converting decimal type y-test into float to allow for metrics computation to
proceed without errors.
2022-02-28

azureml-automl-core
Fix incorrect form displayed in PBI for integration with AutoML regression
models
Adding min-label-classes check for both classification tasks (multi-class and
multi-label). It throws an error for the customer's run if the unique number of
classes in the input training dataset is fewer than 2. It is meaningless to run
classification on fewer than two classes.
Converting decimal type y-test into float to allow for metrics computation to
proceed without errors.
AutoML training now supports numpy version 1.8.
Fixed a bug in the TCNForecaster model where not all training data would be
used when cross-validation settings were provided.
TCNForecaster wrapper's forecast method that was corrupting inference-time
predictions. Also fixed an issue where the forecast method would not use the
most recent context data in train-valid scenarios.
azureml-interpret
For azureml-interpret package, remove shap pin with packaging update.
Remove numba and numpy pin after CE env update.
azureml-responsibleai package to raiwidgets and responsibleai 0.17.0 release
azureml-synapse
Fix the issue that magic widget is disappeared.
Updating AutoML dependencies to support Python 3.8. This change breaks
compatibility with models trained with SDK 1.37 or below due to newer Pandas
interfaces being saved in the model.
AutoML training now supports numpy version 1.19
Fix AutoML reset index logic for ensemble models in
automl_setup_model_explanations API
In AutoML, use lightgbm surrogate model instead of linear surrogate model for
sparse case after latest lightgbm version upgrade
All internal intermediate artifacts that are produced by AutoML are now stored
transparently on the parent run (instead of being sent to the default workspace
blob store). Users should be able to see the artifacts that AutoML generates
under the outputs/ directory on the parent run.
2022-01-24

azureml-automl-core
Tabnet Regressor and Tabnet Classifier support in AutoML
Saving data transformer in parent run outputs, which can be reused to produce
same featurized dataset, which was used during the experiment run
Supporting getting primary metrics for Forecasting task in get_primary_metrics
API.
Renamed second optional parameter in v2 scoring scripts as GlobalParameters
Added the scoring metrics in the metrics UI
Bug fix for cases where the algorithm name for NimbusML models may show up
as empty strings, either on the ML Studio, or on the console outputs.
azureml-core
Added parameter blobfuse_enabled in
azureml.core.webservice.aks.AksWebservice.deploy_configuration. When this
parameter is true, models and scoring files are downloaded with blobfuse
instead of the blob storage API.
azureml-interpret
Updated azureml-interpret to interpret-community 0.24.0
In azureml-interpret update scoring explainer to support latest version of
lightgbm with sparse TreeExplainer
Update azureml-interpret to interpret-community 0.23.*
Add note in pipelinedata, recommend user to use pipeline output dataset
instead.
Add environment_variables to ParallelRunConfig, runtime environment
variables can be passed by this parameter and will be set on the process where
the user script is executed.
Tabnet Regressor and Tabnet Classifier support in AutoML
Saving data transformer in parent run outputs, which can be reused to produce
same featurized dataset, which was used during the experiment run
azureml-train-core
Enable support for early termination for Bayesian Optimization in Hyperdrive
Bayesian and GridParameterSampling objects can now pass on properties
2021-12-13
Breaking changes
azureml-core
Starting in version 1.37.0, Azure Machine Learning SDK uses MSAL as the
underlying authentication library. MSAL uses Azure Active Directory (Azure
AD) v2.0 authentication flow to provide more functionality and increases
security for token cache. For more information, see Overview of the Microsoft
Authentication Library (MSAL).
Update AML SDK dependencies to the latest version of Azure Resource
Management Client Library for Python (azure-mgmt-resource>=15.0.0,
<20.0.0) & adopt track2 SDK.
Starting in version 1.37.0, azure-ml-cli extension should be compatible with
the latest version of Azure CLI >=2.30.0.
When using Azure CLI in a pipeline, like as Azure DevOps, ensure all
tasks/stages are using versions of Azure CLI above v2.30.0 for MSAL-based
Azure CLI. Azure CLI 2.30.0 is not backward compatible with prior versions
and throws an error when using incompatible versions. To use Azure CLI
credentials with Azure Machine Learning SDK, Azure CLI should be installed
as pip package.

azureml-core
Removed instance types from the attach workflow for Kubernetes compute.
Instance types can now directly be set up in the Kubernetes cluster. For more
details, please visit aka.ms/amlarc/doc.
azureml-interpret
updated azureml-interpret to interpret-community 0.22.*
Fixed a bug where the experiment "placeholder" might be created on
submission of a Pipeline with an AutoMLStep.
update azureml-responsibleai and compute instance environment to
responsibleai and raiwidgets 0.15.0 release
update azureml-responsibleai package to latest responsibleai 0.14.0.
azureml-tensorboard
You can now use Tensorboard(runs, use_display_name=True) to mount the
TensorBoard logs to folders named after the run.display_name/run.id
instead of run.id .
Fixed a bug where the experiment "placeholder" might be created on
submission of a Pipeline with an AutoMLStep.
Update AutoMLConfig test_data and test_size docs to reflect preview status.
Added new feature that allows users to pass time series grains with one
unique value.
In certain scenarios, an AutoML model can predict NaNs. The rows that
correspond to these NaN predictions is removed from test datasets and
predictions before computing metrics in test runs.
2021-11-08

Cleaned up minor typos on some error messages.
Submitting Reinforcement Learning runs that use simulators are no longer
supported.
azureml-core
Added support for partitioned premium blob.
Specifying nonpublic clouds for Managed Identity authentication is no longer
supported.
User can migrate AKS web service to online endpoint and deployment, which
is managed by CLI (v2).
The instance type for training jobs on Kubernetes compute targets can now
be set via a RunConfiguration property:
run_config.kubernetescompute.instance_type.
azureml-defaults
Removed redundant dependencies like gunicorn and werkzeug
azureml-interpret
azureml-interpret package updated to 0.21.* version of interpret-community
Deprecate MpiStep in favor of using CommandStep for running ML training
(including distributed training) in pipelines.
azureml-train-automl-rutime
Update the AutoML model test predictions output format docs.
Added docstring descriptions for Naive, SeasonalNaive, Average, and
SeasonalAverage forecasting model.
Featurization summary is now stored as an artifact on the run (check for a file
named 'featurization_summary.json' under the outputs folder)
Enable categorical indicators support for Tabnet Learner.
Add downsample parameter to automl_setup_model_explanations to allow
users to get explanations on all data without downsampling by setting this
parameter to be false.
2021-10-11

azureml-automl-core
Enable binary metrics calculation
azureml-contrib-fairness
Improve error message on failed dashboard download
azureml-core
Bug in specifying nonpublic clouds for Managed Identity authentication has
been resolved.
Dataset.File.upload_directory() and
Dataset.Tabular.register_pandas_dataframe() experimental flags are now
removed.
Experimental flags are now removed in partition_by() method of
TabularDataset class.
Experimental flags are now removed for the partition_keys parameter of the
ParallelRunConfig class.
azureml-interpret
azureml-interpret package updated to intepret-community 0.20.*
azureml-mlflow
Made it possible to log artifacts and images with MLflow using subdirectories
Improve error message on failed dashboard download
Added support for computer vision tasks such as Image Classification, Object
Detection and Instance Segmentation. Detailed documentation can be found
at: Set up AutoML to train computer vision models with Python (v1).
Enable binary metrics calculation
Add TCNForecaster support to model test runs.
Update the model test predictions.csv output format. The output columns
now include the original target values and the features, which were passed in
to the test run. This can be turned off by setting
test_include_predictions_only=True in AutoMLConfig or by setting
include_predictions_only=True in ModelProxy.test() . If the user has
requested to only include predictions, then the output format looks like
(forecasting is the same as regression): Classification => [predicted values]
[probabilities] Regression => [predicted values] else (default): Classification
=> [original test data labels] [predicted values] [probabilities] [features]
Regression => [original test data labels] [predicted values] [features] The
[predicted values] column name = [label column name] + "_predicted" .
The [probabilities] column names = [class name] + "_predicted_proba" . If

no target column was passed in as input to the test run, then [original test
data labels] will not be in the output.
2021-09-07

azureml-automl-core
Added support for refitting a previously trained forecasting pipeline.
Added ability to get predictions on the training data (in-sample prediction)
for forecasting.
Add support to return predicted probabilities from a deployed endpoint of
an AutoML classifier model.
Added a forecasting option for users to specify that all predictions should be
integers.
Removed the target column name from being part of model explanation
feature names for local experiments with training_data_label_column_name
as dataset inputs.
Added support for refitting a previously trained forecasting pipeline.
Added ability to get predictions on the training data (in-sample prediction)
for forecasting.
azureml-core
Added support to set stream column type, mount and download stream
columns in tabular dataset.
New optional fields added to
Kubernetes.attach_configuration(identity_type=None, identity_ids=None)
which allow attaching KubernetesCompute with either SystemAssigned or
UserAssigned identity. New identity fields are included when calling
print(compute_target) or compute_target.serialize(): identity_type, identity_id,
principal_id, and tenant_id/client_id.
azureml-dataprep
Added support to set stream column type for tabular dataset. added support
to mount and download stream columns in tabular dataset.
azureml-defaults
The dependency azureml-inference-server-http==0.3.1 has been added to
azureml-defaults .
azureml-mlflow
Allow pagination of list_experiments API by adding max_results and
page_token optional params. For documentation, see MLflow official docs.
azureml-sdk
Replaced dependency on deprecated package(azureml-train) inside azureml-
sdk.
Add azureml-responsibleai to azureml-sdk extras
Expose the test_data and test_size parameters in AutoMLConfig . These
parameters can be used to automatically start a test run after the model
training phase has been completed. The test run computes predictions using
the best model and generates metrics given these predictions.
2021-08-24
Azure Machine Learning Experimentation User Interface

Run Delete
Run Delete is a new functionality that allows users to delete one or multiple
runs from their workspace.
This functionality can help users reduce storage costs and manage storage
capacity by regularly deleting runs and experiments from the UI directly.
Batch Cancel Run
Batch Cancel Run is new functionality that allows users to select one or multiple
runs to cancel from their run list.
This functionality can help users cancel multiple queued runs and free up space
on their cluster.
2021-08-18
Azure Machine Learning Experimentation User Interface

Run Display Name
The Run Display Name is a new, editable and optional display name that can be
assigned to a run.
This name can help with more effectively tracking, organizing and discovering
the runs.
The Run Display Name is defaulted to an adjective_noun_guid format (Example:
awesome_watch_2i3uns).
This default name can be edited to a more customizable name. This can be
edited from the Run details page in the Azure Machine Learning studio user
interface.
2021-08-02

azureml-automl-core
Improved error handling around XGBoost model retrieval.
Added possibility to convert the predictions from float to integers for
forecasting and regression tasks.
Updated default value for enable_early_stopping in AutoMLConfig to True.
Added possibility to convert the predictions from float to integers for
forecasting and regression tasks.
Hierarchical timeseries (HTS) is enabled for forecasting tasks through
pipelines.
Add Tabular dataset support for inferencing
Custom path can be specified for the inference data
Some properties in azureml.core.environment.DockerSection are deprecated,
such as shm_size property used by Ray workers in reinforcement learning
jobs. This property can now be specified in
azureml.contrib.train.rl.WorkerConfiguration instead.
azureml-core
Fixed a hyperlink in ScriptRunConfig.distributed_job_config documentation
Azure Machine Learning compute clusters can now be created in a location
different from the location of the workspace. This is useful for maximizing
idle capacity allocation and managing quota utilization across different
locations without having to create more workspaces just to use quota and
create a compute cluster in a particular location. For more information, see
Create an Azure Machine Learning compute cluster.
Added display_name as a mutable name field of Run object.
Dataset from_files now supports skipping of data extensions for large input
data
azureml-dataprep
Fixed a bug where to_dask_dataframe would fail because of a race condition.
Dataset from_files now supports skipping of data extensions for large input
data
azureml-defaults
We're removing the dependency azureml-model-management-
sdk==1.0.1b6.post1 from azureml-defaults.
azureml-interpret
updated azureml-interpret to interpret-community 0.19.*
pipelines.
Switch to using blob store for caching in Automated ML.
pipelines.
Improved error handling around XGBoost model retrieval.
Switch to using blob store for caching in Automated ML.
pipelines.
2021-07-06

azureml-core
Expose diagnose workspace health in SDK/CLI
azureml-defaults
Added opencensus-ext-azure==1.0.8 dependency to azureml-defaults
Updated the AutoMLStep to use prebuilt images when the environment for
job submission matches the default environment
New error analysis client added to upload, download and list error analysis
reports
Ensure raiwidgets and responsibleai packages are version synchronized
Set the time allocated to dynamically search across various featurization
strategies to a maximum of one-fourth of the overall experiment timeout
2021-06-21

azureml-core
Improved documentation for platform property on Environment class
Changed default AML Compute node scale down time from 120 seconds to
1800 seconds
Updated default troubleshooting link displayed on the portal for
troubleshooting failed runs to: https://aka.ms/azureml-run-
troubleshooting
Data Cleaning: Samples with target values in [None, "", "nan", np.nan] is
dropped prior to featurization and/or model training
azureml-interpret
Prevent flush task queue error on remote Azure Machine Learning runs that
use ExplanationClient by increasing timeout
Add jar parameter to synapse step
Fix high cardinality guardrails to be more aligned with docs
2021-06-07
azureml-core
Pin dependency ruamel-yaml to < 0.17.5 as a breaking change was released
in 0.17.5.
aml_k8s_config property is being replaced with namespace ,
default_instance_type , and instance_types parameters for

KubernetesCompute attach.
Workspace sync keys were changed to a long running operation.

Fixed problems where runs with big data may fail with Elements of y_test
cannot be NaN .
azureml-mlflow
MLFlow deployment plugin bugfix for models with no signature.
ParallelRunConfig: update doc for process_count_per_node.
Support for custom defined quantiles during MM inference
Support for forecast_quantiles during batch inference.
Support for custom defined quantiles during MM inference
Support for forecast_quantiles during batch inference.
2021-05-25
Announcing the CLI (v2) for Azure Machine Learning

The ml extension to the Azure CLI is the next-generation interface for Azure Machine
Learning. It enables you to train and deploy models from the command line, with
features that accelerate scaling data science up and out while tracking the model
lifecycle. Install and set up the CLI (v2).

Breaking changes
Dropped support for Python 3.5.
Fixed a bug where the STLFeaturizer failed if the time-series length was
shorter than the seasonality. This error manifested as an IndexError. The case
is handled now without error, though the seasonal component of the STL just
consists of zeros in this case.
azureml-contrib-automl-dnn-vision
Added a method for batch inferencing with file paths.
azureml-contrib-gbdt
The azureml-contrib-gbdt package has been deprecated and might not
receive future updates and will be removed from the distribution altogether.
azureml-core
Corrected explanation of parameter create_if_not_exists in
Datastore.register_azure_blob_container.
Added sample code to DatasetConsumptionConfig class.
Added support for step as an alternative axis for scalar metric values in
run.log()
azureml-dataprep
Limit partition size accepted in _with_partition_size() to 2 GB
azureml-interpret
update azureml-interpret to the latest interpret-core package version
Dropped support for SHAP DenseData, which has been deprecated in SHAP
0.36.0.
Enable ExplanationClient to upload to a user specified datastore.
azureml-mlflow
Move azureml-mlflow to mlflow-skinny to reduce the dependency footprint
while maintaining full plugin support
PipelineParameter code sample is updated in the reference doc to use
correct parameter.
2021-05-10

Improved AutoML Scoring script to make it consistent with designer
Patch bug where forecasting with the Prophet model would throw a "missing
column" error if trained on an earlier version of the SDK.
Added the ARIMAX model to the public-facing, forecasting-supported model
lists of the AutoML SDK. Here, ARIMAX is a regression with ARIMA errors and
a special case of the transfer function models developed by Box and Jenkins.
For a discussion of how the two approaches are different, see The ARIMAX
model muddle . Unlike the rest of the multivariate models that use
autogenerated, time-dependent features (hour of the day, day of the year,
and so on) in AutoML, this model uses only features that are provided by the
user, and it makes interpreting coefficients easy.
azureml-contrib-dataset
Updated documentation description with indication that libfuse should be
installed while using mount.
azureml-core
Default CPU curated image is now
mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04. Default GPU image
is now mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.2-cudnn8-
ubuntu18.04
Run.fail() is now deprecated, use Run.tag() to mark run as failed or use
Run.cancel() to mark the run as canceled.
Updated documentation with a note that libfuse should be installed when
mounting a file dataset.
Add experimental register_dask_dataframe() support to tabular dataset.
Support DatabricksStep with Azure Blob/ADL-S as inputs/outputs and expose
parameter permit_cluster_restart to let customer decide whether AML can
restart cluster when i/o access configuration need to be added into cluster
azureml-dataset-runtime
azureml-dataset-runtime now supports versions of pyarrow < 4.0.0
azureml-mlflow
Added support for deploying to Azure Machine Learning via our MLFlow
plugin.
Support DatabricksStep with Azure Blob/ADL-S as inputs/outputs and expose
parameter permit_cluster_restart to let customer decide whether AML can
restart cluster when i/o access configuration need to be added into cluster
azureml-synapse
Enable audience in msi authentication
Added changed link for compute target doc
2021-04-19

azureml-core
Added the ability to override the default timeout value for artifact uploading
via the "AZUREML_ARTIFACTS_DEFAULT_TIMEOUT" environment variable.
Fixed a bug where docker settings in Environment object on ScriptRunConfig
aren't respected.
Allow partitioning a dataset when copying it to a destination.
Added a custom mode to the OutputDatasetConfig to enable passing
created Datasets in pipelines through a link function. These support
enhancements made to enable Tabular Partitioning for PRS.
Added a new KubernetesCompute compute type to azureml-core.
Adding a custom mode to the OutputDatasetConfig and enabling a user to
pass through created Datasets in pipelines through a link function. File path
destinations support placeholders. These support the enhancements made to
enable Tabular Partitioning for PRS.
Addition of new KubernetesCompute compute type to azureml-core.
azureml-synapse
Update spark UI url in widget of azureml synapse
The STL featurizer for the forecasting task now uses a more robust
seasonality detection based on the frequency of the time series.
azureml-train-core
Fixed bug where docker settings in Environment object are not respected.
2021-04-05

azureml-automl-core
Fixed an issue where Naive models would be recommended in AutoMLStep
runs and fail with lag or rolling window features. These models won't be
recommended when target lags or target rolling window size are set.
Changed console output when submitting an AutoML run to show a portal
link to the run.
azureml-core
Added HDFS mode in documentation.
Added support to understand File Dataset partitions based on glob structure.
Added support for update container registry associated with Azure Machine
Learning Workspace.
Deprecated Environment attributes under the DockerSection - "enabled",
"shared_volume" and "arguments" are a part of DockerConfiguration in
RunConfiguration now.
Updated Pipeline CLI clone documentation
Updated portal URIs to include tenant for authentication
Removed experiment name from run URIs to avoid redirects
Updated experiment URO to use experiment ID.
Bug fixes for attaching remote compute with Azure Machine Learning CLI.
Updated portal URIs to include tenant for authentication.
Updated experiment URI to use experiment ID.
azureml-interpret
azureml-interpret updated to use interpret-community 0.17.0
azureml-opendatasets
Input start date and end date type validation and error indication if it's not
datetime type.
azureml-parallel-run
[Experimental feature] Add partition_keys parameter to ParallelRunConfig, if
specified, the input dataset(s) would be partitioned into mini-batches by the
keys specified by it. It requires all input datasets to be partitioned dataset.
Bugfix - supporting path_on_compute while passing dataset configuration as
download.
Deprecate RScriptStep in favor of using CommandStep for running R scripts
in pipelines.
Deprecate EstimatorStep in favor of using CommandStep for running ML
training (including distributed training) in pipelines.
azureml-sdk
Update python_requires to < 3.9 for azureml-sdk
link to the run.
azureml-train-core
Deprecated DockerSection's 'enabled', 'shared_volume', and 'arguments'
attributes in favor of using DockerConfiguration with ScriptRunConfig.
Use Azure Open Datasets for MNIST dataset
Hyperdrive error messages have been updated.
2021-03-22

azureml-automl-core
link to the run.
azureml-core
Starts to support updating container registry for workspace in SDK and CLI
Updated Pipeline CLI clone documentation
Updated portal URIs to include tenant for authentication
Removed experiment name from run URIs to avoid redirects
Updated experiment URO to use experiment ID.
Bug fixes for attaching remote compute using az CLI
Updated portal URIs to include tenant for authentication.
Added support to understand File Dataset partitions based on glob structure.
azureml-interpret
azureml-interpret updated to use interpret-community 0.17.0
Input start date and end date type validation and error indication if it's not
datetime type.
download.
download.
Deprecate RScriptStep in favor of using CommandStep for running R scripts
in pipelines.
Deprecate EstimatorStep in favor of using CommandStep for running ML
training (including distributed training) in pipelines.
link to the run.
azureml-train-core
Use Azure Open Datasets for MNIST dataset
Hyperdrive error messages have been updated.
2021-03-31
Azure Machine Learning studio Notebooks Experience

(March Update)
New features
Render CSV/TSV. Users are able to render and TSV/CSV file in a grid format for
easier data analysis.
SSO Authentication for Compute Instance. Users can now easily authenticate
any new compute instances directly in the Notebook UI, making it easier to
authenticate and use Azure SDKs directly in Azure Machine Learning.
Compute Instance Metrics. Users are able to view compute metrics like CPU
usage and memory via terminal.
File Details. Users can now see file details including the last modified time, and
file size by clicking the three dots beside a file.

Improved page load times.
Improved performance.
Improved speed and kernel reliability.
Gain vertical real estate by permanently moving up Notebook file pane.
Links are now clickable in Terminal
Improved Intellisense performance
2021-03-08

azureml-automl-core
Removed backwards compatible imports from azureml.automl.core.shared .
Module not found errors in the azureml.automl.core.shared namespace can
be resolved by importing from azureml.automl.runtime.shared .
Exposed object detection yolo model.
Added functionality to filter Tabular Datasets by column values and File
Datasets by metadata.
Include JSON schema in wheel for azureml-contrib-fairness
azureml-contrib-mir
With setting show_output to True when deploy models, inference
configuration and deployment configuration is replayed before sending the
request to server.
azureml-core
Added functionality to filter Tabular Datasets by column values and File
Datasets by metadata.
Previously, it was possibly for users to create provisioning configurations for
ComputeTarget's that didn't satisfy the password strength requirements for
the admin_user_password field (that is, that they must contain at least 3 of the
following: One lowercase letter, one uppercase letter, one digit, and one
special character from the following set: \`~!@#$%^&*()=+_[]{}|;:./'",<>? ). If
the user created a configuration with a weak password and ran a job using
that configuration, the job would fail at runtime. Now, the call to
AmlCompute.provisioning_configuration throws a ComputeTargetException
with an accompanying error message explaining the password strength
requirements.
Additionally, it was also possible in some cases to specify a configuration
with a negative number of maximum nodes. It's no longer possible to do this.
Now, AmlCompute.provisioning_configuration throws a
ComputeTargetException if the max_nodes argument is a negative integer.
With setting show_output to True when deploy models, inference

configuration and deployment configuration is displayed.
With setting show_output to True when wait for the completion of model
deployment, the progress of deployment operation is displayed.
Allow customer specified Azure Machine Learning auth config directory
through environment variable: AZUREML_AUTH_CONFIG_DIR
Previously, it was possible to create a provisioning configuration with the
minimum node count less than the maximum node count. The job would run
but fail at runtime. This bug has now been fixed. If you now try to create a
provisioning configuration with min_nodes < max_nodes the SDK raises a
ComputeTargetException .
azureml-interpret
fix explanation dashboard not showing aggregate feature importances for
sparse engineered explanations
optimized memory usage of ExplanationClient in azureml-interpret package
Fixed show_output=False to return control to the user when running using
spark.
2021-02-28

(February Update)
New features
Native Terminal (GA). Users now have access to an integrated terminal and Git
operation via the integrated terminal.
Notebook Snippets (preview). Common Azure Machine Learning code excerpts
are now available at your fingertips. Navigate to the code snippets panel,
accessible via the toolbar, or activate the in-code snippets menu using Ctrl +
Space.
Keyboard Shortcuts. Full parity with keyboard shortcuts available in Jupyter.
Indicate Cell parameters. Shows users which cells in a notebook are parameter
cells and can run parameterized notebooks via Papermill on the Compute
Instance.
Terminal and Kernel session manager: Users are able to manage all kernels and
terminal sessions running on their compute.
Sharing Button. Users can now share any file in the Notebook file explorer by
right-clicking the file and using the share button.

Improved page load times
Improved performance
Improved speed and kernel reliability
Added spinning wheel to show progress for all ongoing Compute Instance
operations.
Right click in File Explorer. Right-clicking any file now opens file operations.
2021-02-16

azureml-core
[Experimental feature] Add support to link synapse workspace into AML as a
linked service
[Experimental feature] Add support to attach synapse spark pool into AML as
a compute
[Experimental feature] Add support for identity based data access. Users can
register datastore or datasets without providing credentials. In such case,
users' Azure AD token or managed identity of compute target is used for
authentication. To learn more, see Connect to storage by using identity-
based data access.
[Experimental feature] Add support for SynapseSparkStep
azureml-synapse
[Experimental feature] Add support of spark magic to run interactive session
in synapse spark pool.
In this update, we added holt winters exponential smoothing to forecasting
toolbox of AutoML SDK. Given a time series, the best model is selected by
AICc (Corrected Akaike's Information Criterion) and returned.
AutoML now generates two log files instead of one. Log statements go to
one or the other depending on which process the log statement was
generated in.
Remove unnecessary in-sample prediction during model training with cross-
validations. This may decrease model training time in some cases, especially
for time-series forecasting models.
Add a JSON schema for the dashboardDictionary uploads.
azureml-contrib-interpret
azureml-contrib-interpret README is updated to reflect that package will be
removed in next update after being deprecated since October, use azureml-
interpret package instead
azureml-core
Previously, it was possible to create a provisioning configuration with the
minimum node count less than the maximum node count. This has now been
fixed. If you now try to create a provisioning configuration with min_nodes <
max_nodes the SDK will raises a ComputeTargetException .
Fixes bug in wait_for_completion in AmlCompute, which caused the function
to return control flow before the operation was actually complete
Run.fail() is now deprecated, use Run.tag() to mark run as failed or use
Run.cancel() to mark the run as canceled.
Show error message 'Environment name expected str, {} found' when
provided environment name isn't a string.
Fixed a bug that prevented AutoML experiments performed on Azure
Databricks clusters from being canceled.
2021-02-09

azureml-automl-core
Fixed bug where an extra pip dependency was added to the conda yml file
for vision models.
Fixed a bug where classical forecasting models (for example, AutoArima)
could receive training data wherein rows with imputed target values weren't
present. This violated the data contract of these models. * Fixed various bugs
with lag-by-occurrence behavior in the time-series lagging operator.
Previously, the lag-by-occurrence operation didn't mark all imputed rows
correctly and so wouldn't always generate the correct occurrence lag values.
Also fixed some compatibility issues between the lag operator and the rolling
window operator with lag-by-occurrence behavior. This previously resulted in
the rolling window operator dropping some rows from the training data that
it should otherwise use.
azureml-core
Adding support for Token Authentication by audience.
Add process_count to PyTorchConfiguration to support multi-process multi-
node PyTorch jobs.
CommandStep now GA and no longer experimental.
ParallelRunConfig: add argument allowed_failed_count and

allowed_failed_percent to check error threshold on mini batch level. Error
threshold has three flavors now:
error_threshold - the number of allowed failed mini batch items;
allowed_failed_count - the number of allowed failed mini batches;
allowed_failed_percent- the percent of allowed failed mini batches.
A job stops if exceeds any of them. error_threshold is required to keep it
backward compatibility. Set the value to -1 to ignore it.
Fixed whitespace handling in AutoMLStep name.
ScriptRunConfig is now supported by HyperDriveStep

azureml-train-core
HyperDrive runs invoked from a ScriptRun is now considered a child run.
Add process_count to PyTorchConfiguration to support multi-process multi-
node PyTorch jobs.
azureml-widgets
Add widget ParallelRunStepDetails to visualize status of a ParallelRunStep.
Allows hyperdrive users to see an axis on the parallel coordinates chart that
shows the metric value corresponding to each set of hyperparameters for
each child run.
2021-01-31

(January Update)
New features
Native Markdown Editor in Azure Machine Learning. Users can now render and
edit markdown files natively in Azure Machine Learning Studio.
Run Button for Scripts (.py, .R and .sh). Users can easily now run Python, R and
Bash script in Azure Machine Learning
Variable Explorer. Explore the contents of variables and data frames in a pop-up
panel. Users can easily check data type, size, and contents.
Table of Content. Navigate to sections of your notebook, indicated by
Markdown headers.
Export your Notebook as Latex/HTML/Py. Create easy-to-share notebook files
by exporting to LaTex, HTML, or .py
Intellicode. ML-powered results provide an enhanced intelligent autocompletion
experience.

2021-01-25

azure-cli-ml
Fixed CLI help text when using AmlCompute with UserAssigned Identity
Deploy and download buttons become visible for AutoML vision runs, and
models can be deployed or downloaded similar to other AutoML runs. There
are two new files (scoring_file_v_1_0_0.py and conda_env_v_1_0_0.yml) which
contain a script to run inferencing and a yml file to recreate the conda
environment. The 'model.pth' file has also been renamed to use the '.pt'
extension.
azureml-core
MSI support for azure-cli-ml
User Assigned Managed Identity Support.
With this change, the customers should be able to provide a user assigned
identity that can be used to fetch the key from the customer key vault for
encryption at rest.
fix row_count=0 for the profile of large files - fix error in double conversion
for delimited values with white space padding
Remove experimental flag for Output dataset GA
Update documentation on how to fetch specific version of a Model
Allow updating workspace for mixed mode access in private link
Fix to remove another registration on datastore for resume run feature
Added CLI/SDK support for updating primary user assigned identity of
workspace
azureml-interpret
updated azureml-interpret to interpret-community 0.16.0
memory optimizations for explanation client in azureml-interpret
Enabled streaming for ADB runs
azureml-train-core
Fix to remove another registration on datastore for resume run feature
azureml-widgets
Customers shouldn't see changes to existing run data visualization using the
widget, and now have support if they optionally use conditional
hyperparameters.
The user run widget now includes a detailed explanation for why a run is in
the queued state.
2021-01-11

azure-cli-ml
framework_version added in OptimizationConfig. It's used when model is
registered with framework MULTI.
azureml-contrib-optimization
framework_version added in OptimizationConfig. It's used when model is
registered with framework MULTI.
Introducing CommandStep, which would take command to process.
Command can include executables, shell commands, scripts, etc.
azureml-core
Now workspace creation supports user assigned identity. Adding the uai
support from SDK/CLI
Fixed issue on service.reload() to pick up changes on score.py in local
deployment.
run.get_details() has an extra field named "submittedBy", which displays
the author's name for this run.

Edited Model.register method documentation to mention how to register
model from run directly
Fixed IOT-Server connection status change handling issue.
2020-12-31

(December Update)
New features
User Filename search. Users are now able to search all the files saved in a
workspace.
Markdown Side by Side support per Notebook Cell. In a notebook cell, users
can now have the option to view rendered markdown and markdown syntax
side-by-side.
Cell Status Bar. The status bar indicates what state a code cell is in, whether a
cell run was successful, and how long it took to run.

2020-12-07

azureml-automl-core
Added experimental support for test data to AutoMLStep.
Added the initial core implementation of test set ingestion feature.
Moved references to sklearn.externals.joblib to depend directly on joblib.
introduce a new AutoML task type of "image-instance-segmentation".
When all the strings in a text column have a length of exactly one character,
the TfIdf word-gram featurizer doesn't work because its tokenizer ignores the
strings with fewer than two characters. The current code change allows
AutoML to handle this use case.
azureml-contrib-automl-dnn-nlp
Initial PR for new dnn-nlp package
This new package is responsible for creating steps required for many models
train/inference scenario. - It also moves the train/inference code into
azureml.train.automl.runtime package so any future fixes would be
automatically available through curated environment releases.
azureml-core
Fixing the xref warnings for documentation in azureml-core package
Doc string fixes for Command support feature in SDK
Adding command property to RunConfiguration. The feature enables users
to run an actual command or executables on the compute through Azure
Machine Learning SDK.
Users can delete an empty experiment given the ID of that experiment.
azureml-dataprep
Added dataset support for Spark built with Scala 2.12. This adds to the
existing 2.11 support.
azureml-mlflow
AzureML-MLflow adds safe guards in remote scripts to avoid early
termination of submitted runs.
Fixed a bug in setting a default pipeline for pipeline endpoint created via UI
azureml-tensorboard
Fixing the xref warnings for documentation in azureml-core package
Fix the computation of the raw explanations for the best AutoML model if the
AutoML models are trained using validation_size setting.
Moved references to sklearn.externals.joblib to depend directly on joblib.
azureml-train-core
HyperDriveRun.get_children_sorted_by_primary_metric() should complete

faster now
Improved error handling in HyperDrive SDK.
Deprecated all estimator classes in favor of using ScriptRunConfig to

configure experiment runs. Deprecated classes include:
MMLBase
Estimator
PyTorch
TensorFlow
Chainer
SKLearn
Deprecated the use of Nccl and Gloo as valid input types for Estimator
classes in favor of using PyTorchConfiguration with ScriptRunConfig.
Deprecated the use of Mpi as a valid input type for Estimator classes in favor
of using MpiConfiguration with ScriptRunConfig.
Adding command property to run configuration. The feature enables users to

run an actual command or executables on the compute through Azure
Machine Learning SDK.
Deprecated all estimator classes in favor of using ScriptRunConfig to

configure experiment runs. Deprecated classes include: + MMLBaseEstimator
+ Estimator + PyTorch + TensorFlow + Chainer + SKLearn
Deprecated the use of Nccl and Gloo as a valid type of input for Estimator
classes in favor of using PyTorchConfiguration with ScriptRunConfig.
Deprecated the use of Mpi as a valid type of input for Estimator classes in
favor of using MpiConfiguration with ScriptRunConfig.
2020-11-30

(November Update)
New features
Native Terminal. Users now have access to an integrated terminal and Git
operation via the integrated terminal.
Duplicate Folder
Costing for Compute Drop Down
Offline Compute Pylance

Large File Upload. You can now upload file >95 mb
2020-11-09

azureml-automl-core
Improved handling of short time series by allowing padding them with
Gaussian noise.
Throw ConfigException if a DateTime column has OutOfBoundsDatetime
value
Gaussian noise.
Making sure that each text column can use char-gram transform with the n-
gram range based on the length of the strings in that text column
Providing raw feature explanations for best mode for AutoML experiments
running on user's local compute
azureml-core
Pin the package: pyjwt to avoid pulling in breaking in versions upcoming
releases.
Creating an experiment returns the active or last archived experiment with
that same given name if such experiment exists or a new experiment.
Calling get_experiment by name returns the active or last archived
experiment with that given name.
Users can't rename an experiment while reactivating it.
Improved error message to include potential fixes when a dataset is
incorrectly passed to an experiment (for example, ScriptRunConfig).
Improved documentation for OutputDatasetConfig.register_on_complete to
include the behavior of what happens when the name already exists.
Specifying dataset input and output names that have the potential to collide
with common environment variables now results in a warning
Repurposed grant_workspace_access parameter when registering datastores.
Set it to True to access data behind virtual network from Machine Learning
studio. Learn more
Linked service API is refined. Instead of providing resource ID, we have three
separate parameters sub_id, rg, and name defined in configuration.
In order to enable customers to self-resolve token corruption issues, enable
workspace token synchronization to be a public method.
This change allows an empty string to be used as a value for a script_param
Gaussian noise.
Throw ConfigException if a DateTime column has OutOfBoundsDatetime
value
Added support for providing raw feature explanations for best model for
AutoML experiments running on user's local compute
Gaussian noise.
azureml-train-core
This change allows an empty string to be used as a value for a script_param
azureml-train-restclients-hyperdrive
README has been changed to offer more context
azureml-widgets
Add string support to charts/parallel-coordinates library for widget.
2020-11-05
Data Labeling for image instance segmentation (polygon

annotation) (preview)
The image instance segmentation (polygon annotations) project type in data labeling is
available now, so users can draw and annotate with polygons around the contour of the
objects in the images. Users are able assign a class and a polygon to each object which
of interest within an image.
Learn more about image instance segmentation labeling.
2020-10-26

new examples
A new community-driven repository of examples is available at
https://github.com/Azure/azureml-examples
azureml-automl-core
Fixed an issue where get_output may raise an XGBoostError.
Time/calendar based features created by AutoML now have the prefix.
Fixed an IndexError occurring during training of StackEnsemble for
classification datasets with large number of classes and subsampling
enabled.
Fixed an issue where VotingRegressor predictions may be inaccurate after
refitting the model.
azureml-core
More detail added about relationship between AKS deployment
configuration and Azure Kubernetes Service concepts.
Environment client labels support. User can label Environments and reference
them by label.
azureml-dataprep
Better error message when using currently unsupported Spark with Scala
2.12.
azureml-explain-model
The azureml-explain-model package is officially deprecated
azureml-mlflow
Resolved a bug in mlflow.projects.run against azureml backend where
Finalizing state wasn't handled properly.
Add support to create, list and get pipeline schedule based one pipeline
endpoint.
Improved the documentation of PipelineData.as_dataset with an invalid
usage example - Using PipelineData.as_dataset improperly now results in a
ValueException being thrown
Changed the HyperDriveStep pipelines notebook to register the best model
within a PipelineStep directly after the HyperDriveStep run.
Changed the HyperDriveStep pipelines notebook to register the best model
within a PipelineStep directly after the HyperDriveStep run.
Fixed an issue where get_output may raise an XGBoostError.

(October Update)
New features
Full virtual network support
Focus Mode
Save notebooks Ctrl-S
Line Numbers

Improvement in speed and kernel reliability
Jupyter Widget UI updates
2020-10-12

azure-cli-ml
AKSWebservice and AKSEndpoints now support pod-level CPU and Memory
resource limits. These optional limits can be used by setting --cpu-cores-
limit and --memory-gb-limit flags in applicable CLI calls
azureml-core
Pin major versions of direct dependencies of azureml-core
AKSWebservice and AKSEndpoints now support pod-level CPU and Memory
resource limits. More information on Kubernetes Resources and Limits
Updated run.log_table to allow individual rows to be logged.
Added static method Run.get(workspace, run_id) to retrieve a run only
using a workspace
Added instance method Workspace.get_run(run_id) to retrieve a run within
the workspace
Introducing command property in run configuration, which enables users to
submit command instead of script & arguments.
azureml-interpret
fixed explanation client is_raw flag behavior in azureml-interpret
azureml-sdk
azureml-sdk officially support Python 3.8.
azureml-train-core
Adding TensorFlow 2.3 curated environment
Introducing command property in run configuration, which enables users to
submit command instead of script & arguments.
azureml-widgets
Redesigned interface for script run widget.
2020-09-28

LIME explainer moved from azureml-contrib-interpret to interpret-
community package and image explainer removed from azureml-contrib-
interpret package
visualization dashboard removed from azureml-contrib-interpret package,
explanation client moved to azureml-interpret package and deprecated in
azureml-contrib-interpret package and notebooks updated to reflect
improved API
fix pypi package descriptions for azureml-interpret, azureml-explain-model,
azureml-contrib-interpret and azureml-tensorboard
azureml-contrib-notebook
Pin nbcovert dependency to < 6 so that papermill 1.x continues to work.
azureml-core
Added parameters to the TensorflowConfiguration and MpiConfiguration
constructor to enable a more streamlined initialization of the class attributes
without requiring the user to set each individual attribute. Added a
PyTorchConfiguration class for configuring distributed PyTorch jobs in
ScriptRunConfig.
Pin the version of azure-mgmt-resource to fix the authentication error.
Support Triton No Code Deploy
outputs directories specified in Run.start_logging() are now tracked when
using run in interactive scenarios. The tracked files are visible on ML Studio
upon calling Run.complete()
File encoding can be now specified during dataset creation with
Dataset.Tabular.from_delimited_files and
Dataset.Tabular.from_json_lines_files by passing the encoding argument.
The supported encodings are 'utf8', 'iso88591', 'latin1', 'ascii', utf16', 'utf32',
'utf8bom' and 'windows1252'.
Bug fix when environment object isn't passed to ScriptRunConfig constructor.
Updated Run.cancel() to allow cancel of a local run from another machine.
azureml-dataprep
Fixed dataset mount timeout issues.
azureml-interpret
improved API
azureml-interpret package updated to depend on interpret-community
0.15.0
Fixed pipeline issue with OutputFileDatasetConfig where the system may
stop responding when register_on_complete is called with the name
parameter set to a pre-existing dataset name.
Removed stale databricks notebooks.
azureml-tensorboard
improved API
azureml-widgets
improved API
2020-09-21

azure-cli-ml
Grid Profiling removed from the SDK and isn't longer supported.
azureml-accel-models
azureml-accel-models package now supports TensorFlow 2.x
azureml-automl-core
Added error handling in get_output for cases when local versions of
pandas/sklearn don't match the ones used during training
Fixed a bug where AutoArima iterations would fail with a PredictionException
and the message: "Silent failure occurred during prediction."
azureml-cli-common
Grid Profiling removed from the SDK and isn't longer supported.
azureml-contrib-server
Update description of the package for pypi overview page.
azureml-core
Grid Profiling removed from the SDK and is no longer supported.
Reduce number of error messages when workspace retrieval fails.
Don't show warning when fetching metadata fails
New Kusto Step and Kusto Compute Target.
Update document for sku parameter. Remove sku in workspace update
functionality in CLI and SDK.
Updated documentation for Azure Machine Learning Environments.
Expose service managed resources settings for AML workspace in SDK.
azureml-dataprep
Enable execute permission on files for Dataset mount.
azureml-mlflow
Updated Azure Machine Learning MLflow documentation and notebook
samples
New support for MLflow projects with Azure Machine Learning backend
MLflow model registry support
Added Azure RBAC support for AzureML-MLflow operations
Improved the documentation of the PipelineOutputFileDataset.parse_*
methods.
Provided Swaggerurl property for pipeline-endpoint entity via that user can
see the schema definition for published pipeline endpoint.
azureml-telemetry
azureml-train
Added error handling in get_output for cases when local versions of
pandas/sklearn don't match the ones used during training
azureml-train-core
2020-08-31

Preview features
azureml-core With the new output datasets capability, you can write back to
cloud storage including Blob, ADLS Gen 1, ADLS Gen 2, and FileShare. You can
configure where to output data, how to output data (via mount or upload),
whether to register the output data for future reuse and sharing and pass
intermediate data between pipeline steps seamlessly. This enables
reproducibility, sharing, prevents duplication of data, and results in cost
efficiency and productivity gains. Learn how to use it

azureml-automl-core
Added validated_{platform}_requirements.txt file for pinning all pip
dependencies for AutoML.
This release supports models greater than 4 Gb.
Upgraded AutoML dependencies: scikit-learn (now 0.22.1), pandas (now
0.25.1), numpy (now 1.18.2).
Set horovod for text DNN to always use fp16 compression.
This release supports models greater than 4 Gb.
Fixed issue where AutoML fails with ImportError: can't import name
RollingOriginValidator .

0.25.1), numpy (now 1.18.2).
0.25.1), numpy (now 1.18.2).
Provide a short description for azureml-contrib-fairness.
azureml-contrib-pipeline-steps
Added message indicating this package is deprecated and user should use
azureml-pipeline-steps instead.
azureml-core
Added list key command for workspace.
Add tags parameter in Workspace SDK and CLI.
Fixed the bug where submitting a child run with Dataset fails due to
TypeError: can't pickle _thread.RLock objects .
Adding page_count default/documentation for Model list().
Modify CLI&SDK to take adbworkspace parameter and Add workspace adb
lin/unlink runner.
Fix bug in Dataset.update that caused newest Dataset version to be updated
not the version of the Dataset update was called on.
Fix bug in Dataset.get_by_name that would show the tags for the newest
Dataset version even when a specific older version was retrieved.
azureml-interpret
Added probability outputs to shap scoring explainers in azureml-interpret
based on shap_values_output parameter from original explainer.
Improved PipelineOutputAbstractDataset.register 's documentation.
0.25.1), numpy (now 1.18.2).
0.25.1), numpy (now 1.18.2).
azureml-train-core
Users must now provide a valid hyperparameter_sampling arg when creating
a HyperDriveConfig. In addition, the documentation for
HyperDriveRunConfig has been edited to inform users of the deprecation of
HyperDriveRunConfig.
Reverting PyTorch Default Version to 1.4.
Adding PyTorch 1.6 & TensorFlow 2.2 images and curated environment.

(August Update)
New features
New Getting started landing Page
Preview features
Gather feature in Notebooks. With the Gather feature, users can now easily
clean up notebooks with, Gather uses an automated dependency analysis of
your notebook, ensuring the essential code is kept, but removing any irrelevant
pieces.

Improvement in speed and reliability
Dark mode bugs fixed
Output Scroll Bugs fixed
Sample Search now searches all the content of all the files in the Azure Machine
Learning sample notebooks repo
Multi-line R cells can now run
"I trust contents of this file" is now auto checked after first time
Improved Conflict resolution dialog, with new "Make a copy" option
2020-08-17

azure-cli-ml
Add image_name and image_label parameters to Model.package() to enable
renaming the built package image.
azureml-automl-core
AutoML raises a new error code from dataprep when content is modified
while being read.
Added alerts for the user when data contains missing values but featurization
is turned off.
Fixed child runs failures when data contains nan and featurization is turned
off.
while being read.
Updated normalization for forecasting metrics to occur by grain.
Improved calculation of forecast quantiles when lookback features are
disabled.
Fixed bool sparse matrix handling when computing explanations after
AutoML.
azureml-core
A new method run.get_detailed_status() now shows the detailed
explanation of current run status. It's currently only showing explanation for
Queued status.
Add image_name and image_label parameters to Model.package() to enable

renaming the built package image.
New method set_pip_requirements() to set the entire pip section in
CondaDependencies at once.
Enable registering credential-less ADLS Gen2 datastore.
Improved error message when trying to download or mount an incorrect
dataset type.
Update time series dataset filter sample notebook with more examples of
partition_timestamp that provides filter optimization.
Change the sdk and CLI to accept subscriptionId, resourceGroup,
workspaceName, peConnectionName as parameters instead of
ArmResourceId when deleting private endpoint connection.
Experimental Decorator shows class name for easier identification.
Descriptions for the Assets inside of Models are no longer automatically
generated based on a Run.
azureml-datadrift
Mark create_from_model API in DataDriftDetector as to be deprecated.
azureml-dataprep
Improved error message when trying to download or mount an incorrect
dataset type.
Fixed bug when deserializing pipeline graph that contains registered
datasets.
RScriptStep supports RSection from azureml.core.environment.
Removed the passthru_automl_config parameter from the AutoMLStep public
API and converted it to an internal only parameter.
Removed local asynchronous, managed environment runs from AutoML. All
local runs run in the environment the run was launched from.
Fixed snapshot issues when submitting AutoML runs with no user-provided
scripts.
Fixed child run failures when data contains nan and featurization is turned
off.
while being read.
Fixed snapshot issues when submitting AutoML runs with no user-provided
scripts.
Fixed child run failures when data contains nan and featurization is turned
off.
azureml-train-core
Added support for specifying pip options (for example--extra-index-url) in
the pip requirements file passed to an Estimator through
pip_requirements_file parameter.
2020-08-03

azure-cli-ml
Fix model framework and model framework not passed in run object in CLI
model registration path
Fix CLI amlcompute identity show command to show tenant ID and principal
ID
Added get_best_child () to AutoMLRun for fetching the best child run for an
AutoML Run without downloading the associated model.
Added ModelProxy object that allows predict or forecast to be run on a
remote training environment without downloading the model locally.
Unhandled exceptions in AutoML now point to a known issues HTTP page,
where more information about the errors can be found.
azureml-core
Model names can be 255 characters long.
Environment.get_image_details() return object type changed.
DockerImageDetails class replaced dict , image details are available from the
new class properties. Changes are backward compatible.
Fix bug for Environment.from_pip_requirements() to preserve dependencies
structure
Fixed a bug where log_list would fail if an int and double were included in the
same list.
While enabling private link on an existing workspace, note that if there are
compute targets associated with the workspace, those targets won't work if
they are not behind the same virtual network as the workspace private
endpoint.
Made as_named_input optional when using datasets in experiments and
added as_mount and as_download to FileDataset . The input name is
automatically generated if as_mount or as_download is called.
azureml-automl-core
Unhandled exceptions in AutoML now point to a known issues HTTP page,
where more information about the errors can be found.
Added get_best_child () to AutoMLRun for fetching the best child run for an
AutoML Run without downloading the associated model.
Added ModelProxy object that allows predict or forecast to be run on a
remote training environment without downloading the model locally.
Added enable_default_model_output and enable_default_metrics_output
flags to AutoMLStep . These flags can be used to enable/disable the default
outputs.
2020-07-20

azureml-automl-core
When using AutoML, if a path is passed into the AutoMLConfig object and it
does not already exist, it is automatically created.
Users can now specify a time series frequency for forecasting tasks by using
the freq parameter.
the freq parameter.
AutoML Forecasting now supports rolling evaluation, which applies to the
use case that the length of a test or validation set is longer than the input
horizon, and known y_pred value is used as forecasting context.
azureml-core
Warning messages are printed if no files were downloaded from the
datastore in a run.
Added documentation for skip_validation to the
Datastore.register_azure_sql_database method .
Users are required to upgrade to sdk v1.10.0 or above to create an auto
approved private endpoint. This includes the Notebook resource that is
usable behind the VNet.
Expose NotebookInfo in the response of get workspace.
Changes to have calls to list compute targets and getting compute target
succeed on a remote run. Sdk functions to get compute target and list
workspace compute targets now works in remote runs.
Add deprecation messages to the class descriptions for azureml.core.image
classes.
Throw exception and clean up workspace and dependent resources if
workspace private endpoint creation fails.
Support workspace sku upgrade in workspace update method.
azureml-datadrift
Update matplotlib version from 3.0.2 to 3.2.1 to support Python 3.8.
azureml-dataprep
Added support of web url data sources with Range or Head request.
Improved stability for file dataset mount and download.
Fixed issues related to removal of RequirementParseError from setuptools.
Use docker instead of conda for local runs submitted using
"compute_target='local'"
The iteration duration printed to the console has been corrected. Previously,
the iteration duration was sometimes printed as run end time minus run
creation time. It has been corrected to equal run end time minus run start
time.
the freq parameter.
Improved console output when best model explanations fail.
Renamed input parameter to "blocked_models" to remove a sensitive term.
Renamed input parameter to "allowed_models" to remove a sensitive
term.
the freq parameter.
2020-07-06
azureml-automl-core
Replaced get_model_path() with AZUREML_MODEL_DIR environment variable
in AutoML autogenerated scoring script. Also added telemetry to track
failures during init().
Removed the ability to specify enable_cache as part of AutoMLConfig
Fixed a bug where runs may fail with service errors during specific forecasting
runs
Improved error handling around specific models during get_output
Fixed call to fitted_model.fit(X, y) for classification with y transformer
Enabled customized forward fill imputer for forecasting tasks
A new ForecastingParameters class is used instead of forecasting parameters
in a dict format
Improved target lag autodetection
Added limited availability of multi-noded, multi-gpu distributed featurization
with BERT
Prophet now does additive seasonality modeling instead of multiplicative.
Fixed the issue when short grains, having frequencies different from ones of
the long grains results in failed runs.
Collect system/gpu stats and log averages for training and scoring
azureml-contrib-mir
Added support for enable-app-insights flag in ManagedInferencing
azureml-core
A validate parameter to these APIs by allowing validation to be skipped when
the data source is not accessible from the current compute.
TabularDataset.time_before(end_time, include_boundary=True,
validate=True)
TabularDataset.time_after(start_time, include_boundary=True,
validate=True)
TabularDataset.time_recent(time_delta, include_boundary=True,
validate=True)
TabularDataset.time_between(start_time, end_time,
include_boundary=True, validate=True)
Added framework filtering support for model list, and added NCD AutoML
sample in notebook back
For Datastore.register_azure_blob_container and
Datastore.register_azure_file_share (only options that support SAS token), we
have updated the doc strings for the sas_token field to include minimum
permissions requirements for typical read and write scenarios.
Deprecating _with_auth param in ws.get_mlflow_tracking_uri()
azureml-mlflow
Add support for deploying local file:// models with AzureML-MLflow
Deprecating _with_auth param in ws.get_mlflow_tracking_uri()
Recently published Covid-19 tracking datasets are now available with the
SDK
Log out warning when "azureml-defaults" is not included as part of pip-
dependency
Improve Note rendering.
Added support for quoted line breaks when parsing delimited files to
PipelineOutputFileDataset.
The PipelineDataset class is deprecated. For more information, see
https://aka.ms/dataset-deprecation . Learn how to use dataset with
pipeline, see https://aka.ms/pipeline-with-dataset .
Doc updates to azureml-pipeline-steps.
Added support in ParallelRunConfig's load_yaml() for users to define
Environments inline with the rest of the config or in a separate file
azureml-train-automl-client.
Removed the ability to specify enable_cache as part of AutoMLConfig
Added limited availability of multi-noded, multi-gpu distributed featurization
with BERT.
Added error handling for incompatible packages in ADB based automated
machine learning runs.
azureml-widgets
Doc updates to azureml-widgets.
2020-06-22

Preview features
azureml-contrib-fairness The azureml-contrib-fairness package provides
integration between the open-source fairness assessment and unfairness
mitigation package Fairlearn and Azure Machine Learning studio. In particular,
the package enables model fairness evaluation dashboards to be uploaded as
part of an Azure Machine Learning Run and appear in Azure Machine Learning
studio

azure-cli-ml
Support getting logs of init container.
Added new CLI commands to manage ComputeInstance
azureml-automl-core
Users are now able to enable stack ensemble iteration for Time series tasks
with a warning that it could potentially overfit.
Added a new type of user exception that is raised if the cache store contents
have been tampered with
Class Balancing Sweeping is no longer enabled if user disables featurization.
azureml-contrib-notebook
Doc improvements to azureml-contrib-notebook package.
Doc improvements to azureml-contrib--pipeline-steps package.
azureml-core
Add set_connection, get_connection, list_connections, delete_connection
functions for customer to operate on workspace connection resource
Documentation updates to azureml-coore/azureml.exceptions package.
Documentation updates to azureml-core package.
Doc updates to ComputeInstance class.
Doc improvements to azureml-core/azureml.core.compute package.
Doc improvements for webservice-related classes in azureml-core.
Support user-selected datastore to store profiling data
Added expand and page_count property for model list API
Fixed bug where removing the overwrite property causes the submitted run
to fail with deserialization error.
Fixed inconsistent folder structure when downloading or mounting a
FileDataset referencing to a single file.
Loading a dataset of parquet files to_spark_dataframe is now faster and
supports all parquet and Spark SQL datatypes.
Support getting logs of init container.
AutoML runs are now marked as child run of Parallel Run Step.
azureml-datadrift
Doc improvements to azureml-contrib-notebook package.
azureml-dataprep
Loading a dataset of parquet files to_spark_dataframe is now faster and
supports all parquet and Spark SQL datatypes.
Better memory handling for OutOfMemory issue for to_pandas_dataframe.
azureml-interpret
Upgraded azureml-interpret to use interpret-community version 0.12.*
azureml-mlflow
Doc improvements to azureml-mlflow.
Adds support for AML model registry with MLFlow.
Added support for Python 3.8
Updated PipelineDataset 's documentation to make it clear it is an internal
class.
ParallelRunStep updates to accept multiple values for one argument, for
example: "--group_column_names", "Col1", "Col2", "Col3"
Removed the passthru_automl_config requirement for intermediate data
usage with AutoMLStep in Pipelines.
Doc improvements to azureml-pipeline-steps package.
Removed the passthru_automl_config requirement for intermediate data
usage with AutoMLStep in Pipelines.
azureml-telemetry
Doc improvements to azureml-telemetry.
Fixed a bug where experiment.submit() called twice on an AutoMLConfig
object resulted in different behavior.
Users are now able to enable stack ensemble iteration for Time series tasks
with a warning that it could potentially overfit.
Changed AutoML run behavior to raise UserErrorException if service throws
user error
Fixes a bug that caused azureml_automl.log to not get generated or be
missing logs when performing an AutoML experiment on a remote compute
target.
For Classification data sets with imbalanced classes, we apply Weight
Balancing, if the feature sweeper determines that for subsampled data,
Weight Balancing improves the performance of the classification task by a
certain threshold.
Changed AutoML run behavior to raise UserErrorException if service throws
user error
2020-06-08

azure-cli-ml
Completed the removal of model profiling from mir contrib by cleaning up
CLI commands and package dependencies, Model profiling is available in
core.
Upgrades the min Azure CLI version to 2.3.0
azureml-automl-core
Better exception message on featurization step fit_transform() due to custom
transformer parameters.
Add support for multiple languages for deep learning transformer models
such as BERT in automated ML.
Remove deprecated lag_length parameter from documentation.
The forecasting parameters documentation was improved. The lag_length
parameter was deprecated.
Fixed the error raised when one of categorical columns is empty in
forecast/test time.
Fix the run failures happening when the lookback features are enabled and
the data contain short grains.
Fixed the issue with duplicated time index error message when lags or rolling
windows were set to 'auto'.
Fixed the issue with Prophet and Arima models on data sets, containing the
lookback features.
Added support of dates before 1677-09-21 or after 2262-04-11 in columns
other than date time in the forecasting tasks. Improved error messages.
Better exception message on featurization step fit_transform() due to custom
transformer parameters.
Cache operations that result in some OSErrors raises user error.
Added checks to ensure training and validation data have the same number
and set of columns
Fixed issue with the autogenerated AutoML scoring script when the data
contains quotation marks
Enabling explanations for AutoML Prophet and ensemble models that
contain Prophet model.
A recent customer issue revealed a live-site bug wherein we log messages
along Class-Balancing-Sweeping even when the Class Balancing logic isn't
properly enabled. Removing those logs/messages with this PR.
azureml-cli-common
Completed the removal of model profiling from mir contrib by cleaning up
CLI commands and package dependencies, Model profiling is available in
core.
Load testing tool
azureml-core
Documentation changes on Script_run_config.py
Fixes a bug with printing the output of run submit-pipeline CLI
Documentation improvements to azureml-core/azureml.data
Fixes issue retrieving storage account using hdfs getconf command
Improved register_azure_blob_container and register_azure_file_share
documentation
azureml-datadrift
Improved implementation for disabling and enabling dataset drift monitors
azureml-interpret
In explanation client, remove NaNs or Infs prior to json serialization on
upload from artifacts
Update to latest version of interpret-community to improve out of memory
errors for global explanations with many features and classes
Add true_ys optional parameter to explanation upload to enable more
features in the studio UI
Improve download_model_explanations() and list_model_explanations()
performance
Small tweaks to notebooks, to aid with debugging
azureml-opendatasets needs azureml-dataprep version 1.4.0 or higher.
Added warning if lower version is detected
This change allows user to provide an optional runconfig to the
moduleVersion when calling module.Publish_python_script.
Enable node account can be a pipeline parameter in ParallelRunStep in
azureml.pipeline.steps
This change allows user to provide an optional runconfig to the
moduleVersion when calling module.Publish_python_script.
Remove deprecated lag_length parameter from documentation.
Enabling explanations for AutoML Prophet and ensemble models that
contain Prophet model.
Documentation updates to azureml-train-automl-* packages.
azureml-train-core
Supporting TensorFlow version 2.1 in the PyTorch Estimator
Improvements to azureml-train-core package.
2020-05-26

New features
AutoML Forecasting now supports customers forecast beyond the
prespecified max-horizon without retraining the model. When the forecast
destination is farther into the future than the specified maximum horizon, the
forecast() function still makes point predictions out to the later date using a
recursive operation mode. For the illustration of the new feature, see the
"Forecasting farther than the maximum horizon" section of "forecasting-
forecast-function" notebook in folder ."
ParallelRunStep is now released and is part of azureml-pipeline-steps
package. Existing ParallelRunStep in azureml-contrib-pipeline-steps package
is deprecated. Changes from public preview version:
Added run_max_try optional configurable parameter to control max call
to run method for any given batch, default value is 3.
No PipelineParameters are autogenerated anymore. Following
configurable values can be set as PipelineParameter explicitly.
mini_batch_size
node_count
process_count_per_node
logging_level
run_invocation_timeout
run_max_try
Default value for process_count_per_node is changed to 1. User should
tune this value for better performance. Best practice is to set as the
number of GPU or CPU node has.
ParallelRunStep does not inject any packages, user needs to include
azureml-core and azureml-dataprep[pandas, fuse] packages in
environment definition. If custom docker image is used with
user_managed_dependencies, then user need to install conda on the
image.
Breaking changes
Deprecated the use of azureml.dprep.Dataflow as a valid type of input for
AutoMLConfig
Deprecated the use of azureml.dprep.Dataflow as a valid type of input for
AutoMLConfig

azureml-automl-core
Fixed the bug where a warning may be printed during get_output that asked
user to downgrade client.
Updated Mac to rely on cudatoolkit=9.0 as it is not available at version 10
yet.
Removing restrictions on prophet and xgboost models when trained on
remote compute.
Improved logging in AutoML
The error handling for custom featurization in forecasting tasks was
improved.
Added functionality to allow users to include lagged features to generate
forecasts.
Updates to error message to correctly display user error.
Support for cv_split_column_names to be used with training_data
Update logging the exception message and traceback.
Enable guardrails for forecasting missing value imputations.
Added fine grained error handling for data prep exceptions
remote compute.
azureml-train-automl-runtime and azureml-automl-runtime have updated
dependencies for pytorch , scipy , and cudatoolkit . we now support
pytorch==1.4.0 , scipy>=1.0.0,<=1.3.1 , and cudatoolkit==10.1.243 .
The error handling for custom featurization in forecasting tasks was

improved.
The forecasting data set frequency detection mechanism was improved.
Fixed issue with Prophet model training on some data sets.
The auto detection of max horizon during the forecasting was improved.
forecasts.
Adds functionality in the forecast function to enable providing forecasts
beyond the trained horizon without retraining the forecasting model.
azureml-contrib-mir
Added support for Windows services in ManagedInferencing
Remove old MIR workflows such as attach MIR compute,
SingleModelMirWebservice class - Clean out model profiling placed in
contrib-mir package
Minor fix for YAML support
ParallelRunStep is released to General Availability -
azureml.contrib.pipeline.steps has a deprecation notice and is move to
azureml.pipeline.steps
RL Load testing tool
RL estimator has smart defaults
azureml-core
Remove old MIR workflows such as attach MIR compute,
SingleModelMirWebservice class - Clean out model profiling placed in
contrib-mir package
Fixed the information provided to the user in profiling failure: included
request ID and reworded the message to be more meaningful. Added new
profiling workflow to profiling runners
Improved error text in Dataset execution failures.
Workspace private link CLI support added.
Added an optional parameter invalid_lines to
Dataset.Tabular.from_json_lines_files that allows for specifying how to
handle lines that contain invalid JSON.
We will be deprecating the run-based creation of compute in the next
release. We recommend creating an actual Amlcompute cluster as a
persistent compute target, and using the cluster name as the compute target
in your run configuration. See example notebook here:
aka.ms/amlcomputenb
Improved error messages in Dataset execution failures.
azureml-dataprep
Made warning to upgrade pyarrow version more explicit.
Improved error handling and message returned in failure to execute
dataflow.
azureml-interpret
Documentation updates to azureml-interpret package.
Fixed interpretability packages and notebooks to be compatible with latest
sklearn update
return None when there is no data returned.
Improve the performance of to_pandas_dataframe.
Quick fix for ParallelRunStep where loading from YAML was broken
azureml.pipeline.steps - new features include: 1. Datasets as
PipelineParameter 2. New parameter run_max_retry 3. Configurable
append_row output file name
Deprecated azureml.dprep.Dataflow as a valid type for input data.
Quick fix for ParallelRunStep where loading from YAML was broken
azureml.pipeline.steps - new features include:
Datasets as PipelineParameter
New parameter run_max_retry
Configurable append_row output file name
azureml-telemetry
Update logging the exception message and traceback.
Updated Mac to rely on cudatoolkit=9.0 as it is not available at version 10
yet.
remote compute.


forecasts.
Added fine grained error handling for data prep exceptions
remote compute.


azureml-train-core
Added a new set of HyperDrive specific exceptions. azureml.train.hyperdrive
now throws detailed exceptions.
azureml-widgets
Azure Machine Learning Widgets is not displaying in JupyterLab
2020-05-11

New features
Preview features
Azure Machine Learning is releasing preview support for reinforcement
learning using the Ray framework. The ReinforcementLearningEstimator
enables training of reinforcement learning agents across GPU and CPU
compute targets in Azure Machine Learning.

azure-cli-ml
Fixes an accidentally left behind warning log in my previous PR. The log was
used for debugging and accidentally was left behind.
Bug fix: inform clients about partial failure during profiling
azureml-automl-core
Speed up Prophet/AutoArima model in AutoML forecasting by enabling
parallel fitting for the time series when data sets have multiple time series. In
order to benefit from this new feature, you are recommended to set
"max_cores_per_iteration = -1" (that is, using all the available cpu cores) in
AutoMLConfig.
Fix KeyError on printing guardrails in console interface
Fixed error message for experimentation_timeout_hours
Deprecated TensorFlow models for AutoML.
Fixed error message for experimentation_timeout_hours
Fixed unclassified exception when trying to deserialize from cache store
Speed up Prophet/AutoArima model in AutoML forecasting by enabling
parallel fitting for the time series when data sets have multiple time series.
Fixed the forecasting with enabled rolling window on the data sets where
test/prediction set does not contain one of grains from the training set.
Improved handling of missing data
Fixed issue with prediction intervals during forecasting on data sets,
containing time series, which are not aligned in time.
Added better validation of data shape for the forecasting tasks.
Improved the frequency detection.
Created better error message if the cross validation folds for forecasting
tasks cannot be generated.
Fix console interface to print missing value guardrail correctly.
Enforcing datatype checks on cv_split_indices input in AutoMLConfig.
azureml-cli-common
azureml-contrib-mir
Adds a class azureml.contrib.mir.RevisionStatus, which relays information
about the currently deployed MIR revision and the most recent version
specified by the user. This class is included in the MirWebservice object under
'deployment_status' attribute.
Enables update on Webservices of type MirWebservice and its child class
SingleModelMirWebservice.
Added support for Ray 0.8.3
AmlWindowsCompute only supports Azure Files as mounted storage
Renamed health_check_timeout to health_check_timeout_seconds
Fixed some class/method descriptions.
azureml-core
Enabled WASB -> Blob conversions in Azure Government and China clouds.
Fixes bug to allow Reader roles to use az ml run CLI commands to get run
information
Removed unnecessary logging during Azure Machine Learning Remote Runs
with input Datasets.
RCranPackage now supports "version" parameter for the CRAN package
version.
Added European-style float handling for azureml-core.
Enabled workspace private link features in Azure Machine Learning sdk.
When creating a TabularDataset using from_delimited_files , you can specify
whether empty values should be loaded as None or as empty string by
setting the boolean argument empty_as_string .
Added European-style float handling for datasets.
Improved error messages on dataset mount failures.
azureml-datadrift
Data Drift results query from the SDK had a bug that didn't differentiate the
minimum, maximum, and mean feature metrics, resulting in duplicate values.
We have fixed this bug by prefixing target or baseline to the metric names.
Before: duplicate min, max, mean. After: target_min, target_max, target_mean,
baseline_min, baseline_max, baseline_mean.
azureml-dataprep
Improve handling of write restricted Python environments when ensuring
.NET Dependencies required for data delivery.
Fixed Dataflow creation on file with leading empty records.
Added error handling options for to_partition_iterator similar to
to_pandas_dataframe .
azureml-interpret
Reduced explanation path length limits to reduce likelihood of going over
Windows limit
Bugfix for sparse explanations created with the mimic explainer using a linear
surrogate model.
Fix issue of MNIST's columns are parsed as string, which should be int.
Allowing the option to regenerate_outputs when using a module that is
embedded in a ModuleStep.
Deprecated TensorFlow models for AutoML.
Fix users allow listing unsupported algorithms in local mode
Doc fixes to AutoMLConfig.
Enforcing datatype checks on cv_split_indices input in AutoMLConfig.
Fixed issue with AutoML runs failing in show_output
Fixing a bug in Ensemble iterations that was preventing model download
timeout from kicking in successfully.
azureml-train-core
Fix typo in azureml.train.dnn.Nccl class.
Supporting PyTorch version 1.5 in the PyTorch Estimator
Fix the issue that framework image can't be fetched in Azure Government
region when using training framework estimators
2020-05-04
New Notebook Experience
You can now create, edit, and share machine learning notebooks and files directly inside
the studio web experience of Azure Machine Learning. You can use all the classes and
methods available in Azure Machine Learning Python SDK from inside these notebooks.
To get started, visit the Run Jupyter Notebooks in your workspace article.
New Features Introduced:
Improved editor (Monaco editor) used by Visual Studio Code

UI/UX improvements
Cell Toolbar
New Notebook Toolbar and Compute Controls
Notebook Status Bar
Inline Kernel Switching
R Support
Accessibility and Localization improvements
Command Palette
More Keyboard Shortcuts
Auto save
Improved performance and reliability
Access the following web-based authoring tools from the studio:
Web-based tool Description
Azure Machine Learning First in-class authoring for notebook files and support all operation
Studio Notebooks available in the Azure Machine Learning Python SDK.
2020-04-27

New features
AmlCompute clusters now support setting up a managed identity on the cluster
at the time of provisioning. Just specify whether you would like to use a system-
assigned identity or a user-assigned identity, and pass an identityId for the
latter. You can then set up permissions to access various resources like Storage
or ACR in a way that the identity of the compute gets used to securely access
the data, instead of a token-based approach that AmlCompute employs today.
Check out our SDK reference for more information on the parameters.
Breaking changes
AmlCompute clusters supported a Preview feature around run-based creation,
that we are planning on deprecating in two weeks. You can continue to create
persistent compute targets as always by using the Amlcompute class, but the
specific approach of specifying the identifier "amlcompute" as the compute
target in run config will not be supported soon.

Enable support for unhashable type when calculating number of unique
values in a column.
azureml-core
Improved stability when reading from Azure Blob Storage using a
TabularDataset.
Improved documentation for the grant_workspace_msi parameter for
Datastore.register_azure_blob_store .
Fixed bug with datastore.upload to support the src_dir argument ending
with a / or \ .
Added actionable error message when trying to upload to an Azure Blob
Storage datastore that does not have an access key or SAS token.
azureml-interpret
Added upper bound to file size for the visualization data on uploaded
explanations.
Explicitly checking for label_column_name & weight_column_name
parameters for AutoMLConfig to be of type string.
ParallelRunStep now supports dataset as pipeline parameter. User can
construct pipeline with sample dataset and can change input dataset of the
same type (file or tabular) for new pipeline run.
2020-04-13

azureml-automl-core
Added more telemetry around post-training operations.
Speeds up automatic ARIMA training by using conditional sum of squares
(CSS) training for series of length longer than 100. The length used is stored
as the constant ARIMA_TRIGGER_CSS_TRAINING_LENGTH w/in the
TimeSeriesInternal class at /src/azureml-automl-
core/azureml/automl/core/shared/constants.py
The user logging of forecasting runs was improved, now more information
on what phase is currently running is shown in the log
Disallowed target_rolling_window_size to be set to values less than 2
Improved the error message shown when duplicated timestamps are found.
Disallowed target_rolling_window_size to be set to values less than 2.
Fixed the lag imputation failure. The issue was caused by the insufficient
number of observations needed to seasonally decompose a series. The "de-
seasonalized" data is used to compute a partial autocorrelation function
(PACF) to determine the lag length.
Enabled column purpose featurization customization for forecasting tasks by
featurization config. Numerical and Categorical as column purpose for
forecasting tasks is now supported.
Enabled drop column featurization customization for forecasting tasks by
featurization config.
Enabled imputation customization for forecasting tasks by featurization
config. Constant value imputation for target column and mean, median,
most_frequent, and constant value imputation for training data are now
supported.
Accept string compute names to be passed to ParallelRunConfig
azureml-core
Added Environment.clone(new_name) API to create a copy of Environment
object
Environment.docker.base_dockerfile accepts filepath. If able to resolve a file,
the content is read into base_dockerfile environment property
Automatically reset mutually exclusive values for base_image and
base_dockerfile when user manually sets a value in Environment.docker
Added user_managed flag in RSection that indicates whether the
environment is managed by user or by Azure Machine Learning.
Dataset: Fixed dataset download failure if data path containing unicode
characters.
Dataset: Improved dataset mount caching mechanism to respect the
minimum disk space requirement in Azure Machine Learning Compute,
which avoids making the node unusable and causing the job to be canceled.
Dataset: We add an index for the time series column when you access a time
series dataset as a pandas dataframe, which is used to speed up access to
time series-based data access. Previously, the index was given the same
name as the timestamp column, confusing users about which is the actual
timestamp column and which is the index. We now don't give any specific
name to the index since it should not be used as a column.
Dataset: Fixed dataset authentication issue in sovereign cloud.
Dataset: Fixed Dataset.to_spark_dataframe failure for datasets created from
Azure PostgreSQL datastores.
azureml-interpret
Added global scores to visualization if local importance values are sparse
Updated azureml-interpret to use interpret-community 0.9.*
Fixed issue with downloading explanation that had sparse evaluation data
Added support of sparse format of the explanation object in AutoML
Support ComputeInstance as compute target in pipelines
Fixed the regression in early stopping
Changing default AutoML experiment time-out to six days.
added sparse AutoML end to end support
Added another telemetry for service monitor.
Enable front door for blob to increase stability
2020-03-23

Breaking changes
Drop support for Python 2.7

azure-cli-ml
Adds "--subscription-id" to az ml model/computetarget/service commands
in the CLI
Adding support for passing customer-managed key(CMK) vault_url,
key_name and key_version for ACI deployment
azureml-automl-core
Enabled customized imputation with constant value for both X and y data
forecasting tasks.
Fixed the issue in with showing error messages to user.
Fixed the issue in with forecasting on the data sets, containing grains with
only one row
Decreased the amount of memory required by the forecasting tasks.
Added better error messages if time column has incorrect format.
Enabled customized imputation with constant value for both X and y data
forecasting tasks.
azureml-core
Added support for loading ServicePrincipal from environment variables:
AZUREML_SERVICE_PRINCIPAL_ID,
AZUREML_SERVICE_PRINCIPAL_TENANT_ID, and
AZUREML_SERVICE_PRINCIPAL_PASSWORD
Introduced a new parameter support_multi_line to
Dataset.Tabular.from_delimited_files : By default
( support_multi_line=False ), all line breaks, including those in quoted field
values, will be interpreted as a record break. Reading data this way is faster
and more optimized for parallel execution on multiple CPU cores. However, it
may result in silently producing more records with misaligned field values.
This should be set to True when the delimited files are known to contain
quoted line breaks.
Added the ability to register ADLS Gen2 in the Azure Machine Learning CLI
Renamed parameter 'fine_grain_timestamp' to 'timestamp' and parameter
'coarse_grain_timestamp' to 'partition_timestamp' for the
with_timestamp_columns() method in TabularDataset to better reflect the
usage of the parameters.
Increased max experiment name length to 255.
azureml-interpret
Updated azureml-interpret to interpret-community 0.7.*
azureml-sdk
Changing to dependencies with compatible version Tilde for the support of
patching in pre-release and stable releases.
2020-03-11

Feature deprecation
Python 2.7
Last version to support Python 2.7
Breaking changes
Semantic Versioning 2.0.0
Starting with version 1.1 Azure Machine Learning Python SDK adopts
Semantic Versioning 2.0.0 . All subsequent versions follow new numbering
scheme and semantic versioning contract.

azure-cli-ml
Change the endpoint CLI command name from 'az ml endpoint aks' to 'az ml
endpoint real time' for consistency.
update CLI installation instructions for stable and experimental branch CLI
Single instance profiling was fixed to produce a recommendation and was
made available in core sdk.
azureml-automl-core
Enabled the Batch mode inference (taking multiple rows once) for AutoML
ONNX models
Improved the detection of frequency on the data sets, lacking data or
containing irregular data points
Added the ability to remove data points not complying with the dominant
frequency.
Changed the input of the constructor to take a list of options to apply the
imputation options for corresponding columns.
The error logging has been improved.
Fixed the issue with the error thrown if the grain was not present in the
training set appeared in the test set
Removed the y_query requirement during scoring on forecasting service
Fixed the issue with forecasting when the data set contains short grains with
long time gaps.
Fixed the issue when the auto max horizon is turned on and the date column
contains dates in form of strings. Proper conversion and error messages were
added for when conversion to date is not possible
Using native NumPy and SciPy for serializing and deserializing intermediate
data for FileCacheStore (used for local AutoML runs)
Fixed a bug where failed child runs could get stuck in Running state.
Increased speed of featurization.
Fixed the frequency check during scoring, now the forecasting tasks do not
require strict frequency equivalence between train and test set.
Changed the input of the constructor to take a list of options to apply the
imputation options for corresponding columns.
Fixed errors related to lag type selection.
Fixed the unclassified error raised on the data sets, having grains with the
single row
Fixed the issue with frequency detection slowness.
Fixes a bug in AutoML exception handling that caused the real reason for
training failure to be replaced by an AttributeError.
azureml-cli-common
azureml-contrib-mir
Adds functionality in the MirWebservice class to retrieve the Access Token
Use token auth for MirWebservice by default during MirWebservice.run() call
- Only refresh if call fails
Mir webservice deployment now requires proper Skus [Standard_DS2_v2,
Standard_F16, Standard_A2_v2] instead of [Ds2v2, A2v2, and F16]
respectively.
Optional parameter side_inputs added to ParallelRunStep. This parameter can
be used to mount folder on the container. Currently supported types are
DataReference and PipelineData.
Parameters passed in ParallelRunConfig can be overwritten by passing
pipeline parameters now. New pipeline parameters supported
aml_mini_batch_size, aml_error_threshold, aml_logging_level,
aml_run_invocation_timeout (aml_node_count and
aml_process_count_per_node are already part of earlier release).
azureml-core
Deployed Azure Machine Learning Webservices now defaults to INFO
logging. This can be controlled by setting the AZUREML_LOG_LEVEL
environment variable in the deployed service.
Python sdk uses discovery service to use 'api' endpoint instead of 'pipelines'.
Swap to the new routes in all SDK calls.
Changed routing of calls to the ModelManagementService to a new unified
structure.
Made workspace update method publicly available.
Added image_build_compute parameter in workspace update method to
allow user updating the compute for image build.
Added deprecation messages to the old profiling workflow. Fixed profiling
cpu and memory limits.
Added RSection as part of Environment to run R jobs.
Added validation to Dataset.mount to raise error when source of the dataset
is not accessible or does not contain any data.
Added --grant-workspace-msi-access as another parameter for the
Datastore CLI for registering Azure Blob Container that allows you to register
Blob Container that is behind a VNet.
Fixed the issue in aks.py _deploy.
Validates the integrity of models being uploaded to avoid silent storage
failures.
User may now specify a value for the auth key when regenerating keys for
webservices.
Fixed bug where uppercase letters cannot be used as dataset's input name.
azureml-defaults
azureml-dataprep will now be installed as part of azureml-defaults . It is no
longer required to install data prep[fuse] manually on compute targets to

mount datasets.
azureml-interpret
Updated azureml-interpret to interpret-community 0.6.*
Updated azureml-interpret to depend on interpret-community 0.5.0
Added azureml-style exceptions to azureml-interpret
Fixed DeepScoringExplainer serialization for keras models
azureml-mlflow
Add support for sovereign clouds to azureml.mlflow
Pipeline batch scoring notebook now uses ParallelRunStep
Fixed a bug where PythonScriptStep results could be incorrectly reused
despite changing the arguments list
Added the ability to set columns' type when calling the parse_* methods on
PipelineOutputFileDataset
Moved the AutoMLStep to the azureml-pipeline-steps package. Deprecated
the AutoMLStep within azureml-train-automl-runtime .
Added documentation example for dataset as PythonScriptStep input
azureml-tensorboard
Updated azureml-tensorboard to support TensorFlow 2.0
Show correct port number when using a custom TensorBoard port on a
Compute Instance
Fixed an issue where certain packages may be installed at incorrect versions
on remote runs.
fixed FeaturizationConfig overriding issue that filters custom featurization
config.
Fixed the issue with frequency detection in the remote runs
Moved the AutoMLStep in the azureml-pipeline-steps package. Deprecated
azureml-train-core
2020-03-02
Azure Machine Learning SDK for Python v1.1.2rc0 (Pre-
release)
azureml-automl-core
Enabled the Batch mode inference (taking multiple rows once) for AutoML
ONNX models
Improved the detection of frequency on the data sets, lacking data or
containing irregular data points
Added the ability to remove data points not complying with the dominant
frequency.
Fixed the issue with the error thrown if the grain was not present in the
training set appeared in the test set
Removed the y_query requirement during scoring on forecasting service
azureml-contrib-mir
Adds functionality in the MirWebservice class to retrieve the Access Token
azureml-core
Deployed Azure Machine Learning Webservices now defaults to INFO
logging. This can be controlled by setting the AZUREML_LOG_LEVEL
environment variable in the deployed service.
Fix iterating on Dataset.get_all to return all datasets registered with the
workspace.
Improve error message when invalid type is passed to path argument of
dataset creation APIs.
Python sdk uses discovery service to use 'api' endpoint instead of 'pipelines'.
Swap to the new routes in all SDK calls
Changes routing of calls to the ModelManagementService to a new unified
structure
Made workspace update method publicly available.
Added image_build_compute parameter in workspace update method to
allow user updating the compute for image build
Added deprecation messages to the old profiling workflow. Fixed profiling
cpu and memory limits
azureml-interpret
update azureml-interpret to interpret-community 0.6.*
azureml-mlflow
Add support for sovereign clouds to azureml.mlflow
Moved the AutoMLStep to the azureml-pipeline-steps package . Deprecated
Fixed an issue where certain packages may be installed at incorrect versions
on remote runs.
Fixed the issue with frequency detection in the remote runs
azureml-train-core
2020-02-18

release)
azure-cli-ml
azureml-automl-core
The error logging has been improved.
Fixed the issue with forecasting when the data set contains short grains with
long time gaps.
Fixed the issue when the auto max horizon is turned on and the date column
contains dates in form of strings. We added proper conversion and sensible
error if conversion to date is not possible
Using native NumPy and SciPy for serializing and deserializing intermediate
data for FileCacheStore (used for local AutoML runs)
Fixed a bug where failed child runs could get stuck in Running state.
azureml-cli-common
azureml-core
Added --grant-workspace-msi-access as another parameter for the
Datastore CLI for registering Azure Blob Container that allows you to register
Blob Container that is behind a VNet
Fixed the issue in aks.py _deploy
Validates the integrity of models being uploaded to avoid silent storage
failures.
azureml-interpret
added azureml-style exceptions to azureml-interpret
fixed DeepScoringExplainer serialization for keras models
Pipeline batch scoring notebook now uses ParallelRunStep
Optional parameter side_inputs added to ParallelRunStep. This parameter can
be used to mount folder on the container. Currently supported types are
DataReference and PipelineData.
azureml-tensorboard
Updated azureml-tensorboard to support TensorFlow 2.0
Fixed FeaturizationConfig overriding issue that filters custom featurization
config.
azureml-train-core
2020-02-04

release)
Breaking changes
Semantic Versioning 2.0.0
Starting with version 1.1 Azure Machine Learning Python SDK adopts
Semantic Versioning 2.0.0 . All subsequent versions follow new numbering
scheme and semantic versioning contract.
Increased speed of featurization.
Fixed the frequency check during scoring, now in the forecasting tasks we do
not require strict frequency equivalence between train and test set.
azureml-core
User may now specify a value for the auth key when regenerating keys for
webservices.
azureml-interpret
Updated azureml-interpret to depend on interpret-community 0.5.0
Fixed a bug where PythonScriptStep results could be incorrectly reused
despite changing the arguments list
Added documentation example for dataset as PythonScriptStep input
Parameters passed in ParallelRunConfig can be overwritten by passing
pipeline parameters now. New pipeline parameters supported
aml_mini_batch_size, aml_error_threshold, aml_logging_level,
aml_run_invocation_timeout (aml_node_count and
aml_process_count_per_node are already part of earlier release).
2020-01-21

New features
azureml-core
Get the current core usage and quota limitation for AmlCompute resources
in a given workspace and subscription
Enable user to pass tabular dataset as intermediate result from previous step
to parallelrunstep

Removed the requirement of y_query column in the request to the deployed
forecasting service.
The 'y_query' was removed from the Dominick's Orange Juice notebook
service request section.
Fixed the bug preventing forecasting on the deployed models, operating on
data sets with date time columns.
Added Matthews Correlation Coefficient as a classification metric, for both
binary and multiclass classification.
Removed text explainers from azureml-contrib-interpret as text explanation
has been moved to the interpret-text repo that will be released soon.
azureml-core
Dataset: usages for file dataset no longer depend on numpy and pandas to
be installed in the Python env.
Changed LocalWebservice.wait_for_deployment() to check the status of the
local Docker container before trying to ping its health endpoint, greatly
reducing the amount of time it takes to report a failed deployment.
Fixed the initialization of an internal property used in
LocalWebservice.reload() when the service object is created from an existing
deployment using the LocalWebservice() constructor.
Edited error message for clarification.
Added a new method called get_access_token() to AksWebservice that will
return AksServiceAccessToken object, which contains access token, refresh
after timestamp, expiry on timestamp and token type.
Deprecated existing get_token() method in AksWebservice as the new
method returns all of the information this method returns.
Modified output of az ml service get-access-token command. Renamed
token to accessToken and refreshBy to refreshAfter. Added expiryOn and
tokenType properties.
Fixed get_active_runs
updated shap to 0.33.0 and interpret-community to 0.4.*
azureml-interpret
updated shap to 0.33.0 and interpret-community to 0.4.*
Added Matthews Correlation Coefficient as a classification metric, for both
binary and multiclass classification.
Deprecate preprocess flag from code and replaced with featurization -
featurization is on by default
2020-01-06
New features
Dataset: Add two options on_error and out_of_range_datetime for
to_pandas_dataframe to fail when data has error values instead of filling them
with None .
Workspace: Added the hbi_workspace flag for workspaces with sensitive data
that enables further encryption and disables advanced diagnostics on
workspaces. We also added support for bringing your own keys for the
associated Azure Cosmos DB instance, by specifying the cmk_keyvault and
resource_cmk_uri parameters when creating a workspace, which creates an
Azure Cosmos DB instance in your subscription while provisioning your
workspace. To learn more, see the Azure Cosmos DB section of data encryption
article.

Fixed a regression that caused a TypeError to be raised when running
AutoML on Python versions below 3.5.4.
azureml-core
Fixed bug in datastore.upload_files were relative path that didn't start with
./ was not able to be used.
Added deprecation messages for all Image class code paths
Fixed Model Management URL construction for Azure China 21Vianet region.
Fixed issue where models using source_dir couldn't be packaged for Azure
Functions.
Added an option to Environment.build_local() to push an image into Azure
Machine Learning workspace container registry
Updated the SDK to use new token library on Azure synapse in a back
compatible manner.
azureml-interpret
Fixed bug where None was returned when no explanations were available for
download. Now raises an exception, matching behavior elsewhere.
Disallowed passing DatasetConsumptionConfig s to Estimator 's inputs
parameter when the Estimator will be used in an EstimatorStep .
azureml-sdk
Added AutoML client to azureml-sdk package, enabling remote AutoML runs
to be submitted without installing the full AutoML package.
Corrected alignment on console output for AutoML runs
Fixed a bug where incorrect version of pandas may be installed on remote
amlcompute.
2019-12-23

defer shap dependency to interpret-community from azureml-interpret
azureml-core
Compute target can now be specified as a parameter to the corresponding
deployment config objects. This is specifically the name of the compute
target to deploy to, not the SDK object.
Added CreatedBy information to Model and Service objects. May be
accessed through.created_by
Fixed ContainerImage.run(), which was not correctly setting up the Docker
container's HTTP port.
Make azureml-dataprep optional for az ml dataset register CLI command
Fixed a bug where TabularDataset.to_pandas_dataframe would incorrectly fall
back to an alternate reader and print a warning.
defer shap dependency to interpret-community from azureml-interpret
Added new pipeline step NotebookRunnerStep , to run a local notebook as a
step in pipeline.
Removed deprecated get_all functions for PublishedPipelines, Schedules, and
PipelineEndpoints
Started deprecation of data_script as an input to AutoML.
2019-12-09

azureml-automl-core
Removed featurizationConfig to be logged
Updated logging to log "auto"/"off"/"customized" only.
Added support for pandas. Series and pandas. Categorical for detecting
column data type. Previously only supported numpy.ndarray
Added related code changes to handle categorical dtype correctly.
The forecast function interface was improved: the y_pred parameter was
made optional. -The docstrings were improved.
Fixed a bug where labeled datasets could not be mounted.
azureml-core
Bug fix for Environment.from_existing_conda_environment(name,
conda_environment_name) . User can create an instance of Environment that is
exact replica of the local environment
Changed time series-related Datasets methods to include_boundary=True by
default.
Fixed issue where validation results are not printed when show output is set
to false.
2019-11-25

Breaking changes
Azureml-Train-AutoML upgrade issues
Upgrading to azureml-train-automl>=1.0.76 from azureml-train-
automl<1.0.76 can cause partial installations, causing some AutoML imports
to fail. To resolve this, you can run the setup script found at
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-
use-azureml/automated-machine-learning/automl_setup.cmd . Or if you
are using pip directly you can:
"pip install --upgrade azureml-train-automl"
"pip install --ignore-installed azureml-train-automl-client"
or you can uninstall the old version before upgrading
"pip uninstall azureml-train-automl"
"pip install azureml-train-automl"

AutoML now takes into account both true and false classes when calculating
averaged scalar metrics for binary classification tasks.
Moved Machine learning and training code in AzureML-AutoML-Core to a
new package AzureML-AutoML-Runtime.
When calling to_pandas_dataframe on a labeled dataset with the download
option, you can now specify whether to overwrite existing files or not.
When calling keep_columns or drop_columns that results in a time series,
label, or image column being dropped, the corresponding capabilities are
dropped for the dataset as well.
Fixed an issue with pytorch loader for the object detection task.
Removed explanation dashboard widget from azureml-contrib-interpret,
changed package to reference the new one in interpret_community
Updated version of interpret-community to 0.2.0
azureml-core
Improve performance of workspace.datasets .
Added the ability to register Azure SQL Database Datastore using username
and password authentication
Fix for loading RunConfigurations from relative paths.
When calling keep_columns or drop_columns that results in a time series
column being dropped, the corresponding capabilities are dropped for the
dataset as well.
azureml-interpret
updated version of interpret-community to 0.2.0
Documented supported values for runconfig_pipeline_params for Azure
machine learning pipeline steps.
Added CLI option to download output in json format for Pipeline commands.
azureml-train-automl
Split AzureML-Train-AutoML into two packages, a client package AzureML-
Train-AutoML-Client and an ML training package AzureML-Train-AutoML-
Runtime
Added a thin client for submitting AutoML experiments without needing to
install any machine learning dependencies locally.
Fixed logging of automatically detected lags, rolling window sizes and
maximal horizons in the remote runs.
Added a new AutoML package to isolate machine learning and runtime
components from the client.
azureml-contrib-train-rl
Added reinforcement learning support in SDK.
Added AmlWindowsCompute support in RL SDK.
2019-11-11

Preview features
After importing azureml-contrib-dataset, you can call
Dataset.Labeled.from_json_lines instead of ._Labeled to create a labeled
dataset.
When calling to_pandas_dataframe on a labeled dataset with the download
option, you can now specify whether to overwrite existing files or not.
When calling keep_columns or drop_columns that results in a time series,
label, or image column being dropped, the corresponding capabilities are
dropped for the dataset as well.
Fixed issues with PyTorch loader when calling dataset.to_torchvision() .

azure-cli-ml
Added Model Profiling to the preview CLI.
Fixes breaking change in Azure Storage causing Azure Machine Learning CLI
to fail.
Added Load Balancer Type to MLC for AKS types
azureml-automl-core
Fixed the issue with detection of maximal horizon on time series, having
missing values and multiple grains.
Fixed the issue with failures during generation of cross validation splits.
Replace this section with a message in markdown format to appear in the
release notes: -Improved handling of short grains in the forecasting data sets.
Fixed the issue with masking of some user information during logging. -
Improved logging of the errors during forecasting runs.
Adding psutil as a conda dependency to the autogenerated yml deployment
file.
azureml-contrib-mir
Fixes breaking change in Azure Storage causing Azure Machine Learning CLI
to fail.
azureml-core
Fixes a bug that caused models deployed on Azure Functions to produce 500
s.
Fixed an issue where the amlignore file was not applied on snapshots.
Added a new API amlcompute.get_active_runs that returns a generator for
running and queued runs on a given amlcompute.
Added Load Balancer Type to MLC for AKS types.
Added append_prefix bool parameter to download_files in run.py and
download_artifacts_from_prefix in artifacts_client. This flag is used to
selectively flatten the origin filepath so only the file or folder name is added
to the output_directory
Fix deserialization issue for run_config.yml with dataset usage.
When calling keep_columns or drop_columns that results in a time series
column being dropped, the corresponding capabilities are dropped for the
dataset as well.
azureml-interpret
Updated interpret-community version to 0.1.0.3
Fixed an issue where automl_step might not print validation issues.
Fixed register_model to succeed even if the model's environment is missing
dependencies locally.
Fixed an issue where some remote runs were not docker enabled.
Add logging of the exception that is causing a local run to fail prematurely.
azureml-train-core
Consider resume_from runs in the calculation of automated hyperparameter
tuning best child runs.
Fixed parameter handling in pipeline argument construction.
Added pipeline description and step type yaml parameter.
New yaml format for Pipeline step and added deprecation warning for old
format.
2019-11-04
Web experience
The collaborative workspace landing page at https://ml.azure.com has been enhanced
and rebranded as the Azure Machine Learning studio.
From the studio, you can train, test, deploy, and manage Azure Machine Learning assets
such as datasets, pipelines, models, endpoints, and more.
Access the following web-based authoring tools from the studio:
Web-based tool Description
Notebook VM(preview) Fully managed cloud-based workstation
Automated machine learning No code experience for automating machine learning model
(preview) development
Designer Drag-and-drop machine learning modeling tool formerly known

as the visual interface
Azure Machine Learning designer enhancements

Formerly known as the visual interface
11 new modules including recommenders, classifiers, and training utilities
including feature engineering, cross validation, and data transformation.
R SDK
Data scientists and AI developers use the Azure Machine Learning SDK for R to build
and run machine learning workflows with Azure Machine Learning.
The Azure Machine Learning SDK for R uses the reticulate package to bind to the
Python SDK. By binding directly to Python, the SDK for R allows you access to core
objects and methods implemented in the Python SDK from any R environment you
choose.
Main capabilities of the SDK include:
Manage cloud resources for monitoring, logging, and organizing your machine
Train models using cloud resources, including GPU-accelerated model training.
Deploy your models as webservices on Azure Container Instances (ACI) and Azure
Kubernetes Service (AKS).
See the package website for complete documentation.
Azure Machine Learning integration with Event Grid

Azure Machine Learning is now a resource provider for Event Grid, you can configure
machine learning events through the Azure portal or Azure CLI. Users can create events
for run completion, model registration, model deployment, and data drift detected.
These events can be routed to event handlers supported by Event Grid for consumption.
See machine learning event schema and tutorial articles for more details.
2019-10-31

New features
Added dataset monitors through the azureml-datadrift package, allowing for

monitoring time series datasets for data drift or other statistical changes over
time. Alerts and events can be triggered if drift is detected or other conditions
on the data are met. See our documentation for details.
Announcing two new editions (also referred to as a SKU interchangeably) in

Azure Machine Learning. With this release, you can now create either a Basic or
Enterprise Azure Machine Learning workspace. All existing workspaces are
defaulted to the Basic edition, and you can go to the Azure portal or to the
studio to upgrade the workspace anytime. You can create either a Basic or
Enterprise workspace from the Azure portal. Read our documentation to learn
more. From the SDK, the edition of your workspace can be determined using
the "sku" property of your workspace object.
We have also made enhancements to Azure Machine Learning Compute - you

can now view metrics for your clusters (like total nodes, running nodes, total
core quota) in Azure Monitor, besides viewing Diagnostic logs for debugging. In
addition, you can also view currently running or queued runs on your cluster
and details such as the IPs of the various nodes on your cluster. You can view
these either in the portal or by using corresponding functions in the SDK or CLI.
Preview features
We are releasing preview support for disk encryption of your local SSD in
Azure Machine Learning Compute. Raise a technical support ticket to get
your subscription allow listed to use this feature.
Public Preview of Azure Machine Learning Batch Inference. Azure Machine
Learning Batch Inference targets large inference jobs that are not time-
sensitive. Batch Inference provides cost-effective inference compute scaling,
with unparalleled throughput for asynchronous applications. It is optimized
for high-throughput, fire-and-forget inference over large collections of data.
Enabled functionalities for labeled dataset
Python
import azureml.core
import azureml.contrib.dataset
from azureml.contrib.dataset import FileHandlingOption,
LabeledDatasetTask
# create a labeled dataset by passing in your JSON lines file

dataset =
Dataset._Labeled.from_json_lines(datastore.path('path/to/file.json
l'), LabeledDatasetTask.IMAGE_CLASSIFICATION)
# download or mount the files in the ìmage_url` column

dataset.download()
dataset.mount()
# get a pandas dataframe

from azureml.data.dataset_type_definitions import
FileHandlingOption
dataset.to_pandas_dataframe(FileHandlingOption.DOWNLOAD)
dataset.to_pandas_dataframe(FileHandlingOption.MOUNT)
# get a Torchvision dataset

dataset.to_torchvision()

azure-cli-ml
CLI now supports model packaging.
Added dataset CLI. For more information: az ml dataset --help
Added support for deploying and packaging supported models (ONNX,
scikit-learn, and TensorFlow) without an InferenceConfig instance.
Added overwrite flag for service deployment (ACI and AKS) in SDK and CLI. If
provided, will overwrite the existing service if service with name already
exists. If service doesn't exist, will create new service.
Models can be registered with two new frameworks, Onnx and TensorFlow. -
Model registration accepts sample input data, sample output data and
resource configuration for the model.
azureml-automl-core
Training an iteration would run in a child process only when runtime
constraints are being set.
Added a guardrail for forecasting tasks, to check whether a specified
max_horizon causes a memory issue on the given machine or not. If it will, a
guardrail message is displayed.
Added support for complex frequencies like two years and one month. -
Added comprehensible error message if frequency cannot be determined.
Add azureml-defaults to auto generated conda env to solve the model
deployment failure
Allow intermediate data in Azure Machine Learning Pipeline to be converted
to tabular dataset and used in AutoMLStep .
Implemented column purpose update for streaming.
Implemented transformer parameter update for Imputer and
HashOneHotEncoder for streaming.
Added the current data size and the minimum required data size to the
validation error messages.
Updated the minimum required data size for Cross-validation to guarantee a
minimum of two samples in each validation fold.
azureml-cli-common
Models can be registered with two new frameworks, Onnx and TensorFlow.
azureml-contrib-gbdt
fixed the release channel for the notebook
Added a warning for non-AmlCompute compute target that we don't
support
Added LightGMB Estimator to azureml-contrib-gbdt package
azureml-core
Add deprecation warning for deprecated Dataset APIs. See Dataset API
change notice at https://aka.ms/tabular-dataset .
Change Dataset.get_by_id to return registration name and version if the
dataset is registered.
Fix a bug that ScriptRunConfig with dataset as argument cannot be used
repeatedly to submit experiment run.
Datasets retrieved during a run will be tracked and can be seen in the run
details page or by calling run.get_details() after the run is complete.
Allow intermediate data in Azure Machine Learning Pipeline to be converted
to tabular dataset and used in AutoMLStep.
Added support for deploying and packaging supported models (ONNX,
scikit-learn, and TensorFlow) without an InferenceConfig instance.
Models can be registered with two new frameworks, Onnx and TensorFlow.
Added new datastore for Azure Database for MySQL. Added example for
using Azure Database for MySQL in DataTransferStep in Azure Machine
Learning Pipelines.
Added functionality to add and remove tags from experiments Added
functionality to remove tags from runs
azureml-datadrift
Moved from azureml-contrib-datadrift into azureml-datadrift
Added support for monitoring time series datasets for drift and other
statistical measures
New methods create_from_model() and create_from_dataset() to the
DataDriftDetector class. The create() method is deprecated.
Adjustments to the visualizations in Python and UI in the Azure Machine
Learning studio.
Support weekly and monthly monitor scheduling, in addition to daily for
dataset monitors.
Support backfill of data monitor metrics to analyze historical data for dataset
monitors.
Various bug fixes
azureml-dataprep is no longer needed to submit an Azure Machine Learning
Pipeline run from the pipeline yaml file.
Add azureml-defaults to auto generated conda env to solve the model
deployment failure
AutoML remote training now includes azureml-defaults to allow reuse of
training env for inference.
azureml-train-core
Added PyTorch 1.3 support in PyTorch estimator
2019-10-21
Visual interface (preview)
The Azure Machine Learning visual interface (preview) has been overhauled to run
on Azure Machine Learning pipelines. Pipelines (previously known as experiments)
authored in the visual interface are now fully integrated with the core Azure
Machine Learning experience.
Unified management experience with SDK assets
Versioning and tracking for visual interface models, pipelines, and endpoints
Redesigned UI
Added batch inference deployment
Added Azure Kubernetes Service (AKS) support for inference compute targets
New Python-step pipeline authoring workflow
New landing page for visual authoring tools
New modules
Apply math operation
Apply SQL transformation
Clip values
Summarize data
Import from SQL Database
2019-10-14

azureml-automl-core
Limiting model explanations to best run rather than computing explanations
for every run. Making this behavior change for local, remote and ADB.
Added support for on-demand model explanations for UI
Added psutil as a dependency of automl and included psutil as a conda
dependency in amlcompute.
Fixed the issue with heuristic lags and rolling window sizes on the forecasting
data sets some series of which can cause linear algebra errors
Added print for the heuristically determined parameters in the forecasting
runs.
azureml-contrib-datadrift
Added protection while creating output metrics if dataset level drift is not in
the first section.
azureml-contrib-explain-model package has been renamed to azureml-
contrib-interpret
azureml-core
Added API to unregister datasets. dataset.unregister_all_versions()
azureml-contrib-explain-model package has been renamed to azureml-
contrib-interpret.
azureml-core
Added API to unregister datasets. dataset.unregister_all_versions().
Added Dataset API to check data changed time. dataset.data_changed_time .
Being able to consume FileDataset and TabularDataset as inputs to
PythonScriptStep , EstimatorStep , and HyperDriveStep in Azure Machine
Learning Pipeline
Performance of FileDataset.mount has been improved for folders with a
large number of files
Being able to consume FileDataset and TabularDataset as inputs to
PythonScriptStep, EstimatorStep, and HyperDriveStep in the Azure Machine
Learning Pipeline.
Performance of FileDataset.mount() has been improved for folders with a
large number of files
Added URL to known error recommendations in run details.
Fixed a bug in run.get_metrics where requests would fail if a run had too
many children
Fixed a bug in run.get_metrics where requests would fail if a run had too
many children
Added support for authentication on Arcadia cluster.
Creating an Experiment object gets or creates the experiment in the Azure
Machine Learning workspace for run history tracking. The experiment ID and
archived time are populated in the Experiment object on creation. Example:
experiment = Experiment(workspace, "New Experiment") experiment_id =
experiment.id archive() and reactivate() are functions that can be called on an
experiment to hide and restore the experiment from being shown in the UX
or returned by default in a call to list experiments. If a new experiment is
created with the same name as an archived experiment, you can rename the
archived experiment when reactivating by passing a new name. There can
only be one active experiment with a given name. Example: experiment1 =
Experiment(workspace, "Active Experiment") experiment1.archive() # Create
new active experiment with the same name as the archived. experiment2. =
Experiment(workspace, "Active Experiment")
experiment1.reactivate(new_name="Previous Active Experiment") The static
method list() on Experiment can take a name filter and ViewType filter.
ViewType values are "ACTIVE_ONLY", "ARCHIVED_ONLY" and "ALL" Example:
archived_experiments = Experiment.list( workspace,
view_type="ARCHIVED_ONLY") all_first_experiments =
Experiment.list(workspace, name="First Experiment", view_type="ALL")
Support using environment for model deployment, and service update
azureml-datadrift
The show attribute of DataDriftDector class don't support optional argument
'with_details' anymore. The show attribute only presents data drift coefficient
and data drift contribution of feature columns.
DataDriftDetector attribute 'get_output' behavior changes:
Input parameter start_time, end_time are optional instead of mandatory;
Input specific start_time and/or end_time with a specific run_id in the
same invoking results in value error exception because they are mutually
exclusive
By input specific start_time and/or end_time, only results of scheduled
runs are returned;
Parameter 'daily_latest_only' is deprecated.
Support retrieving Dataset-based Data Drift outputs.
Renames AzureML-explain-model package to AzureML-interpret, keeping
the old package for backwards compatibility for now
fixed automl bug with raw explanations set to classification task instead of
regression by default on download from ExplanationClient
Add support for ScoringExplainer to be created directly using MimicWrapper
Improved performance for large Pipeline creation
azureml-train-core
Added TensorFlow 2.0 support in TensorFlow Estimator
Creating an Experiment object gets or creates the experiment in the Azure

Machine Learning workspace for run history tracking. The experiment ID and
archived time are populated in the Experiment object on creation. Example:
Python
experiment = Experiment(workspace, "New Experiment")

experiment_id = experiment.id
archive() and reactivate() are functions that can be called on an experiment to

hide and restore the experiment from being shown in the UX or returned by
default in a call to list experiments. If a new experiment is created with the
same name as an archived experiment, you can rename the archived
experiment when reactivating by passing a new name. There can only be one
active experiment with a given name. Example:
Python
experiment1 = Experiment(workspace, "Active Experiment")

experiment1.archive()
# Create new active experiment with the same name as the archived.
experiment2 = Experiment(workspace, "Active Experiment")
experiment1.reactivate(new_name="Previous Active Experiment")
The static method list() on Experiment can take a name filter and ViewType
filter. ViewType values are "ACTIVE_ONLY", "ARCHIVED_ONLY" and "ALL".
Example:
Python
archived_experiments = Experiment.list(workspace,
view_type="ARCHIVED_ONLY")
all_first_experiments = Experiment.list(workspace, name="First
Experiment", view_type="ALL")
Support using environment for model deployment, and service update.

azureml-datadrift
The show attribute of DataDriftDetector class don't support optional
argument 'with_details' anymore. The show attribute only presents data drift
coefficient and data drift contribution of feature columns.
DataDriftDetector function [get_output]python/api/azureml-
datadrift/azureml.datadrift.datadriftdetector.datadriftdetector#get-output-
start-time-none--end-time-none--run-id-none-) behavior changes:
Input parameter start_time, end_time are optional instead of mandatory;
Input specific start_time and/or end_time with a specific run_id in the
same invoking results in value error exception because they are mutually
exclusive;
By input specific start_time and/or end_time, only results of scheduled
runs are returned;
Parameter 'daily_latest_only' is deprecated.
Support retrieving Dataset-based Data Drift outputs.
Add support for ScoringExplainer to be created directly using MimicWrapper
Improved performance for large Pipeline creation.
azureml-train-core
Added TensorFlow 2.0 support in TensorFlow Estimator.
The parent run will no longer be failed when setup iteration failed, as the
orchestration already takes care of it.
Added local-docker and local-conda support for AutoML experiments
Added local-docker and local-conda support for AutoML experiments.
2019-10-08
New web experience (preview) for Azure Machine

Learning workspaces
The Experiment tab in the new workspace portal has been updated so data scientists
can monitor experiments in a more performant way. You can explore the following
features:
Experiment metadata to easily filter and sort your list of experiments

Simplified and performant experiment details pages that allow you to visualize and
compare your runs
New design to run details pages to understand and monitor your training runs
2019-09-30

New features
Added curated environments. These environments have been preconfigured
with libraries for common machine learning tasks, and have been prebuild and
cached as Docker images for faster execution. They appear by default in
Workspace's list of environment, with prefix "AzureML".
Added curated environments. These environments have been preconfigured
with libraries for common machine learning tasks, and have been prebuild and
cached as Docker images for faster execution. They appear by default in
Workspace's list of environment, with prefix "AzureML".
Added the ONNX conversion support for the ADB and HDI
Preview features
Supported BERT and BiLSTM as text featurizer (preview only)
Supported featurization customization for column purpose and transformer
parameters (preview only)
Supported raw explanations when user enables model explanation during
training (preview only)
Added Prophet for timeseries forecasting as a trainable pipeline (preview
only)
Packages relocated from azureml-contrib-datadrift to azureml-datadrift; the
contrib package will be removed in a future release

azureml-automl-core
Introduced FeaturizationConfig to AutoMLConfig and AutoMLBaseSettings
Introduced FeaturizationConfig to AutoMLConfig and AutoMLBaseSettings
Override Column Purpose for Featurization with given column and feature
type
Override transformer parameters
Added deprecation message for explain_model() and
retrieve_model_explanations()
Added Prophet as a trainable pipeline (preview only)
retrieve_model_explanations().
Added Prophet as a trainable pipeline (preview only).
Added support for automatic detection of target lags, rolling window size,
and maximal horizon. If one of target_lags, target_rolling_window_size or
max_horizon is set to 'auto', the heuristics is applied to estimate the value of
corresponding parameter based on training data.
Fixed forecasting in the case when data set contains one grain column, this
grain is of a numeric type and there is a gap between train and test set
Fixed the error message about the duplicated index in the remote run in
forecasting tasks
Fixed forecasting in the case when data set contains one grain column, this
grain is of a numeric type and there is a gap between train and test set.
Fixed the error message about the duplicated index in the remote run in
forecasting tasks.
Added a guardrail to check whether a dataset is imbalanced or not. If it is, a
guardrail message would be written to the console.
azureml-core
Added ability to retrieve SAS URL to model in storage through the model
object. Ex: model.get_sas_url()
Introduce run.get_details()['datasets'] to get datasets associated with the
submitted run
Add API Dataset.Tabular.from_json_lines_files to create a TabularDataset
from JSON Lines files. To learn about this tabular data in JSON Lines files on
TabularDataset, visit this article for documentation.
Added other VM size fields (OS Disk, number of GPUs) to the
supported_vmsizes () function
Added more fields to the list_nodes () function to show the run, the private
and the public IP, the port etc.
Ability to specify a new field during cluster provisioning --
remotelogin_port_public_access, which can be set to enabled or disabled
depending on whether you would like to leave the SSH port open or closed
at the time of creating the cluster. If you do not specify it, the service will
smartly open or close the port depending on whether you are deploying the
cluster inside a VNet.
azureml-core
Added ability to retrieve SAS URL to model in storage through the model
object. Ex: model.get_sas_url()
Introduce run.get_details['datasets'] to get datasets associated with the
submitted run
Add API Dataset.Tabular .from_json_lines_files() to create a TabularDataset
from JSON Lines files. To learn about this tabular data in JSON Lines files on
TabularDataset, visithttps://aka.ms/azureml-data for documentation.
Added other VM size fields (OS Disk, number of GPUs) to the
supported_vmsizes() function
Added other fields to the list_nodes() function to show the run, the private,
and the public IP, the port etc.
Ability to specify a new field during cluster provisioning that can be set to
enabled or disabled depending on whether you would like to leave the SSH
port open or closed at the time of creating the cluster. If you do not specify
it, the service smartly opens or closes the port depending on whether you are
deploying the cluster inside a VNet.
Improved documentation for Explanation outputs in the classification
scenario.
Added the ability to upload the predicted y values on the explanation for the
evaluation examples. Unlocks more useful visualizations.
Added explainer property to MimicWrapper to enable getting the underlying
MimicExplainer.
Added notebook to describe Module, ModuleVersion, and ModuleStep
Added RScriptStep to support R script run via AML pipeline.
Fixed metadata parameters parsing in AzureBatchStep that was causing the
error message "assignment for parameter SubscriptionId is not specified."
Supported training_data, validation_data, label_column_name,
weight_column_name as data input format
retrieve_model_explanations()
Added a notebook to describe Module, [ModuleVersion, and ModuleStep.
Added RScriptStep to support R script run via AML pipeline.
Fixed metadata parameters parsing in [AzureBatchStep that was causing the
error message "assignment for parameter SubscriptionId is not specified".
Supported training_data, validation_data, label_column_name,
weight_column_name as data input format.
retrieve_model_explanations().
2019-09-16

New features
Introduced the timeseries trait on TabularDataset. This trait enables easy

timestamp filtering on data a TabularDataset, such as taking all data between a
range of time or the most recent data.
https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-
use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-
timeseries-dataset-filtering.ipynb for an example notebook.
Enabled training with TabularDataset and FileDataset.
azureml-train-core
Added Nccl and Gloo support in PyTorch estimator

azureml-automl-core
Deprecated the AutoML setting 'lag_length' and the LaggingTransformer.
Fixed correct validation of input data if they are specified in a Dataflow
format
Modified the fit_pipeline.py to generate the graph json and upload to
artifacts.
Rendered the graph under userrun using Cytoscape .
azureml-core
Revisited the exception handling in ADB code and make changes to as per
new error handling
Added automatic MSI authentication for Notebook VMs.
Fixes bug where corrupt or empty models could be uploaded because of
failed retries.
Fixed the bug where DataReference name changes when the DataReference
mode changes (for example, when calling as_upload , as_download , or
as_mount ).
Make mount_point and target_path optional for FileDataset.mount and

FileDataset.download .
Exception that timestamp column cannot be found is thrown out if the time
serials-related API is called without fine timestamp column assigned or the
assigned timestamp columns are dropped.
Time serials columns should be assigned with column whose type is Date,
otherwise exception is expected
Time serials columns assigning API 'with_timestamp_columns' can take None
value fine/coarse timestamp column name, which clears previously assigned
timestamp columns.
Exception will be thrown out when either coarse grain or fine grained
timestamp column is dropped with indication for user that dropping can be
done after either excluding timestamp column in dropping list or call
with_time_stamp with None value to release timestamp columns
Exception will be thrown out when either coarse grain or fine grained
timestamp column is not included in keep columns list with indication for
user that keeping can be done after either including timestamp column in
keep column list or call with_time_stamp with None value to release
timestamp columns.
Added logging for the size of a registered model.
Fixed warning printed to console when "packaging" Python package is not
installed: "Using older than supported version of lightgbm, please upgrade to
version greater than 2.2.1"
Fixed download model explanation with sharding for global explanations
with many features
Fixed mimic explainer missing initialization examples on output explanation
Fixed immutable error on set properties when uploading with explanation
client using two different types of models
Added a get_raw param to scoring explainer.explain() so one scoring
explainer can return both engineered and raw values.
Introduced public APIs from AutoML for supporting explanations from
automl explain SDK - Newer way of supporting AutoML explanations by
decoupling AutoML featurization and explain SDK - Integrated raw

explanation support from azureml explain SDK for AutoML models.
Removing azureml-defaults from remote training environments.
Changed default cache store location from FileCacheStore based one to
AzureFileCacheStore one for AutoML on Azure Databricks code path.
Fixed correct validation of input data if they are specified in a Dataflow
format
azureml-train-core
Reverted source_directory_data_store deprecation.
Added ability to override azureml installed package versions.
Added dockerfile support in environment_definition parameter in

estimators.
Simplified distributed training parameters in estimators.
Python
from azureml.train.dnn import TensorFlow, Mpi, ParameterServer
2019-09-09
New web experience (preview) for Azure Machine
Learning workspaces
The new web experience enables data scientists and data engineers to complete their
end-to-end machine learning lifecycle from prepping and visualizing data to training
and deploying models in a single location.
Key features:
Using this new Azure Machine Learning interface, you can now:
Manage your notebooks or link out to Jupyter

Run automated ML experiments
Create datasets from local files, datastores, & web files
Explore & prepare datasets for model creation
Monitor data drift for your models
View recent resources from a dashboard
At the time, of this release, the following browsers are supported: Chrome, Firefox,
Safari, and Microsoft Edge Preview.
Known issues:
1. Refresh your browser if you see "Something went wrong! Error loading chunk files"
when deployment is in progress.
2. Can't delete or rename file in Notebooks and Files. During Public Preview, you can
use Jupyter UI or Terminal in Notebook VM to perform update file operations.
Because it is a mounted network file system all changes, you make on Notebook
VM are immediately reflected in the Notebook Workspace.
3. To SSH into the Notebook VM:

a. Find the SSH keys that were created during VM setup. Or, find the keys in the
Azure Machine Learning workspace > open Compute tab > locate Notebook
VM in the list > open its properties: copy the keys from the dialog.
b. Import those public and private SSH keys to your local machine.
c. Use them to SSH into the Notebook VM.
2019-09-03

New features
Introduced FileDataset, which references single or multiple files in your
datastores or public urls. The files can be of any format. FileDataset provides
you with the ability to download or mount the files to your compute.
Added Pipeline Yaml Support for PythonScript Step, Adla Step, Databricks Step,
DataTransferStep, and AzureBatch Step
azureml-automl-core
AutoArima is now a suggestable pipeline for preview only.
Improved error reporting for forecasting.
Improved the logging by using custom exceptions instead of generic in the
forecasting tasks.
Removed the check on max_concurrent_iterations to be less than total
number of iterations.
AutoML models now return AutoMLExceptions
This release improves the execution performance of automated machine
learning local runs.
azureml-core
Introduce Dataset.get_all(workspace), which returns a dictionary of
TabularDataset and FileDataset objects keyed by their registration name.
Python
all_datasets = Dataset.get_all(workspace)
mydata = all_datasets['my-data']
Introduce parition_format as argument to

Dataset.Tabular.from_parquet.files . The partition information of each data
path is extracted into columns based on the specified format.

'{column_name}' creates string column, and
'{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where
'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day,
hour, minute, and second for the datetime type. The partition_format should
start from the position of first partition key until the end of file path. For
example, given the path '../USA/2019/01/01/data.csv' where the partition is
by country/region and time,
partition_format='/{Country}/{PartitionDate:yyyy/MM/dd}/data.csv' creates
string column 'Country' with value 'USA' and datetime column 'PartitionDate'
with value '2019-01-01'.
Python
all_datasets = Dataset.get_all(workspace)
mydata = all_datasets['my-data']
Introduce partition_format as argument to

Dataset.Tabular.from_parquet.files . The partition information of each data
path is extracted into columns based on the specified format.

'{column_name}' creates string column, and
'{column_name:yyyy/MM/dd/HH/mm/ss}' creates datetime column, where
'yyyy', 'MM', 'dd', 'HH', 'mm' and 'ss' are used to extract year, month, day,
hour, minute, and second for the datetime type. The partition_format should
start from the position of first partition key until the end of file path. For
example, given the path '../USA/2019/01/01/data.csv' where the partition is
by country/region and time,
partition_format='/{Country}/{PartitionDate:yyyy/MM/dd}/data.csv' creates
string column 'Country' with value 'USA' and datetime column 'PartitionDate'
with value '2019-01-01'.
to_csv_files and to_parquet_files methods have been added to
TabularDataset . These methods enable conversion between a

TabularDataset and a FileDataset by converting the data to files of the
specified format.
Automatically log into the base image registry when saving a Dockerfile
generated by Model.package().
'gpu_support' is no longer necessary; AML now automatically detects and

uses the nvidia docker extension when it is available. It will be removed in a
future release.
Added support to create, update, and use PipelineDrafts.
This release improves the execution performance of automated machine

learning local runs.
Users can query metrics from run history by name.
Improved the logging by using custom exceptions instead of generic in the

forecasting tasks.
Added feature_maps parameter to the new MimicWrapper, allowing users to
get raw feature explanations.
Dataset uploads are now off by default for explanation upload, and can be
re-enabled with upload_datasets=True
Added "is_law" filtering parameters to explanation list and download
functions.
Adds method get_raw_explanation(feature_maps) to both global and local
explanation objects.
Added version check to lightgbm with printed warning if below supported
version
Optimized memory usage when batching explanations
AutoML models now return AutoMLExceptions
Added support to create, update, and use PipelineDrafts - can be used to
maintain mutable pipeline definitions and use them interactively to run
Created feature to install specific versions of gpu-capable pytorch v1.1.0,
cuda toolkit 9.0, pytorch-transformers, which is required to enable BERT/
XLNet in the remote Python runtime environment.
azureml-train-core
Early failure of some hyperparameter space definition errors directly in the
sdk instead of server side.
Azure Machine Learning Data Prep SDK v1.1.14

Enabled writing to ADLS/ADLSGen2 using raw path and credentials.
Fixed a bug that caused include_path=True to not work for read_parquet .
Fixed to_pandas_dataframe() failure caused by exception "Invalid property
value: hostSecret".
Fixed a bug where files could not be read on DBFS in Spark mode.
2019-08-19

New features
Enabled TabularDataset to be consumed by AutomatedML. To learn more
about TabularDataset , visithttps://aka.ms/azureml/howto/createdatasets.

azure-cli-ml
You can now update the TLS/SSL certificate for the scoring endpoint
deployed on AKS cluster both for Microsoft generated and customer
certificate.
azureml-automl-core
Fixed an issue in AutoML where rows with missing labels were not removed
properly.
Improved error logging in AutoML; full error messages will now always be
written to the log file.
AutoML has updated its package pinning to include azureml-defaults ,
azureml-explain-model , and azureml-dataprep . AutoML no longer warns on
package mismatches (except for azureml-train-automl package).
Fixed an issue in timeseries where cv splits are of unequal size causing bin
calculation to fail.
When running ensemble iteration for the Cross-Validation training type, if we
ended up having trouble downloading the models trained on the entire
dataset, we were having an inconsistency between the model weights and
the models that were being fed into the voting ensemble.
Fixed the error, raised when training and/or validation labels (y and y_valid)
are provided in the form of pandas dataframe but not as numpy array.
Fixed the issue with the forecasting tasks when None was encountered in the
Boolean columns of input tables.
Allow AutoML users to drop training series that's not long enough when
forecasting. - Allow AutoML users to drop grains from the test set that does
not exist in the training set when forecasting.
azureml-core
Fixed issue with blob_cache_timeout parameter ordering.
Added external fit and transform exception types to system errors.
Added support for Key Vault secrets for remote runs. Add an
azureml.core.keyvault.Keyvault class to add, get, and list secrets from the
key vault associated with your workspace. Supported operations are:
azureml.core.workspace.Workspace.get_default_keyvault()
azureml.core.keyvault.Keyvault.set_secret(name, value)
azureml.core.keyvault.Keyvault.set_secrets(secrets_dict)
azureml.core.keyvault.Keyvault.get_secret(name)
azureml.core.keyvault.Keyvault.get_secrets(secrets_list)
azureml.core.keyvault.Keyvault.list_secrets()
More methods to obtain default keyvault and get secrets during remote
run:
azureml.core.workspace.Workspace.get_default_keyvault()
azureml.core.run.Run.get_secret(name)
azureml.core.run.Run.get_secrets(secrets_list)
Added other override parameters to submit-hyperdrive CLI command.
Improve reliability of API calls be expanding retries to common requests
library exceptions.
Add support for submitting runs from a submitted run.
Fixed expiring SAS token issue in FileWatcher, which caused files to stop
being uploaded after their initial token had expired.
Supported importing HTTP csv/tsv files in dataset Python SDK.
Deprecated the Workspace.setup() method. Warning message shown to
users suggests using create() or get()/from_config() instead.
Added Environment.add_private_pip_wheel(), which enables uploading
private custom Python packages whl to the workspace and securely using
them to build/materialize the environment.
You can now update the TLS/SSL certificate for the scoring endpoint
deployed on AKS cluster both for Microsoft generated and customer
certificate.
Added parameter to add a model ID to explanations on upload.
Added is_raw tagging to explanations in memory and upload.
Added pytorch support and tests for azureml-explain-model package.
Support detecting and logging auto test environment.
Added classes to get US population by county and zip.
Added label property to input and output port definitions.
azureml-telemetry
Fixed an incorrect telemetry configuration.
Fixed the bug where on setup failure, error was not getting logged in "errors"
field for the setup run and hence was not stored in parent run "errors".
Fixed an issue in AutoML where rows with missing labels were not removed
properly.
Allow AutoML users to drop training series that are not long enough when
forecasting.
Allow AutoML users to drop grains from the test set that does not exist in the
training set when forecasting.
Now AutoMLStep passes through automl config to backend to avoid any
issues on changes or additions of new config parameters.
AutoML Data Guardrail is now in public preview. User will see a Data
Guardrail report (for classification/regression tasks) after training and also be
able to access it through SDK API.
azureml-train-core
Added torch 1.2 support in PyTorch Estimator.
azureml-widgets
Improved confusion matrix charts for classification training.

New features
Lists of strings can now be passed in as input to read_* methods.

The performance of read_parquet has been improved when running in Spark.
Fixed an issue where column_type_builder failed in a single column with
ambiguous date formats.
Azure portal
Preview Feature
Log and output file streaming is now available for run details pages. The files
stream updates in real time when the preview toggle is turned on.
Ability to set quota at a workspace level is released in preview. AmlCompute
quotas are allocated at the subscription level, but we now allow you to
distribute that quota between workspaces and allocate it for fair sharing and
governance. Just click on the Usages+Quotas blade in the left navigation bar of
your workspace and select the Configure Quotas tab. You must be a
subscription admin to be able to set quotas at the workspace level since this is a
cross-workspace operation.
2019-08-05

New features
Token-based authentication is now supported for the calls made to the scoring
endpoint deployed on AKS. We continue to support the current key based
authentication and users can use one of these authentication mechanisms at a
time.
Ability to register a blob storage that is behind the virtual network (VNet) as a
datastore.

azureml-automl-core
Fixes a bug where validation size for CV splits is small and results in bad
predicted vs. true charts for regression and forecasting.
The logging of forecasting tasks on the remote runs improved, now user is
provided with comprehensive error message if the run was failed.
Fixed failures of Timeseries if preprocess flag is True.
Made some forecasting data validation error messages more actionable.
Reduced memory consumption of AutoML runs by dropping and/or lazy
loading of datasets, especially in between process spawns
azureml-contrib-explain-model
Added model_task flag to explainers to allow user to override default
automatic inference logic for model type
Widget changes: Automatically installs with contrib , no more nbextension
install/enable - support explanation with global feature importance (for
example, Permutative)
Dashboard changes: - Box plots and violin plots in addition to beeswarm plot
on summary page - Faster rerendering of beeswarm plot on 'Top -k' slider
change - helpful message explaining how top-k is computed - Useful
customizable messages in place of charts when data not provided
azureml-core
Added Model.package() method to create Docker images and Dockerfiles
that encapsulate models and their dependencies.
Updated local webservices to accept InferenceConfigs containing
Environment objects.
Fixed Model.register() producing invalid models when '.' (for the current
directory) is passed as the model_path parameter.
Add Run.submit_child, the functionality mirrors Experiment.submit while
specifying the run as the parent of the submitted child run.
Support configuration options from Model.register in Run.register_model.
Ability to run JAR jobs on existing cluster.
Now supporting instance_pool_id and cluster_log_dbfs_path parameters.
Added support for using an Environment object when deploying a Model to
a Webservice. The Environment object can now be provided as a part of the
InferenceConfig object.
Add appinsifht mapping for new regions - centralus - westus - northcentralus
Added documentation for all the attributes in all the Datastore classes.
Added blob_cache_timeout parameter to
Datastore.register_azure_blob_container .
Added save_to_directory and load_from_directory methods to
azureml.core.environment.Environment.
Added the "az ml environment download" and "az ml environment register"
commands to the CLI.
Added Environment.add_private_pip_wheel method.
Added dataset tracking to Explanations using the Dataset service (preview).
Decreased default batch size when streaming global explanations from 10k
to 100.
Added model_task flag to explainers to allow user to override default
automatic inference logic for model type.
azureml-mlflow
Fixed bug in mlflow.azureml.build_image where nested directories are
ignored.
Added ability to run JAR jobs on existing Azure Databricks cluster.
Added support instance_pool_id and cluster_log_dbfs_path parameters for
DatabricksStep step.
Added support for pipeline parameters in DatabricksStep step.
Added docstrings for the Ensemble related files.
Updated docs to more appropriate language for max_cores_per_iteration
and max_concurrent_iterations
The logging of forecasting tasks on the remote runs improved, now user is
provided with comprehensive error message if the run was failed.
Removed get_data from pipeline automlstep notebook.
Started support dataprep in automlstep .

New features
You can now request to execute specific inspectors (for example, histogram,
scatter plot, etc.) on specific columns.
Added a parallelize argument to append_columns . If True, data is loaded into
memory but execution runs in parallel; if False, execution is streaming but
single-threaded.
2019-07-23

New features
Automated Machine Learning now supports training ONNX models on the
remote compute target
Azure Machine Learning now provides ability to resume training from a previous
run, checkpoint, or model files.
Learn how to use estimators to resume training from a previous run

azure-cli-ml
CLI commands "model deploy" and "service update" now accept parameters,
config files, or a combination of the two. Parameters have precedence over
attributes in files.
Model description can now be updated after registration
azureml-automl-core
Update NimbusML dependency to 1.2.0 version (current latest).
Adding support for NimbusML estimators & pipelines to be used within
AutoML estimators.
Fixing a bug in the Ensemble selection procedure that was unnecessarily
growing the resulting ensemble even if the scores remained constant.
Enable reuse of some featurizations across CV Splits for forecasting tasks.
This speeds up the run-time of the setup run by roughly a factor of
n_cross_validations for expensive featurizations like lags and rolling windows.
Addressing an issue if time is out of pandas supported time range. We now
raise a DataException if time is less than pd.Timestamp.min or greater than
pd.Timestamp.max
Forecasting now allows different frequencies in train and test sets if they can
be aligned. For example, "quarterly starting in January" and at "quarterly
starting in October" can be aligned.
The property "parameters" was added to the TimeSeriesTransformer.
Remove old exception classes.
In forecasting tasks, the target_lags parameter now accepts a single integer
value or a list of integers. If the integer was provided, only one lag is created.
If a list is provided, the unique values of lags is taken. target_lags=[1, 2, 2, 4]
creates lags of one, two and four periods.
Fix the bug about losing columns types after the transformation (bug linked);
In model.forecast(X, y_query) , allow y_query to be an object type
containing None(s) at the begin (#459519).
Add expected values to automl output
Improvements to example notebook including switch to azureml-
opendatasets instead of azureml-contrib-opendatasets and performance
improvements when enriching data
azureml-contrib-explain-model
Fixed transformations argument for LIME explainer for raw feature
importance in azureml-contrib-explain-model package
Added segmentations to image explanations in image explainer for the
AzureML-contrib-explain-model package
Add scipy sparse support for LimeExplainer
Added batch_size to mimic explainer when include_local=False , for
streaming global explanations in batches to improve execution time of
DecisionTreeExplainableModel
azureml-contrib-featureengineering
Fix for calling set_featurizer_timeseries_params(): dict value type change and
null check - Add notebook for timeseries featurizer
azureml-core
Added the ability to attach DBFS datastores in the Azure Machine Learning
CLI
Fixed the bug with datastore upload where an empty folder is created if
target_path started with /
Fixed deepcopy issue in ServicePrincipalAuthentication.

Added the "az ml environment show" and "az ml environment list"
commands to the CLI.
Environments now support specifying a base_dockerfile as an alternative to
an already-built base_image.
The unused RunConfiguration setting auto_prepare_environment has been
marked as deprecated.
Model description can now be updated after registration
Bugfix: Model and Image delete now provides more information about
retrieving upstream objects that depend on them if delete fails due to an
upstream dependency.
Fixed bug that printed blank duration for deployments that occur when
creating a workspace for some environments.
Improved failure exceptions for workspace creation. Such that users don't see
"Unable to create workspace. Unable to find..." as the message and instead
see the actual creation failure.
Add support for token authentication in AKS webservices.
Add get_token() method to Webservice objects.
Added CLI support to manage machine learning datasets.
Datastore.register_azure_blob_container now optionally takes a
blob_cache_timeout value (in seconds) which configures blobfuse's mount

parameters to enable cache expiration for this datastore. The default is no
timeout, such as when a blob is read, it stays in the local cache until the job is
finished. Most jobs prefer this setting, but some jobs need to read more data
from a large dataset than will fit on their nodes. For these jobs, tuning this
parameter helps them succeed. Take care when tuning this parameter: setting
the value too low can result in poor performance, as the data used in an
epoch may expire before being used again. All reads are done from blob
storage/network rather than the local cache, which negatively impacts
training times.
Model description can now properly be updated after registration
Model and Image deletion now provides more information about upstream
objects that depend on them, which causes the delete to fail
Improve resource utilization of remote runs using azureml.mlflow.
Fixed transformations argument for LIME explainer for raw feature
importance in azureml-contrib-explain-model package
add scipy sparse support for LimeExplainer
added shape linear explainer wrapper, and another level to tabular explainer
for explaining linear models
for mimic explainer in explain model library, fixed error when
include_local=False for sparse data input
add expected values to automl output
fixed permutation feature importance when transformations argument
supplied to get raw feature importance
added batch_size to mimic explainer when include_local=False , for
DecisionTreeExplainableModel
for model explainability library, fixed blackbox explainers where pandas
dataframe input is required for prediction
Fixed a bug where explanation.expected_values would sometimes return a
float rather than a list with a float in it.
azureml-mlflow
Improve performance of mlflow.set_experiment(experiment_name)
Fix bug in use of InteractiveLoginAuthentication for mlflow tracking_uri
Improve resource utilization of remote runs using azureml.mlflow.
Improve the documentation of the azureml-mlflow package
Patch bug where mlflow.log_artifacts("my_dir") would save artifacts under
my_dir/<artifact-paths> instead of <artifact-paths>
Pin pyarrow of opendatasets to old versions (<0.14.0) because of memory
issue newly introduced there.
Move azureml-contrib-opendatasets to azureml-opendatasets.
Allow open dataset classes to be registered to Azure Machine Learning
workspace and use AML Dataset capabilities seamlessly.
Improve NoaaIsdWeather enrich performance in non-SPARK version
significantly.
DBFS Datastore is now supported for Inputs and Outputs in DatabricksStep.
Updated documentation for Azure Batch Step regarding inputs/outputs.
In AzureBatchStep, changed delete_batch_job_after_finish default value to
true.
azureml-telemetry
Move azureml-contrib-opendatasets to azureml-opendatasets.
Allow open dataset classes to be registered to Azure Machine Learning
significantly.
Updated documentation on get_output to reflect the actual return type and
provide other notes on retrieving key properties.
add expected values to automl output
azureml-train-core
Strings are now accepted as compute target for Automated Hyperparameter
Tuning
The unused RunConfiguration setting auto_prepare_environment has been
marked as deprecated.

New features
Added support for reading a file directly from an http or https url.

Improved error message when attempting to read a Parquet Dataset from a
remote source (which is not currently supported).
Fixed a bug when writing to Parquet file format in ADLS Gen 2, and updating
the ADLS Gen 2 container name in the path.
2019-07-09
Visual Interface
Preview features
Added "Execute R script" module in visual interface.

New features
azureml-contrib-opendatasets is now available as azureml-opendatasets.
The old package can still work, but we recommend you using azureml-
opendatasets moving forward for richer capabilities and improvements.
This new package allows you to register open datasets as Dataset in Azure
Machine Learning workspace, and use whatever functionalities that Dataset
offers.
It also includes existing capabilities such as consuming open datasets as
Pandas/SPARK dataframes, and location joins for some dataset like weather.
Preview features
HyperDriveConfig can now accept pipeline object as a parameter to support
hyperparameter tuning using a pipeline.

Fixed the bug about losing columns types after the transformation.
Fixed the bug to allow y_query to be an object type containing None(s) at the
beginning.
Fixed the issue in the Ensemble selection procedure that was unnecessarily
growing the resulting ensemble even if the scores remained constant.
Fixed the issue with allow list_models and block list_models settings in
AutoMLStep.
Fixed the issue that prevented the usage of preprocessing when AutoML
would have been used in the context of Azure Machine Learning Pipelines.
Moved azureml-contrib-opendatasets to azureml-opendatasets.
Allowed open dataset classes to be registered to Azure Machine Learning
Improved NoaaIsdWeather enrich performance in non-SPARK version
significantly.
Updated online documentation for interpretability objects.
Added batch_size to mimic explainer when include_local=False , for
DecisionTreeExplainableModel for model explainability library.
Fixed the issue where explanation.expected_values would sometimes return
a float rather than a list with a float in it.
Added expected values to automl output for mimic explainer in explain
model library.
Fixed permutation feature importance when transformations argument
supplied to get raw feature importance.
azureml-core
Added the ability to attach DBFS datastores in the Azure Machine Learning
CLI.
Fixed the issue with datastore upload where an empty folder is created if
target_path started with / .
Enabled comparison of two datasets.
Model and Image delete now provides more information about retrieving
upstream objects that depend on them if delete fails due to an upstream
dependency.
Deprecated the unused RunConfiguration setting in
auto_prepare_environment.
azureml-mlflow
Improved resource utilization of remote runs that use azureml.mlflow.
Improved the documentation of the azureml-mlflow package.
Fixed the issue where mlflow.log_artifacts("my_dir") would save artifacts
under "my_dir/artifact-paths" instead of "artifact-paths".
Parameter hash_paths for all pipeline steps is deprecated and will be
removed in future. By default contents of the source_directory is hashed
(except files listed in .amlignore or .gitignore )
Continued improving Module and ModuleStep to support compute type-
specific modules, to prepare for RunConfiguration integration and other
changes to unlock compute type-specific module usage in pipelines.
AzureBatchStep: Improved documentation about inputs/outputs.
AzureBatchStep: Changed delete_batch_job_after_finish default value to true.
azureml-train-core
Strings are now accepted as compute target for Automated Hyperparameter
Tuning.
Deprecated the unused RunConfiguration setting in
auto_prepare_environment.
Deprecated parameters conda_dependencies_file_path and
pip_requirements_file_path in favor of conda_dependencies_file and
pip_requirements_file respectively.
significantly.
2019-04-26
Azure Machine Learning SDK for Python v1.0.33 released.

Azure Machine Learning Hardware Accelerated Models on FPGAs is generally
available.
You can now use the azureml-accel-models package to:
Train the weights of a supported deep neural network (ResNet 50, ResNet
152, DenseNet-121, VGG-16, and SSD-VGG)
Use transfer learning with the supported DNN
Register the model with Model Management Service and containerize the
model
Deploy the model to an Azure VM with an FPGA in an Azure Kubernetes
Service (AKS) cluster
Deploy the container to an Azure Stack Edge server device
Score your data with the gRPC endpoint with this sample
Automated Machine Learning

Feature sweeping to enable dynamically adding featurizers for performance
optimization. New featurizers: work embeddings, weight of evidence, target
encodings, text target encoding, cluster distance
Smart CV to handle train/valid splits inside automated ML
Few memory optimization changes and runtime performance improvement
Performance improvement in model explanation
ONNX model conversion for local run
Added Subsampling support
Intelligent Stopping when no exit criteria defined
Stacked ensembles
Time Series Forecasting

New predict forecast function
You can now use rolling-origin cross validation on time series data
New functionality added to configure time series lags
New functionality added to support rolling window aggregate features
New Holiday detection and featurizer when country/region code is defined in
experiment settings
Azure Databricks
Enabled time series forecasting and model explainabilty/interpretability
capability
You can now cancel and resume (continue) automated ML experiments
Added support for multicore processing
MLOps
Local deployment & debugging for scoring containers
You can now deploy an ML model locally and iterate quickly on your scoring file
and dependencies to ensure they behave as expected.
Introduced InferenceConfig & Model.deploy()

Model deployment now supports specifying a source folder with an entry script,
the same as a RunConfig. Additionally, model deployment has been simplified to a
single command.
Git reference tracking

Customers have been requesting basic Git integration capabilities for some time as
it helps maintain a complete audit trail. We have implemented tracking across
major entities in Azure Machine Learning for Git-related metadata (repo, commit,
clean state). This information will be collected automatically by the SDK and CLI.
Model profiling & validation service

Customers frequently complain of the difficulty to properly size the compute
associated with their inference service. With our model profiling service, the
customer can provide sample inputs and we profile across 16 different CPU /
memory configurations to determine optimal sizing for deployment.
Bring your own base image for inference

Another common complaint was the difficulty in moving from experimentation to
inference RE sharing dependencies. With our new base image sharing capability,
you can now reuse your experimentation base images, dependencies and all, for
inference. This should speed up deployments and reduce the gap from the inner to
the outer loop.
Improved Swagger schema generation experience

Our previous swagger generation method was error prone and impossible to
automate. We have a new in-line way of generating swagger schemas from any
Python function via decorators. We have open-sourced this code and our schema
generation protocol is not coupled to the Azure Machine Learning platform.
Azure Machine Learning CLI is generally available (GA)

Models can now be deployed with a single CLI command. We got common
customer feedback that no one deploys an ML model from a Jupyter notebook.
The CLI reference documentation has been updated.
2019-04-22
Azure Machine Learning SDK for Python v1.0.30 released.
The PipelineEndpoint was introduced to add a new version of a published pipeline while
maintaining same endpoint.
2019-04-15
Azure portal
You can now resubmit an existing Script run on an existing remote compute
cluster.
You can now run a published pipeline with new parameters on the Pipelines tab.
Run details now supports a new Snapshot file viewer. You can view a snapshot of
the directory when you submitted a specific run. You can also download the
notebook that was submitted to start the run.
You can now cancel parent runs from the Azure portal.
2019-04-08

New features
The Azure Machine Learning SDK now supports Python 3.7.
Azure Machine Learning DNN Estimators now provide built-in multi-version
support. For example, TensorFlow estimator now accepts a framework_version
parameter, and users can specify version '1.10' or '1.12'. For a list of the versions
supported by your current SDK release, call get_supported_versions() on the
desired framework class (for example, TensorFlow.get_supported_versions() ).
For a list of the versions supported by the latest SDK release, see the DNN
Estimator documentation.
2019-03-25

New features
The azureml.core.Run.create_children method allows low-latency creation of
multiple child-runs with a single call.
2019-03-11

Changes
The azureml-tensorboard package replaces azureml-contrib-tensorboard.
With this release, you can set up a user account on your managed compute
cluster (amlcompute), while creating it. This can be done by passing these
properties in the provisioning configuration. You can find more details in the
SDK reference documentation.

New features
Now supports adding two numeric columns to generate a resultant column
using the expression language.

Improved the documentation and parameter checking for random_split.
2019-02-27

Bug fix
Fixed a Service Principal authentication issue that was caused by an API change.
2019-02-25

New features
Azure Machine Learning now provides first class support for popular DNN
framework Chainer. Using Chainer class users can easily train and deploy
Chainer models.
Learn how to run hyperparameter tuning with Chainer using HyperDrive
Azure Machine Learning Pipelines added ability to trigger a Pipeline run based
on datastore modifications. The pipeline schedule notebook is updated to
showcase this feature.

We have added support in Azure Machine Learning pipelines for setting the
source_directory_data_store property to a desired datastore (such as a blob
storage) on RunConfigurations that are supplied to the PythonScriptStep. By
default Steps use Azure File store as the backing datastore, which may run into
throttling issues when a large number of steps are executed concurrently.
Azure portal
New features
New drag and drop table editor experience for reports. Users can drag a column
from the well to the table area where a preview of the table will be displayed.
The columns can be rearranged.
New Logs file viewer
Links to experiment runs, compute, models, images, and deployments from the
activities tab
Next steps
Read the overview for Azure Machine Learning.

Azure Machine Learning Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Azure Machine Learning Guide

Uploaded by

Copyright:

Available Formats

Tell us about your PDF experience.

Azure Machine Learning documentation

What is Azure Machine Learning?

Setup & quickstart

Get started with Azure Machine Learning

Start with the basics

Prepare and explore data

Develop on a cloud workstation

Set up a reusable pipeline

Work with data

Use Apache Spark in Azure Machine Learning

Create data assets

Run training with CLI, SDK, or REST API

Tune hyperparameters for model training

Build pipelines from reuseable components

Use automated ML in studio

Streamline model deployment with endpoints

Real-time scoring with online endpoints

Batch scoring with batch endpoints

Manage the ML lifecycle (MLOps)

Track, monitor, analyze training runs

Model management, deployment & monitoring

Security for ML projects

Create a secure workspace

Enterprise security & governance

Python SDK (v2)

Algorithm & component reference

Python SDK (v2) code examples

CLI (v2) code examples

ML Studio (classic) documentation

APPLIES TO: Azure CLI ml extension v1 Python SDK azureml v1

How do I know which SDK version I have?

How do I know which CLI extension I have?

If the list of Extensions contains azure-cli-ml , you have the v1 extension.

azure-cli-ml - Install, set up, and use the CLI (v1)

ml - Install and set up the CLI (v2)

APPLIES TO: Azure CLI ml extension v1 Python SDK azureml v1

The workspace is the centralized place to:

You can share a workspace with others.

Compute instance: A compute instance is a VM that includes multiple tools and

Datasets and datastores

An environment is the encapsulation of the environment where training or scoring of

An experiment is a grouping of many runs from a specified script. It always belongs to a

A run is a single execution of a training script. An experiment will typically contain

Metadata about the run (timestamp, duration, and so on)

For example run configurations, see Configure a training run.

Git tracking and integration

Machine Learning Compute, accessed through a workspace-managed identity.

1. Remote Docker construction is kicked off, if needed.

Environment. This environment encapsulates the dependencies required to run

Web service endpoint

Here are the details:

A pipeline endpoint is a collection of published pipelines. This logical organization lets

Azure Machine Learning CLI

Monitoring and logging

Interacting with your workspace

What is Azure Machine Learning?

Learn how to configure a Python development environment for Azure Machine

Environment Pros Cons

Local Full control of your development Takes longer to get started.

The Data Similar to the cloud-based compute A slower getting started

Local and DSVM only: Create a workspace configuration

Create a workspace configuration file in one of the following methods:

APPLIES TO: Python SDK azureml v1