Effortless Models Deployment With MLFlow

2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
Open in app Sign up Sign In
You have 2 free member-only stories left this month.

Sign up for Medium and get an extra one
Facundo Santiago Follow
Mar 16, 2022 · 12 min read · · Listen
Save
Effortless models deployment with MLFlow

Save your Machine Learning models in an open-source format with
MLFlow to unlock effortless deployment experience later.
Welcome back to the series Effortless model deployment with MLFlow!
1. MLFlow: Introduction to the MLModel specification.(this post)
2. Customizing inference with MLFlow.
3. Packaging models with multiple pieces: deploying a recommender system.
4. Packaging models with multiple assets: deploying a HuggingFace NLP model for
classification.
18 1
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 1/15
MLflow is an open-source platform, designed to manage the complete machine

learning lifecycle. As it is open-source, it can be used when training models on
different platforms which allows you to avoid vendor lock-ins and to move freely
from one platform to another one.
Most of the time, people are familiar with MLFlow’s capabilities for tracking
experiments, logging parameters, metrics, and artifacts from a given run. It also
comes with a nice user interface to compare and evaluate models.
However, one fact that is usually overseen about MLFlow— which is really appealing
to me — is the introduction of a self-contained specification for models supporting
any modern ML framework: the MLModel specification. The good thing about this
specification is that it doesn’t just allow you to save the model in a standard, open-
source format, but it also allows you to easily deploy them on containers without too
much effort.
Effortless deployments is paramount on today’s

organizations. More than 87% of models being
created don’t make it to production. Among the
many reasons, one of the top ones are the
challenges associated with moving from the training
environment to a production ready environment.
To illustrate the potential, in this post we will build a binary classifier model for
computer vision using the popular Deep Learning library FastAI that can classify
images as cats or dogs (yes, nothing fancy). We will see how — by adopting an open-
source model specification format inside of MLFlow — we can quickly move from
model training to deployment without much effort. We won’t go over the model
building in this post (cause it is not the main point we want to highlight), but you
can find the complete notebook for training the model and deploying it in my
repository.
First things first: How models are saved in MLFlow?

When running experiments and training models with the help of MLflow, you can
log your trained models using an instruction like the following one:
Here, we created a new experiment called cats-vs-dogs and we are logging (persisting)
the resulting model using the function log_model .
import mlflow
mlflow.set_experiment("cats-vs-dogs")
with mlflow.start_run():
(...)
mlflow.fastai.log_model(learn, "classifier",
registered_model_name="cats_vs_dogs"
signature=signature)
MLFlow is framework agnostic, meaning that log_model will work with multiple
machine learning frameworks. For instance:
mlflow.fastai.log_model , will log a model created using FastAI.
mlflow.keras.log_model , will log a model created using TensorFlow Keras.
mlflow.sklean.log_model , will log a model created using Scikit-Learn.
Models will be logged inside of the MLFlow Tracking Server for the particular run.
Finally, since we indicated registered_model_name=”cats_vs_dogs” , the model will
also be registered in the model registry under the given name. They are saved using
the MLModel specification (details in the next section).
Currently, MLFlow supports logging models in fastai, gluon, pyspark, pytorch,

tensorflow, xgboost, catboost, h2o, keras, lightgbm, mleap, onnx, prophet, spacy,
spark and statsmodels. Hope I don’t forget anyone.
The MLModel specification

MLFlow has a particular way to package models. The MLModel models are saved in
folders, rather than in single files. The folder contains an arbitrary amount of files
together with an special one called MLModel at the root of the folder. This file is the
single source of truth about how a model can be loaded and used.
The MLModel file

Let’s take a look at such a file for the binary classification model created using the
FastAI deep learning library.
artifact_path: classifier
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12
model_uuid: e694c68eba484299976b06ab9058f636
run_id: e13da8ac-b1e6-45d4-a9b2-6a0a5cfac537
signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, -1, -1, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'
utc_time_created: '2022-02-24 21:56:17.489543'
This specification tells us a lot about a model:
artifact_path is the path where the model has persisted inside the run. In this
case, the path is classifier , meaning that if you query the run’s artifacts, you
will see a path called classifier that contains this given model. It’s just a name,
you don’t have to worry much about it.
flavors are the different ways the model can be invoked. It is a very powerful
concept because this convention allows MLFlow to understand any model
created with the ML frameworks they support. In this particular case, 2 flavors
are available, fastai and python_function . fastai is the flavor that allows
loading the model using FastAI API (We will address python_function flavor later
in the post as it deserves it). Notice that fastai flavor indicates some other
fields, including data — the location where the model is placed and
fastai_version — meaning the version of the fastai library used. Those are
specific for fastai.
model_uuid is the unique identifier of the given model.
run_id is the ID of the run (or experiment run, trial, whatsoever) that generated
this model.
utc_time_created: the timestamp when this model was created.
signature indicates which are the expected inputs for this model and which are
the expected outputs. Note that here all the flavors share the same signature
(this may look different from TensorFlow Serving if you are familiar with it). In
this particular case, this model receives tensors with shape (-1, -1, -1, 3) —
meaning a batch of images of variable height and width pixels in color (3
channels). The output is a batch of tensors of shape 2, having the two
probabilities as this is a binary classification case. The first -1 on both inputs
and outputs corresponds to the batch size, with is variable too. See the next
section for a better understanding of signatures.
Signatures
Model signatures are an important part of the model specification, as it serves as a
data contract between the model and the server running our models. It is also
important cause any data parsing is done automatically by MLFlow, so type
checking will happen when data is submitted to your model. For instance, data
type conversions.
Don’t worry, signatures in MLFlow work better than on email.
Signatures are indicated as a parameter of the function log_model(... ,
signature=...) when you log your model. There are two ways to indicate the
signature. One way to do that is to use the infer_signature() method which will try
to infer the signature from the given inputs and outputs of the model. The following
is an extract from the original notebook sample:
from mlflow.models.signature import infer_signature

# Create a sample input
sample_img = load_image(image_files[1])
sample_img_arr = np.array(sample_img)
# Run the sample through the model
dl_model = learn.dls.test_dl([sample_img_arr])
real_preds, _ = learn.get_preds(dl=dl_model)
# Infer the signatures
signature = infer_signature(np.aray([sample_img_arr]),
real_preds.detach().numpy())
where,
[sample_img_arr] is a sample input image created to feed the model. Since the
model takes batches of data, it is surrounded by [] to make it an array.
real_preds are the expected predictions returned from the model.
Important: This method has some drawbacks as, for instance, this will capture the image
size of the sample image provided. In our case, images can be any size. So this won’t work.
Another way to do that is to indicate the signature explicitly. I rather use this one
cause inputs and outputs are something that you should really be aware of. If they
don’t fit into your expectation, then an error will occur —which is good since it
means that you are missing something.
Let’s see this particular case. The model will take batches of images and will return
the class probabilities of each class (in our case either cat or dog). This means that
our inputs will be tensors of shape (IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS) . Since the
model expects batches of images, the right input shape is (BATCH_SIZE,
IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS) . The schema then will be as follows:
import numpy as np
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
input_schema = Schema([
TensorSpec(np.dtype(np.uint8), (-1, -1, -1, 3)),
])
output_schema = Schema([
TensorSpec(np.dtype(np.float32), (-1, 2)),
])
signature = ModelSignature(inputs=input_schema,
outputs=output_schema)
You will note that the input signature contains a lot of -1. Our model can handle variable
image sizes, since it has a transformation that resizes the image in the process (the
Resize(224) transformation in the source code). Then, we can indicate a variable input
size using -1 in height and width . The same applies to batch size, since we can handle
variable sizes.
Note also inputs numeric formats. Inputs are Int8 — meaning the inputs are
indicated using RGB in the range of 0–255 — while outputs are float32 — meaning
the outputs are probabilities from 0–1.
The conda environment
The MLModel folder contains also a conda.yml file on it. This conda file indicates
how to construct a conda environment along with all the required dependencies to
run the model. That means that if we create an environment using this file, we
should be able to run the given model without issues.
This is paramount to ensure reproducibility and ensure we are able to run the models in
production in the same way they were constructed by data scientist.
In this case, the file looks as follows:
channels:
- conda-forge
dependencies:
- python=3.8.12
- pip
- pip:
- mlflow
- defusedxml==0.7.1
- fastai==2.4.1
- google-cloud-storage==1.41.0
- ipython==8.0.1
name: mlflow-env
How does MLFlow know which are the dependencies to include in the conda
environment?
One of the parameters of the log_model function is conda_env . Here you have the
chance to indicate what your environment should look like to run the given model.
If not indicated, MLFlow will try to infer the requirements based on the flavor you
are using. Check Mlflow documentation to know how you can indicate this conda
environment definition if needed.
Is it about time to talk about python_function flavor now?

Yes, it's about time! When we talked about the flavors in the example above, we saw
that even when we logged the model using FastAI, another flavor was automatically
added to the MLModel file. What is it for? The python_function (also known as
pyfunc ) flavor, provides two things:
A way to create custom models or custom ways to load a model. If this is your
case, check the next post on this series: Effortless models deployment with
MLFlow — Customizing inference.
A generic way to specify how to load any model regardless of its flavor.
That last one is the reason why I preferred to leave this model flavor to the end.
Having a generic way to load and execute a model results extremely appealing for
serving!
Here there’s the deal. When you load a model using its own flavor, then the object
that is loaded into memory is the specific implementation of the model on that
framework. For instance, if we load a sequential TensorFlow model in Keras using
the Keras flavor, then the object returned will be an instance of
tensorflow.keras.Sequential . For FastAI, it would be an instance of the object
fastai.learner . Some of these objects may have a method model.predict() , others
model.predict_proba() , others may be invoked as a function like model(inputs) , you
named.
learner = mlflow.fastai.load_model("models:/cats_vs_dogs/1")
learner.get_preds(...)
Did you notice we are calling get_preds() ? this is a method of a fastai.learner object.
load_model loads the models as it would be loaded using the library itself.
This introduces a problem with serving, cause your serving code will have a
different shape depending on the framework of the model you are serving! It would
be better to have a consistent way to load and invoke a model, regardless of its
flavor right? Well, that’s the job of pyfunc flavor. This flavor will load into memory a
wrapper of the original model, containing just one function: model.predict()
You will see later that this is the reason that, when deploying a model using
MLFlow, the flavor that is used for serving the models is the python_function or
pyfunc flavor, cause it provides a consistent way to invoke the model regardless of
its flavor.
How to deploy these models?

The benefit of having models persisted in this way is that deploying them comes
free of charge. MLFlow can serve any model persisted model in this way by running
the following command:
mlflow models serve -m models:/cats_vs_dogs/1
This will do a couple of things:
Download your model from MLFlow server. Remember that the environment
variable MLFLOW_TRACKING_URI is used to identify where your tracking server is.
Create a new conda environment according to your model’s conda.yml file.
Start a GUNICORN server and serve your model using the python_function
flavor.
Easy right? All the complexity about how to create the inference server has gone
away! We can now test out our model by submitting a request.
The resulting server will be listening on port 5000 on the path invocations
How requests are sent?

Requests are encoded in JSON, regardless of the input data type you are sending to
it. MLflow supports the following formats:
JSON-serialized pandas DataFrames in either split orientation or records
orientation. For example, data = pandas_df.to_json(orient='split') . This
format is specified using a Content-Type request header value of

application/json or application/json; format=pandas-split - application/json;
format=pandas-records .
CSV-serialized pandas DataFrames. For example, data = pandas_df.to_csv() .
This format is specified using a Content-Type request header value of text/csv .
Tensor input formatted as described in TF Serving’s API docs where the

provided inputs will be cast to Numpy arrays. This format is specified using a
Content-Type request header value of application/json and the instances or
inputs key in the request body dictionary.
In our case, the model receives tensors, so we need to indicate a request in the TF
Serving format. A sample request will look like this (note that I’m placing the
request’s content into a JSON file for convenience).
import json
with open("sample.json", "w") as f:

f.write(json.dumps(
{
"instances": sample_batch.tolist()
}
))
sample_batch was constructed like this:
sample_img = load_image(image_files[1]) # Pointing to any image

sample_batch = sample_img_arr.reshape(1,
sample_img.height,
sample_img.width,
3)
What is this code doing? The last line transforms a single image tensor of size
(IMAGE_WIDTH, IMAGE_HEIGHT, CHANNELS) to a batch of 1 item of size (1, IMAGE_WIDTH,
IMAGE_HEIGHT, CHANNELS) . Remember that our model receives batches of images, not only
one. We need to comply with the signature.
Check the original notebook for more details about how this is actually done.
Finally, we can run the model like this from a terminal (note that this is bash. If you
are running Windows, you can run the invocation using Postman).
!cat -A sample.json | curl http://127.0.0.1:5000/invocations \

--request POST \
--header 'Content-Type: application/json' \
--data-binary @-
Let’s check the output:
Deploying your model to a cloud provider

MLFlow is integrated with several ML platforms meaning that you can deploy
models on those targets without changing a single line of code. Wow! Right now
MLFLow supports:
Azure Machine Learning
AWS SageMaker
Apache Spark UDF functions
Take a Built-in deployment tools documentation in MLFlow website.
You have an example about how to do this on Azure Machine Learning in the sample
notebook I shared before.
Deploying your model somewhere else as a docker container
Can’t you solve all your problems with containers?
You can also package your model into a container image to then deploy it whatever
you want. The command to achieve that is pretty simple:
mlflow models build-docker -m models:/cats_vs_dogs/1 -n

"cats_vs_dogs_classifier"
This will create a docker image called cats_vs_dogs_classifier that you can port to
your deployment target of preference. Now it starts to look appealing, isn’t it? All
those deployment options for free.
Conclusion
The idea of this post was to show how, just by adopting an open-source model
specification format inside of MLFlow, we can gain a lot of deployment of options
for free, avoid any vendor lock-ins, and unlock the potential of our models. In the
next post, I will show you how you can take this to the next level by having custom
inference routines, as so far your model will output predictions in the way the
framework of choice decided to do by default. However, the MLModel format has
ways to do that. Next post:
Effortless models deployment with Mlflow — Customizing

inference
Save your ML models in an open-source format with MLFlow to
unlock effortless deployment experience later. Customizing…
santiagof.medium.com
Mlflow Machine Learning Mlops
About Help Terms Privacy
Get the Medium app

Effortless Models Deployment With MLFlow - by Facundo Santiago - Medium

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Effortless Models Deployment With MLFlow - by Facundo Santiago - Medium

Uploaded by

Copyright:

Available Formats

2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium

Open in app Sign up Sign In

You have 2 free member-only stories left this month.

Facundo Santiago Follow

Mar 16, 2022 · 12 min read · · Listen

1. MLFlow: Introduction to the MLModel specification.(this post)

2. Customizing inference with MLFlow.

3. Packaging models with multiple pieces: deploying a recommender system.

MLflow is an open-source platform, designed to manage the complete machine

Effortless deployments is paramount on today’s

First things first: How models are saved in MLFlow?

mlflow.fastai.log_model , will log a model created using FastAI.

mlflow.keras.log_model , will log a model created using TensorFlow Keras.

mlflow.sklean.log_model , will log a model created using Scikit-Learn.

Currently, MLFlow supports logging models in fastai, gluon, pyspark, pytorch,

The MLModel specification

The MLModel file

This specification tells us a lot about a model:

model_uuid is the unique identifier of the given model.

utc_time_created: the timestamp when this model was created.

Don’t worry, signatures in MLFlow work better than on email.

Signatures are indicated as a parameter of the function log_model(... ,

from mlflow.models.signature import infer_signature

real_preds are the expected predictions returned from the model.

IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS) . The schema then will be as follows:

The conda environment

In this case, the file looks as follows:

Is it about time to talk about python_function flavor now?

MLFlow — Customizing inference.

How to deploy these models?

the following command:

mlflow models serve -m models:/cats_vs_dogs/1

This will do a couple of things:

Create a new conda environment according to your model’s conda.yml file.

How requests are sent?

JSON-serialized pandas DataFrames in either split orientation or records

orientation. For example, data = pandas_df.to_json(orient='split') . This

format is specified using a Content-Type request header value of

CSV-serialized pandas DataFrames. For example, data = pandas_df.to_csv() .

This format is specified using a Content-Type request header value of text/csv .

Tensor input formatted as described in TF Serving’s API docs where the

request’s content into a JSON file for convenience).

with open("sample.json", "w") as f:

sample_batch was constructed like this:

sample_img = load_image(image_files[1]) # Pointing to any image

!cat -A sample.json | curl http://127.0.0.1:5000/invocations \

Let’s check the output:

Deploying your model to a cloud provider

Azure Machine Learning

Apache Spark UDF functions

Take a Built-in deployment tools documentation in MLFlow website.

Deploying your model somewhere else as a docker container

Can’t you solve all your problems with containers?

mlflow models build-docker -m models:/cats_vs_dogs/1 -n

Effortless models deployment with Mlflow — Customizing

Mlflow Machine Learning Mlops

About Help Terms Privacy

Get the Medium app

You might also like