Professional Documents
Culture Documents
Save
4. Packaging models with multiple assets: deploying a HuggingFace NLP model for
classification.
18 1
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 1/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
Most of the time, people are familiar with MLFlow’s capabilities for tracking
experiments, logging parameters, metrics, and artifacts from a given run. It also
comes with a nice user interface to compare and evaluate models.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 2/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
However, one fact that is usually overseen about MLFlow— which is really appealing
to me — is the introduction of a self-contained specification for models supporting
any modern ML framework: the MLModel specification. The good thing about this
specification is that it doesn’t just allow you to save the model in a standard, open-
source format, but it also allows you to easily deploy them on containers without too
much effort.
Here, we created a new experiment called cats-vs-dogs and we are logging (persisting)
the resulting model using the function log_model .
import mlflow
mlflow.set_experiment("cats-vs-dogs")
with mlflow.start_run():
(...)
mlflow.fastai.log_model(learn, "classifier",
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 3/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
registered_model_name="cats_vs_dogs"
signature=signature)
MLFlow is framework agnostic, meaning that log_model will work with multiple
machine learning frameworks. For instance:
Models will be logged inside of the MLFlow Tracking Server for the particular run.
Finally, since we indicated registered_model_name=”cats_vs_dogs” , the model will
also be registered in the model registry under the given name. They are saved using
the MLModel specification (details in the next section).
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 4/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
artifact_path: classifier
flavors:
fastai:
data: model.fastai
fastai_version: 2.4.1
python_function:
data: model.fastai
env: conda.yaml
loader_module: mlflow.fastai
python_version: 3.8.12
model_uuid: e694c68eba484299976b06ab9058f636
run_id: e13da8ac-b1e6-45d4-a9b2-6a0a5cfac537
signature:
inputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "uint8", "shape": [-1, -1, -1, 3]}
}]'
outputs: '[{"type": "tensor",
"tensor-spec":
{"dtype": "float32", "shape": [-1,2]}
}]'
utc_time_created: '2022-02-24 21:56:17.489543'
artifact_path is the path where the model has persisted inside the run. In this
case, the path is classifier , meaning that if you query the run’s artifacts, you
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 5/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
will see a path called classifier that contains this given model. It’s just a name,
you don’t have to worry much about it.
flavors are the different ways the model can be invoked. It is a very powerful
concept because this convention allows MLFlow to understand any model
created with the ML frameworks they support. In this particular case, 2 flavors
are available, fastai and python_function . fastai is the flavor that allows
loading the model using FastAI API (We will address python_function flavor later
in the post as it deserves it). Notice that fastai flavor indicates some other
fields, including data — the location where the model is placed and
fastai_version — meaning the version of the fastai library used. Those are
specific for fastai.
run_id is the ID of the run (or experiment run, trial, whatsoever) that generated
this model.
signature indicates which are the expected inputs for this model and which are
the expected outputs. Note that here all the flavors share the same signature
(this may look different from TensorFlow Serving if you are familiar with it). In
this particular case, this model receives tensors with shape (-1, -1, -1, 3) —
meaning a batch of images of variable height and width pixels in color (3
channels). The output is a batch of tensors of shape 2, having the two
probabilities as this is a binary classification case. The first -1 on both inputs
and outputs corresponds to the batch size, with is variable too. See the next
section for a better understanding of signatures.
Signatures
Model signatures are an important part of the model specification, as it serves as a
data contract between the model and the server running our models. It is also
important cause any data parsing is done automatically by MLFlow, so type
checking will happen when data is submitted to your model. For instance, data
type conversions.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 6/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
signature=...) when you log your model. There are two ways to indicate the
signature. One way to do that is to use the infer_signature() method which will try
to infer the signature from the given inputs and outputs of the model. The following
is an extract from the original notebook sample:
where,
[sample_img_arr] is a sample input image created to feed the model. Since the
model takes batches of data, it is surrounded by [] to make it an array.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 7/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
Important: This method has some drawbacks as, for instance, this will capture the image
size of the sample image provided. In our case, images can be any size. So this won’t work.
Another way to do that is to indicate the signature explicitly. I rather use this one
cause inputs and outputs are something that you should really be aware of. If they
don’t fit into your expectation, then an error will occur —which is good since it
means that you are missing something.
Let’s see this particular case. The model will take batches of images and will return
the class probabilities of each class (in our case either cat or dog). This means that
our inputs will be tensors of shape (IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS) . Since the
model expects batches of images, the right input shape is (BATCH_SIZE,
import numpy as np
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
input_schema = Schema([
TensorSpec(np.dtype(np.uint8), (-1, -1, -1, 3)),
])
output_schema = Schema([
TensorSpec(np.dtype(np.float32), (-1, 2)),
])
signature = ModelSignature(inputs=input_schema,
outputs=output_schema)
You will note that the input signature contains a lot of -1. Our model can handle variable
image sizes, since it has a transformation that resizes the image in the process (the
Resize(224) transformation in the source code). Then, we can indicate a variable input
size using -1 in height and width . The same applies to batch size, since we can handle
variable sizes.
Note also inputs numeric formats. Inputs are Int8 — meaning the inputs are
indicated using RGB in the range of 0–255 — while outputs are float32 — meaning
the outputs are probabilities from 0–1.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 8/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
The MLModel folder contains also a conda.yml file on it. This conda file indicates
how to construct a conda environment along with all the required dependencies to
run the model. That means that if we create an environment using this file, we
should be able to run the given model without issues.
This is paramount to ensure reproducibility and ensure we are able to run the models in
production in the same way they were constructed by data scientist.
channels:
- conda-forge
dependencies:
- python=3.8.12
- pip
- pip:
- mlflow
- defusedxml==0.7.1
- fastai==2.4.1
- google-cloud-storage==1.41.0
- ipython==8.0.1
name: mlflow-env
How does MLFlow know which are the dependencies to include in the conda
environment?
One of the parameters of the log_model function is conda_env . Here you have the
chance to indicate what your environment should look like to run the given model.
If not indicated, MLFlow will try to infer the requirements based on the flavor you
are using. Check Mlflow documentation to know how you can indicate this conda
environment definition if needed.
A way to create custom models or custom ways to load a model. If this is your
case, check the next post on this series: Effortless models deployment with
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 9/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
A generic way to specify how to load any model regardless of its flavor.
That last one is the reason why I preferred to leave this model flavor to the end.
Having a generic way to load and execute a model results extremely appealing for
serving!
Here there’s the deal. When you load a model using its own flavor, then the object
that is loaded into memory is the specific implementation of the model on that
framework. For instance, if we load a sequential TensorFlow model in Keras using
the Keras flavor, then the object returned will be an instance of
tensorflow.keras.Sequential . For FastAI, it would be an instance of the object
fastai.learner . Some of these objects may have a method model.predict() , others
model.predict_proba() , others may be invoked as a function like model(inputs) , you
named.
learner = mlflow.fastai.load_model("models:/cats_vs_dogs/1")
learner.get_preds(...)
Did you notice we are calling get_preds() ? this is a method of a fastai.learner object.
load_model loads the models as it would be loaded using the library itself.
This introduces a problem with serving, cause your serving code will have a
different shape depending on the framework of the model you are serving! It would
be better to have a consistent way to load and invoke a model, regardless of its
flavor right? Well, that’s the job of pyfunc flavor. This flavor will load into memory a
wrapper of the original model, containing just one function: model.predict()
You will see later that this is the reason that, when deploying a model using
MLFlow, the flavor that is used for serving the models is the python_function or
pyfunc flavor, cause it provides a consistent way to invoke the model regardless of
its flavor.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 10/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
Download your model from MLFlow server. Remember that the environment
variable MLFLOW_TRACKING_URI is used to identify where your tracking server is.
Start a GUNICORN server and serve your model using the python_function
flavor.
Easy right? All the complexity about how to create the inference server has gone
away! We can now test out our model by submitting a request.
The resulting server will be listening on port 5000 on the path invocations
format=pandas-records .
In our case, the model receives tensors, so we need to indicate a request in the TF
Serving format. A sample request will look like this (note that I’m placing the
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 11/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
import json
What is this code doing? The last line transforms a single image tensor of size
(IMAGE_WIDTH, IMAGE_HEIGHT, CHANNELS) to a batch of 1 item of size (1, IMAGE_WIDTH,
IMAGE_HEIGHT, CHANNELS) . Remember that our model receives batches of images, not only
one. We need to comply with the signature.
Check the original notebook for more details about how this is actually done.
Finally, we can run the model like this from a terminal (note that this is bash. If you
are running Windows, you can run the invocation using Postman).
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 12/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
AWS SageMaker
You have an example about how to do this on Azure Machine Learning in the sample
notebook I shared before.
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 13/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
You can also package your model into a container image to then deploy it whatever
you want. The command to achieve that is pretty simple:
This will create a docker image called cats_vs_dogs_classifier that you can port to
your deployment target of preference. Now it starts to look appealing, isn’t it? All
those deployment options for free.
Conclusion
The idea of this post was to show how, just by adopting an open-source model
specification format inside of MLFlow, we can gain a lot of deployment of options
for free, avoid any vendor lock-ins, and unlock the potential of our models. In the
next post, I will show you how you can take this to the next level by having custom
inference routines, as so far your model will output predictions in the way the
framework of choice decided to do by default. However, the MLModel format has
ways to do that. Next post:
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 14/15
2/22/23, 3:44 PM Effortless models deployment with MLFlow | by Facundo Santiago | Medium
https://santiagof.medium.com/effortless-models-deployment-with-mlflow-2b1b443ff157 15/15