Professional Documents
Culture Documents
You have 1 free member-only story left this month. Upgrade for unlimited access.
Save
Photo by Fotis
80 Fotopoulos
1 on Unsplash
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 1/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
I use it to keep track of machine learning experiments, push and version models to
a registry, and easily collaborate with colleagues on the same projects.
After intensively using this tool for more than a year, I came to know its ins and
outs and as a recap of this experience, this post is a consolidation of 8 useful code
snippets I regularly use.
Feel free to skip the ones you know and go over those you’re less familiar with.
The one we’ll be interested in today is called MLflow Tracking: broadly speaking,
you can view it as a GIT repository for models and machine learning projects.
It allows you to track parameters, metrics, and files (also called artifacts) in a
central location, namely, a remote server.
MLflow Tracking is organized into experiments and each experiment is split into
runs (and that’s all you need to know for the rest of this post).
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 2/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
With that in mind, let’s now move to the code snippets that will hopefully get you
productive.
PS*: Before running the following code snippets, you should create an MLflow experiment
and start a UI server. If you don’t know how to do that, you can check my previous post.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 3/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
After creating an experiment on MLflow, logging data would probably be your first
interaction with this tool.
To log some parameters and metrics, you’ll first need to start a run, and inside its
context, call the log_param and log_metric methods.
These two methods take a key and a value as first and second arguments.
Here’s an example:
1 import mlflow
2
3 experiment_id = "some_experiment_id"
4
5 with mlflow.start_run(experiment_id=experiment_id) as run:
6 mlflow.log_param("lr", 0.01)
7 mlflow.log_param("dropout", 0.25)
8 mlflow.log_param("optimizer", "Adam")
9 mlflow.log_param("n_layers", 5)
10
11 mlflow.log_metric("precision", 0.76)
12 mlflow.log_metric("recall", 0.92)
13 mlflow.log_metric("f1", 0.83)
14 mlflow.log_metric("coverage", 0.76)
Alternatively, you can also use the log_params and log_metrics methods. In that
case, you’ll have to pass a dictionary of parameters or metrics.
👉 The run object declared after the with statement allows you to access the
information of the current run.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 4/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 >>> run.to_dictionary()
2
3 {'info': {'artifact_uri': "XXXX",
4 'end_time': None,
5 'experiment_id': 'XXXX',
6 'lifecycle_stage': 'active',
7 'run_id': '5aa1f947312a44c68c844bc4034497d7',
8 'run_uuid': '5aa1f947312a44c68c844bc4034497d7',
9 'start_time': 1644579211050,
10 'status': 'RUNNING',
11 'user_id': ''},
12 'data': {'metrics': {},
13 'params': {},
14 'tags': {'mlflow.source.name': "XXXX",
15 'mlflow.source.type': 'LOCAL',
16 'mlflow.user': "XXXX"}}}
Be careful ⚠️
Use metrics when it makes sense for your data or if you want to sort your runs based
on their values. For example, if you want to sort the runs by a decreasing number of
layers, n_layers should be a metric and not a parameter.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 5/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
set the nested argument to True when creating the nested run.
Here’s what happens visually on the MLflow UI: runs that have nested runs inside
them can be collapsed.
👉 It’s worth noting that the two runs — the parent and the nested one — have two
different run ids.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 6/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
data : this encapsulates a RunData object that contains the metrics, params, and
tags
>>> run.data.metrics
{'coverage': 0.76, 'f1': 0.83, 'precision': 0.76, 'recall': 0.92}
>>> run.data.params
{'dropout': '0.25', 'lr': '0.01', 'n_layers': '5', 'optimizer':
'Adam'}
info : this encapsulates a RunInfo object that contains additional run metadata
such as start and end time, run_id, experiment id, and artifact URI.
>>> run.info.run_id
'5aa1f947312a44c68c844bc4034497d7'
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 7/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
>>> run.info.experiment_id
'418327'
Let’s first create 50 fake runs that have random values of metrics and parameters.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 8/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 import random
2
3 def generate_random_params():
4 lr = random.random()
5 dropout = random.random()
6 optimizer = random.choice(["sgd", "adam", "adamw", "rmsprop"])
7 n_layers = random.randint(1, 20)
8
9 return {
10 "lr": lr,
11 "dropout": dropout,
12 "optimizer": optimizer,
13 "n_layers": n_layers,
14 }
15
16 def generate_random_metrics():
17 precision = random.random()
18 recall = random.random()
19 f1 = (2 * precision * recall) / (precision + recall)
20 coverage = random.random()
21
22 return {
23 "precision": precision,
24 "recall": recall,
25 "f1": f1,
26 "coverage": coverage,
27 }
28
29
30 for _ in range(50):
31 params = generate_random_params()
32 metrics = generate_random_metrics()
33
34 with mlflow.start_run(experiment_id=experiment_id):
35 mlflow.log_params(params)
36 mlflow.log_metrics(metrics)
If you want to search these runs based on a specific filter, you can directly do it
from the interface.
For example, if you want to filter the runs that use SGD as an optimizer, you’ll have
to enter this query in the search box:
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 9/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
params.optimizer="sgd"
If you want to do this programmatically, you’ll have to use the search_runs method
and pass the search query to the filter_string argument.
1 runs_with_sgd = client.search_runs(
2 experiment_ids=experiment_id,
3 filter_string="params.optimizer='sgd'",
4 )
5
6 >>> len(runs_with_sgd)
7 15
1 filtered_runs = client.search_runs(
2 experiment_ids=experiment_id,
3 filter_string="metrics.precision > 0.5 and metrics.coverage > 0.6 and params.optimizer='adam
4 )
5 — Upload artifacts
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 10/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
Besides parameters and metrics, a run may also contain artifacts such as CSV files,
binary objects, HTML pages, images, etc.
To upload an artifact, use log_artifact . As a first argument, this method takes the
artifact path on the local filesystem.
Once an artifact is logged, you can click on the run from the UI and check if the files
have correctly been uploaded.
If we want a provide a destination folder to write the artifact to, we can set it in the
artifact_path argument.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 11/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
If we check the UI again, we’ll see that the stats_comparison.csv file is now inside
the stats folder.
We can also use the log_artifacts method to upload the full content of a directory.
1 with mlflow.start_run(experiment_id=experiment_id):
2 params = generate_random_metrics()
3 metrics = generate_random_metrics()
4
5 mlflow.log_params(params)
6 mlflow.log_metrics(metrics)
7
8 mlflow.log_artifacts("./images/", artifact_path="images/")
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 12/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
→ In summary:
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 13/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
6 — Download artifacts
As expected, downloading the run’s artifacts is also possible: you can do it by calling
the download_artifacts method.
Let's use the previous run in which we logged the images as artifacts and let’s create
a downloads folder locally to download them.
1 run_id = "78a0e1927ac5473eb79125ed7d6ebee6"
2
3 client.download_artifacts(run_id=run_id, path=".", dst_path="./downloads/")
path is the relative source path to the desired artifact in the MLflow tracking
server
If we set path to . and dst_path to downloads , everything that’s been logged in the
run will be downloaded in the downloads folder.
But you can also set the path argument to any desired artifact path of the run. You
don’t have to download everything.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 14/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
Let’s generate some random metrics and use them to override the previous ones. It’s
the usual syntax. The only difference is that you set the run_id argument instead of
the experiment_id .
1 run_id = "f0a285ab628245a79f417ab0706b9a99"
2
3 with mlflow.start_run(run_id=run_id):
4 random_metrics = generate_random_metrics()
5 mlflow.log_metrics(random_metrics)
If we check the same run again on the UI, we’ll see that metrics have been updated.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 15/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 run_id = "f0a285ab628245a79f417ab0706b9a99"
2
3 with mlflow.start_run(run_id=run_id):
4 random_params = generate_random_params()
5 mlflow.log_params(random_params)
I admit that updating the parameters of a run can be useful in many situations. If
you come to know a hacky way to do it, please let me know in the comment.
This is very useful when your inference pipeline is not standard. This may happen
when your model needs to include external artifacts while predicting, or when it
needs to send multiple outputs or perform post-processing of some sort.
Using a custom model can be valuable when you also need to integrate some
business logic in the prediction pipeline: this is where get creative.
This model will use a trained random forest as an artifact. It will customize the
inference by adding some data validation and it will return multiple outputs in a
dictionary.
This is just an example but it should give you some inspiration to develop more
complex pipelines.
To start with, let’s train a random forest model. For the sake of simplicity, we’re
going to use the Iris dataset.
Once the model is trained and evaluated with cross-validation, we’ll save it using
the joblib library.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 17/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 import joblib
2 import numpy as np
3 from sklearn.datasets import load_iris
4 from sklearn.ensemble import RandomForestClassifier
5 from sklearn.model_selection import cross_val_score
6
7 iris_data = load_iris()
8
9 features = iris_data["data"]
10 targets = iris_data["target"]
11
12 random_forest = RandomForestClassifier()
13
14 scores = cross_val_score(
15 estimator=random_forest,
16 X=features,
17 y=targets,
18 cv=5,
19 )
20
21 np.mean(scores)
22 # 0.96
23
24 random_forest.fit(features, targets)
25
26 joblib.dump(random_forest, "./models/random_forest.joblib")
We prepare a dictionary that lists all the artifacts that will be uploaded with the
model.
This dictionary will reference the local path for each artifact and MLflow will be
later responsible for uploading it.
1 artifacts = {
2 "random_forest": "./models/random_forest.joblib",
3 }
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 18/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
2. predict receives the context and the inputs arguments. The inputs argument is
a dictionary.
This is where you get creative and customize the inference logic.
In the example below, the predict method extracts the input features, validates
their shapes, and passes them to the pre-loaded model artifact.
It then extracts multiple predictions such as probability scores and predicted labels,
and packages everything in a dictionary with a success message.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 19/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 class AwesomeModel(mlflow.pyfunc.PythonModel):
2 def load_context(self, context):
3 import joblib
4 import numpy as np
5
6 self.random_forest = joblib.load(context.artifacts["random_forest"])
7 self.target_names = ["setosa", "versicolor", "virginica"]
8
9 def predict(self, context, inputs):
10 features = inputs["features"]
11
12 if type(features) == list:
13 features = np.array(features).reshape(1, -1)
14
15 elif type(features) == np.ndarray:
16 if (features.ndim != 2) or (features.shape[1] == 4):
17 return {
18 "message": "The number of features is incorrect",
19 "outputs": None,
20 "prediction_labels": None,
21 "prediction_probas": None,
22 }
23
24 predictions = self.random_forest.predict(features)
25 prediction_labels = map(
26 lambda prediction: self.target_names[prediction], predictions
27 )
28 prediction_labels = list(prediction_labels)
29
30 prediction_probas = self.random_forest.predict_proba(features)
31
32 return {
33 "message": "success",
34 "outputs": predictions,
35 "prediction_labels": prediction_labels,
36 "prediction_probas": prediction_probas,
37 }
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 20/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
1 awesome_model = AwesomeModel()
2
3 with mlflow.start_run(experiment_id=experiment_id):
4 mlflow.pyfunc.log_model(
5 artifact_path="awesome_model",
6 python_model=awesome_model,
7 artifacts=artifacts,
8 )
Note how MLflow automatically added an artifacts folder that contains the
pretrained random forest model.
Now given the run id, you can load the model locally and perform inference.
Everything is packaged into this model.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 21/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
Cool right?
Resources:
Here’s a list of material that you can go through to learn more about MLflow:
https://towardsdatascience.com/5-tips-for-mlflow-experiment-tracking-
c70ae117b03f
https://towardsdatascience.com/how-to-use-mlflow-on-aws-to-better-track-
machine-learning-experiments-bbcb8acded65
https://www.mlflow.org/docs/latest/tutorials-and-examples/tutorial.html
If you know other MLflow tricks, please let me know about them in the comments.
New to Medium? You can subscribe for $5 per month and unlock unlimited articles on
various topics (tech, design, entrepreneurship…) You can support me by clicking on my
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 22/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
referral link.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 23/24
2/2/23, 11:54 AM 8 Code Snippets To Quickly Get Started With MLflow Tracking | by Ahmed Besbes | Towards Data Science
Your tip will go to Ahmed Besbes through a third-party platform of their choice, letting them know you appreciate
their story.
Give a tip
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-
edge research to original features you don't want to miss. Take a look.
https://towardsdatascience.com/8-code-snippets-to-quickly-get-started-with-mlflow-tracking-63064d99d3ff 24/24