You are on page 1of 20

The Machine Learning

Lifecycle in 2021
How do you actually complete a machine learning project and
what are some tools that can help each step of the way?
The Machine Learning Lifecycle in 2021 | by Eric Hofesmann | Towards Data Science

Photo by Tolga Ulkan on Unsplash

Everyone and their mother is getting into machine learning (ML) in


this day and age. It seems that every company that is collecting data is
trying to figure out some way to use AI and ML to analyze their
business and provide automated solutions.
The machine learning market cap is expected to
reach $117 billion by 2027 — Fortune Business
Insights

This influx of popularity in ML is leading to a lot of newcomers without


a formal background getting into the space. While it’s great that more
people are getting excited and learning about this field, it needs to be
clear that incorporating an ML project in a production setting is not an
easy task.

Image from the 2020 State of Enterprise ML by Algorithmia based on 750


businesses
55% of businesses working on ML models have
yet to get them into production — Algorithmia

Many people seem to be under the assumption that an ML project is


fairly straightforward if you have the data and computing resources
necessary to train a model. They could not be more wrong. This
assumption seems to lead to significant time and monetary costs
without ever deploying a model.

Naive assumption of the ML lifecycle (Image by author)

In this article, we’ll discuss what the lifecycle of an ML project actually


looks like and some tools to help tackle it.

The Machine Learning Lifecycle


In reality, machine learning projects are not straightforward, they are a
cycle iterating between improving the data, model, and evaluation that
is never really finished. This cycle is crucial in developing an ML model
because it focuses on using model results and evaluation to refine your
dataset. A high-quality dataset is the most surefire way to train a high-
quality model. The speed that this cycle is iterated through is what
determines your costs, luckily there are tools that can help speed up
the cycle without sacrificing quality.

A realistic example of ML lifecycle (Image by author)

Much like any system, even a deployed ML model requires monitoring,


maintenance, and updates. You can’t just deploy an ML model and
forget about it, expecting it to work as well as it did on your test set in
the real world for the rest of time. ML models deployed in production
environments are going to need updates as you find biases in the
model, add new sources of data, require additional functionality, etc.
This brings you right back into the data, model, and evaluation cycle.

As of 2021, deep learning has been prominent for over a decade now
and helped bring ML front and center in the market. The ML industry
has undergone a boom with countless products being developed to aid
in the creation of ML models. Every step of the ML lifecycle has some
tool that you can use to expedite the process and not end up as one of
the companies with an ML project that never sees the light of day.

The next sections will deep dive into each phase of the ML lifecycle and
highlight popular tools.

Phase 1: Data
Data in the ML lifecycle (Image by author)

While the end goal is a high-quality model, the lifeblood of training a


good model is in the amount and more importantly the quality of the
data being passed into it.

The primary data related steps in the ML lifecycle are:

Data Collection — Collect as much raw data as possible regardless of


quality In the end, only a small subset of it will be annotated anyway
which is where most of the cost comes from. It is useful to have a lot of
data available to add as needed when problems arise with model
performance.

• List of public datasets


Define your annotation schema — This is one of the most
important parts of the data phase of the lifecycle, and it often gets
overlooked. A poorly constructed annotation schema will result in
ambiguous classes and edge cases that make it much more difficult to
train a model.

For example, the performance of object detection models depends


heavily on attributes like size, localization, orientation, and truncation.
So including attributes like object size, density, and occlusion during
annotation can provide critical metadata needed to create high-quality
training datasets that models can learn from.

• Matplotlib, Plotly — Plot properties of your data

• Tableu — Analytics platform to better understand your data

Data Annotation—Annotation is a tedious process of performing the


same task on and on for hours at a time, which is why annotation
services are a booming business. The result is that annotators will
likely make numerous mistakes. While most annotation firms
guarantee a maximum error percentage (ex. 2% max error), a larger
problem is a poorly defined annotation schema resulting in annotators
deciding to label samples differently. This is harder to spot by the QA
team of an annotation firm and is something that you need to check
yourself.

• Scale, Labelbox, Prodigy — Popular annotation services


• Mechanical Turk — Crowdsourced annotation

• CVAT — DIY computer vision annotation

• Doccano — NLP specific annotation tool

• Centaur Labs — Medical data labeling service

Improve dataset and annotations — You will likely spend the


majority of your time here when trying to improve model performance.
If your model is learning but not performing well, the culprit is almost
always a training dataset containing biases and mistakes that
are creating a performance ceiling for your model. Improving your
model generally involves things like hard sample mining (adding new
training data similar to other samples the model failed on), rebalancing
your dataset based on biases your model has learned, and updating
your annotations and schema to add new labels and refine existing
ones.

• DAGsHub — Dataset versioning

• FiftyOne — Visualize datasets and find mistakes

Phase 2: Model
Models in the ML lifecycle (Image by author)

Even though the output of this process is a model, you will ideally
spend the least amount of time in this loop.

In industry, more time is spent on datasets than models. Credit to Andrej


Karpathy (source, original talk )
Explore existing pretrained models — The goal here is to reuse as
many available resources as possible to give yourself the best head start
to model production. Transfer learning is a core tenant of deep
learning in this day and age. You will likely not be creating a model
from scratch, but instead fine-tuning an existing model that was
pretrained on a related task. For example, if you want to create a mask
detection model, you will likely download a pretrained face detection
model from GitHub since that is a more popular topic with more prior
work.

• FiftyOne model zoo — Download and run models in one


line of code

• TensorFlow Hub — Repository of trained ML models

• modelzoo.co — Pretrained deep learning models for


various tasks and libraries

Construct training loop — Your data will likely differ in some way
from what was used to pretrain the model. For image datasets, things
like input resolution and object sizes need to be taken into account
when setting up the training pipeline for your model. You will also
need to modify the output structure of the model to match the classes
and structure of your labels. PyTorch lightning provides an easy way to
scale up model training with limited code.

• Scikit Learn — Build and visualize classic ML systems


• PyTorch, PyTorch Lightning, TensorFlow, TRAX — Popular
deep learning Python libraries

• Sagemaker — Build and train ML systems in the


Sagemaker IDE

Experiment Tracking — This entire cycle will likely require multiple


iterations. You will end up training a lot of different models so being
meticulous in your tracking of different versions of a model and the
hyperparameters and data it was trained on will help a great deal to
keep things organized.

• Tensorboard, Weights & Biases, MLFlow — Visualize and


track model hyperparameters

Side Note: Even if you think your task is completely unique, here are
some pre-training techniques to consider. I would recommend looking
into ways to pretrain your model in unsupervised or semi-supervised
ways, still only using a small subset of your total raw data for
finetuning. Depending on your task, you could also look into synthetic
data to pretrain your model. The goal is just to get a model that has
learned a good representation of your data so that your fine-tuning
dataset only needs to be used to train a few layers worth of model
parameters.

Phase 3: Evaluation
Evaluation in the ML lifecycle (Image by author)

Once you managed to get a model that has learned your training data,
it’s time to dig in and see how well it can perform on new data.

The key steps for evaluating an ML model:

Visualize model outputs — As soon as you have a trained model,


you need to immediately run it on a few samples and look at the
output. This is the best way to find if there are any bugs in your
training/evaluation pipeline before running evaluation on your entire
test set. It will also show if there are any glaring errors, like if two of
your classes have been mislabeled.
• OpenCV, Numpy, Matplotlib — Write custom visualization
scripts

• FiftyOne — Visualize outputs of computer vision tasks on


images and videos

Choose the right metric — Coming up with one or a few metrics can
help in comparing the overall performance of models. In order to make
sure you pick the best models for your task, you should develop metrics
in line with your end goal. You should also update metrics as you find
other important qualities you want to track. For example, if you want
to start tracking how well your object detection model performs on
small objects, use mAP on objects with a bounding box < 0.05 as one of
your metrics.

While these gross dataset metrics can be useful in comparing the


performance of multiple models, they rarely help in understanding
how to improve the performance of a model.

• Scikit Learn — Provides common metrics

• Python, Numpy — Develop custom metrics

Look at failure cases —Everything your model does is based on the


data that it was trained on. So assuming that it is able to learn
something, if it is performing more poorly than you would expect, you
need to take a look at the data. It can be useful to look at cases where
your model is doing well, but it is vital to look at false positives and
false negatives, where your model predicted something incorrectly.
After looking through enough of these samples, you will start to see
patterns of failure in your model.

For example, the image below shows a sample from the Open Images
dataset, one false positive is shown as the back wheel. This false
positive turns out to have been a missing annotation. Verifying all
wheel annotations in the dataset and fixing other similar mistakes can
help improve your model’s performance on wheels.

Image credit to Tyler Ganter (source)


• FiftyOne, Aquarium, Scale Nucleus — Dataset debugging to
find mistakes

Formulate solutions — Identifying failure cases is the first step in


figuring out ways to fix improve your model performance. In most
cases, it goes back to adding training data similar to where your model
failed but it can also include things like changing pre- or post-
processing steps in your pipeline or fixing annotations. No matter what
the solution is, you can only fix the problems with your model by
understanding where it fails.

Phase 4: Production
Deploying a model (Image by author)

Finally! You’ve got a model that performs well on your evaluation


metrics with no major errors on various edge cases.

Now you’ll need to:

Monitor model — Test your deployment to ensure that your model is


still performing as expected on test data with respect to your
evaluation metrics and things like inference speed.
• Pachyderm, Algorithmia, Datarobot, Kubeflow, MLFlow —
Deploy and monitor models and pipelines

• Amazon Web Services, Google AutoML, Microsoft Azure —


Cloud-based solutions for ML models

Evaluate new data — Using a model in production means you will


frequently pass brand new data through the model that it has never
been tested on. It’s important to perform evaluation and dig into
specific samples to see how your model performs on any new data it
encounters.

Continue understanding model — Some errors and biases in your


model can be deep-seated and take a long time to uncover. You need to
continuously test and probe your model for various edge cases and
trends that could cause problems if they were to be discovered by
clients instead.

Expand capabilities — Even if everything is working perfectly, it’s


possible that the model isn’t increasing profits as much as you hoped.
From adding new classes, developing new data streams, and making
the model more efficient there are countless ways to expand the
capabilities of your current model to make it even better. Any time you
want to improve your system, you will need to restart the ML lifecycle
to update your data, model, and evaluate it all to make sure your new
features work as expected.
FiftyOne

The above is pretty general and unbiased, but I want to tell you a little
bit more about the tool I’ve been working on.

Lots of tools exist for various portions of the ML lifecycle. However,


there is a pretty glaring lack of tools that help some of the key points
I’ve stressed in this post. Things like visualizing complex data (like
image or video) and labels or writing queries to find specific cases
where your model performs poorly are generally done through manual
scripting.

I have been working at Voxel51 developing FiftyOne, an open-source


data visualization tool designed to help debug datasets and models and
fill this void. FiftyOne lets you visualize your image and video datasets
and model predictions in a GUI either locally or remotely. It also
provides powerful capabilities to evaluate models and write advanced
queries for any aspects of your dataset or model output.

FiftyOne can run in notebooks so try it out in your browser with


this Colab Notebook. Alternatively, you can easily install it with pip.
pip install fiftyone
Sample from object detection model and dataset in FiftyOne (Image by
author)

Summary

Only a fraction of all companies that try to incorporate machine


learning (ML) into their business manage to actually deploy a model to
production. The lifecycle of an ML model is not straight forward but
requires continuous iterations between data and annotation
improvements, model and training pipeline construction, and sample-
level evaluation. If you know what you’re in for, this cycle can
eventually lead to a production-ready model, but it will also need to be
maintained and updated over time. Luckily, there are countless tools
developed to aid in every step of this process.

You might also like