Professional Documents
Culture Documents
Lifecycle in 2021
How do you actually complete a machine learning project and
what are some tools that can help each step of the way?
The Machine Learning Lifecycle in 2021 | by Eric Hofesmann | Towards Data Science
As of 2021, deep learning has been prominent for over a decade now
and helped bring ML front and center in the market. The ML industry
has undergone a boom with countless products being developed to aid
in the creation of ML models. Every step of the ML lifecycle has some
tool that you can use to expedite the process and not end up as one of
the companies with an ML project that never sees the light of day.
The next sections will deep dive into each phase of the ML lifecycle and
highlight popular tools.
Phase 1: Data
Data in the ML lifecycle (Image by author)
Phase 2: Model
Models in the ML lifecycle (Image by author)
Even though the output of this process is a model, you will ideally
spend the least amount of time in this loop.
Construct training loop — Your data will likely differ in some way
from what was used to pretrain the model. For image datasets, things
like input resolution and object sizes need to be taken into account
when setting up the training pipeline for your model. You will also
need to modify the output structure of the model to match the classes
and structure of your labels. PyTorch lightning provides an easy way to
scale up model training with limited code.
Side Note: Even if you think your task is completely unique, here are
some pre-training techniques to consider. I would recommend looking
into ways to pretrain your model in unsupervised or semi-supervised
ways, still only using a small subset of your total raw data for
finetuning. Depending on your task, you could also look into synthetic
data to pretrain your model. The goal is just to get a model that has
learned a good representation of your data so that your fine-tuning
dataset only needs to be used to train a few layers worth of model
parameters.
Phase 3: Evaluation
Evaluation in the ML lifecycle (Image by author)
Once you managed to get a model that has learned your training data,
it’s time to dig in and see how well it can perform on new data.
Choose the right metric — Coming up with one or a few metrics can
help in comparing the overall performance of models. In order to make
sure you pick the best models for your task, you should develop metrics
in line with your end goal. You should also update metrics as you find
other important qualities you want to track. For example, if you want
to start tracking how well your object detection model performs on
small objects, use mAP on objects with a bounding box < 0.05 as one of
your metrics.
For example, the image below shows a sample from the Open Images
dataset, one false positive is shown as the back wheel. This false
positive turns out to have been a missing annotation. Verifying all
wheel annotations in the dataset and fixing other similar mistakes can
help improve your model’s performance on wheels.
Phase 4: Production
Deploying a model (Image by author)
The above is pretty general and unbiased, but I want to tell you a little
bit more about the tool I’ve been working on.
Summary