Professional Documents
Culture Documents
Serverless ML in Python
Predict surf height at Lahinch Beach
Beyond Notebooks: Don’t just train models, build “Prediction Services”
train model
Models
Features Pipelines & Batch Prediction Pipelines
Features
HOPSWORKS.AI
Once or Twice/day
Twice/day Predictions
Publish to UI
Github Pages UI
https://github.com/jimdowling/cjsurf
Alternatives to Github Actions for Serverless Python
No Yes
surfs up?
We built a system called CJSurf to predict surf at Lahinch
Direction
Height
Wave height at the point is 4 times higher than wave height at the beach
Swell Predictions by NOAA Buoys with height, period, direction
https://polar.ncep.noaa.gov/waves/WEB/gfswave.latest_run/plots/gfswave.62108.bull
Accurate Surf Height Observations by Lahinch Surf Shop
MySQL
Lahinch, NOAA
surf-report-features.ipynb
swell-features.ipynb download Hopsworks Github
batch-predict-surf.ipynb model Model Registry Pages
latest_lahinch.png
add model
insert
DataFrames train-model.ipynb
Hopsworks
Feature Store https://github.com/jimdowling/cjsurf
Feature Store
Feature Engineering with Pandas/Spark/SQL/Flink
HOPSWORKS
Feature Store
DataFrames DataFrames/Files
SQL Aggregations
Normalization
Dimensionality One-hot encoding
Reductions
Validations
Feature Store: write to Feature Groups, read from Feature Views
Real-Time
Read Feature Vectors
Features
Online API
Prediction “hits_at”
Time=0 Lahinch Time=?
Lahinch
SWELL WINDOW
for Lahinch
Feature
Group
DataFrame Great
Expectations ❌
Hopsworks
Alert
Creating
Training Data From
Feature Groups
Feature Store: write to Feature Groups, read from Feature Views
Real-Time
Read Feature Vectors
Features
Online API
Training Data
Python DSL for Point-in-Time JOINs, transpiled into SQL
query = lahinch.select(['wave_height'])
.join(swells.select(['height','period','direction']))
fv = fs.create_feature_view(name='lahinch_surf',
description="Lahinch surf height features",
version=1,
labels=["wave_height"],
query=query)
Avoid Training/Serving
Skew with Online Models
Maximize Feature Reuse: Transformations after Feature Groups
HOPSWORKS
Feature Store
DataFrames DataFrames/Files
SQL Aggregations
Normalization
Dimensionality One-hot encoding
Reductions
Validations
Normalizing numerical features often improves model performance
Training Pipeline
Training Pipeline
standard_scaler =
fs.get_transformation_function(name="standard_scaler")
transformation_functions = {
"height": standard_scaler,
"period": standard_scaler,
"direction": standard_scaler,
}
fv = fs.create_feature_view(name='lahinch_surf',
…
transformation_functions=transformation_functions)
X_train,y_train,X_test,y_test = fv.train_test_split(0.1)
Transformation
Online Inference Pipeline functions (UDFs)
consistent over
keys= {“beach_id”: 1} training & serving
feature_vector = fv.get_feature_vector(keys)
Lesson Learned:
Refactor Monolithic ML
Pipelines into
Production ML Services
Beyond Notebooks and Monolithic ML Pipelines
● A feature pipeline to create features from new live data or to backfill features
from historical data
● A training pipeline that can be run when a new model is needed
● An inference pipeline (either batch or online) that takes features from the feature
store, and if the model is online, combines them with online features.
Run on a Run
Training schedule on-demand
Pipeline
● Some features are pre-computed and retrieved from the feature store
(typically those that require history and context information)
● Some features are computed on-demand (at run-time) with
application-supplied data (and possibly also history/context)
Operational Run on a Run
Service schedule on-demand
Training
Pipeline on-demand
features
Stream Source training data model
precomputed
features request
Feature features Model Application
Batch Source Pipeline Hopsworks Serving or Service
prediction
backfill
Historical Data
Case Study: Iris Flowers as a Batch Prediction Service
register
training data
new model
data
Synthetic Data
features DataFrame iris-batch-infere
iris-feature- Github
nce-pipeline
pipeline.ipynb Hopsworks iris_model .ipynb Pages UI
iris.csv backfill
https://github.com/featurestoreorg/serverless-ml-course/tree/main/src/01-module
SERVERLESS ML
www.serverless-ml.org
September 2022
Serverless ML Flywheels with Hopsworks
https://app.hopsworks.ai
Compliance
Governance
Efficiency
At Scale
Open &
modular
www.hopsworks.ai