You are on page 1of 35

1

serveRless
computing for R

EARL 2019 – London


by Thomas Laber
2

Agenda

01 02 03
The Problem Serverless Architecture

Building a scalable What does this A solution


and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
3

Agenda

01 02 03
The Problem Serverless Architecture

Building a scalable What does this A solution


and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
4

The Problem
How can we build a
cost-effective data
science pipeline that
allows data scientists
using R to easily put
their models into
production, that
scales well and is
easy to maintain?

Rare picture of the fabled „eierlegende Wollmilch“


What we want
5

a serverless data science architecture

get training data docker pull


Batch Training

Pool 1 Pool 2 … Pool n

store models

Model storage

Project 1 Project 2 … Project n

get scoring data


read models

write results Auto-Scaling

Trigger REST
Batch Scoring Realtime Scoring
6

Agenda

01 02 03
The Problem Serverless Architecture

Building a scalable What does this A solution


and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
7

The Solution
Just like wireless
internet has wires
somewhere, serverless
architectures still have
servers somewhere.

What ‘serverless’ really


means is that, as a
developer you don’t
have to think about
those servers. You just
focus on code.
(AWS) Components of a serverless architecture serverless.com
Why serverless?
8

The promise: Focus on coding, not maintenance

NO ADMINISTRATION SCALE ON DEMAND PAY-PER-USE FASTER TURNAROUND

No server provisioning and Scaling is automatic and Billing is based on actual Spinning up new
maintenance is necessary. part of the service. compute resources used. environments is quick and
Hardware and OS are No compute used, no costs. allows for faster
abstracted away. experimentation.
The Evolution of the Cloud
9

Cloud provider versus customer roles for managing cloud services

Entreprise IT Infrastructure as a Service Platform as a Service Functions as a Service


legacy IT Virtual Machines Containers Functions

Customer Managed
Applications Applications Applications Applications
Scalability Scalability Scalability Scalability
Security Security Security Security
Customer Managed

OS OS OS OS

Provider Managed
Virtualization Virtualization Virtualization Virtualization

Provider Managed
Provider Managed
Servers Servers Servers Servers
Storage Storage Storage Storage
Networking Networking Networking Networking
Data Centers Data Centers Data Centers Data Centers
What now?
10

VMs vs containers vs functions

Virtual Machines Containers Functions


Unit of Scale machine application function

Abstraction hardware operation system language runtime

Packaging image container file code

Configure machine, storage, network, OS servers, applications, scaling run code when needed

Execution multi-threaded, multi-task multi-threaded, single-task single-threaded, single-task

Runtime hours to months minutes to days microseconds to seconds

Unit of Cost per VM per hour per container per hour per memory/second per request

Amazon EC2 Fargate Lambda

Azure Azure VM Container Instances Azure Functions

Google Google Compute Engine Google Kubernetes Cloud Functions


Three ways to interact with Cloud Providers
11

Terminology

Browser (GUI)

Command Line Interface (CLI)


Azure: ARM-Templates
AWS: CloudFormation

REST calls
excellent tool: Postman
Three ways to interact with Cloud Providers
12

Browser

Browser
(GUI)
Three ways to interact with Cloud Providers
13

Terminology

Command Line Interface (CLI)


Azure: ARM-Templates
AWS: CloudFormation

IaC
(Infrastructure
as Code)
Three ways to interact with Cloud Providers
14

Terminology

REST calls
excellent tool: Postman
Cost Comparison
15

Serverless can be cheap, but depends on work load

Big
lock-in
potential!
Cost Comparison
16

Serverless can be cheap, but depends on work load

You create a Linux container group with a 1 vCPU, 1 GB configuration once daily during a month (30
days). The duration of each container group is 5 minutes (300 seconds).

Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total

1 300 seconds 1 € 0.0000014 per GB-s 30 days € 0.013

vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total

1 300 seconds 1 € 0.0000129 per vCPU-s 30 days € 0.116

TOTAL = € 0.013 + € 0.116 = € 0.13


Cost Comparison
17

Serverless can be cheap, but depends on work load

You create a Linux container group with a 1 vCPU, 2 GB configuration 50 times daily during a month (30
days). The container group duration is 150 seconds.

Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total

50 150 seconds 2 € 0.0000014 per GB-s 30 days € 0.63

vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total

50 150 seconds 1 € 0.0000129 per vCPU-s 30 days € 2.903

TOTAL = € 0.63 + € 2.903 = € 3.53


Cost Comparison
18

Serverless can be cheap, but depends on work load

You create a Linux container group with a 4 vCPU, 8 GB configuration 2 times daily during a month (30
days). The container group duration is 1 hour (= 3600 seconds).

Memory duration
Number of container groups memory duration (seconds) GB price per GB-s number of days Total

2 3600 seconds 8 € 0.0000014 per GB-s 30 days € 2.419

vCPU duration
Number of container groups vCPU duration (seconds) vCPU price per vCPU-s number of days Total

2 3600 seconds 4 € 0.0000129 per vCPU-s 30 days € 11.146

TOTAL = € 2.42 + € 11.146 = € 13.56


19

Agenda

01 02 03
The Problem Serverless Architecture

Building a scalable What does this A solution


and flexible pipeline buzzword actually architecture for Azure
to deploy R models mean?
Two Use Cases
20

Model training and scoring have different architecture requirements

TRAINING SCORING

• Usually long running tasks • Mostly short running tasks


• Resource intensive • Resource usage low
• Mostly in batch mode • Either adhoc or on schedule

OUR FOCUS TODAY


Serverless Options
21

We primarily looked at the following options:

AWS Lambda Azure Functions Azure Container


Instances
Requirements
22

Many ways to realize serverless scoring architecture with different pros and cons

Must support at least


Loading from blobs
time/http trigger

Trigger Deploy Resources Load model Serve Model Score

! Custom runtime Optionally serving


support necessary scores with HTTP
Function as a Service
23

AWS Lambda

additional layers R/
└──library/
├── package 1/
├── package 2/ runtime.zip
├── package …/
└── package n/

Philipp Schirmer
R/
├──bin/
├──lib/
├──library/ runtime.zip
└──…
bootstrap
Compiled packages base layer runtime.r

can be a headache…

A function can use up to 5 layers at a time. The total unzipped size of the function
and all layers can't exceed the unzipped deployment package size limit of 250MB.
Function as a Service
24

Azure Functions

Neal Fultz
C# Java …
function code
modern open source high
performance RPC framework
language worker process

Protocol Buffers
.NET .NET Core
host Dirk Eddelbuettel

host process Google's language-neutral,


platform-neutral, extensible
mechanism for serializing
functions V2 runtime
structured data
Why Azure Container?
25

Container give us maximum flexibility regarding runtime and reduce vendor lock-in

PROS CONS

More setup involved compared


Supports arbitrary runtimes
to FaaS such as AWS Lambda

No problems with compiled libraries Higher startup times compared


to FaaS depending on Image
Lots of supported triggers in
combination with logic apps

Low vendor lock-in

Pay-as-you-go
Docker Basics
26

Terminology

docker build docker run

Dockerfile Docker Image Docker Container

FROM rocker/rstudio:3.5.3

RUN mkdir -p /usr/src/app


WORKDIR /usr/src/app

COPY * ./TravisR/
WORKDIR /usr/src/app/TravisR
Azure Container + Logic App
27

Our setup currently looks like this

01 Write Code
01
User Rstudio or the IDE of your choice to write some arbitrary
R code, ideally as a package

02 Create Dockerfile 02 Dockerfile


Use serverless to create a Dockerfile
or do it yourself

03 git Repo 03
Push your package and the dockerfile to github
Azure Container + Logic App
28

Our setup currently looks like this

04 Create Container Registry 04 ACR 05 ACR Task


Create a resource group and more importantly an Azure
Container Registry (ACR)

05 Create ACR Task


Create a registry task which builds a docker image
based on your github repo 06 07

ACI
06 Create Container Instance
Create a Azure Container Instance (ACI) pulling the
docker image from the ACR

07 Manage it
Use a Logic App or single REST calls to start and stop it
Logic App
30

A serverless workflow orchestration tool with GUI for prototyping

LOGIC APP DESIGNER LOGIC APP TEMPLATING


Scoring Workflow
31

Template workflow for a wide range of scenarios

Specify trigger type (time, HTTP, email,


etc)

Create container resources based on


spec (cpu + RAM + nodes)

Check if container group is spawned


successfully

Delete container after work is done


serveRless Package
32

We want to build a package to help automate this setup

IDEA STATUS

Build prototype to test setup


Build an R package that allows R users to
deploy their code in a serverless setup Build Rstudio Addin

R Package serveRless

Many thanks to Hong Ooi for his awesome work supporting R in Azure!
And of course Christoph Bodner und Florian Schwendinger who are not here today.
33

Short Demo
34

Questions?
35

Thank you
for your attention!

Feel free to reach :

linkedin.com/in/thomas-laber
data-science-austria.at
data-analytics.netlify.com
What now?
36

VMs vs containers vs functions

Virtual Machines Containers Functions


Unit of Scale machine application function

Abstraction hardware operation system language runtime

Packaging image container file code

Configure machine, storage, network, OS servers, applications, scaling run code when needed

Execution multi-threaded, multi-task multi-threaded, single-task single-threaded, single-task

Runtime hours to months minutes to days microseconds to seconds

Unit of Cost per VM per hour per container per hour per memory/second per request

Amazon EC2 Fargate Lambda

Azure Azure VM Container Instances Azure Functions

Google Google Compute Engine Google Kubernetes Cloud Functions

You might also like