You are on page 1of 37

Building a Python

web service with Ray

Philipp Moritz
September 30, 2020
What this talk is about
What this talk is about

Show design patterns for building Python web services with


Ray using Ray tasks and actors
What this talk is about

Show design patterns for building Python web services with


Ray using Ray tasks and actors
Show how we are building Anyscale as a Ray application
What this talk is about

Show design patterns for building Python web services with


Ray using Ray tasks and actors
Show how we are building Anyscale as a Ray application
Show how to address practical challenges like type checking,
testing, tracing, monitoring and deployment
Requirements for a web service

Needs to be available 24/7


Requirements for a web service

Needs to be available 24/7

Needs to be scalable according to user demand


Requirements for a web service

Needs to be available 24/7

Needs to be scalable according to user demand

Needs to integrate external Python libraries and frameworks,


e.g. web serving frameworks or machine learning libraries
Traditional Python web service architecture

Flask server

aiohttp server

fastAPI server

Web logic
Traditional Python web service architecture

Redis

Flask server
Celery
Redis Queue
aiohttp server Multiprocessing

Service 1 Service 2 Service 3


fastAPI server

Web logic Business logic


Traditional Python web service architecture
Database
Redis

Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store

Service 1 Service 2 Service 3


fastAPI server

Web logic Business logic Data


Traditional Python web service architecture
Database
Redis

Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store

Service 1 Service 2 Service 3


fastAPI server
Challenges: Programming, scaling,
Web logic Business logic Data
monitoring, tracing, fault tolerance
Ray web service architecture
Database
Redis

Flask server
Celery
Redis Queue
aiohttp server Multiprocessing Blob store

Service 1 Service 2 Service 3


fastAPI server

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Flask server
Task Actor Task

Task Actor Task


aiohttp server Blob store
Actor Actor Task

Actor Actor
fastAPI server Task

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
aiohttp server Blob store
Actor Actor Task

Actor Actor
fastAPI server Task

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
Actor Actor Task

Actor Actor
fastAPI server Task

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task

Actor Actor
fastAPI server Task

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task
● Built-in facilities for monitoring
Actor Actor
fastAPI server Task

Web logic Business logic Data


Ray web service architecture
Database
Service 1 Service 2 Service 3
Advantages of Ray:
Flask server
Task Actor Task
● Unified programming model
Task Actor Task
● Automatic scheduling,
aiohttp server resource management Blob store
● Autoscaling Actor Actor Task
● Built-in facilities for monitoring
● Great
fastAPIsupport
server for ML
Actor Actor Task

Web logic Business logic Data


Reminder: The Anyscale Platform

1. Laptop experience with the power of a cluster


2. Serverless experience without serverless limitations
3. Real-time collaboration
Architecture of Anyscale

fastAPI server Service 1 Service 2 Database


Task Task
Task Task
Actor Actor

Web logic Business logic Data


Scaling up with Ray tasks

fastAPI server Sessions Websockets Database


Task Actor Session 1
Session 2

Web logic Business logic Data

/api/v2/session/start

Session 1

Session 2
Scaling up with Ray tasks

fastAPI server Sessions Websockets Database


Task Actor Session 1
Task Session 2
Task

Web logic Business logic Data

/api/v2/session/1/execute

/api/v2/session/1/execute

/api/v2/session/1/execute Session 1

Session 2
Scaling up with Ray tasks

fastAPI server Sessions Websockets Database


Task Actor Session 1
Task Session 2
Task
Task
Web logic Business logic Data
Web logic Business logic
/api/v2/session/1/execute

/api/v2/session/1/execute

/api/v2/session/1/execute Session 1

/api/v2/session/2/execute
Session 2
Managing state with Ray actors
Update
fastAPI server Sessions Notifications
Actor Actor

Task
Update
Web logic Business logic

/api/v2/session/start

Session
Writing an API server with fastAPI

● Makes it easy to define a REST API


● Schema validation
● Typing

@router.get(“/{command_id}/execution_logs”)

async def get_execution_logs(

command_id: int, ...) ->

Response[LogOutput]:
Ray asyncio support
Object Reference
Ray object references are awaitable!
async def get_execution_logs(session_record, session_command_id, log_params):

log = await session_tasks_service.get_execution_log.remote(

session_record["id"],

session_command_id,

logs_params

)
Ray asyncio support
Object Reference
Ray object references are awaitable!
async def get_execution_logs(session_record, session_command_id, log_params):

log = await session_tasks_service.get_execution_log.remote(


@ray.remote
session_record["id"],
class WebSocketActor:

session_command_id, def __init__(self) -> None:


self.sio = socketio.AsyncServer()
logs_params
async def emit(self, message_name: str,
) data: Dict[str, Any]) -> None:
await self.sio.emit(message_name, data)
Ray actors can also be
Typing
fastAPI server Service 1 Service 2
Task Task
Task Task
Actor Actor

Web logic Business logic


Frontend
executeCommand({ async def execute_command( @ray.remote
sessionId, session_id: int, options: Options): def execute_command(
options: { command_record = db.create_command( command: CommandRecord):
command: input session_id, options) runner = AnyscaleSessionRunner()
} execute_command.remote( runner.execute_command(command)
}) command_record)

TypeScript Python Python


Testing
Unit testing: Use the Ray local mode for unit testing:
ray.init(local_mode=True)
Everything runs in a single process -> can mock out interfaces
Integration testing: Use a Ray instance running on the laptop/CI server,
testing web logic, business logic and database
End-to-end testing: Test full functionality in staging environment
Stress testing: Test scalability limits of the system
Metrics and Monitoring

Use Ray’s built in metrics API:


from ray.experimental import metrics

self.create_cluster_stats = metrics.Histogram(

"Anyscale_create_cluster", "Num of seconds took to create cluster",

"second",

[float(i) for i in range(10, 300, 10)],

["step"],
Metrics and Monitoring

Use Ray’s built in metrics API:


from ray.experimental import metrics

self.create_cluster_stats = metrics.Histogram(

"Anyscale_create_cluster", "Num of seconds took to create cluster",

"second",

[float(i) for i in range(10, 300, 10)],

["step"],
Tracing
We use OpenTelemetry for tracing

Can generate detailed traces for a number of Python libraries, including database
clients, web frameworks. requirements.txt:
opentelemetry-api
opentelemetry-sdk
opentelemetry-ext-asgi
Automatic tracing for Ray tasks and actors
opentelemetry-ext-asyncpg
opentelemetry-ext-botocore
opentelemetry-instrumentatio
Can also add custom traces: n-starlette
Tracing
We use OpenTelemetry for tracing

It can generate detailed traces automatically for a number of Python libraries,


including database clients, web frameworks.
requirements.txt
opentelemetry-api
opentelemetry-sdk

Full automatic tracing for Ray tasks and actors opentelemetry-ext-asgi


opentelemetry-ext-asyncpg
opentelemetry-ext-botocore
opentelemetry-instrumentation-starlette

Can also add custom traces:


Deployment
The cloud environment for our web service is set up with
Terraform, to make the setup easily reproducible for

● Development
● Staging,
● Production

The web service is deployed on Docker and Kubernetes, which


integrate well with Ray
Summary

We showed how the Python web serving ecosystem


integrates with Ray

We showed how Ray makes it easy to scale up your web


services and manage their state

We showed how to type, test, monitor and deploy your web


service with Ray
Thanks to the Team @ Anyscale

You might also like