You are on page 1of 52

Async I/O for Python 3

(PyCon 2013 keynote) Guido van Rossum guido@python.org

This all started on python-ideas...


When someone proposed to fix asyncore.py http://mail.python.org/pipermail/pythonideas/2012-September/016185.html
Subject: asyncore: included batteries don't fit Date: September 22, 2012 By October 6 it was a centithread On October 12 I started several new threads On December 12 I first posted PEP 3156

Take a deep breath

What is async I/O?


Do something else while waiting for I/O It's an old idea (as old as computers) With lots of approaches
threads, callbacks, events...

I'll come back to this later

Why async I/O?


I/O is slow compared to other work
the CPU is not needed to do I/O

Keep a UI responsive
avoid beach ball while loading a url

Want to do several/many I/O things at once


some complex client apps typical server apps

Why not use threads?


(Actually you may if they work for you!) OS threads are relatively expensive Max # open sockets >> max # threads Preemptive scheduling causes races
"solved" with locks

Async I/O without threads


select(), poll(), etc. asyncore :-( write your own frameworks, e.g. Twisted, Tornado, zeroMQ Wrap C libraries, e.g. libevent, libev, libuv Stackless, gevent, eventlet (Some overlap)

Downsides
Too many choices Nobody likes callbacks APIs not always easy Standard library doesn't cooperate

So, about gevent...


Scary implementation details
x86 CPython specific stack-copying code

Monkey-patching
"patch-and-pray"

Don't know when it task switches


could be not enough could be unexpected

So what to do?

No, really!

Let's standardize the event loop


At the bottom of all of these is an event loop
(that is, all except OS threads)

Event loop multiplexes I/O Various other features also common

Why is the event loop special?


Serializes event handling
handle only one event at a time

There should be only one


otherwise it's not serializing events

Each framework has its own event loop API


even though the functionality has much overlap

What functionality is needed?


start, stop running the loop
variant: always running

schedule callback DT in the future (may be 0)


also: repeated timer callback

set callback for file descriptor when ready


variant: call when I/O done

Interop
Most frameworks don't interoperate There's a small cottage industry adapting the event loop from framework X to be usable with framework Y
Tornado now maintains a Twisted adapter There's also a zeroMQ adapter for Tornado I hear there's a gevent fork of Tornado etc.

Enter PEP 3156 and Tulip

I know this is madness


Why can't we all just use Tornado? Let's just import Twisted into the stdlib Standardizing gevent solves all its problems
no more monkey-patching greenlets in the language

Or maybe use Stackless Python? Why reinvent the wheel?


libevent/ev/uv is the industry standard

Again: PEP 3156 and Tulip


I like to write clean code from scratch I also like to learn from others I really like clean interfaces PEP 3156 and Tulip satisfy all my cravings

What is PEP 3156? What is Tulip?


PEP 3156:
standard event loop interface slated for Python 3.4

Tulip:
experimental prototype (currently) reference implementation (eventually) additional functionality (maybe) works with Python 3.3 (always)

PEP 3156 is not just an event loop


It's also an interface to change the event loop implementation (to another conforming one)
this is the path to framework interop (even gevent!)

It also proposes a new way of writing callbacks


(that doesn't actually use callbacks)

But first, the event loop


Influenced by both Twisted and Tornado Reviewed by (some) other stakeholders The PEP is not in ideal state yet I am going to sprint Mon-Tue on PEP and Tulip

Event loop method groups


starting/stopping the loop basic callbacks I/O callbacks thread interactions socket I/O operations higher-level network operations

Starting/stopping the event loop


run() # runs until nothing more to do run_forever() run_once([timeout]) run_until_complete(future, [timeout]) stop()

May change these around a bit

Basic callbacks
call_soon(callback, *args) call_later(delay, callback, *args) call_repeatedly(interval, callback, *args) call_soon_threadsafe(callback, *args)

All return a Handler instance which can be used to cancel the callback

I/O callbacks
add_reader(fd, callback, *args) -> Handler remove_reader(fd) add_writer(fd, callback, *args) -> Handler remove_writer(fd)

Not all fd types are always acceptable fd may be an object with a fileno() method

UNIX signals
add_signal_handler(sig, callback, *args) -> Handler remove_signal_handler(sig) Raise RuntimeError if signals are unsupported

Thread interactions
wrap_future(future) -> Future run_in_executor(executor, callback, *args) -> Future Used to run code in another thread
sometimes there is no alternative e.g. getaddrinfo(), database connections

Threads may use call_soon_threadsafe()

Socket I/O operations


sock_recv(sock, nbytes) -> Future sock_sendall(sock, data) -> Future sock_accept(sock) -> Future sock_connect(sock, address) -> Future

Only transports should use these

High-level network operations


getaddrinfo(host, port, ...) -> Future getnameinfo(address, [flags]) -> Future create_connection(factory, host, port, ...) -> Future start_serving(factory, host, port, ...) -> Future

Use these in your high-level code

Um, Futures?
Like PEP 3148 Futures (new in Python 3.2):
from concurrent.futures import Future f.set_result(x), f.set_exception(e) f.result(), f.exception() f.add_done_callback(func) wait(fs, [timeout, [flags]]) -> (done, not_done) as_completed(fs, [timeout]) -> <iterator>

However, adapted for use with coroutines

Um, coroutines?
Whoops, let me get back to that later

What's a Future?
Abstraction for a value to be produced later
Also known as Promises (check wikipedia) Per wikipedia, these are explicit futures

API:
result() blocks until result is ready an exception is a "result" too: will be raised! exception() blocks ands checks for exceptions done callbacks called when result/exc is ready

Futures and coroutines


Not the concurrent.futures.Future class! Nor exactly the same API Where PEP 3148 "blocks", we must use...

Drum roll, please

PEP 380: yield-from


@coroutine def getresp(): s = socket() yield from loop.sock_connect(s, host, port) yield from loop.sock_sendall(s, b'xyzzy') data = yield from loop.sock_recv(s, 100) Yes, you can now return from a generator! Please, do not write real code like this! :-)

I cannot possibly do this justice


The best way to think about this is that yieldfrom is magic that "blocks" your current task but does not block your application It's almost best to pretend it isn't there when you squint (but things don't work without it)

PS. @coroutine / yield-from are very close to async / await in C#

How to think about Futures


Most of the time you can forget they are there Just pretend that:
data = yield from <function_returning_future>

is equivalent to:
data = <equivalent_blocking_function>

...and keep calm and carry on Also forget about result(), exception(), and done-callbacks

Error handling
Futures can raise exceptions too Just put a try/except around the yield-from: try: data = yield from loop.sock_connect(s, h, p) except OSError: <error handling code>

Coroutines
Yield-from must be used inside a generator Use @coroutine decorator to indicate that you're using yield-from to pass Futures Coroutines are driven by the yield-from Without yield-from a coroutine doesn't run

What if you want an autonomous task?

Tasks
Tasks run as long as the event loop runs A Task is a coroutine wrapped in a Future Two ways to create Tasks:
@task decorator (instead of @coroutine) f = Task(some_coroutine())

The Task makes sure the coroutine runs Task is a subclass of Future

Back to higher-level network ops


Consider: loop.create_connection(factory, host, port) This will block and create a TCP connection It returns a Future when ready The factory is a protocol class
or a factory function returning a protocol instance

Future's result is a (transport, protocol) tuple

Wait; transports and protocols?!


PEP 3153 (async I/O) explains why transport and protocol is the right abstraction
transport: provides two byte streams
e.g. TCP or SSL or pipes

protocol: implements application logic


e.g. SMTP or FTP or IRC

Only this abstraction level supports both ready- (select) and done-callbacks (IOCP)

Below the event loop


Lowest level factored out
selector classes: uniform API to select, poll, etc. will be stdlib classes in their own right also an IOCP "proactor" (not the same API)

Not part of the PEP (uncontroversial)

There's a lot more...

But I'm out of time :-(


StreamReader class: like a file whose methods return Futures (e.g. readline()) Datagram protocol (under development) Various types of locks (experimental) Exemplary HTTP client and server protocols
(may base client on Requests, HTTP for humans)

Subprocess support (mostly TBD)

More about interop...


Write code against standard event loop API May use yield-from, don't have to Will interop with other code written like that Will also work with adapted event loop
e.g. Twisted reactor code using legacy event loop API will also work Ideally most of Twisted will work with any standard event loop

Using Futures w/o yield-from


You can use Futures without yield-from! Just use add_done_callback() and set_result() This is how Twisted can adapt the event loop

When can I have it?


Tulip works but is in flux and undocumented PEP 3156 still to be reviewed thoroughly Push to be ready for Python 3.4 (Feb 2014)
3.4.0 beta 1 cutoff date Nov 23, 2013

Tulip (3rd party) will work with vanilla 3.3 Will keep Tulip around for a few releases PS. stdlib version won't be named "tulip"

And the rest of the stdlib?


We'll start thinking about that in earnest once 3.4 is out of the door We may eventually have to deprecate urllib, socketserver etc. Or emulate them on top of PEP 3156 But that will take years

What about older Python versions?


Sorry, you're out of luck :-( yield-from only available in 3.3 Much of Tulip depends on yield-from
even the parts that just use Futures

Consider this a carrot for porting to 3.3 :-) However, someone could implement a PEPconforming event loop in Python 2.7
just use yield instead of yield-from

Acknowledgments
Greg Ewing for PEP 380 (yield-from) Glyph and SF Twisted folks for meetings Richard Oudkerk for the IOCP proactor work Nikolay Kim for much of the code and tests Charles-Franois Natali for the Selectors Eli Benderski, Geert Jansen, Sal Ibarra Corretg, Steve Dower, Dino Viehland, Ben Darnell, Laurens van Houtven, Giampaolo Rodol, and everyone on python-ideas...

Oh yeah, I'm sprinting


Will be here Monday - Tuesday

You might also like