Making 100 Million Requests With Python Aiohttp

Setup
Server
We have a server program “server”:
(Note it differs from PaweÅ‚’s version because I am using an older version of aiohttp which has
fewer convenient features.)
#!/usr/bin/env python3.5
from aiohttp import web

import asyncio
import random
async def handle(request):

await asyncio.sleep(random.randint(0, 3))
return web.Response(text="Hello, World!")
async def init():

app = web.Application()
app.router.add_route('GET', '/{name}', handle)
return await loop.create_server(
app.make_handler(), '127.0.0.1', 8080)
loop = asyncio.get_event_loop()
loop.run_until_complete(init())
loop.run_forever()
This just responds “Hello, World!” to every request it receives, but after an artificial delay of 0-3
seconds.
Synchronous client
As a baseline, we have a synchronous client “client-sync”:
import requests
import sys
url = "http://localhost:8080/{}"
for i in range(int(sys.argv[1])):
requests.get(url.format(i)).text
This waits for each request to complete before making the next one. Like the other clients below,
it takes the number of requests to make as a command-line argument.
Async client using semaphores

Copied mostly verbatim from Making 1 million requests with python-aiohttp we have an async
client “client-async-sem” that uses a semaphore to restrict the number of requests that are in
progress at any time to 1000:
from aiohttp import ClientSession

import asyncio
import sys
limit = 1000
async def fetch(url, session):

async with session.get(url) as response:
return await response.read()
async def bound_fetch(sem, url, session):

# Getter function with semaphore.
async with sem:
await fetch(url, session)
async def run(session, r):

tasks = []
# create instance of Semaphore
sem = asyncio.Semaphore(limit)
for i in range(r):
# pass Semaphore and session to every GET request
task = asyncio.ensure_future(bound_fetch(sem, url.format(i), session))
tasks.append(task)
responses = asyncio.gather(*tasks)
await responses
with ClientSession() as session:
loop.run_until_complete(asyncio.ensure_future(run(session,
int(sys.argv[1]))))
Async client using limited_as_completed
The new client I am presenting here uses limited_as_completed from the previous post. This
means it can make a generator that provides the futures to wait for as they are needed, instead of
making them all at the beginning.
It is called “client-async-as-completed”:
from aiohttp import ClientSession

import asyncio
from itertools import islice
import sys
def limited_as_completed(coros, limit):

futures = [
asyncio.ensure_future(c)
for c in islice(coros, 0, limit)
]
async def first_to_finish():
while True:
await asyncio.sleep(0)
for f in futures:
if f.done():
futures.remove(f)
try:
newf = next(coros)
futures.append(
asyncio.ensure_future(newf))
except StopIteration as e:
pass
return f.result()
while len(futures) > 0:
yield first_to_finish()
async def fetch(url, session):

async with session.get(url) as response:
return await response.read()
limit = 1000
async def print_when_done(tasks):

for res in limited_as_completed(tasks, limit):
await res
r = int(sys.argv[1])
with ClientSession() as session:
coros = (fetch(url.format(i), session) for i in range(r))
loop.run_until_complete(print_when_done(coros))
loop.close()
Again, this limits the number of requests to 1000.
Test setup
Finally, we have a test runner script called “timed”:
#!/usr/bin/env bash
./server &
sleep 1 # Wait for server to start
/usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" "$@"
# %e Elapsed real (wall clock) time used by the process, in seconds.

# %M Maximum resident set size of the process in Kilobytes.
kill %1
This runs each process, ensuring the server is restarted each time it runs, and prints out how long
it took to run, and how much memory it used.
Results
When making only 10 requests, the async clients worked faster because they launched all the
requests simultaneously and only had to wait for the longest one (3 seconds). The memory usage
of all three clients was fine:
$ ./timed ./client-sync 10
Memory usage: 20548KB Time: 15.16 seconds
$ ./timed ./client-async-sem 10
$ ./timed ./client-async-as-completed 10
When making 100 requests, the synchronous client was very slow, but all three clients worked
eventually:
$ ./timed ./client-sync 100

At this point let’s agree that life is too short to wait for the synchronous client.
When making 10000 requests, both async clients worked quite quickly, and both had increased
memory usage, but the semaphore-based one used almost twice as much memory as the
limited_as_completed version:

For 1 million requests, the semaphore-based client took 25 minutes on my (32GB RAM)
machine. It only used about 10% of my CPU, and it used a lot of memory (over 3GB):

Note: PaweÅ‚’s version only took 9 minutes on his laptop and used all his CPU, so I wonder
whether I have made a mistake somewhere, or whether my version of Python (3.5.2) is not as
good as a later one.
The limited_as_completed version ran in a similar amount of time but used 100% of my CPU,
and used a much smaller amount of memory (162MB):
Now let’s try 100 million requests. The semaphore-based version lasted 10 hours before it was
killed by Linux’s OOM Killer, but it didn’t manage to make any requests in this time, because it
creates all its futures before it starts making requests:

Command terminated by signal 9
I left the limited_as_completed version over the weekend and it managed to succeed
eventually:

So its memory usage was still very bounded, and it managed to do about 665 requests/second
over an extended period, which is almost identical to the throughput of the previous cases.
Conclusion
Making a million requests is usually enough, but when we really need to do a lot of work while
keeping our memory usage bounded, it looks like an approach like limited_as_completed is a
good way to go. I also think it’s slightly easier to understand.
*****
Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib
Update: slides of a talk I gave at the London Python Meetup on this: Talk slides: Making 100
million HTTP requests with Python aiohttp.
Update: see how Cristian Garcia improved on this code here: Making an Unlimited Number of
Requests with Python aiohttp + pypeln.
I’ve been working on how to make a very large number of HTTP requests using Python’s
asyncio and aiohttp.
PaweÅ‚ Miech’s post Making 1 million requests with python-aiohttp taught me how to think
about this, and got us a long way, with 1 million requests running in a reasonable time, but I need
to go further.
PaweÅ‚’s approach limits the number of requests that are in progress, but it uses an unbounded
amount of memory to hold the futures that it wants to execute.
See also: 2 excellent related posts by Quentin Pradet: How do you rate limit calls with asyncio?,
How do you limit memory usage with asyncio?.
We can avoid using unbounded memory by using the limited_as_completed function I outined
in my previous post.

Making 100 Million Requests With Python Aiohttp

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Making 100 Million Requests With Python Aiohttp

Uploaded by

Copyright:

Available Formats

Setup

We have a server program “server”:

from aiohttp import web

async def handle(request):

async def init():

As a baseline, we have a synchronous client “client-sync”:

Async client using semaphores

from aiohttp import ClientSession

async def fetch(url, session):

async def bound_fetch(sem, url, session):

async def run(session, r):

Async client using limited_as_completed

from aiohttp import ClientSession

def limited_as_completed(coros, limit):

async def fetch(url, session):

async def print_when_done(tasks):

Again, this limits the number of requests to 1000.

Finally, we have a test runner script called “timed”:

/usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" "$@"

# %e Elapsed real (wall clock) time used by the process, in seconds.

$ ./timed ./client-sync 100

$ ./timed ./client-async-sem 10000

$ ./timed ./client-async-sem 1000000

$ ./timed ./client-async-sem 100000000

$ ./timed ./client-async-as-completed 100000000

You might also like