Professional Documents
Culture Documents
Parallelism in One Line - Blogomatano PDF
Parallelism in One Line - Blogomatano PDF
About
Projects
Contact
RSS Feed
Python has a terrible rep when it comes to its parallel processing capabilities. Ignoring the standard
arguments about its threads and the GIL (which are mostly valid), the real problem I see with
parallelism in Python isn't a technical one, but a pedagogical one. The common tutorials surrounding
Threading and Multiprocessing in Python, while generally excellent, are pretty “heavy.” They start in
the intense stuff, and stop before they get to the really good, day-to-day useful parts.
Traditional Example
A quick survey of the top DDG results for “Python threading tutorial” shows that just about every
single one of them gives the same Class + Queue based example.
#Example.py
'''
Standard Producer/Consumer Threading Pattern
'''
import time
import threading
import Queue
class Consumer(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
while True:
# queue.get() blocks the current thread until
# an item is retrieved.
msg = self._queue.get()
# Checks if the current message is
# the "Poison Pill"
if isinstance(msg, str) and msg == 'quit':
https://chriskiehl.com/article/parallelism-in-one-line 1/14
14/9/2019 Parallelism in One Line - Blogomatano
def Producer():
# Queue is used to share items between
# the threads.
queue = Queue.Queue()
if __name__ == '__main__':
Producer()
Now, I don’t want to give the impression that I think the Producer / Consumer way of handling
threading/multiprocessing is wrong — because it’s definitely not. In fact it is perfect for many kinds of
problems. However, what I do think is that it’s not the most useful for day-to-day scripting.
For one, you need a boiler-plate class in order to do anything useful. Secondly, you’ll need to
maintain a Queue through which you can pipe objects, and to top if all off, you’ll need methods on
both ends of the pipe in order to do the actual work (likely involving another queue if you want to
communicate two ways or store results).
https://chriskiehl.com/article/parallelism-in-one-line 2/14
14/9/2019 Parallelism in One Line - Blogomatano
From here, next thing you’d likely do is make a pool of those worker classes in order to start
squeezing some speed out of your Python. Below is a variation of the example code given in the
excellent IBM tutorial on threading. It’s a very common scenario in which you spread the task of
retrieving web pages across multiple threads.
#Example2.py
'''
A more realistic thread pool example
'''
import time
import threading
import Queue
import urllib2
class Consumer(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
while True:
content = self._queue.get()
if isinstance(content, str) and content == 'quit':
break
response = urllib2.urlopen(content)
print 'Bye byes!'
def Producer():
urls = [
'http://www.python.org', 'http://www.yahoo.com'
'http://www.scala.org', 'http://www.google.com'
# etc..
]
queue = Queue.Queue()
worker_threads = build_worker_pool(queue, 4)
start_time = time.time()
https://chriskiehl.com/article/parallelism-in-one-line 3/14
14/9/2019 Parallelism in One Line - Blogomatano
if __name__ == '__main__':
Producer()
Works like a charm, but look at all that code! Now we've got setup methods, lists of threads to keep
track of, and worst of all, if you’re anywhere as dead-lock prone as I am, a bunch of join statements
to issue. And It only gets more complex from here!
What’s been accomplished so far? A whole lotta nothin. Just about everything in the above code is
pure plumbing. It’s boiler-plate-y, It’s error prone (Hell, I even forgot to call task_done() on the
queue object while writing this), and it’s a lot of work for little payoff. Luckily, there’s a much better
way.
Introducing: Map
Map is a cool little function, and the key to easily injecting parallelism into your Python code. For
those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which
maps another function over a sequence. e.g.
This applies the method urlopen, on each item in the passed in sequence and stores all of the results
in a list. It is more or less equivalent to
results = []
for url in urls:
results.append(urllib2.urlopen(url))
Map handles the iteration over the sequence for us, applies the function, and stores all of the results
in a handy list at the end.
Why does this matter? Because with the right libraries, map makes running things in parallel
completely trivial!
https://chriskiehl.com/article/parallelism-in-one-line 4/14
14/9/2019 Parallelism in One Line - Blogomatano
Parallel versions of the map function are provided by two libraries: multiprocessing, and also its
little known, but equally fantastic step child: multiprocessing.dummy.
Digression: What’s that? Never heard of the threading clone of multiprocessing library called dummy?
I hadn't either until very recently. It has all of ONE sentence devoted to it in the multiprocessing
documentation page. And that sentence pretty much boils down to “Oh yeah, and this thing exists.”
It’s tragically undersold, I tell you!
Dummy is an exact clone of the multiprocessing module. The only difference is that, whereas
multiprocessing works with processes, the dummy module uses threads (which come with all the
usual Python limitations). So anything that applies to one, applies to the other. It makes it extremely
easy to hop back and forth between the two. Which is especially great for exploratory programming
when you’re not quite sure if some framework call is IO or CPU bound.
Getting Started
To access the parallel versions of the map functions the first thing you need to do is import the
modules that contain them:
pool = ThreadPool()
This single statement handles everything we did in the seven line build_worker_pool function from
example2.py. Namely, It creates a bunch of available workers, starts them up so that they’re ready to
do some work, and stores all of them in variable so that they’re easily accessed.
The pool objects take a few parameters, but for now, the only one worth noting is the first one:
processes. This sets the number of workers in the pool. If you leave it blank, it will default to the
number of Cores in your machine.
In the general case, if you’re using the multiprocessing pool for CPU bound tasks, more cores equals
more speed (I say that with a lot of caveats). However, when threading and dealing with network
bound stuff, things seem to vary wildly, so it’s a good idea to experiment with the exact size of the
pool.
https://chriskiehl.com/article/parallelism-in-one-line 5/14
14/9/2019 Parallelism in One Line - Blogomatano
If you run too many threads, you’ll waste more time switching between then than doing useful work,
so it’s always good to play around a little bit until you find the sweet spot for the task at hand.
So, now with the pool objects created, and simple parallelism at our fingertips, let’s rewrite the url
opener from example2.py!
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
'http://planet.python.org/',
'https://wiki.python.org/moin/LocalUserGroups',
'http://www.python.org/psf/',
'http://docs.python.org/devguide/',
'http://www.python.org/community/awards/'
# etc..
]
Look at that! The code that actually does work is all of 4 lines. 3 of which are simple bookkeeping
ones. The map call handles everything our previous 40 line example did with ease! For funzies, I timed
both approaches as well as different pool sizes.
results = []
for url in urls:
result = urllib2.urlopen(url)
results.append(result)
https://chriskiehl.com/article/parallelism-in-one-line 6/14
14/9/2019 Parallelism in One Line - Blogomatano
pool = ThreadPool(4)
results = pool.map(urllib2.urlopen, urls)
pool = ThreadPool(8)
results = pool.map(urllib2.urlopen, urls)
pool = ThreadPool(13)
results = pool.map(urllib2.urlopen, urls)
Pretty awesome! And also shows why it’s good to play around a bit with the pool size. Any pool size
greater than 9 quickly lead to diminishing returns on my machine.
Let’s now do something CPU bound! A pretty common task for me at work is manipulating massive
image folders. One of those transformations is creating thumbnails. It is ripe for being run in parallel.
SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'
def get_image_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'jpeg' in f)
def create_thumbnail(filename):
im = Image.open(filename)
im.thumbnail(SIZE, Image.ANTIALIAS)
base, fname = os.path.split(filename)
save_path = os.path.join(base, SAVE_DIRECTORY, fname)
im.save(save_path)
if __name__ == '__main__':
https://chriskiehl.com/article/parallelism-in-one-line 7/14
14/9/2019 Parallelism in One Line - Blogomatano
folder = os.path.abspath(
'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
os.mkdir(os.path.join(folder, SAVE_DIRECTORY))
images = get_image_paths(folder)
A little hacked together for example, but in essence, a folder is passed into the program, from that it
grabs all of the images in the folder, then finally creates the thumbnails and saves them to their own
directory.
SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'
def get_image_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'jpeg' in f)
def create_thumbnail(filename):
im = Image.open(filename)
im.thumbnail(SIZE, Image.ANTIALIAS)
base, fname = os.path.split(filename)
save_path = os.path.join(base, SAVE_DIRECTORY, fname)
im.save(save_path)
if __name__ == '__main__':
folder = os.path.abspath(
'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
os.mkdir(os.path.join(folder, SAVE_DIRECTORY))
images = get_image_paths(folder)
pool = Pool()
pool.map(create_thumbnail, images)
pool.close()
pool.join()
5.6 seconds!
That’s a pretty massive speedup for only changing a few lines of code. The production version of this
is even faster by splitting cpu and io tasks into their own respective processes and threads — which is
usually a recipe for deadlocked code. However, due to the explicit nature of map, and the lack of
https://chriskiehl.com/article/parallelism-in-one-line 8/14
14/9/2019 Parallelism in One Line - Blogomatano
manual thread management, it feels remarkably easy to mix and match the two in a way that is clean,
reliable, and easy to debug.
53 Comments chriskiehl.com
1 Login
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
9△ ▽ • Reply • Share ›
5△ ▽ • Reply • Share ›
Many thanks!
1△ ▽ • Reply • Share ›
This occurs because the pool.map function does not return an array. Specifically, opencv BGR2GRAY
causes this error with an output that has no shape attribute.
Does anyone know what the problem is or how to get around this.
△ ▽ • Reply • Share ›
From the artivle it seems that results is a combination of all returned by the map which it is not
△ ▽ • Reply • Share ›
In the map-reduce technique, you'd return a tuple that would contain the contents of the page ("Get
the page") and a list of new URLs to get if any ("New subtasks"). Once you've finished the `map` of
your initial set of urls, you would `reduce` the list of lists of new urls (the list of "new subtasks") into
a single list and `map` it again again getting back a bunch of page contents and lists of lists And
https://chriskiehl.com/article/parallelism-in-one-line 12/14
14/9/2019 Parallelism in One Line - Blogomatano
a single list, and map it again, again getting back a bunch of page contents and lists of lists. And
you'd continue in that manner: essentially looping on "map over the task as I understand it now"
and "reduce / flatten the results into my growing list of finished work and a list of new tasks."
In the recursive technique (which would be a little clearer but wouldn't scale quite as well), the
function which you pass to `map` would extract the URLs and would recursively call itself via `map`,
passing in the list of newly discovered URLs.
These techniques _sound_ complex as alternatives to looping and branching and mutating arrays
and, yes, they _do_ take a little practice to seem as natural. But ultimately, many people feel that
the end result is code that is actually very easy to scale, maintain, and combine with other code.
1△ ▽ • Reply • Share ›
pool = multiprocessing.Pool()
results = pool.map(runSimulation, params)
△ ▽ • Reply • Share ›
In my case I had a list of tuples that I had to split as arguments for the function, then I wrapped that
function in another one which would use the * operator in my call as my_func(*my_tuple) to be
used as normal args.
△ ▽ • Reply • Share ›
I had written a really complicated (several weeks of coding) library to do this I totally head-desked
https://chriskiehl.com/article/parallelism-in-one-line 13/14
14/9/2019 Parallelism in One Line - Blogomatano
I had written a really complicated (several weeks of coding) library to do this. I totally head desked
after reading this.
1△ ▽ • Reply • Share ›
/Chirag
△ ▽ • Reply • Share ›
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
Copyright © 2014-2019
https://chriskiehl.com/article/parallelism-in-one-line 14/14