Parallelism in One Line

14/9/2019 Parallelism in One Line - Blogomatano
About
Projects
Contact
RSS Feed
Parallelism in One Line

Published: 2015-05-13
Python has a terrible rep when it comes to its parallel processing capabilities. Ignoring the standard
arguments about its threads and the GIL (which are mostly valid), the real problem I see with
parallelism in Python isn't a technical one, but a pedagogical one. The common tutorials surrounding
Threading and Multiprocessing in Python, while generally excellent, are pretty “heavy.” They start in
the intense stuff, and stop before they get to the really good, day-to-day useful parts.
Traditional Example
A quick survey of the top DDG results for “Python threading tutorial” shows that just about every
single one of them gives the same Class + Queue based example.
The de-facto, intro to threading/multiprocessing, producer/Consumer example code:
#Example.py
'''
Standard Producer/Consumer Threading Pattern
'''
import time
import threading
import Queue
class Consumer(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
while True:
# queue.get() blocks the current thread until
# an item is retrieved.
msg = self._queue.get()
# Checks if the current message is
# the "Poison Pill"
if isinstance(msg, str) and msg == 'quit':
https://chriskiehl.com/article/parallelism-in-one-line 1/14
# if so, exists the loop

break
# "Processes" (or in our case, prints) the queue item
print "I'm a thread, and I received %s!!" % msg
# Always be friendly!
print 'Bye byes!'
def Producer():
# Queue is used to share items between
# the threads.
queue = Queue.Queue()
# Create an instance of the worker

worker = Consumer(queue)
# start calls the internal run() method to
# kick off the thread
worker.start()
# variable to keep track of when we started

start_time = time.time()
# While under 5 seconds..
while time.time() - start_time < 5:
# "Produce" a piece of work and stick it in
# the queue for the Consumer to process
queue.put('something at %s' % time.time())
# Sleep a bit just to avoid an absurd number of messages
time.sleep(1)
# This the "poison pill" method of killing a thread.

queue.put('quit')
# wait for the thread to close down
worker.join()
if __name__ == '__main__':
Producer()
Mmm.. Smell those Java roots.
Now, I don’t want to give the impression that I think the Producer / Consumer way of handling
threading/multiprocessing is wrong — because it’s definitely not. In fact it is perfect for many kinds of
problems. However, what I do think is that it’s not the most useful for day-to-day scripting.
The Problems (as I see them)
For one, you need a boiler-plate class in order to do anything useful. Secondly, you’ll need to
maintain a Queue through which you can pipe objects, and to top if all off, you’ll need methods on
both ends of the pipe in order to do the actual work (likely involving another queue if you want to
communicate two ways or store results).
More workers, more problems.
From here, next thing you’d likely do is make a pool of those worker classes in order to start
squeezing some speed out of your Python. Below is a variation of the example code given in the
excellent IBM tutorial on threading. It’s a very common scenario in which you spread the task of
retrieving web pages across multiple threads.
#Example2.py
'''
A more realistic thread pool example
'''
import time
import threading
import Queue
import urllib2
class Consumer(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
def run(self):
while True:
content = self._queue.get()
if isinstance(content, str) and content == 'quit':
break
response = urllib2.urlopen(content)
print 'Bye byes!'
def Producer():
urls = [
'http://www.python.org', 'http://www.yahoo.com'
'http://www.scala.org', 'http://www.google.com'
# etc..
]
queue = Queue.Queue()
worker_threads = build_worker_pool(queue, 4)
start_time = time.time()
# Add the urls to process

for url in urls:
queue.put(url)
# Add the poison pillv
for worker in worker_threads:
queue.put('quit')
for worker in worker_threads:
worker.join()
print 'Done! Time taken: {}'.format(time.time() - start_time)
def build_worker_pool(queue, size):

workers = []
for _ in range(size):
worker = Consumer(queue)
worker.start()
workers.append(worker)
return workers
if __name__ == '__main__':
Producer()
Works like a charm, but look at all that code! Now we've got setup methods, lists of threads to keep
track of, and worst of all, if you’re anywhere as dead-lock prone as I am, a bunch of join statements
to issue. And It only gets more complex from here!
What’s been accomplished so far? A whole lotta nothin. Just about everything in the above code is
pure plumbing. It’s boiler-plate-y, It’s error prone (Hell, I even forgot to call task_done() on the
queue object while writing this), and it’s a lot of work for little payoff. Luckily, there’s a much better
way.
Introducing: Map
Map is a cool little function, and the key to easily injecting parallelism into your Python code. For
those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which
maps another function over a sequence. e.g.
urls = ['http://www.yahoo.com', 'http://www.reddit.com']

results = map(urllib2.urlopen, urls)
This applies the method urlopen, on each item in the passed in sequence and stores all of the results
in a list. It is more or less equivalent to
results = []
for url in urls:
results.append(urllib2.urlopen(url))
Map handles the iteration over the sequence for us, applies the function, and stores all of the results
in a handy list at the end.
Why does this matter? Because with the right libraries, map makes running things in parallel
completely trivial!
Parallel versions of the map function are provided by two libraries: multiprocessing, and also its
little known, but equally fantastic step child: multiprocessing.dummy.
Digression: What’s that? Never heard of the threading clone of multiprocessing library called dummy?
I hadn't either until very recently. It has all of ONE sentence devoted to it in the multiprocessing
documentation page. And that sentence pretty much boils down to “Oh yeah, and this thing exists.”
It’s tragically undersold, I tell you!
Dummy is an exact clone of the multiprocessing module. The only difference is that, whereas
multiprocessing works with processes, the dummy module uses threads (which come with all the
usual Python limitations). So anything that applies to one, applies to the other. It makes it extremely
easy to hop back and forth between the two. Which is especially great for exploratory programming
when you’re not quite sure if some framework call is IO or CPU bound.
Getting Started
To access the parallel versions of the map functions the first thing you need to do is import the
modules that contain them:
from multiprocessing import Pool

from multiprocessing.dummy import Pool as ThreadPool
and instantiate their Pool objects in the code:
pool = ThreadPool()
This single statement handles everything we did in the seven line build_worker_pool function from
example2.py. Namely, It creates a bunch of available workers, starts them up so that they’re ready to
do some work, and stores all of them in variable so that they’re easily accessed.
The pool objects take a few parameters, but for now, the only one worth noting is the first one:
processes. This sets the number of workers in the pool. If you leave it blank, it will default to the
number of Cores in your machine.
In the general case, if you’re using the multiprocessing pool for CPU bound tasks, more cores equals
more speed (I say that with a lot of caveats). However, when threading and dealing with network
bound stuff, things seem to vary wildly, so it’s a good idea to experiment with the exact size of the
pool.
pool = ThreadPool(4) # Sets the pool size to 4
If you run too many threads, you’ll waste more time switching between then than doing useful work,
so it’s always good to play around a little bit until you find the sweet spot for the task at hand.
So, now with the pool objects created, and simple parallelism at our fingertips, let’s rewrite the url
opener from example2.py!
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
'http://planet.python.org/',
'https://wiki.python.org/moin/LocalUserGroups',
'http://www.python.org/psf/',
'http://docs.python.org/devguide/',
'http://www.python.org/community/awards/'
# etc..
]
# Make the Pool of workers

pool = ThreadPool(4)
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)
#close the pool and wait for the work to finish
pool.close()
pool.join()
Look at that! The code that actually does work is all of 4 lines. 3 of which are simple bookkeeping
ones. The map call handles everything our previous 40 line example did with ease! For funzies, I timed
both approaches as well as different pool sizes.
results = []
for url in urls:
result = urllib2.urlopen(url)
results.append(result)
# # ------- VERSUS ------- #
# # ------- 4 Pool ------- #
# # ------- 8 Pool ------- #
# # ------- 13 Pool ------- #
Pretty awesome! And also shows why it’s good to play around a bit with the pool size. Any pool size
greater than 9 quickly lead to diminishing returns on my machine.
Real World Example 2:

Thumbnailing thousands of images
Let’s now do something CPU bound! A pretty common task for me at work is manipulating massive
image folders. One of those transformations is creating thumbnails. It is ripe for being run in parallel.
The basic single process setup

from PIL import Image
SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'
def get_image_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'jpeg' in f)
def create_thumbnail(filename):
im = Image.open(filename)
im.thumbnail(SIZE, Image.ANTIALIAS)
base, fname = os.path.split(filename)
save_path = os.path.join(base, SAVE_DIRECTORY, fname)
im.save(save_path)
if __name__ == '__main__':
folder = os.path.abspath(
'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
os.mkdir(os.path.join(folder, SAVE_DIRECTORY))
images = get_image_paths(folder)
for image in images:

create_thumbnail(image)
A little hacked together for example, but in essence, a folder is passed into the program, from that it
grabs all of the images in the folder, then finally creates the thumbnails and saves them to their own
directory.
On my machine, this took 27.9 seconds to process ~6000 images.
If we replace the for loop with a parallel map call:

from PIL import Image
SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'
def get_image_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'jpeg' in f)
def create_thumbnail(filename):
im = Image.open(filename)
im.thumbnail(SIZE, Image.ANTIALIAS)
base, fname = os.path.split(filename)
save_path = os.path.join(base, SAVE_DIRECTORY, fname)
im.save(save_path)
if __name__ == '__main__':
folder = os.path.abspath(
'11_18_2013_R000_IQM_Big_Sur_Mon__e10d1958e7b766c3e840')
os.mkdir(os.path.join(folder, SAVE_DIRECTORY))
images = get_image_paths(folder)
pool = Pool()
pool.map(create_thumbnail, images)
pool.close()
pool.join()
5.6 seconds!
That’s a pretty massive speedup for only changing a few lines of code. The production version of this
is even faster by splitting cpu and io tasks into their own respective processes and threads — which is
usually a recipe for deadlocked code. However, due to the explicit nature of map, and the lack of
manual thread management, it feels remarkably easy to mix and match the two in a way that is clean,
reliable, and easy to debug.
So there it is. Parallelism in (almost) one line.
53 Comments chriskiehl.com 
1 Login
 Recommend 56 t Tweet f Share Sort by Best
Join the discussion…
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
HongxuChen • 4 years ago • edited

Another option is using concurrent.futures.
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:

executor.map(requests.get, urls)
9△ ▽ • Reply • Share ›
David Howell > HongxuChen • a year ago

If you use `concurrent.futures.ProcessPoolExecutor` it uses the Multiprocessing library anyway.
`ThreadPoolExecutor` has all the same GIL issues as `multiprocessing.dummy` and regular
threads. Threads might be OK for concurrency but deadlocks and unexpected results can occur in
mutable variables if you do, it's concurrency, not parallelism.
youngsterxyf > HongxuChen • 4 years ago

concurrent 这个库应该python 3才有的吧
△ ▽ • Reply • Share ›
HongxuChen > youngsterxyf • 4 years ago

yes, but python2 has a backport for it
foresmac • a year ago • edited

Since ThreadPool.map is blocking, there's no need to call pool.join(). You can even do it as a context
manager:
with ThreadPool() as pool:

pool.map(func, iterable)
Dominik Schmidt • 4 years ago

Great article! Thank you very much!
Jasper Blues • 4 years ago

I'm just taking my first steps in Python, but enjoyed this article a lot. Thanks!
Vladimir Brasil • a year ago

This is genius work.
Brilliant.
I had to write this down not only think please comprehend

I had to write this down, not only think, please comprehend.
Thanks, thanks, thanks, and congrats, congrats, congrats (a zillion times).

Youness Nachid-Idrissi > Vladimir Brasil • a year ago

Agreed ... I Spend days searching for this.
Alexander • 2 years ago

Using python for applied calculations and hobby projects I just dodged a bullet of reading a lot of boring
documentation like this
- https://docs.python.org/3/l...
- https://www.tutorialspoint....
It will greatly help me with processing 100k bird songs in my project!

- https://spark-in.me/post/bi...
Many thanks!
geekunlimited • 3 years ago

This is an amazing piece of writing. Thanks for writing about such a useful piece of code that anybody
barely knows about. I am about to give map function a try right away. Thanks for your time to write about
this, it is really helpful.
Jake Peyser • 3 years ago

Thank you for this simple and direct explanation of multiprocessing in Python!
Luke Mondy • 4 years ago

I read this article a little while ago, but I just wanted to say thanks! It was really good, and I hope you
continue with this style of blog post.
Cheers!
Protik Das • a year ago

One question. How can I pass multiple arguments to map? Within the multiprocessing library there is a
starmap method that takes a tuple. I tried the same with multiprocessing.dummy but it takes more time to
finish.
izzyS123 • a year ago

Fantastic article - thanks!
archie8888 • 2 years ago

This is a great article and I am having limited success with it. However, for some OpenCV and numpy
funtions, applying pool.map produces this error,
Error: src is not a numpy array, neither a scalar.
This occurs because the pool.map function does not return an array. Specifically, opencv BGR2GRAY
causes this error with an output that has no shape attribute.
Does anyone know what the problem is or how to get around this.
Sherif Hamdy • 2 years ago

OMG
Dawid Chara • 2 years ago

Good job! You really helped me with understanding those concepts. Best regards sincerely.
Lerner Adams • 2 years ago

I'd like to conform that pool.map equals the map function?
Ethan Lyon • 2 years ago

Wow, blown away by how easy that is. Is there a way to do time delays between threads?
Erik • 2 years ago

I'm trying to get the output from a Job in a pool.. any tips? https://stackoverflow.com/q...
Steve Dillon • 2 years ago

From finding the blog to having 20 parallel uploads to AWS S3 in 20 minutes! Thank You!
Sam Howes • 2 years ago

Great article. Exactly what I was looking for. Much better explanation than the docs.
Nirav Borad • 2 years ago • edited

Thank you!
Shivam Chawla • 2 years ago

can a thread call create a pool of multiple threads?
bjd2385 • 2 years ago

Just wanted to say that you could also just write `from multiprocessing.pool import ThreadPool`, instead of
`from multiprocessing.dummy import Pool as ThreadPool`; the two are identical, and the former, IMHO, is
slightly more efficient and straightforward.
Sergio Danelli > bjd2385 • 2 years ago

from reading, the dummy one is using threading with the same interface and signatures of the
multiprocessing pool which uses actual multi processing, so kinda of a big difference depending on
needs.
Bucketo > Sergio Danelli • 2 years ago

@Sergio Danelli Nope, they are, pratically, the same.
AI Productions > Bucketo • 2 years ago

Who is right? :D
Peisong Cong > AI Productions • 2 years ago

The answer identical is right.
Checked Python2.7's library, found the following in multiprocessing.dummy

'''
def Pool(processes=None, initializer=None, initargs=()):

from multiprocessing.pool import ThreadPool
return ThreadPool(processes, initializer, initargs)
'''
in Python3.6, the following is used in multiprocessing.dummy

'''
def Pool(processes=None, initializer=None, initargs=()):
from ..pool import ThreadPool
return ThreadPool(processes, initializer, initargs)
'''
which also leads to multiprocessing.pool
bellboy household • 3 years ago

This is gold, it would be great however to have a small mention to common variables in concurrent
threads,
From the artivle it seems that results is a combination of all returned by the map which it is not
Paldin • 3 years ago

How do I apply this to another function? For example I want to strip a list? This since I don't know how to
go to python 3 from urllib2.
Jason Crowe > Paldin • 10 months ago

You can easily strip a list of items with a list comprehension:
striped_list = [x.strip() for x in unstriped_list]

Wyatt Shapiro • 3 years ago

How do you write to a text file and not have each thread write over each other?
Peter Majko > Wyatt Shapiro • a year ago

You don't. You use buffer and single thread which performs this IO.
Andrea Moro • 3 years ago

Great article, thanks.
Question. What if your URL list varies ask the time?

Sat you read a URL, parse it and extract another 100 to be processed in the same way. As the URLs array
is defined on top, can new URLs be appended?
I cannot see that happening in a for statement.
Sorry for the silly question, I'm learning this bits.

Larry O'Brien > Andrea Moro • 3 years ago

The "functional programming" answer to this would be to adopt one of two strategies: map-reduce
(which is a technique, not a technology) or recursion. The key to either is that you don't _append_
the new URLs to your existing list (and think of the task as growing and shrinking as new URLs are
added and returned) but rather, you think of the new URLs as defining a new task or subtask.
In the map-reduce technique, you'd return a tuple that would contain the contents of the page ("Get
the page") and a list of new URLs to get if any ("New subtasks"). Once you've finished the `map` of
your initial set of urls, you would `reduce` the list of lists of new urls (the list of "new subtasks") into
a single list and `map` it again again getting back a bunch of page contents and lists of lists And
a single list, and map it again, again getting back a bunch of page contents and lists of lists. And
you'd continue in that manner: essentially looping on "map over the task as I understand it now"
and "reduce / flatten the results into my growing list of finished work and a list of new tasks."
In the recursive technique (which would be a little clearer but wouldn't scale quite as well), the
function which you pass to `map` would extract the URLs and would recursively call itself via `map`,
passing in the list of newly discovered URLs.
These techniques _sound_ complex as alternatives to looping and branching and mutating arrays
and, yes, they _do_ take a little practice to seem as natural. But ultimately, many people feel that
the end result is code that is actually very easy to scale, maintain, and combine with other code.
Andrea Moro > Larry O'Brien • 3 years ago

Thanks Larry, that makes somewhat sense. Thought I've doubts on his get the map function
in hold and pass the new list?
Brandon Doyle • 3 years ago • edited

I recently gave Pathos (https://pypi.python.org/pyp... a try in my own trials with parallel convolution with
Numba and it also gave nice results. There are great comments in the source code to help with various
things too. Altogether, this is a very interesting post, thanks for writing it!
Some guy. • 3 years ago

If my the function that I would like to run in parallel requires more than one argument, how would I go
about this? I tried passing a multidimensional list as the 2nd argument to map, but that didn't work. I'm not
sure how to modify it to make that work. Any ideas?
nabaraj dahal > Some guy. • 3 years ago

if __name__ == '__main__':
# Define the parameters to test
param1 = range(100)
param2 = range(2, 202, 2)
# Zip the parameters because pool.map() takes only one iterable

params = zip(param1, param2)
pool = multiprocessing.Pool()
results = pool.map(runSimulation, params)
andersonsantos > Some guy. • 3 years ago • edited

Look at functools.partial, that's probably what you are looking for. prepopulating a function with args
leaving only one arg will be needed with map. if you do actually multi-array mapping, then you
might need to implement something fancier yourself.
In my case I had a list of tuples that I had to split as arguments for the function, then I wrapped that
function in another one which would use the * operator in my call as my_func(*my_tuple) to be
used as normal args.
This comment was deleted.

Avatar
Jason Cherry > Guest • 2 years ago

Yeah no kidding! And this is just using the multiprocessing wrapper over threading. Removing
.dummy allows one to use the proper multiprocessing library and bypass the GIL:
https://docs.python.org/3.6...
I had written a really complicated (several weeks of coding) library to do this I totally head-desked
I had written a really complicated (several weeks of coding) library to do this. I totally head desked
after reading this.
justanotherboob • 3 years ago

thank you. I am new at this and trying to understand the difference between processes and threads as you
alluded to. So, instead of using multiprocessing.dummy, I tried using this:
from multiprocessing import Pool as ThreadPool
and got an error
Reason: 'TypeError("can't pickle cStringIO.StringO objects",)'
So, can you point me to an explanation?

Erick Arthur • 3 years ago

I can not run this script.
Jon Neff • 4 years ago

Great post! I never knew concurrent threads in Python could be so easy!
Zaur Nasibov • 4 years ago

Thank you very much for the article! Since reading it for the first time, I cite it every time discussing
multitreading and multiprocessing in Python.
chirag visavadia • 4 years ago

Great article, It really helps to understand Threading...!!
/Chirag
Matthew Dunn • 4 years ago

Chris, thank you so very much for this post! Incredibly pragmatic and quite pythonic. The combination of
pool.map and the multi.dummy insight gave us faster AND smaller code. Favorite post of the month!
Load more comments
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
Copyright © 2014-2019

Parallelism in One Line - Blogomatano PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallelism in One Line - Blogomatano PDF

Uploaded by

Copyright:

Available Formats

14/9/2019 Parallelism in One Line - Blogomatano

The de-facto, intro to threading/multiprocessing, producer/Consumer example code:

# if so, exists the loop

# Create an instance of the worker

# variable to keep track of when we started

# This the "poison pill" method of killing a thread.

Mmm.. Smell those Java roots.

The Problems (as I see them)

More workers, more problems.

# Add the urls to process

print 'Done! Time taken: {}'.format(time.time() - start_time)

def build_worker_pool(queue, size):

urls = ['http://www.yahoo.com', 'http://www.reddit.com']

from multiprocessing import Pool

and instantiate their Pool objects in the code:

pool = ThreadPool(4) # Sets the pool size to 4

# Make the Pool of workers

# # ------- VERSUS ------- #

# # ------- 4 Pool ------- #

# # ------- 8 Pool ------- #

# # ------- 13 Pool ------- #

Real World Example 2:

The basic single process setup

from multiprocessing import Pool

for image in images:

On my machine, this took 27.9 seconds to process ~6000 images.

If we replace the for loop with a parallel map call:

from multiprocessing import Pool

So there it is. Parallelism in (almost) one line.

 Recommend 56 t Tweet f Share Sort by Best

Join the discussion…

HongxuChen • 4 years ago • edited

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:

David Howell > HongxuChen • a year ago

youngsterxyf > HongxuChen • 4 years ago

HongxuChen > youngsterxyf • 4 years ago

foresmac • a year ago • edited

with ThreadPool() as pool:

Dominik Schmidt • 4 years ago

Jasper Blues • 4 years ago

Vladimir Brasil • a year ago

I had to write this down not only think please comprehend

Thanks, thanks, thanks, and congrats, congrats, congrats (a zillion times).

Youness Nachid-Idrissi > Vladimir Brasil • a year ago

Alexander • 2 years ago

It will greatly help me with processing 100k bird songs in my project!

geekunlimited • 3 years ago

Jake Peyser • 3 years ago

Luke Mondy • 4 years ago

Protik Das • a year ago

izzyS123 • a year ago

archie8888 • 2 years ago

Error: src is not a numpy array, neither a scalar.

Sherif Hamdy • 2 years ago

Dawid Chara • 2 years ago

Lerner Adams • 2 years ago

Ethan Lyon • 2 years ago

Erik • 2 years ago

Steve Dillon • 2 years ago

Sam Howes • 2 years ago

Nirav Borad • 2 years ago • edited

Shivam Chawla • 2 years ago