You are on page 1of 3

A Quick Primer on Python Concurrency

The threading module in Python that allows us to spin up native operating system
threads to execute multiple tasks concurrently.
by Tim Ojo � Oct. 30, 17 � Big Data Zone � Tutorial
Like (7)
Comment (0)
Save Tweet 7,140 Views
Join the DZone community and get the full member experience. JOIN FOR FREE

Python is often thought of as a single-threaded language, but there are several


avenues for executing tasks concurrently.

The threading module allows us to spin up native operating system threads to


execute multiple tasks concurrently. The threading API has methods for creating
thread objects and then using the object to start and join on the underlying
thread.

# define the function to execute in a thread


def do_some_work(val):
print ("doing some work in thread")
print ("echo: {}".format(val))
return
val = "text"
# create thread object passing in the target function and optional args in
constructor
t = threading.Thread(target=do_some_work, args=(val,))
# start the thread
t.start()
# block execution of the main thread until thread t is completed
t.join()
The threading module also provides several synchronization and inter-thread
communication mechanisms for when threads need to communicate and coordinate with
each other, or for when multiple threads are mutating the same area of memory.
Locks and queues are the most common of those synchronization methods but Python
also provides RLocks, semaphores, conditions, events, and barrier implementations
in the threading API.

lock = threading.Lock()
### assume that code below runs in multiple threads ###
lock.acquire() # acquire the lock preventing other threads from doing so
try:
# access shared resource
finally:
lock.release() # release the lock so that other blocked threads can now run
queue = Queue()
## assume the code below runs in a separate thread t1 ###
def producer(queue):
item = make_an_item()
queue.put(item)
## assume the code below runs in a separate thread t2 ###
def consumer(queue):
item = queue.get() #gets item put in the queue by another thread. Blocks if
item not there yet
queue.task_done() # marks the last item retrieved as done
However, the current implementation of Python has a global interpreter lock (GIL)
to make Python easier to implement and faster to run for single threaded programs.
But as a result of the GIL, which only allows one thread to run at a time,
threading is not suitable for CPU-bound tasks (tasks in which most of the time is
spent performing a computation instead of waiting on IO). So instead, we have the
multiprocessing package. The multiprocessing package uses processes instead of
threads as the actors of parallel execution. And the multiprocessing API tries to
mimic the threading API as much as possible, to reduce the amount of dissonance
between the two and to make switching easier.

# define the function to execute in a new process


def generate_hash(text):
return hashlib.sha384(text).hexdigest()
text = "some long text here�"
if __name__ == '__main__':
# create process object passing in the target function and optional args in
constructor
p = multiprocessing.Process(target= generate_hash,args=(text,))
# start and join the process
p.start()
p.join()
One of the major areas where there is a difference between the threading and
multiprocessing APIs is in the implementation of shared state. Threads
automatically share memory with each other, but processes don't. So, special
accommodations must be made to allow processes to communicate and share state.
Processes can either allocate and use OS-shared memory areas or they can
communicate with a server process which maintains shared data.

The concurrent.futures module provides a layer of abstraction over both concurrency


mechanisms (threads and processes).

It was also the introduction of Futures into Python. In Python, a future represents
a pending result and it also allows us to manage the execution of the computation
that produces the result. Future API methods include result(), cancel(), and
add_done_callback(fn):

# define the function to execute in a new process


def generate_hash(text):
return hashlib.sha384(text).hexdigest()
text = b"some long text here..."
executor = ProcessPoolExecutor() # can be replaced with `ThreadpoolExecutor()`
future_result = executor.submit(generate_hash, text) # submit a job to the pool,
immediately returns a future object
Finally, the most recent addition to the Python concurrency family is the asyncio
module. asyncio brings single-threaded asynchronous programming to Python. It
provides an event loop that runs specialized functions called coroutines. A
coroutine has the ability to pause itself and yield control back to the event loop
when it needs to wait for IO or some other long-running task. The event loop can
then go on and execute other coroutines and resume the prior coroutine when an
event occurs that indicates that the IO or long-running task is complete. As a
result, we have multiple tasks running on the same thread and yielding to one
another instead of blocking.

# a coroutine function as denoted by the async keyword


async def delayed_hello():
print("Hello ")
# the coroutine will pause here and yield back to the event loop
await asyncio.sleep(1)
print("World!")
# get the event loop
loop = asyncio.get_event_loop()
# pass the coroutine to the event loop for execution
loop.run_until_complete(delayed_hello())
loop.close()
There are several resources that provided an in-depth look into Python concurrency,
like the Python Module of the Week blog and the Python Parallel Programming
Cookbook. If you are a Pluralsight user, you can also check out my Pluralsight
course, Python Concurrency: Getting Started.

You might also like