You are on page 1of 18

6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data

| Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one.

Member-only story

Multi-tasking in Python: Speed up your


program 10x by executing things
simultaneously
Step-by-step guide to apply threads and processes to speed up your code

Mike Huls · Follow


Published in Towards Data Science
7 min read · Nov 18, 2021

Listen Share

An army of workers to help us execute faster (image by Brian McGowan on Unsplash)

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 1/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

This article focuses on speeding up your program by making it do multiple things at


To make
the same time. We don’t have toMedium work, we
idle while ourlogprogram
user data. waits for and API response
By using Medium, you agree to our
e.g; we can do somethingPrivacy
else in that same time! We’ll also get into how to apply
Policy, including cookie policy.
more CPU’s to speed up calculation times. At the end of this article you’ll:

understand the difference ways of multi-tasking

know when to apply a which technique

be able to speed up your own code by using the code examples

Before we begin I’d strongly suggest to check out the article below. It explains how
Python works under the hood and why it isn’t as fast as other languages. Also it
reveals why isn’t Python multi-threaded to begin with? You’ll have a better
understanding of what the problem we’re trying to solve in this article. Let’s code!

Why Python is so slow and how to speed it up


Take a look under the hood to see where Python’s bottlenecks lie
towardsdatascience.com

Threads and processes


Python can multi-task in two ways: threading and multiprocessing. On the surface
they appear very alike but are fundamentally different. In the parts below we’ll
examine both by using two simple metaphors. Our goal is to get an understanding of
the differences between threads and processes so that we know when to use which.

Threading is like making breakfast


Let’s make some breakfast: we’ll need a boiled egg, some toast and a cup of coffee so
we have 4 tasks:

1. toast bread

2. boil water

3. boil egg
https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 2/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

4. switch on the coffee maker


To make Medium work, we log user data.
By using Medium, you agree to our
Privacy Policy, including cookie policy.

This is what we’re trying to make in as little time as possible (image by Eiliv-Sonas Aceron on Unsplash)

How would you go about this? One way is to perform each task sequentially; first
toast bread, then boil water and an egg an then switch on the coffee maker.
Although this process is pretty understandable, in the end it just leaves us with some
cold toast, a cold egg and a hot cup of coffee. Alternatively we could perform some
tasks simultaneously; we’ll switch on the coffee maker and toaster and boil some
water at the same time.

Let’s simulate this with some code.

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 3/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

1 def toast_bread():
2 print("toasting bread..")
To make Medium work, we log user data.
3 time.sleep(8) By using Medium, you agree to our
4 Privacy Policy, including cookie policy.
print("bread toasted")
5
6 def make_some_coffee():
7 print("turned on coffee maker..")
8 time.sleep(4)
9 print("Poured a nice cup of coffee")
10
11 def boil_water_and_egg():
12 print("boiling water..")
13 time.sleep(5.5)
14 print("water boiled")

py_multitask_1.py hosted with ❤ by GitHub view raw

We’ll run this code sequentially (one after the other) like this:

toast_bread()
boil_water_and_egg()
make_some_coffee()

Making breakfast sequentially will take around 17.5 seconds. This involves a lot of
waiting! Let’s multitask using threads:

1 from threading import Thread


2
3 threadlist = []
4
5 threadlist.append(Thread(target=toast_bread))
6 threadlist.append(Thread(target=boil_water_and_egg))
7 threadlist.append(Thread(target=make_some_coffee))
8
9 for t in threadlist:
10 t.start()
11
12 for t in threadlist:
13 t.join()

py_multitask_2.py hosted with ❤ by GitHub view raw

The code is pretty straight-forward: we’ll create some tasks and append all of them
to a list. Then we’ll start each thread in the list and wait for all of them to finish (this
https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 4/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

is what t.join() does). Making breakfast concurrently takes around 8 seconds!


To make Medium work, we log user data.
By using Medium, you agree to our
Privacy Policy, including cookie policy.

Our threaded way of making breakfast (image by Author)

The main take-away is that if there’s a lot of waiting involved (typical in I/O tasks like
downloading data, API requests, writing files..) we can use threads to multi-task.
Later on in this article we’ll examine why threads are the best option for I/O-tasks.

Multiprocessing is like making your homework


When we apply the same breakfast-making principle to making our homework we
run into a problem; doing math homework is a task that needs constant attention;
we can’t start it and wait for it to finish! In order to do multiple subjects of
homework at the same time we would need to clone ourselves, right?
Multiprocessing does exactly that.

Open in app Sign up Sign In

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 5/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

Time to focus and start processing (image by Annie Spratt on Unsplash)

Let’s first translate our homework example to some code. We’ll simulate doing our
homework with some CPU-intensive processing; adding up all numbers from 0 to
100 million:

1 def do_math_homework():
2
3 res = 0
4 for i in range(100_000_000):
5 res += i
6 return res
7
8 def do_physics_homework(num_of_tasks:int):
9 """same as do_math_homework)"""
10 def do_chemistry_homework(num_of_tasks:int):
11 """same as do_math_homework)"""

py_multitask_3.py hosted with ❤ by GitHub view raw

These are CPU-intensive functions; we have to perform a lot of calculations. We’ll


first execute these functions sequentially, then threaded (same code as in the
https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 6/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

previous part) and then using the code below to spawn processes. Notice that the
To make Medium work,
code looks a lot like the threading-code fromwe log
theuser data.
previous part.
By using Medium, you agree to our
Privacy Policy, including cookie policy.
1 from multiprocessing import Process
2
3 processlist = []
4
5 processlist.append(Process(target=do_math_homework)))
6 processlist.append(Process(target=do_physics_homework))
7 processlist.append(Process(target=do_chemistry_homework))
8
9 for t in processlist:
10 t.start()
11
12 for t in processlist:
13 t.join()

py_multitask_4.py hosted with ❤ by GitHub view raw

The benchmaks:

sequentially: 14.22 seconds

using threads: 13.89 seconds

using processes: 6.00 seconds

You’ll see that using processes speeds up executing by quite a lot! The reason for this
is that we can use more CPU’s.

In this example we have to to 300 million * 3 calculations. Threading this doesn’t


speed it up because it’s still one CPU that has to perform 300 million calculations.
When using processes however, we spawn a brand-new instance of Python on a
different CPU. In other words we’ll use 3 CPU’s that can perform 100 million
calculations each!

Summary
Threads are for making-breakfast-like tasks: it involves a lot of waiting so one
‘person’ (or CPU) can do thing simultaneously. Processes are for ‘thinking-tasks’;
they need you to be there to do the heavy work. Multiprocessing is like creating a
clone of yourself so it can do other things while you are working on your task.

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 7/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

Threading and multiprocessing


To make Mediumunder
work, we the hood
log user data.
By using
Now that we have a clearer Medium, you agree
understanding to ourthreads and processes work, let’s
of how
Privacy Policy, including cookie policy.
talk a bit about the differences between the two.

Let’s find out how they run (image by Erik Mclean on Unsplash)

The GIL — why threads are more suitable for I/O


As discussed threads are suitable for I/O tasks whereas processes are suitable for
CPU-heavy tasks. The reason for this is Python’s infamous GIL; the Global
Interpreter Lock. This lock ensures that Python runs single-threaded, blocking
other processes that do not hold on to the lock. Many I/O processes release the GIL
while idle, making threading possible. Check out this article to understand why
Python applies the GIL.

In the homework-example threading makes no sense because the task involved is


not an I/O-task. Because of the GIL only one thread can execute at any moment so it
offers no speed-ups. When multiprocessing we create a fresh instance of Python
which has its own GIL. This way processes run in parallel, speeding up the
executing of our program significantly.
Processes can’t share resources

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 8/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

The main one is that processes can’t share resources while threads can. This is
because a process works To make
with Medium work,
multiple CPU’swe log user data.a thread is just one CPU going
whereas
By using Medium, you agree to our
back and forth between multiple threads.
Privacy Policy, including cookie policy.

You can think of threading as a single CPU that first executes a few lines of code in
thread1, then it executes some lines in thread2, then moves on to thread3. Then it
executes the next line in thread1, then thread2 etc. Threading executes multiple
tasks concurrently; one worker that switches between tasks. For the user it looks as
though things happen simultaneously, this isn’t technically so.

When you spawn a new process a whole new instance of python is created and
allocated to a different CPU. This is the reason why two processes cannot share a
common resource. Processes run in parallel; there are multiple workers that work
on multiple tasks simultaneously.

Overhead
Processes take a little more time to spawn. This is the reason the homework
example is not three times faster but slightly less; first we’ll have to spawn the
processes before we can benefit from the parallelism.

Even more speed


Multi-tasking can solve a lot of speed-issues in Python but sometimes it’s just not
enough. Check out this or this article that show you how to compile a small part of
your code for a 100x speed increase.

Cython for absolute beginners: 30x faster code in two simple steps
Easy Python code compilation for blazingly fast applications
towardsdatascience.com

Conclusion

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 9/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

Threading and multiprocessing can be used to speed up the execution of your code
To make
in many, many cases. In this Medium
article work, explored
we’ve we log user data.
what threads and processes are,
By using Medium, you agree to our
how they work and whenPrivacy
to usePolicy,
which. Don’t forget to check out this article on how
including cookie policy.
to apply pools and for benchmarks!

If you have suggestions/clarifications please comment so I can improve this article.


In the meantime, check out my other articles on all kinds of programming-related
topics like these:

Why Python is slow and how to speed it up

Advanced multi-tasking in Python: applying and benchmarking threadpools and


processpools

Write you own C extension to speed up Python x100

Getting started with Cython: how to perform >1.7 billion calculations per second
in Python

Create a fast auto-documented, maintainable and easy-to-use Python API in 5


lines of code with FastAPI

Create and publish your own Python package

Create Your Custom, private Python Package That You Can PIP Install From Your
Git Repository

Virtual environments for absolute beginners — what is it and how to create one
(+ examples)

Dramatically improve your database insert speed with a simple upgrade

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!

Join Medium with my referral link - Mike Huls


As a Medium member, a portion of your membership fee goes to
writers you read, and you get full access to every story…
mikehuls.medium.com
https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 10/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Python Data Science Programming Coding Software Engineering
Privacy Policy, including cookie policy.

Follow

Written by Mike Huls


1K Followers · Writer for Towards Data Science

I'm a full-stack developer with a passion for programming, technology and traveling. — mikehuls.com —
https://mikehuls.medium.com/membership

More from Mike Huls and Towards Data Science

Mike Huls in Towards Data Science

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 11/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

A complete guide to using environment variables and files with Docker


and Compose To make Medium work, we log user data.
By using Medium, you agree to our
Keep your containers secure and flexible with this easy tutorial
Privacy Policy, including cookie policy.
· 6 min read · Jan 2, 2022

210 2

Jacob Marks, Ph.D. in Towards Data Science

How I Turned My Company’s Docs into a Searchable Database with


OpenAI
And how you can do the same with your docs

15 min read · Apr 25

4K 50

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 12/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

Khuyen Tran in Towards Data Science

Stop Hard Coding in a Data Science Project — Use Config Files Instead
And How to Efficiently Interact with Config Files in Python

· 6 min read · May 26

1.5K 19

Mike Huls in Towards Data Science

Simple trick to work with relative paths in Python

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 13/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

Calculate the file path at runtime with ease


To make Medium work, we log user data.
· 5 min read · Oct 25, 2021 By using Medium, you agree to our
Privacy Policy, including cookie policy.
133 2

See all from Mike Huls

See all from Towards Data Science

Recommended from Medium

Lynn Kwong in Towards Data Science

Understand async/await with asyncio for Asynchronous Programming in


Python
Get your hands dirty with a new way of writing asynchronous code

· 9 min read · Dec 25, 2022

326 1

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 14/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

Mike Huls in Towards Data Science

Thread Your Python Program with Two Lines of Code


Speed up your program by doing multiple things simultaneously

· 8 min read · Jan 10

157 3

Lists

General Coding Knowledge


20 stories · 6 saves

Coding & Development


11 stories · 1 save

Predictive Modeling w/ Python


18 stories · 6 saves

New_Reading_List
173 stories

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 15/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

Bobby in Level Up Coding

8 Mind-Blowing Python One-Liners You Should Know


Master Pythonic Efficiency with These 8 Incredible One-Liners

· 5 min read · Mar 13

181 4

Patrick Kalkman in ITNEXT

Dependency Injection in Python

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 16/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

Building flexible and testable architectures in Python


To make Medium work, we log user data.
· 13 min read · Apr 14 By using Medium, you agree to our
Privacy Policy, including cookie policy.
777 5

Yang Zhou in TechToFreedom

9 Python Built-In Decorators That Optimize Your Code Significantly


Do more by less: leverage the power of decorators

· 7 min read · Jan 2

1.98K 21

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 17/18
6/20/23, 3:30 PM Multi-tasking in Python: Speed up your program 10x by executing things simultaneously | by Mike Huls | Towards Data Science

To make Medium work, we log user data.


By using Medium, you agree to our
Privacy Policy, including cookie policy.

Mike Huls in Towards Data Science

What is the difference between UNION and JOIN in SQL?


5 minute guide to UNION, EXCEPT and INTERSECT in SQL

· 7 min read · May 1

159

See more recommendations

https://towardsdatascience.com/multi-tasking-in-python-speed-up-your-program-10x-by-executing-things-simultaneously-4b4fc7ee71e 18/18

You might also like