You are on page 1of 31

Python Multiprocessing

CS307: Systems Practicum Lecture-2


Prof. Varun Dutt, IIT Mandi 9th Feb 2023
Eliminating Impact of global interpreter lock (GIL)
While working with concurrent applications, there is a limitation present in Python called the GIL (Global
Interpreter Lock). GIL never allows us to utilize multiple cores of CPU and hence we can say that there are no
true threads in Python. GIL is the mutex – mutual exclusion lock, which makes things thread safe. In other
words, we can say that GIL prevents multiple threads from executing Python code in parallel. The lock can be
held by only one thread at a time and if we want to execute a thread then it must acquire the lock first.

With the use of multiprocessing, we can effectively bypass the limitation caused by GIL −

● By using multiprocessing, we are utilizing the capability of multiple processes and hence we are
utilizing multiple instances of the GIL.
● Due to this, there is no restriction of executing the bytecode of one thread within our programs at
any one time.
Starting Processes in Python
The following three methods can be used to start a process in Python within the multiprocessing module −

● Fork
● Spawn
● Forkserver

Creating a process with Fork


Fork command is a standard command found in UNIX. It is used to create new processes called child
processes. This child process runs concurrently with the process called the parent process. These child
processes are also identical to their parent processes and inherit all of the resources available to the parent.
The following system calls are used while creating a process with Fork −

● fork() − It is a system call generally implemented in kernel. It is used to create a copy of the
process.p>
● getpid() − This system call returns the process ID(PID) of the calling process.
Example
The following Python script example will help you understand how to create a new child process and get
the PIDs of child and parent processes −
Creating a Process with Spawn
Spawn means to start something new. Hence, spawning a process means the creation of a new process by a parent process.
The parent process continues its execution asynchronously or waits until the child process ends its execution. Follow these
steps for spawning a process −

● Importing multiprocessing module.


● Creating the object process.
● Starting the process activity by calling start() method.
● Waiting until the process has finished its work and exit by calling join() method.
Concurrency and parallelism
Concurrency means that two or more calculations happen within the same time frame. Parallelism
means that two or more calculations happen at the same moment. Parallelism is therefore a specific
case of concurrency. It requires multiple CPU units or cores.

True parallelism in Python is achieved by creating multiple processes, each having a Python
interpreter with its own separate GIL.

Python has three modules for concurrency: multiprocessing, threading, and asyncio. When the tasks
are CPU intensive, we should consider the multiprocessing module. When the tasks are I/O bound
and require lots of connections, the asyncio module is recommended. For other types of tasks and
when libraries cannot cooperate with asyncio, the threading module can be considered.
Embarrassinbly parallel
The term embarrassinbly parallel is used to describe a problem or workload that can be easily run
in parallel. It is important to realize that not all workloads can be divided into subtasks and run
parallelly. For instance those, who need lots of communication among subtasks.

The examples of perfectly parallel computations include:

● Monte Carlo analysis


● numerical integration
● rendering of computer graphics
● brute force searches in cryptography
● genetic algorithms

Another situation where parallel computations can be applied is when we run several
different computations, that is, we don't divide a problem into subtasks. For instance, we
could run calculations of π using different algorithms in parallel.
Process
The Process object represents an activity that is run in a separate process. The
multiprocessing.Process class has equivalents of all the methods of threading.Thread. The Process
constructor should always be called with keyword arguments.

The target argument of the constructor is the callable object to be invoked by the run method. The
name is the process name. The start method starts the process's activity. The join method blocks
until the process whose join method is called terminates. If the timeout option is provided, it blocks
at most timeout seconds. The is_alive method returns a boolean value indicationg whether the
process is alive. The terminate method terminates the process.

The __main__ guard


The Python multiprocessing style guide recommends to place the multiprocessing code inside the
__name__ == '__main__' idiom. This is due to the way the processes are created on Windows. The
guard is to prevent the endless loop of process generations.
Simple process example
The following is a simple program that uses multiprocessing.
We create a new process and pass a value to it.

The function prints the passed parameter.

A new process is created. The target option provides the callable that is run in the new process. The
args provides the data to be passed. The multiprocessing code is placed inside the main guard. The
process is started with the start method.

The code is placed inside the __name__ == '__main__' idiom.


Python multiprocessing join
The join method blocks the execution of the main process until the process whose join method is
called terminates. Without the join method, the main process won't wait until the process gets
terminated.
The example calls the join on the newly created process.

$ ./joining.py
starting main
starting fun
finishing fun
finishing main

The finishing main message is printed after the child process has finished.

$ ./joining.py
starting main
finishing main
starting fun
finishing fun

When we comment out the join method, the main process finishes before the child process.

It is important to call the join methods after the start methods.


If we call the join methods incorrectly, then we in fact run the processes sequentially. (The incorrect
way is commented out.)
Python multiprocessing is_alive
The is_alive method determines if the process is running.

When we wait for the


child process to finish
with the join method, the
process is already dead
when we check it. If we
comment out the join,
the process is still alive.
Python multiprocessing Process Id
The os.getpid returns the current process Id, while the os.getppid returns the parent's process Id.
Naming processes

With the name property of the Process, we can give the worker a specific name. Otherwise, the module creates
its own name.
Subclassing Process

When we subclass the Process, we override the run method.


Daemon Processes in Python
Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the
processes that are running in the background follow similar concept as the daemon threads. To execute the process in the
background, we need to set the daemonic flag to true. The daemon process will continue to run as long as the main process is
executing and it will terminate after finishing its execution or when the main program would be killed.

Example
Here, we are using the same example as used in the daemon threads. The only difference is the change of module from
multithreading to multiprocessing and setting the daemonic flag to true. However, there would be a change in output as shown
below −

The output is different when compared to the


one generated by daemon threads, because
the process in no daemon mode have an
output. Hence, the daemonic process ends
automatically after the main programs end to
avoid the persistence of running processes.
Terminating processes in Python
We can kill or terminate a process immediately by using the terminate() method. We will use this method to
terminate the child process, which has been created with the help of function, immediately before completing
its execution.

Example Output
import multiprocessing My Process has terminated, terminating main thread
import time Terminating Child Process
def Child_process(): Child Process successfully terminated
print ('Starting function')
time.sleep(5)
print ('Finished function')
P = multiprocessing.Process(target = Child_process)
P.start()
print("My Process has terminated, terminating main thread")
print("Terminating Child Process")
P.terminate()
print("Child Process successfully terminated")

The output shows that the program terminates before the execution of child process that has been created
with the help of the Child_process() function. This implies that the child process has been terminated
successfully.
Python multiprocessing Pool

The management of the worker processes can be simplified with


the Pool object. It controls a pool of worker processes to which
jobs can be submitted. The pool's map method chops the given
iterable into a number of chunks which it submits to the
process pool as separate tasks. The pool's map is a parallel
equivalent of the built-in map method. The map blocks the
main execution until all computations finish.

The Pool can take the number of processes as a parameter. It is


a value with which we can experiment. If we do not provide any
value, then the number returned by os.cpu_count is used.
Multiple arguments

To pass multiple arguments to a worker function, we can use the starmap method. The elements of the iterable
are expected to be iterables that are unpacked as arguments.
Multiple functions

The following example shows how to run multiple functions in a pool.

We have three functions, which


are run independently in a pool.
We use the functools.partial to
prepare the functions and their
parameters before they are
executed.
Separate memory in a Process
In multiprocessing, each worker has its own memory. The memory is not shared like in threading.
Sharing states between processes
Data can be stored in a shared memory using Value or Array.

Note: It is best to avoid sharing data between processes. Message passing is preferred.

Each process must acquire a lock for itself.


Message Passing with queues
● The message passing is the preferred way of communication among processes. Message passing avoids having
to use synchronization primitives such as locks, which are difficult to use and error prone in complex situations.
● To pass messages, we can utilize the pipe for the connection between two processes. The queue allows multiple
producers and consumers.
Calculating π with Monte Carlo method
Monte Carlo methods are a broad class of computational algorithms that rely on repeated random
sampling to obtain numerical results. The underlying concept is to use randomness to solve
problems that might be deterministic in principle.

The following formula is used to calculate the approximation of π:

The M is the number of generated points in the square and N is the total number of points.

While this method of π calculation is interesting and perfect for school examples, it is not very
accurate. There are far better algorithms to get π.
In the example, we calculate the approximation of the π value using one hundred million generated
random points.

It took 44.78 seconds to calculate the approximation


of π.
Calculating π with Monte Carlo method
Now we divide the whole task of π computation into subtasks.
In the above example, we find out the number of cores and divide the random sampling into subtasks. Each
task will compute the random values independently.

Instead of calculating 100_000_000 in one go, each subtask will calculate a portion of it.

The partial calculations are passed to the count variable and the sum is then used in the final formula.
When running the example in parallel with four cores, the calculations took 29.46 seconds.
References

● https://zetcode.com/python/multiprocessing/
● https://www.tutorialspoint.com/concurrency_in_python/concurrency_in_python_multiprocessing.ht
m

You might also like