You are on page 1of 66

UNIT - III

Modules and Packages

Modular programming refers to the process of breaking a large, unwieldy


programming task into separate, smaller, more manageable subtasks or modules.
Individual modules can then be cobbled together like building blocks to create a
larger application.

There are several advantages to modularizing code in a large application:

 Simplicity: Rather than focusing on the entire problem at hand, a module


typically focuses on one relatively small portion of the problem. If you’re
working on a single module, you’ll have a smaller problem domain to wrap
your head around. This makes development easier and less error-prone.
 Maintainability: Modules are typically designed so that they enforce logical
boundaries between different problem domains. If modules are written in a
way that minimizes interdependency, there is decreased likelihood that
modifications to a single module will have an impact on other parts of the
program. (You may even be able to make changes to a module without
having any knowledge of the application outside that module.) This makes it
more viable for a team of many programmers to work collaboratively on a
large application.
 Reusability: Functionality defined in a single module can be easily reused
(through an appropriately defined interface) by other parts of the
application. This eliminates the need to duplicate code.
 Scoping: Modules typically define a separate namespace, which helps avoid
collisions between identifiers in different areas of a program. (One of the
tenets in the Zen of Python is Namespaces are one honking great idea—let’s
do more of those!)

Functions, modules and packages are all constructs in Python that promote code


modularization.

Python Modules: Overview


There are actually three different ways to define a module in Python:

1. A module can be written in Python itself.


2. A module can be written in C and loaded dynamically at run-time, like the re (regular
expression) module.
3. A built-in module is intrinsically contained in the interpreter, like the itertools module.

A module’s contents are accessed the same way in all three cases: with the import statement.
Here, the focus will mostly be on modules that are written in Python. The cool thing about
modules written in Python is that they are exceedingly straightforward to build. All you need
to do is create a file that contains legitimate Python code and then give the file a name with
a .py extension. That’s it! No special syntax or voodoo is necessary.

For example, suppose you have created a file called mod.py containing the following:

mod.py

s = "If Comrade Napoleon says it, it must be right."


a = [100, 200, 300]

def foo(arg):
print(f'arg = {arg}')

class Foo:
pass
Several objects are defined in mod.py:

 s (a string)
 a (a list)
 foo() (a function)
 Foo (a class)

Assuming mod.py is in an appropriate location, which you will learn more about shortly, these
objects can be accessed by importing the module as follows:

>>>
>>> import mod
>>> print(mod.s)
If Comrade Napoleon says it, it must be right.
>>> mod.a
[100, 200, 300]
>>> mod.foo(['quux', 'corge', 'grault'])
arg = ['quux', 'corge', 'grault']
>>> x = mod.Foo()
>>> x
<mod.Foo object at 0x03C181F0>

The Module Search Path


Continuing with the above example, let’s take a look at what happens when Python executes
the statement:
import mod
When the interpreter executes the above import statement, it searches for mod.py in a list of
directories assembled from the following sources:

 The directory from which the input script was run or the current directory if the
interpreter is being run interactively
 The list of directories contained in the PYTHONPATH environment variable, if it is
set. (The format for PYTHONPATH is OS-dependent but should mimic
the PATH environment variable.)
 An installation-dependent list of directories configured at the time Python is installed

The resulting search path is accessible in the Python variable sys.path, which is obtained from
a module named sys:

>>>
>>> import sys
>>> sys.path
['', 'C:\\Users\\john\\Documents\\Python\\doc', 'C:\\Python36\\Lib\\idlelib',
'C:\\Python36\\python36.zip', 'C:\\Python36\\DLLs', 'C:\\Python36\\lib',
'C:\\Python36', 'C:\\Python36\\lib\\site-packages']

Note: The exact contents of sys.path are installation-dependent. The above will almost


certainly look slightly different on your computer.

Thus, to ensure your module is found, you need to do one of the following:

 Put mod.py in the directory where the input script is located or the current directory,
if interactive
 Modify the PYTHONPATH environment variable to contain the directory
where mod.py is located before starting the interpreter
o Or: Put mod.py in one of the directories already contained in
the PYTHONPATH variable
 Put mod.py in one of the installation-dependent directories, which you may or may not
have write-access to, depending on the OS

There is actually one additional option: you can put the module file in any directory of your
choice and then modify sys.path at run-time so that it contains that directory. For example, in
this case, you could put mod.py in directory C:\Users\john and then issue the following
statements:

>>>
>>> sys.path.append(r'C:\Users\john')
>>> sys.path
['', 'C:\\Users\\john\\Documents\\Python\\doc', 'C:\\Python36\\Lib\\idlelib',
'C:\\Python36\\python36.zip', 'C:\\Python36\\DLLs', 'C:\\Python36\\lib',
'C:\\Python36', 'C:\\Python36\\lib\\site-packages', 'C:\\Users\\john']
>>> import mod
Once a module has been imported, you can determine the location where it was found with
the module’s __file__ attribute:

>>>
>>> import mod
>>> mod.__file__
'C:\\Users\\john\\mod.py'

>>> import re
>>> re.__file__
'C:\\Python36\\lib\\re.py'

The directory portion of __file__ should be one of the directories in sys.path.

The import Statement
Module contents are made available to the caller with the import statement.
The import statement takes many different forms, shown below.

import <module_name>
The simplest form is the one already shown above:

import <module_name>
Note that this does not make the module contents directly accessible to the caller. Each
module has its own private symbol table, which serves as the global symbol table for all
objects defined in the module. Thus, a module creates a separate namespace, as already
noted.

The statement import <module_name> only places <module_name> in the caller’s symbol table.


The objects that are defined in the module remain in the module’s private symbol table.

From the caller, objects in the module are only accessible when prefixed
with <module_name> via dot notation, as illustrated below.

After the following import statement, mod is placed into the local symbol table. Thus, mod has
meaning in the caller’s local context:

>>>
>>> import mod
>>> mod
<module 'mod' from 'C:\\Users\\john\\Documents\\Python\\doc\\mod.py'>
But s and foo remain in the module’s private symbol table and are not meaningful in the local
context:

>>>
>>> s
NameError: name 's' is not defined
>>> foo('quux')
NameError: name 'foo' is not defined
To be accessed in the local context, names of objects defined in the module must be prefixed
by mod:

>>>
>>> mod.s
'If Comrade Napoleon says it, it must be right.'
>>> mod.foo('quux')
arg = quux
Several comma-separated modules may be specified in a single import statement:

import <module_name>[, <module_name> ...]

from <module_name> import <name(s)>


An alternate form of the import statement allows individual objects from the module to be
imported directly into the caller’s symbol table:

from <module_name> import <name(s)>


Following execution of the above statement, <name(s)> can be referenced in the caller’s
environment without the <module_name> prefix:

>>>
>>> from mod import s, foo
>>> s
'If Comrade Napoleon says it, it must be right.'
>>> foo('quux')
arg = quux

>>> from mod import Foo


>>> x = Foo()
>>> x
<mod.Foo object at 0x02E3AD50>
Because this form of import places the object names directly into the caller’s symbol table,
any objects that already exist with the same name will be overwritten:

>>>
>>> a = ['foo', 'bar', 'baz']
>>> a
['foo', 'bar', 'baz']

>>> from mod import a


>>> a
[100, 200, 300]
It is even possible to indiscriminately import everything from a module at one fell swoop:

from <module_name> import *


This will place the names of all objects from <module_name> into the local symbol table, with
the exception of any that begin with the underscore (_) character.

For example:

>>>
>>> from mod import *
>>> s
'If Comrade Napoleon says it, it must be right.'
>>> a
[100, 200, 300]
>>> foo
<function foo at 0x03B449C0>
>>> Foo
<class 'mod.Foo'>

This isn’t necessarily recommended in large-scale production code. It’s a bit dangerous
because you are entering names into the local symbol table en masse. Unless you know them
all well and can be confident there won’t be a conflict, you have a decent chance of
overwriting an existing name inadvertently. However, this syntax is quite handy when you
are just mucking around with the interactive interpreter, for testing or discovery purposes,
because it quickly gives you access to everything a module has to offer without a lot of
typing.

from <module_name> import <name> as <alt_name>


It is also possible to import individual objects but enter them into the local symbol table with
alternate names:
from <module_name> import <name> as <alt_name>[, <name> as <alt_name> …]
This makes it possible to place names directly into the local symbol table but avoid conflicts
with previously existing names:

>>>
>>> s = 'foo'
>>> a = ['foo', 'bar', 'baz']

>>> from mod import s as string, a as alist


>>> s
'foo'
>>> string
'If Comrade Napoleon says it, it must be right.'
>>> a
['foo', 'bar', 'baz']
>>> alist
[100, 200, 300]

import <module_name> as <alt_name>


You can also import an entire module under an alternate name:

import <module_name> as <alt_name>


 
>>>
>>> import mod as my_module
>>> my_module.a
[100, 200, 300]
>>> my_module.foo('qux')
arg = qux

Module contents can be imported from within a function definition. In that case,
the import does not occur until the function is called:

>>>
>>> def bar():
... from mod import foo
... foo('corge')
...

>>> bar()
arg = corge

However, Python 3 does not allow the indiscriminate import * syntax from within a function:

>>>
>>> def bar():
... from mod import *
...
SyntaxError: import * only allowed at module level
Lastly, a try statement with an except ImportError clause can be used to guard against
unsuccessful import attempts:

>>>
>>> try:
... # Non-existent module
... import baz
... except ImportError:
... print('Module not found')
...

Module not found


 
>>>
>>> try:
... # Existing module, but non-existent object
... from mod import baz
... except ImportError:
... print('Object not found in module')
...

Object not found in module

The dir() Function
The built-in function dir() returns a list of defined names in a namespace. Without arguments,
it produces an alphabetically sorted list of names in the current local symbol table:

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> qux = [1, 2, 3, 4, 5]
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'qux']

>>> class Bar():


... pass
...
>>> x = Bar()
>>> dir()
['Bar', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'qux', 'x']

Note how the first call to dir() above lists several names that are automatically defined and
already in the namespace when the interpreter starts. As new names are defined (qux, Bar, x),
they appear on subsequent invocations of dir().

This can be useful for identifying what exactly has been added to the namespace by an import
statement:

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> import mod


>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'mod']
>>> mod.s
'If Comrade Napoleon says it, it must be right.'
>>> mod.foo([1, 2, 3])
arg = [1, 2, 3]

>>> from mod import a, Foo


>>> dir()
['Foo', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'a', 'mod']
>>> a
[100, 200, 300]
>>> x = Foo()
>>> x
<mod.Foo object at 0x002EAD50>

>>> from mod import s as string


>>> dir()
['Foo', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'a', 'mod', 'string', 'x']
>>> string
'If Comrade Napoleon says it, it must be right.'

When given an argument that is the name of a module, dir() lists the names defined in the
module:

>>>
>>> import mod
>>> dir(mod)
['Foo', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__',
'__name__', '__package__', '__spec__', 'a', 'foo', 's']
 
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> from mod import *
>>> dir()
['Foo', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'a', 'foo', 's']

Executing a Module as a Script


Any .py file that contains a module is essentially also a Python script, and there isn’t any
reason it can’t be executed like one.

Here again is mod.py as it was defined above:

mod.py

s = "If Comrade Napoleon says it, it must be right."


a = [100, 200, 300]

def foo(arg):
print(f'arg = {arg}')

class Foo:
pass
This can be run as a script:

C:\Users\john\Documents>python mod.py
C:\Users\john\Documents>
There are no errors, so it apparently worked. Granted, it’s not very interesting. As it is
written, it only defines objects. It doesn’t do anything with them, and it doesn’t generate any
output.

Let’s modify the above Python module so it does generate some output when run as a script:

mod.py

s = "If Comrade Napoleon says it, it must be right."


a = [100, 200, 300]

def foo(arg):
print(f'arg = {arg}')

class Foo:
pass

print(s)
print(a)
foo('quux')
x = Foo()
print(x)
Now it should be a little more interesting:

C:\Users\john\Documents>python mod.py
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<__main__.Foo object at 0x02F101D0>
Unfortunately, now it also generates output when imported as a module:
>>>
>>> import mod
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<mod.Foo object at 0x0169AD50>
This is probably not what you want. It isn’t usual for a module to generate output when it is
imported.

Wouldn’t it be nice if you could distinguish between when the file is loaded as a module and
when it is run as a standalone script?

Ask and ye shall receive.

When a .py file is imported as a module, Python sets the special dunder variable __name__ to


the name of the module. However, if a file is run as a standalone script, __name__ is
(creatively) set to the string '__main__'. Using this fact, you can discern which is the case at
run-time and alter behavior accordingly:

mod.py

s = "If Comrade Napoleon says it, it must be right."


a = [100, 200, 300]

def foo(arg):
print(f'arg = {arg}')

class Foo:
pass

if (__name__ == '__main__'):
print('Executing as standalone script')
print(s)
print(a)
foo('quux')
x = Foo()
print(x)
Now, if you run as a script, you get output:

C:\Users\john\Documents>python mod.py
Executing as standalone script
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<__main__.Foo object at 0x03450690>
But if you import as a module, you don’t:

>>>
>>> import mod
>>> mod.foo('grault')
arg = grault
Modules are often designed with the capability to run as a standalone script for purposes of
testing the functionality that is contained within the module. This is referred to as unit
testing. For example, suppose you have created a module fact.py containing
a factorial function, as follows:

fact.py

def fact(n):
return 1 if n == 1 else n * fact(n-1)

if (__name__ == '__main__'):
import sys
if len(sys.argv) > 1:
print(fact(int(sys.argv[1])))
The file can be treated as a module, and the fact() function imported:

>>>
>>> from fact import fact
>>> fact(6)
720
But it can also be run as a standalone by passing an integer argument on the command-line
for testing:

C:\Users\john\Documents>python fact.py 6
720

Reloading a Module
For reasons of efficiency, a module is only loaded once per interpreter session. That is fine
for function and class definitions, which typically make up the bulk of a module’s contents.
But a module can contain executable statements as well, usually for initialization. Be aware
that these statements will only be executed the first time a module is imported.

Consider the following file mod.py:


mod.py

a = [100, 200, 300]


print('a =', a)
 
>>>
>>> import mod
a = [100, 200, 300]
>>> import mod
>>> import mod

>>> mod.a
[100, 200, 300]
The print() statement is not executed on subsequent imports. (For that matter, neither is the
assignment statement, but as the final display of the value of mod.a shows, that doesn’t
matter. Once the assignment is made, it sticks.)

If you make a change to a module and need to reload it, you need to either restart the
interpreter or use a function called reload() from module importlib:

>>>
>>> import mod
a = [100, 200, 300]

>>> import mod

>>> import importlib


>>> importlib.reload(mod)
a = [100, 200, 300]
<module 'mod' from 'C:\\Users\\john\\Documents\\Python\\doc\\mod.py'>

Python Packages
Suppose you have developed a very large application that includes many modules. As the
number of modules grows, it becomes difficult to keep track of them all if they are dumped
into one location. This is particularly so if they have similar names or functionality. You
might wish for a means of grouping and organizing them.

Packages allow for a hierarchical structuring of the module namespace using dot notation.


In the same way that modules help avoid collisions between global variable
names, packages help avoid collisions between module names.

Creating a package is quite straightforward, since it makes use of the operating system’s
inherent hierarchical file structure. Consider the following arrangement:
Here, there is a directory named pkg that contains two modules, mod1.py and mod2.py. The
contents of the modules are:

mod1.py

def foo():
print('[mod1] foo()')

class Foo:
pass
mod2.py

def bar():
print('[mod2] bar()')

class Bar:
pass
Given this structure, if the pkg directory resides in a location where it can be found (in one of
the directories contained in sys.path), you can refer to the two modules with dot
notation (pkg.mod1, pkg.mod2) and import them with the syntax you are already familiar
with:

import <module_name>[, <module_name> ...]


 
>>>
>>> import pkg.mod1, pkg.mod2
>>> pkg.mod1.foo()
[mod1] foo()
>>> x = pkg.mod2.Bar()
>>> x
<pkg.mod2.Bar object at 0x033F7290>
 
from <module_name> import <name(s)>
 
>>>
>>> from pkg.mod1 import foo
>>> foo()
[mod1] foo()
 
from <module_name> import <name> as <alt_name>
 
>>>
>>> from pkg.mod2 import Bar as Qux
>>> x = Qux()
>>> x
<pkg.mod2.Bar object at 0x036DFFD0>
You can import modules with these statements as well:

from <package_name> import <modules_name>[, <module_name> ...]


from <package_name> import <module_name> as <alt_name>
 
>>>
>>> from pkg import mod1
>>> mod1.foo()
[mod1] foo()

>>> from pkg import mod2 as quux


>>> quux.bar()
[mod2] bar()
You can technically import the package as well:

>>>
>>> import pkg
>>> pkg
<module 'pkg' (namespace)>
But this is of little avail. Though this is, strictly speaking, a syntactically correct Python
statement, it doesn’t do much of anything useful. In particular, it does not place any of the
modules in pkg into the local namespace:

>>>
>>> pkg.mod1
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
pkg.mod1
AttributeError: module 'pkg' has no attribute 'mod1'
>>> pkg.mod1.foo()
Traceback (most recent call last):
File "<pyshell#35>", line 1, in <module>
pkg.mod1.foo()
AttributeError: module 'pkg' has no attribute 'mod1'
>>> pkg.mod2.Bar()
Traceback (most recent call last):
File "<pyshell#36>", line 1, in <module>
pkg.mod2.Bar()
AttributeError: module 'pkg' has no attribute 'mod2'
To actually import the modules or their contents, you need to use one of the forms shown
above.

Package Initialization
If a file named __init__.py is present in a package directory, it is invoked when the package or
a module in the package is imported. This can be used for execution of package initialization
code, such as initialization of package-level data.

For example, consider the following __init__.py file:

__init__.py

print(f'Invoking __init__.py for {__name__}')


A = ['quux', 'corge', 'grault']
Let’s add this file to the pkg directory from the above example:

Now when the package is imported, the global list A is initialized:

>>>
>>> import pkg
Invoking __init__.py for pkg
>>> pkg.A
['quux', 'corge', 'grault']
A module in the package can access the global variable by importing it in turn:

mod1.py
def foo():
from pkg import A
print('[mod1] foo() / A = ', A)

class Foo:
pass
 
>>>
>>> from pkg import mod1
Invoking __init__.py for pkg
>>> mod1.foo()
[mod1] foo() / A = ['quux', 'corge', 'grault']
__init__.py can also be used to effect automatic importing of modules from a package. For
example, earlier you saw that the statement import pkg only places the name pkg in the caller’s
local symbol table and doesn’t import any modules. But if __init__.py in the pkg directory
contains the following:

__init__.py

print(f'Invoking __init__.py for {__name__}')


import pkg.mod1, pkg.mod2
then when you execute import pkg, modules mod1 and mod2 are imported automatically:

>>>
>>> import pkg
Invoking __init__.py for pkg
>>> pkg.mod1.foo()
[mod1] foo()
>>> pkg.mod2.bar()
[mod2] bar()
Note: Much of the Python documentation states that an __init__.py file must be present in the
package directory when creating a package. This was once true. It used to be that the very
presence of __init__.py signified to Python that a package was being defined. The file could
contain initialization code or even be empty, but it had to be present.

Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the
creation of a package without any __init__.py file. Of course, it can still be present if package
initialization is needed. But it is no longer required.

Importing * From a Package
For the purposes of the following discussion, the previously defined package is expanded to
contain some additional modules:
There are now four modules defined in the pkg directory. Their contents are as shown below:

mod1.py

def foo():
print('[mod1] foo()')

class Foo:
pass
mod2.py

def bar():
print('[mod2] bar()')

class Bar:
pass
mod3.py

def baz():
print('[mod3] baz()')

class Baz:
pass
mod4.py

def qux():
print('[mod4] qux()')

class Qux:
pass
(Imaginative, aren’t they?)
You have already seen that when import * is used for a module, all objects from the module
are imported into the local symbol table, except those whose names begin with an underscore,
as always:

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> from pkg.mod3 import *

>>> dir()
['Baz', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'baz']
>>> baz()
[mod3] baz()
>>> Baz
<class 'pkg.mod3.Baz'>
The analogous statement for a package is this:

from <package_name> import *


What does that do?

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> from pkg import *


>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
Hmph. Not much. You might have expected (assuming you had any expectations at all) that
Python would dive down into the package directory, find all the modules it could, and import
them all. But as you can see, by default that is not what happens.

Instead, Python follows this convention: if the __init__.py file in the package directory


contains a list named __all__, it is taken to be a list of modules that should be imported when
the statement from <package_name> import * is encountered.

For the present example, suppose you create an __init__.py in the pkg directory like this:

pkg/__init__.py
__all__ = [
'mod1',
'mod2',
'mod3',
'mod4'
]
Now from pkg import * imports all four modules:

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> from pkg import *


>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'mod1', 'mod2', 'mod3', 'mod4']
>>> mod2.bar()
[mod2] bar()
>>> mod4.Qux
<class 'pkg.mod4.Qux'>
Using import * still isn’t considered terrific form, any more for packages than for modules.
But this facility at least gives the creator of the package some control over what happens
when import * is specified. (In fact, it provides the capability to disallow it entirely, simply by
declining to define __all__ at all. As you have seen, the default behavior for packages is to
import nothing.)

By the way, __all__ can be defined in a module as well and serves the same purpose: to
control what is imported with import *. For example, modify mod1.py as follows:

pkg/mod1.py

__all__ = ['foo']

def foo():
print('[mod1] foo()')

class Foo:
pass
Now an import * statement from pkg.mod1 will only import what is contained in __all__:

>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> from pkg.mod1 import *


>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'foo']

>>> foo()
[mod1] foo()
>>> Foo
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
Foo
NameError: name 'Foo' is not defined
foo() (the function) is now defined in the local namespace, but Foo (the class) is not, because
the latter is not in __all__.

In summary, __all__ is used by both packages and modules to control what is imported


when import * is specified. But the default behavior differs:

 For a package, when __all__ is not defined, import * does not import anything.


 For a module, when __all__ is not defined, import * imports everything (except—you
guessed it—names starting with an underscore).

Subpackages
Packages can contain nested subpackages to arbitrary depth. For example, let’s make one
more modification to the example package directory as follows:
The four modules (mod1.py, mod2.py, mod3.py and mod4.py) are defined as previously. But
now, instead of being lumped together into the pkg directory, they are split out into
two subpackage directories, sub_pkg1 and sub_pkg2.

Importing still works the same as shown previously. Syntax is similar, but additional dot
notation is used to separate package name from subpackage name:

>>>
>>> import pkg.sub_pkg1.mod1
>>> pkg.sub_pkg1.mod1.foo()
[mod1] foo()

>>> from pkg.sub_pkg1 import mod2


>>> mod2.bar()
[mod2] bar()

>>> from pkg.sub_pkg2.mod3 import baz


>>> baz()
[mod3] baz()

>>> from pkg.sub_pkg2.mod4 import qux as grault


>>> grault()
[mod4] qux()
In addition, a module in one subpackage can reference objects in a sibling subpackage (in
the event that the sibling contains some functionality that you need). For example, suppose
you want to import and execute function foo() (defined in module mod1) from within
module mod3. You can either use an absolute import:

pkg/sub__pkg2/mod3.py

def baz():
print('[mod3] baz()')

class Baz:
pass

from pkg.sub_pkg1.mod1 import foo


foo()
 
>>>
>>> from pkg.sub_pkg2 import mod3
[mod1] foo()
>>> mod3.foo()
[mod1] foo()
Or you can use a relative import, where .. refers to the package one level up. From
within mod3.py, which is in subpackage sub_pkg2,

 .. evaluates to the parent package (pkg), and


 ..sub_pkg1 evaluates to subpackage sub_pkg1 of the parent package.

pkg/sub__pkg2/mod3.py

def baz():
print('[mod3] baz()')

class Baz:
pass

from .. import sub_pkg1


print(sub_pkg1)

from ..sub_pkg1.mod1 import foo


foo()
 
>>>
>>> from pkg.sub_pkg2 import mod3
<module 'pkg.sub_pkg1' (namespace)>
[mod1] foo()

Conclusion
In this tutorial, you covered the following topics:

 How to create a Python module


 Locations where the Python interpreter searches for a module
 How to obtain access to the objects defined in a module with the import statement
 How to create a module that is executable as a standalone script
 How to organize modules into packages and subpackages
 How to control package initialization

Object Oriented Programming


Object-oriented programming (OOP) is a method of structuring a program by bundling
related properties and behaviors into individual objects. In this tutorial, you’ll learn the basics
of object-oriented programming in Python.
Conceptually, objects are like the components of a system. Think of a program as a factory
assembly line of sorts. At each step of the assembly line a system component processes some
material, ultimately transforming raw material into a finished product.

An object contains data, like the raw or pre-processed materials at each step on an assembly
line, and behaviour, like the action each assembly line component performs.

What Is Object-Oriented Programming in Python?


Object-oriented programming is a programming paradigm that provides a means of
structuring programs so that properties and behaviors are bundled into individual objects.

For instance, an object could represent a person with properties like a name, age, and address
and behaviors such as walking, talking, breathing, and running. Or it could represent an email
with properties like a recipient list, subject, and body and behaviors like adding attachments
and sending.

Put another way, object-oriented programming is an approach for modeling concrete, real-
world things, like cars, as well as relations between things, like companies and employees,
students and teachers, and so on. OOP models real-world entities as software objects that
have some data associated with them and can perform certain functions.

Another common programming paradigm is procedural programming, which structures a


program like a recipe in that it provides a set of steps, in the form of functions and code
blocks, that flow sequentially in order to complete a task.

The key takeaway is that objects are at the center of object-oriented programming in Python,
not only representing the data, as in procedural programming, but in the overall structure of
the program as well.

Define a Class in Python


Primitive data structures—like numbers, strings, and lists—are designed to represent simple
pieces of information, such as the cost of an apple, the name of a poem, or your favourite
colours, respectively. What if you want to represent something more complex?

For example, let’s say you want to track employees in an organization. You need to store
some basic information about each employee, such as their name, age, position, and the year
they started working.

One way to do this is to represent each employee as a list:

kirk = ["James Kirk", 34, "Captain", 2265]


spock = ["Spock", 35, "Science Officer", 2254]
mccoy = ["Leonard McCoy", "Chief Medical Officer", 2266]
There are a number of issues with this approach.

First, it can make larger code files more difficult to manage. If you reference kirk[0] several
lines away from where the kirk list is declared, will you remember that the element with
index 0 is the employee’s name?
Second, it can introduce errors if not every employee has the same number of elements in the
list. In the mccoy list above, the age is missing, so mccoy[1] will return "Chief Medical
Officer" instead of Dr. McCoy’s age.

A great way to make this type of code more manageable and more maintainable is to use
classes.

Classes vs Instances
Classes are used to create user-defined data structures. Classes define functions called
methods, which identify the behaviors and actions that an object created from the class can
perform with its data.

In this tutorial, you’ll create a Dog class that stores some information about the
characteristics and behaviours that an individual dog can have.

A class is a blueprint for how something should be defined. It doesn’t actually contain any
data. The Dog class specifies that a name and an age are necessary for defining a dog, but it
doesn’t contain the name or age of any specific dog.

While the class is the blueprint, an instance is an object that is built from a class and contains
real data. An instance of the Dog class is not a blueprint anymore. It’s an actual dog with a
name, like Miles, who’s four years old.

Put another way, a class is like a form or questionnaire. An instance is like a form that has
been filled out with information. Just like many people can fill out the same form with their
own unique information, many instances can be created from a single class.

How to Define a Class


All class definitions start with the class keyword, which is followed by the name of the class
and a colon. Any code that is indented below the class definition is considered part of the
class’s body.

Here’s an example of a Dog class:

class Dog:
pass
The body of the Dog class consists of a single statement: the pass keyword. pass is often used
as a placeholder indicating where code will eventually go. It allows you to run this code
without Python throwing an error.

Note: Python class names are written in CapitalizedWords notation by convention. For
example, a class for a specific breed of dog like the Jack Russell Terrier would be written as
JackRussellTerrier.

The Dog class isn’t very interesting right now, so let’s spruce it up a bit by defining some
properties that all Dog objects should have. There are a number of properties that we can
choose from, including name, age, coat color, and breed. To keep things simple, we’ll just
use name and age.
The properties that all Dog objects must have are defined in a method called .__init__().
Every time a new Dog object is created, .__init__() sets the initial state of the object by
assigning the values of the object’s properties. That is, .__init__() initializes each new
instance of the class.

You can give .__init__() any number of parameters, but the first parameter will always be a
variable called self. When a new class instance is created, the instance is automatically passed
to the self parameter in .__init__() so that new attributes can be defined on the object.

Let’s update the Dog class with an .__init__() method that creates .name and .age attributes:

class Dog:
def __init__(self, name, age):
self.name = name
self.age = age

Notice that the .__init__() method’s signature is indented four spaces. The body of the
method is indented by eight spaces. This indentation is vitally important. It tells Python that
the .__init__() method belongs to the Dog class.

In the body of .__init__(), there are two statements using the self variable:

self.name = name creates an attribute called name and assigns to it the value of the name
parameter.

self.age = age creates an attribute called age and assigns to it the value of the age parameter.
Attributes created in .__init__() are called instance attributes. An instance attribute’s value is
specific to a particular instance of the class. All Dog objects have a name and an age, but the
values for the name and age attributes will vary depending on the Dog instance.

On the other hand, class attributes are attributes that have the same value for all class
instances. You can define a class attribute by assigning a value to a variable name outside
of .__init__().

For example, the following Dog class has a class attribute called species with the value
"Canis familiaris":

class Dog:
# Class attribute
species = "Canis familiaris"

def __init__(self, name, age):


self.name = name
self.age = age

Class attributes are defined directly beneath the first line of the class name and are indented
by four spaces. They must always be assigned an initial value. When an instance of the class
is created, class attributes are automatically created and assigned to their initial values.
Use class attributes to define properties that should have the same value for every class
instance. Use instance attributes for properties that vary from one instance to another.

Now that we have a Dog class, let’s create some dogs!

Instantiate an Object in Python


Open IDLE’s interactive window and type the following:

>>> class Dog:


... pass
This creates a new Dog class with no attributes or methods.

Creating a new object from a class is called instantiating an object. You can instantiate a new
Dog object by typing the name of the class, followed by opening and closing parentheses:

>>> Dog()
<__main__.Dog object at 0x106702d30>
You now have a new Dog object at 0x106702d30. This funny-looking string of letters and
numbers is a memory address that indicates where the Dog object is stored in your
computer’s memory. Note that the address you see on your screen will be different.

Now instantiate a second Dog object:

>>> Dog()
<__main__.Dog object at 0x0004ccc90>
The new Dog instance is located at a different memory address. That’s because it’s an
entirely new instance and is completely unique from the first Dog object that you instantiated.

To see this another way, type the following:

>>> a = Dog()
>>> b = Dog()
>>> a == b
False

In this code, you create two new Dog objects and assign them to the variables a and b. When
you compare a and b using the == operator, the result is False. Even though a and b are both
instances of the Dog class, they represent two distinct objects in memory.

Class and Instance Attributes


Now create a new Dog class with a class attribute called .species and two instance attributes
called .name and .age:

>>> class Dog:


... species = "Canis familiaris"
... def __init__(self, name, age):
... self.name = name
... self.age = age
To instantiate objects of this Dog class, you need to provide values for the name and age. If
you don’t, then Python raises a TypeError:
>>> Dog()
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
Dog()
TypeError: __init__() missing 2 required positional arguments: 'name' and 'age'

To pass arguments to the name and age parameters, put values into the parentheses after the
class name:

>>> buddy = Dog("Buddy", 9)


>>> miles = Dog("Miles", 4)

This creates two new Dog instances—one for a nine-year-old dog named Buddy and one for
a four-year-old dog named Miles.

The Dog class’s .__init__() method has three parameters, so why are only two arguments
passed to it in the example?

When you instantiate a Dog object, Python creates a new instance and passes it to the first
parameter of .__init__(). This essentially removes the self parameter, so you only need to
worry about the name and age parameters.

After you create the Dog instances, you can access their instance attributes using dot
notation:

>>> buddy.name
'Buddy'
>>> buddy.age
9

>>> miles.name
'Miles'
>>> miles.age
4
You can access class attributes the same way:

>>> buddy.species
'Canis familiaris'

One of the biggest advantages of using classes to organize data is that instances are
guaranteed to have the attributes you expect. All Dog instances have .species, .name, and .age
attributes, so you can use those attributes with confidence knowing that they will always
return a value.

Although the attributes are guaranteed to exist, their values can be changed dynamically:

>>> buddy.age = 10
>>> buddy.age
10
>>> miles.species = "Felis silvestris"
>>> miles.species
'Felis silvestris'

In this example, you change the .age attribute of the buddy object to 10. Then you change
the .species attribute of the miles object to "Felis silvestris", which is a species of cat. That
makes Miles a pretty strange dog, but it is valid Python!

The key takeaway here is that custom objects are mutable by default. An object is mutable if
it can be altered dynamically. For example, lists and dictionaries are mutable, but strings and
tuples are immutable.

Instance Methods
Instance methods are functions that are defined inside a class and can only be called from an
instance of that class. Just like .__init__(), an instance method’s first parameter is always self.

Open a new editor window in IDLE and type in the following Dog class:

class Dog:
species = "Canis familiaris"

def __init__(self, name, age):


self.name = name
self.age = age

# Instance method
def description(self):
return f"{self.name} is {self.age} years old"

# Another instance method


def speak(self, sound):
return f"{self.name} says {sound}"

This Dog class has two instance methods:

.description() returns a string displaying the name and age of the dog.
.speak() has one parameter called sound and returns a string containing the dog’s name and
the sound the dog makes.
Save the modified Dog class to a file called dog.py and press F5 to run the program. Then
open the interactive window and type the following to see your instance methods in action:

>>> miles = Dog("Miles", 4)

>>> miles.description()
'Miles is 4 years old'

>>> miles.speak("Woof Woof")


'Miles says Woof Woof'

>>> miles.speak("Bow Wow")


'Miles says Bow Wow'
In the above Dog class, .description() returns a string containing information about the Dog
instance miles. When writing your own classes, it’s a good idea to have a method that returns
a string containing useful information about an instance of the class. However, .description()
isn’t the most Pythonic way of doing this.

When you create a list object, you can use print() to display a string that looks like the list:

>>> names = ["Fletcher", "David", "Dan"]


>>> print(names)
['Fletcher', 'David', 'Dan']
Let’s see what happens when you print() the miles object:

>>> print(miles)
<__main__.Dog object at 0x00aeff70>

When you print(miles), you get a cryptic looking message telling you that miles is a Dog
object at the memory address 0x00aeff70. This message isn’t very helpful. You can change
what gets printed by defining a special instance method called .__str__().

In the editor window, change the name of the Dog class’s .description() method to .__str__():

class Dog:
# Leave other parts of Dog class as-is

# Replace .description() with __str__()


def __str__(self):
return f"{self.name} is {self.age} years old"
Save the file and press F5. Now, when you print(miles), you get a much friendlier output:

>>> miles = Dog("Miles", 4)


>>> print(miles)
'Miles is 4 years old'

Methods like .__init__() and .__str__() are called dunder methods because they begin and
end with double underscores. There are many dunder methods that you can use to customize
classes in Python. Although too advanced a topic for a beginning Python book, understanding
dunder methods is an important part of mastering object-oriented programming in Python.

In the next section, you’ll see how to take your knowledge one step further and create classes
from other classes.

Inherit From Other Classes in Python


Inheritance is the process by which one class takes on the attributes and methods of another.
Newly formed classes are called child classes, and the classes that child classes are derived
from are called parent classes.

Child classes can override or extend the attributes and methods of parent classes. In other
words, child classes inherit all of the parent’s attributes and methods but can also specify
attributes and methods that are unique to themselves.
Although the analogy isn’t perfect, you can think of object inheritance sort of like genetic
inheritance.

You may have inherited your hair color from your mother. It’s an attribute you were born
with. Let’s say you decide to color your hair purple. Assuming your mother doesn’t have
purple hair, you’ve just overridden the hair color attribute that you inherited from your mom.

You also inherit, in a sense, your language from your parents. If your parents speak English,
then you’ll also speak English. Now imagine you decide to learn a second language, like
German. In this case you’ve extended your attributes because you’ve added an attribute that
your parents don’t have.

Dog Park Example


Pretend for a moment that you’re at a dog park. There are many dogs of different breeds at
the park, all engaging in various dog behaviors.

Suppose now that you want to model the dog park with Python classes. The Dog class that
you wrote in the previous section can distinguish dogs by name and age but not by breed.

You could modify the Dog class in the editor window by adding a .breed attribute:

class Dog:
species = "Canis familiaris"

def __init__(self, name, age, breed):


self.name = name
self.age = age
self.breed = breed
The instance methods defined earlier are omitted here because they aren’t important for this
discussion.

Press F5 to save the file. Now you can model the dog park by instantiating a bunch of
different dogs in the interactive window:

>>> miles = Dog("Miles", 4, "Jack Russell Terrier")


>>> buddy = Dog("Buddy", 9, "Dachshund")
>>> jack = Dog("Jack", 3, "Bulldog")
>>> jim = Dog("Jim", 5, "Bulldog")
Each breed of dog has slightly different behaviors. For example, bulldogs have a low bark
that sounds like woof, but dachshunds have a higher-pitched bark that sounds more like yap.

Using just the Dog class, you must supply a string for the sound argument of .speak() every
time you call it on a Dog instance:

>>> buddy.speak("Yap")
'Buddy says Yap'

>>> jim.speak("Woof")
'Jim says Woof'
>>> jack.speak("Woof")
'Jack says Woof'
Passing a string to every call to .speak() is repetitive and inconvenient. Moreover, the string
representing the sound that each Dog instance makes should be determined by its .breed
attribute, but here you have to manually pass the correct string to .speak() every time it’s
called.

You can simplify the experience of working with the Dog class by creating a child class for
each breed of dog. This allows you to extend the functionality that each child class inherits,
including specifying a default argument for .speak().

Parent Classes vs Child Classes


Let’s create a child class for each of the three breeds mentioned above: Jack Russell Terrier,
Dachshund, and Bulldog.

For reference, here’s the full definition of the Dog class:

class Dog:
species = "Canis familiaris"

def __init__(self, name, age):


self.name = name
self.age = age

def __str__(self):
return f"{self.name} is {self.age} years old"

def speak(self, sound):


return f"{self.name} says {sound}"

Remember, to create a child class, you create new class with its own name and then put the
name of the parent class in parentheses. Add the following to the dog.py file to create three
new child classes of the Dog class:

class JackRussellTerrier(Dog):
pass

class Dachshund(Dog):
pass

class Bulldog(Dog):
pass

Press F5 to save and run the file. With the child classes defined, you can now instantiate
some dogs of specific breeds in the interactive window:

>>> miles = JackRussellTerrier("Miles", 4)


>>> buddy = Dachshund("Buddy", 9)
>>> jack = Bulldog("Jack", 3)
>>> jim = Bulldog("Jim", 5)
Instances of child classes inherit all of the attributes and methods of the parent class:

>>> miles.species
'Canis familiaris'

>>> buddy.name
'Buddy'

>>> print(jack)
Jack is 3 years old

>>> jim.speak("Woof")
'Jim says Woof'

To determine which class a given object belongs to, you can use the built-in type():

>>> type(miles)
<class '__main__.JackRussellTerrier'>

What if you want to determine if miles is also an instance of the Dog class? You can do this
with the built-in isinstance():

>>> isinstance(miles, Dog)


True

Notice that isinstance() takes two arguments, an object and a class. In the example above,
isinstance() checks if miles is an instance of the Dog class and returns True.

The miles, buddy, jack, and jim objects are all Dog instances, but miles is not a Bulldog
instance, and jack is not a Dachshund instance:

>>> isinstance(miles, Bulldog)


False

>>> isinstance(jack, Dachshund)


False

More generally, all objects created from a child class are instances of the parent class,
although they may not be instances of other child classes.

Now that you’ve created child classes for some different breeds of dogs, let’s give each breed
its own sound.

Extend the Functionality of a Parent Class


Since different breeds of dogs have slightly different barks, you want to provide a default
value for the sound argument of their respective .speak() methods. To do this, you need to
override .speak() in the class definition for each breed.

To override a method defined on the parent class, you define a method with the same name
on the child class. Here’s what that looks like for the JackRussellTerrier class:
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return f"{self.name} says {sound}"

Now .speak() is defined on the JackRussellTerrier class with the default argument for sound
set to "Arf".

Update dog.py with the new JackRussellTerrier class and press F5 to save and run the file.
You can now call .speak() on a JackRussellTerrier instance without passing an argument to
sound:

>>> miles = JackRussellTerrier("Miles", 4)


>>> miles.speak()
'Miles says Arf'
Sometimes dogs make different barks, so if Miles gets angry and growls, you can still
call .speak() with a different sound:

>>> miles.speak("Grrr")
'Miles says Grrr'
One thing to keep in mind about class inheritance is that changes to the parent class
automatically propagate to child classes. This occurs as long as the attribute or method being
changed isn’t overridden in the child class.

For example, in the editor window, change the string returned by .speak() in the Dog class:

class Dog:
# Leave other attributes and methods as they are

# Change the string returned by .speak()


def speak(self, sound):
return f"{self.name} barks: {sound}"
Save the file and press F5. Now, when you create a new Bulldog instance named jim,
jim.speak() returns the new string:

>>> jim = Bulldog("Jim", 5)


>>> jim.speak("Woof")
'Jim barks: Woof'

However, calling .speak() on a JackRussellTerrier instance won’t show the new style of
output:

>>> miles = JackRussellTerrier("Miles", 4)


>>> miles.speak()
'Miles says Arf'

Sometimes it makes sense to completely override a method from a parent class. But in this
instance, we don’t want the JackRussellTerrier class to lose any changes that might be made
to the formatting of the output string of Dog.speak().
To do this, you still need to define a .speak() method on the child JackRussellTerrier class.
But instead of explicitly defining the output string, you need to call the Dog class’s .speak()
inside of the child class’s .speak() using the same arguments that you passed to
JackRussellTerrier.speak().

You can access the parent class from inside a method of a child class by using super():

class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return super().speak(sound)
When you call super().speak(sound) inside JackRussellTerrier, Python searches the parent
class, Dog, for a .speak() method and calls it with the variable sound.

Update dog.py with the new JackRussellTerrier class. Save the file and press F5 so you can
test it in the interactive window:

>>> miles = JackRussellTerrier("Miles", 4)


>>> miles.speak()
'Miles barks: Arf'
Now when you call miles.speak(), you’ll see output reflecting the new formatting in the Dog
class.

Note: In the above examples, the class hierarchy is very straightforward. The
JackRussellTerrier class has a single parent class, Dog. In real-world examples, the class
hierarchy can get quite complicated.

super() does much more than just search the parent class for a method or an attribute. It
traverses the entire class hierarchy for a matching method or attribute. If you aren’t careful,
super() can have surprising results.
UNIT – IV

Exception and Error Handling in Python


Error handling increases the robustness of your code, which guards against potential failures
that would cause your program to exit in an uncontrolled fashion.

Introduction
Before we get into why exception handling is essential and types of built-in exceptions that
Python supports, it is necessary to understand that there is a subtle difference between
an error and an exception.

Errors cannot be handled, while Python exceptions can be handled at the run time. An error
can be a syntax (parsing) error, while there can be many types of exceptions that could occur
during the execution and are not unconditionally inoperable. An Error might indicate critical
problems that a reasonable application should not try to catch, while an Exception might
indicate conditions that an application should try to catch. Errors are a form of an unchecked
exception and are irrecoverable like an OutOfMemoryError, which a programmer should not
try to handle.

Exception handling makes your code more robust and helps prevent potential failures that
would cause your program to stop in an uncontrolled manner. Imagine if you have written a
code which is deployed in production and still, it terminates due to an exception, your client
would not appreciate that, so it's better to handle the particular exception beforehand and
avoid the chaos.

Errors can be of various types:

 Syntax Error
 Out of Memory Error

 Recursion Error

 Exceptions

Let's see them one by one.

Syntax Error
Syntax errors often called as parsing errors, are predominantly caused when the parser
detects a syntactic issue in your code.

Let's take an example to understand it.


a=8
b = 10
c=ab
File "<ipython-input-8-3b3ffcedf995>", line 3
c=ab
^
SyntaxError: invalid syntax

The above arrow indicates when the parser ran into an error while executing the code. The
token preceding the arrow causes the failure. To rectify such fundamental errors, Python will
do most of your job since it will print for you the file name and the line number at which the
error occurred.

Out of Memory Error


Memory errors are mostly dependent on your systems RAM and are related to Heap. If you
have large objects (or) referenced objects in memory, then you will see OutofMemoryError
(Source). It can be caused due to various reasons:

 Using a 32-bit Python Architecture (Maximum Memory Allocation given is very low,
between 2GB - 4GB).
 Loading a very large data file

 Running a Machine Learning/Deep Learning model and many more.

You can handle the memory error with the help of exception handling, a fallback exception
for when the interpreter entirely runs out of memory and must immediately stop the current
execution. In these rare instances, Python raises an OutofMemoryError, allowing the script to
somehow catch itself and break out of the memory error and recover itself.

However, since Python adopts to the memory management architecture of the C language
(malloc() function), it is not certain that all processes of the script will recover — in some
cases, a MemoryError will result in an unrecoverable crash. Hence, neither it is a good
practice to use exception handling for such an error, nor it is advisable.

Recursion Error
It is related to stack and occurs when you call functions. As the name
suggests, recursion error transpires when too many methods, one inside another is executed
(one with an infinite recursion), which is limited by the size of the stack.

All your local variables and methods call associated data will be placed on the stack. For each
method call, one stack frame will be created, and local as well as method call relevant data
will be placed inside that stack frame. Once the method execution is completed, the stack
frame will be removed.

To reproduce this error, let's define a function recursion that will be recursive, meaning it will
keep calling itself as an infinite loop method call, you will see StackOverflow or a Recursion
Error because the stack frame will be populated with method data for every call, but it will
not be freed.
def recursion():
return recursion()
recursion()
---------------------------------------------------------------------------

RecursionError Traceback (most recent call last)


<ipython-input-3-c6e0f7eb0cde> in <module>
----> 1 recursion()

<ipython-input-2-5395140f7f05> in recursion()
1 def recursion():
----> 2 return recursion()

... last 1 frames repeated, from the frame below ...

<ipython-input-2-5395140f7f05> in recursion()
1 def recursion():
----> 2 return recursion()

RecursionError: maximum recursion depth exceeded

Indentation Error
Indentation error is similar in spirit to the syntax error and falls under it. However, specific to
the only indentation related issues in the script.

So let's take a quick example to understand an indentation error.


for i in range(10):
print('Hello world')
File "<ipython-input-6-628f419d2da8>", line 2
print('Hello world')
^
IndentationError: expected an indented block

Exceptions
Even if the syntax of a statement or expression is correct, it may still cause an error when
executed. Python exceptions are errors that are detected during execution and are not
unconditionally fatal: you will soon learn in the tutorial how to handle them in Python
programs. An exception object is created when a Python script raises an exception. If the
script explicitly doesn't handle the exception, the program will be forced to terminate
abruptly.

The programs usually do not handle exceptions, and result in error messages as shown here:

Type Error
a=2
b = 'DataCamp'
a+b
---------------------------------------------------------------------------

TypeError Traceback (most recent call last)


<ipython-input-7-86a706a0ffdf> in <module>
1a=2
2 b = 'DataCamp'
----> 3 a + b

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Zero Division Error


100 / 0
---------------------------------------------------------------------------

ZeroDivisionError Traceback (most recent call last)

<ipython-input-43-e9e866a10e2a> in <module>
----> 1 100 / 0

ZeroDivisionError: division by zero

There are various types of Python exceptions, and the type is printed as part of the message:
the types in the above two examples are ZeroDivisionError and TypeError. Both the error
strings printed as the exception type is the name of the Python's built-in exception.

The remaining part of the error line provides the details of what caused the error based on the
type of exception.

Let's now look at Python's built-in exceptions.

Built-in Exceptions

Source
Before you start learning the built-in exceptions, let's just quickly revise the four main
components of exception handling, as shown in this figure.

 Try: It will run the code block in which you expect an error to occur.

 Except: Here, you will define the type of exception you expect in the try block (built-
in or custom).

 Else: If there isn't any exception, then this block of code will be executed (consider
this as a remedy or a fallback option if you expect a part of your script to produce an
exception).

 Finally: Irrespective of whether there is an exception or not, this block of code will


always be executed.
In the following section of the tutorial, you will learn about the common type of exceptions
and also learn to handle them with the help of exception handling.

Keyboard Interrupt Error


The KeyboardInterrupt exception is raised when you try to stop a running program by
pressing ctrl+c or ctrl+z in a command line or interrupting the kernel in Jupyter Notebook.
Sometimes you might not intend to interrupt a program, but by mistake, it happens, in which
case using exception handling to avoid such issues can be helpful.

In the below example, if you run the cell and interrupt the kernel, the program will raise a
KeyboardInterrupt exception. inp = input() Let's now handle
the KeyboardInterrupt exception.
try:
inp = input()
print ('Press Ctrl+C or Interrupt the Kernel:')
except KeyboardInterrupt:
print ('Caught KeyboardInterrupt')
else:
print ('No exception occurred')
Caught KeyboardInterrupt

Standard Error
Let's learn about some of the standard errors that could usually occur while programming.

Arithmetic Error

 Zero Division Error


 OverFlow Error

 Floating Point Error

All of the above exceptions fall under the Arithmetic base class and are raised for errors in
arithmetic operations, as discussed here.

Zero Division
When the divisor (second argument of the division) or the denominator is zero, then the
resultant raises a zero division error.
try:
a = 100 / 0
print (a)
except ZeroDivisionError:
print ("Zero Division Exception Raised." )
else:
print ("Success, no error!")
Zero Division Exception Raised.

OverFlow Error
The Overflow Error is raised when the result of an arithmetic operation is out of range.
OverflowError is raised for integers that are outside a required range.
try:
import math
print(math.exp(1000))
except OverflowError:
print ("OverFlow Exception Raised.")
else:
print ("Success, no error!")
OverFlow Exception Raised.

Assertion Error
When an assert statement is failed, an Assertion Error is raised.

Let's take an example to understand the assertion error. Let's say you have two
variables a and b, which you need to compare. To check whether a and b are equal or not, you
apply an assert keyword before that, which will raise an Assertion exception when the
expression will return false.
try:
a = 100
b = "DataCamp"
assert a == b
except AssertionError:
print ("Assertion Exception Raised.")
else:
print ("Success, no error!")
Assertion Exception Raised.

Attribute Error
When a non-existent attribute is referenced, and when that attribute reference or assignment
fails, an attribute error is raised.

In the below example, you can observe that the Attributes class object has no attribute with
the name attribute.
class Attributes(object):
a=2
print (a)

try:
object = Attributes()
print (object.attribute)
except AttributeError:
print ("Attribute Exception Raised.")
2
Attribute Exception Raised.

Import Error
ImportError is raised when you try to import a module that does not exist (unable to load) in
its standard path or even when you make a typo in the module's name.
import nibabel
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)

<ipython-input-6-9e567e3ae964> in <module>
----> 1 import nibabel

ModuleNotFoundError: No module named 'nibabel'

Lookup Error
Lookup Error acts as a base class for the exceptions that occur when a key or index used on a
mapping or sequence of a list/dictionary is invalid or does not exists.

The two types of exceptions raised are:

 IndexError
 KeyError

Key Error
If a key you are trying to access is not found in the dictionary, a key error exception is raised.
try:
a = {1:'a', 2:'b', 3:'c'}
print (a[4])
except LookupError:
print ("Key Error Exception Raised.")
else:
print ("Success, no error!")
Key Error Exception Raised.

Index Error
When you are trying to access an index (sequence) of a list that does not exist in that list or is
out of range of that list, an index error is raised.
try:
a = ['a', 'b', 'c']
print (a[4])
except LookupError:
print ("Index Error Exception Raised, list index out of range")
else:
print ("Success, no error!")
Index Error Exception Raised, list index out of range

Memory Error
As discussed earlier, Memory Error is raised when an operation does not get enough memory
to process further.

Name Error
Name Error is raised when a local or global name is not found.

In the below example, ans variable is not defined. Hence, you will get a name error.
try:
print (ans)
except NameError:
print ("NameError: name 'ans' is not defined")
else:
print ("Success, no error!")
NameError: name 'ans' is not defined

Runtime Error

Not Implemented Error


This section of the tutorial is derived from this Source. Runtime Error acts as a base class for
the NotImplemented Error. Abstract methods in user-defined classes should raise this
exception when the derived classes override the method.
class BaseClass(object):
"""Defines the interface"""
def __init__(self):
super(BaseClass, self).__init__()
def do_something(self):
"""The interface, not implemented"""
raise NotImplementedError(self.__class__.__name__ + '.do_something')

class SubClass(BaseClass):
"""Implementes the interface"""
def do_something(self):
"""really does something"""
print (self.__class__.__name__ + ' doing something!')

SubClass().do_something()
BaseClass().do_something()
SubClass doing something!

---------------------------------------------------------------------------

NotImplementedError Traceback (most recent call last)

<ipython-input-1-57792b6bc7e4> in <module>
14
15 SubClass().do_something()
---> 16 BaseClass().do_something()

<ipython-input-1-57792b6bc7e4> in do_something(self)
5 def do_something(self):
6 """The interface, not implemented"""
----> 7 raise NotImplementedError(self.__class__.__name__ + '.do_something')
8
9 class SubClass(BaseClass):

NotImplementedError: BaseClass.do_something

Type Error
Type Error Exception is raised when two different or unrelated types of operands or objects
are combined.

In the below example, an integer and a string are added, which results in a type error.
try:
a=5
b = "DataCamp"
c=a+b
except TypeError:
print ('TypeError Exception Raised')
else:
print ('Success, no error!')
TypeError Exception Raised

Value Error
Value error is raised when the built-in operation or a function receives an argument that has a
correct type but invalid value.

In the below example, the built-in operation float receives an argument, which is a sequence


of characters (value), which is invalid for a type float.
try:
print (float('DataCamp'))
except ValueError:
print ('ValueError: could not convert string to float: \'DataCamp\'')
else:
print ('Success, no error!')
ValueError: could not convert string to float: 'DataCamp'

Python Custom Exceptions


This section of the tutorial is derived from this Source.

As studied in the previous section of the tutorial, Python has many built-in exceptions that
you can use in your program. Still, sometimes, you may need to create custom exceptions
with custom messages to serve your purpose.

You can achieve this by creating a new class, which will be derived from the pre-defined
Exception class in Python.
class UnAcceptedValueError(Exception):
def __init__(self, data):
self.data = data
def __str__(self):
return repr(self.data)
Total_Marks = int(input("Enter Total Marks Scored: "))
try:
Num_of_Sections = int(input("Enter Num of Sections: "))
if(Num_of_Sections < 1):
raise UnAcceptedValueError("Number of Sections can't be less than 1")
except UnAcceptedValueError as e:
print ("Received error:", e.data)
Enter Total Marks Scored: 10
Enter Num of Sections: 0
Received error: Number of Sections can't be less than 1

In the above example, as you observed that if you enter anything less than 1, a custom
exception will be raised and handled. Many standard modules define their exceptions to
report errors that may occur in functions they define.

Demerits of Python Exception Handling


Making use of Python exception handling has a side effect, as well. Like, programs that make
use try-except blocks to handle exceptions will run slightly slower, and the size of your code
will increase.

Below is an example where the timeit module of Python is being used to check the execution
time of 2 different statements. In stmt1, try-except is used to handle ZeroDivisionError,
while in stmt2, if statement is used as a normal check condition. Then, you execute these
statements 10000 times with variable a=0. The point to note here is that the execution time of
both the statements is different. You will find that stmt1, which is handling the exception,
took a slightly longer time than stmt2, which is just checking the value and doing nothing if
the condition is not met.

Hence, you should limit the use of Python exception handling and use it for rare cases only.
For example, when you are not sure whether the input will be an integer or a float for
arithmetic calculations or not sure about the existence of a file while trying to open it.
import timeit
setup="a=0"
stmt1 = '''\
try:
b=10/a
except ZeroDivisionError:
pass'''
stmt2 = '''\
if a!=0:
b=10/a'''
print("time=",timeit.timeit(stmt1,setup,number=10000))
print("time=",timeit.timeit(stmt2,setup,number=10000))
time= 0.003897680000136461
time= 0.0002797570000439009

PYTHON FUNCTIONS - MAP, FILTER, AND FUNCTOOLS.REDUCE

map, filter, and reduce


Python provides several functions which enable a functional approach to programming.
These functions are all convenience features in that they can be written in Python fairly
easily.

Functional programming is all about expressions. We may say that the Functional


programming is an expression oriented programming.
Expression oriented functions of Python provides are:
1. map(aFunction, aSequence)
2. filter(aFunction, aSequence)
3. reduce(aFunction, aSequence)
4. lambda
5. list comprehension

map
One of the common things we do with list and other sequences is applying an operation to
each item and collect the result.
For example, updating all the items in a list can be done easily with a for loop:

>>> items = [1, 2, 3, 4, 5]


>>> squared = []
>>> for x in items:
squared.append(x ** 2)
>>> squared
[1, 4, 9, 16, 25]
>>>

Since this is such a common operation, actually, we have a built-in feature that does most of
the work for us.
The map(aFunction, aSequence) function applies a passed-in function to each item in an
iterable object and returns a list containing all the function call results.

>>> items = [1, 2, 3, 4, 5]


>>>
>>> def sqr(x): return x ** 2

>>> list(map(sqr, items))


[1, 4, 9, 16, 25]
>>>

We passed in a user-defined function applied to each item in the list. map calls sqr on each


list item and collects all the return values into a new list.

Because map expects a function to be passed in, it also happens to be one of the places


where lambda routinely appears:

>>> list(map((lambda x: x **2), items))


[1, 4, 9, 16, 25]
>>>

In the short example above, the lambda function squares each item in the items list.

As shown earlier, map is defined like this:

map(aFunction, aSequence)

While we still use lamda as a aFunction, we can have a list of functions as aSequence:

def square(x):
return (x**2)
def cube(x):
return (x**3)

funcs = [square, cube]


for r in range(5):
value = map(lambda x: x(r), funcs)
print value
Output:

[0, 0]
[1, 1]
[4, 8]
[9, 27]
[16, 64]

Because using map is equivalent to for loops, with an extra code we can always write a
general mapping utility:

>>> def mymap(aFunc, aSeq):


result = []
for x in aSeq: result.append(aFunc(x))
return result

>>> list(map(sqr, [1, 2, 3]))


[1, 4, 9]
>>> mymap(sqr, [1, 2, 3])
[1, 4, 9]
>>>

Since it's a built-in, map is always available and always works the same way. It also has
some performance benefit because it is usually faster than a manually coded for loop. On top
of those, map can be used in more advance way. For example, given multiple sequence
arguments, it sends items taken form sequences in parallel as distinct arguments to the
function:

>>> pow(3,5)
243
>>> pow(2,10)
1024
>>> pow(3,11)
177147
>>> pow(4,12)
16777216
>>>
>>> list(map(pow, [2, 3, 4], [10, 11, 12]))
[1024, 177147, 16777216]
>>>

As in the example above, with multiple sequences, map() expects an N-argument function


for N sequences. In the example, pow function takes two arguments on each call.
Here is another example of map() doing element-wise addition with two lists:

x = [1,2,3]
y = [4,5,6]

from operator import add


print map(add, x, y) # output [5, 7, 9]

The map call is similar to the list comprehension expression. But map applies a function


call to each item instead of an arbitrary expression. Because of this limitation, it is
somewhat less general tool. In some cases, however, map may be faster to run than a list
comprehension such as when mapping a built-in function. And map requires less coding.
If function is None, the identity function is assumed; if there are multiple
arguments, map() returns a list consisting of tuples containing the corresponding items from
all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any
iterable object; the result is always a list:

>>> m = [1,2,3]
>>> n = [1,4,9]
>>> new_tuple = map(None, m, n)
>>> new_tuple
[(1, 1), (2, 4), (3, 9)]

For Python 3, we may want to use itertools.zip_longest instead:


>>> m = [1,2,3]
>>> n = [1,4,9]
>>> from itertools import zip_longest
>>> for i,j in zip_longest(m,n):
... print(i,j)
...
11
24
39

The zip_longest() makes an iterator that aggregates elements from the two iterables (m &


n).
We can do typecasting using map. In the following example, we construct 4x3 matrix from
the user input:

for _ in range(4):
arr.append(list(map(int, input().rstrip().split())))
print(arr)

With an input from the user:

123
456
789
10 11 12

We get a 4x3 integer array:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

filter and reduce


As the name suggests filter extracts each element in the sequence for which the function
returns True. The reduce function is a little less obvious in its intent. This function reduces a
list to a single value by combining elements via a supplied function. The map function is the
simplest one among Python built-ins used for functional programming.

These tools apply functions to sequences and other iterables. The filter filters out items based
on a test function which is a filter and apply functions to pairs of item and running result
which is reduce.

Because they return iterables, range and filter both require list calls to display all their results
in Python 3.0.

As an example, the following filter call picks out items in a sequence that are less than zero:

>>> list(range(-5,5))
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
>>>
>>> list( filter((lambda x: x < 0), range(-5,5)))
[-5, -4, -3, -2, -1]
>>>

Items in the sequence or iterable for which the function returns a true, the result are
added to the result list. Like map, this function is roughly equivalent to a for loop,
but it is built-in and fast:

>>>
>>> result = []
>>> for x in range(-5, 5):
if x < 0:
result.append(x)

>>> result
[-5, -4, -3, -2, -1]
>>>

Here is another use case for filter(): finding intersection of two lists:

a = [1,2,3,5,7,9]
b = [2,3,5,6,7,8]
print filter(lambda x: x in a, b) # prints out [2, 3, 5, 7]

Note that we can do the same with list comprehension:

a = [1,2,3,5,7,9]
b = [2,3,5,6,7,8]
print [x for x in a if x in b] # prints out [2, 3, 5, 7]

The reduce is in the functools in Python 3.0. It is more complex. It accepts an


iterator to process, but it's not an iterator itself. It returns a single result:

>>>
>>> from functools import reduce
>>> reduce( (lambda x, y: x * y), [1, 2, 3, 4] )
24
>>> reduce( (lambda x, y: x / y), [1, 2, 3, 4] )
0.041666666666666664
>>>

At each step, reduce passes the current product or division, along with the next
item from the list, to the passed-in lambda function. By default, the first item in
the sequence initialized the starting value.
Here's the for loop version of the first of these calls, with the multiplication
hardcoded inside the loop:

>>> L = [1, 2, 3, 4]
>>> result = L[0]
>>> for x in L[1:]:
result = result * x

>>> result
24
>>>

Let's make our own version of reduce.

>>> def myreduce(fnc, seq):


tally = seq[0]
for next in seq[1:]:
tally = fnc(tally, next)
return tally

>>> myreduce( (lambda x, y: x * y), [1, 2, 3, 4])


24
>>> myreduce( (lambda x, y: x / y), [1, 2, 3, 4])
0.041666666666666664
>>>

We can concatenate a list of strings to make a sentence. Using the Dijkstra's


famous quote on bug:

import functools
>>> L = ['Testing ', 'shows ', 'the ', 'presence', ', ','not ', 'the ', 'absence ', 'of ', 'bugs']
>>> functools.reduce( (lambda x,y:x+y), L)
'Testing shows the presence, not the absence of bugs'
>>>

We can get the same result by using join :

>>> ''.join(L)
'Testing shows the presence, not the absence of bugs'

We can also use operator to produce the same result:

>>> import functools, operator


>>> functools.reduce(operator.add, L)
'Testing shows the presence, not the absence of bugs'
>>>
The built-in reduce also allows an optional third argument placed before the items
in the sequence to serve as a default result when the sequence is empty.

UNIT – V
Getting Started with Python Multithreading
Let us start by creating a Python module, named download.py. This file will contain all the
functions necessary to fetch the list of images and download them. We will split these
functionalities into three separate functions:

get_links
download_link
setup_download_dir
The third function, setup_download_dir, will be used to create a download destination
directory if it doesn’t already exist.
Imgur’s API requires HTTP requests to bear the Authorization header with the client ID. You
can find this client ID from the dashboard of the application that you have registered on
Imgur, and the response will be JSON encoded. We can use Python’s standard JSON library
to decode it. Downloading the image is an even simpler task, as all you have to do is fetch the
image by its URL and write it to a file.

This is what the script looks like:

import json
import logging
import os
from pathlib import Path
from urllib.request import urlopen, Request

logger = logging.getLogger(__name__)

types = {'image/jpeg', 'image/png'}

def get_links(client_id):
headers = {'Authorization': 'Client-ID {}'.format(client_id)}
req = Request('https://api.imgur.com/3/gallery/random/random/', headers=headers,
method='GET')
with urlopen(req) as resp:
data = json.loads(resp.read().decode('utf-8'))
return [item['link'] for item in data['data'] if 'type' in item and item['type'] in types]

def download_link(directory, link):


download_path = directory / os.path.basename(link)
with urlopen(link) as image, download_path.open('wb') as f:
f.write(image.read())
logger.info('Downloaded %s', link)

def setup_download_dir():
download_dir = Path('images')
if not download_dir.exists():
download_dir.mkdir()
return download_dir
Next, we will need to write a module that will use these functions to download the images,
one by one. We will name this single.py. This will contain the main function of our first,
naive version of the Imgur image downloader. The module will retrieve the Imgur client ID
in the environment variable IMGUR_CLIENT_ID. It will invoke the setup_download_dir to
create the download destination directory. Finally, it will fetch a list of images using the
get_links function, filter out all GIF and album URLs, and then use download_link to
download and save each of those images to the disk. Here is what single.py looks like:

import logging
import os
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')
logger = logging.getLogger(__name__)

def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
for link in links:
download_link(download_dir, link)
logging.info('Took %s seconds', time() - ts)

if __name__ == '__main__':
main()

Concurrency and Parallelism in Python: Threading Example


Threading is one of the most well-known approaches to attaining Python concurrency and
parallelism. Threading is a feature usually provided by the operating system. Threads are
lighter than processes, and share the same memory space.

Python multithreading memory model


In this Python threading example, we will write a new module to replace single.py. This
module will create a pool of eight threads, making a total of nine threads including the main
thread. I chose eight worker threads because my computer has eight CPU cores and one
worker thread per core seemed a good number for how many threads to run at once. In
practice, this number is chosen much more carefully based on other factors, such as other
applications and services running on the same machine.

This is almost the same as the previous one, with the exception that we now have a new class,
DownloadWorker, which is a descendent of the Python Thread class. The run method has
been overridden, which runs an infinite loop. On every iteration, it calls self.queue.get() to try
and fetch a URL to from a thread-safe queue. It blocks until there is an item in the queue for
the worker to process. Once the worker receives an item from the queue, it then calls the
same download_link method that was used in the previous script to download the image to
the images directory. After the download is finished, the worker signals the queue that that
task is done. This is very important, because the Queue keeps track of how many tasks were
enqueued. The call to queue.join() would block the main thread forever if the workers did not
signal that they completed a task.

import logging
import os
from queue import Queue
from threading import Thread
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')

logger = logging.getLogger(__name__)

class DownloadWorker(Thread):

def __init__(self, queue):


Thread.__init__(self)
self.queue = queue

def run(self):
while True:
# Get the work from the queue and expand the tuple
directory, link = self.queue.get()
try:
download_link(directory, link)
finally:
self.queue.task_done()

def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
# Create a queue to communicate with the worker threads
queue = Queue()
# Create 8 worker threads
for x in range(8):
worker = DownloadWorker(queue)
# Setting daemon to True will let the main thread exit even though the workers are
blocking
worker.daemon = True
worker.start()
# Put the tasks into the queue as a tuple
for link in links:
logger.info('Queueing {}'.format(link))
queue.put((download_dir, link))
# Causes the main thread to wait for the queue to finish processing all the tasks
queue.join()
logging.info('Took %s', time() - ts)
if __name__ == '__main__':
main()
Running this Python threading example script on the same machine used earlier results in a
download time of 4.1 seconds! That’s 4.7 times faster than the previous example. While this
is much faster, it is worth mentioning that only one thread was executing at a time throughout
this process due to the GIL. Therefore, this code is concurrent but not parallel. The reason it
is still faster is because this is an IO bound task. The processor is hardly breaking a sweat
while downloading these images, and the majority of the time is spent waiting for the
network. This is why Python multithreading can provide a large speed increase. The
processor can switch between the threads whenever one of them is ready to do some work.
Using the threading module in Python or any other interpreted language with a GIL can
actually result in reduced performance. If your code is performing a CPU bound task, such as
decompressing gzip files, using the threading module will result in a slower execution time.
For CPU bound tasks and truly parallel execution, we can use the multiprocessing module.

While the de facto reference Python implementation—CPython–has a GIL, this is not true of
all Python implementations. For example, IronPython, a Python implementation using
the .NET framework, does not have a GIL, and neither does Jython, the Java-based
implementation. You can find a list of working Python implementations here.

Related: Python Best Practices and Tips by Toptal Developers


Concurrency and Parallelism in Python Example 2: Spawning Multiple Processes
The multiprocessing module is easier to drop in than the threading module, as we don’t need
to add a class like the Python threading example. The only changes we need to make are in
the main function.

Python multiprocessing tutorial: Modules


To use multiple processes, we create a multiprocessing Pool. With the map method it
provides, we will pass the list of URLs to the pool, which in turn will spawn eight new
processes and use each one to download the images in parallel. This is true parallelism, but it
comes with a cost. The entire memory of the script is copied into each subprocess that is
spawned. In this simple example, it isn’t a big deal, but it can easily become serious overhead
for non-trivial programs.

import logging
import os
from functools import partial
from multiprocessing.pool import Pool
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %


(levelname)s - %(message)s')
logging.getLogger('requests').setLevel(logging.CRITICAL)
logger = logging.getLogger(__name__)
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
download = partial(download_link, download_dir)
with Pool(4) as p:
p.map(download, links)
logging.info('Took %s seconds', time() - ts)

if __name__ == '__main__':
main()
Concurrency and Parallelism in Python Example 3: Distributing to Multiple Workers
While the threading and multiprocessing modules are great for scripts that are running on
your personal computer, what should you do if you want the work to be done on a different
machine, or you need to scale up to more than the CPU on one machine can handle? A great
use case for this is long-running back-end tasks for web applications. If you have some long-
running tasks, you don’t want to spin up a bunch of sub-processes or threads on the same
machine that need to be running the rest of your application code. This will degrade the
performance of your application for all of your users. What would be great is to be able to run
these jobs on another machine, or many other machines.

A great Python library for this task is RQ, a very simple yet powerful library. You first
enqueue a function and its arguments using the library. This pickles the function call
representation, which is then appended to a Redis list. Enqueueing the job is the first step, but
will not do anything yet. We also need at least one worker to listen on that job queue.

Python Multithreading vs. Multiprocessing


If your code is IO bound, both multiprocessing and multithreading in Python will work for
you. Multiprocessing is a easier to just drop in than threading but has a higher memory
overhead. If your code is CPU bound, multiprocessing is most likely going to be the better
choice—especially if the target machine has multiple cores or CPUs. For web applications,
and when you need to scale the work across multiple machines, RQ is going to be better for
you.

Related: Become More Advanced: Avoid the 10 Most Common Mistakes That Python
Programmers Make

Update
Python concurrent.futures
Something new since Python 3.2 that wasn’t touched upon in the original article is the
concurrent.futures package. This package provides yet another way to use concurrency and
parallelism with Python.

In the original article, I mentioned that Python’s multiprocessing module would be easier to
drop into existing code than the threading module. This was because the Python 3 threading
module required subclassing the Thread class and also creating a Queue for the threads to
monitor for work.

Using a concurrent.futures.ThreadPoolExecutor makes the Python threading example code


almost identical to the multiprocessing module.

import logging
import os
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')

logger = logging.getLogger(__name__)

def main():
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)

# By placing the executor inside a with block, the executors shutdown method
# will be called cleaning up threads.
#
# By default, the executor sets number of workers to 5 times the number of
# CPUs.
with ThreadPoolExecutor() as executor:

# Create a new partially applied function that stores the directory


# argument.
#
# This allows the download_link function that normally takes two
# arguments to work with the map function that expects a function of a
# single argument.
fn = partial(download_link, download_dir)

# Executes fn concurrently using threads on the links iterable. The


# timeout is for the entire process, not a single call, so downloading
# all images must complete within 30 seconds.
executor.map(fn, links, timeout=30)

if __name__ == '__main__':
main()
Now that we have all these images downloaded with our Python ThreadPoolExecutor, we can
use them to test a CPU-bound task. We can create thumbnail versions of all the images in
both a single-threaded, single-process script and then test a multiprocessing-based solution.

We are going to use the Pillow library to handle the resizing of the images.

Here is our initial script.

import logging
from pathlib import Path
from time import time

from PIL import Image

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')

logger = logging.getLogger(__name__)

def create_thumbnail(size, path):


"""
Creates a thumbnail of an image with the same name as image but with
_thumbnail appended before the extension. E.g.:

>>> create_thumbnail((128, 128), 'image.jpg')

A new thumbnail image is created with the name image_thumbnail.jpg

:param size: A tuple of the width and height of the image


:param path: The path to the image file
:return: None
"""
image = Image.open(path)
image.thumbnail(size)
path = Path(path)
name = path.stem + '_thumbnail' + path.suffix
thumbnail_path = path.with_name(name)
image.save(thumbnail_path)

def main():
ts = time()
for image_path in Path('images').iterdir():
create_thumbnail((128, 128), image_path)
logging.info('Took %s', time() - ts)

if __name__ == '__main__':
main()
This script iterates over the paths in the images folder and for each path it runs the
create_thumbnail function. This function uses Pillow to open the image, create a thumbnail,
and save the new, smaller image with the same name as the original but with _thumbnail
appended to the name.

Running this script on 160 images totaling 36 million takes 2.32 seconds. Lets see if we can
speed this up using a ProcessPoolExecutor.

import logging
from pathlib import Path
from time import time
from functools import partial

from concurrent.futures import ProcessPoolExecutor

from PIL import Image

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')

logger = logging.getLogger(__name__)

def create_thumbnail(size, path):


"""
Creates a thumbnail of an image with the same name as image but with
_thumbnail appended before the extension. E.g.:

>>> create_thumbnail((128, 128), 'image.jpg')

A new thumbnail image is created with the name image_thumbnail.jpg

:param size: A tuple of the width and height of the image


:param path: The path to the image file
:return: None
"""
path = Path(path)
name = path.stem + '_thumbnail' + path.suffix
thumbnail_path = path.with_name(name)
image = Image.open(path)
image.thumbnail(size)
image.save(thumbnail_path)

def main():
ts = time()
# Partially apply the create_thumbnail method, setting the size to 128x128
# and returning a function of a single argument.
thumbnail_128 = partial(create_thumbnail, (128, 128))
# Create the executor in a with block so shutdown is called when the block
# is exited.
with ProcessPoolExecutor() as executor:
executor.map(thumbnail_128, Path('images').iterdir())
logging.info('Took %s', time() - ts)

if __name__ == '__main__':
main()
The create_thumbnail method is identical to the last script. The main difference is the
creation of a ProcessPoolExecutor. The executor’s map method is used to create the
thumbnails in parallel. By default, the ProcessPoolExecutor creates one subprocess per CPU.
Running this script on the same 160 images took 1.05 seconds—2.2 times faster!

Async/Await (Python 3.5+ only)


One of the most requested items in the comments on the original article was for an example
using Python 3’s asyncio module. Compared to the other examples, there is some new Python
syntax that may be new to most people and also some new concepts. An unfortunate
additional layer of complexity is caused by Python’s built-in urllib module not being
asynchronous. We will need to use an async HTTP library to get the full benefits of asyncio.
For this, we’ll use aiohttp.

Let’s jump right into the code and a more detailed explanation will follow.

import asyncio
import logging
import os
from time import time

import aiohttp

from download import setup_download_dir, get_links

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s


- %(message)s')
logger = logging.getLogger(__name__)

async def async_download_link(session, directory, link):


"""
Async version of the download_link method we've been using in the other examples.
:param session: aiohttp ClientSession
:param directory: directory to save downloads
:param link: the url of the link to download
:return:
"""
download_path = directory / os.path.basename(link)
async with session.get(link) as response:
with download_path.open('wb') as f:
while True:
# await pauses execution until the 1024 (or less) bytes are read from the stream
chunk = await response.content.read(1024)
if not chunk:
# We are done reading the file, break out of the while loop
break
f.write(chunk)
logger.info('Downloaded %s', link)

# Main is now a coroutine


async def main():
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
# We use a session to take advantage of tcp keep-alive
# Set a 3 second read and connect timeout. Default is 5 minutes
async with aiohttp.ClientSession(conn_timeout=3, read_timeout=3) as session:
tasks = [(async_download_link(session, download_dir, l)) for l in get_links(client_id)]
# gather aggregates all the tasks and schedules them in the event loop
await asyncio.gather(*tasks, return_exceptions=True)

if __name__ == '__main__':
ts = time()
# Create the asyncio event loop
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
# Shutdown the loop even if there is an exception
loop.close()
logger.info('Took %s seconds to complete', time() - ts)
There is quite a bit to unpack here. Let’s start with the main entry point of the program. The
first new thing we do with the asyncio module is to obtain the event loop. The event loop
handles all of the asynchronous code. Then, the loop is run until complete and passed the
main function. There is a piece of new syntax in the definition of main: async def. You’ll also
notice await and with async.

The async/await syntax was introduced in PEP492. The async def syntax marks a function as
a coroutine. Internally, coroutines are based on Python generators, but aren’t exactly the same
thing. Coroutines return a coroutine object similar to how generators return a generator
object. Once you have a coroutine, you obtain its results with the await expression. When a
coroutine calls await, execution of the coroutine is suspended until the awaitable completes.
This suspension allows other work to be completed while the coroutine is suspended
“awaiting” some result. In general, this result will be some kind of I/O like a database request
or in our case an HTTP request.

The download_link function had to be changed pretty significantly. Previously, we were


relying on urllib to do the brunt of the work of reading the image for us. Now, to allow our
method to work properly with the async programming paradigm, we’ve introduced a while
loop that reads chunks of the image at a time and suspends execution while waiting for the
I/O to complete. This allows the event loop to loop through downloading the different images
as each one has new data available during the download.

There Should Be One—Preferably Only One—Obvious Way to Do It


While the zen of Python tells us there should be one obvious way to do something, there are
many ways in Python to introduce concurrency into our programs. The best method to choose
is going to depend on your specific use case. The asynchronous paradigm scales better to
high-concurrency workloads (like a webserver) compared to threading or multiprocessing,
but it requires your code (and dependencies) to be async in order to fully benefit.

Hopefully the Python threading examples in this article—and update—will point you in the
right direction so you have an idea of where to look in the Python standard library if you need
to introduce concurrency into your programs.

You might also like