Professional Documents
Culture Documents
A module’s contents are accessed the same way in all three cases: with the import statement.
Here, the focus will mostly be on modules that are written in Python. The cool thing about
modules written in Python is that they are exceedingly straightforward to build. All you need
to do is create a file that contains legitimate Python code and then give the file a name with
a .py extension. That’s it! No special syntax or voodoo is necessary.
For example, suppose you have created a file called mod.py containing the following:
mod.py
def foo(arg):
print(f'arg = {arg}')
class Foo:
pass
Several objects are defined in mod.py:
s (a string)
a (a list)
foo() (a function)
Foo (a class)
Assuming mod.py is in an appropriate location, which you will learn more about shortly, these
objects can be accessed by importing the module as follows:
>>>
>>> import mod
>>> print(mod.s)
If Comrade Napoleon says it, it must be right.
>>> mod.a
[100, 200, 300]
>>> mod.foo(['quux', 'corge', 'grault'])
arg = ['quux', 'corge', 'grault']
>>> x = mod.Foo()
>>> x
<mod.Foo object at 0x03C181F0>
The directory from which the input script was run or the current directory if the
interpreter is being run interactively
The list of directories contained in the PYTHONPATH environment variable, if it is
set. (The format for PYTHONPATH is OS-dependent but should mimic
the PATH environment variable.)
An installation-dependent list of directories configured at the time Python is installed
The resulting search path is accessible in the Python variable sys.path, which is obtained from
a module named sys:
>>>
>>> import sys
>>> sys.path
['', 'C:\\Users\\john\\Documents\\Python\\doc', 'C:\\Python36\\Lib\\idlelib',
'C:\\Python36\\python36.zip', 'C:\\Python36\\DLLs', 'C:\\Python36\\lib',
'C:\\Python36', 'C:\\Python36\\lib\\site-packages']
Thus, to ensure your module is found, you need to do one of the following:
Put mod.py in the directory where the input script is located or the current directory,
if interactive
Modify the PYTHONPATH environment variable to contain the directory
where mod.py is located before starting the interpreter
o Or: Put mod.py in one of the directories already contained in
the PYTHONPATH variable
Put mod.py in one of the installation-dependent directories, which you may or may not
have write-access to, depending on the OS
There is actually one additional option: you can put the module file in any directory of your
choice and then modify sys.path at run-time so that it contains that directory. For example, in
this case, you could put mod.py in directory C:\Users\john and then issue the following
statements:
>>>
>>> sys.path.append(r'C:\Users\john')
>>> sys.path
['', 'C:\\Users\\john\\Documents\\Python\\doc', 'C:\\Python36\\Lib\\idlelib',
'C:\\Python36\\python36.zip', 'C:\\Python36\\DLLs', 'C:\\Python36\\lib',
'C:\\Python36', 'C:\\Python36\\lib\\site-packages', 'C:\\Users\\john']
>>> import mod
Once a module has been imported, you can determine the location where it was found with
the module’s __file__ attribute:
>>>
>>> import mod
>>> mod.__file__
'C:\\Users\\john\\mod.py'
>>> import re
>>> re.__file__
'C:\\Python36\\lib\\re.py'
The import Statement
Module contents are made available to the caller with the import statement.
The import statement takes many different forms, shown below.
import <module_name>
The simplest form is the one already shown above:
import <module_name>
Note that this does not make the module contents directly accessible to the caller. Each
module has its own private symbol table, which serves as the global symbol table for all
objects defined in the module. Thus, a module creates a separate namespace, as already
noted.
From the caller, objects in the module are only accessible when prefixed
with <module_name> via dot notation, as illustrated below.
After the following import statement, mod is placed into the local symbol table. Thus, mod has
meaning in the caller’s local context:
>>>
>>> import mod
>>> mod
<module 'mod' from 'C:\\Users\\john\\Documents\\Python\\doc\\mod.py'>
But s and foo remain in the module’s private symbol table and are not meaningful in the local
context:
>>>
>>> s
NameError: name 's' is not defined
>>> foo('quux')
NameError: name 'foo' is not defined
To be accessed in the local context, names of objects defined in the module must be prefixed
by mod:
>>>
>>> mod.s
'If Comrade Napoleon says it, it must be right.'
>>> mod.foo('quux')
arg = quux
Several comma-separated modules may be specified in a single import statement:
>>>
>>> from mod import s, foo
>>> s
'If Comrade Napoleon says it, it must be right.'
>>> foo('quux')
arg = quux
>>>
>>> a = ['foo', 'bar', 'baz']
>>> a
['foo', 'bar', 'baz']
For example:
>>>
>>> from mod import *
>>> s
'If Comrade Napoleon says it, it must be right.'
>>> a
[100, 200, 300]
>>> foo
<function foo at 0x03B449C0>
>>> Foo
<class 'mod.Foo'>
This isn’t necessarily recommended in large-scale production code. It’s a bit dangerous
because you are entering names into the local symbol table en masse. Unless you know them
all well and can be confident there won’t be a conflict, you have a decent chance of
overwriting an existing name inadvertently. However, this syntax is quite handy when you
are just mucking around with the interactive interpreter, for testing or discovery purposes,
because it quickly gives you access to everything a module has to offer without a lot of
typing.
>>>
>>> s = 'foo'
>>> a = ['foo', 'bar', 'baz']
Module contents can be imported from within a function definition. In that case,
the import does not occur until the function is called:
>>>
>>> def bar():
... from mod import foo
... foo('corge')
...
>>> bar()
arg = corge
However, Python 3 does not allow the indiscriminate import * syntax from within a function:
>>>
>>> def bar():
... from mod import *
...
SyntaxError: import * only allowed at module level
Lastly, a try statement with an except ImportError clause can be used to guard against
unsuccessful import attempts:
>>>
>>> try:
... # Non-existent module
... import baz
... except ImportError:
... print('Module not found')
...
The dir() Function
The built-in function dir() returns a list of defined names in a namespace. Without arguments,
it produces an alphabetically sorted list of names in the current local symbol table:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> qux = [1, 2, 3, 4, 5]
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'qux']
Note how the first call to dir() above lists several names that are automatically defined and
already in the namespace when the interpreter starts. As new names are defined (qux, Bar, x),
they appear on subsequent invocations of dir().
This can be useful for identifying what exactly has been added to the namespace by an import
statement:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
When given an argument that is the name of a module, dir() lists the names defined in the
module:
>>>
>>> import mod
>>> dir(mod)
['Foo', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__',
'__name__', '__package__', '__spec__', 'a', 'foo', 's']
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> from mod import *
>>> dir()
['Foo', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'a', 'foo', 's']
mod.py
def foo(arg):
print(f'arg = {arg}')
class Foo:
pass
This can be run as a script:
C:\Users\john\Documents>python mod.py
C:\Users\john\Documents>
There are no errors, so it apparently worked. Granted, it’s not very interesting. As it is
written, it only defines objects. It doesn’t do anything with them, and it doesn’t generate any
output.
Let’s modify the above Python module so it does generate some output when run as a script:
mod.py
def foo(arg):
print(f'arg = {arg}')
class Foo:
pass
print(s)
print(a)
foo('quux')
x = Foo()
print(x)
Now it should be a little more interesting:
C:\Users\john\Documents>python mod.py
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<__main__.Foo object at 0x02F101D0>
Unfortunately, now it also generates output when imported as a module:
>>>
>>> import mod
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<mod.Foo object at 0x0169AD50>
This is probably not what you want. It isn’t usual for a module to generate output when it is
imported.
Wouldn’t it be nice if you could distinguish between when the file is loaded as a module and
when it is run as a standalone script?
mod.py
def foo(arg):
print(f'arg = {arg}')
class Foo:
pass
if (__name__ == '__main__'):
print('Executing as standalone script')
print(s)
print(a)
foo('quux')
x = Foo()
print(x)
Now, if you run as a script, you get output:
C:\Users\john\Documents>python mod.py
Executing as standalone script
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<__main__.Foo object at 0x03450690>
But if you import as a module, you don’t:
>>>
>>> import mod
>>> mod.foo('grault')
arg = grault
Modules are often designed with the capability to run as a standalone script for purposes of
testing the functionality that is contained within the module. This is referred to as unit
testing. For example, suppose you have created a module fact.py containing
a factorial function, as follows:
fact.py
def fact(n):
return 1 if n == 1 else n * fact(n-1)
if (__name__ == '__main__'):
import sys
if len(sys.argv) > 1:
print(fact(int(sys.argv[1])))
The file can be treated as a module, and the fact() function imported:
>>>
>>> from fact import fact
>>> fact(6)
720
But it can also be run as a standalone by passing an integer argument on the command-line
for testing:
C:\Users\john\Documents>python fact.py 6
720
Reloading a Module
For reasons of efficiency, a module is only loaded once per interpreter session. That is fine
for function and class definitions, which typically make up the bulk of a module’s contents.
But a module can contain executable statements as well, usually for initialization. Be aware
that these statements will only be executed the first time a module is imported.
>>> mod.a
[100, 200, 300]
The print() statement is not executed on subsequent imports. (For that matter, neither is the
assignment statement, but as the final display of the value of mod.a shows, that doesn’t
matter. Once the assignment is made, it sticks.)
If you make a change to a module and need to reload it, you need to either restart the
interpreter or use a function called reload() from module importlib:
>>>
>>> import mod
a = [100, 200, 300]
Python Packages
Suppose you have developed a very large application that includes many modules. As the
number of modules grows, it becomes difficult to keep track of them all if they are dumped
into one location. This is particularly so if they have similar names or functionality. You
might wish for a means of grouping and organizing them.
Creating a package is quite straightforward, since it makes use of the operating system’s
inherent hierarchical file structure. Consider the following arrangement:
Here, there is a directory named pkg that contains two modules, mod1.py and mod2.py. The
contents of the modules are:
mod1.py
def foo():
print('[mod1] foo()')
class Foo:
pass
mod2.py
def bar():
print('[mod2] bar()')
class Bar:
pass
Given this structure, if the pkg directory resides in a location where it can be found (in one of
the directories contained in sys.path), you can refer to the two modules with dot
notation (pkg.mod1, pkg.mod2) and import them with the syntax you are already familiar
with:
>>>
>>> import pkg
>>> pkg
<module 'pkg' (namespace)>
But this is of little avail. Though this is, strictly speaking, a syntactically correct Python
statement, it doesn’t do much of anything useful. In particular, it does not place any of the
modules in pkg into the local namespace:
>>>
>>> pkg.mod1
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
pkg.mod1
AttributeError: module 'pkg' has no attribute 'mod1'
>>> pkg.mod1.foo()
Traceback (most recent call last):
File "<pyshell#35>", line 1, in <module>
pkg.mod1.foo()
AttributeError: module 'pkg' has no attribute 'mod1'
>>> pkg.mod2.Bar()
Traceback (most recent call last):
File "<pyshell#36>", line 1, in <module>
pkg.mod2.Bar()
AttributeError: module 'pkg' has no attribute 'mod2'
To actually import the modules or their contents, you need to use one of the forms shown
above.
Package Initialization
If a file named __init__.py is present in a package directory, it is invoked when the package or
a module in the package is imported. This can be used for execution of package initialization
code, such as initialization of package-level data.
__init__.py
>>>
>>> import pkg
Invoking __init__.py for pkg
>>> pkg.A
['quux', 'corge', 'grault']
A module in the package can access the global variable by importing it in turn:
mod1.py
def foo():
from pkg import A
print('[mod1] foo() / A = ', A)
class Foo:
pass
>>>
>>> from pkg import mod1
Invoking __init__.py for pkg
>>> mod1.foo()
[mod1] foo() / A = ['quux', 'corge', 'grault']
__init__.py can also be used to effect automatic importing of modules from a package. For
example, earlier you saw that the statement import pkg only places the name pkg in the caller’s
local symbol table and doesn’t import any modules. But if __init__.py in the pkg directory
contains the following:
__init__.py
>>>
>>> import pkg
Invoking __init__.py for pkg
>>> pkg.mod1.foo()
[mod1] foo()
>>> pkg.mod2.bar()
[mod2] bar()
Note: Much of the Python documentation states that an __init__.py file must be present in the
package directory when creating a package. This was once true. It used to be that the very
presence of __init__.py signified to Python that a package was being defined. The file could
contain initialization code or even be empty, but it had to be present.
Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the
creation of a package without any __init__.py file. Of course, it can still be present if package
initialization is needed. But it is no longer required.
Importing * From a Package
For the purposes of the following discussion, the previously defined package is expanded to
contain some additional modules:
There are now four modules defined in the pkg directory. Their contents are as shown below:
mod1.py
def foo():
print('[mod1] foo()')
class Foo:
pass
mod2.py
def bar():
print('[mod2] bar()')
class Bar:
pass
mod3.py
def baz():
print('[mod3] baz()')
class Baz:
pass
mod4.py
def qux():
print('[mod4] qux()')
class Qux:
pass
(Imaginative, aren’t they?)
You have already seen that when import * is used for a module, all objects from the module
are imported into the local symbol table, except those whose names begin with an underscore,
as always:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> dir()
['Baz', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'baz']
>>> baz()
[mod3] baz()
>>> Baz
<class 'pkg.mod3.Baz'>
The analogous statement for a package is this:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
For the present example, suppose you create an __init__.py in the pkg directory like this:
pkg/__init__.py
__all__ = [
'mod1',
'mod2',
'mod3',
'mod4'
]
Now from pkg import * imports all four modules:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
By the way, __all__ can be defined in a module as well and serves the same purpose: to
control what is imported with import *. For example, modify mod1.py as follows:
pkg/mod1.py
__all__ = ['foo']
def foo():
print('[mod1] foo()')
class Foo:
pass
Now an import * statement from pkg.mod1 will only import what is contained in __all__:
>>>
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']
>>> foo()
[mod1] foo()
>>> Foo
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
Foo
NameError: name 'Foo' is not defined
foo() (the function) is now defined in the local namespace, but Foo (the class) is not, because
the latter is not in __all__.
Subpackages
Packages can contain nested subpackages to arbitrary depth. For example, let’s make one
more modification to the example package directory as follows:
The four modules (mod1.py, mod2.py, mod3.py and mod4.py) are defined as previously. But
now, instead of being lumped together into the pkg directory, they are split out into
two subpackage directories, sub_pkg1 and sub_pkg2.
Importing still works the same as shown previously. Syntax is similar, but additional dot
notation is used to separate package name from subpackage name:
>>>
>>> import pkg.sub_pkg1.mod1
>>> pkg.sub_pkg1.mod1.foo()
[mod1] foo()
pkg/sub__pkg2/mod3.py
def baz():
print('[mod3] baz()')
class Baz:
pass
pkg/sub__pkg2/mod3.py
def baz():
print('[mod3] baz()')
class Baz:
pass
Conclusion
In this tutorial, you covered the following topics:
An object contains data, like the raw or pre-processed materials at each step on an assembly
line, and behaviour, like the action each assembly line component performs.
For instance, an object could represent a person with properties like a name, age, and address
and behaviors such as walking, talking, breathing, and running. Or it could represent an email
with properties like a recipient list, subject, and body and behaviors like adding attachments
and sending.
Put another way, object-oriented programming is an approach for modeling concrete, real-
world things, like cars, as well as relations between things, like companies and employees,
students and teachers, and so on. OOP models real-world entities as software objects that
have some data associated with them and can perform certain functions.
The key takeaway is that objects are at the center of object-oriented programming in Python,
not only representing the data, as in procedural programming, but in the overall structure of
the program as well.
For example, let’s say you want to track employees in an organization. You need to store
some basic information about each employee, such as their name, age, position, and the year
they started working.
First, it can make larger code files more difficult to manage. If you reference kirk[0] several
lines away from where the kirk list is declared, will you remember that the element with
index 0 is the employee’s name?
Second, it can introduce errors if not every employee has the same number of elements in the
list. In the mccoy list above, the age is missing, so mccoy[1] will return "Chief Medical
Officer" instead of Dr. McCoy’s age.
A great way to make this type of code more manageable and more maintainable is to use
classes.
Classes vs Instances
Classes are used to create user-defined data structures. Classes define functions called
methods, which identify the behaviors and actions that an object created from the class can
perform with its data.
In this tutorial, you’ll create a Dog class that stores some information about the
characteristics and behaviours that an individual dog can have.
A class is a blueprint for how something should be defined. It doesn’t actually contain any
data. The Dog class specifies that a name and an age are necessary for defining a dog, but it
doesn’t contain the name or age of any specific dog.
While the class is the blueprint, an instance is an object that is built from a class and contains
real data. An instance of the Dog class is not a blueprint anymore. It’s an actual dog with a
name, like Miles, who’s four years old.
Put another way, a class is like a form or questionnaire. An instance is like a form that has
been filled out with information. Just like many people can fill out the same form with their
own unique information, many instances can be created from a single class.
class Dog:
pass
The body of the Dog class consists of a single statement: the pass keyword. pass is often used
as a placeholder indicating where code will eventually go. It allows you to run this code
without Python throwing an error.
Note: Python class names are written in CapitalizedWords notation by convention. For
example, a class for a specific breed of dog like the Jack Russell Terrier would be written as
JackRussellTerrier.
The Dog class isn’t very interesting right now, so let’s spruce it up a bit by defining some
properties that all Dog objects should have. There are a number of properties that we can
choose from, including name, age, coat color, and breed. To keep things simple, we’ll just
use name and age.
The properties that all Dog objects must have are defined in a method called .__init__().
Every time a new Dog object is created, .__init__() sets the initial state of the object by
assigning the values of the object’s properties. That is, .__init__() initializes each new
instance of the class.
You can give .__init__() any number of parameters, but the first parameter will always be a
variable called self. When a new class instance is created, the instance is automatically passed
to the self parameter in .__init__() so that new attributes can be defined on the object.
Let’s update the Dog class with an .__init__() method that creates .name and .age attributes:
class Dog:
def __init__(self, name, age):
self.name = name
self.age = age
Notice that the .__init__() method’s signature is indented four spaces. The body of the
method is indented by eight spaces. This indentation is vitally important. It tells Python that
the .__init__() method belongs to the Dog class.
In the body of .__init__(), there are two statements using the self variable:
self.name = name creates an attribute called name and assigns to it the value of the name
parameter.
self.age = age creates an attribute called age and assigns to it the value of the age parameter.
Attributes created in .__init__() are called instance attributes. An instance attribute’s value is
specific to a particular instance of the class. All Dog objects have a name and an age, but the
values for the name and age attributes will vary depending on the Dog instance.
On the other hand, class attributes are attributes that have the same value for all class
instances. You can define a class attribute by assigning a value to a variable name outside
of .__init__().
For example, the following Dog class has a class attribute called species with the value
"Canis familiaris":
class Dog:
# Class attribute
species = "Canis familiaris"
Class attributes are defined directly beneath the first line of the class name and are indented
by four spaces. They must always be assigned an initial value. When an instance of the class
is created, class attributes are automatically created and assigned to their initial values.
Use class attributes to define properties that should have the same value for every class
instance. Use instance attributes for properties that vary from one instance to another.
Creating a new object from a class is called instantiating an object. You can instantiate a new
Dog object by typing the name of the class, followed by opening and closing parentheses:
>>> Dog()
<__main__.Dog object at 0x106702d30>
You now have a new Dog object at 0x106702d30. This funny-looking string of letters and
numbers is a memory address that indicates where the Dog object is stored in your
computer’s memory. Note that the address you see on your screen will be different.
>>> Dog()
<__main__.Dog object at 0x0004ccc90>
The new Dog instance is located at a different memory address. That’s because it’s an
entirely new instance and is completely unique from the first Dog object that you instantiated.
>>> a = Dog()
>>> b = Dog()
>>> a == b
False
In this code, you create two new Dog objects and assign them to the variables a and b. When
you compare a and b using the == operator, the result is False. Even though a and b are both
instances of the Dog class, they represent two distinct objects in memory.
To pass arguments to the name and age parameters, put values into the parentheses after the
class name:
This creates two new Dog instances—one for a nine-year-old dog named Buddy and one for
a four-year-old dog named Miles.
The Dog class’s .__init__() method has three parameters, so why are only two arguments
passed to it in the example?
When you instantiate a Dog object, Python creates a new instance and passes it to the first
parameter of .__init__(). This essentially removes the self parameter, so you only need to
worry about the name and age parameters.
After you create the Dog instances, you can access their instance attributes using dot
notation:
>>> buddy.name
'Buddy'
>>> buddy.age
9
>>> miles.name
'Miles'
>>> miles.age
4
You can access class attributes the same way:
>>> buddy.species
'Canis familiaris'
One of the biggest advantages of using classes to organize data is that instances are
guaranteed to have the attributes you expect. All Dog instances have .species, .name, and .age
attributes, so you can use those attributes with confidence knowing that they will always
return a value.
Although the attributes are guaranteed to exist, their values can be changed dynamically:
>>> buddy.age = 10
>>> buddy.age
10
>>> miles.species = "Felis silvestris"
>>> miles.species
'Felis silvestris'
In this example, you change the .age attribute of the buddy object to 10. Then you change
the .species attribute of the miles object to "Felis silvestris", which is a species of cat. That
makes Miles a pretty strange dog, but it is valid Python!
The key takeaway here is that custom objects are mutable by default. An object is mutable if
it can be altered dynamically. For example, lists and dictionaries are mutable, but strings and
tuples are immutable.
Instance Methods
Instance methods are functions that are defined inside a class and can only be called from an
instance of that class. Just like .__init__(), an instance method’s first parameter is always self.
Open a new editor window in IDLE and type in the following Dog class:
class Dog:
species = "Canis familiaris"
# Instance method
def description(self):
return f"{self.name} is {self.age} years old"
.description() returns a string displaying the name and age of the dog.
.speak() has one parameter called sound and returns a string containing the dog’s name and
the sound the dog makes.
Save the modified Dog class to a file called dog.py and press F5 to run the program. Then
open the interactive window and type the following to see your instance methods in action:
>>> miles.description()
'Miles is 4 years old'
When you create a list object, you can use print() to display a string that looks like the list:
>>> print(miles)
<__main__.Dog object at 0x00aeff70>
When you print(miles), you get a cryptic looking message telling you that miles is a Dog
object at the memory address 0x00aeff70. This message isn’t very helpful. You can change
what gets printed by defining a special instance method called .__str__().
In the editor window, change the name of the Dog class’s .description() method to .__str__():
class Dog:
# Leave other parts of Dog class as-is
Methods like .__init__() and .__str__() are called dunder methods because they begin and
end with double underscores. There are many dunder methods that you can use to customize
classes in Python. Although too advanced a topic for a beginning Python book, understanding
dunder methods is an important part of mastering object-oriented programming in Python.
In the next section, you’ll see how to take your knowledge one step further and create classes
from other classes.
Child classes can override or extend the attributes and methods of parent classes. In other
words, child classes inherit all of the parent’s attributes and methods but can also specify
attributes and methods that are unique to themselves.
Although the analogy isn’t perfect, you can think of object inheritance sort of like genetic
inheritance.
You may have inherited your hair color from your mother. It’s an attribute you were born
with. Let’s say you decide to color your hair purple. Assuming your mother doesn’t have
purple hair, you’ve just overridden the hair color attribute that you inherited from your mom.
You also inherit, in a sense, your language from your parents. If your parents speak English,
then you’ll also speak English. Now imagine you decide to learn a second language, like
German. In this case you’ve extended your attributes because you’ve added an attribute that
your parents don’t have.
Suppose now that you want to model the dog park with Python classes. The Dog class that
you wrote in the previous section can distinguish dogs by name and age but not by breed.
You could modify the Dog class in the editor window by adding a .breed attribute:
class Dog:
species = "Canis familiaris"
Press F5 to save the file. Now you can model the dog park by instantiating a bunch of
different dogs in the interactive window:
Using just the Dog class, you must supply a string for the sound argument of .speak() every
time you call it on a Dog instance:
>>> buddy.speak("Yap")
'Buddy says Yap'
>>> jim.speak("Woof")
'Jim says Woof'
>>> jack.speak("Woof")
'Jack says Woof'
Passing a string to every call to .speak() is repetitive and inconvenient. Moreover, the string
representing the sound that each Dog instance makes should be determined by its .breed
attribute, but here you have to manually pass the correct string to .speak() every time it’s
called.
You can simplify the experience of working with the Dog class by creating a child class for
each breed of dog. This allows you to extend the functionality that each child class inherits,
including specifying a default argument for .speak().
class Dog:
species = "Canis familiaris"
def __str__(self):
return f"{self.name} is {self.age} years old"
Remember, to create a child class, you create new class with its own name and then put the
name of the parent class in parentheses. Add the following to the dog.py file to create three
new child classes of the Dog class:
class JackRussellTerrier(Dog):
pass
class Dachshund(Dog):
pass
class Bulldog(Dog):
pass
Press F5 to save and run the file. With the child classes defined, you can now instantiate
some dogs of specific breeds in the interactive window:
>>> miles.species
'Canis familiaris'
>>> buddy.name
'Buddy'
>>> print(jack)
Jack is 3 years old
>>> jim.speak("Woof")
'Jim says Woof'
To determine which class a given object belongs to, you can use the built-in type():
>>> type(miles)
<class '__main__.JackRussellTerrier'>
What if you want to determine if miles is also an instance of the Dog class? You can do this
with the built-in isinstance():
Notice that isinstance() takes two arguments, an object and a class. In the example above,
isinstance() checks if miles is an instance of the Dog class and returns True.
The miles, buddy, jack, and jim objects are all Dog instances, but miles is not a Bulldog
instance, and jack is not a Dachshund instance:
More generally, all objects created from a child class are instances of the parent class,
although they may not be instances of other child classes.
Now that you’ve created child classes for some different breeds of dogs, let’s give each breed
its own sound.
To override a method defined on the parent class, you define a method with the same name
on the child class. Here’s what that looks like for the JackRussellTerrier class:
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return f"{self.name} says {sound}"
Now .speak() is defined on the JackRussellTerrier class with the default argument for sound
set to "Arf".
Update dog.py with the new JackRussellTerrier class and press F5 to save and run the file.
You can now call .speak() on a JackRussellTerrier instance without passing an argument to
sound:
>>> miles.speak("Grrr")
'Miles says Grrr'
One thing to keep in mind about class inheritance is that changes to the parent class
automatically propagate to child classes. This occurs as long as the attribute or method being
changed isn’t overridden in the child class.
For example, in the editor window, change the string returned by .speak() in the Dog class:
class Dog:
# Leave other attributes and methods as they are
However, calling .speak() on a JackRussellTerrier instance won’t show the new style of
output:
Sometimes it makes sense to completely override a method from a parent class. But in this
instance, we don’t want the JackRussellTerrier class to lose any changes that might be made
to the formatting of the output string of Dog.speak().
To do this, you still need to define a .speak() method on the child JackRussellTerrier class.
But instead of explicitly defining the output string, you need to call the Dog class’s .speak()
inside of the child class’s .speak() using the same arguments that you passed to
JackRussellTerrier.speak().
You can access the parent class from inside a method of a child class by using super():
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return super().speak(sound)
When you call super().speak(sound) inside JackRussellTerrier, Python searches the parent
class, Dog, for a .speak() method and calls it with the variable sound.
Update dog.py with the new JackRussellTerrier class. Save the file and press F5 so you can
test it in the interactive window:
Note: In the above examples, the class hierarchy is very straightforward. The
JackRussellTerrier class has a single parent class, Dog. In real-world examples, the class
hierarchy can get quite complicated.
super() does much more than just search the parent class for a method or an attribute. It
traverses the entire class hierarchy for a matching method or attribute. If you aren’t careful,
super() can have surprising results.
UNIT – IV
Introduction
Before we get into why exception handling is essential and types of built-in exceptions that
Python supports, it is necessary to understand that there is a subtle difference between
an error and an exception.
Errors cannot be handled, while Python exceptions can be handled at the run time. An error
can be a syntax (parsing) error, while there can be many types of exceptions that could occur
during the execution and are not unconditionally inoperable. An Error might indicate critical
problems that a reasonable application should not try to catch, while an Exception might
indicate conditions that an application should try to catch. Errors are a form of an unchecked
exception and are irrecoverable like an OutOfMemoryError, which a programmer should not
try to handle.
Exception handling makes your code more robust and helps prevent potential failures that
would cause your program to stop in an uncontrolled manner. Imagine if you have written a
code which is deployed in production and still, it terminates due to an exception, your client
would not appreciate that, so it's better to handle the particular exception beforehand and
avoid the chaos.
Syntax Error
Out of Memory Error
Recursion Error
Exceptions
Syntax Error
Syntax errors often called as parsing errors, are predominantly caused when the parser
detects a syntactic issue in your code.
The above arrow indicates when the parser ran into an error while executing the code. The
token preceding the arrow causes the failure. To rectify such fundamental errors, Python will
do most of your job since it will print for you the file name and the line number at which the
error occurred.
Using a 32-bit Python Architecture (Maximum Memory Allocation given is very low,
between 2GB - 4GB).
Loading a very large data file
You can handle the memory error with the help of exception handling, a fallback exception
for when the interpreter entirely runs out of memory and must immediately stop the current
execution. In these rare instances, Python raises an OutofMemoryError, allowing the script to
somehow catch itself and break out of the memory error and recover itself.
However, since Python adopts to the memory management architecture of the C language
(malloc() function), it is not certain that all processes of the script will recover — in some
cases, a MemoryError will result in an unrecoverable crash. Hence, neither it is a good
practice to use exception handling for such an error, nor it is advisable.
Recursion Error
It is related to stack and occurs when you call functions. As the name
suggests, recursion error transpires when too many methods, one inside another is executed
(one with an infinite recursion), which is limited by the size of the stack.
All your local variables and methods call associated data will be placed on the stack. For each
method call, one stack frame will be created, and local as well as method call relevant data
will be placed inside that stack frame. Once the method execution is completed, the stack
frame will be removed.
To reproduce this error, let's define a function recursion that will be recursive, meaning it will
keep calling itself as an infinite loop method call, you will see StackOverflow or a Recursion
Error because the stack frame will be populated with method data for every call, but it will
not be freed.
def recursion():
return recursion()
recursion()
---------------------------------------------------------------------------
<ipython-input-2-5395140f7f05> in recursion()
1 def recursion():
----> 2 return recursion()
<ipython-input-2-5395140f7f05> in recursion()
1 def recursion():
----> 2 return recursion()
Indentation Error
Indentation error is similar in spirit to the syntax error and falls under it. However, specific to
the only indentation related issues in the script.
Exceptions
Even if the syntax of a statement or expression is correct, it may still cause an error when
executed. Python exceptions are errors that are detected during execution and are not
unconditionally fatal: you will soon learn in the tutorial how to handle them in Python
programs. An exception object is created when a Python script raises an exception. If the
script explicitly doesn't handle the exception, the program will be forced to terminate
abruptly.
The programs usually do not handle exceptions, and result in error messages as shown here:
Type Error
a=2
b = 'DataCamp'
a+b
---------------------------------------------------------------------------
<ipython-input-43-e9e866a10e2a> in <module>
----> 1 100 / 0
There are various types of Python exceptions, and the type is printed as part of the message:
the types in the above two examples are ZeroDivisionError and TypeError. Both the error
strings printed as the exception type is the name of the Python's built-in exception.
The remaining part of the error line provides the details of what caused the error based on the
type of exception.
Built-in Exceptions
Source
Before you start learning the built-in exceptions, let's just quickly revise the four main
components of exception handling, as shown in this figure.
Try: It will run the code block in which you expect an error to occur.
Except: Here, you will define the type of exception you expect in the try block (built-
in or custom).
Else: If there isn't any exception, then this block of code will be executed (consider
this as a remedy or a fallback option if you expect a part of your script to produce an
exception).
In the below example, if you run the cell and interrupt the kernel, the program will raise a
KeyboardInterrupt exception. inp = input() Let's now handle
the KeyboardInterrupt exception.
try:
inp = input()
print ('Press Ctrl+C or Interrupt the Kernel:')
except KeyboardInterrupt:
print ('Caught KeyboardInterrupt')
else:
print ('No exception occurred')
Caught KeyboardInterrupt
Standard Error
Let's learn about some of the standard errors that could usually occur while programming.
Arithmetic Error
All of the above exceptions fall under the Arithmetic base class and are raised for errors in
arithmetic operations, as discussed here.
Zero Division
When the divisor (second argument of the division) or the denominator is zero, then the
resultant raises a zero division error.
try:
a = 100 / 0
print (a)
except ZeroDivisionError:
print ("Zero Division Exception Raised." )
else:
print ("Success, no error!")
Zero Division Exception Raised.
OverFlow Error
The Overflow Error is raised when the result of an arithmetic operation is out of range.
OverflowError is raised for integers that are outside a required range.
try:
import math
print(math.exp(1000))
except OverflowError:
print ("OverFlow Exception Raised.")
else:
print ("Success, no error!")
OverFlow Exception Raised.
Assertion Error
When an assert statement is failed, an Assertion Error is raised.
Let's take an example to understand the assertion error. Let's say you have two
variables a and b, which you need to compare. To check whether a and b are equal or not, you
apply an assert keyword before that, which will raise an Assertion exception when the
expression will return false.
try:
a = 100
b = "DataCamp"
assert a == b
except AssertionError:
print ("Assertion Exception Raised.")
else:
print ("Success, no error!")
Assertion Exception Raised.
Attribute Error
When a non-existent attribute is referenced, and when that attribute reference or assignment
fails, an attribute error is raised.
In the below example, you can observe that the Attributes class object has no attribute with
the name attribute.
class Attributes(object):
a=2
print (a)
try:
object = Attributes()
print (object.attribute)
except AttributeError:
print ("Attribute Exception Raised.")
2
Attribute Exception Raised.
Import Error
ImportError is raised when you try to import a module that does not exist (unable to load) in
its standard path or even when you make a typo in the module's name.
import nibabel
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-6-9e567e3ae964> in <module>
----> 1 import nibabel
Lookup Error
Lookup Error acts as a base class for the exceptions that occur when a key or index used on a
mapping or sequence of a list/dictionary is invalid or does not exists.
IndexError
KeyError
Key Error
If a key you are trying to access is not found in the dictionary, a key error exception is raised.
try:
a = {1:'a', 2:'b', 3:'c'}
print (a[4])
except LookupError:
print ("Key Error Exception Raised.")
else:
print ("Success, no error!")
Key Error Exception Raised.
Index Error
When you are trying to access an index (sequence) of a list that does not exist in that list or is
out of range of that list, an index error is raised.
try:
a = ['a', 'b', 'c']
print (a[4])
except LookupError:
print ("Index Error Exception Raised, list index out of range")
else:
print ("Success, no error!")
Index Error Exception Raised, list index out of range
Memory Error
As discussed earlier, Memory Error is raised when an operation does not get enough memory
to process further.
Name Error
Name Error is raised when a local or global name is not found.
In the below example, ans variable is not defined. Hence, you will get a name error.
try:
print (ans)
except NameError:
print ("NameError: name 'ans' is not defined")
else:
print ("Success, no error!")
NameError: name 'ans' is not defined
Runtime Error
class SubClass(BaseClass):
"""Implementes the interface"""
def do_something(self):
"""really does something"""
print (self.__class__.__name__ + ' doing something!')
SubClass().do_something()
BaseClass().do_something()
SubClass doing something!
---------------------------------------------------------------------------
<ipython-input-1-57792b6bc7e4> in <module>
14
15 SubClass().do_something()
---> 16 BaseClass().do_something()
<ipython-input-1-57792b6bc7e4> in do_something(self)
5 def do_something(self):
6 """The interface, not implemented"""
----> 7 raise NotImplementedError(self.__class__.__name__ + '.do_something')
8
9 class SubClass(BaseClass):
NotImplementedError: BaseClass.do_something
Type Error
Type Error Exception is raised when two different or unrelated types of operands or objects
are combined.
In the below example, an integer and a string are added, which results in a type error.
try:
a=5
b = "DataCamp"
c=a+b
except TypeError:
print ('TypeError Exception Raised')
else:
print ('Success, no error!')
TypeError Exception Raised
Value Error
Value error is raised when the built-in operation or a function receives an argument that has a
correct type but invalid value.
As studied in the previous section of the tutorial, Python has many built-in exceptions that
you can use in your program. Still, sometimes, you may need to create custom exceptions
with custom messages to serve your purpose.
You can achieve this by creating a new class, which will be derived from the pre-defined
Exception class in Python.
class UnAcceptedValueError(Exception):
def __init__(self, data):
self.data = data
def __str__(self):
return repr(self.data)
Total_Marks = int(input("Enter Total Marks Scored: "))
try:
Num_of_Sections = int(input("Enter Num of Sections: "))
if(Num_of_Sections < 1):
raise UnAcceptedValueError("Number of Sections can't be less than 1")
except UnAcceptedValueError as e:
print ("Received error:", e.data)
Enter Total Marks Scored: 10
Enter Num of Sections: 0
Received error: Number of Sections can't be less than 1
In the above example, as you observed that if you enter anything less than 1, a custom
exception will be raised and handled. Many standard modules define their exceptions to
report errors that may occur in functions they define.
Below is an example where the timeit module of Python is being used to check the execution
time of 2 different statements. In stmt1, try-except is used to handle ZeroDivisionError,
while in stmt2, if statement is used as a normal check condition. Then, you execute these
statements 10000 times with variable a=0. The point to note here is that the execution time of
both the statements is different. You will find that stmt1, which is handling the exception,
took a slightly longer time than stmt2, which is just checking the value and doing nothing if
the condition is not met.
Hence, you should limit the use of Python exception handling and use it for rare cases only.
For example, when you are not sure whether the input will be an integer or a float for
arithmetic calculations or not sure about the existence of a file while trying to open it.
import timeit
setup="a=0"
stmt1 = '''\
try:
b=10/a
except ZeroDivisionError:
pass'''
stmt2 = '''\
if a!=0:
b=10/a'''
print("time=",timeit.timeit(stmt1,setup,number=10000))
print("time=",timeit.timeit(stmt2,setup,number=10000))
time= 0.003897680000136461
time= 0.0002797570000439009
map
One of the common things we do with list and other sequences is applying an operation to
each item and collect the result.
For example, updating all the items in a list can be done easily with a for loop:
Since this is such a common operation, actually, we have a built-in feature that does most of
the work for us.
The map(aFunction, aSequence) function applies a passed-in function to each item in an
iterable object and returns a list containing all the function call results.
In the short example above, the lambda function squares each item in the items list.
map(aFunction, aSequence)
def square(x):
return (x**2)
def cube(x):
return (x**3)
[0, 0]
[1, 1]
[4, 8]
[9, 27]
[16, 64]
Because using map is equivalent to for loops, with an extra code we can always write a
general mapping utility:
Since it's a built-in, map is always available and always works the same way. It also has
some performance benefit because it is usually faster than a manually coded for loop. On top
of those, map can be used in more advance way. For example, given multiple sequence
arguments, it sends items taken form sequences in parallel as distinct arguments to the
function:
>>> pow(3,5)
243
>>> pow(2,10)
1024
>>> pow(3,11)
177147
>>> pow(4,12)
16777216
>>>
>>> list(map(pow, [2, 3, 4], [10, 11, 12]))
[1024, 177147, 16777216]
>>>
x = [1,2,3]
y = [4,5,6]
>>> m = [1,2,3]
>>> n = [1,4,9]
>>> new_tuple = map(None, m, n)
>>> new_tuple
[(1, 1), (2, 4), (3, 9)]
for _ in range(4):
arr.append(list(map(int, input().rstrip().split())))
print(arr)
123
456
789
10 11 12
These tools apply functions to sequences and other iterables. The filter filters out items based
on a test function which is a filter and apply functions to pairs of item and running result
which is reduce.
Because they return iterables, range and filter both require list calls to display all their results
in Python 3.0.
As an example, the following filter call picks out items in a sequence that are less than zero:
>>> list(range(-5,5))
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
>>>
>>> list( filter((lambda x: x < 0), range(-5,5)))
[-5, -4, -3, -2, -1]
>>>
Items in the sequence or iterable for which the function returns a true, the result are
added to the result list. Like map, this function is roughly equivalent to a for loop,
but it is built-in and fast:
>>>
>>> result = []
>>> for x in range(-5, 5):
if x < 0:
result.append(x)
>>> result
[-5, -4, -3, -2, -1]
>>>
a = [1,2,3,5,7,9]
b = [2,3,5,6,7,8]
print filter(lambda x: x in a, b) # prints out [2, 3, 5, 7]
a = [1,2,3,5,7,9]
b = [2,3,5,6,7,8]
print [x for x in a if x in b] # prints out [2, 3, 5, 7]
>>>
>>> from functools import reduce
>>> reduce( (lambda x, y: x * y), [1, 2, 3, 4] )
24
>>> reduce( (lambda x, y: x / y), [1, 2, 3, 4] )
0.041666666666666664
>>>
At each step, reduce passes the current product or division, along with the next
item from the list, to the passed-in lambda function. By default, the first item in
the sequence initialized the starting value.
Here's the for loop version of the first of these calls, with the multiplication
hardcoded inside the loop:
>>> L = [1, 2, 3, 4]
>>> result = L[0]
>>> for x in L[1:]:
result = result * x
>>> result
24
>>>
import functools
>>> L = ['Testing ', 'shows ', 'the ', 'presence', ', ','not ', 'the ', 'absence ', 'of ', 'bugs']
>>> functools.reduce( (lambda x,y:x+y), L)
'Testing shows the presence, not the absence of bugs'
>>>
>>> ''.join(L)
'Testing shows the presence, not the absence of bugs'
UNIT – V
Getting Started with Python Multithreading
Let us start by creating a Python module, named download.py. This file will contain all the
functions necessary to fetch the list of images and download them. We will split these
functionalities into three separate functions:
get_links
download_link
setup_download_dir
The third function, setup_download_dir, will be used to create a download destination
directory if it doesn’t already exist.
Imgur’s API requires HTTP requests to bear the Authorization header with the client ID. You
can find this client ID from the dashboard of the application that you have registered on
Imgur, and the response will be JSON encoded. We can use Python’s standard JSON library
to decode it. Downloading the image is an even simpler task, as all you have to do is fetch the
image by its URL and write it to a file.
import json
import logging
import os
from pathlib import Path
from urllib.request import urlopen, Request
logger = logging.getLogger(__name__)
def get_links(client_id):
headers = {'Authorization': 'Client-ID {}'.format(client_id)}
req = Request('https://api.imgur.com/3/gallery/random/random/', headers=headers,
method='GET')
with urlopen(req) as resp:
data = json.loads(resp.read().decode('utf-8'))
return [item['link'] for item in data['data'] if 'type' in item and item['type'] in types]
def setup_download_dir():
download_dir = Path('images')
if not download_dir.exists():
download_dir.mkdir()
return download_dir
Next, we will need to write a module that will use these functions to download the images,
one by one. We will name this single.py. This will contain the main function of our first,
naive version of the Imgur image downloader. The module will retrieve the Imgur client ID
in the environment variable IMGUR_CLIENT_ID. It will invoke the setup_download_dir to
create the download destination directory. Finally, it will fetch a list of images using the
get_links function, filter out all GIF and album URLs, and then use download_link to
download and save each of those images to the disk. Here is what single.py looks like:
import logging
import os
from time import time
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
for link in links:
download_link(download_dir, link)
logging.info('Took %s seconds', time() - ts)
if __name__ == '__main__':
main()
This is almost the same as the previous one, with the exception that we now have a new class,
DownloadWorker, which is a descendent of the Python Thread class. The run method has
been overridden, which runs an infinite loop. On every iteration, it calls self.queue.get() to try
and fetch a URL to from a thread-safe queue. It blocks until there is an item in the queue for
the worker to process. Once the worker receives an item from the queue, it then calls the
same download_link method that was used in the previous script to download the image to
the images directory. After the download is finished, the worker signals the queue that that
task is done. This is very important, because the Queue keeps track of how many tasks were
enqueued. The call to queue.join() would block the main thread forever if the workers did not
signal that they completed a task.
import logging
import os
from queue import Queue
from threading import Thread
from time import time
logger = logging.getLogger(__name__)
class DownloadWorker(Thread):
def run(self):
while True:
# Get the work from the queue and expand the tuple
directory, link = self.queue.get()
try:
download_link(directory, link)
finally:
self.queue.task_done()
def main():
ts = time()
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
# Create a queue to communicate with the worker threads
queue = Queue()
# Create 8 worker threads
for x in range(8):
worker = DownloadWorker(queue)
# Setting daemon to True will let the main thread exit even though the workers are
blocking
worker.daemon = True
worker.start()
# Put the tasks into the queue as a tuple
for link in links:
logger.info('Queueing {}'.format(link))
queue.put((download_dir, link))
# Causes the main thread to wait for the queue to finish processing all the tasks
queue.join()
logging.info('Took %s', time() - ts)
if __name__ == '__main__':
main()
Running this Python threading example script on the same machine used earlier results in a
download time of 4.1 seconds! That’s 4.7 times faster than the previous example. While this
is much faster, it is worth mentioning that only one thread was executing at a time throughout
this process due to the GIL. Therefore, this code is concurrent but not parallel. The reason it
is still faster is because this is an IO bound task. The processor is hardly breaking a sweat
while downloading these images, and the majority of the time is spent waiting for the
network. This is why Python multithreading can provide a large speed increase. The
processor can switch between the threads whenever one of them is ready to do some work.
Using the threading module in Python or any other interpreted language with a GIL can
actually result in reduced performance. If your code is performing a CPU bound task, such as
decompressing gzip files, using the threading module will result in a slower execution time.
For CPU bound tasks and truly parallel execution, we can use the multiprocessing module.
While the de facto reference Python implementation—CPython–has a GIL, this is not true of
all Python implementations. For example, IronPython, a Python implementation using
the .NET framework, does not have a GIL, and neither does Jython, the Java-based
implementation. You can find a list of working Python implementations here.
import logging
import os
from functools import partial
from multiprocessing.pool import Pool
from time import time
if __name__ == '__main__':
main()
Concurrency and Parallelism in Python Example 3: Distributing to Multiple Workers
While the threading and multiprocessing modules are great for scripts that are running on
your personal computer, what should you do if you want the work to be done on a different
machine, or you need to scale up to more than the CPU on one machine can handle? A great
use case for this is long-running back-end tasks for web applications. If you have some long-
running tasks, you don’t want to spin up a bunch of sub-processes or threads on the same
machine that need to be running the rest of your application code. This will degrade the
performance of your application for all of your users. What would be great is to be able to run
these jobs on another machine, or many other machines.
A great Python library for this task is RQ, a very simple yet powerful library. You first
enqueue a function and its arguments using the library. This pickles the function call
representation, which is then appended to a Redis list. Enqueueing the job is the first step, but
will not do anything yet. We also need at least one worker to listen on that job queue.
Related: Become More Advanced: Avoid the 10 Most Common Mistakes That Python
Programmers Make
Update
Python concurrent.futures
Something new since Python 3.2 that wasn’t touched upon in the original article is the
concurrent.futures package. This package provides yet another way to use concurrency and
parallelism with Python.
In the original article, I mentioned that Python’s multiprocessing module would be easier to
drop into existing code than the threading module. This was because the Python 3 threading
module required subclassing the Thread class and also creating a Queue for the threads to
monitor for work.
import logging
import os
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import time
logger = logging.getLogger(__name__)
def main():
client_id = os.getenv('IMGUR_CLIENT_ID')
if not client_id:
raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
download_dir = setup_download_dir()
links = get_links(client_id)
# By placing the executor inside a with block, the executors shutdown method
# will be called cleaning up threads.
#
# By default, the executor sets number of workers to 5 times the number of
# CPUs.
with ThreadPoolExecutor() as executor:
if __name__ == '__main__':
main()
Now that we have all these images downloaded with our Python ThreadPoolExecutor, we can
use them to test a CPU-bound task. We can create thumbnail versions of all the images in
both a single-threaded, single-process script and then test a multiprocessing-based solution.
We are going to use the Pillow library to handle the resizing of the images.
import logging
from pathlib import Path
from time import time
logger = logging.getLogger(__name__)
def main():
ts = time()
for image_path in Path('images').iterdir():
create_thumbnail((128, 128), image_path)
logging.info('Took %s', time() - ts)
if __name__ == '__main__':
main()
This script iterates over the paths in the images folder and for each path it runs the
create_thumbnail function. This function uses Pillow to open the image, create a thumbnail,
and save the new, smaller image with the same name as the original but with _thumbnail
appended to the name.
Running this script on 160 images totaling 36 million takes 2.32 seconds. Lets see if we can
speed this up using a ProcessPoolExecutor.
import logging
from pathlib import Path
from time import time
from functools import partial
logger = logging.getLogger(__name__)
def main():
ts = time()
# Partially apply the create_thumbnail method, setting the size to 128x128
# and returning a function of a single argument.
thumbnail_128 = partial(create_thumbnail, (128, 128))
# Create the executor in a with block so shutdown is called when the block
# is exited.
with ProcessPoolExecutor() as executor:
executor.map(thumbnail_128, Path('images').iterdir())
logging.info('Took %s', time() - ts)
if __name__ == '__main__':
main()
The create_thumbnail method is identical to the last script. The main difference is the
creation of a ProcessPoolExecutor. The executor’s map method is used to create the
thumbnails in parallel. By default, the ProcessPoolExecutor creates one subprocess per CPU.
Running this script on the same 160 images took 1.05 seconds—2.2 times faster!
Let’s jump right into the code and a more detailed explanation will follow.
import asyncio
import logging
import os
from time import time
import aiohttp
if __name__ == '__main__':
ts = time()
# Create the asyncio event loop
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
# Shutdown the loop even if there is an exception
loop.close()
logger.info('Took %s seconds to complete', time() - ts)
There is quite a bit to unpack here. Let’s start with the main entry point of the program. The
first new thing we do with the asyncio module is to obtain the event loop. The event loop
handles all of the asynchronous code. Then, the loop is run until complete and passed the
main function. There is a piece of new syntax in the definition of main: async def. You’ll also
notice await and with async.
The async/await syntax was introduced in PEP492. The async def syntax marks a function as
a coroutine. Internally, coroutines are based on Python generators, but aren’t exactly the same
thing. Coroutines return a coroutine object similar to how generators return a generator
object. Once you have a coroutine, you obtain its results with the await expression. When a
coroutine calls await, execution of the coroutine is suspended until the awaitable completes.
This suspension allows other work to be completed while the coroutine is suspended
“awaiting” some result. In general, this result will be some kind of I/O like a database request
or in our case an HTTP request.
Hopefully the Python threading examples in this article—and update—will point you in the
right direction so you have an idea of where to look in the Python standard library if you need
to introduce concurrency into your programs.