You are on page 1of 80

PY

TRAINING MATERIALS FOR IT PROFESSIONALS

Python 3.8
Advanced
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz

ho
ut
na
U
Advanced Python
EV

3.8
U
na

AL
ut
ho
riz

U
ed
R

AT
ep

with examples and


ro

hands-on exercises
d
uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
WEBUCATOR
n
Pr

PY
oh
bii
te
d
Copyright © 2020 by Webucator. All rights reserved.

No part of this manual may be reproduced or used in any manner without written permission of the
EV
copyright owner.

Version: PYT238.1.1.1
U

Class Files
na

AL
ut

Download the class files used in this manual at


ho

https://www.webucator.com/class-files/index.cfm?versionId=4765.
riz

Errata
U
ed

Corrections to errors in the manual can be found at https://www.webucator.com/books/errata.cfm.


R

AT
ep
ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d
Table of Contents
LESSON 1. Advanced Python Concepts...........................................................................................1
Lambda Functions....................................................................................................................1
EV
Advanced List Comprehensions................................................................................................2
Exercise 1: Rolling Five Dice...........................................................................................8
Collections Module.................................................................................................................10
Exercise 2: Creating a defaultdict.................................................................................17
U

Counters.................................................................................................................................21
na

AL
Exercise 3: Creating a Counter.....................................................................................26
ut

Mapping and Filtering............................................................................................................28


ho

Mutable and Immutable Built-in Objects...............................................................................33


riz

Sorting....................................................................................................................................34
U
ed

Exercise 4: Converting list.sort() to sorted(iterable).....................................................38


Sorting Sequences of Sequences............................................................................................41
R

AT
Creating a Dictionary from Two Sequences............................................................................44
ep

Unpacking Sequences in Function Calls.................................................................................45


ro

Exercise 5: Converting a String to a datetime.date Object............................................47


Modules and Packages...........................................................................................................48
d uc

LESSON 2. Regular Expressions.....................................................................................................53


IO
tio

Regular Expression Tester.......................................................................................................53


Regular Expression Syntax......................................................................................................55
n

Python’s Handling of Regular Expressions..............................................................................60


or

N
Exercise 6: Green Glass Door.......................................................................................68
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

Table of Contents | i
LESSON 3. Working with Data.....................................................................................................73
Virtual Environment...............................................................................................................73
Relational Databases..............................................................................................................74
Passing Parameters.................................................................................................................82
SQLite.....................................................................................................................................85
EV
Exercise 7: Querying a SQLite Database.......................................................................89
SQLite Database in Memory...................................................................................................90
Exercise 8: Inserting File Data into a Database.............................................................93
U

Drivers for Other Databases...................................................................................................97


na

CSV.........................................................................................................................................97
AL
Exercise 9: Finding Data in a CSV File.........................................................................102
ut

Creating a New CSV File........................................................................................................104


ho

Exercise 10: Creating a CSV with DictWriter...............................................................109


riz

Getting Data from the Web..................................................................................................110


U
ed

Exercise 11: HTML Scraping.......................................................................................120


XML.......................................................................................................................................124
R

AT
JSON.....................................................................................................................................125
ep

Exercise 12: JSON Home Runs....................................................................................131


ro

LESSON 4. Testing and Debugging.............................................................................................133


d

Testing for Performance.......................................................................................................133


uc

IO
Exercise 13: Comparing Times to Execute..................................................................143
tio

The unittest Module.............................................................................................................145


n

Exercise 14: Fixing Functions.....................................................................................153


or

Special unittest.TestCase Methods.......................................................................................154


N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

ii | Table of Contents
LESSON 5. Classes and Objects...................................................................................................157
Attributes..............................................................................................................................157
Behaviors..............................................................................................................................158
Classes vs. Objects................................................................................................................159
Attributes and Methods.......................................................................................................161
EV
Exercise 15: Adding a roll() Method to Die.................................................................170
Private Attributes.................................................................................................................173
Properties.............................................................................................................................175
U

Exercise 16: Properties..............................................................................................180


na

Objects that Track their Own History...................................................................................182


AL
Documenting Classes............................................................................................................184
ut

Exercise 17: Documenting the Die Class.....................................................................191


ho

Inheritance...........................................................................................................................192
riz

Exercise 18: Extending the Die Class..........................................................................195


U
ed

Extending a Class Method....................................................................................................198


Exercise 19: Extending the roll() Method...................................................................202
R

AT
Static Methods.....................................................................................................................204
ep

Class Attributes and Methods..............................................................................................206


ro

Abstract Classes and Methods.............................................................................................212


d

Understanding Decorators...................................................................................................219
uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

Table of Contents | iii


PY
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
LESSON 1
Advanced Python Concepts
EV
Topics Covered

Lambda functions.
U
na

AL
Advanced list comprehensions.
ut

The collections module.


ho

Mapping and filtering.


riz

U
ed

Sorting sequences.
R

AT
Unpacking sequences in function calls.
ep

Modules and packages.


ro
d uc

IO
Packing the basket was not quite such pleasant work as unpacking the basket.
tio

It never is.
n
or

N
– The Wind in the Willows, Kenneth Grahame
D
is

Introduction
tri

C
bu

In this lesson, you will learn about some Python functionality and techniques that are commonly used
tio

but require a solid foundation in Python to understand.


O
n
Pr

PY
oh

Lambda Functions
i bi

Lambda functions are anonymous functions that are generally used to complete a small task, after
te

which they are no longer needed. The syntax for creating a lambda function is:
d

LESSON 1 | 1
lambda arguments: expression

Lambda functions are almost always used within other functions, but for demonstration purposes, we
could assign a lambda function to a variable, like this:
EV
f = lambda n: n**2
U

We could then call f like this:


na

AL
ut

f(5) # Returns 25
ho

f(2) # Returns 4
riz

U
Try it at the Python terminal:
ed
R

AT
>>> f = lambda n: n**2
ep

>>> f(5)
25
ro

>>> f(2)
d

4
uc

IO
tio

We will revisit lambda functions throughout this lesson.


n
or

N
Advanced List Comprehensions
D
is
tri

Quick Review of Basic List Comprehensions


C
bu

Before we get into advanced list comprehensions, let’s do a quick review. The basic syntax for a list
tio

O
comprehension is:
n
Pr

my_list = [f(x) for x in iterable if condition]


PY
oh
i bi

In the preceding code f(x) could be any of the following:


te
d

1. Just a variable name (e.g., x).


2. An operation (e.g., x**2).

2 | Advanced Python Concepts


3. A function call (e.g., len(x) or square(x)).

Here is an example from earlier, in which we create a list by filtering another list:

Demo 1: advanced-python-concepts/Demos/sublist_from_list.py
EV
1. def main():
2. words = ['Woodstock', 'Gary', 'Tucker', 'Gopher', 'Spike', 'Ed',
3. 'Faline', 'Willy', 'Rex', 'Rhino', 'Roo', 'Littlefoot',
4. 'Bagheera', 'Remy', 'Pongo', 'Kaa', 'Rudolph', 'Banzai',
U

5. 'Courage', 'Nemo', 'Nala', 'Alvin', 'Sebastian', 'Iago']


na

AL
6. three_letter_words = [w for w in words if len(w) == 3]
7. print(three_letter_words)
ut

8.
ho

9. main()
riz

U
ed

Code Explanation
R

AT
This will return:
ep
ro

['Rex', 'Roo', 'Kaa']


d uc

IO
tio

And here is a new example, in which we map all the elements in one list to another using a function:
n

Demo 2: advanced-python-concepts/Demos/list_comp_mapping.py
or

N
D

1. def get_inits(name):
is

2. # Create list from first letter of each name part


3. inits = [name_part[0] for name_part in name.split()]
tri

C
4. # Join inits list on "." and append "." to end
bu

5. return '.'.join(inits) + '.'


tio

6.
O
7. def main():
n

8. people = ['George Washington', 'John Adams',


Pr

9. 'Thomas Jefferson', 'John Quincy Adams']


PY
oh

10.
11. # Create list by mapping person elements to get_inits()
i bi

12. inits = [get_inits(person) for person in people]


13. print(inits)
te

14.
d

15. main()

LESSON 1 | 3
Code Explanation

This will return:

['G.W.', 'J.A.', 'T.J.', 'J.Q.A.']


EV
Now, on to the more advanced uses of list comprehension.
U
na

Multiple for Loops


AL
ut

Assume that you need to create a list of tuples showing the possible permutations of rolling two six-sided
ho

dice. When dealing with permutations, order matters, so (1, 2) and (2, 1) are not the same. First, let’s
riz

look at how we would do this without a list comprehension, using a nested for loop:
U
ed

Demo 3: advanced-python-concepts/Demos/dice_rolls.py
R

AT
1. def main():
ep

2. dice_rolls = []
ro

3. for a in range(1, 7):


4. for b in range(1, 7):
d uc

5. roll = (a, b)
IO
6. dice_rolls.append(roll)
tio

7.
n

8. print(dice_rolls)
9.
or

N
10. main()
D
is
tri

Code Explanation
C
bu

This will return:


tio

O
n
Pr

PY
oh
i bi
te
d

4 | Advanced Python Concepts


[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
EV
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]

List comprehensions can include multiple for loops with each subsequent loop nested within the
U

previous loop. This provides an easy way to create something similar to a two-dimensional array or a
na

AL
matrix:
ut
ho

Demo 4: advanced-python-concepts/Demos/dice_rolls_list_comp.py
riz

1. def main():
U
2. dice_rolls = [
ed

3. (a, b)
4. for a in range(1, 7)
R

AT
5. for b in range(1, 7)
ep

6. ]
ro

7.
d

8. print(dice_rolls)
uc

9.
IO
10. main()
tio
n
or

Code Explanation
N
D

This code will create the same list of tuples containing all the possible permutations of two dice rolls.
is
tri

C
bu

Notice that the list of permutations contains what game players would consider duplicates. For example,
(1, 2) and (2, 1) are considered the same in dice. We can remove these pseudo-duplicates by starting
tio

O
the second for loop with the current value of a in the first for loop. Let’s do this first without a list
n

comprehension:
Pr

PY
oh
bii
te
d

LESSON 1 | 5
Demo 5: advanced-python-concepts/Demos/dice_combos.py
1. def main():
2. dice_rolls = []
3. for a in range(1, 7):
4. for b in range(a, 7):
EV
5. roll = (a, b)
6. dice_rolls.append(roll)
7.
8. print(dice_rolls)
U

9.
na

10. main()
AL
ut
ho

Code Explanation
riz

U
The first time through the outer loop, the inner loop from 1 to 7 (not including 7), the second time
ed

through, it will loop from 2 to 7, then 3 to 7, and so on…


R

AT
ep

The dice_rolls list will now contain the different possible rolls (from a dice rolling point of view):
ro
d

[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
uc

(2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
IO
(3, 3), (3, 4), (3, 5), (3, 6),
tio

(4, 4), (4, 5), (4, 6),


n

(5, 5), (5, 6),


or

(6, 6)]
N
D
is

Where we previously showed permutations, in which order matters, we are now showing combinations,
tri

in which order does not matter. The following two tuples represent different permutations, but the
C
bu

same combination: (1, 2) and (2, 1).


tio

O
n

Now, let’s see how we can do the same thing with a list comprehension:
Pr

PY
oh
i bi
te
d

6 | Advanced Python Concepts


Demo 6: advanced-python-concepts/Demos/dice_combos_list_comp.py
1. def main():
2. dice_rolls = [
3. (a, b)
4. for a in range(1, 7)
EV
5. for b in range(a, 7)
6. ]
7.
8. print(dice_rolls)
U

9.
na

10. main()
AL
ut
ho

Code Explanation
riz

U
This code will create the same list of tuples containing all the possible combinations of two dice rolls.
ed
R

AT
ep
ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 1 | 7
Exercise 1: Rolling Five Dice
10 to 15 minutes
EV
There is no limit to the number of for loops in a list comprehension, so we can use this same technique
to get the possibilities for more than two dice.

1. Create a new file in advanced-python-concepts/Exercises named list_comprehen


U

sions.py.
na

AL
2. Write two separate list comprehensions:
ut
ho

A. The first should create five-item tuples for all unique permutations from rolling five
identical six-sided dice. Remember, when looking for permutations, order matters.
riz

U
B. The second should create five-item tuples for all unique combinations from rolling
ed

five identical six-sided dice. Remember, when looking for combinations, order doesn’t
matter.
R

AT
ep

3. Print the length of each list.


ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

8 | Advanced Python Concepts


PY

LESSON 1 | 9
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/list_comprehensions.py
1. # Get unique permutations:
2. dice_rolls_p = [(a, b, c, d, e)
3. for a in range(1, 7)
4. for b in range(1, 7)
EV
5. for c in range(1, 7)
6. for d in range(1, 7)
7. for e in range(1, 7)]
8.
U

9. print('Number of permutations:', len(dice_rolls_p))


na

10.
AL
11. # Get unique combinations:
ut

12. dice_rolls_c = [(a, b, c, d, e)


ho

13. for a in range(1,7 )


14. for b in range(a, 7)
riz

U
15. for c in range(b, 7)
ed

16. for d in range(c, 7)


17. for e in range(d, 7)]
R

AT
18.
ep

19. print('Number of combinations:', len(dice_rolls_c))


ro
d

Code Explanation
uc

IO
tio

This file will output:


n
or

Number of permutations: 7776


N
Number of combinations: 252
D
is
tri

C
bu

Collections Module
tio

O
n

The collections module includes specialized containers (objects that hold data) that provide more
Pr

specific functionality than Python’s built-in containers (list, tuple, dict, and set). Some of the
PY
oh

more useful containers are named tuples (created with the namedtuple() function), defaultdict,
i

and Counter.
bi
te
d

Named Tuples
Imagine you are creating a game in which you need to set and get the position of a target. You could
do this with a regular tuple like this:

10 | Advanced Python Concepts


# Set target position:
target_pos = (100, 200)

# Get x value of target position


target_pos[0] # 100
EV
But someone reading your code might not understand what target_pos[0] refers to.
U

A named tuple allows you to reference target_pos.x, which is more meaningful and helpful. Here
na

AL
is a simplified signature for creating namedtuple objects:
ut
ho

namedtuple(typename, field_names)
riz

U
1. typename – The value passed in for typename will be the name of a new tuple subclass. It is
ed

standard for the name of the new subclass to begin with a capital letter. We have not yet
R

AT
covered classes and subclasses yet. For now, it is enough to know that the new tuple subclass
ep

created by namedtuple() will inherit all the properties of a tuple, and also make it possible
ro

to refer to elements of the tuple by name.


d

2. field_names – The value for field_names can either be a whitespace-delimited string (e.g.,
uc

IO
'x y'), a comma-delimited string (e.g., 'x, y'), or a sequence of strings (e.g., ['x', 'y']).
tio
n

Demo 7: advanced-python-concepts/Demos/namedtuple.py
or

N
1. from collections import namedtuple
D

2.
3. Point = namedtuple('Point', 'x, y')
is

4.
tri

C
5. # Set target position:
bu

6. target_pos = Point(100, 200)


tio

7.
O
8. # Get x value of target position
n

9. print(target_pos.x)
Pr

PY
oh

Code Explanation
i bi
te

As the preceding code shows, the namedtuple() function allows you to give a name to the elements
d

at different positions in a tuple and then refer to them by that name.

LESSON 1 | 11
Default Dictionaries (defaultdict)
With regular dictionaries, trying to modify a key that doesn’t exist will cause an exception. For example,
the following code will result in a KeyError:
EV
foo = {}
foo['bar'] += 1
U

A defaultdict is like a regular dictionary except that, when you look up a key that doesn’t exist, it
na

AL
creates the key and assigns it the value returned by a function specified when creating it.
ut
ho

To illustrate how a defaultdict can be useful, let’s see how we would create a regular dictionary that
shows the number of different ways each number (2 through 12) can be rolled when rolling two dice,
riz

U
like this:
ed
R

{
AT
ep

2: 1,
3: 2,
ro

4: 3,
d

5: 4,
uc

IO
6: 5,
tio

7: 6,
8: 5,
n

9: 4,
or

N
10: 3,
D

11: 2,
is

12: 1
tri

}
C
bu

There is only one way to roll a 2: (1, 1).


tio

O
n

There are two ways to roll a 3: (1, 2) and (2, 1).


Pr

There are five ways to roll a 6: (1, 5), (2, 4), (3, 3), (4, 2), and (5, 1).
PY
oh

1. First, create the list of possibilities as we did earlier:


i bi
te

dice_rolls = [
d

(a, b)
for a in range(1, 7)
for b in range(1, 7)
]

12 | Advanced Python Concepts


2. Next, create an empty dictionary, roll_counts, and then loop through the dice_rolls list
checking for the existence of a key that is the sum of the dice roll. For example, on the first
iteration, we find (1, 1), which when added together, gives us 2. Since roll_counts does
not have a key 2, we need to add that key and set its value to 1. The same is true for when we
find (1, 2), which adds up to 3. But later when we find (2, 1), which also adds up to 3,
EV
we don’t need to recreate the key. Instead, we increment the existing key’s value by 1. The
code looks like this:
U

roll_counts = {}
na

for roll in dice_rolls:


AL
if sum(roll) in roll_counts:
ut

roll_counts[sum(roll)] += 1
ho

else:
riz

roll_counts[sum(roll)] = 1
U
ed

This method works fine and gives us the following roll_counts dictionary that we looked at earlier:
R

AT
ep

{
ro

2: 1,
d

3: 2,
uc

IO
4: 3,
tio

5: 4,
6: 5,
n

7: 6,
or

N
8: 5,
D

9: 4,
is

10: 3,
tri

11: 2,
C
bu

12: 1
}
tio

O
n

An alternative to using conditionals to make sure the key exists is to just go ahead and try to increment
Pr

the value of each potential key we find and then, if we get a KeyError, assign 1 for that key, like this:
PY
oh
i bi
te
d

LESSON 1 | 13
roll_counts = {}
for roll in dice_rolls:
try:
roll_counts[sum(roll)] += 1
except KeyError:
EV
roll_counts[sum(roll)] = 1

This also works and produced the same dictionary.


U
na

But with a defaultdict, we don’t need the if-else block or the try-except block. The code looks
AL
like this:
ut
ho

from collections import defaultdict


riz

U
ed

roll_counts = defaultdict(int)
for roll in dice_rolls:
R

AT
roll_counts[sum(roll)] += 1
ep
ro

The result is a defaultdict object that can be treated just like a normal dictionary:
d
uc

IO
defaultdict(<class 'int'>, {
tio

2: 1,
n

3: 2,
4: 3,
or

N
5: 4,
D

6: 5,
is

7: 6,
tri

8: 5,
C
bu

9: 4,
10: 3,
tio

O
11: 2,
n

12: 1
Pr

})
PY
oh

Here are the three methods again:


i bi
te
d

14 | Advanced Python Concepts


Demo 8: advanced-python-concepts/Demos/dict_if_else.py
-------Lines 1 through 4 Omitted-------
5. roll_counts = {}
6. for roll in dice_rolls:
7. if sum(roll) in roll_counts:
EV
8. roll_counts[sum(roll)] += 1
9. else:
10. roll_counts[sum(roll)] = 1
U
na

AL
Demo 9: advanced-python-concepts/Demos/dict_try_except.py
ut

-------Lines 1 through 4 Omitted-------


ho

5. roll_counts = {}
riz

6. for roll in dice_rolls:


U
7. try:
ed

8. roll_counts[sum(roll)] += 1
9. except KeyError:
R

AT
10. roll_counts[sum(roll)] = 1
ep
ro

Demo 10: advanced-python-concepts/Demos/defaultdict.py


d uc

IO
1. from collections import defaultdict
tio

-------Lines 2 through 6 Omitted-------


7. roll_counts = defaultdict(int)
n

8. for roll in dice_rolls:


or

N
9. roll_counts[sum(roll)] += 1
D
is
tri

Notice in the preceding code that we passed int to defaultdict():


C
bu
tio

roll_counts = defaultdict(int)
O
n
Pr

Remember, when you try to look up a key that doesn’t exist in a defaultdict, it creates the key and
PY
oh

assigns it the value returned by a function you specified when creating it. In this case, that function is
int().
i bi
te

When passing the function to defaultdict(), you do not include parentheses, because you are not
d

calling the function at the time you pass it to defaultdict(). Rather, you are specifying that you
want to use this function to give you default values for new keys. By passing int, we are stating that

LESSON 1 | 15
we want new keys to have a default value of whatever int() returns when no argument is passed to
it. That value is 0:

>>> int()
0
EV
You can create default dictionaries with any number of functions, both built-in and user-defined:
U

defaultdict with Built-in Functions


na

AL
a = defaultdict(list) # Default key value will be []
ut

b = defaultdict(str) # Default key value will be ''


ho
riz

defaultdict with lambda Function


U
ed

c = defaultdict(lambda: 5) # Default key value will be 5


c['a'] += 1 # c['a'] will contain 6
R

AT
ep

defaultdict with User-defined Function


ro

def foo():
d
uc

return 'bar'
IO
tio

d = defaultdict(foo) # Default key value will be 'bar'


n

d['a'] = d['a'].upper() # d['a'] will contain 'BAR'


or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

16 | Advanced Python Concepts


Exercise 2: Creating a defaultdict
15 to 20 minutes
EV
In this exercise, you will organize the 1927 New York Yankees by position by creating a default dictionary
that looks like this:

defaultdict(<class 'list'>,
U

{
na

AL
'OF': ['Earle Combs', 'Cedric Durst', 'Bob Meusel',
ut

'Ben Paschal', 'Babe Ruth'],


ho

'C': ['Benny Bengough', 'Pat Collins', 'Johnny Grabowski'],


'2B': ['Tony Lazzeri', 'Ray Morehart'],
riz

'SS': ['Mark Koenig'],


U
ed

'3B': ['Joe Dugan', 'Mike Gazella', 'Julie Wera'],


'P': ['Walter Beall', 'Joe Giard', 'Waite Hoyt',
R

AT
'Wilcy Moore', 'Herb Pennock', 'George Pipgras',
ep

'Dutch Ruether', 'Bob Shawkey', 'Urban Shocker',


'Myles Thomas'],
ro

'1B': ['Lou Gehrig']


d

})
uc

IO
tio

You will start with this list of dictionaries:


n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 1 | 17
yankees_1927 = [
{'position': 'P', 'name': 'Walter Beall'},
{'position': 'C', 'name': 'Benny Bengough'},
{'position': 'C', 'name': 'Pat Collins'},
{'position': 'OF', 'name': 'Earle Combs'},
EV
{'position': '3B', 'name': 'Joe Dugan'},
{'position': 'OF', 'name': 'Cedric Durst'},
{'position': '3B', 'name': 'Mike Gazella'},
{'position': '1B', 'name': 'Lou Gehrig'},
U

{'position': 'P', 'name': 'Joe Giard'},


na

AL
{'position': 'C', 'name': 'Johnny Grabowski'},
ut

{'position': 'P', 'name': 'Waite Hoyt'},


{'position': 'SS', 'name': 'Mark Koenig'},
ho

{'position': '2B', 'name': 'Tony Lazzeri'},


riz

{'position': 'OF', 'name': 'Bob Meusel'},


U
ed

{'position': 'P', 'name': 'Wilcy Moore'},


{'position': '2B', 'name': 'Ray Morehart'},
R

{'position': 'OF', 'name': 'Ben Paschal'},


AT
ep

{'position': 'P', 'name': 'Herb Pennock'},


{'position': 'P', 'name': 'George Pipgras'},
ro

{'position': 'P', 'name': 'Dutch Ruether'},


d

{'position': 'OF', 'name': 'Babe Ruth'},


uc

IO
{'position': 'P', 'name': 'Bob Shawkey'},
tio

{'position': 'P', 'name': 'Urban Shocker'},


{'position': 'P', 'name': 'Myles Thomas'},
n

{'position': '3B', 'name': 'Julie Wera'}


or

N
]
D
is

1. Open advanced-python-concepts/Exercises/defaultdict.py in your editor.


tri

C
bu

2. Write code so that the script creates the defaultdict above from the given list.
tio

3. Output the pitchers stored in your new defaultdict.


O
n
Pr

PY
oh
i bi
te
d

18 | Advanced Python Concepts


PY

LESSON 1 | 19
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/defaultdict.py
1. from collections import defaultdict
2.
3. yankees_1927 = [
4. {'position': 'P', 'name': 'Walter Beall'},
EV
5. {'position': 'C', 'name': 'Benny Bengough'},
6. {'position': 'C', 'name': 'Pat Collins'},
7. {'position': 'OF', 'name': 'Earle Combs'},
8. {'position': '3B', 'name': 'Joe Dugan'},
U

9. {'position': 'OF', 'name': 'Cedric Durst'},


na

10. {'position': '3B', 'name': 'Mike Gazella'},


AL
11. {'position': '1B', 'name': 'Lou Gehrig'},
ut

12. {'position': 'P', 'name': 'Joe Giard'},


ho

13. {'position': 'C', 'name': 'Johnny Grabowski'},


14. {'position': 'P', 'name': 'Waite Hoyt'},
riz

U
15. {'position': 'SS', 'name': 'Mark Koenig'},
ed

16. {'position': '2B', 'name': 'Tony Lazzeri'},


17. {'position': 'OF', 'name': 'Bob Meusel'},
R

AT
18. {'position': 'P', 'name': 'Wilcy Moore'},
ep

19. {'position': '2B', 'name': 'Ray Morehart'},


20. {'position': 'OF', 'name': 'Ben Paschal'},
ro

21. {'position': 'P', 'name': 'Herb Pennock'},


d

22. {'position': 'P', 'name': 'George Pipgras'},


uc

IO
23. {'position': 'P', 'name': 'Dutch Ruether'},
tio

24. {'position': 'OF', 'name': 'Babe Ruth'},


25. {'position': 'P', 'name': 'Bob Shawkey'},
n

26. {'position': 'P', 'name': 'Urban Shocker'},


or

27. {'position': 'P', 'name': 'Myles Thomas'},


N
28. {'position': '3B', 'name': 'Julie Wera'}
D

29. ]
is

30.
tri

31. # Each value will be a list of players, so we pass list to defaultdict


C
bu

32. positions = defaultdict(list)


33.
tio

O
34. # Loop through list of yankees appending player names to their position keys
n

35. for player in yankees_1927:


Pr

36. positions[player['position']].append(player['name'])
PY
37.
oh

38. print(positions['P'])
bii
te

Code Explanation
d

This will output the list of pitchers:

20 | Advanced Python Concepts


[
'Walter Beall',
'Joe Giard',
'Waite Hoyt',
'Wilcy Moore',
EV
'Herb Pennock',
'George Pipgras',
'Dutch Ruether',
'Bob Shawkey',
U

'Urban Shocker',
na

AL
'Myles Thomas'
ut

]
ho

Add the following line of code to the for loop to watch as the players get added:
riz

U
ed

print(player['position'], positions[player['position']])
R

AT
ep

The beginning of the output will look like this:


ro
d

PS …\advanced-python-concepts\Solutions> python defaultdict.py


uc

IO
P ['Walter Beall']
tio

C ['Benny Bengough']
C ['Benny Bengough', 'Pat Collins']
n

OF ['Earle Combs']
or

N
3B ['Joe Dugan']
D

OF ['Earle Combs', 'Cedric Durst']


is

3B ['Joe Dugan', 'Mike Gazella']


tri

1B ['Lou Gehrig']
C
bu

P ['Walter Beall', 'Joe Giard']


C ['Benny Bengough', 'Pat Collins', 'Johnny Grabowski']
tio


O
n
Pr

PY
oh

Counters
i bi
te

Consider again the defaultdict object we created to get the number of different ways each number
d

could be rolled when rolling two dice. This type of task is very common. You might have a collection
of plants and want to get a count of the number of each species or the number of plants by color. The

LESSON 1 | 21
objects that hold these counts are called counters, and the collections module includes a special
Counter() class for creating them.

Although there are different ways of creating counters, they are most often created with an iterable,
like this:
EV
from collections import Counter
c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])
U
na

AL
This will create the following counter:
ut
ho

Counter({
'blue': 3,
riz

U
'green': 2,
ed

'red': 1,
'yellow': 1
R

AT
})
ep
ro

To create a counter from the dice_rolls list we used earlier, we need to first create a list of sums
d

from it, like this:


uc

IO
tio

roll_sums = [sum(roll) for roll in dice_rolls]


n
or

N
roll_sums will contain the following list:
D
is

[
tri

C
2, 3, 4, 5, 6, 7,
bu

3, 4, 5, 6, 7, 8,
tio

4, 5, 6, 7, 8, 9,
O
5, 6, 7, 8, 9, 10,
n

6, 7, 8, 9, 10, 11,
Pr

7, 8, 9, 10, 11, 12
PY
oh

]
i bi

We then create the counter like this:


te
d

c = Counter(roll_sums)

That creates a counter that is very similar to the defaultdict we saw earlier:

22 | Advanced Python Concepts


Counter({
7: 6,
6: 5,
8: 5,
5: 4,
EV
9: 4,
4: 3,
10: 3,
3: 2,
U

11: 2,
na

AL
2: 1,
ut

12: 1
})
ho
riz

U
The code in the following file creates and outputs the colors and dice rolls counters:
ed

Demo 11: advanced-python-concepts/Demos/counter.py


R

AT
ep

1. from collections import Counter


2. c = Counter(['green','blue','blue','red','yellow','green','blue'])
ro

3. print('Colors Counter:', c, sep='\n', end='\n\n')


d

4.
uc

IO
5. dice_rolls = [(a,b)
tio

6. for a in range(1,7)
7. for b in range(1,7)]
n

8.
or

9. roll_sums = [sum(roll) for roll in dice_rolls]


N
10. c = Counter(roll_sums)
D

11. print('Dice Roll Counter:', c, sep='\n')


is
tri

C
bu

Run this file. It will output:


tio

O
n

Colors Counter:
Pr

Counter({'blue': 3, 'green': 2, 'red': 1, 'yellow': 1})


PY
oh

Dice Roll Counter:


i bi

Counter({7: 6, 6: 5, 8: 5, 5: 4, 9: 4, 4: 3, 10: 3, 3: 2, 11: 2, 2: 1, 12: 1})


te
d

LESSON 1 | 23
Updating Counters

Counter is a subclass of dict. We will learn more about subclasses later, but for now all you need to
understand is that a subclass generally has access to all of its superclass’s methods and data. So, Counter
supports all the standard dict instance methods. The update() method behaves differently though.
EV
In standard dict objects, update() replaces key values with those of the passed-in dictionary:

Updating with a Dictionary


U

grades = {'English':97, 'Math':93, 'Art':74, 'Music':86}


na

AL
grades.update({'Math':97, 'Gym':93})
ut
ho

The grades dictionary will now contain:


riz

U
ed

{
'English': 97,
R

AT
'Math': 97, # 97 replaces 93
ep

'Art': 74,
'Music': 86,
ro

'Gym': 93 # Key is added with value of 93


d

}
uc

IO
tio

In Counter objects, update() adds the values of a passed-in iterable or another Counter object to its
n

own values:
or

N
D

Updating a Counter with a List


is

>>> c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])


tri

C
>>> c # Before update:
bu

Counter({'blue': 3, 'green': 2, 'red': 1, 'yellow': 1})


tio

>>> c.update(['red', 'yellow', 'yellow', 'purple'])


O
>>> c # After update:
n

Counter({'blue': 3, 'yellow': 3, 'green': 2, 'red': 2, 'purple': 1})


Pr

PY
oh
i bi
te
d

24 | Advanced Python Concepts


Updating a Counter with a Counter
>>> c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])
>>> d = Counter(['green', 'violet'])
>>> c.update(d)
>>> c
EV
Counter({'green': 3, 'blue': 3, 'red': 1, 'yellow': 1, 'violet': 1})

Counters also have a corresponding subtract() method. It works just like update() but subtracts
U

rather than adds the passed-in iterable counts:


na

AL
ut

Subtracting with a Counter


ho

>>> c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])


riz

>>> c # Before subtraction:


U
Counter({'blue': 3, 'green': 2, 'red': 1, 'yellow': 1})
ed

>>> c.subtract(['red', 'yellow', 'yellow', 'purple'])


>>> c # After subtraction:
R

AT
Counter({'blue': 3, 'green': 2, 'red': 0, 'yellow': -1, 'purple': -1})
ep
ro

Notice that the value for the 'yellow' and 'purple' keys are negative, which is a little odd. We will learn
d

how to create a non-negative counter in a later lesson. (see page 200)


uc

IO
tio

The most_common([n]) Method


n
or

N
Counters include a most_common([n]) method that returns the n most common elements and their
D

counts, sorted from most to least common. If n is not passed in, all elements are returned.
is
tri

C
>>> c = Counter(['green', 'blue', 'blue', 'red', 'yellow', 'green', 'blue'])
bu

>>> c.most_common()
tio

[('blue', 3), ('green', 2), ('red', 1), ('yellow', 1)]


O
>>> c.most_common(2)
n

[('blue', 3), ('green', 2)]


Pr

PY
oh
i bi
te
d

LESSON 1 | 25
Exercise 3: Creating a Counter
10 to 15 minutes
EV
In this exercise, you will create a counter that holds the most common words used and the number of
times they show up in the U.S. Declaration of Independence.

1. Create a new Python script in advanced-python-concepts/Exercises named counter.py.


U

2. Write code that:


na

AL
A. Reads the Declaration_of_Independence.txt file in the same folder.
ut
ho

B. Creates a list of all the words that have at least six characters.
riz

Use split() to split the text into words. This will split the text on
U
ed

whitespace. This isn’t quite correct as it doesn’t account for punctuation,


but for now, it’s good enough.
R

AT
Use upper() to convert the words to uppercase.
ep
ro

C. Creates a Counter from the word list.


d

D. Outputs the most common ten words and their counts. The result should look like
uc

IO
this:
tio
n

[
('PEOPLE', 13),
or

N
('STATES', 7),
D

('SHOULD', 5),
is

('INDEPENDENT', 5),
tri

('AGAINST', 5),
C
bu

('GOVERNMENT,', 4),
('ASSENT', 4),
tio

('OTHERS', 4),
O
n

('POLITICAL', 3),
Pr

('POWERS', 3)
]
PY
oh
i bi
te
d

26 | Advanced Python Concepts


PY

LESSON 1 | 27
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/counter.py
1. from collections import Counter
2.
3. with open('Declaration_of_Independence.txt') as f:
4. doi = f.read()
EV
5.
6. word_list = [word for word in doi.upper().split() if len(word) > 5]
7.
8. c = Counter(word_list)
U

9. print(c.most_common(10))
na

AL
ut
ho

Mapping and Filtering


riz

U
ed

map(function, iterable, …)
R

AT
The built-in map() function is used to sequentially pass all the values of an iterable (or multiple iterables)
ep

to a function and return an iterator containing the returned values. It can be used as an alternative to
ro

list comprehensions. First, consider the following code sample that does not use map():
d uc

IO
Demo 12: advanced-python-concepts/Demos/without_map.py
tio

1. def multiply(x, y):


n

2. return x * y
or

3.
N
4. def main():
D

5. nums1 = range(0, 10)


is

6. nums2 = range(10, 0, -1)


tri

7.
C
bu

8. multiples = []
9. for i in range(len(nums1)):
tio

10. multiple = multiply(nums1[i], nums2[i])


O
n

11. multiples.append(multiple)
Pr

12.
13. for multiple in multiples:
PY
oh

14. print(multiple)
i

15.
bi

16. main()
te
d

28 | Advanced Python Concepts


Code Explanation

This code creates an iterator (a list) by multiplying 0 by 10, 1 by 9, 2 by 8,… 9 by 1, and 10 by 0. It


then loops through the iterator printing each result. It will output:
EV
0
9
16
21
U

24
na

AL
25
ut

24
ho

21
16
riz

9
U
ed
R

AT
The following code sample does the same thing using map():
ep

Demo 13: advanced-python-concepts/Demos/with_map.py


ro
d

1. def multiply(x, y):


uc

IO
2. return x * y
tio

3.
4. def main():
n

5. nums1 = range(0, 10)


or

6. nums2 = range(10, 0, -1)


N
7.
D

8. multiples = map(multiply, nums1, nums2)


is

9.
tri

10. for multiple in multiples:


C
bu

11. print(multiple)
12.
tio

13. main()
O
n
Pr

PY
Code Explanation
oh
i bi

We could also include the map() function right in the for loop:
te
d

LESSON 1 | 29
for multiple in map(multiply, nums1, nums2):
print(multiple)

Note that you can accomplish the same thing with a list comprehension:
EV
multiples = [multiply(nums1[i], nums2[i]) for i in range(len(nums1))]
U

One possible advantage of using map() in combination with multiple sequences is that it will not error
na

AL
if the sequences are different lengths. It will stop mapping when it reaches the end of the shortest
ut

sequence. In some cases, this might also be a disadvantage as it might hide a bug in the code. Also, this
ho

“feature” can be reproduced easily enough using the min() function:


riz

U
ed

[multiply(nums1[i], nums2[i]) for i in range(min(len(nums1), len(nums2)))]


R

AT
ep

filter(function, iterable)
ro

The built-in filter() function is used to sequentially pass all the values of a single iterable to a
duc

function and return an iterator containing the values for which the function returns True. As with
IO
map(), filter() can be used as an alternative to list comprehensions. First, consider the following
tio

code sample that does not use filter():


n
or

N
Demo 14: advanced-python-concepts/Demos/without_filter.py
D

1. def is_odd(num):
is

2. return num % 2
tri

C
3.
bu

4. def main():
tio

5. nums = range(0, 10)


O
6.
n

7. odd_nums = []
Pr

8. for num in nums:


PY
9. if is_odd(num):
oh

10. odd_nums.append(num)
i

11.
bi

12. for num in odd_nums:


te

13. print(num)
d

14.
15. main()

30 | Advanced Python Concepts


Code Explanation

This code passes a range of numbers one by one to the is_odd() function to create an iterator (a list)
of odd numbers. It then loops through the iterator printing each result. It will output:
EV
1
3
5
7
U

9
na

AL
ut
ho

The following code sample does the same thing using filter():
riz

Demo 15: advanced-python-concepts/Demos/with_filter.py


U
ed

1. def is_odd(num):
R

2. return num % 2
AT
3.
ep

4. def main():
ro

5. nums = range(0, 10)


d

6.
uc

7. odd_nums = filter(is_odd, nums)


IO
8.
tio

9. for num in odd_nums:


n

10. print(num)
or

11.
N
12. main()
D
is
tri

Code Explanation
C
bu

As with map(), we can include the filter() function right in the for loop:
tio

O
n

for num in filter(is_odd, nums):


Pr

print(num)
PY
oh
i bi

Again, you can accomplish the same thing with a list comprehension:
te
d

odd_nums = [num for num in nums if is_odd(num)]

LESSON 1 | 31
Using Lambda Functions with map() and filter()
The map() and filter() functions are both often used with lambda functions, like this:

>>> nums1 = range(0, 10)


EV
>>> nums2 = range(10, 0, -1)
>>> for multiple in map(lambda n: nums1[n] * nums2[n], range(10)):
... print(multiple)
U

...
na

0
AL
9
ut

16
ho

21
24
riz

U
25
ed

24
21
R

AT
16
ep

9
ro

>>> for num in filter(lambda n: n % 2 == 1, range(10)):


d uc

... print(num)
IO
...
tio

1
n

3
or

5
N
7
D

9
is
tri

C
bu

Let's just *keep* lambda


tio

O
Some programmers, including Guido van Rossum, the creator of Python, dislike lambda,
n

filter() and map(). These programmers feel that list comprehension can generally be used
Pr

instead. However, other programmers love these functions and Guido eventually gave up the
PY
oh

fight to remove them from Python. In February, 2006, he wrote:1


bii

You’ll have to decide for yourself whether or not to use them.


te
d

1. https://mail.python.org/pipermail/python-dev/2006-February/060415.html

32 | Advanced Python Concepts


Mutable and Immutable Built-in Objects
The difference between mutable objects, such as lists and dictionaries, and immutable objects, such as
strings, integers, and tuples, may seem pretty straightforward: mutable objects can be changed; immutable
objects cannot. But it helps to have a deeper understanding of how this can affect your code.
EV
Consider the following code:
U

>>> v1 = 'A'
na

>>> v2 = 'A'
AL
>>> v1 is v2
ut

True
ho

>>> list1 = ['A']


>>> list2 = ['A']
riz

U
>>> list1 is list2
ed

False
R

AT
ep

Immutable objects cannot be modified in place. Every time you “change” a string, you are actually
creating a new string:
ro
d uc

>>> name = 'Nat'


IO
>>> id(name)
tio

2613698710320
n

>>> name += 'haniel'


or

>>> id(name)
N
2613698710512
D
is
tri

Notice the ids are different. It is impossible to modify an immutable object in place.
C
bu

Lists, on the other hand, are mutable and can be modified in place. For example:
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 1 | 33
>>> v1 = [1, 2]
>>> v2 = v1
>>> id(v1) == id(v2)
True # Both variables point to the same list
>>> v1, v2
EV
([1, 2], [1, 2])
>>> v2 += [3]
>>> v1, v2
([1, 2, 3], [1, 2, 3])
U

>>> id(v1) == id(v2)


na

AL
True # Both variables still point to the same list
ut
ho

Notice that with lists, v2 changes when we change v1. Both are pointing at the same list object, which
riz

is mutable. So, when we modify the v2 list, we see the change in v1, because it points to the same
U
object.
ed

Be careful though. If you use the assignment operator, you will overwrite the old list and create a new
R

AT
ep

list object:
ro

>>> v1 = [1, 2]
d
uc

>>> v2 = v1
IO
>>> v1, v2
tio

([1, 2], [1, 2])


n

>>> v1 = v1 + [3]
or

>>> v1, v2
N
([1, 2, 3], [1, 2])
D
is
tri

Assigning v1 explicitly with the assignment operator, rather than appending a value via v1.append(3),
C
bu

results in a new object.


tio

O
Sorting
n
Pr

PY
oh

Sorting Lists in Place


i bi

Python lists have a sort() method that sorts the list in place:
te
d

34 | Advanced Python Concepts


colors = ['red', 'blue', 'green', 'orange']
colors.sort()

The colors list will now contain:


EV
['blue', 'green', 'orange', 'red']
U

The sort() method can take two keyword arguments: key and reverse.
na

AL
ut

reverse
ho

The reverse argument is a boolean:


riz

U
ed

colors = ['red', 'blue', 'green', 'orange']


R

colors.sort(reverse=True)
AT
ep
ro

The colors list will now contain:


d uc

IO
['red', 'orange', 'green', 'blue']
tio
n

key
or

N
D

The key argument takes a function to be called on each list item and performs the sort based on the
is

result. For example, the following code will sort by word length:
tri

C
bu

colors = ['red', 'blue', 'green', 'orange']


tio

colors.sort(key=len)
O
n
Pr

The colors list will now contain:


PY
oh
i

['red', 'blue', 'green', 'orange']


bi
te
d

And the following code will sort by last name:

LESSON 1 | 35
def get_lastname(name):
return name.split()[-1]

people = ['George Washington', 'John Adams',


'Thomas Jefferson', 'John Quincy Adams']
EV
people.sort(key=get_lastname)

The people list will now contain:


U
na

AL
[
ut

'John Adams',
ho

'John Quincy Adams',


'Thomas Jefferson',
riz

'George Washington'
U
ed

]
R

AT
Note that John Quincy Adams shows up after John Adams in the result only because he shows up after
ep

him in the initial list. Our code as it stands does not take into account middle or first names.
ro
d

Using Lambda Functions with key


uc

IO
tio

If you don’t want to create a new named function just to perform the sort, you can use a lambda
n

function. For example, the following code would do the same thing as the code above without the need
or

for the get_lastname() function:


N
D
is

people = ['George Washington', 'John Adams',,


tri

'Thomas Jefferson', 'John Quincy Adams']


C
people.sort(key=lambda name: name.split()[-1])
bu
tio

O
n

Combining key and reverse


Pr

The key and reverse arguments can be combined. For example, the following code will sort by word
PY
oh

length in descending order:


i bi
te

colors = ['red', 'blue', 'green', 'orange']


d

colors.sort(key=len, reverse=True)

The colors list will now contain:

36 | Advanced Python Concepts


['orange', 'green', 'blue', 'red']

The sorted() Function


EV
The built-in sorted() function requires an iterable as its first argument and can take key and reverse
as keyword arguments. It works just like the list’s sort() method except that:
U

1. It does not modify the iterable in place. Rather, it returns a new sorted list.
na

AL
2. It can take any iterable, not just a list (but it always returns a list).
ut
ho
riz

U
ed
R

AT
ep
ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 1 | 37
Exercise 4: Converting list.sort() to sorted(iterable)
15 to 25 minutes
EV
In this exercise, you will convert all the examples of sort() we saw earlier to use sorted() instead.

1. Open advanced-python-concepts/Exercises/sorting.py in your editor.


U

2. The code in the first example has already been converted to use sorted().
na

AL
3. Convert all other code examples in the script.
ut
ho

Exercise Code: advanced-python-concepts/Exercises/sorting.py


riz

1. # Simple sort() method


U
2. colors = ['red', 'blue', 'green', 'orange']
ed

3. # colors.sort()
4. new_colors = sorted(colors) # This one has been done for you
R

AT
5. print(new_colors)
ep

6.
ro

7. # The reverse argument:


8. colors.sort(reverse=True)
d uc

9. print(colors)
IO
10.
tio

11. # The key argument:


12. colors.sort(key=len)
n

13. print(colors)
or

N
14.
D

15. # The key argument with named function:


is

16. def get_lastname(name):


tri

17. return name.split()[-1]


C
18.
bu

19. people = ['George Washington', 'John Adams',


tio

20. 'Thomas Jefferson', 'John Quincy Adams']


O
21. people.sort(key=get_lastname)
n

22. print(people)
Pr

23.
PY
oh

24. # The key argument with lambda:


25. people.sort(key=lambda name: name.split()[-1])
i bi

26. print(people)
te

27.
28. # Combing key and reverse
d

29. colors.sort(key=len, reverse=True)


30. print(colors)

38 | Advanced Python Concepts


PY

LESSON 1 | 39
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz
ho
ut
na
U
Solution: advanced-python-concepts/Solutions/sorting.py
1. # Simple sort() method
2. colors = ['red', 'blue', 'green', 'orange']
3. # colors.sort()
4. new_colors = sorted(colors) # This one has been done for you
EV
5. print(new_colors)
6.
7. # The reverse argument:
8. # colors.sort(reverse=True)
U

9. # print(colors)
na

10. new_colors = sorted(colors, reverse=True)


AL
11. print(new_colors)
ut

12.
ho

13. # The key argument:


14. # colors.sort(key=len)
riz

U
15. # print(colors)
ed

16. new_colors = sorted(colors, key=len)


17. print(new_colors)
R

AT
18.
ep

19. # The key argument with named function:


20. def get_lastname(name):
ro

21. return name.split()[-1]


d

22.
uc

IO
23. people = ['George Washington', 'John Adams',
tio

24. 'Thomas Jefferson', 'John Quincy Adams']


25. # people.sort(key=get_lastname)
n

26. # print(people)
or

27. new_people = sorted(people, key=get_lastname)


N
28. print(new_people)
D

29.
is

30. # The key argument with lambda function:


tri

31. people = ['George Washington', 'John Adams',


C
bu

32. 'Thomas Jefferson', 'John Quincy Adams']


33. # people.sort(key=lambda name: name.split()[-1])
tio

O
34. # print(people)
n

35. new_people = sorted(people, key=lambda name: name.split()[-1])


Pr

36. print(new_people)
PY
37.
oh

38. # Combining key and reverse


i

39. # colors.sort(key=len, reverse=True)


bi

40. # print(colors)
te

41. new_colors = sorted(colors, key=len, reverse=True)


d

42. print(new_colors)

40 | Advanced Python Concepts


Sorting Sequences of Sequences
When you sort a sequence of sequences, Python first sorts by the first element of each sequence, then
by the second element, and so on. For example:
EV
ww2_leaders = [
('Charles', 'de Gaulle'),
('Winston', 'Churchill'),
U

('Teddy', 'Roosevelt'), # Not a WW2 leader, but helps make point


na

('Franklin', 'Roosevelt'),
AL
('Joseph', 'Stalin'),
ut

('Adolph', 'Hitler'),
ho

('Benito', 'Mussolini'),
('Hideki', 'Tojo')
riz

U
]
ed

ww2_leaders.sort()
R

AT
ep

The ww2_leaders list will be sorted by first name and then by last name. It will now contain:
ro
d uc

[
IO
('Adolph', 'Hitler'),
tio

('Benito', 'Mussolini'),
n

('Charles', 'de Gaulle'),


or

('Franklin', 'Roosevelt'),
N
('Hideki', 'Tojo'),
D

('Joseph', 'Stalin'),
is

('Teddy', 'Roosevelt'),
tri

C
('Winston', 'Churchill')
bu

]
tio

O
n

To change the order of the sort, use a lambda function:


Pr

PY
oh

ww2_leaders.sort(key=lambda leader: (leader[1], leader[0]))


i bi

The ww2_leaders list will now be sorted by last name and then by first name. It will now contain:
te
d

LESSON 1 | 41
[
('Winston', 'Churchill'),
('Adolph', 'Hitler'),
('Benito', 'Mussolini'),
('Franklin', 'Roosevelt'),
EV
('Teddy', 'Roosevelt'),
('Joseph', 'Stalin'),
('Hideki', 'Tojo'),
('Charles', 'de Gaulle')
U

]
na

AL
ut

It may seem strange that “de Gaulle” comes after “Tojo,” but that is correct. Lowercase letters come
ho

after uppercase letters in sorting. To change the result, you can use the lower() function:
riz

U
ed

ww2_leaders.sort(key=lambda leader: (leader[1].lower(), leader[0]))


R

AT
ww2_leaders will now contain:
ep
ro

[
d uc

('Winston', 'Churchill'),
IO
('Charles', 'de Gaulle'),
tio

('Adolph', 'Hitler'),
n

('Benito', 'Mussolini'),
or

('Franklin', 'Roosevelt'),
N
('Teddy', 'Roosevelt'),
D

('Joseph', 'Stalin'),
is

('Hideki', 'Tojo')
tri

]
C
bu
tio

The preceding code can also be found in advanced-python-concepts/Demos/sequence_of_se


O
n

quences.py.
Pr

PY
oh

Sorting Sequences of Dictionaries


i bi

You may often find data stored as lists of dictionaries:


te
d

42 | Advanced Python Concepts


from datetime import date
ww2_leaders = []
ww2_leaders.append(
{'fname':'Winston', 'lname':'Churchill', 'dob':date(1889,4,20)}
)
EV
ww2_leaders.append(
{'fname':'Charles', 'lname':'de Gaulle', 'dob':date(1883,7,29)}
)
ww2_leaders.append(
U

{'fname':'Adolph', 'lname':'Hitler', 'dob':date(1890,11,22)}


na

AL
)
ut

ww2_leaders.append(
{'fname':'Benito', 'lname':'Mussolini', 'dob':date(1882,1,30)}
ho

)
riz

ww2_leaders.append(
U
ed

{'fname':'Franklin', 'lname':'Roosevelt', 'dob':date(1884,12,30)}


)
R

ww2_leaders.append(
AT
ep

{'fname':'Joseph', 'lname':'Stalin', 'dob':date(1878,12,18)}


)
ro

ww2_leaders.append(
d

{'fname':'Hideki', 'lname':'Tojo', 'dob':date(1874,11,30)}


uc

IO
)
tio
n

This data can be sorted using a lambda function similar to how we sorted lists of tuples:
or

N
D

ww2_leaders.sort(key=lambda leader: leader['dob'])


is
tri

You can use this same technique to sort by a tuple:


C
bu
tio

ww2_leaders.sort(key=lambda leader: (leader['lname'], leader['fname']))


O
n
Pr

The preceding code can also be found in advanced-python-concepts/Demos/sequence_of_dic


PY
oh

tionaries.py.
i bi

itemgetter()
te
d

While the method shown above works fine, the operator module provides an itemgetter() method
that performs this same task a bit faster. It works like this:

LESSON 1 | 43
Demo 16: advanced-python-concepts/Demos/sorting_with_itemgetter.py
1. from datetime import date
2. from operator import itemgetter
3.
4. def main():
EV
-------Lines 5 through 27 Omitted-------
28. ww2_leaders.sort(key=itemgetter('dob'))
29. print('First born:', ww2_leaders[0]['fname'])
30.
U

31. ww2_leaders.sort(key=itemgetter('lname', 'fname'))


na

AL
32. print('First in Encyclopedia:', ww2_leaders[0]['fname'])
ut

33.
34. main()
ho
riz

U
ed

Creating a Dictionary from Two Sequences


R

AT
ep

Follow these steps to make a dictionary from two lists using the first list for keys and the second list
for values:
ro
d

1. Use the built-in zip() function to make a list of two-element tuples from the two lists:
uc

IO
tio

>>> courses = ['English', 'Math', 'Art', 'Music']


>>> grades = [96, 99, 88, 94]
n

>>> z = zip(courses, grades)


or

N
D

This will result in a zip object.


is

2. Pass the zip object to the dict() constructor:


tri

C
bu

>>> course_grades = dict(z)


tio

>>> course_grades
O
{'English': 96, 'Math': 99, 'Art': 88, 'Music': 94}
n
Pr

PY
oh

You can do the above in one step, like this:


i bi
te

course_grades = dict(zip(courses, grades))


d

This works with any type of sequence. For example, you could create a dictionary mapping letters to
numbers like this:

44 | Advanced Python Concepts


>>> letter_mapping = dict(zip('abcdef', range(6)))
>>> letter_mapping
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5}
EV
Unpacking Sequences in Function Calls
Sometimes you’ll have a sequence that contains the exact arguments a function needs. To illustrate,
U

consider the following function:


na

AL
ut

import math
def distance_from_origin(a, b):
ho

return math.sqrt(a**2 + b**2)


riz

U
ed

The function expects two arguments, a and b, which are the x, y coordinates of a point. It uses the
Pythagorean theorem to determine the distance the point is from the origin.
R

AT
ep

We can call the function like this:


ro
d

c = distance_from_origin(3, 4)
uc

IO
tio

But it would be nice to be able to call the function like this too:
n
or

N
point = (3, 4)
D

c = distance_from_origin(point)
is
tri

C
However, that will cause an error because the function expects two arguments and we’re only passing
bu

in one.
tio

O
One solution would be to pass the individual elements of our point:
n
Pr

PY
point = (3, 4)
oh

c = distance_from_origin(point[0], point[1])
i bi
te

But Python provides an even easier solution. We can use an asterisk in the function call to unpack the
d

sequence into separate elements:

LESSON 1 | 45
point = (3, 4)
c = distance_from_origin(*point)

When you pass a sequence preceded by an asterisk into a function, the sequence gets unpacked, meaning
that the function receives the individual elements rather than the sequence itself.
EV
The preceding code can also be found in advanced-python-concepts/Demos/unpacking_func
tion_arguments.py.
U
na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
d
uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

46 | Advanced Python Concepts


Exercise 5: Converting a String to a datetime.date Object
10 to 20 minutes
EV
In this exercise, you will convert a string representing a date to a datetime.date object. This is the
starting code:

Exercise Code: advanced-python-concepts/Exercises/converting_date_string_to_datetime.py


U
na

1. import datetime
AL
2.
ut

3. def str_to_date(str_date):
ho

4. # Write function
5. pass
riz

U
6.
ed

7. str_date = input('Input date as YYYY-MM-DD: ')


8. date = str_to_date(str_date)
R

AT
9. print(date)
ep
ro

1. Open advanced-python-concepts/Exercises/converting_date_string_to_date
d uc

time.py in your editor.


IO
tio

2. The imported datetime module includes a date() method that can create a date object
n

from three passed-in parameters: year, month, and day. For example:
or

N
datetime.date(1776, 7, 4)
D
is
tri

3. Write the code for the str_to_date() function so that it…


C
bu

A. Splits the passed-in string into a list of date parts. Each part should be an integer.
tio

B. Returns a date object created by passing the unpacked list of date parts to
O
n

datetime.date().
Pr

PY
oh
i bi
te
d

LESSON 1 | 47
Solution: advanced-python-concepts/Solutions/converting_date_string_to_datetime.py
1. import datetime
2.
3. def str_to_date(str_date):
4. date_parts = [int(i) for i in str_date.split("-")]
EV
5. return datetime.date(*date_parts)
6.
7. str_date = input('Input date as YYYY-MM-DD: ')
8. date = str_to_date(str_date)
U

9. print(date)
na

AL
ut
ho

Modules and Packages


riz

U
You have worked with different Python modules (e.g., random and math) and packages (e.g.,
ed

collections). In general, it’s not all that important to know whether a library you want to use is a
R

AT
module or a package, but there is a difference, and when you’re creating your own, it’s important to
ep

understand that difference.


ro
d

Modules
uc

IO
tio

A module is a single file. It can be made up of any number of functions and classes. You can import
the whole module using:
n
or

N
import module_name
D
is
tri

Or you can import specific functions or classes from the module using:
C
bu
tio

from module_name import class_or_function_name


O
n
Pr

For example, if you want to use the random() function from the random module, you can do so by
PY
importing the whole module or by importing just the random() function:
oh
i bi
te
d

48 | Advanced Python Concepts


>>> import random
>>> random.random()
0.8843194837647139
>>> from random import random
>>> random()
EV
0.251050616400771

As shown above, when you import the whole module, you must prefix the module’s functions with
U

the module name.


na

AL
Every .py file is a module. When you build a module with the intention of making it available to other
ut

modules for importing, it is common to include a _test() function that runs tests when the module
ho

is run directly. For example, if you run random.py, which is in the Lib directory of your Python home,
riz

the output will look something like this:


U
ed

2000 times random


R

AT
0.003 sec, avg 0.500716, stddev 0.285239, min 0.000495333, max 0.99917
ep
ro

2000 times normalvariate


d

0.004 sec, avg 0.0061499, stddev 0.971102, min -2.86188, max 3.02266
uc

IO
2000 times lognormvariate
tio

0.004 sec, avg 1.64752, stddev 2.12612, min 0.0310675, max 28.5174
n


or

N
D

Where is My Python Home?


is
tri

C
To find your Python home, run the following code:
bu
tio

Demo 17: advanced-python-concepts/Demos/find_python_home.py


O
n

1. import sys
Pr

2. import os
PY
3.
oh

4. python_home = os.path.dirname(sys.executable)
i bi

5. print(python_home)
te
d

Open random.py in an editor and you will see it ends with this code:

LESSON 1 | 49
if __name__ == '__main__':
_test()

The __name__ variable of any module that is imported holds that module’s name. For example, if you
EV
import random and then print random.__name__, it will output “random”. However, if you open
random.py, add a line that reads print(__name__), and run it, it will print “__main__”. So, the if
condition in the code above just checks to see if the file has been imported. If it hasn’t (i.e., if it’s
running directly), then it will call the _test() function.
U
na

AL
If you do not want to write tests, you could include code like this:
ut
ho

if __name__ == '__main__':
riz

print('This module is for importing, not for running directly.')


U
ed
R

Packages
AT
ep

A package is a group of files (and possibly subfolders) stored in a directory that includes a file named
ro

__init__.py. The __init__.py file does not need to contain any code. Some libraries’ __init__.py files
d

have a simple comment like this:


uc

IO
tio

# Dummy file to make this a package.


n
or

N
However, you can include code in the __init__.py file that will initialize the package. You can also (but
D

do not have to) set a global __all__ variable, which should contain a list of files to be imported when
is

a file imports your package using from package_name import *. If you do not set the __all__
tri

C
variable, then that form of import will not be allowed, which may be just fine.
bu
tio

Search Path for Modules and Packages


O
n
Pr

The Python interpreter must locate the imported modules. When import is used within a script, the
PY
oh

interpreter searches for the imported module in the following places sequentially:
i bi

1. The current directory (same directory as script doing the importing).


te

2. The library of standard modules.


d

50 | Advanced Python Concepts


3. The paths defined in sys.path.2

As you see, the steps involved in creating modules and packages for import are relatively straightforward.
However, designing useful and easy-to-use modules and packages takes a lot of planning and thought.
EV
Conclusion
In this lesson, you have learned several advanced techniques with sequences. You have also learned to
U

do mapping and filtering, and to use lambda functions. Finally, you have learned how modules and
na

packages are created.


AL
ut
ho
riz

U
ed
R

AT
ep
ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

2. sys.path contains a list of strings specifying the search path for modules. The list is os-dependent. To see your list, import sys,
and then output sys.path.

LESSON 1 | 51
PY
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed

52 | Advanced Python Concepts


EV
riz
ho
ut
na
U
LESSON 2
Regular Expressions
EV
Topics Covered

Understanding regular expressions.


U
na

Python’s re module.
AL
ut
ho

Tom’s whole class were of a pattern--restless, noisy, and troublesome. When they came
riz

to recite their lessons, not one of them knew his verses perfectly, but had to be prompted
U
all along.
ed

– The Adventures of Tom Sawyer, Mark Twain


R

AT
ep
ro

Introduction
d uc

IO
Regular expressions are used to do pattern matching in many programming languages, including Java,
tio

PHP, JavaScript, C, C++, and Perl. We will provide a brief introduction to regular expressions and
then we’ll show you how to work with them in Python.
n
or

N
D

Regular Expression Tester


is
tri

C
We will use the online regular expression testing tool at https://pythex.org to demonstrate and
bu

test our regular expressions. To see how it works, open the page in your browser:
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 2 | 53
EV
U
na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
d

As shown in the screenshot:


uc

IO
tio

1. Enter “rose” in the Your regular expression field.


n

2. Enter “A rose is a rose is a rose.” in the Your test string field.


or

3. Notice the Match result. The parts of the string that match your pattern will be highlighted.
N
D

Usually, you will want to have the MULTILINE option selected so that each line will be tested
is

individually.
tri

C
bu

In the Your test string field, you can test multiple strings:
tio

O
n
Pr

PY
oh
i
bi
te
d

54 | Regular Expressions
EV
U
na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
d uc

IO
tio

These examples just find occurrences of a substring (e.g., “rose”) in a string (e.g, “A rose is a rose is a
n

rose.”). But the power of regular expressions is in pattern matching. The best way to get a feel for them
is to try them out. So, let’s do that.
or

N
D

Regular Expression Syntax


is
tri

C
bu

Here we’ll show the different symbols used in regular expressions. You should use https://pythex.org
to test the patterns we show.
tio

O
n

Start and End ( ^ $ )


Pr

PY
oh

A caret ( ^ ) at the beginning of a regular expression indicates that the string being searched must start
i

with this pattern.


bi
te

The pattern ^dog can be found in “dogfish”, but not in “bulldog” or “boondoggle”.
d

A dollar sign ( $ ) at the end of a regular expression indicates that the string being searched must end
with this pattern.

LESSON 2 | 55
The pattern dog$ can be found in “bulldog”, but not in “dogfish” or “boondoggle”.

Word Boundaries (\b \B)

Backslash-b ( \b ) denotes a word boundary. It matches a location at the beginning or end of a word.
EV
A word is a sequence of numbers, letters, and underscores. Any other character is considered a word
boundary.

The pattern dog\b matches the first but not the second occurence of “dog” in the phrase
U
na

“The bulldog bit the dogfish.”


AL
ut

In the phrase “The dogfish bit the bulldog.”, it only matches the second occurrence of “dog”,
ho

because a period is a word boundary.


riz

Backslash-B ( \B ) is the opposite of backslash-b ( \b ). It matches a location that is not a word boundary.
U
ed

The pattern dog\B matches the second but not the first occurence of “dog” in the phrase
R

AT
“The bulldog bit the dogfish.”
ep

But in the phrase “The dogfish bit the bulldog.”, it only matches the first occurrence of “dog”.
ro
d

Number of Occurrences ( ? + * {} )
uc

IO
tio

The following symbols affect the number of occurrences of the preceding character: ?, +, *, and {}.
n
or

A question mark ( ? ) indicates that the preceding character should appear zero or one times in the
N
pattern.
D
is

The pattern go?ad can be found in “goad” and “gad”, but not in “gooad”. Only zero or one
tri

C
“o” is allowed before the “a”.
bu
tio

A plus sign ( + ) indicates that the preceding character should appear one or more times in the pattern.
O
n

The pattern go+ad can be found in “goad”, “gooad” and “goooad”, but not in “gad”.
Pr

PY
oh

An asterisk ( * ) indicates that the preceding character should appear zero or more times in the pattern.
i bi

The pattern go*ad can be found in “gad”, “goad”, “gooad” and “goooad”.
te
d

Curly braces with one parameter ( {n} ) indicate that the preceding character should appear exactly n
times in the pattern.

56 | Regular Expressions
The pattern fo{3}d can be found in “foood” , but not in “food” or “fooood”.

Curly braces with two parameters ( {n1,n2} ) indicate that the preceding character should appear
between n1 and n2 times in the pattern.
EV
The pattern fo{2,4}d can be found in “food”, “foood” and “fooood”, but not in “fod” or
“foooood”.

Curly braces with one parameter and an empty second parameter ( {n,} ) indicate that the preceding
U

character should appear at least n times in the pattern.


na

AL
ut

The pattern fo{2,}d can be found in “food” and “foooood”, but not in “fod”.
ho
riz

Common Characters ( . \d \D \w \W \s \S )
U
ed

A period ( . ) represents any character except a newline.


R

AT
ep

The pattern fo.d can be found in “food”, “foad”, “fo9d”, and “fo d”.
ro

Backslash-d ( \d ) represents any digit. It is the equivalent of [0-9] (to be discussed soon).
d uc

IO
The pattern fo\dd can be found in “fo1d”, “fo4d” and “fo0d”, but not in “food” or “fodd”.
tio
n

Backslash-D ( \D ) represents any character except a digit. It is the equivalent of [^0-9] (to be discussed
or

soon).
N
D

The pattern fo\Dd can be found in “good” and “gold”, but not in “go4d”.
is
tri

Backslash-w ( \w ) represents any word character (letters, digits, and the underscore (_) ).
C
bu
tio

The pattern fo\wd can be found in “food”, “fo_d” and “fo4d”, but not in “fo*d”.
O
n

Backslash-W ( \W ) represents any character except a word character.


Pr

PY
oh

The pattern fo\Wd can be found in “fo*d”, “fo@d” and “fo.d”, but not in “food”.
i bi

Backslash-s ( \s ) represents any whitespace character (e.g., space, tab, newline).


te
d

The pattern fo\sd can be found in “fo d”, but not in “food”.

Backslash-S ( \S ) represents any character except a whitespace character.

LESSON 2 | 57
The pattern fo\Sd can be found in “fo*d”, “food” and “fo4d”, but not in “fo d”.

Character Classes ( [] )

Square brackets ( [] ) are used to create a character class (or character set), which specifies a set of
EV
characters to match.

The pattern f[aeiou]d can be found in “fad” and “fed”, but not in “food”, “fyd” or “fd”
U

[aeiou] matches an “a”, an “e”, an “i”, an “o”, or a “u”


na

AL
ut

The pattern f[aeiou]{2}d can be found in “faed” and “feod”, but not in “fod”, “fold” or
ho

“fd”.
riz

The pattern [A-Za-z]+ can be found twice in “Webucator, Inc.”, but not in “13066”.
U
ed

[A-Za-z] matches any lowercase or uppercase letter.


R

[A-Z] matches
AT
any uppercase letter.
ep

[a-z] matches any lowercase letter.


ro
d

The pattern [1-9]+ can be found twice in “13066”, but not in “Webucator, Inc.”
uc

IO
tio

Negation ( ^ )
n
or

When used as the first character within a character class, the caret ( ^ ) is used for negation. It matches
N
any characters not in the set.
D
is

The pattern f[^aeiou]d can be found in “fqd” and “f4d”, but not in “fad” or “fed”.
tri

C
bu

Groups ( () )
tio

O
n

Parentheses ( () ) are used to capture subpatterns and store them as groups, which can be retrieved
Pr

later.
PY
oh

The pattern f(oo)?d can be found in “food” and “fd”, but not in “fod”.
i bi

The whole group can show up zero or one time.


te
d

Alternatives ( | )

The pipe ( | ) is used to create optional patterns.

58 | Regular Expressions
The pattern ^web|or$ can be found in “website”, “educator”, and twice in “webucator”, but
not in “cobweb” or “orphan”.

Escape Character ( \ )
EV
The backslash ( \ ) is used to escape special characters.

The pattern fo\.d can be found in “fo.d”, but not in “food” or “fo4d”.
U
na

AL
Backreferences
ut
ho

Backreferences are special wildcards that refer back to a group within a pattern. They can be used to
make sure that two subpatterns match. The first group in a pattern is referenced as \1, the second
riz

U
group is referenced as \2, and so on.
ed

For example, the pattern ([bmpr])o\1 matches “bobcat”, “thermometer”, “popped”, and “prorate”.
R

AT
ep

A more practical example has to do with matching the delimiter in social security numbers. Examine
ro

the following regular expression:


d uc

IO
^\d{3}([\- ]?)\d{2}([\- ]?)\d{4}$
tio
n

Within the caret (^) and dollar sign ($), which are used to specify the beginning and end of the pattern,
or

N
there are three sequences of digits, optionally separated by a hyphen or a space. This pattern will be
D

matched in all of the following strings (and more):


is
tri

123-45-6789
C
bu

123 45 6789
tio

123456789
O
n

123-45 6789
Pr

123 45-6789
PY
oh

123-456789
i bi
te

The last three strings are not ideal, but they do match the pattern. Backreferences can be used to make
sure that the second delimiter matches the first delimiter. The regular expression would look like this:
d

LESSON 2 | 59
^\d{3}([\- ]?)\d{2}\1\d{4}$

The \1 refers back to the first subpattern. Only the first three strings listed above match this regular
expression.
EV
Python’s Handling of Regular Expressions
U

In Python, you use the re module to access the regular expression engine. Here is a very simple
na

AL
illustration. Imagine you’re looking for the pattern “r[aeiou]se” in the string “A rose is a rose is a rose.”
ut

1. Import the re module:


ho
riz

import re
U
ed

2. Compile the pattern:


R

AT
ep

p = re.compile('r[aeiou]se')
ro
d

3. Search the string for a match:


uc

IO
tio

result = p.search('A rose is a rose is a rose.')


n
or

4. Print the result:


N
D

print(result)
is
tri

C
bu

This will print the following, showing that the result is a match object and that it found the match
tio

“rose” starting at index 2 and ending at index 6:


O
n
Pr

<_sre.SRE_Match object; span=(2, 6), match='rose'>


PY
oh

Compiling a regular expression pattern into an object is a good idea if you’re going to reuse the expression
i bi

throughout the program, but if you’re just using it once or twice, you can use the module-level search()
te

method, like this:


d

60 | Regular Expressions
>>> result = re.search('r[aeiou]se', 'A rose is a rose is a rose.')
>>> result
<re.Match object; span=(2, 6), match='rose'>
EV
Raw String Notation

Python uses the backslash character ( \ ) to escape special characters. For example \n is a newline
character. A call to print('a\nb\nc') will print the letters a, b, and c each on its own line:
U
na

AL
ut

>>> print('a\nb\nc')
a
ho

b
riz

c
U
ed

If you actually want to print a backslash followed by an “n”, you need to escape the backslash with
R

AT
another backslash, like this: print('a\\nb\\nc'). That will print the literal string “a\nb\nc”:
ep
ro

>>> print('a\\nb\\nc')
d

a\nb\nc
uc

IO
tio

Python provides another way of doing this. Instead of escaping all the backslashes, you can use rawstring
n

notation by placing the letter “r” before the beginning of the string, like this: print(r'a\nb\nc'):
or

N
D

>>> print(r'a\nb\nc')
is

a\nb\nc
tri

C
bu

While this may not come in very handy in most areas of programming, it is very helpful when writing
tio

regular expression patterns. That is because the regular expression syntax also uses the backslash for
O
n

special characters. If you don’t use raw string notation, you may find your patterns filled with backslashes.
Pr

The takeaway: Always use raw string notation for your patterns.
PY
oh
i bi
te
d

LESSON 2 | 61
Regular Expression Object Methods
1. p.search(string) – Finds the first substring that matches the pattern. Returns a Match
object or None.
EV
>>> p = re.compile(r'\W')
>>> p.search('andré@example.com')
<re.Match object; span=(5, 6), match='@'>
U

This finds the first non-word character.


na

AL
2. p.match(string) – Like search(), but the match must be found at the beginning of the
ut

string. Returns a Match object or None.


ho
riz

>>> p = re.compile(r'\W')
U
>>> p.match('andré@example.com') # Returns None
ed

>>> p.match('@example.com')
R

<re.Match object; span=(0, 1), match='@'>


AT
ep

This matches the first character if it is a non-word character. The first example returns None
ro

as the passed-in string begins with “a”.


d uc

3. p.fullmatch(string) – Like search(), but the whole string must match. Returns a Match
IO
tio

object or None.
n

>>> p = re.compile(r'[\w\.]+@example.com')
or

N
>>> p.match('andré@example.com')
D

<re.Match object; span=(0, 17), match='andré@example.com'>


is
tri

This matches a string made up of word characters and periods followed by “@example.com”.
C
bu

4. p.findall(string) – Finds all non-overlapping matches. Returns a list of strings.


tio

O
>>> p = re.compile(r'\W')
n

>>> p.findall('andré@example.com')
Pr

['@', '.']
PY
oh

This returns a list of all matches of non-word characters.


i bi
te
d

62 | Regular Expressions
5. p.split(string, maxsplit=0) – Splits the string on pattern matches. If maxsplit is
nonzero, limits splits to maxsplit. Returns a list of strings.

>>> p = re.compile(r'\W')
>>> p.split('andré@example.com')
EV
['andré', 'example', 'com']

This splits the string on non-word characters.


U

6. p.sub(repl, string, count=0) – Replaces all non-overlapping matches in string with


na

repl. If count is nonzero, limits replacements to count. More details on sub() under Using
AL
sub() with a Function
ut

(see page 64). Returns a string.


ho

All the methods that search a string for a pattern (search(), match(), fullmatch(), and findall())
riz

include start and end parameters that indicate what positions in the string to start and end the search.
U
ed

Groups
R

AT
ep

As discussed earlier, parentheses in regular expression patterns are used to capture groups. You can
ro

access these groups individually using a match object’s group() method or all at once using its groups()
d

method.
uc

IO
tio

The group() method takes an integer3 as an argument and returns a string:


n

match.group(0) returns the whole match.


or

N
match.group(1) returns the first group found.
D
is

match.group(2) returns the second group found.


tri

C
And so on...
bu
tio

You can also get multiple groups at the same time returned as a tuple of strings by passing in more
O
n

than one argument (e.g., match.group(1, 2)).


Pr

When nested parentheses are used in the pattern, the outer group is returned before the inner group.
PY
oh

Here is an example that illustrates that:


i bi
te
d

3. Groups can also be named through a Python extension to regular expressions. For more information, see
https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups.

LESSON 2 | 63
>>> import re
>>> p = re.compile(r'(\w+)@(\w+\.(\w+))')
>>> match = p.match('andre@example.com')
>>> email = match.group(0)
>>> handle = match.group(1)
EV
>>> domain = match.group(2)
>>> domain_type = match.group(3)
>>> print(email, handle, domain, domain_type, sep='\n')
andre@example.com
U

andre
na

AL
example.com
ut

com
ho

Notice that “example.com” is group 2 and “com”, which is nested within “example.com” is group 3.
riz

U
ed

And you can use the groups() method to get them all at once:
R

AT
ep

>>> print(match.groups())
('andre', 'example.com', 'com')
ro
d
uc

IO
Flags
tio

The compile() method takes an optional second argument: flags. The flags are constants that can
n
or

be used individually or combined with a pipe (|).


N
D
is

re.compile(pattern, re.FLAG1|re.FLAG2)
tri

C
bu

The two most useful flags are (shortcut versions in parentheses):


tio

O
1. re.IGNORECASE (re.I) – Makes the pattern case insensitive.
n
Pr

2. re.MULTILINE (re.M) – Makes ^ and $ consider each line as an independent string.


PY
oh

Using sub() with a Function


i bi
te

The sub() method can either replace each match with a string or with the return value of a specified
d

function. The function receives the match as an argument and must return a string that will replace
the matched pattern. Here is an example:

64 | Regular Expressions
Demo 18: regular-expressions/Demos/clean_cusses.py
1. import re
2. import random
3.
4. def clean_cuss(match):
EV
5. # Get the whole match
6. cuss = match.group(0)
7. # Generate a random list of characters the length of cuss
8. chars = [random.choice('!@#$%^&*') for letter in cuss]
U

9. # Return the list joined into a string


na

10. return ''.join(chars)


AL
11.
ut

12. def main():


ho

13. pattern = r'\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b'


14. p = re.compile(pattern, re.IGNORECASE|re.MULTILINE)
riz

U
15. s = """Shucks! What a cruddy day I\'ve had.
ed

16. I spent the whole darn day with my slobbiest


17. friend darning his STINKY socks."""
R

AT
18. result = p.sub(clean_cuss, s)
ep

19. print(result)
20.
ro

21. main()
d uc

IO
tio

Code Explanation
n

Reading regular expressions is tricky. You have to think like a computer and parse it part by part:
or

N
D

Word boundary:
is
tri

\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
C
bu

0 or more lowercase letters:


tio

O
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
n
Pr

PY
Any one of the words delimited by the pipes (|):
oh

\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b
i bi
te

0 or more lowercase letters:


d

\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b

LESSON 2 | 65
Word boundary:
\b[a-z]*(stupid|stinky|darn|shucks|crud|slob)[a-z]*\b

Notice that we compile the pattern using the re.IGNORECASE and re.MULTILINE flags:
EV
p = re.compile(pattern, re.IGNORECASE|re.MULTILINE)
U

The following screenshot shows the regular expression matches in pythex.org:


na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
d
uc

IO
tio
n
or

N
D
is
tri

C
bu

Run the file at the terminal to see that it replaces all those matches with a random string of characters:
tio

O
PS …\regular-expressions\Demos> python clean_cusses.py
n

@$@$#!! What a !&!#%* day I've had.


Pr

I spent the whole &*$* day with my @$@!%$^*^


PY
oh

friend ^&#*&#& his ^!@*#@ socks.


i bi
te

In an earlier lesson (see page 26), we split the text of the U.S. Declaration of Independence on spaces to
d

create a counter showing which words were used the most often. The resulting list looked like this:

66 | Regular Expressions
[('PEOPLE', 13), ('STATES', 7), ('SHOULD', 5), ('INDEPENDENT', 5),
('AGAINST', 5), ('GOVERNMENT,', 4), ('ASSENT', 4),
('OTHERS', 4), ('POLITICAL', 3), ('POWERS', 3)]

In the following demo, we use a regular expression to split on any character that is not a capital letter:
EV
Demo 19: regular-expressions/Demos/counter_re.py
1. import re
U

2. from collections import Counter


na

AL
3.
ut

4. with open('Declaration_of_Independence.txt') as f:
5. doi = f.read().upper()
ho

6.
riz

7. word_list = [word for word in re.split('[^A-Z]', doi) if len(word) > 5]


U
8.
ed

9. c = Counter(word_list)
10. print(c.most_common(10))
R

AT
ep
ro

Code Explanation
d uc

Because we use upper() to convert the whole text to uppercase, we can split on [^A-Z]. If we didn’t
IO
tio

know that there were only uppercase letters, we would have used [^A-Za-z] instead.
n

The new results are:


or

N
D

[('PEOPLE', 17), ('STATES', 12), ('GOVERNMENT', 7), ('POWERS', 6),


is

('BRITAIN', 6), ('SHOULD', 5), ('COLONIES', 5), ('INDEPENDENT', 5),


tri

('AGAINST', 5), ('MANKIND', 4)]


C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 2 | 67
Exercise 6: Green Glass Door
20 to 30 minutes
EV
In this exercise, you will modify a function so that it uses a regular expression. But first, a little riddle:

The following items can pass through the green glass door:
U

1. puddles
na

AL
2. mommies
ut

3. aardvarks
ho

4. balloons
riz

U
ed

The following items cannot pass through the green glass door:
R

AT
1. ponds
ep

2. moms
ro

3. anteaters
duc

4. kites
IO
tio

Knowing that, which of the following can pass through the green glass door?
n
or

1. bananas
N
D

2. apples
is

3. pears
tri

C
4. grapes
bu

5. cherries
tio

O
n

Did you figure it out? The two that can pass are apples and cherries. Any word with a double letter
Pr

can pass through the green glass door.


PY
oh

Now, take a look at the following code:


i bi
te
d

68 | Regular Expressions
Exercise Code: regular-expressions/Exercises/green_glass_door.py
1. def green_glass_door(word):
2. prev_letter = ''
3. for letter in word:
4. letter = letter.upper()
EV
5. if letter == prev_letter:
6. return True
7. prev_letter = letter
8. return False
U

9.
na

10. fruits = ['banana', 'apple', 'pear', 'grape', 'cherry',


AL
11. 'persimmons', 'orange', 'passion fruit']
ut

12.
ho

13. for fruit in fruits:


14. if green_glass_door(fruit):
riz

U
15. print(f'YES! {fruit} can pass through the green glass door.')
ed

16. else:
17. print(f'NO! {fruit} cannot pass through the green glass door.')
R

AT
ep
ro

Study the code, paying particular attention to the green_glass_door() function. Your job is to
d

rewrite that function to use a regular expression. Don’t forget to import re.
uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i
bi
te
d

LESSON 2 | 69
Solution: regular-expressions/Solutions/green_glass_door.py
1. import re
2.
3. def green_glass_door(word):
4. pattern = re.compile(r'(.)\1')
EV
5. return pattern.search(word)
6.
7. fruits = ['banana', 'apple', 'pear', 'grape', 'cherry',
8. 'persimmons', 'orange', 'passion fruit']
U

9.
na

10. for fruit in fruits:


AL
11. if green_glass_door(fruit):
ut

12. print(f'YES! {fruit} can pass through the green glass door.')
ho

13. else:
14. print(f'NO! {fruit} cannot pass through the green glass door.')
riz

U
ed

Code Explanation
R

AT
ep

The first part of the pattern matches any character. It uses parentheses to create a group:
ro
d

pattern = re.compile(r'(.)\1')
uc

IO
tio

The second part of the pattern uses a backreference to match the first group: that is, the character
n

matched by (.):
or

N
D

pattern = re.compile(r'(.)\1')
is
tri

C
The function then returns the result of searching the string for that pattern:
bu
tio

return pattern.search(word)
O
n
Pr

That will either return a Match object, which evaluates to True, or it will return None, which evaluates
PY
oh

to False.
i bi
te
d

70 | Regular Expressions
Conclusion
In this lesson, you have learned how to work with regular expressions in Python. To learn more about
regular expressions, see Python’s Regular Expression HOWTO
(https://docs.python.org/3/howto/regex.html).
EV
U
na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
d uc

IO
tio
n
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
i bi
te
d

LESSON 2 | 71
PY
O
d
te
bi
C
i
oh
Pr
N
n
tio
bu
IO
tri
is
AT
D
or
n
tio
uc
U d
AL
ro
ep
R
ed
EV
riz

72 | Regular Expressions
ho
ut
na
U
EVU
na

AL
ut
ho
riz

U
ed
R

AT
ep
ro
ucd

IO
7400 E. Orchard Road, Suite 1450 N
tio

Greenwood Village, Colorado 80111


Ph: 303-302-5280
n

www.ITCourseware.com
or

N
D
is
tri

C
bu
tio

O
n
Pr

PY
oh
bii
te
d

1-38-00319-000-04-28-20

You might also like