You are on page 1of 143

Data Structures and Algorithms

Theory and Practice

Version 1.0
December 6, 2019

Haitham El-Ghareeb

Faculty of Computers and Information Sciences


Mansoura University, Egypt

2nd Year General


ii
Preface

Welcome to Data Structures and Algorithms: Theory and Practice Course. In the beginning,
let's take a small journey familiarizing ourselves with the course specications, objectives,
and contents based on our beloved faculty specications.

Course Meta Data


Course Info. Summary
ˆ Course Code CS211P

ˆ Course Title Data Structures and Algorithms

ˆ Core / Elective Core

ˆ Credits

 Theory 2
 Project 0
 Lab 3
 TOT 3

Course Description
This course intorduces

ˆ the fundamental concepts of data structures

ˆ and the algorithms that proceed from them

ˆ the le system fundamentals

ˆ and developing skills in the design and implementation of complex software systems

Course Syllabus
ˆ Secondary Storage Devices

 Stacks

 Queues

 Lists

 Double Ended Queues

iii
iv PREFACE

ˆ Sequences

 Ranked Sequences

 Positional Sequences

 General Sequences

ˆ Trees

 Binary Trees

 Data Structures for Representing Trees

ˆ Priority Queues

 Priority Queue as a Sequence

 Heaps

ˆ Dictionaries

 Binary Search Trees

 AVL Trees

 Hash Tables

ˆ Sets, Sorting, Selection

 Sets

 Merge Sort

 Quick Sort

 Radix Sort

 Complexity of Sorting

 Selection

ˆ Graphs

 Data Structures for Graphs

 Graph Traversal

 Directed Graphs

ˆ Strings

 Brute-Force String Pattern Matching

 Regular Expression Pattern Matching

 Tries

ˆ Record Storage and File Organizations

 Ordered and Unordered Eles

ˆ Hashing and extendible hashing

ˆ Index structures for les

 B-Trees

 B+-Trees
v

Course Resources
ˆ Telegram Channel: https://t.me/DSA1920
ˆ Github: https://www.github.com/helghareeb/DSA20
ˆ Google Classroom: https://classroom.google.com
 Invite students or give them the class code:

 Remember: You have to use your University email

ˆ Contact: h.elghareeb@yahoo.com

ˆ Demo: Join Google Classroom https://www.youtube.com/watch?v=9hmfs-binhM


ˆ Demo: Anaconda Python Installation (Windows) https://youtu.be/ejBttg7GWsw

Book Contents
This book begins with part, the part you are reading right now.
Chapter 1 presented in page 3 presents an important discussion about the dierent
methods used to compare between programming languages, and how we shall compare them
in order to chose among them. Comparison criteria are many, not clear, and eventually we
have to chose.
Book concludes with resources section presented in 15.5 at page 15.5. This part provides
links to important resources to broaden the concepts presented in this book. A very impor-
tant section (Glossary) is presented at the end of the book, with important denitions of
basic concepts (either from Data Structures and Algorithms or generally from Computers
and Information Sciences) that you must be familiar with, and ready to answer questions
about when asked.

Suggested Lecture Schedule


During this semester, we are planning to go through the following schedule - May God gives
us strength, power, and wisdom to go as planned, nish all contents, and make use of the
following topics.

ˆ Welcome! - 1 Lecture

 Course Mechanics

 Programming Languages are Not the same

 Python Review

ˆ Abstract Data Types - 1 Lecture

ˆ Algorithm Analysis - 1 Lecture

ˆ Arrays, Sets, Maps - 1 Lectures

ˆ Searching and Sorting - 1 Lecture

ˆ Advanced Sorting - 1 Lecture

ˆ Linked Structures - 1 Lecture


vi PREFACE

ˆ Recursion - 1 Lecture

ˆ Hash Tables - 1 Lecture

ˆ Tree - 1 Lecture

ˆ Graph - 1 Lecture

ˆ Advanced Algorithm Techniques - 1 Lecture

ˆ Problem Solving - 1 Lecture

ˆ Data Structures in Real Examples - 1 Lecture

ˆ What Next? - 1 Lecture

Those are 14 Lectures Total.

Github Repository
Besides CIS Faculty Learning Management System (LMS), Course repository is available at
https://www.github.com/helghareeb/DSA20
This shall be your main source of course information. Going there regularly, at least
once weekly, you will notice somethings

ˆ Content gets updated regularly, new items and content added weekly

ˆ There, you will nd folders arranged by Weeks, Lectures, and mapped linearly

ˆ Lecture contents (usually) are Four folders


1

 Assignment Each lecture shall include weekely assignment. Assignments are


really important in learning, practicing, and understanding

 Demo Code samples illustrated in the lecture. Some of them are already placed
inside the book, however we present them again here as solutions / projects so
you can compile/interpret and run immediately

 Lab Will include labs activities, resources, and solutions (it depends) of the
lecture week contents

 Lecture include lecture slides, which are useful guidelines for studying <what
content> and <where: in the book and other useful external resources>

Finally, we are doing our best to make this course interesting and useful as we can. For
sure we will face challenges, but there is always hope. Hope you will enjoy this course as we
enjoyed preparing it, and hope this course will become a checkpoint in your career.
Regards, Dr.Haitham December 6, 2019

1 Note that Github does not display empty folders, however they are there for all lectures
Contents

Preface iii

I Introduction 1
1 Programming Languages are Not the Same 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 This is Not... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Let's Agree on.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Programming Languages Comparison . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Why We Need To Compare ? . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 How Do We Compare ? . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Academic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Programming Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.1 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.2 Structured Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.3 Procedural Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.4 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.5 Event-Driven Programming . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.6 Object Oriented Programming . . . . . . . . . . . . . . . . . . . . . . 10
1.5.7 Declarative Programming . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.8 Reactive Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.9 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 General Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1 Compiled vs. Interpreted . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.2 Standardized Programming Languages . . . . . . . . . . . . . . . . . . 12
1.6.3 Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.4 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.5 Object Oriented Programming Features Support . . . . . . . . . . . . 14
1.6.6 Functional Programming Features Support . . . . . . . . . . . . . . . 15
1.6.7 Multithreading / Concurrency . . . . . . . . . . . . . . . . . . . . . . 15
1.6.8 Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.9 Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.10 Regular Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

vii
viii CONTENTS

1.6.11 Language Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 16


1.6.12 Built-In Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Market Share / Adoption / Penetration . . . . . . . . . . . . . . . . . . . . . 16
1.8 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8.1 Programming Languages Philosophy . . . . . . . . . . . . . . . . . . . 16
1.8.2 What Experts Think ? . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.8.3 What Shall I Do ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Variables, expressions and statements 19


2.1 Assignment statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Variable names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Expressions and statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Script mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Order of operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 String operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Lists 25
3.1 A list is a sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Lists are mutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Traversing a list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 List operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 List slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 List methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Map, lter and reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 Deleting elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.9 Lists and strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.10 Objects and values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.11 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.12 List arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.13 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.14 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Tuples 37
4.1 Tuples are immutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Tuple assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Tuples as return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Variable-length argument tuples . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Lists and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Dictionaries and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Sequences of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5 Functions 47
5.1 Function calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Math functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Adding new functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Denitions and uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
CONTENTS ix

5.6 Flow of execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


5.7 Parameters and arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.8 Variables and parameters are local . . . . . . . . . . . . . . . . . . . . . . . . 52
5.9 Stack diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.10 Fruitful functions and void functions . . . . . . . . . . . . . . . . . . . . . . . 53
5.11 Why functions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.12 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.13 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Classes and objects 57


6.1 Programmer-dened types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3 Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Instances as return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Objects are mutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.6 Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.8 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7 Magic Methods 65
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 __new__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.3 __str__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.4 __add__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.5 __ge__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.6 Important Magic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8 Python Testing 71
8.1 Testing Your Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2 Automated vs. Manual Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3 Unit Tests vs. Integration Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.4 Choosing a Test Runner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4.1 unittest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

9 Numpy 73
9.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.2 Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.2.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.2.2 Integer Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2.3 Boolean Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.3 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.4 Array Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

10 Abstract Data Types 81


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2 Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2.1 Procedural Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.2.2 Data Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3 Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.4.1 ADT and Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.5 General Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
x CONTENTS

10.5.1 Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.5.2 Container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.3 Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.4 Sorted Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.5 List vs. Python list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6 Python and ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6.1 Step 01: Specify ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6.2 02: Using the ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.6.3 Preconditions and Postconditions . . . . . . . . . . . . . . . . . . . . . 85
10.7 Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.1 Bag Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.2 Bag Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.3 Bag Usage Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.7.4 Why a Bag ADT? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.7.5 Selecting a Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.8 Chose the Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.9 List-Based Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.9.1 Some Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 89

11 Arrays 91
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2 The Array Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2.1 Arrays vs. Python lists . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2.2 When to use Arrays? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3 Array Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3.1 Array ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3.2 Creation and Usage of Array ADT . . . . . . . . . . . . . . . . . . . . 93
11.3.3 Implementing the Array . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.4 Array 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.4.1 Implementing Array 2D . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.5 Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.5.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.5.2 Game of Life - Core - Full Code . . . . . . . . . . . . . . . . . . . . . 96
11.5.3 Game of Life - GUI - Full Code . . . . . . . . . . . . . . . . . . . . . . 99

12 Algorithm Analysis 101


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.1.1 How do we measure - Example . . . . . . . . . . . . . . . . . . . . . . 101
12.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.3 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.3.1 Does Asymptotic Analysis always work? . . . . . . . . . . . . . . . . . 104
12.3.2 Three Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.4 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
12.4.1 Constant Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
12.4.2 Logarithmic Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

13 Linked List 111


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
13.2 Creation of Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
13.3 Traversing a Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
13.4 Insertion in a Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
CONTENTS xi

13.4.1 Inserting at the Beginning of the Linked List . . . . . . . . . . . . . . 112


13.4.2 Inserting at the End of the Linked List . . . . . . . . . . . . . . . . . . 113
13.4.3 Inserting in between two Data Nodes . . . . . . . . . . . . . . . . . . . 114
13.4.4 Removing an Item form a Liked List . . . . . . . . . . . . . . . . . . . 115

14 Queue 119
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.2 Queue Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.4 Implementation using list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
14.5 Implementation using collections.deque . . . . . . . . . . . . . . . . . . . . . . 121
14.6 Implementation using queue.Queue . . . . . . . . . . . . . . . . . . . . . . . . 122

15 Stack 125
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15.2 Implementing a Python Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.3 Using list to Create a Python Stack . . . . . . . . . . . . . . . . . . . . . . . 127
15.4 Using collections.deque . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
15.4.1 Why Have deque and list? . . . . . . . . . . . . . . . . . . . . . . . . . 129
15.5 Which Implementation to Use? . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Resources 131
15.6 Book Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.7 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.8 Github Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
xii CONTENTS
Part I

Introduction

1
Chapter 1

Programming Languages are Not

the Same

1.1 Introduction
1.1.1 Objectives
ˆ Which Programming Language !

ˆ How do we compare between PLs?

ˆ What are the Criteria ?

ˆ How do those Criteria Relate to me ?

ˆ What Technology Leaders Think about this Question ?

ˆ What is the "accurate" Question ?

1.1.2 This is Not...


ˆ Object Oriented Programming Course

ˆ Functional Programming Course

ˆ (Certain) Programming Language Course

ˆ Even a Course !

1.1.3 Prerequisites
ˆ Familiarity with Programming Concepts (Preferred)

ˆ Familiarity with one or more Programming Languages (Preferred)

3
4 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

1.1.4 Contents
1. Let's Agree on

2. Programming Languages Comparison

(a) Why and How Do we / academics compare

3. Programming Paradigms

4. General Characteristics

5. Market Share / Adoption / Penetration

6. Final Thoughts

1.2 Let's Agree on..


1.2.1 Denitions
From Wikipedia
1

ˆ formal computer language


A programming language is a designed to communi-
cate instructions to a machine, particularly a computer.

ˆ Programming languages can be used to create programs to control the behavior


of a machine or to express algorithms.

Open Source
From Wikipedia
2

ˆ Computer software with its source code made available with a license in which
the copyright holder provides the rights to study, change, and distribute the
software to anyone and for any purpose.

Technical Standard - What


From Wikipedia
3

ˆ formal document that establishes uniform engineering or technical


It is usually a
criteria, methods, processes and practices related to technical systems.

ˆ In contrast, a custom, convention, company product, corporate standard, and


so forth that becomes generally accepted and dominant is often called a de facto
standard.

1 https://en.wikipedia.org/wiki/ProgrammingLanguage
2 https://en.wikipedia.org/wiki/OpenSourceSoftware
3 https://en.wikipedia.org/wiki/Technical_standard
1.2. LET'S AGREE ON.. 5

Technical Standard - Who


From Wikipedia
4

ˆ A technical standard may be developed privately or unilaterally.

ˆ Standards can also be developed by groups such as trade unions, and trade associ-
ations.

ˆ Standards organizations often have more diverse input and usually develop volun-
tary standards.
ˆ The standardization process may be by edict or may involve the formal consensus
of technical experts.

1.2.2 Thoughts
Theory vs. Product
Theory vs. Product
ˆ Who leads: Academia vs. Standards vs. Industry ?

ˆ Who sticks with standards ?

ˆ Who wins ?

How Many Programming Languages?


https://www.quora.com/
How-many-programming-languages-are-there-in-the-world-of-software

How Many Programming Languages ?


ˆ Hundreds

ˆ New Ones Every Year

ˆ Same Company Supports Dierent Languages !

Time-line of Programming Languages


https://en.wikipedia.org/wiki/Timeline_of_programming_languages

The Big List of 256 Programming Languages


https://dzone.com/articles/big-list-256-programming

Why so Many Programming Languages?


ˆ https://cs.stackexchange.com/questions/451/why-are-there-so-many-programming-languages

ˆ https://www.reddit.com/r/explainlikeimfive/comments/1jk4jo/eli5_why_are_
there_so_many_programming_languages/
4 https://en.wikipedia.org/wiki/Technical_standard
6 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

1.3 Programming Languages Comparison


1.3.1 Why We Need To Compare ?
ˆ Eventually, we have to choose !

ˆ We need to chose Only One... for the Task

ˆ We don't have enough time to learn them all !

1.3.2 How Do We Compare ?


Comparison Criteria
ˆ Many

ˆ No Standards !

ˆ Even in Academia !!

ˆ Industry Benchmarks !!!

Comparison Criteria - for us


1. Academic

2. Programming Paradigms

3. General Characteristics

4. Market Share / Adoption / Penetration

1.4 Academic
How Do Researchers Compare ?
https://scholar.google.com/

1.5 Programming Paradigms


Programming Paradigm
From Wikipedia
5

ˆ Way to classify programming languages based on their features.

ˆ Languages can be classied into multiple paradigms.

Comparison of Programming Paradigms


https://en.wikipedia.org/wiki/Comparison_of_programming_paradigms
5 https://en.wikipedia.org/wiki/ProgrammingParadigm
1.5. PROGRAMMING PARADIGMS 7

Programming Paradigms for Programmers


What Every Programmer Should Know
http://hiperc.buffalostate.edu/courses/ACM612-F15/uploads/ACM612/
VanRoy-Programming.pdf

Examples of Programming Paradigms


1. Imperative

2. Structured

3. Procedural

4. Functional

5. Event-Driven

6. Object-Oriented

7. Declarative

8. Reactive

9. Others

1.5.1 Imperative Programming


From Wikipedia
6

ˆ Programming paradigm that uses statements that change a program's state.

ˆ Imperative program consists of commands for the computer to perform.

State (Computer Science)


From Wikipedia
7

ˆ Program is described asstateful if it is designed to remember preceding events or


user interactions; the remembered information is called the state of the system.

ˆ The set of states a system can occupy is known as its state space.

ˆ In a discrete system, the state space is countable and often nite

Simple Statements in Python


https://docs.python.org/3.7/reference/simple_stmts.html
6 https://en.wikipedia.org/wiki/ImperativeProgramming
7 https://en.wikipedia.org/wiki/State
8 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

Expression (Computer Science - I)


From Wikipedia
8

ˆ Expression is a combination of one or more explicit values, constants, variables,


operators, and functions that the programming language interprets and computes to
produce ("to return", in a stateful environment) another value.

ˆ For example, 2 + 3 is an arithmetic and programming expression which evaluates to


5.

Expression (Computer Science - II)


ˆ A variable is an expression because it denotes a value in memory, so y+6 is an
expression.

ˆ An example of a relational expression is 4 6= 4, which evaluates to false.

Statement vs. Expression - I


From
9

ˆ Statement is a complete line of code that performs some action

ˆ Expression is any section of the code that evaluates to a value

ˆ Statements can only be combined vertically by writing one after another, or with
block constructs.

ˆ Expressions can be combined horizontally into larger expressions using operators

Statement vs. Expression - II


ˆ Every expression can be used as a statement (whose eect is to evaluate the expres-
sion and ignore the resulting value)

ˆ Most statements cannot be used as expressions

Imperative Programming Languages - Examples


From
10

ˆ Most of the mainstream languages, including object-oriented programming (OOP)


languages such as C#, Visual Basic, C++, and Java, were designed to primarily
support imperative (procedural) programming.

8 https://en.wikipedia.org/wiki/Expression
9 https://www.quora.com/Whats-the-dierence-between-a-statement-and-an-expression-in-Python
10 https://stackoverow.com/questions/17826380/what-is-dierence-between-functional-and-imperative-
programming-languages
1.5. PROGRAMMING PARADIGMS 9

1.5.2 Structured Programming


From Wikipedia
11

ˆ Programming paradigm aimed at improving the clarity, quality, and development time
of a computer program

ˆ Making extensive use of subroutines, block structures, for and while loopsin contrast
to using simple tests and jumps such as the goto statement
ˆ It emerged in the late 1950s with the appearance of the ALGOL 58 and ALGOL 60
programming languages

ˆ C, C++, Java, Python are Structured Programming Languages


12

1.5.3 Procedural Programming


From Wikipedia
13

ˆ Derived from Structured programming

ˆ Based upon the concept of the procedure call.


ˆ Procedures, also known as routines, subroutines, or functions simply contain a series
of computational steps to be carried out.
ˆ Any given procedure might be called at any point during a program's execution,
including by other procedures or itself.

ˆ Pascal, C, C++, Ada, Lisp, PHP, Python, and Go

1.5.4 Functional Programming


From Wikipedia
14

ˆ evaluation of mathematical functions


Treats computation as the and avoids
changing-state and mutable data.
ˆ Programming is done with expressions or declarations instead of statements.

ˆ In functional code, the output value of a function depends only on the arguments that
are passed to the function, so calling a function f twice with the same value for an
argument x will produce the same result f(x) each time.

Functional Programming - II
ˆ This is in contrast to procedures depending on a local or global state, which may
produce dierent results at dierent times when called with the same arguments
but a dierent program state.

Functional Programming - Must Read


https://wiki.haskell.org/Functional_programming
11 https://en.wikipedia.org/wiki/StructuredProgramming
12 https://en.wikipedia.com/wiki/ProgrammingParadigm
13 https://en.wikipedia.org/wiki/ProceduralProgramming
14 https://en.wikipedia.org/wiki/FunctionalProgramming
10 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

Some Features of Functional Languages - I


From
15

ˆ Higher-order functions, functions that take other functions as their arguments.

ˆ Purity, some functional languages allow expressions to yield actions in addition to


return values. These actions are called side eects. Languages that prohibit side
eects are called pure.
 Immutable data, Instead of altering existing values, altered copies are created
and the original is preserved.

Some Features of Functional Languages - II


ˆ Lazy Evaluation, computations can be performed at any time and still yield the
same result. This makes it possible to defer the computation of values until they are
needed.

ˆ Recursion, often the only way to iterate. Implementations will often include tail call
optimization.

Examples of Functional Programming Languages


C++, Clojure, Coeescript, Elixir, Erlang, F#, Haskell, Lisp, Python, Ruby, Scala,
SequenceL, Standard ML, JavaScript

1.5.5 Event-Driven Programming


From Wikipedia
16

ˆ Flow of the program is determined by events such as user actions (mouse clicks, key
presses), sensor outputs, or messages from other programs / threads.

ˆ Dominant paradigm used in graphical user interfaces (GUI) and other applications
(e.g. JavaScript web applications) that are centered on performing certain actions in
response to user input.

ˆ This is also true of programming for Device Drivers, Game Programming

ˆ There is generally a main loop that listens for events, and then triggers a callback
function when one of those events is detected.

1.5.6 Object Oriented Programming


From Wikipedia
17

ˆ Based on the concept of objects, which may contain data, in the form of elds, often
known as attributes; and code, in the form of procedures, often known as methods.

ˆ Programs are designed by making them out of objects that interact with one another.

15 https://wiki.haskell.org/Functional_programming
16 https://en.wikipedia.org/wiki/EventDrivenProgramming
17 https://en.wikipedia.org/wiki/ObjectOrientedProgramming
1.5. PROGRAMMING PARADIGMS 11

ˆ Java, C++, C#, Python, PHP, Ruby, Perl, Object Pascal, Objective-C, Dart, Swift,
Scala, Common Lisp, and Smalltalk.

ˆ Encapsulation, concept that binds together the data and functions that manipulate
the data, and that keeps both safe from outside interference and misuse.

ˆ Data/Information Hiding 18 , ability to prevent certain aspects of a class or software


component from being accessible to its clients.

ˆ Composition, Objects can contain other objects in their instance variables.

ˆ Inheritance, This allows classes to be arranged in a hierarchy that represents is-a-


type-of relationships.

 All the data and methods available to the parent class also appear in the child
class with the same names.

 Allows easy re-use of the same procedures and data denitions, in addition to
potentially mirroring real-world relationships in an intuitive way.

ˆ Polymorphism 19 , provision of a single interface to entities of dierent types.

ˆ A polymorphic type is one whose operations can also be applied to values of some
other type, or types.

ˆ Several kinds of polymorphism:

 Ad hoc polymorphism, function denotes dierent and potentially heteroge-


neous implementations depending on a limited range of individually specied
function overloading ).
types and combinations (

 Parametric polymorphism, code is written without mention of any specic


type and can be used transparently with any number of new types (generics ).
 Subtyping, name denotes instances of dierent classes related by some common
superclass (polymorphism ).

ˆ Dynamic Binding, linking procedure call to a specic sequence of code (method) at


run-time.

 It means that the code to be executed for a specic procedure call is not known
until run-time.

 Dynamic binding is also known as late binding or run-time binding.

ˆ All predened types are Objects

ˆ All operations performed by sending messages to Objects

ˆ All user dened types are Objects

1.5.7 Declarative Programming


From Wikipedia
20

ˆ Style of building the structure and elements of computer programs  that expresses
the logic of a computation without describing its control ow.

ˆ SQL, regular expressions, CSS, Prolog, OWL, SPARQL


18 https://en.wikipedia.org/wiki/InformationHiding
19 https://en.wikipedia.com/wiki/Polymorphism
20 https://en.wikipedia.com/wiki/DeclarativeProgramming
12 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

Compiled vs. Interpreted Standard Programming Language


Garbage Collection Type System
Object Oriented Features Functional Programming Support
Multi-threading / Concurrency Pointer Arithmetic
Design by Contract Regular Expressions
Language Integration Built-in Security
Market Share / Adoption

1.5.8 Reactive Programming


ˆ https://en.wikipedia.org/wiki/Reactive_programming

1.5.9 Others
List continues, to include ( not only )
ˆ Automata based programming

ˆ Logic

ˆ Symbolic

Further Reading http://cs.lmu.edu/~ray/notes/paradigms/

1.6 General Characteristics


ˆ What are the features ?

ˆ Are they equivalent ?

ˆ Do we care about all of them ?

Selected Features
1.6.1 Compiled vs. Interpreted
ˆ Compiled, implementations are typically compilers (translators that generate ma-
chine code from source code), and not interpreters

ˆ Interpreted, step-by-step executors of source code, where no pre-runtime translation


takes place.

ˆ Compiled then Interpreted

1.6.2 Standardized Programming Languages


ˆ ANSI / ISO Standard

ˆ Important ?

ˆ Dierent Implementations for the same Programming Language


1.6. GENERAL CHARACTERISTICS 13

1.6.3 Garbage Collection


ˆ Form of automatic memory management.

ˆ The garbage collector attempts to reclaim garbage, or memory occupied by objects


that are no longer in use by the program.

ˆ Invented by John McCarthy to simplify manual memory management in Lisp.

Strategies include
21

ˆ Tracing, strategy consists of determining which objects should be garbage collected by


tracing which objects are reachable by a chain of references from certain root objects,
and considering the rest as garbage and collecting them.

ˆ Reference Counting, each object has a count of the number of references to it.
Garbage is identied by having a reference count of zero.

ˆ Escape Analysis, used to convert heap allocations to stack allocations, thus reducing
the amount of work needed to be done by the garbage collector. This is done using a
compile-time analysis.

ˆ Mark and Sweep, http://www.geeksforgeeks.org/mark-and-sweep-garbage-collection-algorith

ˆ Generational, http://wiki.c2.com/?GenerationalGarbageCollection

1.6.4 Type System


Type System Classication
From
22

ˆ Static vs. Dynamic

ˆ Strong vs. Weak

Static vs. Dynamic Type Checking


ˆ about when type information is acquired
 Static, variables are checked at compile-time, (should) remain the same, and
requires well-dened type system (variables adhere to restrictions). No possibility
of run-time error.

 Dynamic, variables change, and does not require a specic type system. Type
checking happens at run-time.

Strong vs. Weak Typed


ˆ about how strictly types are distinguished.

 Strong, Programming Language raises errors when data types are not com-
pataible.

 Weak, Programming Language tries to do implicit conversions.


21 https://en.wikipedia.com/wiki/GarbageCollection
22 https://stackoverflow.com/questions/2351190/static-dynamic-vs-strong-weak
14 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

1.6.5 Object Oriented Programming Features Support


Access Control
Ability for a modules implementation to remain hidden behind its public interface

Generic Classes
ˆ aka Parametric Type

ˆ ex. Stack Class Parameterized with what it Contains

ˆ Allows statically typed languages to retain their compile-time type safety yet remain
nearly as exible as dynamically typed languages.

ˆ Dynamically typed languages support generic programming inherently

Inheritance
ˆ Multiple Inheritance

ˆ Prototypal Inheritance, objects inherit from objects.

ˆ http://javascript.crockford.com/prototypal.html

Feature Renaming
ˆ Attribute / Method

ˆ Provide a feature with a more natural name for its new context

ˆ Resolve naming ambiguities when a name is inherited from multiple inheritance paths

Operator Overloading - Polymorphism


ˆ Dene an operator (such as + or *) for user-dened types.

Uniform Access
ˆ All services oered by a module should be available through a uniform notation

ˆ Does not betray whether they are implemented through storage or through computa-
tion

Class Variables / Methods


ˆ Class variables and methods are owned by a class

ˆ and Not by any particular instance of a class

ˆ This means that, for however many instances of a class exist at any given point in
time, only one copy of each class variable/method exists and is shared by every
instance of the class
1.6. GENERAL CHARACTERISTICS 15

Reection
ˆ Ability for a program to determine and manipulate various pieces of information about
an object at run-time.

ˆ Most object oriented programming languages support some form of reection.

ˆ This includes ability to determine

 Object type

 Object inheritance structure

 Object methods, including number and types of parameters, and return types

 Object attributes names and types (optional)

Introspection vs. Reection


ˆ Type Introspection is the ability of a program to examine the type or properties of
an object at runtime.

ˆ Reection, ability for a program to manipulate the values, meta-data, properties


and/or functions of an object at runtime
23 .

Object Oriented Programming Language


ˆ Pure Object Oriented Programming Languages

ˆ Hybrid Object Oriented Programming Languages

ˆ Otherwise (None Object Oriented)

1.6.6 Functional Programming Features Support


Higher Order Functions
ˆ Functions can be treated as if they were data objects

 can be bound to variables

 including the ability to be stored in collections

 can be passed to other functions as parameters

 can be returned as the result of other functions

Lexical Closures
ˆ Bundling up the lexical (static) scope surrounding the function with the function itself

ˆ Function carries its surrounding environment around with it wherever it may be used

1.6.7 Multithreading / Concurrency


ˆ Multithreading, ability for a single process to process two or more tasks concurrently
ˆ Concurrency, the decomposability property of a program, algorithm, or problem
into order-independent or partially-ordered components or units
24
23 https://stackoverow.com/questions/25198271/what-is-the-dierence-between-introspection-and-
reection
24 https://en.wikipedia.com/wiki/Concurrency
16 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME

1.6.8 Pointer Arithmetic


Ability for a language to directly manipulate memory addresses and their contents

1.6.9 Design by Contract


ˆ Ability to incorporate important aspects of a specication into the software that is
implementing it.

ˆ Important features are:

 Pre-conditions, conditions that must be true before a method is invoked


 Post-conditions, conditions guaranteed to be true after the invocation of a
method

 Invariant, conditions guaranteed to be true at any stable point during the life-
time of an object

1.6.10 Regular Expression


Pattern matching constructs capable of recognizing the class of languages known as regular
languages 25

1.6.11 Language Integration


Seaming-less integration with other programming languages

1.6.12 Built-In Security


ˆ Programming language's ability to determine whether or not a piece of code comes
from a trusted source (such as the users hard disk) limiting the permissions of the
code if it does not

1.7 Market Share / Adoption / Penetration


Interactive: The 2018 Top Programming Languages - IEEE Spectrum
https:
//spectrum.ieee.org/static/interactive-the-top-programming-languages-2018

TIOBE Index
https://www.tiobe.com/tiobe-index/

1.8 Final Thoughts


1.8.1 Programming Languages Philosophy
Philosophy
Programming Languages that are Optimized for

25 https://en.wikipedia.com/wiki/RegularLanguages
1.8. FINAL THOUGHTS 17

ˆ Concurrency

ˆ Readability

ˆ Overcome what (they think shortages) in other languages (mainly C/C++)

ˆ Other

1.8.2 What Experts Think ?


Are All PLs the Same?
https://www.coursera.org/learn/programming-languages/lecture/fbcb7/
are-all-pls-the-same

1.8.3 What Shall I Do ?


Teach Yourself Programming in 10 Years
http://norvig.com/21-days.html

Experts' Opinion in Programming Languages Comparison


ˆ https://www.quora.com/What-are-the-best-programming-languages-to-learn-today

ˆ https://www.quora.com/What-programming-languages-should-a-modern-day-programmer-have-in
18 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME
Chapter 2

Variables, expressions and

statements

One of the most powerful features of a programming language is the ability to manipulate
variables. A variable is a name that refers to a value.

2.1 Assignment statements


An assignment statement creates a new variable and gives it a value:

>>> message = 'And now for something completely different'


>>> n = 17
>>> pi = 3.1415926535897932

This example makes three assignments. The rst assigns a string to a new variable named
message; the second gives the integer 17 to n; the third assigns the (approximate) value of
π to pi.

2.2 Variable names


Programmers generally choose names for their variables that are meaningfulthey docu-
ment what the variable is used for.
Variable names can be as long as you like. They can contain both letters and numbers,
but they can't begin with a number. It is legal to use uppercase letters, but it is conventional
to use only lower case for variables names.
The underscore character, _, can appear in a name. It is often used in names with
multiple words, such as your_name or airspeed_of_unladen_swallow.
If you give a variable an illegal name, you get a syntax error:

>>> 76trombones = 'big parade'


SyntaxError: invalid syntax
>>> more@ = 1000000
SyntaxError: invalid syntax
>>> class = 'Advanced Theoretical Zymurgy'
SyntaxError: invalid syntax

19
20 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS

76trombones is illegal because it begins with a number. more@ is illegal because it contains
an illegal character, @. But what's wrong with class?
It turns out that class is one of Python's keywords. The interpreter uses keywords to
recognize the structure of the program, and they cannot be used as variable names.
Python 3 has these keywords:

False class finally is return


None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise

You don't have to memorize this list. In most development environments, keywords are
displayed in a dierent color; if you try to use one as a variable name, you'll know.

2.3 Expressions and statements


An expression is a combination of values, variables, and operators. A value all by itself is
considered an expression, and so is a variable, so the following are all legal expressions:

>>> 42
42
>>> n
17
>>> n + 25
42

When you type an expression at the prompt, the interpreter evaluates it, which means
that it nds the value of the expression. In this example, n has the value 17 and n + 25
has the value 42.
A statement is a unit of code that has an eect, like creating a variable or displaying
a value.

>>> n = 17
>>> print(n)

The rst line is an assignment statement that gives a value to n. The second line is a print
statement that displays the value of n.
When you type a statement, the interpreter executes it, which means that it does
whatever the statement says. In general, statements don't have values.

2.4 Script mode


So far we have run Python in interactive mode, which means that you interact directly
with the interpreter. Interactive mode is a good way to get started, but if you are working
with more than a few lines of code, it can be clumsy.
The alternative is to save code in a le called a script and then run the interpreter in
script mode to execute the script. By convention, Python scripts have names that end
with .py.
2.5. ORDER OF OPERATIONS 21

Because Python provides both modes, you can test bits of code in interactive mode
before you put them in a script. But there are dierences between interactive mode and
script mode that can be confusing.
For example, if you are using Python as a calculator, you might type

>>> miles = 26.2


>>> miles * 1.61
42.182
The rst line assigns a value to miles, but it has no visible eect. The second line is
an expression, so the interpreter evaluates it and displays the result. It turns out that a
marathon is about 42 kilometers.
But if you type the same code into a script and run it, you get no output at all. In script
mode an expression, all by itself, has no visible eect. Python evaluates the expression, but
it doesn't display the result. To display the result, you need a print statement like this:

miles = 26.2
print(miles * 1.61)
This behavior can be confusing at rst. To check your understanding, type the following
statements in the Python interpreter and see what they do:

5
x = 5
x + 1
Now put the same statements in a script and run it. What is the output? Modify the
script by transforming each expression into a print statement and then run it again.

2.5 Order of operations


When an expression contains more than one operator, the order of evaluation depends
on the order of operations. For mathematical operators, Python follows mathematical
convention. The acronym PEMDAS is a useful way to remember the rules:
ˆ Parentheses have the highest precedence and can be used to force an expression to
evaluate in the order you want. Since expressions in parentheses are evaluated rst,
2 * (3-1) is 4, and (1+1)**(5-2) is 8. You can also use parentheses to make an
expression easier to read, as in (minute * 100) / 60, even if it doesn't change the
result.

ˆ Exponentiation has the next highest precedence, so 1 + 2**3 is 9, not 27, and 2 *
3**2 is 18, not 36.
ˆ Multiplication and Division have higher precedence than Addition and Subtraction.
So 2*3-1 is 5, not 4, and 6+4/2 is 8, not 5.

ˆ Operators with the same precedence are evaluated from left to right (except exponen-
tiation). So in the expression degrees / 2 * pi, the division happens rst and the
result is multiplied by pi. 2π , you can use parentheses or write degrees
To divide by
/ 2 / pi.
I don't work very hard to remember the precedence of operators. If I can't tell by looking
at the expression, I use parentheses to make it obvious.
22 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS

2.6 String operations


In general, you can't perform mathematical operations on strings, even if the strings look
like numbers, so the following are illegal:

'chinese'-'food' 'eggs'/'easy' 'third'*'a charm'


But there are two exceptions, + and *.
The + operator performs string concatenation, which means it joins the strings by
linking them end-to-end. For example:

>>> first = 'throat'


>>> second = 'warbler'
>>> first + second
throatwarbler
The * operator also works on strings; it performs repetition. For example, 'Spam'*3 is
'SpamSpamSpam'. If one of the values is a string, the other has to be an integer.
This use of + and * makes sense by analogy with addition and multiplication. Just as
4*3 is equivalent to 4+4+4, we expect 'Spam'*3 to be the same as 'Spam'+'Spam'+'Spam',
and it is. On the other hand, there is a signicant way in which string concatenation and
repetition are dierent from integer addition and multiplication. Can you think of a property
that addition has that string concatenation does not?

2.7 Comments
As programs get bigger and more complicated, they get more dicult to read. Formal
languages are dense, and it is often dicult to look at a piece of code and gure out what
it is doing, or why.
For this reason, it is a good idea to add notes to your programs to explain in natural
language what the program is doing. These notes are called comments, and they start
with the # symbol:

# compute the percentage of the hour that has elapsed


percentage = (minute * 100) / 60
In this case, the comment appears on a line by itself. You can also put comments at the
end of a line:

percentage = (minute * 100) / 60 # percentage of an hour


Everything from the # to the end of the line is ignoredit has no eect on the execution of
the program.
Comments are most useful when they document non-obvious features of the code. It is
reasonable to assume that the reader can gure out what the code does; it is more useful to
explain why.
This comment is redundant with the code and useless:

v = 5 # assign 5 to v
This comment contains useful information that is not in the code:

v = 5 # velocity in meters/second.
Good variable names can reduce the need for comments, but long names can make complex
expressions hard to read, so there is a tradeo.
2.8. DEBUGGING 23

2.8 Debugging
Three kinds of errors can occur in a program: syntax errors, runtime errors, and semantic
errors. It is useful to distinguish between them in order to track them down more quickly.

Syntax error: Syntax refers to the structure of a program and the rules about that
structure. For example, parentheses have to come in matching pairs, so (1 + 2) is
legal, but 8) is a syntax error.
If there is a syntax error anywhere in your program, Python displays an error message
and quits, and you will not be able to run the program. During the rst few weeks of
your programming career, you might spend a lot of time tracking down syntax errors.
As you gain experience, you will make fewer errors and nd them faster.

Runtime error: The second type of error is a runtime error, so called because the error
does not appear until after the program has started running. These errors are also
called exceptions because they usually indicate that something exceptional (and bad)
has happened.

Runtime errors are rare in the simple programs you will see in the rst few chapters,
so it might be a while before you encounter one.

Semantic error: The third type of error is semantic, which means related to meaning.
If there is a semantic error in your program, it will run without generating error
messages, but it will not do the right thing. It will do something else. Specically, it
will do what you told it to do.

Identifying semantic errors can be tricky because it requires you to work backward by
looking at the output of the program and trying to gure out what it is doing.

2.9 Glossary
variable: A name that refers to a value.

assignment: A statement that assigns a value to a variable.

state diagram: A graphical representation of a set of variables and the values they refer
to.

keyword: A reserved word that is used to parse a program; you cannot use keywords like
if, def, and while as variable names.

operand: One of the values on which an operator operates.

expression: A combination of variables, operators, and values that represents a single


result.

evaluate: To simplify an expression by performing the operations in order to yield a single


value.

statement: A section of code that represents a command or action. So far, the statements
we have seen are assignments and print statements.

execute: To run a statement and do what it says.

interactive mode: A way of using the Python interpreter by typing code at the prompt.
24 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS

script mode: A way of using the Python interpreter to read code from a script and run
it.

script: A program stored in a le.

order of operations: Rules governing the order in which expressions involving multiple
operators and operands are evaluated.

concatenate: To join two operands end-to-end.

comment: Information in a program that is meant for other programmers (or anyone
reading the source code) and has no eect on the execution of the program.

syntax error: An error in a program that makes it impossible to parse (and therefore
impossible to interpret).

exception: An error that is detected while the program is running.

semantics: The meaning of a program.

semantic error: An error in a program that makes it do something other than what the
programmer intended.
Chapter 3

Lists

This chapter presents one of Python's most useful built-in types, lists. You will also learn
more about objects and what can happen when you have more than one name for the same
object.

3.1 A list is a sequence


Like a string, a list is a sequence of values. In a string, the values are characters; in a list,
they can be any type. The values in a list are called elements or sometimes items.
There are several ways to create a new list; the simplest is to enclose the elements in
square brackets ([ and ]):
[10, 20, 30, 40]
['crunchy frog', 'ram bladder', 'lark vomit']
The rst example is a list of four integers. The second is a list of three strings. The elements
of a list don't have to be the same type. The following list contains a string, a oat, an
integer, and (lo!) another list:

['spam', 2.0, 5, [10, 20]]


A list within another list is nested.
A list that contains no elements is called an empty list; you can create one with empty
brackets, [].
As you might expect, you can assign list values to variables:

>>> cheeses = ['Cheddar', 'Edam', 'Gouda']


>>> numbers = [42, 123]
>>> empty = []
>>> print(cheeses, numbers, empty)
['Cheddar', 'Edam', 'Gouda'] [42, 123] []

3.2 Lists are mutable


The syntax for accessing the elements of a list is the same as for accessing the characters
of a stringthe bracket operator. The expression inside the brackets species the index.
Remember that the indices start at 0:

25
26 CHAPTER 3. LISTS

list
cheeses 0 ’Cheddar’
1 ’Edam’
2 ’Gouda’

list
numbers 0 42
1 123
5

list
empty

Figure 3.1: State diagram.

>>> cheeses[0]
'Cheddar'

Unlike strings, lists are mutable. When the bracket operator appears on the left side of an
assignment, it identies the element of the list that will be assigned.

>>> numbers = [42, 123]


>>> numbers[1] = 5
>>> numbers
[42, 5]

The one-eth element of numbers, which used to be 123, is now 5.


Figure 3.1 shows the state diagram for cheeses, numbers and empty:
Lists are represented by boxes with the word list outside and the elements of the list
inside. cheeses refers to a list with three elements indexed 0, 1 and 2. numbers contains
two elements; the diagram shows that the value of the second element has been reassigned
from 123 to 5. empty refers to a list with no elements.
List indices work the same way as string indices:

ˆ Any integer expression can be used as an index.

ˆ If you try to read or write an element that does not exist, you get an IndexError.

ˆ If an index has a negative value, it counts backward from the end of the list.

The in operator also works on lists.

>>> cheeses = ['Cheddar', 'Edam', 'Gouda']


>>> 'Edam' in cheeses
True
>>> 'Brie' in cheeses
False
3.3. TRAVERSING A LIST 27

3.3 Traversing a list


The most common way to traverse the elements of a list is with a for loop. The syntax is
the same as for strings:

for cheese in cheeses:


print(cheese)

This works well if you only need to read the elements of the list. But if you want to write
or update the elements, you need the indices. A common way to do that is to combine the
built-in functions range and len:

for i in range(len(numbers)):
numbers[i] = numbers[i] * 2

This loop traverses the list and updates each element. len returns the number of elements
in the list. range returns a list of indices from 0 to n − 1, where n is the length of the list.
Each time through the loop i gets the index of the next element. The assignment statement
in the body uses i to read the old value of the element and to assign the new value.
A for loop over an empty list never runs the body:

for x in []:
print('This never happens.')

Although a list can contain another list, the nested list still counts as a single element. The
length of this list is four:

['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]

3.4 List operations


The + operator concatenates lists:

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> c = a + b
>>> c
[1, 2, 3, 4, 5, 6]

The * operator repeats a list a given number of times:

>>> [0] * 4
[0, 0, 0, 0]
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

The rst example repeats [0] four times. The second example repeats the list [1, 2, 3]
three times.
28 CHAPTER 3. LISTS

3.5 List slices


The slice operator also works on lists:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> t[1:3]
['b', 'c']
>>> t[:4]
['a', 'b', 'c', 'd']
>>> t[3:]
['d', 'e', 'f']
If you omit the rst index, the slice starts at the beginning. If you omit the second, the
slice goes to the end. So if you omit both, the slice is a copy of the whole list.

>>> t[:]
['a', 'b', 'c', 'd', 'e', 'f']
Since lists are mutable, it is often useful to make a copy before performing operations that
modify lists.
A slice operator on the left side of an assignment can update multiple elements:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> t[1:3] = ['x', 'y']
>>> t
['a', 'x', 'y', 'd', 'e', 'f']

3.6 List methods


Python provides methods that operate on lists. For example, append adds a new element
to the end of a list:

>>> t = ['a', 'b', 'c']


>>> t.append('d')
>>> t
['a', 'b', 'c', 'd']
extend takes a list as an argument and appends all of the elements:

>>> t1 = ['a', 'b', 'c']


>>> t2 = ['d', 'e']
>>> t1.extend(t2)
>>> t1
['a', 'b', 'c', 'd', 'e']
This example leaves t2 unmodied.
sort arranges the elements of the list from low to high:

>>> t = ['d', 'c', 'e', 'b', 'a']


>>> t.sort()
>>> t
['a', 'b', 'c', 'd', 'e']
Most list methods are void; they modify the list and return None. If you accidentally write
t = t.sort(), you will be disappointed with the result.
3.7. MAP, FILTER AND REDUCE 29

3.7 Map, lter and reduce


To add up all the numbers in a list, you can use a loop like this:

def add_all(t):
total = 0
for x in t:
total += x
return total

total is initialized to 0. Each time through the loop, x gets one element from the list.
The += operator provides a short way to update a variable. This augmented assignment
statement,
total += x

is equivalent to

total = total + x

As the loop runs, total accumulates the sum of the elements; a variable used this way is
sometimes called an accumulator.
Adding up the elements of a list is such a common operation that Python provides it as
a built-in function, sum:

>>> t = [1, 2, 3]
>>> sum(t)
6

An operation like this that combines a sequence of elements into a single value is sometimes
called reduce.
Sometimes you want to traverse one list while building another. For example, the follow-
ing function takes a list of strings and returns a new list that contains capitalized strings:

def capitalize_all(t):
res = []
for s in t:
res.append(s.capitalize())
return res

res is initialized with an empty list; each time through the loop, we append the next element.
So res is another kind of accumulator.
An operation like capitalize_all is sometimes called a map because it maps a func-
tion (in this case the method capitalize) onto each of the elements in a sequence.
Another common operation is to select some of the elements from a list and return a
sublist. For example, the following function takes a list of strings and returns a list that
contains only the uppercase strings:

def only_upper(t):
res = []
for s in t:
if s.isupper():
res.append(s)
return res
30 CHAPTER 3. LISTS

isupper is a string method that returns True if the string contains only upper case letters.
An operation like only_upper is called a lter because it selects some of the elements
and lters out the others.
Most common list operations can be expressed as a combination of map, lter and reduce.

3.8 Deleting elements


There are several ways to delete elements from a list. If you know the index of the element
you want, you can use pop:
>>> t = ['a', 'b', 'c']
>>> x = t.pop(1)
>>> t
['a', 'c']
>>> x
'b'
pop modies the list and returns the element that was removed. If you don't provide an
index, it deletes and returns the last element.
If you don't need the removed value, you can use the del operator:

>>> t = ['a', 'b', 'c']


>>> del t[1]
>>> t
['a', 'c']
If you know the element you want to remove (but not the index), you can use remove:
>>> t = ['a', 'b', 'c']
>>> t.remove('b')
>>> t
['a', 'c']
The return value from remove is None.
To remove more than one element, you can use del with a slice index:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> del t[1:5]
>>> t
['a', 'f']
As usual, the slice selects all the elements up to but not including the second index.

3.9 Lists and strings


A string is a sequence of characters and a list is a sequence of values, but a list of characters
is not the same as a string. To convert from a string to a list of characters, you can use
list:
>>> s = 'spam'
>>> t = list(s)
>>> t
['s', 'p', 'a', 'm']
3.10. OBJECTS AND VALUES 31

a ’banana’ a
’banana’
b ’banana’ b

Figure 3.2: State diagram.

Because list is the name of a built-in function, you should avoid using it as a variable
name. I also avoid l because it looks too much like 1. So that's why I use t.
The list function breaks a string into individual letters. If you want to break a string
into words, you can use the split method:

>>> s = 'pining for the fjords'


>>> t = s.split()
>>> t
['pining', 'for', 'the', 'fjords']

An optional argument called a delimiter species which characters to use as word bound-
aries. The following example uses a hyphen as a delimiter:

>>> s = 'spam-spam-spam'
>>> delimiter = '-'
>>> t = s.split(delimiter)
>>> t
['spam', 'spam', 'spam']

join is the inverse of split. It takes a list of strings and concatenates the elements. join is
a string method, so you have to invoke it on the delimiter and pass the list as a parameter:

>>> t = ['pining', 'for', 'the', 'fjords']


>>> delimiter = ' '
>>> s = delimiter.join(t)
>>> s
'pining for the fjords'

In this case the delimiter is a space character, so join puts a space between words. To
concatenate strings without spaces, you can use the empty string, '', as a delimiter.

3.10 Objects and values


If we run these assignment statements:

a = 'banana'
b = 'banana'

We know that a and b both refer to a string, but we don't know whether they refer to the
same string. There are two possible states, shown in Figure 3.2.
In one case, a and b refer to two dierent objects that have the same value. In the
second case, they refer to the same object.
To check whether two variables refer to the same object, you can use the is operator.
32 CHAPTER 3. LISTS

a [ 1, 2, 3 ]
b [ 1, 2, 3 ]

Figure 3.3: State diagram.

a
[ 1, 2, 3 ]
b

Figure 3.4: State diagram.

>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True

In this example, Python only created one string object, and both a and b refer to it. But
when you create two lists, you get two objects:

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False

So the state diagram looks like Figure 3.3.


In this case we would say that the two lists are equivalent, because they have the
same elements, but not identical, because they are not the same object. If two objects
are identical, they are also equivalent, but if they are equivalent, they are not necessarily
identical.
Until now, we have been using object and value interchangeably, but it is more precise
to say that an object has a value. If you evaluate [1, 2, 3], you get a list object whose
value is a sequence of integers. If another list has the same elements, we say it has the same
value, but it is not the same object.

3.11 Aliasing
If a refers to an object and you assign b = a, then both variables refer to the same object:

>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True

The state diagram looks like Figure 3.4.


The association of a variable with an object is called a reference. In this example, there
are two references to the same object.
An object with more than one reference has more than one name, so we say that the
object is aliased.
If the aliased object is mutable, changes made with one alias aect the other:
3.12. LIST ARGUMENTS 33

list
__main__ letters
0 ’a’
1 ’b’
delete_head t
2 ’c’

Figure 3.5: Stack diagram.

>>> b[0] = 42
>>> a
[42, 2, 3]

Although this behavior can be useful, it is error-prone. In general, it is safer to avoid aliasing
when you are working with mutable objects.
For immutable objects like strings, aliasing is not as much of a problem. In this example:

a = 'banana'
b = 'banana'

It almost never makes a dierence whether a and b refer to the same string or not.

3.12 List arguments


When you pass a list to a function, the function gets a reference to the list. If the function
modies the list, the caller sees the change. For example, delete_head removes the rst
element from a list:

def delete_head(t):
del t[0]

Here's how it is used:

>>> letters = ['a', 'b', 'c']


>>> delete_head(letters)
>>> letters
['b', 'c']

The parameter t and the variable letters are aliases for the same object. The stack
diagram looks like Figure 3.5.
Since the list is shared by two frames, I drew it between them.
It is important to distinguish between operations that modify lists and operations that
create new lists. For example, the append method modies a list, but the + operator creates
a new list.
Here's an example using append:
>>> t1 = [1, 2]
>>> t2 = t1.append(3)
>>> t1
[1, 2, 3]
>>> t2
None
34 CHAPTER 3. LISTS

The return value from append isNone.


Here's an example using the + operator:
>>> t3 = t1 + [4]
>>> t1
[1, 2, 3]
>>> t3
[1, 2, 3, 4]
The result of the operator is a new list, and the original list is unchanged.
This dierence is important when you write functions that are supposed to modify lists.
For example, this function does not delete the head of a list:
def bad_delete_head(t):
t = t[1:] # WRONG!
The slice operator creates a new list and the assignment makes t refer to it, but that doesn't
aect the caller.

>>> t4 = [1, 2, 3]
>>> bad_delete_head(t4)
>>> t4
[1, 2, 3]
At the beginning of bad_delete_head, t and t4 refer to the same list. At the end, t refers
to a new list, but t4 still refers to the original, unmodied list.
An alternative is to write a function that creates and returns a new list. For example,
tail returns all but the rst element of a list:

def tail(t):
return t[1:]

This function leaves the original list unmodied. Here's how it is used:

>>> letters = ['a', 'b', 'c']


>>> rest = tail(letters)
>>> rest
['b', 'c']

3.13 Debugging
Careless use of lists (and other mutable objects) can lead to long hours of debugging. Here
are some common pitfalls and ways to avoid them:

1. Most list methods modify the argument and return None. This is the opposite of the
string methods, which return a new string and leave the original alone.

If you are used to writing string code like this:

word = word.strip()

It is tempting to write list code like this:

t = t.sort() # WRONG!
3.13. DEBUGGING 35

Because sort returns None, the next operation you perform with t is likely to fail.

Before using list methods and operators, you should read the documentation carefully
and then test them in interactive mode.

2. Pick an idiom and stick with it.

Part of the problem with lists is that there are too many ways to do things. For
example, to remove an element from a list, you can use pop, remove, del, or even a
slice assignment.

To add an element, you can use the append method or the + operator. Assuming that
t is a list and x is a list element, these are correct:

t.append(x)
t = t + [x]
t += [x]

And these are wrong:

t.append([x]) # WRONG!
t = t.append(x) # WRONG!
t + [x] # WRONG!
t = t + x # WRONG!

Try out each of these examples in interactive mode to make sure you understand what
they do. Notice that only the last one causes a runtime error; the other three are legal,
but they do the wrong thing.

3. Make copies to avoid aliasing.

If you want to use a method like sort that modies the argument, but you need to
keep the original list as well, you can make a copy.

>>> t = [3, 1, 2]
>>> t2 = t[:]
>>> t2.sort()
>>> t
[3, 1, 2]
>>> t2
[1, 2, 3]

In this example you could also use the built-in function sorted, which returns a new,
sorted list and leaves the original alone.

>>> t2 = sorted(t)
>>> t
[3, 1, 2]
>>> t2
[1, 2, 3]
36 CHAPTER 3. LISTS

3.14 Glossary
list: A sequence of values.

element: One of the values in a list (or other sequence), also called items.

nested list: A list that is an element of another list.

accumulator: A variable used in a loop to add up or accumulate a result.

augmented assignment: A statement that updates the value of a variable using an op-
erator like +=.
reduce: A processing pattern that traverses a sequence and accumulates the elements into
a single result.

map: A processing pattern that traverses a sequence and performs an operation on each
element.

lter: A processing pattern that traverses a list and selects the elements that satisfy some
criterion.

object: Something a variable can refer to. An object has a type and a value.

equivalent: Having the same value.

identical: Being the same object (which implies equivalence).

reference: The association between a variable and its value.

aliasing: A circumstance where two or more variables refer to the same object.

delimiter: A character or string used to indicate where a string should be split.


Chapter 4

Tuples

This chapter presents one more built-in type, the tuple, and then shows how lists, dictionar-
ies, and tuples work together. I also present a useful feature for variable-length argument
lists, the gather and scatter operators.
One note: there is no consensus on how to pronounce tuple. Some people say tuh-ple,
which rhymes with supple. But in the context of programming, most people say too-ple,
which rhymes with quadruple.

4.1 Tuples are immutable


A tuple is a sequence of values. The values can be any type, and they are indexed by
integers, so in that respect tuples are a lot like lists. The important dierence is that tuples
are immutable.
Syntactically, a tuple is a comma-separated list of values:

>>> t = 'a', 'b', 'c', 'd', 'e'

Although it is not necessary, it is common to enclose tuples in parentheses:

>>> t = ('a', 'b', 'c', 'd', 'e')

To create a tuple with a single element, you have to include a nal comma:

>>> t1 = 'a',
>>> type(t1)
<class 'tuple'>

A value in parentheses is not a tuple:

>>> t2 = ('a')
>>> type(t2)
<class 'str'>

Another way to create a tuple is the built-in function tuple. With no argument, it creates
an empty tuple:

>>> t = tuple()
>>> t
()

37
38 CHAPTER 4. TUPLES

If the argument is a sequence (string, list or tuple), the result is a tuple with the elements
of the sequence:

>>> t = tuple('lupins')
>>> t
('l', 'u', 'p', 'i', 'n', 's')

Because tuple is the name of a built-in function, you should avoid using it as a variable
name.
Most list operators also work on tuples. The bracket operator indexes an element:

>>> t = ('a', 'b', 'c', 'd', 'e')


>>> t[0]
'a'

And the slice operator selects a range of elements.

>>> t[1:3]
('b', 'c')

But if you try to modify one of the elements of the tuple, you get an error:

>>> t[0] = 'A'


TypeError: object doesn't support item assignment

Because tuples are immutable, you can't modify the elements. But you can replace one
tuple with another:

>>> t = ('A',) + t[1:]


>>> t
('A', 'b', 'c', 'd', 'e')

This statement makes a new tuple and then makes t refer to it.
The relational operators work with tuples and other sequences; Python starts by compar-
ing the rst element from each sequence. If they are equal, it goes on to the next elements,
and so on, until it nds elements that dier. Subsequent elements are not considered (even
if they are really big).

>>> (0, 1, 2) < (0, 3, 4)


True
>>> (0, 1, 2000000) < (0, 3, 4)
True

4.2 Tuple assignment


It is often useful to swap the values of two variables. With conventional assignments, you
have to use a temporary variable. For example, to swap a and b:

>>> temp = a
>>> a = b
>>> b = temp

This solution is cumbersome; tuple assignment is more elegant:


4.3. TUPLES AS RETURN VALUES 39

>>> a, b = b, a

The left side is a tuple of variables; the right side is a tuple of expressions. Each value is
assigned to its respective variable. All the expressions on the right side are evaluated before
any of the assignments.
The number of variables on the left and the number of values on the right have to be
the same:

>>> a, b = 1, 2, 3
ValueError: too many values to unpack

More generally, the right side can be any kind of sequence (string, list or tuple). For example,
to split an email address into a user name and a domain, you could write:

>>> addr = 'monty@python.org'


>>> uname, domain = addr.split('@')

The return value from split is a list with two elements; the rst element is assigned to
uname, the second to domain.

>>> uname
'monty'
>>> domain
'python.org'

4.3 Tuples as return values


Strictly speaking, a function can only return one value, but if the value is a tuple, the eect
is the same as returning multiple values. For example, if you want to divide two integers
and compute the quotient and remainder, it is inecient to compute x//y and then x%y. It
is better to compute them both at the same time.
The built-in function divmod takes two arguments and returns a tuple of two values, the
quotient and remainder. You can store the result as a tuple:

>>> t = divmod(7, 3)
>>> t
(2, 1)

Or use tuple assignment to store the elements separately:

>>> quot, rem = divmod(7, 3)


>>> quot
2
>>> rem
1

Here is an example of a function that returns a tuple:

def min_max(t):
return min(t), max(t)

max and min are built-in functions that nd the largest and smallest elements of a sequence.
min_max computes both and returns a tuple of two values.
40 CHAPTER 4. TUPLES

4.4 Variable-length argument tuples


Functions can take a variable number of arguments. A parameter name that begins with *
gathers arguments into a tuple. For example, printall takes any number of arguments
and prints them:

def printall(*args):
print(args)

The gather parameter can have any name you like, but args is conventional. Here's how
the function works:

>>> printall(1, 2.0, '3')


(1, 2.0, '3')

The complement of gather is scatter. If you have a sequence of values and you want to pass
it to a function as multiple arguments, you can use the * operator. For example, divmod
takes exactly two arguments; it doesn't work with a tuple:

>>> t = (7, 3)
>>> divmod(t)
TypeError: divmod expected 2 arguments, got 1

But if you scatter the tuple, it works:

>>> divmod(*t)
(2, 1)

Many of the built-in functions use variable-length argument tuples. For example, max and
min can take any number of arguments:

>>> max(1, 2, 3)
3

But sum does not.

>>> sum(1, 2, 3)
TypeError: sum expected at most 2 arguments, got 3

As an exercise, write a function called sum_all that takes any number of arguments and
returns their sum.

4.5 Lists and tuples


zip is a built-in function that takes two or more sequences and interleaves them. The name
of the function refers to a zipper, which interleaves two rows of teeth.
This example zips a string and a list:

>>> s = 'abc'
>>> t = [0, 1, 2]
>>> zip(s, t)
<zip object at 0x7f7d0a9e7c48>
4.5. LISTS AND TUPLES 41

The result is a zip object that knows how to iterate through the pairs. The most common
use of zip is in a for loop:

>>> for pair in zip(s, t):


... print(pair)
...
('a', 0)
('b', 1)
('c', 2)

A zip object is a kind of iterator, which is any object that iterates through a sequence.
Iterators are similar to lists in some ways, but unlike lists, you can't use an index to select
an element from an iterator.
If you want to use list operators and methods, you can use a zip object to make a list:

>>> list(zip(s, t))


[('a', 0), ('b', 1), ('c', 2)]

The result is a list of tuples; in this example, each tuple contains a character from the string
and the corresponding element from the list.
If the sequences are not the same length, the result has the length of the shorter one.

>>> list(zip('Anne', 'Elk'))


[('A', 'E'), ('n', 'l'), ('n', 'k')]

You can use tuple assignment in a for loop to traverse a list of tuples:

t = [('a', 0), ('b', 1), ('c', 2)]


for letter, number in t:
print(number, letter)

Each time through the loop, Python selects the next tuple in the list and assigns the elements
to letter and number. The output of this loop is:

0 a
1 b
2 c

If you combine zip, for and tuple assignment, you get a useful idiom for traversing two (or
more) sequences at the same time. For example, has_match takes two sequences, t1 and
t2, and returns True if there is an index i such that t1[i] == t2[i]:

def has_match(t1, t2):


for x, y in zip(t1, t2):
if x == y:
return True
return False

If you need to traverse the elements of a sequence and their indices, you can use the built-in
function enumerate:

for index, element in enumerate('abc'):


print(index, element)
42 CHAPTER 4. TUPLES

The result from enumerate is an enumerate object, which iterates a sequence of pairs; each
pair contains an index (starting from 0) and an element from the given sequence. In this
example, the output is

0 a
1 b
2 c
Again.

4.6 Dictionaries and tuples


Dictionaries have a method called items that returns a sequence of tuples, where each tuple
is a key-value pair.

>>> d = {'a':0, 'b':1, 'c':2}


>>> t = d.items()
>>> t
dict_items([('c', 2), ('a', 0), ('b', 1)])
The result is a dict_items object, which is an iterator that iterates the key-value pairs.
You can use it in a for loop like this:
>>> for key, value in d.items():
... print(key, value)
...
c 2
a 0
b 1
As you should expect from a dictionary, the items are in no particular order.
Going in the other direction, you can use a list of tuples to initialize a new dictionary:

>>> t = [('a', 0), ('c', 2), ('b', 1)]


>>> d = dict(t)
>>> d
{'a': 0, 'c': 2, 'b': 1}
Combining dict with zip yields a concise way to create a dictionary:

>>> d = dict(zip('abc', range(3)))


>>> d
{'a': 0, 'c': 2, 'b': 1}
The dictionary method update also takes a list of tuples and adds them, as key-value pairs,
to an existing dictionary.
It is common to use tuples as keys in dictionaries (primarily because you can't use lists).
For example, a telephone directory might map from last-name, rst-name pairs to telephone
numbers. Assuming that we have dened last, first and number, we could write:

directory[last, first] = number


The expression in brackets is a tuple. We could use tuple assignment to traverse this
dictionary.
4.7. SEQUENCES OF SEQUENCES 43

tuple
0 ’Cleese’
1 ’John’

Figure 4.1: State diagram.

dict
(’Cleese’, ’John’) ’08700 100 222’
(’Chapman’, ’Graham’) ’08700 100 222’
(’Idle’, ’Eric’) ’08700 100 222’
(’Gilliam’, ’Terry’) ’08700 100 222’
(’Jones’, ’Terry’) ’08700 100 222’
(’Palin’, ’Michael’) ’08700 100 222’

Figure 4.2: State diagram.

for last, first in directory:


print(first, last, directory[last,first])

This loop traverses the keys in directory, which are tuples. It assigns the elements of each
tuple to last and first, then prints the name and corresponding telephone number.
There are two ways to represent tuples in a state diagram. The more detailed ver-
sion shows the indices and elements just as they appear in a list. For example, the tuple
('Cleese', 'John') would appear as in Figure 4.1.
But in a larger diagram you might want to leave out the details. For example, a diagram
of the telephone directory might appear as in Figure 4.2.
Here the tuples are shown using Python syntax as a graphical shorthand. The telephone
number in the diagram is the complaints line for the BBC, so please don't call it.

4.7 Sequences of sequences


I have focused on lists of tuples, but almost all of the examples in this chapter also work
with lists of lists, tuples of tuples, and tuples of lists. To avoid enumerating the possible
combinations, it is sometimes easier to talk about sequences of sequences.
In many contexts, the dierent kinds of sequences (strings, lists and tuples) can be used
interchangeably. So how should you choose one over the others?
To start with the obvious, strings are more limited than other sequences because the
elements have to be characters. They are also immutable. If you need the ability to change
the characters in a string (as opposed to creating a new string), you might want to use a
list of characters instead.
Lists are more common than tuples, mostly because they are mutable. But there are a
few cases where you might prefer tuples:

1. In some contexts, like a return statement, it is syntactically simpler to create a tuple


than a list.

2. If you want to use a sequence as a dictionary key, you have to use an immutable type
like a tuple or string.
44 CHAPTER 4. TUPLES

3. If you are passing a sequence as an argument to a function, using tuples reduces the
potential for unexpected behavior due to aliasing.

Because tuples are immutable, they don't provide methods like sort and reverse, which
modify existing lists. But Python provides the built-in function sorted, which takes any
sequence and returns a new list with the same elements in sorted order, and reversed,
which takes a sequence and returns an iterator that traverses the list in reverse order.

4.8 Debugging
Lists, dictionaries and tuples are examples of data structures; in this chapter we are
starting to see compound data structures, like lists of tuples, or dictionaries that contain
tuples as keys and lists as values. Compound data structures are useful, but they are prone
to what I call shape errors; that is, errors caused when a data structure has the wrong
type, size, or structure. For example, if you are expecting a list with one integer and I give
you a plain old integer (not in a list), it won't work.
To help debug these kinds of errors, I have written a module called structshape that
provides a function, also called structshape, that takes any kind of data structure as
an argument and returns a string that summarizes its shape. You can download it from
http://thinkpython2.com/code/structshape.py
Here's the result for a simple list:

>>> from structshape import structshape


>>> t = [1, 2, 3]
>>> structshape(t)
'list of 3 int'
s
A fancier program might write list of 3 int , but it was easier not to deal with plurals.
Here's a list of lists:

>>> t2 = [[1,2], [3,4], [5,6]]


>>> structshape(t2)
'list of 3 list of 2 int'
If the elements of the list are not the same type, structshape groups them, in order, by
type:

>>> t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]


>>> structshape(t3)
'list of (3 int, float, 2 str, 2 list of int, int)'
Here's a list of tuples:

>>> s = 'abc'
>>> lt = list(zip(t, s))
>>> structshape(lt)
'list of 3 tuple of (int, str)'
And here's a dictionary with 3 items that map integers to strings.

>>> d = dict(lt)
>>> structshape(d)
'dict of 3 int->str'
If you are having trouble keeping track of your data structures, structshape can help.
4.9. GLOSSARY 45

4.9 Glossary
tuple: An immutable sequence of elements.

tuple assignment: An assignment with a sequence on the right side and a tuple of variables
on the left. The right side is evaluated and then its elements are assigned to the
variables on the left.

gather: An operation that collects multiple arguments into a tuple.

scatter: An operation that makes a sequence behave like multiple arguments.

zip object: The result of calling a built-in function zip; an object that iterates through a
sequence of tuples.

iterator: An object that can iterate through a sequence, but which does not provide list
operators and methods.

data structure: A collection of related values, often organized in lists, dictionaries, tuples,
etc.

shape error: An error caused because a value has the wrong shape; that is, the wrong
type or size.
46 CHAPTER 4. TUPLES
Chapter 5

Functions

In the context of programming, a function is a named sequence of statements that performs


a computation. When you dene a function, you specify the name and the sequence of
statements. Later, you can call the function by name.

5.1 Function calls


We have already seen one example of a function call:
>>> type(42)
<class 'int'>

The name of the function is type. The expression in parentheses is called the argument
of the function. The result, for this function, is the type of the argument.
It is common to say that a function takes an argument and returns a result. The
result is also called the return value.
Python provides functions that convert values from one type to another. The int func-
tion takes any value and converts it to an integer, if it can, or complains otherwise:

>>> int('32')
32
>>> int('Hello')
ValueError: invalid literal for int(): Hello

int can convert oating-point values to integers, but it doesn't round o; it chops o the
fraction part:

>>> int(3.99999)
3
>>> int(-2.3)
-2

float converts integers and strings to oating-point numbers:

>>> float(32)
32.0
>>> float('3.14159')
3.14159

47
48 CHAPTER 5. FUNCTIONS

Finally, str converts its argument to a string:

>>> str(32)
'32'
>>> str(3.14159)
'3.14159'

5.2 Math functions


Python has a math module that provides most of the familiar mathematical functions. A
module is a le that contains a collection of related functions.
Before we can use the functions in a module, we have to import it with an import
statement:

>>> import math

This statement creates a module object named math. If you display the module object,
you get some information about it:

>>> math
<module 'math' (built-in)>

The module object contains the functions and variables dened in the module. To access one
of the functions, you have to specify the name of the module and the name of the function,
separated by a dot (also known as a period). This format is called dot notation.

>>> ratio = signal_power / noise_power


>>> decibels = 10 * math.log10(ratio)

>>> radians = 0.7


>>> height = math.sin(radians)

The rst example uses math.log10 to compute a signal-to-noise ratio in decibels (assuming
that signal_power noise_power are dened). The math module also provides log,
and
which computes logarithms base e.
The second example nds the sine of radians. The variable name radians is a hint
that sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians.
To convert from degrees to radians, divide by 180 and multiply by π :

>>> degrees = 45
>>> radians = degrees / 180.0 * math.pi
>>> math.sin(radians)
0.707106781187

The expression math.pi gets the variable pi from the math module. Its value is a oating-
point approximation of π , accurate to about 15 digits.
If you know trigonometry, you can check the previous result by comparing it to the
square root of two divided by two:

>>> math.sqrt(2) / 2.0


0.707106781187
5.3. COMPOSITION 49

5.3 Composition
So far, we have looked at the elements of a programvariables, expressions, and statements
in isolation, without talking about how to combine them.
One of the most useful features of programming languages is their ability to take small
building blocks and compose them. For example, the argument of a function can be any
kind of expression, including arithmetic operators:

x = math.sin(degrees / 360.0 * 2 * math.pi)

And even function calls:

x = math.exp(math.log(x+1))

Almost anywhere you can put a value, you can put an arbitrary expression, with one ex-
ception: the left side of an assignment statement has to be a variable name. Any other
expression on the left side is a syntax error (we will see exceptions to this rule later).

>>> minutes = hours * 60 # right


>>> hours * 60 = minutes # wrong!
SyntaxError: can't assign to operator

5.4 Adding new functions


So far, we have only been using the functions that come with Python, but it is also possible
to add new functions. A function denition species the name of a new function and the
sequence of statements that run when the function is called.
Here is an example:

def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")

def is a keyword that indicates that this is a function denition. The name of the function
is print_lyrics. The rules for function names are the same as for variable names: letters,
numbers and underscore are legal, but the rst character can't be a number. You can't use
a keyword as the name of a function, and you should avoid having a variable and a function
with the same name.
The empty parentheses after the name indicate that this function doesn't take any ar-
guments.
The rst line of the function denition is called the header; the rest is called the body.
The header has to end with a colon and the body has to be indented. By convention,
indentation is always four spaces. The body can contain any number of statements.
The strings in the print statements are enclosed in double quotes. Single quotes and
double quotes do the same thing; most people use single quotes except in cases like this
where a single quote (which is also an apostrophe) appears in the string.
All quotation marks (single and double) must be straight quotes, usually located next
to Enter on the keyboard. Curly quotes, like the ones in this sentence, are not legal in
Python.
If you type a function denition in interactive mode, the interpreter prints dots (...)
to let you know that the denition isn't complete:
50 CHAPTER 5. FUNCTIONS

>>> def print_lyrics():


... print("I'm a lumberjack, and I'm okay.")
... print("I sleep all night and I work all day.")
...

To end the function, you have to enter an empty line.


Dening a function creates a function object, which has type function:

>>> print(print_lyrics)
<function print_lyrics at 0xb7e99e9c>
>>> type(print_lyrics)
<class 'function'>

The syntax for calling the new function is the same as for built-in functions:

>>> print_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.

Once you have dened a function, you can use it inside another function. For example, to
repeat the previous refrain, we could write a function called repeat_lyrics:

def repeat_lyrics():
print_lyrics()
print_lyrics()

And then call repeat_lyrics:

>>> repeat_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.

But that's not really how the song goes.

5.5 Denitions and uses


Pulling together the code fragments from the previous section, the whole program looks like
this:

def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")

def repeat_lyrics():
print_lyrics()
print_lyrics()

repeat_lyrics()
5.6. FLOW OF EXECUTION 51

This program contains two function denitions: print_lyrics and repeat_lyrics. Func-
tion denitions get executed just like other statements, but the eect is to create function
objects. The statements inside the function do not run until the function is called, and the
function denition generates no output.
As you might expect, you have to create a function before you can run it. In other words,
the function denition has to run before the function gets called.
As an exercise, move the last line of this program to the top, so the function call appears
before the denitions. Run the program and see what error message you get.
Now move the function call back to the bottom and move the denition of print_lyrics
after the denition of repeat_lyrics. What happens when you run this program?

5.6 Flow of execution


To ensure that a function is dened before its rst use, you have to know the order statements
run in, which is called the ow of execution.
Execution always begins at the rst statement of the program. Statements are run one
at a time, in order from top to bottom.
Function denitions do not alter the ow of execution of the program, but remember
that statements inside the function don't run until the function is called.
A function call is like a detour in the ow of execution. Instead of going to the next
statement, the ow jumps to the body of the function, runs the statements there, and then
comes back to pick up where it left o.
That sounds simple enough, until you remember that one function can call another.
While in the middle of one function, the program might have to run the statements in
another function. Then, while running that new function, the program might have to run
yet another function!
Fortunately, Python is good at keeping track of where it is, so each time a function
completes, the program picks up where it left o in the function that called it. When it gets
to the end of the program, it terminates.
In summary, when you read a program, you don't always want to read from top to
bottom. Sometimes it makes more sense if you follow the ow of execution.

5.7 Parameters and arguments


Some of the functions we have seen require arguments. For example, when you call math.sin
you pass a number as an argument. Some functions take more than one argument: math.pow
takes two, the base and the exponent.
Inside the function, the arguments are assigned to variables called parameters. Here
is a denition for a function that takes an argument:

def print_twice(bruce):
print(bruce)
print(bruce)
This function assigns the argument to a parameter named bruce. When the function is
called, it prints the value of the parameter (whatever it is) twice.
This function works with any value that can be printed.

>>> print_twice('Spam')
Spam
Spam
52 CHAPTER 5. FUNCTIONS

>>> print_twice(42)
42
42
>>> print_twice(math.pi)
3.14159265359
3.14159265359
The same rules of composition that apply to built-in functions also apply to programmer-
dened functions, so we can use any kind of expression as an argument for print_twice:

>>> print_twice('Spam '*4)


Spam Spam Spam Spam
Spam Spam Spam Spam
>>> print_twice(math.cos(math.pi))
-1.0
-1.0
The argument is evaluated before the function is called, so in the examples the expressions
'Spam '*4 and math.cos(math.pi) are only evaluated once.
You can also use a variable as an argument:

>>> michael = 'Eric, the half a bee.'


>>> print_twice(michael)
Eric, the half a bee.
Eric, the half a bee.
The name of the variable we pass as an argument (michael) has nothing to do with the
name of the parameter (bruce). It doesn't matter what the value was called back home (in
the caller); here in print_twice, we call everybody bruce.

5.8 Variables and parameters are local


When you create a variable inside a function, it is local, which means that it only exists
inside the function. For example:

def cat_twice(part1, part2):


cat = part1 + part2
print_twice(cat)
This function takes two arguments, concatenates them, and prints the result twice. Here is
an example that uses it:

>>> line1 = 'Bing tiddle '


>>> line2 = 'tiddle bang.'
>>> cat_twice(line1, line2)
Bing tiddle tiddle bang.
Bing tiddle tiddle bang.
When cat_twice terminates, the variable cat is destroyed. If we try to print it, we get an
exception:

>>> print(cat)
NameError: name 'cat' is not defined
Parameters are also local. For example, outside print_twice, there is no such thing as
bruce.
5.9. STACK DIAGRAMS 53

line1 ’Bing tiddle ’


__main__
line2 ’tiddle bang.’

part1 ’Bing tiddle ’


cat_twice part2 ’tiddle bang.’
cat ’Bing tiddle tiddle bang.’

print_twice bruce ’Bing tiddle tiddle bang.’

Figure 5.1: Stack diagram.

5.9 Stack diagrams


To keep track of which variables can be used where, it is sometimes useful to draw a stack
diagram. Like state diagrams, stack diagrams show the value of each variable, but they
also show the function each variable belongs to.
Each function is represented by a frame. A frame is a box with the name of a function
beside it and the parameters and variables of the function inside it. The stack diagram for
the previous example is shown in Figure 5.1.
The frames are arranged in a stack that indicates which function called which, and so
on. In this example, print_twice was called by cat_twice, and cat_twice was called
by __main__, which is a special name for the topmost frame. When you create a variable
outside of any function, it belongs to __main__.
Each parameter refers to the same value as its corresponding argument. So, part1 has
the same value as line1, part2 has the same value as line2, and bruce has the same value
as cat.
If an error occurs during a function call, Python prints the name of the function, the
name of the function that called it, and the name of the function that called that, all the
way back to __main__.
For example, if you try to access cat from within print_twice, you get a NameError:
Traceback (innermost last):
File "test.py", line 13, in __main__
cat_twice(line1, line2)
File "test.py", line 5, in cat_twice
print_twice(cat)
File "test.py", line 9, in print_twice
print(cat)
NameError: name 'cat' is not defined
This list of functions is called a traceback. It tells you what program le the error occurred
in, and what line, and what functions were executing at the time. It also shows the line of
code that caused the error.
The order of the functions in the traceback is the same as the order of the frames in the
stack diagram. The function that is currently running is at the bottom.

5.10 Fruitful functions and void functions


Some of the functions we have used, such as the math functions, return results; for lack of a
better name, I call them fruitful functions. Other functions, like print_twice, perform
54 CHAPTER 5. FUNCTIONS

an action but don't return a value. They are called void functions.
When you call a fruitful function, you almost always want to do something with the
result; for example, you might assign it to a variable or use it as part of an expression:

x = math.cos(radians)
golden = (math.sqrt(5) + 1) / 2

When you call a function in interactive mode, Python displays the result:

>>> math.sqrt(5)
2.2360679774997898

But in a script, if you call a fruitful function all by itself, the return value is lost forever!

math.sqrt(5)

This script computes the square root of 5, but since it doesn't store or display the result, it
is not very useful.
Void functions might display something on the screen or have some other eect, but
they don't have a return value. If you assign the result to a variable, you get a special value
called None.

>>> result = print_twice('Bing')


Bing
Bing
>>> print(result)
None

The value None is not the same as the string 'None'. It is a special value that has its own
type:

>>> type(None)
<class 'NoneType'>

The functions we have written so far are all void. We will start writing fruitful functions in
a few chapters.

5.11 Why functions?


It may not be clear why it is worth the trouble to divide a program into functions. There
are several reasons:

ˆ Creating a new function gives you an opportunity to name a group of statements,


which makes your program easier to read and debug.

ˆ Functions can make a program smaller by eliminating repetitive code. Later, if you
make a change, you only have to make it in one place.

ˆ Dividing a long program into functions allows you to debug the parts one at a time
and then assemble them into a working whole.

ˆ Well-designed functions are often useful for many programs. Once you write and debug
one, you can reuse it.
5.12. DEBUGGING 55

5.12 Debugging
One of the most important skills you will acquire is debugging. Although it can be frus-
trating, debugging is one of the most intellectually rich, challenging, and interesting parts
of programming.
In some ways debugging is like detective work. You are confronted with clues and you
have to infer the processes and events that led to the results you see.
Debugging is also like an experimental science. Once you have an idea about what
is going wrong, you modify your program and try again. If your hypothesis was correct,
you can predict the result of the modication, and you take a step closer to a working
program. If your hypothesis was wrong, you have to come up with a new one. As Sherlock
Holmes pointed out, When you have eliminated the impossible, whatever remains, however
improbable, must be the truth. (A. Conan Doyle, The Sign of Four)
For some people, programming and debugging are the same thing. That is, programming
is the process of gradually debugging a program until it does what you want. The idea is
that you should start with a working program and make small modications, debugging
them as you go.
For example, Linux is an operating system that contains millions of lines of code, but
it started out as a simple program Linus Torvalds used to explore the Intel 80386 chip.
According to Larry Greeneld, One of Linus's earlier projects was a program that would
switch between printing AAAA and BBBB. This later evolved to Linux. ( The Linux Users'
Guide Beta Version 1).

5.13 Glossary
function: A named sequence of statements that performs some useful operation. Functions
may or may not take arguments and may or may not produce a result.

function denition: A statement that creates a new function, specifying its name, pa-
rameters, and the statements it contains.

function object: A value created by a function denition. The name of the function is a
variable that refers to a function object.

header: The rst line of a function denition.

body: The sequence of statements inside a function denition.

parameter: A name used inside a function to refer to the value passed as an argument.

function call: A statement that runs a function. It consists of the function name followed
by an argument list in parentheses.

argument: A value provided to a function when the function is called. This value is
assigned to the corresponding parameter in the function.

local variable: A variable dened inside a function. A local variable can only be used
inside its function.

return value: The result of a function. If a function call is used as an expression, the
return value is the value of the expression.

fruitful function: A function that returns a value.

void function: A function that always returns None.


56 CHAPTER 5. FUNCTIONS

None: A special value returned by void functions.

module: A le that contains a collection of related functions and other denitions.

import statement: A statement that reads a module le and creates a module object.

module object: A value created by an import statement that provides access to the values
dened in a module.

dot notation: The syntax for calling a function in another module by specifying the module
name followed by a dot (period) and the function name.

composition: Using an expression as part of a larger expression, or a statement as part of


a larger statement.

ow of execution: The order statements run in.

stack diagram: A graphical representation of a stack of functions, their variables, and the
values they refer to.

frame: A box in a stack diagram that represents a function call. It contains the local
variables and parameters of the function.

traceback: A list of the functions that are executing, printed when an exception occurs.
Chapter 6

Classes and objects

At this point you know how to use functions to organize code and built-in types to organize
data. The next step is to learn object-oriented programming, which uses programmer-
dened types to organize both code and data. Object-oriented programming is a big topic;
it will take a few chapters to get there.

6.1 Programmer-dened types


We have used many of Python's built-in types; now we are going to dene a new type. As
an example, we will create a type called Point that represents a point in two-dimensional
space.
In mathematical notation, points are often written in parentheses with a comma sepa-
rating the coordinates. For example, (0, 0) represents the origin, and (x, y) represents the
point x units to the right and y units up from the origin.
There are several ways we might represent points in Python:

ˆ We could store the coordinates separately in two variables, x and y.


ˆ We could store the coordinates as elements in a list or tuple.

ˆ We could create a new type to represent points as objects.

Creating a new type is more complicated than the other options, but it has advantages
that will be apparent soon.
A programmer-dened type is also called a class. A class denition looks like this:

class Point:
"""Represents a point in 2-D space."""
The header indicates that the new class is called Point. The body is a docstring that explains
what the class is for. You can dene variables and methods inside a class denition, but we
will get back to that later.
Dening a class named Point creates a class object.
>>> Point
<class '__main__.Point'>
Because Point is dened at the top level, its full name is __main__.Point.
The class object is like a factory for creating objects. To create a Point, you call Point
as if it were a function.

57
58 CHAPTER 6. CLASSES AND OBJECTS

Point
blank x 3.0
y 4.0

Figure 6.1: Object diagram.

>>> blank = Point()


>>> blank
<__main__.Point object at 0xb7e9d3ac>

The return value is a reference to a Point object, which we assign to blank.


Creating a new object is called instantiation, and the object is an instance of the
class.
When you print an instance, Python tells you what class it belongs to and where it is
stored in memory (the prex 0x means that the following number is in hexadecimal).
Every object is an instance of some class, so object and instance are interchangeable.
But in this chapter I use instance to indicate that I am talking about a programmer-dened
type.

6.2 Attributes
You can assign values to an instance using dot notation:

>>> blank.x = 3.0


>>> blank.y = 4.0

This syntax is similar to the syntax for selecting a variable from a module, such as math.pi
or string.whitespace. In this case, though, we are assigning values to named elements of
an object. These elements are called attributes.
As a noun, AT-trib-ute is pronounced with emphasis on the rst syllable, as opposed
to a-TRIB-ute, which is a verb.
Figure 6.1 is a state diagram that shows the result of these assignments. A state diagram
that shows an object and its attributes is called an object diagram.
The variable blank refers to a Point object, which contains two attributes. Each attribute
refers to a oating-point number.
You can read the value of an attribute using the same syntax:

>>> blank.y
4.0
>>> x = blank.x
>>> x
3.0

The expression blank.x means, Go to the object blank refers to and get the value of x.
In the example, we assign that value to a variable named x. There is no conict between
the variable x and the attribute x.
You can use dot notation as part of any expression. For example:

>>> '(%g, %g)' % (blank.x, blank.y)


'(3.0, 4.0)'
6.3. RECTANGLES 59

>>> distance = math.sqrt(blank.x**2 + blank.y**2)


>>> distance
5.0
You can pass an instance as an argument in the usual way. For example:

def print_point(p):
print('(%g, %g)' % (p.x, p.y))
print_point takes a point as an argument and displays it in mathematical notation. To
invoke it, you can pass blank as an argument:

>>> print_point(blank)
(3.0, 4.0)
Inside the function, p is an alias for blank, so if the function modies p, blank changes.
As an exercise, write a function called distance_between_points that takes two Points
as arguments and returns the distance between them.

6.3 Rectangles
Sometimes it is obvious what the attributes of an object should be, but other times you have
to make decisions. For example, imagine you are designing a class to represent rectangles.
What attributes would you use to specify the location and size of a rectangle? You can ignore
angle; to keep things simple, assume that the rectangle is either vertical or horizontal.
There are at least two possibilities:

ˆ You could specify one corner of the rectangle (or the center), the width, and the height.

ˆ You could specify two opposing corners.

At this point it is hard to say whether either is better than the other, so we'll implement
the rst one, just as an example.
Here is the class denition:

class Rectangle:
"""Represents a rectangle.

attributes: width, height, corner.


"""
The docstring lists the attributes: width and height are numbers; corner is a Point object
that species the lower-left corner.
To represent a rectangle, you have to instantiate a Rectangle object and assign values
to the attributes:

box = Rectangle()
box.width = 100.0
box.height = 200.0
box.corner = Point()
box.corner.x = 0.0
box.corner.y = 0.0
The expression box.corner.x means, Go to the object box refers to and select the attribute
named corner; then go to that object and select the attribute named x.
Figure 6.2 shows the state of this object. An object that is an attribute of another object
is embedded.
60 CHAPTER 6. CLASSES AND OBJECTS

Rectangle
box width 100.0 Point
height 200.0 x 0.0
corner y 0.0

Figure 6.2: Object diagram.

6.4 Instances as return values


Functions can return instances. For example, find_center takes a Rectangle as an argu-
ment and returns a Point that contains the coordinates of the center of the Rectangle:
def find_center(rect):
p = Point()
p.x = rect.corner.x + rect.width/2
p.y = rect.corner.y + rect.height/2
return p
Here is an example that passes box as an argument and assigns the resulting Point to
center:
>>> center = find_center(box)
>>> print_point(center)
(50, 100)

6.5 Objects are mutable


You can change the state of an object by making an assignment to one of its attributes. For
example, to change the size of a rectangle without changing its position, you can modify
the values of width and height:
box.width = box.width + 50
box.height = box.height + 100
You can also write functions that modify objects. For example, grow_rectangle takes a
Rectangle object and two numbers, dwidth and dheight, and adds the numbers to the
width and height of the rectangle:

def grow_rectangle(rect, dwidth, dheight):


rect.width += dwidth
rect.height += dheight
Here is an example that demonstrates the eect:

>>> box.width, box.height


(150.0, 300.0)
>>> grow_rectangle(box, 50, 100)
>>> box.width, box.height
(200.0, 400.0)
Inside the function,rect is an alias for box, so when the function modies rect, box changes.
As an exercise, write a function named move_rectangle that takes a Rectangle and two
numbers named dx and dy. It should change the location of the rectangle by adding dx to
the x coordinate of corner and adding dy to the y coordinate of corner.
6.6. COPYING 61

box width 100.0 100.0 width box2


height 200.0 x 0.0 200.0 height
corner y 0.0 corner

Figure 6.3: Object diagram.

6.6 Copying
Aliasing can make a program dicult to read because changes in one place might have
unexpected eects in another place. It is hard to keep track of all the variables that might
refer to a given object.
Copying an object is often an alternative to aliasing. The copy module contains a
function called copy that can duplicate any object:

>>> p1 = Point()
>>> p1.x = 3.0
>>> p1.y = 4.0

>>> import copy


>>> p2 = copy.copy(p1)

p1 and p2 contain the same data, but they are not the same Point.

>>> print_point(p1)
(3, 4)
>>> print_point(p2)
(3, 4)
>>> p1 is p2
False
>>> p1 == p2
False

The is operator indicates that p1 and p2 are not the same object, which is what we expected.
But you might have expected == to yield True because these points contain the same data.
In that case, you will be disappointed to learn that for instances, the default behavior
of the == operator is the same as the is operator; it checks object identity, not object
equivalence. That's because for programmer-dened types, Python doesn't know what
should be considered equivalent. At least, not yet.
If you use copy.copy to duplicate a Rectangle, you will nd that it copies the Rectangle
object but not the embedded Point.

>>> box2 = copy.copy(box)


>>> box2 is box
False
>>> box2.corner is box.corner
True

Figure 6.3 shows what the object diagram looks like. This operation is called a
shallow copy because it copies the object and any references it contains, but not the
embedded objects.
62 CHAPTER 6. CLASSES AND OBJECTS

For most applications, this is not what you want. In this example, invoking grow_rectangle
on one of the Rectangles would not aect the other, but invoking move_rectangle on either
would aect both! This behavior is confusing and error-prone.
Fortunately, the copy module provides a method named deepcopy that copies not only
the object but also the objects it refers to, and the objects they refer to, and so on. You
will not be surprised to learn that this operation is called a deep copy.

>>> box3 = copy.deepcopy(box)


>>> box3 is box
False
>>> box3.corner is box.corner
False
box3 and box are completely separate objects.
As an exercise, write a version of move_rectangle that creates and returns a new Rect-
angle instead of modifying the old one.

6.7 Debugging
When you start working with objects, you are likely to encounter some new exceptions. If
you try to access an attribute that doesn't exist, you get an AttributeError:
>>> p = Point()
>>> p.x = 3
>>> p.y = 4
>>> p.z
AttributeError: Point instance has no attribute 'z'
If you are not sure what type an object is, you can ask:

>>> type(p)
<class '__main__.Point'>
You can also use isinstance to check whether an object is an instance of a class:

>>> isinstance(p, Point)


True
If you are not sure whether an object has a particular attribute, you can use the built-in
function hasattr:
>>> hasattr(p, 'x')
True
>>> hasattr(p, 'z')
False
The rst argument can be any object; the second argument is a string that contains the
name of the attribute.
You can also use a try statement to see if the object has the attributes you need:

try:
x = p.x
except AttributeError:
x = 0
This approach can make it easier to write functions that work with dierent types; more
on that topic is coming up in Section ??.
6.8. GLOSSARY 63

6.8 Glossary
class: A programmer-dened type. A class denition creates a new class object.

class object: An object that contains information about a programmer-dened type. The class
object can be used to create instances of the type.

instance: An object that belongs to a class.

instantiate: To create a new object.

attribute: One of the named values associated with an object.

embedded object: An object that is stored as an attribute of another object.

shallow copy: To copy the contents of an object, including any references to embedded objects;
implemented by the copy function in the copy module.

deep copy: To copy the contents of an object as well as any embedded objects, and any objects
embedded in them, and so on; implemented by the deepcopy function in the copy
module.

object diagram: A diagram that shows objects, their attributes, and the values of the attributes.
64 CHAPTER 6. CLASSES AND OBJECTS
Chapter 7

Magic Methods

7.1 Introduction
Magic methods in Python are the special methods which add "magic" to your class. Magic
methods are not meant to be invoked directly by you, but the invocation happens internally
from the class on a certain action. For example, when you add two numbers using the +
operator, internally, the

__add__()
method will be called. Built-in classes in Python dene many magic methods. Use the

dir()
function to see the number of magic methods inherited by a class. For example, the following
lists all the attributes and methods dened in the int class.

>>> dir(int)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__',
'__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__',
'__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__',
'__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__',
'__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__',
'__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__',
'__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__',
'__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate',
'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
As you can see above, the int class includes various magic methods surrounded by double
underscores. For example, the

__add__
method is a magic method which gets called when we add two numbers using the + operator.
Consider the following example.

>>> num=10
>>> num + 5
15

65
66 CHAPTER 7. MAGIC METHODS

>>> num.__add__(5)
15
As you can see, when you do num+10, the + operator calls the

__add__(10)
method. You can also call

num.__add__(5)
directly which will give the same result. However, as mentioned before, magic methods are
not meant to be called directly, but internally, through some other methods or actions.
Magic methods are most frequently used to dene overloaded behaviours of predened
operators in Python. For instance, arithmetic operators by default operate upon numeric
operands. This means that numeric objects must be used along with operators like +, -, *,
/, etc. The + operator is also dened as a concatenation operator in string, list and tuple
classes. We can say that the + operator is overloaded.
In order to make the overloaded behaviour available in your own custom class, the
corresponding magic method should be overridden. For example, in order to use the +
operator with objects of a user-dened class, it should include the

__add__()
method.
Let's see how to implement and use some of the important magic methods.

7.2 __new__() method


Languages such as Java and C# use the new operator to create a new instance of a class.
In Python the

__new__()
magic method is implicitly called before the

__init__()
method. The

__new__()
method returns a new object, which is then initialized by

__init__()
.

class employee:
def __new__(cls):
print ("__new__ magic method is called")
inst = object.__new__(cls)
return inst
def __init__(self):
print ("__init__ magic method is called")
self.name='Haitham'
The above example will produce the following output when you create an instance of the
Employee class.
7.3. __STR__() METHOD 67

>>> e1=employee()
__new__ magic method is called
__init__ magic method is called
Thus, the

__new__()
method is called before the

__init__()
method.

7.3 __str__() method


Another useful magic method is

__str__()
. It is overridden to return a printable string representation of any user dened class. We
have seen str() built-in function which returns a string from the object parameter. For
example, str(12) returns '12'. When invoked, it calls the

__str__()
method in the int class.

>>> num=12
>>> str(num)
'12'
>>> #This is equivalent to
>>> int.__str__(num)
'12'
Let us now override the

__str__()
method in the employee class to return a string representation of its object.

class employee:
def __init__(self):
self.name='Haitham'
self.salary=10000
def __str__(self):
return 'name='+self.name+' salary=\$'+str(self.salary)
See how the str() function internally calls the

__str__()
method dened in the employee class. This is why it is called a magic method.

>>> e1=employee()
>>> print(e1)
name=Haitham salary=\$10000
68 CHAPTER 7. MAGIC METHODS

7.4 __add__() method


In following example, a class named distance is dened with two instance attributes - ft
and inch. The addition of these two distance objects is desired to be performed using the
overloading + operator.
To achieve this, the magic method

__add__()
is overridden, which performs the addition of the ft and inch attributes of the two objects.
The

__str__()
method returns the object's string representation.

class distance:
def __init__(self, x=None,y=None):
self.ft=x
self.inch=y
def __add__(self,x):
temp=distance()
temp.ft=self.ft+x.ft
temp.inch=self.inch+x.inch
if temp.inch>=12:
temp.ft+=1
temp.inch-=12
return temp
def __str__(self):
return 'ft:'+str(self.ft)+' in: '+str(self.inch)
Run the above Python script to verify the overloaded operation of the + operator.

>>> d1=distance(3,10)
>>> d2=distance(4,4)
>>> print("d1= {} d2={}".format(d1, d2))
d1= ft:3 in: 10 d2=ft:4 in: 4
>>>d3=d1+d2
>>>print(d3)
ft:8 in: 2

7.5 __ge__() method


The following method is added in the distance class to overload the ≥ operator.

class distance:
def __init__(self, x=None,y=None):
self.ft=x
self.inch=y
def __ge__(self, x):
val1=self.ft*12+self.inch
val2=x.ft*12+x.inch
if val1>=val2:
return True
7.6. IMPORTANT MAGIC METHODS 69

Initialization and Construction Description

__new__(cls, other) To get called in an object's instantiation


__init__(self, other) To get called by the __new__ method
__del__(self ) Destructor method

Unary operators and functions Description

__pos__(self ) To get called for unary positive e.g. +someobject


__neg__(self ) To get called for unary negative e.g. -someobject
__abs__(self ) To get called by built-in abs() function
__invert__(self ) To get called for inversion using the operator
__round__(self,n) To get called by built-in round() function
__oor__(self ) To get called by built-in math.oor() function
__ceil__(self ) To get called by built-in math.ceil() function
__trunc__(self ) To get called by built-in math.trunc() function.

else:
return False
This method gets invoked when the ≥ operator is used and returns True or False. Ac-
cordingly, the appropriate message can be displayed

>>>d1=distance(2,1)
>>>d2=distance(4,10)
>>>d1>=d2
False

7.6 Important Magic Methods


The following tables list important magic methods in Python 3.

Augmented Assignment Description

__iadd__(self, other) To get called on addition with assignment e.g. a+ = b


__isub__(self, other) To get called on subtraction with assignment e.g. a− = b
__imul__(self, other) To get called on multiplication with assignment e.g. a∗ = b
__ioordiv__(self, other) To get called on integer division with assignment e.g. a// = b
__idiv__(self, other) To get called on division with assignment e.g. a/ = b
__itruediv__(self, other) To get called on true division with assignment
__imod__(self, other) To get called on modulo with assignment e.g. a% = b
__ipow__(self, other) To get called on exponentswith assignment e.g. a∗∗=b
__ilshift__(self, other) To get called on left bitwise shift with assignment e.g. a <<= b
__irshift__(self, other) To get called on right bitwise shift with assignment e.g. a >>= b
__iand__(self, other) To get called on bitwise AND with assignment e.g. a∧ = b
__ior__(self, other) To get called on bitwise OR with assignment e.g. a|=b
__ixor__(self, other) To get called on bitwise XOR with assignment e.g. a⊕ = b
70 CHAPTER 7. MAGIC METHODS

Type Conversion Magic Methods Description

__int__(self ) To get called by built-int int() method to convert a type to an int


__oat__(self ) To get called by built-int oat() method to convert a type to oat
__complex__(self ) To get called by built-int complex() method to convert a type to complex
__oct__(self ) To get called by built-int oct() method to convert a type to octal
__hex__(self ) To get called by built-int hex() method to convert a type to hexadecimal
__index__(self ) To get called on type conversion to an int when the object is used in a slice expressio
__trunc__(self ) To get called from math.trunc() method

String Magic Methods Description

__str__(self ) To get called by built-int str() method to return a string representation of a type
__repr__(self ) To get called by built-int repr() method to return a machine readable representation of a
__unicode__(self ) To get called by built-int unicode() method to return an unicode string of a type
__format__(self, formatstr) To get called by built-int string.format() method to return a new style of string
__hash__(self ) To get called by built-int hash() method to return an integer
__nonzero__(self ) To get called by built-int bool() method to return True or False
__dir__(self ) To get called by built-int dir() method to return a list of attributes of a class
__sizeof__(self ) To get called by built-int sys.getsizeof() method to return the size of an object

Attribute Magic Methods Description

__getattr__(self, name) Is called when the accessing attribute of a class that does not exist
__setattr__(self, name, value) Is called when assigning a value to the attribute of a class
__delattr__(self, name) Is called when deleting an attribute of a class

Operator Magic Methods Description

__add__(self, other) To get called on add operation using + operator


__sub__(self, other) To get called on subtraction operation using - operator
__mul__(self, other) To get called on multiplication operation using * operator
__oordiv__(self, other) To get called on oor division operation using // operator
__div__(self, other) To get called on division operation using ÷ operator
__mod__(self, other) To get called on modulo operation using % operator
__pow__(self, other[, modulo]) To get called on calculating the power using ** operator
__lt__(self, other) To get called on comparison using < operator
__le__(self, other) To get called on comparison using <= operator
__eq__(self, other) To get called on comparison using == operator
__ne__(self, other) To get called on comparison using ! = operator
__gt__ To get called on comparison using > operator
__ge__(self, other) To get called on comparison using >= operator
Chapter 8

Python Testing

8.1 Testing Your Code


There are many ways to test your code. In this chapter, you'll learn the techniques from
the most basic steps and work towards advanced methods.

8.2 Automated vs. Manual Testing


The good news is, you've probably already created a test without realizing it. Remember
when you ran your application and used it for the rst time? Did you check the features and
experiment using them? That's known as exploratory testing and is a form of manual
testing.
Exploratory testing is a form of testing that is done without a plan. In an exploratory
test, you're just exploring the application. To have a complete set of manual tests, all you
need to do is make a list of all the features your application has, the dierent types of input
it can accept, and the expected results. Now, every time you make a change to your code,
you need to go through every single item on that list and check it.
This is where automated testing comes in. Automated testing is the execution of your
test plan (the parts of your application you want to test, the order in which you want to test
them, and the expected responses) by a script instead of a human. Python already comes
with a set of tools and libraries to help you create automated tests for your application.

8.3 Unit Tests vs. Integration Tests


Testing multiple components is known as integration testing.
Think of all the things that need to work correctly in order for a simple task to give the
right result. These components are like the parts to your application, all of those classes,
functions, and modules you've written.
A major challenge with integration testing is when an integration test doesn't give the
right result. It's very hard to diagnose the issue without being able to isolate which part of
the system is failing.
A unit test is a smaller test, one that checks that a single component operates in the
right way. A unit test helps you to isolate what is broken in your application and x it
faster.
You have just seen two types of tests:

71
72 CHAPTER 8. PYTHON TESTING

1. An integration test checks that components in your application operate with each
other.

2. A unit test checks a small component in your application.

You can write both integration tests and unit tests in Python.

8.4 Choosing a Test Runner


There are many test runners available for Python. The one built into the Python standard
library is called unittest. The principles of unittest are easily portable to other frameworks.
The three most popular test runners are:

ˆ unittest

ˆ nose or nose2

ˆ pytest

Choosing the best test runner for your requirements and level of experience is important.

8.4.1 unittest
unittest has been built into the Python standard library since version 2.1. You'll probably
see it in commercial Python applications and open-source projects. unittest contains both a
testing framework and a test runner. unittest has some important requirements for writing
and executing tests.
unittest requires that:

1. You put your tests into classes as methods

2. You use a series of special assertion methods in the unittest.TestCase class instead of
the built-in assert statement

To build a unittest test case, you would have to:

1. Import unittest from the standard library

2. Create a class called TestSum that inherits from the TestCase class

3. Convert the test functions into methods by adding self as the rst argument

4. Change the assertions to use the self.assertEqual() method on the TestCase class

5. Change the command-line entry point to call unittest.main()


Chapter 9

Numpy

Numpy is the core library for scientic computing in Python. It provides a high-performance
multidimensional array object, and tools for working with these arrays.

9.1 Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non-
negative integers. The number of dimensions is the rank of the array; the shape of an array
is a tuple of integers giving the size of the array along each dimension.
We can initialize numpy arrays from nested Python lists, and access elements using
square brackets:

import numpy as np

a = np.array([1, 2, 3]) # Create a rank 1 array


print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)"
print(a[0], a[1], a[2]) # Prints "1 2 3"
a[0] = 5 # Change an element of the array
print(a) # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array


print(b.shape) # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0]) # Prints "1 2 4"

Numpy also provides many functions to create arrays:

import numpy as np

a = np.zeros((2,2)) # Create an array of all zeros


print(a) # Prints "[[ 0. 0.]
# [ 0. 0.]]"

b = np.ones((1,2)) # Create an array of all ones


print(b) # Prints "[[ 1. 1.]]"

c = np.full((2,2), 7) # Create a constant array

73
74 CHAPTER 9. NUMPY

print(c) # Prints "[[ 7. 7.]


# [ 7. 7.]]"

d = np.eye(2) # Create a 2x2 identity matrix


print(d) # Prints "[[ 1. 0.]
# [ 0. 1.]]"

e = np.random.random((2,2)) # Create an array filled with random values


print(e) # Might print "[[ 0.91940167 0.08143941]
# [ 0.68744134 0.87236687]]"

9.2 Array Indexing


Numpy oers several ways to index into arrays.

9.2.1 Slicing
Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidi-
mensional, you must specify a slice for each dimension of the array:

import numpy as np

# Create the following rank 2 array with shape (3, 4)


# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
# [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it


# will modify the original array.
print(a[0, 1]) # Prints "2"
b[0, 0] = 77 # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1]) # Prints "77"
You can also mix integer indexing with slice indexing. However, doing so will yield an
array of lower rank than the original array.

import numpy as np

# Create the following rank 2 array with shape (3, 4)


# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
9.2. ARRAY INDEXING 75

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :] # Rank 1 view of the second row of a
row_r2 = a[1:2, :] # Rank 2 view of the second row of a
print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:


col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape) # Prints "[ 2 6 10] (3,)"
print(col_r2, col_r2.shape) # Prints "[[ 2]
# [ 6]
# [10]] (3, 1)"

9.2.2 Integer Array Indexing


Integer array indexing: When you index into numpy arrays using slicing, the resulting array
view will always be a subarray of the original array. In contrast, integer array indexing
allows you to construct arbitrary arrays using the data from another array. Here is an
example:

import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.


# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]]) # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:


print(np.array([a[0, 0], a[1, 1], a[2, 0]])) # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]]) # Prints "[2 2]"

# Equivalent to the previous integer array indexing example


print(np.array([a[0, 1], a[0, 1]])) # Prints "[2 2]"
One useful trick with integer array indexing is selecting or mutating one element from
each row of a matrix:

import numpy as np

# Create a new array from which we will select elements


a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a) # prints "array([[ 1, 2, 3],


# [ 4, 5, 6],
# [ 7, 8, 9],
76 CHAPTER 9. NUMPY

# [10, 11, 12]])"

# Create an array of indices


b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b


print(a[np.arange(4), b]) # Prints "[ 1 6 7 11]"

# Mutate one element from each row of a using the indices in b


a[np.arange(4), b] += 10

print(a) # prints "array([[11, 2, 3],


# [ 4, 5, 16],
# [17, 8, 9],
# [10, 21, 12]])

9.2.3 Boolean Array Indexing


Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an
array. Frequently this type of indexing is used to select the elements of an array that satisfy
some condition. Here is an example:

import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2) # Find the elements of a that are bigger than 2;


# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.

print(bool_idx) # Prints "[[False False]


# [ True True]
# [ True True]]"

# We use boolean array indexing to construct a rank 1 array


# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx]) # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:


print(a[a > 2]) # Prints "[3 4 5 6]"

9.3 Data Types


Every numpy array is a grid of elements of the same type. Numpy provides a large set of
numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype
when you create an array, but functions that construct arrays usually also include an optional
argument to explicitly specify the datatype. Here is an example:
9.4. ARRAY MATH 77

import numpy as np

x = np.array([1, 2]) # Let numpy choose the datatype


print(x.dtype) # Prints "int64"

x = np.array([1.0, 2.0]) # Let numpy choose the datatype


print(x.dtype) # Prints "float64"

x = np.array([1, 2], dtype=np.int64) # Force a particular datatype


print(x.dtype) # Prints "int64"

9.4 Array Math


Basic mathematical functions operate elementwise on arrays, and are available both as
operator overloads and as functions in the numpy module:

import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array


# [[ 6.0 8.0]
# [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array


# [[-4.0 -4.0]
# [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array


# [[ 5.0 12.0]
# [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array


# [[ 0.2 0.33333333]
# [ 0.42857143 0.5 ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array


# [[ 1. 1.41421356]
# [ 1.73205081 2. ]]
print(np.sqrt(x))
78 CHAPTER 9. NUMPY

Note that unlike MATLAB, * is elementwise multiplication, not matrix multiplication.


We instead use the dot function to compute inner products of vectors, to multiply a vector
by a matrix, and to multiply matrices. dot is available both as a function in the numpy
module and as an instance method of array objects:

import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219


print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array


# [[19 22]
# [43 50]]
print(x.dot(y))
print(np.dot(x, y))

Numpy provides many useful functions for performing computations on arrays; one of
the most useful is sum:

import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x)) # Compute sum of all elements; prints "10"


print(np.sum(x, axis=0)) # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1)) # Compute sum of each row; prints "[3 7]"

Apart from computing mathematical functions using arrays, we frequently need to re-
shape or otherwise manipulate data in arrays. The simplest example of this type of operation
is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

import numpy as np

x = np.array([[1,2], [3,4]])
print(x) # Prints "[[1 2]
# [3 4]]"
print(x.T) # Prints "[[1 3]
# [2 4]]"

# Note that taking the transpose of a rank 1 array does nothing:


v = np.array([1,2,3])
9.4. ARRAY MATH 79

print(v) # Prints "[1 2 3]"


print(v.T) # Prints "[1 2 3]"
80 CHAPTER 9. NUMPY
Chapter 10

Abstract Data Types

10.1 Introduction
The foundation of computer science is based on the study of algorithms. An algorithm
is a sequence of clear and precise step-by-step instrutions for solving a problem in a nite
amount of time. Algorithms are implemented by translating the step-by-step instructions
into computer program that can be executed by a computer. This translation process is
called computer programming or simply programming. Computer programs are con-
structed using a programming language appropriate to the problem. While programming
is an important part of computer science, computer science is not the study of program-
ming. Nor is it about learning a particular programming language. Instead, programming
and programming languages are tools used by compter scientists to solve problems.
Data items are represented within a computer as a sequence of binary digits. To dis-
tinguish between the dierent types of data, the term type is often used to refer to a
collection of values and the term data type to refer to a given type along with a collection
of operations for manipulating values of the given type.
Programming languages commonly provide data types as part of the language itself.
These data types, known as primitives, come in two categories: simple and complex.
The simple data types consists of values that are in the most basic form and can't
be decomposed into smaller parts. Integer and real types, for example, consist of single
numeric values. The complex data types, on the other hand, are constructed of multiple
components consisting of simple types or other complex types. In Python, objects, strings,
lists, and dictionaries, which can contain multiple values, are all examples of complex types.
The primitive types provided by a language may not be sucient for solving large complex
problems. Thus, most languages allow for the construction of additional data types, known
as user-dened types since they are dened by the programmer and not the language.
Some of these data types can themselves be very complex.

10.2 Abstractions
An abstraction is a mechanism for separating the properties of an object and restricing the
focus to those relevant in the current context. Abstractions are used to help manage complex
problems, and complex data types. The user of the abstraction does not have to understand
all of the details in order to utilize the object, but only those relevant to the current task
or problem. Typcially, abstractions of problems occur in layers. Two common types of
abstractions encountered in computer science are procedural, or functional, abstraction and

81
82 CHAPTER 10. ABSTRACT DATA TYPES

data abstraction.

10.2.1 Procedural Abstraction


Procedural abstraction is the use of a function or method knowing what it does but ignoring
how it is accomplished. Consider the mathematical square root function which you have
probably used a lot. Do you know how the square root is computed?

10.2.2 Data Abstraction


Data abstraction is the separation of the properties of a data type (its values and operations)
from the implementation of that data type. You have used strings in Python many times.
Do you know how they are implemented?

10.3 Abstract Data Types


An abstract data type (ADT) is a programmer-dened data type that species a set
of data values and a collection of well-dened operations that can be performed on those
values. Abstract data types are dened independent of their implementation, allowing us
to focus on the use of the new data type instead of how it is implemented. This separa-
tion is typically enforced by requiring interaction with the abstract data type through an
interface or dened set of operations. This is known as information hiding. By hiding
the implementation details and requiring ADTs to be accessed through an interface, we can
work with an abstraction and focus on what functionality the ADT provides instead of how
that functionality is implemented.
Abstract data types can be viewed like black boxed as illustrated in gure 10.3 presented
in page 83. User programs interact with instances of the ADT by invoking one of the several
operations dened by its interface. The set of operations can be grouped into four categories:

ˆ Constructors Create and initialize new instances of the ADT

ˆ Accessors Return data contained in an instance without modifying it

ˆ Mutators Modify the contents of an ADT instance

ˆ Iterators Process individual data components sequentially

The implementation of the various operations are hidden inside the black box, the con-
tents of which we do not have to know in order to utilize the ADT. There are several
advantages of working with adbstract data types and focusing on 'what' instead of 'how'

ˆ We can focus on solving the problem at hand instead of getting down in the imple-
mentaion details

ˆ We can reduce logical errors that can occur from accidental misuse of storage structures
and data types by presenting direct access to the implementation

ˆ The implementaion of the abstract data type can be changed without having to modify
the program code that uses the ADT

ˆ It is easier to manage and divide larger programs into smaller modules


10.4. DATA STRUCTURES 83

Figure 10.1: Separating the ADT denition from its implementation

10.4 Data Structures


Working with abstract data types, which separate the denition from the implementation, is
advantageous in solving problems and writing programs. At some point, however, we must
provide a concrete implementation in order for the program to execute. ADTs provided in
language libraries, like Python, are implemented by the maintainers of the library. When
you dene and create your own abstract data types, you must eventually provide an imple-
mentation. The choices you make in implementing your ADT can ect its functionality and
eciency.

10.4.1 ADT and Data Structures


Abstract data types can be simple or complex. A simple ADT is composed of a single
or several individually named dta elds such as those used to represent data values such as
those used to represent a date or rational number. The complex ADTs are composed of a
collection of data values such as the Python list or dictionary. Complex abstract data types
are implemnted using a particular data structure, which is the physical representation
of how data is organized and manipulated. Data structures can be characterized by how
they store and organize the individual data elements and what operations are available for
accessing and manipulating the data.
There are many common data structures, including arrays, linked lists, stacks, queues,
and trees, to name a few. All data structures store a collection of values, but dier in how
they organize the individual data items and by what operations can be applied to manage
the collection. The choice of a particular data structure depends on the ADT and the
problem at hand.

10.5 General Denitions


10.5.1 Collection
A collection is a group of values with no implied organization or relationsip between the
individual values. Sometimes we may restrict the elements to a specic data type such as a
collection of integers or oating-point values.
84 CHAPTER 10. ABSTRACT DATA TYPES

10.5.2 Container
A container is any data structure or abstract data type that stores and organizes a col-
lection. The individual values of the collection are known as elements of the container
and a container with no elements is said to be empty. The organization or arrangement
of the elements can vary from one container to the next as can be the operations available
for accessing the elements. Python provides a number of built-in containers, which include
strings, tuples, lists, dictionaries, and sets.

10.5.3 Sequence
A sequence is a container in which the elements are arranged in linear order from front
to back, with each element accessible by position. Access to the individual elements based
on their position within the linear order is provided using the subscript operator. Python
provides two immutable sequences, strings and tuples, and one mutable sequence, the list.

10.5.4 Sorted Sequence


A sorted sequence is one in which the position of the elements is based on a prescribed
relationship between each element and its successor. For example, we can created a sorted
sequence of integers in which the elements are arranged in ascending or increasing order
from smallest to largest value.

10.5.5 List vs. Python list


In computer science, the term List is commonly used to refer to any collection with a linear
ordering. The ordering is such taht every element in the collection, except the rst one, has
a unique predecessor and every element, except the last one, has a unique successor. By
this denition, a sequence is a list, but a list not necessarily a sequence since there is no
requirement that a list provide access to the elements by position. Python uses the same
name for its built-in mutable sequence type, which in other languages would be called an
arraylist or vector data type.

10.6 Python and ADT


10.6.1 Step 01: Specify ADT
When dening an ADT, we speciy the ADT operaions as method prototypes. The class
constructor, which is used to create an instance of the ADT, is indicated by the name of
the class used in the implementation. Python allows classes to dene or overload various
operators that can be used more naturally in a program without having to call a method
by name. We dene all ADT operations as named methods, but implement some of them
as operators when appropriate instead of using the named method. The ADT operations
that will be implemented as Python operators are indicated in italicized text and a brief
comment is provided in the ADT denition indicating the corresponding operator. This
approach allows us to focus on the general ADT specication that can be wasily translated
to other languages if the need arises but also allows us to take advantage of Python's simple
syntax in avarious sample programs. More information in chapter 7 presented in page 65.
10.7. BAGS 85

10.6.2 02: Using the ADT


To illustrate the use of the ADT, we present programs which processes elements / collections
of the ADT. Working with abstractions allows us focusing on what functionality the ADT
provides instead of how that unctionality is implemented. By hiding the implementation
details, we can use an ADT independent of its implementation. In fact, the choice of
implementation for the ADT will have no eect on the instructions in our example programs.
Classes are the foundation of object-oriented programming languages, and they procide a
convenienct mechanism form dening and implementing abstract data types. A review of
Python classes is presented in chapter 6 presented in page 57.

10.6.3 Preconditions and Postconditions


In dening the operations, we must include a specication of required inputs and the re-
sulting outpu, if any. In addition, we must specify the preconditions and postconditions
for each operation. A precondition indicates the condition or state of the ADT instance
and inputs before the operation can be performed. A postcondition indicates the result
of rnding state of the ADT instance after the operation is performed. The precondition is
assumed to be true while the post condition is a guarantee as long as the preconditions are
met. Attempting to perform an operation in which the precondition is not satised should
be agged as an error. Python raises an exception when an error occurs. An exception is
an event that can be triggered and optionally handled during program execution. When an
exception is raised indicating an error, the program can contain code to catch and gracefully
handle the excepttion; otherwise, the program will abort. The

assert
statement, which can be used to raise an

AssertionError
exception. Assert statement is used to state what we assume to be true at a given proint
in the program. If the assertion fails, Python automatically raises an AssertionError and
aborts the program, unluess the exception is caught.

10.7 Bags
A bag is a simple container like a shopping bag that can be used to store a collection of
items. The bag container restricts access to the individual items by only dening operations
for adding and removing individual items, for determining if an item is in the bag, and for
traversing over the collection of items.

10.7.1 Bag Abstract Data Type


There are several variations of the Bag ADT with the one described here being a simple bag.
A grab bag is similar to the simple bag but the items are removed from the bag at random.
Another common variations is the counting bag, which which includes an operation that
returns the number of occurences in the bag of a given item.

10.7.2 Bag Denition


A bag is a container that stores a collection in which duplicate values are allowed. The item,
each of which is individually stored, have no particular order but the must be comparable.
86 CHAPTER 10. ABSTRACT DATA TYPES

ˆ Bag(): Creates a bag that is initially empty

ˆ length(): Returns the number of items stored in the bag. Accessed using the len()
function

ˆ contains(item): Determines if the given target item is stored in the bag and returns
the appropriate boolean value. Accessed using the in operator

ˆ add(item): Adds the given item to the bag

ˆ remove(item): Removes and returns an occurence of item from the bag. An exception
is raised if the element is not in the bag

ˆ iterator(): Creates and returns an iterator that can be used to iterate over the collec-
tion of items

10.7.3 Bag Usage Example


from bag import Bag
my_bag = Bag()
my_bag.add(19)
my_bag.add(74)
my_bag.add(23)
my_bag.add(19)
my_bag.add(12)

# Check if value in our bag


value = int(input('Guess a value in the bag'))
if value in my_bag:
print('You Guessed Right!)
else:
print('Try Again!')

# Print items in the bag


for item in my_bag:
print(item)

10.7.4 Why a Bag ADT?


You may be wondering, why do we need the Bag ADT when we could simply use the list
to store the items? For a small program and a small collection of data, using a list would
be appropriate. When working with large programs and multiple team members, however,
abstract data types provide several advantages. By working with the abstraction of a bag,
we can:

1. focus on solving the problem at hand instead of worrying about the implementation
of the container

2. reduce the chance of introducing errors from misuse of the list since it provides addi-
tional operations that are not appropriate for a bag

3. provide better coordination between dierent modules and designers

4. easily swap out our current implementation of the Bag ADT for a dierent possibly
more ecient, version later
10.8. CHOSE THE DATA STRUCTURE 87

10.7.5 Selecting a Data Structure


The implementation of a complex abstract data type typically requires the use of a data
structure for organizing and managing the collection of data items. There are many dierent
structures from which to choose. So how do we know which to use? We have to evaluate the
suitability of a data stucture for implementing a given abstract data type, which we base
on the following criteria:

1. Does the data structure provide for the storage requirements as specied by the domain
of the ADT? Abstract data types are dened to work with a specic domain of data
values. The data structure we choose must be capable of storing all possible values
in that domain, taking into consideration any restrictions or limitations placed on the
individual items.

2. Does the data structure provide the necessary data access and manipulation functional-
ity to fully implement the ADT? The functionality of an abstract data type is provided
through its dened set of operations. The data structure must allow for a full and
correct implementation of the ADT without having to violate the abstraction principle
by exposing the implementation details to the user

3. Does the data structure lend itself to an ecient implementation of the operations?
An important goal in the implementation of an abstract data type is to provide an
ecient solution. Some data structures allow for a more ecient implementation than
others, but not every data structure is suitable for implementing every ADT. Eciency
considerations can help to select the best structure from among multiple candidates

There may be multiple data structures suitable for implementing a given abstract data
type, but we attempt to select the best possible based on the context in which the ADT will
be used. Language libraries will commonly provide several implementations of some ADTs,
allowing the programmer to choose the most appropriate. Eciency will be introduced
later.

10.8 Chose the Data Structure


The possible candidates for implementing Bag ADT now include the list and dictionary
structures.
The list can store any type of comparable object, including duplicates. Each item is
stored individually, including duplicates, which means the reference to each individual object
is stored and later accessible when needed. This satises the storage requirements of the
Bag ADT, making the list a candidate structure for its implementtaion.
The dictionary stores key/value pairs in which the key component must be comparable
and unique. To use the dictionary in implementing the Bag ADT, we must have a way to
store duplicate items as required by the denition of the abstract data type. To accomplish
this, each unique item can be stored in the key part of the key/value part and a counter
can be stored in the value part. The counter would be used to indicate the number of
occurences of the corresponding item in the bag. When a duplicate item is added, the
counter is incremented; when a duplicate is removed, the counter is decremented.
Both the list and dictionary structure could be used to implement the Bag ADT. For
the simple version of the bag, however the list is a better choice since the dictionary would
require twice as much space to store the contents of the bag in the case where most of
the items are unique. The dictionary is an excellent choice for the implementation of the
counting bag variation of the ADT.
88 CHAPTER 10. ABSTRACT DATA TYPES

Having chosen the list, we must ensure it provides the means to implement the complete
set of bag operations. When implementing an ADT, we must use the functionality provided
by the underlying data structure. Sometimes, an ADT operation is identical to one already
provided by the data structure. In this case, the implementation can be quite simple and
may consist of a single call to the corresponding operation of the sturcture, while in other
cases, we have to use multiple operations provided by the structure. To help verify a correct
implementation of the Bag ADT using the list, we can outline how each bag operation will
be implemented:

ˆ An empty bag can be represented by an empty list

ˆ The size of the bag can determined by the size of the list

ˆ Determining if the bag contains a specic item can be done using the equivalent list
operation

ˆ When a new item is added to the bag, it can be appended to the end of the list since
there is nor specic ordering of the items in a bag

ˆ Removing an item from the bag can also be handled by the equivalent operation

ˆ The items in a list can be traversed using a for loop and Python provides for user-
dened iterators that be used with a bag

From this itemized list, we see that each Bag ADT operation can be implemented using
the available functionality of the list. Thus, the list is suitable for implementing the bag.

10.9 List-Based Implementation


The implementation of the Bag ADT using a list is shown below. The constructor denes
a single data eld, which is initialized to an empty list. This corresponds to the denition
of the constructor for the Bag ADT in which the container is initially created empty.

class Bag:
def __init__(self):
self._items = []
self._current_item = -1

def __len__(self):
return len(self._items)

def __contains__(self, item):


return item in self._items

def add(self, item):


self._items.append(item)

def remove(self, item):


assert item in self._items
idx = self._items.index(item)
return self._items.pop(idx)

def __iter(self):
10.9. LIST-BASED IMPLEMENTATION 89

Figure 10.2: Sample instance of the Bag class implemented using a list

Figure 10.3: The Bag and BagIterator objects after the rst loop iteration

return self

def __next__(self):
if self._current_item < len(self) - 1:
self._current_item += 1
return self._items[self._current_item]
else:
raise StopIteration

10.9.1 Some Implementation Details


Most of the implementation is straight forward, as shown in gure 10.9.1 presented in page
89. Using Iterators is illustrated in . Some extra details include:

ˆ ADT denition of remove() operation species the precondition that the item must
exist in the bag in order to be removed. Thus, we must rst assert that condition and
verify the existence of the item.

ˆ We need to provide an iteration mechanism that allows us to iterate over the individual
items in the bag.
90 CHAPTER 10. ABSTRACT DATA TYPES
Chapter 11

Arrays

11.1 Introduction
The most basic structure for storing and accessing a collection of data is the array. Arrays
can be used to sole a wide range of problems in computer science. Most programming
languages provide this structured data type as a primitive and allow for the creation of
arrays with multiple dimensions. Python don't.

11.2 The Array Structure


At the hardware level, most computer architectures provide a mechanism for creating and
using one-dimensional arrays. A one-dimensional array, as shown in gure 11.2 presented
in page 91, is composed of multiple sequential elements stored in contiguous bytes of memory
and allows for random access to the individual elements.
The entire contents of an array are identied by a single name. individual elements
within the array can be accessed directly by specifying an integer subscript or index value,
which indicates an oset from the start of the array.

11.2.1 Arrays vs. Python lists


Though arrays are very similar to Python lists, there are major diernces

1. Array has a limited number of operations

(a) Array creation

(b) reading a value from a specic element

(c) writing a value to a specic element

2. Python list provides a large number of operations for working with the contents of the
list

Figure 11.1: A sample 1-D array consisting of 11 elements

91
92 CHAPTER 11. ARRAYS

3. Python list can grow and shrink during execution as elements are added or removed

4. Size of Array can't be changed after it has been created

11.2.2 When to use Arrays?


Both structures have their use. If the number of elements is known beforehand and the
exible set of operations available with the list is not needed, we use Arrays.

1. Arrays are best suited for problems requiring sequences in which the maximum number
of elements are known upfront

2. Python lists are better choice when the size of the sequence needs to change after it
has been created

3. Python list contains more storage space than is needed to store the items currently in
the list. This extra space, the size of which can be up to twice the necessary capacity,
allows for quick and easy expansion as new items are added

4. However, extra space is wasteful when using Pytho list to store xed number of ele-
ments

5. Python lists provide a large set of operations, besides retrieving item at specic loca-
tion, like searching for item, removing an item by value or location, easily extracting
a subset of items, and sorting items

6. Arrays on the other hand, only provides limited set of operations for accessing the
individual elements

values = [ None ] * 100000


The previous sequence stores 100000 values in 200000 element space. What a waste!

11.3 Array Abstract Data Type


We can dene Array ADT to represent a one-dimensional array for use in Python that works
similar to arrays found in other languages.

11.3.1 Array ADT


A one-dimensional array is a collection of contiguous elements in which individual el-
ements are identifcal by a unique integer subscript starting with zero. Once an array is
created, its size can't be changed.

ˆ Array(size): Creates a one-dimensional array consisting of size elements with each


element initially set to None. size must be greater than zero

ˆ length(): Returns the length or number of elements in the array

ˆ get_item(index): Returns the value stored in the array at element position index.
The index argument must be within the valid range. Accessed using the subscript
operator.

ˆ set_item(index, value): Modies the contents of the array element at position index to
contain value. The index must be within the valid range. Accessed using the subscript
operator
11.3. ARRAY ABSTRACT DATA TYPE 93

ˆ clearing(value): Clears the array be setting every element to value

ˆ iterator: creates and returns an iterator that can be used to traverse the elements of
the array

11.3.2 Creation and Usage of Array ADT


Basic Example
from array import Array
import random

value_list = Array(100)
for i in range(len(value_list)):
value_list[i] random.random()

for value in value_list:


print(value)

Another example
Suppose you need to read the contents of a text le and count the number of letters occurring
in the le with the results printed to the terminal. Characters are presented by the ASCII
code which consists of integer values. The letters of the alphabet, both uppercase and
lowercase are part of what is known as the printable range of the ASCII code. This includes
the ASCII code in the range [32, ..., 126].

# Count the number of occurences of each letter in a text file

from array import Array

# Create an array for the counters and initialize each column


the_counters = Array(127)
the_counters.clear (0)

# Open the text file for reading and extract each line from the file
# and iterate over each character in he line
the_file = open('text_file.txt', 'r')
for line in the_file:
for letter in line:
code = ord(letter)
the_counter[code] += 1
# Close the file
the_file.close()

# Print the results


# The uppercase letters have ASCII values in the range 65 .. 90
# The lowercase letters have ASCII values in the range 97 .. 122
for i in range(26):
print("%c - %4d %c - %4d" % \
(chr(65+i), the_counters[65+i], chr(97+i), the_counters[97+i]))
94 CHAPTER 11. ARRAYS

11.3.3 Implementing the Array

import ctypes

class MyArray:
def __init__(self, size):
assert size > 0, 'Array size must be > 0'
self._size = size
self._next_item = -1

array_type = ctypes.py_object * size


self._elements = array_type()

self.clear(None)

def __len__(self):
return self._size

def clear(self, value):


for i in range(len(self)):
self._elements[i] = value

def __iter__(self):
return self

def __next__(self):
if self._next_item < len(self) - 1:
self._next_item += 1
return self._elements[self._next_item]
else:
raise StopIteration

# Makes our array subscriptable


def __getitem__(self, index):
assert index >= 0 and index < len(self), 'Array subscript out of range'
return self._elements[index]

# Makes our array subscriptable


def __setitem__(self, index, value):
assert index >=0 and index < len(self), 'Array subscript out of range'
self._elements[index] = value

11.4 Array 2D
11.4.1 Implementing Array 2D
11.4. ARRAY 2D 95

from my_array import MyArray

class MyArrayTD:
def __init__(self, no_rows, no_cols):

# Optional but will make code easier later


# self.no_rows = no_rows
# self.no_cols = no_cols

# Create Outer Array


self._rows = MyArray(no_rows)

# Create Internal Arrays for each Outer Array


for row in range(no_rows):
self._rows[row] = MyArray(no_cols)

def num_rows(self):
# return self.num_rows
return len(self._rows)

def num_cols(self):
# return self.num_cols
return len(self._rows[0])

def clear(self, value):


for row in range(self.num_rows()):
# this clear implementation inside my_array
row.clear(value)

# Make array subscriptable


def __getitem__(self, idx_tuple):
assert len(idx_tuple) == 2, 'Invalid number'
row = idx_tuple[0]
col = idx_tuple[1]
assert row >= 0 and row <= self.num_rows() and \
col >= 0 and col <= self.num_cols(), \
'Array subscript out of range'
the_one_d_arr = self._rows[row]
return the_one_d_arr[col]

# Make array subscriptable


def __setitem__(self, idx_tuple, value):
assert len(idx_tuple) == 2, 'Invalid number'
row = idx_tuple[0]
col = idx_tuple[1]
assert row >= 0 and row <= self.num_rows() and \
col >= 0 and col <= self.num_cols(), \
'Array subscript out of range'
the_one_d_arr = self._rows[row]
the_one_d_arr[col] = value
96 CHAPTER 11. ARRAYS

11.5 Game of Life


The Game of Life, also known simply as Life, is a cellular automaton devised by the British
mathematician John Horton Conway in 1970.
The game is a zero-player game, meaning that its evolution is determined by its initial
state, requiring no further input. One interacts with the Game of Life by creating an initial
conguration and observing how it evolves.

11.5.1 Rules
The universe of the Game of Life is an innite, two-dimensional orthogonal grid of square
cells, each of which is in one of two possible states, alive or dead, (or populated and un-
populated, respectively). Every cell interacts with its eight neighbours, which are the cells
that are horizontally, vertically, or diagonally adjacent. At each step in time, the following
transitions occur:

1. Any live cell with fewer than two live neighbours dies, as if by underpopulation.

2. Any live cell with two or three live neighbours lives on to the next generation.

3. Any live cell with more than three live neighbours dies, as if by overpopulation.

4. Any dead cell with exactly three live neighbours becomes a live cell, as if by repro-
duction.

These rules, which compare the behavior of the automaton to real life, can be condensed
into the following:

1. Any live cell with two or three neighbors survives.

2. Any dead cell with three live neighbors becomes a live cell.

3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.

The initial pattern constitutes the seed of the system. The rst generation is created by
applying the above rules simultaneously to every cell in the seed; births and deaths occur
simultaneously, and the discrete moment at which this happens is sometimes called a tick.
Each generation is a pure function of the preceding one. The rules continue to be applied
repeatedly to create further generations.

11.5.2 Game of Life - Core - Full Code

## Rules

### Any live cell with fewer than two live neighbours dies (underpopulation)
### Any live cell with two or three live neighbors lives on to the next generation
### Any live cell with more than three live neighbours dies, as if by overpopulation
### Any dead cell with exactly three live neighbours becomes a live cell, as
if by reproduction

import matplotlib.pyplot as plt


import numpy as np

class Life:
11.5. GAME OF LIFE 97

def __init__(self, initial=None):


# if initial:
self._initial = initial
# else:
# # 5 x 5 Initial Matrix
# # Surrounded by Border with Zero Values
# self._initial = np.zeros((6,6))
self._rows = self._initial.shape[0]
self._cols = self._initial.shape[1]
self._next = self._initial.copy()
self.flag = False

def check_underpopulation(self, mat, r,c, res):


# if mat[r-1][c-1] + mat[r-1][c] + mat[r-1][c+1] + mat[r][c-1] + mat[r][c+1]
+ mat[r+1][c-1] + mat[r+1][c] + mat[r+1][c+1] < 2:
# res[r][c] = 0
if self.find_sum_around_me(mat, r, c) < 2:
res[r,c] = 0

def check_overpopulation(self, mat, r, c, res):


# if mat[r-1][c-1] + mat[r-1][c] + mat[r-1][c+1] + mat[r][c-1] + mat[r][c+1]
+ mat[r+1][c-1] + mat[r+1][c] + mat[r+1][c+1] > 3:
# res[r][c] = 0
if self.find_sum_around_me(mat, r, c) > 3:
res[r,c] = 0

def check_reproduction(self, mat, r, c, res):


# if mat[r-1][c-1] + mat[r-1][c] + mat[r-1][c+1] + mat[r][c-1] + mat[r][c+1]
+ mat[r+1][c-1] + mat[r+1][c] + mat[r+1][c+1] == 3:
# res[r][c] = 1
if self.find_sum_around_me(mat, r, c) == 3:
res[r,c] = 1

def next_generation(self):
if self.flag:
self.flag = not self.flag
self._initial = self._next.copy()
for r in range(self._rows):
for c in range(self._cols):
self.check_underpopulation(self._next,r,c,self._initial)
self.check_overpopulation(self._next,r,c,self._initial)
self.check_reproduction(self._next,r,c,self._initial)
# self.plot_generation(self._initial)
# print(f'Next Generation: \n {self._initial}')
return self._initial
else:
self.flag = not self.flag
self._next = self._initial.copy()
for r in range(self._rows):
for c in range(self._cols):
self.check_underpopulation(self._initial,r,c,self._next)
98 CHAPTER 11. ARRAYS

self.check_overpopulation(self._initial,r,c,self._next)
self.check_reproduction(self._initial,r,c,self._next)
# self.plot_generation(self._next)
# print(f'Next Generation: \n {self._next}')
return self._next

def get_mat_item(self, mat, r, c):


if mat[r,c]:
return mat[r,c]
else:
return 0

def find_sum_around_me(self, mat, r, c):


sum = 0
try:
sum += self.get_mat_item(mat,r-1, c-1)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r-1, c)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r-1, c+1)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r, c-1)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r, c+1)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r+1, c-1)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r+1, c)
except:
sum += 0
try:
sum += self.get_mat_item(mat,r+1, c+1)
except:
sum += 0
return sum

def plot_generation(self, mat):


mat_data = []
for i in range(len(mat)):
11.5. GAME OF LIFE 99

mat_data.append(mat[i])
mat_dataset = tuple(mat_data)
plt.matshow(mat_dataset)
plt.show()

11.5.3 Game of Life - GUI - Full Code

from life_np import Life


import matplotlib.pyplot as plt
import numpy as np

dim_row = 200
dim_col = 200

data = np.random.random_integers(0,1,(dim_row,dim_col))

# print(data)

life = Life(initial=data)

fig, ax = plt.subplots()
ax.imshow(life._initial)

for i in range(100):
ax.cla()
ax.imshow(life.next_generation())
# ax.set_title("frame {}".format(i))
# Note that using time.sleep does *not* work here!
plt.pause(1)
100 CHAPTER 11. ARRAYS
Chapter 12

Algorithm Analysis

12.1 Introduction
Algorithms are designed to solve problems, but a given problem can have many dierent
solutions. To determine the most ecient solution, we can measure the execution time. We
can implement the solution by constructing a computer program, using a given programming
language. We then execute the programe and time it using a wall clock or the computer's
internal clock. The execution time is dependent on several factors. First, the amount of data
that must be processed directly aects the execution time. As the data set size increases,
so does the execution time. Second, the execution times can vary depending on the type
of hardware and the time of day a computer is used. If we use a multi-process, multi-
user system to execute the program, the execution of other programs on the same machine
can directly aect the execution time of our program. Finally, the choice of programming
language and compiler used to implement an algorithm can also inuence the execution
time. Some compilers are better optimizer than others and some languages produce better
optimized code than others. Thus, we need a method to analyze an algorithms eciency
independent of the implementation details.

In computer science, time complexity is the computational complexity that describes the
amount of time it takes to run an algorithm.

Big O notation is a method for determining how fast an algorithm is. Using Big O
notation, we can learn whether our algorithm is fast or slow. This knowledge lets us design
better algorithms.

This article is written using agnostic Python. That means it will be easy to port the
Big O notation code over to Java, or any other language. If the code isn't agnostic, there's
Java code accompanying it.

12.1.1 How do we measure - Example


$ python3 -m timeit '[print(x) for x in range(100)]'
100 loops, best of 3: 11.1 msec per loop
$ python3 -m timeit '[print(x) for x in range(10)]'
1000 loops, best of 3: 1.09 msec per loop
# We can see that the time per loop changes depending on the input!

101
102 CHAPTER 12. ALGORITHM ANALYSIS

12.2 Complexity Analysis


To determine the eciency of an algorithm, we can examine the solution itself and measure
those aspects of the algorithm that most critically aect its execution time. For example, we
can count the number of logical comparisons, data interchanges, or arithmetic operations.
Consider the following algorithm for computing the sum of each row of an nxn matrix an
an overall sum of the entire matrix

total_sum = 0
for in in range(n):
row_sum[i] = 0
for j in range(n):
row_sum[i] = row_sum[i] + matrix[i, j]
total_sum = total_sum + matrix[i,j]
Suppose we want to analyze the algorithm based on the number of additions performed.
In this example, there are only two addition operations, making this a simple task. The
algorithm contains two loops, one nested inside the other. The inner loop is executed n
times and since it contains the two addition operations, there are a total of 2n additions
performed by the inner loop for each iteration of the outer loop. The outer loop is also
performed n times, for a total of 2n2 additions.
Can we improve upon this algorithm to reduce the total number of addition operations
performed? Consider a new version of the algorithm in which the second addition is moved
out of the inner loop and modied to sum the entries in the rows um array instead of
individual elements of the matrix

total_sum = 0
for i in range(n):
row_sum[i] = 0
for j in range(n):
row_sum[i] = row_sum[i] + matrix[i,j]
total_sum = total_sum + row_sum[i]
In this version, the inner loop is again executed n times, but this time, it only contains
one addition operation. That gives a total of n additions for each iteration of the outer
loop, but the outer loop now contains an addition operator of its own. To calculate the
total number of additions for this version, we take the n additions of the outer loop. This
gives n+1 additions for each iteration of the outer loop, which is performed n times for a
total of n2 + n additions.
If we compare the two results, it is obvious the number of additions in the second version
is less than the rst for any n greater than 1. Thus, the second version will execute faster
than the rst, but the dierence in execution times will not be signicant. The reason is
that both algorithms execute on the same order of magnitude, namely n2 . Thus, as the
size of n increases, both algorithms increase at approximately the same rate (though one is
slightly better).
Table 12.1 presented in page 103 presents an important growth rate comparison il-
lustrating the discussed example. Figure 12.1 presented in page 103 presents graphical
representation of values presented in the table.

12.3 Asymptotic Analysis


Given two algorithms for a task, how do we nd out which one is better? One naive way of
doing this is  implement both the algorithms and run the two programs on your computer
12.3. ASYMPTOTIC ANALYSIS 103

Table 12.1: Growth rate comparisons for dierent input sizes


2
n 2n n2 + n
10 200 110
100 20,000 10,100
1000 2,000,000 1,001,000
10,000 200,000,000 100,010,000
100,000 20,000,000,000 10,000,100,000

Figure 12.1: Graphical comparison of the growth rates


104 CHAPTER 12. ALGORITHM ANALYSIS

for dierent inputs and see which one takes less time. There are many problems with this
approach for analysis of algorithms.

1. It might be possible that for some inputs, rst algorithm performs better than the
second. And for some inputs second performs better.

2. It might also be possible that for some inputs, rst algorithm perform better on one
machine and the second works better on other machine for some other inputs.

Asymptotic Analysis is the big idea that handles above issues in analyzing algorithms.
In Asymptotic Analysis, we evaluate the performance of an algorithm in terms of input size
(we don't measure the actual running time). We calculate, how does the time (or space)
taken by an algorithm increases with the input size.
For example, let us consider the search problem (searching a given item) in a sorted
array. One way to search is Linear Search (order of growth is linear) and other way is
Binary Search (order of growth is logarithmic). To understand how Asymptotic Analysis
solves the above mentioned problems in analyzing algorithms, let us say we run the Linear
Search on a fast computer and Binary Search on a slow computer. For small values of input
array size n, the fast computer may take less time. But, after certain value of input array
size, the Binary Search will denitely start taking less time compared to the Linear Search
even though the Binary Search is being run on a slow machine. The reason is the order of
growth of Binary Search with respect to input size logarithmic while the order of growth
of Linear Search is linear. So the machine dependent constants can always be ignored after
certain values of input size.

12.3.1 Does Asymptotic Analysis always work?


Asymptotic Analysis is not perfect, but that's the best way available for analyzing algo-
rithms. For example, say there are two sorting algorithms that take 1000nLogn and 2nLogn
time respectively on a machine. Both of these algorithms are asymptotically same (order
of growth is nLogn). So, With Asymptotic Analysis, we can't judge which one is better as
we ignore constants in Asymptotic Analysis. Also, in Asymptotic analysis, we always talk
about input sizes larger than a constant value. It might be possible that those large inputs
are never given to your software and an algorithm which is asymptotically slower, always
performs better for your particular situation. So, you may end up choosing an algorithm
that is Asymptotically slower but faster for your software.

12.3.2 Three Cases


We can have three cases to analyze an algorithm:

1. Worst Case

2. Average Case

3. Best Case

Let us consider the following implementation of Linear Search.

# Linearly search x in arr[]. If x is present


# then return the index, otherwise return -1
def search(arr, n, x):
i = 0
12.3. ASYMPTOTIC ANALYSIS 105

for i in range(i, n):


if (arr[i] == x):
return i
return -1

# Driver Code
arr = [1, 10, 30, 15]
x = 30
n = len(arr)
print(x, "is present at index",
search(arr, n, x))

30 is present at index 2

Worst Case Analysis (Usually Done)


In the worst case analysis, we calculate upper bound on running time of an algorithm. We
must know the case that causes maximum number of operations to be executed. For Linear
Search, the worst case happens when the element to be searched (x in the above code) is
not present in the array. When x is not present, the search() functions compares it with all
the elements of arr[] one by one. Therefore, the worst case time complexity of linear search
would be O(n).

Average Case Analysis (Sometimes done)


In average case analysis, we take all possible inputs and calculate computing time for all of
the inputs. Sum all the calculated values and divide the sum by total number of inputs. We
must know (or predict) distribution of cases. For the linear search problem, let us assume
that all cases are uniformly distributed (including the case of x not being present in array).
So we sum all the cases and divide the sum by (n+1). Following is the value of Average
Case Time (ACT) complexity.

Pn+1
i=1 θ(i)
ACT = = θ(n) (12.1)
(n + 1)

Best Case Analysis


In the best case analysis, we calculate lower bound on running time of an algorithm. We
must know the case that causes minimum number of operations to be executed. In the linear
search problem, the best case occurs when x is present at the rst location. The number
of operations in the best case is constant (not dependent on n). So time complexity in the
best case would be ω(1)

General Notes
1. Most of the times, we do worst case analysis to analyze algorithms. In the worst
analysis, we guarantee an upper bound on the running time of an algorithm which is
good information.

2. The average case analysis is not easy to do in most of the practical cases and it is
rarely done. In the average case analysis, we must know (or predict) the mathematical
distribution of all possible inputs.
106 CHAPTER 12. ALGORITHM ANALYSIS

3. The Best Case analysis is bogus. Guaranteeing a lower bound on an algorithm doesn't
provide any information as in the worst case, an algorithm may take years to run.

4. For some algorithms, all the cases are asymptotically same, i.e., there are no worst
and best cases.

12.4 Big-O Notation


Instead of counting the precise number of operations or steps, computer scientists are more
interested in classifying an algorithm based on the order of magnitude as applied to execution
time or space requirements. This classication approximates the actual number of required
steps for execution or the actual storage requirements in terms of variable-sized data sets.
The term big-O, which is derived from the expression "on the order of," is used to specify
an algorithm's classication.
Big O is a formal notation that describes the behaviour of a function when the argument
tends towards the maximum input.
It was invented by Paul Bachmann, Edmund Landau and others between 1894 and 1820s.
Popularised in the 1970s by Donald Knuth.
Big O takes the upper bound. The worst-case results in the worst execution of the
algorithm. For our shopping list example, the worst-case is an innite list.
Instead of saying the input is 10 billion, or innite - we say the input is n size. The exact
size of the input doesn't matter, only how our algorithm performs with the worst input. We
can still work out Big O without knowing the exact size of an input.
Big O is easy to read once we learn the following:

ˆ Constant: O(1)
ˆ Logarithm: O(logn)
ˆ Linear: O(n)
ˆ Polynomial: O(n2 ), O(n3 ), O(nx )
ˆ Exponential: O(2n )

Other asymptotic (time-measuring) notations are:

ˆ Big Omega ω: lower bound - best case

ˆ Big Theta θ: average bound - average case

ˆ Big O O: max bound - worst case

12.4.1 Constant Time


Constant algorithms do not scale with the input size, they are constant no matter how big
the input. An example of this is addition. 1+2 takes the same time as 500+700. They
may take more physical time, but we do not add more steps in the algorithm for addition
of big numbers. The underlying algorithm doesn't change at all. We often see constant as
O(1), but any number could be used and it would still be constant. We sometimes change
the number to a 1, because it doesn't matter at all about how many steps it takes. What
matters is that it takes a constant number of steps. Constant time is the fastest of all Big
O time complexities. The formal denition of constant time is: It is upper-bounded by
a constant.
12.4. BIG-O NOTATION 107

def odd_or_even(n):
return "Even" if n % 2 else "Odd"

12.4.2 Logarithmic Time


Here's a quick explain of what a logarithm is.

log93

What is being asked here is 3 to what power gives us 9? This is 3 to the power of 2
gives us 9, so the whole expression looks like:

log93 = 2

A logarithmic algorithm halves the list every time it's run.


Let's look at binary search. Given the below sorted list:

a = [1, 2, 3, 4, 5, 6 , 7, 8, 9, 10]
We want to nd the number "2".
We implement Binary Search as:

def binarySearch(alist, item):


first = 0
last = len(alist)-1
found = False

while first <= last and not found:


midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1

return found
This is:

1. Go to the middle of the list

2. Check to see if that element is the answer

3. If it's not, check to see if that element is more than the item we want to nd

4. If it is, ignore the right-hand side (all the numbers higher than the midpoint) of the
list and choose a new midpoint.

5. Start over again, by nding the midpoint in the new list.


108 CHAPTER 12. ALGORITHM ANALYSIS

Linear Time
Linear time algorithms mean that every single element from the input is visited exactly
once, O(n) times. As the size of the input, N, grows our algorithm's run time scales exactly
with the size of the input.
Linear time is where every single item in a list is visited once, in a worst-case scenario.

shopping_list = ["Bread", "Butter", "The Nacho Libre soundtrack from the 2006
film Nacho Libre", "Reusable Water Bottle"]
for item in shopping_list:
print(item)
Let's look at another example. The largest item of an unsorted array
Given the list:

a = [2, 16, 7, 9, 8, 23, 12]


How do we work out what the largest item is?
We need to program it like this:

a = [2, 16, 7, 9, 8, 23, 12]


max_item = a[0]
for item in a:
if item > max_item:
max_item = item
We have to go through every item in the list, 1 by 1.

Polynomial Time
Polynomial time is a polynomial function of the input. A polynomial function looks like n2
3
or n and so on.
If one loop through a list is O(n), 2 loops must be O(n2 ). For each loop, we go over
the list once. For each item in that list, we go over the entire list once. Resulting in n2
operations.

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in a:
for x in a:
print("x")
For each nesting on the same list, that adds an extra +1 onto the powers.
So a triple nested loop is O(n3 ).
Bubblesort is a good example of an O(n2 )algorithm. The sorting algorithm takes the
rst number and swaps it with the adjacent number if they are in the wrong order. It does
this for each number, until all numbers are in the right order - and thus sorted.

def bubbleSort(arr):
n = len(arr)

# Traverse through all array elements


for i in range(n):

# Last i elements are already in place


for j in range(0, n-i-1):

# traverse the array from 0 to n-i-1


12.4. BIG-O NOTATION 109

# Swap if the element found is greater


# than the next element
if arr[j] > arr[j+1] :
arr[j], arr[j+1] = arr[j+1], arr[j]

# Driver code to test above


arr = [64, 34, 25, 12, 22, 11, 90]

bubbleSort(arr)

Exponential Complexity
Exponential time is 2n , where 2 depends on the permutations involved.
This algorithm is the slowest of them all. You saw how my professor reacted to polyno-
mial algorithms. He was jumping up and down in furiosity at exponential algorithms!
Say we have a password consisting only of numbers (10 numbers, 0 through to 9). we
want to crack a password which has a length of n. To bruteforce through every combination
we'll have:

10n
Combinations to work through.
One example of exponential time is to nd all the subsets of a set.

>>> subsets([''])
['']
>>> subsets(['x'])
['', 'x']
>>> subsets(['a', 'b'])
['', 'a', 'b', 'ab']
We can see that when we have an input size of 2, the output size is 22=4.
Now, let's code up subsets.

from itertools import chain, combinations

def subsets(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
Taken from the documentation for itertools. What's important here is to see that it
exponentially grows depending on the input size. Java code can be found here.
Exponential algorithms are horric, but like polynomial algorithms we can learn a thing
or two. Let's say we have to calculate 104. We need to do this:

10*10*10*10=102*102
We have to calculate 102 twice! What if we store that value somewhere and use it later
so we do not have to recalculate it? This is the principle of Dynamic Programming, which
you can read about here.
When we see an exponential algorithm, dynamic programming can often be used to
speed it up.
Again, knowing time complexities allows us to build better algorithms.
110 CHAPTER 12. ALGORITHM ANALYSIS
Chapter 13

Linked List

13.1 Introduction
A linked list is a sequence of data elements, which are connected together via links. Each
data element contains a connection to another data element in form of a pointer. Python
does not have linked lists in its standard library. In this chapter we are going to study
the types of linked lists known as singly linked lists. In this type of data structure there is
only one link between any two data elements. We create such a list and create additional
methods to insert, update and remove elements from the list.

13.2 Creation of Linked List


We create a Node object and create another class to use this ode object. We pass the
appropriate values through the node object to point the to the next data elements. The
below program creates the linked list with three data elements. In the next section we will
see how to traverse the linked list.

class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None

class SLinkedList:
def __init__(self):
self.headval = None

list1 = SLinkedList()
list1.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
# Link first Node to second node
list1.headval.nextval = e2

# Link second Node to third node


e2.nextval = e3

111
112 CHAPTER 13. LINKED LIST

13.3 Traversing a Linked List


Singly linked lists can be traversed in only forward direction starting form the rst data
element. We simply print the value of the next data element by assigning the pointer of the
next node to the current data element.

class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None

class SLinkedList:
def __init__(self):
self.headval = None

def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval

list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")

# Link first Node to second node


list.headval.nextval = e2

# Link second Node to third node


e2.nextval = e3

list.listprint()
When the above code is executed, it produces the following result:

Mon
Tue
Wed

13.4 Insertion in a Linked List


Inserting element in the linked list involves reassigning the pointers from the existing nodes
to the newly inserted node. Depending on whether the new data element is getting inserted
at the beginning or at the middle or at the end of the linked list, we have the below scenarios.

13.4.1 Inserting at the Beginning of the Linked List


This involves pointing the next pointer of the new data node to the current head of the
linked list. So the current head of the linked list becomes the second data element and the
new node becomes the head of the linked list.
13.4. INSERTION IN A LINKED LIST 113

class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None

class SLinkedList:
def __init__(self):
self.headval = None

# Print the linked list


def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval
def AtBegining(self,newdata):
NewNode = Node(newdata)

# Update the new nodes next val to existing node


NewNode.nextval = self.headval
self.headval = NewNode

list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")

list.headval.nextval = e2
e2.nextval = e3

list.AtBegining("Sun")

list.listprint()
When the above code is executed, it produces the following result:

Sun
Mon
Tue
Wed

13.4.2 Inserting at the End of the Linked List


This involves pointing the next pointer of the the current last node of the linked list to the
new data node. So the current last node of the linked list becomes the second last data
node and the new node becomes the last node of the linked list.

class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None

class SLinkedList:
114 CHAPTER 13. LINKED LIST

def __init__(self):
self.headval = None

# Function to add newnode


def AtEnd(self, newdata):
NewNode = Node(newdata)
if self.headval is None:
self.headval = NewNode
return
laste = self.headval
while(laste.nextval):
laste = laste.nextval
laste.nextval=NewNode

# Print the linked list


def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval

list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")

list.headval.nextval = e2
e2.nextval = e3

list.AtEnd("Thu")

list.listprint()
When the above code is executed, it produces the following result:

Mon
Tue
Wed
Thu

13.4.3 Inserting in between two Data Nodes


This involves chaging the pointer of a specic node to point to the new node. That is
possible by passing in both the new node and the existing node after which the new node
will be inserted. So we dene an additional class which will change the next pointer of the
new node to the next pointer of middle node. Then assign the new node to next pointer of
the middle node.

class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
13.4. INSERTION IN A LINKED LIST 115

class SLinkedList:
def __init__(self):
self.headval = None

# Function to add node


def Inbetween(self,middle_node,newdata):
if middle_node is None:
print("The mentioned node is absent")
return

NewNode = Node(newdata)
NewNode.nextval = middle_node.nextval
middle_node.nextval = NewNode

# Print the linked list


def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval

list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Thu")

list.headval.nextval = e2
e2.nextval = e3

list.Inbetween(list.headval.nextval,"Fri")

list.listprint()
When the above code is executed, it produces the following result:

Mon
Tue
Fri
Thu

13.4.4 Removing an Item form a Liked List


We can remove an existing node using the key for that node. In the below program we
locate the previous node of the node which is to be deleted. Then point the next pointer of
this node to the next node of the node to be deleted.

class Node:
def __init__(self, data=None):
self.data = data
self.next = None
116 CHAPTER 13. LINKED LIST

class SLinkedList:
def __init__(self):
self.head = None

def Atbegining(self, data_in):


NewNode = Node(data_in)
NewNode.next = self.head
self.head = NewNode

# Function to remove node


def RemoveNode(self, Removekey):

HeadVal = self.head

if (HeadVal is not None):


if (HeadVal.data == Removekey):
self.head = HeadVal.next
HeadVal = None
return

while (HeadVal is not None):


if HeadVal.data == Removekey:
break
prev = HeadVal
HeadVal = HeadVal.next

if (HeadVal == None):
return

prev.next = HeadVal.next

HeadVal = None

def LListprint(self):
printval = self.head
while (printval):
print(printval.data),
printval = printval.next

llist = SLinkedList()
llist.Atbegining("Mon")
llist.Atbegining("Tue")
llist.Atbegining("Wed")
llist.Atbegining("Thu")
llist.RemoveNode("Tue")
llist.LListprint()

When the above code is executed, it produces the following result:

Thu
13.4. INSERTION IN A LINKED LIST 117

Wed
Mon
118 CHAPTER 13. LINKED LIST
Chapter 14

Queue

14.1 Introduction
Queue is a linear data structure that stores items in First In First Out (FIFO) manner.
With a queue the least recently added item is removed rst. A good example of queue is
any queue of consumers for a resource where the consumer that came rst is served rst.

14.2 Queue Operations


Operations associated with queue are:

1. Enqueue: Adds an item to the queue. If the queue is full, then it is said to be an
Overow condition  Time Complexity : O(1)

2. Dequeue: Removes an item from the queue. The items are popped in the same order
in which they are pushed. If the queue is empty, then it is said to be an Underow
condition  Time Complexity : O(1)

3. Front: Get the front item from queue  Time Complexity : O(1)

4. Rear: Get the last item from queue  Time Complexity : O(1)

14.3 Implementation
There are various ways to implement a queue in Python. This article covers the implemen-
tation of queue using data structures and modules from Python library.

Queue in Python can be implemented by the following ways:

1. list

2. collections.deque

3. queue.Queue

119
120 CHAPTER 14. QUEUE

14.4 Implementation using list


List is a Python's built-in data structure that can be used as a queue. Instead of enqueue()
and dequeue(), append() and pop() function is used. However, lists are quite slow for this
purpose because inserting or deleting an element at the beginning requires shifting all of the
other elements by one, requiring O(n) time.

# Initializing a queue
queue = []

# Adding elements to the queue


queue.append('a')
queue.append('b')
queue.append('c')

print("Initial queue")
print(queue)

# Removing elements from the queue


print("\nElements dequeued from queue")
print(queue.pop(0))
print(queue.pop(0))
print(queue.pop(0))

print("\nQueue after removing elements")


print(queue)

# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty

Output

Initial queue
['a', 'b', 'c']

Elements dequeued from queue


a
b
c

Queue after removing elements


[]

Traceback (most recent call last):


File "/home/ef51acf025182ccd69d906e58f17b6de.py", line 25, in
print(queue.pop(0))
IndexError: pop from empty list
14.5. IMPLEMENTATION USING COLLECTIONS.DEQUE 121

14.5 Implementation using collections.deque


Queue in Python can be implemented using deque class from the collections module. Deque
is preferred over list in the cases where we need quicker append and pop operations from
both the ends of container, as deque provides an O(1) time complexity for append and pop
operations as compared to list which provides O(n) time complexity. Instead of enqueue
and deque, append() and popleft() functions are used.

from collections import deque

# Initializing a queue
q = deque()

# Adding elements to a queue


q.append('a')
q.append('b')
q.append('c')

print("Initial queue")
print(q)

# Removing elements from a queue


print("\nElements dequeued from the queue")
print(q.popleft())
print(q.popleft())
print(q.popleft())

print("\nQueue after removing elements")


print(q)

# Uncommenting q.popleft()
# will raise an IndexError
# as queue is now empty

Output:

Initial queue
deque(['a', 'b', 'c'])

Elements dequeued from the queue


a
b
c

Queue after removing elements


deque([])

Traceback (most recent call last):


File "/home/b2fa8ce438c2a9f82d6c3e5da587490f.py", line 23, in
q.popleft()
IndexError: pop from an empty deque
122 CHAPTER 14. QUEUE

14.6 Implementation using queue.Queue


Queue is built-in module of Python which is used to implement a queue. queue.Queue(maxsize)
initializes a variable to a maximum size of maxsize. A maxsize of zero `0' means a innite
queue. This Queue follows FIFO rule. There are various functions available in this module:

ˆ maxsize  Number of items allowed in the queue.

ˆ empty()  Return True if the queue is empty, False otherwise.

ˆ full()  Return True if there are maxsize items in the queue. If the queue was initialized
with maxsize=0 (the default), then full() never returns True.

ˆ get()  Remove and return an item from the queue. If queue is empty, wait until an
item is available.

ˆ get_nowait()  Return an item if one is immediately available, else raise QueueEmpty.

ˆ put(item)  Put an item into the queue. If the queue is full, wait until a free slot is
available before adding the item.

ˆ put_nowait(item)  Put an item into the queue without blocking.

ˆ qsize()  Return the number of items in the queue. If no free slot is immediately
available, raise QueueFull.

from queue import Queue

# Initializing a queue
q = Queue(maxsize = 3)

# qsize() give the maxsize


# of the Queue
print(q.qsize())

# Adding of element to queue


q.put('a')
q.put('b')
q.put('c')

# Return Boolean for Full


# Queue
print("\nFull: ", q.full())

# Removing element from queue


print("\nElements dequeued from the queue")
print(q.get())
print(q.get())
print(q.get())

# Return Boolean for Empty


# Queue
print("\nEmpty: ", q.empty())
14.6. IMPLEMENTATION USING QUEUE.QUEUE 123

q.put(1)
print("\nEmpty: ", q.empty())
print("Full: ", q.full())

# This would result into Infinite


# Loop as the Queue is empty.
# print(q.get())

Output:

Full: True

Elements dequeued from the queue


a
b
c

Empty: True

Empty: False
Full: False
124 CHAPTER 14. QUEUE
Chapter 15

Stack

15.1 Introduction
A stack is a data structure that stores items in an Last-In/First-Out manner. This is
frequently referred to as LIFO. This is in contrast to a queue, which stores items in a
First-In/First-Out (FIFO) manner.
It's probably easiest to understand a stack if you think of a use case you're likely familiar
with: the Undo feature in your editor.
Let's imagine you're editing a Python le so we can look at some of the operations you
perform. First, you add a new function. This adds a new item to the undo stack:
You can see that the stack now has an Add Function operation on it. After adding the
function, you delete a word from a comment. This also gets added to the undo stack:
Notice how the Delete Word item is placed on top of the stack. Finally you indent a
comment so that it's lined up properly:
You can see that each of these commands are stored in an undo stack, with each new
command being put at the top. When you're working with stacks, adding new items like
this is called push.
Now you've decided to undo all three of those changes, so you hit the undo command.
It takes the item at the top of the stack, which was indenting the comment, and removes
that from the stack:
Your editor undoes the indent, and the undo stack now contains two items. This opera-
tion is the opposite of push and is commonly called pop.
When you hit undo again, the next item is popped o the stack:

125
126 CHAPTER 15. STACK
15.2. IMPLEMENTING A PYTHON STACK 127

This removes the Delete Word item, leaving only one operation on the stack.
Finally, if you hit Undo a third time, then the last item will be popped o the stack:
The undo stack is now empty. Hitting Undo again after this will have no eect because
your undo stack is empty, at least in most editors. You'll see what happens when you call
.pop() on an empty stack in the implementation descriptions below.

15.2 Implementing a Python Stack


There are a couple of options when you're implementing a Python stack. We will focus on
using data structures that are part of the Python library, rather than writing our own or
using third-party packages.
We will look at the following Python stack implementations:

1. list

2. collections.deque

15.3 Using list to Create a Python Stack


The built-in list structure that you likely use frequently in your programs can be used as
a stack. Instead of .push(), you can use .append() to add new elements to the top of your
stack, while .pop() removes the elements in the LIFO order:

>>> myStack = []

>>> myStack.append('a')
>>> myStack.append('b')
>>> myStack.append('c')

>>> myStack
['a', 'b', 'c']

>>> myStack.pop()
'c'
>>> myStack.pop()
'b'
>>> myStack.pop()
128 CHAPTER 15. STACK

'a'

>>> myStack.pop()
Traceback (most recent call last):
File "<console>", line 1, in <module>
IndexError: pop from empty list

You can see in the nal command that a list will raise an IndexError if you call .pop()
on an empty stack.
list has the advantage of being familiar. You know how it works and likely have used it
in your programs already.
Unfortunately, list has a few shortcomings compared to other data structures you'll look
at. The biggest issue is that it can run into speed issues as it grows. The items in a list are
stored with the goal of providing fast access to random elements in the list. At a high level,
this means that the items are stored next to each other in memory.
If your stack grows bigger than the block of memory that currently holds it, then Python
needs to do some memory allocations. This can lead to some .append() calls taking much
longer than other ones.
There is a less serious problem as well. If you use .insert() to add an element to your stack
at a position other than the end, it can take much longer. This is not normally something
you would do to a stack, however.
The next data structure will help you get around the reallocation problem you saw with
list.

15.4 Using collections.deque


The collections module contains deque, which is useful for creating Python stacks. deque is
pronounced deck and stands for double-ended queue.
You can use the same methods on deque as you saw above for list, .append(), and .pop():

>>> from collections import deque


>>> myStack = deque()

>>> myStack.append('a')
>>> myStack.append('b')
>>> myStack.append('c')

>>> myStack
deque(['a', 'b', 'c'])

>>> myStack.pop()
'c'
>>> myStack.pop()
'b'
>>> myStack.pop()
'a'

>>> myStack.pop()
Traceback (most recent call last):
File "<console>", line 1, in <module>
IndexError: pop from an empty deque
15.4. USING COLLECTIONS.DEQUE 129

This looks almost identical to the list example above. At this point, you might be
wondering why the Python core developers would create two data structures that look the
same.

15.4.1 Why Have deque and list?


As you saw in the discussion about list above, it was built upon blocks of contiguous memory,
meaning that the items in the list are stored right next to each other.
This works great for several operations, like indexing into the list. Getting myList[3] is
fast, as Python knows exactly where to look in memory to nd it. This memory layout also
allows slices to work well on lists.
The contiguous memory layout is the reason that list might need to take more time to
.append() some objects than others. If the block of contiguous memory is full, then it will
need to get another block, which can take much longer than a normal .append().
deque, on the other hand, is built upon a doubly linked list. In a linked list structure,
each entry is stored in its own memory block and has a reference to the next entry in the
list.
A doubly linked list is just the same, except that each entry has references to both the
previous and the next entry in the list. This allows you to easily add nodes to either end of
the list.
Adding a new entry into a linked list structure only requires setting the new entry's
reference to point to the current top of the stack and then pointing the top of the stack to
the new entry.
This constant-time addition and removal of entries onto a stack comes with a trade-o,
however. Getting myDeque[3] is slower than it was for a list, because Python needs to walk
through each node of the list to get to the third element.
130 CHAPTER 15. STACK

Fortunately, you rarely want to do random indexing or slicing on a stack. Most operations
on a stack are either push or pop.
The constant time .append() and .pop() operations make deque an excellent choice for
implementing a Python stack if your code doesn't use threading.

15.5 Which Implementation to Use?


Python Stacks: Which Implementation Should You Use?
In general, you should use a deque if you're not using threading. If you are using
threading, then you should use a LifoQueue unless you've measured your performance and
found that a small boost in speed for pushing and popping will make enough dierence to
warrant the maintenance risks.
list may be familiar, but it should be avoided because it can potentially have memory
reallocation issues. The interfaces for deque and list are identical, and deque doesn't have
these issues, which makes deque the best choice for your non-threaded Python stack.
Resources

This chapter includes resources that are complimentary to the information presented in this
book, and that are useful for further reading. Resources are divided by sections, based on
their categories.

15.6 Book Resources


At the end of the book, I really hope you enjoyed the journey and found the book useful.
Kindly accept my apologies if you have found some diculties in interpreting some of the
presented gures or code listings. This happens mainly as a result of decreasing images
resolution to t for printing. This has nothing to do with printing quality by the way.

15.7 Errata
Though we have tried heavily to make this book free of errors; and by we I mean the review
team that have contributed greatly in debugging this book, I am sure there is no book edited
by human that is free of errors. In case you have found any error within the book, kindly
email me at:
h.elghareeb@yahoo.com
and use DSA20 Book Errata in the email subject. Though it is not for granted, but mostly
you will be rewarded with a free edition of this book, or may be another book from my
library.

15.8 Github Repository


https://www.github.com/helghareeb/DSA20

15.9 Bibliography
This list include some of the resources we have greatly beneted from during our journey
in learning Data Structures and Algorithms, and in writing this book. List include; not
limited to

ˆ Data Structures and Algorithms

ˆ Python

131

You might also like