Professional Documents
Culture Documents
Version 1.0
December 6, 2019
Haitham El-Ghareeb
Welcome to Data Structures and Algorithms: Theory and Practice Course. In the beginning,
let's take a small journey familiarizing ourselves with the course specications, objectives,
and contents based on our beloved faculty specications.
Credits
Theory 2
Project 0
Lab 3
TOT 3
Course Description
This course intorduces
and developing skills in the design and implementation of complex software systems
Course Syllabus
Secondary Storage Devices
Stacks
Queues
Lists
iii
iv PREFACE
Sequences
Ranked Sequences
Positional Sequences
General Sequences
Trees
Binary Trees
Priority Queues
Heaps
Dictionaries
AVL Trees
Hash Tables
Sets
Merge Sort
Quick Sort
Radix Sort
Complexity of Sorting
Selection
Graphs
Graph Traversal
Directed Graphs
Strings
Tries
B-Trees
B+-Trees
v
Course Resources
Telegram Channel: https://t.me/DSA1920
Github: https://www.github.com/helghareeb/DSA20
Google Classroom: https://classroom.google.com
Invite students or give them the class code:
Contact: h.elghareeb@yahoo.com
Book Contents
This book begins with part, the part you are reading right now.
Chapter 1 presented in page 3 presents an important discussion about the dierent
methods used to compare between programming languages, and how we shall compare them
in order to chose among them. Comparison criteria are many, not clear, and eventually we
have to chose.
Book concludes with resources section presented in 15.5 at page 15.5. This part provides
links to important resources to broaden the concepts presented in this book. A very impor-
tant section (Glossary) is presented at the end of the book, with important denitions of
basic concepts (either from Data Structures and Algorithms or generally from Computers
and Information Sciences) that you must be familiar with, and ready to answer questions
about when asked.
Welcome! - 1 Lecture
Course Mechanics
Python Review
Recursion - 1 Lecture
Tree - 1 Lecture
Graph - 1 Lecture
Github Repository
Besides CIS Faculty Learning Management System (LMS), Course repository is available at
https://www.github.com/helghareeb/DSA20
This shall be your main source of course information. Going there regularly, at least
once weekly, you will notice somethings
Content gets updated regularly, new items and content added weekly
There, you will nd folders arranged by Weeks, Lectures, and mapped linearly
Demo Code samples illustrated in the lecture. Some of them are already placed
inside the book, however we present them again here as solutions / projects so
you can compile/interpret and run immediately
Lab Will include labs activities, resources, and solutions (it depends) of the
lecture week contents
Lecture include lecture slides, which are useful guidelines for studying <what
content> and <where: in the book and other useful external resources>
Finally, we are doing our best to make this course interesting and useful as we can. For
sure we will face challenges, but there is always hope. Hope you will enjoy this course as we
enjoyed preparing it, and hope this course will become a checkpoint in your career.
Regards, Dr.Haitham December 6, 2019
1 Note that Github does not display empty folders, however they are there for all lectures
Contents
Preface iii
I Introduction 1
1 Programming Languages are Not the Same 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 This is Not... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Let's Agree on.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Programming Languages Comparison . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Why We Need To Compare ? . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 How Do We Compare ? . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Academic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Programming Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.1 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.2 Structured Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.3 Procedural Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.4 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.5 Event-Driven Programming . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.6 Object Oriented Programming . . . . . . . . . . . . . . . . . . . . . . 10
1.5.7 Declarative Programming . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.8 Reactive Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.9 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 General Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.1 Compiled vs. Interpreted . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.2 Standardized Programming Languages . . . . . . . . . . . . . . . . . . 12
1.6.3 Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.4 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.5 Object Oriented Programming Features Support . . . . . . . . . . . . 14
1.6.6 Functional Programming Features Support . . . . . . . . . . . . . . . 15
1.6.7 Multithreading / Concurrency . . . . . . . . . . . . . . . . . . . . . . 15
1.6.8 Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.9 Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.10 Regular Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vii
viii CONTENTS
3 Lists 25
3.1 A list is a sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Lists are mutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Traversing a list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 List operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 List slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 List methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Map, lter and reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 Deleting elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.9 Lists and strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.10 Objects and values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.11 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.12 List arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.13 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.14 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Tuples 37
4.1 Tuples are immutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Tuple assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Tuples as return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Variable-length argument tuples . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Lists and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6 Dictionaries and tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Sequences of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Functions 47
5.1 Function calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Math functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Adding new functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Denitions and uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
CONTENTS ix
7 Magic Methods 65
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2 __new__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.3 __str__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.4 __add__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.5 __ge__() method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.6 Important Magic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8 Python Testing 71
8.1 Testing Your Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2 Automated vs. Manual Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3 Unit Tests vs. Integration Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.4 Choosing a Test Runner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4.1 unittest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9 Numpy 73
9.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.2 Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.2.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.2.2 Integer Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2.3 Boolean Array Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.3 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.4 Array Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
10.5.1 Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.5.2 Container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.3 Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.4 Sorted Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.5.5 List vs. Python list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6 Python and ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6.1 Step 01: Specify ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10.6.2 02: Using the ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.6.3 Preconditions and Postconditions . . . . . . . . . . . . . . . . . . . . . 85
10.7 Bags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.1 Bag Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.2 Bag Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.7.3 Bag Usage Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.7.4 Why a Bag ADT? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.7.5 Selecting a Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.8 Chose the Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.9 List-Based Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.9.1 Some Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 89
11 Arrays 91
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2 The Array Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2.1 Arrays vs. Python lists . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2.2 When to use Arrays? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3 Array Abstract Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3.1 Array ADT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.3.2 Creation and Usage of Array ADT . . . . . . . . . . . . . . . . . . . . 93
11.3.3 Implementing the Array . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.4 Array 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.4.1 Implementing Array 2D . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.5 Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.5.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.5.2 Game of Life - Core - Full Code . . . . . . . . . . . . . . . . . . . . . 96
11.5.3 Game of Life - GUI - Full Code . . . . . . . . . . . . . . . . . . . . . . 99
14 Queue 119
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.2 Queue Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
14.4 Implementation using list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
14.5 Implementation using collections.deque . . . . . . . . . . . . . . . . . . . . . . 121
14.6 Implementation using queue.Queue . . . . . . . . . . . . . . . . . . . . . . . . 122
15 Stack 125
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15.2 Implementing a Python Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.3 Using list to Create a Python Stack . . . . . . . . . . . . . . . . . . . . . . . 127
15.4 Using collections.deque . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
15.4.1 Why Have deque and list? . . . . . . . . . . . . . . . . . . . . . . . . . 129
15.5 Which Implementation to Use? . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Resources 131
15.6 Book Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.7 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.8 Github Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
xii CONTENTS
Part I
Introduction
1
Chapter 1
the Same
1.1 Introduction
1.1.1 Objectives
Which Programming Language !
Even a Course !
1.1.3 Prerequisites
Familiarity with Programming Concepts (Preferred)
3
4 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME
1.1.4 Contents
1. Let's Agree on
3. Programming Paradigms
4. General Characteristics
6. Final Thoughts
Open Source
From Wikipedia
2
Computer software with its source code made available with a license in which
the copyright holder provides the rights to study, change, and distribute the
software to anyone and for any purpose.
1 https://en.wikipedia.org/wiki/ProgrammingLanguage
2 https://en.wikipedia.org/wiki/OpenSourceSoftware
3 https://en.wikipedia.org/wiki/Technical_standard
1.2. LET'S AGREE ON.. 5
Standards can also be developed by groups such as trade unions, and trade associ-
ations.
Standards organizations often have more diverse input and usually develop volun-
tary standards.
The standardization process may be by edict or may involve the formal consensus
of technical experts.
1.2.2 Thoughts
Theory vs. Product
Theory vs. Product
Who leads: Academia vs. Standards vs. Industry ?
Who wins ?
https://www.reddit.com/r/explainlikeimfive/comments/1jk4jo/eli5_why_are_
there_so_many_programming_languages/
4 https://en.wikipedia.org/wiki/Technical_standard
6 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME
No Standards !
Even in Academia !!
2. Programming Paradigms
3. General Characteristics
1.4 Academic
How Do Researchers Compare ?
https://scholar.google.com/
2. Structured
3. Procedural
4. Functional
5. Event-Driven
6. Object-Oriented
7. Declarative
8. Reactive
9. Others
The set of states a system can occupy is known as its state space.
Statements can only be combined vertically by writing one after another, or with
block constructs.
8 https://en.wikipedia.org/wiki/Expression
9 https://www.quora.com/Whats-the-dierence-between-a-statement-and-an-expression-in-Python
10 https://stackoverow.com/questions/17826380/what-is-dierence-between-functional-and-imperative-
programming-languages
1.5. PROGRAMMING PARADIGMS 9
Programming paradigm aimed at improving the clarity, quality, and development time
of a computer program
Making extensive use of subroutines, block structures, for and while loopsin contrast
to using simple tests and jumps such as the goto statement
It emerged in the late 1950s with the appearance of the ALGOL 58 and ALGOL 60
programming languages
In functional code, the output value of a function depends only on the arguments that
are passed to the function, so calling a function f twice with the same value for an
argument x will produce the same result f(x) each time.
Functional Programming - II
This is in contrast to procedures depending on a local or global state, which may
produce dierent results at dierent times when called with the same arguments
but a dierent program state.
Recursion, often the only way to iterate. Implementations will often include tail call
optimization.
Flow of the program is determined by events such as user actions (mouse clicks, key
presses), sensor outputs, or messages from other programs / threads.
Dominant paradigm used in graphical user interfaces (GUI) and other applications
(e.g. JavaScript web applications) that are centered on performing certain actions in
response to user input.
There is generally a main loop that listens for events, and then triggers a callback
function when one of those events is detected.
Based on the concept of objects, which may contain data, in the form of elds, often
known as attributes; and code, in the form of procedures, often known as methods.
Programs are designed by making them out of objects that interact with one another.
15 https://wiki.haskell.org/Functional_programming
16 https://en.wikipedia.org/wiki/EventDrivenProgramming
17 https://en.wikipedia.org/wiki/ObjectOrientedProgramming
1.5. PROGRAMMING PARADIGMS 11
Java, C++, C#, Python, PHP, Ruby, Perl, Object Pascal, Objective-C, Dart, Swift,
Scala, Common Lisp, and Smalltalk.
Encapsulation, concept that binds together the data and functions that manipulate
the data, and that keeps both safe from outside interference and misuse.
All the data and methods available to the parent class also appear in the child
class with the same names.
Allows easy re-use of the same procedures and data denitions, in addition to
potentially mirroring real-world relationships in an intuitive way.
A polymorphic type is one whose operations can also be applied to values of some
other type, or types.
It means that the code to be executed for a specic procedure call is not known
until run-time.
Style of building the structure and elements of computer programs that expresses
the logic of a computation without describing its control ow.
1.5.9 Others
List continues, to include ( not only )
Automata based programming
Logic
Symbolic
Selected Features
1.6.1 Compiled vs. Interpreted
Compiled, implementations are typically compilers (translators that generate ma-
chine code from source code), and not interpreters
Important ?
Strategies include
21
Reference Counting, each object has a count of the number of references to it.
Garbage is identied by having a reference count of zero.
Escape Analysis, used to convert heap allocations to stack allocations, thus reducing
the amount of work needed to be done by the garbage collector. This is done using a
compile-time analysis.
Generational, http://wiki.c2.com/?GenerationalGarbageCollection
Dynamic, variables change, and does not require a specic type system. Type
checking happens at run-time.
Strong, Programming Language raises errors when data types are not com-
pataible.
Generic Classes
aka Parametric Type
Allows statically typed languages to retain their compile-time type safety yet remain
nearly as exible as dynamically typed languages.
Inheritance
Multiple Inheritance
http://javascript.crockford.com/prototypal.html
Feature Renaming
Attribute / Method
Provide a feature with a more natural name for its new context
Resolve naming ambiguities when a name is inherited from multiple inheritance paths
Uniform Access
All services oered by a module should be available through a uniform notation
Does not betray whether they are implemented through storage or through computa-
tion
This means that, for however many instances of a class exist at any given point in
time, only one copy of each class variable/method exists and is shared by every
instance of the class
1.6. GENERAL CHARACTERISTICS 15
Reection
Ability for a program to determine and manipulate various pieces of information about
an object at run-time.
Object type
Object methods, including number and types of parameters, and return types
Lexical Closures
Bundling up the lexical (static) scope surrounding the function with the function itself
Function carries its surrounding environment around with it wherever it may be used
Invariant, conditions guaranteed to be true at any stable point during the life-
time of an object
TIOBE Index
https://www.tiobe.com/tiobe-index/
25 https://en.wikipedia.com/wiki/RegularLanguages
1.8. FINAL THOUGHTS 17
Concurrency
Readability
Other
https://www.quora.com/What-programming-languages-should-a-modern-day-programmer-have-in
18 CHAPTER 1. PROGRAMMING LANGUAGES ARE NOT THE SAME
Chapter 2
statements
One of the most powerful features of a programming language is the ability to manipulate
variables. A variable is a name that refers to a value.
This example makes three assignments. The rst assigns a string to a new variable named
message; the second gives the integer 17 to n; the third assigns the (approximate) value of
π to pi.
19
20 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS
76trombones is illegal because it begins with a number. more@ is illegal because it contains
an illegal character, @. But what's wrong with class?
It turns out that class is one of Python's keywords. The interpreter uses keywords to
recognize the structure of the program, and they cannot be used as variable names.
Python 3 has these keywords:
You don't have to memorize this list. In most development environments, keywords are
displayed in a dierent color; if you try to use one as a variable name, you'll know.
>>> 42
42
>>> n
17
>>> n + 25
42
When you type an expression at the prompt, the interpreter evaluates it, which means
that it nds the value of the expression. In this example, n has the value 17 and n + 25
has the value 42.
A statement is a unit of code that has an eect, like creating a variable or displaying
a value.
>>> n = 17
>>> print(n)
The rst line is an assignment statement that gives a value to n. The second line is a print
statement that displays the value of n.
When you type a statement, the interpreter executes it, which means that it does
whatever the statement says. In general, statements don't have values.
Because Python provides both modes, you can test bits of code in interactive mode
before you put them in a script. But there are dierences between interactive mode and
script mode that can be confusing.
For example, if you are using Python as a calculator, you might type
miles = 26.2
print(miles * 1.61)
This behavior can be confusing at rst. To check your understanding, type the following
statements in the Python interpreter and see what they do:
5
x = 5
x + 1
Now put the same statements in a script and run it. What is the output? Modify the
script by transforming each expression into a print statement and then run it again.
Exponentiation has the next highest precedence, so 1 + 2**3 is 9, not 27, and 2 *
3**2 is 18, not 36.
Multiplication and Division have higher precedence than Addition and Subtraction.
So 2*3-1 is 5, not 4, and 6+4/2 is 8, not 5.
Operators with the same precedence are evaluated from left to right (except exponen-
tiation). So in the expression degrees / 2 * pi, the division happens rst and the
result is multiplied by pi. 2π , you can use parentheses or write degrees
To divide by
/ 2 / pi.
I don't work very hard to remember the precedence of operators. If I can't tell by looking
at the expression, I use parentheses to make it obvious.
22 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS
2.7 Comments
As programs get bigger and more complicated, they get more dicult to read. Formal
languages are dense, and it is often dicult to look at a piece of code and gure out what
it is doing, or why.
For this reason, it is a good idea to add notes to your programs to explain in natural
language what the program is doing. These notes are called comments, and they start
with the # symbol:
v = 5 # assign 5 to v
This comment contains useful information that is not in the code:
v = 5 # velocity in meters/second.
Good variable names can reduce the need for comments, but long names can make complex
expressions hard to read, so there is a tradeo.
2.8. DEBUGGING 23
2.8 Debugging
Three kinds of errors can occur in a program: syntax errors, runtime errors, and semantic
errors. It is useful to distinguish between them in order to track them down more quickly.
Syntax error: Syntax refers to the structure of a program and the rules about that
structure. For example, parentheses have to come in matching pairs, so (1 + 2) is
legal, but 8) is a syntax error.
If there is a syntax error anywhere in your program, Python displays an error message
and quits, and you will not be able to run the program. During the rst few weeks of
your programming career, you might spend a lot of time tracking down syntax errors.
As you gain experience, you will make fewer errors and nd them faster.
Runtime error: The second type of error is a runtime error, so called because the error
does not appear until after the program has started running. These errors are also
called exceptions because they usually indicate that something exceptional (and bad)
has happened.
Runtime errors are rare in the simple programs you will see in the rst few chapters,
so it might be a while before you encounter one.
Semantic error: The third type of error is semantic, which means related to meaning.
If there is a semantic error in your program, it will run without generating error
messages, but it will not do the right thing. It will do something else. Specically, it
will do what you told it to do.
Identifying semantic errors can be tricky because it requires you to work backward by
looking at the output of the program and trying to gure out what it is doing.
2.9 Glossary
variable: A name that refers to a value.
state diagram: A graphical representation of a set of variables and the values they refer
to.
keyword: A reserved word that is used to parse a program; you cannot use keywords like
if, def, and while as variable names.
statement: A section of code that represents a command or action. So far, the statements
we have seen are assignments and print statements.
interactive mode: A way of using the Python interpreter by typing code at the prompt.
24 CHAPTER 2. VARIABLES, EXPRESSIONS AND STATEMENTS
script mode: A way of using the Python interpreter to read code from a script and run
it.
order of operations: Rules governing the order in which expressions involving multiple
operators and operands are evaluated.
comment: Information in a program that is meant for other programmers (or anyone
reading the source code) and has no eect on the execution of the program.
syntax error: An error in a program that makes it impossible to parse (and therefore
impossible to interpret).
semantic error: An error in a program that makes it do something other than what the
programmer intended.
Chapter 3
Lists
This chapter presents one of Python's most useful built-in types, lists. You will also learn
more about objects and what can happen when you have more than one name for the same
object.
25
26 CHAPTER 3. LISTS
list
cheeses 0 ’Cheddar’
1 ’Edam’
2 ’Gouda’
list
numbers 0 42
1 123
5
list
empty
>>> cheeses[0]
'Cheddar'
Unlike strings, lists are mutable. When the bracket operator appears on the left side of an
assignment, it identies the element of the list that will be assigned.
If you try to read or write an element that does not exist, you get an IndexError.
If an index has a negative value, it counts backward from the end of the list.
This works well if you only need to read the elements of the list. But if you want to write
or update the elements, you need the indices. A common way to do that is to combine the
built-in functions range and len:
for i in range(len(numbers)):
numbers[i] = numbers[i] * 2
This loop traverses the list and updates each element. len returns the number of elements
in the list. range returns a list of indices from 0 to n − 1, where n is the length of the list.
Each time through the loop i gets the index of the next element. The assignment statement
in the body uses i to read the old value of the element and to assign the new value.
A for loop over an empty list never runs the body:
for x in []:
print('This never happens.')
Although a list can contain another list, the nested list still counts as a single element. The
length of this list is four:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> c = a + b
>>> c
[1, 2, 3, 4, 5, 6]
>>> [0] * 4
[0, 0, 0, 0]
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
The rst example repeats [0] four times. The second example repeats the list [1, 2, 3]
three times.
28 CHAPTER 3. LISTS
>>> t[:]
['a', 'b', 'c', 'd', 'e', 'f']
Since lists are mutable, it is often useful to make a copy before performing operations that
modify lists.
A slice operator on the left side of an assignment can update multiple elements:
def add_all(t):
total = 0
for x in t:
total += x
return total
total is initialized to 0. Each time through the loop, x gets one element from the list.
The += operator provides a short way to update a variable. This augmented assignment
statement,
total += x
is equivalent to
total = total + x
As the loop runs, total accumulates the sum of the elements; a variable used this way is
sometimes called an accumulator.
Adding up the elements of a list is such a common operation that Python provides it as
a built-in function, sum:
>>> t = [1, 2, 3]
>>> sum(t)
6
An operation like this that combines a sequence of elements into a single value is sometimes
called reduce.
Sometimes you want to traverse one list while building another. For example, the follow-
ing function takes a list of strings and returns a new list that contains capitalized strings:
def capitalize_all(t):
res = []
for s in t:
res.append(s.capitalize())
return res
res is initialized with an empty list; each time through the loop, we append the next element.
So res is another kind of accumulator.
An operation like capitalize_all is sometimes called a map because it maps a func-
tion (in this case the method capitalize) onto each of the elements in a sequence.
Another common operation is to select some of the elements from a list and return a
sublist. For example, the following function takes a list of strings and returns a list that
contains only the uppercase strings:
def only_upper(t):
res = []
for s in t:
if s.isupper():
res.append(s)
return res
30 CHAPTER 3. LISTS
isupper is a string method that returns True if the string contains only upper case letters.
An operation like only_upper is called a lter because it selects some of the elements
and lters out the others.
Most common list operations can be expressed as a combination of map, lter and reduce.
a ’banana’ a
’banana’
b ’banana’ b
Because list is the name of a built-in function, you should avoid using it as a variable
name. I also avoid l because it looks too much like 1. So that's why I use t.
The list function breaks a string into individual letters. If you want to break a string
into words, you can use the split method:
An optional argument called a delimiter species which characters to use as word bound-
aries. The following example uses a hyphen as a delimiter:
>>> s = 'spam-spam-spam'
>>> delimiter = '-'
>>> t = s.split(delimiter)
>>> t
['spam', 'spam', 'spam']
join is the inverse of split. It takes a list of strings and concatenates the elements. join is
a string method, so you have to invoke it on the delimiter and pass the list as a parameter:
In this case the delimiter is a space character, so join puts a space between words. To
concatenate strings without spaces, you can use the empty string, '', as a delimiter.
a = 'banana'
b = 'banana'
We know that a and b both refer to a string, but we don't know whether they refer to the
same string. There are two possible states, shown in Figure 3.2.
In one case, a and b refer to two dierent objects that have the same value. In the
second case, they refer to the same object.
To check whether two variables refer to the same object, you can use the is operator.
32 CHAPTER 3. LISTS
a [ 1, 2, 3 ]
b [ 1, 2, 3 ]
a
[ 1, 2, 3 ]
b
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refer to it. But
when you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
3.11 Aliasing
If a refers to an object and you assign b = a, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
list
__main__ letters
0 ’a’
1 ’b’
delete_head t
2 ’c’
>>> b[0] = 42
>>> a
[42, 2, 3]
Although this behavior can be useful, it is error-prone. In general, it is safer to avoid aliasing
when you are working with mutable objects.
For immutable objects like strings, aliasing is not as much of a problem. In this example:
a = 'banana'
b = 'banana'
It almost never makes a dierence whether a and b refer to the same string or not.
def delete_head(t):
del t[0]
The parameter t and the variable letters are aliases for the same object. The stack
diagram looks like Figure 3.5.
Since the list is shared by two frames, I drew it between them.
It is important to distinguish between operations that modify lists and operations that
create new lists. For example, the append method modies a list, but the + operator creates
a new list.
Here's an example using append:
>>> t1 = [1, 2]
>>> t2 = t1.append(3)
>>> t1
[1, 2, 3]
>>> t2
None
34 CHAPTER 3. LISTS
>>> t4 = [1, 2, 3]
>>> bad_delete_head(t4)
>>> t4
[1, 2, 3]
At the beginning of bad_delete_head, t and t4 refer to the same list. At the end, t refers
to a new list, but t4 still refers to the original, unmodied list.
An alternative is to write a function that creates and returns a new list. For example,
tail returns all but the rst element of a list:
def tail(t):
return t[1:]
This function leaves the original list unmodied. Here's how it is used:
3.13 Debugging
Careless use of lists (and other mutable objects) can lead to long hours of debugging. Here
are some common pitfalls and ways to avoid them:
1. Most list methods modify the argument and return None. This is the opposite of the
string methods, which return a new string and leave the original alone.
word = word.strip()
t = t.sort() # WRONG!
3.13. DEBUGGING 35
Because sort returns None, the next operation you perform with t is likely to fail.
Before using list methods and operators, you should read the documentation carefully
and then test them in interactive mode.
Part of the problem with lists is that there are too many ways to do things. For
example, to remove an element from a list, you can use pop, remove, del, or even a
slice assignment.
To add an element, you can use the append method or the + operator. Assuming that
t is a list and x is a list element, these are correct:
t.append(x)
t = t + [x]
t += [x]
t.append([x]) # WRONG!
t = t.append(x) # WRONG!
t + [x] # WRONG!
t = t + x # WRONG!
Try out each of these examples in interactive mode to make sure you understand what
they do. Notice that only the last one causes a runtime error; the other three are legal,
but they do the wrong thing.
If you want to use a method like sort that modies the argument, but you need to
keep the original list as well, you can make a copy.
>>> t = [3, 1, 2]
>>> t2 = t[:]
>>> t2.sort()
>>> t
[3, 1, 2]
>>> t2
[1, 2, 3]
In this example you could also use the built-in function sorted, which returns a new,
sorted list and leaves the original alone.
>>> t2 = sorted(t)
>>> t
[3, 1, 2]
>>> t2
[1, 2, 3]
36 CHAPTER 3. LISTS
3.14 Glossary
list: A sequence of values.
element: One of the values in a list (or other sequence), also called items.
augmented assignment: A statement that updates the value of a variable using an op-
erator like +=.
reduce: A processing pattern that traverses a sequence and accumulates the elements into
a single result.
map: A processing pattern that traverses a sequence and performs an operation on each
element.
lter: A processing pattern that traverses a list and selects the elements that satisfy some
criterion.
object: Something a variable can refer to. An object has a type and a value.
aliasing: A circumstance where two or more variables refer to the same object.
Tuples
This chapter presents one more built-in type, the tuple, and then shows how lists, dictionar-
ies, and tuples work together. I also present a useful feature for variable-length argument
lists, the gather and scatter operators.
One note: there is no consensus on how to pronounce tuple. Some people say tuh-ple,
which rhymes with supple. But in the context of programming, most people say too-ple,
which rhymes with quadruple.
To create a tuple with a single element, you have to include a nal comma:
>>> t1 = 'a',
>>> type(t1)
<class 'tuple'>
>>> t2 = ('a')
>>> type(t2)
<class 'str'>
Another way to create a tuple is the built-in function tuple. With no argument, it creates
an empty tuple:
>>> t = tuple()
>>> t
()
37
38 CHAPTER 4. TUPLES
If the argument is a sequence (string, list or tuple), the result is a tuple with the elements
of the sequence:
>>> t = tuple('lupins')
>>> t
('l', 'u', 'p', 'i', 'n', 's')
Because tuple is the name of a built-in function, you should avoid using it as a variable
name.
Most list operators also work on tuples. The bracket operator indexes an element:
>>> t[1:3]
('b', 'c')
But if you try to modify one of the elements of the tuple, you get an error:
Because tuples are immutable, you can't modify the elements. But you can replace one
tuple with another:
This statement makes a new tuple and then makes t refer to it.
The relational operators work with tuples and other sequences; Python starts by compar-
ing the rst element from each sequence. If they are equal, it goes on to the next elements,
and so on, until it nds elements that dier. Subsequent elements are not considered (even
if they are really big).
>>> temp = a
>>> a = b
>>> b = temp
>>> a, b = b, a
The left side is a tuple of variables; the right side is a tuple of expressions. Each value is
assigned to its respective variable. All the expressions on the right side are evaluated before
any of the assignments.
The number of variables on the left and the number of values on the right have to be
the same:
>>> a, b = 1, 2, 3
ValueError: too many values to unpack
More generally, the right side can be any kind of sequence (string, list or tuple). For example,
to split an email address into a user name and a domain, you could write:
The return value from split is a list with two elements; the rst element is assigned to
uname, the second to domain.
>>> uname
'monty'
>>> domain
'python.org'
>>> t = divmod(7, 3)
>>> t
(2, 1)
def min_max(t):
return min(t), max(t)
max and min are built-in functions that nd the largest and smallest elements of a sequence.
min_max computes both and returns a tuple of two values.
40 CHAPTER 4. TUPLES
def printall(*args):
print(args)
The gather parameter can have any name you like, but args is conventional. Here's how
the function works:
The complement of gather is scatter. If you have a sequence of values and you want to pass
it to a function as multiple arguments, you can use the * operator. For example, divmod
takes exactly two arguments; it doesn't work with a tuple:
>>> t = (7, 3)
>>> divmod(t)
TypeError: divmod expected 2 arguments, got 1
>>> divmod(*t)
(2, 1)
Many of the built-in functions use variable-length argument tuples. For example, max and
min can take any number of arguments:
>>> max(1, 2, 3)
3
>>> sum(1, 2, 3)
TypeError: sum expected at most 2 arguments, got 3
As an exercise, write a function called sum_all that takes any number of arguments and
returns their sum.
>>> s = 'abc'
>>> t = [0, 1, 2]
>>> zip(s, t)
<zip object at 0x7f7d0a9e7c48>
4.5. LISTS AND TUPLES 41
The result is a zip object that knows how to iterate through the pairs. The most common
use of zip is in a for loop:
A zip object is a kind of iterator, which is any object that iterates through a sequence.
Iterators are similar to lists in some ways, but unlike lists, you can't use an index to select
an element from an iterator.
If you want to use list operators and methods, you can use a zip object to make a list:
The result is a list of tuples; in this example, each tuple contains a character from the string
and the corresponding element from the list.
If the sequences are not the same length, the result has the length of the shorter one.
You can use tuple assignment in a for loop to traverse a list of tuples:
Each time through the loop, Python selects the next tuple in the list and assigns the elements
to letter and number. The output of this loop is:
0 a
1 b
2 c
If you combine zip, for and tuple assignment, you get a useful idiom for traversing two (or
more) sequences at the same time. For example, has_match takes two sequences, t1 and
t2, and returns True if there is an index i such that t1[i] == t2[i]:
If you need to traverse the elements of a sequence and their indices, you can use the built-in
function enumerate:
The result from enumerate is an enumerate object, which iterates a sequence of pairs; each
pair contains an index (starting from 0) and an element from the given sequence. In this
example, the output is
0 a
1 b
2 c
Again.
tuple
0 ’Cleese’
1 ’John’
dict
(’Cleese’, ’John’) ’08700 100 222’
(’Chapman’, ’Graham’) ’08700 100 222’
(’Idle’, ’Eric’) ’08700 100 222’
(’Gilliam’, ’Terry’) ’08700 100 222’
(’Jones’, ’Terry’) ’08700 100 222’
(’Palin’, ’Michael’) ’08700 100 222’
This loop traverses the keys in directory, which are tuples. It assigns the elements of each
tuple to last and first, then prints the name and corresponding telephone number.
There are two ways to represent tuples in a state diagram. The more detailed ver-
sion shows the indices and elements just as they appear in a list. For example, the tuple
('Cleese', 'John') would appear as in Figure 4.1.
But in a larger diagram you might want to leave out the details. For example, a diagram
of the telephone directory might appear as in Figure 4.2.
Here the tuples are shown using Python syntax as a graphical shorthand. The telephone
number in the diagram is the complaints line for the BBC, so please don't call it.
2. If you want to use a sequence as a dictionary key, you have to use an immutable type
like a tuple or string.
44 CHAPTER 4. TUPLES
3. If you are passing a sequence as an argument to a function, using tuples reduces the
potential for unexpected behavior due to aliasing.
Because tuples are immutable, they don't provide methods like sort and reverse, which
modify existing lists. But Python provides the built-in function sorted, which takes any
sequence and returns a new list with the same elements in sorted order, and reversed,
which takes a sequence and returns an iterator that traverses the list in reverse order.
4.8 Debugging
Lists, dictionaries and tuples are examples of data structures; in this chapter we are
starting to see compound data structures, like lists of tuples, or dictionaries that contain
tuples as keys and lists as values. Compound data structures are useful, but they are prone
to what I call shape errors; that is, errors caused when a data structure has the wrong
type, size, or structure. For example, if you are expecting a list with one integer and I give
you a plain old integer (not in a list), it won't work.
To help debug these kinds of errors, I have written a module called structshape that
provides a function, also called structshape, that takes any kind of data structure as
an argument and returns a string that summarizes its shape. You can download it from
http://thinkpython2.com/code/structshape.py
Here's the result for a simple list:
>>> s = 'abc'
>>> lt = list(zip(t, s))
>>> structshape(lt)
'list of 3 tuple of (int, str)'
And here's a dictionary with 3 items that map integers to strings.
>>> d = dict(lt)
>>> structshape(d)
'dict of 3 int->str'
If you are having trouble keeping track of your data structures, structshape can help.
4.9. GLOSSARY 45
4.9 Glossary
tuple: An immutable sequence of elements.
tuple assignment: An assignment with a sequence on the right side and a tuple of variables
on the left. The right side is evaluated and then its elements are assigned to the
variables on the left.
zip object: The result of calling a built-in function zip; an object that iterates through a
sequence of tuples.
iterator: An object that can iterate through a sequence, but which does not provide list
operators and methods.
data structure: A collection of related values, often organized in lists, dictionaries, tuples,
etc.
shape error: An error caused because a value has the wrong shape; that is, the wrong
type or size.
46 CHAPTER 4. TUPLES
Chapter 5
Functions
The name of the function is type. The expression in parentheses is called the argument
of the function. The result, for this function, is the type of the argument.
It is common to say that a function takes an argument and returns a result. The
result is also called the return value.
Python provides functions that convert values from one type to another. The int func-
tion takes any value and converts it to an integer, if it can, or complains otherwise:
>>> int('32')
32
>>> int('Hello')
ValueError: invalid literal for int(): Hello
int can convert oating-point values to integers, but it doesn't round o; it chops o the
fraction part:
>>> int(3.99999)
3
>>> int(-2.3)
-2
>>> float(32)
32.0
>>> float('3.14159')
3.14159
47
48 CHAPTER 5. FUNCTIONS
>>> str(32)
'32'
>>> str(3.14159)
'3.14159'
This statement creates a module object named math. If you display the module object,
you get some information about it:
>>> math
<module 'math' (built-in)>
The module object contains the functions and variables dened in the module. To access one
of the functions, you have to specify the name of the module and the name of the function,
separated by a dot (also known as a period). This format is called dot notation.
The rst example uses math.log10 to compute a signal-to-noise ratio in decibels (assuming
that signal_power noise_power are dened). The math module also provides log,
and
which computes logarithms base e.
The second example nds the sine of radians. The variable name radians is a hint
that sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians.
To convert from degrees to radians, divide by 180 and multiply by π :
>>> degrees = 45
>>> radians = degrees / 180.0 * math.pi
>>> math.sin(radians)
0.707106781187
The expression math.pi gets the variable pi from the math module. Its value is a oating-
point approximation of π , accurate to about 15 digits.
If you know trigonometry, you can check the previous result by comparing it to the
square root of two divided by two:
5.3 Composition
So far, we have looked at the elements of a programvariables, expressions, and statements
in isolation, without talking about how to combine them.
One of the most useful features of programming languages is their ability to take small
building blocks and compose them. For example, the argument of a function can be any
kind of expression, including arithmetic operators:
x = math.exp(math.log(x+1))
Almost anywhere you can put a value, you can put an arbitrary expression, with one ex-
ception: the left side of an assignment statement has to be a variable name. Any other
expression on the left side is a syntax error (we will see exceptions to this rule later).
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")
def is a keyword that indicates that this is a function denition. The name of the function
is print_lyrics. The rules for function names are the same as for variable names: letters,
numbers and underscore are legal, but the rst character can't be a number. You can't use
a keyword as the name of a function, and you should avoid having a variable and a function
with the same name.
The empty parentheses after the name indicate that this function doesn't take any ar-
guments.
The rst line of the function denition is called the header; the rest is called the body.
The header has to end with a colon and the body has to be indented. By convention,
indentation is always four spaces. The body can contain any number of statements.
The strings in the print statements are enclosed in double quotes. Single quotes and
double quotes do the same thing; most people use single quotes except in cases like this
where a single quote (which is also an apostrophe) appears in the string.
All quotation marks (single and double) must be straight quotes, usually located next
to Enter on the keyboard. Curly quotes, like the ones in this sentence, are not legal in
Python.
If you type a function denition in interactive mode, the interpreter prints dots (...)
to let you know that the denition isn't complete:
50 CHAPTER 5. FUNCTIONS
>>> print(print_lyrics)
<function print_lyrics at 0xb7e99e9c>
>>> type(print_lyrics)
<class 'function'>
The syntax for calling the new function is the same as for built-in functions:
>>> print_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
Once you have dened a function, you can use it inside another function. For example, to
repeat the previous refrain, we could write a function called repeat_lyrics:
def repeat_lyrics():
print_lyrics()
print_lyrics()
>>> repeat_lyrics()
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")
def repeat_lyrics():
print_lyrics()
print_lyrics()
repeat_lyrics()
5.6. FLOW OF EXECUTION 51
This program contains two function denitions: print_lyrics and repeat_lyrics. Func-
tion denitions get executed just like other statements, but the eect is to create function
objects. The statements inside the function do not run until the function is called, and the
function denition generates no output.
As you might expect, you have to create a function before you can run it. In other words,
the function denition has to run before the function gets called.
As an exercise, move the last line of this program to the top, so the function call appears
before the denitions. Run the program and see what error message you get.
Now move the function call back to the bottom and move the denition of print_lyrics
after the denition of repeat_lyrics. What happens when you run this program?
def print_twice(bruce):
print(bruce)
print(bruce)
This function assigns the argument to a parameter named bruce. When the function is
called, it prints the value of the parameter (whatever it is) twice.
This function works with any value that can be printed.
>>> print_twice('Spam')
Spam
Spam
52 CHAPTER 5. FUNCTIONS
>>> print_twice(42)
42
42
>>> print_twice(math.pi)
3.14159265359
3.14159265359
The same rules of composition that apply to built-in functions also apply to programmer-
dened functions, so we can use any kind of expression as an argument for print_twice:
>>> print(cat)
NameError: name 'cat' is not defined
Parameters are also local. For example, outside print_twice, there is no such thing as
bruce.
5.9. STACK DIAGRAMS 53
an action but don't return a value. They are called void functions.
When you call a fruitful function, you almost always want to do something with the
result; for example, you might assign it to a variable or use it as part of an expression:
x = math.cos(radians)
golden = (math.sqrt(5) + 1) / 2
When you call a function in interactive mode, Python displays the result:
>>> math.sqrt(5)
2.2360679774997898
But in a script, if you call a fruitful function all by itself, the return value is lost forever!
math.sqrt(5)
This script computes the square root of 5, but since it doesn't store or display the result, it
is not very useful.
Void functions might display something on the screen or have some other eect, but
they don't have a return value. If you assign the result to a variable, you get a special value
called None.
The value None is not the same as the string 'None'. It is a special value that has its own
type:
>>> type(None)
<class 'NoneType'>
The functions we have written so far are all void. We will start writing fruitful functions in
a few chapters.
Functions can make a program smaller by eliminating repetitive code. Later, if you
make a change, you only have to make it in one place.
Dividing a long program into functions allows you to debug the parts one at a time
and then assemble them into a working whole.
Well-designed functions are often useful for many programs. Once you write and debug
one, you can reuse it.
5.12. DEBUGGING 55
5.12 Debugging
One of the most important skills you will acquire is debugging. Although it can be frus-
trating, debugging is one of the most intellectually rich, challenging, and interesting parts
of programming.
In some ways debugging is like detective work. You are confronted with clues and you
have to infer the processes and events that led to the results you see.
Debugging is also like an experimental science. Once you have an idea about what
is going wrong, you modify your program and try again. If your hypothesis was correct,
you can predict the result of the modication, and you take a step closer to a working
program. If your hypothesis was wrong, you have to come up with a new one. As Sherlock
Holmes pointed out, When you have eliminated the impossible, whatever remains, however
improbable, must be the truth. (A. Conan Doyle, The Sign of Four)
For some people, programming and debugging are the same thing. That is, programming
is the process of gradually debugging a program until it does what you want. The idea is
that you should start with a working program and make small modications, debugging
them as you go.
For example, Linux is an operating system that contains millions of lines of code, but
it started out as a simple program Linus Torvalds used to explore the Intel 80386 chip.
According to Larry Greeneld, One of Linus's earlier projects was a program that would
switch between printing AAAA and BBBB. This later evolved to Linux. ( The Linux Users'
Guide Beta Version 1).
5.13 Glossary
function: A named sequence of statements that performs some useful operation. Functions
may or may not take arguments and may or may not produce a result.
function denition: A statement that creates a new function, specifying its name, pa-
rameters, and the statements it contains.
function object: A value created by a function denition. The name of the function is a
variable that refers to a function object.
parameter: A name used inside a function to refer to the value passed as an argument.
function call: A statement that runs a function. It consists of the function name followed
by an argument list in parentheses.
argument: A value provided to a function when the function is called. This value is
assigned to the corresponding parameter in the function.
local variable: A variable dened inside a function. A local variable can only be used
inside its function.
return value: The result of a function. If a function call is used as an expression, the
return value is the value of the expression.
module: A le that contains a collection of related functions and other denitions.
import statement: A statement that reads a module le and creates a module object.
module object: A value created by an import statement that provides access to the values
dened in a module.
dot notation: The syntax for calling a function in another module by specifying the module
name followed by a dot (period) and the function name.
stack diagram: A graphical representation of a stack of functions, their variables, and the
values they refer to.
frame: A box in a stack diagram that represents a function call. It contains the local
variables and parameters of the function.
traceback: A list of the functions that are executing, printed when an exception occurs.
Chapter 6
At this point you know how to use functions to organize code and built-in types to organize
data. The next step is to learn object-oriented programming, which uses programmer-
dened types to organize both code and data. Object-oriented programming is a big topic;
it will take a few chapters to get there.
Creating a new type is more complicated than the other options, but it has advantages
that will be apparent soon.
A programmer-dened type is also called a class. A class denition looks like this:
class Point:
"""Represents a point in 2-D space."""
The header indicates that the new class is called Point. The body is a docstring that explains
what the class is for. You can dene variables and methods inside a class denition, but we
will get back to that later.
Dening a class named Point creates a class object.
>>> Point
<class '__main__.Point'>
Because Point is dened at the top level, its full name is __main__.Point.
The class object is like a factory for creating objects. To create a Point, you call Point
as if it were a function.
57
58 CHAPTER 6. CLASSES AND OBJECTS
Point
blank x 3.0
y 4.0
6.2 Attributes
You can assign values to an instance using dot notation:
This syntax is similar to the syntax for selecting a variable from a module, such as math.pi
or string.whitespace. In this case, though, we are assigning values to named elements of
an object. These elements are called attributes.
As a noun, AT-trib-ute is pronounced with emphasis on the rst syllable, as opposed
to a-TRIB-ute, which is a verb.
Figure 6.1 is a state diagram that shows the result of these assignments. A state diagram
that shows an object and its attributes is called an object diagram.
The variable blank refers to a Point object, which contains two attributes. Each attribute
refers to a oating-point number.
You can read the value of an attribute using the same syntax:
>>> blank.y
4.0
>>> x = blank.x
>>> x
3.0
The expression blank.x means, Go to the object blank refers to and get the value of x.
In the example, we assign that value to a variable named x. There is no conict between
the variable x and the attribute x.
You can use dot notation as part of any expression. For example:
def print_point(p):
print('(%g, %g)' % (p.x, p.y))
print_point takes a point as an argument and displays it in mathematical notation. To
invoke it, you can pass blank as an argument:
>>> print_point(blank)
(3.0, 4.0)
Inside the function, p is an alias for blank, so if the function modies p, blank changes.
As an exercise, write a function called distance_between_points that takes two Points
as arguments and returns the distance between them.
6.3 Rectangles
Sometimes it is obvious what the attributes of an object should be, but other times you have
to make decisions. For example, imagine you are designing a class to represent rectangles.
What attributes would you use to specify the location and size of a rectangle? You can ignore
angle; to keep things simple, assume that the rectangle is either vertical or horizontal.
There are at least two possibilities:
You could specify one corner of the rectangle (or the center), the width, and the height.
At this point it is hard to say whether either is better than the other, so we'll implement
the rst one, just as an example.
Here is the class denition:
class Rectangle:
"""Represents a rectangle.
box = Rectangle()
box.width = 100.0
box.height = 200.0
box.corner = Point()
box.corner.x = 0.0
box.corner.y = 0.0
The expression box.corner.x means, Go to the object box refers to and select the attribute
named corner; then go to that object and select the attribute named x.
Figure 6.2 shows the state of this object. An object that is an attribute of another object
is embedded.
60 CHAPTER 6. CLASSES AND OBJECTS
Rectangle
box width 100.0 Point
height 200.0 x 0.0
corner y 0.0
6.6 Copying
Aliasing can make a program dicult to read because changes in one place might have
unexpected eects in another place. It is hard to keep track of all the variables that might
refer to a given object.
Copying an object is often an alternative to aliasing. The copy module contains a
function called copy that can duplicate any object:
>>> p1 = Point()
>>> p1.x = 3.0
>>> p1.y = 4.0
p1 and p2 contain the same data, but they are not the same Point.
>>> print_point(p1)
(3, 4)
>>> print_point(p2)
(3, 4)
>>> p1 is p2
False
>>> p1 == p2
False
The is operator indicates that p1 and p2 are not the same object, which is what we expected.
But you might have expected == to yield True because these points contain the same data.
In that case, you will be disappointed to learn that for instances, the default behavior
of the == operator is the same as the is operator; it checks object identity, not object
equivalence. That's because for programmer-dened types, Python doesn't know what
should be considered equivalent. At least, not yet.
If you use copy.copy to duplicate a Rectangle, you will nd that it copies the Rectangle
object but not the embedded Point.
Figure 6.3 shows what the object diagram looks like. This operation is called a
shallow copy because it copies the object and any references it contains, but not the
embedded objects.
62 CHAPTER 6. CLASSES AND OBJECTS
For most applications, this is not what you want. In this example, invoking grow_rectangle
on one of the Rectangles would not aect the other, but invoking move_rectangle on either
would aect both! This behavior is confusing and error-prone.
Fortunately, the copy module provides a method named deepcopy that copies not only
the object but also the objects it refers to, and the objects they refer to, and so on. You
will not be surprised to learn that this operation is called a deep copy.
6.7 Debugging
When you start working with objects, you are likely to encounter some new exceptions. If
you try to access an attribute that doesn't exist, you get an AttributeError:
>>> p = Point()
>>> p.x = 3
>>> p.y = 4
>>> p.z
AttributeError: Point instance has no attribute 'z'
If you are not sure what type an object is, you can ask:
>>> type(p)
<class '__main__.Point'>
You can also use isinstance to check whether an object is an instance of a class:
try:
x = p.x
except AttributeError:
x = 0
This approach can make it easier to write functions that work with dierent types; more
on that topic is coming up in Section ??.
6.8. GLOSSARY 63
6.8 Glossary
class: A programmer-dened type. A class denition creates a new class object.
class object: An object that contains information about a programmer-dened type. The class
object can be used to create instances of the type.
shallow copy: To copy the contents of an object, including any references to embedded objects;
implemented by the copy function in the copy module.
deep copy: To copy the contents of an object as well as any embedded objects, and any objects
embedded in them, and so on; implemented by the deepcopy function in the copy
module.
object diagram: A diagram that shows objects, their attributes, and the values of the attributes.
64 CHAPTER 6. CLASSES AND OBJECTS
Chapter 7
Magic Methods
7.1 Introduction
Magic methods in Python are the special methods which add "magic" to your class. Magic
methods are not meant to be invoked directly by you, but the invocation happens internally
from the class on a certain action. For example, when you add two numbers using the +
operator, internally, the
__add__()
method will be called. Built-in classes in Python dene many magic methods. Use the
dir()
function to see the number of magic methods inherited by a class. For example, the following
lists all the attributes and methods dened in the int class.
>>> dir(int)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__',
'__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__',
'__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__',
'__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__',
'__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__',
'__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__',
'__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__',
'__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate',
'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
As you can see above, the int class includes various magic methods surrounded by double
underscores. For example, the
__add__
method is a magic method which gets called when we add two numbers using the + operator.
Consider the following example.
>>> num=10
>>> num + 5
15
65
66 CHAPTER 7. MAGIC METHODS
>>> num.__add__(5)
15
As you can see, when you do num+10, the + operator calls the
__add__(10)
method. You can also call
num.__add__(5)
directly which will give the same result. However, as mentioned before, magic methods are
not meant to be called directly, but internally, through some other methods or actions.
Magic methods are most frequently used to dene overloaded behaviours of predened
operators in Python. For instance, arithmetic operators by default operate upon numeric
operands. This means that numeric objects must be used along with operators like +, -, *,
/, etc. The + operator is also dened as a concatenation operator in string, list and tuple
classes. We can say that the + operator is overloaded.
In order to make the overloaded behaviour available in your own custom class, the
corresponding magic method should be overridden. For example, in order to use the +
operator with objects of a user-dened class, it should include the
__add__()
method.
Let's see how to implement and use some of the important magic methods.
__new__()
magic method is implicitly called before the
__init__()
method. The
__new__()
method returns a new object, which is then initialized by
__init__()
.
class employee:
def __new__(cls):
print ("__new__ magic method is called")
inst = object.__new__(cls)
return inst
def __init__(self):
print ("__init__ magic method is called")
self.name='Haitham'
The above example will produce the following output when you create an instance of the
Employee class.
7.3. __STR__() METHOD 67
>>> e1=employee()
__new__ magic method is called
__init__ magic method is called
Thus, the
__new__()
method is called before the
__init__()
method.
__str__()
. It is overridden to return a printable string representation of any user dened class. We
have seen str() built-in function which returns a string from the object parameter. For
example, str(12) returns '12'. When invoked, it calls the
__str__()
method in the int class.
>>> num=12
>>> str(num)
'12'
>>> #This is equivalent to
>>> int.__str__(num)
'12'
Let us now override the
__str__()
method in the employee class to return a string representation of its object.
class employee:
def __init__(self):
self.name='Haitham'
self.salary=10000
def __str__(self):
return 'name='+self.name+' salary=\$'+str(self.salary)
See how the str() function internally calls the
__str__()
method dened in the employee class. This is why it is called a magic method.
>>> e1=employee()
>>> print(e1)
name=Haitham salary=\$10000
68 CHAPTER 7. MAGIC METHODS
__add__()
is overridden, which performs the addition of the ft and inch attributes of the two objects.
The
__str__()
method returns the object's string representation.
class distance:
def __init__(self, x=None,y=None):
self.ft=x
self.inch=y
def __add__(self,x):
temp=distance()
temp.ft=self.ft+x.ft
temp.inch=self.inch+x.inch
if temp.inch>=12:
temp.ft+=1
temp.inch-=12
return temp
def __str__(self):
return 'ft:'+str(self.ft)+' in: '+str(self.inch)
Run the above Python script to verify the overloaded operation of the + operator.
>>> d1=distance(3,10)
>>> d2=distance(4,4)
>>> print("d1= {} d2={}".format(d1, d2))
d1= ft:3 in: 10 d2=ft:4 in: 4
>>>d3=d1+d2
>>>print(d3)
ft:8 in: 2
class distance:
def __init__(self, x=None,y=None):
self.ft=x
self.inch=y
def __ge__(self, x):
val1=self.ft*12+self.inch
val2=x.ft*12+x.inch
if val1>=val2:
return True
7.6. IMPORTANT MAGIC METHODS 69
else:
return False
This method gets invoked when the ≥ operator is used and returns True or False. Ac-
cordingly, the appropriate message can be displayed
>>>d1=distance(2,1)
>>>d2=distance(4,10)
>>>d1>=d2
False
__str__(self ) To get called by built-int str() method to return a string representation of a type
__repr__(self ) To get called by built-int repr() method to return a machine readable representation of a
__unicode__(self ) To get called by built-int unicode() method to return an unicode string of a type
__format__(self, formatstr) To get called by built-int string.format() method to return a new style of string
__hash__(self ) To get called by built-int hash() method to return an integer
__nonzero__(self ) To get called by built-int bool() method to return True or False
__dir__(self ) To get called by built-int dir() method to return a list of attributes of a class
__sizeof__(self ) To get called by built-int sys.getsizeof() method to return the size of an object
__getattr__(self, name) Is called when the accessing attribute of a class that does not exist
__setattr__(self, name, value) Is called when assigning a value to the attribute of a class
__delattr__(self, name) Is called when deleting an attribute of a class
Python Testing
71
72 CHAPTER 8. PYTHON TESTING
1. An integration test checks that components in your application operate with each
other.
You can write both integration tests and unit tests in Python.
unittest
nose or nose2
pytest
Choosing the best test runner for your requirements and level of experience is important.
8.4.1 unittest
unittest has been built into the Python standard library since version 2.1. You'll probably
see it in commercial Python applications and open-source projects. unittest contains both a
testing framework and a test runner. unittest has some important requirements for writing
and executing tests.
unittest requires that:
2. You use a series of special assertion methods in the unittest.TestCase class instead of
the built-in assert statement
2. Create a class called TestSum that inherits from the TestCase class
3. Convert the test functions into methods by adding self as the rst argument
4. Change the assertions to use the self.assertEqual() method on the TestCase class
Numpy
Numpy is the core library for scientic computing in Python. It provides a high-performance
multidimensional array object, and tools for working with these arrays.
9.1 Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non-
negative integers. The number of dimensions is the rank of the array; the shape of an array
is a tuple of integers giving the size of the array along each dimension.
We can initialize numpy arrays from nested Python lists, and access elements using
square brackets:
import numpy as np
import numpy as np
73
74 CHAPTER 9. NUMPY
9.2.1 Slicing
Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidi-
mensional, you must specify a slice for each dimension of the array:
import numpy as np
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
# [6 7]]
b = a[:2, 1:3]
import numpy as np
# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :] # Rank 1 view of the second row of a
row_r2 = a[1:2, :] # Rank 2 view of the second row of a
print(row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"
import numpy as np
# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]]) # Prints "[2 2]"
import numpy as np
import numpy as np
import numpy as np
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))
Numpy provides many useful functions for performing computations on arrays; one of
the most useful is sum:
import numpy as np
x = np.array([[1,2],[3,4]])
Apart from computing mathematical functions using arrays, we frequently need to re-
shape or otherwise manipulate data in arrays. The simplest example of this type of operation
is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:
import numpy as np
x = np.array([[1,2], [3,4]])
print(x) # Prints "[[1 2]
# [3 4]]"
print(x.T) # Prints "[[1 3]
# [2 4]]"
10.1 Introduction
The foundation of computer science is based on the study of algorithms. An algorithm
is a sequence of clear and precise step-by-step instrutions for solving a problem in a nite
amount of time. Algorithms are implemented by translating the step-by-step instructions
into computer program that can be executed by a computer. This translation process is
called computer programming or simply programming. Computer programs are con-
structed using a programming language appropriate to the problem. While programming
is an important part of computer science, computer science is not the study of program-
ming. Nor is it about learning a particular programming language. Instead, programming
and programming languages are tools used by compter scientists to solve problems.
Data items are represented within a computer as a sequence of binary digits. To dis-
tinguish between the dierent types of data, the term type is often used to refer to a
collection of values and the term data type to refer to a given type along with a collection
of operations for manipulating values of the given type.
Programming languages commonly provide data types as part of the language itself.
These data types, known as primitives, come in two categories: simple and complex.
The simple data types consists of values that are in the most basic form and can't
be decomposed into smaller parts. Integer and real types, for example, consist of single
numeric values. The complex data types, on the other hand, are constructed of multiple
components consisting of simple types or other complex types. In Python, objects, strings,
lists, and dictionaries, which can contain multiple values, are all examples of complex types.
The primitive types provided by a language may not be sucient for solving large complex
problems. Thus, most languages allow for the construction of additional data types, known
as user-dened types since they are dened by the programmer and not the language.
Some of these data types can themselves be very complex.
10.2 Abstractions
An abstraction is a mechanism for separating the properties of an object and restricing the
focus to those relevant in the current context. Abstractions are used to help manage complex
problems, and complex data types. The user of the abstraction does not have to understand
all of the details in order to utilize the object, but only those relevant to the current task
or problem. Typcially, abstractions of problems occur in layers. Two common types of
abstractions encountered in computer science are procedural, or functional, abstraction and
81
82 CHAPTER 10. ABSTRACT DATA TYPES
data abstraction.
The implementation of the various operations are hidden inside the black box, the con-
tents of which we do not have to know in order to utilize the ADT. There are several
advantages of working with adbstract data types and focusing on 'what' instead of 'how'
We can focus on solving the problem at hand instead of getting down in the imple-
mentaion details
We can reduce logical errors that can occur from accidental misuse of storage structures
and data types by presenting direct access to the implementation
The implementaion of the abstract data type can be changed without having to modify
the program code that uses the ADT
10.5.2 Container
A container is any data structure or abstract data type that stores and organizes a col-
lection. The individual values of the collection are known as elements of the container
and a container with no elements is said to be empty. The organization or arrangement
of the elements can vary from one container to the next as can be the operations available
for accessing the elements. Python provides a number of built-in containers, which include
strings, tuples, lists, dictionaries, and sets.
10.5.3 Sequence
A sequence is a container in which the elements are arranged in linear order from front
to back, with each element accessible by position. Access to the individual elements based
on their position within the linear order is provided using the subscript operator. Python
provides two immutable sequences, strings and tuples, and one mutable sequence, the list.
assert
statement, which can be used to raise an
AssertionError
exception. Assert statement is used to state what we assume to be true at a given proint
in the program. If the assertion fails, Python automatically raises an AssertionError and
aborts the program, unluess the exception is caught.
10.7 Bags
A bag is a simple container like a shopping bag that can be used to store a collection of
items. The bag container restricts access to the individual items by only dening operations
for adding and removing individual items, for determining if an item is in the bag, and for
traversing over the collection of items.
length(): Returns the number of items stored in the bag. Accessed using the len()
function
contains(item): Determines if the given target item is stored in the bag and returns
the appropriate boolean value. Accessed using the in operator
remove(item): Removes and returns an occurence of item from the bag. An exception
is raised if the element is not in the bag
iterator(): Creates and returns an iterator that can be used to iterate over the collec-
tion of items
1. focus on solving the problem at hand instead of worrying about the implementation
of the container
2. reduce the chance of introducing errors from misuse of the list since it provides addi-
tional operations that are not appropriate for a bag
4. easily swap out our current implementation of the Bag ADT for a dierent possibly
more ecient, version later
10.8. CHOSE THE DATA STRUCTURE 87
1. Does the data structure provide for the storage requirements as specied by the domain
of the ADT? Abstract data types are dened to work with a specic domain of data
values. The data structure we choose must be capable of storing all possible values
in that domain, taking into consideration any restrictions or limitations placed on the
individual items.
2. Does the data structure provide the necessary data access and manipulation functional-
ity to fully implement the ADT? The functionality of an abstract data type is provided
through its dened set of operations. The data structure must allow for a full and
correct implementation of the ADT without having to violate the abstraction principle
by exposing the implementation details to the user
3. Does the data structure lend itself to an ecient implementation of the operations?
An important goal in the implementation of an abstract data type is to provide an
ecient solution. Some data structures allow for a more ecient implementation than
others, but not every data structure is suitable for implementing every ADT. Eciency
considerations can help to select the best structure from among multiple candidates
There may be multiple data structures suitable for implementing a given abstract data
type, but we attempt to select the best possible based on the context in which the ADT will
be used. Language libraries will commonly provide several implementations of some ADTs,
allowing the programmer to choose the most appropriate. Eciency will be introduced
later.
Having chosen the list, we must ensure it provides the means to implement the complete
set of bag operations. When implementing an ADT, we must use the functionality provided
by the underlying data structure. Sometimes, an ADT operation is identical to one already
provided by the data structure. In this case, the implementation can be quite simple and
may consist of a single call to the corresponding operation of the sturcture, while in other
cases, we have to use multiple operations provided by the structure. To help verify a correct
implementation of the Bag ADT using the list, we can outline how each bag operation will
be implemented:
The size of the bag can determined by the size of the list
Determining if the bag contains a specic item can be done using the equivalent list
operation
When a new item is added to the bag, it can be appended to the end of the list since
there is nor specic ordering of the items in a bag
Removing an item from the bag can also be handled by the equivalent operation
The items in a list can be traversed using a for loop and Python provides for user-
dened iterators that be used with a bag
From this itemized list, we see that each Bag ADT operation can be implemented using
the available functionality of the list. Thus, the list is suitable for implementing the bag.
class Bag:
def __init__(self):
self._items = []
self._current_item = -1
def __len__(self):
return len(self._items)
def __iter(self):
10.9. LIST-BASED IMPLEMENTATION 89
Figure 10.2: Sample instance of the Bag class implemented using a list
Figure 10.3: The Bag and BagIterator objects after the rst loop iteration
return self
def __next__(self):
if self._current_item < len(self) - 1:
self._current_item += 1
return self._items[self._current_item]
else:
raise StopIteration
ADT denition of remove() operation species the precondition that the item must
exist in the bag in order to be removed. Thus, we must rst assert that condition and
verify the existence of the item.
We need to provide an iteration mechanism that allows us to iterate over the individual
items in the bag.
90 CHAPTER 10. ABSTRACT DATA TYPES
Chapter 11
Arrays
11.1 Introduction
The most basic structure for storing and accessing a collection of data is the array. Arrays
can be used to sole a wide range of problems in computer science. Most programming
languages provide this structured data type as a primitive and allow for the creation of
arrays with multiple dimensions. Python don't.
2. Python list provides a large number of operations for working with the contents of the
list
91
92 CHAPTER 11. ARRAYS
3. Python list can grow and shrink during execution as elements are added or removed
1. Arrays are best suited for problems requiring sequences in which the maximum number
of elements are known upfront
2. Python lists are better choice when the size of the sequence needs to change after it
has been created
3. Python list contains more storage space than is needed to store the items currently in
the list. This extra space, the size of which can be up to twice the necessary capacity,
allows for quick and easy expansion as new items are added
4. However, extra space is wasteful when using Pytho list to store xed number of ele-
ments
5. Python lists provide a large set of operations, besides retrieving item at specic loca-
tion, like searching for item, removing an item by value or location, easily extracting
a subset of items, and sorting items
6. Arrays on the other hand, only provides limited set of operations for accessing the
individual elements
get_item(index): Returns the value stored in the array at element position index.
The index argument must be within the valid range. Accessed using the subscript
operator.
set_item(index, value): Modies the contents of the array element at position index to
contain value. The index must be within the valid range. Accessed using the subscript
operator
11.3. ARRAY ABSTRACT DATA TYPE 93
iterator: creates and returns an iterator that can be used to traverse the elements of
the array
value_list = Array(100)
for i in range(len(value_list)):
value_list[i] random.random()
Another example
Suppose you need to read the contents of a text le and count the number of letters occurring
in the le with the results printed to the terminal. Characters are presented by the ASCII
code which consists of integer values. The letters of the alphabet, both uppercase and
lowercase are part of what is known as the printable range of the ASCII code. This includes
the ASCII code in the range [32, ..., 126].
# Open the text file for reading and extract each line from the file
# and iterate over each character in he line
the_file = open('text_file.txt', 'r')
for line in the_file:
for letter in line:
code = ord(letter)
the_counter[code] += 1
# Close the file
the_file.close()
import ctypes
class MyArray:
def __init__(self, size):
assert size > 0, 'Array size must be > 0'
self._size = size
self._next_item = -1
self.clear(None)
def __len__(self):
return self._size
def __iter__(self):
return self
def __next__(self):
if self._next_item < len(self) - 1:
self._next_item += 1
return self._elements[self._next_item]
else:
raise StopIteration
11.4 Array 2D
11.4.1 Implementing Array 2D
11.4. ARRAY 2D 95
class MyArrayTD:
def __init__(self, no_rows, no_cols):
def num_rows(self):
# return self.num_rows
return len(self._rows)
def num_cols(self):
# return self.num_cols
return len(self._rows[0])
11.5.1 Rules
The universe of the Game of Life is an innite, two-dimensional orthogonal grid of square
cells, each of which is in one of two possible states, alive or dead, (or populated and un-
populated, respectively). Every cell interacts with its eight neighbours, which are the cells
that are horizontally, vertically, or diagonally adjacent. At each step in time, the following
transitions occur:
1. Any live cell with fewer than two live neighbours dies, as if by underpopulation.
2. Any live cell with two or three live neighbours lives on to the next generation.
3. Any live cell with more than three live neighbours dies, as if by overpopulation.
4. Any dead cell with exactly three live neighbours becomes a live cell, as if by repro-
duction.
These rules, which compare the behavior of the automaton to real life, can be condensed
into the following:
2. Any dead cell with three live neighbors becomes a live cell.
3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.
The initial pattern constitutes the seed of the system. The rst generation is created by
applying the above rules simultaneously to every cell in the seed; births and deaths occur
simultaneously, and the discrete moment at which this happens is sometimes called a tick.
Each generation is a pure function of the preceding one. The rules continue to be applied
repeatedly to create further generations.
## Rules
### Any live cell with fewer than two live neighbours dies (underpopulation)
### Any live cell with two or three live neighbors lives on to the next generation
### Any live cell with more than three live neighbours dies, as if by overpopulation
### Any dead cell with exactly three live neighbours becomes a live cell, as
if by reproduction
class Life:
11.5. GAME OF LIFE 97
def next_generation(self):
if self.flag:
self.flag = not self.flag
self._initial = self._next.copy()
for r in range(self._rows):
for c in range(self._cols):
self.check_underpopulation(self._next,r,c,self._initial)
self.check_overpopulation(self._next,r,c,self._initial)
self.check_reproduction(self._next,r,c,self._initial)
# self.plot_generation(self._initial)
# print(f'Next Generation: \n {self._initial}')
return self._initial
else:
self.flag = not self.flag
self._next = self._initial.copy()
for r in range(self._rows):
for c in range(self._cols):
self.check_underpopulation(self._initial,r,c,self._next)
98 CHAPTER 11. ARRAYS
self.check_overpopulation(self._initial,r,c,self._next)
self.check_reproduction(self._initial,r,c,self._next)
# self.plot_generation(self._next)
# print(f'Next Generation: \n {self._next}')
return self._next
mat_data.append(mat[i])
mat_dataset = tuple(mat_data)
plt.matshow(mat_dataset)
plt.show()
dim_row = 200
dim_col = 200
data = np.random.random_integers(0,1,(dim_row,dim_col))
# print(data)
life = Life(initial=data)
fig, ax = plt.subplots()
ax.imshow(life._initial)
for i in range(100):
ax.cla()
ax.imshow(life.next_generation())
# ax.set_title("frame {}".format(i))
# Note that using time.sleep does *not* work here!
plt.pause(1)
100 CHAPTER 11. ARRAYS
Chapter 12
Algorithm Analysis
12.1 Introduction
Algorithms are designed to solve problems, but a given problem can have many dierent
solutions. To determine the most ecient solution, we can measure the execution time. We
can implement the solution by constructing a computer program, using a given programming
language. We then execute the programe and time it using a wall clock or the computer's
internal clock. The execution time is dependent on several factors. First, the amount of data
that must be processed directly aects the execution time. As the data set size increases,
so does the execution time. Second, the execution times can vary depending on the type
of hardware and the time of day a computer is used. If we use a multi-process, multi-
user system to execute the program, the execution of other programs on the same machine
can directly aect the execution time of our program. Finally, the choice of programming
language and compiler used to implement an algorithm can also inuence the execution
time. Some compilers are better optimizer than others and some languages produce better
optimized code than others. Thus, we need a method to analyze an algorithms eciency
independent of the implementation details.
In computer science, time complexity is the computational complexity that describes the
amount of time it takes to run an algorithm.
Big O notation is a method for determining how fast an algorithm is. Using Big O
notation, we can learn whether our algorithm is fast or slow. This knowledge lets us design
better algorithms.
This article is written using agnostic Python. That means it will be easy to port the
Big O notation code over to Java, or any other language. If the code isn't agnostic, there's
Java code accompanying it.
101
102 CHAPTER 12. ALGORITHM ANALYSIS
total_sum = 0
for in in range(n):
row_sum[i] = 0
for j in range(n):
row_sum[i] = row_sum[i] + matrix[i, j]
total_sum = total_sum + matrix[i,j]
Suppose we want to analyze the algorithm based on the number of additions performed.
In this example, there are only two addition operations, making this a simple task. The
algorithm contains two loops, one nested inside the other. The inner loop is executed n
times and since it contains the two addition operations, there are a total of 2n additions
performed by the inner loop for each iteration of the outer loop. The outer loop is also
performed n times, for a total of 2n2 additions.
Can we improve upon this algorithm to reduce the total number of addition operations
performed? Consider a new version of the algorithm in which the second addition is moved
out of the inner loop and modied to sum the entries in the rows um array instead of
individual elements of the matrix
total_sum = 0
for i in range(n):
row_sum[i] = 0
for j in range(n):
row_sum[i] = row_sum[i] + matrix[i,j]
total_sum = total_sum + row_sum[i]
In this version, the inner loop is again executed n times, but this time, it only contains
one addition operation. That gives a total of n additions for each iteration of the outer
loop, but the outer loop now contains an addition operator of its own. To calculate the
total number of additions for this version, we take the n additions of the outer loop. This
gives n+1 additions for each iteration of the outer loop, which is performed n times for a
total of n2 + n additions.
If we compare the two results, it is obvious the number of additions in the second version
is less than the rst for any n greater than 1. Thus, the second version will execute faster
than the rst, but the dierence in execution times will not be signicant. The reason is
that both algorithms execute on the same order of magnitude, namely n2 . Thus, as the
size of n increases, both algorithms increase at approximately the same rate (though one is
slightly better).
Table 12.1 presented in page 103 presents an important growth rate comparison il-
lustrating the discussed example. Figure 12.1 presented in page 103 presents graphical
representation of values presented in the table.
for dierent inputs and see which one takes less time. There are many problems with this
approach for analysis of algorithms.
1. It might be possible that for some inputs, rst algorithm performs better than the
second. And for some inputs second performs better.
2. It might also be possible that for some inputs, rst algorithm perform better on one
machine and the second works better on other machine for some other inputs.
Asymptotic Analysis is the big idea that handles above issues in analyzing algorithms.
In Asymptotic Analysis, we evaluate the performance of an algorithm in terms of input size
(we don't measure the actual running time). We calculate, how does the time (or space)
taken by an algorithm increases with the input size.
For example, let us consider the search problem (searching a given item) in a sorted
array. One way to search is Linear Search (order of growth is linear) and other way is
Binary Search (order of growth is logarithmic). To understand how Asymptotic Analysis
solves the above mentioned problems in analyzing algorithms, let us say we run the Linear
Search on a fast computer and Binary Search on a slow computer. For small values of input
array size n, the fast computer may take less time. But, after certain value of input array
size, the Binary Search will denitely start taking less time compared to the Linear Search
even though the Binary Search is being run on a slow machine. The reason is the order of
growth of Binary Search with respect to input size logarithmic while the order of growth
of Linear Search is linear. So the machine dependent constants can always be ignored after
certain values of input size.
1. Worst Case
2. Average Case
3. Best Case
# Driver Code
arr = [1, 10, 30, 15]
x = 30
n = len(arr)
print(x, "is present at index",
search(arr, n, x))
30 is present at index 2
Pn+1
i=1 θ(i)
ACT = = θ(n) (12.1)
(n + 1)
General Notes
1. Most of the times, we do worst case analysis to analyze algorithms. In the worst
analysis, we guarantee an upper bound on the running time of an algorithm which is
good information.
2. The average case analysis is not easy to do in most of the practical cases and it is
rarely done. In the average case analysis, we must know (or predict) the mathematical
distribution of all possible inputs.
106 CHAPTER 12. ALGORITHM ANALYSIS
3. The Best Case analysis is bogus. Guaranteeing a lower bound on an algorithm doesn't
provide any information as in the worst case, an algorithm may take years to run.
4. For some algorithms, all the cases are asymptotically same, i.e., there are no worst
and best cases.
Constant: O(1)
Logarithm: O(logn)
Linear: O(n)
Polynomial: O(n2 ), O(n3 ), O(nx )
Exponential: O(2n )
def odd_or_even(n):
return "Even" if n % 2 else "Odd"
log93
What is being asked here is 3 to what power gives us 9? This is 3 to the power of 2
gives us 9, so the whole expression looks like:
log93 = 2
a = [1, 2, 3, 4, 5, 6 , 7, 8, 9, 10]
We want to nd the number "2".
We implement Binary Search as:
return found
This is:
3. If it's not, check to see if that element is more than the item we want to nd
4. If it is, ignore the right-hand side (all the numbers higher than the midpoint) of the
list and choose a new midpoint.
Linear Time
Linear time algorithms mean that every single element from the input is visited exactly
once, O(n) times. As the size of the input, N, grows our algorithm's run time scales exactly
with the size of the input.
Linear time is where every single item in a list is visited once, in a worst-case scenario.
shopping_list = ["Bread", "Butter", "The Nacho Libre soundtrack from the 2006
film Nacho Libre", "Reusable Water Bottle"]
for item in shopping_list:
print(item)
Let's look at another example. The largest item of an unsorted array
Given the list:
Polynomial Time
Polynomial time is a polynomial function of the input. A polynomial function looks like n2
3
or n and so on.
If one loop through a list is O(n), 2 loops must be O(n2 ). For each loop, we go over
the list once. For each item in that list, we go over the entire list once. Resulting in n2
operations.
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in a:
for x in a:
print("x")
For each nesting on the same list, that adds an extra +1 onto the powers.
So a triple nested loop is O(n3 ).
Bubblesort is a good example of an O(n2 )algorithm. The sorting algorithm takes the
rst number and swaps it with the adjacent number if they are in the wrong order. It does
this for each number, until all numbers are in the right order - and thus sorted.
def bubbleSort(arr):
n = len(arr)
bubbleSort(arr)
Exponential Complexity
Exponential time is 2n , where 2 depends on the permutations involved.
This algorithm is the slowest of them all. You saw how my professor reacted to polyno-
mial algorithms. He was jumping up and down in furiosity at exponential algorithms!
Say we have a password consisting only of numbers (10 numbers, 0 through to 9). we
want to crack a password which has a length of n. To bruteforce through every combination
we'll have:
10n
Combinations to work through.
One example of exponential time is to nd all the subsets of a set.
>>> subsets([''])
['']
>>> subsets(['x'])
['', 'x']
>>> subsets(['a', 'b'])
['', 'a', 'b', 'ab']
We can see that when we have an input size of 2, the output size is 22=4.
Now, let's code up subsets.
def subsets(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
Taken from the documentation for itertools. What's important here is to see that it
exponentially grows depending on the input size. Java code can be found here.
Exponential algorithms are horric, but like polynomial algorithms we can learn a thing
or two. Let's say we have to calculate 104. We need to do this:
10*10*10*10=102*102
We have to calculate 102 twice! What if we store that value somewhere and use it later
so we do not have to recalculate it? This is the principle of Dynamic Programming, which
you can read about here.
When we see an exponential algorithm, dynamic programming can often be used to
speed it up.
Again, knowing time complexities allows us to build better algorithms.
110 CHAPTER 12. ALGORITHM ANALYSIS
Chapter 13
Linked List
13.1 Introduction
A linked list is a sequence of data elements, which are connected together via links. Each
data element contains a connection to another data element in form of a pointer. Python
does not have linked lists in its standard library. In this chapter we are going to study
the types of linked lists known as singly linked lists. In this type of data structure there is
only one link between any two data elements. We create such a list and create additional
methods to insert, update and remove elements from the list.
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
def __init__(self):
self.headval = None
list1 = SLinkedList()
list1.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
# Link first Node to second node
list1.headval.nextval = e2
111
112 CHAPTER 13. LINKED LIST
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
def __init__(self):
self.headval = None
def listprint(self):
printval = self.headval
while printval is not None:
print (printval.dataval)
printval = printval.nextval
list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
list.listprint()
When the above code is executed, it produces the following result:
Mon
Tue
Wed
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
def __init__(self):
self.headval = None
list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
list.headval.nextval = e2
e2.nextval = e3
list.AtBegining("Sun")
list.listprint()
When the above code is executed, it produces the following result:
Sun
Mon
Tue
Wed
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
class SLinkedList:
114 CHAPTER 13. LINKED LIST
def __init__(self):
self.headval = None
list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Wed")
list.headval.nextval = e2
e2.nextval = e3
list.AtEnd("Thu")
list.listprint()
When the above code is executed, it produces the following result:
Mon
Tue
Wed
Thu
class Node:
def __init__(self, dataval=None):
self.dataval = dataval
self.nextval = None
13.4. INSERTION IN A LINKED LIST 115
class SLinkedList:
def __init__(self):
self.headval = None
NewNode = Node(newdata)
NewNode.nextval = middle_node.nextval
middle_node.nextval = NewNode
list = SLinkedList()
list.headval = Node("Mon")
e2 = Node("Tue")
e3 = Node("Thu")
list.headval.nextval = e2
e2.nextval = e3
list.Inbetween(list.headval.nextval,"Fri")
list.listprint()
When the above code is executed, it produces the following result:
Mon
Tue
Fri
Thu
class Node:
def __init__(self, data=None):
self.data = data
self.next = None
116 CHAPTER 13. LINKED LIST
class SLinkedList:
def __init__(self):
self.head = None
HeadVal = self.head
if (HeadVal == None):
return
prev.next = HeadVal.next
HeadVal = None
def LListprint(self):
printval = self.head
while (printval):
print(printval.data),
printval = printval.next
llist = SLinkedList()
llist.Atbegining("Mon")
llist.Atbegining("Tue")
llist.Atbegining("Wed")
llist.Atbegining("Thu")
llist.RemoveNode("Tue")
llist.LListprint()
Thu
13.4. INSERTION IN A LINKED LIST 117
Wed
Mon
118 CHAPTER 13. LINKED LIST
Chapter 14
Queue
14.1 Introduction
Queue is a linear data structure that stores items in First In First Out (FIFO) manner.
With a queue the least recently added item is removed rst. A good example of queue is
any queue of consumers for a resource where the consumer that came rst is served rst.
1. Enqueue: Adds an item to the queue. If the queue is full, then it is said to be an
Overow condition Time Complexity : O(1)
2. Dequeue: Removes an item from the queue. The items are popped in the same order
in which they are pushed. If the queue is empty, then it is said to be an Underow
condition Time Complexity : O(1)
3. Front: Get the front item from queue Time Complexity : O(1)
4. Rear: Get the last item from queue Time Complexity : O(1)
14.3 Implementation
There are various ways to implement a queue in Python. This article covers the implemen-
tation of queue using data structures and modules from Python library.
1. list
2. collections.deque
3. queue.Queue
119
120 CHAPTER 14. QUEUE
# Initializing a queue
queue = []
print("Initial queue")
print(queue)
# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty
Output
Initial queue
['a', 'b', 'c']
# Initializing a queue
q = deque()
print("Initial queue")
print(q)
# Uncommenting q.popleft()
# will raise an IndexError
# as queue is now empty
Output:
Initial queue
deque(['a', 'b', 'c'])
full() Return True if there are maxsize items in the queue. If the queue was initialized
with maxsize=0 (the default), then full() never returns True.
get() Remove and return an item from the queue. If queue is empty, wait until an
item is available.
put(item) Put an item into the queue. If the queue is full, wait until a free slot is
available before adding the item.
qsize() Return the number of items in the queue. If no free slot is immediately
available, raise QueueFull.
# Initializing a queue
q = Queue(maxsize = 3)
q.put(1)
print("\nEmpty: ", q.empty())
print("Full: ", q.full())
Output:
Full: True
Empty: True
Empty: False
Full: False
124 CHAPTER 14. QUEUE
Chapter 15
Stack
15.1 Introduction
A stack is a data structure that stores items in an Last-In/First-Out manner. This is
frequently referred to as LIFO. This is in contrast to a queue, which stores items in a
First-In/First-Out (FIFO) manner.
It's probably easiest to understand a stack if you think of a use case you're likely familiar
with: the Undo feature in your editor.
Let's imagine you're editing a Python le so we can look at some of the operations you
perform. First, you add a new function. This adds a new item to the undo stack:
You can see that the stack now has an Add Function operation on it. After adding the
function, you delete a word from a comment. This also gets added to the undo stack:
Notice how the Delete Word item is placed on top of the stack. Finally you indent a
comment so that it's lined up properly:
You can see that each of these commands are stored in an undo stack, with each new
command being put at the top. When you're working with stacks, adding new items like
this is called push.
Now you've decided to undo all three of those changes, so you hit the undo command.
It takes the item at the top of the stack, which was indenting the comment, and removes
that from the stack:
Your editor undoes the indent, and the undo stack now contains two items. This opera-
tion is the opposite of push and is commonly called pop.
When you hit undo again, the next item is popped o the stack:
125
126 CHAPTER 15. STACK
15.2. IMPLEMENTING A PYTHON STACK 127
This removes the Delete Word item, leaving only one operation on the stack.
Finally, if you hit Undo a third time, then the last item will be popped o the stack:
The undo stack is now empty. Hitting Undo again after this will have no eect because
your undo stack is empty, at least in most editors. You'll see what happens when you call
.pop() on an empty stack in the implementation descriptions below.
1. list
2. collections.deque
>>> myStack = []
>>> myStack.append('a')
>>> myStack.append('b')
>>> myStack.append('c')
>>> myStack
['a', 'b', 'c']
>>> myStack.pop()
'c'
>>> myStack.pop()
'b'
>>> myStack.pop()
128 CHAPTER 15. STACK
'a'
>>> myStack.pop()
Traceback (most recent call last):
File "<console>", line 1, in <module>
IndexError: pop from empty list
You can see in the nal command that a list will raise an IndexError if you call .pop()
on an empty stack.
list has the advantage of being familiar. You know how it works and likely have used it
in your programs already.
Unfortunately, list has a few shortcomings compared to other data structures you'll look
at. The biggest issue is that it can run into speed issues as it grows. The items in a list are
stored with the goal of providing fast access to random elements in the list. At a high level,
this means that the items are stored next to each other in memory.
If your stack grows bigger than the block of memory that currently holds it, then Python
needs to do some memory allocations. This can lead to some .append() calls taking much
longer than other ones.
There is a less serious problem as well. If you use .insert() to add an element to your stack
at a position other than the end, it can take much longer. This is not normally something
you would do to a stack, however.
The next data structure will help you get around the reallocation problem you saw with
list.
>>> myStack.append('a')
>>> myStack.append('b')
>>> myStack.append('c')
>>> myStack
deque(['a', 'b', 'c'])
>>> myStack.pop()
'c'
>>> myStack.pop()
'b'
>>> myStack.pop()
'a'
>>> myStack.pop()
Traceback (most recent call last):
File "<console>", line 1, in <module>
IndexError: pop from an empty deque
15.4. USING COLLECTIONS.DEQUE 129
This looks almost identical to the list example above. At this point, you might be
wondering why the Python core developers would create two data structures that look the
same.
Fortunately, you rarely want to do random indexing or slicing on a stack. Most operations
on a stack are either push or pop.
The constant time .append() and .pop() operations make deque an excellent choice for
implementing a Python stack if your code doesn't use threading.
This chapter includes resources that are complimentary to the information presented in this
book, and that are useful for further reading. Resources are divided by sections, based on
their categories.
15.7 Errata
Though we have tried heavily to make this book free of errors; and by we I mean the review
team that have contributed greatly in debugging this book, I am sure there is no book edited
by human that is free of errors. In case you have found any error within the book, kindly
email me at:
h.elghareeb@yahoo.com
and use DSA20 Book Errata in the email subject. Though it is not for granted, but mostly
you will be rewarded with a free edition of this book, or may be another book from my
library.
15.9 Bibliography
This list include some of the resources we have greatly beneted from during our journey
in learning Data Structures and Algorithms, and in writing this book. List include; not
limited to
Python
131