You are on page 1of 205

CS 2112 : Object-Oriented Design and Data Structures – Honors

Fall 2015

CS 2112 Lecture and Recitation Notes


Andrew Myers

Table of Contents
1. Introduction
2. Objects and values
3. Java execution model: arrays, strings, autoboxing
4. Encapsulation and information hiding
5. Representing Java values
6. Interfaces and subtyping
7. Inheritance and the specialization interface
8. Using exceptions
9. Loop invariants
10. Recursion and linked lists
11. Asymptotic complexity
12. Generics and more lists
13. Trees
14. Hash tables
15. Sorting
16. Grammars and parsing
17. Designing and documenting interfaces
18. Modular design and implementation strategies
19. Testing
20. Design patterns
21. Graphical user interfaces: display and layout
22. User interface design
23. Concurrency
24. Synchronization
25. Graphs and graph representations
26. Graph traversals
27. Dijkstra's single-source shortest path algorithm
28. Minimum spanning trees and strongly connected components
29. Priority queues and heaps
30. Balanced binary trees
31. Interpreters, compilers, and the Java virtual machine
32. Hard and incomputable problems
Introduction
These are course notes from Computer Science 2112, “Object-Oriented Design and Data Struc-
tures”, which is the honors version of CS 2110.

Topics covered in this course include object-oriented programming, program structure and or-
ganization, program reasoning using specifications and invariants, recursion, design patterns,
concurrent programming, graphical user interfaces, data structures, sorting and graph algorithms,
asymptotic complexity, and simple algorithm analysis. Java is the principal programming lan-
guage, but this course is not primarily a programming language instruction course. Students are
expected to have some prior programming experience, though not necessarily in Java.
Objects and Values
Primitive values
Programming languages describe computations that compute over values. These values include
primitive values such as integers and characters, which in some sense exist even before programs
start running, and objects, which are created by programs as they run.

The Java language supports a variety of different primitive values. Each value has a correspond-
ing run-time type. Here are some examples of primitive values:

Type Examples Description

int 1, 5, -10
integers between −231 and 231−1

char 'x', 'a', There are about 65,000 Unicode characters from
'\n',
'\u0000' '\u0000' to '\uFFFF'. Ordinary ASCII characters have
character codes between 0 and 127 ('\u0000' to
'\u007F').

long 1L, 5L, -10L


integers between −263 and 263−1

boolean true, false

double 2.0, -1.5e5 A double-precision floating point number, with about


17 digits of precision.

float 2.0f A single-precision floating point number, with about 7


digits of precision. Usually to be avoided because of
its limited accuracy.

(null) null null is the only primitive value that can be used where
an object is expected. It is generally used to represent
the absence of an object value. Its run-time type does
not have a corresponding name as a Java type.

An expression like 2+3 is a computation that produces a value (5, of course). The expression + is
a binary operator (binary, because it takes two arguments). There are also unary operators like
logical negation (!). For example, !true evaluates to false.
Java prevents us from writing certain computations that don't make sense. For example, the ex-
pression 2 + true would be a run-time type error if allowed to proceed. However, Java will not
allow this expression to be written because it can tell that a run-time type error might (and will,
in this case) result. In fact, run-time type errors are not possible in Java. For that reason we say
that Java is a strongly typed language.

Variables
Programming languages let us assign values into variables. For example, assuming we have vari-
ables x and y of the right type, we can write code like this:

x = 2;
y = 2;
y = y + 1;

We read the equals sign “=” not as “equals” but as “gets” or “becomes”. This is because the oper-
ator is not testing equality, but rather, changing the value currently in a variable. We can see this
because as an equation, y = y+1 doesn't make much sense!

A useful way to understand what is happening when we run Java programs is to draw an object
diagram. There are different styles of object diagrams; here are two ways to draw what is going
in the code above (click on the figure to step through the code and see its effects):

The view expressed by the picture on the left is that variables contain references to values; the
view of the picture on the right is that variables contain the values themselves. In the view on the
left, the variables x and y are pointing to values that in some sense live outside the computer; we
can't really put the number 2 in the variable x because there is only one 2, and we certainly don't
destroy the number 2 when we assign into y. What is stored in x is something that represents or
names 2. In the view on the right, variables store values, so we can put the value 2 directly into x.
Both views are equally correct in the sense that they give the same answers if we use either of
them to predict what will happen when we run a program and observe its output. Of course, what
actually happens when we run the program is more complicated than either of these diagrams!
So it is very useful to have a simple model that nevertheless manages to accurately explain the
output we get.

In practice, people often draw object diagrams using a hybrid approach in which variables of
primitive type are drawn as shown in the diagram on the right side, whereas variables of object
type are drawn as shown in the diagram on the left. This treats the different types perhaps more
differently than is warranted, but makes the diagrams look as simple as possible. Such hybrid di-
agrams also hint at what is going in the underlying implementation of the programming lan-
guage, because references to objects are typically more expensive than references to values of
primitive types.

Types and declarations


Java is a statically typed language, which means that the compiler tries to figure out before the
program runs what the type of each expression will be. To help it do this, the programmer is re-
quired to declare the type of all variables. (Some languages, such as OCaml, actually can auto-
matically infer the types of variables.) This is different from dynamically typed languages such as
Python, where variables do not have a static type and can even be used to store values of differ-
ent types.

In Java we can write a variable declaration by writing the new variable name preceded by its
type, e.g.:

int x;

We can both declare a new variable and initialize it with a value in one statement:

int y = x + 1;

Java will let us use the variable y immediately, but will not let us use the value of variable x until
it has been assigned a value.

If we try to assign a value that is not an int to one of these variables, such as in the assignment y
= true;, it will be a compile-time (static) type error. The program will not be allowed to run.

However, Java does allow certain type mismatches to occur, and automatically resolves them by
converting the value to a value of the correct type. If we write:

double z = y;

then the variable z is initialized to the value 2.0, the closest approximation to the integer 2 that
the type double supports.

Defining and calling methods


To compute something in Java, we need to put it into a method, which can either define a func-
tion (which returns a value) or a procedure (which doesn't). For example, if we want to compute
the triple of numbers, we might define a method to do it:

/** Return 3 times n */


int triple(int n) {
int r = n + n + n;
return r;
}

Let's dissect this a bit. We start with a specification (or just spec) describing what the method
does. This is a comment that the Java compiler ignores. However, the Javadoc tool will read the
comment and use it to construct web page documentation for the method. It's a good idea when
writing a method to start with the spec. We'll talk more later about how to effectively use specs
and Javadoc.

The expression int triple(int n) is the signature of the method. It starts with the return type
(int), and also gives the types of the expected arguments. The formal parameter (or just formal)
n takes on the value of whatever argument (aka actual) is provided to the function when the func-
tion is invoked (aka called). The body of the method is the part in braces ({...}). The body defines
what computation is done by the method. The bolded keyword return causes the function to pro-
duce the value that its expression n + n + n computes. If we've done our job correctly, this com-
putation agrees with the spec. Note that in Java, braces (“{” and “}”) are used as a way to group
one or more statements together, much like parentheses are used for grouping expressions.

We can visualize what happens when this method is called using an object diagram. Consider the
following code:

int y = 1000;
int z = triple(y+1);

The following figure shows what happens as the code executes. Initially only the variable y ex-
ists, with value 1000. When the call to triple happens, the argument 1001 is first computed.
Then, an activation record is created (often called a stack frame, though this is really a name for
a way to implement an activation record). The activation record holds all the local variables for
the function being called, which in this case is both the formal parameter n, bound to the value of
the argument 1001, and the local variable r. The desired result, 3003, is computed and stored into
r, and then the return statement returns this value to the caller. The just-created activation record
is destroyed as the function returns, and the new variable z is initialized in the activation record
of the calling code to the returned result.

Classes and objects


Objects are unlike primitive values in that programs can create them at run time. The form of an
object is defined by its class, which describes the state of the object and the operations the object
supports. For example, suppose we want values that represent points in the plane. We might de-
fine a class Point that describes the objects that are instances of the class:

class Point {
int x = 0;
int y = 0;
}
The class contains two variable definitions. The variables x and y are called instance variables.
Each object that is an instance of Point has its own instance variables x and y. Instance variables
are also called fields, a term that was inherited from the C programming language.

The new operator is used to create objects. For example, we can create two point objects that both
represent the origin:

Point a = new Point();


Point b = new Point();

Even though these objects represent the same point on the plane, they are different objects. We
represent this by drawing objects as boxes. Viewed as a value, the object is the box, so we refer
to objects as boxed values, whereas primitive values are unboxed values. The object diagram for
the above code shows that we have two distinct values in play:

Although the two objects initially have the same state, we can easily distinguish the two objects.
For example, we can assign to the fields of one of the objects:

b.x = 2;
b.y = b.x + 1;

Notice that we can name instance variables by using the dot notation to describe which object's
instance variable is being used, and we can both assign to instance variables and use their values,
exactly like regular variables.

Each object keeps track of what class it is an instance. This is represented in the object diagram
by putting the class name in the first slot. Because the object knows its class, it is possible to
write code to check whether an object is of a particular class.

We can also build objects using other objects. Suppose we want objects that represent axis-
aligned rectangles. We can define such a rectangle in terms of its lower-left and upper-right cor-
ners:

class Rectangle {
Point LL;
Point UR;
}
...
Rectangle r = new Rectangle();
r.LL = a
r.UR = b;

Assuming variables a and b are initialized and updated as in the code above, we arrive the the
following object diagram. Notice that the Rectangle object does not contain the two point ob-
jects; it merely references them. Changes to the point objects through variables a and b will still
affect what rectangle r represents.

Instance methods
In addition to instance variables, classes can define methods that provide their instances with op-
erations. For example, we might want to find the area of a rectangle. We extend the class defini-
tion above accordingly:

class Rectangle {
Point LL;
Point UR;
int area() {
return (UR.x - LL.x) * (UR.y - LL.y);
}
}

In order to invoke this method, we need an instance, which is called the receiver object:

Rectangle r = ...;
int ar = r.area(); // evaluates to 6 on rectangle defined above

Inside the method, we can use the instance variables UR and LL. These variable names denote the
instance variables of the receiver object.

Methods do not have to be functions; we can also define methods that are procedures, used for
the side effects they cause. A procedure has a return type of void. For example, suppose we want
to be able to change Points to be back at the origin:

class Point {
int x, y;
void setOrigin() {
x = y = 0; // we can cascade assignments and initializations
}
}
...
Point a = ...;
...
a.setOrigin(); // a refers to the same object, but its instance variables x and y are changed.

Constructors
It's awkward and error-prone to have to initialize the instance variables of objects after creating
them with new. A better solution is to define one or more constructors that initialize the object to
a “good” state. Constructors are methods whose name is exactly the same as that of the class.

For example, it would be convenient to be able to create Point objects to represent the appropri-
ate coordinates directly. An appropriate constructor would be as follows.

class Point {
int x, y;
Point (int xx, int yy) {
x = xx;
y = yy;
}
...
}
...
Point b = new Point(2,3);

Static methods
Sometimes it is helpful to have methods attached to the class itself, rather than to its instances.
Such methods are called static methods (or class methods). They can be invoked without any re-
ceiver object, and they have no access to the instance variables of a receiver object. A method is
made static by adding the keyword static to its signature.

The main method


A Java program is simply a class with a static main method with exactly the following signature:

public static void main(String[] args);

Because it is static, main can be called when no object of its class is available. Extra arguments
supplied to the program by the operating system are made available in the array args.

For example, we can write a program that prints “Hello world!”:

class Hello {
public static void main(String[] args) {
System.out.println("Hello world!");
}
}
This program can be run using Eclipse, which will compile it automatically. Alternatively, you
can run it from a shell such as bash (or, on Windows, cmd or Powershell, where the prompt $ will
be a >):

$ javac Hello.java # compile the program


$ java Hello # run the program
Hello World!
Java Execution Model: Arrays, Strings,
Autoboxing
We look at some important language features and see how object diagrams can help us to under-
stand how they work and to avoid common programming errors involving them.

Arrays
Like objects, arrays in Java are boxed values. The type int[] is the type of an array of int, and
any type can be substituted for the int to obtain a corresponding array type.

Since arrays are boxes, we ordinarily create them with the new expression. Consider the following
example:

int[] a = new int[2];


int[] b = new int[] { 10, 20, 30 };

This code creates objects and initializes variables as shown by the following diagram.

As with objects, the variables a and b contain references to the arrays rather than the arrays them-
selves. If we wrote an assignment a = b;, they would subsequently refer to the same underlying
array; a and b would be aliases. Each array contains a first slot that keeps track of the type of the
elements in the array, and each has a single immutable instance variable length that keeps track
of the number of elements. The two elements of a are initialized to the default value of type int,
which is 0.

When declaring and initializing an array of type T[] for some T, an abbreviated syntax is allowed
in which the usual new T[] is omitted. For example, the previous declaration of b could be written
more compactly:

int[] b = { 10, 20, 30 };


In general, we can't construct arrays completely at declaration time. To initialize them, it is com-
mon to use loops. The for loop is a useful statement in Java. For example, here is a loop that ini-
tializes an array of Points, where the class Point is defined as in the previous lecture:

int n = ...;
Point[] pa = new Point[n];
for (int i = 0; i < n; i++) {
pa[i] = new Point(i, 0);
}

This code creates an array whose entries are references to a series of newly created Point objects:

The for loop repeatedly executes its body (the code in braces) until a condition is false. It has an
interesting syntax. There are three clauses in the parentheses, separated by semicolons. The first
clause is the loop initializer. It is executed once at the beginning of the loop and may be a varia-
ble declaration. The second clause is the loop guard. It is evaluated at the beginning of every
loop iteration, and the loop terminates if it evaluates to false. The third clause is the increment
statement. It is executed at the end of every loop iteration.

Another way to exit from a loop is to use the break statement. It immediately terminates the clos-
est enclosing loop. The less frequently used continue statement causes the current loop iteration
to end and the next loop iteration to begin immediately (although the increment statement is still
executed, and the guard is still checked.)

Multidimensional arrays
A multidimensional array is really an array of references to arrays. For example, consider the fol-
lowing code that creates a two-dimensional array (aka matrix):

int[][] m = {{10, 20, 30}, {40, 50, 60}};

For many purposes, we can think of this code as truly creating a 2D array that could, for exam-
ple, be used for linear algebra computations:
However, this code actually creates three objects:

Java does not try very hard to ensure that m continues to represent a nice rectangular matrix. For
example, we can change the length of one of its rows:

m[1] = new int[1];

Or we can even make the rows alias each other!

m[1] = m[0];

Arrays as objects
Since arrays are objects, we can put a array value into a variable whose type is declared to be
Object:

Object a = new int[]{10, 20};

This also means that we can create an array of objects and store any object, including arrays, into
it:

Object[] b = new Object[]{a, a};


b[0] = b; // !
While this code is legal, it is certainly confusing and probably not a good model of how to write
code!

Strings
Strings in Java are also objects, which leads to some surprises. A string literal like "Hello" actual-
ly causes a call to a constructor for String, resulting in an object. For example, the code on the
left has the effect on the right:

String x = "Hello";
String y = x;
String z = y + "World";
String w = y + "World";

The operator + denotes concatenation when applied to


strings, rather than addition. It creates new string objects.
Notice that variables z and w are initialized to refer to string
objects that have exactly the same state, but are actually dif-
ferent objects. Since strings are immutable (they cannot be
changed after they are created), the fact that they are differ-
ent objects normally does not matter.

Strings support a large selection of useful methods. For ex-


ample, one such method is charAt, which returns the charac-
ter at a given position in the string. For example, the expres-
sion z.charAt(1) evaluates to the character 'e', and the same
is true for w.

The strings referenced by z and w can be distinguished in one way, however. If they are compared
using the == operator, the result of z == w is false. This happens because the == operator on boxed
values simply returns whether the operands are the same box (that is, the same object). Probably
this isn't what we want when we compare two strings!

Therefore, when comparing two objects generally, and strings particularly, you should almost al-
ways use the equals method, which returns whether two objects should be considered inter-
changeable. The expression z.equals(w) evaluates to true, as we'd like. Think twice before you
use == on object values.
Based on the discussion of strings above, it is
natural to think that strings are very special
objects in Java. But they aren't: the only truly
special thing about strings is that string ob-
jects can be created using the convenient
quotation mark syntax. The object diagram
above is a bit of a white lie, because strings
are actually implemented using arrays of
characters. For example, the string "Hello" is
really implemented as two objects as shown
in the object diagram on right. The String ob-
ject contains an instance variable value that
refers to the array of characters making up
the string.

Since the entries in the character array never change, you have to work pretty hard to figure out
that is what a string really is in Java, because you can only access strings through the operations
of the String class. And that is a Good Thing, because it means that the designers of Java can
change the way strings work in future versions of Java without breaking all the existing pro-
grams! In fact, the implementation of Strings has changed significantly in the past few versions
of Java, so even this object diagram is a white lie.

Autoboxing
Sometimes we want to use an unboxed value like an int where a boxed value is expected. For
example, a variable of the type Object can refer to any object, but can't refer directly to a primi-
tive value.

To address this issue, Java introduces a set of classes corresponding to the primitive values. For
int there is Integer, for boolean there is Boolean, for double, Double, and so on. Each of these clas-
ses defines objects that contain a value of the appropriate primitive type, and define equals to
compare state.

In addition, Java will automatically box primitive values into the corresponding object type when
necessary, and will automatically unbox them in the other direction, too. This feature is called
autoboxing. It can have some counterintuitive effects, however. For example, consider this code:

Integer i = 200;
Object l = i;
int j = i;
Object k = j;
i == j // true
i == l // true
j == k // static error: can't compare Object and int.
i == k // false!

There are a couple of surprises here: first, the compiler does not let us compare j and k. Autobox-
ing causes j to be boxed into an Integer object, but the static type of k is Object, so the Java com-
piler does not know that k can be unboxed into an int.
Another surprise is the last line of code. Since i and k are different objects representing the same
number, they compare as unequal. As with strings, we should use the equals method to compare
values of type Integer.

Perhaps even more surprisingly, changing the number 200 to anything between -128 and 127 will
cause the code above to report true for i == k. This happens because there is a table of Integer
objects that is used only for small integers. Autoboxing is performed by the method
Integer.valueOf, which uses this table when it can and only resorts to new for larger integers.

One moral of the story, again, is that to compare to Integer objects, we need to use the equals()
method on the objects. Even though expression i==k is false, the expression i.equals(j) is true.

Clearly, the assignment j=i is doing more than just an assignment. In fact, it's really executing
the following code: j = i.intValue(). The intValue() method extracts the int value from the
Integer object. This is an example of syntactic sugar, in which the language permits us to abbre-
viate how we write code. Conversely, if we assigned i=j, this would be syntactic sugar for i =
Integer.valueOf(j), which calls a method that depending on the value of j either looks up an ap-
propriate preexisting object in a table, or creates a new Integer object. Calls to the valueOf and
intValue methods are automatically inserted by the Java compiler to implement boxing and un-
boxing. Similar methods exist for the other primitive types.

Names and scope


Names can refer to a variety of things: local variables (including formal parameters), instance
variables (aka fields), methods, types, classes, and packages. The basic rule for deciding what
kind of thing a name refers to is to find the definition of the name with the smallest scope that in-
cludes the use of the name. Different kinds of names have different rule for scope. Local varia-
bles are in scope from the point of declaration until the end of the block in which they are de-
clared. Method and field names are in scope throughout their class. Class and interface names are
in scope throughout the program unless they are nested inside another class, in which case they
are in scope throughout the containing class.

If a name is in the scope of two different declarations at once, the outer declaration is said to be
shadowed by the inner one. Java considers some shadowing to be illegal. For example, this code
will not compile because the variable x is shadowed inside the while loop:

int x = 2;
while (x != 0) {
int x = 5;
// both x's in scope here.
}
// only outer x in scope here.

One place where shadowing is allowed, often getting programmers into trouble is when a local
variable shadows an instance variable. Shadowing often arises with constructors, because it is
tempting to name formal parameters in the same way as instance variables:

class Point {
int x, y;
Point(int x, int y) {
// locals x and y shadow instance variables x and y
this.x = x;
this.y = y;
}
}

As the example shows, there is a way to talk about shadowed instance variables, using the object
reference this. The expression this can only be used inside instance methods (not static meth-
ods) and refers to the current receiver object: in this case, the object being constructed. Alterna-
tively, you can give the formal parameters different names (for example, by appending an under-
score), and then the instance variables can be used directly.
Encapsulation and Information Hiding
The object-oriented programming model
A major topic of this course is object-oriented design. We use an object-oriented language, Java,
as a vehicle for exploring object-oriented design. We assume some prior familiarity with Java,
but will focus on how to use it in an object-oriented way.

It is useful to distinguish between object-oriented (OO) languages and the object-oriented pro-
gramming model. A programming model is an approach to solving programming problems.
There are many programming models and variants thereof. For example, in addition to the ob-
ject-oriented model, there is a functional programming model that you will learn about in CS
3110. And there are variations on the object-oriented model such as the event-driven model.

Some languages are designed to support some programming models better than other, and it
makes sense to use an OO language like Java for learning OO design. But this course is not pri-
marily about Java. It is a course about object-oriented design (and other computer science top-
ics), and the lessons you learn about object-oriented design should apply to other programming
languages.

Elements of OO programming
What makes a language object-oriented? It should support the essential elements of object-ori-
ented programming. There are three key elements, which we will discuss in the following three
lectures:

1. Encapsulation (also called Information Hiding)


2. Subtyping
3. Inheritance

We'll start by talking about encapsulation, an idea that Java supports with the keywords public,
private, and protected. The use of these keywords prevents code and data from being used,
which may seem strange: why is it important to take power away from the programmer? The an-
swer is that limiting access supports modularity.

The need for modular programming

Figure 1: Procedural programming: data is fully accessible by all code


Figure 2: Modular programming: access to data is mediated by code

In early programming languages, the information manipulated by the program got short shrift.
Programs were organized around the algorithms doing the computation, as illustrated in Figure 1.
Those algorithms had full access to the data that they were computing with. This is the procedur-
al model of programming.

As software systems became increasingly complex, the procedural model stopped working well.
It did not scale up to big systems. The problem is that the procedural model offers no control
over which program code can access a given part of the data. Code can reach into the program
data and use it or update in an arbitrary way. Working on a team of programmers is difficult in
the procedural model because the different parts of the code are too tightly coupled. A bug in one
software component can corrupt program data and look like a bug in a different component.
Code is hard to maintain and to evolve without breaking it, and changes to way data is represent-
ed in the program tend to affect all of the program code rather than being isolated to a small part
of the code.

Modular programming
To address these problems, modular programming was developed. (Object-oriented program-
ming extends modular programming with some additional ideas that we will get to soon.) The
idea of modular programming is that the software should be broken up into distinct modules that
can be developed relatively independently. A good modular design respects the principle of sepa-
ration of concerns, which says that different aspects of the design should be designed separately.
With a good modular design, changes can be made to one module without changing other mod-
ules, and it is relatively easy for programmers to know whether their changes will affect other
modules that are perhaps owned by other programmers. Separation of concerns is strengthened
by information hiding in which modules do not take advantage of knowledge about detailed in-
formation about how other modules are implemented. Information hiding provides loose cou-
pling that tends to make code evolution easier. In a loosely coupled system, changes to the way
information is represented or modules are implemented tend to propagate less to other modules,
much as loosely coupled train cars can start moving without trying to move the whole train.

A key insight in programming language design was that modular programming can be enforced
by a programming language mechanism that encapsulates its state and behavior, enforcing infor-
mation hiding and controlling how other modules can access it. This approach is suggested by
Figure 2. Code outside a module cannot directly access the data that is internal to the module.
Any access from outside to a module's data must occur via the module's code, and only the code
that the module chooses to expose to the outside. Access to the data is mediated by this public
code.

By client code we mean code outside the module that is using the module through a set of public-
ly exposed operations. These operations we called the interface of the module, which should not
be confused with the Java language feature interface (which is, however, an example of the more
general interface idea, and one we will return to).

The interface that a module exposes to client code is a kind of contract with the rest of the pro-
gram. The idea of modular programming is that if very module lives up to its contract, the whole
program will work correctly. Programmers can then think about and program each module in iso-
lation from the rest. Instead of thinking about the correctness of the entire program, a bewilder-
ingly complex problem, they can just think about the correctness of the particular module they
are working on now. This nice property of modular programming is called local reasoning.

Modules also make it possible to use data from other modules without knowing exactly how that
data is represented. All they have to know is what operations (from the public interface) can be
performed on the data. The data is opaque to the client code, which means that the module im-
plementer is free to change how the data is represented because no client code can depend on the
precise representation. This powerful idea is called data abstraction. The word abstraction refers
to the idea of hiding inessential detail. In this case, the inessential detail is, for the client code,
the precise way that information is represented inside the module.

An example: rational numbers


In an object-oriented language like Java, encapsulation and data abstraction are primarily provid-
ed by classes, though packages are also used as an encapsulation mechanism. A class and its
code are shared by all objects of that class (the instances of the class), and the class's code can
mediate access to all information stored in instances. For example, suppose we want objects that
act like rational numbers, allowing us to write code like the following, using a class Rational:

Rational a = new Rational(1, 2);


Rational b = new Rational(1, 3);
Rational c = Rational.plus(a, b);
System.out.println(a.toString() + "+" + b + "=" + c);
if (c.equals(new Rational(10, 12))) {
System.out.println(" = 10/12");
}

A class implementing rational numbers is shown here (you can click on code examples to down-
load them).
/** A rational number p/q where p is the numerator and q is the denominator.
*/
public class Rational {
private int p,q; // represents p/q
// class invariant: q > 0, gcd(p,q) = 1
// Note: gcd(0, x) = x

/** Create num/den. Requires den != 0. */


public Rational (int num, int den) {
if (den < 0) {
num = -num;
den = -den;
}
int g = gcd(num,den);
p = num/g;
q = den/g;
}

/** Update this to be this+r. */


public void add(Rational r) {
int g = gcd(q, r.q);
p = r.q/g * p + q/g * r.p;
q *= r.q/g;
}

/** Returns x+y. */


public static Rational plus(Rational x, Rational y) {
Rational z = new Rational(x.p, x.q);
z.add(y);
return z;
}

public boolean equals(Object o) {


if (!(o instanceof Rational)) return false;
Rational r = (Rational) o;
return (p == r.p && q == r.q);
}
public String toString() {
return Integer.toString(p) + "/" + q;
}

...
}

Example: Rational numbers

There are many interesting things going on in this implementation. We start out with a very im-
portant comment, which we call the class overview. This describes how client code programmers
should think about the values of class Rational. This is all To the client, the objects are simply ra-
tional numbers, with a numerator p and a denominator q. The overview also gives the client a no-
tation for talking about these objects abstractly, as a fraction p/q. Having a notation for objects of
the class is helpful for expressing the specifications of the methods. The

The data in Rational objects is the fields p and q, with type int. This is just one possible represen-
tation of rational numbers. For example, it would probably be better to make the types of these
fields long or even use BigInteger. We could also imagine keeping track of the sign of the number
in a separate boolean field, leaving both p and q as nonnegative numbers. The point is that the
client doesn't and shouldn't need to know how the number is represented internally. The client
should think of the objects of class Rational as simply rational numbers.

The fields of the class are marked private to ensure that they are encapsulated inside the class.
The keywords public and private are known as visibility modifiers, because they control which
parts of the class are visible outside the class, and hence can be accessed.

The methods add and equals and the constructor Rational are marked public and hence can be
used by external clients.

Inside the method add, there is a special variable this that refers to the object on which the meth-
od was invoked, called the receiver object. It happens that add does not mention this explicitly,
but it does refer to the fields of this as p and q. Writing these names is equivalent to writing
this.p and this.q respectively.

The method plus is a static method, which means that it does not have a receiver object. The spe-
cial variable this is not in scope in a static method. That means it cannot be named within the
method. A static method should be called using the name of its class, as in the following code:

Rational r3 = Rational.plus(r1, r2);

It is also possible to declare fields to be static, in which case they are shared by all objects of
their class. However, this practice should usually be avoided, and even static methods (except for
constructors) should be used sparingly because they limit reuse and extensibility of the code.

Preconditions and postconditions


The specification of the constructor has a Requires clause stating that the argument den must be
non-zero. This clause is a form of precondition that must be satisfied by any correct client imple-
mentation. It is a programmer mistake for client code to call the constructor without satisfying
the precondition: in particular, it is a mistake by the programmer writing the client code. Thus,
when mistakes are made, preconditions help us figure out whose fault they are.

Similarly, postconditions (sometimes called Returns clauses) specify what a method is supposed
to do. If a method doesn't satisfy its postcondition, the mistake is not the client's; it's the imple-
menter's.

Class invariants
An early comment expresses an invariant regarding the fields p and q. An invariant is something
that is always true at certain points in the program, though in programming, invariants can be vi-
olated temporarily. This particular invariant is an invariant about the state of objects of the class,
and is variously called a class invariant, data structure invariant, or representation invariant.
The invariant states that q is positive and that the rational number is always stored in the reduced
form where p and q are relatively prime. A class invariant is expected to hold at the beginning
and end of every public method, but may be temporarily violated in the middle of a method.
Knowing that the class invariant is true is very helpful because when writing the code for the
methods of the class, you can ignore the possibility of a zero denominator.
Having invariants that you can rely on is critical to being able to easily write working code. It is
much easier to make sure you can rely on invariants if the code that enforces the invariant is lo-
calized to one class, as a class invariant (or, at least, to a small number of classes in a package).

Encapsulation aids with this goal of localization. Because the fields p and q are private, the code
of the class can enforce this invariant. Code outside the current class has no way to, say, modify q
to be zero. Conversely, we can see that making any assignable field public completely destroys
the ability to enforce class invariants involving that field: client code can assign an arbitrary val-
ue to the field.

Like the method plus, the class constructor, which must also be named Rational like its class, is
also static in the sense that it is not called using a receiver object. Unlike in a static method, the
variable this and its fields are in scope inside the constructor. They refer to the fields of the ob-
ject currently being constructed. Notice that the constructor does not simply accept the numerator
and denominator directly, but instead computes a new numerator and denominator that represent
the same number while satisfying the invariant. This ensures that at the end of the constructor,
the invariant holds.

Because of the invariant, the method equals can be implemented very simply and efficiently, by
comparing the corresponding fields of this and r. This implementation relies on the fact that the
invariant we chose ensures there is only one way to represent a given rational number. In general,
it is not required that there be a unique representation for any rational number, but it's handy
here. Without the invariant, we would have to write something more expensive like the follow-
ing:

public boolean equals(Object o) {


...
return (p*r.q == q*r.p);
}

There is no free lunch here, of course. We had to pay up front for the simplicity of equals (and
toString), by enforcing the invariant elsewhere in the code.

Visibility modifiers
As we have seen, the annotations public and private can be used to control which code outside a
class can access its components. The full list of visibility modifiers is as follows:

Modifier Significance Comments

public Accessible everywhere. Instance vars should not


normally be public.

private Accessible only within the class May limit future


extensibility

(no Accessible from classes in the same Does not apply to nested
modifier) package packages
protected Accessible from subclasses (and also
from other classes in the same
package).

Assertions
Preconditions and postconditions define a contract between the client and the implementer, and
class invariants are an internal contract between the module implementation and itself. If every-
one is obeying the contract, the program will work. But if someone doesn't follow their part of a
contract, the program may fail in a way that is hard to debug. How can we gain confidence that
these contracts are all being obeyed?

Using assertions is very helpful for catching these contract violations and speeding up debug-
ging. The assert statement stops the program (with an AssertionError) if the tested condition is
false. Assertions can be used to double-check that anything the programmer believes ought to be
true, actually is. While this has some performance impact, assertions can be turned off for pro-
duction code.

One thing to watch out for with assertions is that they are turned off by default! You should al-
ways have them turned on when developing code. This is achieved by giving the Java VM the -
ea flag. We recommend setting this flag by default for your Eclipse projects. You can find it in
Eclipse in Preferences→Java→Installed JREs→Edit, where you set “Default VM arguments” to
-ea.
Adding assertions to Rational
/** A rational number p/q where p is the numerator and q is the denominator.
*/
public class Rational {
private int p,q; // represents p/q

boolean classInv() {
return q > 0 && gcd(p, q) == 1;
}

/** Create num/den. Checks (assert): den != 0. */


public Rational (int num, int den) {
assert den != 0;
if (den < 0) {
num = -num;
den = -den;
}
int g = gcd(num,den);
p = num/g;
q = den/g;
assert classInv();
}

/** Update this to be this+r. */


public void add(Rational r) {
assert classInv() && r.classInv();
int g = gcd(q, r.q);
p = r.q/g * p + q/g * r.p;
q *= r.q/g;
assert classInv();
}

/** Returns x+y. */


public static Rational plus(Rational x, Rational y) {
assert x.classInv() && y.classInv();
Rational z = new Rational(x.p, x.q);
z.add(y);
assert z.classInv();
return z;
}

public boolean equals(Object o) {


if (!(o instanceof Rational)) return false;
Rational r = (Rational) o;
assert r.classInv();
return (p == r.p && q == r.q);
}
public String toString() {
assert classInv();
return Integer.toString(p) + "/" + q;
}
...
}

Example: class Rational with assertions added to check class invariants and preconditions
Names and packages
Classes in Java live in packages. For example, the class String is really a shorthand for the fully
qualified class name java.lang.String, where java.lang is the name of a Java package containing
many standard Java types.

The dot symbol . is used for several things in Java. As we've seen earlier, it is used to indicate
use of a method or field. Beyond this, it is used to indicate which package a class lives in. Pack-
age names can have dots inside them, and those happen to define how Java source code and com-
piled code are stored in the file system of the computer. Perhaps surprisingly, it is incorrect to
think of java.lang as being “inside” the package java. In particular, something that is made visi-
ble just to classes in the java package will not be visible to classes in the java.lang package, or
vice versa. These are two different packages whose names happen to be related.

When referring to a class in a different package, it is necessary either to use the fully qualified
name for those classes or to use an import statement at the top of the source file. Thus, we can
write the fully qualified class name cs2112.lec03.Rational to refer to the Rational class from a
package other than cs2112.lec03; alternatively, we can write a statement import
cs2112.lec03.Rational; at the top of the file and then just refer to the class as Rational. It is also
possible to use a wildcard import to import all classes in a single package: for example, import
cs2112.lec03.*;. In fact, the entire package java.lang is automatically imported in this way, which
is why we can talk about classes like String and Integer without qualifying their names. In gen-
eral, however, the danger of a wildcard import is that it may import many classes you don't need,
creating confusion about what some names refer to. It is usually better to import just the classes
you are actually using.

Further information
There is a useful Oracle web page about programming with assertions.
Representing Java Values
The Java language abstraction vs. hardware implementation
The Java language is an abstraction. Java is implemented on top of hardware using a compiler
and the software for its run-time system, presenting you with the illusion that there are really
ints, booleans, objects, and so forth. In reality there are circuits and electrons. For most part, you
can ignore the lower-level details of what is really happening when Java programs run. That's be-
cause the Java abstraction is very effective. But computer scientists ought to have a model of
how computers really work.

Memory, memory addresses, and words


The computer memory is actually a big grid in which a bit is stored at every intersection. But the
hardware itself offers a more abstract view of memory as a big array. The memory can be ac-
cessed using memory addresses that start at 0 and go up to some large number (usually 232 or
264, depending on whether the computer is 32-bit or 64-bit). Each address names a group of 8
bits (a byte). However, usually computer memories give back multiple bytes at once. When giv-
en an address divisible by four, the computer memory can read the four bytes beginning at that
address, containing 32 bits. These four bytes are called a word. On 64-bit machines, the memory
can fetch 64 bits at a time; for simplicity we'll call a set of eight bytes a double word.

Integral types (int, short, byte) and signed (two's-complement)


representations.
Internally, computers represent numbers using binary (base 2) numbers rather than the decimal
(base 10) representations we've grown up with. In base 2, there are only two possible digits, 0
and 1, which can be represented in the computer's hardware as a switch that is turned off or on,
or a voltage that is low or high.

Binary digits stand for coefficients on powers of two rather than on powers of ten, so 11012 = 810
+ 410 + 0 + 1 = 1310. (Note that we write a subscript 2 or 10 to indicate the base of a number
when there is ambiguity.)
In base 10, a number of n digits can represent any number from 0 up to but not including the
number 10n. For example, in three decimal digits we can represent any number between 0 and
999 = 103-1. Similarly, n binary digits can represent any number from 0 up to 2n-1. Therefore, a
byte containing 8 bits could represent any number between 0 and 255, since 28 = 256. The Java
char type is a 16-bit number, so it can have any value between 0 and 216-1 = 65535. A 32-bit
word can represent any number between 0 and 232-1 = 4294967295.

That may sound good enough for most uses, but what about negative numbers? To handle them,
we need a way to represent the sign of a number.

1. Signed: int (32 bits), short (16 bits), long (64 bits), byte (8 bits)
2. Unsigned: char (16 bits, 0–65535)

Two's complement representation: in a signed n-bit representation, the most significant bit
stands for -2n-1 instead of 2n−1, which would be its value in the unsigned representation we have
been discussing above. For Java's type byte, the bits 01111111 represent 127 but 10000000 is
−128.

Adding these together as binary numbers in the usual way, we get 11111111, which represents
−1, just as we'd expect when we add −128 and 127. Adding one more to this—and dropping the
final carry—we have 00000000, the representation of 0, also as we would like. Some other repre-
sentations for bytes are as follows:

−7 11111001

−4 11111100

−2 11111110

−1 11111111

0 00000000

1 00000001

2 00000010

4 00000100
7 00000111

In effect, going from an unsigned representation to a two's-complement signed representation


takes the upper half of the representable numbers and makes them wrap around to become nega-
tive numbers. This works because numeric computations are done modulo 2n, and 128 and −128
are the same number modulo 28.

Everything we have said about the byte applies to the other signed Java types (short, int, and
long), except that those types have more bits (16, 32, and 64 respectively).

An interesting identity is that to negate a number, you simply invert all of its bits and then add
one, discarding any final carry. Considering bytes, we have −1 = 11111110 + 1 = 11111111. What
about negating 0? In this case, −0 = 11111111 + 1 = 100000000, but this is just 0 once we discard
the 9th digit.

Other representations for numbers exist, such as one's-complement, in which the negation of a
number is performed by inverting all of its bits. But the usual two's-component representation is
more convenient to work with and leads to simpler, faster implementations in hardware. At this
point essentially everyone uses two's complement.

Arrays, objects, and pointers


When Java variables are stored in the computer's memory, they may take up more space than the
number of bits indicated. In particular, variables are stored in memory on word boundaries and
always take up an integral number of words. So variables with types short, int, byte, and char all
take up one full word, whereas variables with type long take up two consecutive words. For ex-
ample, given the variable declarations

char c = 'a';
long x = 1;

we might find variables c and x stored in consecutive memory locations 10012–10023 as in the
following figure showing the contents of each byte. Notice that the high 32 bits of x are all zero,
and that the high 16 bits of the word containing c are unused.
Fields of objects (instance variables) are stored in the same way as variables, taking up an inte-
gral number of words that are located contiguously in memory.

For example, consider the following code:

class A {
char c;
B y;
}
class B {
long z;
}
...
B b = new B();
b.z = 1;
A a = new A();
a.c = 'a';
a.y = b;

Suppose that object b is located at memory location 14404 and object a is located at 22288. Then
memory looks something like the following:

Notice that each object starts with one header word that describes its class. This word actually
contains a pointer to another memory location that is shared by all objects of that class.

Because the actual memory addresses don't matter, it is sufficient for understanding this program
to look at the simpler object diagrams we have been using:

Floating point representations


Java supports floating-point numbers that act a lot like real numbers. But it takes an unbounded
amount of memory to store a real number, so floating-point numbers are only an approximation.
There are two floating-point types in Java. The type float is a single-precision floating point
number taking up one word, and the double double is a double-precision floating point number
taking up two words.

It is possible to see how floating point numbers are represented in Java by using the methods
Float.floatToRawIntBits and Double.doubleToRawLongBits(double), which return an int and a long
containing the same bits as the respective float and double. Attached to these notes is some code
that uses these methods to explore how floating point numbers are represented.

According to the IEEE 754 specification, which is almost universally followed at this point, A
single-precision floating point number is represented as shown:

A floating point number is represented using three components, the sign, the exponent, and the
mantissa, respectively taking up 1, 8, and 23 bits of the 32-bit word.

Given components s (sign), exp (exponent), and m (mantissa), where the exponent is between
−126 and 127, the floating point number represents the real number

(−1)s·2exp·(1.m)

Here, we intepret m as a sequence of binary digits, so “1.m” represents a binary number that is at
least 1 (and equal to 1 in the case where m = "000000...") and less than 2. The maximum number
that 1.m can represent is "1.11111...", which is a binary representation of a number less than 2 by
the small amount 2−23.

Thus, if we want to represent the number 1, we choose s=0, exp=0, m=0000... . To represent
numbers outside the interval [1,2), an exponent is needed. To represent 2, we use s = 0, exp = 1,
m=0000..., since 1·21 = 2. To represent 3, we choose s=0, exp=1, m=10000..., since 1.1 in binary
represents 3/2, and 3/2·21 = 3. And so on.

The exponent is stored with a “bias” or offset of 127, so the actual bit-level representation of 1.0
has s = 0, bexp = 01111111, m = 00000000000000000000000, and the whole word contains ex-
actly the bits 00111111100000000000000000000000.

Some special exponents follow different rules. If the exponent is −127 (and hence the biased ex-
ponent is 0), the represented number is instead:

(−1)s·2−126·(0.m)

Therefore, a word consisting entirely of 0's represents the floating point number 0.0, since 0.m =
0 in that case. It is also possible to represent the floating point number −0.0, which behaves
slightly differently, such as when its reciprocal is computed.

An exponent of 128 is represented by a biased exponent of 255 (all 1's). “Numbers” represented
with this exponent include positive and negative infinity, with a mantissa of 0 and the appropriate
sign bit. If the mantissa is anything other than 0, the represented “number” is called NaN, for
“not a number”. The value NaN arises when there is no reasonable answer to the computation,
such as when adding positive and negative infinity. Any computation performed using NaN pro-
duces NaN, so a final result of NaN indicates that something went wrong during the computa-
tion.

A single decimal digit corresponds to log2 10 = 3.32 bits, so the number of decimal digits repre-
sented by 23 bits of mantissa is 23/3.32 = 6.93. You can therefore think of floating point numbers
as representing numbers to about 7 digits of precision. However, errors tend to accumulate with
each computation, so a long computation often has fewer digits of precision. For example, if you
subtract 1000000.0 from 1000000.1, the result will be 0.1 but with only 1 digit of precision. In
fact, as the attached code showed, the result is reported to be 0.125. Oops!

The other thing to watch out for is that the order of operations may affect what you compute, be-
cause rounding gets done at different places. Just because two formulas produce the same real
number doesn't mean they will produce the same floating-point number. As a result, it is usually
necessary when comparing two floating-point numbers for equality to instead check that their
difference lies within some tolerance corresponding to the largest expected round-off error.

The largest number representable using float is Float.MAX_VALUE, which is about 3.0·1038.
And the smallest positive number representable with full precision is about 10−38. It is possible
to represent smaller numbers down to about 10−45 using the denormalized representation in
which the exponent is −127, but in this case the precision of the number is reduced.

Because of the limited precision of single-precision floating point, it's usually a good idea to use
double for any computations you really care about. A double-precision number has 11 exponent
bits (with a bias of 1023) and 53 mantissa bits, so about 16 digits of precision and a decimal ex-
ponent ranging from −308 to 308 (at full precision). For most applications, that's plenty of preci-
sion and range.
Interfaces and subtyping
The word “interface” has more than one meaning. In the context of computer science generally,
it refers to a description or specification of the way that client code interacts with a program
module. In the context of Java, “interface” refers to a language feature that allows programmers
to create interfaces for classes, which are a module mechanism. An interface describes a set of
public methods that must be provided by the class; when using the class via the interface; only
these public methods can be used. The interface includes the name and signature of the methods;
it's also a good place to write specifications for those methods.

Example
For example, suppose that we want to implement the 2048 game discussed in the introductory
lecture. We want to keep track of the state of the game at any given point. Let's define an inter-
face to the state of the game. We'll call it Puzzle:

/** A state of the 2048 game, which contains a 4x4 grid of numeric tiles. */
interface Puzzle {
/** Reset the game to the initial state, with all tiles blank. */
public void reset();

/** Returns: the value of the tile at row r and column c, or 0 if that
* tile position is blank.
* Checks: both r and c are in the range 0..3.
*/
int tile(int r, int c);

/** Effect: performs a game move in the specified direction.


* Returns: true if the move is legal; that is, tiles can be slid
* in the specified direction.
* Checks: d is one of 'U', 'D', 'L', or 'R'.
*/
boolean move(char d);

/** Current score. */


int score();
}

Notice that this interface says nothing about how the various methods are implemented or about
how the puzzle information is represented inside objects. The interface doesn't say that the meth-
ods are public, because interface methods are always public.

An implementation
The Puzzle interface can be implemented by defining a class that is declared to implement it.
Now we get to make some implementation choices, such as how to represent the tiles of the puz-
zle. One obvious representation is as a 2D array of integers, with 0 representing blank tiles:

class APuzzle implements Puzzle {


private int[][] tiles; // a 4x4 array
private int score;
public APuzzle() {
reset();
}
public void reset() {
tiles = new int[4][4];
}
public int tile(int r, int c) {
return tiles[r][c]; // no assert needed!
}
public boolean move(char d) {
switch(d) {
case 'U': ...
case 'D': ...
case 'L': ...
case 'R': ...
default: assert false;
}
...
}
public int score() {
return score;
}
}

There are few interesting aspects of this code. First, notice that each method declared in the inter-
face must be implemented as a public method in the class. The class has some other components
as well. For example, the instance variables tiles and score are declared private to hide them
from clients. A class can also add new methods not declared in the interface, such as the
APuzzle() constructor.

We can define an interface (in the general sense) to a class either by declaring which of its meth-
ods are public, or by declaring a Java interface as above. One advantage of using the interface
mechanism is that it allows multiple implementations of the interface.

For example, here is a sketch of a second implementation of the Puzzle interface, in which the
tiles are represented as characters in a string (the char type can represent numbers up to 32767,
which should suffice):

class SPuzzle implements Puzzle {


private String tiles; // a 16-character string
private int score;

public SPuzzle() {
reset();
}
public void reset() {
tiles = "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
}
public int tile(int r, int c) {
assert 0 <= r && r < 4 && 0 <= c && c < 4;
return tiles.charAt(r*4 + c);
}
...
}
Using objects at type Puzzle
We now have two implementations of Puzzle. The important new capability that is added is that
client code can be written that does not care which interface is being used. For example, suppose
we want to write a method that displays a puzzle. That code can be written so it works on objects
from either implementation:

void display(Puzzle p) {
for (int r = 0; i < 4; i++) {
for (int c = 0; j < 4; i++) {
int value = p.tile(r,c);
// ... do some display work here ...
}
}
}

Now, client code can use display() on either kind of object, because both APuzzle and SPuzzle are
implementations of Puzzle.

APuzzle a = new APuzzle();


SPuzzle s = new SPuzzle();
...
display(a); // ok!
display(s); // ok!

As long as APuzzle and SPuzzle are implemented correctly, the code of display() cannot tell which
implementation is being used. That is why display() can be used on objects of either kind. The
abstraction barrier imposed by the Puzzle interface allows the programmer to start with one im-
plementation and later to replace it with a different implementation, with confidence that it won't
break the program.

Notice that at the method call p.tile(r,c), the compiler cannot know which method code it is go-
ing to run. The variable p refers to either an APuzzle or an SPuzzle object (or maybe even to some
other implementation of Puzzle), and in general, a compiler cannot figure out ahead of time
which one it will be. The call to tile() must find to the correct method implementation at run
time. This is called dynamic dispatch.

Classes vs. types


Both interfaces like Puzzle and classes like APuzzle and SPuzzle may be used as type in Java pro-
grams. The difference is that Puzzle can only be used as a static type. The run-time type of an ob-
ject can never be Puzzle; it is always the class of the object that was provided to the new operator
when the object was created.

For example, consider the following code. The first line creates an object with run-time type
APuzzle and assigns it to a variable of static type APuzzle. The second line assigns the same refer-
ence to a variable of static type Puzzle. This is allowed because APuzzle is an implementation of
Puzzle. The third line tries to use the interface Puzzle with the operator new. This is illegal, be-
cause Puzzle is not a class; the compiler doesn't know what implementation to use.
APuzzle a = new APuzzle(); // ok
Puzzle p = a; // ok
Puzzle p = new Puzzle(); // illegal

This shows that both classes and interfaces in Java may be used as static types, but only classes
can be used to construct objects.

Subtyping and subtype polymorphism


Because APuzzle implements Puzzle, an expression of static type APuzzle can always be used
where a Puzzle. This is an example of a subtype relationship between two types. We write
APuzzle <: Puzzle to mean that the type APuzzle is a subtype of the type Puzzle. (Sometimes you
will see this written as APuzzle ≤ Puzzle.) Since an SPuzzle can also be used wherever a Puzzle is
expected, the subtype relationship SPuzzle <: Puzzle also holds.

The subtyping relationships among the various types form a subtype hierarchy, an example of
which is shown in this figure:

In addition to the subtype relationships we've discussed so far, this diagram adds more, such as
Puzzle <: Object. By transitivity, this relationship also implies APuzzle < Object. (The subtyping
relation is a reflexive relation — that is, T<:T for all types T — and is also transitive.) We can al-
so notice that array types like int[] are subtypes of Object. Standing alone in the diagram are the
primitive types (int, boolean, char, and so on). These types are not subtypes of any other type.

In the diagram above, every type has at most one parent, making this diagram a collection of
trees (a forest). However, a class is allowed to implement more than one interface, as in the fol-
lowing definition:

class SPuzzle implements Puzzle, Collection<Integer> {


...
}

so in general the subtype diagram is a graph in which some nodes have more than one

Subtyping vs. coercion


Java lets us write the following declaration, which might make us think incorrectly that int <:
Object:
Object x = 2;

Although this looks like subtyping, actually Java is automatically inserting a coercion (conver-
sion) to make the types work. The variable o is not being assigned the value appearing on the
right-hand side, but rather a different value of type Integer. The declaration is syntactic sugar for
this one: Object o = Integer.valueOf(2).

Casting
With subtyping, a given value can be treated as though it has more than one type. (When a value
can have more than one type, it is called polymorphism, and the kind of polymorphism we get
with subtyping is called subtype polymorphism.)

Java's cast operator can be used to control the type at which we are using a value. As an example,
suppose we have a variable a of type APuzzle. To force it to be treated as a Puzzle, we write
(Puzzle) a. Since APuzzle is a subtype of Puzzle, this cast will always succeed at run time: it is
safe. We refer to this kind of cast as an upcast because it shifts the type upward in the subtype
hierarchy.

It is also possible to cast downward in the subtype hierarchy. This gives us a downcast, an un-
safe cast that can fail at run time. For example, consider this code:

APuzzle a = new APuzzle();


Puzzle p = a;
APuzzle a2 = (APuzzle) p;

Here, all three variables refer to the same underlying APuzzle object. The downcast to a2 succeeds
at run time because the class of the object is a subtype of the type it is being cast to. On the other
hand, if we had reassigned the variable p to refer to an SPuzzle object, the downcast would fail
with a ClassCastException.

A downcast should ordinarily be used only when it is guaranteed to succeed, so a failed downcast
generally means that the programmer has made a mistake. It is possible to ensure that a downcast
will succeed by using the instanceof operator to test the run-time type of the object.

Puzzle p = ...;
if (p instanceof APuzzle) {
APuzzle a2 = (APuzzle) p;
}

However, the use of downcasting and of instanceof, while sometimes necessary, is a danger sign
for your designs. If you find yourself using these operations, it is worth thinking about whether
there is a way to redesign your code to avoid them.

Notice that the ability to downcast means that while upcasting to a supertype can be used to hide
operations not present in the supertype, a downcast can be used to restore the view in which the
operations are visible. This shows that subtyping offers a weak form of information hiding in
which there is no real encapsulation.
Conformance
For one type to be a subtype of another, the methods of the first type must have signatures that
conform to the signatures of those in the supertype. The conformance requirement has implica-
tions for the types and visibility of methods.

For conformance, Java requires that the types of method formal parameters in the subtype be ex-
actly the same as the types of the corresponding formal parameters in the supertype. However,
the return type of a method in the subtype may be a subtype of the corresponding return type in
the supertype.

In addition, the visibility of instance methods in the subclass is not allowed to be any less than
the visibility in the superclass. Visibility modifiers can be ordered as follows:

private < (package) < protected < public

To see why, suppose that a method is declared public in the supertype but is private in the sub-
type. If we have a reference to an object of the subtype, the method will be inaccessible, but will
become accessible if the object is upcast to the supertype. Therefore, the private visibility modi-
fier is not meaningful. On the other hand, it is legal to have a method that is private in the super-
type but public in the subtype.

Interface subtyping
Interfaces can extend each other, creating a subtyping relationship. For example, we might want
a puzzle interface that adds a method for automatically solving the puzzle. If we declare the in-
terface to "extend" Puzzle, it will be a subtype:

interface SolvablePuzzle extends Puzzle {


Move[] solve();
}

The definition of SolvablePuzzle makes it a subtype of Puzzle: that is, SolvablePuzzle <: Puzzle,
and SolvablePuzzle has all of the methods of Puzzle in addition to the new solve() method that it
adds.

The ability to extend interfaces gives even more control over how much of a class is exposed to
client code. Clients that do not need the solve() method can be given the Puzzle view of the puz-
zle, avoiding unnecessarily coupling of the implementing class and its clients.

It is worth noting that subtype relationships need to be explicitly declared in Java. If we declared
another separate interface Q that had all the same methods as Puzzle, plus some more, it might
appear harmless to allow Q < Puzzle. Unlike some languages, Java does not allow this structural
subtype relationship; it only allows nominal subtyping, which is explicitly declared. There are
two reasons for this design: first, structural subtyping is hard to implement efficiently, and se-
cond, just because two methods have the same name and signature does not mean that they actu-
ally mean the same thing.
Factory methods
The use of interfaces allows us to write code that is independent of the choice of implementation
—except when the objects are actually created. At some point an actual implementing class must
be provided to new. This might seem to break the principle of separating interfaces from imple-
mentations.

When this strong separation is desired, a common solution is to use factory methods to build
objects. A factory method is one that creates objects, typically with interface type. For example,
we could (in some class) declare a method that creates Puzzle objects:

static Puzzle createPuzzle();

Now, code can use this method to create Puzzle objects, without committing to using any particu-
lar implementation of the Puzzle interface.

Summary
Java interfaces are a very useful mechanism for writing code in which module clients and mod-
ule implementations are strongly separated and hence loosely coupled. In addition, Java's support
for subtyping means that it is possible to write multiple implementations of the same interface,
and these multiple implementations can even coexist and interoperate within the same program.
This is a valuable feature for building large software.
Inheritance and the specialization interface
Recall that we identified three key elements of OO programming:

• Encapsulation
• Subtyping
• Inheritance

We are now discuss the third of these, inheritance. Inheritance is a mechanism that supports ex-
tensibility and reuse of code, both across programs and within programs.

Motivating example
Suppose someone gives us an implementation of the Puzzle interface that we saw in the previous
lecture. In this implementation, the puzzle is represented by a 2D array of integers:

class APuzzle implements Puzzle {


private int[][] tiles;

public int tile(int r, int c) {


return tiles[r][c];
}
public void move(Direction d) { ... }
...
}

Now suppose that we want to build a better puzzle implementation by reusing this code. For ex-
ample, we might want the puzzle to log all of the moves have been made. We could proceed by
copying the code of APuzzle to create a new class LogPuzzle which we then add new fields and
methods to. But this strategy doesn't work very well when the original supplier fixes some bugs
in APuzzle and gives us APuzzle'. We'll either have to apply the bug fixes to LogPuzzle or our up-
grades to APuzzle', as illustrated in Figure 1. Either one will be hard to do in an automatic way.

Figure 1: The problem of merging upgrades

At least for some kinds of changes that we might make between APuzzle and LogPuzzle, inher-
itance offers a solution to this problem. We can think of inheritance as a language mechanism for
copying a class and making certain kinds of changes to that class.

To add the new functionality to LogPuzzle, we inherit from APuzzle:

class LogPuzzle extends APuzzle {


private int num_moves;
public int numMoves() { return num_moves; }
public void move(Direction d) {
num_moves++;
super.move(d);
}
}

Here APuzzle is the superclass and LogPuzzle is the subclass. In general, classes inheriting from
each other form an inheritance hierarchy or class hierarchy, in which superclasses are above
their subclasses.

The LogPuzzle class is just like the APuzzle class, except:

• It has an additional method, numMoves, and a new field, num_moves.


• It has a new version of the move that overrides the original version from APuzzle. It
overrides the original because it has the same name and signature as the original method.
The signature consists of the types of all of the arguments and the return type.
• The type LogPuzzle is a subtype of the type APuzzle. That is, LogPuzzle <: APuzzle.

Method dispatch and inheritance


Because of the subtyping relationship between the two classes, we can write the following code:

APuzzle p = new LogPuzzle();


p.move(up); // which move?

Which version of the move is used? The static type of p is APuzzle, but dynamically refers to an ob-
ject whose class is LogPuzzle. For ordinary, non-static methods, the object's run-time class defines
what method is used. So it is the LogPuzzle version that is run. This is known as run-time method
dispatch or dynamic dispatch.

There are two different ways to explain how inheritance works when a method is invoked. One
way, which you have probably already heard in the past, is to say that when a method is invoked,
the system searches upward through the inheritance hierarchy looking for the first implementa-
tion of the method. But it is equivalent to say that the methods from the superclasses are copied
down to their subclasses except when overridden.

To better understand how inheritance operates, suppose that the superclass APuzzle had a method
scramble that used move:

class APuzzle {
void scramble() {
... move(random_dir) ...
}
}

APuzzle p = new LogPuzzle();


p.scramble();
// is p.numMoves() equal to 0?
When the method scramble() calls move(), which version is called? We can answer this question
by understanding that the code of scramble acts as though it is copied to the run-time class of p,
which is LogPuzzle. Therefore, when the method move() is called, the LogPuzzle version of move()
is used.

Inheritance and static methods


Static methods complicate the story slightly, because they cannot be overridden by subclasses.
The choice of what method to call is made in the class in which the method call is made. It does
not change when the code making the call is inherited into a subclass. Consider the following
code:

class A {
static int f() { ... }

void g() {
f();
}
}

class B extends A {
static int f() { ... }
}

A x = new B();
x.g();

When the method g() is called, the call to f() goes to the A version of f rather than the B version.
Because f is a static method, the call to f() is exactly the same as if it were written A.f(). There-
fore it does not change in meaning when it is inherited by B.

The special syntax super.move(), which was used in the implementation of LogPuzzle above, is al-
so a static call that always goes to the APuzzle version of move.

Constructors
Constructors are special static methods that are used to ensure that objects are fully initialized
before they are used. If a class defines one or more constructors, new objects of that class can on-
ly be created using the constructor. Therefore the constructor for any subclass must call a super-
class constructor, as in the following example:

class APuzzle {
public APuzzle(int size) {
tiles = new int[size][size];
}
}
class LogPuzzle extends APuzzle {
public LogPuzzle() {
super(4);
num_moves = 0;
}
}
Here, the constructor LogPuzzle always creates puzzles of size 4×4, which is accomplished by
calling the superclass constructor with super(4). The call to the superclass constructor is static.

Protected visibility
What if we want the LogPuzzle code to access the tiles field directly? As defined we cannot, be-
cause tiles is declared as private. However, if we give tiles the protected visibility in APuzzle, it
becomes visible to all subclasses of APuzzle:

class APuzzle {
protected int[][] tiles;
...
}

Protected fields and methods form a second interface to a class. Public methods and fields are the
public interface, which is exposed to client code. Protected methods and fields are the specializa-
tion interface, which is available to subclasses but not to ordinary clients. One of the challenges
of good object-oriented design is to design both of these interfaces effectively, without confusing
their roles. Designing a good specialization interface is especially important for object-oriented
libraries where the classes provided by the library are intended to be extended through inher-
itance.

Figure 2: the specialization interface

Protected methods and the specialization interface


Suppose that scramble() had been defined to call a protected method internal_move instead of the
public move method:

class APuzzle {
public void scramble() {
... internal_move(n); ...
}
protected internal_move(int d) { ... }
}

class LogPuzzle extends APuzzle {


protected internal_move(int d) {
num_moves++;
super.internal_move(d);
}
}

This example shows that the specialization interface of APuzzle allows the LogPuzzle class to
change the behavior of existing public methods without overriding them directly. Protected
methods are hooks for future extensibility of OO code. The specialization interface defines how
code can be extended.

Abstract classes
An abstract class is a class that provides some state and behavior that can be inherited by other
classes, but that cannot be the run-time class of any object. An abstract class, indicated by the
keyword abstract in the class declaration, may similarly declare methods that are marked
abstract. Such methods do not need to come with an implementation, but any non-abstract sub-
class must implement them.

Abstract classes are useful as a way to factor out and centralize common functionality needed by
a group of related classes. Using inheritance in this way is much better than copying the code and
state into all the classes.

One useful pattern is to use such methods as holes in the implementation to be filled in by sub-
classes. In this case, the protected visibility is appropriate because the methods are not intended
to be used directly by clients.
Using Exceptions
Exceptions vs. Errors
Exceptions are a language mechanism that helps transfer control from one point in the program
to another without cluttering the code in between. As always, we should keep a clear distinction
between the mechanisms the programming language designers chose to put at our disposal and
the proper ways to use those mechanisms to write good code.

In particular, an exception should not be thought of as exactly the same thing as an error, alt-
hough exceptions are often used to indicate errors. We will use the word “error” to mean a mis-
take in the code: a programmer error. One reason why the ideas of exceptions and errors are con-
fused is that exceptions are a useful way to stop programs quickly and cleanly when a program-
mer error is detected.

Exceptions have another use, however. We may in some cases want to use exceptions to handle
unusual conditions within the code. These might be “errors” in (that is, misconfigurations of) the
environment in which the program is being run rather than mistakes by the programmer. We
want the program to be able to handle such unusual conditions and respond accordingly; excep-
tions are a nice way to handle such unusual conditions fairly cleanly. Without exceptions, the
code to handle unusual conditions ends up mixed in with the code that handles normal-case exe-
cution of the program. This mixing makes the normal-case code (and hence the code as a whole)
harder to understand. With exceptions we can factor out separately the code that handles unusual
conditions.

Exceptions are generated either by using the throw statement to throw an object of a subclass of
class Throwable, or by using a built-in operation that generates an exception under some condi-
tion. Java has a quite a few builtin exceptions that can be generated by standard language con-
structs. Null values generate NullPointerException if used as objects, arrays generate
ArrayIndexOutOfBoundsException if the array index is, well, out of bounds, and so on.

Exceptions can also be handled by using a try statement. It has a body defining what code is al-
lowed to generate an exception, then at least one catch clause (or possibly a finally clause). The
catch clauses define which exceptions generated by the try body will be caught and define what
code is run when the exception is caught. The finally block provides some code that is always
run before the try statement finishes. The finally block is very useful for performing cleanup
work that must happen regardless of how cleanly the try statement completes. For example, it
might close files that were opened in the main body of the try.

Here is a small example of using exceptions to separate handling of unusual conditions from nor-
mal-case code. This code parses the command line of the program by scanning the argument list
from left to right. However, it needs to handle the case in which the user fails to provide a file-
name to the --file command-line option. The code can be simpler if it doesn't have to check that
every index into the array of arguments is in bounds. As the code below demonstrates, it is possi-
ble to use try...catch to factor out the handling of that problem in an exception handler:

String filename;
try {
for (int i = 0; i < args.length; i++) {
switch (args[i]) {
case "--file":
filename = args[i+1];
i++; // advance past filename
// as a one-liner: filename = args[++i];
break;
...
}
}
} catch (ArrayIndexOutOfBoundsException e) {
print_usage_message();
return;
}

In a typical command-line parser, there will be multiple option for which a user might forget to
supply the corresponding argument. Code in the style above not only avoids cluttering up the
normal-case code with error handling, but it even consolidates the handling of multiple errors in-
to one place.

How not to handle unusual conditions


An alternative to using exceptions is to define special return values to indicate unusual condi-
tions. The Java libraries often (unfortunately) follow this strategy. For example, the specification
for String.indexOf looks like this:

/**
* Returns: if the string argument occurs as a substring within this
* object, then the index of the first character of the first
* such substring is returned; if it does not occur as a
* substring, -1 is returned.
*/
public int indexOf(String str)

The problem with the special-return-value strategy is that it's easy to forget to check for the spe-
cial value, writing code such as this:

String s1 = s2.substring(s2.indexOf("header:") + 7);

If the string doesn't begin with header:, we'll get a result that includes the 6th and following char-
acters of s2. This doesn't make much sense! If the library designers had instead chosen to throw
an exception, client code could look like that above, and the compiler would force clients to re-
member to check for the case in which “header:” isn't found.

Checked vs unchecked exceptions


Java requires that some exceptions be declared in methods that might generate them. These are
called checked exceptions. Checked exceptions force the client to be aware that they might hap-
pen and to handle them appropriately. This helps lead to more robust code. Unchecked excep-
tions include the run-time exceptions like NullPointerException but also subclasses of Error. The
compiler will not warn clients if they are ignoring unchecked exceptions.

When should you use each kind of exception? It depends on why the exception is being used:
If the exception is thrown because there is a programmer error, unchecked exceptions are
the right choice. Particularly, Error or a subclass (e.g., AssertionError) should be used.

If the exception is thrown because there is an unusual condition, a checked exception


should be used. The client should be aware that the condition can happen. Sometimes
checked exceptions can be annoying because the programmer “knows” they cannot hap-
pen, yet they must be declared. The simplest way to deal with this is to write a catch
clause for those exceptions, and throw an Error from the handler.

Specifying partial functions


The coefficient method is an interesting case, because it is a partial function that has no natural
result in the case where the requested exponent is negative. There are several alternative ways to
deal with this situation, with varying tradeoffs in terms of performance vs. debugging. The alter-
native least friendly to the client is simply to require that the requested exponent be nonnegative,
by giving a requires clause:

/** Returns: the coefficient of the polynomial term with exponent n, or zero
* if there is no such term.
* Requires: n ≥ 0
*/
double coefficient(int n);

What happens if the client calls coefficient(-1)? This spec doesn't say. Maybe it throws an ex-
ception, maybe it goes into an infinite loop, maybe it just returns a wrong answer. If the client
makes this call, the code can do anything it likes. But that is okay. The spec is clear that the client
must ensure that n is not negative. If the requires clause is violated, it is the client's fault.

A more forgiving version of the same spec uses a checks clause, which is a kind of requires
clause. The difference is that the checks clause promises to check that the precondition holds,
and to stop the program as cleanly as possible if it is violated. For example, the method might
throw an exception that is a subclass of Error, which the client should not try to catch. However,
client code that violates the precondition is still wrong code. It is still the client's fault if the pre-
condition is violated.

/** Returns: the coefficient of the polynomial term with exponent n, or zero
* if there is no such term.
* Checks: n ≥ 0
*/
double coefficient(int n);

A good way to implement checks clauses is by using the assert statement. For example, the im-
plementation of coefficient might check this precondition as follows:

double coefficient(int n) {
assert n ≥ 0;
...
}

The assert statement may check any boolean condition. If Java is used with assertions enabled
(by using the -ea option, which you should enable in Eclipse), then the boolean condition is test-
ed at run time. If it evaluates to true, nothing happens. If it executes to false, an error
AssertionError is generated and will halt the program at the point where the assertion failed.

It is also even more helpful when writing a checks clause to tell the programmer what exception
will be thrown when the check fails, e.g.

// Checks: n ≥ 0 (assert)

or:

// Checks: n ≥ 0 (throws NegException)

This may help when debugging. However, if an exception is thrown to indicate an error, the cli-
ent (caller) should not catch that exception. The exception indicates a problem in the client code
that should not just be papered over.

The most friendly way to deal with partial functions is to make them total, so that the customer is
never wrong:

/** Returns: the coefficient of the polynomial term with exponent n, or zero
* if there is no such term. Throws NegException if n is negative.
*/
double coefficient(int n) throws NegException;

Notice that now the information about the thrown exception is part of the returns clause, and ap-
pears in the signature, indicating that this exception is expected behavior in some situations and
that the client had better be ready to handle it when it happens. The implementation of
coefficient might be exactly the same as when we used a “checks” clause. But with the checks
clause, the contract says that the client is in error; with exceptions appearing in the returns
clause, the client is always “right”.

Javadoc and exceptions


Javadoc doesn't completely support the clauses we have been describing thus far, though it has
been evolving in that general direction. If you want to use Javadoc to generate HTML documen-
tation, you will need to adapt this documentation strategy accordingly. The key is not that you
need to have explicitly labeled clauses that Javadoc understands, but that you should know for
each thing you write in the comment which clause it belongs to, and include all the information
that should be found in the clauses that the spec of your code needs.
Loop invariants
A loop invariant is a condition that is true at the beginning and end of every loop iteration, anal-
ogously to the way that a class invariant is true at the beginning and end of every public method.
When you write a loop that works correctly, you are at least implicitly relying on a loop invari-
ant. Knowing what a loop invariant is and thinking explicitly about loop invariants will help you
write correct, efficient code that implements tricky algorithms.

Binary search via iteration


Suppose we want to find an element in a sorted array. We can do much better than scanning from
left to right: we can use binary search. Here is the binary search algorithm, written as a loop.

/** Returns an index i such that a[i] == k.


Requires: k is in a, and a is sorted in ascending order
*/
int search(int[] a, int k) {
int l = 0, r = a.length-1;
while (l < r) {
int m = (l+r)/2;
if (k <= a[m]) r = m;
else l = m+1;
}
return l;
}

Conceptually, this algorithm is simple. But it is deceptively tricky to get exactly right. How do
we know we got the computation of m right? Why is it k <= a[m] and not k < a[m]? Why m and m+1
in the two updates to r and l respectively? If we change any of these decisions, the algorithm can
fail to find the correct element.

Binary search loop invariant


To convince ourselves that we wrote the correct code, we need a loop invariant with three claus-
es:

1. a is sorted in ascending order


2. l≤r
3. k ∈ a[l..r]

Note that we use the notation i..j to denote the set {x | i ≤ x ≤ j} = {i,i+1,...,j-1,j}. We use the
notation a[i..j] to indicate the subsequence of the array a starting from a[i] and continuing up
to and including a[j].

If we know what the loop invariant is for a loop, it is often a good idea to document it. In fact,
we can document it in a checkable way by using an assert statement that is executed on every
loop iteration.
Using loop invariants to show code is correct
Loop invariants can help us convince ourselves that our code, especially tricky code, is correct.
They also help us develop code to be correct in the first place, and they help us write efficient
code.

To use a loop invariant to argue that code does what we want, we use the following steps:

1. Establishment. Show that the loop invariant is true at the very beginning of the loop. (also
known as Initialization)
2. Preservation. Show that if we assume the loop invariant is true at the beginning of the loop,
it is also true at the end of the loop. (Other than coming up with the loop invariant in the
first place, this step is typically the most challenging.) Also known as Maintenance.
3. Postcondition. Show that if the loop guard is false (so the loop exits) and the loop invariant
still holds, the loop must have achieved what is desired. This is a crucial step too. If the
chosen loop invariant is too weak, this step will not be possible.

These three steps allow us to conclude that the loop satisfies partial correctness, which means
that if the loop terminates, it will succeed. To show total correctness, meaning that the loop will
always terminate, there is a fourth step:

4. Termination. Assuming the loop invariant holds at the start of each iteration, show that
some quantity strictly decreases, and that it cannot decrease indefinitely without making
either the loop guard or the loop invariant false. This quantity is called the decrementing
function or loop variant.

Let's try these four steps on the binary search algorithm.

1. Establishment. The loop invariant has three parts:


1. The array is sorted because that's a precondition of the method.
2. Since a.length is at least 1, l≤r.
3. k is in a[l..r] because that's the whole array and the precondition guarantees k is there.

2. Preservation. First, notice that the array is never changed in the loop, so part (1) of the
invariant is preserved.
We use l', r' to represent the values of l and r at the end of the loop. Then part (2) requires
l'≤r' and part (3) requires k∈a[l'..r']. Notice that m is the average of l and r, rounded down.
So we know that l≤m≤r. We know that either k∈a[l..m] or k∈a[m+1..r]. We analyze the two
cases separately.

Case k∈a[l..m]:
In this case we must have k≤a[m], so the if guard is true and r' = m and l' = l. We have
l'≤r' as required, since l≤m. Since k∈a[l..m] by assumption, k∈a[l'..r']. No changes
were made to the array, so the array is still sorted

Case k∉a[l..m]:
In this case we must have r>m≥l and k∈a[m+1..r]. Since k is not in the required range
of the array and the array is sorted, the if condition must be false. Therefore, we have
l'= m+1 and r' = r. Since r>m, l'>r' as required. We know No changes were made to the
array, so the array is still sorted.

3. Postcondition. For the algorithm to be correct, we need a[l] = k. If the loop guard is false,
we know l≥r. But the invariant (2) guarantees l≤r, so this can happen only if l=r. We know
from the invariant (3) that k∈a[l..r], which has been reduced to a single element that must be
where k is.
4. Termination. The value r−l is guaranteed by the invariant (2) to be non-negative. In the
case where k∈a[l..m], we know m<r, so l'−r' < l−r. In the other case, we know l<m+1, so
again, l'−r' < l−r. Because integer division rounds down, it gets smaller on every loop
iteration. Therefore the loop eventually terminates.

This loop invariant has three clauses, but it's easy to leave things out of the loop invariant. If
clauses are omitted from the loop invariant, it makes Establishment easy to argue, but often it be-
comes impossible to show Preservation or Postcondition. (This is the usual error.) If the loop in-
variant has extra things in it that aren't really true during the whole loop execution, Establish-
ment or Preservation become impossible to show.

Let's consider what would have happened had we omitted any of the three clauses from the bina-
ry search loop invariant:

1. ais sorted in ascending order.


Without this clause, we can't show Preservation, because there is no guarantee that the up-
dated range a[l'..r'] contains the desired element.

2. l≤r
Without this clause, we don't know that we are going to the correct side when we split on m.
The Termination argument also fails because the decrementing function is no longer guaran-
teed to be nonnegative.

3. k ∈ a[l..r]
Without this clause, we don't know that the loop has found anything when it terminates, so
Postcondition fails.

Example: Exponentiation by squaring and multiplication


Here is an implementation of exponentiation that is efficient but whose correctness is not instant-
ly apparent.

/** Returns: x^e


* Requires: e ≥ 0
* Performance: O(lg(e))
*/
static int pow(int x, int e) {
int r = 1, b = x, y = e;
// loop invariant: r·b^y = x^e and y≥0
while (y > 0) {
if (y % 2 == 1) r = r * b;
y = y/2;
b = b*b;
}
return r;
}

Intuitively, what this algorithm does is to convert the exponent e into a binary representation,
k k
which we can think of as a sum of powers of 2: e = (2k1 + 2k2 + ...). So xe = x(2 1)·x(2 2)· ... . By
repeatedly dividing y and inspecting the resulting parity, The algorithm finds each of the “1 dig-
its” in the binary representation, corresponding to the terms 2ki, and for such a digit at position k,
k
multiplies into r the appropriate factor x(2 ). However, the loop invariant will help convince us
that it really does work. The loop invariant captures that part of the final result has been trans-
ferred into r and what remains is by.

Let's consider the four steps outlined above.

1. Establishment. Initially, r=1, b=x and y=e, so trivially we have r·by = xe.
2. Preservation.Let us use y', b', and r' to refer to the values of these variables at the end of the
loop. We need to show that if r·by = xe at the beginning, then r'·b'y' = xe at the end. There are
two cases to consider:
Case: y is even. In this case, r' = r, y' = y/2, and b'= b2. Therefore, r'·b'y' = r·(b2)y/2 =
r·by, as desired.
Case: y is odd. Here we have r'=r·b, y' = (y-1)/2, and b' = b2. Therefore, r'·b'y' =
r·b·(b2)(y-1)/2 = r·b·(b)(y-1) = r·by, again.

3. Postcondition. If the loop guard is false, then y = 0, because y can never become negative
by dividing it by 2. If y = 0, then r·by = r, so r must be equal to xe.
4. Termination. Dividing by two makes the quantity y smaller on every loop iteration, because
it is always nonnegative (nonnegativity is actually a second clause in the loop invariant). It
can never become negative, so eventually it will become zero and the loop will terminate.

Therefore, the loop terminates and computes the correct value for xe in variable r.

Example: Insertion sort

/** Effect: put array a into ascending sorted order


*/
void sort(int[] a) {
// loop invariant:
// 1 ≤ i ≤ a.length
// a[0..i-1] is in sorted order
// Decrementing function: a.length - i
for (int i = 1; i < a.length; i++) {
int k = a[i];
// loop invariant:
// 0 ≤ j ≤ i < a.length
// Elements a[0..i], excluding a[j], are in sorted order,
// and include all the elements found originally in a[0..i-1].
// Elements a[j+1..i] > k.
// Decrementing function: j-1
for (int j = i; j > 0 && a[j-1] > k; j--) {
a[j] = a[j-1];
}
a[j] = k;
}
}

There are two loops, hence two loop invariants. These loop invariants can be visualized with the
following diagram:

Notice that the loop invariant holds in for loops at the point when the loop guard (i.e., i <
a.length) is evaluated, and not necessarily at the point when the for statement starts executing.
That is, the initialization expression in the for statement can help establish the loop invariant.

Loop invariants in software engineering


Loop invariants capture key facts that explain why code works. This means that if you write code
in which the loop invariant is not obvious, you should add a comment that gives the loop invari-
ant. This helps other programmers understand the code, and helps keep them (or you!) from acci-
dentally breaking the invariant with future changes.

If you have figured out (even part of) the loop invariant, it also makes sense to add an assertion
that checks the loop invariant on every iteration. Such assertions will tend to quickly expose
problems with your understanding of why the code works, and coding errors when implementing
the loop.
Recursion and linked lists
Recursion
Recursion is the definition of something in terms of itself. This sounds circular, but with care, re-
cursive definitions can be a highly effective way to express both algorithms and data structures.
Recursion allows us to solve a problem by using solutions to “smaller” versions of the same
problem.

Example: Fibonacci numbers


The nth Fibonacci number is the sum of the previous two Fibonacci numbers. This is a recursive
definition of a function f(n):

f(n) = f(n–1) + f(n–k)

To make this definition make sense, we need a base case that stops the recursion definition from
expanding indefinitely:

f(0) = f(1) = 1

Since the recursive definition is always in terms of smaller values of n, any given f(n) where n≥0
expands into smaller and smaller arguments until the base case is reached. For example:

f(3) = f(2) + f(1) = f(1) + f(0) + f(1) = 1 + 1 + 1

This recursive definition not only makes sense mathematically, it can be implemented in a direct
way as Java code:

int f(int n) {
if (n >= 2) return f(n-1) + f(n-2);
return 1;
}

This is very concise code, but not very efficient, as we'll see.

Example: Exponentiation
Previously we saw an efficient, iterative algorithm for exponentiation. It is even easier to write a
similar algorithm recursively:

/** Returns x^e. Requires e≥0 and the result is representable as an int. */
static int pow(int x, int e) {
if (e == 0) return 1;
int h = pow(x, e/2);
if (e%2 == 0) return h*h;
return h*h*x;
}
The base case for the recursion is e = 0, which is clearly computed correctly. For all other values
of e, the expression e/2 is smaller than e, so each recursive call gets closer to the base case and so
pow must terminate. When evaluating whether it works for a given value of e, we assume that it
works for all smaller values of e, in particular for pow(x, e/2). Therefore, the value of h is x⌊e/2⌋
according to the spec for pow. If e is even, then ⌊e/2⌋ = e/2, and h*h is equal to (xe/2)2 = xe, as de-
sired. If e is odd, then ⌊e/2⌋ = (e−1)/2, and h*h*x = x(e−1) * x = xe. So the method works in either
case.

Execution of recursive methods


Every time a method is invoked, a structure called an activation record is created in the comput-
er's memory, containing all its local variables, including its parameters. We can represent an acti-
vation record in a way similar to the way we've shown objects, as a box containing variables.
(However, it's not a full-fledged object, because we cannot create a reference to the activation
record.) For example, the diagram below shows the activation records created during the call
pow(2,5). Activation records have a pointer to the activation record of the calling method, shown
by a dotted arrow in the diagram. As each recursive call occurs, a new activation record is creat-
ed containing new local variables, so that each distinct call has its own variables.

When a method returns, a value is returned to the calling method, as shown by the numbers be-
side the dotted arrows. The activation record of the called method is then destroyed because it is
no longer needed.

The activation records of a running program form a stack. A stack is an ordered sequence of
items that supports two operations: push and pop. The push operation puts a new item at the be-
ginning of the sequence and pop discards the first item in the sequence. The stack of activation
records begins with the current activation record. Each call to a method causes the creation of a
new activation record that is pushed onto the stack. Each return from a method causes the current
activation record to be popped from the stack. Therefore, activation records are also known as
stack frames.

Termination and the base case


For recursive code to be correct, the base case of the recursion must eventually be reached on
every chain of recursive calls. Just like in the case of correct loops where we have a decrement-
ing function that gets smaller on every loop iteration, something must get smaller on every recur-
sive call, until the base case is reached.

For example, consider the problem of determining the number of ways to select r items from a
set of n items. We write this as C(n,r), where C stands for either choose or combinations. To
write the code, we break the problem into deciding what to do with the first element of the set,
and then solving the problem recursively for the rest of the set. If we choose the first element of
the set of n elements, then there are C(n−1, r−1) ways to pick the remaining elements we need
from the rest of the set. If we don't choose the first element of the set, then there are C(n−1, r)
ways to pick all the r elements from the rest of the set. Therefore, we have the following equa-
tion:

C(n, r) = C(n−1, r−1) + C(n−1, r)

There are two cases where this equation doesn't hold: when r = 0 and when n = r. In both those
cases, there is only one way to pick the elements, so:

C(n, n) = C(n, 0) = 1

We can view the space of inputs (n,r) as a table, in which the value of each cell other than the top
row and the diagonal is determined by adding the numbers directly above and diagonally to the
left. This gives us Pascal's triangle:

n 0 1 2 3 4 ...

r 0 1 1 1 1 1

1 1 2 3 4

2 1 3 6

3 1 4

4 1

As a decrementing function, we can use the value min(n−r, r), which measures the closest dis-
tance to one of the two lines of 1's in the table. If this value goes to zero, then we are in one of
the two base cases. Its value decreases by 1 in both recursive calls, so it can never go below zero.
Therefore, the base case must be reached along any chain of recursive calls.
Tail recursion and iteration
Earlier we saw that we could code up binary search as an iterative algorithm. We can also imple-
ment it recursively, in a way that makes it more obvious why it works.

/** Returns: an index i between indices fst and lst, inclusive, where
a[i] == k. Requires: that such an index exists, and that a is
sorted in ascending order.
*/
int search(int[] a, int k, int fst, int lst) {
if (fst == lst) return fst; // base case
int m = (fst + lst)/2;
if (k <= a[m])
search(a, k, fst, m);
else
search(a, k, m+1, lst);
}

Why does the algorithm work? First, consider the base case, which is when fst=lst. Since we are
assuming that the element k is between fst and lst, it must be at index lst. Otherwise, the code de-
termines whether the element is to the left of m (inclusive) or to the right (exclusive), ensuring
both that the precondition of the method is satisfied for the recursive call and that the distance
between fst and lst strictly decreases. It can never decrease below zero, so the base case is
reached eventually.

The search() method is recursive because it called itself. It is a particularly interesting kind of re-
cursive method, in that it calls itself as the very last thing done: the result of the method is the re-
sult of a recursive call. Such a function is said to be tail-recursive.

Tail-recursive methods have an interesting property that they are equivalent to loops. Any tail-re-
cursive method can be converted to a loop and any loop can be converted to a tail-recursive
method. The reason this works is that the activation record for a tail-recursive method is not
needed after the call is made. So the same activation record can be reused for the recursive call.
We wrap the whole method body in a big loop, and instead of passing arguments to the recursive
call, we just reassign the formal parameter variables to the new values that we would have
passed in the recursive call. Here is what the code looks like after this transformation:

int search(int[] a, int k, int fst, int lst) {


while (true) {
if (fst == lst) return fst; // base case
int m = (fst + lst)/2;
if (k <= a[m])
lst = m; // recursive call 1
else
fst = m + 1; // recursive call 2
}
}

Of course, we can make further simplifications such as folding the base case into the loop guard.

In Java, the iterative version is likely to run more efficiently than the recursive version, because
it requires only one activation record. In some languages, such as OCaml, the compiler recogniz-
es tail recursion automatically and generates code that is just as efficient as a loop. Since the
transformation to a loop is straightforward, we can write code using tail recursion when that
helps us to get the algorithm right, and convert it into an efficient loop when necessary for good
performance.

The linked list data structure


We're now going to start talking about data structures, which often involve another kind of recur-
sion: recursion on types. A data structure is simply a graph of objects linked together by refer-
ences (pointers), often in order to store or look up information.

A classic and still very useful example of a data structure is the linked list. A linked list consists
of a sequence of node objects in which each node points to the next node in the list. Some data is
stored along with each node. This data structure can be depicted graphically as shown in this fig-
ure, where the first three prime numbers are stored in the list.

The last node in the list does not point to another node; instead, it refers to the special value null
or, alternatively, to a sentinel object that is used to mark the end of the list.

A linked list node is implemented with code like the following:

class Node {
Object data;
Node next; // may be null
}

Notice that the class Node has an instance variable of the same type. That is, Node is defined in
terms of itself. It is an example of a recursive type. Recursive types are perfectly legal in Java,
and very useful.

The information in the list may be contained inside the nodes of the linked list, in which case the
list is said to be endogenous, or it may merely be referenced by the list node, in which case the
list is exogenous. We will be working with exogenous lists here.

A sentinel object can be used instead of the special value null, avoiding the possibility of a null
pointer exception:

static Node Null = new Node(); Null.next = Null;

The list shown above is considered a singly linked list because each list node has one outgoing
link. It is sometimes helpful to use a doubly linked list instead, in which each node points to
both the previous and next nodes in the list. In a doubly linked list, it is possible to walk in both
directions.

The definition of a doubly linked list node looks something like the following:
class DNode {
Node prev, next;
/* invariant: If next ≠ null, next.prev = this.
If prev ≠ null, prev.next = this. */
Object data;
}

Doubly linked lists come in both linear and circular varieties, as illustrated below. In the linear
variety, the first and last nodes have null prev and next pointers, respectively. In the circular vari-
ety, no pointers are null. The head node is distinguished by the fact that a pointer is kept to it
from outside. The class java.util.LinkedList is actually implemented as a circular doubly linked
list.

Iterative list traversals


We can use lists to build many interesting algorithms. Usually we keep some small number of
variables pointing into the list, and follow the pointers between nodes to get around. For exam-
ple, we can write code to check whether a list contains an object equal to a particular object x:

boolean contains(Node n, Object x) {


while (n != null) {
if (x.equals(n.data)) return true;
n = n.next;
}
return false;
}

We can also scan over a list accumulating information. For example, we might compute the total
of all the numbers contained in a list of integers:
int total(Node n) {
int sum = 0;
while (n != null) {
sum += n.data;
n = n.next;
}
return sum;
}

Something we often want to do with linked lists is to add a new item into the list. It is easiest to
add at the front of the list because the front of the list is immediately accessible. This is some-
times called the cons operation, as in the Lisp language:

Node cons(Object x, Node n) {


Node ret = new Node();
ret.data = x;
ret.next = n;
return ret;
}

Notice that the new linked list is constructed without making any changes to the existing nodes.
In a sense, we now have two linked lists that happen to share most of their nodes. As long as we
don't make changes to the nodes of either linked list, this sharing is perfectly okay and something
we can exploit for efficiency.

If we want to insert at the end of the list, we have to modify the last node. We can find the last
node by scanning down the list until we find it. Or we can keep track of the last node of the list
explicitly.

/** Update the list starting at n to add a node containing x at the end.
* Requires: n is not null.
*/
void append(Object x, Node n) {
while (n.next != null)
n = n.next;
n.next = new Node();
n.next.data = x;
}

Building abstractions using linked lists


Linked lists are useful data structures, but using them directly can lead to programming errors.
Often their utility comes from using them to implement other abstractions.

Immutable lists
For example, we can use a linked list to efficiently implement an immutable list of objects:

/**
* An immutable, ordered, finite sequence of objects
* (a0, a1, ..., an-1), which may be empty.
*/
interface ImmList {
/** Returns: the first object in the list (a0).
* Checks: the list is not empty. */
Object first();

/** Returns: a list containing all elements but the first, i.e.,
* (a1, ..., an-1)
* Checks: the list is not empty. */
ImmList rest();

/** Returns whether this is the empty list. */


boolean empty();

/** Returns: a list containing the same elements as this,


* but with the object x inserted at the beginning. That is,
* (x, a1, ..., an-1)
* Checks: the list is not empty. */
ImmList cons(Object x)
}

To implement this interface using a null-terminated list, we will need an additional header ob-
ject so that we can represent empty lists with something other than null. (Notice that we don't
bother to repeat the specifications from the interface. No need!)

class ImmListImpl implements ImmList {


private Node head; // may be null to represent empty list

public ImmListImpl() {
head = null;
}
public boolean empty() {
return (head == null);
}
public Object first() {
assert head != null;
return head.data;
}
public ImmList rest() {
assert head != null;
ImmListImpl r = new ImmListImpl();
r.head = head.next;
return r;
}
public ImmList cons(Object x) {
ImmListImpl r = new ImmListImpl();
r.head = new Node(x, head); // assuming appropriate Node constructor
return r;
}
}

Notice that this implementation allows different lists to share the same list nodes. This makes op-
erations like cons and rest much more efficient than they otherwise would be; the method rest
runs in constant time rather than needing to copy all the remaining nodes of the list. It is safe to
share list nodes precisely because the list abstraction is immutable, and the underlying list nodes
cannot be accessed by any code outside the ImmListImpl class. Abstraction lets us build more effi-
cient code.

Because ImmList is immutable, it makes sense to have an equals operation that compares all the
corresponding elements:

boolean equals(Object o) {
if (!o instanceof ImmList) return false;
ImmList lst = (ImmList) o;
Node n = head;
while (n != null) {
if (lst.empty()) return false;
if (!n.data.equals(lst.data)) return false;
n = n.next;
lst = lst.rest();
}
return lst.empty();
}

Mutable lists
The sharing that was possible with immutable lists is necessarily lost when we use linked lists to
implement mutable lists. On the other hand, we can offer a larger set of operations:

/** A mutable ordered list (a0, a1, ..., an-1) */


interface MutList {
/** The number of objects in the list. */
int size();

/** Returns: The object at index i (ai).


* Requires: 0 ≤ i < n */
Object get(int i);

/** Effects: Inserts x at the head of the list. */


void prepend(Object x);

/** Effects: Inserts x at the end of the list. */


void append(Object x);

/** Returns: true if x is in the list.


* Effects: Removes the first occurrence of object x from the list. */
boolean remove(Object x);

/** Returns: true if x is in the list. */


boolean contains(Object x);

...more operations...
}

Again, this abstraction can be implemented using a linked list. A header object is again handy,
especially to keep track of auxiliary information like the number of elements in the list and the
last element of the list.
We didn't put a rest() operation in the interface, because it would have to be an O(n) operation.
If the client really wants to perform that computation, they will probably copy the whole list and
remove the first element of the copy.

Below is the implementation of mutable lists. Notice that prepend and append both can be imple-
mented to take constant time thanks to the last field, which avoids scanning down the whole list
to find the end.

A final important thing we want to be able to do with mutable lists is to remove a node. For dou-
bly linked lists, removing nodes is easy, but it is slightly tricky for singly linked lists. The prob-
lem is that when the node is found, the previous node in the list needs to be updated to point to
the next node. The simple loop we've been using so far will have forgotten what that previous
node is. A second wrinkle is that if the node to be removed is the first node in the list, there is no
previous node to be updated. We can solve this problem by marching two pointers through the
list at the same time. The variable n points to the current node, and the p points to the previous
node, or contains null if n is the first node:

class MList implements MutList {


private Node head;
// invariant: size is the number of nodes in the list starting with head.
private int size;
// The last element in the list. Is null iff head is null.
private Node last;

int size() { return size; }

void prepend(Object x) {
head = new Node(x, head);
size++; // restore the size invariant
}
void append(Object x) {
Node n = new Node(x, null);
if (head == null)
head = last = n;
else
last.next = n;
size++;
}
boolean remove(Object x) {
Node n = head, p = null;
while (n != null && !x.equals(n.data)) {
p = n;
n = n.next;
}
if (n == null) return false;
size--;
if (p == null) head = n.next;
else p.next = n.next; // splice out n
return true;
}
}
Abstractions vs. data structures
A key observation
is that singly or
doubly linked lists
are merely data
structures rather
than abstractions.
As we've seen, and
as depicted on the
right, we can use
these data structures
to implement list
abstractions such as
immutable lists and
mutable lists. As we'll see later on, linked lists are just one of the ways to implement list abstrac-
tions. And we can use these data structures, in turn, to implement other abstractions.

For example, one useful abstraction we will see over and over again is the stack. A stack is an
ordered list that supports two operations:

void push(Object x) : insert the element x at the beginning of the list.


Object pop() : remove and return the first element in the list. Requires that the stack is non-
empty.

The stack abstraction is easily and efficiently implemented using linked lists:

class Stack {
private Node top;

void push(Object x) {
top = new Node(x, top);
}
Object pop() {
Object ret = top.data;
top = top.next;
}
}

The key is to keep in mind that data structures are ways to implement abstractions, and using
them through an abstraction barrier is preferable to using the data structure directly. This allows
you to change the data structure without breaking client code.

Recursion on lists
The tail of a list is another, smaller list. That means we can write recursive algorithms that com-
pute over linked lists. For example, we can write the contains method even more compactly us-
ing recursion:

/** Returns whether x is in the list starting at node n. */


boolean contains(Object x, Node n) {
if (n == null) return false;
if (x.equals(n.data)) return true;
return contains(x, n.next);
}

Many different computations on lists can be written recursively, including the total method we
saw earlier:

/** Returns the total of the data in the list starting at n. */


int total(Node n) {
if (n == null) return 0;
return n.data + total(n.next);
}

The code is shorter and simpler than the iterative code. However, it's likely to be a little bit slow-
er, at least in Java. The reason is that the recursive version creates one activation record for each
list node, but the iterative version uses only one activation record total.

Tail recursion
Fortunately, many recursive functions can be converted to loops. An example of such a function
is contains, above. The key property of contains is that its result in the recursive case is simply
the result of a recursive call. In general, a method call that produces a result that is immediately
returned (that is, return f(...)) is known as a tail call to f. When a tail call is made, the activa-
tion record of the calling method will never be used again. In the case of a recursive tail call, the
activation record of the caller can be reused for the callee. This is known as tail recursion.

Java doesn't automatically reuse the activation record in the way that some other languages do.
However, we can restructure the code slightly to have the same effect. The trick to reusing the
activation record is to wrap the whole function body in a while loop that allow us to restart the
call. Then, the recursive call is replaced with assignments that set the formal parameters to the
values they would take in the recursive call:

boolean contains(Object x, Node n) {


while (true) {
if (n == null) return false;
if (x.equals(n.data)) return true;
n = n.next; // x is unchanged in the recursive call
}
}

As an optimization, we then move the (negated) test n == null into the loop guard, and put the
return false after the loop, to get exactly the iterative code above!

boolean contains(Object x, Node n) {


while (n != null) {
if (x.equals(n.data)) return true;
n = n.next; // x is unchanged in the recursive call
}
return false;
}
The moral is that we can write short, clear, recursive code and convert it to an efficient loop
when efficiency is paramount.
Asymptotic Complexity
We write expressions like O(n) and O(n2) to describe the performance of algorithms. This is
called “big-O” notation, and describes performance in a way that is largely independent of the
kind of computer on which we are running our code. This is handy.

The statement that f(n) is O(g(n)) means that g(n) is an upper bound for f(n) within a constant
factor, for large enough n. That is, there exists some k such that f(n) ≤ k g(n) for sufficiently large
n.

For example, the function f(n) = 3n−2 is O(n), because (3n−2) ≤ 3n for all n > 0. That is, the con-
stant k is 3. Similarly, the function f'(n) = 3n + 2 is also O(n). It is bounded above by 4n for any n
larger than 2. This points out that kg(n) doesn't have to be larger than f(n) for all n, just for suffi-
ciently large n. There must be some value n0 such that for all n ≥ n0, kg(n) is larger than f(n).

A perhaps surprising consequence of the definition of O(g(n)), is that both f and f' are also O(n2),
because the quantity (3n±2) is bounded above by kn2 (for any k) as n grows large. This illustrates
that big-O notation only establishes an upper bound on how the function grows.

A function that is O(n) is said to be asympotically linear and a function that is O(1) is said to be
constant-time because it is always less than some constant k. A function that is O(n2) is called
quadratic, and a function that is O(ny) for some positive integer y is said to be polynomial.

Reasoning with asymptotic complexity


Notice that an expression like O(g(n)) is not a function. It really describes a set of functions: all
functions for which the appropriate constant factor k can be found. Viewed as sets, for example,
this means that O(10) = O(1) and O(n+1) = O(n). Sometimes people write “equations” like 5n+1
= O(n) that are not really equations. What is meant is that 5n + 1 is in the set O(n). Similarly, we
write things like O(n) + O(n2) = O(n2) to mean that the sum of any two functions that are respec-
tively asymptotically linear and asymptotically quadratic is asymptotically quadratic.

It helps to have some rules for reasoning about asymptotic complexity. Suppose f and g are both
functions of n, and c is an arbitrary constant. Then using the shorthand notation of the previous
paragraph, the following rules hold:

c = O(1)
O(c·f) = c·O(f) = O(f)
cnm = O(nk) if m ≤ k
O(f) + O(g) = O(f + g)
O(f)·O(g) = O(f·g)
logcn = O(log n)

However, we might expect that O(kn) = O(k'n) when k≠k', but this is not true.
Deriving asymptotic complexity
Together, the constants k and n0 form a witness to the asymptotic complexity of the function. To
show that a function has a particular asymptotic complexity, the direct way to produce the neces-
sary witness. For the example of the function f'(n) = 3n + 2, one witness is, as we saw above, the
pair (k=3, n0=2). Witnesses are not unique. If (k, n0) is a witness, so is (k', n'0) whenever k'≥k
and n'0≥n0.

Often, a simple way to show asymptotic complexity is to use the limit of the ratio f(n)/g(n) as n
goes to infinity. If this ratio has a finite limit, then f(n) is O(g(n)). On the other hand, if the ratio
limits to infinity, f(n) is not O(g(n)). (Both of these shortcuts can be proved using the definition
of limits.)

To evaluate the limit of f(n)/g(n), L'Hôpital's rule often comes in handy. When both f(n) and g(n)
go to infinity as n goes to infinity, the ratio of the two functions f(n)/g(n) limits to the same value
as the limit of their derivatives: f'(n)/g'(n).

For example, lg n is O(n) because limn→∞ (lg n)/n = limn→∞ (1/n)/1 = 0. In turn, this means that
lgk n is O(n) for any k because the derivative of lgk n is k/n lgk-1 n. Since lg n is O(n), so is lg2 n,
and therefore lg3 n, and so on for any positive k. (This is an argument by induction.)
Generics and More Lists
Support for generics is not an essentially object-oriented idea, and was not originally part of Ja-
va. We already saw that object-oriented languages support subtype polymorphism; generics give
us a different kind of polymorphism, parametric polymorphism. Recall that polymorphism
means that something can be used at multiple types. Subtyping allows client code to be written
with respect to a single type, while interoperating with multiple implementations of that type.
Parametric polymorphism, by contrast, allows an implementation to be written with respect to a
single type, but used by differently typed clients.

Application: Collections
Java comes with a library of different collection abstractions and implementations for collections
of information. The biggest reason why generics were added to Java was to make the Collections
Framework more effective. In Java 1.4, the interface Collection looked like the following (only
some key methods are shown):

/** A mutable collection. */


interface Collection {
/** Return whether object o is in the collection. */
boolean contains(Object o);
/** Add object o to the collection. Return true if this changes
* the state of the collection.
*/
boolean add(Object o);
/** Remove object o from the collection. Return true if this changes
* the state of the collection.
*/
boolean remove(Object o);
...
}

All the collection knows about its contained elements is that they are objects. This loss of infor-
mation leads to programmer errors and makes code more awkward. Here is an example:

Collection c = ...;
c.add(2); // no check that we are inserting the right kind of object
...
for (Object o : c) {
Integer i = (Integer) o;
// use i here
}

Here, we expect c to be a collection of Integers, but we have to use a downcast to use the ele-
ments of the collections as the type we expect. The downcast is not only awkward and verbose,
but it might fail at run time, because there is nothing about the collection that prevents us from
accidentally putting something into it of the wrong type.

Notice that the same problem doesn't happen when using an array:
Integer[] c;
c[0] = 2; // statically checked
...
for (Integer i : c) {
// use i here
}

The key is that array is a parameterized type. We can think of the type Integer[] as the applica-
tion of a type-level function (call it array) to the type parameter Integer. The idea of generics is
to allow user-defined parameterized types.

Type parameterization
The generics allows programmers to define their own parameterized types. For example, we can
make Collection become a parameterized type that can be applied to an arbitrary type T using the
“angle bracket” syntax: Collection<T>. Thus, the type Collection<Integer> is a collection of
Integers, the type Collection<String> is a collection of Strings, and the type
Collection<Collection<String>> is a collection of collections of strings.

A parameterized type is declared by giving it a formal type parameter that can then be used as
a type inside the type's definition—for example, in method signatures:

interface Collection <T> {


boolean contains(T x);
boolean remove(T x);
boolean add(T x);
}

Inside the definition, the type parameter T stands for whatever actual type the client chooses to
apply it to. A type like Collection<String> is called an instantiation of the parameterized type
Collection on the type argument String. The signatures of the methods of Collection<String> are
exactly the signatures appearing in the declaration of Interface, except that every occurrence of T
is replaced with String. For example, the add method of Collection<String> behaves exactly as if
its signature were boolean add(String x).

Now, the compiler can tell when we are trying to add an element of the wrong type, and we don't
have to worry about getting the wrong type of element out of the collection:

Collection<String> c = ...
c.add("hi"); // checked!
c.add(2); // illegal: static error
for (String s : c) {
// use s
}

Implementing generics
Parametric polymorphism also helps us when we are implementing abstractions. Let's consider
implementing the Collection interface using a linked list. First, we will want linked list nodes
that can contain data of an arbitrary type:
class Node<T> {
T data;
Node<T> next;

Node(T d, Node<T> n) {
data = d;
next = n;
}
}

We can't use the Node class to implement the Collection interface directly, because an empty list is
represented as a null, which we can't invoke methods on. Therefore, to implement the Collection
interface, we use an additional header object to point to the rest of the list. The header object is
implemented by the LinkedList class:

/** A mutable list. */


class LList<T> implements Collection<T> {
int size = 0;
Node<T> head = null; // may be null

boolean contains(T x) {
Node<T> n = head;
while (n != null) {
if (x.equals(n.data)) return true;
n = n.next;
}
return false;
}
boolean add(T x) {
head = new Node<T>(x, head);
size++;
}
boolean remove(T x) {
Node n = head, p = null;
while (n != null && !x.equals(n.data)) {
p = n;
n = n.next;
}
if (n == null) return false;
size--;
if (p == null) head = n.next;
else p.next = n.next; // splice out n
return true;
}
// ... more methods ...
}

Generic methods
So far we've seen that classes and interface can be parameterized. We can also give methods their
own type parameters. For example, suppose that some non-generic code outside the implementa-
tion of linked lists or collections needs to be able to print out collections regardless of what kind
of element is in the collection. We can define a generic method to accomplish this:
<T> void print(Collection<T> c) {
for (T x : c) {
println("value: " + x);
}
}

Collection<Integer> c;
print(c); // equivalent to this.<Integer>print(c);

Notice that a call to the print method does not need to specify the actual type parameter Integer.
The compiler is able to infer the missing parameter automatically. It is also possible to supply
type parameters to generic method calls explicitly, by putting the type parameter in angle brack-
ets after the dot.

Subtyping
Like other implements declarations, the declaration above that LList<T> implements Collection<T>
generates a subtype relationship: in fact, a family of subtype relationships, because the subtype
relationship holds regardless of what actual type T is chosen. The compiler therefore understands
that the relationship LList<String> <: Collection<String> holds. What about these other possible
relationships?

LList<String> <: LList<Object> ?

LList<String> <: Collection<Object> ?

Both of these sound reasonable at first glance. But they are actually unsound, leading to possible
run-time type errors. The following example shows the problem:

LList<String> ls = new LList<String>();


LList<Object> lo = ls;
lo.add(2112);
String s = ls.last(); // extract last element from list

The last element of the list, which is assigned to a variable of type String, is actually an Integer!

The idea that there can be a subtyping relationship between different instantiations of the same
generic type is called variance. Variance is tricky to support in a sound way, so Java does not
support variance. Other languages such as Scala do have variance.

Wildcards
To make up for the lack of variance, Java has a feature called wildcards, in which question
marks are used as type arguments. The type LList<?> represents an object that is an LList<T> for
some type T, though precisely which type T is not known at compile time (or, in fact, even at run
time).

A value of type LList<T> (for any T) can be used as if it had type LList<?>, so there is a family of
subtyping relationships LList<T> <:LList<?>. This means that a method can provide a caller with a
list of any type without the client knowing what is really stored in the list; the client can get ele-
ments from the list but cannot change the list:
LList<?> f() {
LList<Integer> i = new LList();
i.add(2);
i.add(3);
i.add(5);
return i;
}

// in caller
LList<?> lst = f();
lst.add(7); // illegal: type ? not known
foreach (Object o : lst) {
println(o);
}

Notice that the type of the elements iterated over is not really known, either, but we do at least
know that the type hidden by ? is a subtype of Object. So it is type-safe to declare the variable o
as an Object.

If we need to know more about the type hidden by the question mark, it is possible to add an
extends clause. For example, suppose we have an interface Animal with two implementing classes
Elephant and Rhino. Then the type Collection<? extends Animal< is a supertype of both
Collection<Elephant> and Collection<Rhino>, and we can iterate over the collection and extract
Animals rather than just Objects.

Collection<? extends Animal> c = new LList<Rhino>();


for (Animal a : c) {
// use a as Animal here
}

Limitations
The way generics are actually implemented in Java is that all actual type parameters are erased
at run time. This implementation choice leads to a number of limitations of the generics mecha-
nism in Java when in a generic context where T is a formal parameter:

1. Constructors of T cannot be used; we cannot write new T(). The workaround for this
limitation is to have an object with a factory method for creating T objects.
2. Arrays with T as elements cannot be created, either. We cannot write new T[n], because the
type T is not known at run time and so the type T[] cannot be installed into the object's
header. The workaround for this limitation is to use an array of type Object[] instead:
T[] a = (T[]) new Object[n];

This of course creates an array that could in principle be used to store things other than T's,
but as long as we use that array through the variable a, we won't. The compiler gives us an
alarming warning when we use this trick because of the unsafe cast, but this programming
idiom is fairly safe.
3. We can't use instanceof to find out what type parameters are, because the object does not
contain that information. If, for example, we create an LList<String> object, the object's
header word only records that it is an LList. So an LList<String> object that is statically
typed as an Object can be tested to see if it is some kind of LList, but not whether the actual
type parameter is String:
Object co = new LList<String>();

if (co instanceof LList<String>) ... // illegal


if (co instanceof LList<?>) ... // legal
if (co instanceof LList) ... // legal but discouraged

LList<String> ls = (LList<String>) co; // legal but only partly checked


LList<?> ls = (LList<?>) co; // legal
LList<String> ls = (LList<?>) co; // illegal
LList<String> ls = (LList)co; // legal but discouraged

The last four lines above illustrate how downcasts interoperate with generics. Code can cast
to a type with an actual type parameter, but the type parameter is not actually checked at run
time; Java takes the programmer's word that the type parameter is correct. We can cast to a
wildcard instantiation, but such a cast is not very useful if we need to use the elements at
their actual type. Finally, we can cast to the raw type LList; casting to raw types is unsafe. It
is essentially the same as casting to LList<?> except that Java allows a raw type to be used as
if it were any particular instantiation. Raw types should be avoided when possible.

Accessing type operations


What if we want to use methods of T in a generic context where T is a formal parameter? There
is more than one way to do this, but in Java the most powerful approach is to provide a separate
model object that knows how to perform the operations that are needed. For example, suppose
we want to compare objects of type T using the compareTo method. We declare a generic interface
Comparator<T>:

interface Comparator<T> {
/** Compares x and y. Return 0 if x and y are equal, a negative number if x < y,
* and a positive number if x > y.
*/
int compareTo(T x, T y);
}

Now, a generic method for sorting an array takes an extra comparator parameter:

/** Sort the array a in ascending order using cmp to define the ordering on the
* elements. */
<T> sort(T[] a, Comparator<T> cmp) {
...
if (cmp.compareTo(a[i], a[j]) > 0) {
...
}
...
}
A class can then implement the comparator interface and be used to make the right comparator
operation available to the generic code.

class SCmp implements Comparator<String> {


@Override
public int compareTo(String x, String y) {
return x.compareTo(y);
}

String[] a = {"z", "Y", "x"};


sort(a, new SCmp());

Notice that here we are using String's own compareTo operation as a model for the comparator, but
we don't have to. For example, we could have used the compareToIgnoreCase method to sort strings
while ignoring the difference between upper and lower case. It turns out that we can also use Ja-
va's new “lambda expressions” to implement the interface even more compactly. Here is how we
would sort the array using a lambda expression while also ignoring case:

sort(a, (x,y) -> x.compareToIgnoreCase(y));

The lambda expression (x,y) -> x.compareToIgnoreCase(y) is actually just a very convenient syn-
tactic sugar for declaring a class like the one above and instantiating it with new.

Generic classes may need to access parameter type operations too. The typical approach is to ac-
cept the model object in constructors and to then store it in an instance variable for later use by
other methods:

class SortedList<T> implements Collection<T> {


Comparator<T> comparator;

SortedList(Comparator<T> cmp) {
comparator = cmp; // save model object
...
}

boolean add(T x) {
...
if (comparator.compareTo(x, y)) { // use model object
...
}
...
}
}
Trees
Trees are a very useful class of data structures. Like (singly-)linked lists, trees are acyclic graphs
of node objects in which each node may have some attached information. Whereas linked list
nodes have zero or one successor nodes, tree nodes may have more. Having multiple successor
nodes (called children) makes trees more versatile as a way to represent information and often
more efficient as a way to find information within the data structure.

Trees are recursive data structures. A non-empty tree has a single root node that is the starting
point for walking the graph of tree nodes; all nodes are reachable from the root. All tree nodes
except the root have exactly one predecessor node, called its parent. We often draw the root with
an incoming arrow to distinguish it. It is convenient to draw trees as growing downward:

Because a tree is a recursive data structure, each child of node in the tree is the root of a subtree.
For example, in the following tree, node G has two children B and H, each of which is the root of
a subtree. This tree is a binary tree in which each node has up to two children. We say that a bi-
nary tree has a branching factor of 2. In a binary tree, the children are usually ordered, so B is
the left child and root of the left subtree, and H is the right child and root of the right subtree.
Nodes that have no children are called leaves.
Each node in a tree has a height and a depth. The depth is the length of the path from the root.
The height is the length of the path from the node to the deepest leaf reachable from it. The
height of a tree is the height of its root node, or the depth of the deepest leaf.

Conventionally we represent a binary tree using a class with instance variables for the left and
right children:

class BinaryNode<T> {
T data;
BinaryNode<T> left, right; // may be null
}

For trees with a larger branching factor, the children may be stored in another data structure such
as an array or linked list:

class NAryNode<T> {
T data;
NAryNode<T>[] children;
}

class NAryNode<T> {
T data;
LinkedList<NAryNode<T>> children;
}

Analogously to doubly-linked lists, tree nodes in some tree data structures maintain a pointer to
their parent node (if any). A parent pointer can be useful for walking upward in a tree, though it
takes up extra memory and creates additional maintenance requirements.

Why trees?
There are two main reasons why tree data structures are important:

• First, some information has a naturally tree-like structure, so storing it in trees makes
sense. Examples of such information include parse trees, inheritance and subtyping hierar-
chies, game search trees, and decision trees.

• Second, trees make it possible to find information within the data structure relatively
quickly. If a significant fraction of the nodes have more than one child, it can be arranged
that all nodes are fairly close to the root. For simplicity, let's think about a full tree in
which all leaves are at equal depth and all non-leaves have two children. In a full binary
tree of depth h, there are 1 + 2 + 4 + ... 2h = 2h+1 − 1 nodes. Calling this number of nodes
n, we have h = lg(n+1)-1, so h is O(lg n). (Recall that lg x is the logarithm of x base 2).
Thus, if we are looking for information in a full binary tree, it can be reached along a path
whose length is logarithmic in the total information stored in the tree.

For large n, logarithmic time is a big speedup over the linear-time performance offered by
data structures like linked lists and arrays. For n = 1,000,000, lg n = 20, a speedup of
50,000 when constant factors are ignored. For n = 1,000,000,000, lg = 30, a speedup of
more than 30,000,000.

Binary search trees


Of course, the preceding analysis relies on us knowing which path to take through the tree to find
information. This can be arranged by organizing the tree as a binary search tree.

A binary search tree is a binary tree that satisfies a data structure invariant: for each node in the
tree, all elements in the node's left subtree are less than the element at the node, and all elements
in the node's right subtree are greater than the node's element. Pictorially, we can visualize the
tree relative to a given node as follows:

We can express this data structure invariant as a class invariant:

class BinaryNode<T extends Comparable<T>> {


T data;
/** Invariant: left and right are the root of subtrees,
* where all elements in the left subtree are less than data,
* and all elements in the right subtree are greater. The value null
* represents an empty subtree.
*/
BinaryNode<T> left, right;
}

interface Comparable<T> {
/** Return 0 if this == y, a positive number if this > y,
* and a negative number if this < y.
*/
int compareTo(T y);
}

Since the invariant is defined on a recursive type, it applies recursively throughout the tree, en-
suring that data structure invariant holds for every node.

For the invariant to make sense, we must be able to compare two elements to see which is greater
(in some ordering). The ordering is specified by the operation compareTo(). One way to ensure
that the type T has such an operation is to specify in the class declaration that T extends
Comparable<T>, where Comparable is the generic interface shown above. The keyword extends
merely signifies that T is a subtype of Comparable<T>; it is perfectly sufficient for T to be a class
that “implements” the interface. The compiler will prevent us from instantiating the class
BinaryNode on any type T that is not a declared subtype of Comparable<T>, and therefore the code of
BinaryNode can assume that T has the compareTo() method.

Searching in a binary search tree


Now, consider what happens when we try to find a path down through the tree looking for an ar-
bitrary element x. We compare x to the root data value. If x is equal to the data value, then we've
already found it. If x is less than the data value and it's in the tree, it must be in the left subtree,
so we can walk down to the left subtree and look for the element there. Conversely, if x is greater
than the data value and it's in the tree, it must by in the right subtree, so we should look for it
there. In either case, if the subtree where x must be is empty (null), the element must not be in
the tree. This algorithm can be expressed compactly as a (tail-)recursive method on BinaryNode:

boolean contains(T x) {
int c = x.compareTo(data);
if (c == 0) return true;
BinaryNode<T> child = (c < 0) ? left : right;
if (child == null) return false;
return child.contains(x);
}

We've used Java's “ternary expression” here to make the code more compact (and to show off an-
other coding idiom!) The expression b ? e1 : e2 is a conditional expression whose value is the
result of either e1 or e2, depending on whether the boolean expression b evaluates to true or
false.

Since the method is tail-recursive, we can also write it as a loop. Here is a version where the root
of the tree is passed in explicitly as a parameter n:

static boolean contains(T x, BinaryNode<T> n) {


while (n != null) {
int c = x.compareTo(n.data);
if (c == 0) return true;
n = (c < 0) ? left : right;
}
return false;
}

Adding an element to a binary search tree


To add an element so we can find it later, it has to be added along the search path that will be
used. Therefore, we add an element by search for the element. If found, it need not be added; if
not found, it is added as a child of the leaf node reached along the search path. Again, this can be
written easily as a tail-recursive method:

/** Add element x to the binary search tree, unless it is already there.
Return whether if the element was added.
*/
boolean add(T x) {
int c = x.compareTo(data);
if (c == 0) return false;
if (c < 0) {
if (left != null) return left.add(x);
left = new BinaryNode<T>(x);
} else {
if (right != null) return right.add(x);
right = new BinaryNode<T>(x);
}
return true;
}

To see how this algorithm works, consider adding the element 3 to the tree shown with the black
arrows in the following diagram. We start at the root (2) and go to the right (5) because 3 < 2. In
the recursive call we then go to the left (2<5) to node 4. Since 3<4, we try to go to the left but
observe left==null, and therefore create a left child containing 3, shown by the gray arrow.

Using trees to implement maps


A map abstraction lets us associate values with keys, and look up the value corresponding to a
given key. This can be implemented using a tree by using the elements as keys, but adding the
associated value to the same tree node as its key. When the key is found, so is the associated val-
ue. An alternate way to view this implementation is that each element stored in the tree is really a
pair of a key and a value, where two elements are ordered according to their keys alone.

Supporting duplicate elements


For some applications, it may be useful to store multiple elements that are considered equal to
each other. Suppose elements are key–value pairs, but we want to allow a key to be associated
with multiple values. To allow equal elements to be stored in the same tree, we need to relax the
BST invariant slightly. Given a node containing value x, we must know whether to go to the left
or right to find the other occurrences of x. To build a tree where we go left, we relax the BST in-
variant so that the left subtree contains elements less than or equal to x, whereas the right subtree
contains elements strictly greater than x.

N-ary search trees


It is possible to define search trees with more than two children per node. The higher branching
factor means paths through the tree are shorter, but considerably complicates all of the algo-
rithms involved. B-trees are an example of an N-ary search tree structure. In N-ary search tree,
each node contains up to N-1 elements e0..eN-2 and has up to N children c0, arranged so that the
subtrees of the children contain only elements between successive elements at the node. If a node
has n children, the node contains n-1 elements obeying the following invariant:

c0 < e0 < c1 < e1 < c2 < e2 ... < en-2 < cn-1

Given an element to be searched for, we do a binary search on the elements ei and if, it is not
found, the invariant indicates the appropriate child subtree to search.

Parent pointers
Thus far we have only had pointers going downward in the tree. It is sometimes handy to have
pointers going from nodes to their parents, much as nodes in a doubly linked list contain pointers
to their predecessor nodes.

Removing elements from a binary search tree


Removing elements from a tree is generally more complicated than adding them, because the el-
ements to be removed are not necessarily leaves. The algorithm starts by first finding the node
containing the value x to be removed, and its parent node p. There are three cases to consider:

1. Node x is a leaf.
In this case, we can prune the node x from the tree by setting the pointer to it from p to null.
The other subtree of p (shown as a white triangle, but it may be empty) is unchanged.

2. Node x has one child.


We splice out node x from the tree by redirecting the pointer from p to x to now point to the
single child of x. Since the BST invariant guarantees that A < x < p < B, splicing out x pre-
serves the BST invariant.

3. Node x has two children.


In this case it's not easy to remove node x. Instead, we replace the data element in the node
with the element from, depending on the implementation, either the very next element in the
tree or the immediately previous element. Suppose, without loss of generality, we always
use the very next element x'. Since it's the immediately next element, its node can't possibly
have a left child. Therefore, we either prune the x' node (if it is a leaf) or splice it out (if it is
not), and then overwrite the data in the x node with x'.

To find the element x', we start by walking down the tree one step to the right, since the ele-
ment x' must be in the right subtree of x. The element x' must be the smallest element in the
right subtree. We find the smallest element in the subtree by simply walking down to the left
as far as possible. Note that this smallest element may have a right child, in which case we
reattach that child to the parent of the node being removed. One interesting case in which
that happens is when the right child of x has no left child.

Asymptotic performance of binary search trees


The time required to search for elements in a search tree is in the worst cas proportional to the
longest path from the root of the tree to a leaf, or the height of the tree, h. Therefore, tree opera-
tions take O(h) time.

With the simple binary search tree implementation we've seen so far, the worst performance is
seen when elements are inserted in order, e.g.: add(1), add(2), add(3), ... add(n). The resulting bi-
nary tree will only have right children, and will be functionally identical to a linked list!

For this tree, h is O(n), and tree operations are O(n). Our goal is logarithmic performance, which
requires h is O(lg n). A tree in which h is O(lg n) is said to be balanced. Many techniques exist
for obtaining balanced trees.

One simple-minded way to balance trees is to insert elements into the tree in a random order.
This turns out to result in a tree whose expected height is O(lg n) (Proving this is outside the
scope of this course, but see CLRS, Chapter 12.4 for a proof). However, we need to know how to
shuffle a collection of elements into a random order, which involves a small digression:
How to place a sequence of elements in random order
The Fisher–Yates algorithm (developed in 1938!) places N elements into random order. Re-
call that there are N! possible permutations of N elements; a perfectly random shuffle should
have equal probability 1/N! of producing any given permutation. The algorithm works as
follows. Assume we have the N elements in an array. We iterate from N-1 down to 0 decid-
ing which element to place into a given array index. In each array index we randomly
choose one of the elements in the array indices up to that point and swap it with the current
array index.

T[] a;
int N = a.length;
Random r = new Random();
for (int i = N-1; i > 0; i--) {
int j = r.nextInt(i+1);
// swap a[j] and a[i]
T temp = a[j];
a[j] = a[i];
a[i] = temp;
}

The first iteration generates one of N possible values, the second iteration one of N-1 possi-
ble values, and so on until the final iteration generates one of two possible values. There-
fore, the total number of possible ways to execute is N×(N-1)×(N-2)×...×2 = N!. Further-
more, given a particular permutation, there is exactly one way for the algorithm to produce
it. Therefore, all permutations are produced with equal probability, assuming the random
number generator is truly random.

Traversing trees
Given a tree containing some number of elements, it is sometimes useful to traverse the tree,
visiting each element and doing something with it, such as printing it out or adding it to another
collection.

The most common traversal strategy is in-order traversal, in which each element is visited be-
tween the elements of the subtrees. In-order traverse can be expressed easily using recursion:

/** Apply the method visit() to every node in the tree, using an in-order traversal. */
void traverse() {
if (left != null) left.traverse();
visit(data);
if (right != null) right.traverse();
}

For example, consider using this algorithm on the following search tree:
The elements will be visited in the order 1, 3, 5, 10, 11, 17.

The traversal is not clearly tail-recursive and therefore cannot be easily converted into a loop, but
this is not a problem unless the tree is very deep. If iterative traversal is required, it can be done
if nodes contain parent pointers.

Notice that an in-order traversal of a binary search tree visits the element in sorted order. This
observation gives us, in fact, an asymptotically efficient sorting algorithm. Given a collection of
elements to sort, we add them into a BST, taking O(hn) time. If the elements are first shuffled in-
to a random order, h is O(lg n) with high probability, so adding all the elements takes O(n lg n)
time. Then, we can use an in-order traversal, taking O(n) time, to extract all the elements out in
order. While no general sorting algorithm is more efficient asymptotically than O(n lg n), we will
later see some other sorting algorithms that are just as asymptotically efficient but have lower
constant factors.

Other traversals can be done. By moving the visiting of the element to both or after the descend-
ants, we arrive at preorder and postorder traversals respectively:

/** Apply the method visit() to every node in the tree, using a preorder traversal. */
void traverse() {
visit(data);
if (left != null) left.traverse();
if (right != null) right.traverse();
}

/** Apply the method visit() to every node in the tree, using a postorder traversal. */
void traverse() {
if (left != null) left.traverse();
if (right != null) right.traverse();
visit(data);
}

A preorder traversal of the tree above visits the nodes in the order 5, 3, 1, 11, 10, 17: a node is
visited before all of its descendants. A postorder traversal visits a node after all of its descend-
ants; for this example, in the order 1, 3, 10, 17, 11, 5.

References
Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms.
Hash tables
Suppose we want a data structure to implement either a mutable set of elements (with operations
like contains, add, and remove that take an element as an argument) or a mutable map from keys
to values (with operations like get, put, and remove that take a key for an arguments). A mutable
map is also known as an associative array. We've now seen a few data structures that could be
used for both of these implementation tasks.

We consider the problem of implementing sets and maps together because most data structures
that can implement a set can also implement a map. A set of key–value pairs can act as a map, as
long as the way we compare key–value pairs is to compare only the keys. Alternatively, we can
view the transformation from a set to a map as starting with a data structure that implements set
of keys and then adding an associated value to each data structure node that stores a key.

Here are the data structures we've seen so far, with the asymptotic complexities for each of their
key operations:

Data structure lookup (contains/get) add/put remove

Array O(n) O(1) O(n)

Sorted array O(lg n) O(n) O(n)

Linked list O(n) O(1) O(n)

Search tree O(lg n) O(lg n) O(lg n)

Naturally, we might wonder if there is a data structure that can do better. And it turns out that
there is: the hash table, one of the best and most useful data structures there is—when used cor-
rectly.

Many variations on hash tables have been developed. We'll explore the most common ones,
building up in steps.

Step 1: Direct address tables


While arrays make a slow data structure when you don't know what index to look at, all of their
operations are very fast when you do. This is the insight behind the direct address table. Sup-
pose that for each element that we want to store in the data structure, we can determine a unique
integer index in the range 0..m–1. That is, we need an injective function that maps elements (or
keys) to integers in the range. Then we can use the indices produced by the function to decide at
which index to store the elements in an array of size m.

For example, suppose we are maintaining a collection of objects representing houses on the same
street. We can use the street address as the index into a direct address table. Not every possible
street address will be used, so some array entries will be empty. This is not a problem as long as
there are not too many empty entries. However, it is often hard to come up with an injective
function that does not require many empty
entries. For example, suppose that instead
we are maintaining a collection of employ-
ees whom we want to look up by social se-
curity number. Using the social security
number as the index into a direct address
table means we need an array of 10 billion
elements, almost all of which are likely to
be unused. Even assuming our computer
has enough memory to store such a sparse
array, it will be a waste of memory. Fur-
thermore, on most computer hardware, the
use of caches means that accesses to large arrays are actually significantly slower than accesses
to small arrays—sometimes, two orders of magnitude slower!

Step 2: Hashing
Instead of requiring that the key be mapped to an in-
dex without any collisions, we allow collisions in
which two keys maps to the same array index. To
avoid having many collisions, this mapping is per-
formed by a hash function that maps the key in a re-
producible but “random” way to a hash that is a legal
array index. If the hash function is good, collisions
occur as if completely at random. Suppose that we are
using an array with 13 entries and our keys are social
security numbers, expressed as long values. Then we
might use modular hashing, in which the array index
is computed as key % 13. This is not a very random
hash function, but is likely to be good enough unless
there is an adversary purposely trying to produce col-
lisions.

Step 3: Collision resolution

Open hashing (chaining)


There are
two main
ideas for
how to
deal with
collisions.
The best
way is
usually
chaining:
each array
entry cor-
responds
to a bucket containing a mutable set of elements. (Confusingly, this approach is also known as
closed addressing or open hashing.) Typically, the bucket is implemented as a linked list, so
each array entry (if nonempty) contains a pointer to the head of the linked list.

To check whether an element is in the hash table, the key is first hashed to find the correct bucket
to look in. Then, the linked list is scanned to see if the desired element is present. If the linked
list is short, this scan is very quick.

An element is added or removed by hashing it to find the correct bucket. Then, the bucket is
checked to see if the element is there, and finally the element is added or removed appropriately
from the bucket in the usual way for linked lists.

Closed hashing (probing)


Another approach to collision
resolution that is worth know-
ing about is probing. (Con-
fusingly, this technique is also
known as open addressing or
closed hashing.) Rather than
put colliding elements in a
linked list, all elements are
stored in the array itself.
When adding a new element
to the hash table creates a col-
lision, the hash table finds
somewhere else in the array
to put it. The simple way to
find an empty index is to search ahead through the array indices with a fixed stride (often 1),
looking for an unused entry; this linear probing strategy tends to produce a lot of clustering of
elements in the table, leading to bad performance. A better strategy is to use a second hash func-
tion to compute the probing interval; this strategy is called double hashing. Regardless of how
probing is implemented, however, the time required to search for or add an element grows rapid-
ly as the hash table fills up. By contrast, the performance of chaining degrades more gracefully,
and chaining is usually faster than probing even when the hash table is not nearly full. Therefore
chaining is usually preferred over probing.

A recently popular variant of closed hashing is Cuckoo hashing, in which two hash functions are
used. Each element is stored at one of the two locations computed by these hash functions, so at
most two table locations must be consulted in order to determine whether the element is present.
If both possible locations are occupied, the newly added element displaces the element that was
there, and this element is then re-added to the table. In general, a chain of displacements occurs.

Performance of hash tables


Suppose we are using a chained hash table with m buckets, and the number of elements in the
hash table is n. Then the average number of elements per bucket is n/m, which is called the load
factor of the hash table, denoted α. When an element that is not in the hash table is searched for,
the expected length of the linked list traversed is α. Since there is always the initial (constant)
cost of hashing, the cost of hash table operations with a good hash function is, on average, O(1 +
α). If we can ensure that the load factor α never exceeds some fixed value αmax, then all opera-
tions will be O(1 + αmax) = O(1).

In practice, we will get the best performance out of hash tables when α is within a narrow range,
from approximately 1/2 to 2. If α is less than 1/2, the bucket array is becoming sparse and a
smaller array is likely to give better performance. If α is greater than 2, the cost of traversing the
linked lists limits performance.

One way to hit the desired range for α is to allocate the bucket array is just the right size for the
number of elements that are being added to it. In general, however, it's hard to know ahead of
time what this size will be, and in any case, the number of elements in the hash table may need to
change over time.

Step 4: Resizable arrays


Since we can't predict how big to make the bucket array ahead of time, why not dynamically ad-
just its size? We can use a resizable array data structure to achieve this. Instead of representing
the hash table as a bucket array, we introduce a header object that contains a pointer to the cur-
rent bucket array, and also keeps track of the number of elements in the hash table.

Whenever adding an element would


cause α to exceed αmax, the hash table
generates a new bucket array whose
size is a multiple of the original size.
Typically, the new bucket array is
twice the size of the current one. Then,
all of the elements must be rehashed
into the new bucket array. This means
a change of hash function; typically,
hash functions are designed so they take the array size m as a parameter, so this parameter just
needs to be changed.

Amortized complexity
Since some add() operations cause all the elements to be rehashed, the cost of each such opera-
tion is O(n) in the number of elements. For a large hash table, this may take enough time that it
causes problems for the program. Perhaps surprisingly, however, the cost per operation is still al-
ways O(1). In particular, any sequence of n operations on the hash table always takes O(n) time,
or O(1) per operation. Therefore we say that the amortized asymptotic complexity of hash table
operations is O(1).

To see why this is true, consider a hash table with αmax = 1. The most expensive sequence of n
operations we can do is a series of n add() calls where n = 2j, meaning that the hash table resizes
on the very last call to add(). The cost of the operations can be measured in the number of uses of
the hash functions. There are n initial hashes when elements are added. The hash table is resized
whenever it hits a power of two is size, so the extra hashes caused by resizing are 1 + 2 + 4 + 8 +
... + 2j. This sum is bounded by 2×2j = 2n, so the total number of hashes is less than 3n, which is
O(n).
Notice that it is crucial that the array size grows geometrically (doubling). It may be tempting to
grow the array by a fixed increment (e.g., 100 elements at time), but this causes n elements to be
rehashed O(n) times on average, resulting in O(n2) total insertion time, or amortized complexity
of O(n).

Hash tables in the Java Collection Framework


The standard Java libraries offer multiple implementations of hash tables. The class HashSet<T>
implements a mutable set abstraction: a set of elements of type T. The class HashMap<K,V> imple-
ments a mutable map from keys of type K to values of type V. There is also a second, older muta-
ble map implementation, Hashtable<K,V>, but it should be avoided; the HashMap class is faster and
better designed.

All three of these hash table imple-


mentations rely on objects having
a hashCode() method that is used to
compute the hash of an object. The
hashCode() method as defined by
Java is not a hash function. As
shown in the figure, it generates
the input to an internal hash func-
tion that is provided by the hash
table and that operates on integers.
Therefore, the hash function being
used in effect is the composition of
the two methods: h○hashCode

Generating hash codes


The design of the Java collection classes is intended to relieve the client of the burden of imple-
menting a high-quality hash function. The use of an internal hash function makes it easier to im-
plement hashCode() in such a way that the composed hash function h○hashCode is good enough.

However, a badly designed hashCode() method still can cause the hash table implementation to
fail to work correctly or to exhibit poor performance. There are two main considerations:

1. For the hash table to work, the hashCode() method must be consistent with the equals() meth-
od, because equals() is used by the hash table to determine when it has found the right ele-
ment or key. If two objects are equal according to equals(), they must also have equal hash
codes. Otherwise looking

2. For good performance, hashCode() should be as injective as is feasible: it should avoid map-
ping unequal objects to the same hashCode() to the extent possible. This goal implies that the
hash code should be computed using all of the information in the object that determines
equality. If some of the information that distinguishes two objects does not affect the hash
code, objects will always collide when they differ only with respect to that ignored infor-
mation.

Java provides a default implementation of hashCode(), which returns the memory address of the
object. For mutable objects, this implementation satisfies the two conditions above. It is usually
the right choice, because two mutable objects are only really equal if they are the same object.
On the other hand, immutable objects such as String's and Integer's have a notion of equality that
ignores the object's memory address, so these classes override hashCode().

Java's collection classes also override hashCode() to look at the current contents of the collection;
this way of computing the hash code is dangerous because mutating the collection used as the
key will change its hash code, breaking the class invariant of the hash table. Any collection being
used as a key must not be mutated.

Designing hash functions


If the client is providing hash codes that are quasi-injective but easily distinguishable from ran-
dom, the internal hash function is essential for providing diffusion of the information in the hash
code throughout the final hash. The goal of diffusion is to make the hash look random and thus to
avoid collisions and clustering. The internal hash function is “good enough” if the client compu-
tation that generates keys is not more likely than chance to cause collisions. For most client com-
putations, the implementation doesn't have to work very hard to achieve this goal.

Assuming the Java design with integer hash codes, we can use an integer hash function to pro-
vide diffusion. There are two standard approaches, modular hashing and multiplicative hashing:

Modular hashing
With modular hashing, the hash function is simply h(k) = k mod m for some modulus m, which
is typically the number of buckets. This hash function is easy to compute quickly when we have
an integer hash code. Some values of m tend to produce poor results though; in particular, if m is
a power of two (that is, m=2j for some j), then h(k) is just the j lowest-order bits of k. Throwing
away the rest of the bits works particularly poorly when the hash code of an object is its memory
address, as is the case for Java. Two or more of lower-order bits of an object address will be zero,
with the result that most buckets are not used! More generally, we want a hash function that uses
all the bits of the key so that any change in the key is likely to change the bucket it maps to. In
practice, primes not too close to powers of 2 work well as moduli.

Multiplicative hashing
A better alternative is multiplicative hashing, which is defined as h(k) = ⌊m * frac(k×A)⌋, where
A is a constant between 0 and 1 (e.g., Knuth recommends φ−1 = 0.61803...), and the function
frac gives the fractional part of a number (that is, frac(x) = x − ⌊x⌋). This formula uses that frac-
tional part of the product k×A to choose the bucket.

However, the formula above is not the best way to evaluate the hash function. If we choose m to
be a power of two 2q, we can scale up the multiplier A by 231, and then evaluate the hash func-
tion as follows using 64-bit long values, obtaining a q-bit result in [0,m):

h(k) = (k*A & 0x7FFFFFFF) >> (31-q)

Implemented properly, multiplicative hashing is faster and higher-quality than modular hashing.
Intuitively, multiplying together two large numbers diffuses information from each of them into
the product, especially around the middle bits of the product. The formula above picks out q bits
from the middle of the product kA.
Unfortunately, multiplicative hashing is often implemented incorrectly, and has unfairly acquired
a bad reputation in some quarters because of these bad implementations. The most common mis-
take is to implement it as (kA mod m). By the properties of modular arithmetic, kA mod m = ((k
mod m) × (A mod m) mod m). Therefore, this broken implementation merely shuffles the buck-
ets rather than providing real diffusion.

Adversarial computation and cryptographic hash functions


For good performance, the goal of the hash table is that collisions should occur as if at random.
Therefore, whether collisions occur depends to some extent on the keys being generated by the
client. If the client is an adversary trying to produce collisions, the hash table must work harder.
Many early web sites implemented using the Perl programming language were subject to denial-
of-service attacks that exploited the ability to cause hash table collisions. Attackers used their
knowledge of Perl's hash function on strings to craft strings that collided, effectively turning
Perl's associative arrays into linked lists. Resizing the array of buckets didn't help, because the
collision happened in the space of hash codes.

An alternative way to design a hash table is to give the job of providing a high-quality hash func-
tion entirely to the client code: the hash codes themselves must look random. This approach puts
more of a burden on the client but avoids wasted computation when the client is providing a
high-quality hash function already. In the presence of keys generated by an adversary, the client
already should be providing a hash code that appears random (and ideally one with at least 64
bits), because otherwise the adversary can engineer hash code collisions. For example, it is possi-
ble to choose strings such that Java's String.hashCode() produces collisions.

To produce hashes resistant to an adversary, a cryptographic hash function should be used. The
message digest algorithms MD5, SHA-1, and SHA-2 are good choices whose security increases
(and performance decreases) in that order. They are available in Java through the class
java.security.MessageDigest. Viewing the data to be hashed as a string or byte array s, the value
MD5(R + s) mod m is a cryptographic hash function offering a good balance between security and
performance. MD5 generates 128 bits of output, so if m = 2j, this formula amounts to picking j
bits from the MD5 output. The value R is the initialization vector. It should be randomly gener-
ated when the program starts using a high-entropy input source such as the class
java.security.SecureRandom. The initialization vector prevents the adversary from testing possible
values of s ahead of time. For very long-running programs, it is also prudent to proactively re-
fresh R periodically, though this requires rehashing all hash tables that depend on it.

Precomputing hash codes


High-quality hash functions can be expensive. If the same values are being hashed repeatedly,
one trick is to precompute their hash codes and store them with the value. Hash tables can also
store the full hash codes of values, which makes scanning down one bucket fast; there is no need
to do a full equality test on the keys if their hash codes don't match. In fact, if the hash code is
long and the hash function is cryptographically strong (e.g., 64+ bits of a properly constructed
MD5 digest), two keys with the same hash code are almost certainly the same value. Your com-
puter is then more likely to get a wrong answer from a cosmic ray hitting it than from a collision
in random 64-bit data.

Precomputing and storing hash codes is an example of a space-time tradeoff, in which we speed
up computation at the cost of using extra memory.
Measuring clustering
When the distribution of keys into buckets is not random, we say that the hash table exhibits
clustering. If you care about performance, it's a good idea to test your hash function to make
sure it does not exhibit clustering. With any hash function, it is possible to generate data that
cause it to behave poorly, but a good hash function will make this unlikely.

A good way to determine whether your hash function is working well is to measure clustering. If
bucket i contains xi elements, then a good measure of clustering is the following:

C = (m/(n − 1))((∑i(xi2)/n) − 1).

A uniform hash function produces clustering C near 1.0 with high probability. A clustering meas-
ure C that is greater than one means that clustering will slow down the performance of the hash
table by approximately a factor of C. For example, if m=n and all elements are hashed into one
bucket, the clustering measure evaluates to n. If the hash function is perfect and every element
lands in its own bucket, the clustering measure will be 0. If the clustering measure is less than
1.0, the hash function is spreading elements out more evenly than a random hash function would;
not something to count on happening!

The reason the clustering measure works is because it is based on an estimate of the variance of
the distribution of bucket sizes. If clustering is occurring, some buckets will have more elements
than they should, and some will have fewer. So there will be a wider range of bucket sizes than
one would expect from a random hash function.

Note that it's not necessary to compute the sum of squares of all bucket lengths; picking enough
buckets so that enough keys are counted (say, at least 100) is good enough.

Unfortunately most hash table implementations, including those in the Java Collections Frame-
work, do not give the client a way to measure clustering. Client can't easily tell whether the hash
function is performing well. Hopefully, future hash table designers will provide some clustering
estimation as part of the interface.

Digression for those who have taken some probability theory: Consider bucket i contain-
ing xi elements. For each of the n elements, we can imagine a random variable ej, whose
value is 1 if the element lands in bucket i (with probability 1/m), and 0 otherwise. The buck-
et size xi is a random variable that is the sum of all these random variables:

xi = ∑j∈1..n ej

Let's write E(x) for the expected value of variable x, and Var(x) for the variance of x, which
is equal to E((x − E(x))2) = E(x2) − E(x)2. Then we have:

E(ej) = 1/m
E(ej2) = 1/m
Var(ej) = 1/m - 1/m2
E(xi) = nE(ej) = α

The variance of the sum of independent random variables is the sum of their variances. If
we assume that the ej are independent random variables, then:
Var(xi) = n Var(ej) = α - α/m = E(xi2) - E(xi)2
E(xi2) = Var(xi) + E(xi)2
= α(1 - 1/m) + α2

Now, if we sum up all m of the variables xi, and divide by n, as in the formula, we should ef-
fectively divide this by α:

(1/n) E(∑ xi2) = (1/α)E(xi2) = 1 - 1/m + α

Subtracting 1, we get (n−1)/m. The clustering measure multiplies this by its reciprocal to get
1.

Suppose instead we had a hash function that hit only one of every c buckets, but was ran-
dom among those buckets. In this case, for the non-empty buckets, we'd have

E(ej) = E(ej2) = c/m


E(xi) = αc
(1/n) E(∑ xi2) - 1 = αc − c/m
= c(n-1)/m

Therefore, the clustering measure evaluates in this case to c. In other words, if the clustering
measure gives a value significantly greater than one, it is like having a hash function that
only hits a fraction 1/c of the buckets.

References
Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms.
Sorting
Sorting a collection of values is a fundamental operation with many uses. Let's look at the most
common algorithms. You might ask why we need to talk about sorting algorithms at all, given
that sorting algorithms are built into Java (see Arrays.sort) and many other languages these days.
One reasong is that it is useful to understand the tradeoffs between different sorting algorithms.
Another is that at some point you will probably have to use an environment in which sorting is
not so available. Finally, sorting is a great opportunity to talk about algorithms, loop invariants
and performance analysis.

Insertion sort
Insertion sort is a simple algorithm that is the fastest way to sort small arrays. Intuitively, inser-
tion sort scans through the array from left to right, making sure that the part of the array that has
been scanned is always in sorted order. The code can be written as a loop with a loop invariant
depicted as follows:

Figure 1: Invariant for the outer loop of insertion sort

The loop invariant is maintained by shifting each newly encountered element (at index i) left-
ward into the place it belongs in the sorted part of the array. This insertion causes the sorted part
of the array to grow by one element. Eventually all elements have been inserted into the sorted
part and there is nothing left to sort.

/** Effect: Put a into ascending sorted order. */


void sort(int[] a) {
for (int i = 1; i < a.length; i++) {
int k = a[i];
int j = i;
for (; j > 0 && a[j-1] > k; j--)
a[j] = a[j-1];
a[j] = k;
}
}

The loop invariant for the outer loop is as depicted above. The invariant is satisfied when i=1
and each loop iteration ensures that the value that index i initially pointed to (k) is inserted into
the right place.

Figure 2: Invariant for the inner loops

The invariant for the inner loop is also illustrated in the figure. The index j points to an array lo-
cation such that everything to the left of j (region A) is less than everything to the right (region
B). Further, everything in region B is greater than the value to be inserted, k. When the loop ter-
minates, the top element in A is less than or equal to the k, so k can be placed in the element
marked “?”. Figuring out loop invariants helps us write code like this that is efficient and correct.

The running time of insertion sort is best when the array is already sorted. In this case the inner
loop stops immediately on each outer iteration, so the total work done per outer iteration is con-
stant. Therefore the total work done by the algorithm is linear in the array size, which we write as
O(n).

The worst case for the algorithm is when the array is sorted in the reverse order. In that case the
loop on j goes all the way down to 0 on each outer iteration. The first iteration does two copies,
the second three copies, and so on, so the total work is Σ {2,...,n} = n(n+1)/2 − 1. This function
is O(n2), since O(n2+n) = O(n2).

Recall that in general, we can drop lower-order terms from polynomials when determining as-
ymptotic complexity. For example, in this case (n2+n)/n2 limits to a constant (1) as n becomes
large. Therefore the two functions in the ratio have the same asymptotic complexity.

Insertion sort has one other nice property: implemented properly, it is a stable sort, meaning that
if given an array containing elements that are equal to each other, it keeps those elements in the
same relative order as in the original array.

Selection sort
Selection sort is another sorting algorithm, used more commonly by humans than by computers.
Intuitively, it tries to find the right element to put in each location of the final array. Once an ar-
ray location is set to contain the right element, it is never changed.

for (int i = 0; i < n; i++) {


// find smallest element in subarray a[i..n−1]
// swap it with a[i]
}

Because each loop iteration must in turn iterate over the rest of the array to find the smallest ele-
ment, the best-case performance of this algorithm is the same as the worst-case performance:
O(n2).

Merge sort
More efficient sorting algorithms use recursion to implement a divide-and-conquer strategy.
They break the array into smaller subarrays and recursively sort them. Merge sort is one such al-
gorithm. Given an array to sort, it finds the middle of the array and then recursively sorts the left
half and the right half of the array. Then it merges the resulting arrays. A temporary array tmp is
provided to give space for merging work:

/** Sort a[l..r-1]. Modifies tmp.


Requires: l < r, and tmp is an array at least as long as a.
*/
void sort(int[] a, int l, int r, int tmp[]) {
if (l == r-1) return; // already sorted
int m = (l+r)/2;
sort(a, l, m, tmp);
sort(a, m, r, tmp);
merge(a, l, m, r, tmp)
}

The real work is done in merge, which takes time linear in the total number of elements to be
merged: O(r−l). It works by scanning both subarrays to be merged from left to right, picking the
smaller element from each array as the following diagram suggests:

Array during merge

Here is the code. We use the notation a[l..r) to mean a[l..r-1].

/** Place a[l..r) into sorted order.


* Requires: l < m < r, and a[l..m) and a[m..r) are both in sorted order.
* Performance: O(r-l)
*/
void merge(int[] a, int l, int m, int r, int[] tmp) {
int i = l, j = m, k = l;
while (i < m && j < r)
tmp[k++] = (a[i] < a[j]) ? a[i++] : a[j++];
System.arraycopy(a, i, tmp, k, m-i);
System.arraycopy(tmp, l, a, l, j-l);
}

At the end of the while loop, either i = m or j=r, but not both, because only one of i and j is in-
cremented on each loop iteration. Therefore, array a still contains some elements that have not
been copied to tmp, either in a[i..m) (if j = r) or in a[j..r) (if i = m). if j = r, the first arraycopy
call transfers the elements a[i..m) to tmp and the second arraycopy copies all the elements from
tmp back to a (since j-l = r-l). If i=m, however, elements in a[j..r), are already in the right
place in a, so there is no need to copy them to tmp and back again. The first arraycopy does noth-
ing, and the second arraycopy copies just the elements tmp[l..j) into a[l..j), leaving a[j..r)
alone.

The running time of this algorithm is always O(n lg n), which is big improvement on O(n2). For
example, if sorting a million elements, the speedup, ignoring constant factors, is 1,000,000/lg
1,000,000 ≈ 50,000. The speedup probably won't be quite that great when comparing to insertion
sort because of constant factors.

To see why it is n lg n, think about the whole sequence of recursive calls shown in Figure 2. Each
layer of recursive calls takes total merge time proportional to n, and there are lg n recursive calls.
The total time spent in the algorithm is therefore O(n·lg n).

Merge sort, like insertion sort, is a stable sort. This is a major reason when merge sort is com-
monly used. Another is that its run time is predictable.
Merge sort is
not as fast as
the quicksort
algorithm that
we will see
next because
it does extra
copying into
the temporary
array. We can
avoid some of
the copying
by exchanging
the roles of a
and tmp on al-
ternate recur-
Figure 2: Merge sort performance analysis
sive calls.
This speeds
up the algorithm at the cost of more complex code. It is actually possible to do an in-place merge
in linear time, but in-place merging is tricky and is slower in practice than using a separate array.

Another trick that is used to speed up merge sort is to use insertion sort when the subarrays get
small enough. For very small arrays insertion sort is faster, because k1n2 is smaller than k2n lg n
when n and k1 are small enough!

Quicksort
Quicksort is another divide-and-conquer sorting algorithm. It avoids the work of merging by par-
titioning the array elements before recursively sorting. The algorithm chooses a pivot value p
and then separates all the elements in the array so that the right half contains elements at least as
large as p and the left half contains elements no larger than p.

Final partitioned state of the array

Thus, quicksort does some of the work of sorting before recursing. The two resulting subarrays
can then be sorted recursively and the algorithm is done.

/** Sort a[l..r-1] */


void qsort(int[] a, int l, int r) {
if (l == r-1) return; // base case: already sorted

// partition elements around some pivot value p, obtaining partition index k


int k = partition(a, l, r);

qsort(a, l, k);
qsort(a, k, r);
}
One thing we notice is that the choice of pivot matters. If the pivot value is the largest or smallest
element in the array, the subarrays have lengths 1 and n−1. If this happens on every recursion—
which it easily can if the array is sorted to begin with—quicksort will take O(n2) time. One solu-
tion is to choose the pivot randomly from among the elements of the array, and swap it with a[l].
With this choice, quicksort has expected run time O(n lg n), using reasoning similar to that for
merge sort. A different, commonly used heuristic is to choose the median of the first, the last, and
the middle element of the array. This cheaper heuristic makes quicksort perform well on arrays
that are mostly sorted, while usually avoiding the O(n2) case in practice.

Partitioning
Now, how to partition elements efficiently? We want the array to end up looking like the diagram
above. The idea is to start two pointers i and j from opposite ends of the array. They sweep in to-
ward the middle swapping elements as necessary to achieve the final partitioned state shown
above.

We start with the array containing the pivot value in its first element and the rest of the array in
an unknown state:

Initial state of the array

The initial loop advances j so that it points to an element on the wrong side of the array, with l ≤
i ≤ j < r:

State of the array at the start of the main loop

The loop must have an invariant that starts out describing this state but that ends up describing
the desired final state. As the following diagram suggests, the invariant says that all elements
strictly to the left of i are at most p, and all elements strictly to the right of j are at least p. Fur-
ther, the element at i itself is at least p (and therefore belongs on the right-hand side of the array)
and the element at j is at most p (and belongs on the left-hand side). Finally, the inequalities l ≤ i
≤ j+1 ≤ r hold, so both i and j are in bounds and can go past each other by at most a single index.
Therefore, if i and j do past each other, the values they index will be on the correct side of the
array and the array can be partitioned between them.

Invariant during partitioning

Despite (or because of!) the complexity of the invariant, the partitioning code can then look very
simple and be very efficient:
/** Partition array a into a[l..k) and a[k..r), where l<k<r, and all elements
* in a[l..k) are less than or equal to all elements in a[k..r).
* Requires: 0≤l, r≤a.length, and r-l≥2.
*/
int partition(int[] a, int l, int r) {
int p = a[l]; // better: swap a[l] with random element first
int i = l, j = r;
do j--; while (a[j] > p);
while (i < j) {
swap a[i] ⇔ a[j]
do i++; while (a[i] < p);
do j--; while (a[j] > p);
}
return j+1;
}

The inner loops are written as do...while loops because we want to do the body of each loop
once even if the loop guard is initially false. Interestingly, these inner loops do not need to do any
bounds checking on i and j. The reason bounds checks are not needed is after swapping a[i] and
a[j], there must be at least one value “ahead” of both i and j that will stop their inner loops.

An example of partitioning will probably help understand what is going on. We start out with the
following array, with p=5:

5 2 6 7 1 9 3 8
i j

Before the main loop starts, we move j down to a value that can be swapped with the pivot at in-
dex i:

5 2 6 7 1 9 3 8
i j

In the first iteration, we swap and then move i and j inward to the next swappable values:

3 2 6 7 1 9 5 8
i j

In the second iteration, we swap and then move i and j inward again:

3 2 1 7 6 9 5 8
j i

Since i > j, the loop halts and the result is j+1 = 3. The two subarrays to be recursively sorted are
(3,2,1) and (7,6,9,5,8).

The loop can either stop with i = j+1 or with i = j in the case where a[i] = a[j] = p. Since j must
be decremented at least twice, either by the initial loop or by the first iteration of the main loop,
the value j+1 will always be a valid array index.
Quicksort is an excellent sorting algorithm for many applications. However, one downside is that
it is not a stable sort. As with merge sort, it makes sense to switch to insertion sort for sufficiently
small subarrays.

Quickselect
Finding the maximum and minimum elements in an array is a straightforward O(n) algorithm.
But what if we want to find the median element? Or the 10th largest element? The problem of
finding the nth largest element in an array is called the order statistics problem. Clearly, we can
solve this problem by sorting the array and then indexing to the appropriate position, but that
does a lot of sorting work that is not necessary.

Fortunately, the quicksort algorithm can be tweaked a little to solve the order statistics problem
efficiently.

/** Returns: the nth largest element in the subarray a[l..r)


* Requires: 0 ≤ n < r-l and l ≤ r < a.length */
int qselect(int[] a, int l, int r, int n) {
if (l+1 == r) return a[l];
int k = partition(a, l, r);
if (n < k)
return qselect(a, l, k, n);
else
return qselect(a, k, r, n);
}

If partitioning perfectly splits the array in half each time, the total work is proportional to the fi-
nite geometric series n + n/2 + n/4 + ... + 1 = 2n− 1, which is O(n). Of course, the split won't usu-
ally be perfect, but the average split will still result in a series that is O(n). Therefore, expected
time is O(n); worst-case time is, as with quicksort, O(n2).

Unlike quicksort, this method is tail-recursive, so it can be converted to a loop in the usual way,
to obtain efficient

/** Returns: the nth largest element in the subarray a[l..r)


* Requires: 0 ≤ n < r-l and l ≤ r < a.length */
int qselect(int[] a, int l, int r, int n) {
while (l+1 < r) {
int k = partition(a, l, r);
if (n < k) r = k;
else l = k;
}
return a[l];
}
Grammars and Parsing
Parsing is something we do constantly. In fact, you are doing it right now as you read this sen-
tence. Parsing is the process of converting a stream of input into a structured representation. The
input stream may consists of words, characters, or even bits. The output of the process is a tree.

Our brains are remarkably good at parsing. When we hear a sentence like “the rat ate cheese”,
our brains build a parse tree similar to the following diagram:

Notice that the leaves of the tree, in left-to-right order, spell out the sentence, but there are also
some other nodes higher up in the tree describing the function of each words and of subsequenc-
es of words.

Your brain can handle much more complex sentences, though it does have its limits. On the other
hand, when you read a supposed sentence like “rat cheese the ate the”, you instantly recognize
this as not being a sentence at all, because it has no parse tree. This sequence of words is, in fact,
a syntax error.

Parsing is performed by computers as well. Your Java programs are parsed by the Java compiler.
Even more mundane devices such as calculators use parsing to interpret mathematical expres-
sions.

For programming languages, legal syntax is defined by a grammar, which specifies which input
sequences have a parse tree. While the situation in real human languages is more complex, for
programming languages, legal syntax is defined using a context-free grammar. The modifier
context-free means that the legal syntax for a subtree of the parse tree (say, the possible subtrees
of a “noun phrase” node, above) depends only on the node at the root of the subtree and not on
the rest of the tree.

The input parsed using a grammar consists of a series of symbols, The grammar is defined in
terms of both terminal symbols (also called tokens) that can appear as part of the input (e.g.,
“rat”) and appear in the parse tree only at its leaves, and nonterminal symbols (e.g., “noun
phrase”) that appear at all other nodes in the tree.

Productions
A context-free grammar is defined by a set of productions that define how a single nonterminal
can be expanded out into a longer series of symbols. A given nonterminal can have multiple dif-
ferent productions specifying alternative ways to expand it. For example, we can write a pseudo-
English grammar corresponding to the parse tree above:
sentence → noun-phrase verb-phrase noun-phrase
sentence → noun-phrase verb-phrase
noun-phrase → noun
noun-phrase → article noun
noun-phrase → adjective noun
adjective → the | old
noun → cat | dog | rat | cheese
verb → ate | barks | ...

As an abbreviation, when a single nonterminal has multiple productions, we write them all to-
gether on the right side separated by a vertical bar.

One of the nonterminals in a context-free grammar is designated as the start symbol; it is the
root of every possible parse tree. The leaves of the parse tree are terminals, and every other node
is a nonterminal whose children correspond to one of the productions in the grammar.

The language of a grammar is the set of strings of terminal symbols that can be produced by
constructing a parse tree with that grammar: the possible strings that can be derived from the
start symbol. The job of a parser is to figure out whether an input string is part of the language of
the grammar.

The language of a grammar can be infinite in size if the grammar is recursive. For example, the
production noun-phrase → adjective noun-phrase allows us to derive the following noun
phrases: “the dog”, “the old dog”, “the old old dog”, and so on indefinitely.

Ambiguity
With some grammars, it is possible for a string to have more than one parse tree. Such a gram-
mar is said to be ambiguous.

An example of an ambiguous grammar is the following grammar for arithmetic expressions:

E → n | E + E | E × E | ( E )

The symbols n, +, ×, (, and ) are all terminals and the only nonterminal is the start symbol E. The
symbols n stands for all possible numeric literals.

With this grammar, a string like “2 + 3 × 5” has two parses, as shown in the following figure.

Two parse trees for “2+3×5”

Only one of these parse trees corresponds to our usual understanding of the meaning of the ex-
pression as arithmetic: the one on the left. Usually, we want the grammars we write to be unam-
biguous so their meaning is clear. The problem of determining whether a grammar is ambiguous
is one that cannot be solved in general (it is said to be undecidable). However, in practice it is
possible to design grammars in such a way that ambiguity is avoided.

The problem with the parse tree on the right is that it violates our expectations about prece-
dence. It doesn't make sense to place the production for “+” directly under the production for “×”
because multiplication has higher precedence than addition. To get the expected precedence, the
grammar can be rewritten to use more nonterminals. We add more nonterminals to prevent a “+”
production from appearing directly under a “×” production. Here, T stands for “term” (an expres-
sion at the ”+” level of precedence) and F stands for ”factor” (at the ”×” level of precedence).
The productions in this grammar prevent a T from appearing under an F in the parse tree:

E → E + T | T
T → T * F | F
F → n | ( E )

This grammar has exactly the same language as the above grammar but is unambiguous. For ex-
ample, the example string parses uniquely as follows:

Recursive-descent parsing
THe idea of recursive-descent parsing, also known as top-down parsing, is to parse input
while exploring the corresponding parse tree recursively, starting from the top.

Let's build a parser using the following methods. For simplicity, we will assume that the input ar-
rives as tokens of type String. We assume there is a special token EOF at the end of the stream.

/** Returns: the next token in the input stream. */


String peek();

/** Consumes the next token from the input stream, and returns it. */
String consume();

Using these two methods, we can build two more methods that will be especially useful for pars-
ing:

/** Returns: whether the next token is s. */


boolean peek(String s) {
return peek().equals(s);
}

/** Effects: Consume the next token if it is s.


* Throw SyntaxError otherwise
*/
void consume(String s) throws SyntaxError {
if (peek(s)) { consume(); }
else throw new SyntaxError();
}

The idea of recursive-descent parsing is that we implement a separate method for each nontermi-
nal. The content of each method corresponds exactly to the productions for that nonterminal. The
key is that it must be possible to predict from the tokens seen on the input stream which produc-
tion is being used. Recursive-descent parsing is therefore also known as predictive parsing.

For example, the method to parse an E nonterminal first parses a T (because both productions
start with T), then peeks ahead at the next token to see if the input contains a T + E or just T.

// E → T | T + E
void parseE() throws SyntaxError {
parseT();
if (peek("+")) {
consume();
parseE();
}
}

Similarly, the method parseT looks for “×” to decide which production to use:

// T → F | F + T
void parseT() throws SyntaxError {
parseF();
if (peek("×")) {
consume();
parseT();
}
}

And parseF() can decide using the first symbol it sees, assuming we have an appropriate method
isNumber():

// F → n | ( E )
void parseF() throws SyntaxError {
if (isNumber(peek())) {
consume();
} else {
consume("(");
parseE();
consume(")");
}
}
Building an evaluator
Thus far, the parser we've built is only a recognizer that decides whether the input it sees is in
the language. It doesn't build any kind of representation of the input. With a few modifications,
we can easily convert it into an evaluator that figures out what the expression evaluates to. The
result type of all the methods becomes int. Working from the bottom up:

int parseF() throws SyntaxError {


if (isNumber(peek())) {
return Integer.parseInt(consume());
} else {
consume("(");
int ret = parseE();
consume(")");
return ret;
}
}
int parseT() throws SyntaxError {
int v = parseF();
if (peek("×")) {
consume();
v = v * parseT();
}
return v;
}
int parseE() throws SyntaxError {
int v = parseT();
if (peek("+")) {
consume();
v = v + parseT();
}
return v;
}

Now we can parse the previous example input and get a result of 17!

Abstract syntax trees


The output of a parser is usually an abstract syntax tree (AST), which differs from a parse tree
in that it contains no noninformative terminal symbols. For example, if we parse the input
“2+(3×5)”, we expect to get a tree like the following,

Abstract syntax tree

Notice what's not in this tree structure: the parentheses. The parentheses aren't needed because
the tree structure captures the expected evaluation order. In fact, the expressions “2+(3×5)” and
“2+3×5” should produce exactly the same AST. Notice also that this tree doesn't keep track of
nonterminals like T and F. The tree structure alone is enough to determine the correct order of
arithmetic operations.

The abstract syntax tree is implemented as a data structure. However, unlike the tree structures
we have seen up until this point, the nodes are of different types.

class Expr {}
enum BinaryOp { PLUS, TIMES }
class Binary extends Expr {
BinaryOp operator;
Expr left, right;
Plus(BinaryOp op, Expr l, Expr r) {
operator = op;
left = l;
right = r;
}
}
class Number extends Expr {
int value;
Number(int v) { value = v; }
}

Using these classes, we can easily build the AST shown in the previous figure:

Expr e = new Binary(BinaryOp.PLUS,


new Number(2),
new Binary(BinaryOp.TIMES, new Number(3), new Number(5)));

Parsing an AST
A parser that constructs an AST can now be written by changing our recognizer once more, this
time to return an Expr. Instead of computing integers, as in the evaluator, we just construct the
corresponding AST nodes:

Expr parseF() throws SyntaxError {


if (isNumber(peek())) {
return new Number(Integer.parseInt(consume()));
} else {
consume("("); // parentheses discarded here!
return parseE();
consume(")");
}
}
Expr parseE() throws SyntaxError {
Expr e = parseT();
if (peek("+")) {
consume();
e = new Binary(BinaryOp.PLUS, e, parseE());
}
return e;
}
Expr parseT() throws SyntaxError {
Expr e = parseF();
if (peek("×")) {
consume();
e = new Binary(BinaryOp.TIMES, e, parseT());
}
return e;
}

Limitations of top-down parsing


Some grammars cannot be parsed top-down. Unfortunately, they include grammars that we
might naturally want to write. Particularly problematic are grammars that contain left-recursive
productions, where the nonterminal being expanded appears on the left-hand side of its own pro-
duction. The production E → E + T is left-recursive, whereas the production used above, E → T
+ E, is right-recursive. A left-recursive production doesn't lend itself to predictive parsing, be-
cause top-down construction of the AST means the parser needs to be able to choose which pro-
duction to use based on the first symbol seen.

Grammars with left-recursive productions are very useful, because they create parse trees that
describe left-to-right computation. With the right-recursive production used above, the string “1
+ 2 + 3” creates an AST in which evaluation proceeds right-to-left. Programmers normally ex-
pect left-to-right evaluation, and this is even more of a problem if the operator in question is not
associative (for example, subtraction).

Reassociation
The two productions E → T + E | T can be viewed as shorthand for an infinite list of productions:

E→T
E→T+T
E→T+T+T
E→T+T+T+T
...

Another way to express all these productions is to adapt the Kleene star notation used in regular
expressions: the expression A* means 0 or more concatenated strings, each chosen from the lan-
guage of A. For example, if the language of A is {"a", "bb"}, then the language of A* has an infi-
nite number of elements, including "a", "aa", "aaa", "bb", "bbbb", "abba", "bbabbbb", and many
more.

Using the Kleene star notation, we can rewrite the infinite list of productions:

E → T ( + T )*

where the parentheses are being used as a grouping construct (they are metasyntax rather than
syntax).

The point of rewriting the productions in this way is that parsing a use of Kleene star naturally
lends itself to an implementation as a loop. Within that loop, the AST can be built bottom-up, so
that the operator associates to the left:

void parseE() throws SyntaxError {


Expr e = parseT();
while (peek("+")) {
consume();
e = new Binary(BinaryOp.PLUS, e, parseT());
}
}

Given input “t0 + t1 + t2 + t3 + ...”, the corresponding abstract syntax tree built by this code looks
as follows:

Because it allows parsing of left-associative operators, reassociation via bottom-up tree construc-
tion is an important technique for top-down parsing.
Designing and documenting interfaces
Good software engineering is about dividing code into modules that separate concerns and local-
ize them within modules. These modules then interact via interfaces that provide abstraction
barriers supporting local reasoning. Let's look more closely at the problem of designing good in-
terfaces.

What makes a good interface?


Interfaces exist at the boundaries of the modular decomposition. An interface will be most effec-
tive when it has the following three properties:

1. It provides a strong abstraction barrier between modules.

2. The interface should be as narrow as possible while providing the functionality needed by
clients.

3. The interface should be clearly specified.

We've already discussed abstraction earlier; our goal here is to examine the second two attributes
of a good interface.

By a narrow interface, we mean an interface that exposes few operations or other potential de-
pendencies between modules. The opposite of a narrow interface is a wide interface, one that ex-
poses many operations or potential dependencies between modules.

The choice between a narrow interface and a wide interface is not always obvious, because there
are benefits to each approach. We can compare and contrast the philosophies:

narrow wide

few operations, limited functionality many operations, much functionality


for clients to use available for clients

easy to extend, maintain, reimplement hard to extend, maintain, reimplement

loose coupling: clients less likely to be tight coupling: clients more likely to be
disrupted by changes disrupted by changes

In principle, it's possible to make the interface so narrow that it interferes with clients getting
their job done in an efficient and straightforward way. But this is not the usual mistake of soft-
ware designers; more typically, they make interfaces too wide, leading to software that is hard to
maintain, extend, and reimplement without breaking client code.

The rule of thumb, then, is that interfaces should be made only as wide as necessary for efficient
client code to be written in a straighforward way.
Often when a narrow interface feels awkward to use, it is possible to address this problem by
writing convenience functions that are implemented outside the module, using only the narrow
interface that the module provides. Clients can then use the convenience functions to avoid code
duplication, but without widening the interface and thereby introducing new dependencies be-
tween modules.

When a module's interface is wide and there doesn't seem to be a way to avoid this by writing
convenience functions or by separating the module into multiple modules, it is often a sign that
you haven't managed to separate program concerns into different modules. When concerns are
not sufficiently separated, there are inherently too many interactions between the different parts
of the program to define a narrow interface between the components.

How should we document interfaces?


Once we've decided what operations and other functionality belongs in an interface, what docu-
mentation should be added? An important principle can guide us here: documentation is code,
code for humans to run. The documentation is a human-readable abstraction of the code that (de-
pending on which documentation we're talking about) supports programmers writing client code
or maintaining the implementation.

The most important function of documentation is to provide specifications of what code does.
Specifications are particularly useful for supporting client code, but also help implementers.

Where to document?
According to the principle that documentation is code, the best place for documentation is with
the code itself, in the form of program comments. When this is not practical, code documentation
should be linked from code so it can be easily accessed. Javadoc documentation is a good exam-
ple of this principle in action: the documentation is extracted from the code, so it cannot be sepa-
rated from it.

Documenting code separately in separate documents may be appealing, but the more separate
documentation is from the code it describes, the more it tends to diverge from the code. The
more it diverges from the code, the less useful it becomes and the less programmers rely on (or
look at) the documentation. Both documentation and code require programmers' attentation to
stay fresh!
When to document? (Early!)

Too often, programmers write their documentation at the end of the design and implmentation
process, as a kind of afterthought. The workflow of design, coding, debugging, and documenting
tends to look like the figure on the left. A lot of time is spent debugging because the design is not
worked out carefully enough. In general, spending a lot of time debugging is a sign you haven't
worked hard enough on the design.

Documenting the design early, as shown in the figure on the right, helps you work bugs out of
your design and to understand your design better. Typically, this makes both coding and debug-
ging faster. Sometimes your code just works on the first try!

The moral is that documentation is not some kind of esthetic decoration for your code. It is a tool
that can improve your designs and save you time in the long run.

What to write?
1. Know your audience: Tell your reader the things they need to know in a way they can un-
derstand. But your reader's attention is precious. Don't waste space on obvious, boring
things. Filler distracts from what's important. Avoid “filler” comments that don't add any
value and distract from what's important, such as:

x = x + 1 // add one to x

2. Be coherent: avoid interruptions. Better to write one clear explanation than to intersperse
explanatory comments throughout the code.

3. Respect the abstraction barrier: write specifications in terms the reader/client can under-
stand without knowing the implementation.
An example: polynomials

Step 1: describe the abstraction

/** A polynomial over a single variable x


* Example: 2 + 3x - 5.3x3
*/
interface Poly {
...
}

Step 2: choose operations


Well-designed methods usually fall into one of three categories: creators (factory methods), ob-
servers (also known as queries), and mutators (also known as commands).

Creators: these methods are used to create new objects, particular those of the class being
implemented.
Observers: these methods report on object state but have no side effects.
Mutators: the primary purpose of these methods is to have a side effect.

Abstractions that do not have any mutators that can change their state, such as String and
Integer, are immutable abstractions. Abstractions with mutators are mutable. Both kinds of ab-
stractions have their uses. The advantage of immutable abstractions is that their objects can be
shared freely by different code modules.

The useful principle of command-query separation can guide how we design methods. The
principle says that a given operation should fall into one of these three categories, rather than
multiple categories. This makes the interface easier to use. For example, you don't want to be
forced to have side effects in order to check the state of an object.

Considering each of the categories in turn, we might come up with operations like the following:

Creators:
zero: create the zero polynomial
monomial: create a polynomial with form axb
fromArray: create a polynomial with coefficients defined by an array of doubles.
derivative:create a polynomial that is the derivative of the given polynomial (also
an observer).
plus: create a polynomial that is the sum of two polynomials.
minus: create a polynomial that is the difference of two polynomials.

Observers:
degree: report the maximum exponent with non-zero degree.
coefficient: report one coefficient of the polynomial
evaluate: evaluate the polynomial at a given value for x
toString: generate a string representation of the polynomial
equals: report equality of a polynomial with another object.

Mutators: We might not want to have mutators at all, so that we don't have to worry about
polynomials changing their values. If polynomial are to be mutable, however, we need
mutators, e.g.:
clear: set this to the zero polynomial.
add: add another polynomial to this.

Notice that we have not discussed how we are going to implement this polynomial abstraction.
That is a good thing. We want to expose the operations that clients are going to need. We might
have to make sacrifices because some operations are hard or expensive to implement, but that
should be done only after thinking about the ideal interface.

We want to avoid adding operations that we can implement efficiently using existing operations.
For example, we might be tempted to have an operation that finds zeros of the polynomial. How-
ever, such an operation can probably be implemented efficiently using either factoring (for low-
degree polynomials) or numerically via Newton's method, using evaluate and derivative.

Standard operations. Some operations are so useful that it is worth thinking about whether you
will want them for every data abstraction you define:

equals. Testing whether two values are equal is fundamental to mathematics and to pro-
gramming.

toString. It is very useful for debugging to be able to print a string representation of an


object. Ideally, two objects should have equal answers for toString() if and only if they are
equal according to equals.

hashcode. If you want to use an object as key in a hash table, it needs to have a hashcode()
method. Two objects that are equal according to equals() must have the same hashcode.
Two application-generated objects that are not equal according to equals() should have
different hash codes with high probability.

a copy constructor. For mutable abstractions (that is, abstractions that have mutators), it
is often useful to make a copy of an existing mutable object. Until mutators are used on ei-
ther the copy or the original, the two should be indistinguishable. There should be no way
to affect the original by mutating the copy, or vice versa. Among other uses, copies are
handy for building test cases.

The problem with getters and setters. Some people reflexively add getters and setters to clas-
ses they write. Getters are observer methods that merely report the contents of fields, and setters
are mutators that simply change the values of fields to an arbitrary value. Both getters and setters
can undermine abstraction. Setters are especially pernicious because they allow any client to
change the contents of the object in an arbitrary way. You might as well make the field public.
Often it does makes sense to have operations that are implemented by reporting the contents of a
field, when the field contains information that makes sense to the client. For example, we might
implement polynomials with a field that keeps track of the degree of the polynomial. The degree
method might just report the contents of that field, e.g.
int degree() {
return deg;
}

However, this should not be thought of as a getter. As far as the client knows we might instead
have kept track of all the coefficients in an array and implemented degree by reporting the array's
length:

int degree() {
return coefficients.length - 1;
}

A second problem with getters is that they may lead to representation exposure, in which muta-
ble state from inside the abstraction is made available in an unconstrained way to clients. For ex-
ample, suppose we stored the coefficients of the polynomial as an array in the instance variable
coefficients, and provided a getter:

double[] coefficients;
double[] getCoefficients() {
return coefficients;
}

Now a client can get access to the internal array of coefficients and change the polynomial in an
unconstrained way, possibly breaking its class invariant.

So think before adding getters and think twice before adding setters to an interface.

Another operation that is overrated is the default constructor. It is the job of constructors to cre-
ate a properly initialized object. Unfortunately, one often sees programmers using a default con-
structor to create the object, then initializing the fields using setters. This style of programming is
a strong sign that the abstraction is poorly designed. A default constructor does make sense for
some abstractions, typically mutable ones whose state can be changed using additional calls to
mutators.

Step 3: write specifications


We have a rough idea of the operations we want to support. But before we start implementing,
we should write clear, precise specifications so we know when we've implemented the operations
correctly.

For each method, we need to define a signature that gives the types of the parameters and the re-
turn value, and the possible exceptions. And we need to define a specification (spec) that de-
scribes what the client needs to and is allowed to know about the behavior of the method.

For example, we might write a spec for the degree method as follows:

/** Returns: the degree of the polynomial, the largest exponent with a non-zero
* coefficient. */
int degree();
Spec clauses
To help us construct a good spec, it is useful to think of the spec as being composed of various
clauses. These cover different things the client needs to know about:

Outputs: The returns clause describes what results come back when the method is called.
Inputs: The requires clause describes what arguments the caller is permitted to supply.
Effects: The effects clause describes what side effects to the state of objects happen.
Errors: To aid in debugging, we may want to explain what happens when the client
supplies incorrect inputs. This is usually done as part of the requires clause.
Examples: Sometimes an example or two can help clarify what the method does.

The key to writing good specs is to think of the spec as a contract between the client and the im-
plementer. Like a legal contract, its main goal is to help everyone figure out who to blame when
things go wrong. This is very important for successful software engineering, especially in a large
team.

Using Javadoc
Javadoc doesn't completely support the clauses we have been describing thus far, though there
are efforts to make it do so. If you want to use Javadoc to generate HTML documentation, you
will need to adapt this documentation strategy accordingly. The key is not that you need to have
explicit clauses, but that you should know for each thing you write in the comment which clause
it belongs to.

Step 4: implement operations


Now we're ready to implement our specifications. And we'll want to write some documentation
of that implementation...

Documenting implementations: The implementer's view

Our goal for documenting the implementation is to support implementation and maintenance by
describing implementation methods and even abstracting them. Therefore, private methods have
the same clauses as public ones.
We also want to write specifications for fields where it is not obvious. Two kinds of information
are essential:

1. describing what values of the fields mean in terms of the client's abstract view.
2. stating the rep invariant that these fields must satisfy.

Example of field specifications including rep invariants:

class LinkedList {
/** First node. May be null when list is empty. */
Node first;
/** The number of nodes in the list. size ≥ 0. */
int size;
/** Last node. last.next = null. May be null when list is empty. */
Node last;
...

When we write methods like LinkedList.add, these invariants may be temporarily broken:

/** Append x to the end of the list. */


void add(T x) {
// Algorithm: Create a new node. Make it the new head of the list
// if the list is empty. Otherwise attach it to "last".
if (first == null) {
first = last = new Node(x, null); // size invariant broken here
} else {
last = last.next = new Node(x, null); // size invariant broken
}
size++; // invariant restored here
}

Documenting implementations: the specializer's view


The specializer uses the code as a superclass, with the goal of producing a subclass that reuses
superclass functionality. The specializer may override the behavior of existing methods that have
public or protected visibility. When we write a specification for a method that can be overridden,
there are really two separate goals:

1. Specification for client use. Define the contract with the client that this class and all
subclasses must enforce. The client specification is all that the client can count on, because
the dynamic type of the object being used may not be the same the static type.
2. Overridable behavior. Defines the behavior of this particular method implementation. This
behavior must be compliant with the client specification, but it may guarantee more than the
client specification does.

For example, consider an implementation of an extensible chess game. We might define a class
Piece that gives an interface for manipulating pieces, with subclasses such as King that specialize
it:
/** A chess piece */
class Piece {
/** Spec: Iterates over all the legal moves for this piece.
* Overridable: uses legalDests() to construct
* the legal moves.
*/
public Iterator<Move> legalMoves() { ... }

/** Iterates over all destinations this piece can move to in an


* ordinary move, including captures. */
abstract protected Iterator<Location> legalDests();
}

Given this specification for Piece, we can implement a piece such as a king with extra castling
moves that are not computed from the legal destinations as with other pieces:

class King extends Piece {


public Iterator<Move> legalMoves() {
Collection<Move> moves = ...;
for (Move m : super.legalMoves()) { // rely on superclass overridable behavior
moves.add(m);
}
m.add(new CastleMove(...));
return moves.iterator();
}

/** Overridable behavior: iterate over the squares adjacent


* to the current location. */
public Iterator<Location> legalDestination() {
...
}
}

Note that King.legalMoves obeys the specification of Piece.legalMoves, but overrides its behavior.
Because the implementer of Piece defined overridable behavior of the method, the implementer
of King can rely on this behavior in implementing their own method, without needing to read the
details of the implementation of Piece.legalMoves.

Writing classes that can be inherited and reused effectively requires keeping these two different
kinds of specification separate.
Modular Design and Implementation Strategies
Module Dependency Diagrams
We have discussed how to pick operations for modules and how to specify those operations.
Moving to a higher level, it will be helpful to have methods for planning how we are going to
partition a programming effort into modules or even into larger units like groups of modules.

The Module Dependency Diagram (MDD) is one tool that helps with planning and communi-
cating design at the larger scale. The idea is that the modular design has a structure that comes
out when we look at how modules depend on each other. In an OO language, modules are classes
(or, to some extent packages). There are two ways that classes depend on each other: first, by re-
ferring to each other, which we consider to be a "depends on" relationship, and second, by inher-
itance, which is a specialized dependency, a "subclass of" relationship. We can illustrate these
two kinds of relationships with arrows:

For example, consider Assignment 2. The MDD of one reasonable implementation of this assign-
ment is something like the following:

Notice that the subclass hierarchies appear as parts of the MDD, albeit upside-down from the
usual way that we draw them.

The MDD helps diagnose the quality of a design. We can tell which modules are important to the
overall design, because they have a lot of incoming arrows or can be reached from many mod-
ules by following arrows through the MDD. We can also tell which modules are more likely to
break, because they have many outgoing arrows or can reach many other modules by following
arrows.

A good design tends to lead to a clean-looking MDD, because the dependencies of different
modules are easy to understand. A good design in which modules are loosely coupled tends to
look like a tree or a graph without cycles. Cycles in the graph are a danger sign for the design.
For example, two modules that depend on each other are really intimately tied to each other, be-
cause changes to either one will tend to propagate to the other, and perhaps back! A messy look-
ing MDD that can't be drawn without creating spaghetti means you should rethink how you've
divided the tasks into modules.

Implementation strategies
The MDD gives us a nice way to plan how we are going to implement code. There are two basic
ways to go about implementing: top-down and bottom-up.

Top-down: Here we start at the top of the MDD (i.e., the main program) and work our way
down, implementing clients before the modules on which they depend. Top-down development
helps us get high-level design decisions right early on and to understand user needs. The program
is always in a demoable and testable state as long as we write stub implementations of the miss-
ing pieces of the code. With top-down development, we can do system testing in which the
whole system is being tested at once, and modulo functionality not yet implemented in stubs, we
can test as we develop.

top-down development

Bottom-up: As the name suggests, we implement modules before the clients that depend on
them. Bottom-up development has the advantage that, assuming we are testing as we go, we are
also building upon solid, fully implemented foundations. As we implement each module, we can
immediately test it with unit tests designed for that module. Since the top levels of the program
are not present, we replace them with a test harness for each module, containing that module's
unit tests. Bottom-up development is particularly effective when it is not clear whether key parts
of the system can be implemented with acceptable performance.

bottom-up development

Which implementation strategy is better? It depends on what we are trying to build. Typically,
we want to reduce the risk that we will do extra work, by discarding code or by setting up a
structure in which the desired functionality is difficult to implement. Top-down development is
helpful for reducing the risk that the customer needs are not going to be met, or that the high-lev-
el structure of the program is not going to work out. Bottom-up development reduces the risk
that core technologies will be infeasible, potentially requiring a redesign of all of the modules de-
pending on them.

In practice, we often want to use both styles of design at once, depending on our assessment of
where the main risks lie. Before implementing a program, especially when working on a team, it
is a good idea to decide on the MDD and come up with an implementation strategy that address-
es the risks effectively through a mixture of top-down and bottom-up development. The imple-
mentation strategy should include any stubs or test harnesses needed to test the code as it is being
developing. After sketching the MDD, the team should design the interfaces that define the
boundaries between different modules. Good interface design and clear specs are especially im-
portant at the boundaries where different programmers will interact.

Testing
Testing in software development
It is tempting to defer testing, like documentation, to fairly late in the software development pro-
cess. For example, in the waterfall model of development, testing is part of the validation (aka
verification) step occurring late in the process:

While the steps of the waterfall model are all real tasks that must be done, in practice each phase
cannot be complete before the next one begins. Each phase can influence the ones that come be-
fore it in the model. While waterfall model has had serious proponents, at this point it is widely
criticized; it's really a straw man for how not to do development.

Having validation at the end is particularly problematic. Validation is any process that increases
our confidence in the correctness of the system; it can include both testing and formal verifica-
tion, though we focus on the former in this course. Code that hasn't been tested should be as-
sumed to be broken; waiting until development is finished to start testing is a recipe for disaster.

When to test?
The time to test is before, during, and after implementation. We can even start working on testing
before we have started implementing code. This helps because the process of designing tests for
the system already identifies design flaws. Early testing not only tests the code, it tests the speci-
fications and even the tests themselves. (Failed tests may fail because the tests themselves are
wrong!)

The key to success is continuous testing throughout the development process. As each feature is
added or module is implemented, test cases should be developed to validate the implementation
work done. With continuous testing, new bugs that are discovered will tend to be found in recent-
ly written code, helping you localize the error. Continuous testing works particularly well when
code contains assertions to check preconditions, postconditions, and invariants.

Regression testing
Studies of software development have shown that roughly 1/3 of all bug fixes introduce new
bugs. Adding new features often also adds new bugs to old code. Regression testing helps ad-
dress this problem. A test suite containing tests that cover the functionality of the program is de-
veloped; these tests can then be run to make sure that working features of the code have not "re-
gressed". For regression testing to be effective, it should be as automatic and easy to invoke as
possible. Many software organizations include regression testing as a standard part of the soft-
ware build process, preventing developers from pushing changes to the code repository if regres-
sion tests fails. The time invested in automating regression tests is time well spent for even mod-
erate-sized projects.

The test suites associated with a software project are tremendously valuable. Like code docu-
mentation, they are part of the code of the project and should be curated and maintained with
equal care. It is tempting to take shortcuts with testing code because it isn't shipped to the cus-
tomer; this would be a mistake.

How to add bugs


Why do programmers write code with so many bugs? One important reason is the excessive use
of copying and pasting code. Copying and pasting buggy code automatically multiplies the num-
ber of bugs in your code. Further, programmers frequently fail to completely adapt the copied
code to the new setting where it is being pasted. In fact, one recently popular way to find bugs in
programs is exactly to automatically search for code that appears to be copied and pasted, but
where the pasted version has not been consistently adapted to its new context.

Copying and pasting code is one of those “lazy shortcuts” that often creates more work in the
long run. When you feel the need to copy code, try instead to introduce an abstraction (a class, a
method) that captures the functionality that you are copying. You might have to think a little
harder but your code will typically be shorter and better.

Strategies for test design


A program might be run using a staggeringly huge number of possible inputs and different envi-
ronments. Testing is by its nature a finite process. How can we can confidence that the program
works in all situations from just finite number of tests? This is the problem of coverage. There
are several good strategies for achieving meaningful coverage.

Black-box testing
In black-box testing, we design test cases by looking at the spec rather than the code. We design
test cases that not only include “typical cases“ but also edge and corner cases that (from the spec)
we can see are atypical in one way or another. Because we are designing test cases without look-
ing at the implementation, we can even design black-box test cases before the implementation is
written. In fact, writing black-box test cases also helps get the specs right because thinking about
testing helps us realize when specs are incomplete or ambiguous. Developing tests early also
leads to a better implementation because it causes us to think more deeply about what will hap-
pen in corner cases.

Let's consider an example of developing black-box tests. Suppose we are testing the remove oper-
ation of a linked list implementation. We would want to test some "typical" cases in which the el-
ement is in the list, but also some corner cases:

When the element is not in the list


When the list is empty
When the element is the first in the list
When the element is the last in the list
When the list contains only the element to be removed

Black-box testing requires the programmer to define input/output pairs, in which the correct out-
put corresponding to each input is defined. This can be a time-consuming process.

Glass-box testing
In glass-box testing we design test cases based on the implementation code. The goal is achieve
coverage of all the ways the code can behave, under the reasonable position that any untested
functionality in the code could hide a bug.

At a minimum, therefore, glass-box testing requires that we test every method. But we should go
further to obtain higher assurance:

Every line of code should be exercised.


Every path through the code should be taken.
Every data structure that the code can build should be built.

In general it is infeasible to take every path through the code in a finite set of test cases, because
even a single loop defines an infinite number of paths depending on how many times the loop re-
peats. Here we can fall back on the strategy of sampling the space intelligently. We test both the
"typical" case in which the loop does a few iterations and the boundary cases of 0 or 1 iterations.

Randomized testing
Designing test cases that exercise all parts of the code and specification is challenging. Often,
higher assurance can be obtained by generating test cases randomly, either by generating random
inputs or, in the case that the module has internal state, sequences of random inputs. If enough
test cases are generated, the coverage may be excellent. In general, some care made be needed in
generating test cases, if the corner cases exhibiting bugs are exceedingly unlikely to be hit by
chance.

With black-box and glass-box testing, input/output pairs must be designed. Where do we get the-
se when doing randomized testing? In general we are not going to be able to fully test that the
outputs of the code are according to the specification, and must settle for weaker properties of
the code being tested, such as not throwing unexpected assertions or failing assertions.
One way that we can use randomized testing to test against the spec is if we have a reference
implementation of the same specification—perhaps a less efficient, simpler implementation of
the same abstraction. In this case, the two implementations can be compared on random test cas-
es to ensure they agree.

Randomized testing requires writing a test case generator, which can be challenging, particularly
when the test cases must satisfy a complex precondition that is unlikely to be satisfied by most
randomly generated inputs. One way to solve that problem is to generate random inputs and then
"fix" them into inputs that satisfy the precondition. Another is to generate each input by perturb-
ing a previous input randomly in a way that ensures the precondition still holds.

While automatic test case generation is not always easy, tool support is steadily improving. JUnit
Theories and Haskell QuickCheck are examples of tools that support this style of testing.

Bounded exhaustive testing


An alternative to randomized testing is to generate exhaustive test cases that completely cover
the space tested. Full exhaustive testing is not feasible, of course. However, for most program
bugs, there is some "small" (in some sense) test case on which the bug is exhibited. This is the
small counterexample hypothesis. Therefore, we design test cases so that all test cases (or input
sequences) of up to a certain maximum size are tested. For example, if we want to test a method
that operates on a tree data structure, we might generate all possible trees of up to height 5. A
buggy algorithm that manipulates trees is unlikely to work perfectly on all trees of size 5 or less.
As with randomized testing, the challenge is to generate the test cases, but tool support continues
to improve.

Symbolic execution
Testing on particular inputs leaves the possibility that the inputs not chosen would have exhibited
the bug. Another idea for how to obtain coverage is to use symbolic execution in which the code
is run on symbolic inputs rather than actual values. This approach requires special tool support to
be able to run programs in this alternate (and expensive mode).

The idea is that rather than giving each variable a concrete value, the execution carries around a
logical formula describing constraints on the possible values that variables can take on. The ini-
tial constraints for the code of a method would come from the precondition of the method, but
new constraints are obtained from the flow of control within the program. For example, consider
the following code. It is not immediately obvious whether this code can crash:

/** Requires: y ≥ 0. */
int f(int x, int y) {
if (y > 0) {
d = x/y;
} else {
assert y == 0;
d = y + 1;
}
x = x/d;
}
A symbolic executor proceeds by starting with the preconditions and propagating them through
the program. Where a conditional cannot be unambiguously evaluated, execution splits and two
executions proceed with different information. For example, in the following execution, we don't
know whether y>0 or not, so both paths are followed. In each path, we use the result of the com-
parison of y>0 to strengthen the information we have available:

/** Requires: y ≥ 0. */
int f(int x, int y) {
// x can be anything, y ≥ 0
if (y > 0) {
// x can be anything, y > 0
d = x/y;
// d can be anything
} else {
// d can be anything, y = 0.
assert y == 0; // cannot fail
d = y + 1;
// y = 0 and d = 1
}
// (y = 0 and d = 1) or (y > 0)
x = x/d;
}

The path of execution when y>0 has no information about the value of d, which means that when
we get to the statement x = x /d, we cannot show that d != 0. We have found a bug. By repre-
senting the values of variables logically, symbolic execution effectively runs a very large number
of test cases at the same time.

The weakness of symbolic execution is that it does not scale up to large programs. However, it
can be very effective on smaller implementations. Tools for symbolic execution are still not
mainstream but are maturing.
Design patterns
Design patterns are coding idioms that help build better programs. The goal is often to help make
programs more modular by decoupling communicating code modules. Some design patterns just
help avoid mistakes. Design patterns give programmers a common vocabulary for explaining
their designs and aid in quick understanding of the advantages and disadvantages of particular
designs.

The term design pattern was introduced by the very influential “Gang of Four” book, Design
Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma, Richard Helm,
Ralph Johnson, and John Vlissides. The book discusses object-oriented programming and intro-
duces (or gives names to) more than 20 design patterns.

Many other design patterns or variations of design patterns have been identified since, some
more useful and meaningful than others. In this lecture we look at some of the more important
design patterns to understand why they are helpful. Understanding patterns will also help you re-
sist the lure of the Cargo Cult Programming antipattern, in which design patterns are used with-
out real purpose!

Iterator pattern
A not uncommon problem when designing programs is how to set up a stream of information
from a producer module A to a consumer module B, while keeping both A and B decoupled so
that each has no dependency on the module they are communicating with. Assuming that the val-
ues communicated have some type T, the communication we want can be depicted as follows:

The Iterator design pattern is one way to solve this programming problem. Module A constructs
objects that provide the ability for the consumer to “pull” values from the producer. These object
provide an interface like the following:

interface Iterator<T> {
boolean hasNext();
T next() throws NoSuchElementException;
}

The key operation here is next(), which the consumer uses to get objects. This is a polling-style
interface, in which the consumer can ask any time for a new object, but might have to wait until
something is available. Iterators can be used by calling only next(), but to detect the end of the it-
eration without using exceptions, it is standard to use the hasNext() method instead.

Once consumer B has obtained an iterator from producer A, it can keep getting new elements
from the iterator without mentioning A in any way. The producer code doesn't need know about
B, either. Thus, we have complete decoupling of A and B.
An additional advantage of this pattern is that multiple consumers can obtain streams of infor-
mation from a single producer without interfering with each other. Whatever state is needed to
keep track of the position in the stream is stored in the iterator object, not in the producer.

Java provides a very convenient syntactic sugar for invoking iterators. A statement of the form
for (T x : c) { ...body... } is syntactic sugar for the following code:

Iterator<T> i = c.iterator();
while (i.hasNext()) {
T x = i.next();
//...body....
}

To use this syntactic sugar, it is necessary that c either be an array or implement the interface
Iterable<T>:

interface Iterable<T> {
Iterator<T> iterator();
}

Implementing iterators
Iterators are very handy and easy for client code to use. They are a welcome addition to interfac-
es. However, there is one problem: implementing them can be tricky.

1. The iterator needs to keep track of the current state of the iteration so that it can resume at
the right place in the stream on each call to next(). For tree data structures, tracking iteration
state is particularly awkward. The state of the iteration is a path from the root to the current
node; this path must be updated on each call to next.

2. The iterator supports both hasNext and next methods. The hasNext method must figure out
whether there is a next element to be provided; typically, this duplicates work that the next()
might have to do, and in some cases, that work cannot be done separately by the two meth-
ods. The iterator must contain additional state to keep track of whether the current answer to
hasNext() has been computed yet.

3. Dealing with changes to the underlying data structure during iteration is often tricky, so
changes to the collection being iterated over is typically forbidden. In the Java collections
framework, collection classes throw an exception ConcurrentModificationException if an ele-
ment is requested from an iterator after a mutation to the collection that occurred during the
iteration. Note that a concurrent modification can happen even if there is no real concurren-
cy in the system. To detect such requests, every collection class object has a hidden version
number that is incremented after each mutation. Iterator objects record the collection's ver-
sion number when they are created, and compare this version number against the collection's
on each call to next(). A mismatch causes the exception to be thrown.

A commonly desired change to the collection is to remove the element currently referenced
by the iterator. Iterators may support a remove() method whose job it is to remove the current
element; this operation is not considered a concurrent modification. However, if there are
multiple iterators traversing the data structure, a remove() by one iterator will in general
break the others.

Generators
Some languages other than Java support another language construct that makes it easier to imple-
ment iterators. The C#, Python, Ruby languages support generators that send results to the con-
sumer using the yield statement. An extended version of Java that supports yield is JMatch, de-
veloped at Cornell. In these languages, you can think of the iterator as running concurrently with
the consumer, but only when the consumer requests a new value. The iterator and the loop body
are coroutines. For example, with generators, an iterator for trees can be implemented very easi-
ly using recursion:

Iterator<T> iterator() {
if (left != null)
for (T x : left)
yield x;
yield data;
if (right != null)
for (T x : right)
yield x;
}

By contrast, a Java implementation of the same iterator will take at least 50 lines of code and of-
fer more opportunities for introducing bugs. On the other hand, a careful Java implementation of
a tree iterator can be made to run faster than the generator, by avoiding yielding elements up
through every level of the tree. The trick is to keep the path from the root to the current node in a
stack.

Ironically, the term iterator originally referred to this style of implementing iteration, which was
invented in the language CLU in the 70's. The term generator originally referred to what we
now know as the iterator design pattern.

Observer pattern
Sometimes we want to send a stream of information from a producer to a consumer, but it's not
convenient to have the consumer polling the producer. Instead, we want to push information
from the producer to the consumer. We can think of the information being pushed as events that
the consumer wants to know about. This is the idea behind the Observer pattern, which works in
the opposite way as the Iterator pattern:
In the Observer pattern, the consumer provides an object implementing the interface Observer<T>:

interface Observer<T> {
void notify(T event);
}

Whenever the producer has a new event x to report to the consumer, it calls the observer's meth-
od notify(x). The observer then does something with the data it receives that is appropriate for
the consumer. Since the observer is provided by the consumer, it knows what operations the con-
sumer is and is typically inside the consumer's abstraction boundary, perhaps implemented as an
inner class.

How does the producer know which observers to notify? This is accomplished by registering the
observer(s) with the producer. The producer implements an interface similar to this:

interface Observable<T> {
void registerObserver(Observer<T> observer);
}

When the producer receives a call to registerObserver, it records the observer in its collection of
observers to be notified. When the producer has a new event to provide to consumers, it iterates
over the collection, calling notify on each observer in the collection.

We have already seen an instance of the observer pattern: Swing listeners. For example,
ActionListeners are observers with a notify named actionPerformed. If one is setting up listener
for button clicks, the Observable in question is the JButton object, and an observable is registered
by calling addActionListener(l).

Like the Iterator pattern, the Observer pattern has the benefit that the producer and consumer can
exchange information without tying either implementation to the other. An observable can also
provide information to multiple observers simultaneously.

We can see that there is a symmetry to Iterators and Observers. We can make this a bit more
compelling. Using A→B to represent the type of a function that takes in an A and returns a B,
and using () to represent the type of an empty argument list (which is really the same thing as
void), we have the following types:

Iterator:

next: ()→T
iterator: () → (()→T)

Observer:

notify: T→()
registerObserver: (T→()) → ()

The types of the Iterator operations are exactly the same as the types of the Observer operations,
except that all the arrows are flipped! This shows that we have a duality between Iterator and
Observer.
Abstract Factory pattern
When we create objects using a constructor, we tie the calling code to a particular choice of im-
plementation. For example, when creating a set, we specify exactly which implementation we are
using (for simplicity, let's ignore type parameters):

Set s = new HashSet();

One way to avoid binding the client code explicitly to an implementing class is to use factory
methods (creators), which we have talked about earlier. We might declare a class with static
methods that create appropriate data structures:

class DataStructs {
static Set createSet() { return new HashSet(); }
static Map createMap() { return new HashMap(); }
static List createMap() { return new LinkedList(); }
...
}

Now the client can create sets without naming the implementation, and the choice of which im-
plementations to use for all the data structures has been centralized in the DataStructs class.

Sometimes static factory methods still don't provide enough flexibility. The choice of implemen-
tation is still fixed at compile time even if the client code doesn't choose it explicitly. We can
solve this problem by using the Abstract Factory pattern. The idea is to define an interface with
non-static creator methods for the various kinds of things that need to be allocated.

interface DataStructs {
Set createSet();
Map createMap();
List createList();
...
}

All the choices about what implementation to use can now be bound into an object that imple-
ments this interface. Assuming that object is in a variable ds, the client might contain:

DataStructs ds;
...
Set s = ds.createSet();

Of course the choice of implementation has to be made somewhere, where ds is initialized, but
that can be far away from the uses of ds, in some other module. Since the abstract factory is an
object, it can be chosen truly dynamically, at run time. There can even be multiple implementa-
tions of an abstract factory interface used within the same program.

One place where the abstract factory approach has been used successfully is for user interface li-
braries. We might define an interface for creating UI components:
interface UIFactory {
Button createButton(String label);
Label createLabel(String txt);
Scrollbar createScrollbar();
...
}

Then, different UIFactory objects can encapsulate different choices of look and feel for the user
interface. Swing doesn't take quite this approach, but the look-and-feel choices that make Swing
UIs look different on Windows versus Mac OS X are in fact made by binding each Swing JCom-
ponent to a contained object of type UIComponent. The UIComponent controls the look and feel
of the JComponent, and it is chosen dynamically based on the OS platform being used.

Singleton pattern
Sometimes classes don't need to have more than one instance. A class with just one instance is an
example of the Singleton pattern. For example, if we wanted a class that represented empty
linked lists, we might only allocate a single object of that class, since all empty lists are inter-
changeable anyway. We can store it into a static field of the class to expose it to clients, and hide
the constructor since it shouldn't be used outside the class itself:

class EmptyList implements List {


public static empty = new EmptyList();
private EmptyList() {}
}

The Singleton pattern is also frequently used with the Abstract Factory pattern. There is no need
to have more than one object of the class implementing, say, DataStructs or UIFactory in the ex-
amples above.

Composite
The Composite pattern is a pattern that we've already been using: it refers to using a data struc-
ture of objects to provide what appears to the client to be a single object. This idea is simply the
combination of data structures with data abstraction. Even common objects like strings are Com-
posite objects in Java.

Flyweight
The idea of this design pattern is to have objects that take up very little memory. This is done by
having their representations be small and also by placing as much state as possible in underlying
objects that are shared across many instances.

Interning
A related idea to flyweight objects is interning (known in Lisp as hashconsing). A hash table is
used to keep track of all objects of a given class. Object creation is done by a factory method.
When a new object is requested to be created, the factory method uses the parameters to the calls
to look up whether a suitable object has already been created. If so, this object is returned. Other-
wise, a new object is created using the constructor, which is typically made private so the only
way to create objects is to go through the factory method. This pattern makes the most sense for
immutable abstractions, because it may cause the same objects to be shared across unrelated
code or data structures.

Adapter
The Adapter pattern allows an existing object to satisfy an interface it was not originally de-
signed to satisfy, hiding the actual interface provided by the existing object. This is accomplished
by using a wrapper object that implements the interface and that simply redirects calls of the new
interface to the appropriate calls on the underlying wrapped object.

Decorator
The Decorator pattern is similar to the Adapter pattern. Here the idea is to extend the interface of
some existing objects of a class. Unlike in the Adapter pattern, the Decorator interface is a sub-
type of the interface that the objects already implement; its implementation is a wrapper class
that redirects all calls from the original interface to the wrapped object.

External state
Sometimes it is undesirable to record some of the state associated with an object in the object it-
self, perhaps because the class cannot be extended with new instance variables, because only
some of the objects of the class have that extra state, or because that state is involved in an invar-
iant maintained by another module. A second class is defined to contain that external state, and
objects of the second class are created as necessary. To allow quickly finding the external state
for an object, the external state objects are put into a hash table, using the original object itself as
a key to find the state.

State machine
Programming in an event-driven style can result in messy designs in which not all events are
handled. One way to ensure all events are handled is to think about the program, or about parts of
the program, as state machines. Often state machines are presented as mathematical abstrac-
tions, but they are also a way to organize code: a design pattern called the state design pattern.

A state machine has a set of states and a set of events that can occur. At any given moment, the
machine is in one of the allowed states. However, when it receives a new event, it can change
states. For each (state, event) pair, there is a new state to which the machine transitions when that
event is received in that state.

A state machine can be represented as a graph in which the nodes are the states and the edges are
the transitions between states. The edges are labeled with the event that causes that transition.

As a simple example, consider a window in a graphical user interface. Simplifying a bit, it can be
in the following states: opened, closed, minimized, or maximized. (One reason that these states
are a simplification is that the window also has a size and position.) The following events can be
received: open, close, minimize, and maximize, corresponding to buttons that can be clicked. As
a graph, the window implements the following state machine:
A diagram like this helps understand what states the system can get into and how the system
moves among states. It doesn't help as much with ensuring that all combinations of states and
events are considered. To ensure no combinations are missed, we can construct a state-transi-
tion table.

When the number of states in a state machine is finite, s state-transition table is called a finite
state machine, or finite-state automaton. In general, a state machine can have an infinite num-
ber of states, or a very large number of states. The rows in the table correspond to states, and the
columns correspond to events. The entries in the table say what the next state given the current
state and event.

State open close minimize maximize

1. Opened — 2 3 4

2. Closed — — — —

3. Minimized 1 2 — —

4. Maximized — 2 3? 1

The table helps us think systematically about all the possible things that can happen in the sys-
tem, and make sure we have covered all the possibilities.Thinking about the various entries helps
us not only missing event handlers but also missing states. For example, when minimizing a
maximized window, the state machine above forgets that the window was maximized. When the
window is reopened, it will no longer be maximized. If that is not the desired behavior, we'll
need to add a fifth state to the state machine, keeping track of windows that are minimized from
a maximized state.

Even the entries marked with —, which represent events that don't make sense in the current
state, are interesting to think about because we need to make sure that the user interface doesn't
permit those events to happen—perhaps by graying out the corresponding UI component.

In the state design pattern, the various events that can be received by the state machine are repre-
sented as different methods on the state machine object. The key is to centralize the code that im-
plements a state machine. It is also possible to implement a state machine as a big switch state-
ment in which the event type is used to dispatch to the appropriate handling code, but this style
of implementation is less object-oriented.

Model-View-Controller
Since the UI components are used to manipulate the information managed by the application, it
is tempting to store that information and the algorithms that manipulate it (the application logic)
directly in components, perhaps by using inheritance. This is usually a bad idea. The code for
graphical presentation of information is different from the underlying operations on that infor-
mation. Interleaving them makes the code harder to understand and maintain. Further, it makes it
difficult to port the application to a new platform. For example, you might implement the appli-
cation in Swing and then want to port it to Android, whose UI toolkit is very different.

This observation leads to the Model-View-Controller pattern, in which the application classes are
separated into one of three categories: the model, which contains the important application state
and operations, and does not refer to the graphical UI classes; the view, which provides a graph-
ical view of the model; and the controller, which handles user input and translates it into either
changes to the view or commands to be performed on the model.

The idea is that the view may hold some state, but only state related to how the model is current-
ly being displayed, or what part of the model is displayed. If the view were destroyed, some ver-
sion of it could be created anew from the model. With this kind of structure, there can be more
than one user interface built on top of the same model. In fact, multiple views can even coexist.

One of the challenges of the MVC pattern is how to allow the view to update when the model
changes, without making the model depend on the view. This task is usually accomplished by us-
ing the Observer pattern. The model allows observers to be registered on its state; the view is
then notified when the state changes.
This separation between model, view, and controller will be very important for Assignment 7,
where you will build a distributed version of the critter simulation. The model will run on a
shared server, with one or more clients viewing that model through a Swing user interface.

There are many variations of the MVC pattern. Some versions of the MVC pattern make less of a
distinction between the view and the controller; this is usually indicated by talking about the
M/VC pattern, in which the view and the controller are more tightly coupled, but strong separa-
tion is maintained between the model and these two parts of the design.

Visitor
The Visitor pattern allows the traversal of a tree data structure (such as an abstract syntax tree) to
be factored out from the nodes of the tree, in a generic way that can be reused for multiple tra-
versals. There are many variations on the visitor pattern.

Antipatterns
Coding patterns that are used frequently but that people think should be avoided are often
dubbed “antipatterns”. For example, some Java programmers make heavy use of reflection in Ja-
va. Using reflection is generally bad practice, leading to slow, fragile code. A good reason to use
reflection is if you are loading code dynamically at run time (for example, for plugins or for dy-
namic code generation). Most applications do not need this capability, so we do not talk about re-
flection in this course. A good and rather humorous list of antipatterns can be found on Wikipe-
dia.
Graphical User Interfaces: Display and Layout
One of the driving forces behind the development of object-oriented programming was the need
to create graphical user interfaces (GUIs). It's not surprising, then, that the OO programming
model really shines when building GUIs.

In fact, the standard WIMP model of graphical user interfaces (Windows, Icons, Menus, Pointer)
was developed at Xerox PARC during the 1970's in and along with one of the first object-orient-
ed programming languages, Smalltalk. GUIs influenced the language design and vice versa. The
WIMP model first saw widespread deployment by Apple with the first Macintosh computer and
was soon adopted by Microsoft Windows as well.

In this approach to graphical user interface design, the user interface is represented internally as a
tree structure, in which the various user interface components (sometimes called widgets) are
nodes in the tree, and the parent–child relationship corresponds to containment. In fact, the
HTML markup language follows this same approach to defining a user interface.

We will be using JavaFX, a modern object-oriented GUI library. In such a library, there are two
interacting hierarchies: first, the containment hierarchy corresponding to the tree of UI compo-
nents, and second, the class hierarchy of the components themselves. Unlike in HTML, the class
hierarchy is extensible, allowing developers to use inheritance to design new components or to
customize the behavior of existing components.

Scene Graph
JavaFX manages user interfaces as scene graphs, which is actually a tree of nodes. At the root of
the scene graph is some node; the node is registered with a Scene object that is in turn registered
with a Stage, which corresponds to a top-level window in the application.

Example
(These notes still in progress)

Node class hierarchy


Node
Shape
Text
Line
Rectangle
Circle
Polygon

Parent: node that contains other nodes


Group
Region: parent that covers a specified area
Control: some useful predefined GUI widgets
Accordion
ChoiceBox
Separator
Labeled
ButtonBase
Button
CheckBox
Hyperlink

TextInputControl
TextArea
TextField

Pane
HBox
VBox
StackPane
GridPane
FlowPane
BorderPane

Chart
Axis

Canvas
ImageView
SwingNode

Building a UI
A scene graph must be constructed. This can be done either by writing code to build the scene
graph one node at a time, or by reading it from a file. All of the non-leaf nodes in the tree will be
of class Parent or some subclass (typically a subclass of Pane).

To add a child using code:

Parent p;
Node n;
p.getChildren().add(n);
The getChildren() method returns a collection of children that is tied to the actual children of the
parent node. Thus, adding a new node to the collection causes the parent node to acquire a new
child. The collection can can be used in various other ways, however; for example, it can be used
to iterate over the children or to listen to its contents to find out when the set of children changes.

The JavaFX Scene Builder can also be used to create a scene graph that can then be saved into a
.fxml file to be loaded later.

Layout
Some JavaFX nodes just display themselves (e.g., Rectangle). Some nodes do something active
(e.g., Button, TextField). Other nodes are just there to control the layout of other nodes in the
window. Examples of these are the various subclasses of Pane. Pane places all its children in the
upper left corner. HBox and VBox lay out their children in a horizontal row or vertical column,
respectively. StackPane stacks all its children on top of each other, centered. GridPane, Flow-
Pane, and BorderPane lay out children in fancier ways.

Appearance
1. Can use existing nodes (controls, shape nodes)
• nodes draw and redraw themselves
• can add, remove, move and transform nodes
• can style nodes using CSS either with code (e.g., p.setStyle("-fx-background-color:
#ffff00");) or by loading style sheets from external files.

2. Can draw onto a Canvas using a GraphicsContext object


• complete control over rendering
• but complete responsibility for rendering (except that exposed pixels are redrawn
automatically, unlike in Swing)

(These notes still in progress)

Event-driven programs
A different programming paradigm: “Don't call us, we'll call you.” JavaFX library invokes your
application code with things to do.

Threads
JavaFX program has multiple threads, unlike simple Java programs: Main thread, event dispatch-
ing thread, rendering thread(s), background worker threads. Event loop in library waits for input,
calls your application code with events to handle. Can do work in background by creating worker
threads.

Long-running computation should not happen in the event dispatching thread, because this will
freeze the UI by preventing events from being handled. To prevent interference between threads
accessing the node graph, however, all access to the node graph has to happen in the event dis-
patching thread. Fortunately, you can cause changes to the UI from a thread other than the event
dispatching thread by using the method Platform.runLater(). It causes some code to run (later,
but usually very soon) in the event dispatching thread.

Events
Program defines event handlers to handle different input events. These can be defined using clas-
ses, inner classes, or lambda expressions. All are basically syntactic sugar for the same thing, but
convenient. Example with lambda expression:

Button b = ...;
b.setOnAction(e -> print("button clicked"));

This is essentially the Observer pattern, but for many events, there can be only one handler at-
tached, rather than allowing multiple handlers to be registered.
User interface design
High-level goals
A good user interface enables users to get their job done efficiently, easily, and enjoyably.

Not about:

Making programmer's job easy by exposing underlying functionality as directly as


possible.
Providing the largest feature set possible.
Giving users what they say they want (your users are not good UI designers). (But listen
carefully to your users anyway!)
“If I had asked people what they wanted, they would have said faster horses.” — Henry
Ford.

A few principles will help guide your designs.

Know your users. Design to them.


frequent or occasional user?
novice or knowledgeable?
training expected?
You are not like the user: user testing is required.
Infrequent/novice users : gentle learning curve, discoverability
Discoverability: can learn features, interaction
Protection from dangerous actions, no loaded guns (video)
Clarity: Simple, clear displays
Consistency with other applications
Use metaphors to communicate (e.g., icons)
No (loaded guns)

Intermittent, knowledgeable users : Focus on discoverable features, reduced


memory burden
Internally consistent appearance and actions
Clear visual structure
Protection from dangerous action for safe exploration

Frequent/power users : Optimize for efficient interaction


Powerful actions, short interaction sequences
Rapid response times
Rich controls, shortcuts for common actions
Exploit muscle memory
Information-rich displays
Customization and macros

UI as dialogue
App needs to be good conversation partner
Ratify actions quickly
Be responsive (e.g., highlighting)
Show progress on longer actions

Work out the conversations


Use a set of use cases to figure out what users will have to do.
Eliminate unnecessary user actions.
Aim for short interactions with clear progress: intermediate goal satisfaction (cf.
DisneyWorld ride lines)
Use testing to find your blind spots (as developer).
May need to write testing scripts for human testers. Key (as usual): coverage.

Choosing the right interaction paradigm: direct manipulation vs. I/O vs. ?
Goal: app feels like extension of user
Example: driving a car vs. programming/remote-controlling a car, or Word vs. TeX.
UI abstracts underlying application state (the model)
Abstraction does not need to match the implementation
Good implementation strategy: Model != View != Controller
Good abstraction: Model = View = Controller
Less direct styles:
Menu selection/hierarchical navigation
Form fill-in + submit
Command languages

GUI helps by restricting the vocabulary


Channels `utterances´ (user actions) into meaningful directions
Reduces memorization

Use modes well


Modes: states of UI that restrict interactions.
Modes help with restricted context-sensitive vocabulary
Avoid overusing modes and trapping users in modes.

Interaction time scales


<1/60s: biologically imperceptible: faster than neurons
<1/30: fast enough for continuous-feedback tasks (e.g., mouse tracking)
<1/10s: imperceptible delay for discrete actions, e.g. button clicks.
<1/2s: fast but noticeable (ok for command-response interaction)
1/2s-5s: increasingly annoying but user stays focused
5s-10s: User starts to lose attention.
10s-1 min$: User becomes distracted and productivity declines. App needs to support
parallel activities.
>1 min: Significant loss of productivity. User leaves for coffee.

Learning and memory


Goal: interface that users can learn to use sufficiently quickly and can remember how to
use.
Obvious controls
Easy to find and identify
Don't set up user for a fall: disable invalid/dangerous actions

Avoid overload
Human can only hold 7 things in their head at once
Avoid long menus, lots of buttons
Design visuals carefully for rapid comprehension

Context-sensitive help
Task-focused rather than feature-focused (unlike most modern apps!)

Exploit spatial memory


Tap into humans ability remember to things spatially: e.g., the memory palace.
Each window or dialogue is a place you can go—make it a place user wants to be
in.
Avoid unnecessary places to learn about and navigate between.
Make navigation simple, clear, easy.
Big-picture overviews help users stay oriented
bird's eye views
menus with highlights

Exploit motor memory (esp. for power users)


Frequent users remember UI in their muscles.
Make actions to get an effect consistent so motor memory can carry users through
without thinking.

Visual design
Avoid visual clutter
Use space shading, color rather than lines to organize information.
Use low-contrast separators
Maximize “information/ink ratio” (Tufte)

Aim for visual consistency


Large apps need a written style guide to keep look and feel consistent as app
evolves.
E.g., buttons that change state vs. buttons that navigate vs. buttons that expose
hidden information: user should be able to distinguish.

Employ visual features as an additional communication channel.


Shape (up to 15)
Color (up to 24)
Size, length, thickness: up to 6.
Orientation: up to 24
Texture
Differing perceptual capabilities of some users (esp. for color) ⇒ cannot rely solely
on visual features to communicate, but they can complement the information that is
there in another form.

Useful reading
Envisioning Information, Edward Tufte.
Principles of Computer Graphics, by Foley and Van Dam. Ch 8.2-8.3, 9
About Face 2.0, Cooper and Reimann.
Concurrency
Concurrency is the simultaneous execution of multiple threads of execution. These threads may
execute within the same program, which is called multithreading. Modern computers also allow
different processes to run separate programs at the same time; this is called multiprogramming.
We've already seen in the context of user interfaces that like most modern programming lan-
guages, Java supports concurrent threads; Java is a multithreaded programming language.

Concurrency is important for two major reasons. First, it helps us build good user interfaces, be-
cause one thread can be taking care of the UI while other threads are doing the real computation-
al work of the application. JavaFX has a separate Application thread that starts when the first
JavaFX window is opened, and handles delivery of events to JavaFX nodes. JavaFX applications
can also start other threads that run in the background and get work done for the application even
while the user is using it. Second, modern computers have multiple processors that can execute
threads in parallel, so concurrency lets you take full advantage of your computer's processing
power by giving all the available processors work to do.

Programming concurrent applications is challenging because different threads can interfere with
each other, and it is hard to reason about all the ways that this can happen. Some additional tech-
niques and design patterns help.

Concurrency vs. parallelism


Modern computers usually have multiple processors that can be simultaneously computing dif-
ferent things. The individual processors are called cores when they are located together on the
same chip, as they are in most modern multicore machines. Multiprocessor systems have exist-
ed for a long time, but prior to multicore systems, the processors were located on different chips.

Concurrency is different from, but related to, parallelism. Parallelism is when different hardware
units (e.g., cores) are doing work at the same time. Other forms of parallelism exist: graphics
processing units (GPUs), network, and disk hardware all do work in parallel with the main pro-
cessor(s). Modern processors even use parallelism when they are executing a single thread, be-
cause they use pipelining and other techniques to execute multiple machine instructions from a
single thread at the same time.

Thus, concurrency can be present even when there is no parallelism, and parallelism can be pre-
sent without concurrency. However, parallelism makes concurrency more effective, because con-
current threads can execute in parallel on different cores.

Concurrent threads can also execute on a single core. To implement the abstraction that the
threads are all running at the same time, the core rapidly switches among the threads, making a
little progress on executing each thread before moving on to the next thread. This is called con-
text switching. One problem with context switching is that it takes a little time to set up the
hardware state for a new context.

The JVM and your operating system automatically allocate threads to cores so you usually don't
need to worry about how many cores you have. However, creating a very large number of
threads is usually not very efficient because it forces cores to context-switch a lot.
Programming with threads in Java
In Java, the key class for concurrency is java.lang.Thread. It starts a new thread to do some com-
putation. The most important part of the interface is as follows:

class Thread {
/** Start a new thread that executes this.run() */
public void start();

/** Effects: do something. Overridable: do nothing. */


public void run();

/** Allow other threads to do work. However, other threads may preempt
the current thread even if this is not called. */
public void yield();

/** Set whether this thread is a daemon thread. */


void setDaemon(boolean b);
}

Thread objects have other methods, such as stop(), but they should probably be avoided. There
are better ways to accomplish what they do.

To start a new thread, we create a subclass of Thread whose run() method is overridden to do
something useful. Whatever it does will be run concurrently with other threads in the program.

For example, consider a program where we want to start a long-running computation when the
user clicks a button. We don't want to do this computation inside the Application thread because
this will stop the user interface while the computation completes. Therefore, we can start a new
thread when the button is clicked. This can be done very conveniently using two inner classes:

Button b = ...
b.setOnAction(e -> {
Thread t = new Thread() {
public void run() {
// do lots of work here!
}
};
t.start();
});

In Java, threads can preempt each other by starting to run even when yield() is not called. With
preemptive concurrency, a thread that has run long enough might be suspended automatically to
allow other threads to run. It is nearly impossible for the programmer to predict when preemption
will occur, so careful programming is needed to ensure the program works no matter when
threads are preempted.

Another useful method of Thread is setDaemon. A Java program will not stop running until all non-
daemon threads have stopped. If a thread should automatically stop when the program is done, it
should be marked as a daemon thread.
Race conditions
We have to be careful about having threads share objects, because threads can interfere with each
other. If two threads access the same object but only read information from it, it is not a problem.
Read-only sharing is safe. But if one or both of the threads is updating the object state, we need
to make sure that the order in which the updates happen is fixed. Otherwise we have a race con-
dition. Both read--write and write--write races are a problem.

For example, consider the following bank account simulation:

class Account {
int balance;
void withdraw(int n) {
int b = balance - n; // R1
balance = b; // W1
}
void deposit(int n) {
int b = balance + n; // R2
balance = b; // W2
}
}

If two threads T1 and T2 are respectively concurrently executing withdraw(50) and deposit(50),
what can happen? Clearly the final balance ought to be 100. But the actions of the different
threads can be interleaved in many different ways. Under some of those interleavings, such as
(R1, W1, R2, W2) or (R2, W2, R1, W1), the final balance is indeed 100. But other interleavings
are more problematic: (R1, R2, W2, W1) destroys 50 dollars, and (R2, R1, R1, W2) creates 50
dollars. The problem is the races between R1 and W2 and between R2 and W1.

We can fix this code by controlling which interleavings are possible. In particular, we want only
interleavings in which the methods withdraw() and deposit() execute atomically, meaning that
their execution can be thought of an indivisible unit that cannot be interrupted by another thread.
This does not mean that when one thread executes, say, withdraw(), that all other threads are sus-
pended. However, it does mean that as far as the programmer is concerned, the system acts as if
this were true.

Critical sections and atomicity


We have been seeing that sharing mutable objects between different threads is tricky. We need
some kind of synchronization between the different threads to prevent them from interfering
with each other in undesirable ways. For example, we saw that the following two methods on a
BankAccount object got us into trouble:

void withdraw(int n) { void deposit(int n) {


balance -= n; balance += n;
} }

There is a problem here even though the updates to balance are done all in one statement rather
than in two as in the previous lecture. Execution of one thread may pause in the middle of that
statement, so it doesn't help to write it as one statement. Two threads that are simultaneously exe-
cuting withdraw and deposit, or even two threads both simultaneously executing withdraw, may
cause the balance to be updated in a way that doesn't make sense.

This example shows that sometimes a piece of code needs to be executed as though nothing else
in the system is making updates. Such code segments are called critical sections. They need to
be executed atomically and in isolation : that is, without interruption from or interaction with
other threads.

However, we don't want to stop all threads just because one thread has entered a critical section.
So we need a mechanism that only stops the interactions of other threads with this one. This is
usually achieved by using locks. (Recently, software- and hardware-based transaction mecha-
nisms have become a popular research topic, but locks remain for now the standard way to iso-
late threads.)

Mutexes and synchronized


Mutexes are mutual exclusion locks. There are two main operations on mutexes: acquire() and
release(). The acquire() operation tries to acquire the mutex for the current thread. At most one
thread can hold a mutex at a time. While a lock is being held by a thread, all other threads that
try to acquire the lock will be blocked until the lock is released, at which point just one waiting
thread will manage to acquire it.

Java supports mutexes directly. Every object has a mutex implicitly associated with it. There is
no way to directly invoke the acquire() and release() operations on an object o; instead, we use
the synchronized statement to acquire the object's mutex, to perform some action, and to release
the mutex:

synchronized (o) {
// ...perform some action while holding o's mutex...
}

The synchronized statement is useful because it makes sure that the mutex is released no matter
how the statement finishes executing, even it is through an exception. You can't call the underly-
ing acquire() and release() operations explicitly, but if you could, the above code using
synchronized would be equivalent to this:

try {
o.acquire();
// ...perform some action while holding o's mutex...
} finally {
o.release();
}

Mutexes take up space, but a mutex is created for an object only when the object is first used for
a synchronized statement, so normally they don't add much overhead.

Mutex syntactic sugar


Using mutexes we can protect the withdraw() and deposit() methods from themselves and from
each other, using the receiver object's mutex:
void withdraw(int n) {
void deposit(int n) {
synchronized(this) {
synchronized(this) {
balance -= n;
balance += n;
}
}
}
}

Because the pattern of wrapping entire method bodies in synchronized(this) is so common, Java
has syntactic sugar for it. Declaring a method to be synchronized has the same effect:

synchronized void withdraw(int n) { synchronized void deposit(int n) {


balance -= n; balance += n;
} }

Mutex variations
Java mutexes are reentrant mutexes, meaning that it is harmless for a single thread to acquire the
same mutex more than once. One consequence is that one synchronized method can call another
on the same object without getting stuck trying to acquire the same mutex. Each mutex keeps
track of the number of times the thread has acquired the mutex, and the mutex is only really re-
leased once it has been released by the holding thread the same number of times.

A locking mechanism closely related to the mutex is the semaphore, named after railway sema-
phores. A binary semaphore acts just like a (non-reentrant) mutex, except that a thread is not re-
quired to hold the semaphore in order to release it. In general, semaphores can be acquired by up
to some fixed number of threads, and additional threads trying to acquire it block until some re-
leases happen. Semaphores are the original locking abstraction, and they make possible some ad-
ditional concurrent algorithms. But semaphores are harder than mutexes to use successfuly.

When is synchronization needed?


Synchronization is not free. It involves manipulating data structures, and on a machine with mul-
tiple processors (or cores), requires communication between the processors. When one is trying
to make code run fast, it is tempting to cheat on synchronization. Usually this leads to disaster.

Synchronization is needed whenever we need to rely on invariants on the state of objects, either
between different fields of one or more objects, or between contents of the same field at different
times. Without synchronization there is no guarantee that some other thread won't be simultane-
ously modifying the fields in question, leading to an inconsistent view of their contents.

Synchronization is also needed when we need to make sure that one thread sees the updates
caused by another thread. It is possible for one thread to update an instance variable and another
thread to later read the same instance variable and see the value it had before the update. This in-
consistency arises because different threads may run on different processors. For speed, each
processor has its own local copy of memory, but updates to local memory need not propagate im-
mediately to other processors. For example, consider two threads executing the following code in
parallel:

Thread 1: Thread 2:
y = 1; while (x != 0) {}
x = 1; print (y);

What possible values of y might be printed by thread 2? Naively it looks like the only possible
value is 1. But without synchronization between these two threads, the update to x can be seen by
thread 2 without seeing the update to y. The fact that the assignment to y happened before x does
not matter!

The reliable way to ensure that updates done by one thread are seen by another is to explicitly
synchronize the two threads. Synchronization is needed for all accesses to mutable state that is
shared between threads. The mutable state might be entire objects, or, for finer-grained synchro-
nization, just mutable fields of objects. Each piece of mutable state should be protected by a
lock. When the lock protecting a shared mutable field is not being held by the current thread, the
programmer must assume that its value can change at any time. Any invariant that involves the
value of such a field cannot be relied upon.

Note that immutable state shared between threads doesn't need to be locked because no one will
try to update them. This fact encourages a style of programming that avoids mutable state.
Synchronization
Synchronization and happens-before
As we've seen, a write to memory by one thread (i.e., an assignment to an instance variable) does
not necessarily affect a later read from the same location by another thread. Java, like most pro-
gramming languages, offers only a weak consistency model in which only some writes are guar-
anteed to be seen by later reads. When the consistency model considers two operations to be
causally related in this way, we say that one operation has a happens-before relationship to the
other. A write that happens-before a read is guaranteed to be seen by the read (though it is also
possible that the read will return the result of an even later write). Conversely, a read that hap-
pens-before a write is guarantee not to see the written value.

Release consistency
Different weak consistency models make different guarantees about happens-before relation-
ships. However, a useful least common denominator is release consistency. It guarantees that
any operation to a location that occurs before a release of a mutex by one thread will happen-be-
fore any operation following a later acquire of that mutex. Thus, to make sure that updates done
to shared state by one thread are seen by other threads, we simply have to guard all accesses to
the shared state using the same lock.

Barriers
In scientific computing applications, barriers are another popular way to ensure that updates by
one thread or set of threads are seen by computation in other threads. A barrier is created with a
specified number of threads that must reach the barrier. Each thread that reaches the barrier will
block until the specified number of threads have all reached it, at which point all the threads un-
block and are able to go forward. All operations in all threads that occur before the barrier is
reached are guaranteed to happen-before all operations that occur after the barrier. Barriers make
it easy to divide up a parallel computation into a series of communicating stages.

The Java libraries provide a barrier abstraction, the class java.util.concurrent.CyclicBarrier. An


instance of this class is created with the number of threads that are expected to reach the barrier.
Each thread reaches the barrier by calling barrier.await(). This cause the threads to block until
the required number of threads reaches the barrier, at which point all the threads unblock. The
barrier then resets and can be used again.

Barriers also help ensure a consistent view of memory. Once a thread that has reached a barrier
unblocks, it is guaranteed to see all the memory updates that other threads performed before the
barrier. The barrier style of computation allows a set of threads to divide up work and make pro-
gress on it, then exchange information via a barrier.

Monitors
The monitor pattern is another way to manage synchronization. It builds synchronization into
objects: a monitor is an object with a built-in mutex on which all of the monitor's methods are
synchronized. This design is accomplished in Java easily, because every object has a mutex, and
the synchronized keyword enforces the monitor pattern. Java objects are designed to be used as
monitors. A monitor can also have some number of condition variables, which we'll return to
shortly.

The only objects that should be shared between threads are therefore immutable objects and ob-
jects protected by locks. Objects protected by locks include both monitors and objects encapsu-
lated inside monitors, since objects encapsulated inside monitors are protected by their locks.

Deadlock
Monitors ensure consistency of data. But the locking they engage in can cause deadlock, a con-
dition in which every thread is waiting to acquire a lock that some other thread is holding. For
example, consider two monitors a and b, where a.f() calls b.g() and vice-versa:

class A {
synchronized void f() { b.g(); }
}

class B {
synchronized void g() { a.f(); }
}

Thread 1: Thread 2:

a.f(); b.g();

If two threads try to call a.f() and b.g() respectively, the threads can acquire locks on a and b re-
spectively and then deadlock trying to acquire the remaining lock. We can represent this situation
using a diagram like the following. In such a diagram, deadlocks show up as cycles in the graph.

To avoid creating cycles in the graph, the usual approach is to define an ordering on locks, and
acquire locks in an order consistent with that ordering. For example, we might decide that a < b
in the lock ordering. Therefore, b cannot call a method of a because a method of b already holds
a lock that is higher in the ordering.

In general, a thread may acquire a lock on (synchronize on) an object only if that object comes
later in the lock ordering than all locks that the thread already holds.

The requirement that some locks not be held becomes a precondition of methods, which need to
specify which locks may be held when the method is called. To abstract this requirement, the no-
tion of locking level can be used. The locking level defines the highest level lock in the lock or-
dering that may be held when the method is called.
For example, suppose we have a < b in the lock ordering, and a lock level annotation to the
methods f() and g() that says the highest level that can be held is the corresponding lock. With
such an annotation, the call in B.g() to a.f() violates the lock ordering, so we can see that it
might result in deadlock.

class A {
/** LL: a */
synchronized void f() { b.g(); }
}

class B {
/** LL: b */
synchronized void g() { a.f(); }
}

Locks are not enough for waiting


Locks block threads from making progress, but in general, they are not a sufficiently powerful
mechanism for blocking threads. More generally, we may want to block a thread until some con-
dition becomes true. Examples of such situations are (1) when we want to communicate infor-
mation between threads (which may need to block until some information becomes available)
and (2) when we want to implement our own lock abstractions.

One such abstraction we might want to build is a barrier, because for simple uses of concurrency,
barriers make it easy to build race-free, deadlock-free code.

For example, suppose we want to run two threads in parallel to compute some results and wait
until both results are available. We might define a class WorkerPair that spawns two worker
threads:

class WorkerPair extends Runnable {


int done; // number of threads that have finished
Object result;
WorkerPair() {
done = 0;
new Thread(this).start();
new Thread(this).start();
}
public void run() {
doWork();
synchronized(this) {
done++;
result = ...
}
}

// not synchronized, to allow concurrent execution<


public void doWork() {
// use synchronized methods here
}

Object getResult() {
while (done < 2) {} // oops: wasteful!
return result; // oops: not synchronized!
}
}

We might then use this code as follows:

w = new WorkerPair();
Object o = w.getResult();

As the comments in the code suggest, there are two serious problems with the getResult imple-
mentation. First, the loop on done < 2 will waste a lot of time and energy. Second, there is no syn-
chronization ensuring that updates to result are seen.

How can we fix this? We might start by making getResult() synchronized, but this would block
the final assignment to done and result in the run method. We can't use the mutex of w to wait un-
til done becomes 2.

Condition variables
The solution to the problem is to use a condition variable, which is a mechanism for blocking a
thread until a condition becomes true.

While monitors in general have multiple condition variables, every Java object implicitly has a
single condition variable tied to its mutex. It is accessed using the wait() and notifyAll() meth-
ods. (There is also a notify() method, but it should usually be avoided.)

The wait() method is used when the thread wants to wait for the condition to become true. It may
only be called when the mutex is held. It atomically releases the mutex and blocks the current
thread on the condition variable. The thread will only wake up and start executing when
notifyAll() or notify() are called on the same condition variable. (Java has a version of wait()
that includes a timeout period after which it will automatically wake itself up. This version
should usually be avoided.) In particular, wait() will not wake up simply because the condition
variable's mutex has been released by some other thread. The other thread must call notifyAll().

Another thread should call the notifyAll() method when the condition of the condition variable
might be true. Its effect is to wake up all threads waiting on the condition variable. When a
thread wakes up from wait(), it immediately tries acquire the mutex. Only one thread can win;
the others all block waiting for the winner to release the mutex. Eventually they acquire the
mutex, though there is no guarantee that the condition is true when any of the threads awakes.

After a thread calls wait(), the condition it is waiting for might be true when wait() returns. But
it need not be. Some other thread might have been scheduled first and may have made the condi-
tion false. So wait() is always called in a loop, like so:

while (!condition) wait();

Failure to test the condition after wait() leads to what is called a wakeup--waiting race, in which
threads awakened by notifyAll() race to observe the condition as true. The winners of the race
can then spoil things for later awakeners.

Using condition variables, we can implement getResult() as follows:


synchronized Object getResult() {
while (done < 2) wait();
return result;
}

With this implementation, the mutex is not held while the thread waits. The implementation of
run is also modified to call notifyAll():

...
synchronized(this) {
done++;
result = ...
if (done == 2) notifyAll();
}

In Java, the call to notifyAll() must be done when the mutex is held. Waiting threads will awak-
en but will immediately block trying to acquire the mutex. If there are threads waiting, one of
them will win the race and acquire the mutex. In fact, since each awakened thread will test the
condition, we need not even test it before calling notifyAll():

...
synchronized(this) {
done++;
result = ...
notifyAll();
}

Java objects also have a notify() method that wakes just one thread instead of all of them. Use of
notify() is error-prone and usually should be avoided.

In general a monitor may have multiple conditions under which it wants to wake up threads. Giv-
en that a Java object has only one built-in condition variable, how can this be managed? One
possibility is to use a ConditionObject object from the java.util.concurrent package. A second
easy technique is to combine all the multiple conditions into one condition variable that repre-
sents the boolean disjunction of all of them. A notifyAll() is sent whenever any of the conditions
becomes true; threads awoken by notifyAll() then test to see if their particular condition has be-
come true; otherwise, they go back to sleep.

Using background threads with JavaFX


In JavaFX, any background work must be done in a separate thread, because if the Application
thread is busy doing work instead of handling user interface events, the UI becomes unrespon-
sive. However, UI nodes are not thread-safe, so only one thread is allowed to access the compo-
nent hierarchy: the Application thread.

The Task class encapsulates useful functionality for starting up background threads and for ob-
taining results from them. This is easier than coding up your own mechanism using mutexes and
condition variables. The key methods are these:

/** A concurrent computation that produces a value of type V. */


class Task<V> {
// not to be called:

/** Returns: the result of this task. */


protected abstract V call();

// For use inside the call() method:

/** Returns: whether task was canceled. Implementation of call()


* should use this method periodically if another thread might cancel
* the task.
*/
public boolean isCancelled();

/** Effect: Report progress of the task. When workdone reaches max, the task
* should be done. Progress can also be reported using Platform.runLater().
*/
protected void updateProgress(long workDone, long max);

// For use by clients in other threads:

/** Returns: the value computed by this thread. */


T getValue();

/** Effect: set the event handler to invoke when the task completes
* successfully.
*/
void setOnSucceeded(EventHandler<WorkerStateEvent> h);

/** Effect: Cancel this task. */


public void cancel();

/** Returns: the fractional progress. */


double getProgress();

/** Returns: a property for the progress. */


ReadOnlyDoubleProperty progressProperty();

Some of the methods are designed to be used within the implementation of the task, and others
are designed to be used by client code in other threads, to control the task and to interact with it.

To compute something of type V in the background, a subclass of Task<V> is defined that overrides
the method call(). Because a Task is a Runnable, the task can be started by creating a new thread
to run it:

Thread th = new Thread(task);


th.start();

The work done by the tasks is defined in the call() method; it should simply return the desired
result at the end of the method in the usual way. Notice that the call() method is not supposed to
be called by clients or by any subclass code; instead, it is automatically called by the run method
of the task.
To report progress back to the Application thread, it may also call reportProgress(). When the
task completes by returning a value of type V from the call() method, the event handler h defined
by calling setOnSucceeded(h) is invoked in the Application thread.

It is possible for a task to be canceled by calling the cancel() method; however, it is incumbent
on the implementation of the task to periodically check whether the task has been canceled by
using the isCancelled() method.

By listening on the property progressProperty(), client code in the Application thread can keep
track of the progress of the task and update the GUI to reflect how far along the task is. The Task
can also communicate back to the Application thread by using method Platform.runLater(), but
this approach may couple the task implementation with the GUI more than is desirable.
Graphs and graph representations
Topics:
vertices and edges
directed vs undirected graphs
labeled graphs
adjacency and degree
adjacency-matrix and adjacency-list representations
paths and cycles
topological sorting
more graph problems: shortest paths, graph coloring

A graph is a highly useful mathematical abstraction. A graph consists of a set of vertices (also
called nodes) and a set of edges (also called arcs) connecting those vertices. There are two main
kinds of graphs: undirected graphs and directed graphs. In a directed graph (sometimes abbre-
viated as digraph), the edges are directed: that is, they have a direction, proceeding from a
source vertex to a sink (or destination) vertex. The sink vertex is a successor of the source, and
the the source is a predecessor of the sink. In undirected graphs, the edges are symmetrical.

Uses of graphs
Graphs are a highly useful abstraction in computer science because so many important problems
can be expressed in terms of graphs. We have already seen a number of graph structures: for ex-
ample, the objects in a running program form a directed graph in which the vertices are objects
and references between objects are edges. To implement automatic garbage collection (which
discards unused objects), the language implementation uses a algorithm for graph reachability.

Other examples of graphs, many of which we've seen, include:

• states of games and puzzles, which are vertices connected by edges that are the legal
moves in the game,
• state machines, where the states are vertices and the transitions between states are edges,
• road maps, where the vertices are intersections or points along the road and edges are
roads connecting those points,
• scheduling problems, where vertices represent events to be scheduled and edges might
represent events that cannot be scheduled together, or, depending on the problem, edges
that must be scheduled together,
• and in fact, any binary relation ρ can be viewed as a directed graph in which the
relationship x ρ y corresponds to an edge from vertex x to vertex y.

What is the value of having a common mathematical abstraction like graphs? One payoff is that
we can develop algorithms that work on graphs in general. Once we realize we can express a
problem in terms of graphs, we can consult a very large toolbox of efficient graph algorithms, ra-
ther than trying to invent a new algorithm for the specific domain of interest.

On the other hand, some problems over graphs are known to be intractable to solve in a reasona-
ble amount of time (or at least strongly suspected to be so). If we can show that solving the prob-
lem we are given is at least as hard as solving one of these

Vertices and edges


The vertices V of a graph are a set; the edges E can be viewed as set of ordered pairs (v1, v2) rep-
resenting an edge with source vertex v1 and sink vertex v2.

E = {(v1, v2), (v'1, v'2), ...}

If the graph is undirected, then for each edge (v1, v2), the edge set also includes (v1, v2). Alterna-
tively, we can view the edges in an undirected graph as a set of sets of edges {v1, v2}.

Edges from a vertex to itself may or may not be permitted depending on the setting; also, multi-
ple edges between the same vertices may be permitted in some cases.

Adjacency and degree


Two vertices v and w are adjacent, written v ~ w, if they are connected by an edge. The degree
of a vertex is the total number of adjacent vertices. In a directed graph, we can distinguish be-
tween outgoing and incoming edges. The out-degree of a vertex is the number of outgoing edges
and the in-degree is the number of incoming edgs.

Labels
The real value of graphs is obtained when we can use them to organize information. Both edges
and vertices of graphs can have labels that carry meaning about an entity represented by a vertex
or about the relationship between two entities represented by an edge. For example, we might en-
code information about three cities, Syracuse, Ithaca, and Binghamton as edge and vertex labels
in the following undirected graph:
Here, the vertices are labeled with a pair containing the name of the city and its population. The
edges are labeled with the distance between the cities.

A graph in which the edges are labeled with numbers is called a weighted graph. Of course, the
labels do not have to represent weight; they might stand for distance betweenvertices, or the
probability of transitioning from one state to another, or the similarity between two vertices, etc.

Graph representations
There is more than one way to represent a graph in a computer program. Which representation is
best depend on what graphs are being represented and how they are going to be used. Let us con-
sider the following weighted directed graph and how we might represent it:

Adjacency matrix
An adjacency matrix represents a graph as a two-dimensional array. Each vertex is assigned a
distinct index in [0, |V|). If the graph is represented by the 2D array m, then the edge (or lack
thereof) from vertex i to vertex j is recorded at m[i][j].

The graph structure can be represented by simplying storing a boolean value at each array index.
For example, the edges in the directed graph above are represented by the true (T) values in this
matrix:

0 1 2 3

0 F T T F

1 F F F T

2 F T F T

3 F F F F

More compact bit-level representations for the booleans are also possible.

Typically there is some information associated with each edge; instead of a boolean, we store
that information into the corresponding array entry:
0 1 2 3

0 — 10 40 —

1 — — — –5

2 — 25 20 —

3 — — — —

The space required by the adjacency matrix representation is O(V2), so adjacency matrices can
waste a lot of space if the number of edges |E| is O(V). Such graphs are said to be sparse. For ex-
ample, graphs in which in-degree or out-degree are bounded by a constant are sparse. Adjacency
matrices are asymptotically space-efficient, however, when the graphs they represent are dense;
that is, |E| is O(V2).

The adjacency matrix representation is time-efficient for some operations. Testing whether there
is an edge between two vertices can clearly be done in constant time. However, finding all in-
coming edges to a given vertex, or finding all outgoing edges, takes time proportional to the
number of vertices, even for sparse graphs.

Undirected graphs can be represented with an adjacency matrix too, though the matrix will be
symmetrical around the matrix diagonal. This symmetry invariant makes possible some space
optimizations.

Adjacency list representation


Since sparse graphs are common, the adjacency list representation is often preferred. This repre-
sentation keeps track of the outgoing edges from each vertex, typically as a linked list. For exam-
ple, the graph above might be represented with the following data structure:

Adjacency lists are asymptotically space-efficient because they only use space proportional to
the number of vertices and the number of edges. We say that they require O(V+E) space.

Finding the outgoing edges from a vertex is very efficient in the adjacency list representation
too; it requires time proportional to the number of edges found. However, finding the incoming
edges to a vertex is not efficient: it requires scanning the entire data structure, requiring O(V+E)
time.
When it is necessary to be able to walk forward on outgoing edges and backward on incoming
edges, a good approach is to maintain two adjacency lists, one representing the graph as above
and one corresponding to the dual (or transposed) graph in which all edges are reversed. That it,
if there is a an edge a→b in the original graph, there is an edge b→a in the transposed graph. Of
course, an invariant must be maintained between the two adjacency list representations.

Testing whether there is an edge from vertex i to vertex j requires scanning all the outgoing edg-
es, taking O(V) time in the worse case. If this operation needs to be fast, the linked list can be re-
placed with a hash table. For example, we might implement the graph using this Java representa-
tion, which preserves the asympotic space efficiency of adjacency lists while also supporting
queries for particular edges:

HashMap<Vertex, HashMap<Vertex, Edge>>

Paths and cycles


Following a series of edges from a starting vertex creates a walk through the graph, a sequence
of vertices (v0,...,vp) where there is an edge from vi-1 to vi for all i between 1 and p. The length
of the walk is the number of edges followed (that is, p). If no vertex appears twice in the walk,
except that possibly v0 = vn, the walk is called a path. If there are no repeating vertices, it is a
simple path. If the first and last vertices are the same, the path is a cycle.

Some graphs have no cycles. For example, linked lists and trees are both examples of graphs in
which there are no cycles. They are directed acyclic graphs, abbreviated as DAGs. In trees and
linked lists, each vertex has at most one predecessor; in general, DAG vertices can have more
than one predecessor.

Topological sorting
One use of directed graphs is to represent an ordering constraint on vertices. We use an edge
from x to y to represent the idea that “x must happen before y”. A topological sort of the vertices
is a total ordering of the vertices that is consistent with all edges. A graph can be topologically
sorted only if it has no cycles; it must be a DAG.

Topological sorts are useful for deciding in what order to do things. For example, consider the
following DAG expressing what we might call the “men's informal dressing problem”:

A valid plan for getting dressed is a topological sort of this graph, and in fact any topological sort
is in principle a workable way to get dressed. For example, the ordering (pants, shirt, belt, socks,
tie, jacket, shoes) is consistent with the ordering on all the graph edges. Less conventional strate-
gies are also workable, such as (socks, pants, shoes, shirt, belt, tie, jacket).

Does every DAG have a topological sort? Yes. To see this, observe that every finite DAG must
have a vertex with in-degree zero. To find such a vertex, we start from an arbitrary vertex in the
graph and walk backward along edges until we reach a vertex with zero in-degree. We know that
the walk must generate a simple path because there are no cycles in the graph. Therefore, the
walk must terminate because we run out of vertices that haven't already been seen along the
walk.

This gives us an (inefficient) way to topologically sort a DAG:

1. Start with an empty ordering.


2. Find a 0 in-degree node and put it at the end of the ordering built thus far (the first node we
do this with will be the first node in the ordering).
3. Remove the node found from the graph.
4. Repeat from step 2 until the graph is empty.

Since finding the 0 in-degree node takes O(V) time, this algorithm takes O(V2) time. We can do
better, as we'll see shortly.

Other graph problems


Many problems of interest can be expressed in terms of graphs. Here are a few examples of im-
portant graph problems, some of which can be solved efficiently and some of which are intracta-
ble!

Reachability
One vertex is reachable from another if there is a path from one to the other. Determining which
vertices are reachable from a given vertex is useful and can be done efficiently, in linear time.

Shortest paths
Finding paths with the smallest number of edges is useful and can be solved efficiently. Shortest-
path problems address a generalization: minimizing the total weight of all edges along a path,
where the weight of an edge is thought of as a distance.

For example, if a road map is represented as a graph with vertices representing intersections and
edges representing road segments, the shortest-path problem can be used to find short routes.
There are several variants of the problem, depending on whether one is interested in the distance
from a given root vertex or in the distances between all pairs of vertices. If negative-weight edg-
es exist, these problems become harder and different algorithms (e.g., Bellman–Ford) are need-
ed.

Hamiltonian cycles and the traveling salesman problem


The problem of finding the longest path between two nodes in a graph is, in general, intractable.
It is related to some other important problems. A Hamiltonian path is one that visits every ver-
tex in a graph. The ability to determine whether a graph contains a Hamiltonian path (or a Hamil-
tonian cycle) would be useful, but in general this is an intractable problem for which the best ex-
act algorithms require exponential-time searching.

A weighted version of this problem is the traveling salesman problem (TSP), which tries to
find the Hamiltonian cycle with the minimum total weight. The name comes from imagining a
salesman who wants to visit every one of a set of cities while traveling the least possible total
distance. This problem is at least as hard as finding Hamiltonian cycles. However, finding a solu-
tion that is within a constant factor (e.g., 1.5) of optimal can be done in polynomial time with
some reasonable assumptions. In practice, there exist good heuristics that allow close-to-optimal
solutions to TSP to be found even for large problem instances.

Graph coloring
Imagine that we want to schedule exams into k time slots such that no student has two exams at
the same time. We can represent this problem using an undirected graph in which the exams are
vertices. Exam V1 and exam V2 are connected by an edge if there is some student who needs to
take both exams. We can schedule the exams into the k slots if there is a k-coloring of the graph:
a way to assign one of k colors, representing the time slots, to each of the vertices such that no
two adjacent vertices are assigned the same color.

The problem of determining whether there is a k-coloring turns out to be intractable. The chro-
matic number of a graph is the minimum number of colors that can be used to color it; this is of
course intractable too. Though the worst case is intractable, in practice, graph colorings close to
optimal can be found.
Graph traversals
Topics:
tricolor algorithm
breadth-first search
depth-first search
cycle detection
topological sort
connected components

Graph traversals
We often want to solve problems that are expressible in terms of a traversal or search over a
graph. Examples include:

Finding all reachable nodes (for garbage collection)


Finding the best reachable node (single-player game search) or the minmax best reachable
node (two-player game search)
Finding the best path through a graph (for routing and map directions)
Determining whether a graph is a DAG.
Topologically sorting a graph.

The goal of a graph traversal, generally, is to find all nodes reachable from a given set of root
nodes. In an undirected graph we follow all edges; in a directed graph we follow only out-edges.

Tricolor algorithm
Abstractly, graph traversal can be expressed in terms of the tricolor algorithm due to Dijkstra
and others. In this algorithm, graph nodes are assigned one of three colors that can change over
time:

White nodes are undiscovered nodes that have not been seen yet in the current traversal
and may even be unreachable.
Black nodes are nodes that are reachable and that the algorithm is done with.
Gray nodes are nodes that have been discovered but that the algorithm is not done with
yet. These nodes are on a frontier between white and black.

The progress of the algorithm is depicted by the following figure. Initially there are no black
nodes and the roots are gray. As the algorithm progresses, white nodes turn into gray nodes and
gray nodes turn into black nodes. Eventually there are no gray nodes left and the algorithm is
done.
The algorithm maintains a key invariant at all times: there are no edges from black nodes to
white nodes. This is clearly true initially, and because it is true at the end, we know that any re-
maining white nodes cannot be reached from the black nodes.

The algorithm pseudo-code is as follows:

1. Color all nodes white, except for the root nodes, which are colored gray.
2. While some gray node n exists:
color some white successors of n gray.
if n has no white successors, optionally color n black.

This algorithm is abstract enough to describe many different graph traversals. It allows the par-
ticular implementation to choose the node n from among the gray nodes; it allows choosing
which and how many white successors to color gray, and it allows delaying the coloring of gray
nodes black. We says that such an algorithm is nondeterministic because its behavior is not fully
defined. However, as long as it does some work on each gray node that it picks, any implementa-
tion that can be described in terms of this algorithm will finish. Further, because the black-white
invariant is maintained, it must reach all reachable nodes in the graph.

One value of defining graph search in terms of the tricolor algorithm is that the tricolor algorithm
works even when gray nodes are worked on concurrently, as long as the black-white invariant is
maintained. Thinking about this invariant therefore helps us ensure that whatever graph traversal
we choose will work when parallelized, which is increasingly important.

Breadth-first search
Breadth-first search (BFS) is a graph traversal algorithm that explores nodes in the order of their
distance from the roots, where distance is defined as the minimum path length from a root to the
node. Its pseudo-code looks like this:

// let s be the source node


frontier = new Queue();
root.distance = 0;
frontier.push(root);
while (frontier not empty) {
Vertex v = frontier.pop()
foreach (w ≻ v) {
if (w.distance = ∞) {
frontier.push(w)
w.distance = v.distance + 1;
}
}
}

Here the white nodes are those not marked as visited, the gray nodes are those marked as visited
and that are in fronter, and the black nodes are visited nodes no longer in frontier. Rather than
having a visited flag, we can keep track of a node's distance in the field v.distance. When a new
node is discovered, its distance is set to be one greater than its predecessor v.

When frontier is a first-in, first-out (FIFO) queue, we get breadth-first search. All the nodes on
the queue have a minimum path length within one of each other. In general, there is a set of
nodes to be popped off, at some distance k from the source, and another set of elements, later on
the queue, at distance k+1. Every time a new node is pushed onto the queue, it is at distance k+1
until all the nodes at distance k are gone, and k then goes up by one. Therefore newly pushed
nodes are always at a distance at least as great as any other gray node.

Suppose that we run this algorithm on the following graph, assuming that successors are visited
in alphabetic order from any given node: :

In that case, the following sequence of nodes pass through the queue, where each node is anno-
tated by its minimum distance from the source node A. Note that we're pushing onto the right of
the queue and popping from the left.

A0 B1 D1 E1 C2

Clearly, nodes are popped in distance order: A, B, D, E, C. This is very useful when we are trying
to find the shortest path through the graph to something. When a queue is used in this way, it is
known as a worklist; it keeps track of work left to be done.

Depth-first search
What if we were to replace the FIFO queue with a LIFO stack? In that case we get a completely
different order of traversal. Assuming that successors are pushed onto the stack in reverse alpha-
betic order, the successive stack states look like this:

A
B D E
C D E
D E
E
With a stack, the search will proceed from a given node as far as it can before backtracking and
considering other nodes on the stack. For example, the node E had to wait until all nodes reacha-
ble from B and D were considered. This is a depth-first search.

A more standard way of writing depth-first search is as a recursive function, using the program
stack as the stack above. We start with every node white except the starting node and apply the
function DFS to the starting node:

DFS(Vertex v) {
v.color = gray;
foreach (w ≻ v) {
if (w.color == white) DFS(w);
}
v.color = black;
}

You can think of this as a person walking through the graph following arrows and never visiting
a node twice except when backtracking, when a dead end is reached. Running this code on the
graph above yields the following graph colorings in sequence, which are reminiscent of but a bit
different from what we saw with the stack-based version:
Notice that at any given time there is a single path of gray nodes leading from the starting node
and leading to the current node v. This path corresponds to the stack in the earlier implementa-
tion, although the nodes end up being visited in a different order because the recursive algorithm
only marks one successor gray at a time.

The sequence of calls to DFS form a tree: the call tree of the program. Every execution of a pro-
gram traverse a call tree. In this case the call tree is a subgraph of the original graph:

The algorithm maintains an amount of state that is proportional to the size of this path from the
root. This makes DFS rather different from BFS, where the amount of state (the queue size) cor-
responds to the size of the perimeter of nodes at distance k from the starting node. In both algo-
rithms the amount of state can be O(|V|). For DFS this happens when searching a linked list. For
BFS this happens when searching a graph with a lot of branching, such as a binary tree, because
there are 2k nodes at distance k from the root. On a balanced binary tree, DFS maintains state
proportional to the height of the tree, or O(log |V|). Often the graphs that we want to search are
more like trees than linked lists, and so DFS tends to run faster.

There can be at most |V| calls to DFS_visit. And the body of the loop on successors can be exe-
cuted at most |E| times. So the asymptotic performance of DFS is O(|V| + |E|), just like for
breadth-first search.
If we want to search the whole graph, then a single recursive traversal may not suffice. If we had
started a traversal with node C, we would miss all the rest of the nodes in the graph. To do a
depth-first search of an entire graph, we call DFS on an arbitrary unvisited node, and repeat until
every node has been visited. For example, consider the original graph expanded with two new
nodes F and G:

DFS starting at A will not search all the nodes. Suppose we next choose F to start from. Then we
will reach all nodes. Instead of constructing just one tree that is a subgraph of the original graph,
we get a forest of two trees:

Topological sort
One of the most useful algorithms on graphs is topological sort, in which the nodes of an acyclic
graph are placed in an order consistent with the edges of the graph. This is useful when you need
to order a set of elements where some elements have no ordering constraint relative to other ele-
ments.

For example, suppose you have a set of tasks to perform, but some tasks have to be done before
other tasks can start. In what order should you perform the tasks? This problem can be solved by
representing the tasks as node in a graph, where there is an edge from task 1 to task 2 if task 1
must be done before task 2. Then a topological sort of the graph will give an ordering in which
task 1 precedes task 2. Obviously, to topologically sort a graph, it cannot have cycles. For exam-
ple, if you were making lasagna, you might need to carry out tasks described by the following
graph:
There is some flexibility about what order to do things in, but clearly we need to make the sauce
before we assemble the lasagna. A topological sort will find some ordering that obeys this and
the other ordering constraints. Of course, it is impossible to topologically sort a graph with a cy-
cle in it.

The key observation is that a node finishes (is marked black) after all of its descendants have
been marked black. Therefore, a node that is marked black later must come earlier when topolog-
ically sorted. A a postorder traversal generates nodes in the reverse of a topological sort:

Algorithm:
Perform a depth-first search over the entire graph, starting anew with an unvisited node
if previous starting nodes did not visit every node. As each node is finished (colored
black), put it on the head of an initially empty list. This clearly takes time linear in the
size of the graph: O(|V| + |E|).

For example, in the traversal example above, nodes are marked black in the order C, E, D, B, A.
Reversing this, we get the ordering A, B, D, E, C. This is a topological sort of the graph. Similar-
ly, in the lasagna example, assuming that we choose successors top-down, nodes are marked
black in the order bake, assemble lasagna, make sauce, fry sausage, boil pasta, grate cheese. So
the reverse of this ordering gives us a recipe for successfully making lasagna, even though suc-
cessful cooks are likely to do things more in parallel!

Detecting cycles
Since a node finishes after its descendents, a cycle involves a gray node pointing to one of its
gray ancestors that hasn't finished yet. If one of a node's successors is gray, there must be a cycle.
To detect cycles in graphs, therefore, we choose an arbitrary white node and run DFS. If that
completes and there are still white nodes left over, we choose another white node arbitrarily and
repeat. Eventually all nodes are colored black. If at any time we follow an edge to a gray node,
there is a cycle in the graph. Therefore, cycles can be detected in O(|V+E|) time.

Edge classifications
We can classify the various edges of the graph based on the color of the node reached when the
algorithm follows the edge. Here is the expanded (A–G) graph with the edges colored to show
their classification.
Note that the classification of edges depends on what trees are constructed, and therefore de-
pends on what node we start from and in what order the algorithm happens to select successors
to visit.

When the destination node of a followed edge is white, the algorithm performs a recursive call.
These edges are called tree edges, shown as solid black arrows. The graph looks different in this
picture because the nodes have been moved to make all the tree edges go downward. We have al-
ready seen that tree edges show the precise sequence of recursive calls performed during the tra-
versal.

When the destination of the followed edge is gray, it is a back edge, shown in red. Because there
is only a single path of gray nodes, a back edge is looping back to an earlier gray node, creating a
cycle. A graph has a cycle if and only if it contains a back edge when traversed from some node.

When the destination of the followed edge is colored black, it is a forward edge or a cross edge.
It is a cross edge if it goes between one tree and another in the forest; otherwise it is a forward
edge.

Detecting cycles
It is often useful to know whether a graph has cycles. To detect whether a graph has cycles, we
perform a depth-first search of the entire graph. If a back edge is found during any traversal, the
graph contains a cycle. If all nodes have been visited and no back edge has been found, the graph
is acyclic.

Connected components
Graphs need not be connected, although we have been drawing connected graphs thus far. A
graph is connected if there is a path between every two nodes. However, it is entirely possible to
have a graph in which there is no path from one node to another node, even following edges
backward. For connectedness, we don't care which direction the edges go in, so we might as well
consider an undirected graph. A connected component is a subset S such that for every two ad-
jacent vertices v and v', either v and v' are both in S or neither one is.

For example, the following undirected graph has three connected components:
The connected components problem is to determine how many connected components make up a
graph, and to make it possible to find, for each node in the graph, which component it belongs to.
This can be a useful way to solve problems. For example, suppose that different components cor-
respond to different jobs that need to be done, and there is an edge between two components if
they need to be done on the same day. Then to find out what is the maximum number of days that
can be used to carry out all the jobs, we need to count the components.

Algorithm:
Perform a depth-first search over the graph. As each traversal starts, create a new com-
ponent. All nodes reached during the traversal belong to that component. The number
of traversals done during the depth-first search is the number of components. Note that
if the graph is directed, the DFS needs to follow both in- and out-edges.

Strongly Connected Components


For directed graphs, it is usually more useful to define strongly connected components. A
strongly connected component (SCC) is a maximal subset of vertices such that every vertex in
the set is reachable from every other. All nodes in a cycle in a graph are part of the same strongly
connected component, so every graph can be viewed as a DAG composed of SCCs.

Kosraju's algorithm
A simple and efficient algorithm due to Kosaraju finds SCCs by performing depth-first graph
traversal twice:
1. Topologically sort the nodes using DFS. The SCCs will appear in sequence.
2. Now traverse the transposed graph, but pick new (white) roots in topological order. Each
new subtraversal reaches a distinct SCC.

For example, consider the following graph, which is clearly not a DAG:

Running a depth-first traversal in which we happen to choose children left-to-right, and extract-
ing the nodes in reverse postorder, we obtain the ordering 1, 4, 5, 6, 2, 3, 7. Notice that the SCCs
occur sequentially within this ordering. The job of the second part of the algorithm is to identify
the boundaries.
In the second phase, we start with 1 and find it has no predecessors, so {1} is the first SCC. We
then start with 4 and find that 5 and 6 are reachable via backward edges, so the second SCC is
{4,5,6}. Starting from 2, we discover {2,3}, and the final SCC is {7}. The resulting DAG of
SCCs is the following:

Notice that the SCCs are also topologically sorted by this algorithm, modulo back edges.

Tarjan's algorithm
One inconvenience of Kosaraju's algorithm is that it requires being able to walk edges backward.
Tarjan's algorithm for finding strongly connected components performs just one depth-first tra-
versal of the graph, only walking edges forward, and is only slightly more complicated by the
use of a stack.

Each node has an index variable that keeps track of the order in which the nodes were discov-
ered; its value is ∞ for undiscovered nodes. Each node also has a lowlink variable that keeps
track of the lowest-index node that is part of the same SCC. In addition, there is a stack s that
keeps track of the nodes that have been visited but have not yet been assigned to an SCC, and a
list SCCs of already discovered SCCs.

scc(Vertex v) {
v.index = v.lowlink = index++;
s.push(v);
foreach (Vertex w ≻ v) {
if (w.index == ∞) {
scc(w);
v.low = min(v.low, w.low);
} else if (w ∈ s) {
v.low = min(v.low, w.low);
}
}
if (v.low == v.index) {
// pop everything up to v off s and make an SCC from it
SCC nodes = new SCC();
do {
w = s.pop();
nodes.add(w);
} while (w != v);
SCCs.addFirst(nodes);
}
}

At the time each node finishes, the lowlink variable contains the index of the earliest stack node
that is part of the same SCC. The algorithm returns from the recursive calls until that node is
reached, at which point all nodes pushed on the stack after that node are part of the same SCC.
Its nodes are popped off the stack to form the new SCC. The new SCC is then prepended to the
head of the current list of SCCs.
For example, consider running this algorithm on the following graph:

A graph and its DAG of SCCs

The state of the stack and the SCCs as the algorithm executes is as follows. Node names are fol-
lowed by the values of index and lowlink at that point during execution, and a dot marks the cur-
rent node v, which is not necessarily at the top of the stack.

Stack s → SCCs
A:0,0•
A:0,0 B:1,1•
A:0,0 B:1,1 C:2,2•
A:0,0 B:1,1 C:2,1•
A:0,0 B:1,1 C:2,1 G:3,3•
A:0,0 B:1,1 C:2,1• {G}
A:0,0 B:1,1• C:2,1 {G}
A:0,0• {B,C}, {G}
A:0,0 D:4,4• {B,C}, {G}
A:0,0 D:4,4 E:5,5• {B,C}, {G}
A:0,0 D:4,4 E:5,5 F:6,4• {B,C}, {G}
A:0,0 D:4,4 E:5,4• F:6,4 {B,C}, {G}
A:0,0 D:4,4• E:5,4 F:6,4 {B,C}, {G}
A:0,0• {D,E,F}, {B,C}, {G}
{A}, {D,E,F}, {B,C}, {G}

The output is a topologically ordered list of SCCs. Ignoring back edges, each SCC is also topo-
logically ordered.

See Cormen, Leiserson, and Rivest for more details.

Further reading
Carrano. Data Structures and Abstractions with Java, Chapter 31.
Cormen, Leiserson, and Rivest. Introduction to Algorithms, Chapter 23.

Notes by Andrew Myers, 4/19/12.


Dijkstra's single-source shortest path algorithm
Topics:
The single-source shortest path problem
Dijkstra's algorithm
Proving correctness of the algorithm
Extensions: generalizing distance
Extensions: A*

The single-source shortest path problem


Suppose we have a weighted directed graph like the follow-
ing, and we want to find the path between two vertices with
the minimum total weight. Interpreting edge weights as dis-
tances, this is a shortest-path problem. There is a path with
total weight 50, but it is not trivial to find it. We could find
the shortest path by enumerating all possible permutations of
the vertices and checking which one, but there are O(V!) such
permutations.

Finding shortest paths in graphs is a very useful ability. In a


sense, we have already seen an algorithm for finding shortest
paths: breadth-first search finds shortest paths from a root
node or nodes to all other nodes, where the length of a path is
simply the number of edges on the path. In general it is useful
to have edges with different “distances”, so shortest-path
problems are expressed in terms of weighted graphs, where the weights represent distances.
(Weighted undirected graphs can be represented as weighted directed graph where each undi-
rected edge is converted to a pair of edges in each direction.)

Both breadth-first search and depth-first search are instances of the abstract tricolor algorithm.
The tricolor algorithm is nondeterministic: it allows certain things to be done at certain steps in
the algorithm without requiring them to be done. BFS and DFS resolve the nondeterministic
choices in different ways, and therefore visit nodes in different orders. In this lecture we will see
a third instance of the tricolor algorithm, which solves the single-source shortest path problem.

We will be assuming that all edges have nonnegative weights. If edges have negative weights, it
is possible to have cycles with net negative weight. In such a graph, minimum distance doesn't
make much sense, because such a cycle can be traversed an arbitrary number of times. (The Bell-
man-Ford algorithm can be used in this case.)

We will be solving the problem of finding the shortest path from a given root node or set of root
nodes. If used to find a path from multiple root nodes, the algorithm will not find the distances
from all of the possible starting points. To answer this question, the single-source shortest path
algorithm can be used repeatedly, once for each starting node. However, if the graph is dense, it
is more efficient to use an algorithm for solving the all-pairs shortest-path problem. The Floyd-
Warshall algorithm is the standard algorithm, and covered in CS 3110 or CS 4820.
Dijkstra's algorithm
The key insight behind Dijkstra's algorithm is to generalize breadth-first search, which both dis-
covers and finishes nodes in order of path length from the root. When edges have arbitrary
nonnegative weights, they can't be discovered in distance order, but they can be finished in dis-
tance order, as we will see.

Dijkstra's algorithm is an instance of the tricolor algorithm. Recall that a tricolor algorithm
changes white nodes to gray and gray nodes to black while never allowing a black node to have
an outgoing edge to a white node.

The progress of the algorithm is depicted by the following figure. Initially there are no black
nodes and the roots are gray. As the algorithm progresses, reachable white nodes turn into gray
nodes and these gray nodes eventually turn into black nodes. Eventually there are no gray nodes
left and the algorithm is done.

For Dijkstra's algorithm, the three colors are represented as follows:

White (undiscovered) nodes have their distance field set to ∞ because no path to them has
been found yet. (v.dist = ∞)
Gray (frontier) nodes have their distance field set to the total distance of some path from a
root to the node. In addition, like BFS, the algorithm keeps track of frontier nodes in a
queue. Unlike BFS, the queue is a priority queue, which we explain shortly. (v.dist < ∞,
v∈frontier)

Black (finished) nodes have their distance field set to the true minimum distance from the
roots, and are not in the frontier priority queue. (v.dist < ∞ , v∉frontier)

Dijkstra's algorithm starts with the roots in the priority queue at distance 0 (v.dist = 0), so they
are gray. All other nodes are at distance ∞. The algorithm then works roughly like BFS, except
that when node is popped from the priority queue, the node with the lowest current distance is
popped. When the algorithm completes, all reachable nodes are black and hold their true mini-
mum distance.

Here is the pseudo-code:


frontier = new PriorityQueue();
root.distance = 0;
fronter.push(root);
while (frontier not empty) {
Vertex v = frontier.pop();
foreach (Edge e == (v, Vertex v', int d)) {
if (v'.dist == ∞) {
v'.dist = v.dist + d;
frontier.push(v'); // color v' gray
} else {
v'.dist = min(v'.dist, v.dist + d);
// v' already gray, but update to shorter distance
frontier.increase_priority(v');
}
}
// color v black
}

Dijkstra's algorithm is an example of a greedy algorithm: an algorithm in which making simple


local choices (like choosing the gray vertex with the lowest distance) lead to global optimal re-
sults (minimal distances computed to all vertices). Many other problems, such as finding a mini-
mum spanning tree, can be solved by greedy algorithms. Other problems can't.

Example
Suppose we run the algorithm on the graph above, with v0 as the only root node. On the first
loop interation we put the three successors of v0 in the queue and update their distances accord-
ingly. On the second iteration, we pop the node with the lowest distance (9) and add its own suc-
cessor to the queue at distance 9+23=32. The new gray frontier consists of the three nodes shown
in the figure below.

After the first iteration

The next one to come off the queue is the one at distance 14. Inspecting its successors, we dis-
cover a new white vertex and set its distance to 44. We also discover a new path of length 31 to a
node already on the fronter, so adjust its distance.
After the second iteration

At the end, the fronter is empty, all reachable nodes are black, and we have computed the mini-
mum distance to each of them:

How do we recover the actual shortest paths? Observe that along any minimum-distance path
containing an edge v→v' with length d, v'.dist = v.dist + d. If that were not true, there would have
to be another shorter path. We can't have v'.dist > v.dist+d because there is clearly a path to v'
with length v.dist+d. And we can't have v'.dist < v.dist+d, because that means we could make a
shorter path to the final vertex by taking the better path to v' and concatenating it onto the tail of
the current path starting at v'.

We can find the best paths, then, by working backward from each destination vertex v', at each
step finding the predecessor satisfying this equation. We iterate the process until a root node is
reached. The thick edges in the previous figure show the edges that are traversed during this pro-
cess. They form a spanning tree that includes every vertex in the graph and shows a minimum-
distance to each of them.

Performance of Dijkstra's algorithm


The algorithm does some work per vertex and some work per edge. The outer loop happens once
for each vertex (as we will show), and uses the priority queue to get the minimum of up to V ver-
tices. Getting the minimum vertex must take at least O(lg V) time. Otherwise we could use a pri-
ority queue to sort items faster than the lower bound of Ω(n lg n). In fact, one straightforward
way to implement a priority queue is as a balanced binary tree. Its operations, including finding
the minimum, are all O(lg V).

We also do work proportional to the number of edges; in the worst case every edge will cause the
distance to its sink vertex to be adjusted. The priority queue needs to be informed of the new dis-
tance. This will take O(lg V) time if we use a balanced binary tree. So the total time taken by the
algorithm is O(V lg V) + O(E lg V). Since the number of reachable vertices is O(E), this is O(E
lg V).

It turns out there is a way to implement a priority queue as a data structure called a Fibonacci
heap, which makes the operation of updating the distance O(1). The total time is then O(V lg V +
E). However, Fibonacci heaps have poor constant factors so in practice they are not used. They
only make sense for very large, dense graphs where E is not O(V).

Showing Dijkstra's algorithm works


To show this algorithm works, we will need a loop invariant.

Let us start by defining an interior path to a node v to be a path to v that starts from a root node
and gets to v using only black vertices (except possibly for v itself).

The loop invariant for the outer loop is then the usual tricolor black–white invariant, plus the fol-
lowing:

1. For every finished vertex v and frontier vertex v', v.dist ≤ v'.dist.
2. For every discovered (gray or black) vertex, v.dist is the length of the shortest interior path
from the roots to v.
Establishment. We can see that the invariant is true initially, because all the discovered vertices
are roots at distance 0.

Postcondition. When the loop finishes, all paths are interior paths, so each distance must be the
minimum.

Preservation. We consider each of the two parts of the loop invariant in turn.

1. Each iteration of the loop first colors the minimum-distance gray node v black. This
preserves (1) because distance(black) ≤ v ≤ distance(gray). Some new vertices may be
discovered on each iteration, but they will all be on the frontier at a distance greater than
v.dist too. So the first part of the invariant holds. We can see from this that vertices are
moved from gray to black in order of increasing distance.
2. We need to show that we have the correct interior path distance to every discovered vertex.
Coloring v black doesn't create new interior paths to it, but it might create new paths to other
nodes.

We don't have to worry about new interior paths to any nodes that are not successors to v,
because any such interior path must have as its second-to-last step a different black node
than v, one that is already at a closer distance than v. So going through v cannot produce a
shorter interior path.
However, we still need to think about the interior paths to successors to v. Consider an arbi-
trary successor v' encountered in the inner loop. It can have one of three colors:

black. By part 1 of the invariant, any black successor v' must have v'.dist ≤ v.dist, so
there can be no shorter interior path via v.
white. By the tricolor invariant, there is no preexisting interior path to v'. Therefore
the new path via v with distance v.dist+d is the only interior path.
gray. If the node is already
on the frontier, there is an
existing interior path whose
length is recorded in v'.dist.
In addition, there is a new
interior path to v' via v,
with total distance v.dist.
The code compares these
two paths and picks the
shorter one. Coloring v
black can also create other
new interior paths that don't
go directly to v' via the new
edge, but instead go via
some other black node v''.
By invariant part (1),
v''.dist ≤ v.dist, so any such
interior path must be at least as long as the preexisting interior path that goes from the
roots directly to v'' and then to v', and we know by invariant part (2) that that path is
no shorter than v'.dist.

Generalizing shortest paths


The weights in a shortest path problem need not represent distance, of course. They could repre-
sent time, for example. They could also represent probabilities. For example, suppose that we
had a state machine that transitioned to new states with some given probability on each outgoing
edge. The probability of taking a particular path through the graph would then be the product of
the probabilities on edges along that path. We can then answer questions like “what is the most
likely path that the system will follow in order to arrive at a given state,”, by solving a shortest
path problem in a weighted graph in which the weights are the negative logarithms of the proba-
bilities (since (-log a) + (-log b) = -(log ab)).

Heuristic search (A*)


Dijkstra's algorithm is a simple and efficient way to search graphs. However, in some applica-
tions it can search much more of the graph than necessary if we are only interested in the shortest
path to a particular vertex. One simple optimization is to stop the algorithm at the point when the
destination vertex is popped off the priority queue. We know that v.dist is the minimal distance to
v at that point because it is being colored black.
A second and more interesting optimization is to take advantage of more knowledge of distances
in the graph. Dijkstra's algorithm searches outward in distance order from the source node and
finds the best route to every node in the graph. In general we don't want to know every best
route. For example, if we are trying to go from NYC to Mount Rushmore, we don't need to ex-
plore the streets of Miami, but that is what Dijkstra's algorithm will do:

A* search generalizes Dijkstra's algorithm to to take advantage of knowledge about the node we
are trying to find. The idea is that for any given vertex, we may be able to estimate the remaining
distance to the destination. We define a heuristic function h(v) that constructs this estimate.
Then, the priority queue uses v.dist + h(v) as the priority instead of just v.dist. Depending on how
accurate the heuristic function is, this change causes the search to focus on nodes that look like
they are in the right direction.

Adding the heuristic to the node distance effectively changes the weights of edges. Given an
edge from vertex v to v' with weight d, the change in priority along the edge is d + h(v') − h(v). If
h(v) is a perfect estimator of remaining distance, this quantity will be zero along the optimum
path; in that case we can trivially "search" for the optimum path by simply following such zero-
weight edges. More realistically, h(v) will be inaccurate. If it is inaccurate by up to an amount ε,
we end up searching around the optimum path in a region whose size is determined by ε.

However, we have to be careful when defining h(v). If for an edge v→v' with distance d, the total
distance including the heuristic function decreases along the edge, the conditions of Dijkstra's al-
gorithm (nonnegative edge weights) are no longer met. In that case we have an inadmissible
heuristic that could cause us to miss the optimal solution. A heuristic function is admissible if it
never overestimates the true remaining distance. When searching with an admissible heuristic,
each node is seen only once and the optimum path has been found immediately when the goal
node is reached.

If using an inadmissible heuristic, negative-weight edges might cause us to find a new path to an
already “finished” node, pushing it back on to the frontier queue. Even after finding the goal
node at some distance D, It is necessary to keep searching above priority level D to account for
these negative-weight edges. Despite this issue, the reason we might want to use an inadmissible
heuristic is that it can be more accurate than an admissible heuristic, and thus guide the search
more effectively to the goal. In some cases inadmissible heuristics can pay off.
A* search is often used for single-player game search and various other optimization problems.

Notes by Andrew Myers, 11/15/12.


Minimum spanning trees and strongly
connected components
Graphs connect nodes with edges, but graphics may have more edges than necessary for some
applications. For example, suppose we have several locations we want to interconnect with com-
munication cables. There are possible routes that the cables can be laid along, with different costs
per route. These routes can be represented as edges in a graph in which the nodes are the loca-
tions to be conncted. Ultimately we just need to ensure that any two locations are connected via
some route. Can we choose edges from the original graph in such a way that every pair of nodes
is connected and, what's more, the total cost is minimized? Perhaps surprisingly, this problems
and similar problems can be solved efficiently.

An undirected graph is a tree if there is exactly one simple path between any two pair of nodes.
Equivalently, an undirected graph is a tree if it is connected—there is a path between any pair of
nodes—and acyclic. In fact, any two of the following three facts ensures that an undirected graph
is a tree, and implies the third fact:

1. |E| = |V|−1
2. The graph is connected.
3. The graph is acyclic

Spanning trees
A spanning tree of a graph (V, E) is a subgraph (V, E') in which E is a subset of E' and the sub-
graph is a tree. It has the same set of nodes but possibly fewer edges.

Finding a spanning tree is not difficult. We can start with the original graph and keep removing
any edges that are part of cycles until there are no more cycles and a spanning tree is reached.
Notice that if we remove an edge from a cycle, it cannot disconnect the graph. This is a subtrac-
tive algorithm.

Alternatively, we can do things additively: start with no edges and keep adding edges as long as
they do not create cycles. Stop when there is only one connected component. At that point we
will have an acyclic graph with |E| = |V|−1, which is a spanning tree.

Minimum spanning trees


Assume the graph edges come with weights. We want to find a spanning tree that minimizes the
total weight of all its edges. There may be more than one equally good minimum spanning tree,
in which case we are satisfied with any of them.

A greedy algorithm for minimum spanning trees: Kruskal's algorithm


An additive algorithm. Idea: start with an empty set of edges and add the minimum-weight edge
in the graph that does not create a cycle to the set. Repeat until a spanning tree is achieved.
A greedy algorithm for minimum spanning trees: Prim's algorithm
An additive algorithm. Idea: start with an empty set of edges and add the minimum-weight edge
in the graph that does not create a cycle and that is connected to the existing set of connected
nodes until a spanning tree is achieved. This is a bit reminiscent of Dijkstra's SSP algorithm.

A greedy algorithm for minimum spanning trees


A subtractive algorithm. Idea: start with all edges and remove the maximum-weight edge from
the graph that does not disconnect the graph until a spanning tree is achieved.

(These notes still under construction.)


Priority Queues and Heaps
For Dijkstra's shortest-path algorithm, we needed a priority queue: a queue whose elements are
removed in priority order. A priority queue is an abstraction with several important uses. And it
can be implemented efficiently, as we'll see.

We've already seen that priority queues are useful for implementing Dijkstra's algorithm and A*
search. Priority queues are also very useful for event-driven simulation, where the simulator
needs to handle events in the order in which they occur, and handling one event can result in add-
ing new events in the future, which need to be pushed onto the queue. Another use for priority
queues is for the compression algorithm known as Huffman coding, the optimal way to compress
individual symbols in a stream. Priority queues can also be used for sorting, since elements to be
sorted can be pushed into the priority queue and then removed in sorted order.

Priority queue interface


A priority queue can be described via the following interface:

/** A priority queue of elements of type E. A priority is


* an int where smaller ints represent higher priorities.
*/
interface PriorityQueue<E> {
/** Effect: push a new element elem onto the queue. */
void add(E elem);
/** Returns: the highest-priority element in the queue
* (i.e., the one with the smallest priority value).
* EFfects: removes the returned element from the queue.
*/
E extractMin();
/** Effects: Increase the priority of elem to
* priority.
* Requires: elem is already in the queue, and
* priority is at least as high a priority
* as the current priority of elem.
*/
void increasePriority(E elem, int priority);
}

This interface suffices to implement Dijkstra's shortest path algorithm, for example.

Implementing increasePriority() requires that it be possible to find the element in the priority
queue. This can be accomplished by using a hash table to look up element locations, or by aug-
menting the elements themselves with an extra instance variable that holds their location. We
largely ignore that issue here.

Implementation 1: Binary Search Tree


One simple implementation of a priority queue is as a binary search tree, using element priorities
as keys. New elements are added by using the ordinary BST add operation. The minimum ele-
ment in the tree can be found by simply walking leftward in the tree as far as possible and then
pruning or splicing out the element found there. The priority of an element can be adjusted by
first finding the element; then the element is removed from the tree and readded with its new pri-
ority. Assuming that we can find elements in the tree in logarithmic time, and that the tree is bal-
anced, all of these operations can be done in logarithmic time, asymptotically.

Implementation 2: Binary Heap


However, a balanced binary tree is overkill for implementing priority queues; it is conventional
to instead use a simpler data structure, the binary heap. The term heap is a bit overloaded; bina-
ry heaps should not be confused with memory heaps. A memory heap is a low-level data struc-
ture used to keep track of the computer's memory so that the programming language implemen-
tation knows where to place objects in memory.

A binary heap, on the other hand, is a binary tree satisfying the heap order invariant:

(Order) For each non-root node n, the priority of n is no higher than the priority of n's parent.
Equivalently, a heap stores its minimum element at the root, and the left and right subtrees are al-
so both heaps.

Here is an example of a binary heap in which only the priorities of the elements are shown:

Notice that the root of each subtree contains the highest-priority element.

It is possible to manipulate binary heaps as tree structures. However, additional speedup can be
achieved if the binary heap satisfies a second invariant:

(Shape) If there is a node at depth h, then every possible node of depth h–1 exists along with
every possible node to the left of depth h. Therefore, the leaves of the tree are only at depths h
and h–1. This shape invariant may be easier to understand visually:

In fact, the example tree above also satisfies this shape invariant.

The reason the shape invariant helps is because it makes it possible to represent the binary heap
as a resizable array. The elements of the array are “read out” into the heap structure row by row,
so the heap structure above is represented by the following array of length 9, with array indices
shown on the bottom.
How is it possible to represent a tree structure without pointers? The shape invariant guarantees
that the children of a node at index i are found at indices 2i+1 (left) and 2i+2 (right). Conversely,
the parent of a node at index i is found at index (i–1)/2, rounded down. So we can walk up and
down through the tree by using simple arithmetic.

Binary heap operations

Add
Adding is done by adding the element at the end of the array to preserve the Shape invariant.
This violates the Order invariant in general, though. To restore the Order invariant, we bubble up
the element by swapping it with its parent until it reaches either the root or a parent node of high-
er priority. This requires at most lg n swaps, so the algorithm is O(lg n). For example, if we add
an element with priority 2, it goes at the end of the array and then bubbles up to the position
where 3 was:

ExtractMin
The minimum element is always at the root, but it needs to be replaced with something. The last
element in the array needs to go somewhere anyway, so we put it at the root of the tree. However,
this breaks the order invariant in general. We fix the order invariant by bubbling the element
down. The element is compared against the two children nodes and if either is higher-priority, it
is swapped with the higher priority child. The process repeats until either the element is higher-
priority than its children or a leaf is reached. Bubbling down ensures that the heap order invariant
is restored along the path from the root to the last heap element. Here is what happens with our
example heap:
IncreasePriority
Increasing the priority of an element is easy. After increasing the priority, we simply bubble it up
to restore Order.

HeapSort
The heapsort algorithm sorts an array by first heapifying it to make it satisfy Order. Then
extractMin() is used repeatedly to read out the elements in increasing order.

Heapifying can be done by bubbling every element down, starting from the last non-leaf node in
the tree (at index n/2 - 1) and working backward and up toward the root:

for (i = (n/2)-1; i >= 0; i--) {


bubble_down(i);
}

The total time required to do this is linear. At most half the elements need to be bubbled down
one step, at most a quarter of the elements need to be bubbled down two steps, and so on. So the
total work is at most n/2 + 2·n/4 + 3·n/8 + 4·n/16 + ..., which is O(n).

Treaps
A treap is a binary search tree that is balanced with high probability. This is achieved by ensur-
ing the tree has exactly the same structure that it would have had if the elements had been insert-
ed in random order. Each node in a treap contains both a key and a randomly chosen priority. The
treap satisfies the BST invariant with respect to all of its keys so elements can be found in the
treap in logarithmic time. The treap satisfies the heap invariant with respect to all its priorities.
This ensures that the tree structure is exactly what you'd get if the elements had been inserted in
priority order.

For example, the following is a treap where the keys are the letters and the priorities are the num-
bers.

Elements are added to the treap in the usual way for binary search trees. However, this may
break the Order invariant on priorities. To fix that invariant, the element is bubbled up through
the treap using tree rotations that swap a node with its parent while preserving the BST invari-
ant. If the node is x and it is the right child of a parent p, the tree rotation is performed by chang-
ing pointers so the data structure on the left turns into the data structure on the right.
Notice that A, B, and C here represent entire subtrees that are not affected by the rotation except
that their parent node may change. Conversely, if the node is p and it's the left child of a parent x,
the tree rotation to swap p with x transforms the data structure on the right to the one on the left.

For example, adding a node D with priority 2 to the treap above results in the following rotations
being done to restore the Order invariant:

Notice that this is exactly the tree structure we'd get if we had inserted all the nodes into a simple
BST in the order specified by the priorities of the nodes.
Balanced binary trees
We've already seen that by imposing the binary search tree invariant (BST invariant), we can
search for keys in a tree of height h in O(h) time, assuming that the keys are part of a total order
that permits pairwise ordering tests. However, nothing thus far ensured that h is not linear in the
number of nodes n in the tree, whereas we would like to know that trees are balanced: that their
height h, and therefore their worst-case search time, is logarithmic in the number of nodes in the
tree.

AVL Trees
AVL trees, named after its inventors, Adelson-Velsky and Landis, were the original balanced bi-
nary search trees. They strengthen the usual BST invariant with an additional invariant regarding
the heights of subtrees. The AVL invariant is that at each node, the heights of the left and right
subtrees differ by at most one.

The balance factor of a tree node is defined as the difference between the height of the left and
right subtrees. Letting h(t) be the height of the subtree rooted at node t, where an empty tree is
considered to have height −1, the balance factor BF(t) is:

BF(t) = h(t.left) − h(t.right)

The AVL invariant at node t is, then, that |BF(t)| ≤ 1 and also that the invariant holds on both the
left and right subtrees if they are nonempty.

Is an AVL tree balanced?


A balanced tree has the property that the height h is O(lg n); that is, that h ≤ k lg n for some k. To
show this, we need to show that a tree of height h contains a number of nodes that grows geomet-
rically with h.

Let us determine the minimum number of nodes that can exist in an AVL tree of height h. Call it
N(h). The root node at height h has two subtrees, one of which must be height h-1. By the AVL
invariant, the other subtree must be at least height h-1. Therefore, the least number of nodes that
can be in the tree is N(h-1)+N(h-2)+1: that is, the number of nodes in the two subtrees, plus the
root node itself. This gives us an equation called a recurrence:

N(h) = 1 + N(h-1) + N(h-2)

From the recurrence and the facts that N(-1) = 0 and N(0) = 1, we can derive the minimum num-
ber of nodes for some small values of h:

N(-1) = 0
N(0) = 1
N(1) = 2
N(2) = 4
N(3) = 7
N(4) = 12
N(5) = 20
...
It's not obvious that this function is growing exponentially. However, you may already have no-
ticed that the recurrence above is very similar to the Fibonacci recurrence:

Fn = Fn−1 + Fn−2

In fact, if we add 1 to each term in the sequence of values for N(h), the familiar Fibonacci se-
quence emerges: 1, 2, 3, 5, 8, 13, ... . In general, N(h) = Fh+3 – 1. Asympotically, the –1 term
doesn't matter, so N(h) grows asymptotically in the same way as the Fibonacci sequence.

If we can show that the Fibonacci sequence grows exponentially, we'll know N(h) does too. In
fact, it does. The exact formula for the Fibonacci numbers is:

Fn = (φn − φ�n)/√5�

Here, φ is the golden ratio (1+√5�)/2 (≈1.618) and φ� is its negative reciprocal (1−√5�)/2
(≈−0.618). The golden ratio and its negative reciprocal share an interesting property: φ2 = φ+1
(and φ�2 = φ�+1). Multiplying both sides of the equation by φn–2, we can conclude that for any ex-
ponent n, we have φn = φn–1 + φn–2, and similarly for φ�.

To show that the exact formula for the Fibonacci numbers is correct, first observe that the formu-
la for Fn works for n=0 if we consider F0 = 0, and it also works for n=1 since φ−φ� = √5�.

We can show using induction that the formula above works for all greater values of n too. As-
sume that the formula above works for the entire sequence up to but not including Fn (this is
called a strong induction hypothesis). Then,

Fn = Fn−1 + Fn−2 = (φn−1 − φ�n–1)/√5� + (φn−2 − φ�n–2)/√5�


= (φn−1+ φn−2 − φ�n–1 – φ�n–2)/√5�
= (φn − φ�n)/√5�

Since we know the formula works for 0 and 1, it must work for n=2, and therefore for n=3, and
by induction, all greater values of n as well.

Now, we can apply this formula to show that AVL trees are balanced. Observe that since |φ�| < 1,
the term φ�n becomes vanishingly small for large n. Asymptotically, the Fibonacci numbers grow
as φn (technically, they are Θ(φn), meaning that φn is both an asymptotic upper and lower bound
on Fn).

In fact, for all h≥0, N(h) ≥ φh. Given an arbitrary AVL tree of height h containing n nodes, we
know n ≥ N(h) ≥ φh, so h ≤ logφ n. Since all logarithms are related by constant factors, h is there-
fore O(lg n). AVL trees are balanced.

Another inductive argument


Without showing an exact formula for N(h) or Fn, we can prove the asympotic behavior of N(h)
using induction. What we'd really like to know is that N(h) ≥ φh. Experimentally, this formula
holds true for N(0) = 1 and N(1) = 2. Now assume that the condition holds for all h from 0 up to
some value x. We can show that the formula must also hold for h=x+1, meaning that we need to
show that N(x+1) ≥ φx+1. This is not hard, using the induction hypothesis and the properties of φ.
The properties of φ tell us that φx+1 = φx−1·φ2 = φx−1(φ + 1) = φx + φx−1. Then,

N(x+1) = 1 + N(x) + N(x−1) (definition of N)

≥ 1 + φx + φx−1 (by the induction hypothesis)

≥ 1 + φx+1 (using formula above)

≥ φx+1

Since the condition holds for h=0 and h=1, it must hold for h=2. And therefore for h=3. And so
on for all larger h. So for all h≥0, we have N(h) ≥ φh, and h ≤ logφ N(h) ≤ logφ n. Hence h is O(lg
n).

Inserting into an AVL tree


To insert a new element (key/value pair) into an AVL tree, we start by using the key in the usual
way to find where the key can be inserted as a leaf while preserving the BST invariant. Adding a
new leaf makes the path to that leaf one longer than previously, so the AVL invariant may now be
broken. To fix the invariant, we find the lowest node along that path to the leaf where the invari-
ant is broken, and apply one or two tree rotations. Assuming that the insertion is done recursive-
ly, it is easy to identify where along the path the invariant is broken as the recursive calls return,
assuming that each node keeps track of its height in the tree. Of course, that also means that
nodes' heights must also be updated as the recursion unwinds.

If inserting a new leaf breaks the AVL invariant at some node t, the invariant can only be broken
by 1; that is, either BF(t) = 2 or BF(t) = –2, depending on whether the insertion happened under
the left child of t or the right child. Without loss of generality, let us consider the left-child case,
where BF(t) = 2. Suppose the height of the right subtree is some h and the left subtree has height
h+2. Since insertion only affected one path, one of the two subtrees of the left subtree must have
height h+1 and the other, h. Depending on which subtree has height h+1, there two cases to con-
sider:

LL case LR case
In this figure, z is the lowest node that is unbalanced. Therefore, the shorter subtree (c on the left
side and a on the right side) must have height h; if their heights were lower, z would not be the
lowest unbalanced node.

Now, how to fix the AVL invariant? If we are in the LL case shown on the left, we perform a sin-
gle tree rotation to make y the parent of z, and update z so that its left child is now the old left
child of y (that is, c). The resulting tree looks as follows:

After tree rotation(s)

Does this rotation preserve the BST invariant? Because the BST invariant held before the rota-
tion, we know that a < x < b < y < c < z < d (where a, b, c, d stand for all nodes in subtrees a, b,
c, d). This ordering of the keys is preserved in the rotated tree.

Does this rotation establish the AVL invariant? Since subtrees a, b, c, and d all have at most
height h, the nodes x and z are now at height h+1, and node y is at height h+2. The longest path
within this part of the tree is now one shorter than before–it's back to h+2—so this change also
fixes the AVL invariant for all nodes above y.

To fix the AVL invariant in the LR case, we convert the tree into exactly the same structure as for
the LL case. However, this requires more work: all three nodes x, y, and z must be changed. We
can think of this as a double rotation in which we first rotate x and y, and then rotate y and z. Or
we can just update x, y, and z directly.

Symmetrically to the LL and LR cases, there are RR and RL cases. They are handled in exactly
the same way, yielding the same resulting tree shown above.

RR case RL case

An optimization
It is not really necessary to store the height of each node at the node. Instead, we can store just
the balance factor for the node: –1, 0, or 1. Storing one of three options requires only two bits of
space if the programming language is cooperative. If only the balance factor is stored in the
node, balance factors only need to be updated when they change. When inserting a node, the only
nodes whose balance factors change are those on the path from the leaf up to the first unbalanced
node (or the root), and those involved in whatever tree rotations are performed.

Deleting from an AVL tree


Removing a key from the tree can also make it unbalanced. The algorithm works in the usual
way for BST deletion, depending on the number of children of node storing the deleted key. Re-
call that if that node has 0 children, it is pruned; if 1 child, it is spliced out, and if 2 children, its
element is replaced with that from the node storing the next (or previous) key in the tree. The
node storing that next or previous key is the node that is deleted. In any case, deleting a node
(whether the node storing the key or the node storing the next/previous key) may break the AVL
invariant along the path to the deleted node. AVL deletion therefore walks back up the tree from
the deletion point using tree rotations to restore the AVL invariant.

To see how this works, consider the lowest tree node that becomes unbalanced as the result of
deleting a node below it. Without loss of generality, let's assume that the deleted node is on the
right side of the unbalanced node. Just as for insertion, the cases for deletion on the other side are
symmetrical. The right child is at some height h (formerly h+1), and the left side is at height h+2.
One of the two grandchildren on the left side must have height h+1. Let us first consider the case
in which the left grandchild has height h+1 but the right grandchild has only height h. The tree
then looks as shown in this figure, essentially the same as the LL case above:

To rebalance the tree, we simply use the same single rotation of y and z as in the LL case above,
with the y node becoming the new root of the subtree. However, notice that this rotation reduces
the height of the subtree from h+3 to h+2. Therefore, it is necessary to continue walking up to-
ward the root, potentially fixing other unbalanced nodes along the way.

We just assumed that the right grandchild had height h. What if the right grandchild has height
h+1, instead? The picture then looks like much like the LR case above:
As in the LR case, we use a double rotation of the tree to arrive at the tree rotation shown above.
There is one twist, however. Depending on the height of the subtree a, the y node after rotation
may be either h+2 or h+3. In other words, the double rotation may or may not change the height
of the whole subtree. It is only necessary to check whether nodes above are still balanced if the
height of the y node becomes h+2.

Other balanced binary trees


Other balanced binary search trees (and more generally, n-ary search trees) also strengthen the
search tree invariant to ensure that the tree remains balanced. There are many balanced search
tree data structures; some of the most important are:

Red-black trees

Nodes are colored red or black


There are a fixed number of black nodes on every path from the root to each leaf.
There are no red–red edges.
Shortest path length is h/2, so the tree is balanced.

2-3-4 trees
Every leaf has the same depth.
Every node has 2, 3, or 4 children, except the leaves.

B-trees
An n-ary generalization of 2-3-4 trees.
Every leaf at same depth.
Every node except root has at least ceiling(n/2) children ⇒ balanced
Large n ⇒ few edges followed to reach any leaf
Complex logic with a lot of corner cases that are tricky to get right.

Splay trees
Binary tree with no invariant.
Searched-for nodes are rotated to top of tree.
O(lg n) amortized performance.

Each of these except splay trees imposes an additional invariant that ensures the tree remains bal-
anced. For example, red–black trees have a color invariant: every node is either red or black,
and on every path from the root to a leaf, there are the same number of black nodes but no adja-
cent red nodes. B-trees and 2-3-4 trees are perfectly balanced n-ary search trees in which the
number of children varies between ⌈n/2⌉ and n (except that the root may have as few as 2 chil-
dren).

Note that search trees with a branching factor of b must store at least b–1 keys at each node in or-
der to know which child to go to when searching. For example, in a 2–3–4 tree, there can be up
to 3 keys at a given node.
Interpreters, compilers, and the Java Virtual
Machine
Interpreters vs. compilers
There are two strategies for obtaining runnable code from a program written in some
programming language that we will call the source language. The first is compilation, or
translation, in which the program is translated into a second, target language.

Figure 1: Compiling a program

Compilation can be slow because it is not simple to translate from a high-level source language
to low-level languages. But once this translation is done, it can result in fast code. Traditionally,
the target language of compilers is machine code, which the computer's processor knows how to
execute. This is an instance of the second method for running code: interpretation, in which an
interpreter reads the program and does whatever computation it describes.

Figure 2: Interpreting a program

Ultimately all programs have to be interpreted by either hardware or software, since compilers
only translate them. One advantage that a software interpreter offers over a compiler is that, giv-
en a program, it can quickly start running it without spending time to compile it. A second ad-
vantage is that the code is more portable to different hardware architectures; it can run on any
hardware architecture that the interpreter itself can run on.

The disadvantage of software interpretation is that it is orders of magnitude slower than hardware
execution of the same computation. This is because for each machine operation (say, adding two
numbers), a software interpreter has to do many operations to figure out what it is supposed to be
doing. Adding two numbers can be done in a single machine-code instruction requiring just one
machine cycle.

Interpreter designs
There are multiple kinds of software interpreters. The simplest interpreters are AST interpreters
of the sort you built in your project. These are implemented as recursive traversals of the AST.
However, traversing the AST makes them typically hundreds of times slower than the equivalent
machine code.
A faster and very common interpreter design is a bytecode interpreter, in which the program is
compiled to bytecode instructions somewhat similar to machine code, but these instructions are
interpreted. Language implementations based on a bytecode interpreter include Java, Smalltalk,
OCaml, Python, and C#. The bytecode language of Java is the Java Virtual Machine.

A third interpreter design often used is a threaded interpreter. Here the word “threaded” has
nothing to do with the threads related to concurrency. The code is represented as a data structure
in which the leaf nodes are machine code and the non-leaf nodes are arrays of pointers to other
nodes. Execution proceeds largely as a recursive tree traversal, which can be implemented as a
short, efficient loop written in machine code. Threaded interpreters are usually a little faster than
bytecode interpreters but the interpreted code takes up more space (a space--time tradeoff). The
language FORTH uses this approach; these days this language is commonly used in device firm-
ware.

Java compilation
Java combines the two strategies of compilation and interpretation, as depicted in Figure 3.

Figure 3: The Java execution architecture

Source code is compiled to JVM bytecode. This bytecode can immediately be interpreted by the
JVM interpreter. The interpreter also monitors how much each piece of bytecode is executed
(run-time profiling) and hands off frequently executed code (the hot spots) to the just-in-time
(JIT) compiler. The JIT compiler converts the bytecode into corresponding machine code to be
used instead. Because the JIT knows what code is frequently executed, and which classes are ac-
tually loaded into the current JVM, it can optimize code in ways that an ordinary offline compil-
er could not. It generates reasonably good code even though it does not include some (expensive)
optimizations and even though the Java compiler generates bytecode in a straightforward way.

The Java architecture thus allows code to run on any machine to which the JVM interpreter has
been ported and to run fast on any machine for which the JIT interpreter has also been designed
to target. Serious code optimization effort is reserved for the hot spots in the code.

The Java Virtual Machine


JVM bytecode is stored in class files (.class) containing the bytecode for each of the methods of
the class, along with a constant pool defining the values of string constants and other constants
used in the code. Other information is found in a .class file as well, such as attributes. (Consult
the Java Virtual Machine Specification for a thorough and detailed description of the JVM.)

It is not difficult to inspect the bytecode generated by the Java compiler, using the program javap,
a standard part of the Java software release. Run javap -c ⟨fully-qualified-class-
name⟩ to see the bytecode for each of the methods of the named class.

The JVM is a stack-based language with support for objects. When a method executes, its stack
frame includes an array of local variables and an operand stack.

Local variables have indices ranging from 0 up some maximum. The first few local variables are
the arguments to the method, including this at index 0; the remainder represent local variables
and perhaps temporary values computed during the method. A given local variable may be re-
used to represent different variables in the Java code.

For example, consider the following Java code:

if (b) x = y+1;
else x = z;

The corresponding bytecode instructions might look as shown in Fig-


ure 4. Each bytecode instruction is located at some offset (in bytes) 13: iload_3
from the beginning of the method code, which is shown in the first 14: ifeq 26
column in the figure. In this case the bytecode happens to start at off- 17: iload 5
set 13, and variables b, x, y, and z are found in local variables 3, 4, 5, 19: iconst_1
and 6 respectively. 20: iadd
21: istore 4
All computation is done using the operand stack. The purpose of the 23: goto 30
26: iload 6
first instruction is to load the (integer) variable at location 3 and push
28: istore 4
its value onto the stack. This is the variable b, showing that Java bool- 30: return
eans are represented as integers at the JVM level. The second instruc-
tion pops the value on the top of the stack and sees if it is equal to ze-
Figure 4: Some bytecode
ro (i.e., it represents false). If so, the code branches to offset 26. If
non-zero (true), execution continues to the next instruction, which
pushes y onto the stack. The instruction iconst_1 pushes a constant 1 onto the stack, and then iadd
pops the top two values (which must be integers), adds them, and pushes the result onto the
stack. The result is stored into x by the instruction istore 4. Notice that for small integer con-
stants and small-integer indexed local variables, there are special bytecode instructions. These
are used to make the code more compact than it otherwise would be: by looking at the offsets for
each of the instructions, we can see that iload_3 takes just one byte whereas iload 5 takes two.

Method dispatch
When methods are called, the arguments to the method are popped from the operand stack and
used to construct the first few entries of the local variable array in the called method's stack
frame. The result from the method, if any, is pushed onto the operand stack of the caller.

There are multiple bytecode instructions for calling methods. Given a call x.foo(5), where foo is
an ordinary non-final, non-private method and x has static type Bar, the corresponding invocation
bytecode is something like this:
invokevirtual #23

Here the #23 is an index into the constant pool. The corresponding entry contains the string f:
(I)V, showing that the invoked method is named f, that it has a single integer argument (I), and
that it returns void (V). We can see from the fact that the name of the invoked method includes the
arguments of the types that all overloading has been fully resolved by the Java compiler. Unlike
the Java compiler, the JVM doesn't have to make any decisions about what method to call based
on the arguments to the call.

At run time, method dispatch must be done to find the right bytecode to run. For example, sup-
pose that the actual class of the object that x refers to is Baz, a subclass of its static type Bar, but
that Baz inherits its implementation of f from Bar. The situation inside the JVM is shown in Fig-
ure 5.

Each object contains a pointer to its class object, an object representing its class. The class ob-
ject in turn points to the dispatch table, an array of pointers to its method bytecode (In C++ im-
plementations, this array is known as the vtable, and objects point directly to their vtables rather
than to an intervening class object.) In the depicted example, the JVM has decided to put method
f:(I)V in position 2 within this array. To find the bytecode for this method, the appropriate point-
er is loaded from the array. The figure shows that the classes Bar and Baz share inherited
bytecode. If Baz had overridden the method f, the two dispatch tables would have pointed to dif-
ferent bytecode.

The JIT compiler converts bytecode into machine code. In doing so, it may create specialized
versions of inherited methods such as f, so that different code ends up being executed for Bar.f
and Baz.f. Specialization of code allows the compiler to generate more efficient code, at the cost
of greater space usage.

There are other ways to invoke methods, and the JVM has bytecode instructions for them:
• invokestatic invokes static methods, using a table in the specified class object. No receiver
(this) object is passed as an argument.

• invokeinterfaceinvokes methods on objects via their interface. It looks like invokevirtual,


but because a class can implement multiple interfaces, this operation is often a bit more
expensive.

• invokespecial invokes object methods that do not involve dispatch, such as constructors,
final methods, and private methods.

• invokedynamic invokes object methods without requiring that static type of the receiver ob-
ject supports the method. Run-time checking is used to ensure that the method can be
called. This is a recent addition to the JVM, intended to support dynamically typed lan-
guages. It should not ordinarily be needed for Java code.

Bytecode verification
One of the most important properties of the JVM is that JVM bytecode can be type-checked.
This process is known as bytecode verification. It makes sure that bytecode instructions can be
run safely with little run-time checking, making Java both faster and more secure. For example,
when a method is invoked on an object, verification ensures that the receiver object is of the right
type (and is not, for example, actually an integer).

Type parameterization via erasure


The JVM knows nothing about type parameters. All type parameters are erased by the Java com-
piler and replaced with the type Object. An array of parameter type T then becomes an array of
Object in the context of the JVM, which is why you can't write expressions like new T[n].

Generating code at run time with Java


The Java architecture also makes it relatively easy to generate new code at run time. This can
help increase performance for some tasks, such as implementing a domain-specific language for
a particular application. The HW6 project is an example where this strategy would help.

The class javax.tools.ToolProvider provides access to the Java compiler at run time. The compil-
er appears simply as an object implementing the interface javax.tools.JavaCompiler. It can be
used to dynamically generate bytecode. Using a classloader (also obtainable from ToolProvider),
this bytecode can be loaded into the running program as a new class or classes. The new classes
are represented as objects of class java.lang.Class. To run their code, the reflection capabilities
of Java are used. In particular, from a Class object one can obtain a Method object for each of its
methods. The Method object can be used to run the code of the method, using the method
Method.invoke. At this point the code of the newly generated class will be run. If it runs many
times, the JIT compiler will be invoked to convert it into machine code.

Another strategy for generating code is to generate bytecode directly, possibly using one of the
several JVM bytecode assemblers that are available. This bytecode can be loaded using a
classloader as above. Unfortunately JVM bytecode does not expose many capabilities that aren't
available in Java already, so it is usually easier just to target Java code.
Notes by Andrew Myers, 11/27/12.
Hard and incomputable problems
Hard problems
We have been seeing various useful algorithms and data structures for solving problems. Howev-
er, some problems are intractably hard, in the sense that they require an unreasonable amount of
time or space to compute, or even cannot be solved in general at all.

The class of problems that are generally considered to be tractable are those that can be solved in
polynomial time, which is to say O(nk) for some k. In practice, algorithms that take polynomial
time with k larger than 1 scale poorly as well, and algorithms that require k≥5 are not used in
practice (and even k=3 and k=4 are often impractically slow).

We define the complexity class P as the set of all problems that can be solved by an algorithm
taking polynomial time:

P = ⋃k O(nk)

Beyond polynomial time there is the class EXPTIME, which includes all algorithms that take
k
time O(2n ) for some k. Algorithms in EXPTIME effectively hit a wall when the problem size
becomes too large. For example, if a algorithm takes time 2n, increasing the problem size from n
to size n+1 requires twice the time. Even if the algorithm is fast for small n, increasing n quickly
reaches a size where even a small increase is far too expensive. Contrast this with the case of an
O(nk) algorithm, where going from n to n+1 means increasing the time only by a multiple of k/n,
a factor that gets smaller as n increases. (To see why, note that ((n+1)/n)k ≈ 1 + k/n when n ≫ k.)

Another important class is nondeterministic polynomial time, or NP. These are the problems
for which a possible answer can be checked in polynomial time, or equivalently, which can be
computed in polynomial time using an unbounded number of machines computing independently
in parallel. Given an unbounded number of machines, one simply has each machine generate one
of the possible answers and then check in polynomial time whether it is correct.

Examples of problems in NP are the following:

Graph coloring. What is the minimum number of colors needed to color a graph so no
edge connects two vertices of the same color? Given a coloring, checking whether it is
correct can be done in linear time.
Hamiltonian cycles. A Hamiltonian cycle is a cycle that includes every node in a directed
graph. The problem is whether there is such a cycle. Clearly, given a candidate cycle, it
can be checked in polynomial time.
SAT. Given a boolean formula using logical “and”, ”or”, and “not”, is there a way to
choose values for variables in the formula such that the whole formula is true? Given
candidate assignments to variables, only polynomial time is required to determine whether
the formula evaluates to true.
These are three famous problems in NP. However, the best known deterministic algorithms to
solve these problems require exponential time. It is not known whether there is a polynomial-
time algorithm to solve these problems, though most computer scientists believe there is none.

These three problems have the interesting property that in polynomial time, any problem in NP
can be encoded into any of them. Because they can express any problem in NP, these problems
are said to be NP-complete. If we had a polynomial-time algorithm to solve an NP-complete
problem, we could solve any problem in NP in polynomial time! This result would mean that the
complexity classes P and NP were exactly the same. Most computer scientists believe that P and
NP are not the same; that there is no algorithm that solves problems in NP in worst-case polyno-
mial time. However, no one has managed to prove it. Showing that P≠NP is probably the best
known unsolved problem of computer science.

It is also possible to classify algorithms in terms of the memory space they require. Algorithms in
PSPACE require a polynomial amount of space. Algorithms in L require only a logarithmic
amount of space in addition to the input data; in effect, they can use a constant number of point-
ers into the input data. For example, a recent surprising result is that graph reachability is in L.

Some relationships among complexity classes have been proved, such as the following inclusion
relationships:

L ⊆ P ⊆ NP ⊆ PSPACE ⊆ EXPTIME

It also known that L≠PSPACE and that P≠EXPTIME. However, many important things are not
known! We don't know where the dividing line is between these classes. It is not known whether
P=NP, whether L=P, or whether NP=EXPTIME. The complexity of some important problems is
not known either. For example, even though the security of RSA encryption rests on the difficul-
ty of factoring numbers, it is not known whether factoring is in P. (It is, however, known that fac-
toring can in principle be solved in polynomial time on a quantum computer, though no one has
been able to build a useful quantum computer.)

Computability and decidability


Beyond hard problems, there are even incomputable problems that can't be solved by any algo-
rithm running on a computer. In fact, we can prove that such problems exist. We will focus on
decision problems where the goal is to compute a boolean result about some input, and we will
show that some decision problems are undecidable by any algorithm.

An example of such a decision problem is the halting problem: does a program terminate when
run on an arbitrary input? We will see that the halting problem is undecidable (i.e., incomputa-
ble) in general, assuming that our programming language is expressive enough to write an inter-
preter for itself.

We have seen in previous lectures and from the programming assignments that programs can be
represented as a data structure as such as abstract syntax tree or bytecode. Let us assume there is
a type "Program" that we can use to represent programs.

We want to know whether we can implement a method with the following specification:

/** Returns whether program p terminates when given inp as input. */


boolean terminates(Program p, Object inp);
That is, for every program p and input inp, it successfully returns either true or false. Note that
although terminates() is just one method, it is allowed to use as many other classes and methods
as it likes. We have the full power of Java at our disposal.

For simplicity we will consider only programs p that themselves implement a decision problem,
and therefore have the following structure:

class p {
boolean main(Object inp) { ... }
}

Again, the method main is allowed to use other classes and methods. However, we will only con-
sider programs that receive no input from and send no output to the outside environment. The
only input to the program is inp and the only output is the boolean result of main. If we can't de-
termine whether such simple programs terminate, we of course have no hope of determining
whether more complex programs do.

We have seen that it is possible to write an interpreter for a programming language. An interpret-
er for programs like p can be written with the following signature:

/** Simulate the execution of program p on input inp, returning


* the same result as p would. If p would fail to terminate on
* this input, so does interpret.
*/
boolean interpret(Program p, Object inp);

In other words, running p.main(inp) has exactly the same result as running interpret(p, inp).

However, the ability to implement interpret implies that we cannot implement terminates. To see
why, let us assume that we can implement both of these methods.

Creating a contradiction
We introduced the idea of an adjective being autological previously. We can think of a decision
program as an adjective. Let us say that a program p is autological if terminates and returns true
whenever it is passed itself as an input: that is, p.main(p) == true.

Similarly, a program is heterological if it does not terminate or does not return true when passed
itself as an argument. Using terminates, we can implement a test for heterologicity!

class H {
boolean main(Program p) {
if (terminates(p, p))
return !interpret(p,p);
else
return true;
}
}

Program H must terminate because it only calls interpret when interpret is guaranteed to termi-
nate. Therefore H always returns either true or false.
But now consider what happens when we test whether H itself is heterological by evaluating
H.main(H). We know this must terminate, so we are going to be in the “then” branch of the code.
From the code, we see that H.main(H) == !interpret(H, H). But from the specification of
interpret, we know that H.main(H) == interpret(H, H). So H cannot return either true or false, yet
it must return one of them.

This contradiction means that our original assumption must be wrong. We cannot implement
both terminates and interpret. Since we know (roughly) how to implement interpret in Java, we
must not be able to solve the halting problem for Java. Conversely, if we have a programming
language in which we can compute whether programs terminate, that programming language
must not be expressive enough to implement an interpreter for itself. In practice, such program-
ming languages tend to lose even more expressive power than that.

One conclusion is that some useful things are simply not computable by any programming lan-
guages we know how to build.

Implications for program analysis


This result has some practical implications. In particular, many different program analyses other
than termination cannot be precisely decided, because if they could be, the halting problem
would be decidable. As a result, these program analyses must be conservative, giving answers
“true”, “false” or “not sure”. Type checking is one example. Ideally a type checker would tell
you at compile time whether there was any input that could cause a program to have a type error
at run time. But we could apply such a type checker to code like the following:

while (...) {
// complex computation
}
int x = 32 + "hi";

This code has a dynamic type error if and only if there is an input that causes the while loop to
terminate. If we had a precise type checker, we could use it to determine whether the while loop
terminates—if it reports no type error, the while loop cannot terminate, and vice versa. Even if
we can build a type checker that works precisely on some programs, in general there will be
some programs that cannot be type-checked precisely.

By similar arguments, we see that we have to be conservative about many other facts we'd like to
know about programs—for example, whether they are correct, whether they are secure, or
whether they leak memory. All automatic tools for analyzing programs will either be incom-
plete, meaning that they reject some safe programs as possibly unsafe (this is a false positive), or
else unsound, meaning that they accept some unsafe programs as safe (this is a false negative).
However, incomplete and even unsound automatic tools can still be useful!
Notes by Andrew Myers, 5/3/12.

©2015 Andrew Myers, Cornell University

You might also like