Programming in Standard ML
(WORKING DRAFT OF DECEMBER 9, 2002.)
Robert Harper
Carnegie Mellon University
Spring Semester, 2001
Copyright c _1997, 1998, 1999, 2000, 2001.
All Rights Reserved.
Preface
This book is an introduction to programming with the Standard ML pro
gramming language. It began life as a set of lecture notes for Computer
Science 15–212: Principles of Programming, the second semester of the in
troductory sequence in the undergraduate computer science curriculum at
Carnegie Mellon University. It has subsequently been used in many other
courses at Carnegie Mellon, and at a number of universities around the
world. It is intended to supersede my Introduction to Standard ML, which
has been widely circulated over the last ten years.
Standard ML is a formally deﬁned programming language. The Deﬁni
tion of Standard ML (Revised) is the formal deﬁnition of the language. It is
supplemented by the Standard ML Basis Library, which deﬁnes a common
basis of types that are shared by all implementations of the language. Com
mentary on Standard ML discusses some of the decisions that went into the
design of the ﬁrst version of the language.
There are several implementations of Standard ML available for a wide
variety of hardware and software platforms. The bestknown compilers
are Standard ML of New Jersey, Moscow ML, MLKit, and PolyML. These are
all freely available on the worldwide web. Please refer to The Standard ML
Home Page for uptodate information on Standard ML and its implemen
tations.
Numerous people have contributed directly and indirectly to this text.
I am especially grateful to the following people for their helpful comments
and suggestions: Marc Bezem, Franck van Breugel, Karl Crary, Mike Erd
mann, Matthias Felleisen, Joel Jones, John Lafferty, Adrian Moos, Bryce
Nichols, Frank Pfenning, Chris Stone, Dave Swasey, Johan Wallen, Scott
Williams, and Jeannette Wing. I am also grateful to the many students of
15212 who used these notes and sent in their suggestions over the years.
These notes are a work in progress. Corrections, comments and sugges
tions are most welcome.
iv Preface
WORKING DRAFT DECEMBER 9, 2002
Contents
Preface iii
I Overview 1
1 Programming in Standard ML 5
1.1 A Regular Expression Package . . . . . . . . . . . . . . . . . 5
1.2 Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
II The Core Language 19
2 Types, Values, and Effects 23
2.1 Evaluation and Execution . . . . . . . . . . . . . . . . . . . . 23
2.2 The ML Computation Model . . . . . . . . . . . . . . . . . . 24
2.2.1 Type Checking . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Types, Types, Types . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Type Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Declarations 33
3.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Basic Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Type Bindings . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Value Bindings . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Compound Declarations . . . . . . . . . . . . . . . . . . . . . 36
3.4 Limiting Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Typing and Evaluation . . . . . . . . . . . . . . . . . . . . . . 38
WORKING DRAFT DECEMBER 9, 2002
vi CONTENTS
4 Functions 41
4.1 Functions as Templates . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Functions and Application . . . . . . . . . . . . . . . . . . . . 42
4.3 Binding and Scope, Revisited . . . . . . . . . . . . . . . . . . 45
5 Products and Records 49
5.1 Product Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.1 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.2 Tuple Patterns . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Record Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Multiple Arguments and Multiple Results . . . . . . . . . . . 56
6 Case Analysis 59
6.1 Homegeneous and Heterogeneous Types . . . . . . . . . . . 59
6.2 Clausal Function Expressions . . . . . . . . . . . . . . . . . . 60
6.3 Booleans and Conditionals, Revisited . . . . . . . . . . . . . 61
6.4 Exhaustiveness and Redundancy . . . . . . . . . . . . . . . . 62
7 Recursive Functions 65
7.1 SelfReference and Recursion . . . . . . . . . . . . . . . . . . 66
7.2 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Inductive Reasoning . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Mutual Recursion . . . . . . . . . . . . . . . . . . . . . . . . . 73
8 Type Inference and Polymorphism 75
8.1 Type Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2 Polymorphic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . 78
8.3 Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9 Programming with Lists 85
9.1 List Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.2 Computing With Lists . . . . . . . . . . . . . . . . . . . . . . 87
10 Concrete Data Types 91
10.1 Datatype Declarations . . . . . . . . . . . . . . . . . . . . . . 91
10.2 NonRecursive Datatypes . . . . . . . . . . . . . . . . . . . . 92
10.3 Recursive Datatypes . . . . . . . . . . . . . . . . . . . . . . . 94
10.4 Heterogeneous Data Structures . . . . . . . . . . . . . . . . . 97
10.5 Abstract Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 98
WORKING DRAFT DECEMBER 9, 2002
CONTENTS vii
11 HigherOrder Functions 101
11.1 Functions as Values . . . . . . . . . . . . . . . . . . . . . . . . 101
11.2 Binding and Scope . . . . . . . . . . . . . . . . . . . . . . . . 102
11.3 Returning Functions . . . . . . . . . . . . . . . . . . . . . . . 103
11.4 Patterns of Control . . . . . . . . . . . . . . . . . . . . . . . . 106
11.5 Staging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12 Exceptions 111
12.1 Exceptions as Errors . . . . . . . . . . . . . . . . . . . . . . . 111
12.1.1 Primitve Exceptions . . . . . . . . . . . . . . . . . . . 112
12.1.2 UserDeﬁned Exceptions . . . . . . . . . . . . . . . . . 113
12.2 Exception Handlers . . . . . . . . . . . . . . . . . . . . . . . . 114
12.3 ValueCarrying Exceptions . . . . . . . . . . . . . . . . . . . . 118
13 Mutable Storage 121
13.1 Reference Cells . . . . . . . . . . . . . . . . . . . . . . . . . . 121
13.2 Reference Patterns . . . . . . . . . . . . . . . . . . . . . . . . 123
13.3 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13.4 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
13.5 Programming Well With References . . . . . . . . . . . . . . 128
13.5.1 Private Storage . . . . . . . . . . . . . . . . . . . . . . 128
13.5.2 Mutable Data Structures . . . . . . . . . . . . . . . . . 130
13.6 Mutable Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . 132
14 Input/Output 135
14.1 Textual Input/Output . . . . . . . . . . . . . . . . . . . . . . 135
15 Lazy Data Structures 139
15.1 Lazy Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . 141
15.2 Lazy Function Deﬁnitions . . . . . . . . . . . . . . . . . . . . 142
15.3 Programming with Streams . . . . . . . . . . . . . . . . . . . 144
16 Equality and Equality Types 147
17 Concurrency 149
III The Module Language 151
18 Signatures and Structures 155
18.1 Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
WORKING DRAFT DECEMBER 9, 2002
viii CONTENTS
18.1.1 Basic Signatures . . . . . . . . . . . . . . . . . . . . . . 155
18.1.2 Signature Inheritance . . . . . . . . . . . . . . . . . . . 157
18.2 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
18.2.1 Basic Structures . . . . . . . . . . . . . . . . . . . . . . 159
18.2.2 Long and Short Identiﬁers . . . . . . . . . . . . . . . . 160
19 Signature Matching 163
19.1 Principal Signatures . . . . . . . . . . . . . . . . . . . . . . . . 164
19.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
19.3 Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
20 Signature Ascription 171
20.1 Ascribed Structure Bindings . . . . . . . . . . . . . . . . . . . 171
20.2 Opaque Ascription . . . . . . . . . . . . . . . . . . . . . . . . 172
20.3 Transparent Ascription . . . . . . . . . . . . . . . . . . . . . . 175
20.4 Transparency, Opacity, and Dependency . . . . . . . . . . . . 177
21 Module Hierarchies 179
21.1 Substructures . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22 Sharing Speciﬁcations 187
22.1 Combining Abstractions . . . . . . . . . . . . . . . . . . . . . 187
23 Parameterization 195
23.1 Functor Bindings and Applications . . . . . . . . . . . . . . . 195
23.2 Functors and Sharing Speciﬁcations . . . . . . . . . . . . . . 198
23.3 Avoiding Sharing Speciﬁcations . . . . . . . . . . . . . . . . . 200
IV Programming Techniques 205
24 Speciﬁcations and Correctness 209
24.1 Speciﬁcations . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
24.2 Correctness Proofs . . . . . . . . . . . . . . . . . . . . . . . . 211
24.3 Enforcement and Compliance . . . . . . . . . . . . . . . . . . 214
25 Induction and Recursion 217
25.1 Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 217
25.2 The GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . 222
WORKING DRAFT DECEMBER 9, 2002
CONTENTS ix
26 Structural Induction 227
26.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 227
26.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
26.3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
26.4 Generalizations and Limitations . . . . . . . . . . . . . . . . 231
26.5 Abstracting Induction . . . . . . . . . . . . . . . . . . . . . . 232
27 ProofDirected Debugging 235
27.1 Regular Expressions and Languages . . . . . . . . . . . . . . 235
27.2 Specifying the Matcher . . . . . . . . . . . . . . . . . . . . . . 237
28 Persistent and Ephemeral Data Structures 245
28.1 Persistent Queues . . . . . . . . . . . . . . . . . . . . . . . . . 248
28.2 Amortized Analysis . . . . . . . . . . . . . . . . . . . . . . . . 251
29 Options, Exceptions, and Continuations 255
29.1 The nQueens Problem . . . . . . . . . . . . . . . . . . . . . . 255
29.2 Solution Using Options . . . . . . . . . . . . . . . . . . . . . . 257
29.3 Solution Using Exceptions . . . . . . . . . . . . . . . . . . . . 258
29.4 Solution Using Continuations . . . . . . . . . . . . . . . . . . 260
30 HigherOrder Functions 263
30.1 Inﬁnite Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 264
30.2 Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 267
31 Memoization 271
31.1 Cacheing Results . . . . . . . . . . . . . . . . . . . . . . . . . 271
31.2 Laziness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
31.3 Lazy Data Types in SML/NJ . . . . . . . . . . . . . . . . . . . 275
31.4 Recursive Suspensions . . . . . . . . . . . . . . . . . . . . . . 276
32 Data Abstraction 279
32.1 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
32.2 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . 280
32.3 Balanced Binary Search Trees . . . . . . . . . . . . . . . . . . 281
32.4 Abstraction vs. RunTime Checking . . . . . . . . . . . . . . 285
33 Representation Independence and ADT Correctness 289
34 Modularity and Reuse 291
WORKING DRAFT DECEMBER 9, 2002
x CONTENTS
35 Dynamic Typing and Dynamic Dispatch 293
36 Concurrency 295
V Appendices 297
The Standard ML Basis Library 299
Compilation Management 301
36.1 Overview of CM . . . . . . . . . . . . . . . . . . . . . . . . . . 301
36.2 Building Systems with CM . . . . . . . . . . . . . . . . . . . . 301
Sample Programs 303
WORKING DRAFT DECEMBER 9, 2002
Part I
Overview
WORKING DRAFT DECEMBER 9, 2002
3
Standard ML is a typesafe programming language that embodies many
innovative ideas in programming language design. It is a statically typed
language, with an extensible type system. It supports polymorphic type
inference, which all but eliminates the burden of specifying types of vari
ables and greatly facilitates code reuse. It provides efﬁcient automatic
storage management for data structures and functions. It encourages func
tional (effectfree) programming where appropriate, but allows imperative
(effectful) programming where necessary. It facilitates programming with
recursive and symbolic data structures by supporting the deﬁnition of func
tions by pattern matching. It features an extensible exception mechanism
for handling error conditions and effecting nonlocal transfers of control.
It provides a richly expressive and ﬂexible module system for structur
ing large programs, including mechanisms for enforcing abstraction, im
posing hierarchical structure, and building generic modules. It is portable
across platforms and implementations because it has a precise deﬁnition.
It provides a portable standard basis library that deﬁnes a rich collection of
commonlyused types and routines.
Many implementations go beyond the standard to provide experimen
tal language features, extensive libraries of commonlyused routines, and
useful program development tools. Details can be found with the docu
mentation for your compiler, but here’s some of what you may expect.
Most implementations provide an interactive system supporting online
program development, including tools for compiling, linking, and analyz
ing the behavior of programs. A few implementations are “batch compil
ers” that rely on the ambient operating system to manage the construction
of large programs from compiled parts. Nearly every compiler generates
native machine code, even when used interactively, but some also gener
ate code for a portable abstract machine. Most implementations support
separate compilation and provide tools for managing large systems and
shared libraries. Some implementations provide tools for tracing and step
ping programs; many provide tools for time and space proﬁling. Most im
plementations supplement the standard basis library with a rich collection
of handy components such as dictionaries, hash tables, or interfaces to the
ambient operating system. Some implementations support language exten
sions such as support for concurrent programming (using messagepassing
or locking), richer forms of modularity constructs, and support for “lazy”
data structures.
WORKING DRAFT DECEMBER 9, 2002
4
WORKING DRAFT DECEMBER 9, 2002
Chapter 1
Programming in Standard ML
1.1 A Regular Expression Package
To develop a feel for the language and howit is used, let us consider the im
plementation of a package for matching strings against regular expressions.
We’ll structure the implementation into two modules, an implementation
of regular expressions themselves and an implementation of a matching
algorithm for them.
These two modules are concisely described by the following signatures.
signature REGEXP = sig
datatype regexp =
Zero  One  Char of char 
Plus of regexp * regexp 
Times of regexp * regexp 
Star of regexp
exception SyntaxError of string
val parse : string > regexp
val format : regexp > string
end
signature MATCHER = sig
structure RegExp : REGEXP
val match : RegExp.regexp > string > bool
end
The signature REGEXP describes a module that implements regular expres
sions. It consists of a description of the abstract syntax of regular expres
sions, together with operations for parsing and unparsing them. The sig
WORKING DRAFT DECEMBER 9, 2002
6 Programming in Standard ML
nature MATCHER describes a module that implements a matcher for a given
notion of regular expression. It contains a function match that, when given
a regular expression, returns a function that determines whether or not a
given string matches that expression. Obviously the matcher is dependent
on the implementation of regular expressions. This is expressed by a struc
ture speciﬁcation that speciﬁes a hierarchical dependence of an implementa
tion of a matcher on an implementation of regular expressions — any im
plementation of the MATCHER signature must include an implementation of
regular expressions as a constituent module. This ensures that the matcher
is selfcontained, and does not rely on implicit conventions for determining
which implementation of regular expressions it employs.
The deﬁnition of the abstract syntax of regular expressions in the sig
nature REGEXP takes the form of a datatype declaration that is reminiscent
of a contextfree grammar, but which abstracts from matters of lexical pre
sentation (such as precedences of operators, parenthesization, conventions
for naming the operators, etc..) The abstract syntax consists of six clauses,
corresponding to the regular expressions 0, 1, a, r
1
+r
2
, r
1
r
2
, and r
∗
.
1
The
functions parse and format specify the parser and unparser for regular
expressions. The parser takes a string as argument and yields a regular
expression; if the string is illformed, the parser raises the exception Syn
taxError with an associated string describing the source of the error. The
unparser takes a regular expression and yields a string that parses to that
regular expression. In general there are many strings that parse to the same
regular expressions; the unparser generally tries to choose one that is easi
est to read.
The implementation of the matcher consists of two modules: an imple
mentation of regular expressions and an implementation of the matcher
itself. An implementation of a signature is called a structure. The imple
mentation of the matching package consists of two structures, one imple
menting the signature REGEXP, the other implementing MATCHER. Thus the
overall package is implemented by the following two structure declarations:
structure RegExp :> REGEXP = ...
structure Matcher :> MATCHER = ...
The structure identiﬁer RegExp is bound to an implementation of the REGEXP
signature. Conformance with the signature is ensured by the ascription of
the signature REGEXP to the binding of RegExp using the “:>” notation.
Not only does this check that the implementation (which has been elided
1
Some authors use ∅ for 0 and ε for 1.
WORKING DRAFT DECEMBER 9, 2002
1.1 A Regular Expression Package 7
here) conforms with the requirements of the signature REGEXP, but it also
ensures that subsequent code cannot rely on any properties of the imple
mentation other than those explicitly speciﬁed in the signature. This helps
to ensure that modules are kept separate, facilitating subsequent changes
to the code.
Similarly, the structure identiﬁer Matcher is bound to a structure that
implements the matching algorithm in terms of the preceding implementa
tion RegExp of REGEXP. The ascribed signature speciﬁes that the structure
Matcher must conformto the requirements of the signature MATCHER. No
tice that the structure Matcher refers to the structure RegExp in its imple
mentation.
Once these structure declarations have been processed, we may use the
package by referring to its components using paths, or long identiﬁers. The
function Matcher.match has type
Matcher.RegExp.regexp > string > bool,
reﬂecting the fact that it takes a regular expression as implemented within the
package itself and yields a matching function on strings. We may build a
regular expression by applying the parser, Matcher.RegExp.parse, to a
string representing a regular expression, then passing the resulting value
of type Matcher.RegExp.regexp to Matcher.match.
2
Here’s an example of the matcher in action:
val regexp =
Matcher.RegExp.parse "(a+b)*"
val matches =
Matcher.match regexp
val ex1 = matches "aabba" (* yields true *)
val ex2 = matches "abac" (* yields false *)
The use of long identiﬁers can get tedious at times. There are two typi
cal methods for alleviating the burden. One is to introduce a synonym for
a long package name. Here’s an example:
2
It might seem that one can apply Matcher.match to the output of RegExp.parse,
since Matcher.RegExp.parse is just RegExp.parse. However, this relationship is not
stated in the interface, so there is a pro forma distinction between the two. See Chapter 22
for more information on the subtle issue of sharing.
WORKING DRAFT DECEMBER 9, 2002
8 Programming in Standard ML
structure M = Matcher
structure R = M.RegExp
val regexp = R.parse "((a + %).(b + %))*"
val matches = M.match regexp
val ex1 = matches "aabba"
val ex2 = matches "abac"
Another is to “open” the structure, incorporating its bindings into the cur
rent environment:
open Matcher Matcher.RegExp
val regexp = parse "(a+b)*"
val matches = match regexp
val ex1 = matches "aabba"
val ex2 = matches "abac"
It is advisable to be sparing in the use of open because it is often hard to
anticipate exactly which bindings are incorporated into the environment
by its use.
Now let’s look at the internals of the structures RegExp and Matcher.
Here’s a bird’s eye view of RegExp:
structure RegExp :> REGEXP = struct
datatype regexp =
Zero  One  Char of char 
Plus of regexp * regexp 
Times of regexp * regexp 
Star of regexp
.
.
.
fun tokenize s = ...
.
.
.
fun parse s =
let
val (r, s’) =
parse rexp (tokenize (String.explode s))
in
case s’ of
nil => r
 => raise SyntaxError "Bad input."
end
WORKING DRAFT DECEMBER 9, 2002
1.1 A Regular Expression Package 9
handle LexicalError =>
raise SyntaxError "Bad input."
.
.
.
fun format r =
String.implode (format exp r)
end
The elision indicates that portions of the code have been omitted so that we
can get a highlevel view of the structure of the implementation.
The structure RegExp is bracketed by the keywords struct and end.
The type regexp is implemented precisely as speciﬁed by the datatype
declaration in the signature REGEXP. The parser is implemented by a func
tion that, when given a string, “explodes” it into a list of characters, trans
forms the character list into a list of “tokens” (abstract symbols representing
lexical atoms), and ﬁnally parses the resulting list of tokens to obtain its ab
stract syntax. If there is remaining input after the parse, or if the tokenizer
encountered an illegal token, an appropriate syntax error is signalled. The
formatter is implemented by a function that, when given a piece of abstract
syntax, formats it into a list of characters that are then “imploded” to form
a string. The parser and formatter work with character lists, rather than
strings, because it is easier to process lists incrementally than it is to pro
cess strings.
It is interesting to consider in more detail the structure of the parser
since it exempliﬁes well the use of pattern matching to deﬁne functions.
Let’s start with the tokenizer, which we present here in toto:
datatype token =
AtSign  Percent  Literal of char 
PlusSign  TimesSign 
Asterisk  LParen  RParen
exception LexicalError
fun tokenize nil = nil
 tokenize (#"+" :: cs) = PlusSign :: tokenize cs
 tokenize (#"." :: cs) = TimesSign :: tokenize cs
 tokenize (#"*" :: cs) = Asterisk :: tokenize cs
 tokenize (#"(" :: cs) = LParen :: tokenize cs
 tokenize (#")" :: cs) = RParen :: tokenize cs
 tokenize (#"@" :: cs) = AtSign :: tokenize cs
 tokenize (#"%" :: cs) = Percent :: tokenize cs
 tokenize (#"¸¸" :: c :: cs) =
WORKING DRAFT DECEMBER 9, 2002
10 Programming in Standard ML
Literal c :: tokenize cs
 tokenize (#"¸¸" :: nil) = raise LexicalError
 tokenize (#" " :: cs) = tokenize cs
 tokenize (c :: cs) = Literal c :: tokenize cs
The symbol “@” stands for the empty regular expression and the symbol
“%” stands for the regular expression accepting only the null string. Con
catentation is indicated by “.”, alternation by “+”, and iteration by “*”.
We use a datatype declaration to introduce the type of tokens corre
sponding to the symbols of the input language. The function tokenize
has type char list > token list; it transforms a list of characters
into a list of tokens. It is deﬁned by a series of clauses that dispatch on
the ﬁrst character of the list of characters given as input, yielding a list of
tokens. The correspondence between characters and tokens is relatively
straightforward, the only nontrivial case being to admit the use of a back
slash to “quote” a reserved symbol as a character of input. (More sophis
ticated languages have more sophisticated token structures; for example,
words (consecutive sequences of letters) are often regarded as a single to
ken of input.) Notice that it is quite natural to “look ahead” in the input
streamin the case of the backslash character, using a pattern that dispatches
on the ﬁrst two characters (if there are such) of the input, and proceeding
accordingly. (It is a lexical error to have a backslash at the end of the input.)
Let’s turn to the parser. It is a simple recursivedescent parser imple
menting the precedence conventions for regular expressions given earlier.
These conventions may be formally speciﬁed by the following grammar,
which not only enforces precedence conventions, but also allows for the
use of parenthesization to override them.
rexp : : = rfac [ rtrm+rexp
rtrm : : = rfac [ rfac.rtrm
rfac : : = ratm [ ratm*
ratm : : = @ [ % [ a [ (rexp)
The implementation of the parser follows this grammar quite closely. It
consists of four mutually recursive functions, parse rexp, parse rtrm,
parse rfac, and parse ratm. These implement what is known as a re
cursive descent parser that dispatches on the head of the token list to deter
mine how to proceed.
fun parse rexp ts =
let
WORKING DRAFT DECEMBER 9, 2002
1.1 A Regular Expression Package 11
val (r, ts’) = parse rtrm ts
in
case ts’
of (PlusSign :: ts’’) =>
let
val (r’, ts’’’) = parse rexp ts’’
in
(Plus (r, r’), ts’’’)
end
 => (r, ts’)
end
and parse rtrm ts = ...
and parse rfac ts =
let
val (r, ts’) = parse ratm ts
in
case ts’
of (Asterisk :: ts’’) => (Star r, ts’’)
 => (r, ts’)
end
and parse ratm nil =
raise SyntaxError ("No atom")
 parse ratm (AtSign :: ts) = (Zero, ts)
 parse ratm (Percent :: ts) = (One, ts)
 parse ratm ((Literal c) :: ts) = (Char c, ts)
 parse ratm (LParen :: ts) =
let
val (r, ts’) = parse rexp ts
in
case ts’
of (RParen :: ts’’) => (r, ts’’)
 =>
raise SyntaxError "No close paren"
end
Notice that it is quite simple to implement “lookahead” using patterns that
inspect the token list for speciﬁed tokens. This parser makes no attempt to
recover fromsyntax errors, but one could imagine doing so, using standard
techniques.
This completes the implementation of regular expressions. Now for the
WORKING DRAFT DECEMBER 9, 2002
12 Programming in Standard ML
matcher. The matcher proceeds by a recursive analysis of the regular ex
pression. The main difﬁculty is to account for concatenation — to match a
string against the regular expression r
1
r
2
we must match some initial seg
ment against r
1
, then match the corresponding ﬁnal segment against r
2
.
This suggests that we generalize the matcher to one that checks whether
some initial segment of a string matches a given regular expression, then
passes the remaining ﬁnal segment to a continuation, a function that deter
mines what to do after the initial segment has been successfully matched.
This facilitates implementation of concatentation, but how do we ensure
that at the outermost call the entire string has been matched? We achieve
this by using an initial continuation that checks whether the ﬁnal segment is
empty.
Here’s the code, written as a structure implementing the signature MATCHER:
structure Matcher :> MATCHER =
struct
structure RegExp = RegExp
open RegExp
fun match is Zero k = false
 match is One cs k = k cs
 match is (Char c) nil = false
 match is (Char c) (d::cs) k = (c=d) andalso (k cs)
 match is (Plus (r1, r2)) cs k =
match is r1 cs k orelse match is r2 cs k
 match is (Times (r1, r2)) cs k =
match is r1 cs (fn cs’ => match is r2 cs’ k)
 match is (Star r) cs k =
k cs orelse
match is r cs (fn cs’ => match is (Star r) cs’ k)
fun match regexp string =
match is
regexp
(String.explode string)
(fn nil => true  => false)
end
Note that we incorporate the structure RegExp into the structure Matcher,
in accordance with the requirements of the signature. The function match
explodes the string into a list of characters (to facilitiate sequential pro
cessing of the input), then calls match is with an initial continuation that
WORKING DRAFT DECEMBER 9, 2002
1.2 Sample Code 13
ensures that the remaining input is empty to determine the result. The type
of match is is
RegExp.regexp > char list >
(char list > bool) > bool.
That is, match is takes in succession a regular expression, a list of charac
ters, and a continuation of type char list > bool; it yields as result
a value of type bool. This is a fairly complicated type, but notice that
nowhere did we have to write this type in the code! The type inference
mechanism of ML took care of determining what that type must be based
on an analysis of the code itself.
Since match is takes a function as argument, it is said to be a higher
order function. The simplicity of the matcher is due in large measure to the
ease with which we can manipulate functions in ML. Notice that we create
a new, unnamed function to pass as a continuation in the case of concatena
tion — it is the function that matches the second part of the regular expres
sion to the characters remaining after matching an initial segment against
the ﬁrst part. We use a similar technique to implement matching against
an iterated regular expression — we attempt to match the null string, but if
this fails, we match against the regular expression being iterated followed
by the iteration once again. This neatly captures the “zero or more times”
interpretation of iteration of a regular expression.
Important: the code given above contains a subtle error. Can you
ﬁnd it? If not, see chapter 27 for further discussion!
This completes our brief overview of Standard ML. The remainder of
these notes are structured into three parts. The ﬁrst part is a detailed intro
duction to the core language, the language in which we write programs in
ML. The second part is concerned with the module language, the means by
which we structure large programs in ML. The third is about programming
techniques, methods for building reliable and robust programs. I hope you
enjoy it!
1.2 Sample Code
Here is the complete code for this chapter:
signature REGEXP = sig
WORKING DRAFT DECEMBER 9, 2002
14 Programming in Standard ML
datatype regexp =
Zero  One  Char of char 
Plus of regexp * regexp  Times of regexp * regexp 
Star of regexp
exception SyntaxError of string
val parse : string > regexp
val format : regexp > string
end
signature MATCHER = sig
structure RegExp : REGEXP
val match : RegExp.regexp > string > bool
end
structure RegExp :> REGEXP = struct
datatype token =
AtSign  Percent  Literal of char  PlusSign  TimesSign 
Asterisk  LParen  RParen
exception LexicalError
fun tokenize nil = nil
 tokenize (#"+" :: cs) = (PlusSign :: tokenize cs)
 tokenize (#"." :: cs) = (TimesSign :: tokenize cs)
 tokenize (#"*" :: cs) = (Asterisk :: tokenize cs)
 tokenize (#"(" :: cs) = (LParen :: tokenize cs)
 tokenize (#")" :: cs) = (RParen :: tokenize cs)
 tokenize (#"@" :: cs) = (AtSign :: tokenize cs)
 tokenize (#"%" :: cs) = (Percent :: tokenize cs)
 tokenize (#"\\" :: c :: cs) = Literal c :: tokenize cs
 tokenize (#"\\" :: nil) = raise LexicalError
 tokenize (#" " :: cs) = tokenize cs
 tokenize (c :: cs) = Literal c :: tokenize cs
datatype regexp =
Zero  One  Char of char 
Plus of regexp * regexp  Times of regexp * regexp 
Star of regexp
WORKING DRAFT DECEMBER 9, 2002
1.2 Sample Code 15
exception SyntaxError of string
fun parse_exp ts =
let
val (r, ts’) = parse_term ts
in
case ts’
of (PlusSign::ts’’) =>
let
val (r’, ts’’’) = parse_exp ts’’
in
(Plus (r, r’), ts’’’)
end
 _ => (r, ts’)
end
and parse_term ts =
let
val (r, ts’) = parse_factor ts
in
case ts’
of (TimesSign::ts’’) =>
let
val (r’, ts’’’) = parse_term ts’’
in
(Times (r, r’), ts’’’)
end
 _ => (r, ts’)
end
and parse_factor ts =
let
val (r, ts’) = parse_atom ts
in
case ts’
of (Asterisk :: ts’’) => (Star r, ts’’)
 _ => (r, ts’)
end
and parse_atom nil = raise SyntaxError ("Factor expected\n")
 parse_atom (AtSign :: ts) = (Zero, ts)
 parse_atom (Percent :: ts) = (One, ts)
 parse_atom ((Literal c) :: ts) = (Char c, ts)
 parse_atom (LParen :: ts) =
WORKING DRAFT DECEMBER 9, 2002
16 Programming in Standard ML
let
val (r, ts’) = parse_exp ts
in
case ts’
of nil => raise SyntaxError ("Rightparenthesis expected\n")
 (RParen :: ts’’) => (r, ts’’)
 _ => raise SyntaxError ("Rightparenthesis expected\n")
end
fun parse s =
let
val (r, ts) = parse_exp (tokenize (String.explode s))
in
case ts
of nil => r
 _ => raise SyntaxError "Unexpected input.\n"
end
handle LexicalError => raise SyntaxError "Illegal input.\n"
fun format_exp Zero = [#"@"]
 format_exp One = [#"%"]
 format_exp (Char c) = [c]
 format_exp (Plus (r1, r2)) =
let
val s1 = format_exp r1
val s2 = format_exp r2
in
[#"("] @ s1 @ [#"+"] @ s2 @ [#")"]
end
 format_exp (Times (r1, r2)) =
let
val s1 = format_exp r1
val s2 = format_exp r2
in
s1 @ [#"*"] @ s2
end
 format_exp (Star r) =
let
val s = format_exp r
in
[#"("] @ s @ [#")"] @ [#"*"]
end
fun format r = String.implode (format_exp r)
WORKING DRAFT DECEMBER 9, 2002
1.2 Sample Code 17
end
structure Matcher :> MATCHER =
struct
structure RegExp = RegExp
open RegExp
fun match_is Zero _ k = false
 match_is One cs k = k cs
 match_is (Char c) nil _ = false
 match_is (Char c) (c’::cs) k = (c=c’) andalso (k cs)
 match_is (Plus (r1, r2)) cs k =
(match_is r1 cs k) orelse (match_is r2 cs k)
 match_is (Times (r1, r2)) cs k =
match_is r1 cs (fn cs’ => match_is r2 cs’ k)
 match_is (r as Star r1) cs k =
(k cs) orelse match_is r1 cs (fn cs’ => match_is r cs’ k)
fun match regexp string =
match_is regexp (String.explode string)
(fn nil => true  _ => false)
end
WORKING DRAFT DECEMBER 9, 2002
18 Programming in Standard ML
WORKING DRAFT DECEMBER 9, 2002
Part II
The Core Language
WORKING DRAFT DECEMBER 9, 2002
21
All Standard ML is divided into two parts. The ﬁrst part, the core lan
guage, comprises the fundamental programming constructs of the language
— the primitive types and operations, the means of deﬁning and using
functions, mechanisms for deﬁnining new types, and so on. The second
part, the module language, comprises the mechanisms for structuring pro
grams into separate units and is described in Part III. Here we introduce
the core language.
WORKING DRAFT DECEMBER 9, 2002
22
WORKING DRAFT DECEMBER 9, 2002
Chapter 2
Types, Values, and Effects
2.1 Evaluation and Execution
Most familiar programming languages, such as C or Java, are based on an
imperative model of computation. Programs are thought of as specifying a
sequence of commands that modify the memory of the computer. Each step
of execution examines the current contents of memory, performs a simple
computation, modiﬁes the memory, and continues with the next instruc
tion. The individual commands are executed for their effect on the mem
ory (which we may take to include both the internal memory and registers
and the external input/output devices). The progress of the computation
is controlled by evaluation of expressions, such as boolean tests or arith
metic operations, that are executed for their value. Conditional commands
branch according to the value of some expression. Many languages main
tain a distinction between expressions and commands, but often (in C, for
example) expressions may also modify the memory, so that even expres
sion evaluation has an effect.
Computation in ML is of a somewhat different nature. The emphasis
in ML is on computation by evaluation of expressions, rather than execution
of commands. The idea of computation is as a generalization of your expe
rience from high school algebra in which you are given a polynomial in a
variable x and are asked to calculate its value at a given value of x. We
proceed by “plugging in” the given value for x, and then, using the rules of
arithmetic, determine the value of the polynomial. The evaluation model of
computation used in ML is based on the same idea, but rather than restrict
ourselves to arithmetic operations on the reals, we admit a richer variety of
values and a richer variety of primitive operations on them.
WORKING DRAFT DECEMBER 9, 2002
24 Types, Values, and Effects
The evaluation model of computation enjoys several advantages over
the more familiar imperative model. Because of its close relationship to
mathematics, it is much easier to develop mathematical techniques for rea
soning about the behavior of programs. These techniques are important
tools for helping us to ensure that programs work properly without having
to resort to tedious testing and debugging that can only show the presence
of errors, never their absence. Moreover, they provide important tools for
documenting the reasoning that went into the formulation of a program,
making the code easier to understand and maintain.
What is more, the evaluation model subsumes the imperative model as
a special case. Execution of commands for the effect on memory can be
seen as a special case of evaluation of expressions by introducing primitive
operations for allocating, accessing, and modifying memory. Rather than
forcing all aspects of computation into the framework of memory modiﬁ
cation, we instead take expression evaluation as the primary notion. Doing
so allows us to support imperative programming without destroying the
mathematical elegance of the evaluation model for programs that don’t use
memory. As we will see, it is quite remarkable how seldom memory modi
ﬁcation is required. Nevertheless, the language provides for storagebased
computation for those few times that it is actually necessary.
2.2 The ML Computation Model
Computation in ML consists of evaluation of expressions. Each expression
has three important characteristics:
• It may or may not have a type.
• It may or may not have a value.
• It may or may not engender an effect.
These characteristics are all that you need to know to compute with an
expression.
The type of an expression is a description of the value it yields, should
it yield a value at all. For example, for an expression to have type int is to
say that its value (should it have one) is a number, and for an expression to
have type real is to say that its value (if any) is a ﬂoating point number.
In general we can think of the type of an expression as a “prediction” of
the form of the value that it has, should it have one. Every expression is
required to have at least one type; those that do are said to be welltyped.
WORKING DRAFT DECEMBER 9, 2002
2.2 The ML Computation Model 25
Those without a type are said to be illtyped; they are considered ineligible
for evaluation. The type checker determines whether or not an expression is
welltyped, rejecting with an error those that are not.
A welltyped expression is evaluated to determine its value, if indeed
it has one. An expression can fail to have a value because its evaluation
never terminates or because it raises an exception, either because of a run
time fault such as division by zero or because some programmerdeﬁned
condition is signalled during its evaluation. If an expression has a value,
the form of that value is predicted by its type. For example, if an expression
evaluates to a value v and its type is bool, then v must be either true or
false; it cannot be, say, 17 or 3.14. The soundness of the type system
ensures the accuracy of the predictions made by the type checker.
Evaluation of an expression might also engender an effect. Effects in
clude such phenomena as raising an exception, modifying memory, per
forming input or output, or sending a message on the network. It is impor
tant to note that the type of an expression says nothing about its possible
effects! An expression of type int might well display a message on the
screen before returning an integer value. This possibility is not accounted
for in the type of the expression, which classiﬁes only its value. For this
reason effects are sometimes called side effects, to stress that they happen
“off to the side” during evaluation, and are not part of the value of the ex
pression. We will ignore effects until chapter 13. For the time being we will
assume that all expressions are effectfree, or pure.
2.2.1 Type Checking
What is a type? What types are there? Generally speaking, a type is deﬁned
by specifying three things:
• a name for the type,
• the values of the type, and
• the operations that may be performed on values of the type.
Often the division of labor into values and operations is not completely
clearcut, but it nevertheless serves as a very useful guideline for describing
types.
Let’s consider ﬁrst the type of integers. Its name is int. The values of
type int are the numerals 0, 1, ˜1, 2, ˜2, and so on. (Note that negative
WORKING DRAFT DECEMBER 9, 2002
26 Types, Values, and Effects
numbers are written with a preﬁx tilde, rather than a minus sign!) Opera
tions on integers include addition, +, subtraction, , multiplication, *, quo
tient, div, and remainder, mod. Arithmetic expressions are formed in the
familiar manner, for example, 3*2+6, governed by the usual rules of prece
dence. Parentheses may be used to override the precedence conventions,
just as in ordinary mathematical practice. Thus the preceding expression
may be equivalently written as (3*2)+6, but we may also write 3*(2+6)
to override the default precedences.
The formation of expressions is governed by a set of typing rules that
deﬁne the types of expressions in terms of the types of their constituent ex
pressions (if any). The typing rules are generally quite intuitive since they
are consistent with our experience in mathematics and in other program
ming languages. In their full generality the rules are somewhat involved,
but we will sneak up on them by ﬁrst considering only a small fragment of
the language, building up additional machinery as we go along.
Here are some simple arithmetic expressions, written using inﬁx nota
tion for the operations (meaning that the operator comes between the ar
guments, as is customary in mathematics):
3
3 + 4
4 div 3
4 mod 3
Each of these expressions is wellformed; in fact, they each have type
int. This is indicated by a typing assertion of the form exp : typ, which
states that the expression exp has the type typ. A typing assertion is said to
be valid iff the expression exp does indeed have the type typ. The following
are all valid typing assertions:
3 : int
3 + 4 : int
4 div 3 : int
4 mod 3 : int
Why are these typing assertions valid? In the case of the value 3, it
is an axiom that integer numerals have integer type. What about the ex
pression 3+4? The addition operation takes two arguments, each of which
must have type int. Since both arguments in fact have type int, it follows
that the entire expression is of type int. For more complex cases we rea
son analogously, for example, deducing that (3+4) div (2+3): int by
observing that (3+4): int and (2+3): int.
WORKING DRAFT DECEMBER 9, 2002
2.2 The ML Computation Model 27
The reasoning involved in demonstrating the validity of a typing as
sertion may be summarized by a typing derivation consisting of a nested
sequence of typing assertions, each justiﬁed either by an axiom, or a typ
ing rule for an operation. For example, the validity of the typing assertion
(3+7) div 5 : int is justiﬁed by the following derivation:
1. (3+7): int, because
(a) 3 : int because it is an axiom
(b) 7 : int because it is an axiom
(c) the arguments of + must be integers, and the result of + is an
integer
2. 5 : int because it is an axiom
3. the arguments of div must be integers, and the result is an integer
The outermost steps justify the assertion (3+4) div 5 : int by demon
strating that the arguments each have type int. Recursively, the inner
steps justify that (3+4): int.
2.2.2 Evaluation
Evaluation of expressions is deﬁned by a set of evaluation rules that deter
mine how the value of a compound expression is determined as a function
of the values of its constituent expressions (if any). Since the value of an op
erator is determined by the values of its arguments, ML is sometimes said
to be a callbyvalue language. While this may seem like the only sensible
way to deﬁne evaluation, we will see in chapter 15 that this need not be the
case — some operations may yield a value without evaluating their argu
ments. Such operations are sometimes said to be lazy, to distinguish them
from eager operations that require their arguments to be evaluated before
the operation is performed.
An evaluation assertion has the form exp⇓val . This assertion states that
the expression exp has value val . It should be intuitively clear that the
following evaluation assertions are valid.
5 ⇓ 5
2+3 ⇓ 5
(2+3) div (1+4) ⇓ 1
WORKING DRAFT DECEMBER 9, 2002
28 Types, Values, and Effects
An evaluation assertion may be justiﬁed by an evaluation derivation, which
is similar to a typing derivation. For example, we may justify the assertion
(3+7) div 5 ⇓ 2 by the derivation
1. (3+7) ⇓ 10 because
(a) 3 ⇓ 3 because it is an axiom
(b) 2 ⇓ 2 because it is an axiom
(c) Adding 3 to 7 yields 10.
2. 5 ⇓ 5 because it is an axiom
3. Dividing 10 by 5 yields 2.
Note that is an axiom that a numeral evaluates to itself; numerals are fully
evaluated expressions, or values. Second, the rules of arithmetic are used to
determine that adding 3 and 7 yields 10.
Not every expression has a value. A simple example is the expression
5 div 0, which is undeﬁned. If you attempt to evaluate this expression
it will incur a runtime error, reﬂecting the erroneous attempt to ﬁnd the
number n that, when multiplied by 0, yields 5. The error is expressed in ML
by raising an exception; we will have more to say about exceptions in chap
ter 12. Another reason that a welltyped expression might not have a value
is that the attempt to evaluate it leads to an inﬁnite loop. We don’t yet have
the machinery in place to deﬁne such expressions, but we will soon see that
it is possible for an expression to diverge, or run forever, when evaluated.
2.3 Types, Types, Types
What types are there besides the integers? Here are a few useful base types
of ML:
• Type name: real
– Values: 3.14, 2.17, 0.1E6, . . .
– Operations: +, , *, /, =, <, . . .
• Type name: char
– Values: #"a", #"b", . . .
– Operations: ord,chr,=, <, . . .
WORKING DRAFT DECEMBER 9, 2002
2.3 Types, Types, Types 29
• Type name: string
– Values: "abc", "1234", . . .
– Operations: ˆ, size, =, <, . . .
• Type name: bool
– Values: true, false
– Operations: if exp then exp
1
else exp
2
There are many, many (in fact, inﬁnitely many!) others, but these are
enough to get us started. (See V for a complete description of the primitive
types of ML, including the ones given above.)
Notice that some of the arithmetic operations for real numbers are writ
ten the same way as for the corresponding operation on integers. For ex
ample, we may write 3.1+2.7 to perform a ﬂoating point addition of two
ﬂoating point numbers. This is called overloading; the addition operation is
said to be overloaded at the types int and real. In an expression involv
ing addition the type checker tries to resolve which form of addition (ﬁxed
point or ﬂoating point) you mean. If the arguments are int’s, then ﬁxed
point addition is used; if the arguments are real’s, then ﬂoating addition
is used; otherwise an error is reported.
1
Note that ML does not perform any
implicit conversions between types! For example, the expression 3+3.14 is
rejected as illformed! If you intend ﬂoating point addition, you must write
instead real(3)+3.14, which converts the integer 3 to its ﬂoating point
representation before performing the addition. If, on the other hand, you
intend integer addition, you must write 3+round(3.14), which converts
3.14 to an integer by rounding before performing the addition.
Finally, note that ﬂoating point division is a different operation from in
teger quotient! Thus we write 3.1/2.7 for the result of dividing 3.1 by
2.7, which results in a ﬂoating point number. We reserve the operator div
for integers, and use / for ﬂoating point division.
The conditional expression
if exp then exp
1
else exp
2
is used to discriminate on a Boolean value. It has type typ if exp has type
bool and both exp
1
and exp
2
have type typ. Notice that both “arms” of
the conditional must have the same type! It is evaluated by ﬁrst evaluating
exp, then proceeding to evaluate either exp
1
or exp
2
, according to whether
the value of exp is true or false. For example,
1
If the type of the arguments cannot be determined, the type defaults to int.
WORKING DRAFT DECEMBER 9, 2002
30 Types, Values, and Effects
if 1<2 then "less" else "greater"
evaluates to "less" since the value of the expression 1<2 is true.
Note that the expression
if 1<2 then 0 else (1 div 0)
evaluates to 0, even though 1 div 0 incurs a runtime error. This is be
cause evaluation of the conditional proceeds either to the then clause or to
the else clause, depending on the outcome of the boolean test. Whichever
clause is evaluated, the other is simply discarded without further consider
ation.
Although we may, in fact, test equality of two boolean expressions, it is
rarely useful to do so. Beginners often writen conditionals of the form
if exp = true then exp
1
else exp
2
.
But this is equivalent to the simpler expression
if exp then exp
1
else exp
2
.
Similarly, rather than write
if exp = false then exp
1
else exp
2
,
it is better to write
if not exp then exp
1
else exp
2
or, better yet, just
if exp then exp
2
else exp
1
.
2.4 Type Errors
Now that we have more than one type, we have enough rope to hang our
selves by forming illtyped expressions. For example, the following expres
sions are not welltyped:
size 45
#"1" + 1
#"2" ˆ "1"
3.14 + 2
WORKING DRAFT DECEMBER 9, 2002
2.4 Type Errors 31
In each case we are “misusing” an operator with arguments of the wrong
type.
This raises a natural question: is the following expression welltyped or
not?
if 1<2 then 0 else ("abc"+4)
Since the boolean test will come out true, the else clause will never be
executed, and hence need not be constrained to be welltyped. While this
reasoning is sensible for such a simple example, in general it is impossible
for the type checker to determine the outcome of the boolean test during
type checking. To be safe the type checker “assumes the worst” and insists
that both clauses of the conditional be welltyped, and in fact have the same
type, to ensure that the conditional expression can be given a type, namely
that of both of its clauses.
WORKING DRAFT DECEMBER 9, 2002
32 Types, Values, and Effects
WORKING DRAFT DECEMBER 9, 2002
Chapter 3
Declarations
3.1 Variables
Just as in any other programming language, values may be assigned to
variables, which may then be used in expressions to stand for that value.
However, in sharp contrast to most familiar languages, variables in ML do
not vary! A value may be bound to a variable using a construct called a value
binding. Once a variable is bound to a value, it is bound to it for life; there is
no possibility of changing the binding of a variable once it has been bound.
In this respect variables in ML are more akin to variables in mathematics
than to variables in languages such as C.
A type may also be bound to a type constructor using a type binding. A
bound type constructor stands for the type bound to it, and can never stand
for any other type. For this reason a type binding is sometimes called a type
abbreviation —the type constructor stands for the type to which it is bound.
1
A value or type binding introduces a “new” variable or type construc
tor, distinct from all others of that class, for use within its range of signiﬁ
cance, or scope. Scoping in ML is static, or lexical, meaning that the range of
signiﬁcance of a variable or type constructor is determined by the program
text, not by the order of evaluation of its constituent expressions. (Lan
guages with dynamic scope adopt the opposite convention.) For the time
being variables and type constructors have global scope, meaning that the
range of signiﬁcance of the variable or type constructor is the “rest” of the
program — the part that lexically follows the binding. However, we will
soon introduce mechanisms for limiting the scopes of variables or type con
1
By the same token a value binding might also be called a value abbreviation, but for some
reason it never is.
WORKING DRAFT DECEMBER 9, 2002
34 Declarations
structors to a given expression.
3.2 Basic Bindings
3.2.1 Type Bindings
Any type may be given a name using a type binding. At this stage we have
so few types that it is hard to justify binding type names to identiﬁers, but
we’ll do it anyway because we’ll need it later. Here are some examples of
type bindings:
type float = real
type count = int and average = real
The ﬁrst type binding introduces the type constructor float, which sub
sequently is synonymous with real. The second introduces two type con
structors, count and average, which stand for int and real, respec
tively.
In general a type binding introduces one or more new type constructors
simultaneously in the sense that the deﬁnitions of the type constructors may
not involve any of the type constructors being deﬁned. Thus a binding such
as
type float = real and average = float
is nonsensical (in isolation) since the type constructors float and average
are introduced simultaneously, and hence cannot refer to one another.
The syntax for type bindings is
type tycon
1
= typ
1
and ...
and tycon
n
= typ
n
where each tycon
i
is a type constructor and each typ
i
is a type expression.
3.2.2 Value Bindings
A value may be given a name using a value binding. Here are some exam
ples:
val m : int = 3+2
val pi : real = 3.14 and e : real = 2.17
WORKING DRAFT DECEMBER 9, 2002
3.2 Basic Bindings 35
The ﬁrst binding introduces the variable m, specifying its type to be int
and its value to be 5. The second introduces two variables, pi and e, si
multaneously, both having type real, and with pi having value 3.14 and
e having value 2.17. Notice that a value binding speciﬁes both the type
and the value of a variable.
The syntax of value bindings is
val var
1
: typ
1
= exp
1
and ...
and var
n
: typ
n
= exp
n
,
where each var
i
is a variable, each typ
i
is a type expression, and each exp
i
is an expression.
A value binding of the form
val var : typ = exp
is typechecked by ensuring that the expression exp has type typ. If not, the
binding is rejected as illformed. If so, the binding is evaluated using the
bindbyvalue rule: ﬁrst exp is evaluated to obtain its value val , then val is
bound to var. If exp does not have a value, then the declaration does not
bind anything to the variable var.
The purpose of a binding is to make a variable available for use within
its scope. In the case of a type binding we may use the type variable intro
duced by that binding in type expressions occurring within its scope. For
example, in the presence of the type bindings above, we may write
val pi : float = 3.14
since the type constructor float is bound to the type real, the type of the
expression 3.14. Similarly, we may make use of the variable introduced
by a value binding in value expressions occurring within its scope.
Continuing from the preceding binding, we may use the expression
sin pi
to stand for 0.0 (approximately), and we may bind this value to a variable
by writing
val x : float = sin pi
As these examples illustrate, type checking and evaluation are context
dependent in the presence of type and value bindings since we must refer
to these bindings to determine the types and values of expressions. For
WORKING DRAFT DECEMBER 9, 2002
36 Declarations
example, to determine that the above binding for x is wellformed, we must
consult the binding for pi to determine that it has type float, consult the
binding for float to determine that it is synonymous with real, which is
necessary for the binding of x to have type float.
The roughandready rule for both typechecking and evaluation is that
a bound variable or type constructor is implicitly replaced by its binding
prior to type checking and evaluation. This is sometimes called the substi
tution principle for bindings. For example, to evaluate the expression cos
x in the scope of the above declarations, we ﬁrst replace the occurrence
of x by its value (approximately 0.0), then compute as before, yielding
(approximately) 1.0. Later on we will have to reﬁne this simple princi
ple to take account of more sophisticated language features, but it is useful
nonetheless to keep this simple idea in mind.
3.3 Compound Declarations
Bindings may be combined to form declarations. A binding is an atomic
declaration, even though it may introduce many variables simultaneously.
Two declarations may be combined by sequential composition by simply writ
ing them one after the other, optionally separated by a semicolon. Thus we
may write the declaration
val m : int = 3+2
val n : int = m*m
which binds m to 5 and n to 25. Subsequently, we may evaluate m+n to
obtain the value 30. In general a sequential composition of declarations has
the form dec
1
. . . dec
n
, where n is at least 2. The scopes of these declarations
are nested within one another: the scope of dec
1
includes dec
2
, . . . , dec
n
, the
scope of dec
2
includes dec
3
, . . . , dec
n
, and so on.
One thing to keep in mind is that binding is not assignment. The binding
of a variable never changes; once bound to a value, it is always bound to
that value (within the scope of the binding). However, we may shadow a
binding by introducing a second binding for a variable within the scope of
the ﬁrst binding. Continuing the above example, we may write
val n : real = 2.17
to introduce a new variable n with both a different type and a different
value than the earlier binding. The new binding eclipses the old one, which
may then be discarded since it is no longer accessible. (Later on, we will
WORKING DRAFT DECEMBER 9, 2002
3.4 Limiting Scope 37
see that in the presence of higherorder functions shadowed bindings are
not always discarded, but are preserved as private data in a closure. One
might say that old bindings never die, they just fade away.)
3.4 Limiting Scope
The scope of a variable or type constructor may be delimited by using let
expressions and local declarations. A let expression has the form
let dec in exp end
where dec is any declaration and exp is any expression. The scope of the
declaration dec is limited to the expression exp. The bindings introduced
by dec are discarded upon completion of evaluation of exp.
Similarly, we may limit the scope of one declaration to another declara
tion by writing
local dec in dec
end.
The scope of the bindings in dec is limited to the declaration dec
. After
processing dec
, the bindings in dec may be discarded.
The value of a let expression is determined by evaluating the declara
tion part, then evaluating the expression relative to the bindings introduced
by the declaration, yielding this value as the overall value of the let ex
pression. An example will help clarify the idea:
let
val m : int = 3
val n : int = m*m
in
m*n
end
This expression has type int and value 27, as you can readily verify by
ﬁrst calculating the bindings for m and n, then computing the value of m*n
relative to these bindings. The bindings for m and n are local to the expres
sion m*n, and are not accessible from outside the expression.
If the declaration part of a let expression eclipses earlier bindings, the
ambient bindings are restored upon completion of evaluation of the let
expression. Thus the following expression evaluates to 54:
WORKING DRAFT DECEMBER 9, 2002
38 Declarations
val m : int = 2
val r : int =
let
val m : int = 3
val n : int = m*m
in
m*n
end * m
The binding of m is temporarily overridden during the evaluation of the
let expression, then restored upon completion of this evaluation.
3.5 Typing and Evaluation
To complete this chapter, let’s consider in more detail the contextsensitivity
of type checking and evaluation in the presence of bindings. The key ideas
are:
• Type checking must take account of the declared type of a variable.
• Evaluation must take account of the declared value of a variable.
This is achieved by maintaining environments for type checking and
evaluation. The type environment records the types of variables; the value
environment records their values. For example, after processing the com
pound declaration
val m : int = 0
val x : real = Math.sqrt(2.0)
val c : char = #"a"
the type environment contains the information
val m : int
val x : real
val c : char
and the value environment contains the information
val m = 0
val x = 1.414
val c = #"a"
WORKING DRAFT DECEMBER 9, 2002
3.5 Typing and Evaluation 39
In a sense the value declarations have been divided in “half”, separating
the type from the value information.
Thus we see that value bindings have signiﬁcance for both type check
ing and evaluation. In contrast type bindings have signiﬁcance only for
type checking, and hence contribute only to the type environment. A type
binding such as
type float = real
is recorded in its entirety in the type environment, and no change is made
to the value environment. Subsequently, whenever we encounter the type
constructor float in a type expression, it is replaced by real in accor
dance with the type binding above.
In chapter 2 we said that a typing assertion has the form exp : typ, and
that an evaluation assertion has the form exp ⇓ val . While twoplace typing
and evaluation assertions are sufﬁcient for closed expressions (those with
out variables), we must extend these relations to account for open expres
sions (those with variables). Each must be equipped with an environment
recording information about type constructors and variables introduced by
declarations.
Typing assertions are generalized to have the form
typenv ¬ exp : typ
where typenv is a type environment that records the bindings of type con
structors and the types of variables that may occur in exp.
2
We may think
of typenv as a sequence of speciﬁcations of one of the following two forms:
1. type typvar = typ
2. val var : typ
Note that the second form does not include the binding for var, only its
type!
Evaluation assertions are generalized to have the form
valenv ¬ exp ⇓ val
where valenv is a value environment that records the bindings of the vari
ables that may occur in exp. We may think of valenv as a sequence of spec
iﬁcations of the form
2
The turnstile symbol, “”, is simply a punctuation mark separating the type environ
ment from the expression and its type.
WORKING DRAFT DECEMBER 9, 2002
40 Declarations
val var = val
that bind the value val to the variable var.
Finally, we also need a new assertion, called type equivalence, that deter
mines when two types are equivalent, relative to a type environment. This
is written
typenv ¬ typ
1
≡ typ
2
Two types are equivalent iff they are the same when the type constructors
deﬁned in typenv are replaced by their bindings.
The primary use of a type environment is to record the types of the
value variables that are available for use in a given expression. This is
expressed by the following axiom:
. . . val var : typ . . . ¬ var : typ
In words, if the speciﬁcation val var : typ occurs in the type environ
ment, then we may conclude that the variable var has type typ. This rule
glosses over an important point. In order to account for shadowing we
require that the rightmost speciﬁcation govern the type of a variable. That
way rebinding of variables at different types behaves as expected.
Similarly, the evaluation relation must take account of the value envi
ronment. Evaluation of variables is governed by the following axiom:
. . . val var = val . . . ¬ var ⇓ val
Here again we assume that the val speciﬁcation is the rightmost one gov
erning the variable var to ensure that the scoping rules are respected.
The role of the type equivalence assertion is to ensure that type con
structors always stand for their bindings. This is expressed by the follow
ing axiom:
. . . type typvar = typ . . . ¬ typvar ≡ typ
Once again, the rightmost speciﬁcation for typvar governs the assertion.
WORKING DRAFT DECEMBER 9, 2002
Chapter 4
Functions
4.1 Functions as Templates
So far we just have the means to calculate the values of expressions, and to
bind these values to variables for future reference. In this chapter we will
introduce the ability to abstract the data from a calculation, leaving behind
the bare pattern of the calculation. This pattern may then be instantiated as
often as you like so that the calculation may be repeated with speciﬁed data
values plugged in.
For example, consider the expression 2*(3+4). The data might be
taken to be the values 2, 3, and 4, leaving behind the pattern * ( +
), with “holes” where the data used to be. We might equally well take the
data to just be 2 and 3, and leave behind the pattern * ( + 4). Or
we might even regard * and + as the data, leaving 2 (3 4) as the
pattern! What is important is that a complete expression can be recovered
by ﬁlling in the holes with chosen data.
Since a pattern can contain many different holes that can be indepen
dently instantiated, it is necessary to give names to the holes so that instan
tiation consists of plugging in a given value for all occurrences of a name in
an expression. These names are, of course, just variables, and instantiation
is just the process of substituting a value for all occurrences of a variable
in a given expression. A pattern may therefore be viewed as a function of
the variables that occur within it; the pattern is instantiated by applying the
function to argument values.
This view of functions is similar to our experience from high school al
gebra. In algebra we manipulate polynomials such as x
2
+2x+1 as a formof
expression denoting a real number, with the variable x representing a ﬁxed,
WORKING DRAFT DECEMBER 9, 2002
42 Functions
but unknown, quantity. (Indeed, variables in algebra are sometimes called
unknowns, or indeterminates.) It is also possible to think of a polynomial as
a function on the real line: given a real number x, a polynomial determines
a real number y computed by the given combination of arithmetic opera
tions. Indeed, we sometimes write equations such as f(x) = x
2
+ 2x + 1,
to stand for the function f determined by the polynomial. In the univariate
case we can get away with just writing the polynomial for the function, but
in the multivariate case we must be more careful: we may regard the poly
nomial x
2
+ 2xy + y
2
to be a function of x, a function of y, or a function of
both x and y. In these cases we write f(x) = x
2
+ 2xy + y
2
when x varies
and y is held ﬁxed, and g(y) = x
2
+2xy +y
2
when y varies for ﬁxed x, and
h(x, y) = x
2
+ 2xy + y
2
, when both vary jointly.
In algebra it is usually left implicit that the variables x and y range over
the real numbers, and that f, g, and h are functions on the real line. How
ever, to be fully explicit, we sometimes write something like
f : R →R : x ∈ R → x
2
+ 2x + 1
to indicate that f is a function on the reals sending x ∈ R to x
2
+2x+1 ∈ R.
This notation has the virtue of separating the name of the function, f, from
the function itself, the mapping that sends x ∈ R to x
2
+ 2x + 1. It also em
phasizes that functions are a kind of “value” in mathematics (namely, a cer
tain set of ordered pairs), and that the variable f is bound to that value (i.e.,
that set) by the declaration. This viewpoint is especially important once we
consider operators, such as the differential operator, that map functions to
functions. For example, if f is a differentiable function on the real line, the
function Df is its ﬁrst derivative, another function on the real line. In the
case of the function f deﬁned above the function Df sends x ∈ R to 2x+2.
4.2 Functions and Application
The treatment of functions in ML is very similar, except that we stress both
the algorithmic aspects of functions (how they determine values from argu
ments), as well as the extensional aspects (what they compute). As in math
ematics, a function in ML is a kind of value, namely a value of function type
of the form typ > typ
. The type typ is the domain type (the type of argu
ments) of the function, and typ
is its range type (the type of its results). We
compute with a function by applying it to an argument value of its domain
type and calculating the result, a value of its range type. Function appli
WORKING DRAFT DECEMBER 9, 2002
4.2 Functions and Application 43
cation is indicated by juxtaposition: we simply write the argument next to
the function.
The values of function type consist of primitive functions, such as addi
tion and square root, and function expressions, which are also called lambda
expressions,
1
of the form
fn var : typ => exp
The variable var is called the parameter, and the expression exp is called
the body. It has type typ>typ
provided that exp has type typ
under the
assumption that the parameter var has the type typ.
To apply such a function expression to an argument value val , we add
the binding
val var = val
to the value environment, and evaluate exp, obtaining a value val
. Then
the value binding for the parameter is removed, and the result value, val
,
is returned as the value of the application.
For example, Math.sqrt is a primitive function of type real>real
that may be applied to a real number to obtain its square root. For example,
the expression Math.sqrt 2.0 evaluates to 1.414 (approximately). We
can, if we wish, parenthesize the argument, writing Math.sqrt (2.0)
for the sake of clarity; this is especially useful for expressions such as Math.sqrt
(Math.sqrt 2.0). The square root function is built in. We may write the
fourth root function as the following function expression:
fn x : real => Math.sqrt (Math.sqrt x)
It may be applied to an argument by writing an expression such as
(fn x : real => Math.sqrt (Math.sqrt x)) (16.0),
which calculates the fourth root of 16.0. The calculation proceeds by bind
ing the variable x to the argument 16.0, then evaluating the expression
Math.sqrt (Math.sqrt x) in the presence of this binding. When eval
uation completes, we drop the binding of x from the environment, since it
is no longer needed.
Notice that we did not give the fourth root function a name; it is an
“anonymous” function. We may give it a name using the declaration forms
introduced in chapter 3. For example, we may bind the fourth root function
to the variable fourthroot using the following declaration:
1
For purely historical reasons.
WORKING DRAFT DECEMBER 9, 2002
44 Functions
val fourthroot : real > real =
fn x : real => Math.sqrt (Math.sqrt x)
We may then write fourthroot 16.0 to compute the fourth root of 16.0.
This notation for deﬁning functions quickly becomes tiresome, so ML
provides a special syntax for function bindings that is more concise and
natural. Instead of using the val binding above to deﬁne fourthroot,
we may instead write
fun fourthroot (x:real):real = Math.sqrt (Math.sqrt x)
This declaration has the same meaning as the earlier val binding, namely it
binds fn x:real => Math.sqrt(Math.sqrt x) to the variable fourthroot.
It is important to note that function applications in ML are evaluated
according to the callbyvalue rule: the arguments to a function are evalu
ated before the function is called. Put in other terms, functions are deﬁned
to act on values, rather than on unevaluated expressions. Thus, to evaluate
an expression such as fourthroot (2.0+2.0), we proceed as follows:
1. Evaluate fourthroot to the function value fn x : real => Math.sqrt
(Math.sqrt x).
2. Evaluate the argument 2.0+2.0 to its value 4.0
3. Bind x to the value 4.0.
4. Evaluate Math.sqrt (Math.sqrt x) to 1.414 (approximately).
(a) Evaluate Math.sqrt to a function value (the primitive square
root function).
(b) Evaluate the argument expression Math.sqrt x to its value,
approximately 1.414.
i. Evaluate Math.sqrt to a function value (the primitive square
root function).
ii. Evaluate x to its value, 2.0.
iii. Compute the square root of 2.0, yielding 1.414.
(c) Compute the square root of 1.414, yielding 1.189.
5. Drop the binding for the variable x.
Notice that we evaluate both the function and argument positions of an
application expression — both the function and argument are expressions
WORKING DRAFT DECEMBER 9, 2002
4.3 Binding and Scope, Revisited 45
yielding values of the appropriate type. The value of the function position
must be a value of function type, either a primitive function or a lambda
expression, and the value of the argument position must be a value of the
domain type of the function. In this case the result value (if any) will be of
the range type of the function. Functions in ML are ﬁrstclass, meaning that
they may be computed as the value of an expression. We are not limited to
applying only named functions, but rather may compute “new” functions
on the ﬂy and apply these to arguments. This is a source of considerable
expressive power, as we shall see in the sequel.
Using similar techniques we may deﬁne functions with arbitrary do
main and range. For example, the following are all valid function declara
tions:
fun srev (s:string):string = implode (rev (explode s))
fun pal (s:string):string = s ˆ (srev s)
fun double (n:int):int = n + n
fun square (n:int):int = n * n
fun halve (n:int):int = n div 2
fun is even (n:int):bool = (n mod 2 = 0)
Thus pal "ot" evaluates to the string "otto", and is even 4 evaluates
to true.
4.3 Binding and Scope, Revisited
A function expression of the form
fn var:typ => exp
binds the variable var within the body exp of the function. Unlike val
bindings, function expressions bind a variable without giving it a speciﬁc
value. The value of the parameter is only determined when the function is
applied, and then only temporarily, for the duration of the evaluation of its
body.
It is worth reviewing the rules for binding and scope of variables that
we introduced in chapter 3 in the presence of function expressions. As be
fore we adhere to the principle of static scope, according to which variables
are taken to refer to the nearest enclosing binding of that variable, whether by
a val binding or by a fn expression.
Thus, in the following example, the occurrences of x in the body of the
function f refer to the parameter of f, whereas the occurrences of x in the
body of g refer to the preceding val binding.
WORKING DRAFT DECEMBER 9, 2002
46 Functions
val x:real = 2.0
fun f(x:real):real = x+x
fun g(y:real):real = x+y
Local val bindings may shadowparameters, as well as other val bindings.
For example, consider the following function declaration:
fun h(x:real):real =
let val x:real = 2.0 in x+x end * x
The inner binding of x by the val declaration shadows the parameter x of
h, but only within the body of the let expression. Thus the last occurrence
of x refers to the parameter of h, whereas the preceding two occurrences
refer to the inner binding of x to 2.0.
The phrases “inner” and “outer” binding refer to the logical structure,
or abstract syntax of an expression. In the preceding example, the body
of h lies “within” the scope of the parameter x, and the expression x+x
lies within the scope of the val binding for x. Since the occurrences of x
within the body of the let lie within the scope of the inner val binding,
they are taken to refer to that binding, rather than to the parameter. On the
other hand the last occurrence of x does not lie within the scope of the val
binding, and hence refers to the paramter of h.
In general the names of parameters do not matter; we can rename them
at will without affecting the meaning of the program, provided that we
simultaneously (and consistently) rename the binding occurrence and all
uses of that variable. Thus the functions f and g below are completely
equivalent to each other:
fun f(x:int):int = x*x
fun g(y:int):int = y*y
A parameter is just a placeholder; its name is not important.
Our ability to rename parameters is constrained by the static scoping
rule. We may rename a parameter to whatever we’d like, provided that we
don’t change the way in which uses of a variable are resolved. For example,
consider the following situation:
val x:real = 2.0
fun h(y:real):real = x+y
The parameter y to h may be renamed to z without affecting its meaning.
However, we may not rename it to x, for doing so changes its meaning!
That is, the function
WORKING DRAFT DECEMBER 9, 2002
4.3 Binding and Scope, Revisited 47
fun h’(x:real):real = x+x
does not have the same meaning as h, because now both occurrences of x
in the body of h’ refer to the parameter, whereas in h the variable x refers
to the outer val binding, whereas the variable y refers to the parameter.
While this may seem like a minor technical issue, it is essential that you
master these concepts now, for they play a central, and rather subtle, role
later on.
WORKING DRAFT DECEMBER 9, 2002
48 Functions
WORKING DRAFT DECEMBER 9, 2002
Chapter 5
Products and Records
5.1 Product Types
A distinguishing feature of ML is that aggregate data structures, such as
tuples, lists, arrays, or trees, may be created and manipulated with ease.
In contrast to most familiar languages it is not necessary in ML to be con
cerned with allocation and deallocation of data structures, nor with any
particular representation strategy involving, say, pointers or address arith
metic. Instead we may think of data structures as ﬁrstclass values, on a par
with every other value in the language. Just as it is unnecessary to think
about “allocating” integers to evaluate an arithmetic expression, it is un
necessary to think about allocating more complex data structures such as
tuples or lists.
5.1.1 Tuples
This chapter is concerned with the simplest form of aggregate data struc
ture, the ntuple. An ntuple is a ﬁnite ordered sequence of values of the
form
(val
1
,...,val
n
),
where each val
i
is a value. A 2tuple is usually called a pair, a 3tuple a
triple, and so on.
An ntuple is a value of a product type of the form
typ
1
*... *typ
n
.
Values of this type are ntuples of the form
WORKING DRAFT DECEMBER 9, 2002
50 Products and Records
(val
1
,...,val
n
),
where val
i
is a value of type typ
i
(for each 1 ≤ i ≤ n).
Thus the following are wellformed bindings:
val pair : int * int = (2, 3)
val triple : int * real * string = (2, 2.0, "2")
val quadruple
: int * int * real * real
= (2,3,2.0,3.0)
val pair of pairs
: (int * int) * (real * real)
= ((2,3),(2.0,3.0))
The nesting of parentheses matters! A pair of pairs is not the same as
a quadruple, so the last two bindings are of distinct values with distinct
types.
There are two limiting cases, n = 0 and n = 1, that deserve special
attention. A 0tuple, which is also known as a null tuple, is the empty se
quence of values, (). It is a value of type unit, which may be thought of as
the 0tuple type.
1
The null type is surprisingly useful, especially when pro
gramming with effects. On the other hand there seems to be no particular
use for 1tuples, and so they are absent from the language.
As a convenience, ML also provides a general tuple expression of the
form
(exp
1
,...,exp
n
),
where each exp
i
is an arbitrary expression, not necessarily a value. Tuple
expressions are evaluated from left to right, so that the above tuple expres
sion evaluates to the tuple value yielding the tuple value
(val
1
,...,val
n
),
provided that exp
1
evaluates to val
1
, exp
2
evaluates to val
2
, and so on. For
example, the binding
val pair : int * int = (1+1, 52)
1
In Java (and other languages) the type unit is misleadingly written void. The name
“void” suggests that the type has no members, but in fact it has exactly one member, the
null tuple!
WORKING DRAFT DECEMBER 9, 2002
5.1 Product Types 51
binds the value (2, 3) to the variable pair.
Strictly speaking, it is not essential to have tuple expressions as a prim
itive notion in the language. Rather than write
(exp
1
,...,exp
n
),
with the (implicit) understanding that the exp
i
’s are evaluated from left to
right, we may instead write
let val x1 = exp
1
val x2 = exp
2
.
.
.
val xn = exp
n
in (x1,...,xn) end
which makes the evaluation order explicit.
5.1.2 Tuple Patterns
One of the most powerful, and distinctive, features of ML is the use of
pattern matching to access components of aggregate data structures. For
example, suppose that val is a value of type
(int * string) * (real * char)
and we wish to retrieve the ﬁrst component of the second component of
val , a value of type real. Rather than explicitly “navigate” to this position
to retrieve it, we may simply use a generalized form of value binding in
which we select that component using a pattern:
val (( , ), (r:real, )) = val
The lefthand side of the val binding is a tuple pattern that describes a
pair of pairs, binding the ﬁrst component of the second component to the
variable r. The underscores indicate “don’t care” positions in the pattern
— their values are not bound to any variable. If we wish to give names to
all of the components, we may use the following value binding:
val ((i:int, s:string), (r:real, c:char)) = val
If we’d like we can even give names to the ﬁrst and second components of
the pair, without decomposing them into constituent parts:
val (is:int*string,rc:real*char) = val
WORKING DRAFT DECEMBER 9, 2002
52 Products and Records
The general form of a value binding is
val pat = exp,
where pat is a pattern and exp is an expression. A pattern is one of three
forms:
1. A variable pattern of the form var:typ.
2. A tuple pattern of the form (pat
1
,...,pat
n
), where each pat
i
is a
pattern. This includes as a special case the nulltuple pattern, ().
3. A wildcard pattern of the form .
The type of a pattern is determined by an inductive analysis of the form
of the pattern:
1. A variable pattern var:typ is of type typ.
2. A tuple pattern (pat
1
,...,pat
n
) has type typ
1
* *typ
n
, where
each pat
i
is a pattern of type typ
i
. The nulltuple pattern () has type
unit.
3. The wildcard pattern has any type whatsoever.
A value binding of the form
val pat = exp
is welltyped iff pat and exp have the same type; otherwise the binding is
illtyped and is rejected.
For example, the following bindings are welltyped:
val (m:int, n:int) = (7+1,4 div 2)
val (m:int, r:real, s:string) = (7, 7.0, "7")
val ((m:int,n:int), (r:real, s:real)) = ((4,5),(3.1,2.7))
val (m:int, n:int, r:real, s:real) = (4,5,3.1,2.7)
In contrast, the following are illtyped:
val (m:int,n:int,r:real,s:real) = ((4,5),(3.1,2.7))
val (m:int, r:real) = (7+1,4 div 2)
val (m:int, r:real) = (7, 7.0, "7")
WORKING DRAFT DECEMBER 9, 2002
5.1 Product Types 53
Value bindings are evaluated using the bindbyvalue principle discussed
earlier, except that the binding process is now more complex than before.
First, we evaluate the righthand side of the binding to a value (if indeed it
has one). This happens regardless of the form of the pattern — the right
hand side is always evaluated. Second, we perform pattern matching to de
termine the bindings for the variables in the pattern.
The process of matching a value against a pattern is deﬁned by a set of
rules for reducing bindings with complex patterns to a set of bindings with
simpler patterns, stopping once we reach a binding with a variable pattern.
The rules are as follows:
1. The variable binding val var = val is irreducible.
2. The wildcard binding val = val is discarded.
3. The tuple binding
val (pat
1
,...,pat
n
) =
(val
1
,...,val
n
)
is reduced to the set of n bindings
val pat
1
= val
1
.
.
.
val pat
n
= val
n
In the case that n = 0 the tuple binding is simply discarded.
These simpliﬁcations are repeated until all bindings are irreducible, which
leaves us with a set of variable bindings that constitute the result of pattern
matching.
For example, evaluation of the binding
val ((m:int,n:int), (r:real, s:real)) = ((2,3),(2.0,3.0))
proceeds as follows. First, we compose this binding into the following two
bindings:
val (m:int, n:int) = (2,3)
val (r:real, s:real) = (2.0,3.0).
Then we decompose each of these bindings in turn, resulting in the follow
ing set of four atomic bindings:
WORKING DRAFT DECEMBER 9, 2002
54 Products and Records
val m:int = 2
val n:int = 3
val r:real = 2.0
val s:real = 3.0
At this point the patternmatching process is complete.
5.2 Record Types
Tuples are most useful when the number of positions is small. When the
number of components grows beyond a small number, it becomes difﬁcult
to remember which position plays which role. In that case it is more natural
to attach a label to each component of the tuple that mediates access to it.
This is the notion of a record type.
A record type has the form
¦lab
1
:typ
1
,...,lab
n
:typ
n
¦,
where n ≥ 0, and all of the labels lab
i
are distinct. A record value has the
form
¦lab
1
=val
1
,...,lab
n
=val
n
¦,
where val
i
has type typ
i
. A record pattern has the form
¦lab
1
=pat
1
,...,lab
n
=pat
n
¦
which has type
¦lab
1
:typ
1
,...,lab
n
:typ
n
¦
provided that each pat
i
has type typ
i
.
A record value binding of the form
val
¦lab
1
=pat
1
,...,lab
n
=pat
n
¦ =
¦lab
1
=val
1
,...,lab
n
=val
n
¦
is decomposed into the following set of bindings
val pat
1
= val
1
.
.
.
val pat
n
= val
n
.
WORKING DRAFT DECEMBER 9, 2002
5.2 Record Types 55
Since the components of a record are identiﬁed by name, not position, the
order in which they occur in a record value or record pattern is not impor
tant. However, in a record expression (in which the components may not
be fully evaluated), the ﬁelds are evaluated from left to right in the order
written, just as for tuple expressions.
Here are some examples to help clarify the use of record types. First, let
us deﬁne the record type hyperlink as follows:
type hyperlink =
¦ protocol : string,
address : string,
display : string ¦
The record binding
val mailto rwh : hyperlink =
¦ protocol="mailto",
address="rwh@cs.cmu.edu",
display="Robert Harper" ¦
deﬁnes a variable of type hyperlink. The record binding
val ¦ protocol=prot, display=disp, address=addr ¦ = mailto rwh
decomposes into the three variable bindings
val prot = "mailto"
val addr = "rwh@cs.cmu.edu"
val disp = "Robert Harper"
which extract the values of the ﬁelds of mailto rwh.
Using wild cards we can extract selected ﬁelds from a record. For ex
ample, we may write
val ¦protocol=prot, address= , display= ¦ = mailto rwh
to bind the variable prot to the protocol ﬁeld of the record value mailto rwh.
It is quite common to encounter record types with tens of ﬁelds. In
such cases even the wild card notation doesn’t help much when it comes to
selecting one or two ﬁelds from such a record. For this we often use ellipsis
patterns in records, as illustrated by the following example.
val ¦protocol=prot,...¦ = intro home
The pattern ¦protocol=prot,...¦ stands for the expanded pattern
WORKING DRAFT DECEMBER 9, 2002
56 Products and Records
¦protocol=prot, address= , display= ¦
in which the ellided ﬁelds are implicitly bound to wildcard patterns.
In general the ellipsis is replaced by as many wildcard bindings as are
necessary to ﬁll out the pattern to be consistent with its type. In order for
this to occur the compiler must be able to determine unambiguously the type of
the record pattern. Here the righthand side of the value binding determines
the type of the pattern, which then determines which additional ﬁelds to
ﬁll in. In some situations the context does not disambiguate, in which case
you must supply additional type information, or avoid the use of ellipsis
notation.
Finally, ML provides a convenient abbreviated form of record pattern
¦lab
1
,...,lab
n
¦
which stands for the pattern
¦lab
1
=var
1
,...,lab
n
=var
n
¦
where the variables var
i
are variables with the same name as the corre
sponding label lab
i
. For example, the binding
val ¦ protocol, address, display ¦ = mailto rwh
decomposes into the sequence of atomic bindings
val protocol = "mailto"
val address = "rwh@cs.cmu.edu"
val display = "Robert Harper"
This avoids the need to think up a variable name for each ﬁeld; we can just
make the label do “double duty” as a variable.
5.3 Multiple Arguments and Multiple Results
A function may bind more than one argument by using a pattern, rather
than a variable, in the argument position. Function expressions are gener
alized to have the form
fn pat => exp
WORKING DRAFT DECEMBER 9, 2002
5.3 Multiple Arguments and Multiple Results 57
where pat is a pattern and exp is an expression. Application of such a func
tion proceeds much as before, except that the argument value is matched
against the parameter pattern to determine the bindings of zero or more
variables, which are then used during the evaluation of the body of the
function.
For example, we may make the following deﬁnition of the Euclidean
distance function:
val dist
: real * real > real
= fn (x:real, y:real) => sqrt (x*x + y*y)
This function may then be applied to a pair (a twotuple!) of arguments to
yield the distance between them. For example, dist (2.0,3.0) evalu
ates to (approximately) 4.0.
Using fun notation, the distance function may be deﬁned more con
cisely as follows:
fun dist (x:real, y:real):real = sqrt (x*x + y*y)
The meaning is the same as the more verbose val binding given earlier.
Keyword parameter passing is supported through the use of record pat
terns. For example, we may deﬁne the distance function using keyword
parameters as follows:
fun dist’ ¦x=x:real, y=y:real¦ = sqrt (x*x + y*y)
The expression dist’ ¦x=2.0,y=3.0¦ invokes this function with the in
dicated x and y values.
Functions with multiple results may be thought of as functions yielding
tuples (or records). For example, we might compute two different notions
of distance between two points at once as follows:
fun dist2 (x:real, y:real):real*real
= (sqrt (x*x+y*y), abs(xy))
Notice that the result type is a pair, which may be thought of as two results.
These examples illustrate a pleasing regularity in the design of ML.
Rather than introduce ad hoc notions such as multiple arguments, multiple
results, or keyword parameters, we make use of the general mechanisms
of tuples, records, and pattern matching.
It is sometimes useful to have a function to select a particular compo
nent from a tuple or record (e.g., the third component or the component
WORKING DRAFT DECEMBER 9, 2002
58 Products and Records
with a given labeled). Such functions may be easily deﬁned using pattern
matching. But since they arise so frequently, they are predeﬁned in ML
using sharp notation. For any tuple type
typ
1
* *typ
n
,
and each 1 ≤ i ≤ n, there is a function #i of type
typ
1
* *typ
n
>typ
i
deﬁned as follows:
fun #i ( , ..., , x, , ..., ) = x
where x occurs in the ith position of the tuple (and there are underscores
in the other n −1 positions).
Thus we may refer to the second ﬁeld of a threetuple val by writing
#2(val ). It is bad style, however, to overuse the sharp notation; code
is generally clearer and easier to maintain if you use patterns wherever
possible. Compare, for example, the following deﬁnition of the Euclidean
distance function written using sharp notation with the original.
fun dist (p:real*real):real
= sqrt((#1 p)*(#1 p)+(#2 p)*(#2 p))
You can easily see that this gets out of hand very quickly, leading to un
readable code. Use of the sharp notation is strongly discouraged!
A similar notation is provided for record ﬁeld selection. The following
function #lab selects the component of a record with label lab.
fun #lab ¦lab=x,...¦ = x
Notice the use of ellipsis! Bear in mind the disambiguation requirement:
any use of #lab must be in a context sufﬁcient to determine the full record
type of its argument.
WORKING DRAFT DECEMBER 9, 2002
Chapter 6
Case Analysis
6.1 Homegeneous and Heterogeneous Types
Tuple types have the property that all values of that type have the same
form (ntuples, for some n determined by the type); they are said to be
homogeneous. For example, all values of type int*real are pairs whose
ﬁrst component is an integer and whose second component is a real. Any
typecorrect pattern will match any value of that type; there is no possibility
of failure of pattern matching. The pattern (x:int,y:real) is of type
int*real and hence will match any value of that type. On the other hand
the pattern (x:int,y:real,z:string) is of type int*real*string
and cannot be used to match against values of type int*real; attempting
to do so fails at compile time.
Other types have values of more than one form; they are said to be het
erogeneous types. For example, a value of type int might be 0, 1, ˜1, . . . or
a value of type char might be #"a" or #"z". (Other examples of hetero
geneous types will arise later on.) Corresponding to each of the values of
these types is a pattern that matches only that value. Attempting to match
any other value against that pattern fails at execution time with an error con
dition called a bind failure.
Here are some examples of patternmatching against values of a hetero
geneous type:
val 0 = 11
val (0,x) = (11, 34)
val (0, #"0") = (21, #"0")
The ﬁrst two bindings succeed, the third fails. In the case of the second, the
WORKING DRAFT DECEMBER 9, 2002
60 Case Analysis
variable x is bound to 34 after the match. No variables are bound in the
ﬁrst or third examples.
6.2 Clausal Function Expressions
The importance of constant patterns becomes clearer once we consider how
to deﬁne functions over heterogeneous types. This is achieved in ML using
a clausal function expression whose general form is
fn pat
1
=> exp
1

.
.
.
 pat
n
=> exp
n
Each pat
i
is a pattern and each exp
i
is an expression involving the variables
of the pattern pat
i
. Each component pat=>exp is called a clause, or a rule.
The entire assembly of rules is called a match.
The typing rules for matches ensure consistency of the clauses. Speciﬁ
cally, there must exist types typ
1
and typ
2
such that
1. Each pattern pat
i
has type typ
1
.
2. Each expression exp
i
has type typ
2
, given the types of the variables in
pattern pat
i
.
If these requirements are satisﬁed, the function has the type typ
1
>typ
2
.
Application of a clausal function to a value val proceeds by considering
the clauses in the order written. At stage i, where 1 ≤ i ≤ n, the argument
value val is matched against the pattern pat
i
; if the pattern match succeeds,
evaluation continues with the evaluation of expression exp
i
, with the vari
ables of pat
i
replaced by their values as determined by pattern matching.
Otherwise we proceed to stage i + 1. If no pattern matches (i.e., we reach
stage n+1), then the application fails with an execution error called a match
failure.
Here’s an example. Consider the following clausal function:
val recip : int > int =
fn 0 => 0  n:int => 1 div n
This deﬁnes an integervalued reciprocal function on the integers, where
the reciprocal of 0 is arbitrarily deﬁned to be 0. The function has two
clauses, one for the argument 0, the other for nonzero arguments n. (Note
that n is guaranteed to be nonzero because the patterns are considered in
WORKING DRAFT DECEMBER 9, 2002
6.3 Booleans and Conditionals, Revisited 61
order: we reach the pattern n:int only if the argument fails to match the
pattern 0.)
The fun notation is also generalized so that we may deﬁne recip using
the following more concise syntax:
fun recip 0 = 0
 recip (n:int) = 1 div n
One annoying thing to watch out for is that the fun form uses an equal
sign to separate the pattern from the expression in a clause, whereas the fn
form uses a double arrow.
Case analysis on the values of a heterogeneous type is performed by
application of a clausallydeﬁned function. The notation
case exp
of pat
1
=> exp
1
 ...
 pat
n
=> exp
n
is short for the application
(fn pat
1
=> exp
1
 ...
 pat
n
=> exp
n
)
exp.
Evaluation proceeds by ﬁrst evaluating exp, then matching its value succes
sively against the patterns in the match until one succeeds, and continuing
with evaluation of the corresponding expression. The case expression fails
if no pattern succeeds to match the value.
6.3 Booleans and Conditionals, Revisited
The type bool of booleans is perhaps the most basic example of a hetero
geneous type. Its values are true and false. Functions may be deﬁned
on booleans using clausal deﬁnitions that match against the patterns true
and false.
For example, the negation function may be deﬁned clausally as follows:
fun not true = false
 not false = true
The conditional expression
WORKING DRAFT DECEMBER 9, 2002
62 Case Analysis
if exp then exp
1
else exp
2
is shorthand for the case analysis
case exp
of true => exp
1
 false => exp
2
which is itself shorthand for the application
(fn true => exp
1
 false => exp
2
) exp.
The “shortcircuit” conjunction and disjunction operations are deﬁned
as follows. The expression exp
1
andalso exp
2
is short for
if exp
1
then exp
2
else false
and the expression exp
1
orelse exp
2
is short for
if exp
1
then true else exp
2
.
You should expand these into case expressions and check that they behave
as expected. Pay particular attention to the evaluation order, and observe
that the callbyvalue principle is not violated by these expressions.
6.4 Exhaustiveness and Redundancy
Matches are subject to two forms of “sanity check” as an aid to the ML
programmer. The ﬁrst, called exhaustiveness checking, ensures that a well
formed match covers its domain type in the sense that every value of the
domain must match one of its clauses. The second, called redundancy check
ing, ensures that no clause of a match is subsumed by the clauses that pre
cede it. This means that the set of values covered by a clause in a match
must not be contained entirely within the set of values covered by the pre
ceding clauses of that match.
Redundant clauses are always a mistake — such a clause can never be
executed. Redundant rules often arise accidentally. For example, the sec
ond rule of the following clausal function deﬁnition is redundant:
fun not True = false
 not False = true
WORKING DRAFT DECEMBER 9, 2002
6.4 Exhaustiveness and Redundancy 63
By capitalizing True we have turned it into a variable, rather than a con
stant pattern. Consequently, every value matches the ﬁrst clause, rendering
the second redundant.
Since the clauses of a match are considered in the order they are writ
ten, redundancy checking is correspondingly ordersensitive. In particu
lar, changing the order of clauses in a wellformed, irredundant match can
make it redundant, as in the following example:
fun recip (n:int) = 1 div n
 recip 0 = 0
The second clause is redundant because the ﬁrst matches any integer value,
including 0.
Inexhaustive matches may or may not be in error, depending on whether
the match might ever be applied to a value that is not covered by any
clause. Here is an example of a function with an inexhaustive match that is
plausibly in error:
fun is numeric #"0" = true
 is numeric #"1" = true
 is numeric #"2" = true
 is numeric #"3" = true
 is numeric #"4" = true
 is numeric #"5" = true
 is numeric #"6" = true
 is numeric #"7" = true
 is numeric #"8" = true
 is numeric #"9" = true
When applied to, say, #"a", this function fails. Indeed, the function never
returns false for any argument!
Perhaps what was intended here is to include a catchall clause at the
end:
WORKING DRAFT DECEMBER 9, 2002
64 Case Analysis
fun is numeric #"0" = true
 is numeric #"1" = true
 is numeric #"2" = true
 is numeric #"3" = true
 is numeric #"4" = true
 is numeric #"5" = true
 is numeric #"6" = true
 is numeric #"7" = true
 is numeric #"8" = true
 is numeric #"9" = true
 is numeric = false
The addition of a ﬁnal catchall clause renders the match exhaustive, be
cause any value not matched by the ﬁrst ten clauses will surely be matched
by the eleventh.
Having said that, it is a very bad idea to simply add a catchall clause
to the end of every match to suppress inexhaustiveness warnings from the
compiler. The exhaustiveness checker is your friend! Each such warning is
a suggestion to doublecheck that match to be sure that you’ve not made
a silly error of omission, but rather have intentionally left out cases that
are ruled out by the invariants of the program. In chapter 10 we will see
that the exhaustiveness checker is an extremely valuable tool for managing
code evolution.
WORKING DRAFT DECEMBER 9, 2002
Chapter 7
Recursive Functions
So far we’ve only considered very simple functions (such as the reciprocal
function) whose value is computed by a simple composition of primitive
functions. In this chapter we introduce recursive functions, the principal
means of iterative computation in ML. Informally, a recursive function is
one that computes the result of a call by possibly making further calls to
itself. Obviously, to avoid inﬁnite regress, some calls must return their re
sults without making any recursive calls. Those that do must ensure that
the arguments are, in some sense, “smaller” so that the process will even
tually terminate.
This informal description obscures a central point, namely the means
by which we may convince ourselves that a function computes the result
that we intend. In general we must prove that for all inputs of the do
main type, the body of the function computes the “correct” value of result
type. Usually the argument imposes some additional assumptions on the
inputs, called the preconditions. The correctness requirement for the re
sult is called a postcondition. Our burden is to prove that for every input
satisfying the preconditions, the body evaluates to a result satisfying the
postcondition. In fact we may carry out such an analysis for many differ
ent pre and postcondition pairs, according to our interest. For example,
the ML type checker proves that the body of a function yields a value of
the range type (if it terminates) whenever it is given an argument of the do
main type. Here the domain type is the precondition, and the range type
is the postcondition. In most cases we are interested in deeper properties,
examples of which we shall consider below.
To prove the correctness of a recursive function (with respect to given
pre and postconditions) it is typically necessary to use some form of in
WORKING DRAFT DECEMBER 9, 2002
66 Recursive Functions
ductive reasoning. The base cases of the induction correspond to those
cases that make no recursive calls; the inductive step corresponds to those
that do. The beauty of inductive reasoning is that we may assume that the
recursive calls work correctly when showing that a case involving recur
sive calls is correct. We must separately show that the base cases satisfy
the given pre and postconditions. Taken together, these two steps are
sufﬁcient to establish the correctness of the function itself, by appeal to an
induction principle that justiﬁes the particular pattern of recursion.
No doubt this all sounds fairly theoretical. The point of this chapter is
to show that it is also profoundly practical.
7.1 SelfReference and Recursion
In order for a function to “call itself”, it must have a name by which it can
refer to itself. This is achieved by using a recursive value binding, which are
ordinary value bindings qualiﬁed by the keyword rec. The simplest form
of a recursive value binding is as follows:
val rec var:typ = val .
As in the nonrecursive case, the lefthand is a pattern, but here the right
hand side must be a value. In fact the righthand side must be a function
expression, since only functions may be deﬁned recursively in ML. The
function may refer to itself by using the variable var.
Here’s an example of a recursive value binding:
val rec factorial : int>int =
fn 0 => 1  n:int => n * factorial (n1)
Using fun notation we may write the deﬁnition of factorial much more
clearly and concisely as follows:
fun factorial 0 = 1
 factorial (n:int) = n * factorial (n1)
There is obviously a close correspondence between this formulation of factorial
and the usual textbook deﬁnition of the factorial function in terms of recur
sion equations:
0! = 1
n! = n (n −1)! (n > 0)
Recursive value bindings are typechecked in a manner that may, at ﬁrst
glance, seem paradoxical. To check that the binding
WORKING DRAFT DECEMBER 9, 2002
7.1 SelfReference and Recursion 67
val rec var : typ = val
is wellformed, we ensure that the value val has type typ, assuming that
var has type typ. Since var refers to the value val itself, we are in effect
assuming what we intend to prove while proving it!
(Incidentally, since val is required to be a function expression, the type
typ will always be a function type.)
Let’s look at an example. To check that the binding for factorial
given above is wellformed, we assume that the variable factorial has
type int>int, then check that its deﬁnition, the function
fn 0 => 1  n:int => n * factorial (n1),
has type int>int. To do so we must check that each clause has type
int>int by checking for each clause that its pattern has type int and
that its expression has type int. This is clearly true for the ﬁrst clause of
the deﬁnition. For the second, we assume that n has type int, then check
that n * factorial (n1) has type int. This is so because of the rules
for the primitive arithmetic operations and because of our assumption that
factorial has type int>int.
How are applications of recursive functions evaluated? The rules are
almost the same as before, with one modiﬁcation. We must arrange that
all occurrences of the variable standing for the function are replaced by the
function itself before we evaluate the body. That way all references to the
variable standing for the function itself are indeed references to the function
itself!
Suppose that we have the following recursive function binding
val rec var : typ =
fn pat
1
=> exp
1
 ...
 pat
n
=> exp
n
and we wish to apply var to the value val of type typ. As before, we con
sider each clause in turn, until we ﬁnd the ﬁrst pattern pat
i
matching val .
We proceed, as before, by evaluating exp
i
, replacing the variables in pat
i
by
the bindings determined by pattern matching, but, in addition, we replace
all occurrences of the var by its binding in exp
i
before continuing evalua
tion.
For example, to evaluate factorial 3, we proceed by retrieving the
binding of factorial and evaluating
WORKING DRAFT DECEMBER 9, 2002
68 Recursive Functions
(fn 0=>1  n:int => n*factorial(n1))(3).
Considering each clause in turn, we ﬁnd that the ﬁrst doesn’t match, but the
second does. We therefore continue by evaluating its righthand side, the
expression n * factorial(n1), after replacing n by 3 and factorial
by its deﬁnition. We are left with the subproblem of evaluating the expres
sion
3 * (fn 0 => 1  n:int => n*factorial(n1))(2)
Proceeding as before, we reduce this to the subproblem of evaluating
3 * (2 * (fn 0=>1  n:int => n*factorial(n1))(1)),
which reduces to the subproblem of evaluating
3 * (2 * (1 * (fn 0=>1  n:int => n*factorial(n1))(0))),
which reduces to
3 * (2 * (1 * 1)),
which then evaluates to 6, as desired.
Observe that the repeated substitution of factorial by its deﬁnition
ensures that the recursive calls really do refer to the factorial function it
self. Also observe that the size of the subproblems grows until there are
no more recursive calls, at which point the computation can complete. In
broad outline, the computation proceeds as follows:
1. factorial 3
2. 3 * factorial 2
3. 3 * 2 * factorial 1
4. 3 * 2 * 1 * factorial 0
5. 3 * 2 * 1 * 1
6. 3 * 2 * 1
7. 3 * 2
8. 6
Notice that the size of the expression ﬁrst grows (in direct proportion to
the argument), then shrinks as the pending multiplications are completed.
This growth in expression size corresponds directly to a growth in runtime
storage required to record the state of the pending computation.
WORKING DRAFT DECEMBER 9, 2002
7.2 Iteration 69
7.2 Iteration
The deﬁnition of factorial given above should be contrasted with the
following twopart deﬁnition:
fun helper (0,r:int) = r
 helper (n:int,r:int) = helper (n1,n*r)
fun factorial (n:int) = helper (n, 1)
First we deﬁne a “helper” function that takes two parameters, an integer
argument and an accumulator that records the running partial result of the
computation. The idea is that the accumulator reassociates the pending
multiplications in the evaluation trace given above so that they can be per
formed prior to the recursive call, rather than after it completes. This re
duces the space required to keep track of those pending steps. Second, we
deﬁne factorial by calling helper with argument n and initial accumu
lator value 1, corresponding to the product of zero terms (empty preﬁx).
As a matter of programming style, it is usual to conceal the deﬁnitions
of helper functions using a local declaration. In practice we would make
the following deﬁnition of the iterative version of factorial:
local
fun helper (0,r:int) = r
 helper (n:int,r:int) = helper (n1,n*r)
in
fun factorial (n:int) = helper (n,1)
end
This way the helper function is not visible, only the function of interest is
“exported” by the declaration.
The important thing to observe about helper is that it is iterative, or
tail recursive, meaning that the recursive call is the last step of evaluation of
an application of it to an argument. This means that the evaluation trace of
a call to helper with arguments (3,1) has the following general form:
1. helper (3, 1)
2. helper (2, 3)
3. helper (1, 6)
4. helper (0, 6)
WORKING DRAFT DECEMBER 9, 2002
70 Recursive Functions
5. 6
Notice that there is no growth in the size of the expression because there
are no pending computations to be resumed upon completion of the re
cursive call. Consequently, there is no growth in the space required for an
application, in contrast to the ﬁrst deﬁnition given above. Tail recursive
deﬁnitions are analogous to loops in imperative languages: they merely
iterate a computation, without requiring auxiliary storage.
7.3 Inductive Reasoning
Time and space usage are important, but what is more important is that
the function compute the intended result. The key to the correctness of
a recursive function is an inductive argument establishing its correctness.
The critical ingredients are these:
1. An inputoutput speciﬁcation of the intended behavior stating preconditions
on the arguments and a postcondition on the result.
2. A proof that the speciﬁcation holds for each clause of the function,
assuming that it holds for any recursive calls.
3. An induction principle that justiﬁes the correctness of the function as a
whole, given the correctness of its clauses.
We’ll illustrate the use of inductive reasoning by a graduated series of
examples. First consider the simple, nontail recursive deﬁnition of factorial
given in section 7.1. One reasonable speciﬁcation for factorial is as fol
lows:
1. Precondition: n ≥ 0.
2. Postcondition: factorial n evaluates to n!.
We are to establish the following statement of correctness of factorial:
if n ≥ 0, then factorial n evaluates to n!.
That is, we show that the preconditions imply the postcondition holds of
the result of any application. This is called a total correctness assertion be
cause it states not only that the postcondition holds of any result of appli
cation, but, moreover, that every application in fact yields a result (subject
to the precondition on the argument).
WORKING DRAFT DECEMBER 9, 2002
7.3 Inductive Reasoning 71
In contrast, a partial correctness assertion does not insist on termination,
only that the postcondition holds whenever the application terminates.
This may be stated as the assertion
if n ≥ 0 and factorial n evaluates to p, then p = n!.
Notice that this statement is true of a function that diverges whenever it is
applied! In this sense a partial correctness assertion is weaker than a total
correctness assertion.
Let us establish the total correctness of factorial using the pre and
postconditions stated above. To do so, we apply the principle of mathemat
ical induction on the argument n. Recall that this means we are to establish
the speciﬁcation for the case n = 0, and, assuming it to hold for n >= 0,
show that it holds for n + 1. The base case, n = 0, is trivial: by deﬁnition
factorial n evaluates to 1, which is 0!. Now suppose that n = m + 1
for some m >= 0. By the inductive hypothesis we have that factorial m
evaluates to m! (since m ≥ 0), and so by deﬁnition factorial n evaluates
to
n m! = (m + 1) m!
= (m + 1)!
= n!,
as required. This completes the proof.
That was easy. What about the iterative deﬁnition of factorial? We
focus on the behavior of helper. A suitable speciﬁcation is given as fol
lows:
1. Precondition: n ≥ 0.
2. Postcondition: helper (n, r) evaluates to n! r.
To show the total correctness of helper with respect to this speciﬁcation,
we once again proceed by mathematical induction on n. We leave it as an
exercise to give the details of the proof.
With this in hand it is easy to prove the correctness of factorial —
if n ≥ 0 then factorial n evaluates to the result of helper (n, 1),
which evaluates to n! 1 = n!. This completes the proof.
Helper functions correspond to lemmas, main functions correspond to
theorems. Just as we use lemmas to help us prove theorems, we use helper
functions to help us deﬁne main functions. The foregoing argument shows
that this is more than an analogy, but lies at the heart of good programming
style.
WORKING DRAFT DECEMBER 9, 2002
72 Recursive Functions
Here’s an example of a function deﬁned by complete induction (or strong
induction), the Fibonacci function, deﬁned on integers n >= 0:
(* for n>=0, fib n yields the nth Fibonacci number *)
fun fib 0 = 1
 fib 1 = 1
 fib (n:int) = fib (n1) + fib (n2)
The recursive calls are made not only on n1, but also n2, which is why
we must appeal to complete induction to justify the deﬁnition. This deﬁni
tion of fib is very inefﬁcient because it performs many redundant compu
tations: to compute fib n requires that we compute fib (n1) and fib
(n2). To compute fib (n1) requires that we compute fib (n2) a
second time, and fib (n3). Computing fib (n2) requires comput
ing fib (n3) again, and fib (n4). As you can see, there is consid
erable redundancy here. It can be shown that the running time fib of is
exponential in its argument, which is quite awful.
Here’s a better solution: for each n >= 0 compute not only the nth
Fibonacci number, but also the (n − 1)st as well. (For n = 0 we deﬁne the
“−1st” Fibonacci number to be zero). That way we can avoid redundant
recomputation, resulting in a lineartime algorithm. Here’s the code:
(* for n>=0, fib’ n evaluates to (a, b), where
a is the nth Fibonacci number, and
b is the (n1)st *)
fun fib’ 0 = (1, 0)
 fib’ 1 = (1, 1)
 fib’ (n:int) =
let
val (a:int, b:int) = fib’ (n1)
in
(a+b, a)
end
You might feel satisﬁed with this solution since it runs in time linear in
n. It turns out (see Graham, Knuth, and Patashnik, Concrete Mathematics
(AddisonWesley 1989) for a derivation) that the recurrence
F
0
= 1
F
1
= 1
F
n
= F
n−1
+ F
n−2
WORKING DRAFT DECEMBER 9, 2002
7.4 Mutual Recursion 73
has a closedform solution over the real numbers. This means that the nth
Fibonacci number can be calculated directly, without recursion, by using
ﬂoating point arithmetic. However, this is an unusual case. In most in
stances recursivelydeﬁned functions have no known closedformsolution,
so that some form of iteration is inevitable.
7.4 Mutual Recursion
It is often useful to deﬁne two functions simultaneously, each of which calls
the other (and possibly itself) to compute its result. Such functions are said
to be mutually recursive. Here’s a simple example to illustrate the point,
namely testing whether a natural number is odd or even. The most obvi
ous approach is to test whether the number is congruent to 0 mod 2, and
indeed this is what one would do in practice. But to illustrate the idea of
mutual recursion we instead use the following inductive characterization:
0 is even, and not odd; n > 0 is even iff n − 1 is odd; n > 0 is odd iff n − 1
is even. This may be coded up using two mutuallyrecursive procedures as
follows:
fun even 0 = true
 even n = odd (n1)
and odd 0 = false
 odd n = even (n1)
Notice that even calls odd and odd calls even, so they are not deﬁnable
separately from one another. We join their deﬁnitions using the keyword
and to indicate that they are deﬁned simultaneously by mutual recursion.
WORKING DRAFT DECEMBER 9, 2002
74 Recursive Functions
WORKING DRAFT DECEMBER 9, 2002
Chapter 8
Type Inference and
Polymorphism
8.1 Type Inference
So far we’ve mostly written our programs in what is known as the explicitly
typed style. This means that whenever we’ve introduced a variable, we’ve
assigned it a type at its point of introduction. In particular every variable in
a pattern has a type associated with it. As you may have noticed, this gets
a little tedious after a while, especially when you’re using clausal function
deﬁnitions. A particularly pleasant feature of ML is that it allows you to
omit this type information whenever it can be determined from context.
This process is known as type inference since the compiler is inferring the
missing type information based on context.
For example, there is no need to give a type to the variable s in the
function
fn s:string => s ˆ "¸n".
The reason is that no other type for s makes sense, since s is used as an
argument to string concatenation. Consequently, you may write simply
fn s => s ˆ "¸n",
leaving ML to insert “:string” for you.
When is it allowable to omit this information? Almost always, with
very few exceptions. It is a deep, and important, result about ML that miss
ing type information can (almost) always be reconstructed completely and
WORKING DRAFT DECEMBER 9, 2002
76 Type Inference and Polymorphism
unambiguously where it is omitted. This is called the principal typing prop
erty of ML: whenever type information is omitted, there is always a most
general (i.e., least restrictive) way to recover the omitted type information. If
there is no way to recover the omitted type information, then the expres
sion is illtyped. Otherwise there is a “best” way to ﬁll in the blanks, which
will (almost) always be found by the compiler. This is an amazingly useful,
and widely underappreciated, property of ML. It means, for example, that
the programmer can enjoy the full beneﬁts of a static type system without
paying any notational penalty whatsoever!
The prototypical example is the identity function, fn x=>x. The body
of the function places no constraints on the type of x, since it merely re
turns x as the result without performing any computation on it. Since the
behavior of the identity function is the same for all possible choices of type
for its argument, it is said to be polymorphic. Therefore the identity function
has inﬁnitely many types, one for each choice of the type of the parameter
x. Choosing the type of x to be typ, the type of the identity function is
typ>typ. In other words every type for the identity function has the form
typ>typ, where typ is the type of the argument.
Clearly there is a pattern here, which is captured by the notion of a
type scheme. A type scheme is a type expression involving one or more
type variables standing for an unknown, but arbitrary type expression. Type
variables are written ’a (pronounced “α”), ’b (pronounced “β”), ’c (pro
nounced “γ”), etc.. An instance of a type scheme is obtained by replacing
each of the type variables occurring in it with a type scheme, with the same
type scheme replacing each occurrence of a given type variable. For exam
ple, the type scheme ’a>’a has instances int>int, string>string,
(int*int)>(int*int), and (’b>’b)>(’b>’b), among inﬁnitely
many others. However, it does not have the type int>string as in
stance, since we are constrained to replace all occurrences of a type vari
able by the same type scheme. However, the type scheme ’a>’b has
both int>int and int>string as instances since there are different
type variables occurring in the domain and range positions.
Type schemes are used to express the polymorphic behavior of func
tions. For example, we may write fn x:’a=>x for the polymorphic iden
tity function of type ’a>’a, meaning that the behavior of the identity
function is independent of the type of x. Similarly, the behavior of the
function fn (x,y)=>x+1 is independent of the type of y, but constrains
the type of x to be int. This may be expressed using type schemes by writ
ing this function in the explicitlytyped form fn (x:int,y:’a)=>x+1
with type int*’a>int.
WORKING DRAFT DECEMBER 9, 2002
8.1 Type Inference 77
In these examples we needed only one type variable to express the poly
morphic behavior of a function, but usually we need more than one. For
example, the function fn (x,y) = x constrains neither the type of x nor
the type of y. Consequently we may choose their types freely and inde
pendently of one another. This may be expressed by writing this function
in the form fn (x:’a,y:’b)=>x with type scheme ’a*’b>’a. Notice
that while it is correct to assign the type ’a*’a>’a to this function, doing
so would be overly restrictive since the types of the two parameters need
not be the same. However, we could not assign the type ’a*’b>’c to
this function because the type of the result must be the same as the type of
the ﬁrst parameter: it returns its ﬁrst parameter when invoked! The type
scheme ’a*’b>’a precisely captures the constraints that must be satis
ﬁed for the function to be type correct. It is said to be the most general or
principal type scheme for the function.
It is a remarkable fact about ML that every expression (with the exception
of a fewpesky examples that we’ll discuss below) has a principal type scheme.
That is, there is (almost) always a best or most general way to infer types for
expressions that maximizes generality, and hence maximizes ﬂexibility in the
use of the expression. Every expression “seeks its own depth” in the sense
that an occurrence of that expression is assigned a type that is an instance
of its principal type scheme determined by the context of use.
For example, if we write
(fn x=>x)(0),
the context forces the type of the identity function to be int>int, and if
we write
(fn x=>x)(fn x=>x)(0)
the context forces the instance (int>int)>(int>int) of the princi
pal type scheme for the identity at the ﬁrst occurrence, and the instance
int>int for the second.
How is this achieved? Type inference is a process of constraint satis
faction. First, the expression determines a set of equations governing the
relationships among the types of its subexpressions. For example, if a func
tion is applied to an argument, then a constraint equating the domain type
of the function with the type of the argument is generated. Second, the con
straints are solved using a process similar to Gaussian elimination, called
uniﬁcation. The equations can be classiﬁed by their solution sets as follows:
1. Overconstrained: there is no solution. This corresponds to a type error.
WORKING DRAFT DECEMBER 9, 2002
78 Type Inference and Polymorphism
2. Underconstrained: there are many solutions. There are two subcases:
ambiguous (due to overloading, which we will discuss further in sec
tion 8.3), or polymorphic (there is a “best” solution).
3. Uniquely determined: there is precisely one solution. This corresponds
to a completely unambiguous type inference problem.
The free type variables in the solution to the system of equations may be
thought of as determining the “degrees of freedom” or “range of polymor
phism” of the type of an expression — the constraints are solvable for any
choice of types to substitute for these free type variables.
This description of type inference as a constraint satisfaction procedure
accounts for the notorious obscurity of type checking errors in ML. If a
program is not type correct, then the system of constraints associated with
it will not have a solution. The type inference procedure attempts to ﬁnd
a solution to these constraints, and at some point discovers that it cannot
succeed. It is fundamentally impossible to attribute this inconsistency to
any particular constraint; all that can be said is that the constraint set as a
whole has no solution. The checker usually reports the ﬁrst unsatisﬁable
equation it encounters, but this may or may not correspond to the “reason”
(in the mind of the programmer) for the type error. The usual method for
ﬁnding the error is to insert sufﬁcient type information to narrow down the
source of the inconsistency until the source of the difﬁculty is uncovered.
8.2 Polymorphic Deﬁnitions
There is an important interaction between polymorphic expressions and
value bindings that may be illustrated by the following example. Suppose
that we wish to bind the identity function to a variable I so that we may
refer to it by name. We’ve previously observed that the identity function is
polymorphic, with principal type scheme ’a>’a. This may be captured
by ascribing this type scheme to the variable I at the val binding. That is,
we may write
val I : ’a>’a = fn x=>x
to ascribe the type scheme ’a>’a to the variable I. (We may also write
fun I(x:’a):’a = x
for an equivalent binding of I.) Having done this, each use of I determines a
distinct instance of the ascribed type scheme ’a>’a. That is, both I 0 and I
WORKING DRAFT DECEMBER 9, 2002
8.2 Polymorphic Deﬁnitions 79
I 0 are wellformed expressions, the ﬁrst assigning the type int>int to
I, the second assigning the types
(int>int)>(int>int)
and
int>int
to the two occurrences of I. Thus the variable I behaves precisely the same
as its deﬁnition, fn x=>x, in any expression where it is used.
As a convenience ML also provides a form of type inference on value
bindings that eliminates the need to ascribe a type scheme to the variable
when it is bound. If no type is ascribed to a variable introduced by a val
binding, then it is implicitly ascribed the principal type scheme of the right
hand side. For example, we may write
val I = fn x=>x
or
fun I(x) = x
as a binding for the variable . The type checker implicitly assigns the prin
cipal type scheme, ’a>’a, of the binding to the variable I. In practice we
often allow the type checker to infer the principal type of a variable, but it
is often good form to explicitly indicate the intended type as a consistency
check and for documentation purposes.
The treatment of val bindings ensures that a bound variable has pre
cisely the same types as its binding. In other words the type checker be
haves as though all uses of the bound variable are implicitly replaced by its
binding before type checking. Since this may involve replication of the
binding, the meaning of a program is not necessarily preserved by this
transformation. (Think, for example, of any expression that opens a win
dow on your screen: if you replicate the expression and evaluate it twice,
it will open two windows. This is not the same as evaluating it only once,
which results in one window.)
To ensure semantic consistency, variables introduced by a val binding
are allowed to be polymorphic only if the righthand side is a value. This
is called the value restriction on polymorphic declarations. For fun bind
ings this restriction is always met since the righthand side is implicitly
a lambda expression, which is a value. However, it might be thought that
the following declaration introduces a polymorphic variable of type ’a >
’a, but in fact it is rejected by the compiler:
WORKING DRAFT DECEMBER 9, 2002
80 Type Inference and Polymorphism
val J = I I
The reason is that the righthand side is not a value; it requires computation
to determine its value. It is therefore ruled out as inadmissible for polymor
phism; the variable J may not be used polymorphically in the remainder of
the program. In this case the difﬁculty may be avoided by writing instead
fun J x = I I x
because now the binding of J is a lambda, which is a value.
In some rare circumstances this is not possible, and some polymor
phismis lost. For example, the following declaration of a value of list type,
1
val l = nil @ nil,
does not introduce an identiﬁer with a polymorphic type, even though the
almost equivalent declaration
val l = nil
does do so. Since the righthand side is a list, we cannot apply the “trick”
of deﬁning l to be a function; we are stuck with a loss of polymorphism in
this case. This particular example is not very impressive, but occasionally
similar examples do arise in practice.
Why is the value restriction necessary? Later on, when we study mu
table storage, we’ll see that some restriction on polymorphism is essen
tial if the language is to be type safe. The value restriction is an easily
remembered sufﬁcient condition for soundness, but as the examples above
illustrate, it is by no means necessary. The designers of ML were faced with
a choice of simplicity vs ﬂexibility; in this case they opted for simplicity at
the expense of some expressiveness in the language.
8.3 Overloading
Type information cannot always be omitted. There are a few corner cases
that create problems for type inference, most of which arise because of con
cessions that are motivated by longstanding, if dubious, notational prac
tices.
The main source of difﬁculty stems from overloading of arithmetic oper
ators. As a concession to longstanding practice in informal mathematics
1
To be introduced in chapter 9.
WORKING DRAFT DECEMBER 9, 2002
8.3 Overloading 81
and in many programming languages, the same notation is used for both
integer and ﬂoating point arithmetic operations. As long as we are pro
gramming in an explicitlytyped style, this convention creates no particular
problems. For example, in the function
fn x:int => x+x
it is clear that integer addition is called for, whereas in the function
fn x:real => x+x
it is equally obvious that ﬂoating point addition is intended.
However, if we omit type information, then a problem arises. What are
we to make of the function
fn x => x+x ?
Does “+” stand for integer or ﬂoating point addition? There are two dis
tinct reconstructions of the missing type information in this example, cor
responding to the preceding two explictlytyped programs. Which is the
compiler to choose?
When presented with such a program, the compiler has two choices:
1. Declare the expression ambiguous, and force the programmer to pro
vide enough explicit type information to resolve the ambiguity.
2. Arbitrarily choose a “default” interpretation, say the integer arith
metic, that forces one interpretation or another.
Each approach has its advantages and disadvantages. Many compilers
choose the second approach, but issue a warning indicating that it has done
so. To avoid ambiguity, explicit type information is required from the pro
grammer.
The situation is actually a bit more subtle than the preceding discus
sion implies. The reason is that the type inference process makes use of
the surrounding context of an expression to help resolve ambiguities. For
example, if the expression fn x=>x+x occurs in the following, larger ex
pression, there is in fact no ambiguity:
(fn x => x+x)(3).
Since the function is applied to an integer argument, there is no question
that the only possible resolution of the missing type information is to treat
x as having type int, and hence to treat + as integer addition.
WORKING DRAFT DECEMBER 9, 2002
82 Type Inference and Polymorphism
The important question is how much context is considered before the
situation is considered ambiguous? The rule of thumb is that context is
considered up to the nearest enclosing function declaration. For example,
consider the following example:
let
val double = fn x => x+x
in
(double 3, double 4)
end
The function expression fn x=>x+x will be ﬂagged as ambiguous, even
though its only uses are with integer arguments. The reason is that value
bindings are considered to be “units” of type inference for which all am
biguity must be resolved before type checking continues. If your compiler
adopts the integer interpretation as default, the above program will be ac
cepted (with a warning), but the following one will be rejected:
let
val double = fn x => x+x
in
(double 3.0, double 4.0)
end
Finally, note that the following program must be rejected because no
resolution of the overloading of addition can render it meaningful:
let
val double = fn x => x+x
in
(double 3, double 3.0)
end
The ambiguity must be resolved at the val binding, which means that the
compiler must commit at that point to treating the addition operation as
either integer or ﬂoating point. No single choice can be correct, since we
subsequently use double at both types.
A closelyrelated source of ambiguity is arises from the “record elision”
notation described in chapter 5. Consider the function #name, deﬁned by
fun #name ¦name=n:string, ...¦ = n
WORKING DRAFT DECEMBER 9, 2002
8.3 Overloading 83
which selects the name ﬁeld of a record. This deﬁnition is ambiguous be
cause the compiler cannot uniquely determine the domain type of the func
tion! Any of the following types are legitimate domain types for #name,
none of which is “best”:
¦name:string¦
¦name:string,salary:real¦
¦name:string,salary:int¦
¦name:string,address:string¦
Of course there are inﬁnitely many such examples, none of which is clearly
preferable to the other. This function deﬁnition is therefore rejected as am
biguous by the compiler — there is no one interpretation of the function
that sufﬁces for all possible uses.
In chapter 5 we mentioned that functions such as #name are predeﬁned
by the ML compiler, yet we just nowclaimed that such a function deﬁnition
is rejected as ambiguous. Isn’t this a contradiction? Not really, because
what happens is that each occurrence of #name is replaced by the function
fn ¦name=n,...¦ = n
and then context is used to resolve the “local” ambiguity. This works well,
provided that the complete record type of the arguments to #name can be
determined from context. If not, the uses are rejected as ambiguous. Thus,
the following expression is welltyped
fn r : ¦name:string,address:string,salary:int¦ =>
(#name r, #address r)
because the record type of r is explicitly given. If the type of r were omit
ted, the expression would be rejected as ambiguous (unless the context re
solves the ambiguity.)
WORKING DRAFT DECEMBER 9, 2002
84 Type Inference and Polymorphism
WORKING DRAFT DECEMBER 9, 2002
Chapter 9
Programming with Lists
9.1 List Primitives
In chapter 5 we noted that aggregate data structures are especially easy
to handle in ML. In this chapter we consider another important aggregate
type, the list type. In addition to being an important form of aggregate type
it also illustrates two other general features of the ML type system:
1. Type constructors, or parameterized types. The type of a list reveals the
type of its elements.
2. Recursive types. The set of values of a list type are given by an induc
tive deﬁnition.
Informally, the values of type typ list are the ﬁnite lists of values of
type typ. More precisely, the values of type typ list are given by an in
ductive deﬁnition, as follows:
1. nil is a value of type typ list.
2. if h is a value of type typ, and t is a value of type typ list, then h::t
is a value of type typ list.
3. Nothing else is a value of type typ list.
The type expression typ list is a postﬁx notation for the application
of the type constructor list to the type typ. Thus list is a kind of “func
tion” mapping types to types: given a type typ, we may apply list to it
to get another type, written typ list. The forms nil and :: are the value
constructors of type typ list. The nullary (no argument) constructor nil
WORKING DRAFT DECEMBER 9, 2002
86 Programming with Lists
may be thought of as the empty list. The binary (two argument) construc
tor :: constructs a nonempty list from a value h of type typ and another
value t of type typ list; the resulting value, h::t, of type typ list is
pronounced “h cons t” (for historical reasons). We say that “h is cons’d onto
t”, that h is the head of the list, and that t is its tail.
The deﬁnition of the values of type typ list given above is an example
of an inductive deﬁnition. The type is said to be recursive because this deﬁ
nition is “selfreferential” in the sense that the values of type typ list are
deﬁned in terms of (other) values of the same type. This is especially clear
if we examine the types of the value constructors for the type typ list:
val nil : typ list
val (op ::) : typ * typ list > typ list
The notation op :: is used to refer to the :: operator as a function, rather
than to use it to form a list, which requires inﬁx notation.
Two things are notable here:
1. The :: operation takes as its second argument a value of type typ
list, and yields a result of type typ list. This selfreferential as
pect is characteristic of an inductive deﬁnition.
2. Both nil and op :: are polymorphic in the type of the underlying
elements of the list. Thus nil is the empty list of type typ list for
any element type typ, and op :: constructs a nonempty list inde
pendently of the type of the elements of that list.
It is easy to see that a value val of type typ list has the form
val
1
::(val
2
:: ( ::(val
n
::nil) ))
for some n ≥ 0, where val
i
is a value of type typ
i
for each 1 ≤ i ≤ n. For
according to the inductive deﬁnition of the values of type typ list, the
value val must either be nil, which is of the above form, or val
1
::val
,
where val
is a value of type typ list. By induction val
has the form
(val
2
:: ( ::(val
n
::nil) ))
and hence val again has the speciﬁed form.
By convention the operator :: is rightassociative, so we may omit the
parentheses and just write
val
1
::val
2
:: ::val
n
::nil
WORKING DRAFT DECEMBER 9, 2002
9.2 Computing With Lists 87
as the general form of val of type typ list. This may be further abbrevi
ated using list notation, writing
[ val
1
, val
2
, ..., val
n
]
for the same list. This notation emphasizes the interpretation of lists as
ﬁnite sequences of values, but it obscures the fundamental inductive char
acter of lists as being built up from nil using the :: operation.
9.2 Computing With Lists
How do we compute with values of list type? Since the values are deﬁned
inductively, it is natural that functions on lists be deﬁned recursively, using
a clausal deﬁnition that analyzes the structure of a list. Here’s a deﬁnition
of the function length that computes the number of elements of a list:
fun length nil = 0
 length ( ::t) = 1 + length t
The deﬁnition is given by induction on the structure of the list argument.
The base case is the empty list, nil. The inductive step is the nonempty list
::t (notice that we do not need to give a name to the head). Its deﬁnition
is given in terms of the tail of the list t, which is “smaller” than the list
::t. The type of length is ’a list > int; it is deﬁned for lists of
values of any type whatsoever.
We may deﬁne other functions following a similar pattern. Here’s the
function to append two lists:
fun append (nil, l) = l
 append (h::t, l) = h :: append (t, l)
This function is built into ML; it is written using inﬁx notation as exp
1
@
exp
2
. The running time of append is proportional to the length of the ﬁrst
list, as should be obvious from its deﬁnition.
Here’s a function to reverse a list.
fun rev nil = nil
 rev (h::t) = rev t @ [h]
Its running time is O(n
2
), where n is the length of the argument list. This
can be demonstrated by writing down a recurrence that deﬁnes the running
time T(n) on a list of length n.
T(0) = O(1)
T(n + 1) = T(n) + O(n)
WORKING DRAFT DECEMBER 9, 2002
88 Programming with Lists
Solving the recurrence we obtain the result T(n) = O(n
2
).
Can we do better? Oddly, we can take advantage of the nonassociativity
of :: to give a tailrecursive deﬁnition of rev.
local
fun helper (nil, a) = a
 helper (h::t, a) = helper (t, h::a)
in
fun rev’ l = helper (l, nil)
end
The general idea of introducing an accumulator is the same as before, ex
cept that by reordering the applications of :: we reverse the list! The
helper function reverses its ﬁrst argument and prepends it to its second ar
gument. That is, helper (l, a) evaluates to (rev l) @ a, where we
assume here an independent deﬁnition of rev for the sake of the speciﬁ
cation. Notice that helper runs in time proportional to the length of its
ﬁrst argument, and hence rev’ runs in time proportional to the length of
its argument.
The correctness of functions deﬁned on lists may be established using
the principle of structural induction. We illustrate this by establishing that
the function helper satisﬁes the following speciﬁcation:
for every l and a of type typ list, helper(l, a) evaluates to
the result of appending a to the reversal of l.
That is, there are no preconditions on l and a, and we establish the post
condition that helper (l, a) yields (rev l) @ a.
The proof is by structural induction on the list l. If l is nil, then helper
(l,a) evaluates to a, which fulﬁlls the postcondition. If l is the list h::t,
then the application helper (l, a) reduces to the value of helper (t,
(h::a)). By the inductive hypothesis this is just (rev t) @ (h :: a),
which is equivalent to (rev t) @ [h] @ a. But this is just rev (h::t)
@ a, which was to be shown.
The principle of structural induction may be summarized as follows. To
show that a function works correctly for every list l, it sufﬁces to show
1. The correctness of the function for the empty list, nil, and
2. The correctness of the function for h::t, assuming its correctness for
t.
WORKING DRAFT DECEMBER 9, 2002
9.2 Computing With Lists 89
As with mathematical induction over the natural numbers, structural in
duction over lists allows us to focus on the basic and incremental behavior
of a function to establish its correctness for all lists.
WORKING DRAFT DECEMBER 9, 2002
90 Programming with Lists
WORKING DRAFT DECEMBER 9, 2002
Chapter 10
Concrete Data Types
10.1 Datatype Declarations
Lists are one example of the general notion of a recursive type. ML pro
vides a general mechanism, the datatype declaration, for introducing
programmerdeﬁned recursive types. Earlier we introduced type declara
tions as an abbreviation mechanism. Types are given names as documenta
tion and as a convenience to the programmer, but doing so is semantically
inconsequential — one could replace all uses of the type name by its deﬁni
tion and not affect the behavior of the program. In contrast the datatype
declaration provides a means of introducing a new type that is distinct from
all other types and that does not merely stand for some other type. It is the
means by which the ML type system may be extended by the programmer.
The datatype declaration in ML has a number of facets. A datatype
declaration introduces
1. One or more new type constructors. The type constructors intro
duced may, or may not, be mutually recursive.
2. One or more new value constructors for each of the type constructors
introduced by the declaration.
The type constructors may take zero or more arguments; a zeroargument,
or nullary, type constructor is just a type. Each value constructor may also
take zero or more arguments; a nullary value constructor is just a constant.
The type and value constructors introduced by the declaration are “new”
in the sense that they are distinct from all other type and value construc
tors previously introduced; if a datatype redeﬁnes an “old” type or value
WORKING DRAFT DECEMBER 9, 2002
92 Concrete Data Types
constructor, then the old deﬁnition is shadowed by the new one, rendering
the old ones inaccessible in the scope of the new deﬁnition.
10.2 NonRecursive Datatypes
Here’s a simple example of a nullary type constructor with four nullary
value constructors.
datatype suit = Spades  Hearts  Diamonds  Clubs
This declaration introduces a new type suit with four nullary value con
structors, Spades, Hearts, Diamonds, and Clubs. This declaration may
be read as introducing a type suit such that a value of type suit is either
Spades, or Hearts, or Diamonds, or Clubs. There is no signiﬁcance to
the ordering of the constructors in the declaration; we could just as well
have written
datatype suit = Hearts  Diamonds  Spades  Clubs
(or any other ordering, for that matter). It is conventional to capitalize the
names of value constructors, but this is not required by the language.
Given the declaration of the type suit, we may deﬁne functions on it by
case analysis on the value constructors using a clausal function deﬁnition.
For example, we may deﬁne the suit ordering in the card game of bridge
by the function
fun outranks (Spades, Spades) = false
 outranks (Spades, ) = true
 outranks (Hearts, Spades) = false
 outranks (Hearts, Hearts) = false
 outranks (Hearts, ) = true
 outranks (Diamonds, Clubs) = true
 outranks (Diamonds, ) = false
 outranks (Clubs, ) = false
This deﬁnes a function of type suit * suit > bool that determines
whether or not the ﬁrst suit outranks the second.
Data types may be parameterized by a type. For example, the declaration
datatype ’a option = NONE  SOME of ’a
introduces the unary type constructor ’a option with two value con
structors, NONE, with no arguments, and SOME, with one. The values of
type typ option are
WORKING DRAFT DECEMBER 9, 2002
10.2 NonRecursive Datatypes 93
1. The constant NONE, and
2. Values of the form SOME val , where val is a value of type typ.
For example, some values of type string option are NONE, SOME "abc",
and SOME "def".
The option type constructor is predeﬁned in Standard ML. One com
mon use of option types is to handle functions with an optional argument.
For example, here is a function to compute the baseb exponential function
for natural number exponents that defaults to base 2:
fun expt (NONE, n) = expt (SOME 2, n)
 expt (SOME b, 0) = 1
 expt (SOME b, n) =
if n mod 2 = 0 then
expt (SOME b*b, n div 2)
else
b * expt (SOME b, n1)
The advantage of the option type in this sort of situation is that it avoids
the need to make a special case of a particular argument, e.g., using 0 as
ﬁrst argument to mean “use the default exponent”.
A related use of option types is in aggregate data structures. For exam
ple, an address book entry might have a record type with ﬁelds for various
bits of data about a person. But not all data is relevant to all people. For
example, someone may not have a spouse, but they all have a name. For
this we might use a type deﬁnition of the form
type entry = ¦ name:string, spouse:string option ¦
so that one would create an entry for an unmarried person with a spouse
ﬁeld of NONE.
Option types may also be used to represent an optional result. For ex
ample, we may wish to deﬁne a function reciprocal that returns the
reciprocal of an integer, if it has one, and otherwise indicates that it has no
reciprocal. This is achieve by deﬁning reciprocal to have type int >
int option as follows:
fun reciprocal 0 = NONE
 reciprocal n = SOME (1 div n)
To use the result of a call to reciprocal we must perform a case analysis of
the form
WORKING DRAFT DECEMBER 9, 2002
94 Concrete Data Types
case (reciprocal exp
of NONE => exp
1
 SOME r => exp
2
where exp
1
covers the case that exp has no reciprocal, and exp
2
covers the
case that exp has reciprocal r.
10.3 Recursive Datatypes
The next level of generality is the recursive type deﬁnition. For example,
one may deﬁne a type typ tree of binary trees with values of type typ at
the nodes using the following declaration:
datatype ’a tree =
Empty 
Node of ’a tree * ’a * ’a tree
This declaration corresponds to the informal deﬁnition of binary trees with
values of type typ at the nodes:
1. The empty tree Empty is a binary tree.
2. If tree 1 and tree 2 are binary trees, and val is a value of type typ,
then Node (tree 1, val , tree 2) is a binary tree.
3. Nothing else is a binary tree.
The distinguishing feature of this deﬁnition is that it is recursive in the sense
that binary trees are constructed out of other binary trees, with the empty
tree serving as the base case.
(Incidentally, a leaf in a binary tree is here represented as a node both of
whose children are the empty tree. This deﬁnition of binary trees is analo
gous to starting the natural numbers with zero, rather than one. One can
think of the children of a node in a binary tree as the “predecessors” of that
node, the only difference compared to the usual deﬁnition of predecessor
being that a node has two, rather than one, predecessors.)
To compute with a recursive type, use a recursive function. For exam
ple, here is the function to compute the height of a binary tree:
fun height Empty = 0
 height (Node (lft, , rht)) =
1 + max (height lft, height rht)
WORKING DRAFT DECEMBER 9, 2002
10.3 Recursive Datatypes 95
Notice that height is called recursively on the children of a node, and is
deﬁned outright on the empty tree. This pattern of deﬁnition is another
instance of structural induction (on the tree type). The function height
is said to be deﬁned by induction on the structure of a tree. The general
idea is to deﬁne the function directly for the base cases of the recursive
type (i.e., value constructors with no arguments or whose arguments do
not involve values of the type being deﬁned), and to deﬁne it for nonbase
cases in terms of its deﬁnitions for the constituent values of that type. We
will see numerous examples of this as we go along.
Here’s another example. The size of a binary tree is the number of nodes
occurring in it. Here’s a straightforward deﬁnition in ML:
fun size Empty = 0
 size (Node (lft, , rht)) =
1 + size lft + size rht
The function size is deﬁned by structural induction on trees.
A word of warning. One reason to capitalize value constructors is to
avoid a pitfall in the ML syntax that we mentioned in chapter 2. Suppose
we gave the following deﬁnition of size:
fun size empty = 0
 size (Node (lft, , rht)) =
1 + size lft + size rht
The compiler will warn us that the second clause of the deﬁnition is redun
dant! Why? Because empty, spelled with a lowercase “e”, is a variable, not
a constructor, and hence matches any tree whatsoever. Consequently the
second clause never applies. By capitalizing constructors we can hope to
make mistakes such as these more evident, but in practice you are bound
to run into this sort of mistake.
The tree data type is appropriate for binary trees: those for which
every node has exactly two children. (Of course, either or both children
might be the empty tree, so we may consider this to deﬁne the type of trees
with at most two children; it’s a matter of terminology which interpretation
you prefer.) It should be obvious how to deﬁne the type of ternary trees,
whose nodes have at most three children, and so on for other ﬁxed arities.
But what if we wished to deﬁne a type of trees with a variable number of
children? In a socalled variadic tree some nodes might have three children,
some might have two, and so on. This can be achieved in at least two ways.
One way combines lists and trees, as follows:
WORKING DRAFT DECEMBER 9, 2002
96 Concrete Data Types
datatype ’a tree =
Empty 
Node of ’a * ’a tree list
Each node has a list of children, so that distinct nodes may have different
numbers of children. Notice that the empty tree is distinct from the tree
with one node and no children because there is no data associated with the
empty tree, whereas there is a value of type ’a at each node.
Another approach is to simultaneously deﬁne trees and “forests”. A
variadic tree is either empty, or a node gathering a “forest” to form a tree;
a forest is either empty or a variadic tree together with another forest. This
leads to the following deﬁnition:
datatype ’a tree =
Empty 
Node of ’a * ’a forest
and ’a forest =
Nil 
Cons of ’a tree * ’a forest
This example illustrates the introduction of two mutually recursive datatypes.
Mutually recursive datatypes beget mutually recursive functions. Here’s
a deﬁnition of the size (number of nodes) of a variadic tree:
fun size tree Empty = 0
 size tree (Node ( , f)) = 1 + size forest f
and size forest Nil = 0
 size forest (Cons (t, f’)) = size tree t + size forest f’
Notice that we deﬁne the size of a tree in terms of the size of a forest, and
vice versa, just as the type of trees is deﬁned in terms of the type of forests.
Many other variations are possible. Suppose we wish to deﬁne a notion
of binary tree in which data items are associated with branches, rather than
nodes. Here’s a datatype declaration for such trees:
datatype ’a tree =
Empty 
Node of ’a branch * ’a branch
and ’a branch =
Branch of ’a * ’a tree
WORKING DRAFT DECEMBER 9, 2002
10.4 Heterogeneous Data Structures 97
In contrast to our ﬁrst deﬁnition of binary trees, in which the branches from
a node to its children were implicit, we now make the branches themselves
explicit, since data is attached to them.
For example, we can collect into a list the data items labelling the branches
of such a tree using the following code:
fun collect Empty = nil
 collect (Node (Branch (ld, lt), Branch (rd, rt))) =
ld :: rd :: (collect lt) @ (collect rt)
10.4 Heterogeneous Data Structures
Returning to the original deﬁnition of binary trees (with data items at the
nodes), observe that the type of the data items at the nodes must be the
same for every node of the tree. For example, a value of type int tree
has an integer at every node, and a value of type string tree has a string at
every node. Therefore an expression such as
Node (Empty, 43, Node (Empty, "43", Empty))
is illtyped. The type system insists that trees be homogeneous in the sense
that the type of the data items is the same at every node.
It is quite rare to encounter heterogeneous data structures in real pro
grams. For example, a dictionary with strings as keys might be represented
as a binary search tree with strings at the nodes; there is no need for het
erogeneity to represent such a data structure. But occasionally one might
wish to work with a heterogeneous tree, whose data values at each node are
of different types. How would one represent such a thing in ML?
To discover the answer, ﬁrst think about how one might manipulate
such a data structure. When accessing a node, we would need to check
at runtime whether the data item is an integer or a string; otherwise we
would not know whether to, say, add 1 to it, or concatenate "1" to the
end of it. This suggests that the data item must be labelled with sufﬁcient
information so that we may determine the type of the item at runtime. We
must also be able to recover the underlying data item itself so that familiar
operations (such as addition or string concatenation) may be applied to it.
The required labelling and discrimination is neatly achieved using a
datatype declaration. Suppose we wish to represent the type of integeror
string trees. First, we deﬁne the type of values to be integers or strings,
marked with a constructor indicating which:
WORKING DRAFT DECEMBER 9, 2002
98 Concrete Data Types
datatype int or string =
Int of int 
String of string
Then we deﬁne the type of interest as follows:
type int or string tree =
int or string tree
Voila! Perfectly natural and easy — heterogeneity is really a special case of
homogeneity!
10.5 Abstract Syntax
Datatype declarations and pattern matching are extremely useful for deﬁn
ing and manipulating the abstract syntax of a language. For example, we
may deﬁne a small language of arithmetic expressions using the following
declaration:
datatype expr =
Numeral of int 
Plus of expr * expr 
Times of expr * expr
This deﬁnition has only three clauses, but one could readily imagine adding
others. Here is the deﬁnition of a function to evaluate expressions of the
language of arithmetic expressions written using pattern matching:
fun eval (Numeral n) = Numeral n
 eval (Plus (e1, e2)) =
let
val Numeral n1 = eval e1
val Numeral n2 = eval e2
in
Numeral (n1+n2)
end
 eval (Times (e1, e2)) =
let
val Numeral n1 = eval e1
val Numeral n2 = eval e2
in
Numeral (n1*n2)
end
WORKING DRAFT DECEMBER 9, 2002
10.5 Abstract Syntax 99
The combination of datatype declarations and pattern matching con
tributes enormously to the readability of programs written in ML. A less
obvious, but more important, beneﬁt is the error checking that the com
piler can perform for you if you use these mechanisms in tandem. As an
example, suppose that we extend the type expr with a new component for
the reciprocal of a number, yielding the following revised deﬁnition:
datatype expr =
Numeral of int 
Plus of expr * expr 
Times of expr * expr 
Recip of expr
First, observe that the “old” deﬁnition of eval is no longer applicable to
values of type expr! For example, the expression
eval (Plus (Numeral 1, Numeral 2))
is illtyped, even though it doesn’t use the Recip constructor. The reason is
that the redeclaration of expr introduces a “new” type that just happens
to have the same name as the “old” type, but is in fact distinct from it. This
is a boon because it reminds us to recompile the old code relative to the
new deﬁnition of the expr type.
Second, upon recompiling the deﬁnition of eval we encounter an in
exhaustive match warning: the old code no longer applies to every value of
type expr according to its new deﬁnition! We are of course lacking a case
for Recip, which we may provide as follows:
fun eval (Numeral n) = Numeral n
 eval (Plus (e1, e2)) = ... as before ...
 eval (Times (e1, e2)) = ... as before ...
 eval (Recip e) =
let
val Numeral n = eval e
in
Numeral (1 div n)
end
The value of the checks provided by the compiler in such cases cannot be
overestimated. When recompiling a large program after making a change
to a datatype declaration the compiler will automatically point out every
line of code that must be changed to conform to the new deﬁnition; it is
WORKING DRAFT DECEMBER 9, 2002
100 Concrete Data Types
impossible to forget to attend to even a single case. This is a tremendous
help to the developer, especially if she is not the original author of the code
being modiﬁed and is another reason why the static type discipline of ML
is a positive beneﬁt, rather than a hindrance, to programmers.
WORKING DRAFT DECEMBER 9, 2002
Chapter 11
HigherOrder Functions
11.1 Functions as Values
Values of function type are ﬁrstclass, which means that they have the same
rights and privileges as values of any other type. In particular, functions
may be passed as arguments and returned as results of other functions,
and functions may be stored in and retrieved from data structures such as
lists and trees. We will see that ﬁrstclass functions are an important source
of expressive power in ML.
Functions which take functions as arguments or yield functions as re
sults are known as higherorder functions (or, less often, as functionals or oper
ators). Higherorder functions arise frequently in mathematics. For exam
ple, the differential operator is the higherorder function that, when given a
(differentiable) function on the real line, yields its ﬁrst derivative as a func
tion on the real line. We also encounter functionals mapping functions to
real numbers, and real numbers to functions. An example of the former
is provided by the deﬁnite integral viewed as a function of its integrand,
and an example of the latter is the deﬁnite integral of a given function on
the interval [a, x], viewed as a function of a, that yields the area under the
curve from a to x as a function of x.
Higherorder functions are less familiar tools for many programmers
since the bestknown programming languages have only rudimentary mech
anisms to support their use. In contrast higherorder functions play a promi
nent role in ML, with a variety of interesting applications. Their use may
be classiﬁed into two broad categories:
1. Abstracting patterns of control. Higherorder functions are design pat
terns that “abstract out” the details of a computation to lay bare the
WORKING DRAFT DECEMBER 9, 2002
102 HigherOrder Functions
skeleton of the solution. The skeleton may be ﬂeshed out to form a
solution of a problem by applying the general pattern to arguments
that isolate the speciﬁc problem instance.
2. Staging computation. It arises frequently that computation may be
staged by expending additional effort “early” to simplify the compu
tation of “later” results. Staging can be used both to improve efﬁ
ciency and, as we will see later, to control sharing of computational
resources.
11.2 Binding and Scope
Before discussing these programming techniques, we will review the criti
cally important concept of scope as it applies to function deﬁnitions. Recall
that Standard ML is a statically scoped language, meaning that identiﬁers
are resolved according to the static structure of the program. A use of the
variable var is considered to be a reference to the nearest lexically enclosing
declaration of var. We say “nearest” because of the possibility of shadow
ing; if we redeclare a variable var, then subsequent uses of var refer to the
“most recent” (lexically!) declaration of it; any “previous” declarations are
temporarily shadowed by the latest one.
This principle is easy to apply when considering sequences of declara
tions. For example, it should be clear by now that the variable y is bound
to 32 after processing the following sequence of declarations:
val x = 2 (* x=2 *)
val y = x*x (* y=4 *)
val x = y*x (* x=8 *)
val y = x*y (* y=32 *)
In the presence of function deﬁnitions the situation is the same, but it can
be a bit tricky to understand at ﬁrst.
Here’s an example to test your grasp of the lexical scoping principle:
val x = 2
fun f y = x+y
val x = 3
val z = f 4
After processing these declarations the variable z is bound to 6, not to 7!
The reason is that the occurrence of x in the body of f refers to the ﬁrst
WORKING DRAFT DECEMBER 9, 2002
11.3 Returning Functions 103
declaration of x since it is the nearest lexically enclosing declaration of the
occurence, even though it has been subsequently redeclared.
This example illustrates three important points:
1. Binding is not assignment! If we were to view the second binding of
x as an assignment statement, then the value of z would be 7, not 6.
2. Scope resolution is lexical, not temporal. We sometimes refer to the
“most recent” declaration of a variable, which has a temporal ﬂavor,
but we always mean “nearest lexically enclosing at the point of oc
currence”.
3. ”Shadowed” bindings are not lost. The “old” binding for x is still
available (through calls to f), even though a more recent binding has
shadowed it.
One way to understand what’s going on here is through the concept
of a closure, a technique for implementing higherorder functions. When a
function expression is evaluated, a copy of the environment is attached to
the function. Subsequently, all free variables of the function (i.e., those vari
ables not occurring as parameters) are resolved with respect to the environ
ment attached to the function; the function is therefore said to be “closed”
with respect to the attached environment. This is achieved at function ap
plication time by “swapping” the attached environment of the function for
the environment active at the point of the call. The swapped environment
is restored after the call is complete. Returning to the example above, the
environment associated with the function f contains the declaration val x
= 2 to record the fact that at the time the function was evaluated, the vari
able x was bound to the value 2. The variable x is subsequently rebound
to 3, but when f is applied, we temporarily reinstate the binding of x to 2,
add a binding of y to 4, then evaluate the body of the function, yielding 6.
We then restore the binding of x and drop the binding of y before yielding
the result.
11.3 Returning Functions
While seemingly very simple, the principle of lexical scope is the source of
considerable expressive power. We’ll demonstrate this through a series of
examples.
To warm up let’s consider some simple examples of passing functions
as arguments and yielding functions as results. The standard example of
WORKING DRAFT DECEMBER 9, 2002
104 HigherOrder Functions
passing a function as argument is the map’ function, which applies a given
function to every element of a list. It is deﬁned as follows:
fun map’ (f, nil) = nil
 map’ (f, h::t) = (f h) :: map’ (f, t)
For example, the application
map’ (fn x => x+1, [1,2,3,4])
evaluates to the list [2,3,4,5].
Functions may also yield functions as results. What is surprising is that
we can create new functions during execution, not just return functions that
have been previously deﬁned. The most basic (and deceptively simple)
example is the function constantly that creates constant functions: given
a value k, the application constantly k yields a function that yields k
whenever it is applied. Here’s a deﬁnition of constantly:
val constantly = fn k => (fn a => k)
The function constantly has type ’a > (’b > ’a). We used the fn
notation for clarity, but the declaration of the function constantly may
also be written using fun notation as follows:
fun constantly k a = k
Note well that a white space separates the two successive arguments to
constantly! The meaning of this declaration is precisely the same as the
earlier deﬁnition using fn notation.
The value of the application constantly 3 is the function that is con
stantly 3; i.e., it always yields 3 when applied. Yet nowhere have we de
ﬁned the function that always yields 3. The resulting function is “created”
by the application of constantly to the argument 3, rather than merely
“retrieved” off the shelf of previouslydeﬁned functions. In implementa
tion terms the result of the application constantly 3 is a closure consist
ing of the function fn a => k with the environment val k = 3 attached
to it. The closure is a data structure (a pair) that is created by each applica
tion of constantly to an argument; the closure is the representation of the
“new” function yielded by the application. Notice, however, that the only
difference between any two results of applying the function constantly
lies in the attached environment; the underlying function is always fn a
=> k. If we think of the lambda as the “executable code” of the function,
WORKING DRAFT DECEMBER 9, 2002
11.3 Returning Functions 105
then this amounts to the observation that no newcode is created at runtime,
just new instances of existing code.
This also points out why functions in ML are not the same as code point
ers in C. You may be familiar with the idea of passing a pointer to a C func
tion to another C function as a means of passing functions as arguments
or yielding functions as results. This may be considered to be a form of
“higherorder” function in C, but it must be emphasized that code pointers
are signiﬁcantly less expressive than closures because in C there are only
statically many possibilities for a code pointer (it must point to one of the
functions deﬁned in your code), whereas in ML we may generate dynami
cally many different instances of a function, differing in the bindings of the
variables in its environment. The nonvarying part of the closure, the code,
is directly analogous to a function pointer in C, but there is no counterpart
in C of the varying part of the closure, the dynamic environment.
The deﬁnition of the function map’ given above takes a function and list
as arguments, yielding a new list as result. Often it occurs that we wish to
map the same function across several different lists. It is inconvenient (and
a tad inefﬁcient) to keep passing the same function to map’, with the list
argument varying each time. Instead we would prefer to create a instance
of map specialized to the given function that can then be applied to many
different lists. This leads to the following deﬁnition of the function map:
fun map f nil = nil
 map f (h::t) = (f h) :: (map f t)
The function map so deﬁned has type (’a>’b) > ’a list > ’b
list. It takes a function of type ’a > ’b as argument, and yields an
other function of type ’a list > ’b list as result.
The passage from map’ to map is called currying. We have changed
a twoargument function (more properly, a function taking a pair as argu
ment) into a function that takes two arguments in succession, yielding after
the ﬁrst a function that takes the second as its sole argument. This passage
can be codiﬁed as follows:
fun curry f x y = f (x, y)
The type of curry is
(’a*’b>’c) > (’a > (’b > ’c)).
Given a twoargument function, curry returns another function that, when
applied to the ﬁrst argument, yields a function that, when applied to the
WORKING DRAFT DECEMBER 9, 2002
106 HigherOrder Functions
second, applies the original twoargument function to the ﬁrst and second
arguments, given separately.
Observe that map may be alternately deﬁned by the binding
fun map f l = curry map’ f l
Applications are implicitly leftassociated, so that this deﬁnition is equiva
lent to the more verbose declaration
fun map f l = ((curry map’) f) l
11.4 Patterns of Control
We turn now to the idea of abstracting patterns of control. There is an
obvious similarity between the following two functions, one to add up the
numbers in a list, the other to multiply them.
fun add up nil = 0
 add up (h::t) = h + add up t
fun mul up nil = 1
 mul up (h::t) = h * mul up t
What precisely is the similarity? We will look at it from two points of view.
One view is that in each case we have a binary operation and a unit
element for it. The result on the empty list is the unit element, and the
result on a nonempty list is the operation applied to the head of the list
and the result on the tail. This pattern can be abstracted as the function
reduce deﬁned as follows:
fun reduce (unit, opn, nil) =
unit
 reduce (unit, opn, h::t) =
opn (h, reduce (unit, opn, t))
Here is the type of reduce:
val reduce : ’b * (’a*’b>’b) * ’a list > ’b
The ﬁrst argument is the unit element, the second is the operation, and the
third is the list of values. Notice that the type of the operation admits the
possibility of the ﬁrst argument having a different type from the second
argument and result.
Using reduce, we may redeﬁne add up and mul up as follows:
WORKING DRAFT DECEMBER 9, 2002
11.4 Patterns of Control 107
fun add up l = reduce (0, op +, l)
fun mul up l = reduce (1, op *, l)
To further check your understanding, consider the following declaration:
fun mystery l = reduce (nil, op ::, l)
(Recall that “op ::” is the function of type ’a * ’a list > ’a list
that adds a given value to the front of a list.) What function does mystery
compute?
Another view of the commonality between add up and mul up is that
they are both deﬁned by induction on the structure of the list argument,
with a base case for nil, and an inductive case for h::t, deﬁned in terms
of its behavior on t. But this is really just another way of saying that
they are deﬁned in terms of a unit element and a binary operation! The
difference is one of perspective: whether we focus on the pattern part of
the clauses (the inductive decomposition) or the result part of the clauses
(the unit and operation). The recursive structure of add up and mul up
is abstracted by the reduce functional, which is then specialized to yield
add up and mul up. Said another way, the function reduce abstracts the pat
tern of deﬁning a function by induction on the structure of a list.
The deﬁnition of reduce leaves something to be desired. One thing to
notice is that the arguments unit and opn are carried unchanged through
the recursion; only the list parameter changes on recursive calls. While
this might seem like a minor overhead, it’s important to remember that
multiargument functions are really singleargument functions that take a
tuple as argument. This means that each time around the loop we are con
structing a new tuple whose ﬁrst and second components remain ﬁxed,
but whose third component varies. Is there a better way? Here’s another
deﬁnition that isolates the “inner loop” as an auxiliary function:
fun better reduce (unit, opn, l) =
let
fun red nil = unit
 red (h::t) = opn (h, red t)
in
red l
end
Notice that each call to better reduce creates a new function red that
uses the parameters unit and opn of the call to better reduce. This
WORKING DRAFT DECEMBER 9, 2002
108 HigherOrder Functions
means that red is bound to a closure consisting of the code for the func
tion together with the environment active at the point of deﬁnition, which
will provide bindings for unit and opn arising from the application of
better reduce to its arguments. Furthermore, the recursive calls to red
no longer carry bindings for unit and opn, saving the overhead of creating
tuples on each iteration of the loop.
11.5 Staging
An interesting variation on reduce may be obtained by staging the compu
tation. The motivation is that unit and opn often remain ﬁxed for many
different lists (e.g., we may wish to sum the elements of many different
lists). In this case unit and opn are said to be “early” arguments and the
list is said to be a “late” argument. The idea of staging is to perform as
much computation as possible on the basis of the early arguments, yield
ing a function of the late arguments alone.
In the case of the function reduce this amounts to building red on the
basis of unit and opn, yielding it as a function that may be later applied
to many different lists. Here’s the code:
fun staged reduce (unit, opn) =
let
fun red nil = unit
 red (h::t) = opn (h, red t)
in
red
end
The deﬁnition of staged reduce bears a close resemblance to the deﬁ
nition of better reduce; the only difference is that the creation of the
closure bound to red occurs as soon as unit and opn are known, rather than
each time the list argument is supplied. Thus the overhead of closure cre
ation is “factored out” of multiple applications of the resulting function to
list arguments.
We could just as well have replaced the body of the let expression with
the function
fn l => red l
but a moment’s thought reveals that the meaning is the same.
Note well that we would not obtain the effect of staging were we to use
the following deﬁnition:
WORKING DRAFT DECEMBER 9, 2002
11.5 Staging 109
fun curried reduce (unit, opn) nil = unit
 curried reduce (unit, opn) (h::t) =
opn (h, curried reduce (unit, opn) t)
If we unravel the fun notation, we see that while we are taking two ar
guments in succession, we are not doing any useful work in between the
arrival of the ﬁrst argument (a pair) and the second (a list). A curried func
tion does not take signiﬁcant advantage of staging. Since staged reduce
and curried reduce have the same iterated function type, namely
(’b * (’a * ’b > ’b)) > ’a list > ’b
the contrast between these two examples may be summarized by saying
not every function of iterated function type is curried. Some are, and some
aren’t. The “interesting” examples (such as staged reduce) are the ones
that aren’t curried. (This directly contradicts established terminology, but it
is necessary to deviate from standard practice to avoid a serious misappre
hension.)
The time saved by staging the computation in the deﬁnition of staged reduce
is admittedly minor. But consider the following deﬁnition of an append
function for lists that takes both arguments at once:
fun append (nil, l) = l
 append (h::t, l) = h :: append(t,l)
Suppose that we will have occasion to append many lists to the end of a
given list. What we’d like is to build a specialized appender for the ﬁrst list
that, when applied to a second list, appends the second to the end of the
ﬁrst. Here’s a naive solution that merely curries append:
fun curried append nil l = l
 curried append (h::t) l = h :: append t l
Unfortunately this solution doesn’t exploit the fact that the ﬁrst argument
is ﬁxed for many second arguments. In particular, each application of the
result of applying curried append to a list results in the ﬁrst list being
traversed so that the second can be appended to it.
We can improve on this by staging the computation as follows:
WORKING DRAFT DECEMBER 9, 2002
110 HigherOrder Functions
fun staged append nil = fn l => l
 staged append (h::t) =
let
val tail appender = staged append t
in
fn l => h :: tail appender l
end
Notice that the ﬁrst list is traversed once for all applications to a second ar
gument. When applied to a list [v
1
,...,v
n
], the function staged append
yields a function that is equivalent to, but not quite as efﬁcient as, the func
tion
fn l => v
1
:: v
2
:: ... :: v
n
:: l.
This still takes time proportional to n, but a substantial savings accrues
from avoiding the pattern matching required to destructure the original
list argument on each call.
WORKING DRAFT DECEMBER 9, 2002
Chapter 12
Exceptions
In the ﬁrst chapter of these notes we mentioned that expressions in Stan
dard ML always have a type, may have a value, and may have an effect. So
far we’ve concentrated on typing and evaluation. In this chapter we will
introduce the concept of an effect. While it’s hard to give a precise general
deﬁnition of what we mean by an effect, the idea is that an effect is any
action resulting from evaluation of an expression other than returning a
value. From this point of view we might consider nontermination to be
an effect, but we don’t usually think of failure to terminate as a positive
“action” in its own right, rather as a failure to take any action.
The main examples of effects in ML are these:
1. Exceptions. Evaluation may be aborted by signaling an exceptional
condition.
2. Mutation. Storage may be allocated and modiﬁed during evaluation.
3. Input/output. It is possible to read from an input source and write to
an output sink during evaluation.
4. Communication. Data may be sent to and received from communica
tion channels.
This chapter is concerned with exceptions; the other forms of effects will be
considered later.
12.1 Exceptions as Errors
ML is a safe language in the sense that its execution behavior may be un
derstood entirely in terms of the constructs of the language itself. Behav
WORKING DRAFT DECEMBER 9, 2002
112 Exceptions
ior such as “dumping core” or incurring a “bus error” are extralinguistic
notions that may only be explained by appeal to the underlying imple
mentation of the language. These cannot arise in ML. This is ensured by
a combination of a static type discipline, which rules out expressions that
are manifestly illdeﬁned (e.g., adding a string to an integer or casting an
integer as a function), and by dynamic checks that rule out violations that
cannot be detected statically (e.g., division by zero or arithmetic overﬂow).
Static violations are signalled by type checking errors; dynamic violations
are signalled by raising exceptions.
12.1.1 Primitve Exceptions
The expression 3 + "3" is illtyped, and hence cannot be evaluated. In
contrast the expression 3 div 0 is welltyped (with type int), but incurs
a runtime fault that is signalled by raising the exception Div. We will
indicate this by writing
3 div 0 ⇓ raise Div
An exception is a form of “answer” to the question “what is the value
this expression?”. In most implementations an exception such as this is
reported by an error message of the form “Uncaught exception Div”,
together with the line number (or some other indication) of the point in the
program where the exception occurred.
Exceptions have names so that we may distinguish different sources
of error in a program. For example, evaluation of the expression maxint
* maxint (where maxint is the largest representable integer) causes the
exception Overflow to be raised, indicating that an arithmetic overﬂow
error arose in the attempt to carry out the multiplication. This is usefully
distinguished from the exception Div, corresponding to division by zero.
(You may be wondering about the overhead of checking for arithmetic
faults. The compiler must generate instructions that ensure that an over
ﬂow fault is caught before any subsequent operations are performed. This
can be quite expensive on pipelined processors, which sacriﬁce precise de
livery of arithmetic faults in the interest of speeding up execution in the
nonfaulting case. Unfortunately it is necessary to incur this overhead if
we are to avoid having the behavior of an ML program depend on the un
derlying processor on which it is implemented.)
Another source of runtime exceptions is an inexhaustive match. Sup
pose we deﬁne the function hd as follows
WORKING DRAFT DECEMBER 9, 2002
12.1 Exceptions as Errors 113
fun hd (h:: ) = h
This deﬁnition is inexhaustive since it makes no provision for the possibil
ity of the argument being nil. What happens if we apply hd to nil? The
exception Match is raised, indicating the failure of the patternmatching
process:
hd nil ⇓ raise Match
The occurrence of a Match exception at runtime is indicative of a vio
lation of a precondition to the invocation of a function somewhere in the
program. Recall that it is often sensible for a function to be inexhaustive,
provided that we take care to ensure that it is never applied to a value
outside of its domain. Should this occur (because of a programming mis
take, evidently), the result is nevertheless welldeﬁned because ML checks
for, and signals, pattern match failure. That is, ML programs are implic
itly “bulletproofed” against failures of pattern matching. The ﬂip side is
that if no inexhaustive match warnings arise during type checking, then
the exception Match can never be raised during evaluation (and hence no
runtime checking need be performed).
Arelated situation is the use of a pattern in a val binding to destructure
a value. If the pattern can fail to match a value of this type, then a Bind
exception is raised at runtime. For example, evaluation of the binding
val h:: = nil
raises the exception Bind since the pattern h:: does not match the value
nil. Here again observe that a Bind exception cannot arise unless the
compiler has previously warned us of the possibility: no warning, no Bind
exception.
12.1.2 UserDeﬁned Exceptions
So far we have considered examples of predeﬁned exceptions that indicate
fatal error conditions. Since the builtin exceptions have a builtin meaning,
it is generally inadvisable to use these to signal programspeciﬁc error con
ditions. Instead we introduce a new exception using an exception decla
ration, and signal it using a raise expression when a runtime violation
occurs. That way we can associate speciﬁc exceptions with speciﬁc pieces
of code, easing the process of tracking down the source of the error.
Suppose that we wish to deﬁne a “checked factorial” function that en
sures that its argument is nonnegative. Here’s a ﬁrst attempt at deﬁning
such a function:
WORKING DRAFT DECEMBER 9, 2002
114 Exceptions
exception Factorial
fun checked factorial n =
if n < 0 then
raise Factorial
else if n=0 then
1
else n * checked factorial (n1)
The declaration exception Factorial introduces an exception Factorial,
which we raise in the case that checked factorial is applied to a nega
tive number.
The deﬁnition of checked factorial is unsatisfactory in at least two
respects. One, relatively minor, issue is that it does not make effective use
of pattern matching, but instead relies on explicit comparison operations.
To some extent this is unavoidable since we wish to check explicitly for
negative arguments, which cannot be done using a pattern. A more signif
icant problem is that checked factorial repeatedly checks the validity
of its argument on each recursive call, even though we can prove that if
the initial argument is nonnegative, then so must be the argument on each
recursive call. This fact is not reﬂected in the code. We can improve the
deﬁnition by introducing an auxiliary function:
exception Factorial
local
fun fact 0 = 1
 fact n = n * fact (n1)
in
fun checked factorial n =
if n >= 0 then
fact n
else
raise Factorial
end
Notice that we perform the range check exactly once, and that the auxiliary
function makes effective use of patternmatching.
12.2 Exception Handlers
The use of exceptions to signal error conditions suggests that raising an
exception is fatal: execution of the program terminates with the raised ex
WORKING DRAFT DECEMBER 9, 2002
12.2 Exception Handlers 115
ception. But signaling an error is only one use of the exception mechanism.
More generally, exceptions can be used to effect nonlocal transfers of control.
By using an exception handler we may “catch” a raised exception and con
tinue evaluation along some other path. Avery simple example is provided
by the following driver for the factorial function that accepts numbers from
the keyboard, computes their factorial, and prints the result.
fun factorial driver () =
let
val input = read integer ()
val result =
toString (checked factorial input)
in
print result
end
handle Factorial => print "Out of range."
An expression of the form exp handle match is called an exception
handler. It is evaluated by attempting to evaluate exp. If it returns a value,
then that is the value of the entire expression; the handler plays no role
in this case. If, however, exp raises an exception exc, then the exception
value is matched against the clauses of the match (exactly as in the applica
tion of a clausal function to an argument) to determine how to proceed. If
the pattern of a clause matches the exception exc, then evaluation resumes
with the expression part of that clause. If no pattern matches, the excep
tion exc is reraised so that outer exception handlers may dispatch on it. If
no handler handles the exception, then the uncaught exception is signaled
as the ﬁnal result of evaluation. That is, computation is aborted with the
uncaught exception exc.
In more operational terms, evaluation of exp handle match proceeds
by installing an exception handler determined by match, then evaluating
exp. The previous binding of the exception handler is preserved so that
it may be restored once the given handler is no longer needed. Raising
an exception consists of passing a value of type exn to the current excep
tion handler. Passing an exception to a handler deinstalls that handler,
and reinstalls the previously active handler. This ensures that if the han
dler itself raises an exception, or fails to handle the given exception, then
the exception is propagated to the handler active prior to evaluation of the
handle expression. If the expression does not raise an exception, the previ
ous handler is restored as part of completing the evaluation of the handle
expression.
WORKING DRAFT DECEMBER 9, 2002
116 Exceptions
Returning to the function factorial driver, we see that evaluation
proceeds by attempting to compute the factorial of a given number (read
from the keyboard by an unspeciﬁed function read integer), printing
the result if the given number is in range, and otherwise reporting that the
number is out of range. The example is trivialized to focus on the role of ex
ceptions, but one could easily imagine generalizing it in a number of ways
that also make use of exceptions. For example, we might repeatedly read
integers until the user terminates the input stream (by typing the end of
ﬁle character). Termination of input might be signaled by an EndOfFile
exception, which is handled by the driver. Similarly, we might expect that
the function read integer raises the exception SyntaxError in the case
that the input is not properly formatted. Again we would handle this ex
ception, print a suitable message, and resume.
Here’s a sketch of a more complicated factorial driver:
fun factorial driver () =
let
val input = read integer ()
val result =
toString (checked factorial input)
val = print result
in
factorial driver ()
end
handle EndOfFile => print "Done."
 SyntaxError =>
let
val = print "Syntax error."
in
factorial driver ()
end
 Factorial =>
let
val = print "Out of range."
in
factorial driver ()
end
We will return to a more detailed discussion of input/output later in these
notes. The point to notice here is that the code is structured with a com
pletely uncluttered “normal path” that reads an integer, computes its fac
WORKING DRAFT DECEMBER 9, 2002
12.2 Exception Handlers 117
torial, formats it, prints it, and repeats. The exception handler takes care
of the exceptional cases: end of ﬁle, syntax error, and domain error. In the
latter two cases we report an error, and resume reading. In the former we
simply report completion and we are done.
The reader is encouraged to imagine how one might structure this pro
gram without the use of exceptions. The primary beneﬁts of the exception
mechanism are as follows:
1. They force you to consider the exceptional case (if you don’t, you’ll
get an uncaught exception at runtime), and
2. They allow you to segregate the special case from the normal case in
the code (rather than clutter the code with explicit checks).
These aspects work handinhand to facilitate writing robust programs.
A typical use of exceptions is to implement backtracking, a program
ming technique based on exhaustive search of a state space. A very simple,
if somewhat artiﬁcial, example is provided by the following function to
compute change from an arbitrary list of coin values. What is at issue is
that the obvious “greedy” algorithm for making change that proceeds by
doling out as many coins as possible in decreasing order of value does not
always work. Given only a 5 cent and a 2 cent coin, we cannot make 16
cents in change by ﬁrst taking three 5’s and then proceeding to dole out 2’s.
In fact we must use two 5’s and three 2’s to make 16 cents. Here’s a method
that works for any set of coins:
exception Change
fun change 0 = nil
 change nil = raise Change
 change (coin::coins) amt =
if coin > amt then
change coins amt
else
(coin :: change (coin::coins) (amtcoin))
handle Change => change coins amt
The idea is to proceed greedily, but if we get “stuck”, we undo the most
recent greedy decision and proceed again from there. Simulate evaluation
of the example of change [5,2] 16 to see how the code works.
WORKING DRAFT DECEMBER 9, 2002
118 Exceptions
12.3 ValueCarrying Exceptions
So far exceptions are just “signals” that indicate that an exceptional condi
tion has arisen. Often it is useful to attach additional information that is
passed to the exception handler. This is achieved by attaching values to
exceptions.
For example, we might associate with a SyntaxError exception a string
indicating the precise nature of the error. In a parser for a language we
might write something like
raise SyntaxError "Integer expected"
to indicate a malformed expression in a situation where an integer is ex
pected, and write
raise SyntaxError "Identifier expected"
to indicate a badlyformed identiﬁer.
To associate a string with the exception SyntaxError, we declare it as
exception SyntaxError of string.
This declaration introduces the exception SyntaxError as an exception
carrying a string as value. This declaration introduces the exception con
structor SyntaxError.
Exception constructors are in many ways similar to value constructors.
In particular they can be used in patterns, as in the following code frag
ment:
... handle SyntaxError msg => print "Syntax error: " ˆ msg
Here we specify a pattern for SyntaxError exceptions that also binds the
string associated with the exception to the identiﬁer msg and prints that
string along with an error indication.
Recall that we may use value constructors in two ways:
1. We may use them to create values of a datatype (perhaps by applying
them to other values).
2. We may use them to match values of a datatype (perhaps also match
ing a constituent value).
The situation with exception constructors is symmetric.
WORKING DRAFT DECEMBER 9, 2002
12.3 ValueCarrying Exceptions 119
1. We may use them to create an exception (perhaps with an associated
value).
2. We may use them to match an exception (perhaps also matching the
associated value).
Value constructors have types, as we previously mentioned. For exam
ple, the list constructors nil and :: have types ’a list and ’a * ’a
list > ’a list, respectively. What about exception constructors? A
“bare” exception constructor (such as Factorial above) has type exn and
a valuecarrying exception constructor (such as SyntaxError) has type
string > exn Thus Factorial is a value of type exn, and
SyntaxError "Integer expected"
is a value of type exn.
The type exn is the type of exception packets, the data values associated
with an exception. The primitive operation raise takes any value of type
exn as argument and raises an exception with that value. The clauses of
a handler may be applied to any value of type exn using the rules of pat
tern matching described earlier; if an exception constructor is no longer in
scope, then the handler cannot catch it (other than via a wildcard pattern).
The type exn may be thought of as a kind of builtin datatype, except
that the constructors of this type are not determined once and for all (as they
are with a datatype declaration), but rather are incrementally introduced
as needed in a program. For this reason the type exn is sometimes called
an extensible datatype.
WORKING DRAFT DECEMBER 9, 2002
120 Exceptions
WORKING DRAFT DECEMBER 9, 2002
Chapter 13
Mutable Storage
In this chapter we consider a second form of effect, called a storage effect,
the allocation or mutation of storage during evaluation. The introduction
of storage effects has profound consequences, not all of which are desir
able. Indeed, they are sometimes called side effects, by analogy with the un
intended actions of some medications. While it is excessive to dismiss stor
age effects as completely undesirable, it is advantageous to minimize the
use of storage effects except in situations where the task clearly demands
them. We will explore some techniques for programming with storage ef
fects later in this chapter, but ﬁrst we introduce the primitive mechanisms
for programming with mutable storage in ML.
13.1 Reference Cells
To support mutable storage the execution model that we described in chap
ter 2 is enriched with a memory consisting of a ﬁnite set of mutable cells. A
mutable cell may be thought of as a container in which a data value of a
speciﬁed type is stored. During execution of a program the contents of
a cell may be retrieved or replaced by any other value of the appropriate
type. Since cells are used by issuing “commands” to modify and retrieve
their contents, programming with cells is called imperative programming.
Changing the contents of a mutable cell introduces a temporal aspect to
evaluation. We speak of the current contents of a cell, meaning the value
most recently assigned to it. We also speak of previous and future values
of a reference cell when discussing the behavior of a program. This is in
sharp contrast to the effectfree fragment of ML, for which no such con
cepts apply. For example, the binding of a variable does not change while
WORKING DRAFT DECEMBER 9, 2002
122 Mutable Storage
evaluating within the scope of that variable, lending a “permanent” quality
to statements about variables — the “current” binding is the only binding
that variable will ever have.
The type typ ref is the type of reference cells containing values of type
typ. Reference cells are, like all values, ﬁrst class — they may be bound to
variables, passed as arguments to functions, returned as results of func
tions, appear within data structures, and even be stored within other refer
ence cells.
A reference cell is created, or allocated, by the function ref of type typ
> typ ref. When applied to a value val of type typ, ref allocates a
“new” cell, initializes its content to val , and returns a reference to the cell.
By “new” we mean that the allocated cell is distinct from all other cells
previously allocated, and does not share storage with them.
The contents of a cell of type typ is retrieved using the function ! of type
typ ref > typ. Applying ! to a reference cell yields the current contents
of that cell. The contents of a cell is changed by applying the assignment
operator op :=, which has type typ ref * typ > unit. Assignment
is usually written using inﬁx syntax. When applied to a cell and a value, it
replaces the content of that cell with that value, and yields the nulltuple as
result.
Here are some examples:
val r = ref 0
val s = ref 0
val = r := 3
val x = !s + !r
val t = r
val = t := 5
val y = !s + !r
val z = !t + !r
After execution of these bindings, the variable x is bound to 3, the variable
y is bound to 5, and z is bound to 10.
Notice the use of a val binding of the form val = exp when exp
is to be evaluated purely for its effect. The value of exp is discarded by
the binding, since the lefthand side is a wildcard pattern. In most cases
the expression exp has type unit, so that its value is guaranteed to be the
nulltuple, (), if it has a value at all.
A wildcard binding is used to deﬁne sequential composition of expres
sions in ML. The expression
WORKING DRAFT DECEMBER 9, 2002
13.2 Reference Patterns 123
exp
1
; exp
2
is shorthand for the expression
let
val = exp
1
in
exp
2
end
that ﬁrst evaluates exp
1
for its effect, then evaluates exp
2
.
Functions of type typ>unit are sometimes called procedures, because
they are executed purely for their effect. This is apparent from the type: it
is assured that the value of applying such a function is the nulltuple, (),
so the only point of applying it is for its effects on memory.
13.2 Reference Patterns
It is a common mistake to omit the exclamation point when referring to
the content of a reference, especially when that cell is bound to a variable.
In more familiar languages such as C all variables are implicitly bound to
reference cells, and they are implicitly dereferenced whenever they are used
so that a variable always stands for its current contents. This is both a
boon and a bane. It is obviously helpful in many common cases since it
alleviates the burden of having to explicitly dereference variables whenever
their content is required. However, it shifts the burden to the programmer
in the case that the address, and not the content, is intended. In C one
writes & x for the address of (the cell bound to) x. Whether explicit or
implicit dereferencing is preferable is to a large extent a matter of taste.
The burden of explicit dereferencing is not nearly so onerous in ML as
it might be in other languages simply because reference cells are used so
infrequently in ML programs, whereas they are the sole means of binding
variables in more familiar languages.
An alternative to explicitly dereferencing cells is to use ref patterns. A
pattern of the formref pat matches a reference cell whose content matches
the pattern pat. This means that the cell’s contents are implicitly retrieved
during pattern matching, and may be subsequently used without explicit
dereferencing. For example, the second implementation of rot3 above
might be written using ref patterns as follows:
WORKING DRAFT DECEMBER 9, 2002
124 Mutable Storage
fun rot3 (a, b, c) =
let
val (ref x, ref y, ref z) = (a, b, c)
in
a := y; b := z; c := x
end
In practice it is common to use both explicit dereferencing and ref patterns,
depending on the situation.
13.3 Identity
Reference cells raise delicate issues of equality that considerably complicate
reasoning about programs. In general we say that two expressions (of the
same type) are equal iff they cannot be distinguished by any operation in
the language. That is, two expressions are distinct iff there is some way
within the language to tell them apart. This is called Leibniz’s Principle of
identity of indiscernables — we equate everything that we cannot tell apart.
As a consequence we obtain the principle of indiscernability of identicals —
that which we deem equal cannot be told apart. (Think about this for a
minute. One can have the latter without the former. Leibniz’s Principle is a
principle of maximality for equality.)
What makes Leibniz’s Principle tricky to grasp is that it hinges on what
we mean by a “way to tell expressions apart”. The crucial idea is that we
can tell two expressions apart iff there is a complete program containing one
of the expressions whose observable behavior changes when we replace that
expression by the other. That is, two expressions are considered equal iff
there is no such scenario that distinguishes them. But what do we mean by
“complete program”? And what do we mean by “observable behavior”?
For the present purposes we will consider a complete program to be
any expression of basic type (say, int or bool or string). The idea is
that a complete program is one that computes a concrete result such as a
number. The observable behavior of a complete program includes at least
these aspects:
1. Its value, or lack thereof, either by nontermination or by raising an
uncaught exception.
2. Its visible side effects, include visible modiﬁcations to mutable stor
age or any input/output it may perform.
WORKING DRAFT DECEMBER 9, 2002
13.3 Identity 125
In contrast here are some behaviors that we will not count as observations:
1. Execution time or space usage.
2. “Private” uses of storage (e.g., internallyallocated reference cells).
3. The name of uncaught exceptions (i.e., we will not distinguish be
tween terminating with the uncaught exception Bind and the un
caught exception Match.
With these ideas in mind, it should be plausible that if we evaluate these
bindings
val r = ref 0
val s = ref 0
then r and s are not equivalent. Consider the following usage of r to com
pute an integer result:
(s := 1 ; !r)
Clearly this expression evaluates to 0, and mutates the binding of s. Now
replace r by s to obtain
(s := 1 ; !s)
This expression evaluates to 1, and mutates s as before. These two com
plete programs distinguish r froms, and therefore must be considered dis
tinct.
Had we replaced the binding for s by the binding
val s = r
then the two expressions that formerly distinguished r from s no long do
so — they are, after all, bound to the same reference cell! In fact, no program
can be concocted that would distinguish them. In this case r and s are
equivalent.
Now consider a third, very similar scenario. Let us declare r and s as
follows:
val r = ref ()
val s = ref ()
WORKING DRAFT DECEMBER 9, 2002
126 Mutable Storage
Are r and s equivalent or not? We might ﬁrst try to distinguish them by
a variant of the experiment considered above. This breaks down because
there is only one possible value we can assign to a variable of type unit
ref! Indeed, one may suspect that r and s are equivalent in this case,
but in fact there is a way to distinguish them! Here’s a complete program
involving r that we will use to distinguish r from s:
if r=r then "it’s r" else "it’s not"
Now replace the ﬁrst occurrence of r by s to obtain
if s=r then "it’s r" else "it’s not"
and the result is different.
This example hinges on the fact that ML deﬁnes equality for values of
reference type to be reference equality (or, occasionally, pointer equality). Two
reference cells (of the same type) are equal in this sense iff they both arise
from the exact same use of the ref operation to allocate that cell; otherwise
they are distinct. Thus the two cells bound to r and s above are observably
distinct (by testing reference equality), even though they can only ever hold
the value (). Had equality not been included as a primitive, any two ref
erence cells of unit type would have been equal.
Why does ML provide such a ﬁnegrained notion of equality? “True”
equality, as deﬁned by Leibniz’s Principle, is, unfortunately, undecidable
— there is no computer program that determines whether two expressions
are equivalent in this sense. ML provides a useful, conservative approxi
mation to true equality that in some cases is not deﬁned (you cannot test
two functions for equality) and in other cases is too picky (it distinguishes
reference cells that are otherwise indistinguishable). Such is life.
13.4 Aliasing
To see how reference cells complicate programming, let us consider the
problem of aliasing. Any two variables of the same reference type might
be bound to the same reference cell, or to two different reference cells. For
example, after the declarations
val r = ref 0
val s = ref 0
the variables r and s are not aliases, but after the declaration
WORKING DRAFT DECEMBER 9, 2002
13.4 Aliasing 127
val r = ref 0
val s = r
the variables r and s are aliases for the same reference cell.
These examples show that we must be careful when programming with
variables of reference type. This is particularly problematic in the case of
functions, because we cannot assume that two different argument variables
are bound to different reference cells. They might, in fact, be bound to the
same reference cell, in which case we say that the two variables are aliases
for one another. For example, in a function of the form
fn (x:typ ref, y:typ ref) => exp
we may not assume that x and y are bound to different reference cells.
We must always ask ourselves whether we’ve properly considered aliasing
when writing such a function. This is harder to do than it sounds. Aliasing
is a huge source of bugs in programs that work with reference cells.
A classic example is a program to “rotate” the contents of three cells:
given reference cells a, b, and c, with initial contents x, y, and z, set their
contents to y, z, and x, respectively. Here’s a candidate implementation:
fun rot3 (a, b, c) =
let
val t = !a
in
a := !b; b := !c; c := t
end
This code works ﬁne if a, b, and c are three distinct reference cells. But
suppose that a and c are the same cell. Afterwards the contents of a, b, and
c are y, y, and x! Here’s a solution that works correctly, even in the presence
of aliasing:
fun rot3 (a, b, c) =
let
val (x, y, z) = (!a, !b, !c)
in
a := y; b := z; c := x
end
Notice that we use ordinary immutable variables to temporarily hold the
initial contents of the cells while their values are being updated.
WORKING DRAFT DECEMBER 9, 2002
128 Mutable Storage
13.5 Programming Well With References
Using references it is possible to mimic the style of programming used in
imperative languages such as C. For example, we might deﬁne the factorial
function in imitation of such languages as follows:
fun imperative fact (n:int) =
let
val result = ref 1
val i = ref 0
fun loop () =
if !i = n then
()
else
(i := !i + 1;
result := !result * !i;
loop ())
in
loop (); !result
end
Notice that the function loop is essentially just a while loop; it repeatedly
executes its body until the contents of the cell bound to i reaches n. The
tail call to loop is essentially just a goto statement to the top of the loop.
It is (appallingly) bad style to program in this fashion. The purpose of the
function imperative fact is to compute a simple function on the natural
numbers. There is nothing about its deﬁnition that suggests that state must
be maintained, and so it is senseless to allocate and modify storage to com
pute it. The deﬁnition we gave earlier is shorter, simpler, more efﬁcient,
and hence more suitable to the task. This is not to suggest, however, that
there are no good uses of references. We will now discuss some important
uses of state in ML.
13.5.1 Private Storage
The ﬁrst example is the use of higherorder functions to manage shared
private state. This programming style is closely related to the use of ob
jects to manage state in objectoriented programming languages. Here’s an
example to frame the discussion:
WORKING DRAFT DECEMBER 9, 2002
13.5 Programming Well With References 129
local
val counter = ref 0
in
fun tick () = (counter := !counter + 1; !counter)
fun reset () = (counter := 0)
end
This declaration introduces two functions, tick of type unit > int
and reset of type unit > unit. Their deﬁnitions share a private vari
able counter that is bound to a mutable cell containing the current value
of a shared counter. The tick operation increments the counter and re
turns its new value, and the reset operation resets its value to zero. The
types of the operations suggest that implicit state is involved. In the ab
sence of exceptions and implicit state, there is only one useful function of
type unit>unit, namely the function that always returns its argument
(and it’s debatable whether this is really useful!).
The declaration above deﬁnes two functions, tick and reset, that
share a single private counter. Suppose now that we wish to have sev
eral different instances of a counter — different pairs of functions tick and
reset that share different state. We can achieve this by deﬁning a counter
generator (or constructor) as follows:
fun new counter () =
let
val counter = ref 0
fun tick () = (counter := !counter + 1; !counter)
fun reset () = (counter := 0)
in
¦ tick = tick, reset = reset ¦
end
The type of new counter is
unit > ¦ tick : unit>int, reset : unit>unit ¦.
We’ve packaged the two operations into a record containing two func
tions that share private state. There is an obvious analogy with classbased
objectoriented programming. The function new counter may be thought
of as a constructor for a class of counter objects. Each object has a private
instance variable counter that is shared between the methods tick and
reset of the object represented as a record with two ﬁelds.
Here’s how we use counters.
WORKING DRAFT DECEMBER 9, 2002
130 Mutable Storage
val c1 = new counter ()
val c2 = new counter ()
#tick c1 ();
(* 1 *)
#tick c1 ();
(* 2 *)
#tick c2 ();
(* 1 *)
#reset c1 ();
#tick c1 ();
(* 1 *)
#tick c2 ();
(* 2 *)
Notice that c1 and c2 are distinct counters that increment and reset inde
pendently of one another.
13.5.2 Mutable Data Structures
A second important use of references is to build mutable data structures.
The data structures (such as lists and trees) we’ve considered so far are
immutable in the sense that it is impossible to change the structure of the list
or tree without building a modiﬁed copy of that structure. This is both a
beneﬁt and a drawback. The principal beneﬁt is that immutable data struc
tures are persistent in that operations performed on them do not destroy
the original structure — in ML we can eat our cake and have it too. For ex
ample, we can simultaneously maintain a dictionary both before and after
insertion of a given word. The principal drawback is that if we aren’t really
relying on persistence, then it is wasteful to make a copy of a structure if
the original is going to be discarded anyway. What we’d like in this case
is to have an “update in place” operation to build an ephemeral (opposite of
persistent) data structure. To do this in ML we make use of references.
A simple example is the type of possibly circular lists, or pcl’s. Informally,
a pcl is a ﬁnite graph in which every node has at most one neighbor, called
its predecessor, in the graph. In contrast to ordinary lists the predecessor
relation is not necessarily wellfounded: there may be an inﬁnite sequence
of nodes arranged in descending order of predecession. Since the graph
is ﬁnite, this can only happen if there is a cycle in the graph: some node
has an ancestor as predecessor. How can such a structure ever come into
existence? If the predecessors of a cell are needed to construct a cell, then
WORKING DRAFT DECEMBER 9, 2002
13.5 Programming Well With References 131
the ancestor that is to serve as predecessor in the cyclic case can never be
created! The “trick” is to employ backpatching: the predecessor is initialized
to Nil, so that the node and its ancestors can be constructed, then it is reset
to the appropriate ancestor to create the cycle.
This can be achieved in ML using the following datatype declaration:
datatype ’a pcl = Pcl of ’a pcell ref
and ’a pcell = Nil  Cons of ’a * ’a pcl;
A value of type typ pcl is essentially a reference to a value of type typ
pcell. A value of type typ pcell is either Nil, the cell at the end of a
noncircular possiblycircular list, or Cons (h, t), where h is a value of
type typ and t is another such possiblycircular list.
Here are some convenient functions for creating and taking apart possibly
circular lists:
fun cons (h, t) = Pcl (ref (Cons (h, t)));
fun nill () = Pcl (ref Nil);
fun phd (Pcl (ref (Cons (h, )))) = h;
fun ptl (Pcl (ref (Cons ( , t)))) = t;
To implement backpatching, we need a way to “zap” the tail of a possibly
circular list.
fun stl (Pcl (r as ref (Cons (h, ))), u) =
(r := Cons (h, u));
If you’d like, it would make sense to require that the tail of the Cons cell
be the empty pcl, so that you’re only allowed to backpatch at the end of a
ﬁnite pcl.
Here is a ﬁnite and an inﬁnite pcl.
val finite = cons (4, cons (3, cons (2, cons (1, nill ()))))
val tail = cons (1, nill());
val infinite = cons (4, cons (3, cons (2, tail)));
val = stl (tail, infinite)
The last step backpatches the tail of the last cell of infinite to be infinite
itself, creating a circular list.
Now let us deﬁne the size of a pcl to be the number of distinct nodes
occurring in it. It is an interesting problem is to deﬁne a size function
for pcls that makes no use of auxiliary storage (e.g., no set of previously
encountered nodes) and runs in time proportional to the number of cells in
WORKING DRAFT DECEMBER 9, 2002
132 Mutable Storage
the pcl. The idea is to think of running a long race between a tortoise and a
hare. If the course is circular, then the hare, which quickly runs out ahead
of the tortoise, will eventually come from behind and pass it! Conversely,
if this happens, the course must be circular.
local
fun race (Nil, Nil) = 0
 race (Cons ( , Pcl (ref c)), Nil) =
1 + race (c, Nil)
 race (Cons ( , Pcl (ref c)), Cons ( , Pcl (ref Nil))) =
1 + race (c, Nil)
 race (Cons ( , l), Cons ( , Pcl (ref (Cons ( , m))))) =
1 + race’ (l, m)
and race’ (Pcl (r as ref c), Pcl (s as ref d)) =
if r=s then 0 else race (c, d)
in
fun size (Pcl (ref c)) = race (c, c)
end
The hare runs twice as fast as the tortoise. We let the tortoise do the count
ing; the hare’s job is simply to detect cycles. If the hare reaches the ﬁnish
line, it simply waits for the tortoise to ﬁnish counting. This covers the ﬁrst
three clauses of race. If the hare has not yet ﬁnished, we must continue
with the hare running at twice the pace, checking whether the hare catches
the tortoise from behind. Notice that it can never arise that the tortoise
reaches the end before the hare does! Consequently, the deﬁnition of race
is inexhaustive.
13.6 Mutable Arrays
In addition to reference cells, ML also provides mutable arrays as a prim
itive data structure. The type typ array is the type of arrays carrying
values of type typ. The basic operations on arrays are these:
val array : int * ’a > ’a array
val size : ’a array > int
val sub : ’a array * int > ’a
val update : ’a array * int * ’a > unit
The function array creates a new array of a given size, with the given
value as the initial value of every element of the array. The function size
WORKING DRAFT DECEMBER 9, 2002
13.6 Mutable Arrays 133
returns the size of an array. The function sub performs a subscript oper
ation, returning the ith element of an array A, where 0 ≤ i ≤ size(A).
(These are just the basic operations on arrays; please see V for complete
information.)
One simple use of arrays is for memoization. Here’s a function to com
pute the nth Catalan number, which may be thought of as the number of
distinct ways to parenthesize an arithmetic expression consisting of a se
quence of n consecutive multiplication’s. It makes use of an auxiliary sum
mation function that you can easily deﬁne for yourself. (Applying sum to
f and n computes the sum of f0 + + fn.)
fun C 1 = 1
 C n = sum (fn k => (C k) * (C (nk))) (n1)
This deﬁnition of C is hugely inefﬁcient because a given computation may
be repeated exponentially many times. For example, to compute C 10 we
must compute C 1, C 2, . . . , C 9, and the computation of C i engenders
the computation of C 1, . . . , C i −1 for each 1 ≤ i ≤ 9. We can do better by
caching previouslycomputed results in an array, leading to an enormous
improvement in execution speed. Here’s the code:
local
val limit : int = 100
val memopad : int option array =
Array.array (limit, NONE)
in
fun C’ 1 = 1
 C’ n = sum (fn k => (C k)*(C (nk))) (n1)
and C n =
if n < limit then
case Array.sub (memopad, n)
of SOME r => r
 NONE =>
let
val r = C’ n
in
Array.update (memopad, n, SOME r);
r
end
else
C’ n
WORKING DRAFT DECEMBER 9, 2002
134 Mutable Storage
end
Note carefully the structure of the solution. The function C is a memoized
version of the Catalan number function. When called it consults the memo
pad to determine whether or not the required result has already been com
puted. If so, the answer is simply retrieved from the memopad, otherwise
the result is computed, stored in the cache, and returned. The function C’
looks superﬁcially similar to the earlier deﬁnition of C, with the important
difference that the recursive calls are to C, rather than C’ itself. This ensures
that subcomputations are properly cached and that the cache is consulted
whenever possible.
The main weakness of this solution is that we must ﬁx an upper bound
on the size of the cache. This can be alleviated by implementing a more
sophisticated cache management scheme that dynamically adjusts the size
of the cache based on the calls made to it.
WORKING DRAFT DECEMBER 9, 2002
Chapter 14
Input/Output
The Standard ML Basis Library (described in V) deﬁnes a threelayer in
put and output facility for Standard ML. These modules provide a rudi
mentary, platformindependent text I/O facility that we summarize brieﬂy
here. The reader is referred to V for more details. Unfortunately, there is at
present no standard library for graphical user interfaces; each implemen
tation provides its own package. See your compiler’s documentation for
details.
14.1 Textual Input/Output
The text I/O primitives are based on the notions of an input stream and
an output stream, which are values of type instream and outstream, re
spectively. An input streamis an unbounded sequence of characters arising
from some source. The source could be a disk ﬁle, an interactive user, or
another program (to name a few choices). Any source of characters can
be attached to an input stream. An input stream may be thought of as a
buffer containing zero or more characters that have already been read from
the source, together with a means of requesting more input from the source
should the programrequire it. Similarly, an output streamis an unbounded
sequence of characters leading to some sink. The sink could be a disk ﬁle,
an interactive user, or another program (to name a few choices). Any sink
for characters can be attached to an output stream. An output stream may
be thought of as a buffer containing zero or more characters that have been
produced by the program but have yet to be ﬂushed to the sink.
Each program comes with one input stream and one output stream,
called stdIn and stdOut, respectively. These are ordinarily connected to
WORKING DRAFT DECEMBER 9, 2002
136 Input/Output
the user’s keyboard and screen, and are used for performing simple text
I/O in a program. The output stream stdErr is also predeﬁned, and is
used for error reporting. It is ordinarily connected to the user’s screen.
Textual input and output are performed on streams using a variety of
primitives. The simplest are inputLine and print. To read a line of
input from a stream, use the function inputLine of type instream >
string. It reads a line of input from the given stream and yields that
line as a string whose last character is the line terminator. If the source is
exhausted, return the empty string. To write a line to stdOut, use the func
tion print of type string > unit. To write to a speciﬁc stream, use the
function output of type outstream * string > unit, which writes
the given string to the speciﬁed output stream. For interactive applica
tions it is often important to ensure that the output stream is ﬂushed to the
sink (e.g., so that it is displayed on the screen). This is achieved by calling
flushOut of type outstream > unit, which ensures that the output
streamis ﬂushed to the sink. The print function is a composition of output
(to stdOut) and flushOut.
A new input stream may be created by calling the function openIn
of type string > instream. When applied to a string, the system at
tempts to open a ﬁle with that name (according to operating systemspeciﬁc
naming conventions) and attaches it as a source to a newinput stream. Sim
ilarly, a newoutput streammay be created by calling the function openOut
of type string > outstream. When applied to a string, the system
attempts to create a ﬁle with that name (according to operating system
speciﬁc naming conventions) and attaches it as a sink for a new output
stream. An input stream may be closed using the function closeIn of
type instream > unit. A closed input stream behaves as if there is no
further input available; request for input from a closed input stream yield
the empty string. An output streammay be closed using closeOut of type
outstream > unit. A closed output stream is unavailable for further
output; an attempt to write to a closed output stream raises the exception
TextIO.IO.
The function input of type instream > string is a blocking read
operation that returns a string consisting of the characters currently avail
able from the source. If none are currently available, but the end of source
has not been reached, then the operation blocks until at least one character
is available fromthe source. If the source is exhausted or the input streamis
closed, input returns the null string. To test whether an input operation
would block, use the function canInput of type instream * int >
int option. Given a stream s and a bound n, the function canInput de
WORKING DRAFT DECEMBER 9, 2002
14.1 Textual Input/Output 137
termines whether or not a call to input on s would immediately yield up to
n characters. If the input operation would block, canInput yields NONE;
otherwise it yields SOME k, with 0 ≤ k ≤ n being the number of charac
ters immediately available on the input stream. If canInput yields SOME
0, the stream is either closed or exhausted. The function endOfStream
of type instream > bool tests whether the input stream is currently
at the end (no further input is available from the source). This condition
is transitive since, for example, another process might append data to an
open ﬁle in between calls to endOfStream.
The function output of type outstream * string > unit writes
a string to an output stream. It may block until the sink is able to accept
the entire string. The function flushOut of type outstream > unit
forces any pending output to the sink, blocking until the sink accepts the
remaining buffered output.
This collection of primitive I/O operations is sufﬁcient for performing
rudimentary textual I/O. For further information on textual I/O, and sup
port for binary I/O and Posix I/O primitives, see the Standard ML Basis
Library.
WORKING DRAFT DECEMBER 9, 2002
138 Input/Output
WORKING DRAFT DECEMBER 9, 2002
Chapter 15
Lazy Data Structures
In ML all variables are bound by value, which means that the bindings of
variables are fully evaluated expressions, or values. This general principle
has several consequences:
1. The righthand side of a val binding is evaluated before the binding
is effected. If the righthand side has no value, the val binding does
not take effect.
2. In a function application the argument is evaluated before being passed
to the function by binding that value to the parameter of the function.
If the argument does not have a value, then neither does the applica
tion.
3. The arguments to value constructors are evaluated before the con
structed value is created.
According to the byvalue discipline, the bindings of variables are evalu
ated, regardless of whether that variable is ever needed to complete execu
tion. For example, to compute the result of applying the function fn x =>
1 to an argument, we never actually need to evaluate the argument, but we
do anyway. For this reason ML is sometimes said to be an eager language.
An alternative is to bind variables by name,
1
which means that the bind
ing of a variable is an unevaluated expression, known as a computation or a
suspension or a thunk.
2
This principle has several consequences:
1
The terminology is historical, and not wellmotivated. It is, however, ﬁrmly established.
2
For reasons that are lost in the mists of time.
WORKING DRAFT DECEMBER 9, 2002
140 Lazy Data Structures
1. The righthand side of a val binding is not evaluated before the bind
ing is effected. The variable is bound to a computation (unevaluated
expression), not a value.
2. In a function application the argument is passed to the function in
unevaluated form by binding it directly to the parameter of the func
tion. This holds regardless of whether the argument has a value or
not.
3. The arguments to value constructor are left unevaluated when the
constructed value is created.
According to the byname discipline, the bindings of variables are only
evaluated (if ever) when their values are required by a primitive operation.
For example, to evaluate the expression x+x, it is necessary to evaluate the
binding of x in order to perform the addition. Languages that adopt the
byname discipline are, for this reason, said to be lazy.
This discussion glosses over another important aspect of lazy evalua
tion, called memoization. In actual fact laziness is based on a reﬁnement
of the byname principle, called the byneed principle. According to the by
name principle, variables are bound to unevaluated computations, and are
evaluated only as often as the value of that variable’s binding is required
to complete the computation. In particular, to evaluate the expression x+x
the value of the binding of x is need twice, and hence it is evaluated twice.
According to the byneed principle, the binding of a variable is evaluated
at most once — not at all, if it is never needed, and exactly once if it ever
needed at all. Reevaluation of the same computation is avoided by memo
ization. Once a computation is evaluated, its value is saved for future refer
ence should that computation ever be needed again.
The advantages and disadvantages of lazy vs. eager languages have
been hotly debated. We will not enter into this debate here, but rather con
tent ourselves with the observation that laziness is a special case of eagerness.
(Recent versions of) ML have lazy data types that allow us to treat uneval
uated computations as values of such types, allowing us to incorporate
laziness into the language without disrupting its fundamental character on
which so much else depends. This affords the beneﬁts of laziness, but on a
controlled basis — we can use it when it is appropriate, and ignore it when
it is not.
The main beneﬁt of laziness is that it supports demanddriven computa
tion. This is useful for representing online data structures that are created
WORKING DRAFT DECEMBER 9, 2002
15.1 Lazy Data Types 141
only insofar as we examine them. Inﬁnite data structures, such as the se
quence of all prime numbers in order of magnitude, are one example of an
online data structure. Clearly we cannot every “ﬁnish” creating the se
quence of all prime numbers, but we can create as much of this sequence
as we need for a given run of a program. Interactive data structures, such
as the sequence of inputs provided by the user of an interactive system, are
another example of online data structure. In such a system the user’s in
puts are not predetermined at the start of execution, but rather are created
“on demand” in response to the progress of computation up to that point.
The demanddriven nature of online data structures is precisely what is
needed to model this behavior.
Note: Lazy evaluation is a nonstandard feature of ML that is supported
only by the SML/NJ compiler. The lazy evaluation features must be en
abled by executing the following at top level:
Compiler.Control.lazysml := true;
open Lazy;
15.1 Lazy Data Types
SML/NJ provides a general mechanism for introducing lazy data types by
simply attaching the keyword lazy to an ordinary datatype declaration.
The ideas are best illustrated by example. We will focus attention on the
type of inﬁnite streams, which may be declared as follows:
datatype lazy ’a stream = Cons of ’a * ’a stream
Notice that this type deﬁnition has no “base case”! Had we omitted the
keyword lazy, such a datatype would not be very useful, since there
would be no way to create a value of that type!
Adding the keyword lazy makes all the difference. Doing so speciﬁes
that the values of type typ stream are computations of values of the form
Cons (val , val
),
where val is of type typ, and val
is another such computation. Notice how
this description captures the “incremental” nature of lazy data structures.
The computation is not evaluated until we examine it. When we do, its
structure is revealed as consisting of an element val together with another
suspended computation of the same type. Should we inspect that compu
tation, it will again have this form, and so on ad inﬁnitum.
WORKING DRAFT DECEMBER 9, 2002
142 Lazy Data Structures
Values of type typ stream are created using a val rec lazy decla
ration that provides a means for building a “circular” data structure. Here
is a declaration of the inﬁnite stream of 1’s as a value of type int stream:
val rec lazy ones = Cons (1, ones)
The keyword lazy indicates that we are binding ones to a computation,
rather than a value. The keyword rec indicates that the computation is
recursive (or selfreferential or circular). It is the computation whose underly
ing value is constructed using Cons (the only possibility) from the integer
1 and the very same computation itself.
We can inspect the underlying value of a computation by pattern match
ing. For example, the binding
val Cons (h, t) = ones
extracts the “head” and “tail” of the stream ones. This is performed by
evaluating the computation bound to ones, yielding Cons (1, ones),
then performing ordinary pattern matching to bind h to 1 and t to ones.
Had the pattern been “deeper”, further evaluation would be required,
as in the following binding:
val Cons (h, (Cons (h’, t’)) = ones
To evaluate this binding, we evaluate ones to Cons (1, ones), binding
h to 1 in the process, then evaluate ones again to Cons (1, ones), bind
ing h’ to 1 and t’ to ones. The general rule is pattern matching forces eval
uation of a computation to the extent required by the pattern. This is the means
by which lazy data structures are evaluated only insofar as required.
15.2 Lazy Function Deﬁnitions
The combination of (recursive) lazy function deﬁnitions and decomposi
tion by pattern matching are the core mechanisms required to support lazy
evaluation. However, there is a subtlety about function deﬁnitions that re
quires careful consideration, and a third new mechanism, the lazy function
declaration.
Using pattern matching we may easily deﬁne functions over lazy data
structures in a familiar manner. For example, we may deﬁne two functions
to extract the head and tail of a stream as follows:
fun shd (Cons (h, )) = h
fun stl (Cons ( , s)) = s
WORKING DRAFT DECEMBER 9, 2002
15.2 Lazy Function Deﬁnitions 143
These are functions that, when applied to a stream, evaluate it, and match
it against the given patterns to extract the head and tail, respectively.
While these functions are surely very natural, there is a subtle issue that
deserves careful discussion. The issue is whether these functions are “lazy
enough”. From one point of view, what we are doing is decomposing a
computation by evaluating it and retrieving its components. In the case
of the shd function there is no other interpretation — we are extracting a
value of type typ from a value of type typ stream, which is a computation
of a value of the form Cons (exp
h
, exp
t
). We can adopt a similar view
point about stl, namely that it is simply extracting a component value
from a computation of a value of the form Cons (exp
h
, exp
t
).
However, in the case of stl, another point of view is also possible.
Rather than think of stl as extracting a value from a stream, we may in
stead think of it as creating a streamout of another stream. Since streams are
computations, the stream created by stl (according to this view) should
also be suspended until its value is required. Under this interpretation the
argument to stl should not be evaluated until its result is required, rather
than at the time stl is applied. This leads to a variant notion of “tail” that
may be deﬁned as follows:
fun lazy lstl (Cons ( , s)) = s
The keyword lazy indicates that an application of lstl to a stream does
not immediately perform pattern matching on its argument, but rather sets
up a stream computation that, when forced, forces the argument and ex
tracts the tail of the stream.
The behavior of the two forms of tail function can be distinguished us
ing print statements as follows:
val rec lazy s = (print "."; Cons (1, s))
val = stl s (* prints "." *)
val = stl s (* silent *)
val rec lazy s = (print "."; Cons (1, s));
val = lstl s (* silent *)
val = stl s (* prints "." *)
Since stl evaluates its argument when applied, the “.” is printed when it
is ﬁrst called, but not if it is called again. However, since lstl only sets
up a computation, its argument is not evaluated when it is called, but only
when its result is evaluated.
WORKING DRAFT DECEMBER 9, 2002
144 Lazy Data Structures
15.3 Programming with Streams
Let’s deﬁne a function smap that applies a function to every element of
a stream, yielding another stream. The type of smap should be (’a >
’b) > ’a stream > ’b stream. The thing to keep in mind is that
the application of smap to a function and a stream should set up (but not
compute) another stream that, when forced, forces the argument stream to
obtain the head element, applies the given function to it, and yields this as
the head of the result.
Here’s the code:
fun smap f =
let
fun lazy loop (Cons (x, s)) =
Cons (f x, loop s)
in
loop
end
We have “staged” the computation so that the partial application of smap
to a function yields a function that loops over a given stream, applying the
given function to each element. This loop is a lazy function to ensure that it
merely sets up a stream computation, rather than evaluating its argument
when it is called. Had we dropped the keyword lazy fromthe deﬁnition of
the loop, then an application of smap to a function and a stream would im
mediately force the computation of the head element of the stream, rather
than merely set up a future computation of the same result.
To illustrate the use of smap, here’s a deﬁnition of the inﬁnite stream of
natural numbers:
val one plus = smap (fn n => n+1)
val rec lazy nats = Cons (0, one plus nats)
Now let’s deﬁne a function sfilter of type
(’a > bool) > ’a stream > ’a stream
that ﬁlters out all elements of a stream that do not satisfy a given predicate.
fun sfilter pred =
let
fun lazy loop (Cons (x, s)) =
if pred x then
WORKING DRAFT DECEMBER 9, 2002
15.3 Programming with Streams 145
Cons (x, loop s)
else
loop s
in
loop
end
We can use sfilter to deﬁne a function sieve that, when applied to
a stream of numbers, retains only those numbers that are not divisible by a
preceding number in the stream:
fun m mod n = m  n * (m div n)
fun divides m n = n mod m = 0
fun lazy sieve (Cons (x, s)) =
Cons (x, sieve (sfilter (not o (divides x)) s))
We may now deﬁne the inﬁnite stream of primes by applying sieve to
the natural numbers greater than or equal to 2:
val nats2 = stl (stl nats)
val primes = sieve nats2
To inspect the values of a stream it is often useful to use the following func
tion that takes n ≥ 0 elements from a stream and builds a list of those n
values:
fun take 0 = nil
 take n (Cons (x, s)) = x :: take (n1) s
Here’s an example to illustrate the effects of memoization:
val rec lazy s = Cons ((print "."; 1), s)
val Cons (h, ) = s;
(* prints ".", binds h to 1 *)
val Cons (h, ) = s;
(* silent, binds h to 1 *)
Replace print ".";1 by a timeconsuming operation yielding 1 as re
sult, and you will see that the second time we force s the result is returned
instantly, taking advantage of the effort expended on the timeconsuming
operation induced by the ﬁrst force of s.
WORKING DRAFT DECEMBER 9, 2002
146 Lazy Data Structures
WORKING DRAFT DECEMBER 9, 2002
Chapter 16
Equality and Equality Types
WORKING DRAFT DECEMBER 9, 2002
148 Equality and Equality Types
WORKING DRAFT DECEMBER 9, 2002
Chapter 17
Concurrency
Concurrent ML (CML) is an extension of Standard ML with mechanisms
for concurrent programming. It is available as part of the Standard ML of
New Jersey compiler. The eXene Library for programming the X windows
system is based on CML.
WORKING DRAFT DECEMBER 9, 2002
150 Concurrency
WORKING DRAFT DECEMBER 9, 2002
Part III
The Module Language
WORKING DRAFT DECEMBER 9, 2002
153
The Standard ML module language comprises the mechanisms for struc
turing programs into separate units. Program units are called structures. A
structure consists of a collection of components, including types and val
ues, that constitute the unit. Composition of units to form a larger unit is
mediated by a signature, which describes the components of that unit. A
signature may be thought of as the type of a unit. Large units may be struc
tured into hierarchies using substructures. Generic, or parameterized, units
may be deﬁned as functors.
WORKING DRAFT DECEMBER 9, 2002
154
WORKING DRAFT DECEMBER 9, 2002
Chapter 18
Signatures and Structures
The fundamental constructs of the ML module system are signatures and
structures. A signature may be thought of as an interface or speciﬁcation of
a structure, and a structure may correspondingly be thought of as an imple
mentation of a signature. Many languages (such as Modula, Ada, or Java)
have similar constructs: signatures are analogous to interfaces or package
speciﬁcations or class types, and structures are analogous to implementa
tions or packages or classes. However, these are only rough analogies that
should not be taken too seriously.
18.1 Signatures
A signature is a speciﬁcation, or a description, of a program unit, or structure.
Structures consist of declarations of type constructors, exception construc
tors, and value bindings. A signature is an itembyitem speciﬁcation of
these components of a structure. A structure matches, or implements, a sig
nature iff the requirements of the signature are met by the structure. (This
will be made precise below.)
18.1.1 Basic Signatures
A basic signature expression has the formsig specs end, where specs is a
sequence of speciﬁcations. There are four basic forms of speciﬁcation that
may occur in specs:
1
1
There are two other forms of speciﬁcation beyond these four, substructure speciﬁcations
and sharing speciﬁcations. These will be introduced in chapter 21.
WORKING DRAFT DECEMBER 9, 2002
156 Signatures and Structures
1. A type speciﬁcation of the form
type (tyvar
1
,...,tyvar
n
) tycon [ =
typ ],
where the deﬁnition typ of tycon may or may not be present.
2. Adatatype speciﬁcation, which has precisely the same formas a datatype
declaration.
3. An exception speciﬁcation of the form
exception excon of typ.
4. A value speciﬁcation of the form
val id : typ.
The sequence of speciﬁcations are to be understood in the order given, and
no component may be speciﬁed more than once. Each speciﬁcation may
refer to the type constructors introduced earlier in the sequence.
Signatures may be given names using a signature binding
signature sigid = sigexp,
where sigid is a signature identiﬁer and sigexp is a signature expression.
Signature identiﬁers are abbreviations for the signatures to which they are
bound. In practice we nearly always bind signature expressions to identi
ﬁers and refer to them by name.
Here is an illustrative example of a signature deﬁnition. We will refer
back to this deﬁnition often in the rest of this chapter.
signature QUEUE =
sig
type ’a queue
exception Empty
val empty : ’a queue
val insert : ’a * ’a queue > ’a queue
val remove : ’a queue > ’a * ’a queue
end
The signature QUEUE speciﬁes a structure that must provide
1. a unary type constructor ’a queue,
WORKING DRAFT DECEMBER 9, 2002
18.1 Signatures 157
2. a nullary exception Empty,
3. a polymorphic value empty of type ’a queue,
4. two polymorphic functions, insert and remove, with the speciﬁed
type schemes.
Notice that queues are polymorphic in the type of elements of the queue —
the same operations are used regardless of the element type.
18.1.2 Signature Inheritance
Signatures may be built up from one another using two principal tools,
signature inclusion and signature specialization. Each is a form of inheritance
in which a new signature is created by enriching another signature with
additional information.
Signature inclusion is used to add more components to an existing sig
nature. For example, if we wish to add an emptiness test to the signature
QUEUE we might deﬁne the augmented signature, QUEUE WITH EMPTY, us
ing the following signature binding:
signature QUEUE WITH EMPTY =
sig
include QUEUE
val is empty : ’a queue > bool
end
As the notation suggests, the signature QUEUE is included into the body of
the signature QUEUE WITH EMPTY, and an additional component is added.
It is not strictly necessary to use include to deﬁne this signature. In
deed, we may deﬁne it directly using the following signature binding:
signature QUEUE WITH EMPTY =
sig
type ’a queue
exception Empty
val empty : ’a queue
val insert : ’a * ’a queue > ’a queue
val remove : ’a queue > ’a * ’a queue
val is empty : ’a queue > bool
end
WORKING DRAFT DECEMBER 9, 2002
158 Signatures and Structures
There is no semantic difference between the two deﬁnitions of QUEUE WITH EMPTY.
Signature inclusion is a convenience that documents the “history” of how
the more reﬁned signature was created.
Signature specialization is used to augment an existing signature with
additional type deﬁnitions. For example, if we wish to reﬁne the signature
QUEUE to specify that the type constructor ’a queue must be deﬁned as a
pair of lists, we may proceed as follows:
signature QUEUE AS LISTS =
QUEUE where type ’a queue = ’a list * ’a list
There where type clause “patches” the signature QUEUE by adding a def
inition for the type constructor ’a queue.
The signature QUEUE AS LISTS may also be deﬁned directly as fol
lows:
signature QUEUE AS LISTS =
sig
type ’a queue = ’a list * ’a list
exception Empty
val empty : ’a queue
val insert : ’a * ’a queue > ’a queue
val remove : ’a queue > ’a * ’a queue
end
A where type clause may not be used to redeﬁne a type that is al
ready deﬁned in a signature. For example, the following is illegal:
signature QUEUE AS LISTS AS LIST =
QUEUE AS LISTS where type ’a queue = ’a list
If you wish to replace the deﬁnition of a type constructor in a signature
with another deﬁnition using where type, you must go back to a com
mon ancestor in which that type is not yet deﬁned.
signature QUEUE AS LIST =
QUEUE where type ’a queue = ’a list
Two signatures are said to be equivalent iff they differ only up to the
type equivalences induced by type abbreviations.
2
For example, the sig
nature QUEUE where type ’a queue = ’a list is equivalent to the
signature
2
In some languages signatures are compared by name, which means that two signatures
are equivalent iff they are the same signature identiﬁer. This is not the case in ML.
WORKING DRAFT DECEMBER 9, 2002
18.2 Structures 159
signature QUEUE AS LIST =
sig
type ’a queue = ’a list
exception Empty
val empty : ’a list
val insert : ’a * ’a list > ’a list
val remove : ’a list > ’a * ’a list
end
Within the scope of the deﬁnition of the type ’a queue as ’a list, the
two are equivalent, and hence the speciﬁcations of the value components
are equivalent.
This principle of equivalence is sometimes called progagation of type shar
ing. Within the scope of the type declaration in the signature QUEUE AS LIST,
the type constructors ’a queue and ’a list are said to share, or are
equivalent. Therefore they may be used interchangeably within their scope,
as we have done above, without affecting the meaning.
18.2 Structures
A structure is a unit of program consisting of a sequence of declarations
of types, exceptions, and values. Structures are implementations of signa
tures; signatures are the “types” of structures.
18.2.1 Basic Structures
The basic form of structure is an encapsulated sequence of declarations of
the form struct decs end. The declarations in decs are of one of the
following four forms:
1. A type declaration deﬁning a type constructor.
2. A datatype declaration deﬁning a new datatype.
3. An exception declaration deﬁning a new exception constructor with a
speciﬁed argument type.
4. Avalue declaration deﬁning a newvalue variable with a speciﬁed type.
These are precisely the declarations introduced in Part II.
A structure expression is wellformed iff it consists of a wellformed se
quence of wellformed declarations (according to the rules given in Part II).
WORKING DRAFT DECEMBER 9, 2002
160 Signatures and Structures
A structure expression is evaluated by evaluating each of the declarations
within it, in the order given. This amounts to evaluating the righthand
sides of each value declaration in turn to determine its value, which is then
bound to the corresponding value identiﬁer. This means, in particular, that
any side effects that arise out of evaluating the constituent value bindings
occur when the structure expression is evaluated. A structure value is a
structure expression in which all bindings are fully evaluated.
Astructure may be bound to a structure identiﬁer using a structure bind
ing of the form
structure strid = strexp
This declaration deﬁnes strid to stand for the value of strexp. Such a decla
ration is wellformed exactly when strexp is wellformed. It is evaluated by
evaluating the righthand side, and binding the resulting structure value to
strid.
Here is an example of a structure binding:
structure Queue =
struct
type ’a queue = ’a list * ’a list
exception Empty
val empty = (nil, nil)
fun insert (x, (b,f)) = (x::b, f)
fun remove (nil, nil) = raise Empty
 remove (bs, nil) = remove (nil, rev bs)
 remove (bs, f::fs) = (f, (bs, fs))
end
Recall that a fun binding is really an abbreviation for a val rec binding!
Thus the bindings of the value identiﬁers insert and remove are function
expressions.
18.2.2 Long and Short Identiﬁers
Once a structure has been bound to a structure identiﬁer, we may access its
components using paths, or long identiﬁers, or qualiﬁed names. A path has the
form strid.id.
3
It stands for the id component of the structure bound to
strid. For example, Queue.empty refers to the empty component of the
3
In chapter 21, we will generalize this to admit an arbitrary sequence of strid’s separated
by a dot.
WORKING DRAFT DECEMBER 9, 2002
18.2 Structures 161
structure Q. It has type ’a Queue.queue (note well the syntax!), stating
that it is a polymorphic value whose type is built up from the unary type
constructor Queue.queue. Similarly, the function Queue.insert has
type ’a * ’a Queue.queue > ’a Queue.queue and Queue.remove
has type ’a Queue.queue > ’a * ’a Queue.queue.
Type deﬁnitions permeate structure boundaries. For example, the type
’a Queue.queue is equivalent to the type ’a list because it is deﬁned
to be so in the structure Q. We will shortly introduce the means for lim
iting the visibility of type deﬁnitions. Unless special steps are taken, the
deﬁnitions of types within a structure determine the deﬁnitions of the long
identiﬁers that refer to those types within a structure. Consequently, it is
correct to write an expression such as
val q = Queue.insert (1, ([6,5,4],[1,2,3]))
even though the list [6,5,4] was not obtained by using the operations
from the structure Q. This is because the type int Queue.queue is equiv
alent to the type int list, and hence the call to insert is welltyped.
The use of long identiﬁers can get out of hand, cluttering the program,
rather than clarifying it. Suppose that we are frequently using the Queue
operations in a program, so that the code is cluttered with calls to Queue.empty,
Queue.insert, and Queue.remove. One way to reduce clutter is to in
troduce a structure abbreviation of the formstructure strid = strid
that
introduces one structure identiﬁer as an abbreviation for another. For ex
ample, after declaring
structure Q = Queue
we may write Q.empty, Q.insert, and Q.remove, rather than the more
verbose forms mentioned above.
Another way to reduce clutter is to open the structure Queue to incorpo
rate its bindings directly into the current environment. An open declaration
has the form
open strid
1
... strid
n
which incorporates the bindings from the given structures in lefttoright
order (later structures override earlier ones when there is overlap). For
example, the declaration
open Queue
WORKING DRAFT DECEMBER 9, 2002
162 Signatures and Structures
incorporates the body of the structure Queue into the current environment
so that we may write just empty, insert, and remove, without qualiﬁca
tion, to refer to the corresponding components of the structure Queue.
Although this is surely convenient, using open has its disadvantages.
One is that we cannot simultaneously open two structures that have a com
ponent with the same name. For example, if we write
open Queue Stack
where the structure Stack also has a component empty, then uses of empty
(without qualiﬁcation) will stand for Stack.empty, not for Queue.empty.
Another problem with open is that it is hard to control its behavior,
since it incorporates the entire body of a structure, and hence may inadver
tently shadow identiﬁers that happen to be also used in the structure. For
example, if the structure Queue happened to deﬁne an auxiliary function
helper, that function would also be incorporated into the current environ
ment by the declaration open Queue, which may not have been intended.
This turns out to be a source of many bugs; it is best to use open sparingly,
and only then in a let or local declaration (to limit the damage).
WORKING DRAFT DECEMBER 9, 2002
Chapter 19
Signature Matching
When does a structure implement a signature? Roughly speaking, the
structure must provide all of the components and satisfy all of the type deﬁ
nitions required by the signature. Exception components must be provided
with types equivalent to those in the signature. Value components must
be provided with types at least as general as those in the signature. Type
components must be provided with the right arities, and with equivalent
deﬁnitions (if any).
These are useful rules of thumb; nailing them down is surprisingly
tricky. Let us mention a few issues that complicate matters:
• To minimize bureaucracy, a structure may provide more components
than are strictly required by the signature. If a signature requires
components x, y, and z, it is sufﬁcient for the structure to provide
x, y, z, and w.
• To enhance reuse, a structure may provide values with more general
types than are required by the signature. If a signature demands a
function of type int>int, it is enough to provide a function of type
’a>’a.
• To avoid overspeciﬁcation, a datatype may be provided where a type is
required, and a value constructor may be provided where a value is
required.
• To increase ﬂexibility, a structure may consist of declarations presented
in any sensible order, not just the order speciﬁed in the signature,
provided that the requirements of the speciﬁcation are met.
WORKING DRAFT DECEMBER 9, 2002
164 Signature Matching
What it means for a structure to implement a signature is very similar
to what it means for an expression to have a type. An expression exp has
type typ iff typ is an instance of the principal type of exp. Similarly, we will
deﬁne a structure to implement a signature iff the principal signature of the
structure matches that signature.
19.1 Principal Signatures
The principal signature of a structure is, in a precise sense, the most speciﬁc
description of the components of that structure. It captures everything that
needs to be known about that structure during type checking. It is an im
portant, and highly nontrivial, property of ML that there is always a prin
cipal signature for any wellformed structure. For the purposes of type
checking, the principal signature is the ofﬁcial proxy for the structure. We
need never examine the code of the structure durng type checking, once its
principal signature has been determined.
Astructure expression is assigned a principal signature by a component
bycomponent analysis of its constituent declarations. The principal signa
ture of a structure is obtained as follows:
1
1. Corresponding to a declaration of the form
type (tyvar
1
,...,tyvar
n
) tycon = typ,
the principal signature contains the speciﬁcation
type (tyvar
1
,...,tyvar
n
) tycon = typ
The principal signature includes the deﬁnition of tycon.
2. Corresponding to a declaration of the form
datatype (tyvar
1
,...,tyvar
n
) tycon =
con
1
of typ
1
 ...  con
k
of typ
k
the principal signature contains the speciﬁcation
datatype (tyvar
1
,...,tyvar
n
) tycon =
con
1
of typ
1
 ...  con
k
of typ
k
1
These rules gloss over some technical complications that arise only in unusual circum
stances. See The Deﬁnition of Standard ML [2] for complete details.
WORKING DRAFT DECEMBER 9, 2002
19.2 Matching 165
The speciﬁcation is identical to the declaration.
3. Corresponding to a declaration of the form
exception id of typ
the principal signature contains the speciﬁcation
exception id of typ
4. Corresponding to a declaration of the form
val id = exp
the principal signature contains the speciﬁcation
val id : typ
where typ is the principal type scheme of the expression exp (relative
to the preceding declarations). Keep in mind that fun bindings are
really (recursive) val bindings of function type.
In brief, the principal signature contains all of the type deﬁnitions, datatype
deﬁnitions, and exception bindings of the structure, plus the principal types
of its value bindings.
An important point to keep in mind is that the principal signature of
a structure is obtained by inspecting the code of that structure. While this
may seem like an obvious point, it has important implications for manag
ing dependencies between modules. We will return to this point in sec
tion 20.4 of chapter 20.
19.2 Matching
A candidate signature sigexp
c
is said to match a target signature sigexp
t
iff
sigexp
c
has all of the components and all of the type equations speciﬁed by
sigexp
t
. More precisely,
1. Every type constructor in the target must also be present in the can
didate, with the same arity (number of arguments) and an equivalent
deﬁnition (if any).
2. Every datatype in the target must be present in the candidate, with
equivalent types for the value constructors.
WORKING DRAFT DECEMBER 9, 2002
166 Signature Matching
3. Every exception in the target must be present in the candidate, with
an equivalent argument type.
4. Every value in the target must be present in the candidate, with at
least as general a type.
The candidate may have additional components not mentioned in the tar
get, or satisfy additional type equations not required in the target, but it
cannot have fewer of either. The target signature may therefore be seen
as a weakening of the candidate signature, since all of the properties of the
latter are true of the former.
It is easy to see that the matching relation is reﬂexive — every signature
matches itself — and transitive — if sigexp
1
matches sigexp
2
and sigexp
2
matches sigexp
3
, then sigexp
1
matches sigexp
3
. This means that the signa
ture matching relation is a preorder. It is also a partial order, which is to say
that if sigexp
1
matches sigexp
2
and vice versa, then sigexp
1
and sigexp
2
are
equivalent signatures (to within propagation of type equations).
It will be helpful to consider some examples. Recall the following sig
natures from chapter 18.
signature QUEUE =
sig
type ’a queue
exception Empty
val empty : ’a queue
val insert : ’a * ’a queue > ’a queue
val remove : ’a queue > ’a * ’a queue
end
signature QUEUE WITH EMPTY =
sig
include QUEUE
val is empty : ’a queue > bool
end
signature QUEUE AS LISTS =
QUEUE where type ’a queue = ’a list * ’a list
The signature QUEUE WITH EMPTY matches the signature QUEUE, be
cause all of requirements of QUEUE are met by QUEUE WITH EMPTY. The
converse does not hold, because QUEUE lacks the component is empty,
which is required by QUEUE WITH EMPTY.
WORKING DRAFT DECEMBER 9, 2002
19.2 Matching 167
The signature QUEUE AS LISTS matches the signature QUEUE. It is iden
tical to QUEUE, apart fromthe additional speciﬁcation of the type ’a queue.
The converse fails, because the signature QUEUE does not satisfy the re
quirement that ’a queue be equivalent to ’a list * ’a list.
Matching does not distinguish between equivalent signatures. For ex
ample, consider the following signature:
signature QUEUE AS LIST = sig
type ’a queue = ’a list
exception Empty
val empty : ’a list
val insert : ’a * ’a list > ’a list
val remove : ’a list > ’a * ’a list
val is empty : ’a list > bool
end
At ﬁrst glance you might think that this signature does not match the sig
nature QUEUE, since the components of QUEUE AS LIST have superﬁcially
dissimilar types fromthose in QUEUE. However, the signature QUEUE AS LIST
is equivalent to the signature QUEUE with type ’a queue = ’a list,
which matches QUEUE for reasons noted earlier. Therefore, QUEUE AS LIST
matches QUEUE as well.
Signature matching may also involve instantiation of polymorphic types.
The types of values in the candidate may be more general than required by
the target. For example, the signature
signature MERGEABLE QUEUE =
sig
include QUEUE
val merge : ’a queue * ’a queue > ’a queue
end
matches the signature
signature MERGEABLE INT QUEUE =
sig
include QUEUE
val merge : int queue * int queue > int queue
end
because the polymorphic type of merge in MERGEABLE QUEUE instantiates
to its type in MERGEABLE INT QUEUE.
WORKING DRAFT DECEMBER 9, 2002
168 Signature Matching
Finally, a datatype speciﬁcation matches a signature that speciﬁes a type
with the same name and arity (but no deﬁnition), and zero or more value
components corresponding to some (or all) of the value constructors of the
datatype. The types of the value components must match exactly the types
of the corresponding value constructors; no specialization is allowed in this
case. For example, the signature
signature RBT DT =
sig
datatype ’a rbt =
Empty 
Red of ’a rbt * ’a * ’a rbt 
Black of ’a rbt * ’a * ’a rbt
end
matches the signature
signature RBT =
sig
type ’a rbt
val Empty : ’a rbt
val Red : ’a rbt * ’a * ’a rbt > ’a rbt
end
The signature RBT speciﬁes the type ’a rbt as abstract, and includes two
value speciﬁcations that are met by value constructors in the signature
RBT DT.
One way to understand this is to mentally rewrite the signature RBT DT
in the (fanciful) form
2
signature RBT DTS =
sig
type ’a rbt
con Empty : ’a rbt
con Red : ’a rbt * ’a * ’a rbt > ’a rbt
con Black : ’a rbt * ’a * ’a rbt > ’a rbt
The rule is simply that a val speciﬁcation may be matched by a con spec
iﬁcation.
2
Unfortunately, the “signature” RBT DTS is not a legal ML signature!
WORKING DRAFT DECEMBER 9, 2002
19.3 Satisfaction 169
19.3 Satisfaction
Returning to the motivating question of this chapter, a candidate structure
implements a target signature iff the principal signature of the candidate
structure matches the target signature. By the reﬂexivity of the matching re
lation it is immediate that a structure satisﬁes its principal signature. There
fore any signature implemented by a structure is weaker than the principal
signature of that structure. That is, the principal signature is the strongest
signature implemented by a structure.
WORKING DRAFT DECEMBER 9, 2002
170 Signature Matching
WORKING DRAFT DECEMBER 9, 2002
Chapter 20
Signature Ascription
Signature ascription imposes the requirement that a structure implement a
signature and, in so doing, weakens the signature of that structure for all
subsequent uses of it. There are two forms of ascription in ML. Both require
that a structure implement a signature; they differ in the extent to which the
assigned signature of the structure is weakened by the ascription.
1. Transparent, or descriptive ascription. The structure is assigned the tar
get signature augmented by propagating to the target the deﬁnitions
of those types in the candidate.
2. Opaque, or restrictive ascription. The structure is assigned the target
signature as is, without augmentation.
Both forms of ascription hide components not present in the target signa
ture.
Using modules effectively requires careful control over the propagation
of type information. What is held back is as at least as important as what
is revealed. Opaque ascription is the means by which type information is
curtailed; transparent ascription is the means by which it is propagated.
20.1 Ascribed Structure Bindings
The most common form of signature ascription is in a structure binding.
There are two forms, the transparent
structure strid : sigexp = strexp
and the opaque
WORKING DRAFT DECEMBER 9, 2002
172 Signature Ascription
structure strid :> sigexp = strexp
The only difference is that transparent ascription is written using a single
colon, “:”, whereas opaque ascription is written as “:>”.
Ascribed structure bindings are type checked as follows. First, the com
piler checks that strexp implements sigexp according to the rules given
in chapter 19. Speciﬁcally, the principal signature sigexp
0
of strexp is de
termined and matched against sigexp. This determines an augmentation
sigexp
of sigexp by propagating type equations from the principal signa
ture, sigexp
0
. Second, the structure identiﬁer is assigned a signature based
on the form of ascription. For an opaque ascription, it is assigned the signa
ture sigexp; for a transparent ascription, it is assigned the signature sigexp
.
Incidentally, this makes clear that transparent ascription is really a spe
cial case of opaque ascription! Since the augmented signature, sigexp
, is
itself a signature, we could have written it ourselves and opaquely ascribed
it to strexp, with the same net effect as a transparent ascription. Thus trans
parent ascription is really a form of “signature inference” in which we are
asking the compiler to ﬁll in details that it is too inconvenient to write our
selves. As we will see below, this is more than a minor convenience!
Ascribed signature bindings are evaluated by ﬁrst evaluating strexp.
Then a view of the resulting value is formed by dropping all components
that are not present in the target signature, sigexp.
1
The structure variable
strid is bound to the view. The formation of the view ensures that the com
ponents of a structure may always be accessed in constant time, and that
there are no “space leaks” because of components that are present in the
structure, but not in the signature.
20.2 Opaque Ascription
The primary use of opaque ascription is to enforce data abstraction. Agood
example is provided by the implementation of queues as, say, pairs of lists.
structure Queue :> QUEUE =
struct
1
There are some technical complications to do with dropping type components. For
example, if we attempt to include only the value constructors of a datatype, and not the
datatype itself, the compiler will implicitly include the datatype to ensure that the types
of the constructors are expressible in the signature of the view. Any such type implicitly
included in the view is marked as “hidden”. See The Deﬁnition of Standard ML for complete
details.
WORKING DRAFT DECEMBER 9, 2002
20.2 Opaque Ascription 173
type ’a queue = ’a list * ’a list
val empty = (nil, nil)
fun insert (x, (bs, fs)) = (x::bs, fs)
exception Empty
fun remove (nil, nil) = raise Empty
 remove (bs, f::fs) = (f, (bs, fs))
 remove (bs, nil) = remove (nil, rev bs)
end
The use of opaque ascription ensures that the type ’a Queue.queue is
abstract. No deﬁnition is provided for it in the signature QUEUE, and there
fore it has no deﬁnition in terms of other types of the language; the type ’a
Queue.queue is abstract.
The principal effect of rendering ’a Queue.queue abstract is that only
the operations empty, insert, and remove may be performed on values
of that type. We may not make use of the fact that a queue is “really” a
pair of lists simply because it is implemented this way. Instead we have
obscured this fact by opaquely ascribing a signature that does not deﬁne
the type ’a queue. This ensures that all clients of the structure Queue
are insulated from the details of how queues are implemented. This means
that the implementation can be changed without fear of breaking any client
code, a strong tool for enhancing maintainability of large programs. Were
abstraction not enforced, the client might well (inadvertently or deliber
ately) make use of the representation of queues as pairs of lists, forcing all
client code to be reconsidered whenever a change of representation occurs.
A closelyrelated reason for hiding the representation of a type is that it
allows us to isolate the enforcement of representation invariants to the im
plementation of the abstraction. We may think of the type ’a Queue.queue
as the type of states of an abstract machine whose sole instructions are
empty (the initial state), insert, and remove. Internally to the structure
Queue we may wish to impose invariants on the internal state of the ma
chine. The beauty of data abstraction is that it supports an elegant tech
nique for enforcing such invariants.
The technique, called the assumeensure, or relyguarantee, technique re
duces enforcement of representation invariants to these two requirements:
1. All initialization instructions must ensure that the invariant holds true
of the machine state after execution.
2. All state transition instructions may assume that the invariant holds
of the inputs states, and must ensure that it holds of the output state.
WORKING DRAFT DECEMBER 9, 2002
174 Signature Ascription
By induction on the number of “instructions” executed, the invariant must
hold for all states — i.e., it must really be invariant!
Now suppose that we wish to implement an abstract type of priority
queues for an arbitrary element type. The queue operations are no longer
polymorphic in the element type because they actually “touch” the ele
ments to determine their relative priorities. Here is a possible signature for
priority queues that expresses this dependency:
2
signature PQ =
sig
type elt
val lt : elt * elt > bool
type queue
exception Empty
val empty : queue
val insert : elt * queue > queue
val remove : queue > elt * queue
end
Now let us consider an implementation of priority queues in which the
elements are taken to be strings. Since priority queues form an abstract
type, we would expect to use opaque ascription to ensure that its represen
tation is hidden. This suggests an implementation along these lines:
structure PrioQueue :> PQ =
struct
type elt = string
val lt : string * string > bool = (op <)
type queue = ...
.
.
.
end
But not only is the type PrioQueue.queue abstract, so is PrioQueue.elt!
This leaves us no means of creating a value of type PrioQueue.elt, and
hence we can never call PrioQueue.insert. The problem is that the in
terface is “too abstract” — it should only obscure the identity of the type
queue, and not that of the type elt.
The solution is to augment the signature PQ with a deﬁnition for the
type elt, then opaquely ascribe this to PrioQueue:
2
In chapter 21 we’ll introduce better means for structuring this module, but the central
points discussed here will not be affected.
WORKING DRAFT DECEMBER 9, 2002
20.3 Transparent Ascription 175
signature STRING PQ = PQ where type elt = string
structure PrioQueue :> STRING PQ = ...
Now the type PrioQueue.elt is equivalent to string, and we may call
PrioQueue.insert with a string, as expected.
The moral is that there is always an element of judgement involved in
deciding which types to hold abstract, and which to make opaque. In the
case of priority queues, the determining factor is that we speciﬁed only the
operations on elt that were required for the implementation of priority
queues, and no others. This means that elt could not usefully be held
abstract, but must instead be speciﬁed in the signature. On the other hand
the operations on queues are intended to be complete, and so we hold the
type abstract.
20.3 Transparent Ascription
Transparent ascription cuts down on the need for explicit speciﬁcation of
type deﬁnitions in signatures. As we remarked earlier, we can always re
place uses of transparent ascription by a use of opaque ascription with a
handcrafted augmented signature. This can become burdensome. On the
other hand, excessive use of transparent ascription impedes modular pro
gramming by exposing type information that would better be left abstract.
The prototypical use of transparent ascription is to form a view of a
structure that eliminates the components that are not necessary in a given
context without obscuring the identities of its type components. Consider
the signature ORDERED deﬁned as follows:
signature ORDERED =
sig
type t
val lt : t * t > bool
end
This signature speciﬁes a type t equipped with a comparison operation lt.
It should be clear that it would not make sense to opaquely ascribe this
signature to a structure. Doing so would preclude ever calling the lt op
eration, for there would be no means of creating values of type t. Such a
signature is only useful once it has been augmented with a deﬁnition for
the type t. This is precisely what transparent ascription does for you.
For example, consider the following structure binding:
WORKING DRAFT DECEMBER 9, 2002
176 Signature Ascription
structure String : ORDERED =
struct
type t = string
val clt = Char.<
fun lt (s, t) = ... clt ...
end
This structure implements string comparison in terms of character compar
ison (say, to implement the lexicographic ordering of strings). Ascription
of the signature ORDERED ensures two things:
1. The auxiliary function clt is pruned out of the structure. It was in
tended for internal use, and was not meant to be externally visible.
2. The type String.t is equivalent to string, even though this fact
is not present in the signature ORDERED. Transparent ascription com
putes an augmentation of ORDERED with this deﬁnition exposed. The
“true” signature of String is the signature
ORDERED where type t = string
which makes clear the underlying deﬁnition of t.
A related use of transparent ascription is to document an interpretation
of a type without rendering it abstract. For example, we may wish to con
sider the integers ordered in two different ways, one by the standard arith
metic comparison, the other by divisibility. We might make the following
declarations to express this:
structure IntLt : ORDERED =
struct
type t = int
val lt = (op <)
end
structure IntDiv : ORDERED =
struct
type t = int
fun lt (m, n) = (n mod m = 0)
end
The ascription speciﬁes the interpretation of int as partially ordered, in
two senses, but does not hide the type of elements. In particular, IntLt.t
and IntDiv.t are both equivalent to int.
WORKING DRAFT DECEMBER 9, 2002
20.4 Transparency, Opacity, and Dependency 177
20.4 Transparency, Opacity, and Dependency
An important observation is that transparent ascription is, in a sense, a
form of opaque ascription. The target signature is augmented with the type
deﬁnitions of the candidate, and then the augmented signature is opaquely
ascribed to the structure. In principle one can always write out the aug
mented signature by hand (using where type clauses), then opaquely as
cribe it to the structure. While this is true in principle, in practice it is simply
too inconvenient to require the programmer to be fully explicit about the
propagation of type information in signatures.
3
Transparent ascription is a convenience akin to type inference. The com
piler automatically ﬂeshes out omitted information in a manner that is con
venient for the programmer. However, this convenience comes at a price:
whenever you use transparent ascription, the compiler must have access to the
source code of the structure to determine the “true” signature of that structure.
This means that any code that makes reference to the ascribed structure
depends on the implementation of that structure, and not just its (apparent)
interface. With transparent ascription, what you see (the ascribed signature)
is not what you get! Rather, what you get is an augmentation of what you
see obtained by inspecting the implementation of that structure.
On the other hand, whereas transparent ascription therefore introduces
implementation dependencies, opaque ascription eliminates them. Once a
signature has been opaquely ascribed to a structure, all future uses of that
structure may rely only on the ascribed signature. Since the signature is
given independently of the implementation, client code is insulated from
changes to the implementation that do not also affect its interface. This
greatly facilitates team development and code evolution.
Implementation dependencies are fundamentally antimodular.
4
The
goal of modular programming is to isolate parts of programs from one
another so that they can be independently developed and modiﬁed with
minimal interference. The fewer the dependencies between modules, the
fewer the problems with integrating modules to form a complete system.
3
Worse, for very technical reasons, it is not always possible to write the augmented sig
nature by hand, because to do so requires access to “hidden” types.
4
That’s why inheritance in classbased languages is a bad idea. If a class D inherits from
a class C, you have an implementation dependency of C on D. This is called the fragile base
class problem, because changes to C force reconsideration of D.
WORKING DRAFT DECEMBER 9, 2002
178 Signature Ascription
WORKING DRAFT DECEMBER 9, 2002
Chapter 21
Module Hierarchies
So far we have conﬁned ourselves to considering “ﬂat” modules consisting
of a linear sequence of declarations of types, exceptions, and values. As
programs grow in size and complexity, it becomes important to introduce
further structuring mechanisms to support their growth. The ML module
language also supports module hierarchies, treestructured conﬁgurations of
modules that reﬂect the architecture of a large system.
21.1 Substructures
A substructure is a “structure within a structure”. Structures bindings (ei
ther opaque or transparent) are admitted as components of other struc
tures. Structure speciﬁcations of the form
structure strid : sigexp
may appear in signatures. There is no distinction between transparent and
opaque speciﬁcations in a signature, because there is no structure to as
cribe!
The type checking and evaluation rules for structures are extended to
substructures recursively. The principal signature of a substructure bind
ing is determined according to the rules given in chapter 19. Asubstructure
binding in one signature matches the corresponding one in another iff their
signatures match according to the rules in chapter 19. Evaluation of a sub
structure binding consists of evaluating the structure expression, then bind
ing the resulting structure value to that identiﬁer.
To see how substructures arise in practice, consider the following pro
gramming scenario. The ﬁrst version of a system makes use of a polymor
WORKING DRAFT DECEMBER 9, 2002
180 Module Hierarchies
phic dictionary data structure whose search keys are strings. The signature
for such a data structure might be as follows:
signature MY STRING DICT =
sig
type ’a dict
val empty : ’a dict
val insert : ’a dict * string * ’a > ’a dict
val lookup : ’a dict * string > ’a option
end
The return type of lookup is ’a option, since there may be no entry in
the dictionary with the speciﬁed key.
The implementation of this abstraction looks approximately like this:
structure MyStringDict :> MY STRING DICT =
struct
datatype ’a dict =
Empty 
Node of ’a dict * string * ’a * ’a dict
val empty = Empty
fun insert (d, k, v) = ...
fun lookup (d, k) = ...
end
The omitted implementations of insert and lookup make use of the
builtin lexicographic ordering of strings.
The second version of the system requires another dictionary whose
keys are integers, leading to another signature and implementation for dic
tionaries.
signature MY INT DICT =
sig
type ’a dict
val empty : ’a dict
val insert : ’a dict * int * ’a > ’a dict
val lookup : ’a dict * int > ’a option
end
structure MyIntDict :> MY INT DICT =
sig
datatype ’a dict =
Empty 
WORKING DRAFT DECEMBER 9, 2002
21.1 Substructures 181
Node of ’a dict * int * ’a * ’a dict
val empty = Empty
fun insert (d, k, v) = ...
fun lookup (d, k) = ...
end
The ellided implementations of insert and lookup make use of the prim
itive comparison operations on integers.
At this point we may observe an obvious pattern, that of a dictionary
with keys of a speciﬁc type. To avoid further repetition we decide to ab
stract out the key type from the signature so that it can be ﬁlled in later.
signature MY GEN DICT =
sig
type key
type ’a dict
val empty : ’a dict
val insert : ’a dict * key * ’a > ’a dict
end
Notice that the dictionary abstraction carries with it the type of its keys.
Speciﬁc instances of this generic dictionary signature are obtained using
where type.
signature MY STRING DICT =
MY GEN DICT where type key = string
signature MY INT DICT =
MY GEN DICT where type key = int
A string dictionary might then be implemented as follows:
structure MyStringDict :> MY STRING DICT =
struct
type key = string
datatype ’a dict =
Empty 
Node of ’a dict * key * ’a * ’a dict
val empty = Empty
fun insert (Empty, k, v) = Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
if k < l then (* string comparison *)
WORKING DRAFT DECEMBER 9, 2002
182 Module Hierarchies
lookup (dl, k)
else if k > l then (* string comparison *)
lookup (dr, k)
else
v
end
By a similar process we may build an implementation MyIntDict of the
signature MY INT DICT, with integer keys ordered by the standard integer
comparison operations.
Now suppose that we require a third dictionary, with integers as keys,
but ordered according to the divisibility ordering.
1
This implementation,
say MyIntDivDict, makes use of modular arithmetic to compare opera
tions, but has the same signature MY INT DICT as MyIntDict.
structure MyIntDivDict :> MY INT DICT =
struct
type key = int
datatype ’a dict =
Empty 
Node of ’a dict * key * ’a * ’a dict
fun divides (k, l) = (l mod k = 0)
val empty = Empty
fun insert (None, k, v) = Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
if divides (k, l) then (* divisibility test *)
lookup (dl, k)
else if divides (l, k) then (* divisibility test *)
lookup (dr, k)
else
v
end
Notice that we required an auxiliary function, divides, to implement the
comparison in the required sense.
With this in mind, let us reconsider our initial attempt to consolidate
the signatures of the various versions of dictionaries in play. In one sense
there is nothing to do — the signature MY GEN DICT sufﬁces. However, as
1
For which, m < n iff mdivides n evenly.
WORKING DRAFT DECEMBER 9, 2002
21.1 Substructures 183
we’ve just seen, the instances of this signature, which are ascribed to partic
ular implementations, do not determine the interpretation. What we’d like
to do is to package the type with its interpretation so that the dictionary
module is selfcontained. Not only does the dictionary module carry with
it the type of its keys, but it also carries the interpretation used on that type.
This is achieved by introducing a substructure binding in the dictionary
structure. To begin with we ﬁrst isolate the notion of an ordered type.
signature ORDERED =
sig
type t
val lt : t * t > bool
val eq : t * t > bool
end
This signature describes modules that contain a type t equipped with an
equality and comparison operation on it.
An implementation of this signature speciﬁes the type and the interpre
tation, as in the following examples.
(* Lexicographically ordered strings. *)
structure LexString : ORDERED =
struct
type t = string
val eq = (op =)
val lt = (op <)
end
(* Integers ordered conventionally. *)
structure LessInt : ORDERED =
struct
type t = int
val eq = (op =)
val lt = (op <)
end
(* Integers ordered by divisibility.*)
structure DivInt : ORDERED =
struct
type t = int
fun lt (m, n) = (n mod m = 0)
fun eq (m, n) = lt (m, n) andalso lt (n, m)
end
WORKING DRAFT DECEMBER 9, 2002
184 Module Hierarchies
Notice that the use of transparent ascription is very natural here, since
ORDERED is not intended as a selfcontained abstraction.
The signature of dictionaries is restructured as follows:
signature DICT =
sig
structure Key : ORDERED
type ’a dict
val empty : ’a dict
val insert : ’a dict * Key.t * ’a > ’a dict
val lookup : ’a dict * Key.t > ’a option
end
The signature DICT includes as a substructure the key type together with
its interpretation as an ordered type.
To enforce abstraction we introduce specialized versions of this signa
ture that specify the key type using a where type clause.
signature STRING DICT =
DICT where type Key.t=string
signature INT DICT =
DICT where type Key.t=int
These are, respectively, signatures for the abstract type of dictionaries whose
keys are strings and integers.
How are these signatures to be implemented? Corresponding to the
layering of the signatures, we have a layering of the implementation.
structure StringDict :> STRING DICT =
struct
structure Key : ORDERED = LexString
datatype ’a dict =
Empty 
Node of ’a dict * Key.t * ’a * ’a dict
val empty = Empty
fun insert (None, k, v) = Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
if Key.lt(k, l) then
lookup (dl, k)
else if Key.lt (l, k) then
WORKING DRAFT DECEMBER 9, 2002
21.1 Substructures 185
lookup (dr, k)
else
v
end
Observe that the implementation of insert and lookup make use of the
comparison operations Key.lt and Key.eq.
Similarly, we may implement IntDict, with the standard ordering, as
follows:
structure LessIntDict :> INT DICT =
struct
structure Key : ORDERED = LessInt
datatype ’a dict =
Empty 
Node of ’a dict * Key.t * ’a * ’a dict
val empty = Empty
fun insert (None, k, v) = Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
if Key.lt(k, l) then
lookup (dl, k)
else if Key.lt (l, k) then
lookup (dr, k)
else
v
end
Similarly, dictionaries with integer keys ordered by divisibility may be
implemented as follows:
structure IntDivDict :> INT DICT =
struct
structure Key : ORDERED = IntDiv
datatype ’a dict =
Empty 
Node of ’a dict * Key.t * ’a * ’a dict
val empty = Empty
fun insert (None, k, v) = Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
WORKING DRAFT DECEMBER 9, 2002
186 Module Hierarchies
if Key.lt(k, l) then
lookup (dl, k)
else if Key.lt (l, k) then
lookup (dr, k)
else
v
end
Taking stock of the development, what we have done is to structure
the signature of dictionaries to allow the type of keys, together with its
interpretation, to vary from one implementation to another. The Key sub
structure may be viewed as a “parameter” of the signature DICT that is
“instantiated” by specialization to speciﬁc types of interest. In this sense
substructures subsume the notion of a parameterized signature found in some
languages. There are several advantages to this:
1. A signature with one or more substructures is still a complete signa
ture. Parameterized signatures, in contrast, are incomplete signatures
that must be completed to be used.
2. Any substructure of a signature may play the role of a “parameter”.
There is no need to designate in advance which are “arguments” and
which are “results”.
In chapter 23 we will introduce the mechanisms needed to build a generic
implementation of dictionaries that may be instantiated by the key type
and its ordering.
WORKING DRAFT DECEMBER 9, 2002
Chapter 22
Sharing Speciﬁcations
In chapter 21 we illustrated the use of substructures to express the depen
dence of one abstraction on another. In this chapter we will consider the
problem of symmetric combination of modules to form larger modules.
22.1 Combining Abstractions
The discussion will be based on a representation of geometry in ML based
on the following (drastically simpliﬁed) signature.
signature GEOMETRY =
sig
structure Point : POINT
structure Sphere : SPHERE
end
For the purposes of this example, we have reduced geometry to two con
cepts, that of a point in space and that of a sphere.
Points and vectors are fundamental to representing geometry. They are
described by the following (abbreviated) signatures:
signature VECTOR =
sig
type vector
val zero : vector
val scale : real * vector > vector
val add : vector * vector > vector
val dot : vector * vector > real
WORKING DRAFT DECEMBER 9, 2002
188 Sharing Speciﬁcations
end
signature POINT =
sig
structure Vector : VECTOR
type point
(* move a point along a vector *)
val translate : point * Vector.vector > point
(* the vector from a to b *)
val ray : point * point > Vector.vector
end
The vector operations support addition, scalar multiplication, and inner
product, and include a unit element for addition. The point operations
support translation of a point along a vector and the creation of a vector as
the “difference” of two points (i.e., the vector from the ﬁrst to the second).
Spheres are implemented by a module implementing the following (ab
breviated) signature:
signature SPHERE =
sig
structure Vector : VECTOR
structure Point : POINT
type sphere
val sphere : Point.point * Vector.vector > sphere
end
The operation sphere creates a sphere centered at a given point and with
the radius vector given.
These signatures are intentionally designed so that the dimension of
the space is not part of the speciﬁcation. This allows us — using the mech
anisms to be introduced in chapter 23 — to build packages that work in an
arbitrary dimension without requiring runtime conformance checks. It is
the structures, and not the signatures, that specify the dimension. Two and
threedimensional geometry are deﬁned by structure bindings like these:
structure Geom2D :> GEOMETRY = ...
structure Geom3D :> GEOMETRY = ...
As a consequence of the use of opaque ascription, the types Geom2D.Point.point
and Geom3D.Point.point are distinct. This means that dimensional
conformance is enforced by the type checker. For example, we cannot
WORKING DRAFT DECEMBER 9, 2002
22.1 Combining Abstractions 189
apply Geom3D.Sphere.sphere to a point in twospace and a vector in
threespace.
This is a good thing: the more static checking we have, the better off
we are. Closer inspection reveals that, unfortunately, we have too much
of a good thing. Suppose that p and q are twodimensional points of type
Geom2D.Point.point. We might expect to be able to form a sphere cen
tered at p with radius determined by the vector from p to q:
Geom2D.Sphere.sphere (p, Geom2D.Point.ray (p, q)).
But this expression is illtyped! The reason is that the types Geom2D.Point.Vector
and Geom2D.Sphere.Vector.vector are also distinct fromone another,
which is not at all what we intend.
What has gone wrong? The situation is quite subtle. In keeping with the
guidelines discussed in section 21.1, we have incorporated as substructures
the structures on which a given structure depends. For example, forming
a ray from one point to another yields a vector, so an implementation of
POINT depends on an implementation of VECTOR. Thus, POINT has a sub
structure implementing VECTOR, and, similarly, SPHERE has substructures
implementing VECTOR and POINT.
This leads to a proliferation of structures. Even in the very simpliﬁed
geometry signature given above, we have two “copies” of the point abstrac
tion, and three “copies” of the vector abstraction! Since we used opaque as
cription to deﬁne the two and threedimensional implementations of the
signature GEOMETRY, all of these abstractions are kept distinct from one
another, even though they may be implemented identically.
In a sense this is the correct state of affairs. The various “copies” of,
say, the vector abstraction might well be distinct from one another. In the
elided implementation of twodimensional geometry, we might have used
completely incompatible notions of vector in each of the three places where
they are required. Of course, this may not be what is intended, but (so far)
there is nothing in the signature to prevent it. Hence, we are compelled to keep
these types distinct.
What is missing is the expression of the intention that the various “copies”
of vectors and points within the geometry abstraction be identical, so that
we can mixandmatch the vectors constructed in various components of
the package. To support this it is necessary to constrain the implementa
tion to use the same notion of vector throughout. This is achieved using
a type sharing constraint. The revised signatures for the geometry package
look like this:
WORKING DRAFT DECEMBER 9, 2002
190 Sharing Speciﬁcations
signature SPHERE =
sig
structure Vector : VECTOR
structure Point : POINT
sharing type Point.Vector.vector = Vector.vector
type sphere
val sphere : Point.point * Vector.vector > sphere
end
signature GEOMETRY =
sig
structure Point : POINT
structure Sphere : SPHERE
sharing type Point.point = Sphere.Point.point
and Point.Vector.vector = Sphere.Vector.vector
end
These equations specify that the two “copies” of the point abstraction, and
the three “copies” of the vector abstraction must coincide. In the presence
of the above sharing speciﬁcation, the illtyped expression above becomes
welltyped, since now the required type equation holds by explicit speciﬁ
cation in the signature.
As a notational convenience we may use a structure sharing constraint
instead to express the same requirements:
signature SPHERE =
sig
structure Vector : VECTOR
structure Point : POINT
sharing Point.Vector = Vector
type sphere
val sphere : Point.point * Vector.vector > sphere
end
signature GEOMETRY =
sig
structure Point : POINT
structure Sphere : SPHERE
sharing Point = Sphere.Point
and Point.Vector = Sphere.Vector
end
WORKING DRAFT DECEMBER 9, 2002
22.1 Combining Abstractions 191
Rather than specify the required sharing typebytype, we can instead spec
ify it structurebystructure, with the meaning that corresponding types of
shared structures are required to share. Since each structure in our exam
ple contains only one type, the effect of the structure sharing speciﬁcation
above is identical to the preceding type sharing speciﬁcation.
Not only does the sharing speciﬁcation ensure that the desired equa
tions hold amongst the various components of an implementation of GEOMETRY,
but it also constrains the implementation to ensure that these types are the
same. It is easy to achieve this requirement by deﬁning a single implemen
tation of points and vectors that is reused in the higherlevel abstractions.
structure Vector3D : VECTOR = ...
structure Point3D : POINT =
struct
structure Vector : VECTOR = Vector3D
.
.
.
end
structure Sphere3D : SPHERE =
struct
structure Vector : VECTOR = Vector3D
structure Point : POINT = Point3D
.
.
.
end
structure Geom3D :> GEOMETRY =
struct
structure Point = Point3D
structure Sphere = Sphere3D
end
The required type sharing constraints are true by construction.
Had we instead replaced the above declaration of Geom3D by the fol
lowing one, the type checker would reject it on the grounds that the re
quired sharing between Sphere.Point and Point does not hold, because
Sphere2D.Point is distinct from Point3D.
structure Geom3D :> GEOMETRY =
struct
structure Point = Point3D
structure Sphere = Sphere2D
end
WORKING DRAFT DECEMBER 9, 2002
192 Sharing Speciﬁcations
It is natural to wonder whether it might be possible to restructure the
GEOMETRY signature so that the duplication of the point and vector compo
nents is avoided, thereby obviating the need for sharing speciﬁcations. One
can restructure the code in this manner, but doing so would do violence to
the overall structure of the program. This is why sharing speciﬁcations are
so important.
Let’s try to reorganize the signature GEOMETRY so that duplication of
the point and vector structures is avoided. One step is to eliminate the
substructure Vector fromSPHERE, replacing uses of Vector.vector by
Vector.Point.vector.
signature SPHERE =
sig
structure Point : POINT
type sphere
val sphere :
Point.point * Point.Vector.vector > sphere
end
After all, since the structure Point comes equipped with a notion of vector,
why not use it?
This cuts down the number of sharing speciﬁcations to one:
signature GEOMETRY =
sig
structure Point : POINT
structure Sphere : SPHERE
sharing Point = Sphere.Point
end
If we could further eliminate the substructure Point from the signature
SPHERE we would have only one copy of Point and no need for a sharing
speciﬁcation.
But what would the signature SPHERE look like in this case?
signature SPHERE =
sig
type sphere
val sphere :
Point.point * Point.Vector.vector > sphere
end
WORKING DRAFT DECEMBER 9, 2002
22.1 Combining Abstractions 193
The problem now is that the signature SPHERE is no longer selfcontained.
It makes reference to a structure Point, but which Point are we talking
about? Any commitment would tie the signature to a speciﬁc structure, and
hence a speciﬁc dimension, contrary to our intentions. Rather, the notion
of point must be a generic concept within SPHERE, and hence Point must
appear as a substructure. The substructure Point may be thought of as a
parameter of the signature SPHERE in the sense discussed earlier.
The only other move available to us is to eliminate the structure Point
fromthe signature GEOMETRY. This is indeed possible, and would eliminate
the need for any sharing speciﬁcations. But it only defers the problem,
rather than solving it. A fullscale geometry package would contain more
abstractions that involve points, so that there will still be copies in the other
abstractions. Sharing speciﬁcations would then be required to ensure that
these copies are, in fact, identical.
Here is an example. Let us introduce another geometric abstraction, the
semispace.
signature SEMI SPACE =
sig
structure Point : POINT
type semispace
val side : Point.point * semispace > bool option
end
The function side determines (if possible) whether a given point lies in
one half of the semispace or the other.
The expanded GEOMETRY signature would look like this (with the elim
ination of the Point structure in place).
signature EXTD GEOMETRY =
sig
structure Sphere : SPHERE
structure SemiSpace : SEMI SPACE
sharing Sphere.Point = SemiSpace.Point
end
By an argument similar to the one we gave for the signature SPHERE, we
cannot eliminate the substructure Point from the signature SEMI SPACE,
and hence we wind up with two copies. Therefore the sharing speciﬁcation
is required.
What is at issue here is a fundamental tension in the very notion of mod
ular programming. On the one hand we wish to separate modules from one
WORKING DRAFT DECEMBER 9, 2002
194 Sharing Speciﬁcations
another so that they may be treated independently. This requires that the
signatures of these modules be selfcontained. Unbound references to a
structure — such as Point — ties that signature to a speciﬁc implementa
tion, in violation of our desire to treat modules separately fromone another.
On the other hand we wish to combine modules together to form programs.
Doing so requires that the composition be coherent, which is achieved by
the use of sharing speciﬁcations. What sharing speciﬁcations do for you is
to provide an after the fact means of tying together several different abstrac
tions to form a coherent whole. This approach to the problem of coherence
is a unique — and uniquely effective — feature of the ML module system.
WORKING DRAFT DECEMBER 9, 2002
Chapter 23
Parameterization
To support code reuse it is useful to deﬁne generic, or parameterized, mod
ules that leave unspeciﬁed some aspects of the implementation of a mod
ule. The unspeciﬁed parts may be instantiated to determine speciﬁc in
stances of the module. The common part is thereby implemented once and
shared among all instances.
In ML such generic modules are called functors. A functor is a module
level function that takes a structure as argument and yields a structure as
result. Instances are created by applying the functor to an argument speci
fying the interpretation of the parameters.
23.1 Functor Bindings and Applications
Functors are deﬁned using a functor binding. There are two forms, the
opaque and the transparent. A transparent functor has the form
functor funid(decs):sigexp = strexp
where the result signature, sigexp, is transparently ascribed; an opaque func
tor has the form
functor funid(decs):>sigexp = strexp
where the result signature is opaquely ascribed. A functor is a module
level function whose argument is a sequence of declarations, and whose
result is a structure.
Type checking of a functor consists of checking that the functor body
matches the ascribed result signature, given that the parameters of the func
tor have the speciﬁed signatures. For a transparent functor the result sig
nature is the augmented signature determined by matching the principal
WORKING DRAFT DECEMBER 9, 2002
196 Parameterization
signature of the body (relative to the assumptions governing the parame
ters) against the given result signature. For opaque functors the ascribed
result signature is used asis, without further augmentation by type equa
tions.
In chapter 21 we developed a modular implementation of dictionaries
in which the ordering of keys is made explicit as a substructure. The im
plementation of dictionaries is the same for each choice of order structure
on the keys. This can be expressed by a single functor binding that deﬁnes
the implementation of dictionaries parametrically in the choice of keys.
functor DictFun
(structure K : ORDERED) :>
DICT where type Key.t = K.t =
struct
structure Key : ORDERED = K
datatype ’a dict =
Empty 
Node of ’a dict * Key.t * ’a * ’a dict
val empty = Empty
fun insert (None, k, v) =
Node (Empty, k, v, Empty)
fun lookup (Empty, ) = NONE
 lookup (Node (dl, l, v, dr), k) =
if Key.lt(k, l) then
lookup (dl, k)
else if Key.lt (l, k) then
lookup (dr, k)
else
v
end
The functor DictFun takes as argument a single structure speciﬁying the
ordered key type as a structure implementing signature ORDERED. The
opaquely ascribed result signature is
DICT where type Key.t = K.t.
This signature holds the type ’a dict abstract, but speciﬁes that the key
type is the type K.t passed as argument to the functor. The body of the
functor has been written so that the comparison operations are obtained
from the Key substructure. This ensures that the dictionary code is inde
pendent of the choice of key type and ordering.
WORKING DRAFT DECEMBER 9, 2002
23.1 Functor Bindings and Applications 197
Instances of functors are obtained by application. A functor application
has the form
funid(binds)
where binds is a sequence of bindings of the arguments of the functor.
The signature of a functor application is determined by the following
procedure. We assume we are given the signatures of the functor parame
ters, and also the “true” result signature of the functor (the given signature
for opaque functors, the augmented signature for the transparent functors).
1. For each argument, match the argument signature against the cor
responding parameter signature of the functor. This determines an
augmentation of the parameter signature for each argument (as de
scribed in chapter 20).
2. For each reference to a type component of a functor parameter in the
result signature, propagate the type deﬁnitions of the augmented pa
rameter signature to the result signature.
The signature of the application determined by this procedure is then opaquely
ascribed to the application. This means that if a type is left abstract in the
result signature of a functor, that type is “new” in every instance of that
functor. This behavior is called generativity of the functor.
1
Returning to the example of the dictionary functor, the three versions of
dictionaries considered in chapter 21 may be obtained by applying DictFun
to appropriate arguments.
structure LtIntDict = DictFun (structure K = LessInt)
structure LexStringDict = DictFun (structure K = LexString)
structure DivIntDict = DictFun (structure K = DivInt)
In each case the functor DictFun is instantiated by specifying a binding for
its argument stucture K. The argument structures are, as described in chap
ter 21, implementations of the signature ORDERED. They specify the type of
keys and the sense in which they are ordered.
The signatures for the structures LtIntDict, LexStringDict, and
DivIntDict are determined by instantiating the result signature of the
functor DictFun according to the above procedure. Consider the appli
cation of DictFun to LtIntDict. The augmented signature resulting
1
The alternative, called applicativity, means that there is one abstract type shared by all
instances of that functor.
WORKING DRAFT DECEMBER 9, 2002
198 Parameterization
from matching the signature of LtIntDict against the parameter signa
ture ORDERED is the signature
ORDERED where type t=int
Assigning this to the parameter K, we deduce that the type K.t is equiva
lent to int, and hence the result signature of DictFun is
DICT where type Key.t = int
so that IntLtDict.Key.t is equivalent to int, as desired. By a similar
process we deduce that the signature of LexStringDict is
DICT where type Key.t = string
and that the signature of DivIntDict is
DICT where type Key.t = int.
23.2 Functors and Sharing Speciﬁcations
In chapter 22 we developed a signature of geometric primitives that con
tained sharing speciﬁcations to ensure that the constituent abstractions may
be combined properly. The signature GEOMETRY is deﬁned as follows:
signature GEOMETRY =
sig
structure Point : POINT
structure Sphere : SPHERE
sharing Point = Sphere.Point
and Point.Vector = Sphere.Vector
and Sphere.Vector = Sphere.Point.Vector
end
The sharing clauses ensure that the Point and Sphere components are
compatible with each other.
Since we expect to deﬁne vectors, points, and spheres of various di
mensions, it makes sense to implement these as functors, according to the
following scheme:
functor PointFun
(structure V : VECTOR) : POINT = ...
WORKING DRAFT DECEMBER 9, 2002
23.2 Functors and Sharing Speciﬁcations 199
functor SphereFun
(structure V : VECTOR
structure P : POINT) : SPHERE =
struct
structure Vector = V
structure Point = P
.
.
.
end
functor GeomFun
(structure P : POINT
structure S : SPHERE) : GEOMETRY =
struct
structure Point = P
structure Sphere = S
end
A twodimensional geometry package may then be deﬁned as follows:
structure Vector2D : VECTOR = ...
structure Point2D : POINT =
PointFun (structure V = Vector2D)
structure Sphere2D : SPHERE =
SphereFun (structure V = Vector2D and P = Point2D)
structure Geom2D : GEOMETRY =
GeomFun (structure P = Point2D and S = Sphere2D)
A threedimensional version is deﬁned similarly.
There is only one problem: the functors SphereFun and GeomFun are
not welltyped! The reason is that in both cases their result signatures re
quire type equations that are not true of their parameters! For example, the
signature SPHERE requires that Point.Vector be the same as Vector,
which is not satisﬁed by the body of SphereFun. For these to be true, the
structures P.Vector and V must be equivalent. This is not true in general,
because the functors might be applied to arguments for which this is false.
Similar problems plague the functor GeomFun. The solution is to include
sharing constraints in the parameter list of the functors, as follows:
functor SphereFun
(structure V : VECTOR
structure P : POINT
WORKING DRAFT DECEMBER 9, 2002
200 Parameterization
sharing P.Vector = V) : SPHERE =
struct
structure Vector = V
structure Point = P
.
.
.
end
functor GeomFun
(structure P : POINT
structure S : SPHERE
sharing P.Vector = S.Vector and P = S.Point) : GEOMETRY =
struct
structure Point = P
structure Sphere = S
end
These equations preclude instantiations for which the required equations
do not hold, and are sufﬁcient to ensure that the requirements of the result
signatures of the functors are met.
23.3 Avoiding Sharing Speciﬁcations
As with sharing speciﬁcations in signatures, it is natural to wonder whether
they can be avoided in functor parameters. Once again, the answer is “yes”,
but doing so does violence to the structure of your program. The chief
virtue of sharing speciﬁcations is that they express directly and concisely
the required relationships without requiring that these relationships be an
ticipated when deﬁning the signatures of the parameters. This greatly facil
itates reuse of offtheshelf code, for which it is impossible to assume any
sharing relationships that one may wish to impose in an application.
To see what happens, let’s consider the bestcase scenario from chap
ter 22 in which we have minimized sharing speciﬁcations to one tying to
gether the sphere and semispace components. That is, we’re to implement
the following signature:
signature EXTD GEOMETRY =
sig
structure Sphere : SPHERE
structure SemiSpace : SEMI SPACE
sharing Sphere.Point = SemiSpace.Point
end
WORKING DRAFT DECEMBER 9, 2002
23.3 Avoiding Sharing Speciﬁcations 201
The implementation is a functor of the form
functor ExtdGeomFun
(structure Sp : SPHERE
structure Ss : SEMI SPACE
sharing Sphere.Point = SemiSpace.Point) =
struct
structure Sphere = Sp
structure SemiSpace = Ss
end
To eliminate the sharing equation in the functor parameter, we must
arrange that the sharing equation in the signature EXTD GEOMETRY holds.
Simply dropping the sharing speciﬁcation will not do, because then there
is no reason to believe that it will hold as required in the signature. A
natural move is to “factor out” the implementation of POINT, and use it
to ensure that the required equation is true of the functor body. There are
two methods for doing this, each with disadvtanges compared to the use
of sharing speciﬁcations.
One is to make the desired equation true by construction. Rather than
take implementations of SPHERE and SEMI SPACE as arguments, the func
tor ExtdGeomFun takes only an implementation of POINT, then creates in
the functor body appropriate implementations of spheres and semispaces.
functor SphereFun
(structure P : POINT) : SPHERE =
struct
structure Vector = P.Vector
structure Point = P
.
.
.
end
functor SemiSpaceFun
(structure P : POINT) : SEMI SPACE =
struct
.
.
.
end
functor ExtdGeomFun1
(structure P : POINT) : GEOMETRY =
struct
structure Sphere =
WORKING DRAFT DECEMBER 9, 2002
202 Parameterization
SphereFun (structure P = Point)
structure SemiSpace =
SemiSpaceFun (structure P = Point)
end
The problems with this solution are these:
• The body of ExtdGeomFun1 makes use of the functors SphereFun
and SemiSpaceFun. In effect we are limiting the geometry functor
to arguments that are built from these speciﬁc functors, and no other.
This is a signiﬁcant loss of generality that is otherwise present in the
functor ExtdGeomFun, which may be applied to any implementa
tions of SPHERE and SEMI SPACE.
• The functor ExtdGeomFun1 must have as parameter the common el
ement(s) of the components of its body, which is then used to build
up the appropriate substructures in a manner consistent with the re
quired sharing. This approach does not scale well when many ab
stractions are layered atop one another. We must reconstruct the
entire hierarchy, starting with the components that are conceptually
“furthest away” as arguments.
• There is no inherent reason why ExtdGeomFun1 must take an im
plementation of POINT as argument. It does so only so that it can
reconstruct the hierarchy so as to satisfy the sharing requirements of
the result signature.
Another approach is to factor out the common component, and use this
to constrain the arguments to the functor to ensure that the possible argu
ments are limited to situations for which the required sharing holds.
functor ExtdGeomFun2
(structure P : POINT
structure Sp : SPHERE where Point = P
structure Ss : SEMI SPACE where Point = P) =
struct
structure Sphere = Sp
structure SemiSpace = Ss
end
Now the required sharing requirements are met, but it is also clear that this
approach has no particular advantages over just using a sharing speciﬁca
tion. It has the disadvantage of requiring a third argument, whose only
WORKING DRAFT DECEMBER 9, 2002
23.3 Avoiding Sharing Speciﬁcations 203
role is to make it possible to express the required sharing. An applica
tion of this functor must provide not only implementations of SPHERE and
SEMI SPACE, but also an implementation of POINT that is used to build
these!
A slightly more sophisticated version of this solution is as follows:
functor ExtdGeomFun3
(structure Sp : SPHERE
structure Ss : SEMI SPACE where Point = Sp.Point) =
struct
structure Sphere = Sp
structure SemiSpace = Ss
end
The “extra” parameter to the functor has been eliminated by choosing one
of the components as a “representative” and insisting that the others be
compatible with it by using a where clause.
2
This solution has all of the advantages of the direct use of sharing spec
iﬁcations, and no further disadvantages. However, we are forced to violate
arbitrarily the inherent symmetry of the situation. We could just as well
have written
functor ExtdGeomFun4
(structure Ss : SEMI SPACE
structure Sp : SPHERE where Point = Sp.Point) =
struct
structure Sphere = Sp
structure SemiSpace = Ss
end
without changing the meaning.
Here is the point: sharing speciﬁcations allow a symmetric situation to be
treated in a symmetric manner. The compiler breaks the symmetry by choos
ing representatives arbitrarily in the manner illustrated above. Sharing
speciﬁcations offload the burden of making such tedious (because arbi
trary) decisions to the compiler, rather than imposing it on the program
mer.
2
Ofﬁcially, we must write where type Point.point = Sp.Point.point, but
many compilers accept the syntax above.
WORKING DRAFT DECEMBER 9, 2002
204 Parameterization
WORKING DRAFT DECEMBER 9, 2002
Part IV
Programming Techniques
WORKING DRAFT DECEMBER 9, 2002
207
In this part of the book we will explore the use of Standard ML to build
elegant, reliable, and efﬁcient programs. The discussion takes the form of
a series of worked examples illustrating various techniques for building
programs.
WORKING DRAFT DECEMBER 9, 2002
208
WORKING DRAFT DECEMBER 9, 2002
Chapter 24
Speciﬁcations and Correctness
The most important tools for getting programs right are speciﬁcation and
veriﬁcation. In this Chapter we review the main ideas in preparation for
their subsequent use in the rest of the book.
24.1 Speciﬁcations
A speciﬁcation is a description of the behavior of a piece of code. Speciﬁca
tions take many forms:
• Typing. A type speciﬁcation describes the “form” of the value of an
expression, without saying anything about the value itself.
• Effect Behavior. An effect speciﬁcation resembles a type speciﬁcation,
but instead of describing the value of an expression, it describes the
effects it may engender when evaluated.
• InputOutput Behavior. An inputoutput speciﬁcation is a mathemat
ical formula, usually an implication, that describes the output of a
function for all inputs satisfying some assumptions.
• Time and Space Complexity. A complexity speciﬁcation states the time
or space required to evaluate an expression. The speciﬁcation is most
often stated asymptotically in terms of the number of execution steps
or the size of a data structure.
• Equational Speciﬁcations. An equational speciﬁcation states that one
code fragment is equivalent to another. This means that wherever
WORKING DRAFT DECEMBER 9, 2002
210 Speciﬁcations and Correctness
one is used we may replace it with the other without changing the
observable behavior of any program in which they occur.
This list by no means exhausts the possibilities. What speciﬁcations have
in common, however, is a descriptive, or declarative, ﬂavor, rather than a
prescriptive, or operational, one. A speciﬁcation states what a piece of code
does, not how it does it.
Good programmers use speciﬁcations to state precisely and concisely
their intentions when writing a piece of code. The code is written to solve
a problem; the speciﬁcation records for future reference what problem the
code was intended to solve. The very act of formulating a precise speci
ﬁcation of a piece of code is often the key to ﬁnding a good solution to a
tricky problem. The guiding principle is this: if you are unable to write a clear
speciﬁcation, you do not understand the problem well enough to solve it correctly.
The greatest difﬁculty in using speciﬁcations is in knowing what to
say. A common misunderstanding is that a given piece of code has one
speciﬁcation stating all there is to know about it. Rather you should see
a speciﬁcation as describing one of perhaps many properties of interest.
In ML every program comes with at least one speciﬁcation, its type. But
we may also consider other speciﬁcations, according to interest and need.
Sometimes only relatively weak properties are important — the function
square always yields a nonnegative result. Other times stronger proper
ties are needed — the function fibb applied to n yields the nth and n1st
Fibonacci number. It’s a matter of taste and experience to know what to say
and how best to say it.
Recall the following two ML functions from chapter 7:
fun fib 0 = 1
 fib 1 = 1
 fib n = fib (n1) + fib (n2)
fun fib’ 0 = (1, 0)
 fib’ 1 = (1, 1)
 fib’ n =
let
val (a, b) = fib’ (n1)
in
(a+b, a)
end
Here are some speciﬁcations pertaining to these functions:
WORKING DRAFT DECEMBER 9, 2002
24.2 Correctness Proofs 211
• Type speciﬁcations:
– val fib : int > int
– val fib’ : int > int * int
• Effect speciﬁcations:
– The application fib n may raise the exception Overflow.
– The application fib’ n may raise the exception Overflow.
• Inputoutput speciﬁcations:
– If n ≥ 0, then fib n evaluates to the nth Fibonacci number.
– If n ≥ 0, then fib’ n evaluates the nth and n − 1st Fibonacci
number, in that order.
• Time complexity speciﬁcations:
– If n ≥ 0, then fib n terminates in O(2
n
) steps.
– If n ≥ 0, then fib’ n terminates in O(n) steps.
• Equivalence speciﬁcation:
For all n ≥ 0, fib n is equivalent to #1(fib’ n)
24.2 Correctness Proofs
A program satisﬁes, or meets, a speciﬁcation iff its execution behavior is as
described by the speciﬁcation. Veriﬁcation is the process of checking that a
program satisﬁes a speciﬁcation. This takes the form of proving a mathe
matical theorem (by hand or with machine assistance) stating that the pro
gram implements a speciﬁcation.
There are many misunderstandings in the literature about speciﬁcation
and veriﬁcation. It is worthwhile to take time out to address some of them
here.
It is often said that a program is correct if it meets a speciﬁcation. The
veriﬁcation of this fact is then called a correctness proof. While there is noth
ing wrong with this usage, it invites misinterpretation. As we remarked
in section 24.1, there is in general no single preferred speciﬁcation of a
piece of code. Veriﬁcation is always relative to a speciﬁcation, and hence
so is correctness. In particular, a program can be “correct” with respect to
WORKING DRAFT DECEMBER 9, 2002
212 Speciﬁcations and Correctness
one speciﬁcation, and “incorrect” with respect to another. Consequently, a
program is never inherently correct or incorrect; it is only so relative to a
particular speciﬁcation.
A related misunderstanding is the inference that a program “works”
from the fact that it is “correct” in this sense. Speciﬁcations usually make
assumptions about the context in which the program is used. If these as
sumptions are not satisﬁed, then all bets are off —nothing can be said about
its behavior. For this reason it is entirely possible that a correct programwill
malfunction when used, not because the speciﬁcation is not met, but rather
because the speciﬁcation is not relevant to the context in which the code is
used.
Another common misconception about speciﬁcations is that they can
always be implemented as runtime checks.
1
There are at least two fallacies
here:
1. The speciﬁcation is stated at the level of the source code. It may not
even be possible to test it at runtime. See chapter 32 for further dis
cussion of this point.
2. The speciﬁcation may not be mechanically checkable. For example,
we might specify that a function f of type int>int always yields
a nonnegative result. But there is no way to implement this as a run
time check — this is an undecidable, or uncomputable, problem.
For this reason speciﬁcations are strictly more powerful than runtime checks.
Correspondingly, it takes more work than a mere conditional branch to en
sure that a program satisﬁes a speciﬁcation.
Finally, it is important to note that speciﬁcation, implementation, and
veriﬁcation go handinhand. It is unrealistic to propose to verify that an
arbitrary piece of code satisﬁes an arbitrary speciﬁcation. Fundamental
computability and complexity results make clear that we can never succeed
in such an endeavor. Fortunately, it is also completely artiﬁcial. In practice
we specify, code, and verify simultaneously, with each activity informing
the other. If the veriﬁcation breaks down, reconsider the code or the spec
iﬁcation (or both). If the code is difﬁcult to write, look for insights from
the speciﬁcation and veriﬁcation. If the speciﬁcation is complex, rough out
the code and proof to see what it should be. There is no “magic bullet”,
1
This misconception is encouraged by the C assert macro, which introduces an exe
cutable test that a certain computable condition holds. This is a ﬁne thing, but from this
many people draw the conclusion that assertions (speciﬁcations) are simply boolean tests.
This is false.
WORKING DRAFT DECEMBER 9, 2002
24.2 Correctness Proofs 213
but these are some of the most important, and useful, tools for building
elegant, robust programs.
Veriﬁcations of speciﬁcations take many different forms.
• Type speciﬁcations are veriﬁed automatically by the ML compiler. If
you put type annotations on a function stating the types of the param
eters or results of that function, the correctness of these annotations
is ensured by the compiler. For example, we may state the intended
typing of fib as follows:
fun fib (n:int):int =
case n
of 0 => 1
 1 => 1
 n => fib (n1) + fib (n2)
This ensures that the type of fib is int>int.
• Effect speciﬁcations must be checked by hand. These generally state
that a piece of code “may raise” one or more exceptions. If an excep
tion is not mentioned in such a speciﬁcation, we cannot conclude that
the code does not raise it, only that we have no information. Notice
that a handler serves to eliminate a “may raise” speciﬁcation. For ex
ample, the following function may not raise the exception Overflow,
even though fib might:
fun protected fib n =
(fib n) handle Overflow => 0
A handler makes it possible to specify that an exception may not be
raised by a given expression (because the handler traps it).
• Inputoutput speciﬁcations require proof, typically using some form
of induction. For example, in chapter 7 we proved that fib n yields
the nth Fibonacci number by complete induction on n.
• Complexity speciﬁcations are often veriﬁed by solving a recurrence
describing the execution time of a program. In the case of fib we
may read off the following recurrence:
T(0) = 1
T(1) = 1
T(n + 2) = T(n) + T(n + 1) + 1
WORKING DRAFT DECEMBER 9, 2002
214 Speciﬁcations and Correctness
Solving this recurrence yields the proof that T(n) = O(2
n
).
• Equivalence speciﬁcations also require proof. Since equivalence of
expressions must account for all possible uses of them, these proofs
are, in general, very tricky. One method that often works is “induc
tion plus handsimulation”. For example, it is not hard to prove by
induction on n that fib n is equivalent to #1(fib’ n). First, plug
in n = 0 and n = 1, and calculate using the deﬁnitions. Then assume
the result for n and n + 1, and consider n + 2, once again calculating
based on the deﬁnitions, to obtain the result.
24.3 Enforcement and Compliance
Most speciﬁcations have the form of an implication: if certain conditions
are met, then the program behaves a certain way. For example, type spec
iﬁcations are conditional on the types of its free variables. For example,
if x has type int, then x+1 has type int. Inputoutput speciﬁcations are
characteristically of this form. For example, if x is nonnegative, then so is
x+1.
Just as in ordinary mathematical reasoning, if the premises of such a
speciﬁcation are not true, then all bets are off — nothing can be said about
the behavior of the code. The premises of the speciﬁcation are preconditions
that must be met in order for the program to behave in the manner de
scribed by the conclusion, or postcondition, of the speciﬁcation. This means
that the preconditions impose obligations on the caller, the user of the code,
in order for the callee, the code itself, to be wellbehaved. A conditional
speciﬁcation is a contract between the caller and the callee: if the caller
meets the preconditions, the caller promises to fulﬁll the postcondition.
In the case of type speciﬁcations the compiler enforces this obligation by
ruling out as illtyped any attempt to use a piece of code in a context that
does not fulﬁll its typing assumptions. Returning to the example above,
if one attempts to use the expression x+1 in a context where x is not an
integer, one can hardly expect that x+1 will yield an integer. Therefore
it is rejected by the type checker as a violation of the stated assumptions
governing the types of its free variables.
What about speciﬁcations that are not mechanically enforced? For ex
ample, if x is negative, then we cannot infer anything about x+1 from the
speciﬁcation given above.
2
To make use of the speciﬁcation in reasoning
2
There are obviously other speciﬁcations that carry more information, but we’re only
WORKING DRAFT DECEMBER 9, 2002
24.3 Enforcement and Compliance 215
about its used in a larger program, it is essential that this precondition be
met in the context of its use.
Lacking mechanical enforcement of these obligations, it is all too easy
to neglect them when writing code. Many programming mistakes can be
traced to violation of assumptions made by the callee that are not met by
the caller.
3
What can be done about this?
A standard method, called bulletprooﬁng, is to augment the callee with
runtime checks that ensure that its preconditions are met, raising an ex
ception if they are not. For example, we might write a “bulletproofed”
version of fib that ensures that its argument is nonnegative as follows:
local
exception PreCond
fun unchecked fib 0 = 1
 unchecked fib 1 = 1
 unchecked fib n =
unchecked fib (n1) + unchecked fib (n2)
in
fun checked fib n =
if n < 0 then
raise PreCond
else
unchecked fib n
end
It is worth noting that we have structured this program to take the
precondition check out of the loop. It would be poor practice to deﬁne
checked fib as follows:
fun bad checked fib n =
if n < 0 then
raise PreCond
else
case n
of 0 => 1
 1 => 1
 n => bad checked fib (n1) + bad checked fib (n2)
concerned here with the one given. Moreover, if f is an unknown function, then we will, in
general, only have the speciﬁcation, and not the code, to reason about.
3
Sadly, these assumptions are often unstated and can only be culled from the code with
great effort, if at all.
WORKING DRAFT DECEMBER 9, 2002
216 Speciﬁcations and Correctness
Once we know that the initial argument is nonnegative, it is assured that
recursive calls also satisfy this requirement, provided that you’ve done the
inductive reasoning to validate the speciﬁcation of the function.
However, bulletprooﬁng in this form has several drawbacks. First, it
imposes the overhead of checking on all callers, even those that have en
sured that the desired precondition is true. In truth the runtime overhead
is minor; the real overhead is requiring that the implementor of the callee
take the trouble to impose the checks.
Second, and far more importantly, bulletprooﬁng only applies to speci
ﬁcations that can be checked at runtime. As we remarked earlier, not all
speciﬁcations are amenable to runtime checks. For these cases there is
no question of inserting runtime checks to enforce the precondition. For
example, we may wish to impose the requirement that a function argu
ment of type int>int always yields a nonnegative result. There is no
runtime check for this condition — we cannot write a function nonneg
of type (int>int)>bool that determines whether or not a function f
always yields a nonnegative result. In chapter 32 we will consider the use
of data abstraction to enforce at compile time speciﬁcations that may not
be checked at runtime.
WORKING DRAFT DECEMBER 9, 2002
Chapter 25
Induction and Recursion
This chapter is concerned with the close relationship between recursionand
induction in programming. If a function is recursivelydeﬁned, an inductive
proof is required to show that it meets a speciﬁcation of its behavior. The
motto is
when programming recursively, think inductively.
Doing so signiﬁcantly reduces the time spent debugging, and often leads
to more efﬁcient, robust, and elegant programs.
25.1 Exponentiation
Let’s start with a very simple series of examples, all involving the compu
tation of the integer exponential function. Our ﬁrst example is to compute
2
n
for integers n ≥ 0. We seek to deﬁne the function exp of type int>int
satisfying the speciﬁcation
if n ≥ 0, then exp n evaluates to 2
n
.
The precondition, or assumption, is that the argument n is nonnegative. The
postcondition, or guarantee, is that the result of applying exp to n is the num
ber 2
n
. The caller is required to establish the precondition before applying
exp; in exchange, the caller may assume that the result is 2
n
.
Here’s the code:
fun exp 0 = 1
 exp n = 2 * exp (n1)
WORKING DRAFT DECEMBER 9, 2002
218 Induction and Recursion
Does this function satisfy the speciﬁcation? It does, and we can prove
this by induction on n. If n = 0, then exp n evaluates to 1 (as you can
see from the ﬁrst line of its deﬁnition), which is, of course, 2
0
. Otherwise,
assume that exp is correct for n − 1 ≥ 0, and consider the value of exp n.
From the second line of its deﬁnition we can see that this is the value of
2 p, where p is the value of exp (n−1). Inductively, p ≥ 2
n−1
, so 2 p =
2 2
n−1
= 2
n
, as desired. Notice that we need not consider arguments
n < 0 since the precondition of the speciﬁcation requires that this be so. We
must, however, ensure that each recursive call satisﬁes this requirement in
order to apply the inductive hypothesis.
That was pretty simple. Now let us consider the running time of exp
expressed as a function of n. Assuming that arithmetic operations are ex
ecuted in constant time, then we can read off a recurrence describing its
execution time as follows:
T(0) = O(1)
T(n + 1) = O(1) + T(n)
We are interested in solving a recurrence by ﬁnding a closedform expres
sion for it. In this case the solution is easily obtained:
T(n) = O(n)
Thus we have a linear time algorithm for computing the integer exponential
function.
What about space? This is a much more subtle issue than time because it
is much more difﬁcult in a highlevel language such as ML to see where the
space is used. Based on our earlier discussions of recursion and iteration
we can argue informally that the deﬁnition of exp given above requires
space given by the following recurrence:
S(0) = O(1)
S(n + 1) = O(1) + S(n)
The justiﬁcation is that the implementation requires a constant amount of
storage to record the pending multiplication that must be performed upon
completion of the recursive call.
Solving this simple recurrence yields the equation
S(n) = O(n)
expressing that exp is also a linear space algorithm for the integer exponen
tial function.
WORKING DRAFT DECEMBER 9, 2002
25.1 Exponentiation 219
Can we do better? Yes, on both counts! Here’s how. Rather than count
down by one’s, multiplying by two at each stage, we use successive squar
ing to achieve logarithmic time and space requirements. The idea is that
if the exponent is even, we square the result of raising 2 to half the given
power; otherwise, we reduce the exponent by one and double the result,
ensuring that the next exponent will be even. Here’s the code:
fun square (n:int) = n*n
fun double (n:int) = n+n
fun fast exp 0 = 1
 fast exp n =
if n mod 2 = 0 then
square (fast exp (n div 2))
else
double (fast exp (n1))
Its speciﬁcation is precisely the same as before. Does this code satisfy
the speciﬁcation? Yes, and we can prove this by using complete induction,
a form of mathematical induction in which we may prove that n > 0 has
a desired property by assuming not only that the predecessor has it, but
that all preceding numbers have it, and arguing that therefore n must have it.
Here’s how it’s done. For n = 0 the argument is exactly as before. Suppose,
then, that n > 0. If nis even, the value of exp n is the result of squaring the
value of exp (n ÷2). Inductively this value is 2
(n÷2)
, so squaring it yields
2
(ndiv2)
2
(n÷2)
= 2
2×(n÷2)
= 2
n
, as required. If, on the other hand, n is
odd, the value is the result of doubling exp (n −1). Inductively the latter
value is 2
(n−1)
, so doubling it yields 2
n
, as required.
Here’s a recurrence governing the running time of fast exp as a func
tion of its argument:
T(0) = O(1)
T(2n) = O(1) + T(n)
T(2n + 1) = O(1) + T(2n)
= O(1) + T(n)
Solving this recurrence using standard techniques yields the solution
T(n) = O(lgn)
You should convince yourself that fast exp also requires logarithmic space
usage.
WORKING DRAFT DECEMBER 9, 2002
220 Induction and Recursion
Can we do better? Well, it’s not possible to improve the time require
ment (at least not asymptotically), but we can reduce the space required to
O(1) by putting the function into iterative (tail recursive) form. However,
this may not be achieved in this case by simply adding an accumulator ar
gument, without also increasing the running time! The obvious approach
is to attempt to satisfy the speciﬁcation
if n ≥ 0, then skinny fast exp (n, a) evaluates to 2
n
a.
Here’s some code that achieves this speciﬁcation:
fun skinny fast exp (0, a) = a
 skinny fast exp (n, a) =
if n mod 2 = 0 then
skinny fast exp (n div 2,
skinny fast exp (n div 2, a))
else
skinny fast exp (n1, 2*a)
It is easy to see that this code works properly for n = 0 and for n > 0
when n is odd, but what if n > 0 is even? Then by induction we compute
2
(n÷2)
2
(n÷2
) a by two recursive calls to skinny fast exp.
This yields the desired result, but what is the running time? Here’s a
recurrence to describe its running time as a function of n:
T(0) = 1
T(2n) = O(1) + 2T(n)
T(2n + 1) = O(1) + T(2n)
= O(1) + 2T(n)
Here again we have a standard recurrence whose solution is
T(n) = O(nlgn).
Yuck! Can we do better? The key is to recall the following important fact:
2
(2n)
= (2
2
)
n
= 4
n
.
We can achieve a logarithmic time and exponential space bound by a change
of base. Here’s the speciﬁcation:
if n ≥ 0, then gen skinny fast exp (b, n, a) evaluates to
b
n
a.
WORKING DRAFT DECEMBER 9, 2002
25.1 Exponentiation 221
Here’s the code:
fun gen skinny fast exp (b, 0, a) = a
 gen skinny fast exp (b, n, a) =
if n mod 2 = 0 then
gen skinny fast exp (b*b, n div 2, a)
else
gen skinny fast exp (b, n  1, b * a)
Let’s check its correctness by complete induction on n. The base case is
obvious. Assume the speciﬁcation for arguments smaller than n > 0. If n
is even, then by induction the result is (b b)
(n÷2)
a = b
n
a, and if n is
odd, we obtain inductively the result b
(n−1)
ba = b
n
a. This completes
the proof.
The trick to achieving an efﬁcient implementation of the exponential
function was to compute a more general function that can be implemented
using less time and space. Since this is a trick employed by the implemen
tor of the exponential function, it is important to insulate the client from
it. This is easily achieved by using a local declaration to “hide” the gen
eralized function, making available to the caller a function satisfying the
original speciﬁcation. Here’s what the code looks like in this case:
local
fun gen skinny fast exp (b, 0, a) =
 gen skinny fast exp (b, n, a) = ...
in
fun exp n = gen skinny fast exp (2, n, 1)
end
(The ellided code is the same as above.) The point here is to see how induc
tion and recursion go handinhand, and how we used induction not only
to verify programs afterthefact, but, more importantly, to help discover
the program in the ﬁrst place. If the veriﬁcation is performed simultane
ously with the coding, it is far more likely that the proof will go through
and the program will work the ﬁrst time you run it.
It is important to notice the correspondence between strengthening the
speciﬁcation by adding additional assumptions (and guarantees) and adding
accumulator arguments. What we observe is the apparent paradox that it
is often easier to do something (superﬁcially) harder! In terms of proving, it
is often easier to push through an inductive argument for a stronger spec
iﬁcation, precisely because we get to assume the result as the inductive
WORKING DRAFT DECEMBER 9, 2002
222 Induction and Recursion
hypothesis when arguing the inductive step(s). We are limited only by the
requirement that the speciﬁcation be proved outright at the base case(s);
no inductive assumption is available to help us along here. In terms of
programming, it is often easier to compute a more complicated function
involving accumulator arguments, precisely because we get to exploit the
accumulator when making recursive calls. We are limited only by the re
quirement that the result be deﬁned outright for the base case(s); no recur
sive calls are available to help us along here.
25.2 The GCD Algorithm
Let’s consider a more complicated example, the computation of the greatest
common divisor of a pair of nonnegative integers. Recall that m is a divisor
of n, written m[n, iff n is a multiple of m, which is to say that there is some
k ≥ 0 such that n = k m. The greatest common divisor of nonnegative
integers m and n is the largest p such that p[m and p[n. (By convention the
g.c.d. of 0 and 0 is taken to be 0.) Here’s the speciﬁcation of the gcdfunction:
if m, n ≥ 0, then gcd(m,n) evaluates to the g.c.d. of m and n.
Euclid’s algorithm for computing the g.c.d. of m andn is deﬁned by
complete induction on the product mn. Here’s the algorithm, written in
ML:
fun gcd (m:int, 0):int = m
 gcd (0, n:int):int = n
 gcd (m:int, n:int):int =
if m>n then
gcd (m mod n, n)
else
gcd (m, n mod m)
Why is this algorithm correct? We may prove that gcd satisﬁes the
speciﬁcation by complete induction on the product mn. If mn is zero,
then either mor n is zero, in which case the answer is, correctly, the other
number. Otherwise the product is positive, and we proceed according to
whether m > n or m ≤ n. Suppose that m > n. Observe that mmodn =
m − (m ÷ n) n, so that (mmodn) n = m n − (m ÷ n)n
2
< m n,
so that by induction we return the g.c.d. of mmodn and n. It remains to
show that this is the g.c.d. of m and n. If d divides both mmodn and n, then
kd = (mmodn) = (m−(m÷n)n)and l d = n for some nonnegative k
WORKING DRAFT DECEMBER 9, 2002
25.2 The GCD Algorithm 223
and l. Consequently, kd = m−(m÷n)l d, so m = (k+(m÷n)l)d,
which is to say that d divides m. Now if d’ is any other divisor of m and n,
then it is also a divisor of (mmodn) and n, so d > d
. That is, d is the g.c.d.
of m and n. The other case, m ≤ n, follows similarly. This completes the
proof.
At this point you may well be thinking that all this inductive reasoning
is surely helpful, but it’s no replacement for good oldfashioned “bullet
prooﬁng” — conditional tests inserted at critical junctures to ensure that
key invariants do indeed hold at execution time. Sure, you may be think
ing, these checks have a runtime cost, but they can be turned off once the
code is in production, and anyway the cost is minimal compared to, say, the
time required to read and write from disk. It’s hard to complain about this
attitude, provided that sufﬁciently cheap checks can be put into place and
provided that you know where to put them to maximize their effectiveness.
For example, there’s no use checking i > 0 at the start of the then clause of
a test for i > 0. Barring compiler bugs, it can’t possibly be anything other
than the case at that point in the program. Or it may be possible to insert
a check whose computation is more expensive (or more complicated) than
the one we’re trying to perform, in which case we’re defeating the purpose
by including them!
This raises the question of where should we put such checks, and what
checks should be included to help ensure the correct operation (or, at least,
graceful malfunction) of our programs? This is an instance of the general
problem of writing selfchecking programs. We’ll illustrate the idea by elab
orating on the g.c.d. example a bit further. Suppose we wish to write a
selfchecking g.c.d. algorithm that computes the g.c.d., and then checks
the result to ensure that it really is the greatest common divisor of the two
given nonnegative integers before returning it as result. The code might
look something like this:
exception GCD ERROR
fun checked gcd (m, n) =
let
val d = gcd (m, n)
in
if m mod d = 0 andalso
n mod d = 0 andalso ???
then
d
else
WORKING DRAFT DECEMBER 9, 2002
224 Induction and Recursion
raise GCD ERROR
end
It’s obviously no problem to check that putative g.c.d., d, is in fact a com
mon divisor of mand n, but how do we check that it’s the greatest common
divisor? Well, one choice is to simply try all numbers between d and the
smaller of m and n to ensure that no intervening number is also a divisor,
refuting the maximality of d. But this is clearly so inefﬁcient as to be im
practical. But there’s a better way, which, it has to be emphasized, relies
on the kind of mathematical reasoning we’ve been considering right along.
Here’s an important fact:
d is the g.c.d. of m and n iff d divides both m and n and can be written
as a linear combination of m and n.
That is, d is the g.c.d. of mand n iff m = k d for some k ≥ 0, n = l d for
some l ≥ 0, and d = a m + b n for some integers (possibly negative!)
a and b. We’ll prove this constructively by giving a program to compute
not only the g.c.d. d of m and n, but also the coefﬁcients a and b such that
d = a m + b n. Here’s the speciﬁcation:
if m, n ≥ 0, then ggcd (m, n) evaluates to (d, a, b) such that
d divides m, d divides n, and d = a m + b n; consequently, d is
the g.c.d. of m and n.
And here’s the code to compute it:
fun ggcd (0, n) = (n, 0, 1)
 ggcd (m, 0) = (m, 1, 0)
 ggcd (m, n) =
if m>n then
let
val (d, a, b) = ggcd (m mod n, n)
in
(d, a, b  a * (m div n))
end
else
let
val (d, a, b) = ggcd (m, n mod m)
in
(d, a  b*(n div m), b)
end
WORKING DRAFT DECEMBER 9, 2002
25.2 The GCD Algorithm 225
We may easily check that this code satisﬁes the speciﬁcation by induction
on the product m n. If m n = 0, then either m or n is 0, in which case
the result follows immediately. Otherwise assume the result for smaller
products, and show it for m n > 0. Suppose m > n; the other case
is handled analogously. Inductively we obtain d, a, and b such that d is
the g.c.d. of mmodn and n, and hence is the g.c.d. of m and n, and d =
a (mmodn) + b n. Since mmodn = m − (m ÷ n) n, it follows that
d = a m + (b −a (m÷n)) n, from which the result follows.
Now we can write a selfchecking g.c.d. as follows:
exception GCD ERROR
fun checked gcd (m, n) =
let
val (d, a, b) = ggcd (m, n)
in
if m mod d = 0 andalso
n mod d = 0 andalso d = a*m+b*n
then
d
else
raise GCD ERROR
end
This algorithm takes no more time (asymptotically) than the original, and,
moreover, ensures that the result is correct. This illustrates the power of
the interplay between mathematical reasoning methods such as induction
and number theory and programming methods such as bulletprooﬁng to
achieve robust, reliable, and, what is more important, elegant programs.
WORKING DRAFT DECEMBER 9, 2002
226 Induction and Recursion
WORKING DRAFT DECEMBER 9, 2002
Chapter 26
Structural Induction
The importance of induction and recursion are not limited to functions de
ﬁned over the integers. Rather, the familiar concept of mathematical induc
tion over the natural numbers is an instance of the more general notion of
structural induction over values of an inductivelydeﬁned type. Rather than
develop a general treatment of inductivelydeﬁned types, we will rely on a
few examples to illustrate the point. Let’s begin by considering the natural
numbers as an inductively deﬁned type.
26.1 Natural Numbers
The set of natural numbers, N, may be thought of as the smallest set con
taining 0 and closed under the formation of successors. In other words, n is
an element of N iff either n = 0 or n = m+1 for some min N. Still another
way of saying it is to deﬁne N by the following clauses:
1. 0 is an element of N.
2. If m is an element of N, then so is m + 1.
3. Nothing else is an element of N.
(The third clause is sometimes called the extremal clause; it ensures that we
are talking about N and not just some superset of it.) All of these deﬁnitions
are equivalent ways of saying the same thing.
Since N is inductively deﬁned, we may prove properties of the natural
numbers by structural induction, which in this case is just ordinary mathe
matical induction. Speciﬁcally, to prove that a property P(n) holds of every
n in N, it sufﬁces to demonstrate the following facts:
WORKING DRAFT DECEMBER 9, 2002
228 Structural Induction
1. Show that P(0) holds.
2. Assuming that P(m) holds, show that P(m + 1) holds.
The pattern of reasoning follows exactly the structure of the inductive def
inition — the base case is handled outright, and the inductive step is han
dled by assuming the property for the predecessor and show that it holds
for the numbers.
The principal of structural induction also licenses the deﬁnition of func
tions by structural recursion. To deﬁne a function f with domain N, it suf
ﬁces to proceed as follows:
1. Give the value of f(0).
2. Give the value of f(m + 1) in terms of the value of f(m).
Given this information, there is a unique function f with domain N
satisfying these requirements. Speciﬁcally, we may show by induction on
n ≥ 0 that the value of f is uniquely determined on all values m ≤ n. If
n = 0, this is obvious since f(0) is deﬁned by the ﬁrst clause. If n = m + 1,
then by induction the value of f is determined for all values k ≤ m. But the
value of f at n is determined as a function of f(m), and hence is uniquely
determined. Thus f is uniquely determined for all values of n in N, as was
to be shown.
The natural numbers, viewed as an inductivelydeﬁned type, may be
represented in ML using a datatype declaration, as follows:
datatype nat = Zero  Succ of nat
The constructors correspond oneforone with the clauses of the inductive
deﬁnition. The extremal clause is implicit in the datatype declaration
since the given constructors are assumed to be all the ways of building
values of type nat. This assumption forms the basis for exhaustiveness
checking for clausal function deﬁnitions.
(You may object that this deﬁnition of the type nat amounts to a unary
(base 1) representation of natural numbers, an unnatural and spacewasting
representation. This is indeed true; in practice the natural numbers are
represented as nonnegative machine integers to avoid excessive overhead.
Note, however, that this representation places a ﬁxed upper bound on the
size of numbers, whereas the unary representation does not. Hybrid repre
sentations that enjoy the beneﬁts of both are, of course, possible and occa
sionally used when enormous numbers are required.)
Functions deﬁned by structural recursion are naturally represented by
clausal function deﬁnitions, as in the following example:
WORKING DRAFT DECEMBER 9, 2002
26.2 Lists 229
fun double Zero = Zero
 double (Succ n) = Succ (Succ (double n))
fun exp Zero = Succ(Zero)
 exp (Succ n) = double (exp n)
The type checker ensures that we have covered all cases, but it does not
ensure that the pattern of structural recursion is strictly followed —we may
accidentally deﬁne f(m + 1) in terms of itself or some f(k) where k > m,
breaking the pattern. The reason this is admitted is that the ML compiler
cannot always follow our reasoning: we may have a clever algorithm in
mind that isn’t easily expressed by a simple structural induction. To avoid
restricting the programmer, the language assumes the best and allows any
form of deﬁnition.
Using the principle of structure induction for the natural numbers, we
may prove properties of functions deﬁned over the naturals. For example,
we may easily prove by structural induction over the type nat that for
every n ∈ N, exp n evaluates to a positive number. (In previous chapters
we carried out proofs of more interesting program properties.)
26.2 Lists
Generalizing a bit, we may think of the type ’a list as inductively de
ﬁned by the following clauses:
1. nil is a value of type ’a list
2. If h is a value of type ’a, and t is a value of type ’a list, then h::t
is a value of type ’a list.
3. Nothing else is a value of type ’a list.
This deﬁnition licenses the following principle of structural induction
over lists. To prove that a property P holds of all lists l, it sufﬁces to proceed
as follows:
1. Show that P holds for nil.
2. Show that P holds for h::t, assuming that P holds for t.
Similarly, we may deﬁne functions by structural recursion over lists as
follows:
1. Deﬁne the function for nil.
WORKING DRAFT DECEMBER 9, 2002
230 Structural Induction
2. Deﬁne the function for h::t in terms of its value for t.
The clauses of the inductive deﬁnition of lists correspond to the follow
ing (builtin) datatype declaration in ML:
datatype ’a list = nil  :: of ’a * ’a list
(We are neglecting the fact that :: is regarded as an inﬁx operator.)
The principle of structural recursion may be applied to deﬁne the re
verse function as follows:
fun reverse nil = nil
 reverse (h::t) = reverse t @ [h]
There is one clause for each constructor, and the value of reverse for h::t is
deﬁned in terms of its value for t. (We have ignored questions of time and
space efﬁciency to avoid obscuring the induction principle underlying the
deﬁnition of reverse.)
Using the principle of structural induction over lists, we may prove that
reverse l evaluates to the reversal of l. First, we show that reverse
nil yields nil, as indeed it does and ought to. Second, we assume that
reverse t yields the reversal of t, and argue that reverse (h::t) yields
the reversal of h::t, as indeed it does since it returns reverse (t @ [h]).
26.3 Trees
Generalizing even further, we can introduce new inductivelydeﬁned types
such as 23 trees in which interior nodes are either binary (have two chil
dren) or ternary (have three children). Here’s a deﬁnition of 23 trees in
ML:
datatype ’a twth tree =
Empty 
Bin of ’a * ’a twth tree * ’a twth tree 
Ter of ’a * ’a twth tree * ’a twth tree * ’a twth tree
How might one deﬁne the “size” of a value of this type? Your ﬁrst
thought should be to write down a template like the following:
fun size Empty = ???
 size (Bin ( , t1, t2)) = ???
 size (Ter ( , t1, t2, t3)) = ???
WORKING DRAFT DECEMBER 9, 2002
26.4 Generalizations and Limitations 231
We have one clause per constructor, and will ﬁll in the ellided expressions
to complete the deﬁnition. In many cases (including this one) the function
is deﬁned by structural recursion. Here’s the complete deﬁnition:
fun size Empty = 0
 size (Bin ( , t1, t2)) =
1 + size t1 + size t2
 size (Ter ( , t1, t2, t3)) =
1 + size t1 + size t2 + size t3
Obviously this function computes the number of nodes in the tree, as you
can readily verify by structural induction over the type ’a twth tree.
26.4 Generalizations and Limitations
Does this pattern apply to every datatype declaration? Yes and no. No
matter what the form of the declaration it always makes sense to deﬁne a
function over it by a clausal function deﬁnition with one clause per con
structor. Such a deﬁnition is guaranteed to be exhaustive (cover all cases),
and serves as a valuable guide to structuring your code. (It is especially
valuable if you change the datatype declaration, because then the compiler
will inform you of what clauses need to be added or removed from func
tions deﬁned over that type in order to restore it to a sensible deﬁnition.)
The slogan is:
To deﬁne functions over a datatype, use a clausal deﬁnition
with one clause per constructor
The catch is that not every datatype declaration supports a principle
of structural induction because it is not always clear what constitutes the
predecessor(s) of a constructed value. For example, the declaration
datatype D = Int of int  Fun of D>D
is problematic because a value of the form Fun(f) is not constructed di
rectly from another value of type D, and hence it is not clear what to regard
as its predecessor. In practice this sort of deﬁnition comes up only rarely;
in most cases datatypes are naturally viewed as inductively deﬁned.
WORKING DRAFT DECEMBER 9, 2002
232 Structural Induction
26.5 Abstracting Induction
It is interesting to observe that the pattern of structural recursion may be di
rectly codiﬁed in ML as a higherorder function. Speciﬁcally, we may asso
ciate with each inductivelydeﬁned type a higherorder function that takes
as arguments values that determine the base case(s) and step case(s) of the
deﬁnition, and deﬁnes a function by structural induction based on these
arguments. An example will illustrate the point. The pattern of structural
induction over the type nat may be codiﬁed by the following function:
fun nat rec base step =
let
fun loop Zero = base
 loop (Succ n) = step (loop n)
in
loop
end
This function has the type ’a > (’a > ’a) > nat > ’a.
Given the ﬁrst two arguments, nat rec yields a function of type nat
> ’a whose behavior is determined at the base case by the ﬁrst argument
and at the inductive step by the second. Here’s an example of the use of
nat rec to deﬁne the exponential function:
val double =
nat rec Zero (fn result => Succ (Succ result))
val exp =
nat rec (Succ Zero) double
Note well the pattern! The arguments to nat rec are
1. The value for Zero.
2. The value for Succ n deﬁned in terms of its value for n.
Similarly, the pattern of list recursion may be captured by the following
functional:
fun list recursion base step =
let
fun loop nil = base
 loop (h::t) = step (h, loop t)
in
WORKING DRAFT DECEMBER 9, 2002
26.5 Abstracting Induction 233
loop
end
The type of the function list recursion is
’a > (’b * ’a > ’a) > ’b list > ’a
It may be instantiated to deﬁne the reverse function as follows:
val reverse = list recursion nil (fn (h, t) => t @ [h])
Finally, the principle of structural recursion for values of type ’a twth tree
is given as follows:
fun twth rec base bin step ter step =
let
fun loop Empty = base
 loop (Bin (v, t1, t2)) =
bin step (v, loop t1, loop t2)
 loop (Ter (v, t1, t2, t3)) =
ter step (v, loop t1, loop t2, loop t3)
in
loop
end
Notice that we have two inductive steps, one for each formof node. The
type of twth rec is
’a > (’b * ’a * ’a > ’a) > (’b * ’a * ’a * ’a > ’a) > ’b twth tree > ’a
We may instantiate it to deﬁne the function size as follows:
val size =
twth rec 0
(fn ( , s1, s2)) => 1+s1+s2)
(fn ( , s1, s2, s3)) => 1+s1+s2+s3)
Summarizing, the principle of structural induction over a recursive datatype
is naturally codiﬁed in ML using pattern matching and higherorder func
tions. Whenever you’re programming with a datatype, you should use the
techniques outlined in this chapter to structure your code.
WORKING DRAFT DECEMBER 9, 2002
234 Structural Induction
WORKING DRAFT DECEMBER 9, 2002
Chapter 27
ProofDirected Debugging
In this chapter we’ll put speciﬁcation and veriﬁcation techniques to work in
devising a regular expression matcher. The code is similar to that sketched
in chapter 1, but we will use veriﬁcation techiques to detect and correct a
subtle error that may not be immediately apparent from inspecting or even
testing the code. We call this process proofdirected debugging.
The ﬁrst task is to devise a precise speciﬁcation of the regular expres
sion matcher. This is a difﬁcult problem in itself. We then attempt to verify
that the matching program developed in chapter 1 satisﬁes this speciﬁca
tion. The proof attempt breaks down. Careful examination of the failure
reveals a counterexample to the speciﬁcation — the program does not sat
isfy it. We then consider how best to resolve the problem, not by change of
implementation, but instead by change of speciﬁcation.
27.1 Regular Expressions and Languages
Before we begin work on the matcher, let us ﬁrst deﬁne the set of regular ex
pressions and their meaning as a set of strings. The set of regular expressions
is given by the following grammar:
r : : = 0 [ 1 [ a [ r
1
r
2
[ r
1
+r
2
[ r
∗
Here a ranges over a given alphabet, a set of primitive “letters” that may
be used in a regular expression. A string is a ﬁnite sequence of letters of
the alphabet. We write ε for the null string, the empty sequence of letters.
We write s
1
s
2
for the concatenation of the strings s
1
and s
2
, the string s
consisting of the letters in s
1
followed by those in s
2
. The length of a string
is the number of letters in it. We do no distinguish between a character and
WORKING DRAFT DECEMBER 9, 2002
236 ProofDirected Debugging
the unitlength string consisting solely of that character. Thus we write a s
for the extension of s with the letter a at the front.
A language is a set of strings. Every regular expression r stands for a
particular language L(r), the language of r, which is deﬁned by induction
on the structure of r as follows:
L(0) = 0
L(1) = 1
L(a) = ¦ a¦
L(r
1
r
2
) = L(r
1
) L(r
2
)
L(r
1
+r
2
) = L(r
1
) + L(r
2
)
L(r
∗
) = L(r)
∗
This deﬁnition employs the following operations on languages:
0 = ∅
1 = ¦ ε ¦
L
1
+ L
2
= L
1
∪ L
2
L
1
L
2
= ¦ s
1
s
2
[ s
1
∈ L
1
, s
2
∈ L
2
¦
L
(0)
= 1
L
(i+1)
= LL
(i)
L
∗
=
i≥0
L
(i)
An important fact about L
∗
is that it is the smallest language L
such
that 1 + LL
⊆ L
. Spelled out, this means two things:
1. 1 + LL
∗
⊆ L
∗
, which is to say that
(a) ε ∈ L
∗
, and
(b) if s ∈ L and s
∈ L
∗
, then s s
∈ L
∗
.
2. If 1 + LL
⊆ L
, then L
∗
⊆ L
.
This means that L
∗
is the smallest language (with respect to language con
tainment) that contains the null string and is closed under concatenation
on the left by L.
Let’s prove that this is the case. First, since L
(0)
= 1, it follows immedi
ately that ε ∈ L
∗
. Second, if l ∈ L and l
∈ L
∗
, then l
∈ L
(i)
for some i ≥ 0,
and hence l l
∈ L
(i+1)
by deﬁnition of concatentation of languages. This
completes the ﬁrst step. Now suppose that L
is such that 1 + LL
⊆ L
.
We are to show that L
∗
⊆ L
. We show by induction on i ≥ 0 that L
(i)
⊆ L
,
from which the result follows immediately. If i = 0, then it sufﬁces to show
WORKING DRAFT DECEMBER 9, 2002
27.2 Specifying the Matcher 237
that ε ∈ L
. But this follows from the assumption that 1 + LL
⊆ L
, which
implies that 1 ⊆ L
. To show that L
(i+1)
⊆ L
, we observe that, by deﬁni
tion, L
(i+1)
= LL
(i)
. By induction L
(i)
⊆ L
, and hence LL
(i)
⊆ L
, since
LL
⊆ L
by assumption.
Having proved that L
∗
is the smallest language L
such that 1 + LL
⊆
L
, it is not hard to prove that L
∗
satisﬁes the recurrence L
∗
= 1 + LL
∗
.
We just proved the right to left containment. For the converse, it sufﬁces
to observe that 1 + L(1 + LL
∗
) ⊆ 1 + LL
∗
, for then the result follows by
minimality the result. This is easily established by a simple case analysis.
Exercise 1
Give a full proof of the fact that L
∗
= 1 + LL
∗
.
Finally, a word about implementation. We will assume in what follows
that the alphabet is given by the type char of characters, that strings are
elements of the type string, and that regular expressions are deﬁned by
the following datatype declaration:
datatype regexp =
Zero  One  Char of char 
Plus of regexp * regexp 
Times of regexp * regexp 
Star of regexp
We will also work with lists of characters, values of type char list, using
ML notation for primitive operations on lists such as concatenation and
extension. Occasionally we will abuse notation and not distinguish (in the
informal discussion) between a string and a list of characters. In particular
we will speak of a character list as being a member of a language, when in
fact we mean that the corresponding string is a member of that language.
27.2 Specifying the Matcher
Let us begin by devising a speciﬁcation for the regular expression matcher.
As a ﬁrst cut we write down a type speciﬁcation. We seek to deﬁne a
function match of type regexp > string > bool that determines
whether or not a given string matches a given regular expression. More
precisely, we wish to satisfy the following speciﬁcation:
For every regular expression r and every string s, match r s
terminates, and evaluates to true iff s ∈ L(r).
WORKING DRAFT DECEMBER 9, 2002
238 ProofDirected Debugging
We saw in chapter 1 that a natural way to deﬁne the procedure match
is to use a technique called continuation passing. We deﬁned an auxiliary
function match is with the type
regexp > char list > (char list > bool) > bool
that takes a regular expression, a list of characters (essentially a string,
but in a form suitable for incremental processing), and a continuation, and
yields a boolean. The idea is that match is takes a regular expression r, a
character list cs, and a continuation k, and determines whether or not some
initial segment of cs matches r, passing the remaining characters cs
to k in
the case that there is such an initial segment, and yields false otherwise.
Put more precisely,
For every regular expression r, character list cs, and continu
ation k, if cs = cs
@cs
with cs
∈ L(r) and k cs
evaluates
to true, then match is r cs k evaluates true; otherwise,
match is r cs k evaluates to false.
Unfortunately, this speciﬁcation is too strong to ever be satisﬁed by any
program! Can you see why? The difﬁculty is that if k is not guaranteed to
terminate for all inputs, then there is no way that match is can behave as
required. For example, if there is no input on which k terminates, the spec
iﬁcation requires that match is return false. It should be intuitively clear
that we can never implement such a function. Instead, we must restrict
attention to total continuations, those that always terminate with true or
false on any input. This leads to the following revised speciﬁcation:
For every regular expression r, character list cs, and total con
tinuation k, if cs = cs
cs
with cs
∈ L(r) and k cs
evaluates
to true, then match is r cs k evaluates to true; otherwise,
match is r cs k evaluates to false.
Observe that this speciﬁcation makes use of an implicit existential quan
tiﬁcation. Written out in full, we might say “For all . . . , if there exists cs
and
cs
such that cs = cs
cs
with . . . , then . . . ”. This observation makes clear
that we must search for a suitable splitting of cs into two parts such that the
ﬁrst part is in L(r) and the second is accepted by k. There may, in general,
be many ways to partition the input to as to satisfy both of these require
ments; we need only ﬁnd one such way. Note, however, that if cs = cs
@cs
with cs
∈ L(r) but k cs
yielding false, we must reject this partitioning
and search for another. In other words we cannot simply accept any par
titioning whose initial segment matches r, but rather only those that also
WORKING DRAFT DECEMBER 9, 2002
27.2 Specifying the Matcher 239
induce k to accept its corresponding ﬁnal segment. We may return false
only if there is no such splitting, not merely if a particular splitting fails to
work.
Suppose for the moment that match is satisﬁes this speciﬁcation. Does
it followthat match satisﬁes the original speciﬁcation? Recall that the func
tion match is deﬁned as follows:
fun match r s =
match is r
(String.explode s)
(fn nil => true  false)
Notice that the initial continuation is indeed total, and that it yields true
(accepts) iff it is applied to the null string. Therefore match satisﬁes the fol
lowing property obtained from the speciﬁcation of mathc is by plugging
in the initial continuation:
For every regular expression r and string s, if s ∈ L(r), then
match r s evaluates to true, and otherwise match r s eval
uates to false.
This is precisely the property that we desire for match. Thus match is
correct (satisﬁes its speciﬁcation) if match is is correct.
So far so good. But does match is satisfy its speciﬁcation? If so, we are
done. How might we check this? Recall the deﬁnition of match is given
in the overview:
fun match is Zero k = false
 match is One cs k = k cs
 match is (Char c) nil k = false
 match is (Char c) (d::cs) k =
if c=d then k cs else false
 match is (Times (r1, r2)) cs k =
match is r1 cs (fn cs’ => match is r2 cs’ k)
 match is (Plus (r1, r2)) cs k =
match is r1 cs k orelse match is r2 cs k
 match is (Star r) cs k =
k cs orelse
match is r cs (fn cs’ => match is (Star r) cs’ k)
Since match is is deﬁned by a recursive analysis of the regular expression
r, it is natural to proceed by induction on the structure of r. That is, we treat
WORKING DRAFT DECEMBER 9, 2002
240 ProofDirected Debugging
the speciﬁcation as a conjecture about match is, and attempt to prove it
by structural induction on r.
We ﬁrst consider the three base cases. Suppose that r is 0. Then no
string is in L(r), so match is must return false, which indeed it does.
Suppose that r is 1. Since the null string is an initial segment of every
string, and the null string is in L(1), we must yield true iff k cs yields
true, and false otherwise. This is precisely how match is is deﬁned.
Suppose that r is a. Then to succeed cs must have the form acs
with k cs
evaluating to true; otherwise we must fail. The code for match is checks
that cs has the required form and, if so, passes cs
to k to determine the
outcome, and otherwise yields false. Thus match is behaves correctly
for each of the three base cases.
We now consider the three inductive steps. For r = r
1
+r
2
, we observe
that some initial segment of cs matches r and causes k to accept the corre
sponding ﬁnal segment of cs iff either some initial segment matches r
1
and
drives k to accept the rest or some initial segment matches r
2
and drives k
to accept the rest. By induction match is works as speciﬁed for r
1
and r
2
,
which is sufﬁcient to justify the correctness of match is for r = r
1
+r
2
.
For r = r
1
r
2
, the proof is slightly more complicated. By induction
match is behaves according to the speciﬁcation if it is applied to either
r
1
or r
2
, provided that the continuation argument is total. Note that the
continuation k
given by fn cs’ => match is r2 cs’ k is total, since
by induction the inner recursive call to match is always terminates. Sup
pose that there exists a partitioning cs = cs
@cs
with cs
∈ L(r)and k cs
evaluating to true. Then cs
= cs
1
cs
2
with cs
1
∈ L(r
1
) and cs
2
∈ L(r
2
), by
deﬁnition of L(r
1
r
2
). Consequently, match is r
2
(cs
2
cs
) k evaluates
to true, and hence match is r
1
cs
1
cs
2
cs
k
evaluates to true, as re
quired. If, however, no such partitioning exists, then one of three situations
occurs:
1. either no initial segment of cs matches r
1
, in which case the outer
recursive call yields false, as required, or
2. for every initial segment matching r
1
, no initial segment of the corre
sponding ﬁnal segment matches r
2
, in which case the inner recursive
call yields false on every call, and hence the outer call yields false,
as required, or
3. every pair of successive initial segments of cs matching r
1
and r
2
suc
cessively results in k evaluating to false, in which case the inner
WORKING DRAFT DECEMBER 9, 2002
27.2 Specifying the Matcher 241
recursive call always yields false, and hence the continuation k
al
ways yields false, and hence the outer recursive call yields false,
as required.
Be sure you understand the reasoning involved here, it is quite tricky to get
right!
We seem to be on track, with one more case to consider, r = r
1
∗
. This
case would appear to be a combination of the preceding two cases for al
ternation and concatenation, with a similar argument sufﬁcing to establish
correctness. But there is a snag: the second recursive call to match is
leaves the regular expression unchanged! Consequently we cannot apply
the inductive hypothesis to establish that it behaves correctly in this case,
and the obvious proof attempt breaks down.
What to do? A moment’s thought suggests that we proceed by an inner
induction on the length of the string, based on the idea that if some initial
segment of cs matches L(r), then either that initial segment is the null string
(base case), or cs = cs
@cs
with cs
∈ L(r
1
) and cs
∈ L(r) (induction
step). We then handle the base case directly, and handle the inductive case
by assuming that match is behaves correctly for cs
and showing that it
behaves correctly for cs. But there is a ﬂaw in this argument — the string
cs
need not be shorter than cs in the case that cs
is the null string! In that
case the inductive hypothesis does not apply, and we are once again unable
to complete the proof.
This time we can use the failure of the proof to obtain a counterexample
to the speciﬁcation! For if r = 1
∗
, for example, then match is r cs k
does not terminate! In general if r = r
1
∗
with ε ∈ L(r
1
), then match is
r cs k fails to terminate. In other words, match is does not satisfy the
speciﬁcation we have given for it. Our conjecture is false!
Our failure to establish that match is satisﬁes its speciﬁcation lead to a
counterexample that refuted our conjecture and uncovered a genuine bug
in the program — the matcher may not terminate for some inputs. What
to do? One approach is to explicitly check for looping behavior during
matching by ensuring that each recursive calls matches some nonempty
initial segment of the string. This will work, but at the expense of cluttering
the code and imposing additional runtime overhead. You should write
out a version of the matcher that works this way, and check that it indeed
satisﬁes the speciﬁcation we’ve given above.
An alternative is to observe that the proof goes through under the ad
ditional assumption that no iterated regular expression matches the null
string. Call a regular expression r standard iff whenever r
∗
occurs within
WORKING DRAFT DECEMBER 9, 2002
242 ProofDirected Debugging
r, the null string is not an element of L(r
). It is easy to check that the proof
given above goes through under the assumption that the regular expres
sion r is standard.
This says that the matcher works correctly for standard regular expres
sions. But what about the nonstandard ones? The key observation is that
every regular expression is equivalent to one in standard form. By “equivalent”
we mean “accepting the same language”. For example, the regular expres
sions r +0 and r are easily seen to be equivalent. Using this observation we
may avoid the need to consider nonstandard regular expressions. Instead
we can preprocess the regular expression to put it into standard form, then
call the matcher on the standardized regular expression.
The required preprocessing is based on the following deﬁnitions. We
will associate with each regular expression r two standard regular expres
sions δ(r) and r
−
with the following properties:
1. L(δ(r)) = 1 iff ε ∈ L(δ(r)) and L(δ(r)) = 0 otherwise.
2. L(r
−
) = L(r) ¸ 1.
With these equations in mind, we see that every regular expression r may
be written in the form δ(r) +r
−
, which is in standard form.
The function δ mapping regular expressions to regular expressions is
deﬁned by induction on regular expressions by the following equations:
δ(0) = 0
δ(1) = 1
δ(a) = 0
δ(r
1
+r
2
) = δ(r
1
) ⊕δ(r
2
)
δ(r
1
r
2
) = δ(r
1
) ⊗δ(r
2
)
δ(r
∗
) = 1
Here we deﬁne 0 ⊕ 1 = 1 ⊕ 0 = 1 ⊕ 1 = 1 and 0 ⊕ 0 = 0 and 0 ⊗ 1 =
1 ⊗0 = 0 ⊗0 = 0 and 1 ⊗1 = 1.
Exercise 2
Show that L(δ(r)) = 1 iff ε ∈ L(r).
WORKING DRAFT DECEMBER 9, 2002
27.2 Specifying the Matcher 243
The deﬁnition of r
−
is given by induction on the structure of r by the
following equations:
0
−
= 0
1
−
= 0
a
−
= 0
(r
1
+r
2
)
−
= r
−
1
+r
−
2
(r
1
r
2
)
−
= δ(r
1
) r
−
2
+r
1
δ(r
2
) +r
−
1
r
−
2
(r
∗
)
−
= δ(r) +r
−
∗
The only tricky case is the one for concatenation, which must take account
of the possibility that r
1
or r
2
accepts the null string.
Exercise 3
Show that L(r
−
) = L(r) ¸ 1.
WORKING DRAFT DECEMBER 9, 2002
244 ProofDirected Debugging
WORKING DRAFT DECEMBER 9, 2002
Chapter 28
Persistent and Ephemeral Data
Structures
This chapter is concerned with persistent and ephemeral abstract types. The
distinction is best explained in terms of the logical future of a value. When
ever a value of an abstract type is created it may be subsequently acted
upon by the operations of the type (and, since the type is abstract, by no
other operations). Each of these operations may yield (other) values of that
abstract type, which may themselves be handed off to further operations
of the type. Ultimately a value of some other type, say a string or an inte
ger, is obtained as an observable outcome of the succession of operations
on the abstract value. The sequence of operations performed on a value of
an abstract type constitutes a logical future of that type — a computation
that starts with that value and ends with a value of some observable type.
We say that a type is ephemeral iff every value of that type has at most one
logical future, which is to say that it is handed off from one operation of
the type to another until an observable value is obtained from it. This is
the normal case in familiar imperative programming languages because in
such languages the operations of an abstract type destructively modify the
value upon which they operate; its original state is irretrievably lost by the
performance of an operation. It is therefore inherent in the imperative pro
gramming model that a value have at most one logical future. In contrast,
values of an abstract type in functional languages such as ML may have
many different logical futures, precisely because the operations do not “de
stroy” the value upon which they operate, but rather create fresh values of
that type to yield as results. Such values are said to be persistent because
they persist after application of an operation of the type, and in fact may
WORKING DRAFT DECEMBER 9, 2002
246 Persistent and Ephemeral Data Structures
serve as arguments to further operations of that type.
Some examples will help to clarify the distinction. The primitive list
types of ML are persistent because the performance of an operation such
as cons’ing, appending, or reversing a list does not destroy the original list.
This leads naturally to the idea of multiple logical futures for a given value,
as illustrated by the following code sequence:
(* original list *)
val l = [1,2,3]
val m1 = hd l
(* first future of l *)
val n1 = rev m1
(* second future of l *)
val m2 = l @ [4,5,6]
Notice that the original list value, [1,2,3], has two distinct logical fu
tures, one in which we remove its head, then reverse the tail, and the other
in which we append the list [4,5,6] to it. The ability to easily handle mul
tiple logical futures for a data structure is a tremendous source of ﬂexibility
and expressive power, alleviating the need to perform tedious bookkeep
ing to manage “versions” or “copies” of a data structure to be passed to
different operations.
The prototypical ephemeral data structure in ML is the reference cell.
Performing an assignment operation on a reference cell changes it irrevo
cably; the original contents of the cell are lost, even if we keep a handle on
it.
val r = ref 0
(* original cell *)
val s = r
val = (s := 1)
val x = !r
(* 1! *)
Notice that the contents of (the cell bound to) r changes as a result of per
forming an assignment to the underlying cell. There is only one future for
this cell; a reference to its original binding does not yield its original con
tents.
More elaborate forms of ephemeral data structures are certainly possi
ble. For example, the following declaration deﬁnes a type of lists whose
tails are mutable. It is therefore a singlylinked list, one whose predecessor
relation may be changed dynamically by assignment:
WORKING DRAFT DECEMBER 9, 2002
247
datatype ’a mutable list =
Nil 
Cons of ’a * ’a mutable list ref
Values of this type are ephemeral in the sense that some operations on
values of this type are destructive, and hence are irreversible (so to speak!).
For example, here’s an implementation of a destructive reversal of a muta
ble list. Given a mutable list l, this function reverses the links in the cell so
that the elements occur in reverse order of their occurrence in l.
local
fun ipr (Nil, a) = a
 ipr (this as (Cons ( , r as ref next)), a) =
ipr (next, (r := a; this))
in
(* destructively reverse a list *)
fun inplace reverse l = ipr (l, Nil)
end
As you can see, the code is quite tricky to understand! The idea is the
same as the iterative reverse function for pure lists, except that we reuse
the nodes of the original list, rather than generate new ones, when moving
elements onto the accumulator argument.
The distinction between ephemeral and persistent data structures is
essentially the distinction between functional (effectfree) and imperative
(effectful) programming — functional data structures are persistent; im
perative data structures are ephemeral. However, this characterization is
oversimpliﬁed in two respects. First, it is possible to implement a persis
tent data structure that exploits mutable storage. Such a use of mutation
is an example of what is called a benign effect because for all practical pur
poses the data structure is “purely functional” (i.e., persistent), but is in fact
implemented using mutable storage. As we will see later the exploitation
of benign effects is crucial for building efﬁcient implementations of persis
tent data structures. Second, it is possible for a persistent data type to be
used in such a way that persistence is not exploited — rather, every value
of the type has at most one future in the program. Such a type is said to be
singlethreaded, reﬂecting the linear, as opposed to branching, structure of
the future uses of values of that type. The signiﬁcance of a singlethreaded
type is that it may as well have been implemented as an ephemeral data
structure (e.g., by having observable effects on values) without changing
the behavior of the program.
WORKING DRAFT DECEMBER 9, 2002
248 Persistent and Ephemeral Data Structures
28.1 Persistent Queues
Here is a signature of persistent queues:
signature QUEUE = sig
type ’a queue
exception Empty
val empty : ’a queue
val insert : ’a * ’a queue > ’a queue
val remove : ’a queue > ’a * ’a queue
end
This signature describes a structure providing a representation type for
queues, together with operations to create an empty queue, insert an ele
ment onto the back of the queue, and to remove an element from the front
of the queue. It also provides an exception that is raised in response to
an attempt to remove an element from the empty queue. Notice that re
moving an element from a queue yields both the element at the front of
the queue, and the queue resulting from removing that element. This is
a direct reﬂection of the persistence of queues implemented by this signa
ture; the original queue remains available as an argument to further queue
operations.
By a sequence of queue operations we shall mean a succession of uses
of empty, insert, and remove operations in such a way that the queue
argument of one operation is obtained as a result of the immediately pre
ceding queue operation. Thus a sequence of queue operations represents a
singlethreaded timeline in the life of a queue value. Here is an example
of a sequence of queue operations:
val q0 : int queue = empty
val q1 = insert (1, q0)
val q2 = insert (2, q1)
val (h1, q3) = remove q2 (* h1 = 1, q3 = q1 *)
val (h2, q4) = remove q3 (* h2 = 2, q4 = q0 *)
By contrast the following operations do not form a single thread, but
rather a branching development of the queue’s lifetime:
val q0 : int queue = empty
val q1 = insert (1, q0)
val q2 = insert (2, q0) (* NB: q0, not q1! *)
val (h1, q3) = remove q1 (* h1 = 1, q3 = q0 *)
WORKING DRAFT DECEMBER 9, 2002
28.1 Persistent Queues 249
val (h2, q4) = remove q3 (* raise Empty *)
val (h2, q4) = remove q2 (* h2 = 2,, q4 = q0 *)
In the remainder of this section we will be concerned with singlethreaded
sequences of queue operations.
How might we implement the signature QUEUE? The most obvious ap
proach is to represent the queue as a list with, say, the head element of
the list representing the “back” (most recently enqueued element) of the
queue. With this representation enqueueing is a constanttime operation,
but dequeuing requires time proportional to the number of elements in the
queue. Thus in the worst case a sequence of n enqueue and dequeue opera
tions will take time O(n
2
), which is clearly excessive. We can make dequeue
simpler, at the expense of enqueue, by regarding the head of the list as the
“front” of the queue, but the time bound for n operations remains the same
in the worst case.
Can we do better? A wellknown “trick” achieves an O(n) worstcase
performance for any sequence of n operations, which means that each op
eration takes O(1) steps if we amortize the cost over the entire sequence.
Notice that this is a worstcase bound for the sequence, yielding an amortized
bound for each operation of the sequence. This means that some operations
may be relatively expensive, but, in compensation, many operations will
be cheap.
How is this achieved? By combining the two naive solutions sketched
above. The idea is to represent the queue by twolists, one for the back “half”
consisting of recently inserted elements in the order of arrival, and one for
the front “half” consisting of soontoberemoved elements in reverse order
of arrival (i.e., in order of removal). We put “half” in quotes because we
will not, in general, maintain an even split of elements between the front
and the back lists. Rather, we will arrange things so that the following
representation invariants holds true:
1. The elements of the queue listed in order of removal are the elements
of the front followed by the elements of the back in reverse order.
2. The front is empty only if the back is empty.
These invariants are maintained by using a “smart constructor” that
creates a queue from two lists representing the back and front parts of the
queue. This constructor ensures that the representation invariant holds by
ensuring that condition (2) is always true of the resulting queue. The con
structor proceeds by a case analysis on the back and fron parts of the queue.
WORKING DRAFT DECEMBER 9, 2002
250 Persistent and Ephemeral Data Structures
If the front list is nonempty, or both the front and back are empty, the re
sulting queue consists of the back and front parts as given. If the front is
empty and the back is nonempty, the queue constructor yields the queue
consisting of an empty back part and a front part equal to the reversal of the
given back part. Observe that this is sufﬁcient to ensure that the represen
tation invariant holds of the resulting queue in all cases. Observe also that
the smart constructor either runs in constant time, or in time proportional
to the length of the back part, according to whether the front part is empty
or not.
Insertion of an element into a queue is achieved by cons’ing the element
onto the back of the queue, then calling the queue constructor to ensure
that the result is in conformance with the representation invariant. Thus
an insert can either take constant time, or time proportional to the size of
the back of the queue, depending on whether the front part is empty. Re
moval of an element from a queue requires a case analysis. If the front is
empty, then by condition (2) the queue is empty, so we raise an exception.
If the front is nonempty, we simply return the head element together with
the queue created from the original back part and the front part with the
head element removed. Here again the time required is either constant or
proportional to the size of the back of the queue, according to whether the
front part becomes empty after the removal. Notice that if an insertion or
removal requires a reversal of k elements, then the next k operations are
constanttime. This is the fundamental insight as to why we achieve O(n)
time complexity over any sequence of n operations. (We will give a more
rigorous analysis shortly.)
Here’s the implementation of this idea in ML:
structure Queue :> QUEUE = struct
type ’a queue = ’a list * ’a list
fun make queue (q as (nil, nil)) = q
 make queue (bs, nil) = (nil, rev bs)
 make queue (q as (bs, fs)) = q
val empty = make queue (nil, nil)
fun insert (x, (back,front)) =
make queue (x::back, front)
exception Empty
fun remove ( , nil) = raise Empty
 remove (bs, f::fs) = (f, make queue (bs, fs))
end
WORKING DRAFT DECEMBER 9, 2002
28.2 Amortized Analysis 251
Notice that we call the “smart constructor” make queue whenever we
wish to return a queue to ensure that the representation invariant holds.
Consequently, some queue operations are more expensive than others, ac
cording to whether or not the queue needs to be reorganized to satisfy
the representation invariant. However, each such reorganization makes a
corresponding number of subsequent queue operations “cheap” (constant
time), so the overall effort required evens out in the end to constanttime
per operation. More precisely, the running time of a sequence of n queue
operations is now O(n), rather than O(n
2
), as it was in the naive imple
mentation. Consequently, each operation takes O(1) (constant) time “on
average,” i.e., when the total effort is evenly apportioned among each of
the operations in the sequence. Note that this is a worstcase time bound
for each operation, amortized over the entire sequence, not an averagecase time
bound based on assumptions about the distribution of the operations.
28.2 Amortized Analysis
How can we prove this claim? First we given an informal argument, then
we tighten it up with a more rigorous analysis. We are to account for the
total work performed by a sequence of n operations by showing that any
sequence of noperations can be executed in cn steps for some constant c.
Dividing by n, we obtain the result that each operations takes c steps when
amortized over the entire sequence. The key is to observe ﬁrst that the work
required to execute a sequence of queue operations may be apportioned to
the elements themselves, then that only a constant amount of work is ex
pended on each element. The “life” of a queue element may be divided
into three stages: it’s arrival in the queue, it’s transit time in the queue,
and it’s departure from the queue. In the worst case each element passes
through each of these stages (but may “die young”, never participating in
the second or third stage). Arrival requires constant time to add the ele
ment to the back of the queue. Transit consists of being moved from the
back to the front by a reversal, which takes constant time per element on
the back. Departure takes constant time to pattern match and extract the el
ement. Thus at worst we require three steps per element to account for the
entire effort expended to perform a sequence of queue operations. This is
in fact a conservative upper bound in the sense that we may need less than
3n steps for the sequence, but asymptotically the bound is optimal — we
cannot do better than constant time per operation! (You might reasonably
wonder whether there is a worstcase, nonamortized constanttime imple
WORKING DRAFT DECEMBER 9, 2002
252 Persistent and Ephemeral Data Structures
mentation of persistent queues. The answer is “yes”, but the code is far
more complicated than the simple implementation we are sketching here.)
This argument can be made rigorous as follows. The general idea is to
introduce the notion of a charge scheme that provides an upper bound on
the actual cost of executing a sequence of operations. An upper bound on
the charge will then provide an upper bound on the actual cost. Let T(n)
be the cumulative time required (in the worst case) to execute a sequence
of n queue operations. We will introduce a charge function, C(n), repre
senting the cumulative charge for executing a sequence of n operations and
show that T(n) ≤ C(n) = O(n). It is convenient to express this in terms
of a function R(n) = C(n) − T(n) representing the cumulative residual, or
overcharge, which is the amount that the charge for n operations exceeds the
actual cost of executing them. We will arrange things so that R(n) ≥ 0 and
that C(n) = O(n), from which the result follows immediately.
Down to speciﬁcs. By charging 2 for each insert operation and 1 for
each remove, it follows that C(n) ≤ 2n for any sequence of n inserts and
removes. Thus C(n) = O(n). After any sequence of n ≥ 0 operations
have been performed, the queue contains 0 ≤ b ≤ n elements on the back
“half” and 0 ≤ f ≤ n elements on the front “half”. We claim that for every
n ≥ 0, R(n) = b. We prove this by induction on n ≥ 0. The condition
clearly holds after performing 0 operations, since T(0) = 0, C(0) = 0, and
hence R(0) = C(0) − T(0) = 0. Consider the n + 1st operation. If it is an
insert, and f > 0, T(n + 1) = T(n) + 1, C(n + 1) = C(n) + 2, and hence
R(n + 1) = R(n) + 1 = b + 1. This is correct because an insert operation
adds one element to the back of the queue. If, on the other hand, f = 0,
then T(n+1) = T(n) +b+2 (charging one for the cons and one for creating
the new pair of lists), C(n+1) = C(n) +2, so R(n+1) = R(n) +2 −b −2 =
b + 2 − b − 2 = 0. This is correct because the back is now empty; we have
used the residual overcharge to pay for the cost of the reversal. If the n+1st
operation is a remove, and f > 0, then T(n+1) = T(n) +1 and C(n+1) =
C(n) + 1 and hence R(n + 1) = R(n) = b. This is correct because the
remove doesn’t disturb the back in this case. Finally, if we are performing
a remove with f = 0, then T(n + 1) = T(n) + b + 1, C(n + 1) = C(n) + 1,
and hence R(n + 1) = R(n) − b = b − b = 0. Here again we use of the
residual overcharge to pay for the reversal of the back to the front. The
result follows immediately since R(n) = b ≥ 0, and hence C(n) ≥ T(n).
It is instructive to examine where this solution breaks down in the multi
threaded case (i.e., where persistence is fully exploited). Suppose that we
perform a sequence of n insert operations on the empty queue, resulting in
a queue with n elements on the back and none on the front. Call this queue
WORKING DRAFT DECEMBER 9, 2002
28.2 Amortized Analysis 253
q. Let us suppose that we have n independent “futures” for q, each of which
removes an element from it, for a total of 2n operations. How much time
do these 2n operations take? Since each independent future must reverse
all n elements onto the front of the queue before performing the removal,
the entire collection of 2n operations takes n + n
2
steps, or O(n) steps per
operation, breaking the amortized constanttime bound just derived for a
singlethreaded sequence of queue operations. Can we recover a constant
time amortized cost in the persistent case? We can, provided that we share
the cost of the reversal among all futures of q — as soon as one performs
the reversal, they all enjoy the beneﬁt of its having been done. This may be
achieved by using a benign side effect to cache the result of the reversal in a
reference cell that is shared among all uses of the queue. We will return to
this once we introduce memoization and lazy evaluation.
WORKING DRAFT DECEMBER 9, 2002
254 Persistent and Ephemeral Data Structures
WORKING DRAFT DECEMBER 9, 2002
Chapter 29
Options, Exceptions, and
Continuations
In this chapter we discuss the close relationships between option types, ex
ceptions, and continuations. They each provide the means for handling
failure to produce a value in a computation. Option types provide the
means of explicitly indicating in the type of a function the possibility that it
may fail to yield a “normal” result. The result type of the function forces the
caller to dispatch explicitly on whether or not it returned a normal value.
Exceptions provide the means of implicitly signalling failure to return a
normal result value, without sacriﬁcing the requirement that an application
of such a function cannot ignore failure to yield a value. Continuations pro
vide another means of handling failure by providing a function to invoke
in the case that normal return is impossible.
29.1 The nQueens Problem
We will explore the tradeoffs between these three approaches by consider
ing three different implementations of the nqueens problem: ﬁnd a way to
place n queens on an n n chessboard in such a way that no two queens
attack one another. The general strategy is to place queens in successive
columns in such a way that it is not attacked by a previously placed queen.
Unfortunately it’s not possible to do this in one pass; we may ﬁnd that we
can safely place k < n queens on the board, only to discover that there is
no way to place the next one. To ﬁnd a solution we must reconsider earlier
decisions, and work forward from there. If all possible reconsiderations of
all previous decisions all lead to failure, then the problem is unsolvable.
WORKING DRAFT DECEMBER 9, 2002
256 Options, Exceptions, and Continuations
For example, there is no safe placement of three queens on a 3x3 chess
board. This trialanderror approach to solving the nqueens problem is
called backtracking search.
A solution to the nqueens problem consists of an nn chessboard with
n queens safely placed on it. The following signature deﬁnes a chessboard
abstraction:
signature BOARD =
sig
type board
val new : int > board
val complete : board > bool
val place : board * int > board
val safe : board * int > bool
val size : board > int
val positions : board > (int * int) list
end
The operation new creates a new board of a given dimension n ≥ 0. The
operation complete checks whether the board contains a complete safe
placement of n queens. The function safe checks whether it is safe to
place a queen at row i in the next free column of a board B. The operation
place puts a queen at row i in the next available column of the board. The
function size returns the size of a board, and the function positions
returns the coordinates of the queens on the board.
The board abstraction may be implemented as follows:
structure Board :> BOARD =
struct
(* rep: size, next free column, number placed, placements
inv: size>=0, 1<=next free<=size,
length(placements) = number placed
*)
type board = int * int * int * (int * int) list
fun new n = (n, 1, 0, nil)
fun size (n, , , ) = n
fun complete (n, , k, ) = (k=n)
fun positions ( , , , qs) = qs
fun place ((n, i, k, qs),j) =
(n, i+1, k+1, (i,j)::qs)
WORKING DRAFT DECEMBER 9, 2002
29.2 Solution Using Options 257
fun threatens ((i,j), (i’,j’)) =
i=i’ orelse j=j’ orelse
i+j = i’+j’ orelse
ij = i’j’
fun conflicts (q, nil) =
false
 conflicts (q, q’::qs) =
threatens (q, q’) orelse conflicts (q, qs)
fun safe (( , i, , qs), j) =
not (conflicts ((i,j), qs))
end
The representation type contains “redundant” information in order to make
the individual operations more efﬁcient. The representation invariant en
sures that the components of the representation are properly related to one
another (e.g., the claimed number of placements is indeed the length of the
list of placed queens, and so on.)
Our goal is to deﬁne a function
val queens : int > Board.board option
such that if n ≥ 0, then queens n evaluates either to NONE if there is no safe
placement of n queens on an nn board, or to SOME B otherwise, with B a
complete board containing a safe placement of n queens. We will consider
three different solutions, one using option types, one using exceptions, and
one using a failure continuation.
29.2 Solution Using Options
Here’s a solution based on option types:
(* addqueen bd yields SOME bd’ with bd’ a
complete safe placement extending bd,
if one exists, and yields NONE otherwise
*)
fun addqueen bd =
let
fun try j =
if j > Board.size bd then
NONE
WORKING DRAFT DECEMBER 9, 2002
258 Options, Exceptions, and Continuations
else if Board.safe (bd, j) then
case addqueen (Board.place (bd, j))
of NONE => try (j+1)
 r as (SOME bd’) => r
else
try (j+1)
in
if Board.complete bd then
SOME bd
else
try 1
end
fun queens n = addqueen (Board.new n)
The characteristic feature of this solution is that we must explicitly check
the result of each recursive call to addqueen to determine whether a safe
placement is possible from that position. If so, we simply return it; if not,
we must reconsider the placement of a queen in row j of the next available
column. If no placement is possible in the current column, the function
yields NONE, which forces reconsideration of the placement of a queen in
the preceding row. Eventually we either ﬁnd a safe placement, or yield
NONE indicating that no solution is possible.
29.3 Solution Using Exceptions
The explicit check on the result of each recursive call can be replaced by
the use of exceptions. Rather than have addqueen return a value of type
Board.board option, we instead have it return a value of type Board.board,
if possible, and otherwise raise an exception indicating failure. The case
analysis on the result is replaced by a use of an exception handler. Here’s
the code:
exception Fail
(* addqueen bd yields bd’, where bd’ is a complete safe
placement extending bd, if one exists, and raises Fail otherwise
*)
fun addqueen bd =
let
fun try j =
if j > Board.size bd then
WORKING DRAFT DECEMBER 9, 2002
29.3 Solution Using Exceptions 259
raise Fail
else if Board.safe (bd, j) then
addqueen (Board.place (bd, j))
handle Fail => try (j+1)
else
try (j+1)
in
if Board.complete bd then
bd
else
try 1
end
fun queens n =
SOME (addqueen (Board.new n))
handle Fail => NONE
The main difference between this solution and the previous one is that both
calls to addqueen must handle the possibility that it raises the exception
Fail. In the outermost call this corresponds to a complete failure to ﬁnd
a safe placement, which means that queens must return NONE. If a safe
placement is indeed found, it is wrapped with the constructor SOME to in
dicate success. In the recursive call within try, an exception handler is
required to handle the possibility of there being no safe placement starting
in the current position. This check corresponds directly to the case analysis
required in the solution based on option types.
What are the tradeoffs between the two solutions?
1. The solution based on option types makes explicit in the type of the
function addqueen the possibility of failure. This forces the pro
grammer to explicitly test for failure using a case analysis on the re
sult of the call. The type checker will ensure that one cannot use a
Board.board option where a Board.board is expected. The so
lution based on exceptions does not explicitly indicate failure in its
type. However, the programmer is nevertheless forced to handle the
failure, for otherwise an uncaught exception error would be raised at
runtime, rather than compiletime.
2. The solution based on option types requires an explicit case analysis
on the result of each recursive call. If “most” results are successful,
the check is redundant and therefore excessively costly. The solution
based on exceptions is free of this overhead: it is biased towards the
WORKING DRAFT DECEMBER 9, 2002
260 Options, Exceptions, and Continuations
“normal” case of returning a board, rather than the “failure” case of
not returning a board at all. The implementation of exceptions en
sures that the use of a handler is more efﬁcient than an explicit case
analysis in the case that failure is rare compared to success.
For the nqueens problem it is not clear which solution is preferable. In
general, if efﬁciency is paramount, we tend to prefer exceptions if failure is
a rarity, and to prefer options if failure is relatively common. If, on the other
hand, static checking is paramount, then it is advantageous to use options
since the type checker will enforce the requirement that the programmer
check for failure, rather than having the error arise only at runtime.
29.4 Solution Using Continuations
We turn now to a third solution based on continuationpassing. The idea is
quite simple: an exception handler is essentially a function that we invoke
when we reach a blind alley. Ordinarily we achieve this invocation by rais
ing an exception and relying on the caller to catch it and pass control to the
handler. But we can, if we wish, pass the handler around as an additional
argument, the failure continuation of the computation. Here’s how it’s done
in the case of the nqueens problem:
(* addqueen bd yields bd’, where bd’ is a complete safe
placement extending bd, if one exists, and otherwise,
yields the value of fc ()
*)
fun addqueen (bd, fc) =
let
fun try j =
if j > Board.size bd then
fc ()
else if Board.safe (bd, j) then
addqueen
(Board.place (bd, j),
fn () => try (j+1))
else
try (j+1)
in
if Board.complete bd then
SOME bd
WORKING DRAFT DECEMBER 9, 2002
29.4 Solution Using Continuations 261
else
try 1
end
fun queens n =
addqueen (Board.new n, fn () => NONE)
Here again the differences are small, but signiﬁcant. The initial continua
tion simply yields NONE, reﬂecting the ultimate failure to ﬁnd a safe place
ment. On a recursive call we pass to addqueen a continuation that resumes
search at the next rowof the current column. Should we exceed the number
of rows on the board, we invoke the failure continuation of the most recent
call to addqueen.
The solution based on continuations is very close to the solution based
on exceptions, both in form and in terms of efﬁciency. Which is prefer
able? Here again there is no easy answer, we can only offer general advice.
First off, as we’ve seen in the case of regular expression matching, failure
continuations are more powerful than exceptions; there is no obvious way
to replace the use of a failure continuation with a use of exceptions in the
matcher. However, in the case that exceptions would sufﬁce, it is gener
ally preferable to use them since one may then avoid passing an explicit
failure continuation. More signiﬁcantly, the compiler ensures that an un
caught exception aborts the program gracefully, whereas failure to invoke
a continuation is not in itself a runtime fault. Using the right tool for the
right job makes life easier.
WORKING DRAFT DECEMBER 9, 2002
262 Options, Exceptions, and Continuations
WORKING DRAFT DECEMBER 9, 2002
Chapter 30
HigherOrder Functions
Higherorder functions — those that take functions as arguments or return
functions as results — are powerful tools for building programs. An in
teresting application of higherorder functions is to implement inﬁnite se
quences of values as (total) functions fromthe natural numbers (nonnegative
integers) to the type of values of the sequence. We will develop a small
package of operations for creating and manipulating sequences, all of which
are higherorder functions since they take sequences (functions!) as argu
ments and/or return them as results. A natural way to deﬁne many se
quences is by recursion, or selfreference. Since sequences are functions,
we may use recursive function deﬁnitions to deﬁne such sequences. Alter
natively, we may think of such a sequence as arising from a “loopback” or
“feedback” construct. We will explore both approaches.
Sequences may be used to simulate digital circuits by thinking of a
“wire” as a sequence of bits developing over time. The ith value of the
sequence corresponds to the signal on the wire at time i. For simplicity we
will assume a perfect waveform: the signal is always either high or low
(or is undeﬁned); we will not attempt to model electronic effects such as
attenuation or noise. Combinational logic elements (such as and gates or
inverters) are operations on wires: they take in one or more wires as in
put and yield one or more wires as results. Digital logic elements (such as
ﬂipﬂops) are obtained from combinational logic elements by feedback, or
recursion — a ﬂipﬂop is a recursivelydeﬁned wire!
WORKING DRAFT DECEMBER 9, 2002
264 HigherOrder Functions
30.1 Inﬁnite Sequences
Let us begin by developing a sequence package. Here is a suitable signature
deﬁning the type of sequences:
signature SEQUENCE =
sig
type ’a seq = int > ’a
(* constant sequence *)
val constantly : ’a > ’a seq
(* alternating values *)
val alternately : ’a * ’a > ’a seq
(* insert at front *)
val insert : ’a * ’a seq > ’a seq
val map : (’a > ’b) > ’a seq > ’b seq
val zip : ’a seq * ’b seq > (’a * ’b) seq
val unzip : (’a * ’b) seq > ’a seq * ’b seq
(* fair merge *)
val merge : (’a * ’a) seq > ’a seq
val stretch : int > ’a seq > ’a seq
val shrink : int > ’a seq > ’a seq
val take : int > ’a seq > ’a list
val drop : int > ’a seq > ’a seq
val shift : ’a seq > ’a seq
val loopback : (’a seq > ’a seq) > ’a seq
end
Observe that we expose the representation of sequences as functions. This
is done to simplify the deﬁnition of recursive sequences as recursive func
tions. Alternatively we could have hidden the representation type, at the
expense of making it a bit more awkward to deﬁne recursive sequences.
In the absence of this exposure of representation, recursive sequences may
only be built using the loopback operation which constructs a recursive
sequence by “looping back” the output of a sequence transformer to its
input. Most of the other operations of the signature are adaptations of fa
miliar operations on lists. Two exceptions to this rule are the functions
stretch and shrink that dilate and contract the sequence by a given
time parameter — if a sequence is expanded by k, its value at i is the value
of the original sequence at i/k, and dually for shrinking.
Here’s an implementation of sequences as functions.
WORKING DRAFT DECEMBER 9, 2002
30.1 Inﬁnite Sequences 265
structure Sequence :> SEQUENCE =
struct
type ’a seq = int > ’a
fun constantly c n = c
fun alternately (c,d) n =
if n mod 2 = 0 then c else d
fun insert (x, s) 0 = x
 insert (x, s) n = s (n1)
fun map f s = f o s
fun zip (s1, s2) n = (s1 n, s2 n)
fun unzip (s : (’a * ’b) seq) =
(map #1 s, map #2 s)
fun merge (s1, s2) n =
(if n mod 2 = 0 then s1 else s2) (n div 2)
fun stretch k s n = s (n div k)
fun shrink k s n = s (n * k)
fun drop k s n = s (n+k)
fun shift s = drop 1 s
fun take 0 = nil
 take n s = s 0 :: take (n1) (shift s)
fun loopback loop n = loop (loopback loop) n
end
Most of this implementation is entirely straightforward, given the ease with
which we may manipulate higherorder functions in ML. The only tricky
function is loopback, which must arrange that the output of the function
loop is “looped back” to its input. This is achieved by a simple recursive
deﬁnition of a sequence whose value at n is the value at n of the sequence
resulting from applying the loop to this very sequence.
The sensibility of this deﬁnition of loopback relies on two separate
ideas. First, notice that we may not simplify the deﬁnition of loopback as
follows:
(* bad definition *)
fun loopback loop = loop (loopback loop)
The reason is that any application of loopback will immediately loop for
ever! In contrast, the original deﬁnition is arranged so that application of
WORKING DRAFT DECEMBER 9, 2002
266 HigherOrder Functions
loopback immediately returns a function. This may be made more appar
ent by writing it in the following form, which is entirely equivalent to the
deﬁnition given above:
fun loopback loop =
fn n => loop (loopback loop) n
This format makes it clear that loopback immediately returns a function
when applied to a loop functional.
Second, for an application of loopback to a loop to make sense, it must
be the case that the loop returns a sequence without “touching” the argu
ment sequence (i.e., without applying the argument to a natural number).
Otherwise accessing the sequence resulting from an application of loop
back would immediately loop forever. Some examples will help to illus
trate the point.
First, let’s build a few sequences without using the loopback function,
just to get familiar with using sequences:
val evens : int seq = fn n => 2*n
val odds : int seq = fn n => 2*n+1
val nats : int seq = merge (evens, odds)
fun fibs n =
(insert
(1, insert
(1, map (op +)
(zip (drop 1 fibs, fibs)))))(n)
We may “inspect” the sequence using take and drop, as follows:
take 10 nats (* [0,1,2,3,4,5,6,7,8,9] *)
take 5 (drop 5 nats) (* [5,6,7,8,9] *)
take 5 fibs (* [1,1,2,3,5] *)
Now let’s consider an alternative deﬁnition of fibs that uses the loopback
operation:
fun fibs loop s =
insert (1, insert (1,
map (op +) (zip (drop 1 s, s))))
val fibs = loopback fibs loop;
WORKING DRAFT DECEMBER 9, 2002
30.2 Circuit Simulation 267
The deﬁnition of fibs loop is exactly like the original deﬁnition of fibs,
except that the reference to fibs itself is replaced by a reference to the
argument s. Notice that the application of fibs loop to an argument s
does not inspect the argument s!
One way to understand loopback is that it solves a system of equations
for an unknown sequence. In the case of the second deﬁnition of ﬁbs, we
are solving the following system of equations for f:
f0 = 1
f1 = 1
f(n + 2) = f(n + 1) + f(n)
These equations are derived by inspecting the deﬁnitions of insert, map,
zip, and drop given earlier. It is obvious that the solution is the Fibonacci
sequence; this is precisely the sequence obtained by applying loopback to
fibs loop.
Here’s an example of a loop that, when looped back, yields an unde
ﬁned sequence — any attempt to access it results in an inﬁnite loop:
fun bad loop s n = s n + 1
val bad = loopback bad loop
val = bad 0 (* infinite loop! *)
In this example we are, in effect, trying to solve the equation sn = sn+1 for
s, which has no solution (except the totally undeﬁned sequence). The prob
lem is that the “next” element of the output is deﬁned in terms of the next
element itself, rather than in terms of “previous” elements. Consequently,
no solution exists.
30.2 Circuit Simulation
With these ideas in mind, we may apply the sequence package to build an
implementation of digital circuits. Let’s start with wires, which are repre
sented as sequences of levels:
datatype level = High  Low  Undef
type wire = level seq
type pair = (level * level) seq
val Zero : wire = constantly Low
val One : wire = constantly High
WORKING DRAFT DECEMBER 9, 2002
268 HigherOrder Functions
(* clock pulse with given duration of each pulse *)
fun clock (freq:int):wire =
stretch freq (alternately (Low, High))
We include the “undeﬁned” level to account for propagation delays and
settling times in circuit elements.
Combinational logic elements (gates) may be deﬁned as follows. We in
troduce an explicit unit time propagation delay for each gate — the output
is undeﬁned initially, and is then determined as a function of its inputs. As
we build up layers of circuit elements, it takes longer and longer (propor
tional to the length of the longest path through the circuit) for the output to
settle, exactly as in “real life”.
(* apply two functions in parallel *)
infixr **;
fun (f ** g) (x, y) = (f x, g y)
(* hardware logical and *)
fun logical and (Low, ) = Low
 logical and ( , Low) = Low
 logical and (High, High) = High
 logical and = Undef
fun logical not Undef = Undef
 logical not High = Low
 logical not Low = High
fun logical nop l = l
(* a nor b = not a and not b *)
val logical nor =
logical and o (logical not ** logical not)
type unary gate = wire > wire
type binary gate = pair > wire
fun gate f w 0 = Undef
(* logic gate with unit propagation delay *)
 gate f w i = f (w (i1))
val delay : unary gate = gate logical nop (* unit delay *)
val inverter : unary gate = gate logical not
val nor gate : binary gate = gate logical nor
It is a good exercise to build a onebit adder out of these elements, then to
string them together to form an nbit ripplecarry adder. Be sure to present
WORKING DRAFT DECEMBER 9, 2002
30.2 Circuit Simulation 269
the inputs to the adder with sufﬁcient pulse widths to ensure that the circuit
has time to settle!
Combining these basic logic elements with recursive deﬁnitions allows
us to deﬁne digital logic elements such as the RS ﬂipﬂop. The propaga
tion delay inherent in our deﬁnition of a gate is fundamental to ensuring
that the behavior of the ﬂipﬂop is welldeﬁned! This is consistent with
“real life” — ﬂipﬂop’s depend on the existence of a hardware propagation
delay for their proper functioning. Note also that presentation of “illegal”
inputs (such as setting both the R and the S leads high results in metastable
behavior of the ﬂipﬂop, here as in real life Finally, observe that the ﬂipﬂop
exhibits a momentary “glitch” in its output before settling, exactly as in the
hardware case. (All of these behaviors may be observed by using take and
drop to inspect the values on the circuit.)
fun RS ff (S : wire, R : wire) =
let
fun X n = nor gate (zip (S, Y))(n)
and Y n = nor gate (zip (X, R))(n)
in
Y
end
(* generate a pulse of b’s n wide, followed by w *)
fun pulse b 0 w i = w i
 pulse b n w 0 = b
 pulse b n w i = pulse b (n1) w (i1)
val S = pulse Low 2 (pulse High 2 Zero);
val R = pulse Low 6 (pulse High 2 Zero);
val Q = RS ff (S, R);
val = take 20 Q;
val X = RS ff (S, S); (* unstable! *)
val = take 20 X;
It is a good exercise to derive a system of equations governing the RS ﬂip
ﬂop from the deﬁnition we’ve given here, using the implementation of the
sequence operations given above. Observe that the delays arising from the
combinational logic elements ensure that a solution exists by ensuring that
the “next” element of the output refers only the “previous” elements, and
not the “current” element.
Finally, we consider a variant implementation of an RS ﬂipﬂop using
the loopback operation:
WORKING DRAFT DECEMBER 9, 2002
270 HigherOrder Functions
fun loopback2 (f : wire * wire > wire * wire) =
unzip (loopback (zip o f o unzip))
fun RS ff’ (S : wire, R : wire) =
let
fun RS loop (X, Y) =
(nor gate (zip (S, Y)),
nor gate (zip (X, R)))
in
loopback2 RS loop
end
Here we must deﬁne a “binary loopback” function to implement the ﬂip
ﬂop. This is achieved by reducing binary loopback to unary loopback by
composing with zip and unzip.
WORKING DRAFT DECEMBER 9, 2002
Chapter 31
Memoization
In this chapter we will discuss memoization, a programming technique for
cacheing the results of previous computations so that they can be quickly
retrieved without repeated effort. Memoization is fundamental to the im
plementation of lazy data structures, either “by hand” or using the provi
sions of the SML/NJ compiler.
31.1 Cacheing Results
We begin with a discussion of memoization to increase the efﬁciency of
computing a recursivelydeﬁned function whose pattern of recursion in
volves a substantial amount of redundant computation. The problem is to
compute the number of ways to parenthesize an expression consisting of a
sequence of n multiplications as a function of n. For example, the expres
sion 2 ∗ 3 ∗ 4 ∗ 5 can be parenthesized in 5 ways:
((2 ∗ 3) ∗ 4) ∗ 5, (2 ∗ (3 ∗ 4)) ∗ 5, (2 ∗ 3) ∗ (4 ∗ 5), 2 ∗ (3 ∗ (4 ∗ 5)), 2 ∗ ((3 ∗ 4) ∗ 5).
A simple recurrence expresses the number of ways of parenthesizing a
sequence of n multiplications:
fun sum f 0 = 0
 sum f n = (f n) + sum (f (n1))
fun p 1 = 1
 p n = sum (fn k => (p k) * (p (nk)) (n1)
where sum fncomputes the sum of values of a function f(k) with 1 ≤ k ≤
n. This program is extremely inefﬁcient because of the redundancy in the
pattern of the recursive calls.
WORKING DRAFT DECEMBER 9, 2002
272 Memoization
What can we do about this problem? One solution is to be clever and
solve the recurrence. As it happens this recurrence has a closedform so
lution (the Catalan numbers). But in many cases there is no known closed
form, and something else must be done to cut down the overhead. In this
case a simple cacheing technique proves effective. The idea is to maintain
a table of values of the function that is ﬁlled in whenever the function is
applied. If the function is called on an argument n, the table is consulted
to see whether the value has already been computed; if so, it is simply re
turned. If not, we compute the value and store it in the table for future
use. This ensures that no redundant computations are performed. We will
maintain the table as an array so that its entries can be accessed in constant
time. The penalty is that the array has a ﬁxed size, so we can only record
the values of the function at some predetermined set of arguments. Once
we exceed the bounds of the table, we must compute the value the “hard
way”. An alternative is to use a dictionary (e.g., a balanced binary search
tree) which has no a priori size limitation, but which takes logarithmic time
to perform a lookup. For simplicity we’ll use a solution based on arrays.
Here’s the code to implement a memoized version of the parenthesiza
tion function:
local
val limit = 100
val memopad = Array.array (100, NONE)
in
fun p’ 1 = 1
 p’ n = sum (fn k => (p k) * (p (nk))) (n1)
and p n =
if n < limit then
case Array.sub of
SOME r => r
 NONE =>
let
val r = p’ n
in
Array.update (memopad, n, SOME r);
r
end
else
p’ n
WORKING DRAFT DECEMBER 9, 2002
31.2 Laziness 273
end
The main idea is to modify the original deﬁnition so that the recursive
calls consult and update the memopad. The “exported” version of the func
tion is the one that refers to the memo pad. Notice that the deﬁnitions of p
and p’ are mutually recursive!
31.2 Laziness
Lazy evaluation is a combination of delayed evaluation and memoization.
Delayed evaluation is implemented using thunks, functions of type unit
> ’a. To delay the evaluation of an expression exp of type’a, simply
write fn () => exp. This is a value of type unit > ’a; the expression
exp is effectively “frozen” until the function is applied. To “thaw” the ex
pression, simply apply the thunk to the null tuple, (). Here’s a simple
example:
val thunk =
fn () => print "hello" (* nothing printed *)
val = thunk () (* prints hello *)
While this example is especially simpleminded, remarkable effects can
be achieved by combining delayed evaluation with memoization. To do so,
we will consider the following signature of suspensions:
signature SUSP =
sig
type ’a susp
val force : ’a susp > ’a
val delay : (unit > ’a) > ’a susp
end
The function delay takes a suspended computation (in the form of a
thunk) and yields a suspension. It’s job is to “memoize” the suspension
so that the suspended computation is evaluated at most once — once the
result is computed, the value is stored in a reference cell so that subsequent
forces are fast. The implementation is slick. Here’s the code to do it:
structure Susp :> SUSP =
struct
type ’a susp = unit > ’a
WORKING DRAFT DECEMBER 9, 2002
274 Memoization
fun force t = t ()
fun delay (t : ’a susp) =
let
exception Impossible
val memo : ’a susp ref =
ref (fn () => raise Impossible)
fun t’ () =
let val r = t ()
in memo := (fn () => r); r end
in
memo := t’;
fn () => (!memo)()
end
end
It’s worth discussing this code in detail because it is rather tricky. Sus
pensions are just thunks; force simply applies the suspension to the null
tuple to force its evaluation. What about delay? When applied, delay al
locates a reference cell containing a thunk that, if forced, raises an internal
exception. This can never happen for reasons that will become apparent in
a moment; it is merely a placeholder with which we initialize the reference
cell. We then deﬁne another thunk t’ that, when forced, does three things:
1. It forces the thunk t to obtain its value r.
2. It replaces the contents of the memopad with the constant function
that immediately returns r.
3. It returns r as result.
We then assign t’ to the memo pad (hence obliterating the placeholder),
and return a thunk dt that, when forced, simply forces the contents of the
memo pad. Whenever dt is forced, it immediately forces the contents of the
memo pad. However, the contents of the memo pad changes as a result of
forcing it so that subsequent forces exhibit different behavior. Speciﬁcally,
the ﬁrst time dt is forced, it forces the thunk t’, which then forces t its
value r, “zaps” the memo pad, and returns r. The second time dt is forced,
it forces the contents of the memo pad, as before, but this time the it con
tains the constant function that immediately returns r. Altogether we have
ensured that t is forced at most once by using a form of “selfmodifying”
code.
Here’s an example to illustrate the effect of delaying a thunk:
WORKING DRAFT DECEMBER 9, 2002
31.3 Lazy Data Types in SML/NJ 275
val t = Susp.delay (fn () => print "hello")
val = Susp.force t (* prints hello *)
val = Susp.force t (* silent *)
Notice that hello is printed once, not twice! The reason is that the sus
pended computation is evaluated at most once, so the message is printed
at most once on the screen.
31.3 Lazy Data Types in SML/NJ
The lazy datatype declaration
1
datatype lazy ’a stream = Cons of ’a * ’a stream
expands into the following pair of type declarations
datatype ’a stream! = Cons of ’a * ’a stream
withtype ’a stream = ’a stream! Susp.susp
The ﬁrst deﬁnes the type of stream values, the result of forcing a stream
computation, the second deﬁnes the type of stream computations, which are
suspensions yielding stream values. Thus streams are represented by sus
pended (unevaluated, memoized) computations of stream values, which
are formed by applying the constructor Cons to a value and another stream.
The value constructor Cons, when used to build a stream, automat
ically suspends computation. This is achieved by regarding Cons e as
shorthand for Cons (Susp.susp (fn () => e). When used in a pat
tern, the value constructor Cons induces a use of force. For example, the
binding
val Cons (h, t) = e
becomes
val Cons (h, t) = Susp.force e
which forces the righthand side before performing pattern matching.
A similar transformation applies to nonlazy function deﬁnitions — the
argument is forced before pattern matching commences. Thus the “eager”
tail function
1
Please see chapter 15 for a description of the SML/NJ lazy data type mechanism.
WORKING DRAFT DECEMBER 9, 2002
276 Memoization
fun stl (Cons ( , t)) = t
expands into
fun stl! (Cons ( , t)) = t
and stl s = stl! (Susp.force s)
which forces the argument as soon as it is applied.
On the other hand, lazy function deﬁnitions defer pattern matching un
til the result is forced. Thus the lazy tail function
fun lstl (Cons ( , t)) = t
expands into
fun lstl! (Cons ( , t)) =
t
and lstl s =
Susp.delay (fn () => lstl! (Susp.force s))
which a suspension that, when forced, performs the pattern match.
Finally, the recursive stream deﬁnition
val rec lazy ones = Cons (1, ones)
expands into the following recursive function deﬁnition:
val rec ones = Susp.delay (fn () => Cons (1, ones))
Unfortunately this is not quite legal in SML since the righthand side in
volves an application of a a function to another function. This can either be
provided by extending SML to admit such deﬁnitions, or by extending the
Susp package to include an operation for building recursive suspensions
such as this one. Since it is an interesting exercise in itself, we’ll explore the
latter alternative.
31.4 Recursive Suspensions
We seek to add a function to the Susp package with signature
val loopback : (’a susp > ’a susp) > ’a susp
that, when applied to a function f mapping suspensions to suspensions,
yields a suspension s whose behavior is the same as f(s), the application of
f to the resulting suspension. In the above example the function in question
is
WORKING DRAFT DECEMBER 9, 2002
31.4 Recursive Suspensions 277
fun ones loop s = Susp.delay (fn () => Cons (1, s))
We use loopback to deﬁne ones as follows:
val ones = Susp.loopback ones loop
The idea is that ones should be equivalent to Susp.delay (fn () =>
Cons (1, ones)), as in the original deﬁnition and which is the result
of evaluating Susp.loopback ones loop, assuming Susp.loopback
is implemented properly.
How is loopback implemented? We use a technique known as back
patching. Here’s the code
fun loopback f =
let
exception Circular
val r = ref (fn () => raise Circular)
val t = fn () => (!r)()
in
r := f t ; t
end
First we allocate a reference cell which is initialized to a placeholder
that, if forced, raises the exception Circular. Then we deﬁne a thunk
that, when forced, forces the contents of this reference cell. This will be the
return value of loopback. But before returning, we assign to the reference
cell the result of applying the given function to the result thunk. This “ties
the knot” to ensure that the output is “looped back” to the input. Observe
that if the loop function touches its input suspension before yielding an
output suspension, the exception Circular will be raised.
WORKING DRAFT DECEMBER 9, 2002
278 Memoization
WORKING DRAFT DECEMBER 9, 2002
Chapter 32
Data Abstraction
An abstract data type (ADT) is a type equipped with a set of operations for
manipulating values of that type. An ADT is implemented by providing
a representation type for the values of the ADT and an implementation for
the operations deﬁned on values of the representation type. What makes
an ADT abstract is that the representation type is hidden from clients of the
ADT. Consequently, the only operations that may be performed on a value
of the ADT are the given ones. This ensures that the representation may be
changed without affecting the behavior of the client — since the represen
tation is hidden from it, the client cannot depend on it. This also facilitates
the implementation of efﬁcient data structures by imposing a condition,
called a representation invariant, on the representation that is preserved by
the operations of the type. Each operation that takes a value of the ADT as
argument may assume that the representation invariant holds. In compen
sation each operation that yields a value of the ADT as result must guarantee
that the representation invariant holds of it. If the operations of the ADT
preserve the representation invariant, then it must truly be invariant — no
other code in the system could possibly disrupt it. Put another way, any
violation of the representation invariant may be localized to the implemen
tation of one of the operations. This signiﬁcantly reduces the time required
to ﬁnd an error in a program.
32.1 Dictionaries
To make these ideas concrete we will consider the abstract data type of
dictionaries. A dictionary is a mapping from keys to values. For simplic
ity we take keys to be strings, but it is possible to deﬁne a dictionary for
WORKING DRAFT DECEMBER 9, 2002
280 Data Abstraction
any ordered type; the values associated with keys are completely arbitrary.
Viewed as an ADT, a dictionary is a type ’a dict of dictionaries mapping
strings to values of type ’a together with empty, insert, and lookup
operations that create a new dictionary, insert a value with a given key, and
retrieve the value associated with a key (if any). In short a dictionary is an
implementation of the following signature:
signature DICT =
sig
type key = string
type ’a entry = key * ’a
type ’a dict
exception Lookup of key
val empty : ’a dict
val insert : ’a dict * ’a entry > ’a dict
val lookup : ’a dict * key > ’a
end
Notice that the type ’a dict is not speciﬁed in the signature, whereas the
types key and ’a entry are deﬁned to be string and string * ’a,
respectively.
32.2 Binary Search Trees
A simple implementation of a dictionary is a binary search tree. A binary
search tree is a binary tree with values of an ordered type at the nodes ar
ranged in such a way that for every node in the tree, the value at that node
is greater than the value at any node in the left child of that node, and
smaller than the value at any node in the right child. It follows immedi
ately that no two nodes in a binary search tree are labelled with the same
value. The binary search tree property is an example of a representation
invariant on an underlying data structure. The underlying structure is a
binary tree with values at the nodes; the representation invariant isolates a
set of structures satisfying some additional, more stringent, conditions.
We may use a binary search tree to implement a dictionary as follows:
structure BinarySearchTree :> DICT =
struct
type key = string
type ’a entry = key * ’a
WORKING DRAFT DECEMBER 9, 2002
32.3 Balanced Binary Search Trees 281
(* Rep invariant: ’a tree is a binary search tree *)
datatype ’a tree =
Empty 
Node of ’a tree * ’a entry * ’a tree
type ’a dict = ’a tree
exception Lookup of key
val empty = Empty
fun insert (Empty, entry) =
Node (Empty, entry, Empty)
 insert (n as Node (l, e as (k, ), r), e’ as (k’, )) =
(case String.compare (k’, k)
of LESS => Node (insert (l, e’), e, r)
 GREATER => Node (l, e, insert (r, e’))
 EQUAL => n)
fun lookup (Empty, k) = raise (Lookup k)
 lookup (Node (l, (k, v), r), k’) =
(case String.compare (k’, k)
of EQUAL => v
 LESS => lookup (l, k’)
 GREATER => lookup (r, k’))
end
Notice that empty is deﬁned to be a valid binary search tree, that insert
yields a binary search tree if its argument is one, and that lookup relies
on its argument being a binary search tree (if not, it might fail to ﬁnd a
key that in fact occurs in the tree!). The structure BinarySearchTree is
sealed with the signature DICT to ensure that the representation type is
held abstract.
32.3 Balanced Binary Search Trees
The difﬁculty with binary search trees is that they may become unbalanced.
In particular if we insert keys in ascending order, the representation is es
sentially just a list! The left child of each node is empty; the right child is
the rest of the dictionary. Consequently, it takes O(n) time in the worse case
to perform a lookup on a dictionary containing n elements. Such a tree is
said to be unbalanced because the children of a node have widely varying
heights. Were it to be the case that the children of every node had roughly
WORKING DRAFT DECEMBER 9, 2002
282 Data Abstraction
equal height, then the lookup would take O(lg n) time, a considerable im
provement.
Can we do better? Many approaches have been suggested. One that
we will consider here is an instance of what is called a selfadjusting tree,
called a redblack tree (the reason for the name will be apparent shortly). The
general idea of a selfadjusting tree is that operations on the tree may cause
a reorganization of its structure to ensure that some invariant is maintained.
In our case we will arrange things so that the tree is selfbalancing, meaning
that the children of any node have roughly the same height. As we just
remarked, this ensures that lookup is efﬁcient.
How is this achieved? By imposing a clever representation invariant on
the binary search tree, called the redblack tree condition. A redblack tree
is a binary search tree in which every node is colored either red or black
(with the empty tree being regarded as black) and such that the following
properties hold:
1. The children of a red node are black.
2. For any node in the tree, the number of black nodes on any two paths
from that node to a leaf is the same. This number is called the black
height of the node.
These two conditions ensure that a redblack tree is a balanced binary
search tree. Here’s why. First, observe that a redblack tree of black height
h has at least 2
h
−1 nodes. We may prove this by induction on the structure
of the redblack tree. The empty tree has blackheight 1 (since we consider
it to be black), which is at least 2
1
−1, as required. Suppose we have a red
node. The black height of both children must be h, hence each has at most
2
h
−1 nodes, yielding a total of 2 (2
h
−1) +1 = 2
h+1
−1 nodes, which is
at least 2
h
− 1. If, on the other hand, we have a black node, then the black
height of both children is h−1, and each have at most 2
h−1
−1 nodes, for a
total of 2(2
h−1
−1) +1 = 2
h
−1 nodes. Now, observe that a redblack tree
of height h with n nodes has black height at least h/2, and hence has at least
2
h/2
−1 nodes. Consequently, lg(n+1) ≥ h/2, so h ≤ 2lg(n+1). In other
words, its height is logarithmic in the number of nodes, which implies that
the tree is height balanced.
To ensure logarithmic behavior, all we have to do is to maintain the
redblack invariant. The empty tree is a redblack tree, so the only question
is how to perform an insert operation. First, we insert the entry as usual
for a binary search tree, with the fresh node starting out colored red. In
WORKING DRAFT DECEMBER 9, 2002
32.3 Balanced Binary Search Trees 283
doing so we do not disturb the black height condition, but we might intro
duce a redred violation, a situation in which a red node has a red child. We
then remove the redred violation by propagating it upwards towards the
root by a constanttime transformation on the tree (one of several possibil
ities, which we’ll discuss shortly). These transformations either eliminate
the redred violation outright, or, in logarithmic time, push the violation
to the root where it is neatly resolved by recoloring the root black (which
preserves the blackheight invariant!).
The violation is propagated upwards by one of four rotations. We will
maintain the invariant that there is at most one redred violation in the tree.
The insertion may or may not create such a violation, and each propagation
step will preserve this invariant. It follows that the parent of a redred vio
lation must be black. Consequently, the situation must look like this. This
diagram represents four distinct situations, according to whether the up
permost red node is a left or right child of the black node, and whether the
red child of the red node is itself a left or right child. In each case the red
red violation is propagated upwards by transforming it to look like this.
Notice that by making the uppermost node red we may be introducing a
redred violation further up the tree (since the black node’s parent might
have been red), and that we are preserving the blackheight invariant since
the greatgrandchildren of the black node in the original situation will ap
pear as children of the two black nodes in the reorganized situation. No
tice as well that the binary search tree conditions are also preserved by this
transformation. As a limiting case if the redred violation is propagated to
the root of the entire tree, we recolor the root black, which preserves the
blackheight condition, and we are done rebalancing the tree.
Let’s look in detail at two of the four cases of removing a redred vio
lation, those in which the uppermost red node is the left child of the black
node; the other two cases are handled symmetrically. If the situation looks
like this, we reorganize the tree to look like this. You should check that the
blackheight and binary search tree invariants are preserved by this trans
formation. Similarly, if the situation looks like this, then we reorganize the
tree to look like this (precisely as before). Once again, the blackheight and
binary search tree invariants are preserved by this transformation, and the
redred violation is pushed further up the tree.
Here is the ML code to implement dictionaries using a redblack tree.
Notice that the tree rotations are neatly expressed using pattern matching.
structure RedBlackTree :> DICT =
struct
type key = string
WORKING DRAFT DECEMBER 9, 2002
284 Data Abstraction
type ’a entry = string * ’a
(* Inv: binary search tree + redblack conditions *)
datatype ’a dict =
Empty 
Red of ’a entry * ’a dict * ’a dict 
Black of ’a entry * ’a dict * ’a dict
val empty = Empty
exception Lookup of key
fun lookup dict key =
let
fun lk (Empty) = raise (Lookup key)
 lk (Red tree) = lk’ tree
 lk (Black tree) = lk’ tree
and lk’ ((key1, datum1), left, right) =
(case String.compare(key,key1)
of EQUAL => datum1
 LESS => lk left
 GREATER => lk right)
in
lk dict
end
fun restoreLeft
(Black (z, Red (y, Red (x, d1, d2), d3), d4)) =
Red (y, Black (x, d1, d2), Black (z, d3, d4))
 restoreLeft
(Black (z, Red (x, d1, Red (y, d2, d3)), d4)) =
Red (y, Black (x, d1, d2), Black (z, d3, d4))
 restoreLeft dict = dict
fun restoreRight
(Black (x, d1, Red (y, d2, Red (z, d3, d4)))) =
Red (y, Black (x, d1, d2), Black (z, d3, d4))
 restoreRight
(Black (x, d1, Red (z, Red (y, d2, d3), d4))) =
Red (y, Black (x, d1, d2), Black (z, d3, d4))
 restoreRight dict = dict
fun insert (dict, entry as (key, datum)) =
let
(* val ins : ’a dict>’a dict insert entry *)
(* ins (Red ) may have redred at root *)
(* ins (Black ) or ins (Empty) is red/black *)
(* ins preserves black height *)
fun ins (Empty) = Red (entry, Empty, Empty)
 ins (Red (entry1 as (key1, datum1), left, right)) =
WORKING DRAFT DECEMBER 9, 2002
32.4 Abstraction vs. RunTime Checking 285
(case String.compare (key, key1)
of EQUAL => Red (entry, left, right)
 LESS => Red (entry1, ins left, right)
 GREATER => Red (entry1, left, ins right))
 ins (Black (entry1 as (key1, datum1), left, right)) =
(case String.compare (key, key1)
of EQUAL => Black (entry, left, right)
 LESS => restoreLeft (Black (entry1, ins left, right))
 GREATER => restoreRight (Black (entry1, left, ins right)))
in
case ins dict
of Red (t as ( , Red , )) => Black t (* recolor *)
 Red (t as ( , , Red )) => Black t (* recolor *)
 dict => dict
end
end
It is worthwhile to contemplate the role played by the redblack invariant
in ensuring the correctness of the implementation and the time complexity
of the operations.
32.4 Abstraction vs. RunTime Checking
You might wonder whether we could equally well use runtime checks to
enforce representation invariants. The idea would be to introduce a “debug
ﬂag” that, when set, causes the operations of the dictionary to check that the
representation invariant holds of their arguments and results. In the case of
a binary search tree this is surely possible, but at considerable expense since
the time required to check the binary search tree invariant is proportional
to the size of the binary search tree itself, whereas an insert (for example)
can be performed in logarithmic time. But wouldn’t we turn off the debug
ﬂag before shipping the production copy of the code? Yes, indeed, but
then the beneﬁts of checking are lost for the code we care about most! (To
paraphrase Tony Hoare, it’s as if we used our life jackets while learning
to sail on a pond, then tossed them away when we set out to sea.) By
using the type system to enforce abstraction, we can conﬁne the possible
violations of the representation invariant to the dictionary package itself,
and, moreover, we need not turn off the check for production code because
there is no runtime penalty for doing so.
A more subtle point is that it may not always be possible to enforce
data abstraction at runtime. Efﬁciency considerations aside, you might
WORKING DRAFT DECEMBER 9, 2002
286 Data Abstraction
think that we can always replace static localization of representation errors
by dynamic checks for violations of them. But this is false! One reason is
that the representation invariant might not be computable. As an example,
consider an abstract type of total functions on the integers, those that are
guaranteed to terminate when called, without performing any I/O or hav
ing any other computational effect. It is a theorem of recursion theory that
no runtime check can be deﬁned that ensures that a given integervalued
function is total. Yet we can deﬁne an abstract type of total functions that,
while not admitting every possible total function on the integers as values,
provides a useful set of such functions as elements of a structure. By using
these speciﬁed operations to create a total function, we are in effect encod
ing a proof of totality in the code itself.
Here’s a sketch of such a package:
signature TIF = sig
type tif
val apply : tif > (int > int)
val id : tif
val compose : tif * tif > tif
val double : tif
.
.
.
end
structure Tif :> TIF = struct
type tif = int>int
fun apply t n = t n
fun id x = x
fun compose (f, g) = f o g
fun double x = 2 * x
.
.
.
end
Should the application of such some value of type Tif.tif fail to termi
nate, we know where to look for the error. No runtime check can assure
us that an arbitrary integer function is in fact total.
Another reason why a runtime check to enforce data abstraction is im
possible is that it may not be possible to tell from looking at a given value
whether or not it is a legitimate value of the abstact type. Here’s an exam
ple. In many operating systems processes are “named” by integervalue
process identiﬁers. Using the process identiﬁer we may send messages to
the process, cause it to terminate, or perform any number of other opera
WORKING DRAFT DECEMBER 9, 2002
32.4 Abstraction vs. RunTime Checking 287
tions on it. The thing to notice here is that any integer at all is a possible
process identiﬁer; we cannot tell by looking at the integer whether it is in
deed valid. No runtime check on the value will reveal whether a given
integer is a “real” or “bogus” process identiﬁer. The only way to know is
to consider the “history” of how that integer came into being, and what
operations were performed on it. Using the abstraction mechanisms just
described, we can enforce the requirement that a value of type pid, whose
underlying representation is int, is indeed a process identiﬁer. You are
invited to imagine how this might be achieved in ML.
WORKING DRAFT DECEMBER 9, 2002
288 Data Abstraction
WORKING DRAFT DECEMBER 9, 2002
Chapter 33
Representation Independence
and ADT Correctness
This chapter is concerned with proving correctness of ADT implementa
tions by exhibiting a simulation relation between a reference implemen
tation (taken, or known, to be correct) and a candidate implementation
(whose correctness is to be established). The methodology generalizes Hoare’s
notion of abstraction functions to an arbitrary relation, and relies on Reynolds’
notion of parametricity to conclude that related implementations engender
the same observable behavior in all clients.
WORKING DRAFT DECEMBER 9, 2002
290 Representation Independence and ADT Correctness
WORKING DRAFT DECEMBER 9, 2002
Chapter 34
Modularity and Reuse
1. Naming conventions.
2. Exploiting structural subtyping (type t convention).
3. Impedancematching functors.
WORKING DRAFT DECEMBER 9, 2002
292 Modularity and Reuse
WORKING DRAFT DECEMBER 9, 2002
Chapter 35
Dynamic Typing and Dynamic
Dispatch
This chapter is concerned with dynamic typing in a statically typed lan
guage. It is commonly thought that there is an “opposition” between statically
typed languages (such as Standard ML) and dynamicallytyped languages
(such as Scheme). In fact, dynamically typed languages are a special case of
staticallytyped languages! We will demonstrate this by exhibiting a faith
ful representation of Scheme inside of ML.
WORKING DRAFT DECEMBER 9, 2002
294 Dynamic Typing and Dynamic Dispatch
WORKING DRAFT DECEMBER 9, 2002
Chapter 36
Concurrency
In this chapter we consider some fundamental techniques for concurrent
programming using CML.
WORKING DRAFT DECEMBER 9, 2002
296 Concurrency
WORKING DRAFT DECEMBER 9, 2002
Part V
Appendices
WORKING DRAFT DECEMBER 9, 2002
The Standard ML Basis Library
The Standard ML Basis Library is a collection of modules providing a basic
collection of abstract types that are shared by all implementations of Stan
dard ML. All of the primitive types of Standard ML are deﬁned in struc
tures in the Standard Basis. It also deﬁnes a variety of other commonly
used abstract types.
Most implementations of Standard ML include module libraries imple
menting a wide variety of services. These libraries are usually not portable
across implementations, particularly not those that are concerned with the
internals of the compiler or its interaction with the host computer system.
Please refer to the documentation of your compiler for information on its
libraries.
WORKING DRAFT DECEMBER 9, 2002
300 The Standard ML Basis Library
WORKING DRAFT DECEMBER 9, 2002
Compilation Management
All program development environments provide tools to support building
systems out of collections of separatelydeveloped modules. These tools
usually provide services such as:
1. Source code management such as version and revision control.
2. Separate compilation and linking to support simultaneous development
and to reduce build times.
3. Libraries of reusable modules with consistent conventions for identify
ing modules and their components.
4. Release management for building and disseminating systems for gen
eral use.
Different languages, and different vendors, support these activities in dif
ferent ways. Some rely on generic tools, such as the familiar Unix tools,
others provide proprietary tools, commonly known as IDE’s (integrated
development environments).
Most implementations of Standard ML rely on a combination of generic
programdevelopment tools and tools speciﬁc to that implementation of the
language. Rather than attempt to summarize all of the known implemen
tations, we will instead consider the SML/NJ Compilation Manager (CM) as
a representative program development framework for ML. Other compil
ers provide similar tools; please consult your compiler’s documentation for
details of how to use them.
36.1 Overview of CM
36.2 Building Systems with CM
WORKING DRAFT DECEMBER 9, 2002
302 Compilation Management
WORKING DRAFT DECEMBER 9, 2002
Sample Programs
A number of example programs illustrating the concepts discussed in the
preceding chapters are available on the worldwide web at the following
URL:
http://www.cs.cmu.edu/˜rwh/smlbook/examples
WORKING DRAFT DECEMBER 9, 2002
304 Sample Programs
WORKING DRAFT DECEMBER 9, 2002
Bibliography
[1] Emden R. Gansner and John H. Reppy, editors. The Standard ML Basis
Library. Cambridge University Press, 2000.
[2] Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. The
Deﬁnition of Standard ML (Revised). MIT Press, 1997.
WORKING DRAFT DECEMBER 9, 2002
Copyright c 1997, 1998, 1999, 2000, 2001. All Rights Reserved.
Preface
This book is an introduction to programming with the Standard ML programming language. It began life as a set of lecture notes for Computer Science 15–212: Principles of Programming, the second semester of the introductory sequence in the undergraduate computer science curriculum at Carnegie Mellon University. It has subsequently been used in many other courses at Carnegie Mellon, and at a number of universities around the world. It is intended to supersede my Introduction to Standard ML, which has been widely circulated over the last ten years. Standard ML is a formally deﬁned programming language. The Deﬁnition of Standard ML (Revised) is the formal deﬁnition of the language. It is supplemented by the Standard ML Basis Library, which deﬁnes a common basis of types that are shared by all implementations of the language. Commentary on Standard ML discusses some of the decisions that went into the design of the ﬁrst version of the language. There are several implementations of Standard ML available for a wide variety of hardware and software platforms. The bestknown compilers are Standard ML of New Jersey, Moscow ML, MLKit, and PolyML. These are all freely available on the worldwide web. Please refer to The Standard ML Home Page for uptodate information on Standard ML and its implementations. Numerous people have contributed directly and indirectly to this text. I am especially grateful to the following people for their helpful comments and suggestions: Marc Bezem, Franck van Breugel, Karl Crary, Mike Erdmann, Matthias Felleisen, Joel Jones, John Lafferty, Adrian Moos, Bryce Nichols, Frank Pfenning, Chris Stone, Dave Swasey, Johan Wallen, Scott Williams, and Jeannette Wing. I am also grateful to the many students of 15212 who used these notes and sent in their suggestions over the years. These notes are a work in progress. Corrections, comments and suggestions are most welcome.
iv
Preface
W ORKING D RAFT
D ECEMBER 9, 2002
Contents
Preface iii
I Overview
1 Programming in Standard ML 1.1 A Regular Expression Package . . . . . . . . . . . . . . . . . 1.2 Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
5 5 13
II The Core Language
2 Types, Values, and Effects 2.1 Evaluation and Execution . . 2.2 The ML Computation Model 2.2.1 Type Checking . . . . 2.2.2 Evaluation . . . . . . . 2.3 Types, Types, Types . . . . . . 2.4 Type Errors . . . . . . . . . . . 3 Declarations 3.1 Variables . . . . . . . . . 3.2 Basic Bindings . . . . . . 3.2.1 Type Bindings . . 3.2.2 Value Bindings . 3.3 Compound Declarations 3.4 Limiting Scope . . . . . . 3.5 Typing and Evaluation .
W ORKING D RAFT
19
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 24 25 27 28 30 33 33 34 34 34 36 37 38
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
D ECEMBER 9, 2002
vi
CONTENTS 41 41 42 45 49 49 49 51 54 56 59 59 60 61 62 65 66 69 70 73 75 75 78 80 85 85 87 91 91 92 94 97 98
4 Functions 4.1 Functions as Templates . . . . . . . . . . . . . . . . . . . . . . 4.2 Functions and Application . . . . . . . . . . . . . . . . . . . . 4.3 Binding and Scope, Revisited . . . . . . . . . . . . . . . . . . 5 Products and Records 5.1 Product Types . . . . . . . . . . . . . . . . 5.1.1 Tuples . . . . . . . . . . . . . . . . 5.1.2 Tuple Patterns . . . . . . . . . . . . 5.2 Record Types . . . . . . . . . . . . . . . . 5.3 Multiple Arguments and Multiple Results 6 Case Analysis 6.1 Homegeneous and Heterogeneous Types 6.2 Clausal Function Expressions . . . . . . . 6.3 Booleans and Conditionals, Revisited . . 6.4 Exhaustiveness and Redundancy . . . . . 7 Recursive Functions 7.1 SelfReference and Recursion 7.2 Iteration . . . . . . . . . . . . 7.3 Inductive Reasoning . . . . . 7.4 Mutual Recursion . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
8 Type Inference and Polymorphism 8.1 Type Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Polymorphic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . 8.3 Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Programming with Lists 9.1 List Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Computing With Lists . . . . . . . . . . . . . . . . . . . . . . 10 Concrete Data Types 10.1 Datatype Declarations . . . . . 10.2 NonRecursive Datatypes . . . 10.3 Recursive Datatypes . . . . . . 10.4 Heterogeneous Data Structures 10.5 Abstract Syntax . . . . . . . . .
W ORKING D RAFT
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
D ECEMBER 9, 2002
. . . .1. . . . . . . . . . . . . . . . . .1 Primitve Exceptions . . . . . . . . . . . . . . . 2002 . 13. . . . . 13. . . . . . . . .3 ValueCarrying Exceptions . . . . . . .2 Exception Handlers . . . . . . . . . . . . . . . . . . . . . . .1 Private Storage . . . . . .5. . . . . . . 13. . . . . . . .1 Textual Input/Output . . . . . . . . . . . . 144 16 Equality and Equality Types 17 Concurrency 147 149 III The Module Language 151 18 Signatures and Structures 155 18. . . .2 Binding and Scope . . . . . .1 Signatures . . . . . . . . . . 142 15. . . 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Staging . . . . . .2 Mutable Data Structures . . . . . . . . . . . . . . . . . .2 Lazy Function Deﬁnitions . . . . . . . .5 Programming Well With References 13. . . . . . .1 Reference Cells . . . . . . . 141 15. . . . . . . . . .3 Identity . .1 Functions as Values . . . . . . . . . . . . . . 155 W ORKING D RAFT D ECEMBER 9. . . . . . . . . . .5. . . . . . . . . . . .3 Returning Functions 11.2 Reference Patterns . . . . . . . . . . . . .1 Lazy Data Types . . 13 Mutable Storage 13.4 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. . .1. . . . . .6 Mutable Arrays . . . 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . .CONTENTS 11 HigherOrder Functions 11. 14 Input/Output 135 14. . . . . . 13. . . . . . . . . . .2 UserDeﬁned Exceptions . . . . . . . . . . . . . . . . . . . . 13. . . . . . 135 15 Lazy Data Structures 139 15. . . . . . . . . . . . . . . . . . . . . . . 12. . . . . . .1 Exceptions as Errors . . . . . . . . . . . . . . . . . . . vii 101 101 102 103 106 108 111 111 112 113 114 118 121 121 123 124 126 128 128 130 132 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Programming with Streams . 11. . . . . . . . . . . . . . . . . . . 12 Exceptions 12. 13. . . . . . . . . . 12. . . . 12. . . . . . . . . . . . . . . . . . . . .4 Patterns of Control . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 200 IV Programming Techniques 24 Speciﬁcations and Correctness 24. . .2 Long and Short Identiﬁers . . . 24. .2 Structures . . . . .1 Combining Abstractions . . . . . .3 Satisfaction . . . . .1 Functor Bindings and Applications . . . . . . . . . . . . . . 169 20 Signature Ascription 20. . . . . . . . . . . . . . . . . . . . . . 20. . . . . . . . . . . . . . . . . . . . .2 The GCD Algorithm . . . . . . .1 Exponentiation . . . . 165 19. . . .2 Correctness Proofs . . . . . 18. . . . . . .2. . . . 2002 . .3 Transparent Ascription . . . . . . . . . . . . . . . . . . . . . . .2 Signature Inheritance . . . . . . . . . . . . 217 25.viii 18. . . . . . . . . . . 20. .3 Avoiding Sharing Speciﬁcations . . . . . . . . . . . . . . . . . . . . .1 Basic Signatures . . 24. . . . . . . . . . . . . .1. . . . . . . . .1 Speciﬁcations . . . . . . . . . .1 Principal Signatures . . .1 Basic Structures . . .1 Substructures . . . . .2 Opaque Ascription .2 Functors and Sharing Speciﬁcations . . . . . . . 179 22 Sharing Speciﬁcations 187 22. 164 19. . . . . . . . . . . . . . . . . . . . . . . . . . .1 Ascribed Structure Bindings . . . . . . . . . . 222 W ORKING D RAFT D ECEMBER 9. . . . Opacity. . . . . . . .2 Matching . . . . . . .3 Enforcement and Compliance . . . . . 155 157 159 159 160 19 Signature Matching 163 19. . . . . . . . . . . . . . 195 23. . . . . . . . . . . . . . . . . . . . . . . 198 23. 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. 187 23 Parameterization 195 23. . . . . . . . . . and Dependency 171 171 172 175 177 . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Module Hierarchies 179 21.2. . . . . . . . . . . . . . . . 18. . . . . . . . . 18. . .4 Transparency. . . . . . . . . . . . . . . . . . 205 209 209 211 214 25 Induction and Recursion 217 25. . . . . . . . .1. . . . . . . . . . . CONTENTS . . . . . . . .
. . . . . . 251 29 Options.3 Solution Using Exceptions . . . . . . . . . . . . . . . . 26. . . . . 271 271 273 275 276 279 279 280 281 285 289 291 D ECEMBER 9. .2 Amortized Analysis . . . . . . . . 267 31 Memoization 31. . . . . . . . . . . . . . . . . and Continuations 29. . . . . . . . . . . . 29. . . . . . . . .1 The nQueens Problem . . . . . . . . 235 27. 248 28. . . . . . . . . . . . . . . . . . . . . 32 Data Abstraction 32. 31. . . . .1 Persistent Queues . . . . . 237 28 Persistent and Ephemeral Data Structures 245 28. . . . . . . . . . . .1 Regular Expressions and Languages . . . . . . . . . . . . . 30 HigherOrder Functions 263 30. . . . . . .4 Recursive Suspensions . . . . . . . . . . .3 Lazy Data Types in SML/NJ 31. . . . . 29. .CONTENTS 26 Structural Induction 26. . . . . . . . . . . . ix 227 227 229 230 231 232 . . . . . 27 ProofDirected Debugging 235 27. . . . . . 2002 . . . .3 Balanced Binary Search Trees . . . . . . . . . . . . . .4 Generalizations and Limitations 26. . . .1 Dictionaries . . . . . . . . . . . . . . . . .1 Natural Numbers . . . . . . . . . . . . .2 Solution Using Options . . . . . . . . . . . . . . . . .4 Abstraction vs. . . . . . . . 255 255 257 258 260 . . . . . . . . . . . .2 Laziness . . . . . . . . . . . 31. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Representation Independence and ADT Correctness 34 Modularity and Reuse W ORKING D RAFT . . . . . . . . . 264 30. . . . . . . . . . . . 32. . . .2 Circuit Simulation . . . . . . .2 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32. 32. . . . .5 Abstracting Induction . . . . . . .3 Trees . . 26. . . . . . . . . . . . . . . . . . . . . 29. . . . . . . . . . . . . . .1 Cacheing Results . . . . . . . . . . . . . . . Exceptions. . . . .4 Solution Using Continuations . . . . . . . . . . . . . . . . 26. . . . . . . . RunTime Checking . . . . . . . . . . . . . . . . . . . . . .1 Inﬁnite Sequences . .2 Specifying the Matcher . . . . . . . . . . . . . . . . . .2 Lists . .
.2 Building Systems with CM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 36. .x 35 Dynamic Typing and Dynamic Dispatch 36 Concurrency CONTENTS 293 295 V Appendices The Standard ML Basis Library 297 299 Compilation Management 301 36. . . . . 301 Sample Programs 303 W ORKING D RAFT D ECEMBER 9. 2002 . . . . .1 Overview of CM . . . . .
Part I Overview W ORKING D RAFT D ECEMBER 9. 2002 .
.
3 Standard ML is a typesafe programming language that embodies many innovative ideas in programming language design. Nearly every compiler generates native machine code. and building generic modules. and analyzing the behavior of programs. imposing hierarchical structure. including mechanisms for enforcing abstraction. Details can be found with the documentation for your compiler. but here’s some of what you may expect. A few implementations are “batch compilers” that rely on the ambient operating system to manage the construction of large programs from compiled parts. Some implementations provide tools for tracing and stepping programs. Many implementations go beyond the standard to provide experimental language features. It supports polymorphic type inference. It provides a richly expressive and ﬂexible module system for structuring large programs. with an extensible type system. Most implementations supplement the standard basis library with a rich collection of handy components such as dictionaries. It is portable across platforms and implementations because it has a precise deﬁnition. which all but eliminates the burden of specifying types of variables and greatly facilitates code reuse. Some implementations support language extensions such as support for concurrent programming (using messagepassing or locking). and useful program development tools. Most implementations provide an interactive system supporting online program development. It provides efﬁcient automatic storage management for data structures and functions. but some also generate code for a portable abstract machine. linking. It is a statically typed language. hash tables. or interfaces to the ambient operating system. but allows imperative (effectful) programming where necessary. extensive libraries of commonlyused routines. 2002 . W ORKING D RAFT D ECEMBER 9. richer forms of modularity constructs. many provide tools for time and space proﬁling. even when used interactively. It facilitates programming with recursive and symbolic data structures by supporting the deﬁnition of functions by pattern matching. including tools for compiling. Most implementations support separate compilation and provide tools for managing large systems and shared libraries. and support for “lazy” data structures. It provides a portable standard basis library that deﬁnes a rich collection of commonlyused types and routines. It features an extensible exception mechanism for handling error conditions and effecting nonlocal transfers of control. It encourages functional (effectfree) programming where appropriate.
4 W ORKING D RAFT D ECEMBER 9. 2002 .
These two modules are concisely described by the following signatures. It consists of a description of the abstract syntax of regular expressions. We’ll structure the implementation into two modules.1 A Regular Expression Package To develop a feel for the language and how it is used. 2002 .Chapter 1 Programming in Standard ML 1. signature REGEXP = sig datatype regexp = Zero  One  Char of char  Plus of regexp * regexp  Times of regexp * regexp  Star of regexp exception SyntaxError of string val parse : string > regexp val format : regexp > string end signature MATCHER = sig structure RegExp : REGEXP val match : RegExp. The sigW ORKING D RAFT D ECEMBER 9. an implementation of regular expressions themselves and an implementation of a matching algorithm for them. let us consider the implementation of a package for matching strings against regular expressions.regexp > string > bool end The signature REGEXP describes a module that implements regular expressions. together with operations for parsing and unparsing them.
corresponding to the regular expressions 0. Obviously the matcher is dependent on the implementation of regular expressions. r1 r2 . the parser raises the exception SyntaxError with an associated string describing the source of the error. Thus the overall package is implemented by the following two structure declarations: structure RegExp :> REGEXP = . The parser takes a string as argument and yields a regular expression... An implementation of a signature is called a structure. and does not rely on implicit conventions for determining which implementation of regular expressions it employs..) The abstract syntax consists of six clauses. The unparser takes a regular expression and yields a string that parses to that regular expression. and r∗ . the unparser generally tries to choose one that is easiest to read. The deﬁnition of the abstract syntax of regular expressions in the signature REGEXP takes the form of a datatype declaration that is reminiscent of a contextfree grammar. This is expressed by a structure speciﬁcation that speciﬁes a hierarchical dependence of an implementation of a matcher on an implementation of regular expressions — any implementation of the MATCHER signature must include an implementation of regular expressions as a constituent module. the other implementing MATCHER. Conformance with the signature is ensured by the ascription of the signature REGEXP to the binding of RegExp using the “:>” notation. The structure identiﬁer RegExp is bound to an implementation of the REGEXP signature. This ensures that the matcher is selfcontained. The implementation of the matcher consists of two modules: an implementation of regular expressions and an implementation of the matcher itself. 1.. conventions for naming the operators. one implementing the signature REGEXP. when given a regular expression. parenthesization. structure Matcher :> MATCHER = . The implementation of the matching package consists of two structures. returns a function that determines whether or not a given string matches that expression. W ORKING D RAFT D ECEMBER 9. In general there are many strings that parse to the same regular expressions. a. 2002 . r1 + r2 .1 The functions parse and format specify the parser and unparser for regular expressions. Not only does this check that the implementation (which has been elided 1 Some authors use ∅ for 0 and ε for 1. but which abstracts from matters of lexical presentation (such as precedences of operators.. etc. if the string is illformed. It contains a function match that.6 Programming in Standard ML nature MATCHER describes a module that implements a matcher for a given notion of regular expression.
Matcher. Here’s an example: It might seem that one can apply Matcher.RegExp. Similarly. 2002 . but it also ensures that subsequent code cannot rely on any properties of the implementation other than those explicitly speciﬁed in the signature. There are two typical methods for alleviating the burden. One is to introduce a synonym for a long package name.RegExp. Once these structure declarations have been processed.regexp to Matcher.parse.regexp > string > bool.1 A Regular Expression Package 7 here) conforms with the requirements of the signature REGEXP. since Matcher. The function Matcher.parse.parse is just RegExp. See Chapter 22 for more information on the subtle issue of sharing. the structure identiﬁer Matcher is bound to a structure that implements the matching algorithm in terms of the preceding implementation RegExp of REGEXP. to a string representing a regular expression. this relationship is not stated in the interface.RegExp.match has type Matcher.RegExp.match to the output of RegExp. or long identiﬁers.2 Here’s an example of the matcher in action: val regexp = Matcher.RegExp.parse.parse "(a+b)*" val matches = Matcher. so there is a pro forma distinction between the two. 2 W ORKING D RAFT D ECEMBER 9. This helps to ensure that modules are kept separate. reﬂecting the fact that it takes a regular expression as implemented within the package itself and yields a matching function on strings.match regexp val ex1 = matches "aabba" val ex2 = matches "abac" (* yields true *) (* yields false *) The use of long identiﬁers can get tedious at times.1. We may build a regular expression by applying the parser. Notice that the structure Matcher refers to the structure RegExp in its implementation. then passing the resulting value of type Matcher. we may use the package by referring to its components using paths. However. The ascribed signature speciﬁes that the structure Matcher must conform to the requirements of the signature MATCHER. facilitating subsequent changes to the code.match.
8 structure M = Matcher structure R = M. incorporating its bindings into the current environment: open Matcher Matcher.parse "((a + %). Here’s a bird’s eye view of RegExp: structure RegExp :> REGEXP = struct datatype regexp = Zero  One  Char of char  Plus of regexp * regexp  Times of regexp * regexp  Star of regexp . s’) = parse rexp (tokenize (String. fun parse s = let val (r. 2002 . ..(b + %))*" val matches = M. . .RegExp val regexp = parse "(a+b)*" val matches = match regexp val ex1 = matches "aabba" val ex2 = matches "abac" It is advisable to be sparing in the use of open because it is often hard to anticipate exactly which bindings are incorporated into the environment by its use.." end W ORKING D RAFT D ECEMBER 9. Now let’s look at the internals of the structures RegExp and Matcher.match regexp val ex1 = matches "aabba" val ex2 = matches "abac" Another is to “open” the structure.RegExp Programming in Standard ML val regexp = R. . fun tokenize s = . .explode s)) in case s’ of nil => r  => raise SyntaxError "Bad input.
1. when given a piece of abstract syntax. Let’s start with the tokenizer. The parser is implemented by a function that. and ﬁnally parses the resulting list of tokens to obtain its abstract syntax. The parser and formatter work with character lists." :: cs) = (#"*" :: cs) = (#"(" :: cs) = (#")" :: cs) = (#"@" :: cs) = (#"%" :: cs) = (#"\\" :: c :: PlusSign :: tokenize cs TimesSign :: tokenize cs Asterisk :: tokenize cs LParen :: tokenize cs RParen :: tokenize cs AtSign :: tokenize cs Percent :: tokenize cs cs) = D ECEMBER 9. . when given a string. “explodes” it into a list of characters. The type regexp is implemented precisely as speciﬁed by the datatype declaration in the signature REGEXP. If there is remaining input after the parse. The structure RegExp is bracketed by the keywords struct and end. It is interesting to consider in more detail the structure of the parser since it exempliﬁes well the use of pattern matching to deﬁne functions. transforms the character list into a list of “tokens” (abstract symbols representing lexical atoms). 2002 W ORKING D RAFT .1 A Regular Expression Package handle LexicalError => raise SyntaxError "Bad input. rather than strings." . fun format r = String. or if the tokenizer encountered an illegal token. The formatter is implemented by a function that. .implode (format exp r) end 9 The elision indicates that portions of the code have been omitted so that we can get a highlevel view of the structure of the implementation. which we present here in toto: datatype token = AtSign  Percent  Literal of char  PlusSign  TimesSign  Asterisk  LParen  RParen exception LexicalError fun         tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize nil = nil (#"+" :: cs) = (#". formats it into a list of characters that are then “imploded” to form a string. because it is easier to process lists incrementally than it is to process strings. an appropriate syntax error is signalled.
(It is a lexical error to have a backslash at the end of the input. and iteration by “*”. alternation by “+”. and parse ratm. parse rexp. The function tokenize has type char list > token list. It is deﬁned by a series of clauses that dispatch on the ﬁrst character of the list of characters given as input. rexp rtrm rfac ratm ::= ::= ::= ::= rfac  rtrm+rexp rfac  rfac. for example. 2002 . and proceeding accordingly. Concatentation is indicated by “.”.rtrm ratm  ratm* @  %  a  (rexp) The implementation of the parser follows this grammar quite closely.) Notice that it is quite natural to “look ahead” in the input stream in the case of the backslash character.) Let’s turn to the parser. These conventions may be formally speciﬁed by the following grammar. It is a simple recursivedescent parser implementing the precedence conventions for regular expressions given earlier. parse rfac. The correspondence between characters and tokens is relatively straightforward. parse rtrm. the only nontrivial case being to admit the use of a backslash to “quote” a reserved symbol as a character of input. using a pattern that dispatches on the ﬁrst two characters (if there are such) of the input. We use a datatype declaration to introduce the type of tokens corresponding to the symbols of the input language. words (consecutive sequences of letters) are often regarded as a single token of input. but also allows for the use of parenthesization to override them. it transforms a list of characters into a list of tokens. It consists of four mutually recursive functions. (More sophisticated languages have more sophisticated token structures. yielding a list of tokens. These implement what is known as a recursive descent parser that dispatches on the head of the token list to determine how to proceed. which not only enforces precedence conventions. fun parse rexp ts = let W ORKING D RAFT D ECEMBER 9.10 Programming in Standard ML Literal c :: tokenize cs  tokenize (#"\\" :: nil) = raise LexicalError  tokenize (#" " :: cs) = tokenize cs  tokenize (c :: cs) = Literal c :: tokenize cs The symbol “@” stands for the empty regular expression and the symbol “%” stands for the regular expression accepting only the null string.
.1 A Regular Expression Package val (r. ts’’)  => raise SyntaxError "No close paren" end 11 and     Notice that it is quite simple to implement “lookahead” using patterns that inspect the token list for speciﬁed tokens. ts’) = parse rexp ts in case ts’ of (RParen :: ts’’) => (r. using standard techniques. ts’) = parse ratm ts in case ts’ of (Asterisk :: ts’’) => (Star r. Now for the W ORKING D RAFT D ECEMBER 9. ts’) and and end parse rtrm ts = .. ts’’’) = parse rexp ts’’ in (Plus (r. parse rfac ts = let val (r.1. ts’’’) end  => (r. 2002 . ts) parse ratm (LParen :: ts) = let val (r. r’). ts’)  end parse ratm nil = raise SyntaxError ("No atom") parse ratm (AtSign :: ts) = (Zero. ts) parse ratm (Percent :: ts) = (One. ts) parse ratm ((Literal c) :: ts) = (Char c. ts’) = parse rtrm ts in case ts’ of (PlusSign :: ts’’) => let val (r’. This parser makes no attempt to recover from syntax errors. but one could imagine doing so. ts’’) => (r. This completes the implementation of regular expressions.
but how do we ensure that at the outermost call the entire string has been matched? We achieve this by using an initial continuation that checks whether the ﬁnal segment is empty. written as a structure implementing the signature MATCHER: structure Matcher :> MATCHER = struct structure RegExp = RegExp open RegExp fun match is Zero k = false  match is One cs k = k cs  match is (Char c) nil = false  match is (Char c) (d::cs) k = (c=d) andalso (k cs)  match is (Plus (r1. a function that determines what to do after the initial segment has been successfully matched. Here’s the code. r2)) cs k = match is r1 cs k orelse match is r2 cs k  match is (Times (r1. This facilitates implementation of concatentation. then passes the remaining ﬁnal segment to a continuation. in accordance with the requirements of the signature. 2002 . The main difﬁculty is to account for concatenation — to match a string against the regular expression r1 r2 we must match some initial segment against r1 .12 Programming in Standard ML matcher. This suggests that we generalize the matcher to one that checks whether some initial segment of a string matches a given regular expression. The matcher proceeds by a recursive analysis of the regular expression.explode string) (fn nil => true  => false) end Note that we incorporate the structure RegExp into the structure Matcher. The function match explodes the string into a list of characters (to facilitiate sequential processing of the input). then match the corresponding ﬁnal segment against r2 . then calls match is with an initial continuation that W ORKING D RAFT D ECEMBER 9. r2)) cs k = match is r1 cs (fn cs’ => match is r2 cs’ k)  match is (Star r) cs k = k cs orelse match is r cs (fn cs’ => match is (Star r) cs’ k) fun match regexp string = match is regexp (String.
Since match is takes a function as argument.regexp > char list > (char list > bool) > bool. match is takes in succession a regular expression. the language in which we write programs in ML. Can you ﬁnd it? If not. The type of match is is RegExp. That is.1. 2002 . We use a similar technique to implement matching against an iterated regular expression — we attempt to match the null string. the means by which we structure large programs in ML. unnamed function to pass as a continuation in the case of concatenation — it is the function that matches the second part of the regular expression to the characters remaining after matching an initial segment against the ﬁrst part. but notice that nowhere did we have to write this type in the code! The type inference mechanism of ML took care of determining what that type must be based on an analysis of the code itself. The remainder of these notes are structured into three parts. The ﬁrst part is a detailed introduction to the core language. Notice that we create a new. methods for building reliable and robust programs. see chapter 27 for further discussion! This completes our brief overview of Standard ML. a list of characters. The simplicity of the matcher is due in large measure to the ease with which we can manipulate functions in ML. Important: the code given above contains a subtle error. it is said to be a higherorder function. The third is about programming techniques. but if this fails.2 Sample Code 13 ensures that the remaining input is empty to determine the result. it yields as result a value of type bool.2 Sample Code Here is the complete code for this chapter: signature REGEXP = sig W ORKING D RAFT D ECEMBER 9. This is a fairly complicated type. we match against the regular expression being iterated followed by the iteration once again. This neatly captures the “zero or more times” interpretation of iteration of a regular expression. and a continuation of type char list > bool. I hope you enjoy it! 1. The second part is concerned with the module language.
2002 ." :: cs) = (TimesSign :: tokenize cs) (#"*" :: cs) = (Asterisk :: tokenize cs) (#"(" :: cs) = (LParen :: tokenize cs) (#")" :: cs) = (RParen :: tokenize cs) (#"@" :: cs) = (AtSign :: tokenize cs) (#"%" :: cs) = (Percent :: tokenize cs) (#"\\" :: c :: cs) = Literal c :: tokenize cs (#"\\" :: nil) = raise LexicalError (#" " :: cs) = tokenize cs (c :: cs) = Literal c :: tokenize cs datatype regexp = Zero  One  Char of char  Plus of regexp * regexp  Times of regexp * regexp  Star of regexp W ORKING D RAFT D ECEMBER 9.regexp > string > bool end structure RegExp :> REGEXP = struct datatype token = AtSign  Percent  Literal of char  PlusSign  TimesSign  Asterisk  LParen  RParen exception LexicalError fun            tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize tokenize nil = nil (#"+" :: cs) = (PlusSign :: tokenize cs) (#".14 Programming in Standard ML datatype regexp = Zero  One  Char of char  Plus of regexp * regexp  Times of regexp * regexp  Star of regexp exception SyntaxError of string val parse : string > regexp val format : regexp > string end signature MATCHER = sig structure RegExp : REGEXP val match : RegExp.
ts’’’) end  _ => (r. r’). ts) ((Literal c) :: ts) = (Char c. ts’) end and     parse_atom parse_atom parse_atom parse_atom parse_atom nil = raise SyntaxError ("Factor expected\n") (AtSign :: ts) = (Zero. ts’’)  _ => (r. ts’) end and parse_term ts = let val (r. ts’’’) = parse_term ts’’ in (Times (r.1. ts’’’) end  _ => (r. ts’) = parse_term ts in case ts’ of (PlusSign::ts’’) => let val (r’. ts) (Percent :: ts) = (One. ts’’’) = parse_exp ts’’ in (Plus (r. ts’) = parse_factor ts in case ts’ of (TimesSign::ts’’) => let val (r’.2 Sample Code 15 exception SyntaxError of string fun parse_exp ts = let val (r. ts) (LParen :: ts) = D ECEMBER 9. ts’) = parse_atom ts in case ts’ of (Asterisk :: ts’’) => (Star r. ts’) end and parse_factor ts = let val (r. r’). 2002 W ORKING D RAFT .
r2)) = let val s1 = format_exp r1 val s2 = format_exp r2 in [#"("] @ s1 @ [#"+"] @ s2 @ [#")"] end  format_exp (Times (r1. ts’’) _ => raise SyntaxError ("Rightparenthesis expected\n") fun format r = String. ts) = parse_exp (tokenize (String. 2002 .explode s)) in case ts of nil => r  _ => raise SyntaxError "Unexpected input. r2)) = let val s1 = format_exp r1 val s2 = format_exp r2 in s1 @ [#"*"] @ s2 end  format_exp (Star r) = let val s = format_exp r in [#"("] @ s @ [#")"] @ [#"*"] end ts’ nil => raise SyntaxError ("Rightparenthesis expected\n") (RParen :: ts’’) => (r.\n" fun    format_exp Zero = [#"@"] format_exp One = [#"%"] format_exp (Char c) = [c] format_exp (Plus (r1.implode (format_exp r) W ORKING D RAFT D ECEMBER 9.\n" end handle LexicalError => raise SyntaxError "Illegal input. ts’) = parse_exp ts in case of   end fun parse s = let val (r.16 let Programming in Standard ML val (r.
explode string) (fn nil => true  _ => false) end W ORKING D RAFT D ECEMBER 9. r2)) cs k = (match_is r1 cs k) orelse (match_is r2 cs k)  match_is (Times (r1. 2002 .1. r2)) cs k = match_is r1 cs (fn cs’ => match_is r2 cs’ k)  match_is (r as Star r1) cs k = (k cs) orelse match_is r1 cs (fn cs’ => match_is r cs’ k) fun match regexp string = match_is regexp (String.2 Sample Code end structure Matcher :> MATCHER = struct structure RegExp = RegExp open RegExp fun     17 match_is Zero _ k = false match_is One cs k = k cs match_is (Char c) nil _ = false match_is (Char c) (c’::cs) k = (c=c’) andalso (k cs) match_is (Plus (r1.
18 Programming in Standard ML W ORKING D RAFT D ECEMBER 9. 2002 .
2002 .Part II The Core Language W ORKING D RAFT D ECEMBER 9.
.
The second part. The ﬁrst part. the means of deﬁning and using functions. 2002 . the core language. mechanisms for deﬁnining new types. comprises the mechanisms for structuring programs into separate units and is described in Part III.21 All Standard ML is divided into two parts. W ORKING D RAFT D ECEMBER 9. Here we introduce the core language. the module language. and so on. comprises the fundamental programming constructs of the language — the primitive types and operations.
2002 .22 W ORKING D RAFT D ECEMBER 9.
such as C or Java. using the rules of arithmetic. Programs are thought of as specifying a sequence of commands that modify the memory of the computer. Conditional commands branch according to the value of some expression. The evaluation model of computation used in ML is based on the same idea. we admit a richer variety of values and a richer variety of primitive operations on them. determine the value of the polynomial. modiﬁes the memory. but often (in C. and then.Chapter 2 Types. but rather than restrict ourselves to arithmetic operations on the reals. rather than execution of commands. Values. are based on an imperative model of computation. Computation in ML is of a somewhat different nature.1 Evaluation and Execution Most familiar programming languages. Many languages maintain a distinction between expressions and commands. The progress of the computation is controlled by evaluation of expressions. W ORKING D RAFT D ECEMBER 9. 2002 . that are executed for their value. such as boolean tests or arithmetic operations. We proceed by “plugging in” the given value for x . for example) expressions may also modify the memory. The emphasis in ML is on computation by evaluation of expressions. so that even expression evaluation has an effect. Each step of execution examines the current contents of memory. The idea of computation is as a generalization of your experience from high school algebra in which you are given a polynomial in a variable x and are asked to calculate its value at a given value of x . and continues with the next instruction. The individual commands are executed for their effect on the memory (which we may take to include both the internal memory and registers and the external input/output devices). performs a simple computation. and Effects 2.
and modifying memory. those that do are said to be welltyped. should it yield a value at all. What is more. making the code easier to understand and maintain. 2002 . • It may or may not have a value. Values. and for an expression to have type real is to say that its value (if any) is a ﬂoating point number. never their absence. it is quite remarkable how seldom memory modiﬁcation is required. Doing so allows us to support imperative programming without destroying the mathematical elegance of the evaluation model for programs that don’t use memory. should it have one. As we will see. Each expression has three important characteristics: • It may or may not have a type. Nevertheless. Every expression is required to have at least one type. for an expression to have type int is to say that its value (should it have one) is a number. the evaluation model subsumes the imperative model as a special case. Execution of commands for the effect on memory can be seen as a special case of evaluation of expressions by introducing primitive operations for allocating. it is much easier to develop mathematical techniques for reasoning about the behavior of programs. 2. and Effects The evaluation model of computation enjoys several advantages over the more familiar imperative model. accessing. Moreover. they provide important tools for documenting the reasoning that went into the formulation of a program. we instead take expression evaluation as the primary notion. W ORKING D RAFT D ECEMBER 9. the language provides for storagebased computation for those few times that it is actually necessary.24 Types. • It may or may not engender an effect. Rather than forcing all aspects of computation into the framework of memory modiﬁcation. The type of an expression is a description of the value it yields. For example. These techniques are important tools for helping us to ensure that programs work properly without having to resort to tedious testing and debugging that can only show the presence of errors. Because of its close relationship to mathematics. In general we can think of the type of an expression as a “prediction” of the form of the value that it has.2 The ML Computation Model Computation in ML consists of evaluation of expressions. These characteristics are all that you need to know to compute with an expression.
For this reason effects are sometimes called side effects. either because of a runtime fault such as division by zero or because some programmerdeﬁned condition is signalled during its evaluation. 1. The soundness of the type system ensures the accuracy of the predictions made by the type checker. but it nevertheless serves as a very useful guideline for describing types. which classiﬁes only its value. 2. if indeed it has one. if an expression evaluates to a value v and its type is bool. and are not part of the value of the expression. An expression can fail to have a value because its evaluation never terminates or because it raises an exception. ˜1. Let’s consider ﬁrst the type of integers. the form of that value is predicted by its type. and • the operations that may be performed on values of the type. a type is deﬁned by specifying three things: • a name for the type. This possibility is not accounted for in the type of the expression. Effects include such phenomena as raising an exception. We will ignore effects until chapter 13.1 Type Checking What is a type? What types are there? Generally speaking. it cannot be. Its name is int. ˜2. 2002 . The values of type int are the numerals 0. 17 or 3. The type checker determines whether or not an expression is welltyped. For example.2 The ML Computation Model 25 Those without a type are said to be illtyped. It is important to note that the type of an expression says nothing about its possible effects! An expression of type int might well display a message on the screen before returning an integer value. • the values of the type.14. 2.2. A welltyped expression is evaluated to determine its value. or pure. If an expression has a value. rejecting with an error those that are not. Often the division of labor into values and operations is not completely clearcut.2. they are considered ineligible for evaluation. modifying memory. say. (Note that negative W ORKING D RAFT D ECEMBER 9. Evaluation of an expression might also engender an effect. performing input or output. to stress that they happen “off to the side” during evaluation. or sending a message on the network. For the time being we will assume that all expressions are effectfree. then v must be either true or false. and so on.
multiplication. W ORKING D RAFT D ECEMBER 9. Here are some simple arithmetic expressions. they each have type int. The following are all valid typing assertions: 3 3 4 4 : int + 4 : int div 3 : int mod 3 : int Why are these typing assertions valid? In the case of the value 3. written using inﬁx notation for the operations (meaning that the operator comes between the arguments. subtraction.26 Types. Since both arguments in fact have type int. A typing assertion is said to be valid iff the expression exp does indeed have the type typ. The formation of expressions is governed by a set of typing rules that deﬁne the types of expressions in terms of the types of their constituent expressions (if any). For more complex cases we reason analogously. as is customary in mathematics): 3 3 + 4 4 div 3 4 mod 3 Each of these expressions is wellformed. but we may also write 3*(2+6) to override the default precedences. What about the expression 3+4? The addition operation takes two arguments. quotient. . each of which must have type int. Parentheses may be used to override the precedence conventions. This is indicated by a typing assertion of the form exp : typ. *. Values. Arithmetic expressions are formed in the familiar manner. it is an axiom that integer numerals have integer type. for example. which states that the expression exp has the type typ. governed by the usual rules of precedence. The typing rules are generally quite intuitive since they are consistent with our experience in mathematics and in other programming languages. for example. it follows that the entire expression is of type int. and remainder. In their full generality the rules are somewhat involved. Thus the preceding expression may be equivalently written as (3*2)+6. div. in fact. +. just as in ordinary mathematical practice. and Effects numbers are written with a preﬁx tilde. deducing that (3+4) div (2+3): int by observing that (3+4): int and (2+3): int. 2002 . mod. building up additional machinery as we go along. but we will sneak up on them by ﬁrst considering only a small fragment of the language. rather than a minus sign!) Operations on integers include addition. 3*2+6.
5 : int because it is an axiom 3. the inner steps justify that (3+4): int. to distinguish them from eager operations that require their arguments to be evaluated before the operation is performed.2 Evaluation Evaluation of expressions is deﬁned by a set of evaluation rules that determine how the value of a compound expression is determined as a function of the values of its constituent expressions (if any). 5 ⇓ 5 2+3 ⇓ 5 (2+3) div (1+4) ⇓ 1 W ORKING D RAFT D ECEMBER 9. This assertion states that the expression exp has value val .2.2 The ML Computation Model 27 The reasoning involved in demonstrating the validity of a typing assertion may be summarized by a typing derivation consisting of a nested sequence of typing assertions. ML is sometimes said to be a callbyvalue language. It should be intuitively clear that the following evaluation assertions are valid. While this may seem like the only sensible way to deﬁne evaluation. 2. we will see in chapter 15 that this need not be the case — some operations may yield a value without evaluating their arguments.2. each justiﬁed either by an axiom. or a typing rule for an operation. and the result of + is an integer 2. 2002 . For example. the arguments of div must be integers. the validity of the typing assertion (3+7) div 5 : int is justiﬁed by the following derivation: 1. because (a) 3 : int because it is an axiom (b) 7 : int because it is an axiom (c) the arguments of + must be integers. (3+7): int. Recursively. An evaluation assertion has the form exp⇓val . and the result is an integer The outermost steps justify the assertion (3+4) div 5 : int by demonstrating that the arguments each have type int. Such operations are sometimes said to be lazy. Since the value of an operator is determined by the values of its arguments.
Dividing 10 by 5 yields 2. but we will soon see that it is possible for an expression to diverge. Types. We don’t yet have the machinery in place to deﬁne such expressions.17. . If you attempt to evaluate this expression it will incur a runtime error. and Effects An evaluation assertion may be justiﬁed by an evaluation derivation. =. or run forever. we may justify the assertion (3+7) div 5 ⇓ 2 by the derivation 1. /. . the rules of arithmetic are used to determine that adding 3 and 7 yields 10. <. #"b". or values. Note that is an axiom that a numeral evaluates to itself. 2. • Type name: char – Values: #"a". reﬂecting the erroneous attempt to ﬁnd the number n that.1E6. when multiplied by 0. yields 5. 2. Values. . when evaluated. we will have more to say about exceptions in chapter 12.14. – Operations: +. W ORKING D RAFT D ECEMBER 9.3 Types. numerals are fullyevaluated expressions. Not every expression has a value. . which is undeﬁned. Types What types are there besides the integers? Here are a few useful base types of ML: • Type name: real – Values: 3.=. . Second. 0. 5 ⇓ 5 because it is an axiom 3. .chr. The error is expressed in ML by raising an exception. 2. . A simple example is the expression 5 div 0. .28 Types. . <. . – Operations: ord. (3+7) ⇓ 10 because (a) 3 ⇓ 3 because it is an axiom (b) 2 ⇓ 2 because it is an axiom (c) Adding 3 to 7 yields 10. . *. Another reason that a welltyped expression might not have a value is that the attempt to evaluate it leads to an inﬁnite loop. . For example. which is similar to a typing derivation. . 2002 .
14 to an integer by rounding before performing the addition. the addition operation is said to be overloaded at the types int and real. . you intend integer addition. "1234". • Type name: bool – Values: true. size. . The conditional expression if exp then exp1 else exp2 is used to discriminate on a Boolean value.14). on the other hand. . Types • Type name: string – Values: "abc". For example. the type defaults to int. the expression 3+3.7. but these are enough to get us started. which results in a ﬂoating point number.3 Types. false – Operations: if exp then exp1 else exp2 29 There are many.2. you must write 3+round(3. according to whether the value of exp is true or false.7 to perform a ﬂoating point addition of two ﬂoating point numbers. inﬁnitely many!) others. if the arguments are real’s. and use / for ﬂoating point division. (See V for a complete description of the primitive types of ML. We reserve the operator div for integers. then ﬁxed point addition is used.7 for the result of dividing 3. In an expression involving addition the type checker tries to resolve which form of addition (ﬁxed point or ﬂoating point) you mean. then proceeding to evaluate either exp1 or exp2 . For example.1 by 2. note that ﬂoating point division is a different operation from integer quotient! Thus we write 3. we may write 3. . It has type typ if exp has type bool and both exp1 and exp2 have type typ. many (in fact.14. which converts 3. If the arguments are int’s. including the ones given above. – Operations: ˆ. . If. 1 If the type of the arguments cannot be determined.) Notice that some of the arithmetic operations for real numbers are written the same way as for the corresponding operation on integers. otherwise an error is reported. Types. Notice that both “arms” of the conditional must have the same type! It is evaluated by ﬁrst evaluating exp.1/2. then ﬂoating addition is used. you must write instead real(3)+3.1 Note that ML does not perform any implicit conversions between types! For example.1+2. 2002 . W ORKING D RAFT D ECEMBER 9. This is called overloading. which converts the integer 3 to its ﬂoating point representation before performing the addition. =.14 is rejected as illformed! If you intend ﬂoating point addition. Finally. <. .
better yet. 2. just if exp then exp2 else exp1 . rather than write if exp = false then exp1 else exp2 . even though 1 div 0 incurs a runtime error. 2002 W ORKING D RAFT . This is because evaluation of the conditional proceeds either to the then clause or to the else clause. it is better to write if not exp then exp1 else exp2 or. Similarly. But this is equivalent to the simpler expression if exp then exp1 else exp2 . depending on the outcome of the boolean test. Note that the expression if 1<2 then 0 else (1 div 0) evaluates to 0. and Effects if 1<2 then "less" else "greater" evaluates to "less" since the value of the expression 1<2 is true. in fact. the following expressions are not welltyped: size #"1" #"2" 3.14 45 + 1 ˆ "1" + 2 D ECEMBER 9.4 Type Errors Now that we have more than one type. it is rarely useful to do so. Beginners often writen conditionals of the form if exp = true then exp1 else exp2 . Values. Although we may. For example. we have enough rope to hang ourselves by forming illtyped expressions. Whichever clause is evaluated.30 Types. the other is simply discarded without further consideration. test equality of two boolean expressions.
and hence need not be constrained to be welltyped. W ORKING D RAFT D ECEMBER 9.2. While this reasoning is sensible for such a simple example. 2002 . This raises a natural question: is the following expression welltyped or not? if 1<2 then 0 else ("abc"+4) Since the boolean test will come out true. the else clause will never be executed.4 Type Errors 31 In each case we are “misusing” an operator with arguments of the wrong type. in general it is impossible for the type checker to determine the outcome of the boolean test during type checking. namely that of both of its clauses. and in fact have the same type. To be safe the type checker “assumes the worst” and insists that both clauses of the conditional be welltyped. to ensure that the conditional expression can be given a type.
2002 . and Effects W ORKING D RAFT D ECEMBER 9.32 Types. Values.
in sharp contrast to most familiar languages. A type may also be bound to a type constructor using a type binding.1 Variables Just as in any other programming language. it is bound to it for life. there is no possibility of changing the binding of a variable once it has been bound. but for some reason it never is. or lexical. or scope. values may be assigned to variables. A bound type constructor stands for the type bound to it. Scoping in ML is static. meaning that the range of signiﬁcance of a variable or type constructor is determined by the program text. However. 1 W ORKING D RAFT D ECEMBER 9. In this respect variables in ML are more akin to variables in mathematics than to variables in languages such as C. which may then be used in expressions to stand for that value. and can never stand for any other type.1 A value or type binding introduces a “new” variable or type constructor. meaning that the range of signiﬁcance of the variable or type constructor is the “rest” of the program — the part that lexically follows the binding. for use within its range of signiﬁcance. variables in ML do not vary! A value may be bound to a variable using a construct called a value binding. Once a variable is bound to a value. 2002 . we will soon introduce mechanisms for limiting the scopes of variables or type conBy the same token a value binding might also be called a value abbreviation. distinct from all others of that class.Chapter 3 Declarations 3. For this reason a type binding is sometimes called a type abbreviation — the type constructor stands for the type to which it is bound. However. not by the order of evaluation of its constituent expressions.) For the time being variables and type constructors have global scope. (Languages with dynamic scope adopt the opposite convention.
3. but we’ll do it anyway because we’ll need it later. Here are some examples of type bindings: type float = real type count = int and average = real The ﬁrst type binding introduces the type constructor float.1 Type Bindings Any type may be given a name using a type binding. which stand for int and real.34 structors to a given expression. count and average.14 and e : real = 2. and hence cannot refer to one another.17 W ORKING D RAFT D ECEMBER 9. and tyconn = typn where each tyconi is a type constructor and each typi is a type expression. 2002 . Here are some examples: val m : int = 3+2 val pi : real = 3. The second introduces two type constructors. In general a type binding introduces one or more new type constructors simultaneously in the sense that the deﬁnitions of the type constructors may not involve any of the type constructors being deﬁned. The syntax for type bindings is type tycon1 = typ1 and .2. At this stage we have so few types that it is hard to justify binding type names to identiﬁers. Declarations 3.2..2 Basic Bindings 3.2 Value Bindings A value may be given a name using a value binding.. respectively. Thus a binding such as type float = real and average = float is nonsensical (in isolation) since the type constructors float and average are introduced simultaneously. which subsequently is synonymous with real.
3. If not.14 since the type constructor float is bound to the type real. A value binding of the form val var : typ = exp is typechecked by ensuring that the expression exp has type typ. The purpose of a binding is to make a variable available for use within its scope. If exp does not have a value. type checking and evaluation are context dependent in the presence of type and value bindings since we must refer to these bindings to determine the types and values of expressions. both having type real. In the case of a type binding we may use the type variable introduced by that binding in type expressions occurring within its scope.14 and e having value 2. and with pi having value 3. we may write val pi : float = 3. 2002 . and we may bind this value to a variable by writing val x : float = sin pi As these examples illustrate. pi and e. The syntax of value bindings is val var1 : typ1 = exp1 and .2 Basic Bindings 35 The ﬁrst binding introduces the variable m. Similarly. we may make use of the variable introduced by a value binding in value expressions occurring within its scope. Notice that a value binding speciﬁes both the type and the value of a variable. For W ORKING D RAFT D ECEMBER 9.. simultaneously. and varn : typn = expn . For example. Continuing from the preceding binding. specifying its type to be int and its value to be 5.. where each vari is a variable. we may use the expression sin pi to stand for 0. then the declaration does not bind anything to the variable var . each typi is a type expression.17. in the presence of the type bindings above. If so. and each expi is an expression. the binding is rejected as illformed. then val is bound to var . the type of the expression 3. The second introduces two variables.0 (approximately).14. the binding is evaluated using the bindbyvalue rule: ﬁrst exp is evaluated to obtain its value val .
The binding of a variable never changes. The new binding eclipses the old one. (Later on. . decn . we may write val n : real = 2. we may shadow a binding by introducing a second binding for a variable within the scope of the ﬁrst binding. we may evaluate m+n to obtain the value 30. the scope of dec2 includes dec3 . we must consult the binding for pi to determine that it has type float. Subsequently. yielding (approximately) 1. . . . which is necessary for the binding of x to have type float. it is always bound to that value (within the scope of the binding). but it is useful nonetheless to keep this simple idea in mind. 2002 . once bound to a value. then compute as before. . The scopes of these declarations are nested within one another: the scope of dec1 includes dec2 . 3. . we will W ORKING D RAFT D ECEMBER 9.3 Compound Declarations Bindings may be combined to form declarations. even though it may introduce many variables simultaneously. However. which may then be discarded since it is no longer accessible. This is sometimes called the substitution principle for bindings. A binding is an atomic declaration. consult the binding for float to determine that it is synonymous with real. For example. where n is at least 2.0). One thing to keep in mind is that binding is not assignment.17 to introduce a new variable n with both a different type and a different value than the earlier binding. decn .36 Declarations example. optionally separated by a semicolon.0. . Thus we may write the declaration val m : int = 3+2 val n : int = m*m which binds m to 5 and n to 25. . to evaluate the expression cos x in the scope of the above declarations. . . The roughandready rule for both typechecking and evaluation is that a bound variable or type constructor is implicitly replaced by its binding prior to type checking and evaluation. In general a sequential composition of declarations has the form dec1 . to determine that the above binding for x is wellformed. decn . we ﬁrst replace the occurrence of x by its value (approximately 0. and so on. Two declarations may be combined by sequential composition by simply writing them one after the other. Continuing the above example. Later on we will have to reﬁne this simple principle to take account of more sophisticated language features.
then computing the value of m*n relative to these bindings. If the declaration part of a let expression eclipses earlier bindings. they just fade away. and are not accessible from outside the expression. the bindings in dec may be discarded. The value of a let expression is determined by evaluating the declaration part. the ambient bindings are restored upon completion of evaluation of the let expression. yielding this value as the overall value of the let expression. we may limit the scope of one declaration to another declaration by writing local dec in dec end. A let expression has the form let dec in exp end where dec is any declaration and exp is any expression. The scope of the bindings in dec is limited to the declaration dec .4 Limiting Scope The scope of a variable or type constructor may be delimited by using let expressions and local declarations. then evaluating the expression relative to the bindings introduced by the declaration. One might say that old bindings never die. The scope of the declaration dec is limited to the expression exp.3. but are preserved as private data in a closure. The bindings introduced by dec are discarded upon completion of evaluation of exp. Thus the following expression evaluates to 54: W ORKING D RAFT D ECEMBER 9. Similarly. An example will help clarify the idea: let val m : int = 3 val n : int = m*m in m*n end This expression has type int and value 27.) 3. 2002 . as you can readily verify by ﬁrst calculating the bindings for m and n. After processing dec .4 Limiting Scope 37 see that in the presence of higherorder functions shadowed bindings are not always discarded. The bindings for m and n are local to the expression m*n.
For example.sqrt(2. 3. 2002 . This is achieved by maintaining environments for type checking and evaluation. then restored upon completion of this evaluation. The type environment records the types of variables.5 Typing and Evaluation To complete this chapter.38 val m : int = 2 val r : int = let val m : int = 3 val n : int = m*m in m*n end * m Declarations The binding of m is temporarily overridden during the evaluation of the let expression. let’s consider in more detail the contextsensitivity of type checking and evaluation in the presence of bindings.414 val c = #"a" W ORKING D RAFT D ECEMBER 9.0) val c : char = #"a" the type environment contains the information val m : int val x : real val c : char and the value environment contains the information val m = 0 val x = 1. the value environment records their values. after processing the compound declaration val m : int = 0 val x : real = Math. • Evaluation must take account of the declared value of a variable. The key ideas are: • Type checking must take account of the declared type of a variable.
and no change is made to the value environment. A type binding such as type float = real is recorded in its entirety in the type environment. “ ”. val var : typ Note that the second form does not include the binding for var . Each must be equipped with an environment recording information about type constructors and variables introduced by declarations. and that an evaluation assertion has the form exp ⇓ val . In contrast type bindings have signiﬁcance only for type checking. Subsequently. In chapter 2 we said that a typing assertion has the form exp : typ. Thus we see that value bindings have signiﬁcance for both type checking and evaluation. We may think of valenv as a sequence of speciﬁcations of the form The turnstile symbol. and hence contribute only to the type environment. is simply a punctuation mark separating the type environment from the expression and its type.3. only its type! Evaluation assertions are generalized to have the form valenv exp ⇓ val where valenv is a value environment that records the bindings of the variables that may occur in exp. 2 W ORKING D RAFT D ECEMBER 9. While twoplace typing and evaluation assertions are sufﬁcient for closed expressions (those without variables). type typvar = typ 2. whenever we encounter the type constructor float in a type expression. separating the type from the value information.2 We may think of typenv as a sequence of speciﬁcations of one of the following two forms: 1. 2002 . Typing assertions are generalized to have the form typenv exp : typ where typenv is a type environment that records the bindings of type constructors and the types of variables that may occur in exp. we must extend these relations to account for open expressions (those with variables).5 Typing and Evaluation 39 In a sense the value declarations have been divided in “half”. it is replaced by real in accordance with the type binding above.
In order to account for shadowing we require that the rightmost speciﬁcation govern the type of a variable. 2002 .40 val var = val Declarations that bind the value val to the variable var . Finally. . relative to a type environment. . . called type equivalence. This is expressed by the following axiom: . val var : typ . . This is written typenv typ1 ≡ typ2 Two types are equivalent iff they are the same when the type constructors deﬁned in typenv are replaced by their bindings. . . This rule glosses over an important point. val var = val . The role of the type equivalence assertion is to ensure that type constructors always stand for their bindings. . This is expressed by the following axiom: . if the speciﬁcation val var : typ occurs in the type environment. . the rightmost speciﬁcation for typvar governs the assertion. Evaluation of variables is governed by the following axiom: . type typvar = typ . typvar ≡ typ Once again. . . W ORKING D RAFT D ECEMBER 9. Similarly. that determines when two types are equivalent. the evaluation relation must take account of the value environment. var : typ In words. . That way rebinding of variables at different types behaves as expected. then we may conclude that the variable var has type typ. we also need a new assertion. . The primary use of a type environment is to record the types of the value variables that are available for use in a given expression. var ⇓ val Here again we assume that the val speciﬁcation is the rightmost one governing the variable var to ensure that the scoping rules are respected.
Or we might even regard * and + as the data. just variables. This view of functions is similar to our experience from high school algebra. of course. and leave behind the pattern * ( + 4). the pattern is instantiated by applying the function to argument values.1 Functions as Templates So far we just have the means to calculate the values of expressions. 2002 . The data might be taken to be the values 2. Since a pattern can contain many different holes that can be independently instantiated. leaving behind the bare pattern of the calculation. We might equally well take the data to just be 2 and 3. leaving 2 (3 4) as the pattern! What is important is that a complete expression can be recovered by ﬁlling in the holes with chosen data. leaving behind the pattern * ( + ).Chapter 4 Functions 4. This pattern may then be instantiated as often as you like so that the calculation may be repeated with speciﬁed data values plugged in. consider the expression 2*(3+4). and 4. In this chapter we will introduce the ability to abstract the data from a calculation. For example. These names are. it is necessary to give names to the holes so that instantiation consists of plugging in a given value for all occurrences of a name in an expression. A pattern may therefore be viewed as a function of the variables that occur within it. with the variable x representing a ﬁxed. and to bind these values to variables for future reference. 3. and instantiation is just the process of substituting a value for all occurrences of a variable in a given expression. with “holes” where the data used to be. W ORKING D RAFT D ECEMBER 9. In algebra we manipulate polynomials such as x2 +2x+1 as a form of expression denoting a real number.
Function appliW ORKING D RAFT D ECEMBER 9. we sometimes write equations such as f (x) = x2 + 2x + 1. a value of its range type. In the univariate case we can get away with just writing the polynomial for the function. This notation has the virtue of separating the name of the function. but in the multivariate case we must be more careful: we may regard the polynomial x2 + 2xy + y 2 to be a function of x.. namely a value of function type of the form typ > typ . However. and typ is its range type (the type of its results). As in mathematics. that map functions to functions. if f is a differentiable function on the real line. we sometimes write something like f : R → R : x ∈ R → x2 + 2x + 1 to indicate that f is a function on the reals sending x ∈ R to x2 + 2x + 1 ∈ R. (Indeed. to stand for the function f determined by the polynomial. y) = x2 + 2xy + y 2 .) It is also possible to think of a polynomial as a function on the real line: given a real number x. In algebra it is usually left implicit that the variables x and y range over the real numbers. a polynomial determines a real number y computed by the given combination of arithmetic operations. and g(y) = x2 + 2xy + y 2 when y varies for ﬁxed x. This viewpoint is especially important once we consider operators. and that the variable f is bound to that value (i. variables in algebra are sometimes called unknowns. and that f . and h are functions on the real line.42 Functions but unknown. It also emphasizes that functions are a kind of “value” in mathematics (namely. the mapping that sends x ∈ R to x2 + 2x + 1. quantity. We compute with a function by applying it to an argument value of its domain type and calculating the result. that set) by the declaration. another function on the real line. or indeterminates. or a function of both x and y. the function D f is its ﬁrst derivative. In these cases we write f (x) = x2 + 2xy + y 2 when x varies and y is held ﬁxed. For example. as well as the extensional aspects (what they compute). a certain set of ordered pairs). when both vary jointly. f . The type typ is the domain type (the type of arguments) of the function. 4. In the case of the function f deﬁned above the function D f sends x ∈ R to 2x + 2. g. except that we stress both the algorithmic aspects of functions (how they determine values from arguments). to be fully explicit.2 Functions and Application The treatment of functions in ML is very similar. a function in ML is a kind of value.e. and h(x. from the function itself. such as the differential operator. 2002 . a function of y. Indeed.
The square root function is built in.0. we drop the binding of x from the environment.sqrt x) in the presence of this binding. and the result value. and function expressions. since it is no longer needed. we may bind the fourth root function to the variable fourthroot using the following declaration: 1 For purely historical reasons. Math.sqrt (2. parenthesize the argument. For example.sqrt x) It may be applied to an argument by writing an expression such as (fn x : real => Math. W ORKING D RAFT D ECEMBER 9. 2002 . The calculation proceeds by binding the variable x to the argument 16. then evaluating the expression Math. val .sqrt (Math. Then the value binding for the parameter is removed.4. obtaining a value val . and the expression exp is called the body.414 (approximately). The values of function type consist of primitive functions. which are also called lambda expressions. we add the binding val var = val to the value environment. writing Math. For example.sqrt 2. We may write the fourth root function as the following function expression: fn x : real => Math. It has type typ>typ provided that exp has type typ under the assumption that the parameter var has the type typ.2 Functions and Application 43 cation is indicated by juxtaposition: we simply write the argument next to the function. and evaluate exp.0). if we wish. such as addition and square root.sqrt x)) (16.sqrt (Math. To apply such a function expression to an argument value val . Notice that we did not give the fourth root function a name. When evaluation completes. We can.sqrt 2.0. the expression Math. is returned as the value of the application.0) for the sake of clarity.sqrt is a primitive function of type real>real that may be applied to a real number to obtain its square root.1 of the form fn var : typ => exp The variable var is called the parameter.sqrt (Math. this is especially useful for expressions such as Math.0). which calculates the fourth root of 16. For example. it is an “anonymous” function.0 evaluates to 1. We may give it a name using the declaration forms introduced in chapter 3.sqrt (Math.
Evaluate Math. approximately 1. (b) Evaluate the argument expression Math. Evaluate Math. It is important to note that function applications in ML are evaluated according to the callbyvalue rule: the arguments to a function are evaluated before the function is called.0). ii. Evaluate the argument 2. (a) Evaluate Math. so ML provides a special syntax for function bindings that is more concise and natural.0. Thus. we proceed as follows: 1. yielding 1. functions are deﬁned to act on values. Notice that we evaluate both the function and argument positions of an application expression — both the function and argument are expressions W ORKING D RAFT D ECEMBER 9. to evaluate an expression such as fourthroot (2.414. Drop the binding for the variable x.sqrt to a function value (the primitive square root function).0. This notation for deﬁning functions quickly becomes tiresome.0 3.0.sqrt (Math. Bind x to the value 4. Instead of using the val binding above to deﬁne fourthroot. 2.0 to compute the fourth root of 16. rather than on unevaluated expressions. 2. 2002 .sqrt (Math. Evaluate fourthroot to the function value fn x : real => Math.414.sqrt to a function value (the primitive square root function).0+2. Evaluate x to its value. (c) Compute the square root of 1.sqrt x to its value.sqrt x) to the variable fourthroot. iii. Compute the square root of 2.414 (approximately).sqrt(Math.sqrt x) to 1.0.sqrt x) This declaration has the same meaning as the earlier val binding.sqrt x) Functions We may then write fourthroot 16.189. i.sqrt (Math. namely it binds fn x:real => Math. 5.0+2.sqrt (Math.44 val fourthroot : real > real = fn x : real => Math. Put in other terms. yielding 1.sqrt x).0 to its value 4. 4.414. we may instead write fun fourthroot (x:real):real = Math.
Unlike val bindings. in the following example. meaning that they may be computed as the value of an expression. but rather may compute “new” functions on the ﬂy and apply these to arguments. 2002 . In this case the result value (if any) will be of the range type of the function. and then only temporarily. Functions in ML are ﬁrstclass. as we shall see in the sequel. whether by a val binding or by a fn expression.3 Binding and Scope. For example. according to which variables are taken to refer to the nearest enclosing binding of that variable. Using similar techniques we may deﬁne functions with arbitrary domain and range. for the duration of the evaluation of its body. W ORKING D RAFT D ECEMBER 9. It is worth reviewing the rules for binding and scope of variables that we introduced in chapter 3 in the presence of function expressions.4. Thus. 4. and the value of the argument position must be a value of the domain type of the function. This is a source of considerable expressive power. We are not limited to applying only named functions. function expressions bind a variable without giving it a speciﬁc value. The value of the function position must be a value of function type. the following are all valid function declarations: fun fun fun fun fun fun srev (s:string):string = implode (rev (explode s)) pal (s:string):string = s ˆ (srev s) double (n:int):int = n + n square (n:int):int = n * n halve (n:int):int = n div 2 is even (n:int):bool = (n mod 2 = 0) Thus pal "ot" evaluates to the string "otto". the occurrences of x in the body of the function f refer to the parameter of f. either a primitive function or a lambda expression. and is even 4 evaluates to true. Revisited 45 yielding values of the appropriate type. The value of the parameter is only determined when the function is applied. Revisited fn var :typ => exp A function expression of the form binds the variable var within the body exp of the function.3 Binding and Scope. whereas the occurrences of x in the body of g refer to the preceding val binding. As before we adhere to the principle of static scope.
0. In the preceding example.0 fun h(y:real):real = x+y The parameter y to h may be renamed to z without affecting its meaning. its name is not important. On the other hand the last occurrence of x does not lie within the scope of the val binding. but only within the body of the let expression. whereas the preceding two occurrences refer to the inner binding of x to 2.46 val x:real = 2. provided that we don’t change the way in which uses of a variable are resolved.0 fun f(x:real):real = x+x fun g(y:real):real = x+y Functions Local val bindings may shadow parameters. Thus the last occurrence of x refers to the parameter of h. Since the occurrences of x within the body of the let lie within the scope of the inner val binding. the function W ORKING D RAFT D ECEMBER 9. consider the following function declaration: fun h(x:real):real = let val x:real = 2. provided that we simultaneously (and consistently) rename the binding occurrence and all uses of that variable.0 in x+x end * x The inner binding of x by the val declaration shadows the parameter x of h. they are taken to refer to that binding. Our ability to rename parameters is constrained by the static scoping rule. we may not rename it to x. We may rename a parameter to whatever we’d like. Thus the functions f and g below are completely equivalent to each other: fun f(x:int):int = x*x fun g(y:int):int = y*y A parameter is just a placeholder. the body of h lies “within” the scope of the parameter x. The phrases “inner” and “outer” binding refer to the logical structure. and hence refers to the paramter of h. and the expression x+x lies within the scope of the val binding for x. 2002 . for doing so changes its meaning! That is. or abstract syntax of an expression. For example. rather than to the parameter. consider the following situation: val x:real = 2. For example. In general the names of parameters do not matter. we can rename them at will without affecting the meaning of the program. However. as well as other val bindings.
and rather subtle. 2002 . it is essential that you master these concepts now. W ORKING D RAFT D ECEMBER 9. whereas in h the variable x refers to the outer val binding. whereas the variable y refers to the parameter. Revisited fun h’(x:real):real = x+x 47 does not have the same meaning as h. role later on.4. While this may seem like a minor technical issue. for they play a central.3 Binding and Scope. because now both occurrences of x in the body of h’ refer to the parameter.
2002 .48 Functions W ORKING D RAFT D ECEMBER 9.
5. Values of this type are ntuples of the form W ORKING D RAFT D ECEMBER 9. on a par with every other value in the language. such as tuples. 2002 . a 3tuple a triple. say.. In contrast to most familiar languages it is not necessary in ML to be concerned with allocation and deallocation of data structures..Chapter 5 Products and Records 5..valn ). or trees. pointers or address arithmetic.1.1 Product Types A distinguishing feature of ML is that aggregate data structures. A 2tuple is usually called a pair. may be created and manipulated with ease. An ntuple is a value of a product type of the form typ1 *. *typn . nor with any particular representation strategy involving.. Instead we may think of data structures as ﬁrstclass values.. the ntuple. An ntuple is a ﬁnite ordered sequence of values of the form (val1 .1 Tuples This chapter is concerned with the simplest form of aggregate data structure. where each vali is a value. Just as it is unnecessary to think about “allocating” integers to evaluate an arithmetic expression. and so on. lists. arrays. it is unnecessary to think about allocating more complex data structures such as tuples or lists..
the null tuple! 1 W ORKING D RAFT D ECEMBER 9. which is also known as a null tuple. which may be thought of as the 0tuple type. 52) In Java (and other languages) the type unit is misleadingly written void.0) val pair of pairs : (int * int) * (real * real) = ((2. so that the above tuple expression evaluates to the tuple value yielding the tuple value (val1 . (). "2") val quadruple : int * int * real * real = (2. Products and Records where vali is a value of type typi (for each 1 ≤ i ≤ n). Thus the following are wellformed bindings: val pair : int * int = (2.3).. As a convenience.3. n = 0 and n = 1.1 The null type is surprisingly useful.50 (val1 .0.expn ). but in fact it has exactly one member.. especially when programming with effects. and so on. On the other hand there seems to be no particular use for 1tuples.0)) The nesting of parentheses matters! A pair of pairs is not the same as a quadruple.0. A 0tuple.3. 2002 . It is a value of type unit.....0. 3) val triple : int * real * string = (2. is the empty sequence of values. provided that exp1 evaluates to val1 . There are two limiting cases. Tuple expressions are evaluated from left to right.3. For example..2. The name “void” suggests that the type has no members. 2. ML also provides a general tuple expression of the form (exp1 .valn ). where each expi is an arbitrary expression. and so they are absent from the language. not necessarily a value.valn ). the binding val pair : int * int = (1+1.... that deserve special attention.(2... so the last two bindings are of distinct values with distinct types. exp2 evaluates to val2 .
Strictly speaking. without decomposing them into constituent parts: val (is:int*string.. binding the ﬁrst component of the second component to the variable r.. suppose that val is a value of type (int * string) * (real * char) and we wish to retrieve the ﬁrst component of the second component of val . a value of type real. )) = val The lefthand side of the val binding is a tuple pattern that describes a pair of pairs.1. s:string). Rather than write (exp1 . with the (implicit) understanding that the expi ’s are evaluated from left to right. c:char)) = val If we’d like we can even give names to the ﬁrst and second components of the pair. it is not essential to have tuple expressions as a primitive notion in the language.1 Product Types 51 binds the value (2. features of ML is the use of pattern matching to access components of aggregate data structures. The underscores indicate “don’t care” positions in the pattern — their values are not bound to any variable. If we wish to give names to all of the components.. in val xn = expn (x1. Rather than explicitly “navigate” to this position to retrieve it...xn) end which makes the evaluation order explicit. we may use the following value binding: val ((i:int.2 Tuple Patterns One of the most powerful.rc:real*char) = val W ORKING D RAFT D ECEMBER 9... . we may simply use a generalized form of value binding in which we select that component using a pattern: val (( . 3) to the variable pair. 2002 . 5. For example. (r:real.expn ).5.. ). and distinctive. we may instead write let val x1 = exp1 val x2 = exp2 . (r:real. .
52 The general form of a value binding is val pat = exp,
Products and Records
where pat is a pattern and exp is an expression. A pattern is one of three forms: 1. A variable pattern of the form var :typ. 2. A tuple pattern of the form (pat1 ,...,patn ), where each pati is a pattern. This includes as a special case the nulltuple pattern, (). 3. A wildcard pattern of the form . The type of a pattern is determined by an inductive analysis of the form of the pattern: 1. A variable pattern var :typ is of type typ. 2. A tuple pattern (pat1 ,...,patn ) has type typ1 *· · · *typn , where each pati is a pattern of type typi . The nulltuple pattern () has type unit. 3. The wildcard pattern has any type whatsoever. A value binding of the form val pat = exp is welltyped iff pat and exp have the same type; otherwise the binding is illtyped and is rejected. For example, the following bindings are welltyped: val val val val (m:int, n:int) = (7+1,4 div 2) (m:int, r:real, s:string) = (7, 7.0, "7") ((m:int,n:int), (r:real, s:real)) = ((4,5),(3.1,2.7)) (m:int, n:int, r:real, s:real) = (4,5,3.1,2.7)
In contrast, the following are illtyped: val (m:int,n:int,r:real,s:real) = ((4,5),(3.1,2.7)) val (m:int, r:real) = (7+1,4 div 2) val (m:int, r:real) = (7, 7.0, "7")
W ORKING D RAFT D ECEMBER 9, 2002
5.1 Product Types
53
Value bindings are evaluated using the bindbyvalue principle discussed earlier, except that the binding process is now more complex than before. First, we evaluate the righthand side of the binding to a value (if indeed it has one). This happens regardless of the form of the pattern — the righthand side is always evaluated. Second, we perform pattern matching to determine the bindings for the variables in the pattern. The process of matching a value against a pattern is deﬁned by a set of rules for reducing bindings with complex patterns to a set of bindings with simpler patterns, stopping once we reach a binding with a variable pattern. The rules are as follows: 1. The variable binding val var = val is irreducible. 2. The wildcard binding val 3. The tuple binding val (pat1 ,...,patn ) = (val1 ,...,valn ) is reduced to the set of n bindings val pat1 = val1 . . . val patn = valn In the case that n = 0 the tuple binding is simply discarded. These simpliﬁcations are repeated until all bindings are irreducible, which leaves us with a set of variable bindings that constitute the result of pattern matching. For example, evaluation of the binding val ((m:int,n:int), (r:real, s:real)) = ((2,3),(2.0,3.0)) proceeds as follows. First, we compose this binding into the following two bindings: val (m:int, n:int) = (2,3) val (r:real, s:real) = (2.0,3.0). Then we decompose each of these bindings in turn, resulting in the following set of four atomic bindings:
W ORKING D RAFT D ECEMBER 9, 2002
= val is discarded.
54 val val val val m:int = 2 n:int = 3 r:real = 2.0 s:real = 3.0
Products and Records
At this point the patternmatching process is complete.
5.2 Record Types
Tuples are most useful when the number of positions is small. When the number of components grows beyond a small number, it becomes difﬁcult to remember which position plays which role. In that case it is more natural to attach a label to each component of the tuple that mediates access to it. This is the notion of a record type. A record type has the form {lab1 :typ1 ,...,labn :typn }, where n ≥ 0, and all of the labels labi are distinct. A record value has the form {lab1 =val1 ,...,labn =valn }, where vali has type typi . A record pattern has the form {lab1 =pat1 ,...,labn =patn } which has type {lab1 :typ1 ,...,labn :typn } provided that each pati has type typi . A record value binding of the form val {lab1 =pat1 ,...,labn =patn } = {lab1 =val1 ,...,labn =valn } is decomposed into the following set of bindings val pat1 = val1 . . . val patn = valn .
W ORKING D RAFT D ECEMBER 9, 2002
5.2 Record Types
55
Since the components of a record are identiﬁed by name, not position, the order in which they occur in a record value or record pattern is not important. However, in a record expression (in which the components may not be fully evaluated), the ﬁelds are evaluated from left to right in the order written, just as for tuple expressions. Here are some examples to help clarify the use of record types. First, let us deﬁne the record type hyperlink as follows: type hyperlink = { protocol : string, address : string, display : string } The record binding val mailto rwh : hyperlink = { protocol="mailto", address="rwh@cs.cmu.edu", display="Robert Harper" } deﬁnes a variable of type hyperlink. The record binding val { protocol=prot, display=disp, address=addr } = mailto rwh decomposes into the three variable bindings val prot = "mailto" val addr = "rwh@cs.cmu.edu" val disp = "Robert Harper" which extract the values of the ﬁelds of mailto rwh. Using wild cards we can extract selected ﬁelds from a record. For example, we may write val {protocol=prot, address= , display= } = mailto rwh to bind the variable prot to the protocol ﬁeld of the record value mailto rwh. It is quite common to encounter record types with tens of ﬁelds. In such cases even the wild card notation doesn’t help much when it comes to selecting one or two ﬁelds from such a record. For this we often use ellipsis patterns in records, as illustrated by the following example. val {protocol=prot,...} = intro home The pattern {protocol=prot,...} stands for the expanded pattern
W ORKING D RAFT D ECEMBER 9, 2002
56
Products and Records {protocol=prot, address= , display= }
in which the ellided ﬁelds are implicitly bound to wildcard patterns. In general the ellipsis is replaced by as many wildcard bindings as are necessary to ﬁll out the pattern to be consistent with its type. In order for this to occur the compiler must be able to determine unambiguously the type of the record pattern. Here the righthand side of the value binding determines the type of the pattern, which then determines which additional ﬁelds to ﬁll in. In some situations the context does not disambiguate, in which case you must supply additional type information, or avoid the use of ellipsis notation. Finally, ML provides a convenient abbreviated form of record pattern {lab1 ,...,labn } which stands for the pattern {lab1 =var1 ,...,labn =varn } where the variables vari are variables with the same name as the corresponding label labi . For example, the binding val { protocol, address, display } = mailto rwh decomposes into the sequence of atomic bindings val protocol = "mailto" val address = "rwh@cs.cmu.edu" val display = "Robert Harper" This avoids the need to think up a variable name for each ﬁeld; we can just make the label do “double duty” as a variable.
5.3 Multiple Arguments and Multiple Results
A function may bind more than one argument by using a pattern, rather than a variable, in the argument position. Function expressions are generalized to have the form fn pat => exp
W ORKING D RAFT D ECEMBER 9, 2002
5.3 Multiple Arguments and Multiple Results
57
where pat is a pattern and exp is an expression. Application of such a function proceeds much as before, except that the argument value is matched against the parameter pattern to determine the bindings of zero or more variables, which are then used during the evaluation of the body of the function. For example, we may make the following deﬁnition of the Euclidean distance function: val dist : real * real > real = fn (x:real, y:real) => sqrt (x*x + y*y) This function may then be applied to a pair (a twotuple!) of arguments to yield the distance between them. For example, dist (2.0,3.0) evaluates to (approximately) 4.0. Using fun notation, the distance function may be deﬁned more concisely as follows: fun dist (x:real, y:real):real = sqrt (x*x + y*y) The meaning is the same as the more verbose val binding given earlier. Keyword parameter passing is supported through the use of record patterns. For example, we may deﬁne the distance function using keyword parameters as follows: fun dist’ {x=x:real, y=y:real} = sqrt (x*x + y*y) The expression dist’ {x=2.0,y=3.0} invokes this function with the indicated x and y values. Functions with multiple results may be thought of as functions yielding tuples (or records). For example, we might compute two different notions of distance between two points at once as follows: fun dist2 (x:real, y:real):real*real = (sqrt (x*x+y*y), abs(xy)) Notice that the result type is a pair, which may be thought of as two results. These examples illustrate a pleasing regularity in the design of ML. Rather than introduce ad hoc notions such as multiple arguments, multiple results, or keyword parameters, we make use of the general mechanisms of tuples, records, and pattern matching. It is sometimes useful to have a function to select a particular component from a tuple or record (e.g., the third component or the component
W ORKING D RAFT D ECEMBER 9, 2002
It is bad style. . . code is generally clearer and easier to maintain if you use patterns wherever possible. But since they arise so frequently... ) = x where x occurs in the ith position of the tuple (and there are underscores in the other n − 1 positions).. there is a function #i of type typ1 *· · ·*typn >typi deﬁned as follows: fun #i ( . to overuse the sharp notation. For any tuple type typ1 *· · ·*typn . however... for example. Use of the sharp notation is strongly discouraged! A similar notation is provided for record ﬁeld selection. Compare. ... Such functions may be easily deﬁned using pattern matching. and each 1 ≤ i ≤ n. W ORKING D RAFT D ECEMBER 9. they are predeﬁned in ML using sharp notation. fun #lab {lab=x.. The following function #lab selects the component of a record with label lab. . x.. the following deﬁnition of the Euclidean distance function written using sharp notation with the original. Thus we may refer to the second ﬁeld of a threetuple val by writing #2(val ). leading to unreadable code. fun dist (p:real*real):real = sqrt((#1 p)*(#1 p)+(#2 p)*(#2 p)) You can easily see that this gets out of hand very quickly. 2002 .58 Products and Records with a given labeled).} = x Notice the use of ellipsis! Bear in mind the disambiguation requirement: any use of #lab must be in a context sufﬁcient to determine the full record type of its argument.
Other types have values of more than one form. #"0") The ﬁrst two bindings succeed. . they are said to be homogeneous.y:real) is of type int*real and hence will match any value of that type. Attempting to match any other value against that pattern fails at execution time with an error condition called a bind failure. all values of type int*real are pairs whose ﬁrst component is an integer and whose second component is a real. there is no possibility of failure of pattern matching. (Other examples of heterogeneous types will arise later on. On the other hand the pattern (x:int. . 34) val (0.y:real.) Corresponding to each of the values of these types is a pattern that matches only that value. the third fails.z:string) is of type int*real*string and cannot be used to match against values of type int*real. In the case of the second. #"0") = (21. for some n determined by the type). 2002 . 1.x) = (11. the W ORKING D RAFT D ECEMBER 9. The pattern (x:int. Here are some examples of patternmatching against values of a heterogeneous type: val 0 = 11 val (0. ˜1. attempting to do so fails at compile time.Chapter 6 Case Analysis 6. or a value of type char might be #"a" or #"z".1 Homegeneous and Heterogeneous Types Tuple types have the property that all values of that type have the same form (ntuples. they are said to be heterogeneous types. a value of type int might be 0. Any typecorrect pattern will match any value of that type. . For example. For example.
2 Clausal Function Expressions The importance of constant patterns becomes clearer once we consider how to deﬁne functions over heterogeneous types. the function has the type typ1 >typ2 . If no pattern matches (i. If these requirements are satisﬁed. where 1 ≤ i ≤ n. Otherwise we proceed to stage i + 1. Each component pat=>exp is called a clause. No variables are bound in the ﬁrst or third examples. the argument value val is matched against the pattern pati . This is achieved in ML using a clausal function expression whose general form is fn pat1 => exp1 . . Speciﬁcally. The entire assembly of rules is called a match. Consider the following clausal function: val recip : int > int = fn 0 => 0  n:int => 1 div n This deﬁnes an integervalued reciprocal function on the integers. The typing rules for matches ensure consistency of the clauses. then the application fails with an execution error called a match failure. if the pattern match succeeds.. there must exist types typ1 and typ2 such that 1.  . At stage i.60 Case Analysis variable x is bound to 34 after the match. The function has two clauses. we reach stage n +1).  patn => expn Each pati is a pattern and each expi is an expression involving the variables of the pattern pati . with the variables of pati replaced by their values as determined by pattern matching. (Note that n is guaranteed to be nonzero because the patterns are considered in W ORKING D RAFT D ECEMBER 9. Here’s an example. the other for nonzero arguments n. given the types of the variables in pattern pati . 2002 . one for the argument 0. Each expression expi has type typ2 . or a rule.e. Application of a clausal function to a value val proceeds by considering the clauses in the order written. 6. Each pattern pati has type typ1 . where the reciprocal of 0 is arbitrarily deﬁned to be 0. 2. evaluation continues with the evaluation of expression expi .
then matching its value successively against the patterns in the match until one succeeds.. Its values are true and false. For example. The case expression fails if no pattern succeeds to match the value. patn => expn is short for the application (fn pat1 => exp1  . and continuing with evaluation of the corresponding expression.  patn => expn ) exp. The notation case of   exp pat1 => exp1 .3 Booleans and Conditionals.3 Booleans and Conditionals. whereas the fn form uses a double arrow. 2002 . Revisited 61 order: we reach the pattern n:int only if the argument fails to match the pattern 0. Evaluation proceeds by ﬁrst evaluating exp. the negation function may be deﬁned clausally as follows: fun not true = false  not false = true The conditional expression W ORKING D RAFT D ECEMBER 9. Functions may be deﬁned on booleans using clausal deﬁnitions that match against the patterns true and false.. Case analysis on the values of a heterogeneous type is performed by application of a clausallydeﬁned function.. Revisited The type bool of booleans is perhaps the most basic example of a heterogeneous type.) The fun notation is also generalized so that we may deﬁne recip using the following more concise syntax: fun recip 0 = 0  recip (n:int) = 1 div n One annoying thing to watch out for is that the fun form uses an equal sign to separate the pattern from the expression in a clause. 6.6..
ensures that no clause of a match is subsumed by the clauses that precede it. called redundancy checking. Redundant clauses are always a mistake — such a clause can never be executed. The expression exp1 andalso exp2 is short for if exp1 then exp2 else false and the expression exp1 orelse exp2 is short for if exp1 then true else exp2 . 6.62 if exp then exp1 else exp2 is shorthand for the case analysis case exp of true => exp1  false => exp2 which is itself shorthand for the application (fn true => exp1  false => exp2 ) exp. Pay particular attention to the evaluation order. Redundant rules often arise accidentally. For example. and observe that the callbyvalue principle is not violated by these expressions. This means that the set of values covered by a clause in a match must not be contained entirely within the set of values covered by the preceding clauses of that match. the second rule of the following clausal function deﬁnition is redundant: fun not True = false  not False = true W ORKING D RAFT D ECEMBER 9. called exhaustiveness checking.4 Exhaustiveness and Redundancy Matches are subject to two forms of “sanity check” as an aid to the ML programmer. 2002 . Case Analysis The “shortcircuit” conjunction and disjunction operations are deﬁned as follows. You should expand these into case expressions and check that they behave as expected. The ﬁrst. ensures that a wellformed match covers its domain type in the sense that every value of the domain must match one of its clauses. The second.
say. as in the following example: fun recip (n:int) = 1 div n  recip 0 = 0 The second clause is redundant because the ﬁrst matches any integer value. changing the order of clauses in a wellformed. 2002 . depending on whether the match might ever be applied to a value that is not covered by any clause. Consequently.6. rendering the second redundant. redundancy checking is correspondingly ordersensitive. irredundant match can make it redundant. Since the clauses of a match are considered in the order they are written. In particular.4 Exhaustiveness and Redundancy 63 By capitalizing True we have turned it into a variable. the function never returns false for any argument! Perhaps what was intended here is to include a catchall clause at the end: W ORKING D RAFT D ECEMBER 9. including 0. Indeed. every value matches the ﬁrst clause. Inexhaustive matches may or may not be in error. Here is an example of a function with an inexhaustive match that is plausibly in error: fun is numeric #"0" =  is numeric #"1"  is numeric #"2"  is numeric #"3"  is numeric #"4"  is numeric #"5"  is numeric #"6"  is numeric #"7"  is numeric #"8"  is numeric #"9" true = true = true = true = true = true = true = true = true = true When applied to. rather than a constant pattern. #"a". this function fails.
2002 . In chapter 10 we will see that the exhaustiveness checker is an extremely valuable tool for managing code evolution. The exhaustiveness checker is your friend! Each such warning is a suggestion to doublecheck that match to be sure that you’ve not made a silly error of omission.64 fun is numeric #"0" = true  is numeric #"1" = true  is numeric #"2" = true  is numeric #"3" = true  is numeric #"4" = true  is numeric #"5" = true  is numeric #"6" = true  is numeric #"7" = true  is numeric #"8" = true  is numeric #"9" = true  is numeric = false Case Analysis The addition of a ﬁnal catchall clause renders the match exhaustive. it is a very bad idea to simply add a catchall clause to the end of every match to suppress inexhaustiveness warnings from the compiler. Having said that. because any value not matched by the ﬁrst ten clauses will surely be matched by the eleventh. but rather have intentionally left out cases that are ruled out by the invariants of the program. W ORKING D RAFT D ECEMBER 9.
the principal means of iterative computation in ML. In general we must prove that for all inputs of the domain type. Usually the argument imposes some additional assumptions on the inputs.and postcondition pairs. some calls must return their results without making any recursive calls. examples of which we shall consider below. to avoid inﬁnite regress. 2002 . called the preconditions. in some sense. The correctness requirement for the result is called a postcondition. the body of the function computes the “correct” value of result type. For example. This informal description obscures a central point. Those that do must ensure that the arguments are. the ML type checker proves that the body of a function yields a value of the range type (if it terminates) whenever it is given an argument of the domain type. Obviously. In most cases we are interested in deeper properties. according to our interest. “smaller” so that the process will eventually terminate. Here the domain type is the precondition. namely the means by which we may convince ourselves that a function computes the result that we intend. To prove the correctness of a recursive function (with respect to given pre. Informally. In fact we may carry out such an analysis for many different pre. the body evaluates to a result satisfying the postcondition.and postconditions) it is typically necessary to use some form of inW ORKING D RAFT D ECEMBER 9. and the range type is the postcondition. a recursive function is one that computes the result of a call by possibly making further calls to itself. Our burden is to prove that for every input satisfying the preconditions. In this chapter we introduce recursive functions.Chapter 7 Recursive Functions So far we’ve only considered very simple functions (such as the reciprocal function) whose value is computed by a simple composition of primitive functions.
As in the nonrecursive case.1 SelfReference and Recursion In order for a function to “call itself”. 7. it must have a name by which it can refer to itself.66 Recursive Functions ductive reasoning.and postconditions. since only functions may be deﬁned recursively in ML. but here the righthand side must be a value. Here’s an example of a recursive value binding: val rec factorial : int>int = fn 0 => 1  n:int => n * factorial (n1) Using fun notation we may write the deﬁnition of factorial much more clearly and concisely as follows: fun factorial 0 = 1  factorial (n:int) = n * factorial (n1) There is obviously a close correspondence between this formulation of factorial and the usual textbook deﬁnition of the factorial function in terms of recursion equations: 0! = 1 n! = n × (n − 1)! (n > 0) Recursive value bindings are typechecked in a manner that may. by appeal to an induction principle that justiﬁes the particular pattern of recursion. seem paradoxical. Taken together. In fact the righthand side must be a function expression. which are ordinary value bindings qualiﬁed by the keyword rec. the inductive step corresponds to those that do. at ﬁrst glance. To check that the binding W ORKING D RAFT D ECEMBER 9. No doubt this all sounds fairly theoretical. the lefthand is a pattern. 2002 . these two steps are sufﬁcient to establish the correctness of the function itself. The base cases of the induction correspond to those cases that make no recursive calls. The function may refer to itself by using the variable var . The simplest form of a recursive value binding is as follows: val rec var :typ = val . This is achieved by using a recursive value binding. We must separately show that the base cases satisfy the given pre. The point of this chapter is to show that it is also profoundly practical. The beauty of inductive reasoning is that we may assume that the recursive calls work correctly when showing that a case involving recursive calls is correct.
This is clearly true for the ﬁrst clause of the deﬁnition. then check that n * factorial (n1) has type int. For example. in addition. we assume that n has type int.  patn => expn and we wish to apply var to the value val of type typ. assuming that var has type typ. by evaluating expi . Since var refers to the value val itself. but.) Let’s look at an example.7. replacing the variables in pati by the bindings determined by pattern matching. the function fn 0 => 1  n:int => n * factorial (n1). then check that its deﬁnition. That way all references to the variable standing for the function itself are indeed references to the function itself! Suppose that we have the following recursive function binding val rec var : typ = fn pat1 => exp1  .1 SelfReference and Recursion val rec var : typ = val 67 is wellformed. we consider each clause in turn. We must arrange that all occurrences of the variable standing for the function are replaced by the function itself before we evaluate the body. we replace all occurrences of the var by its binding in expi before continuing evaluation. we assume that the variable factorial has type int>int. the type typ will always be a function type. How are applications of recursive functions evaluated? The rules are almost the same as before. As before. we proceed by retrieving the binding of factorial and evaluating W ORKING D RAFT D ECEMBER 9. For the second. We proceed. since val is required to be a function expression. we are in effect assuming what we intend to prove while proving it! (Incidentally. we ensure that the value val has type typ. 2002 . To do so we must check that each clause has type int>int by checking for each clause that its pattern has type int and that its expression has type int. has type int>int. to evaluate factorial 3. To check that the binding for factorial given above is wellformed. with one modiﬁcation.. until we ﬁnd the ﬁrst pattern pati matching val . as before. This is so because of the rules for the primitive arithmetic operations and because of our assumption that factorial has type int>int..
Also observe that the size of the subproblems grows until there are no more recursive calls. then shrinks as the pending multiplications are completed. We therefore continue by evaluating its righthand side. factorial 3 2. Observe that the repeated substitution of factorial by its deﬁnition ensures that the recursive calls really do refer to the factorial function itself. at which point the computation can complete. which reduces to 3 * (2 * (1 * 1)). W ORKING D RAFT D ECEMBER 9. Considering each clause in turn. In broad outline. which then evaluates to 6. 6 Notice that the size of the expression ﬁrst grows (in direct proportion to the argument). 3 * 2 * factorial 1 4. 3 * 2 8. We are left with the subproblem of evaluating the expression 3 * (fn 0 => 1  n:int => n*factorial(n1))(2) Proceeding as before. but the second does. 2002 . 3 * 2 * 1 7. as desired. 3 * 2 * 1 * 1 6. 3 * 2 * 1 * factorial 0 5. we reduce this to the subproblem of evaluating 3 * (2 * (fn 0=>1  n:int => n*factorial(n1))(1)). after replacing n by 3 and factorial by its deﬁnition. the expression n * factorial(n1). This growth in expression size corresponds directly to a growth in runtime storage required to record the state of the pending computation. we ﬁnd that the ﬁrst doesn’t match.68 Recursive Functions (fn 0=>1  n:int => n*factorial(n1))(3). 3 * factorial 2 3. the computation proceeds as follows: 1. which reduces to the subproblem of evaluating 3 * (2 * (1 * (fn 0=>1  n:int => n*factorial(n1))(0))).
r:int) = r  helper (n:int. 2002 . The idea is that the accumulator reassociates the pending multiplications in the evaluation trace given above so that they can be performed prior to the recursive call. it is usual to conceal the deﬁnitions of helper functions using a local declaration. 6) W ORKING D RAFT D ECEMBER 9. As a matter of programming style.2 Iteration 69 7. an integer argument and an accumulator that records the running partial result of the computation.1) end This way the helper function is not visible.r:int) = helper (n1. meaning that the recursive call is the last step of evaluation of an application of it to an argument.r:int) = r  helper (n:int. 1) First we deﬁne a “helper” function that takes two parameters. 6) 4. This means that the evaluation trace of a call to helper with arguments (3. This reduces the space required to keep track of those pending steps.2 Iteration The deﬁnition of factorial given above should be contrasted with the following twopart deﬁnition: fun helper (0. 3) 3. The important thing to observe about helper is that it is iterative.r:int) = helper (n1. only the function of interest is “exported” by the declaration. helper (3. helper (1.n*r) in fun factorial (n:int) = helper (n. or tail recursive. helper (0. helper (2. corresponding to the product of zero terms (empty preﬁx). Second. In practice we would make the following deﬁnition of the iterative version of factorial: local fun helper (0. rather than after it completes. 1) 2. we deﬁne factorial by calling helper with argument n and initial accumulator value 1.n*r) fun factorial (n:int) = helper (n.1) has the following general form: 1.7.
assuming that it holds for any recursive calls. then factorial n evaluates to n!. 6 Recursive Functions Notice that there is no growth in the size of the expression because there are no pending computations to be resumed upon completion of the recursive call. Postcondition: factorial n evaluates to n!. 2. but.3 Inductive Reasoning Time and space usage are important. We’ll illustrate the use of inductive reasoning by a graduated series of examples. 2.70 5. Precondition: n ≥ 0. That is. moreover. W ORKING D RAFT D ECEMBER 9. but what is more important is that the function compute the intended result. An inputoutput speciﬁcation of the intended behavior stating preconditions on the arguments and a postcondition on the result. in contrast to the ﬁrst deﬁnition given above. we show that the preconditions imply the postcondition holds of the result of any application. First consider the simple. that every application in fact yields a result (subject to the precondition on the argument). 7. Consequently. This is called a total correctness assertion because it states not only that the postcondition holds of any result of application. nontail recursive deﬁnition of factorial given in section 7. We are to establish the following statement of correctness of factorial: if n ≥ 0. A proof that the speciﬁcation holds for each clause of the function. without requiring auxiliary storage. 3. given the correctness of its clauses. there is no growth in the space required for an application.1. An induction principle that justiﬁes the correctness of the function as a whole. The critical ingredients are these: 1. The key to the correctness of a recursive function is an inductive argument establishing its correctness. One reasonable speciﬁcation for factorial is as follows: 1. 2002 . Tail recursive deﬁnitions are analogous to loops in imperative languages: they merely iterate a computation.
and postconditions stated above. This may be stated as the assertion if n ≥ 0 and factorial n evaluates to p. This completes the proof. Helper functions correspond to lemmas. W ORKING D RAFT D ECEMBER 9. Notice that this statement is true of a function that diverges whenever it is applied! In this sense a partial correctness assertion is weaker than a total correctness assertion. and. which evaluates to n! × 1 = n!. Let us establish the total correctness of factorial using the pre. Recall that this means we are to establish the speciﬁcation for the case n = 0. but lies at the heart of good programming style. n = 0. Postcondition: helper (n.7. 2002 . and so by deﬁnition factorial n evaluates to n × m! = (m + 1) × m! = (m + 1)! = n!. By the inductive hypothesis we have that factorial m evaluates to m! (since m ≥ 0). The foregoing argument shows that this is more than an analogy. is trivial: by deﬁnition factorial n evaluates to 1. main functions correspond to theorems. show that it holds for n + 1. The base case. a partial correctness assertion does not insist on termination. This completes the proof.3 Inductive Reasoning 71 In contrast. we once again proceed by mathematical induction on n. What about the iterative deﬁnition of factorial? We focus on the behavior of helper. A suitable speciﬁcation is given as follows: 1. which is 0!. Now suppose that n = m + 1 for some m >= 0. That was easy. 1). To do so. then p = n!. assuming it to hold for n >= 0. as required. To show the total correctness of helper with respect to this speciﬁcation. we apply the principle of mathematical induction on the argument n. only that the postcondition holds whenever the application terminates. With this in hand it is easy to prove the correctness of factorial — if n ≥ 0 then factorial n evaluates to the result of helper (n. we use helper functions to help us deﬁne main functions. We leave it as an exercise to give the details of the proof. 2. Precondition: n ≥ 0. r) evaluates to n! × r. Just as we use lemmas to help us prove theorems.
Knuth. and fib (n3). but also the (n − 1)st as well. 2002 . Concrete Mathematics (AddisonWesley 1989) for a derivation) that the recurrence F0 = 1 F1 = 1 Fn = Fn−1 + Fn−2 W ORKING D RAFT D ECEMBER 9. Here’s the code: (* for n>=0. Here’s a better solution: for each n >= 0 compute not only the nth Fibonacci number. and Patashnik. and fib (n4). where a is the nth Fibonacci number. Computing fib (n2) requires computing fib (n3) again. fib’ n evaluates to (a. 1)  fib’ (n:int) = let val (a:int. It turns out (see Graham. b:int) = fib’ (n1) in (a+b. 0)  fib’ 1 = (1. which is why we must appeal to complete induction to justify the deﬁnition. To compute fib (n1) requires that we compute fib (n2) a second time. (For n = 0 we deﬁne the “−1st” Fibonacci number to be zero). resulting in a lineartime algorithm. but also n2. That way we can avoid redundant recomputation. and b is the (n1)st *) fun fib’ 0 = (1. deﬁned on integers n >= 0: (* for n>=0. there is considerable redundancy here. It can be shown that the running time fib of is exponential in its argument. which is quite awful. the Fibonacci function. a) end You might feel satisﬁed with this solution since it runs in time linear in n. This deﬁnition of fib is very inefﬁcient because it performs many redundant computations: to compute fib n requires that we compute fib (n1) and fib (n2). b). As you can see. fib n yields the nth Fibonacci number *) fun fib 0 = 1  fib 1 = 1  fib (n:int) = fib (n1) + fib (n2) The recursive calls are made not only on n1.72 Recursive Functions Here’s an example of a function deﬁned by complete induction (or strong induction).
4 Mutual Recursion 73 has a closedform solution over the real numbers. without recursion. But to illustrate the idea of mutual recursion we instead use the following inductive characterization: 0 is even. Here’s a simple example to illustrate the point.4 Mutual Recursion It is often useful to deﬁne two functions simultaneously. W ORKING D RAFT D ECEMBER 9. This may be coded up using two mutuallyrecursive procedures as follows: fun  and  even 0 = true even n = odd (n1) odd 0 = false odd n = even (n1) Notice that even calls odd and odd calls even. 2002 . Such functions are said to be mutually recursive. by using ﬂoating point arithmetic. n > 0 is odd iff n − 1 is even. namely testing whether a natural number is odd or even. 7. This means that the nth Fibonacci number can be calculated directly. so they are not deﬁnable separately from one another. n > 0 is even iff n − 1 is odd. We join their deﬁnitions using the keyword and to indicate that they are deﬁned simultaneously by mutual recursion. and indeed this is what one would do in practice. each of which calls the other (and possibly itself) to compute its result. The most obvious approach is to test whether the number is congruent to 0 mod 2. In most instances recursivelydeﬁned functions have no known closedform solution. this is an unusual case. However.7. and not odd. so that some form of iteration is inevitable.
2002 .74 Recursive Functions W ORKING D RAFT D ECEMBER 9.
Chapter 8 Type Inference and Polymorphism 8. especially when you’re using clausal function deﬁnitions. and important. A particularly pleasant feature of ML is that it allows you to omit this type information whenever it can be determined from context. with very few exceptions. we’ve assigned it a type at its point of introduction. there is no need to give a type to the variable s in the function fn s:string => s ˆ "\n". you may write simply fn s => s ˆ "\n". This means that whenever we’ve introduced a variable. When is it allowable to omit this information? Almost always. this gets a little tedious after a while.1 Type Inference So far we’ve mostly written our programs in what is known as the explicitly typed style. This process is known as type inference since the compiler is inferring the missing type information based on context. result about ML that missing type information can (almost) always be reconstructed completely and W ORKING D RAFT D ECEMBER 9. As you may have noticed. For example. The reason is that no other type for s makes sense. leaving ML to insert “:string” for you. Consequently. It is a deep. since s is used as an argument to string concatenation. In particular every variable in a pattern has a type associated with it. 2002 .
In other words every type for the identity function has the form typ>typ. one for each choice of the type of the parameter x. If there is no way to recover the omitted type information. since we are constrained to replace all occurrences of a type variable by the same type scheme. and widely underappreciated. However. the behavior of the function fn (x. Type variables are written ’a (pronounced “α”). Otherwise there is a “best” way to ﬁll in the blanks. the type of the identity function is typ>typ. For example. This may be expressed using type schemes by writing this function in the explicitlytyped form fn (x:int.y)=>x+1 is independent of the type of y. string>string. then the expression is illtyped. A type scheme is a type expression involving one or more type variables standing for an unknown. property of ML. An instance of a type scheme is obtained by replacing each of the type variables occurring in it with a type scheme. there is always a most general (i. ’c (pronounced “γ”). and (’b>’b)>(’b>’b). It means. Therefore the identity function has inﬁnitely many types. This is called the principal typing property of ML: whenever type information is omitted. where typ is the type of the argument. it is said to be polymorphic. which is captured by the notion of a type scheme. which will (almost) always be found by the compiler.. with the same type scheme replacing each occurrence of a given type variable. fn x=>x. For example. W ORKING D RAFT D ECEMBER 9.e. Similarly. 2002 . Clearly there is a pattern here. Type schemes are used to express the polymorphic behavior of functions. Since the behavior of the identity function is the same for all possible choices of type for its argument. for example. ’b (pronounced “β”). we may write fn x:’a=>x for the polymorphic identity function of type ’a>’a. it does not have the type int>string as instance. that the programmer can enjoy the full beneﬁts of a static type system without paying any notational penalty whatsoever! The prototypical example is the identity function. among inﬁnitely many others. etc. Choosing the type of x to be typ. the type scheme ’a>’a has instances int>int. However.. meaning that the behavior of the identity function is independent of the type of x. the type scheme ’a>’b has both int>int and int>string as instances since there are different type variables occurring in the domain and range positions.y:’a)=>x+1 with type int*’a>int. but arbitrary type expression. This is an amazingly useful. (int*int)>(int*int).76 Type Inference and Polymorphism unambiguously where it is omitted. but constrains the type of x to be int. The body of the function places no constraints on the type of x. least restrictive) way to recover the omitted type information. since it merely returns x as the result without performing any computation on it.
That is. the constraints are solved using a process similar to Gaussian elimination. It is a remarkable fact about ML that every expression (with the exception of a few pesky examples that we’ll discuss below) has a principal type scheme.y) = x constrains neither the type of x nor the type of y. doing so would be overly restrictive since the types of the two parameters need not be the same. Second. the expression determines a set of equations governing the relationships among the types of its subexpressions. and if we write (fn x=>x)(fn x=>x)(0) the context forces the instance (int>int)>(int>int) of the principal type scheme for the identity at the ﬁrst occurrence. Every expression “seeks its own depth” in the sense that an occurrence of that expression is assigned a type that is an instance of its principal type scheme determined by the context of use. 2002 . if a function is applied to an argument. but usually we need more than one. Consequently we may choose their types freely and independently of one another. called uniﬁcation. For example. then a constraint equating the domain type of the function with the type of the argument is generated. if we write (fn x=>x)(0). and hence maximizes ﬂexibility in the use of the expression. How is this achieved? Type inference is a process of constraint satisfaction. It is said to be the most general or principal type scheme for the function. there is (almost) always a best or most general way to infer types for expressions that maximizes generality. the context forces the type of the identity function to be int>int. The equations can be classiﬁed by their solution sets as follows: 1.1 Type Inference 77 In these examples we needed only one type variable to express the polymorphic behavior of a function. Overconstrained: there is no solution. For example. W ORKING D RAFT D ECEMBER 9. and the instance int>int for the second. However.y:’b)=>x with type scheme ’a*’b>’a. This may be expressed by writing this function in the form fn (x:’a. For example.8. This corresponds to a type error. we could not assign the type ’a*’b>’c to this function because the type of the result must be the same as the type of the ﬁrst parameter: it returns its ﬁrst parameter when invoked! The type scheme ’a*’b>’a precisely captures the constraints that must be satisﬁed for the function to be type correct. Notice that while it is correct to assign the type ’a*’a>’a to this function. First. the function fn (x.
all that can be said is that the constraint set as a whole has no solution. The free type variables in the solution to the system of equations may be thought of as determining the “degrees of freedom” or “range of polymorphism” of the type of an expression — the constraints are solvable for any choice of types to substitute for these free type variables.2 Polymorphic Deﬁnitions There is an important interaction between polymorphic expressions and value bindings that may be illustrated by the following example. 8. 3. then the system of constraints associated with it will not have a solution. This corresponds to a completely unambiguous type inference problem. That is. (We may also write fun I(x:’a):’a = x for an equivalent binding of I. This may be captured by ascribing this type scheme to the variable I at the val binding. 2002 . which we will discuss further in section 8.78 Type Inference and Polymorphism 2. The usual method for ﬁnding the error is to insert sufﬁcient type information to narrow down the source of the inconsistency until the source of the difﬁculty is uncovered. There are two subcases: ambiguous (due to overloading. Underconstrained: there are many solutions. and at some point discovers that it cannot succeed.3). but this may or may not correspond to the “reason” (in the mind of the programmer) for the type error. The type inference procedure attempts to ﬁnd a solution to these constraints. each use of I determines a distinct instance of the ascribed type scheme ’a>’a. Uniquely determined: there is precisely one solution. It is fundamentally impossible to attribute this inconsistency to any particular constraint. we may write val I : ’a>’a = fn x=>x to ascribe the type scheme ’a>’a to the variable I. If a program is not type correct. This description of type inference as a constraint satisfaction procedure accounts for the notorious obscurity of type checking errors in ML. Suppose that we wish to bind the identity function to a variable I so that we may refer to it by name. with principal type scheme ’a>’a. We’ve previously observed that the identity function is polymorphic. That is. The checker usually reports the ﬁrst unsatisﬁable equation it encounters. or polymorphic (there is a “best” solution). both I 0 and I W ORKING D RAFT D ECEMBER 9.) Having done this.
However. variables introduced by a val binding are allowed to be polymorphic only if the righthand side is a value. For fun bindings this restriction is always met since the righthand side is implicitly a lambda expression. ’a>’a. As a convenience ML also provides a form of type inference on value bindings that eliminates the need to ascribe a type scheme to the variable when it is bound. The treatment of val bindings ensures that a bound variable has precisely the same types as its binding. the ﬁrst assigning the type int>int to I.2 Polymorphic Deﬁnitions 79 I 0 are wellformed expressions. of the binding to the variable I. which results in one window. we may write val I = fn x=>x or fun I(x) = x as a binding for the variable . For example.) To ensure semantic consistency. In practice we often allow the type checker to infer the principal type of a variable. it will open two windows. (Think. fn x=>x. This is not the same as evaluating it only once. but it is often good form to explicitly indicate the intended type as a consistency check and for documentation purposes. 2002 . The type checker implicitly assigns the principal type scheme. it might be thought that the following declaration introduces a polymorphic variable of type ’a > ’a.8. This is called the value restriction on polymorphic declarations. If no type is ascribed to a variable introduced by a val binding. in any expression where it is used. In other words the type checker behaves as though all uses of the bound variable are implicitly replaced by its binding before type checking. the meaning of a program is not necessarily preserved by this transformation. then it is implicitly ascribed the principal type scheme of the righthand side. for example. but in fact it is rejected by the compiler: W ORKING D RAFT D ECEMBER 9. Since this may involve replication of the binding. Thus the variable I behaves precisely the same as its deﬁnition. of any expression that opens a window on your screen: if you replicate the expression and evaluate it twice. the second assigning the types (int>int)>(int>int) and int>int to the two occurrences of I. which is a value.
Why is the value restriction necessary? Later on. In some rare circumstances this is not possible. which is a value. In this case the difﬁculty may be avoided by writing instead fun J x = I I x because now the binding of J is a lambda. There are a few corner cases that create problems for type inference. It is therefore ruled out as inadmissible for polymorphism. we cannot apply the “trick” of deﬁning l to be a function. For example. does not introduce an identiﬁer with a polymorphic type. the variable J may not be used polymorphically in the remainder of the program. notational practices. This particular example is not very impressive. The designers of ML were faced with a choice of simplicity vs ﬂexibility. 8. when we study mutable storage. it is by no means necessary. even though the almost equivalent declaration val l = nil does do so. but as the examples above illustrate. it requires computation to determine its value. if dubious. we are stuck with a loss of polymorphism in this case. The value restriction is an easilyremembered sufﬁcient condition for soundness.3 Overloading Type information cannot always be omitted. W ORKING D RAFT D ECEMBER 9. we’ll see that some restriction on polymorphism is essential if the language is to be type safe. The main source of difﬁculty stems from overloading of arithmetic operators.1 val l = nil @ nil. most of which arise because of concessions that are motivated by longstanding. the following declaration of a value of list type. As a concession to longstanding practice in informal mathematics 1 To be introduced in chapter 9. Since the righthand side is a list. and some polymorphism is lost. in this case they opted for simplicity at the expense of some expressiveness in the language. 2002 .80 val J = I I Type Inference and Polymorphism The reason is that the righthand side is not a value. but occasionally similar examples do arise in practice.
For example. Many compilers choose the second approach. then a problem arises. and hence to treat + as integer addition. 2002 . Arbitrarily choose a “default” interpretation. if we omit type information. if the expression fn x=>x+x occurs in the following.8. As long as we are programming in an explicitlytyped style. The reason is that the type inference process makes use of the surrounding context of an expression to help resolve ambiguities. What are we to make of the function fn x => x+x ? Does “+” stand for integer or ﬂoating point addition? There are two distinct reconstructions of the missing type information in this example. and force the programmer to provide enough explicit type information to resolve the ambiguity. the same notation is used for both integer and ﬂoating point arithmetic operations. say the integer arithmetic. The situation is actually a bit more subtle than the preceding discussion implies. that forces one interpretation or another. Which is the compiler to choose? When presented with such a program. For example. this convention creates no particular problems. explicit type information is required from the programmer. the compiler has two choices: 1. there is no question that the only possible resolution of the missing type information is to treat x as having type int. Declare the expression ambiguous. However. but issue a warning indicating that it has done so. W ORKING D RAFT D ECEMBER 9.3 Overloading 81 and in many programming languages. whereas in the function fn x:real => x+x it is equally obvious that ﬂoating point addition is intended. corresponding to the preceding two explictlytyped programs. in the function fn x:int => x+x it is clear that integer addition is called for. To avoid ambiguity. there is in fact no ambiguity: (fn x => x+x)(3). larger expression. Since the function is applied to an integer argument. 2. Each approach has its advantages and disadvantages.
Consider the function #name. even though its only uses are with integer arguments. note that the following program must be rejected because no resolution of the overloading of addition can render it meaningful: let val double = fn x => x+x in (double 3. the above program will be accepted (with a warning). No single choice can be correct. double 3. but the following one will be rejected: let val double = fn x => x+x in (double 3. . deﬁned by fun #name {name=n:string. double 4) end The function expression fn x=>x+x will be ﬂagged as ambiguous. consider the following example: let val double = fn x => x+x in (double 3..} = n W ORKING D RAFT D ECEMBER 9. For example.82 Type Inference and Polymorphism The important question is how much context is considered before the situation is considered ambiguous? The rule of thumb is that context is considered up to the nearest enclosing function declaration. which means that the compiler must commit at that point to treating the addition operation as either integer or ﬂoating point.. 2002 .0) end Finally.0. If your compiler adopts the integer interpretation as default.0) end The ambiguity must be resolved at the val binding. since we subsequently use double at both types. The reason is that value bindings are considered to be “units” of type inference for which all ambiguity must be resolved before type checking continues. double 4. A closelyrelated source of ambiguity is arises from the “record elision” notation described in chapter 5.
8.) W ORKING D RAFT D ECEMBER 9. yet we just now claimed that such a function deﬁnition is rejected as ambiguous. the expression would be rejected as ambiguous (unless the context resolves the ambiguity. Thus.3 Overloading 83 which selects the name ﬁeld of a record..salary:real} {name:string.} = n and then context is used to resolve the “local” ambiguity. This deﬁnition is ambiguous because the compiler cannot uniquely determine the domain type of the function! Any of the following types are legitimate domain types for #name. #address r) because the record type of r is explicitly given.salary:int} {name:string. If the type of r were omitted.address:string.. In chapter 5 we mentioned that functions such as #name are predeﬁned by the ML compiler. If not. This works well. 2002 .salary:int} => (#name r.address:string} Of course there are inﬁnitely many such examples. none of which is “best”: {name:string} {name:string. none of which is clearly preferable to the other. Isn’t this a contradiction? Not really. provided that the complete record type of the arguments to #name can be determined from context. This function deﬁnition is therefore rejected as ambiguous by the compiler — there is no one interpretation of the function that sufﬁces for all possible uses. the following expression is welltyped fn r : {name:string. the uses are rejected as ambiguous.. because what happens is that each occurrence of #name is replaced by the function fn {name=n.
84 Type Inference and Polymorphism W ORKING D RAFT D ECEMBER 9. 2002 .
The type expression typ list is a postﬁx notation for the application of the type constructor list to the type typ. if h is a value of type typ. 2.1 List Primitives In chapter 5 we noted that aggregate data structures are especially easy to handle in ML. In this chapter we consider another important aggregate type. the values of type typ list are the ﬁnite lists of values of type typ. The type of a list reveals the type of its elements. Nothing else is a value of type typ list. as follows: 1. In addition to being an important form of aggregate type it also illustrates two other general features of the ML type system: 1. 2. 2002 . 3. and t is a value of type typ list. The nullary (no argument) constructor nil W ORKING D RAFT D ECEMBER 9. The forms nil and :: are the value constructors of type typ list. Recursive types. written typ list. the values of type typ list are given by an inductive deﬁnition. then h::t is a value of type typ list. The set of values of a list type are given by an inductive deﬁnition. the list type. Informally. More precisely. nil is a value of type typ list. Thus list is a kind of “function” mapping types to types: given a type typ. or parameterized types.Chapter 9 Programming with Lists 9. we may apply list to it to get another type. Type constructors.
The :: operation takes as its second argument a value of type typ list. h::t. This selfreferential aspect is characteristic of an inductive deﬁnition. of type typ list is pronounced “h cons t” (for historical reasons). which is of the above form. or val1 ::val . Thus nil is the empty list of type typ list for any element type typ. where val is a value of type typ list. so we may omit the parentheses and just write val1 ::val2 ::· · ·::valn ::nil W ORKING D RAFT D ECEMBER 9. Two things are notable here: 1. The deﬁnition of the values of type typ list given above is an example of an inductive deﬁnition. The type is said to be recursive because this deﬁnition is “selfreferential” in the sense that the values of type typ list are deﬁned in terms of (other) values of the same type. We say that “h is cons’d onto t”. By induction val has the form (val2 :: (· · · ::(valn ::nil)· · ·)) and hence val again has the speciﬁed form. rather than to use it to form a list. 2002 . and yields a result of type typ list. and op :: constructs a nonempty list independently of the type of the elements of that list. Both nil and op :: are polymorphic in the type of the underlying elements of the list. which requires inﬁx notation. and that t is its tail. where vali is a value of type typi for each 1 ≤ i ≤ n. The binary (two argument) constructor :: constructs a nonempty list from a value h of type typ and another value t of type typ list. It is easy to see that a value val of type typ list has the form val1 ::(val2 :: (· · · ::(valn ::nil)· · ·)) for some n ≥ 0. 2. the resulting value. By convention the operator :: is rightassociative. that h is the head of the list. For according to the inductive deﬁnition of the values of type typ list.86 Programming with Lists may be thought of as the empty list. the value val must either be nil. This is especially clear if we examine the types of the value constructors for the type typ list: val nil : typ list val (op ::) : typ * typ list > typ list The notation op :: is used to refer to the :: operator as a function.
writing [ val1 . This notation emphasizes the interpretation of lists as ﬁnite sequences of values. where n is the length of the argument list. as should be obvious from its deﬁnition. The running time of append is proportional to the length of the ﬁrst list. The base case is the empty list. nil. Here’s the function to append two lists: fun append (nil. using a clausal deﬁnition that analyzes the structure of a list. 9. We may deﬁne other functions following a similar pattern.. it is written using inﬁx notation as exp1 @ exp2 .2 Computing With Lists How do we compute with values of list type? Since the values are deﬁned inductively. fun rev nil = nil  rev (h::t) = rev t @ [h] Its running time is O(n2 ). val2 . This can be demonstrated by writing down a recurrence that deﬁnes the running time T (n) on a list of length n. The inductive step is the nonempty list ::t (notice that we do not need to give a name to the head).9. l) = l  append (h::t. Its deﬁnition is given in terms of the tail of the list t. This may be further abbreviated using list notation. but it obscures the fundamental inductive character of lists as being built up from nil using the :: operation. l) = h :: append (t. Here’s a deﬁnition of the function length that computes the number of elements of a list: fun length nil = 0  length ( ::t) = 1 + length t The deﬁnition is given by induction on the structure of the list argument. it is deﬁned for lists of values of any type whatsoever. The type of length is ’a list > int. T (0) = O(1) T (n + 1) = T (n) + O(n) W ORKING D RAFT D ECEMBER 9... . 2002 . Here’s a function to reverse a list. it is natural that functions on lists be deﬁned recursively. l) This function is built into ML.2 Computing With Lists 87 as the general form of val of type typ list. which is “smaller” than the list ::t. valn ] for the same list.
88 Programming with Lists Solving the recurrence we obtain the result T (n) = O(n2 ). and 2.a) evaluates to a. The correctness of functions deﬁned on lists may be established using the principle of structural induction. there are no preconditions on l and a. it sufﬁces to show 1. which is equivalent to (rev t) @ [h] @ a. we can take advantage of the nonassociativity of :: to give a tailrecursive deﬁnition of rev. a) evaluates to the result of appending a to the reversal of l. To show that a function works correctly for every list l. By the inductive hypothesis this is just (rev t) @ (h :: a). That is. 2002 . We illustrate this by establishing that the function helper satisﬁes the following speciﬁcation: for every l and a of type typ list. The proof is by structural induction on the list l. The correctness of the function for h::t. (h::a)). If l is nil. a) = a  helper (h::t. a) reduces to the value of helper (t. That is. a) = helper (t. If l is the list h::t. Notice that helper runs in time proportional to the length of its ﬁrst argument. helper(l. The principle of structural induction may be summarized as follows. a) yields (rev l) @ a. The correctness of the function for the empty list. W ORKING D RAFT D ECEMBER 9. But this is just rev (h::t) @ a. a) evaluates to (rev l) @ a. local fun helper (nil. except that by reordering the applications of :: we reverse the list! The helper function reverses its ﬁrst argument and prepends it to its second argument. which was to be shown. then the application helper (l. nil. and we establish the postcondition that helper (l. which fulﬁlls the postcondition. h::a) in fun rev’ l = helper (l. helper (l. nil) end The general idea of introducing an accumulator is the same as before. where we assume here an independent deﬁnition of rev for the sake of the speciﬁcation. and hence rev’ runs in time proportional to the length of its argument. assuming its correctness for t. Can we do better? Oddly. then helper (l.
2002 .2 Computing With Lists 89 As with mathematical induction over the natural numbers.9. structural induction over lists allows us to focus on the basic and incremental behavior of a function to establish its correctness for all lists. W ORKING D RAFT D ECEMBER 9.
90 Programming with Lists W ORKING D RAFT D ECEMBER 9. 2002 .
a zeroargument. The type constructors introduced may.1 Datatype Declarations Lists are one example of the general notion of a recursive type. for introducing programmerdeﬁned recursive types. A datatype declaration introduces 1. or may not. Types are given names as documentation and as a convenience to the programmer. 2002 . The datatype declaration in ML has a number of facets. ML provides a general mechanism. In contrast the datatype declaration provides a means of introducing a new type that is distinct from all other types and that does not merely stand for some other type. a nullary value constructor is just a constant. One or more new type constructors. if a datatype redeﬁnes an “old” type or value W ORKING D RAFT D ECEMBER 9. Each value constructor may also take zero or more arguments. but doing so is semantically inconsequential — one could replace all uses of the type name by its deﬁnition and not affect the behavior of the program. or nullary.Chapter 10 Concrete Data Types 10. 2. It is the means by which the ML type system may be extended by the programmer. the datatype declaration. Earlier we introduced type declarations as an abbreviation mechanism. The type and value constructors introduced by the declaration are “new” in the sense that they are distinct from all other type and value constructors previously introduced. One or more new value constructors for each of the type constructors introduced by the declaration. be mutually recursive. The type constructors may take zero or more arguments. type constructor is just a type.
Hearts) = false (Hearts. There is no signiﬁcance to the ordering of the constructors in the declaration. datatype suit = Spades  Hearts  Diamonds  Clubs This declaration introduces a new type suit with four nullary value constructors. we may deﬁne the suit ordering in the card game of bridge by the function fun        outranks outranks outranks outranks outranks outranks outranks outranks (Spades. rendering the old ones inaccessible in the scope of the new deﬁnition. and SOME. This declaration may be read as introducing a type suit such that a value of type suit is either Spades. but this is not required by the language. with no arguments. ) = true (Hearts. Spades) = false (Spades.92 Concrete Data Types constructor. For example. Spades) = false (Hearts.2 NonRecursive Datatypes Here’s a simple example of a nullary type constructor with four nullary value constructors. 10. The values of type typ option are W ORKING D RAFT D ECEMBER 9. Spades. For example. for that matter). ) = true (Diamonds. 2002 . with one. It is conventional to capitalize the names of value constructors. ) = false (Clubs. ) = false This deﬁnes a function of type suit * suit > bool that determines whether or not the ﬁrst suit outranks the second. Clubs) = true (Diamonds. we may deﬁne functions on it by case analysis on the value constructors using a clausal function deﬁnition. or Clubs. we could just as well have written datatype suit = Hearts  Diamonds  Spades  Clubs (or any other ordering. Diamonds. then the old deﬁnition is shadowed by the new one. and Clubs. Hearts. Given the declaration of the type suit. Data types may be parameterized by a type. or Hearts. NONE. the declaration datatype ’a option = NONE  SOME of ’a introduces the unary type constructor ’a option with two value constructors. or Diamonds.
For example. someone may not have a spouse. For example. Option types may also be used to represent an optional result.10. For example. spouse:string option } so that one would create an entry for an unmarried person with a spouse ﬁeld of NONE. n) = if n mod 2 = 0 then expt (SOME b*b. n1) The advantage of the option type in this sort of situation is that it avoids the need to make a special case of a particular argument. and otherwise indicates that it has no reciprocal. 93 For example. n div 2) else b * expt (SOME b. n)  expt (SOME b. This is achieve by deﬁning reciprocal to have type int > int option as follows: fun reciprocal 0 = NONE  reciprocal n = SOME (1 div n) To use the result of a call to reciprocal we must perform a case analysis of the form W ORKING D RAFT D ECEMBER 9. but they all have a name. e. Values of the form SOME val . A related use of option types is in aggregate data structures. and 2. For example. The constant NONE. 0) = 1  expt (SOME b. For this we might use a type deﬁnition of the form type entry = { name:string. The option type constructor is predeﬁned in Standard ML.g. we may wish to deﬁne a function reciprocal that returns the reciprocal of an integer. here is a function to compute the baseb exponential function for natural number exponents that defaults to base 2: fun expt (NONE. an address book entry might have a record type with ﬁelds for various bits of data about a person. if it has one. But not all data is relevant to all people.. One common use of option types is to handle functions with an optional argument. where val is a value of type typ. n) = expt (SOME 2.2 NonRecursive Datatypes 1. SOME "abc". and SOME "def". some values of type string option are NONE. using 0 as ﬁrst argument to mean “use the default exponent”. 2002 .
For example.94 case (reciprocal exp of NONE => exp1  SOME r => exp2 Concrete Data Types where exp1 covers the case that exp has no reciprocal. rather than one. tree 2 ) is a binary tree. rht)) = 1 + max (height lft. If tree 1 and tree 2 are binary trees. Nothing else is a binary tree. (Incidentally. One can think of the children of a node in a binary tree as the “predecessors” of that node.) To compute with a recursive type. 10. a leaf in a binary tree is here represented as a node both of whose children are the empty tree. the only difference compared to the usual deﬁnition of predecessor being that a node has two. one may deﬁne a type typ tree of binary trees with values of type typ at the nodes using the following declaration: datatype ’a tree = Empty  Node of ’a tree * ’a * ’a tree This declaration corresponds to the informal deﬁnition of binary trees with values of type typ at the nodes: 1. The distinguishing feature of this deﬁnition is that it is recursive in the sense that binary trees are constructed out of other binary trees. . with the empty tree serving as the base case. here is the function to compute the height of a binary tree: fun height Empty = 0  height (Node (lft. use a recursive function. val . rather than one. height rht) W ORKING D RAFT D ECEMBER 9. then Node (tree 1 . This deﬁnition of binary trees is analogous to starting the natural numbers with zero. 2002 . 2. predecessors. and exp2 covers the case that exp has reciprocal r. 3. The empty tree Empty is a binary tree. For example.3 Recursive Datatypes The next level of generality is the recursive type deﬁnition. and val is a value of type typ.
The general idea is to deﬁne the function directly for the base cases of the recursive type (i.3 Recursive Datatypes 95 Notice that height is called recursively on the children of a node. The size of a binary tree is the number of nodes occurring in it. 2002 . Here’s another example. whose nodes have at most three children.10. it’s a matter of terminology which interpretation you prefer. One way combines lists and trees. We will see numerous examples of this as we go along. By capitalizing constructors we can hope to make mistakes such as these more evident. rht)) = 1 + size lft + size rht The function size is deﬁned by structural induction on trees. (Of course. . The function height is said to be deﬁned by induction on the structure of a tree. Suppose we gave the following deﬁnition of size: fun size empty = 0  size (Node (lft. is a variable. and so on for other ﬁxed arities. This pattern of deﬁnition is another instance of structural induction (on the tree type). but in practice you are bound to run into this sort of mistake. as follows: W ORKING D RAFT D ECEMBER 9. spelled with a lowercase “e”. and is deﬁned outright on the empty tree. One reason to capitalize value constructors is to avoid a pitfall in the ML syntax that we mentioned in chapter 2. either or both children might be the empty tree. some might have two. rht)) = 1 + size lft + size rht The compiler will warn us that the second clause of the deﬁnition is redundant! Why? Because empty. so we may consider this to deﬁne the type of trees with at most two children. A word of warning. not a constructor.. . and so on. value constructors with no arguments or whose arguments do not involve values of the type being deﬁned). Consequently the second clause never applies.) It should be obvious how to deﬁne the type of ternary trees.e. The tree data type is appropriate for binary trees: those for which every node has exactly two children. This can be achieved in at least two ways. and to deﬁne it for nonbase cases in terms of its deﬁnitions for the constituent values of that type. But what if we wished to deﬁne a type of trees with a variable number of children? In a socalled variadic tree some nodes might have three children. Here’s a straightforward deﬁnition in ML: fun size Empty = 0  size (Node (lft. and hence matches any tree whatsoever.
96 datatype ’a tree = Empty  Node of ’a * ’a tree list
Concrete Data Types
Each node has a list of children, so that distinct nodes may have different numbers of children. Notice that the empty tree is distinct from the tree with one node and no children because there is no data associated with the empty tree, whereas there is a value of type ’a at each node. Another approach is to simultaneously deﬁne trees and “forests”. A variadic tree is either empty, or a node gathering a “forest” to form a tree; a forest is either empty or a variadic tree together with another forest. This leads to the following deﬁnition: datatype ’a tree = Empty  Node of ’a * ’a forest and ’a forest = Nil  Cons of ’a tree * ’a forest This example illustrates the introduction of two mutually recursive datatypes. Mutually recursive datatypes beget mutually recursive functions. Here’s a deﬁnition of the size (number of nodes) of a variadic tree: fun  and  size size size size tree Empty = tree (Node ( forest Nil = forest (Cons 0 , f)) = 1 + size forest f 0 (t, f’)) = size tree t + size forest f’
Notice that we deﬁne the size of a tree in terms of the size of a forest, and vice versa, just as the type of trees is deﬁned in terms of the type of forests. Many other variations are possible. Suppose we wish to deﬁne a notion of binary tree in which data items are associated with branches, rather than nodes. Here’s a datatype declaration for such trees: datatype ’a tree = Empty  Node of ’a branch * ’a branch and ’a branch = Branch of ’a * ’a tree
W ORKING D RAFT
D ECEMBER 9, 2002
10.4 Heterogeneous Data Structures
97
In contrast to our ﬁrst deﬁnition of binary trees, in which the branches from a node to its children were implicit, we now make the branches themselves explicit, since data is attached to them. For example, we can collect into a list the data items labelling the branches of such a tree using the following code: fun collect Empty = nil  collect (Node (Branch (ld, lt), Branch (rd, rt))) = ld :: rd :: (collect lt) @ (collect rt)
10.4
Heterogeneous Data Structures
Returning to the original deﬁnition of binary trees (with data items at the nodes), observe that the type of the data items at the nodes must be the same for every node of the tree. For example, a value of type int tree has an integer at every node, and a value of type string tree has a string at every node. Therefore an expression such as Node (Empty, 43, Node (Empty, "43", Empty)) is illtyped. The type system insists that trees be homogeneous in the sense that the type of the data items is the same at every node. It is quite rare to encounter heterogeneous data structures in real programs. For example, a dictionary with strings as keys might be represented as a binary search tree with strings at the nodes; there is no need for heterogeneity to represent such a data structure. But occasionally one might wish to work with a heterogeneous tree, whose data values at each node are of different types. How would one represent such a thing in ML? To discover the answer, ﬁrst think about how one might manipulate such a data structure. When accessing a node, we would need to check at runtime whether the data item is an integer or a string; otherwise we would not know whether to, say, add 1 to it, or concatenate "1" to the end of it. This suggests that the data item must be labelled with sufﬁcient information so that we may determine the type of the item at runtime. We must also be able to recover the underlying data item itself so that familiar operations (such as addition or string concatenation) may be applied to it. The required labelling and discrimination is neatly achieved using a datatype declaration. Suppose we wish to represent the type of integerorstring trees. First, we deﬁne the type of values to be integers or strings, marked with a constructor indicating which:
W ORKING D RAFT D ECEMBER 9, 2002
98 datatype int or string = Int of int  String of string Then we deﬁne the type of interest as follows: type int or string tree = int or string tree
Concrete Data Types
Voila! Perfectly natural and easy — heterogeneity is really a special case of homogeneity!
10.5 Abstract Syntax
Datatype declarations and pattern matching are extremely useful for deﬁning and manipulating the abstract syntax of a language. For example, we may deﬁne a small language of arithmetic expressions using the following declaration: datatype expr = Numeral of int  Plus of expr * expr  Times of expr * expr This deﬁnition has only three clauses, but one could readily imagine adding others. Here is the deﬁnition of a function to evaluate expressions of the language of arithmetic expressions written using pattern matching: fun eval (Numeral n) = Numeral n  eval (Plus (e1, e2)) = let val Numeral n1 = eval e1 val Numeral n2 = eval e2 in Numeral (n1+n2) end  eval (Times (e1, e2)) = let val Numeral n1 = eval e1 val Numeral n2 = eval e2 in Numeral (n1*n2) end
W ORKING D RAFT D ECEMBER 9, 2002
10.5 Abstract Syntax
99
The combination of datatype declarations and pattern matching contributes enormously to the readability of programs written in ML. A less obvious, but more important, beneﬁt is the error checking that the compiler can perform for you if you use these mechanisms in tandem. As an example, suppose that we extend the type expr with a new component for the reciprocal of a number, yielding the following revised deﬁnition: datatype expr = Numeral of int  Plus of expr * expr  Times of expr * expr  Recip of expr First, observe that the “old” deﬁnition of eval is no longer applicable to values of type expr! For example, the expression eval (Plus (Numeral 1, Numeral 2)) is illtyped, even though it doesn’t use the Recip constructor. The reason is that the redeclaration of expr introduces a “new” type that just happens to have the same name as the “old” type, but is in fact distinct from it. This is a boon because it reminds us to recompile the old code relative to the new deﬁnition of the expr type. Second, upon recompiling the deﬁnition of eval we encounter an inexhaustive match warning: the old code no longer applies to every value of type expr according to its new deﬁnition! We are of course lacking a case for Recip, which we may provide as follows: fun    eval (Numeral n) = Numeral n eval (Plus (e1, e2)) = ... as before ... eval (Times (e1, e2)) = ... as before ... eval (Recip e) = let val Numeral n = eval e in Numeral (1 div n) end
The value of the checks provided by the compiler in such cases cannot be overestimated. When recompiling a large program after making a change to a datatype declaration the compiler will automatically point out every line of code that must be changed to conform to the new deﬁnition; it is
W ORKING D RAFT D ECEMBER 9, 2002
100
Concrete Data Types
impossible to forget to attend to even a single case. This is a tremendous help to the developer, especially if she is not the original author of the code being modiﬁed and is another reason why the static type discipline of ML is a positive beneﬁt, rather than a hindrance, to programmers.
W ORKING D RAFT
D ECEMBER 9, 2002
Chapter 11
HigherOrder Functions
11.1 Functions as Values
Values of function type are ﬁrstclass, which means that they have the same rights and privileges as values of any other type. In particular, functions may be passed as arguments and returned as results of other functions, and functions may be stored in and retrieved from data structures such as lists and trees. We will see that ﬁrstclass functions are an important source of expressive power in ML. Functions which take functions as arguments or yield functions as results are known as higherorder functions (or, less often, as functionals or operators). Higherorder functions arise frequently in mathematics. For example, the differential operator is the higherorder function that, when given a (differentiable) function on the real line, yields its ﬁrst derivative as a function on the real line. We also encounter functionals mapping functions to real numbers, and real numbers to functions. An example of the former is provided by the deﬁnite integral viewed as a function of its integrand, and an example of the latter is the deﬁnite integral of a given function on the interval [a, x], viewed as a function of a, that yields the area under the curve from a to x as a function of x. Higherorder functions are less familiar tools for many programmers since the bestknown programming languages have only rudimentary mechanisms to support their use. In contrast higherorder functions play a prominent role in ML, with a variety of interesting applications. Their use may be classiﬁed into two broad categories: 1. Abstracting patterns of control. Higherorder functions are design patterns that “abstract out” the details of a computation to lay bare the
W ORKING D RAFT D ECEMBER 9, 2002
it should be clear by now that the variable y is bound to 32 after processing the following sequence of declarations: val val val val x y x y = = = = 2 x*x y*x x*y (* (* (* (* x=2 *) y=4 *) x=8 *) y=32 *) In the presence of function deﬁnitions the situation is the same. This principle is easy to apply when considering sequences of declarations. as we will see later.102 HigherOrder Functions skeleton of the solution. Staging can be used both to improve efﬁciency and. any “previous” declarations are temporarily shadowed by the latest one. but it can be a bit tricky to understand at ﬁrst. The skeleton may be ﬂeshed out to form a solution of a problem by applying the general pattern to arguments that isolate the speciﬁc problem instance. we will review the critically important concept of scope as it applies to function deﬁnitions. meaning that identiﬁers are resolved according to the static structure of the program. Staging computation. We say “nearest” because of the possibility of shadowing. It arises frequently that computation may be staged by expending additional effort “early” to simplify the computation of “later” results. 11. to control sharing of computational resources. then subsequent uses of var refer to the “most recent” (lexically!) declaration of it.2 Binding and Scope Before discussing these programming techniques. if we redeclare a variable var . not to 7! The reason is that the occurrence of x in the body of f refers to the ﬁrst W ORKING D RAFT D ECEMBER 9. For example. Recall that Standard ML is a statically scoped language. A use of the variable var is considered to be a reference to the nearest lexically enclosing declaration of var . 2. Here’s an example to test your grasp of the lexical scoping principle: val fun val val x f x z = y = = 2 = x+y 3 f 4 After processing these declarations the variable z is bound to 6. 2002 .
add a binding of y to 4. ”Shadowed” bindings are not lost. but when f is applied.e. The variable x is subsequently rebound to 3. Binding is not assignment! If we were to view the second binding of x as an assignment statement. the variable x was bound to the value 2.. not 6. To warm up let’s consider some simple examples of passing functions as arguments and yielding functions as results.11. Returning to the example above. we temporarily reinstate the binding of x to 2. The swapped environment is restored after the call is complete. The “old” binding for x is still available (through calls to f). even though a more recent binding has shadowed it. then the value of z would be 7. 2002 . The standard example of W ORKING D RAFT D ECEMBER 9.3 Returning Functions 103 declaration of x since it is the nearest lexically enclosing declaration of the occurence. the function is therefore said to be “closed” with respect to the attached environment. 3. Scope resolution is lexical. One way to understand what’s going on here is through the concept of a closure. a copy of the environment is attached to the function. This example illustrates three important points: 1. not temporal. We then restore the binding of x and drop the binding of y before yielding the result. We’ll demonstrate this through a series of examples. 11. then evaluate the body of the function. but we always mean “nearest lexically enclosing at the point of occurrence”.3 Returning Functions While seemingly very simple. the environment associated with the function f contains the declaration val x = 2 to record the fact that at the time the function was evaluated. When a function expression is evaluated. yielding 6. even though it has been subsequently redeclared. We sometimes refer to the “most recent” declaration of a variable. Subsequently. 2. the principle of lexical scope is the source of considerable expressive power. This is achieved at function application time by “swapping” the attached environment of the function for the environment active at the point of the call. a technique for implementing higherorder functions. all free variables of the function (i. those variables not occurring as parameters) are resolved with respect to the environment attached to the function. which has a temporal ﬂavor.
3.2. 2002 . What is surprising is that we can create new functions during execution. The value of the application constantly 3 is the function that is constantly 3. the closure is the representation of the “new” function yielded by the application. The resulting function is “created” by the application of constantly to the argument 3. W ORKING D RAFT D ECEMBER 9..3. which applies a given function to every element of a list. Yet nowhere have we deﬁned the function that always yields 3. If we think of the lambda as the “executable code” of the function. In implementation terms the result of the application constantly 3 is a closure consisting of the function fn a => k with the environment val k = 3 attached to it.4. We used the fn notation for clarity. it always yields 3 when applied. Here’s a deﬁnition of constantly: val constantly = fn k => (fn a => k) The function constantly has type ’a > (’b > ’a). but the declaration of the function constantly may also be written using fun notation as follows: fun constantly k a = k Note well that a white space separates the two successive arguments to constantly! The meaning of this declaration is precisely the same as the earlier deﬁnition using fn notation. i.4]) evaluates to the list [2. the application map’ (fn x => x+1.e.104 HigherOrder Functions passing a function as argument is the map’ function. h::t) = (f h) :: map’ (f. nil) = nil  map’ (f. It is deﬁned as follows: fun map’ (f. however. [1.5]. Functions may also yield functions as results. the application constantly k yields a function that yields k whenever it is applied. The most basic (and deceptively simple) example is the function constantly that creates constant functions: given a value k. rather than merely “retrieved” off the shelf of previouslydeﬁned functions. the underlying function is always fn a => k. that the only difference between any two results of applying the function constantly lies in the attached environment. Notice. The closure is a data structure (a pair) that is created by each application of constantly to an argument. not just return functions that have been previously deﬁned. t) For example.
with the list argument varying each time. This also points out why functions in ML are not the same as code pointers in C. This leads to the following deﬁnition of the function map: fun map f nil = nil  map f (h::t) = (f h) :: (map f t) The function map so deﬁned has type (’a>’b) > ’a list > ’b list. differing in the bindings of the variables in its environment. The nonvarying part of the closure.3 Returning Functions 105 then this amounts to the observation that no new code is created at runtime. Instead we would prefer to create a instance of map specialized to the given function that can then be applied to many different lists. yielding a new list as result. It is inconvenient (and a tad inefﬁcient) to keep passing the same function to map’. We have changed a twoargument function (more properly. a function taking a pair as argument) into a function that takes two arguments in succession. is directly analogous to a function pointer in C. y) The type of curry is (’a*’b>’c) > (’a > (’b > ’c)). You may be familiar with the idea of passing a pointer to a C function to another C function as a means of passing functions as arguments or yielding functions as results. the dynamic environment.11. Often it occurs that we wish to map the same function across several different lists. Given a twoargument function. but it must be emphasized that code pointers are signiﬁcantly less expressive than closures because in C there are only statically many possibilities for a code pointer (it must point to one of the functions deﬁned in your code). and yields another function of type ’a list > ’b list as result. the code. just new instances of existing code. whereas in ML we may generate dynamically many different instances of a function. It takes a function of type ’a > ’b as argument. This may be considered to be a form of “higherorder” function in C. The deﬁnition of the function map’ given above takes a function and list as arguments. but there is no counterpart in C of the varying part of the closure. when applied to the W ORKING D RAFT D ECEMBER 9. yields a function that. when applied to the ﬁrst argument. 2002 . curry returns another function that. This passage can be codiﬁed as follows: fun curry f x y = f (x. The passage from map’ to map is called currying. yielding after the ﬁrst a function that takes the second as its sole argument.
nil) = unit  reduce (unit. fun  fun  add add mul mul up up up up nil = 0 (h::t) = h + add up t nil = 1 (h::t) = h * mul up t What precisely is the similarity? We will look at it from two points of view. This pattern can be abstracted as the function reduce deﬁned as follows: fun reduce (unit. One view is that in each case we have a binary operation and a unit element for it. opn. so that this deﬁnition is equivalent to the more verbose declaration fun map f l = ((curry map’) f) l 11. 2002 . and the third is the list of values. applies the original twoargument function to the ﬁrst and second arguments. the second is the operation. and the result on a nonempty list is the operation applied to the head of the list and the result on the tail. h::t) = opn (h. one to add up the numbers in a list. There is an obvious similarity between the following two functions. reduce (unit. opn. The result on the empty list is the unit element. Notice that the type of the operation admits the possibility of the ﬁrst argument having a different type from the second argument and result. given separately. Using reduce. t)) Here is the type of reduce: val reduce : ’b * (’a*’b>’b) * ’a list > ’b The ﬁrst argument is the unit element. the other to multiply them. opn.4 Patterns of Control We turn now to the idea of abstracting patterns of control. Observe that map may be alternately deﬁned by the binding fun map f l = curry map’ f l Applications are implicitly leftassociated.106 HigherOrder Functions second. we may redeﬁne add up and mul up as follows: W ORKING D RAFT D ECEMBER 9.
The deﬁnition of reduce leaves something to be desired. l) 107 To further check your understanding. which is then specialized to yield add up and mul up. op *. Said another way. op +. One thing to notice is that the arguments unit and opn are carried unchanged through the recursion. But this is really just another way of saying that they are deﬁned in terms of a unit element and a binary operation! The difference is one of perspective: whether we focus on the pattern part of the clauses (the inductive decomposition) or the result part of the clauses (the unit and operation). the function reduce abstracts the pattern of deﬁning a function by induction on the structure of a list. Is there a better way? Here’s another deﬁnition that isolates the “inner loop” as an auxiliary function: fun better reduce (unit. consider the following declaration: fun mystery l = reduce (nil.4 Patterns of Control fun add up l = reduce (0. This W ORKING D RAFT D ECEMBER 9. and an inductive case for h::t. l) (Recall that “op ::” is the function of type ’a * ’a list > ’a list that adds a given value to the front of a list. with a base case for nil. op ::.) What function does mystery compute? Another view of the commonality between add up and mul up is that they are both deﬁned by induction on the structure of the list argument. opn.11. it’s important to remember that multiargument functions are really singleargument functions that take a tuple as argument. 2002 . only the list parameter changes on recursive calls. l) fun mul up l = reduce (1. red t) in red l end Notice that each call to better reduce creates a new function red that uses the parameters unit and opn of the call to better reduce. but whose third component varies. deﬁned in terms of its behavior on t. While this might seem like a minor overhead. This means that each time around the loop we are constructing a new tuple whose ﬁrst and second components remain ﬁxed. The recursive structure of add up and mul up is abstracted by the reduce functional. l) = let fun red nil = unit  red (h::t) = opn (h.
5 Staging An interesting variation on reduce may be obtained by staging the computation.108 HigherOrder Functions means that red is bound to a closure consisting of the code for the function together with the environment active at the point of deﬁnition. yielding a function of the late arguments alone. 2002 . yielding it as a function that may be later applied to many different lists. We could just as well have replaced the body of the let expression with the function fn l => red l but a moment’s thought reveals that the meaning is the same. rather than each time the list argument is supplied. 11. the only difference is that the creation of the closure bound to red occurs as soon as unit and opn are known. opn) = let fun red nil = unit  red (h::t) = opn (h. The idea of staging is to perform as much computation as possible on the basis of the early arguments. saving the overhead of creating tuples on each iteration of the loop. which will provide bindings for unit and opn arising from the application of better reduce to its arguments. Here’s the code: fun staged reduce (unit. the recursive calls to red no longer carry bindings for unit and opn. Note well that we would not obtain the effect of staging were we to use the following deﬁnition: W ORKING D RAFT D ECEMBER 9.. red t) in red end The deﬁnition of staged reduce bears a close resemblance to the deﬁnition of better reduce. The motivation is that unit and opn often remain ﬁxed for many different lists (e. In this case unit and opn are said to be “early” arguments and the list is said to be a “late” argument. Thus the overhead of closure creation is “factored out” of multiple applications of the resulting function to list arguments. In the case of the function reduce this amounts to building red on the basis of unit and opn. we may wish to sum the elements of many different lists).g. Furthermore.
In particular. Some are. we see that while we are taking two arguments in succession. curried reduce (unit. but it is necessary to deviate from standard practice to avoid a serious misapprehension. appends the second to the end of the ﬁrst. Since staged reduce and curried reduce have the same iterated function type. Here’s a naive solution that merely curries append: fun curried append nil l = l  curried append (h::t) l = h :: append t l Unfortunately this solution doesn’t exploit the fact that the ﬁrst argument is ﬁxed for many second arguments. l) = h :: append(t. namely (’b * (’a * ’b > ’b)) > ’a list > ’b the contrast between these two examples may be summarized by saying not every function of iterated function type is curried. we are not doing any useful work in between the arrival of the ﬁrst argument (a pair) and the second (a list). and some aren’t. (This directly contradicts established terminology.11.5 Staging fun curried reduce (unit.) The time saved by staging the computation in the deﬁnition of staged reduce is admittedly minor. What we’d like is to build a specialized appender for the ﬁrst list that. each application of the result of applying curried append to a list results in the ﬁrst list being traversed so that the second can be appended to it. 2002 . when applied to a second list. A curried function does not take signiﬁcant advantage of staging. But consider the following deﬁnition of an append function for lists that takes both arguments at once: fun append (nil. We can improve on this by staging the computation as follows: W ORKING D RAFT D ECEMBER 9. l) = l  append (h::t. opn) (h::t) = opn (h.l) Suppose that we will have occasion to append many lists to the end of a given list. The “interesting” examples (such as staged reduce) are the ones that aren’t curried. opn) t) 109 If we unravel the fun notation. opn) nil = unit  curried reduce (unit.
vn ].. :: vn :: l. the function fn l => v1 :: v2 :: . 2002 . but not quite as efﬁcient as.. the function staged append yields a function that is equivalent to. but a substantial savings accrues from avoiding the pattern matching required to destructure the original list argument on each call... W ORKING D RAFT D ECEMBER 9. When applied to a list [v1 ..110 HigherOrder Functions fun staged append nil = fn l => l  staged append (h::t) = let val tail appender = staged append t in fn l => h :: tail appender l end Notice that the ﬁrst list is traversed once for all applications to a second argument. This still takes time proportional to n..
and may have an effect. The main examples of effects in ML are these: 1. Storage may be allocated and modiﬁed during evaluation. Data may be sent to and received from communication channels. but we don’t usually think of failure to terminate as a positive “action” in its own right. So far we’ve concentrated on typing and evaluation. 4. While it’s hard to give a precise general deﬁnition of what we mean by an effect. the idea is that an effect is any action resulting from evaluation of an expression other than returning a value. 2. This chapter is concerned with exceptions. rather as a failure to take any action. 12. Input/output. may have a value.1 Exceptions as Errors ML is a safe language in the sense that its execution behavior may be understood entirely in terms of the constructs of the language itself. 2002 . Communication. From this point of view we might consider nontermination to be an effect.Chapter 12 Exceptions In the ﬁrst chapter of these notes we mentioned that expressions in Standard ML always have a type. the other forms of effects will be considered later. It is possible to read from an input source and write to an output sink during evaluation. 3. Evaluation may be aborted by signaling an exceptional condition. In this chapter we will introduce the concept of an effect. Mutation. BehavW ORKING D RAFT D ECEMBER 9. Exceptions.
1 Primitve Exceptions The expression 3 + "3" is illtyped. evaluation of the expression maxint * maxint (where maxint is the largest representable integer) causes the exception Overflow to be raised. Static violations are signalled by type checking errors.g.. Suppose we deﬁne the function hd as follows W ORKING D RAFT D ECEMBER 9.1. which rules out expressions that are manifestly illdeﬁned (e. but incurs a runtime fault that is signalled by raising the exception Div. division by zero or arithmetic overﬂow). Exceptions have names so that we may distinguish different sources of error in a program. This is ensured by a combination of a static type discipline. together with the line number (or some other indication) of the point in the program where the exception occurred.g. adding a string to an integer or casting an integer as a function). We will indicate this by writing 3 div 0 ⇓ raise Div An exception is a form of “answer” to the question “what is the value this expression?”. which sacriﬁce precise delivery of arithmetic faults in the interest of speeding up execution in the nonfaulting case. and by dynamic checks that rule out violations that cannot be detected statically (e. In most implementations an exception such as this is reported by an error message of the form “Uncaught exception Div”.) Another source of runtime exceptions is an inexhaustive match. These cannot arise in ML. This can be quite expensive on pipelined processors. In contrast the expression 3 div 0 is welltyped (with type int). For example. 2002 .112 Exceptions ior such as “dumping core” or incurring a “bus error” are extralinguistic notions that may only be explained by appeal to the underlying implementation of the language. (You may be wondering about the overhead of checking for arithmetic faults. corresponding to division by zero. indicating that an arithmetic overﬂow error arose in the attempt to carry out the multiplication. 12. dynamic violations are signalled by raising exceptions. The compiler must generate instructions that ensure that an overﬂow fault is caught before any subsequent operations are performed.. This is usefully distinguished from the exception Div. Unfortunately it is necessary to incur this overhead if we are to avoid having the behavior of an ML program depend on the underlying processor on which it is implemented. and hence cannot be evaluated.
easing the process of tracking down the source of the error. 12.1 Exceptions as Errors fun hd (h:: ) = h 113 This deﬁnition is inexhaustive since it makes no provision for the possibility of the argument being nil. A related situation is the use of a pattern in a val binding to destructure a value. and signals.2 UserDeﬁned Exceptions So far we have considered examples of predeﬁned exceptions that indicate fatal error conditions. evidently). Since the builtin exceptions have a builtin meaning. The ﬂip side is that if no inexhaustive match warnings arise during type checking. ML programs are implicitly “bulletproofed” against failures of pattern matching. indicating the failure of the patternmatching process: hd nil ⇓ raise Match The occurrence of a Match exception at runtime is indicative of a violation of a precondition to the invocation of a function somewhere in the program. For example. and signal it using a raise expression when a runtime violation occurs. Recall that it is often sensible for a function to be inexhaustive. 2002 .12. That is. Here again observe that a Bind exception cannot arise unless the compiler has previously warned us of the possibility: no warning. no Bind exception. then a Bind exception is raised at runtime. it is generally inadvisable to use these to signal programspeciﬁc error conditions. Should this occur (because of a programming mistake. Instead we introduce a new exception using an exception declaration. That way we can associate speciﬁc exceptions with speciﬁc pieces of code. What happens if we apply hd to nil? The exception Match is raised. If the pattern can fail to match a value of this type. then the exception Match can never be raised during evaluation (and hence no runtime checking need be performed). pattern match failure.1. provided that we take care to ensure that it is never applied to a value outside of its domain. the result is nevertheless welldeﬁned because ML checks for. evaluation of the binding val h:: = nil raises the exception Bind since the pattern h:: does not match the value nil. Suppose that we wish to deﬁne a “checked factorial” function that ensures that its argument is nonnegative. Here’s a ﬁrst attempt at deﬁning such a function: W ORKING D RAFT D ECEMBER 9.
even though we can prove that if the initial argument is nonnegative. A more significant problem is that checked factorial repeatedly checks the validity of its argument on each recursive call. but instead relies on explicit comparison operations. One. and that the auxiliary function makes effective use of patternmatching. 12. To some extent this is unavoidable since we wish to check explicitly for negative arguments.2 Exception Handlers The use of exceptions to signal error conditions suggests that raising an exception is fatal: execution of the program terminates with the raised exW ORKING D RAFT D ECEMBER 9. This fact is not reﬂected in the code.114 exception Factorial fun checked factorial n = if n < 0 then raise Factorial else if n=0 then 1 else n * checked factorial (n1) Exceptions The declaration exception Factorial introduces an exception Factorial. which we raise in the case that checked factorial is applied to a negative number. then so must be the argument on each recursive call. issue is that it does not make effective use of pattern matching. 2002 . We can improve the deﬁnition by introducing an auxiliary function: exception Factorial local fun fact 0 = 1  fact n = n * fact (n1) in fun checked factorial n = if n >= 0 then fact n else raise Factorial end Notice that we perform the range check exactly once. The deﬁnition of checked factorial is unsatisfactory in at least two respects. which cannot be done using a pattern. relatively minor.
then evaluation resumes with the expression part of that clause." An expression of the form exp handle match is called an exception handler. If no handler handles the exception. then evaluating exp. and prints the result. But signaling an error is only one use of the exception mechanism. computes their factorial. the previous handler is restored as part of completing the evaluation of the handle expression. Passing an exception to a handler deinstalls that handler. That is. If the pattern of a clause matches the exception exc. In more operational terms. The previous binding of the exception handler is preserved so that it may be restored once the given handler is no longer needed. More generally.12. then the exception is propagated to the handler active prior to evaluation of the handle expression. Raising an exception consists of passing a value of type exn to the current exception handler. If no pattern matches. or fails to handle the given exception. exp raises an exception exc. exceptions can be used to effect nonlocal transfers of control. If. then the exception value is matched against the clauses of the match (exactly as in the application of a clausal function to an argument) to determine how to proceed. If the expression does not raise an exception. evaluation of exp handle match proceeds by installing an exception handler determined by match. W ORKING D RAFT D ECEMBER 9. the handler plays no role in this case. and reinstalls the previously active handler. fun factorial driver () = let val input = read integer () val result = toString (checked factorial input) in print result end handle Factorial => print "Out of range. computation is aborted with the uncaught exception exc. It is evaluated by attempting to evaluate exp. A very simple example is provided by the following driver for the factorial function that accepts numbers from the keyboard. then the uncaught exception is signaled as the ﬁnal result of evaluation. By using an exception handler we may “catch” a raised exception and continue evaluation along some other path. 2002 . the exception exc is reraised so that outer exception handlers may dispatch on it. If it returns a value. however. This ensures that if the handler itself raises an exception.2 Exception Handlers 115 ception. then that is the value of the entire expression.
we might repeatedly read integers until the user terminates the input stream (by typing the end of ﬁle character). computes its facW ORKING D RAFT D ECEMBER 9. and resume. Here’s a sketch of a more complicated factorial driver: fun factorial driver () = let val input = read integer () val result = toString (checked factorial input) val = print result in factorial driver () end handle EndOfFile => print "Done. 2002 ." val in factorial driver () end We will return to a more detailed discussion of input/output later in these notes." in factorial driver () end  Factorial => let = print "Out of range. but one could easily imagine generalizing it in a number of ways that also make use of exceptions. printing the result if the given number is in range. The example is trivialized to focus on the role of exceptions. we see that evaluation proceeds by attempting to compute the factorial of a given number (read from the keyboard by an unspeciﬁed function read integer). we might expect that the function read integer raises the exception SyntaxError in the case that the input is not properly formatted. Similarly. Termination of input might be signaled by an EndOfFile exception.116 Exceptions Returning to the function factorial driver. For example. Again we would handle this exception. and otherwise reporting that the number is out of range. print a suitable message. which is handled by the driver."  SyntaxError => let val = print "Syntax error. The point to notice here is that the code is structured with a completely uncluttered “normal path” that reads an integer.
The primary beneﬁts of the exception mechanism are as follows: 1. In the latter two cases we report an error. The reader is encouraged to imagine how one might structure this program without the use of exceptions. formats it. They allow you to segregate the special case from the normal case in the code (rather than clutter the code with explicit checks). A very simple. and resume reading. if somewhat artiﬁcial. syntax error. example is provided by the following function to compute change from an arbitrary list of coin values. In fact we must use two 5’s and three 2’s to make 16 cents. prints it. In the former we simply report completion and we are done. Given only a 5 cent and a 2 cent coin.12. and domain error. Simulate evaluation of the example of change [5. you’ll get an uncaught exception at runtime). but if we get “stuck”. a programming technique based on exhaustive search of a state space. 2002 . Here’s a method that works for any set of coins: exception Change fun change 0 = nil  change nil = raise Change  change (coin::coins) amt = if coin > amt then change coins amt else (coin :: change (coin::coins) (amtcoin)) handle Change => change coins amt The idea is to proceed greedily. W ORKING D RAFT D ECEMBER 9.2 Exception Handlers 117 torial. What is at issue is that the obvious “greedy” algorithm for making change that proceeds by doling out as many coins as possible in decreasing order of value does not always work. A typical use of exceptions is to implement backtracking. These aspects work handinhand to facilitate writing robust programs. They force you to consider the exceptional case (if you don’t.2] 16 to see how the code works. and repeats. we cannot make 16 cents in change by ﬁrst taking three 5’s and then proceeding to dole out 2’s. and 2. The exception handler takes care of the exceptional cases: end of ﬁle. we undo the most recent greedy decision and proceed again from there.
We may use them to create values of a datatype (perhaps by applying them to other values). Recall that we may use value constructors in two ways: 1. we might associate with a SyntaxError exception a string indicating the precise nature of the error. To associate a string with the exception SyntaxError..118 Exceptions 12. we declare it as exception SyntaxError of string. This declaration introduces the exception SyntaxError as an exception carrying a string as value. In particular they can be used in patterns. 2. In a parser for a language we might write something like raise SyntaxError "Integer expected" to indicate a malformed expression in a situation where an integer is expected. Often it is useful to attach additional information that is passed to the exception handler.3 ValueCarrying Exceptions So far exceptions are just “signals” that indicate that an exceptional condition has arisen. This is achieved by attaching values to exceptions. 2002 . W ORKING D RAFT D ECEMBER 9. The situation with exception constructors is symmetric. For example. as in the following code fragment: . Exception constructors are in many ways similar to value constructors. handle SyntaxError msg => print "Syntax error: " ˆ msg Here we specify a pattern for SyntaxError exceptions that also binds the string associated with the exception to the identiﬁer msg and prints that string along with an error indication. This declaration introduces the exception constructor SyntaxError. and write raise SyntaxError "Identifier expected" to indicate a badlyformed identiﬁer. We may use them to match values of a datatype (perhaps also matching a constituent value)..
and SyntaxError "Integer expected" is a value of type exn. the data values associated with an exception.12. The type exn is the type of exception packets. What about exception constructors? A “bare” exception constructor (such as Factorial above) has type exn and a valuecarrying exception constructor (such as SyntaxError) has type string > exn Thus Factorial is a value of type exn. The primitive operation raise takes any value of type exn as argument and raises an exception with that value. For this reason the type exn is sometimes called an extensible datatype. but rather are incrementally introduced as needed in a program. Value constructors have types. 2. the list constructors nil and :: have types ’a list and ’a * ’a list > ’a list. 2002 . except that the constructors of this type are not determined once and for all (as they are with a datatype declaration).3 ValueCarrying Exceptions 119 1. We may use them to create an exception (perhaps with an associated value). then the handler cannot catch it (other than via a wildcard pattern). For example. respectively. The type exn may be thought of as a kind of builtin datatype. if an exception constructor is no longer in scope. W ORKING D RAFT D ECEMBER 9. as we previously mentioned. The clauses of a handler may be applied to any value of type exn using the rules of pattern matching described earlier. We may use them to match an exception (perhaps also matching the associated value).
120 Exceptions W ORKING D RAFT D ECEMBER 9. 2002 .
2002 . Since cells are used by issuing “commands” to modify and retrieve their contents. programming with cells is called imperative programming. A mutable cell may be thought of as a container in which a data value of a speciﬁed type is stored. for which no such concepts apply. called a storage effect. During execution of a program the contents of a cell may be retrieved or replaced by any other value of the appropriate type. but ﬁrst we introduce the primitive mechanisms for programming with mutable storage in ML.1 Reference Cells To support mutable storage the execution model that we described in chapter 2 is enriched with a memory consisting of a ﬁnite set of mutable cells. they are sometimes called side effects.Chapter 13 Mutable Storage In this chapter we consider a second form of effect. We also speak of previous and future values of a reference cell when discussing the behavior of a program. the allocation or mutation of storage during evaluation. Changing the contents of a mutable cell introduces a temporal aspect to evaluation. 13. not all of which are desirable. While it is excessive to dismiss storage effects as completely undesirable. The introduction of storage effects has profound consequences. the binding of a variable does not change while W ORKING D RAFT D ECEMBER 9. it is advantageous to minimize the use of storage effects except in situations where the task clearly demands them. meaning the value most recently assigned to it. We will explore some techniques for programming with storage effects later in this chapter. We speak of the current contents of a cell. Indeed. This is in sharp contrast to the effectfree fragment of ML. For example. by analogy with the unintended actions of some medications.
the variable x is bound to 3. ﬁrst class — they may be bound to variables. = exp when exp Notice the use of a val binding of the form val is to be evaluated purely for its effect.122 Mutable Storage evaluating within the scope of that variable. since the lefthand side is a wildcard pattern. The contents of a cell of type typ is retrieved using the function ! of type typ ref > typ. and z is bound to 10. and even be stored within other reference cells. ref allocates a “new” cell. 2002 . or allocated. so that its value is guaranteed to be the nulltuple. initializes its content to val . if it has a value at all. passed as arguments to functions. When applied to a cell and a value. Here are some examples: val val val val val val val val r = ref 0 s = ref 0 = r := 3 x = !s + !r t = r = t := 5 y = !s + !r z = !t + !r After execution of these bindings. and does not share storage with them. and returns a reference to the cell. The value of exp is discarded by the binding. lending a “permanent” quality to statements about variables — the “current” binding is the only binding that variable will ever have. A reference cell is created. returned as results of functions. and yields the nulltuple as result. The contents of a cell is changed by applying the assignment operator op :=. Assignment is usually written using inﬁx syntax. Reference cells are. The expression W ORKING D RAFT D ECEMBER 9. by the function ref of type typ > typ ref. (). The type typ ref is the type of reference cells containing values of type typ. the variable y is bound to 5. it replaces the content of that cell with that value. In most cases the expression exp has type unit. By “new” we mean that the allocated cell is distinct from all other cells previously allocated. When applied to a value val of type typ. which has type typ ref * typ > unit. appear within data structures. A wildcard binding is used to deﬁne sequential composition of expressions in ML. Applying ! to a reference cell yields the current contents of that cell. like all values.
An alternative to explicitly dereferencing cells is to use ref patterns. A pattern of the form ref pat matches a reference cell whose content matches the pattern pat. This means that the cell’s contents are implicitly retrieved during pattern matching. 13. This is both a boon and a bane. For example. Whether explicit or implicit dereferencing is preferable is to a large extent a matter of taste. 2002 . because they are executed purely for their effect. then evaluates exp2 .2 Reference Patterns It is a common mistake to omit the exclamation point when referring to the content of a reference. especially when that cell is bound to a variable. (). = exp1 123 Functions of type typ>unit are sometimes called procedures.13. and may be subsequently used without explicit dereferencing. and not the content. and they are implicitly dereferenced whenever they are used so that a variable always stands for its current contents. it shifts the burden to the programmer in the case that the address. In C one writes & x for the address of (the cell bound to) x. It is obviously helpful in many common cases since it alleviates the burden of having to explicitly dereference variables whenever their content is required. exp2 is shorthand for the expression let val in exp2 end that ﬁrst evaluates exp1 for its effect. This is apparent from the type: it is assured that the value of applying such a function is the nulltuple. whereas they are the sole means of binding variables in more familiar languages. so the only point of applying it is for its effects on memory. In more familiar languages such as C all variables are implicitly bound to reference cells. The burden of explicit dereferencing is not nearly so onerous in ML as it might be in other languages simply because reference cells are used so infrequently in ML programs. the second implementation of rot3 above might be written using ref patterns as follows: W ORKING D RAFT D ECEMBER 9.2 Reference Patterns exp1 . is intended. However.
c) = let val (ref x. include visible modiﬁcations to mutable storage or any input/output it may perform. The idea is that a complete program is one that computes a concrete result such as a number. This is called Leibniz’s Principle of identity of indiscernables — we equate everything that we cannot tell apart. c) in a := y. In general we say that two expressions (of the same type) are equal iff they cannot be distinguished by any operation in the language. The crucial idea is that we can tell two expressions apart iff there is a complete program containing one of the expressions whose observable behavior changes when we replace that expression by the other. As a consequence we obtain the principle of indiscernability of identicals — that which we deem equal cannot be told apart. b. b. That is. That is. int or bool or string). ref z) = (a. two expressions are considered equal iff there is no such scenario that distinguishes them. One can have the latter without the former. either by nontermination or by raising an uncaught exception. Its visible side effects. 13. two expressions are distinct iff there is some way within the language to tell them apart. c := x end In practice it is common to use both explicit dereferencing and ref patterns.) What makes Leibniz’s Principle tricky to grasp is that it hinges on what we mean by a “way to tell expressions apart”. W ORKING D RAFT D ECEMBER 9. ref y.3 Identity Reference cells raise delicate issues of equality that considerably complicate reasoning about programs. Its value. depending on the situation. 2. b := z. or lack thereof. But what do we mean by “complete program”? And what do we mean by “observable behavior”? For the present purposes we will consider a complete program to be any expression of basic type (say. 2002 . The observable behavior of a complete program includes at least these aspects: 1. (Think about this for a minute. Leibniz’s Principle is a principle of maximality for equality.124 Mutable Storage fun rot3 (a.
2002 . Let us declare r and s as follows: val r = ref () val s = ref () W ORKING D RAFT D ECEMBER 9.3 Identity 125 In contrast here are some behaviors that we will not count as observations: 1. Now consider a third. 3. and mutates s as before. With these ideas in mind. and therefore must be considered distinct. Consider the following usage of r to compute an integer result: (s := 1 . it should be plausible that if we evaluate these bindings val r = ref 0 val s = ref 0 then r and s are not equivalent.e. Now replace r by s to obtain (s := 1 . 2.13. In this case r and s are equivalent. after all. Had we replaced the binding for s by the binding val s = r then the two expressions that formerly distinguished r from s no long do so — they are. internallyallocated reference cells). and mutates the binding of s. The name of uncaught exceptions (i. “Private” uses of storage (e. we will not distinguish between terminating with the uncaught exception Bind and the uncaught exception Match. These two complete programs distinguish r from s.g. bound to the same reference cell! In fact. no program can be concocted that would distinguish them.. !s) This expression evaluates to 1.. very similar scenario. Execution time or space usage. !r) Clearly this expression evaluates to 0.
let us consider the problem of aliasing.4 Aliasing To see how reference cells complicate programming. but after the declaration W ORKING D RAFT D ECEMBER 9. undecidable — there is no computer program that determines whether two expressions are equivalent in this sense. 13. even though they can only ever hold the value (). unfortunately. is. Thus the two cells bound to r and s above are observably distinct (by testing reference equality). but in fact there is a way to distinguish them! Here’s a complete program involving r that we will use to distinguish r from s: if r=r then "it’s r" else "it’s not" Now replace the ﬁrst occurrence of r by s to obtain if s=r then "it’s r" else "it’s not" and the result is different. Any two variables of the same reference type might be bound to the same reference cell. Why does ML provide such a ﬁnegrained notion of equality? “True” equality. as deﬁned by Leibniz’s Principle.126 Mutable Storage Are r and s equivalent or not? We might ﬁrst try to distinguish them by a variant of the experiment considered above. pointer equality). This breaks down because there is only one possible value we can assign to a variable of type unit ref! Indeed. This example hinges on the fact that ML deﬁnes equality for values of reference type to be reference equality (or. 2002 . occasionally. after the declarations val r = ref 0 val s = ref 0 the variables r and s are not aliases. or to two different reference cells. For example. otherwise they are distinct. Had equality not been included as a primitive. conservative approximation to true equality that in some cases is not deﬁned (you cannot test two functions for equality) and in other cases is too picky (it distinguishes reference cells that are otherwise indistinguishable). any two reference cells of unit type would have been equal. ML provides a useful. one may suspect that r and s are equivalent in this case. Such is life. Two reference cells (of the same type) are equal in this sense iff they both arise from the exact same use of the ref operation to allocate that cell.
z) = (!a. !b. b. because we cannot assume that two different argument variables are bound to different reference cells. c) = let val t = !a in a := !b. z. b := z. y. A classic example is a program to “rotate” the contents of three cells: given reference cells a. This is particularly problematic in the case of functions. and z. in which case we say that the two variables are aliases for one another. y:typ ref) => exp we may not assume that x and y are bound to different reference cells. b. in a function of the form fn (x:typ ref. 2002 . respectively. and x. be bound to the same reference cell. with initial contents x. c := t end This code works ﬁne if a. W ORKING D RAFT D ECEMBER 9. They might. Aliasing is a huge source of bugs in programs that work with reference cells. Here’s a candidate implementation: fun rot3 (a. c) = let val (x.13. c := x end Notice that we use ordinary immutable variables to temporarily hold the initial contents of the cells while their values are being updated. We must always ask ourselves whether we’ve properly considered aliasing when writing such a function. b. and c. These examples show that we must be careful when programming with variables of reference type. This is harder to do than it sounds. and c are three distinct reference cells. Afterwards the contents of a. and x! Here’s a solution that works correctly. b. !c) in a := y. y. b := !c. in fact. and c are y. set their contents to y. y. b. For example. even in the presence of aliasing: fun rot3 (a.4 Aliasing val r = ref 0 val s = r 127 the variables r and s are aliases for the same reference cell. But suppose that a and c are the same cell.
This is not to suggest. result := !result * !i. !result end Notice that the function loop is essentially just a while loop. however.5. The deﬁnition we gave earlier is shorter.1 Private Storage The ﬁrst example is the use of higherorder functions to manage shared private state. more efﬁcient. it repeatedly executes its body until the contents of the cell bound to i reaches n. For example. We will now discuss some important uses of state in ML. 13.5 Programming Well With References Using references it is possible to mimic the style of programming used in imperative languages such as C. simpler. we might deﬁne the factorial function in imitation of such languages as follows: fun imperative fact (n:int) = let val result = ref 1 val i = ref 0 fun loop () = if !i = n then () else (i := !i + 1. that there are no good uses of references. There is nothing about its deﬁnition that suggests that state must be maintained. 2002 . Here’s an example to frame the discussion: W ORKING D RAFT D ECEMBER 9. loop ()) in loop (). This programming style is closely related to the use of objects to manage state in objectoriented programming languages. The purpose of the function imperative fact is to compute a simple function on the natural numbers. The tail call to loop is essentially just a goto statement to the top of the loop. It is (appallingly) bad style to program in this fashion.128 Mutable Storage 13. and hence more suitable to the task. and so it is senseless to allocate and modify storage to compute it.
reset = reset } end The type of new counter is unit > { tick : unit>int. We can achieve this by deﬁning a counter generator (or constructor) as follows: fun new counter () = let val counter = ref 0 fun tick () = (counter := !counter + 1. The function new counter may be thought of as a constructor for a class of counter objects. tick and reset. Suppose now that we wish to have several different instances of a counter — different pairs of functions tick and reset that share different state. There is an obvious analogy with classbased objectoriented programming. Each object has a private instance variable counter that is shared between the methods tick and reset of the object represented as a record with two ﬁelds. Their deﬁnitions share a private variable counter that is bound to a mutable cell containing the current value of a shared counter.13. Here’s how we use counters. We’ve packaged the two operations into a record containing two functions that share private state. reset : unit>unit }. The types of the operations suggest that implicit state is involved. The declaration above deﬁnes two functions. The tick operation increments the counter and returns its new value. In the absence of exceptions and implicit state. W ORKING D RAFT D ECEMBER 9. !counter) fun reset () = (counter := 0) in { tick = tick.5 Programming Well With References 129 local val counter = ref 0 in fun tick () = (counter := !counter + 1. namely the function that always returns its argument (and it’s debatable whether this is really useful!). tick of type unit > int and reset of type unit > unit. !counter) fun reset () = (counter := 0) end This declaration introduces two functions. 2002 . and the reset operation resets its value to zero. that share a single private counter. there is only one useful function of type unit>unit.
For example. To do this in ML we make use of references. (* 1 *) #reset c1 (). (* 1 *) #tick c1 (). The principal drawback is that if we aren’t really relying on persistence. #tick c1 ().2 Mutable Data Structures A second important use of references is to build mutable data structures. Since the graph is ﬁnite. 2002 . (* 2 *) Mutable Storage Notice that c1 and c2 are distinct counters that increment and reset independently of one another. The principal beneﬁt is that immutable data structures are persistent in that operations performed on them do not destroy the original structure — in ML we can eat our cake and have it too. in the graph. The data structures (such as lists and trees) we’ve considered so far are immutable in the sense that it is impossible to change the structure of the list or tree without building a modiﬁed copy of that structure. we can simultaneously maintain a dictionary both before and after insertion of a given word. In contrast to ordinary lists the predecessor relation is not necessarily wellfounded: there may be an inﬁnite sequence of nodes arranged in descending order of predecession. or pcl’s. A simple example is the type of possibly circular lists. (* 2 *) #tick c2 (). this can only happen if there is a cycle in the graph: some node has an ancestor as predecessor. then it is wasteful to make a copy of a structure if the original is going to be discarded anyway. This is both a beneﬁt and a drawback. What we’d like in this case is to have an “update in place” operation to build an ephemeral (opposite of persistent) data structure. a pcl is a ﬁnite graph in which every node has at most one neighbor. 13. How can such a structure ever come into existence? If the predecessors of a cell are needed to construct a cell.130 val c1 = new counter () val c2 = new counter () #tick c1 (). called its predecessor. then W ORKING D RAFT D ECEMBER 9. Informally.5. (* 1 *) #tick c2 ().
(Cons (h. nill()). = stl (tail.. val infinite = cons (4. cons (3. If you’d like. the cell at the end of a noncircular possiblycircular list. we need a way to “zap” the tail of a possiblycircular list. Here is a ﬁnite and an inﬁnite pcl. A value of type typ pcell is either Nil. (Cons ( .13. cons (1. no set of previouslyencountered nodes) and runs in time proportional to the number of cells in W ORKING D RAFT D ECEMBER 9. cons (2. then it is reset to the appropriate ancestor to create the cycle. Now let us deﬁne the size of a pcl to be the number of distinct nodes occurring in it. (ref Nil).5 Programming Well With References 131 the ancestor that is to serve as predecessor in the cyclic case can never be created! The “trick” is to employ backpatching: the predecessor is initialized to Nil. t))). Here are some convenient functions for creating and taking apart possiblycircular lists: fun fun fun fun cons (h. it would make sense to require that the tail of the Cons cell be the empty pcl. t)))) = t. A value of type typ pcl is essentially a reference to a value of type typ pcell. t). fun stl (Pcl (r as ref (Cons (h. 2002 . infinite) val The last step backpatches the tail of the last cell of infinite to be infinite itself. tail))). ))). cons (3. u) = (r := Cons (h. nill ())))) val tail = cons (1. where h is a value of type typ and t is another such possiblycircular list. val finite = cons (4. )))) = h. so that you’re only allowed to backpatch at the end of a ﬁnite pcl. or Cons (h. This can be achieved in ML using the following datatype declaration: datatype ’a pcl = Pcl of ’a pcell ref and ’a pcell = Nil  Cons of ’a * ’a pcl.g. cons (2. t) = nill () = Pcl phd (Pcl (ref ptl (Pcl (ref Pcl (ref (Cons (h. u)). To implement backpatching. so that the node and its ancestors can be constructed. creating a circular list. It is an interesting problem is to deﬁne a size function for pcls that makes no use of auxiliary storage (e.
This covers the ﬁrst three clauses of race. if this happens. Nil)  race (Cons ( . The idea is to think of running a long race between a tortoise and a hare. checking whether the hare catches the tortoise from behind. local fun race (Nil. m) and race’ (Pcl (r as ref c). 13. the course must be circular. which quickly runs out ahead of the tortoise.132 Mutable Storage the pcl. Cons ( . The type typ array is the type of arrays carrying values of type typ. the hare’s job is simply to detect cycles. it simply waits for the tortoise to ﬁnish counting. Pcl (ref c)). Nil)  race (Cons ( . Nil) = 0  race (Cons ( . Pcl (s as ref d)) = if r=s then 0 else race (c. If the hare reaches the ﬁnish line. l). m))))) = 1 + race’ (l. the deﬁnition of race is inexhaustive. with the given value as the initial value of every element of the array. The basic operations on arrays are these: val val val val array : int * ’a > ’a array size : ’a array > int sub : ’a array * int > ’a update : ’a array * int * ’a > unit The function array creates a new array of a given size. Notice that it can never arise that the tortoise reaches the end before the hare does! Consequently. will eventually come from behind and pass it! Conversely. Pcl (ref Nil))) = 1 + race (c. The function size W ORKING D RAFT D ECEMBER 9. then the hare. If the course is circular. we must continue with the hare running at twice the pace. Pcl (ref c)). ML also provides mutable arrays as a primitive data structure. If the hare has not yet ﬁnished. Pcl (ref (Cons ( .6 Mutable Arrays In addition to reference cells. 2002 . d) in fun size (Pcl (ref c)) = race (c. c) end The hare runs twice as fast as the tortoise. Nil) = 1 + race (c. We let the tortoise do the counting. Cons ( .
. . SOME r). n. . .array (limit. C i − 1 for each 1 ≤ i ≤ 9. . We can do better by caching previouslycomputed results in an array. which may be thought of as the number of distinct ways to parenthesize an arithmetic expression consisting of a sequence of n consecutive multiplication’s. NONE) in fun C’ 1 = 1  C’ n = sum (fn k => (C k)*(C (nk))) (n1) and C n = if n < limit then case Array. Here’s the code: local val limit : int = 100 val memopad : int option array = Array. The function sub performs a subscript operation.) One simple use of arrays is for memoization. .sub (memopad.) fun C 1 = 1  C n = sum (fn k => (C k) * (C (nk))) (n1) This deﬁnition of C is hugely inefﬁcient because a given computation may be repeated exponentially many times. to compute C 10 we must compute C 1. please see V for complete information. C 9. . For example.update (memopad. Here’s a function to compute the nth Catalan number. C 2. It makes use of an auxiliary summation function that you can easily deﬁne for yourself. . n) of SOME r => r  NONE => let val r = C’ n in Array. (Applying sum to f and n computes the sum of f 0 + · · · + f n.6 Mutable Arrays 133 returns the size of an array. and the computation of C i engenders the computation of C 1. (These are just the basic operations on arrays. leading to an enormous improvement in execution speed. r end else C’ n W ORKING D RAFT D ECEMBER 9. 2002 .13. returning the ith element of an array A. where 0 ≤ i ≤ size(A).
When called it consults the memopad to determine whether or not the required result has already been computed. otherwise the result is computed. If so. 2002 . with the important difference that the recursive calls are to C. rather than C’ itself. This can be alleviated by implementing a more sophisticated cache management scheme that dynamically adjusts the size of the cache based on the calls made to it. and returned.134 end Mutable Storage Note carefully the structure of the solution. stored in the cache. The function C is a memoized version of the Catalan number function. This ensures that subcomputations are properly cached and that the cache is consulted whenever possible. The function C’ looks superﬁcially similar to the earlier deﬁnition of C. W ORKING D RAFT D ECEMBER 9. The main weakness of this solution is that we must ﬁx an upper bound on the size of the cache. the answer is simply retrieved from the memopad.
These are ordinarily connected to W ORKING D RAFT D ECEMBER 9. Unfortunately. Each program comes with one input stream and one output stream. respectively. The sink could be a disk ﬁle. An input stream may be thought of as a buffer containing zero or more characters that have already been read from the source. Any source of characters can be attached to an input stream. respectively. or another program (to name a few choices). The reader is referred to V for more details. 2002 .1 Textual Input/Output The text I/O primitives are based on the notions of an input stream and an output stream. each implementation provides its own package. Similarly. an interactive user. Any sink for characters can be attached to an output stream. These modules provide a rudimentary. together with a means of requesting more input from the source should the program require it. platformindependent text I/O facility that we summarize brieﬂy here. there is at present no standard library for graphical user interfaces. called stdIn and stdOut. The source could be a disk ﬁle. An input stream is an unbounded sequence of characters arising from some source. an interactive user. 14. an output stream is an unbounded sequence of characters leading to some sink. which are values of type instream and outstream.Chapter 14 Input/Output The Standard ML Basis Library (described in V) deﬁnes a threelayer input and output facility for Standard ML. An output stream may be thought of as a buffer containing zero or more characters that have been produced by the program but have yet to be ﬂushed to the sink. See your compiler’s documentation for details. or another program (to name a few choices).
Given a stream s and a bound n. Textual input and output are performed on streams using a variety of primitives. use the function output of type outstream * string > unit. which writes the given string to the speciﬁed output stream. the system attempts to open a ﬁle with that name (according to operating systemspeciﬁc naming conventions) and attaches it as a source to a new input stream. so that it is displayed on the screen). The print function is a composition of output (to stdOut) and flushOut. use the function print of type string > unit. If none are currently available. The output stream stdErr is also predeﬁned. The simplest are inputLine and print.136 Input/Output the user’s keyboard and screen. use the function inputLine of type instream > string. If the source is exhausted. To test whether an input operation would block. To write to a speciﬁc stream. which ensures that the output stream is ﬂushed to the sink. input returns the null string. return the empty string. the system attempts to create a ﬁle with that name (according to operating systemspeciﬁc naming conventions) and attaches it as a sink for a new output stream. To write a line to stdOut. Similarly. a new output stream may be created by calling the function openOut of type string > outstream.g. then the operation blocks until at least one character is available from the source. 2002 . but the end of source has not been reached. For interactive applications it is often important to ensure that the output stream is ﬂushed to the sink (e. A closed input stream behaves as if there is no further input available. An input stream may be closed using the function closeIn of type instream > unit.IO. It reads a line of input from the given stream and yields that line as a string whose last character is the line terminator. the function canInput deW ORKING D RAFT D ECEMBER 9. and are used for performing simple text I/O in a program. When applied to a string. A closed output stream is unavailable for further output. It is ordinarily connected to the user’s screen. The function input of type instream > string is a blocking read operation that returns a string consisting of the characters currently available from the source. This is achieved by calling flushOut of type outstream > unit. To read a line of input from a stream. use the function canInput of type instream * int > int option. and is used for error reporting. If the source is exhausted or the input stream is closed. An output stream may be closed using closeOut of type outstream > unit. request for input from a closed input stream yield the empty string. A new input stream may be created by calling the function openIn of type string > instream.. an attempt to write to a closed output stream raises the exception TextIO. When applied to a string.
1 Textual Input/Output 137 termines whether or not a call to input on s would immediately yield up to n characters. otherwise it yields SOME k. For further information on textual I/O. If canInput yields SOME 0. The function output of type outstream * string > unit writes a string to an output stream. the stream is either closed or exhausted. and support for binary I/O and Posix I/O primitives. canInput yields NONE. The function endOfStream of type instream > bool tests whether the input stream is currently at the end (no further input is available from the source). If the input operation would block. another process might append data to an open ﬁle in between calls to endOfStream. This condition is transitive since. with 0 ≤ k ≤ n being the number of characters immediately available on the input stream.14. W ORKING D RAFT D ECEMBER 9. The function flushOut of type outstream > unit forces any pending output to the sink. 2002 . for example. blocking until the sink accepts the remaining buffered output. This collection of primitive I/O operations is sufﬁcient for performing rudimentary textual I/O. see the Standard ML Basis Library. It may block until the sink is able to accept the entire string.
2002 .138 Input/Output W ORKING D RAFT D ECEMBER 9.
For this reason ML is sometimes said to be an eager language. and not wellmotivated. known as a computation or a suspension or a thunk. If the argument does not have a value. The righthand side of a val binding is evaluated before the binding is effected. This general principle has several consequences: 1. 2002 . ﬁrmly established. however. to compute the result of applying the function fn x => 1 to an argument. For reasons that are lost in the mists of time. According to the byvalue discipline. 3.Chapter 15 Lazy Data Structures In ML all variables are bound by value. If the righthand side has no value. the val binding does not take effect. It is.1 which means that the binding of a variable is an unevaluated expression.2 This principle has several consequences: 1 2 The terminology is historical. In a function application the argument is evaluated before being passed to the function by binding that value to the parameter of the function. regardless of whether that variable is ever needed to complete execution. The arguments to value constructors are evaluated before the constructed value is created. An alternative is to bind variables by name. which means that the bindings of variables are fully evaluated expressions. 2. W ORKING D RAFT D ECEMBER 9. then neither does the application. but we do anyway. the bindings of variables are evaluated. For example. we never actually need to evaluate the argument. or values.
2002 . For example. its value is saved for future reference should that computation ever be needed again. According to the byname principle. Languages that adopt the byname discipline are. and exactly once if it ever needed at all. Reevaluation of the same computation is avoided by memoization. According to the byname discipline. not a value. This is useful for representing online data structures that are created W ORKING D RAFT D ECEMBER 9. In a function application the argument is passed to the function in unevaluated form by binding it directly to the parameter of the function. and are evaluated only as often as the value of that variable’s binding is required to complete the computation. The variable is bound to a computation (unevaluated expression). for this reason. This discussion glosses over another important aspect of lazy evaluation. The main beneﬁt of laziness is that it supports demanddriven computation.140 Lazy Data Structures 1. to evaluate the expression x+x. if it is never needed. According to the byneed principle. The advantages and disadvantages of lazy vs. This affords the beneﬁts of laziness. and hence it is evaluated twice. In particular. the binding of a variable is evaluated at most once — not at all. but rather content ourselves with the observation that laziness is a special case of eagerness. said to be lazy. and ignore it when it is not. 3. it is necessary to evaluate the binding of x in order to perform the addition. This holds regardless of whether the argument has a value or not. to evaluate the expression x+x the value of the binding of x is need twice. The righthand side of a val binding is not evaluated before the binding is effected. eager languages have been hotly debated. called memoization. but on a controlled basis — we can use it when it is appropriate. the bindings of variables are only evaluated (if ever) when their values are required by a primitive operation. The arguments to value constructor are left unevaluated when the constructed value is created. called the byneed principle. In actual fact laziness is based on a reﬁnement of the byname principle. allowing us to incorporate laziness into the language without disrupting its fundamental character on which so much else depends. Once a computation is evaluated. 2. (Recent versions of) ML have lazy data types that allow us to treat unevaluated computations as values of such types. We will not enter into this debate here. variables are bound to unevaluated computations.
are one example of an online data structure. Interactive data structures. and so on ad inﬁnitum. The ideas are best illustrated by example. but rather are created “on demand” in response to the progress of computation up to that point. Inﬁnite data structures.15. Should we inspect that computation. When we do. are another example of online data structure. and val is another such computation. 15. where val is of type typ. 2002 .1 Lazy Data Types 141 only insofar as we examine them. it will again have this form. Clearly we cannot every “ﬁnish” creating the sequence of all prime numbers. The computation is not evaluated until we examine it.Control. Notice how this description captures the “incremental” nature of lazy data structures. but we can create as much of this sequence as we need for a given run of a program. We will focus attention on the type of inﬁnite streams. The lazy evaluation features must be enabled by executing the following at top level: Compiler. open Lazy. Doing so speciﬁes that the values of type typ stream are computations of values of the form Cons (val .lazysml := true.1 Lazy Data Types SML/NJ provides a general mechanism for introducing lazy data types by simply attaching the keyword lazy to an ordinary datatype declaration. which may be declared as follows: datatype lazy ’a stream = Cons of ’a * ’a stream Notice that this type deﬁnition has no “base case”! Had we omitted the keyword lazy. Note: Lazy evaluation is a nonstandard feature of ML that is supported only by the SML/NJ compiler. The demanddriven nature of online data structures is precisely what is needed to model this behavior. its structure is revealed as consisting of an element val together with another suspended computation of the same type. such as the sequence of inputs provided by the user of an interactive system. since there would be no way to create a value of that type! Adding the keyword lazy makes all the difference. such as the sequence of all prime numbers in order of magnitude. W ORKING D RAFT D ECEMBER 9. such a datatype would not be very useful. val ). In such a system the user’s inputs are not predetermined at the start of execution.
2 Lazy Function Deﬁnitions The combination of (recursive) lazy function deﬁnitions and decomposition by pattern matching are the core mechanisms required to support lazy evaluation. ones). and a third new mechanism. 15. the lazy function declaration. yielding Cons (1. The general rule is pattern matching forces evaluation of a computation to the extent required by the pattern. t) = ones extracts the “head” and “tail” of the stream ones. rather than a value. t’)) = ones To evaluate this binding. the binding val Cons (h. It is the computation whose underlying value is constructed using Cons (the only possibility) from the integer 1 and the very same computation itself. ones). binding h to 1 in the process. Here is a declaration of the inﬁnite stream of 1’s as a value of type int stream: val rec lazy ones = Cons (1. (Cons (h’. further evaluation would be required. ones). This is the means by which lazy data structures are evaluated only insofar as required. s)) = s W ORKING D RAFT D ECEMBER 9. However. we may deﬁne two functions to extract the head and tail of a stream as follows: fun shd (Cons (h. we evaluate ones to Cons (1. there is a subtlety about function deﬁnitions that requires careful consideration. 2002 . then evaluate ones again to Cons (1.142 Lazy Data Structures Values of type typ stream are created using a val rec lazy declaration that provides a means for building a “circular” data structure. Had the pattern been “deeper”. We can inspect the underlying value of a computation by pattern matching. Using pattern matching we may easily deﬁne functions over lazy data structures in a familiar manner. ones) The keyword lazy indicates that we are binding ones to a computation. )) = h fun stl (Cons ( . For example. For example. The keyword rec indicates that the computation is recursive (or selfreferential or circular). as in the following binding: val Cons (h. This is performed by evaluating the computation bound to ones. then performing ordinary pattern matching to bind h to 1 and t to ones. binding h’ to 1 and t’ to ones.
" *) val = stl s (* silent *) val rec lazy s = (print ". its argument is not evaluated when it is called." *) Since stl evaluates its argument when applied. val = lstl s (* silent *) val = stl s (* prints ". The behavior of the two forms of tail function can be distinguished using print statements as follows: val rec lazy s = (print ". 2002 . the “. s)) = s The keyword lazy indicates that an application of lstl to a stream does not immediately perform pattern matching on its argument. another point of view is also possible. when applied to a stream. evaluate it. While these functions are surely very natural. The issue is whether these functions are “lazy enough”. W ORKING D RAFT D ECEMBER 9. Cons (1. when forced. we may instead think of it as creating a stream out of another stream. s)) val = stl s (* prints ". However. respectively. In the case of the shd function there is no other interpretation — we are extracting a value of type typ from a value of type typ stream. Under this interpretation the argument to stl should not be evaluated until its result is required. expt ). the stream created by stl (according to this view) should also be suspended until its value is required. and match it against the given patterns to extract the head and tail. expt ). s)). We can adopt a similar viewpoint about stl. there is a subtle issue that deserves careful discussion. namely that it is simply extracting a component value from a computation of a value of the form Cons (exph . This leads to a variant notion of “tail” that may be deﬁned as follows: fun lazy lstl (Cons ( .". However. Since streams are computations. which is a computation of a value of the form Cons (exph . From one point of view.2 Lazy Function Deﬁnitions 143 These are functions that. rather than at the time stl is applied. what we are doing is decomposing a computation by evaluating it and retrieving its components.” is printed when it is ﬁrst called.15.". Cons (1. but only when its result is evaluated. in the case of stl. Rather than think of stl as extracting a value from a stream. since lstl only sets up a computation. but rather sets up a stream computation that. forces the argument and extracts the tail of the stream. but not if it is called again.
s)) = Cons (f x. 2002 . yielding another stream. loop s) in loop end We have “staged” the computation so that the partial application of smap to a function yields a function that loops over a given stream. forces the argument stream to obtain the head element. here’s a deﬁnition of the inﬁnite stream of natural numbers: val one plus = smap (fn n => n+1) val rec lazy nats = Cons (0. rather than merely set up a future computation of the same result. applying the given function to each element. fun sfilter pred = let fun lazy loop (Cons (x. one plus nats) Now let’s deﬁne a function sfilter of type (’a > bool) > ’a stream > ’a stream that ﬁlters out all elements of a stream that do not satisfy a given predicate. rather than evaluating its argument when it is called. and yields this as the head of the result. then an application of smap to a function and a stream would immediately force the computation of the head element of the stream. when forced. Here’s the code: fun smap f = let fun lazy loop (Cons (x.3 Programming with Streams Let’s deﬁne a function smap that applies a function to every element of a stream. applies the given function to it. To illustrate the use of smap. The type of smap should be (’a > ’b) > ’a stream > ’b stream. Had we dropped the keyword lazy from the deﬁnition of the loop. This loop is a lazy function to ensure that it merely sets up a stream computation. s)) = if pred x then W ORKING D RAFT D ECEMBER 9. The thing to keep in mind is that the application of smap to a function and a stream should set up (but not compute) another stream that.144 Lazy Data Structures 15.
3 Programming with Streams Cons (x. 1). ) = s. and you will see that the second time we force s the result is returned instantly.". (* prints ".". ) = s. taking advantage of the effort expended on the timeconsuming operation induced by the ﬁrst force of s. (* silent.n * (m div n) fun divides m n = n mod m = 0 fun lazy sieve (Cons (x. retains only those numbers that are not divisible by a preceding number in the stream: fun m mod n = m .". sieve (sfilter (not o (divides x)) s)) We may now deﬁne the inﬁnite stream of primes by applying sieve to the natural numbers greater than or equal to 2: val nats2 = stl (stl nats) val primes = sieve nats2 To inspect the values of a stream it is often useful to use the following function that takes n ≥ 0 elements from a stream and builds a list of those n values: fun take 0 = nil  take n (Cons (x. binds h to 1 *) Replace print ". binds h to 1 *) val Cons (h. 2002 .15. loop s) else loop s in loop end 145 We can use sfilter to deﬁne a function sieve that. W ORKING D RAFT D ECEMBER 9. s)) = Cons (x. s)) = x :: take (n1) s Here’s an example to illustrate the effects of memoization: val rec lazy s = Cons ((print ".1 by a timeconsuming operation yielding 1 as result. when applied to a stream of numbers. s) val Cons (h.
146 Lazy Data Structures W ORKING D RAFT D ECEMBER 9. 2002 .
2002 .Chapter 16 Equality and Equality Types W ORKING D RAFT D ECEMBER 9.
148 Equality and Equality Types W ORKING D RAFT D ECEMBER 9. 2002 .
W ORKING D RAFT D ECEMBER 9.Chapter 17 Concurrency Concurrent ML (CML) is an extension of Standard ML with mechanisms for concurrent programming. It is available as part of the Standard ML of New Jersey compiler. 2002 . The eXene Library for programming the X windows system is based on CML.
150 Concurrency W ORKING D RAFT D ECEMBER 9. 2002 .
Part III The Module Language W ORKING D RAFT D ECEMBER 9. 2002 .
.
A structure consists of a collection of components. that constitute the unit. Large units may be structured into hierarchies using substructures. Generic. including types and values. or parameterized. units may be deﬁned as functors. A signature may be thought of as the type of a unit. Composition of units to form a larger unit is mediated by a signature. Program units are called structures. W ORKING D RAFT D ECEMBER 9. which describes the components of that unit. 2002 .153 The Standard ML module language comprises the mechanisms for structuring programs into separate units.
2002 .154 W ORKING D RAFT D ECEMBER 9.
However. or structure. or implements. of a program unit. These will be introduced in chapter 21. and a structure may correspondingly be thought of as an implementation of a signature. and value bindings. exception constructors. where specs is a sequence of speciﬁcations. Ada. A signature may be thought of as an interface or speciﬁcation of a structure. substructure speciﬁcations and sharing speciﬁcations. or a description. 2002 .) 18.1 Basic Signatures A basic signature expression has the form sig specs end. 1 W ORKING D RAFT D ECEMBER 9. these are only rough analogies that should not be taken too seriously. or Java) have similar constructs: signatures are analogous to interfaces or package speciﬁcations or class types. 18. Many languages (such as Modula.Chapter 18 Signatures and Structures The fundamental constructs of the ML module system are signatures and structures. A signature is an itembyitem speciﬁcation of these components of a structure.1. Structures consist of declarations of type constructors.1 Signatures A signature is a speciﬁcation. a signature iff the requirements of the signature are met by the structure. There are four basic forms of speciﬁcation that may occur in specs:1 There are two other forms of speciﬁcation beyond these four. A structure matches. (This will be made precise below. and structures are analogous to implementations or packages or classes.
4. Each speciﬁcation may refer to the type constructors introduced earlier in the sequence. A type speciﬁcation of the form Signatures and Structures type (tyvar1 . Here is an illustrative example of a signature deﬁnition.. Signature identiﬁers are abbreviations for the signatures to which they are bound.. signature QUEUE = sig type ’a queue exception Empty val empty : ’a queue val insert : ’a * ’a queue > ’a queue val remove : ’a queue > ’a * ’a queue end The signature QUEUE speciﬁes a structure that must provide 1. A datatype speciﬁcation. We will refer back to this deﬁnition often in the rest of this chapter. 2002 . An exception speciﬁcation of the form exception excon of typ.. 3. W ORKING D RAFT D ECEMBER 9. 2. Signatures may be given names using a signature binding signature sigid = sigexp.tyvarn ) tycon [ = typ ]. which has precisely the same form as a datatype declaration. and no component may be speciﬁed more than once. where the deﬁnition typ of tycon may or may not be present. A value speciﬁcation of the form val id : typ. where sigid is a signature identiﬁer and sigexp is a signature expression.156 1. In practice we nearly always bind signature expressions to identiﬁers and refer to them by name. The sequence of speciﬁcations are to be understood in the order given. a unary type constructor ’a queue..
signature inclusion and signature specialization. Each is a form of inheritance in which a new signature is created by enriching another signature with additional information.2 Signature Inheritance Signatures may be built up from one another using two principal tools. two polymorphic functions. Notice that queues are polymorphic in the type of elements of the queue — the same operations are used regardless of the element type. QUEUE WITH EMPTY. a polymorphic value empty of type ’a queue. 2002 . 157 4. using the following signature binding: signature QUEUE WITH EMPTY = sig include QUEUE val is empty : ’a queue > bool end As the notation suggests. Indeed. For example. if we wish to add an emptiness test to the signature QUEUE we might deﬁne the augmented signature. we may deﬁne it directly using the following signature binding: signature QUEUE WITH EMPTY = sig type ’a queue exception Empty val empty : ’a queue val insert : ’a * ’a queue > ’a queue val remove : ’a queue > ’a * ’a queue val is empty : ’a queue > bool end W ORKING D RAFT D ECEMBER 9. the signature QUEUE is included into the body of the signature QUEUE WITH EMPTY. 3. It is not strictly necessary to use include to deﬁne this signature. and an additional component is added.1. Signature inclusion is used to add more components to an existing signature.18. with the speciﬁed type schemes. 18. a nullary exception Empty.1 Signatures 2. insert and remove.
the signature QUEUE where type ’a queue = ’a list is equivalent to the signature In some languages signatures are compared by name. For example. This is not the case in ML.2 For example. the following is illegal: signature QUEUE AS LISTS AS LIST = QUEUE AS LISTS where type ’a queue = ’a list If you wish to replace the deﬁnition of a type constructor in a signature with another deﬁnition using where type. 2 W ORKING D RAFT D ECEMBER 9. 2002 . Signature inclusion is a convenience that documents the “history” of how the more reﬁned signature was created. For example. which means that two signatures are equivalent iff they are the same signature identiﬁer. you must go back to a common ancestor in which that type is not yet deﬁned. we may proceed as follows: signature QUEUE AS LISTS = QUEUE where type ’a queue = ’a list * ’a list There where type clause “patches” the signature QUEUE by adding a definition for the type constructor ’a queue. The signature QUEUE AS LISTS may also be deﬁned directly as follows: signature QUEUE AS LISTS = sig type ’a queue = ’a list * ’a list exception Empty val empty : ’a queue val insert : ’a * ’a queue > ’a queue val remove : ’a queue > ’a * ’a queue end A where type clause may not be used to redeﬁne a type that is already deﬁned in a signature. if we wish to reﬁne the signature QUEUE to specify that the type constructor ’a queue must be deﬁned as a pair of lists. Signature specialization is used to augment an existing signature with additional type deﬁnitions.158 Signatures and Structures There is no semantic difference between the two deﬁnitions of QUEUE WITH EMPTY. signature QUEUE AS LIST = QUEUE where type ’a queue = ’a list Two signatures are said to be equivalent iff they differ only up to the type equivalences induced by type abbreviations.
and hence the speciﬁcations of the value components are equivalent. 18. A structure expression is wellformed iff it consists of a wellformed sequence of wellformed declarations (according to the rules given in Part II).18. These are precisely the declarations introduced in Part II.2. Structures are implementations of signatures. without affecting the meaning.2 Structures A structure is a unit of program consisting of a sequence of declarations of types. or are equivalent. 4.1 Basic Structures The basic form of structure is an encapsulated sequence of declarations of the form struct decs end. Therefore they may be used interchangeably within their scope. A datatype declaration deﬁning a new datatype. signatures are the “types” of structures. 2. 3. 18. Within the scope of the type declaration in the signature QUEUE AS LIST. and values. exceptions. as we have done above. A type declaration deﬁning a type constructor. A value declaration deﬁning a new value variable with a speciﬁed type. This principle of equivalence is sometimes called progagation of type sharing. the two are equivalent.2 Structures signature QUEUE AS LIST = sig type ’a queue = ’a list exception Empty val empty : ’a list val insert : ’a * ’a list > ’a list val remove : ’a list > ’a * ’a list end 159 Within the scope of the deﬁnition of the type ’a queue as ’a list. 2002 . An exception declaration deﬁning a new exception constructor with a speciﬁed argument type. the type constructors ’a queue and ’a list are said to share. The declarations in decs are of one of the following four forms: 1. W ORKING D RAFT D ECEMBER 9.
nil) fun insert (x. Here is an example of a structure binding: structure Queue = struct type ’a queue = ’a list * ’a list exception Empty val empty = (nil. A path has the form strid . 3 W ORKING D RAFT D ECEMBER 9. nil) = remove (nil. 2002 . (b. that any side effects that arise out of evaluating the constituent value bindings occur when the structure expression is evaluated. f) fun remove (nil. This means. which is then bound to the corresponding value identiﬁer.empty refers to the empty component of the In chapter 21. For example. in the order given. and binding the resulting structure value to strid . nil) = raise Empty  remove (bs. Such a declaration is wellformed exactly when strexp is wellformed. (bs. or qualiﬁed names. we may access its components using paths. or long identiﬁers. 18.3 It stands for the id component of the structure bound to strid. It is evaluated by evaluating the righthand side. A structure may be bound to a structure identiﬁer using a structure binding of the form structure strid = strexp This declaration deﬁnes strid to stand for the value of strexp.2. This amounts to evaluating the righthand sides of each value declaration in turn to determine its value. f::fs) = (f.2 Long and Short Identiﬁers Once a structure has been bound to a structure identiﬁer. A structure value is a structure expression in which all bindings are fully evaluated.160 Signatures and Structures A structure expression is evaluated by evaluating each of the declarations within it. we will generalize this to admit an arbitrary sequence of strid ’s separated by a dot.id . fs)) end Recall that a fun binding is really an abbreviation for a val rec binding! Thus the bindings of the value identiﬁers insert and remove are function expressions.f)) = (x::b. in particular. Queue. rev bs)  remove (bs.
insert has type ’a * ’a Queue.2. cluttering the program.queue > ’a Queue.queue is equivalent to the type int list. ([6. Queue. and Q. Suppose that we are frequently using the Queue operations in a program.empty. Another way to reduce clutter is to open the structure Queue to incorporate its bindings directly into the current environment..remove. We will shortly introduce the means for limiting the visibility of type deﬁnitions.queue and Queue. It has type ’a Queue. Q. stating that it is a polymorphic value whose type is built up from the unary type constructor Queue.2 Structures 161 structure Q. rather than the more verbose forms mentioned above. For example. For example.empty.4]. The use of long identiﬁers can get out of hand.5. Similarly. after declaring structure Q = Queue we may write Q. so that the code is cluttered with calls to Queue.remove. and Queue. stridn which incorporates the bindings from the given structures in lefttoright order (later structures override earlier ones when there is overlap). the declaration open Queue W ORKING D RAFT D ECEMBER 9. Type deﬁnitions permeate structure boundaries.5. the function Queue. the deﬁnitions of types within a structure determine the deﬁnitions of the long identiﬁers that refer to those types within a structure. the type ’a Queue.remove has type ’a Queue. Consequently.4] was not obtained by using the operations from the structure Q. Unless special steps are taken. it is correct to write an expression such as val q = Queue. For example. This is because the type int Queue.. One way to reduce clutter is to introduce a structure abbreviation of the form structure strid = strid that introduces one structure identiﬁer as an abbreviation for another.queue.queue > ’a * ’a Queue.3])) even though the list [6.18.queue (note well the syntax!).insert (1.queue is equivalent to the type ’a list because it is deﬁned to be so in the structure Q. An open declaration has the form open strid1 .[1. and hence the call to insert is welltyped.insert.insert. rather than clarifying it.queue. 2002 .
2002 . without qualiﬁcation. not for Queue. that function would also be incorporated into the current environment by the declaration open Queue. One is that we cannot simultaneously open two structures that have a component with the same name. and only then in a let or local declaration (to limit the damage). and remove. if we write open Queue Stack where the structure Stack also has a component empty. since it incorporates the entire body of a structure. if the structure Queue happened to deﬁne an auxiliary function helper. then uses of empty (without qualiﬁcation) will stand for Stack. Although this is surely convenient. This turns out to be a source of many bugs. insert. it is best to use open sparingly. W ORKING D RAFT D ECEMBER 9. which may not have been intended.empty. and hence may inadvertently shadow identiﬁers that happen to be also used in the structure. For example. to refer to the corresponding components of the structure Queue.162 Signatures and Structures incorporates the body of the structure Queue into the current environment so that we may write just empty. using open has its disadvantages. Another problem with open is that it is hard to control its behavior.empty. For example.
nailing them down is surprisingly tricky. and with equivalent deﬁnitions (if any). and z. y. Value components must be provided with types at least as general as those in the signature. Type components must be provided with the right arities. a datatype may be provided where a type is required. a structure may provide values with more general types than are required by the signature. Let us mention a few issues that complicate matters: • To minimize bureaucracy. it is sufﬁcient for the structure to provide x. • To increase ﬂexibility. a structure may provide more components than are strictly required by the signature. and w. W ORKING D RAFT D ECEMBER 9. z. • To enhance reuse. not just the order speciﬁed in the signature. These are useful rules of thumb. If a signature requires components x. it is enough to provide a function of type ’a>’a. If a signature demands a function of type int>int. and a value constructor may be provided where a value is required. a structure may consist of declarations presented in any sensible order.Chapter 19 Signature Matching When does a structure implement a signature? Roughly speaking. provided that the requirements of the speciﬁcation are met. the structure must provide all of the components and satisfy all of the type deﬁnitions required by the signature. • To avoid overspeciﬁcation. Exception components must be provided with types equivalent to those in the signature. 2002 . y.
2002 . we will deﬁne a structure to implement a signature iff the principal signature of the structure matches that signature... in a precise sense..tyvarn ) tycon = con1 of typ1  . Corresponding to a declaration of the form type (tyvar1 . It is an important..164 Signature Matching What it means for a structure to implement a signature is very similar to what it means for an expression to have a type.. 19.  conk of typk the principal signature contains the speciﬁcation datatype (tyvar1 .. 1 W ORKING D RAFT D ECEMBER 9.. 2. A structure expression is assigned a principal signature by a componentbycomponent analysis of its constituent declarations. For the purposes of type checking...tyvarn ) tycon = con1 of typ1  . It captures everything that needs to be known about that structure during type checking.. once its principal signature has been determined. We need never examine the code of the structure durng type checking.... The principal signature of a structure is obtained as follows:1 1.. the principal signature contains the speciﬁcation type (tyvar1 ..tyvarn ) tycon = typ The principal signature includes the deﬁnition of tycon.. the principal signature is the ofﬁcial proxy for the structure.. Corresponding to a declaration of the form datatype (tyvar1 . the most speciﬁc description of the components of that structure.1 Principal Signatures The principal signature of a structure is. and highly nontrivial.tyvarn ) tycon = typ.. See The Deﬁnition of Standard ML [2] for complete details. Similarly..  conk of typk These rules gloss over some technical complications that arise only in unusual circumstances. property of ML that there is always a principal signature for any wellformed structure.. An expression exp has type typ iff typ is an instance of the principal type of exp.
More precisely. 2002 .19. Every datatype in the target must be present in the candidate. Corresponding to a declaration of the form exception id of typ the principal signature contains the speciﬁcation exception id of typ 4. it has important implications for managing dependencies between modules.2 Matching A candidate signature sigexpc is said to match a target signature sigexpt iff sigexpc has all of the components and all of the type equations speciﬁed by sigexpt . plus the principal types of its value bindings. Corresponding to a declaration of the form val id = exp the principal signature contains the speciﬁcation val id : typ 165 where typ is the principal type scheme of the expression exp (relative to the preceding declarations). W ORKING D RAFT D ECEMBER 9. Every type constructor in the target must also be present in the candidate. In brief. the principal signature contains all of the type deﬁnitions. 19.2 Matching The speciﬁcation is identical to the declaration. with equivalent types for the value constructors. datatype deﬁnitions. We will return to this point in section 20. 2. 1. An important point to keep in mind is that the principal signature of a structure is obtained by inspecting the code of that structure. with the same arity (number of arguments) and an equivalent deﬁnition (if any). and exception bindings of the structure. Keep in mind that fun bindings are really (recursive) val bindings of function type. While this may seem like an obvious point.4 of chapter 20. 3.
but it cannot have fewer of either. 2002 . 4. with an equivalent argument type. signature QUEUE = sig type ’a queue exception Empty val empty : ’a queue val insert : ’a * ’a queue > ’a queue val remove : ’a queue > ’a * ’a queue end signature QUEUE WITH EMPTY = sig include QUEUE val is empty : ’a queue > bool end signature QUEUE AS LISTS = QUEUE where type ’a queue = ’a list * ’a list The signature QUEUE WITH EMPTY matches the signature QUEUE. then sigexp1 matches sigexp3 . It is also a partial order. The converse does not hold. with at least as general a type. Recall the following signatures from chapter 18. It is easy to see that the matching relation is reﬂexive — every signature matches itself — and transitive — if sigexp1 matches sigexp2 and sigexp2 matches sigexp3 . This means that the signature matching relation is a preorder. then sigexp1 and sigexp2 are equivalent signatures (to within propagation of type equations). The target signature may therefore be seen as a weakening of the candidate signature. It will be helpful to consider some examples. which is to say that if sigexp1 matches sigexp2 and vice versa.166 Signature Matching 3. which is required by QUEUE WITH EMPTY. The candidate may have additional components not mentioned in the target. W ORKING D RAFT D ECEMBER 9. because QUEUE lacks the component is empty. because all of requirements of QUEUE are met by QUEUE WITH EMPTY. Every exception in the target must be present in the candidate. Every value in the target must be present in the candidate. or satisfy additional type equations not required in the target. since all of the properties of the latter are true of the former.
For example. 2002 . because the signature QUEUE does not satisfy the requirement that ’a queue be equivalent to ’a list * ’a list. For example.19. W ORKING D RAFT D ECEMBER 9. However. Signature matching may also involve instantiation of polymorphic types. which matches QUEUE for reasons noted earlier. the signature signature MERGEABLE QUEUE = sig include QUEUE val merge : ’a queue * ’a queue > ’a queue end matches the signature signature MERGEABLE INT QUEUE = sig include QUEUE val merge : int queue * int queue > int queue end because the polymorphic type of merge in MERGEABLE QUEUE instantiates to its type in MERGEABLE INT QUEUE. the signature QUEUE AS LIST is equivalent to the signature QUEUE with type ’a queue = ’a list.2 Matching 167 The signature QUEUE AS LISTS matches the signature QUEUE. Therefore. Matching does not distinguish between equivalent signatures. apart from the additional speciﬁcation of the type ’a queue. consider the following signature: signature QUEUE AS LIST = sig type ’a queue = ’a list exception Empty val empty : ’a list val insert : ’a * ’a list > ’a list val remove : ’a list > ’a * ’a list val is empty : ’a list > bool end At ﬁrst glance you might think that this signature does not match the signature QUEUE. The types of values in the candidate may be more general than required by the target. since the components of QUEUE AS LIST have superﬁcially dissimilar types from those in QUEUE. It is identical to QUEUE. QUEUE AS LIST matches QUEUE as well. The converse fails.
2002 .168 Signature Matching Finally. and zero or more value components corresponding to some (or all) of the value constructors of the datatype. the “signature” RBT DTS is not a legal ML signature! W ORKING D RAFT D ECEMBER 9. One way to understand this is to mentally rewrite the signature RBT DT in the (fanciful) form2 signature RBT DTS = sig type ’a rbt con Empty : ’a rbt con Red : ’a rbt * ’a * ’a rbt > ’a rbt con Black : ’a rbt * ’a * ’a rbt > ’a rbt The rule is simply that a val speciﬁcation may be matched by a con speciﬁcation. For example. 2 Unfortunately. The types of the value components must match exactly the types of the corresponding value constructors. and includes two value speciﬁcations that are met by value constructors in the signature RBT DT. no specialization is allowed in this case. the signature signature RBT DT = sig datatype ’a rbt = Empty  Red of ’a rbt * ’a * ’a rbt  Black of ’a rbt * ’a * ’a rbt end matches the signature signature RBT = sig type ’a rbt val Empty : ’a rbt val Red : ’a rbt * ’a * ’a rbt > ’a rbt end The signature RBT speciﬁes the type ’a rbt as abstract. a datatype speciﬁcation matches a signature that speciﬁes a type with the same name and arity (but no deﬁnition).
That is.19. a candidate structure implements a target signature iff the principal signature of the candidate structure matches the target signature. Therefore any signature implemented by a structure is weaker than the principal signature of that structure. By the reﬂexivity of the matching relation it is immediate that a structure satisﬁes its principal signature.3 Satisfaction Returning to the motivating question of this chapter. the principal signature is the strongest signature implemented by a structure. W ORKING D RAFT D ECEMBER 9. 2002 .3 Satisfaction 169 19.
170 Signature Matching W ORKING D RAFT D ECEMBER 9. 2002 .
they differ in the extent to which the assigned signature of the structure is weakened by the ascription. or descriptive ascription. in so doing. Opaque ascription is the means by which type information is curtailed. There are two forms. Transparent. Both forms of ascription hide components not present in the target signature. There are two forms of ascription in ML. What is held back is as at least as important as what is revealed. 20. weakens the signature of that structure for all subsequent uses of it. 2. Both require that a structure implement a signature. or restrictive ascription. Using modules effectively requires careful control over the propagation of type information. The structure is assigned the target signature as is. transparent ascription is the means by which it is propagated. 2002 . the transparent structure strid : sigexp = strexp and the opaque W ORKING D RAFT D ECEMBER 9. The structure is assigned the target signature augmented by propagating to the target the deﬁnitions of those types in the candidate. 1. without augmentation.1 Ascribed Structure Bindings The most common form of signature ascription is in a structure binding.Chapter 20 Signature Ascription Signature ascription imposes the requirement that a structure implement a signature and. Opaque.
This determines an augmentation sigexp of sigexp by propagating type equations from the principal signature. but not in the signature. it is assigned the signature sigexp. See The Deﬁnition of Standard ML for complete details. Then a view of the resulting value is formed by dropping all components that are not present in the target signature. the principal signature sigexp0 of strexp is determined and matched against sigexp. Incidentally. 1 W ORKING D RAFT D ECEMBER 9.1 The structure variable strid is bound to the view. Second. if we attempt to include only the value constructors of a datatype. the compiler checks that strexp implements sigexp according to the rules given in chapter 19. structure Queue :> QUEUE = struct There are some technical complications to do with dropping type components. Thus transparent ascription is really a form of “signature inference” in which we are asking the compiler to ﬁll in details that it is too inconvenient to write ourselves. Speciﬁcally. A good example is provided by the implementation of queues as. 2002 . For example. sigexp. sigexp0 . this makes clear that transparent ascription is really a special case of opaque ascription! Since the augmented signature. 20. this is more than a minor convenience! Ascribed signature bindings are evaluated by ﬁrst evaluating strexp. First. Any such type implicitly included in the view is marked as “hidden”. As we will see below. for a transparent ascription. sigexp .172 structure strid :> sigexp = strexp Signature Ascription The only difference is that transparent ascription is written using a single colon. we could have written it ourselves and opaquely ascribed it to strexp. “:”. the compiler will implicitly include the datatype to ensure that the types of the constructors are expressible in the signature of the view. it is assigned the signature sigexp . with the same net effect as a transparent ascription. For an opaque ascription. Ascribed structure bindings are type checked as follows. the structure identiﬁer is assigned a signature based on the form of ascription. and not the datatype itself. and that there are no “space leaks” because of components that are present in the structure. whereas opaque ascription is written as “:>”.2 Opaque Ascription The primary use of opaque ascription is to enforce data abstraction. The formation of the view ensures that the components of a structure may always be accessed in constant time. say. is itself a signature. pairs of lists.
f::fs) = (f. 2002 .20. insert. This means that the implementation can be changed without fear of breaking any client code. Were abstraction not enforced. nil) = raise Empty  remove (bs. A closelyrelated reason for hiding the representation of a type is that it allows us to isolate the enforcement of representation invariants to the implementation of the abstraction. Instead we have obscured this fact by opaquely ascribing a signature that does not deﬁne the type ’a queue. fs) exception Empty fun remove (nil. All state transition instructions may assume that the invariant holds of the inputs states. the type ’a Queue. and therefore it has no deﬁnition in terms of other types of the language. This ensures that all clients of the structure Queue are insulated from the details of how queues are implemented. No deﬁnition is provided for it in the signature QUEUE.queue as the type of states of an abstract machine whose sole instructions are empty (the initial state). Internally to the structure Queue we may wish to impose invariants on the internal state of the machine. We may think of the type ’a Queue. called the assumeensure. or relyguarantee. fs))  remove (bs. a strong tool for enhancing maintainability of large programs. (bs. and remove.queue is abstract. (bs. The beauty of data abstraction is that it supports an elegant technique for enforcing such invariants. We may not make use of the fact that a queue is “really” a pair of lists simply because it is implemented this way. the client might well (inadvertently or deliberately) make use of the representation of queues as pairs of lists. and remove may be performed on values of that type. The technique. insert. technique reduces enforcement of representation invariants to these two requirements: 1. The principal effect of rendering ’a Queue. rev bs) end 173 The use of opaque ascription ensures that the type ’a Queue. nil) fun insert (x. All initialization instructions must ensure that the invariant holds true of the machine state after execution.2 Opaque Ascription type ’a queue = ’a list * ’a list val empty = (nil. 2.queue is abstract. forcing all client code to be reconsidered whenever a change of representation occurs. and must ensure that it holds of the output state. fs)) = (x::bs.queue abstract is that only the operations empty. W ORKING D RAFT D ECEMBER 9. nil) = remove (nil.
2002 . then opaquely ascribe this to PrioQueue: In chapter 21 we’ll introduce better means for structuring this module. Here is a possible signature for priority queues that expresses this dependency:2 signature PQ = sig type elt val lt : elt * elt > bool type queue exception Empty val empty : queue val insert : elt * queue > queue val remove : queue > elt * queue end Now let us consider an implementation of priority queues in which the elements are taken to be strings.elt. ..elt! This leaves us no means of creating a value of type PrioQueue. and hence we can never call PrioQueue.e. it must really be invariant! Now suppose that we wish to implement an abstract type of priority queues for an arbitrary element type. but the central points discussed here will not be affected. the invariant must hold for all states — i..174 Signature Ascription By induction on the number of “instructions” executed. This suggests an implementation along these lines: structure PrioQueue :> PQ = struct type elt = string val lt : string * string > bool = (op <) type queue = . . Since priority queues form an abstract type. The queue operations are no longer polymorphic in the element type because they actually “touch” the elements to determine their relative priorities. so is PrioQueue. 2 W ORKING D RAFT D ECEMBER 9..queue abstract. we would expect to use opaque ascription to ensure that its representation is hidden. and not that of the type elt.insert. . The problem is that the interface is “too abstract” — it should only obscure the identity of the type queue. The solution is to augment the signature PQ with a deﬁnition for the type elt. end But not only is the type PrioQueue.
20. Such a signature is only useful once it has been augmented with a deﬁnition for the type t. 175 Now the type PrioQueue. Doing so would preclude ever calling the lt operation. As we remarked earlier.. Consider the signature ORDERED deﬁned as follows: signature ORDERED = sig type t val lt : t * t > bool end This signature speciﬁes a type t equipped with a comparison operation lt. The prototypical use of transparent ascription is to form a view of a structure that eliminates the components that are not necessary in a given context without obscuring the identities of its type components.3 Transparent Ascription Transparent ascription cuts down on the need for explicit speciﬁcation of type deﬁnitions in signatures. as expected. we can always replace uses of transparent ascription by a use of opaque ascription with a handcrafted augmented signature. The moral is that there is always an element of judgement involved in deciding which types to hold abstract. and no others.elt is equivalent to string. This is precisely what transparent ascription does for you. and so we hold the type abstract.insert with a string. On the other hand the operations on queues are intended to be complete. excessive use of transparent ascription impedes modular programming by exposing type information that would better be left abstract. and which to make opaque. for there would be no means of creating values of type t. 2002 . but must instead be speciﬁed in the signature. It should be clear that it would not make sense to opaquely ascribe this signature to a structure.3 Transparent Ascription signature STRING PQ = PQ where type elt = string structure PrioQueue :> STRING PQ = . In the case of priority queues. This means that elt could not usefully be held abstract. and we may call PrioQueue. 20.. For example. consider the following structure binding: W ORKING D RAFT D ECEMBER 9. the determining factor is that we speciﬁed only the operations on elt that were required for the implementation of priority queues. This can become burdensome. On the other hand.
t is equivalent to string. 2. one by the standard arithmetic comparison. The type String. we may wish to consider the integers ordered in two different ways.. n) = (n mod m = 0) end The ascription speciﬁes the interpretation of int as partially ordered.. in two senses. t) = . clt .. A related use of transparent ascription is to document an interpretation of a type without rendering it abstract. For example. W ORKING D RAFT D ECEMBER 9. Transparent ascription computes an augmentation of ORDERED with this deﬁnition exposed. We might make the following declarations to express this: structure IntLt : ORDERED = struct type t = int val lt = (op <) end structure IntDiv : ORDERED = struct type t = int fun lt (m. The auxiliary function clt is pruned out of the structure. 2002 .t are both equivalent to int.t and IntDiv. end Signature Ascription This structure implements string comparison in terms of character comparison (say. to implement the lexicographic ordering of strings). It was intended for internal use. Ascription of the signature ORDERED ensures two things: 1. In particular.176 structure String : ORDERED = struct type t = string val clt = Char.. IntLt. even though this fact is not present in the signature ORDERED. the other by divisibility. and was not meant to be externally visible.< fun lt (s. but does not hide the type of elements. The “true” signature of String is the signature ORDERED where type t = string which makes clear the underlying deﬁnition of t.
Since the signature is given independently of the implementation. In principle one can always write out the augmented signature by hand (using where type clauses). this convenience comes at a price: whenever you use transparent ascription.3 Transparent ascription is a convenience akin to type inference. and not just its (apparent) interface.4 The goal of modular programming is to isolate parts of programs from one another so that they can be independently developed and modiﬁed with minimal interference.20. the fewer the problems with integrating modules to form a complete system. 3 Worse. 2002 .4 Transparency. because changes to C force reconsideration of D. Once a signature has been opaquely ascribed to a structure. This is called the fragile base class problem. for very technical reasons. and Dependency An important observation is that transparent ascription is. because to do so requires access to “hidden” types. in practice it is simply too inconvenient to require the programmer to be fully explicit about the propagation of type information in signatures. However. you have an implementation dependency of C on D. in a sense. all future uses of that structure may rely only on the ascribed signature. The compiler automatically ﬂeshes out omitted information in a manner that is convenient for the programmer. This greatly facilitates team development and code evolution. W ORKING D RAFT D ECEMBER 9. what you see (the ascribed signature) is not what you get! Rather. 4 That’s why inheritance in classbased languages is a bad idea. On the other hand. a form of opaque ascription. Implementation dependencies are fundamentally antimodular. If a class D inherits from a class C. the compiler must have access to the source code of the structure to determine the “true” signature of that structure. it is not always possible to write the augmented signature by hand. client code is insulated from changes to the implementation that do not also affect its interface. what you get is an augmentation of what you see obtained by inspecting the implementation of that structure. Opacity. and Dependency 177 20. With transparent ascription. then opaquely ascribe it to the structure. This means that any code that makes reference to the ascribed structure depends on the implementation of that structure. opaque ascription eliminates them. While this is true in principle. The target signature is augmented with the type deﬁnitions of the candidate. and then the augmented signature is opaquely ascribed to the structure.4 Transparency. Opacity. whereas transparent ascription therefore introduces implementation dependencies. The fewer the dependencies between modules.
2002 .178 Signature Ascription W ORKING D RAFT D ECEMBER 9.
To see how substructures arise in practice. 21. As programs grow in size and complexity. The ML module language also supports module hierarchies. because there is no structure to ascribe! The type checking and evaluation rules for structures are extended to substructures recursively. then binding the resulting structure value to that identiﬁer.1 Substructures A substructure is a “structure within a structure”. A substructure binding in one signature matches the corresponding one in another iff their signatures match according to the rules in chapter 19. There is no distinction between transparent and opaque speciﬁcations in a signature. and values. exceptions. it becomes important to introduce further structuring mechanisms to support their growth. 2002 . Structure speciﬁcations of the form structure strid : sigexp may appear in signatures. The principal signature of a substructure binding is determined according to the rules given in chapter 19. Evaluation of a substructure binding consists of evaluating the structure expression. Structures bindings (either opaque or transparent) are admitted as components of other structures. consider the following programming scenario. The ﬁrst version of a system makes use of a polymorW ORKING D RAFT D ECEMBER 9.Chapter 21 Module Hierarchies So far we have conﬁned ourselves to considering “ﬂat” modules consisting of a linear sequence of declarations of types. treestructured conﬁgurations of modules that reﬂect the architecture of a large system.
. 2002 . v) = . The signature for such a data structure might be as follows: signature MY STRING DICT = sig type ’a dict val empty : ’a dict val insert : ’a dict * string * ’a > ’a dict val lookup : ’a dict * string > ’a option end The return type of lookup is ’a option. end The omitted implementations of insert and lookup make use of the builtin lexicographic ordering of strings. since there may be no entry in the dictionary with the speciﬁed key.. signature MY INT DICT = sig type ’a dict val empty : ’a dict val insert : ’a dict * int * ’a > ’a dict val lookup : ’a dict * int > ’a option end structure MyIntDict :> MY INT DICT = sig datatype ’a dict = Empty  W ORKING D RAFT D ECEMBER 9..180 Module Hierarchies phic dictionary data structure whose search keys are strings.. k. The implementation of this abstraction looks approximately like this: structure MyStringDict :> MY STRING DICT = struct datatype ’a dict = Empty  Node of ’a dict * string * ’a * ’a dict val empty = Empty fun insert (d. fun lookup (d. leading to another signature and implementation for dictionaries. The second version of the system requires another dictionary whose keys are integers. k) = .
v. 2002 . dr). Empty) fun lookup (Empty. that of a dictionary with keys of a speciﬁc type. v) = .. l. To avoid further repetition we decide to abstract out the key type from the signature so that it can be ﬁlled in later. Speciﬁc instances of this generic dictionary signature are obtained using where type. k. signature MY GEN DICT = sig type key type ’a dict val empty : ’a dict val insert : ’a dict * key * ’a > ’a dict end Notice that the dictionary abstraction carries with it the type of its keys.. end 181 The ellided implementations of insert and lookup make use of the primitive comparison operations on integers. v) = Node (Empty. signature MY STRING DICT = MY GEN DICT where type key = string signature MY INT DICT = MY GEN DICT where type key = int A string dictionary might then be implemented as follows: structure MyStringDict :> MY STRING DICT = struct type key = string datatype ’a dict = Empty  Node of ’a dict * key * ’a * ’a dict val empty = Empty fun insert (Empty. At this point we may observe an obvious pattern. k. ) = NONE  lookup (Node (dl. k..1 Substructures Node of ’a dict * int * ’a * ’a dict val empty = Empty fun insert (d. v. k) = .. k) = if k < l then (* string comparison *) W ORKING D RAFT D ECEMBER 9. fun lookup (d.21.
k) else v end Module Hierarchies (* string comparison *) By a similar process we may build an implementation MyIntDict of the signature MY INT DICT. v. k) else if divides (l. ) = NONE  lookup (Node (dl. with integers as keys. makes use of modular arithmetic to compare operations. divides. let us reconsider our initial attempt to consolidate the signatures of the various versions of dictionaries in play. W ORKING D RAFT D ECEMBER 9. k) else if k > l then lookup (dr. k) = if divides (k. k) then (* divisibility test *) lookup (dr. Now suppose that we require a third dictionary. l) then (* divisibility test *) lookup (dl. dr). to implement the comparison in the required sense. v. but has the same signature MY INT DICT as MyIntDict. v) = Node (Empty. structure MyIntDivDict :> MY INT DICT = struct type key = int datatype ’a dict = Empty  Node of ’a dict * key * ’a * ’a dict fun divides (k. but ordered according to the divisibility ordering. However. with integer keys ordered by the standard integer comparison operations. l) = (l mod k = 0) val empty = Empty fun insert (None. With this in mind.182 lookup (dl.1 This implementation. say MyIntDivDict. k) else v end Notice that we required an auxiliary function. Empty) fun lookup (Empty. m < n iff m divides n evenly. In one sense there is nothing to do — the signature MY GEN DICT sufﬁces. k. 2002 . k. l. as 1 For which.
1 Substructures 183 we’ve just seen. the instances of this signature. n) = lt (m. *) structure LessInt : ORDERED = struct type t = int val eq = (op =) val lt = (op <) end (* Integers ordered by divisibility. n) andalso lt (n. 2002 .*) structure DivInt : ORDERED = struct type t = int fun lt (m. but it also carries the interpretation used on that type. as in the following examples. n) = (n mod m = 0) fun eq (m.21. This is achieved by introducing a substructure binding in the dictionary structure. *) structure LexString : ORDERED = struct type t = string val eq = (op =) val lt = (op <) end (* Integers ordered conventionally. (* Lexicographically ordered strings. What we’d like to do is to package the type with its interpretation so that the dictionary module is selfcontained. do not determine the interpretation. signature ORDERED = sig type t val lt : t * t > bool val eq : t * t > bool end This signature describes modules that contain a type t equipped with an equality and comparison operation on it. Not only does the dictionary module carry with it the type of its keys. To begin with we ﬁrst isolate the notion of an ordered type. which are ascribed to particular implementations. m) end W ORKING D RAFT D ECEMBER 9. An implementation of this signature speciﬁes the type and the interpretation.
since ORDERED is not intended as a selfcontained abstraction. k) = if Key. k) then W ORKING D RAFT D ECEMBER 9.t * ’a * ’a dict val empty = Empty fun insert (None.t=string signature INT DICT = DICT where type Key. ) = NONE  lookup (Node (dl. v. The signature of dictionaries is restructured as follows: signature DICT = sig structure Key : ORDERED type ’a dict val empty : ’a dict val insert : ’a dict * Key.184 Module Hierarchies Notice that the use of transparent ascription is very natural here. k. v) = Node (Empty. k) else if Key. structure StringDict :> STRING DICT = struct structure Key : ORDERED = LexString datatype ’a dict = Empty  Node of ’a dict * Key. How are these signatures to be implemented? Corresponding to the layering of the signatures. k. 2002 . l) then lookup (dl. l. v. To enforce abstraction we introduce specialized versions of this signature that specify the key type using a where type clause. signatures for the abstract type of dictionaries whose keys are strings and integers.lt (l.t * ’a > ’a dict val lookup : ’a dict * Key. Empty) fun lookup (Empty. we have a layering of the implementation. signature STRING DICT = DICT where type Key.t > ’a option end The signature DICT includes as a substructure the key type together with its interpretation as an ordered type. dr).lt(k.t=int These are. respectively.
1 Substructures lookup (dr.lt (l. as follows: structure LessIntDict :> INT DICT = struct structure Key : ORDERED = LessInt datatype ’a dict = Empty  Node of ’a dict * Key. k.lt(k. dictionaries with integer keys ordered by divisibility may be implemented as follows: structure IntDivDict :> INT DICT = struct structure Key : ORDERED = IntDiv datatype ’a dict = Empty  Node of ’a dict * Key. v. k) then lookup (dr. v. dr). ) = NONE  lookup (Node (dl. l) then lookup (dl. ) = NONE  lookup (Node (dl.eq.lt and Key.21. Empty) fun lookup (Empty. v. v) = Node (Empty. k. l. 2002 . Empty) fun lookup (Empty. with the standard ordering. k) else v end Similarly. v) = Node (Empty.t * ’a * ’a dict val empty = Empty fun insert (None. dr).t * ’a * ’a dict val empty = Empty fun insert (None. we may implement IntDict. l. k) else v end 185 Observe that the implementation of insert and lookup make use of the comparison operations Key. k) = if Key. Similarly. k) else if Key. v. k. k. k) = W ORKING D RAFT D ECEMBER 9.
In this sense substructures subsume the notion of a parameterized signature found in some languages. Parameterized signatures. k) then lookup (dr.lt(k. There are several advantages to this: 1. 2. are incomplete signatures that must be completed to be used. There is no need to designate in advance which are “arguments” and which are “results”. In chapter 23 we will introduce the mechanisms needed to build a generic implementation of dictionaries that may be instantiated by the key type and its ordering. together with its interpretation. A signature with one or more substructures is still a complete signature. 2002 . what we have done is to structure the signature of dictionaries to allow the type of keys. to vary from one implementation to another. Any substructure of a signature may play the role of a “parameter”. in contrast. W ORKING D RAFT D ECEMBER 9. The Key substructure may be viewed as a “parameter” of the signature DICT that is “instantiated” by specialization to speciﬁc types of interest.lt (l.186 if Key. k) else v end Module Hierarchies Taking stock of the development. l) then lookup (dl. k) else if Key.
that of a point in space and that of a sphere. 22. 2002 . Points and vectors are fundamental to representing geometry. In this chapter we will consider the problem of symmetric combination of modules to form larger modules.1 Combining Abstractions The discussion will be based on a representation of geometry in ML based on the following (drastically simpliﬁed) signature.Chapter 22 Sharing Speciﬁcations In chapter 21 we illustrated the use of substructures to express the dependence of one abstraction on another. signature GEOMETRY = sig structure Point : POINT structure Sphere : SPHERE end For the purposes of this example. we have reduced geometry to two concepts. They are described by the following (abbreviated) signatures: signature VECTOR = sig type vector val zero : vector val scale : real * vector > vector val add : vector * vector > vector val dot : vector * vector > real W ORKING D RAFT D ECEMBER 9.
188 end Sharing Speciﬁcations signature POINT = sig structure Vector : VECTOR type point (* move a point along a vector *) val translate : point * Vector.point are distinct. 2002 .vector > sphere end The operation sphere creates a sphere centered at a given point and with the radius vector given. It is the structures. that specify the dimension. and not the signatures.point and Geom3D.Point.. the vector from the ﬁrst to the second).vector end The vector operations support addition. scalar multiplication.e.Point. This means that dimensional conformance is enforced by the type checker. The point operations support translation of a point along a vector and the creation of a vector as the “difference” of two points (i. the types Geom2D.point * Vector.vector > point (* the vector from a to b *) val ray : point * point > Vector. For example.and threedimensional geometry are deﬁned by structure bindings like these: structure Geom2D :> GEOMETRY = .. This allows us — using the mechanisms to be introduced in chapter 23 — to build packages that work in an arbitrary dimension without requiring runtime conformance checks. These signatures are intentionally designed so that the dimension of the space is not part of the speciﬁcation. As a consequence of the use of opaque ascription. Two. Spheres are implemented by a module implementing the following (abbreviated) signature: signature SPHERE = sig structure Vector : VECTOR structure Point : POINT type sphere val sphere : Point. and inner product. and include a unit element for addition.. structure Geom3D :> GEOMETRY = .. we cannot W ORKING D RAFT D ECEMBER 9..
point. The revised signatures for the geometry package look like this: W ORKING D RAFT D ECEMBER 9. Thus. and. we have two “copies” of the point abstraction. We might expect to be able to form a sphere centered at p with radius determined by the vector from p to q: Geom2D. To support this it is necessary to constrain the implementation to use the same notion of vector throughout. POINT has a substructure implementing VECTOR.Sphere. so an implementation of POINT depends on an implementation of VECTOR. say. the better off we are. this may not be what is intended. we are compelled to keep these types distinct.Point. unfortunately.sphere (p. SPHERE has substructures implementing VECTOR and POINT. the vector abstraction might well be distinct from one another. Of course.Sphere.Sphere.1 Combining Abstractions 189 apply Geom3D.22.sphere to a point in twospace and a vector in threespace. 2002 . we have too much of a good thing. Closer inspection reveals that. we might have used completely incompatible notions of vector in each of the three places where they are required. What has gone wrong? The situation is quite subtle. In keeping with the guidelines discussed in section 21. Suppose that p and q are twodimensional points of type Geom2D. In the elided implementation of twodimensional geometry. and three “copies” of the vector abstraction! Since we used opaque ascription to deﬁne the two. so that we can mixandmatch the vectors constructed in various components of the package.Vector and Geom2D. For example.1. we have incorporated as substructures the structures on which a given structure depends. What is missing is the expression of the intention that the various “copies” of vectors and points within the geometry abstraction be identical. This leads to a proliferation of structures.Point. which is not at all what we intend. This is a good thing: the more static checking we have. even though they may be implemented identically.and threedimensional implementations of the signature GEOMETRY.Vector.ray (p.vector are also distinct from one another. all of these abstractions are kept distinct from one another. similarly. but (so far) there is nothing in the signature to prevent it.Point. In a sense this is the correct state of affairs. forming a ray from one point to another yields a vector. Geom2D. But this expression is illtyped! The reason is that the types Geom2D. Even in the very simpliﬁed geometry signature given above. The various “copies” of. q)). This is achieved using a type sharing constraint. Hence.
Vector. the illtyped expression above becomes welltyped. since now the required type equation holds by explicit speciﬁcation in the signature.Vector.Vector = Sphere.Point.point and Point.point * Vector.Vector. and the three “copies” of the vector abstraction must coincide.vector > sphere end signature GEOMETRY = sig structure Point : POINT structure Sphere : SPHERE sharing Point = Sphere.point = Sphere.Vector = Vector type sphere val sphere : Point.Point and Point.vector type sphere val sphere : Point.vector end These equations specify that the two “copies” of the point abstraction. 2002 .vector > sphere end signature GEOMETRY = sig structure Point : POINT structure Sphere : SPHERE sharing type Point.vector = Vector.Vector end W ORKING D RAFT D ECEMBER 9.190 Sharing Speciﬁcations signature SPHERE = sig structure Vector : VECTOR structure Point : POINT sharing type Point. As a notational convenience we may use a structure sharing constraint instead to express the same requirements: signature SPHERE = sig structure Vector : VECTOR structure Point : POINT sharing Point.point * Vector. In the presence of the above sharing speciﬁcation.vector = Sphere.
22.1 Combining Abstractions
191
Rather than specify the required sharing typebytype, we can instead specify it structurebystructure, with the meaning that corresponding types of shared structures are required to share. Since each structure in our example contains only one type, the effect of the structure sharing speciﬁcation above is identical to the preceding type sharing speciﬁcation. Not only does the sharing speciﬁcation ensure that the desired equations hold amongst the various components of an implementation of GEOMETRY, but it also constrains the implementation to ensure that these types are the same. It is easy to achieve this requirement by deﬁning a single implementation of points and vectors that is reused in the higherlevel abstractions. structure Vector3D : VECTOR = ... structure Point3D : POINT = struct structure Vector : VECTOR = Vector3D . . . end structure Sphere3D : SPHERE = struct structure Vector : VECTOR = Vector3D structure Point : POINT = Point3D . . . end structure Geom3D :> GEOMETRY = struct structure Point = Point3D structure Sphere = Sphere3D end The required type sharing constraints are true by construction. Had we instead replaced the above declaration of Geom3D by the following one, the type checker would reject it on the grounds that the required sharing between Sphere.Point and Point does not hold, because Sphere2D.Point is distinct from Point3D. structure Geom3D :> GEOMETRY = struct structure Point = Point3D structure Sphere = Sphere2D end
W ORKING D RAFT D ECEMBER 9, 2002
192
Sharing Speciﬁcations
It is natural to wonder whether it might be possible to restructure the GEOMETRY signature so that the duplication of the point and vector components is avoided, thereby obviating the need for sharing speciﬁcations. One can restructure the code in this manner, but doing so would do violence to the overall structure of the program. This is why sharing speciﬁcations are so important. Let’s try to reorganize the signature GEOMETRY so that duplication of the point and vector structures is avoided. One step is to eliminate the substructure Vector from SPHERE, replacing uses of Vector.vector by Vector.Point.vector. signature SPHERE = sig structure Point : POINT type sphere val sphere : Point.point * Point.Vector.vector > sphere end After all, since the structure Point comes equipped with a notion of vector, why not use it? This cuts down the number of sharing speciﬁcations to one: signature GEOMETRY = sig structure Point : POINT structure Sphere : SPHERE sharing Point = Sphere.Point end If we could further eliminate the substructure Point from the signature SPHERE we would have only one copy of Point and no need for a sharing speciﬁcation. But what would the signature SPHERE look like in this case? signature SPHERE = sig type sphere val sphere : Point.point * Point.Vector.vector > sphere end
W ORKING D RAFT D ECEMBER 9, 2002
22.1 Combining Abstractions
193
The problem now is that the signature SPHERE is no longer selfcontained. It makes reference to a structure Point, but which Point are we talking about? Any commitment would tie the signature to a speciﬁc structure, and hence a speciﬁc dimension, contrary to our intentions. Rather, the notion of point must be a generic concept within SPHERE, and hence Point must appear as a substructure. The substructure Point may be thought of as a parameter of the signature SPHERE in the sense discussed earlier. The only other move available to us is to eliminate the structure Point from the signature GEOMETRY. This is indeed possible, and would eliminate the need for any sharing speciﬁcations. But it only defers the problem, rather than solving it. A fullscale geometry package would contain more abstractions that involve points, so that there will still be copies in the other abstractions. Sharing speciﬁcations would then be required to ensure that these copies are, in fact, identical. Here is an example. Let us introduce another geometric abstraction, the semispace. signature SEMI SPACE = sig structure Point : POINT type semispace val side : Point.point * semispace > bool option end The function side determines (if possible) whether a given point lies in one half of the semispace or the other. The expanded GEOMETRY signature would look like this (with the elimination of the Point structure in place). signature EXTD GEOMETRY = sig structure Sphere : SPHERE structure SemiSpace : SEMI SPACE sharing Sphere.Point = SemiSpace.Point end By an argument similar to the one we gave for the signature SPHERE, we cannot eliminate the substructure Point from the signature SEMI SPACE, and hence we wind up with two copies. Therefore the sharing speciﬁcation is required. What is at issue here is a fundamental tension in the very notion of modular programming. On the one hand we wish to separate modules from one
W ORKING D RAFT D ECEMBER 9, 2002
194
Sharing Speciﬁcations
another so that they may be treated independently. This requires that the signatures of these modules be selfcontained. Unbound references to a structure — such as Point — ties that signature to a speciﬁc implementation, in violation of our desire to treat modules separately from one another. On the other hand we wish to combine modules together to form programs. Doing so requires that the composition be coherent, which is achieved by the use of sharing speciﬁcations. What sharing speciﬁcations do for you is to provide an after the fact means of tying together several different abstractions to form a coherent whole. This approach to the problem of coherence is a unique — and uniquely effective — feature of the ML module system.
W ORKING D RAFT
D ECEMBER 9, 2002
Chapter 23
Parameterization
To support code reuse it is useful to deﬁne generic, or parameterized, modules that leave unspeciﬁed some aspects of the implementation of a module. The unspeciﬁed parts may be instantiated to determine speciﬁc instances of the module. The common part is thereby implemented once and shared among all instances. In ML such generic modules are called functors. A functor is a modulelevel function that takes a structure as argument and yields a structure as result. Instances are created by applying the functor to an argument specifying the interpretation of the parameters.
23.1
Functor Bindings and Applications
Functors are deﬁned using a functor binding. There are two forms, the opaque and the transparent. A transparent functor has the form functor funid (decs):sigexp = strexp where the result signature, sigexp, is transparently ascribed; an opaque functor has the form functor funid (decs):>sigexp = strexp where the result signature is opaquely ascribed. A functor is a modulelevel function whose argument is a sequence of declarations, and whose result is a structure. Type checking of a functor consists of checking that the functor body matches the ascribed result signature, given that the parameters of the functor have the speciﬁed signatures. For a transparent functor the result signature is the augmented signature determined by matching the principal
W ORKING D RAFT D ECEMBER 9, 2002
196
Parameterization
signature of the body (relative to the assumptions governing the parameters) against the given result signature. For opaque functors the ascribed result signature is used asis, without further augmentation by type equations. In chapter 21 we developed a modular implementation of dictionaries in which the ordering of keys is made explicit as a substructure. The implementation of dictionaries is the same for each choice of order structure on the keys. This can be expressed by a single functor binding that deﬁnes the implementation of dictionaries parametrically in the choice of keys. functor DictFun (structure K : ORDERED) :> DICT where type Key.t = K.t = struct structure Key : ORDERED = K datatype ’a dict = Empty  Node of ’a dict * Key.t * ’a * ’a dict val empty = Empty fun insert (None, k, v) = Node (Empty, k, v, Empty) fun lookup (Empty, ) = NONE  lookup (Node (dl, l, v, dr), k) = if Key.lt(k, l) then lookup (dl, k) else if Key.lt (l, k) then lookup (dr, k) else v end The functor DictFun takes as argument a single structure speciﬁying the ordered key type as a structure implementing signature ORDERED. The opaquely ascribed result signature is DICT where type Key.t = K.t. This signature holds the type ’a dict abstract, but speciﬁes that the key type is the type K.t passed as argument to the functor. The body of the functor has been written so that the comparison operations are obtained from the Key substructure. This ensures that the dictionary code is independent of the choice of key type and ordering.
W ORKING D RAFT D ECEMBER 9, 2002
We assume we are given the signatures of the functor parameters. 2002 . implementations of the signature ORDERED. structure LtIntDict = DictFun (structure K = LessInt) structure LexStringDict = DictFun (structure K = LexString) structure DivIntDict = DictFun (structure K = DivInt) In each case the functor DictFun is instantiated by specifying a binding for its argument stucture K. Consider the application of DictFun to LtIntDict. For each reference to a type component of a functor parameter in the result signature. and DivIntDict are determined by instantiating the result signature of the functor DictFun according to the above procedure. the augmented signature for the transparent functors). The signature of a functor application is determined by the following procedure. 2. The signatures for the structures LtIntDict. 1 W ORKING D RAFT D ECEMBER 9. that type is “new” in every instance of that functor. as described in chapter 21. This behavior is called generativity of the functor. The augmented signature resulting The alternative. The argument structures are.1 Returning to the example of the dictionary functor. LexStringDict. This means that if a type is left abstract in the result signature of a functor.1 Functor Bindings and Applications 197 Instances of functors are obtained by application.23. and also the “true” result signature of the functor (the given signature for opaque functors. means that there is one abstract type shared by all instances of that functor. called applicativity. 1. propagate the type deﬁnitions of the augmented parameter signature to the result signature. They specify the type of keys and the sense in which they are ordered. This determines an augmentation of the parameter signature for each argument (as described in chapter 20). the three versions of dictionaries considered in chapter 21 may be obtained by applying DictFun to appropriate arguments. For each argument. match the argument signature against the corresponding parameter signature of the functor. The signature of the application determined by this procedure is then opaquely ascribed to the application. A functor application has the form funid (binds) where binds is a sequence of bindings of the arguments of the functor.
Vector = Sphere.t = string and that the signature of DivIntDict is DICT where type Key.t = int..Point. 23. and hence the result signature of DictFun is DICT where type Key.Vector and Sphere. and spheres of various dimensions. we deduce that the type K.Key. according to the following scheme: functor PointFun (structure V : VECTOR) : POINT = .Vector = Sphere. W ORKING D RAFT D ECEMBER 9.Vector end The sharing clauses ensure that the Point and Sphere components are compatible with each other. it makes sense to implement these as functors. points. By a similar process we deduce that the signature of LexStringDict is DICT where type Key.Point and Point. Since we expect to deﬁne vectors.2 Functors and Sharing Speciﬁcations In chapter 22 we developed a signature of geometric primitives that contained sharing speciﬁcations to ensure that the constituent abstractions may be combined properly. as desired. The signature GEOMETRY is deﬁned as follows: signature GEOMETRY = sig structure Point : POINT structure Sphere : SPHERE sharing Point = Sphere.198 Parameterization from matching the signature of LtIntDict against the parameter signature ORDERED is the signature ORDERED where type t=int Assigning this to the parameter K..t is equivalent to int. 2002 .t = int so that IntLtDict.t is equivalent to int.
.23. because the functors might be applied to arguments for which this is false. The solution is to include sharing constraints in the parameter list of the functors. For these to be true.Vector be the same as Vector.. which is not satisﬁed by the body of SphereFun. structure Point2D : POINT = PointFun (structure V = Vector2D) structure Sphere2D : SPHERE = SphereFun (structure V = Vector2D and P = Point2D) structure Geom2D : GEOMETRY = GeomFun (structure P = Point2D and S = Sphere2D) A threedimensional version is deﬁned similarly. the signature SPHERE requires that Point. end functor GeomFun (structure P : POINT structure S : SPHERE) : GEOMETRY = struct structure Point = P structure Sphere = S end 199 A twodimensional geometry package may then be deﬁned as follows: structure Vector2D : VECTOR = . the structures P. . . as follows: functor SphereFun (structure V : VECTOR structure P : POINT W ORKING D RAFT D ECEMBER 9.2 Functors and Sharing Speciﬁcations functor SphereFun (structure V : VECTOR structure P : POINT) : SPHERE = struct structure Vector = V structure Point = P . 2002 . There is only one problem: the functors SphereFun and GeomFun are not welltyped! The reason is that in both cases their result signatures require type equations that are not true of their parameters! For example. This is not true in general.Vector and V must be equivalent. Similar problems plague the functor GeomFun.
and are sufﬁcient to ensure that the requirements of the result signatures of the functors are met. 23. The chief virtue of sharing speciﬁcations is that they express directly and concisely the required relationships without requiring that these relationships be anticipated when deﬁning the signatures of the parameters. the answer is “yes”. for which it is impossible to assume any sharing relationships that one may wish to impose in an application. 2002 . This greatly facilitates reuse of offtheshelf code.Vector and P = S.Point end W ORKING D RAFT D ECEMBER 9.Vector = V) : SPHERE = struct structure Vector = V structure Point = P . let’s consider the bestcase scenario from chapter 22 in which we have minimized sharing speciﬁcations to one tying together the sphere and semispace components.Vector = S. we’re to implement the following signature: signature EXTD GEOMETRY = sig structure Sphere : SPHERE structure SemiSpace : SEMI SPACE sharing Sphere. To see what happens. but doing so does violence to the structure of your program. . Once again. That is.3 Avoiding Sharing Speciﬁcations As with sharing speciﬁcations in signatures.Point = SemiSpace. it is natural to wonder whether they can be avoided in functor parameters.Point) : GEOMETRY = struct structure Point = P structure Sphere = S end These equations preclude instantiations for which the required equations do not hold. end functor GeomFun (structure P : POINT structure S : SPHERE sharing P. .200 Parameterization sharing P.
Simply dropping the sharing speciﬁcation will not do. Rather than take implementations of SPHERE and SEMI SPACE as arguments. end functor ExtdGeomFun1 (structure P : POINT) : GEOMETRY = struct structure Sphere = W ORKING D RAFT D ECEMBER 9. . . functor SphereFun (structure P : POINT) : SPHERE = struct structure Vector = P.23. There are two methods for doing this. and use it to ensure that the required equation is true of the functor body. One is to make the desired equation true by construction.3 Avoiding Sharing Speciﬁcations The implementation is a functor of the form functor ExtdGeomFun (structure Sp : SPHERE structure Ss : SEMI SPACE sharing Sphere. then creates in the functor body appropriate implementations of spheres and semispaces. each with disadvtanges compared to the use of sharing speciﬁcations.Point = SemiSpace. we must arrange that the sharing equation in the signature EXTD GEOMETRY holds. because then there is no reason to believe that it will hold as required in the signature.Point) = struct structure Sphere = Sp structure SemiSpace = Ss end 201 To eliminate the sharing equation in the functor parameter. the functor ExtdGeomFun takes only an implementation of POINT. . end functor SemiSpaceFun (structure P : POINT) : SEMI SPACE = struct . A natural move is to “factor out” the implementation of POINT.Vector structure Point = P . . 2002 .
Another approach is to factor out the common component. whose only W ORKING D RAFT D ECEMBER 9.202 SphereFun (structure P = Point) structure SemiSpace = SemiSpaceFun (structure P = Point) end The problems with this solution are these: Parameterization • The body of ExtdGeomFun1 makes use of the functors SphereFun and SemiSpaceFun. starting with the components that are conceptually “furthest away” as arguments. functor ExtdGeomFun2 (structure P : POINT structure Sp : SPHERE where Point = P structure Ss : SEMI SPACE where Point = P) = struct structure Sphere = Sp structure SemiSpace = Ss end Now the required sharing requirements are met. We must reconstruct the entire hierarchy. and use this to constrain the arguments to the functor to ensure that the possible arguments are limited to situations for which the required sharing holds. which is then used to build up the appropriate substructures in a manner consistent with the required sharing. • The functor ExtdGeomFun1 must have as parameter the common element(s) of the components of its body. It has the disadvantage of requiring a third argument. In effect we are limiting the geometry functor to arguments that are built from these speciﬁc functors. • There is no inherent reason why ExtdGeomFun1 must take an implementation of POINT as argument. 2002 . This approach does not scale well when many abstractions are layered atop one another. and no other. It does so only so that it can reconstruct the hierarchy so as to satisfy the sharing requirements of the result signature. but it is also clear that this approach has no particular advantages over just using a sharing speciﬁcation. which may be applied to any implementations of SPHERE and SEMI SPACE. This is a signiﬁcant loss of generality that is otherwise present in the functor ExtdGeomFun.
but many compilers accept the syntax above. and no further disadvantages.2 This solution has all of the advantages of the direct use of sharing speciﬁcations. we must write where type Point. but also an implementation of POINT that is used to build these! A slightly more sophisticated version of this solution is as follows: functor ExtdGeomFun3 (structure Sp : SPHERE structure Ss : SEMI SPACE where Point = Sp. Ofﬁcially. An application of this functor must provide not only implementations of SPHERE and SEMI SPACE.23. Here is the point: sharing speciﬁcations allow a symmetric situation to be treated in a symmetric manner. rather than imposing it on the programmer. we are forced to violate arbitrarily the inherent symmetry of the situation.point = Sp. 2 W ORKING D RAFT D ECEMBER 9. We could just as well have written functor ExtdGeomFun4 (structure Ss : SEMI SPACE structure Sp : SPHERE where Point = Sp. Sharing speciﬁcations offload the burden of making such tedious (because arbitrary) decisions to the compiler.Point) = struct structure Sphere = Sp structure SemiSpace = Ss end without changing the meaning.3 Avoiding Sharing Speciﬁcations 203 role is to make it possible to express the required sharing.Point) = struct structure Sphere = Sp structure SemiSpace = Ss end The “extra” parameter to the functor has been eliminated by choosing one of the components as a “representative” and insisting that the others be compatible with it by using a where clause.Point.point. However. The compiler breaks the symmetry by choosing representatives arbitrarily in the manner illustrated above. 2002 .
204 Parameterization W ORKING D RAFT D ECEMBER 9. 2002 .
2002 .Part IV Programming Techniques W ORKING D RAFT D ECEMBER 9.
.
W ORKING D RAFT D ECEMBER 9. The discussion takes the form of a series of worked examples illustrating various techniques for building programs. 2002 .207 In this part of the book we will explore the use of Standard ML to build elegant. and efﬁcient programs. reliable.
2002 .208 W ORKING D RAFT D ECEMBER 9.
Chapter 24 Speciﬁcations and Correctness The most important tools for getting programs right are speciﬁcation and veriﬁcation. An effect speciﬁcation resembles a type speciﬁcation. An equational speciﬁcation states that one code fragment is equivalent to another. without saying anything about the value itself. • InputOutput Behavior. An inputoutput speciﬁcation is a mathematical formula. 2002 . A type speciﬁcation describes the “form” of the value of an expression. Speciﬁcations take many forms: • Typing. • Time and Space Complexity. • Effect Behavior.1 Speciﬁcations A speciﬁcation is a description of the behavior of a piece of code. usually an implication. The speciﬁcation is most often stated asymptotically in terms of the number of execution steps or the size of a data structure. This means that wherever W ORKING D RAFT D ECEMBER 9. 24. • Equational Speciﬁcations. In this Chapter we review the main ideas in preparation for their subsequent use in the rest of the book. A complexity speciﬁcation states the time or space required to evaluate an expression. it describes the effects it may engender when evaluated. but instead of describing the value of an expression. that describes the output of a function for all inputs satisfying some assumptions.
ﬂavor. This list by no means exhausts the possibilities. The code is written to solve a problem. 1)  fib’ n = let val (a. the speciﬁcation records for future reference what problem the code was intended to solve. 0)  fib’ 1 = (1. The greatest difﬁculty in using speciﬁcations is in knowing what to say. Good programmers use speciﬁcations to state precisely and concisely their intentions when writing a piece of code. according to interest and need. The guiding principle is this: if you are unable to write a clear speciﬁcation. 2002 . its type. Sometimes only relatively weak properties are important — the function square always yields a nonnegative result. A common misunderstanding is that a given piece of code has one speciﬁcation stating all there is to know about it. It’s a matter of taste and experience to know what to say and how best to say it. or operational. Other times stronger properties are needed — the function fibb applied to n yields the nth and n1st Fibonacci number. b) = fib’ (n1) in (a+b. or declarative. rather than a prescriptive. however. is a descriptive. What speciﬁcations have in common. The very act of formulating a precise speciﬁcation of a piece of code is often the key to ﬁnding a good solution to a tricky problem. not how it does it. Rather you should see a speciﬁcation as describing one of perhaps many properties of interest.210 Speciﬁcations and Correctness one is used we may replace it with the other without changing the observable behavior of any program in which they occur. A speciﬁcation states what a piece of code does. In ML every program comes with at least one speciﬁcation. Recall the following two ML functions from chapter 7: fun fib 0 = 1  fib 1 = 1  fib n = fib (n1) + fib (n2) fun fib’ 0 = (1. one. you do not understand the problem well enough to solve it correctly. But we may also consider other speciﬁcations. a) end Here are some speciﬁcations pertaining to these functions: W ORKING D RAFT D ECEMBER 9.
a speciﬁcation iff its execution behavior is as described by the speciﬁcation.2 Correctness Proofs • Type speciﬁcations: – val fib : int > int – val fib’ : int > int * int • Effect speciﬁcations: – The application fib n may raise the exception Overflow. In particular. Veriﬁcation is the process of checking that a program satisﬁes a speciﬁcation. or meets. This takes the form of proving a mathematical theorem (by hand or with machine assistance) stating that the program implements a speciﬁcation. then fib’ n evaluates the nth and n − 1st Fibonacci number. While there is nothing wrong with this usage. The veriﬁcation of this fact is then called a correctness proof.1. • Time complexity speciﬁcations: – If n ≥ 0. in that order. then fib n evaluates to the nth Fibonacci number. – If n ≥ 0. There are many misunderstandings in the literature about speciﬁcation and veriﬁcation. fib n is equivalent to #1(fib’ n) 24. • Inputoutput speciﬁcations: – If n ≥ 0. – If n ≥ 0. As we remarked in section 24. then fib n terminates in O(2n ) steps. It is worthwhile to take time out to address some of them here.24. there is in general no single preferred speciﬁcation of a piece of code. 2002 . 211 – The application fib’ n may raise the exception Overflow. then fib’ n terminates in O(n) steps. • Equivalence speciﬁcation: For all n ≥ 0. and hence so is correctness. Veriﬁcation is always relative to a speciﬁcation. It is often said that a program is correct if it meets a speciﬁcation. it invites misinterpretation.2 Correctness Proofs A program satisﬁes. a program can be “correct” with respect to W ORKING D RAFT D ECEMBER 9.
This is a ﬁne thing. look for insights from the speciﬁcation and veriﬁcation. which introduces an executable test that a certain computable condition holds. The speciﬁcation may not be mechanically checkable. Finally. code. If the code is difﬁcult to write. If these assumptions are not satisﬁed. a program is never inherently correct or incorrect. For this reason it is entirely possible that a correct program will malfunction when used.1 There are at least two fallacies here: 1. Consequently. 2002 . Another common misconception about speciﬁcations is that they can always be implemented as runtime checks. with each activity informing the other. but rather because the speciﬁcation is not relevant to the context in which the code is used. 1 W ORKING D RAFT D ECEMBER 9. rough out the code and proof to see what it should be. There is no “magic bullet”. Fortunately. problem. For this reason speciﬁcations are strictly more powerful than runtime checks. reconsider the code or the speciﬁcation (or both). Fundamental computability and complexity results make clear that we can never succeed in such an endeavor. and verify simultaneously. It may not even be possible to test it at runtime. it takes more work than a mere conditional branch to ensure that a program satisﬁes a speciﬁcation. 2. The speciﬁcation is stated at the level of the source code. If the veriﬁcation breaks down. then all bets are off — nothing can be said about its behavior. implementation. See chapter 32 for further discussion of this point. not because the speciﬁcation is not met. it is only so relative to a particular speciﬁcation. A related misunderstanding is the inference that a program “works” from the fact that it is “correct” in this sense. it is important to note that speciﬁcation. it is also completely artiﬁcial. This is false. But there is no way to implement this as a runtime check — this is an undecidable. and “incorrect” with respect to another. It is unrealistic to propose to verify that an arbitrary piece of code satisﬁes an arbitrary speciﬁcation.212 Speciﬁcations and Correctness one speciﬁcation. we might specify that a function f of type int>int always yields a nonnegative result. or uncomputable. This misconception is encouraged by the C assert macro. For example. and veriﬁcation go handinhand. Speciﬁcations usually make assumptions about the context in which the program is used. Correspondingly. If the speciﬁcation is complex. In practice we specify. but from this many people draw the conclusion that assertions (speciﬁcations) are simply boolean tests.
These generally state that a piece of code “may raise” one or more exceptions. For example. If you put type annotations on a function stating the types of the parameters or results of that function. • Effect speciﬁcations must be checked by hand. we may state the intended typing of fib as follows: fun fib (n:int):int = case n of 0 => 1  1 => 1  n => fib (n1) + fib (n2) This ensures that the type of fib is int>int. tools for building elegant. Veriﬁcations of speciﬁcations take many different forms.2 Correctness Proofs 213 but these are some of the most important. In the case of fib we may read off the following recurrence: T (0) = 1 T (1) = 1 T (n + 2) = T (n) + T (n + 1) + 1 W ORKING D RAFT D ECEMBER 9. Notice that a handler serves to eliminate a “may raise” speciﬁcation. in chapter 7 we proved that fib n yields the nth Fibonacci number by complete induction on n. For example. For example. • Type speciﬁcations are veriﬁed automatically by the ML compiler. typically using some form of induction. • Inputoutput speciﬁcations require proof. • Complexity speciﬁcations are often veriﬁed by solving a recurrence describing the execution time of a program.24. only that we have no information. the following function may not raise the exception Overflow. the correctness of these annotations is ensured by the compiler. we cannot conclude that the code does not raise it. and useful. robust programs. even though fib might: fun protected fib n = (fib n) handle Overflow => 0 A handler makes it possible to specify that an exception may not be raised by a given expression (because the handler traps it). If an exception is not mentioned in such a speciﬁcation. 2002 .
if x is negative. if x is nonnegative. then the program behaves a certain way. in general. once again calculating based on the deﬁnitions. it is not hard to prove by induction on n that fib n is equivalent to #1(fib’ n). one can hardly expect that x+1 will yield an integer. Just as in ordinary mathematical reasoning. This means that the preconditions impose obligations on the caller. type speciﬁcations are conditional on the types of its free variables. For example. but we’re only W ORKING D RAFT D ECEMBER 9. In the case of type speciﬁcations the compiler enforces this obligation by ruling out as illtyped any attempt to use a piece of code in a context that does not fulﬁll its typing assumptions. or postcondition. and consider n + 2. For example. these proofs are. the caller promises to fulﬁll the postcondition. then x+1 has type int. the user of the code. if one attempts to use the expression x+1 in a context where x is not an integer. For example. Then assume the result for n and n + 1. 2002 . if x has type int. in order for the callee. For example. Since equivalence of expressions must account for all possible uses of them. First. plug in n = 0 and n = 1. Inputoutput speciﬁcations are characteristically of this form. to be wellbehaved. One method that often works is “induction plus handsimulation”. • Equivalence speciﬁcations also require proof.2 To make use of the speciﬁcation in reasoning 2 There are obviously other speciﬁcations that carry more information. then all bets are off — nothing can be said about the behavior of the code. to obtain the result. of the speciﬁcation. and calculate using the deﬁnitions. Returning to the example above. What about speciﬁcations that are not mechanically enforced? For example. A conditional speciﬁcation is a contract between the caller and the callee: if the caller meets the preconditions. The premises of the speciﬁcation are preconditions that must be met in order for the program to behave in the manner described by the conclusion. Therefore it is rejected by the type checker as a violation of the stated assumptions governing the types of its free variables.3 Enforcement and Compliance Most speciﬁcations have the form of an implication: if certain conditions are met. the code itself.214 Speciﬁcations and Correctness Solving this recurrence yields the proof that T (n) = O(2n ). then so is x+1. very tricky. 24. if the premises of such a speciﬁcation are not true. then we cannot infer anything about x+1 from the speciﬁcation given above.
2002 . Many programming mistakes can be traced to violation of assumptions made by the callee that are not met by the caller.3 Enforcement and Compliance 215 about its used in a larger program. in general. only have the speciﬁcation. to reason about.3 What can be done about this? A standard method. Lacking mechanical enforcement of these obligations. if at all. then we will. is to augment the callee with runtime checks that ensure that its preconditions are met.24. Moreover. W ORKING D RAFT D ECEMBER 9. we might write a “bulletproofed” version of fib that ensures that its argument is nonnegative as follows: local exception PreCond fun unchecked fib 0 = 1  unchecked fib 1 = 1  unchecked fib n = unchecked fib (n1) + unchecked fib (n2) in fun checked fib n = if n < 0 then raise PreCond else unchecked fib n end It is worth noting that we have structured this program to take the precondition check out of the loop. it is all too easy to neglect them when writing code. It would be poor practice to deﬁne checked fib as follows: fun bad checked fib n = if n < 0 then raise PreCond else case n of 0 => 1  1 => 1  n => bad checked fib (n1) + bad checked fib (n2) concerned here with the one given. and not the code. if f is an unknown function. 3 Sadly. raising an exception if they are not. For example. these assumptions are often unstated and can only be culled from the code with great effort. called bulletprooﬁng. it is essential that this precondition be met in the context of its use.
2002 . In chapter 32 we will consider the use of data abstraction to enforce at compile time speciﬁcations that may not be checked at runtime.216 Speciﬁcations and Correctness Once we know that the initial argument is nonnegative. the real overhead is requiring that the implementor of the callee take the trouble to impose the checks. even those that have ensured that the desired precondition is true. and far more importantly. W ORKING D RAFT D ECEMBER 9. As we remarked earlier. bulletprooﬁng in this form has several drawbacks. we may wish to impose the requirement that a function argument of type int>int always yields a nonnegative result. In truth the runtime overhead is minor. For example. There is no runtime check for this condition — we cannot write a function nonneg of type (int>int)>bool that determines whether or not a function f always yields a nonnegative result. First. bulletprooﬁng only applies to speciﬁcations that can be checked at runtime. For these cases there is no question of inserting runtime checks to enforce the precondition. Second. it is assured that recursive calls also satisfy this requirement. provided that you’ve done the inductive reasoning to validate the speciﬁcation of the function. not all speciﬁcations are amenable to runtime checks. However. it imposes the overhead of checking on all callers.
The postcondition. is that the result of applying exp to n is the number 2n . and often leads to more efﬁcient. robust. The caller is required to establish the precondition before applying exp. or assumption. Here’s the code: fun exp 0 = 1  exp n = 2 * exp (n1) W ORKING D RAFT D ECEMBER 9. then exp n evaluates to 2n . think inductively. Our ﬁrst example is to compute 2n for integers n ≥ 0. If a function is recursivelydeﬁned. The motto is when programming recursively. Doing so signiﬁcantly reduces the time spent debugging. the caller may assume that the result is 2n . is that the argument n is nonnegative. and elegant programs. or guarantee. in exchange.Chapter 25 Induction and Recursion This chapter is concerned with the close relationship between recursionand induction in programming. 2002 . an inductive proof is required to show that it meets a speciﬁcation of its behavior. The precondition. 25. all involving the computation of the integer exponential function.1 Exponentiation Let’s start with a very simple series of examples. We seek to deﬁne the function exp of type int>int satisfying the speciﬁcation if n ≥ 0.
p ≥ 2n−1 . 20 . where p is the value of exp (n − 1). We must. In this case the solution is easily obtained: T (n) = O(n) Thus we have a linear time algorithm for computing the integer exponential function. Solving this simple recurrence yields the equation S(n) = O(n) expressing that exp is also a linear space algorithm for the integer exponential function. Notice that we need not consider arguments n < 0 since the precondition of the speciﬁcation requires that this be so. Assuming that arithmetic operations are executed in constant time. as desired. then we can read off a recurrence describing its execution time as follows: T (0) = O(1) T (n + 1) = O(1) + T (n) We are interested in solving a recurrence by ﬁnding a closedform expression for it. assume that exp is correct for n − 1 ≥ 0. W ORKING D RAFT D ECEMBER 9. and consider the value of exp n.218 Induction and Recursion Does this function satisfy the speciﬁcation? It does. That was pretty simple. 2002 . and we can prove this by induction on n. ensure that each recursive call satisﬁes this requirement in order to apply the inductive hypothesis. Now let us consider the running time of exp expressed as a function of n. which is. Inductively. From the second line of its deﬁnition we can see that this is the value of 2 × p. however. so 2 × p = 2 × 2n−1 = 2n . Based on our earlier discussions of recursion and iteration we can argue informally that the deﬁnition of exp given above requires space given by the following recurrence: S(0) = O(1) S(n + 1) = O(1) + S(n) The justiﬁcation is that the implementation requires a constant amount of storage to record the pending multiplication that must be performed upon completion of the recursive call. Otherwise. of course. What about space? This is a much more subtle issue than time because it is much more difﬁcult in a highlevel language such as ML to see where the space is used. If n = 0. then exp n evaluates to 1 (as you can see from the ﬁrst line of its deﬁnition).
If nis even. we use successive squaring to achieve logarithmic time and space requirements. we reduce the exponent by one and double the result. If. Here’s the code: fun square (n:int) = n*n fun double (n:int) = n+n fun fast exp 0 = 1  fast exp n = if n mod 2 = 0 then square (fast exp (n div 2)) else double (fast exp (n1)) Its speciﬁcation is precisely the same as before. on the other hand. on both counts! Here’s how. that n > 0. For n = 0 the argument is exactly as before. Does this code satisfy the speciﬁcation? Yes. we square the result of raising 2 to half the given power. so squaring it yields 2(ndiv2) × 2(n÷2) = 22×(n÷2) = 2n . as required. The idea is that if the exponent is even. n is odd. but that all preceding numbers have it. Here’s a recurrence governing the running time of fast exp as a function of its argument: T (0) = O(1) T (2n) = O(1) + T (n) T (2n + 1) = O(1) + T (2n) = O(1) + T (n) Solving this recurrence using standard techniques yields the solution T (n) = O(lgn) You should convince yourself that fast exp also requires logarithmic space usage. then. and arguing that therefore n must have it. otherwise. ensuring that the next exponent will be even. the value is the result of doubling exp (n − 1). multiplying by two at each stage. so doubling it yields 2n .1 Exponentiation 219 Can we do better? Yes. Rather than count down by one’s.25. the value of exp n is the result of squaring the value of exp (n ÷ 2). as required. 2002 . Here’s how it’s done. and we can prove this by using complete induction. Inductively the latter value is 2(n−1) . W ORKING D RAFT D ECEMBER 9. Suppose. a form of mathematical induction in which we may prove that n > 0 has a desired property by assuming not only that the predecessor has it. Inductively this value is 2(n÷2) .
but what if n > 0 is even? Then by induction we compute 2(n÷2) × 2(n÷2 ) × a by two recursive calls to skinny fast exp. skinny fast exp (n div 2. but what is the running time? Here’s a recurrence to describe its running time as a function of n: T (0) = 1 T (2n) = O(1) + 2T (n) T (2n + 1) = O(1) + T (2n) = O(1) + 2T (n) Here again we have a standard recurrence whose solution is T (n) = O(nlgn). 2*a) It is easy to see that this code works properly for n = 0 and for n > 0 when n is odd. 2002 .220 Induction and Recursion Can we do better? Well. a) = a  skinny fast exp (n. a) evaluates to 2n × a. this may not be achieved in this case by simply adding an accumulator argument. W ORKING D RAFT D ECEMBER 9. n. This yields the desired result. However. a) evaluates to bn × a. without also increasing the running time! The obvious approach is to attempt to satisfy the speciﬁcation if n ≥ 0. but we can reduce the space required to O(1) by putting the function into iterative (tail recursive) form. a) = if n mod 2 = 0 then skinny fast exp (n div 2. Here’s some code that achieves this speciﬁcation: fun skinny fast exp (0. Here’s the speciﬁcation: if n ≥ 0. We can achieve a logarithmic time and exponential space bound by a change of base. it’s not possible to improve the time requirement (at least not asymptotically). a)) else skinny fast exp (n1. then gen skinny fast exp (b. then skinny fast exp (n. Yuck! Can we do better? The key is to recall the following important fact: 2(2n) = (22 )n = 4n .
This is easily achieved by using a local declaration to “hide” the generalized function. and how we used induction not only to verify programs afterthefact. Here’s what the code looks like in this case: local fun gen skinny fast exp (b. 0. If n is even. a) = . more importantly. The base case is obvious. 0. If the veriﬁcation is performed simultaneously with the coding. b * a) 221 Let’s check its correctness by complete induction on n. it is important to insulate the client from it. This completes the proof. 2002 . a) =  gen skinny fast exp (b. to help discover the program in the ﬁrst place. Assume the speciﬁcation for arguments smaller than n > 0.. in fun exp n = gen skinny fast exp (2. we obtain inductively the result b(n−1) × b × a = bn × a. then by induction the result is (b × b)(n÷2) × a = bn × a.1 Exponentiation Here’s the code: fun gen skinny fast exp (b. a) = if n mod 2 = 0 then gen skinny fast exp (b*b. The trick to achieving an efﬁcient implementation of the exponential function was to compute a more general function that can be implemented using less time and space. a) = a  gen skinny fast exp (b. n . it is often easier to push through an inductive argument for a stronger speciﬁcation. 1) end (The ellided code is the same as above. it is far more likely that the proof will go through and the program will work the ﬁrst time you run it. and if n is odd. n. n.. a) else gen skinny fast exp (b.25.) The point here is to see how induction and recursion go handinhand. making available to the caller a function satisfying the original speciﬁcation. but.1. Since this is a trick employed by the implementor of the exponential function. precisely because we get to assume the result as the inductive W ORKING D RAFT D ECEMBER 9. n div 2. n. It is important to notice the correspondence between strengthening the speciﬁcation by adding additional assumptions (and guarantees) and adding accumulator arguments. What we observe is the apparent paradox that it is often easier to do something (superﬁcially) harder! In terms of proving.
It remains to show that this is the g. of m and n. in which case the answer is. so that (mmodn) × n = m × n − (m ÷ n)n2 < m × n. The greatest common divisor of nonnegative integers m and n is the largest p such that pm and pn. no recursive calls are available to help us along here.c. Observe that mmodn = m − (m ÷ n) × n.d. n) else gcd (m. Otherwise the product is positive. so that by induction we return the g. then gcd(m. We are limited only by the requirement that the result be deﬁned outright for the base case(s).c. of 0 and 0 is taken to be 0.d.d. n:int):int = if m>n then gcd (m mod n. correctly. Here’s the algorithm. of m and n. it is often easier to compute a more complicated function involving accumulator arguments. and we proceed according to whether m > n or m ≤ n.d.c.c.222 Induction and Recursion hypothesis when arguing the inductive step(s). iff n is a multiple of m. of m andn is deﬁned by complete induction on the product mn. the other number. Recall that m is a divisor of n. 0):int = m  gcd (0. then either mor n is zero. written mn. If m × n is zero. the computation of the greatest common divisor of a pair of nonnegative integers.) Here’s the speciﬁcation of the gcdfunction: if m.d. If d divides both mmodn and n. Euclid’s algorithm for computing the g. n ≥ 0. written in ML: fun gcd (m:int. no inductive assumption is available to help us along here. then k × d = (mmodn) = (m − (m ÷ n) × n)and l × d = n for some nonnegative k W ORKING D RAFT D ECEMBER 9. (By convention the g. Suppose that m > n. 25. n mod m) Why is this algorithm correct? We may prove that gcd satisﬁes the speciﬁcation by complete induction on the product m × n.n) evaluates to the g. of mmodn and n.2 The GCD Algorithm Let’s consider a more complicated example.c. n:int):int = n  gcd (m:int. 2002 . We are limited only by the requirement that the speciﬁcation be proved outright at the base case(s). In terms of programming. precisely because we get to exploit the accumulator when making recursive calls. which is to say that there is some k ≥ 0 such that n = k × m.
For example. follows similarly. Sure. example a bit further.25.d. so d > d . you may be thinking. We’ll illustrate the idea by elaborating on the g. say. and then checks the result to ensure that it really is the greatest common divisor of the two given nonnegative integers before returning it as result. these checks have a runtime cost. The code might look something like this: exception GCD ERROR fun checked gcd (m. Consequently. n) in if m mod d = 0 andalso n mod d = 0 andalso ??? then d else W ORKING D RAFT D ECEMBER 9. so m = (k +(m÷n)×l)×d.c.d. Suppose we wish to write a selfchecking g. The other case. This completes the proof. of m and n. and what checks should be included to help ensure the correct operation (or. At this point you may well be thinking that all this inductive reasoning is surely helpful.c. m ≤ n. it can’t possibly be anything other than the case at that point in the program. in which case we’re defeating the purpose by including them! This raises the question of where should we put such checks. d is the g.2 The GCD Algorithm 223 and l.. which is to say that d divides m. provided that sufﬁciently cheap checks can be put into place and provided that you know where to put them to maximize their effectiveness.d. It’s hard to complain about this attitude. but it’s no replacement for good oldfashioned “bulletprooﬁng” — conditional tests inserted at critical junctures to ensure that key invariants do indeed hold at execution time. n) = let val d = gcd (m. That is. Or it may be possible to insert a check whose computation is more expensive (or more complicated) than the one we’re trying to perform. k ×d = m−(m÷n)×l ×d. at least. algorithm that computes the g. there’s no use checking i > 0 at the start of the then clause of a test for i > 0. but they can be turned off once the code is in production.d. and anyway the cost is minimal compared to. then it is also a divisor of (mmodn) and n. the time required to read and write from disk. graceful malfunction) of our programs? This is an instance of the general problem of writing selfchecking programs. Barring compiler bugs. 2002 .c. Now if d’ is any other divisor of m and n.c.
it has to be emphasized. but also the coefﬁcients a and b such that d = a × m + b × n. 0) = (m. 0.c. refuting the maximality of d.c. and d = a × m + b × n for some integers (possibly negative!) a and b.d. of m and n. relies on the kind of mathematical reasoning we’ve been considering right along. 2002 . n mod m) in (d. which. but how do we check that it’s the greatest common divisor? Well. a .224 raise GCD ERROR end Induction and Recursion It’s obviously no problem to check that putative g. 0)  ggcd (m. d of m and n. But there’s a better way.. of m and n iff d divides both m and n and can be written as a linear combination of m and n. one choice is to simply try all numbers between d and the smaller of m and n to ensure that no intervening number is also a divisor. n) = (n.d. a.c. n = l × d for some l ≥ 0. We’ll prove this constructively by giving a program to compute not only the g. b) end W ORKING D RAFT D ECEMBER 9. But this is clearly so inefﬁcient as to be impractical. n) in (d.d. and d = a × m + b × n.d. b) = ggcd (m. a. d is the g.b*(n div m). a. d is the g. of mand n iff m = k × d for some k ≥ 0. Here’s an important fact: d is the g. n) = if m>n then let val (d.c. n ≥ 0. d. d divides n. consequently.a * (m div n)) end else let val (d. b) = ggcd (m mod n. then ggcd (m. b) such that d divides m. 1)  ggcd (m.d. b .c. n) evaluates to (d. And here’s the code to compute it: fun ggcd (0. is in fact a common divisor of mand n. That is. a. 1. Here’s the speciﬁcation: if m.
of mmodn and n. then either m or n is 0. a.25. This illustrates the power of the interplay between mathematical reasoning methods such as induction and number theory and programming methods such as bulletprooﬁng to achieve robust. n) = let val (d.d. and b such that d is the g. n) in if m mod d = 0 andalso n mod d = 0 andalso d = a*m+b*n then d else raise GCD ERROR end This algorithm takes no more time (asymptotically) than the original. 2002 .2 The GCD Algorithm 225 We may easily check that this code satisﬁes the speciﬁcation by induction on the product m × n. moreover. If m × n = 0.c. and show it for m × n > 0. of m and n. Otherwise assume the result for smaller products. ensures that the result is correct. from which the result follows.c. and hence is the g. reliable.d. Now we can write a selfchecking g. Inductively we obtain d. and d = a × (mmodn) + b × n. it follows that d = a × m + (b − a × (m ÷ n)) × n. Suppose m > n. elegant programs.c. W ORKING D RAFT D ECEMBER 9. what is more important. and. as follows: exception GCD ERROR fun checked gcd (m.d. and. a. b) = ggcd (m. in which case the result follows immediately. the other case is handled analogously. Since mmodn = m − (m ÷ n) × n.
226 Induction and Recursion W ORKING D RAFT D ECEMBER 9. 2002 .
Rather than develop a general treatment of inductivelydeﬁned types. 26. Still another way of saying it is to deﬁne N by the following clauses: 1. Rather. then so is m + 1. Let’s begin by considering the natural numbers as an inductively deﬁned type. it ensures that we are talking about N and not just some superset of it. (The third clause is sometimes called the extremal clause.) All of these deﬁnitions are equivalent ways of saying the same thing. which in this case is just ordinary mathematical induction. may be thought of as the smallest set containing 0 and closed under the formation of successors.Chapter 26 Structural Induction The importance of induction and recursion are not limited to functions deﬁned over the integers. 3.1 Natural Numbers The set of natural numbers. If m is an element of N . 2. it sufﬁces to demonstrate the following facts: W ORKING D RAFT D ECEMBER 9. In other words. Nothing else is an element of N . the familiar concept of mathematical induction over the natural numbers is an instance of the more general notion of structural induction over values of an inductivelydeﬁned type. 2002 . we may prove properties of the natural numbers by structural induction. Speciﬁcally. we will rely on a few examples to illustrate the point. n is an element of N iff either n = 0 or n = m + 1 for some min N . N . Since N is inductively deﬁned. to prove that a property P (n) holds of every n in N . 0 is an element of N .
Note. Thus f is uniquely determined for all values of n in N . then by induction the value of f is determined for all values k ≤ m.) Functions deﬁned by structural recursion are naturally represented by clausal function deﬁnitions. we may show by induction on n ≥ 0 that the value of f is uniquely determined on all values m ≤ n. as was to be shown. This assumption forms the basis for exhaustiveness checking for clausal function deﬁnitions. as follows: datatype nat = Zero  Succ of nat The constructors correspond oneforone with the clauses of the inductive deﬁnition. The extremal clause is implicit in the datatype declaration since the given constructors are assumed to be all the ways of building values of type nat. Give the value of f (m + 1) in terms of the value of f (m). that this representation places a ﬁxed upper bound on the size of numbers. an unnatural and spacewasting representation. may be represented in ML using a datatype declaration. Speciﬁcally. Assuming that P (m) holds. To deﬁne a function f with domain N . show that P (m + 1) holds. The pattern of reasoning follows exactly the structure of the inductive definition — the base case is handled outright. (You may object that this deﬁnition of the type nat amounts to a unary (base 1) representation of natural numbers. Give the value of f (0). of course. Hybrid representations that enjoy the beneﬁts of both are. Given this information. 2002 . and the inductive step is handled by assuming the property for the predecessor and show that it holds for the numbers. it sufﬁces to proceed as follows: 1. The principal of structural induction also licenses the deﬁnition of functions by structural recursion. But the value of f at n is determined as a function of f (m). This is indeed true. If n = 0. as in the following example: W ORKING D RAFT D ECEMBER 9. whereas the unary representation does not. and hence is uniquely determined. If n = m + 1. 2. in practice the natural numbers are represented as nonnegative machine integers to avoid excessive overhead. viewed as an inductivelydeﬁned type. however. Show that P (0) holds.228 1. there is a unique function f with domain N satisfying these requirements. possible and occasionally used when enormous numbers are required. The natural numbers. Structural Induction 2. this is obvious since f (0) is deﬁned by the ﬁrst clause.
Deﬁne the function for nil. it sufﬁces to proceed as follows: 1. we may easily prove by structural induction over the type nat that for every n ∈ N . 2. and t is a value of type ’a list. Show that P holds for h::t. 3. 2002 . This deﬁnition licenses the following principle of structural induction over lists. To avoid restricting the programmer. then h::t is a value of type ’a list. To prove that a property P holds of all lists l. Nothing else is a value of type ’a list. we may prove properties of functions deﬁned over the naturals. If h is a value of type ’a. but it does not ensure that the pattern of structural recursion is strictly followed — we may accidentally deﬁne f (m + 1) in terms of itself or some f (k) where k > m. breaking the pattern. For example.2 Lists Generalizing a bit. nil is a value of type ’a list 2. (In previous chapters we carried out proofs of more interesting program properties. Similarly. the language assumes the best and allows any form of deﬁnition.2 Lists fun  fun  double Zero = Zero double (Succ n) = Succ (Succ (double n)) exp Zero = Succ(Zero) exp (Succ n) = double (exp n) 229 The type checker ensures that we have covered all cases. Show that P holds for nil. W ORKING D RAFT D ECEMBER 9. The reason this is admitted is that the ML compiler cannot always follow our reasoning: we may have a clever algorithm in mind that isn’t easily expressed by a simple structural induction. Using the principle of structure induction for the natural numbers. exp n evaluates to a positive number.) 26. assuming that P holds for t. we may think of the type ’a list as inductively deﬁned by the following clauses: 1.26. we may deﬁne functions by structural recursion over lists as follows: 1.
) The principle of structural recursion may be applied to deﬁne the reverse function as follows: fun reverse nil = nil  reverse (h::t) = reverse t @ [h] There is one clause for each constructor. and the value of reverse for h::t is deﬁned in terms of its value for t. as indeed it does and ought to. 26. as indeed it does since it returns reverse (t @ [h]).3 Trees Generalizing even further.) Using the principle of structural induction over lists.230 Structural Induction 2. Second. t3)) = ??? W ORKING D RAFT D ECEMBER 9. we assume that reverse t yields the reversal of t. we show that reverse nil yields nil. and argue that reverse (h::t) yields the reversal of h::t. t1. 2002 . First. we may prove that reverse l evaluates to the reversal of l. t1. (We have ignored questions of time and space efﬁciency to avoid obscuring the induction principle underlying the deﬁnition of reverse. t2. The clauses of the inductive deﬁnition of lists correspond to the following (builtin) datatype declaration in ML: datatype ’a list = nil  :: of ’a * ’a list (We are neglecting the fact that :: is regarded as an inﬁx operator. Here’s a deﬁnition of 23 trees in ML: datatype ’a twth tree = Empty  Bin of ’a * ’a twth tree * ’a twth tree  Ter of ’a * ’a twth tree * ’a twth tree * ’a twth tree How might one deﬁne the “size” of a value of this type? Your ﬁrst thought should be to write down a template like the following: fun size Empty = ???  size (Bin ( . Deﬁne the function for h::t in terms of its value for t. we can introduce new inductivelydeﬁned types such as 23 trees in which interior nodes are either binary (have two children) or ternary (have three children). t2)) = ???  size (Ter ( .
in most cases datatypes are naturally viewed as inductively deﬁned.4 Generalizations and Limitations 231 We have one clause per constructor. and will ﬁll in the ellided expressions to complete the deﬁnition. t1. use a clausal deﬁnition with one clause per constructor The catch is that not every datatype declaration supports a principle of structural induction because it is not always clear what constitutes the predecessor(s) of a constructed value. In practice this sort of deﬁnition comes up only rarely. No matter what the form of the declaration it always makes sense to deﬁne a function over it by a clausal function deﬁnition with one clause per constructor. For example. W ORKING D RAFT D ECEMBER 9. t1. In many cases (including this one) the function is deﬁned by structural recursion. t2)) = 1 + size t1 + size t2  size (Ter ( . because then the compiler will inform you of what clauses need to be added or removed from functions deﬁned over that type in order to restore it to a sensible deﬁnition. Such a deﬁnition is guaranteed to be exhaustive (cover all cases).26. and hence it is not clear what to regard as its predecessor. Here’s the complete deﬁnition: fun size Empty = 0  size (Bin ( . t2. t3)) = 1 + size t1 + size t2 + size t3 Obviously this function computes the number of nodes in the tree.4 Generalizations and Limitations Does this pattern apply to every datatype declaration? Yes and no. (It is especially valuable if you change the datatype declaration. 26. and serves as a valuable guide to structuring your code. 2002 . as you can readily verify by structural induction over the type ’a twth tree.) The slogan is: To deﬁne functions over a datatype. the declaration datatype D = Int of int  Fun of D>D is problematic because a value of the form Fun(f ) is not constructed directly from another value of type D.
Similarly. we may associate with each inductivelydeﬁned type a higherorder function that takes as arguments values that determine the base case(s) and step case(s) of the deﬁnition. 2. 2002 .232 Structural Induction 26. nat rec yields a function of type nat > ’a whose behavior is determined at the base case by the ﬁrst argument and at the inductive step by the second. Here’s an example of the use of nat rec to deﬁne the exponential function: val double = nat rec Zero (fn result => Succ (Succ result)) val exp = nat rec (Succ Zero) double Note well the pattern! The arguments to nat rec are 1. Given the ﬁrst two arguments. Speciﬁcally. The value for Zero. The pattern of structural induction over the type nat may be codiﬁed by the following function: fun nat rec base step = let fun loop Zero = base  loop (Succ n) = step (loop n) in loop end This function has the type ’a > (’a > ’a) > nat > ’a. and deﬁnes a function by structural induction based on these arguments. loop t) in W ORKING D RAFT D ECEMBER 9.5 Abstracting Induction It is interesting to observe that the pattern of structural recursion may be directly codiﬁed in ML as a higherorder function. The value for Succ n deﬁned in terms of its value for n. An example will illustrate the point. the pattern of list recursion may be captured by the following functional: fun list recursion base step = let fun loop nil = base  loop (h::t) = step (h.
s1.26.5 Abstracting Induction loop end The type of the function list recursion is ’a > (’b * ’a > ’a) > ’b list > ’a It may be instantiated to deﬁne the reverse function as follows: 233 val reverse = list recursion nil (fn (h. Whenever you’re programming with a datatype. t3)) = ter step (v. t) => t @ [h]) Finally. t1. s1. the principle of structural recursion for values of type ’a twth tree is given as follows: fun twth rec base bin step ter step = let fun loop Empty = base  loop (Bin (v. the principle of structural induction over a recursive datatype is naturally codiﬁed in ML using pattern matching and higherorder functions. t1. loop t2. you should use the techniques outlined in this chapter to structure your code. one for each form of node. loop t1. W ORKING D RAFT D ECEMBER 9. The type of twth rec is ’a > (’b * ’a * ’a > ’a) > (’b * ’a * ’a * ’a > ’a) > ’b twth tree > We may instantiate it to deﬁne the function size as follows: val size = twth rec 0 (fn ( . loop t1. s3)) => 1+s1+s2+s3) Summarizing. loop t2)  loop (Ter (v. t2. loop t3) in loop end Notice that we have two inductive steps. 2002 . s2)) => 1+s1+s2) (fn ( . s2. t2)) = bin step (v.
234 Structural Induction W ORKING D RAFT D ECEMBER 9. 2002 .
let us ﬁrst deﬁne the set of regular expressions and their meaning as a set of strings. This is a difﬁcult problem in itself. We call this process proofdirected debugging. We write s1 s2 for the concatenation of the strings s1 and s2 . We then attempt to verify that the matching program developed in chapter 1 satisﬁes this speciﬁcation. We then consider how best to resolve the problem. a set of primitive “letters” that may be used in a regular expression. We do no distinguish between a character and W ORKING D RAFT D ECEMBER 9. The length of a string is the number of letters in it. but instead by change of speciﬁcation. A string is a ﬁnite sequence of letters of the alphabet. The code is similar to that sketched in chapter 1. Careful examination of the failure reveals a counterexample to the speciﬁcation — the program does not satisfy it. the string s consisting of the letters in s1 followed by those in s2 . The ﬁrst task is to devise a precise speciﬁcation of the regular expression matcher.1 Regular Expressions and Languages Before we begin work on the matcher.Chapter 27 ProofDirected Debugging In this chapter we’ll put speciﬁcation and veriﬁcation techniques to work in devising a regular expression matcher. The proof attempt breaks down. the empty sequence of letters. We write ε for the null string. The set of regular expressions is given by the following grammar: r : : = 0  1  a  r1 r2  r1 + r2  r∗ Here a ranges over a given alphabet. 2002 . not by change of implementation. 27. but we will use veriﬁcation techiques to detect and correct a subtle error that may not be immediately apparent from inspecting or even testing the code.
s2 ∈ L2 } 1 L L(i) (i) i≥0 L An important fact about L∗ is that it is the smallest language L such that 1 + L L ⊆ L . First. it follows immediately that ε ∈ L∗ .236 ProofDirected Debugging the unitlength string consisting solely of that character. 1 + L L∗ ⊆ L∗ . Every regular expression r stands for a particular language L(r). 2002 . which is to say that (a) ε ∈ L∗ . the language of r. then it sufﬁces to show W ORKING D RAFT D ECEMBER 9. Second. then l ∈ L(i) for some i ≥ 0. If 1 + L L ⊆ L . this means two things: 1. Now suppose that L is such that 1 + L L ⊆ L . if l ∈ L and l ∈ L∗ . If i = 0. since L(0) = 1. from which the result follows immediately. and hence l l ∈ L(i+1) by deﬁnition of concatentation of languages. and (b) if s ∈ L and s ∈ L∗ . This means that L∗ is the smallest language (with respect to language containment) that contains the null string and is closed under concatenation on the left by L. We show by induction on i ≥ 0 that L(i) ⊆ L . Thus we write a s for the extension of s with the letter a at the front. Let’s prove that this is the case. then s s ∈ L∗ . 2. Spelled out. A language is a set of strings. which is deﬁned by induction on the structure of r as follows: L(0) L(1) L(a) L(r1 r2 ) L(r1 + r2 ) L(r∗ ) = = = = = = 0 1 {a} L(r1 ) L(r2 ) L(r1 ) + L(r2 ) L(r)∗ This deﬁnition employs the following operations on languages: 0 1 L1 + L2 L1 L2 L(0) L(i+1) L∗ = = = = = = = ∅ {ε} L1 ∪ L2 { s1 s2  s1 ∈ L1 . then L∗ ⊆ L . We are to show that L∗ ⊆ L . This completes the ﬁrst step.
Having proved that L∗ is the smallest language L such that 1 + L L ⊆ L .2 Specifying the Matcher 237 that ε ∈ L . W ORKING D RAFT D ECEMBER 9. which implies that 1 ⊆ L . We will assume in what follows that the alphabet is given by the type char of characters. 2002 . We seek to deﬁne a function match of type regexp > string > bool that determines whether or not a given string matches a given regular expression. that strings are elements of the type string. L(i+1) = L L(i) . For the converse. This is easily established by a simple case analysis. As a ﬁrst cut we write down a type speciﬁcation.2 Specifying the Matcher Let us begin by devising a speciﬁcation for the regular expression matcher. To show that L(i+1) ⊆ L . for then the result follows by minimality the result. when in fact we mean that the corresponding string is a member of that language. a word about implementation. But this follows from the assumption that 1 + L L ⊆ L . In particular we will speak of a character list as being a member of a language. it is not hard to prove that L∗ satisﬁes the recurrence L∗ = 1 + L L∗ . we observe that. 27. by deﬁnition. since L L ⊆ L by assumption.27. we wish to satisfy the following speciﬁcation: For every regular expression r and every string s. and that regular expressions are deﬁned by the following datatype declaration: datatype regexp = Zero  One  Char of char  Plus of regexp * regexp  Times of regexp * regexp  Star of regexp We will also work with lists of characters. it sufﬁces to observe that 1 + L (1 + L L∗ ) ⊆ 1 + L L∗ . Exercise 1 Give a full proof of the fact that L∗ = 1 + L L∗ . By induction L(i) ⊆ L . values of type char list. More precisely. and evaluates to true iff s ∈ L(r). using ML notation for primitive operations on lists such as concatenation and extension. Occasionally we will abuse notation and not distinguish (in the informal discussion) between a string and a list of characters. We just proved the right to left containment. match r s terminates. Finally. and hence L L(i) ⊆ L .
In other words we cannot simply accept any partitioning whose initial segment matches r. . ”. otherwise. we might say “For all . be many ways to partition the input to as to satisfy both of these requirements. There may. Written out in full. . . . then match is r cs k evaluates to true. Unfortunately. and a continuation k. and continuation k. For every regular expression r. Instead.238 ProofDirected Debugging We saw in chapter 1 that a natural way to deﬁne the procedure match is to use a technique called continuation passing. we must reject this partitioning and search for another. then . if there is no input on which k terminates. This leads to the following revised speciﬁcation: For every regular expression r. otherwise. Put more precisely. and a continuation. . those that always terminate with true or false on any input. . and determines whether or not some initial segment of cs matches r. Observe that this speciﬁcation makes use of an implicit existential quantiﬁcation. we need only ﬁnd one such way. It should be intuitively clear that we can never implement such a function. the speciﬁcation requires that match is return false. if there exists cs and cs such that cs = cs cs with . character list cs. a character list cs. a list of characters (essentially a string. character list cs. but rather only those that also W ORKING D RAFT D ECEMBER 9. we must restrict attention to total continuations. that if cs = cs @cs with cs ∈ L(r) but k cs yielding false. match is r cs k evaluates to false. This observation makes clear that we must search for a suitable splitting of cs into two parts such that the ﬁrst part is in L(r) and the second is accepted by k. 2002 . . this speciﬁcation is too strong to ever be satisﬁed by any program! Can you see why? The difﬁculty is that if k is not guaranteed to terminate for all inputs. For example. if cs = cs @cs with cs ∈ L(r) and k cs evaluates to true. passing the remaining characters cs to k in the case that there is such an initial segment. then match is r cs k evaluates true. in general. match is r cs k evaluates to false. and yields false otherwise. . Note. We deﬁned an auxiliary function match is with the type regexp > char list > (char list > bool) > bool that takes a regular expression. however. The idea is that match is takes a regular expression r. then there is no way that match is can behave as required. if cs = cs cs with cs ∈ L(r) and k cs evaluates to true. and yields a boolean. but in a form suitable for incremental processing). and total continuation k.
explode s) (fn nil => true  false) Notice that the initial continuation is indeed total. r2)) cs k = match is r1 cs (fn cs’ => match is r2 cs’ k)  match is (Plus (r1. That is. and otherwise match r s evaluates to false. r2)) cs k = match is r1 cs k orelse match is r2 cs k  match is (Star r) cs k = k cs orelse match is r cs (fn cs’ => match is (Star r) cs’ k) Since match is is deﬁned by a recursive analysis of the regular expression r. Suppose for the moment that match is satisﬁes this speciﬁcation. We may return false only if there is no such splitting.27. we are done. it is natural to proceed by induction on the structure of r. and that it yields true (accepts) iff it is applied to the null string. we treat W ORKING D RAFT D ECEMBER 9. How might we check this? Recall the deﬁnition of match is given in the overview: fun    match is Zero k = false match is One cs k = k cs match is (Char c) nil k = false match is (Char c) (d::cs) k = if c=d then k cs else false  match is (Times (r1. Thus match is correct (satisﬁes its speciﬁcation) if match is is correct. if s ∈ L(r). So far so good.2 Specifying the Matcher 239 induce k to accept its corresponding ﬁnal segment. 2002 . Therefore match satisﬁes the following property obtained from the speciﬁcation of mathc is by plugging in the initial continuation: For every regular expression r and string s. But does match is satisfy its speciﬁcation? If so. This is precisely the property that we desire for match. then match r s evaluates to true. not merely if a particular splitting fails to work. Does it follow that match satisﬁes the original speciﬁcation? Recall that the function match is deﬁned as follows: fun match r s = match is r (String.
for every initial segment matching r1 . by deﬁnition of L(r1 r2 ). We ﬁrst consider the three base cases. every pair of successive initial segments of cs matching r1 and r2 successively results in k evaluating to false. We now consider the three inductive steps. otherwise we must fail. and false otherwise. and hence match is r1 cs1 cs2 cs k evaluates to true. as required. we observe that some initial segment of cs matches r and causes k to accept the corresponding ﬁnal segment of cs iff either some initial segment matches r1 and drives k to accept the rest or some initial segment matches r2 and drives k to accept the rest. and otherwise yields false. and hence the outer call yields false. For r = r1 + r2 . 2002 . By induction match is works as speciﬁed for r1 and r2 .240 ProofDirected Debugging the speciﬁcation as a conjecture about match is. or 3. which is sufﬁcient to justify the correctness of match is for r = r1 + r2 . passes cs to k to determine the outcome. if so. Thus match is behaves correctly for each of the three base cases. however. in which case the outer recursive call yields false. This is precisely how match is is deﬁned. then one of three situations occurs: 1. Then to succeed cs must have the form a cs with k cs evaluating to true. By induction match is behaves according to the speciﬁcation if it is applied to either r1 or r2 . match is r2 (cs2 cs ) k evaluates to true. and attempt to prove it by structural induction on r. since by induction the inner recursive call to match is always terminates. Suppose that r is 0. Consequently. Since the null string is an initial segment of every string. Note that the continuation k given by fn cs’ => match is r2 cs’ k is total. the proof is slightly more complicated. Suppose that there exists a partitioning cs = cs @cs with cs ∈ L(r)and k cs evaluating to true. or 2. no initial segment of the corresponding ﬁnal segment matches r2 . Suppose that r is a. as required. no such partitioning exists. Then cs = cs1 cs2 with cs1 ∈ L(r1 ) and cs2 ∈ L(r2 ). we must yield true iff k cs yields true. Suppose that r is 1. so match is must return false. provided that the continuation argument is total. and the null string is in L(1). If. The code for match is checks that cs has the required form and. For r = r1 r2 . in which case the inner W ORKING D RAFT D ECEMBER 9. as required. either no initial segment of cs matches r1 . in which case the inner recursive call yields false on every call. Then no string is in L(r). which indeed it does.
or cs = cs @cs with cs ∈ L(r1 ) and cs ∈ L(r) (induction step). with one more case to consider. An alternative is to observe that the proof goes through under the additional assumption that no iterated regular expression matches the null string. it is quite tricky to get right! We seem to be on track. but at the expense of cluttering the code and imposing additional runtime overhead. Be sure you understand the reasoning involved here. then match is r cs k does not terminate! In general if r = r1 ∗ with ε ∈ L(r1 ). You should write out a version of the matcher that works this way. But there is a snag: the second recursive call to match is leaves the regular expression unchanged! Consequently we cannot apply the inductive hypothesis to establish that it behaves correctly in this case. In other words. Our conjecture is false! Our failure to establish that match is satisﬁes its speciﬁcation lead to a counterexample that refuted our conjecture and uncovered a genuine bug in the program — the matcher may not terminate for some inputs. 2002 . What to do? A moment’s thought suggests that we proceed by an inner induction on the length of the string. as required. But there is a ﬂaw in this argument — the string cs need not be shorter than cs in the case that cs is the null string! In that case the inductive hypothesis does not apply. This time we can use the failure of the proof to obtain a counterexample to the speciﬁcation! For if r = 1∗ . and hence the continuation k always yields false. then match is r cs k fails to terminate. then either that initial segment is the null string (base case). based on the idea that if some initial segment of cs matches L(r). Call a regular expression r standard iff whenever r ∗ occurs within W ORKING D RAFT D ECEMBER 9. This will work. This case would appear to be a combination of the preceding two cases for alternation and concatenation. What to do? One approach is to explicitly check for looping behavior during matching by ensuring that each recursive calls matches some nonempty initial segment of the string. We then handle the base case directly. and check that it indeed satisﬁes the speciﬁcation we’ve given above. for example. match is does not satisfy the speciﬁcation we have given for it. and we are once again unable to complete the proof.2 Specifying the Matcher 241 recursive call always yields false.27. and handle the inductive case by assuming that match is behaves correctly for cs and showing that it behaves correctly for cs. and the obvious proof attempt breaks down. with a similar argument sufﬁcing to establish correctness. r = r1 ∗ . and hence the outer recursive call yields false.
The required preprocessing is based on the following deﬁnitions.242 ProofDirected Debugging r. 2002 . This says that the matcher works correctly for standard regular expressions. We will associate with each regular expression r two standard regular expressions δ(r) and r− with the following properties: 1. The function δ mapping regular expressions to regular expressions is deﬁned by induction on regular expressions by the following equations: δ(0) δ(1) δ(a) δ(r1 + r2 ) δ(r1 r2 ) δ(r∗ ) = = = = = = 0 1 0 δ(r1 ) ⊕ δ(r2 ) δ(r1 ) ⊗ δ(r2 ) 1 Here we deﬁne 0 ⊕ 1 = 1 ⊕ 0 = 1 ⊕ 1 = 1 and 0 ⊕ 0 = 0 and 0 ⊗ 1 = 1 ⊗ 0 = 0 ⊗ 0 = 0 and 1 ⊗ 1 = 1. But what about the nonstandard ones? The key observation is that every regular expression is equivalent to one in standard form. Using this observation we may avoid the need to consider nonstandard regular expressions. the regular expressions r + 0 and r are easily seen to be equivalent. then call the matcher on the standardized regular expression. 2. which is in standard form. For example. Exercise 2 Show that L(δ(r)) = 1 iff ε ∈ L(r). L(r− ) = L(r) \ 1. W ORKING D RAFT D ECEMBER 9. With these equations in mind. the null string is not an element of L(r ). we see that every regular expression r may be written in the form δ(r) + r− . L(δ(r)) = 1 iff ε ∈ L(δ(r)) and L(δ(r)) = 0 otherwise. It is easy to check that the proof given above goes through under the assumption that the regular expression r is standard. By “equivalent” we mean “accepting the same language”. Instead we can preprocess the regular expression to put it into standard form.
27.2 Specifying the Matcher 243 The deﬁnition of r− is given by induction on the structure of r by the following equations: 0− 1− a− (r1 + r2 )− (r1 r2 )− (r∗ )− = = = = = = 0 0 0 − − r1 + r2 − − − δ(r1 ) r2 + r1 δ(r2 ) + r1 r2 ∗ δ(r) + r− The only tricky case is the one for concatenation. W ORKING D RAFT D ECEMBER 9. which must take account of the possibility that r1 or r2 accepts the null string. 2002 . Exercise 3 Show that L(r− ) = L(r) \ 1.
244 ProofDirected Debugging W ORKING D RAFT D ECEMBER 9. 2002 .
Whenever a value of an abstract type is created it may be subsequently acted upon by the operations of the type (and. values of an abstract type in functional languages such as ML may have many different logical futures. say a string or an integer. which may themselves be handed off to further operations of the type. since the type is abstract. It is therefore inherent in the imperative programming model that a value have at most one logical future. precisely because the operations do not “destroy” the value upon which they operate. This is the normal case in familiar imperative programming languages because in such languages the operations of an abstract type destructively modify the value upon which they operate. Each of these operations may yield (other) values of that abstract type. and in fact may W ORKING D RAFT D ECEMBER 9. by no other operations). The sequence of operations performed on a value of an abstract type constitutes a logical future of that type — a computation that starts with that value and ends with a value of some observable type. In contrast. Such values are said to be persistent because they persist after application of an operation of the type.Chapter 28 Persistent and Ephemeral Data Structures This chapter is concerned with persistent and ephemeral abstract types. but rather create fresh values of that type to yield as results. The distinction is best explained in terms of the logical future of a value. We say that a type is ephemeral iff every value of that type has at most one logical future. its original state is irretrievably lost by the performance of an operation. is obtained as an observable outcome of the succession of operations on the abstract value. 2002 . which is to say that it is handed off from one operation of the type to another until an observable value is obtained from it. Ultimately a value of some other type.
appending.3]. It is therefore a singlylinked list. [1. a reference to its original binding does not yield its original contents. For example. or reversing a list does not destroy the original list.6] to it. has two distinct logical futures. There is only one future for this cell.246 Persistent and Ephemeral Data Structures serve as arguments to further operations of that type.5. one in which we remove its head.6] Notice that the original list value.2. This leads naturally to the idea of multiple logical futures for a given value. even if we keep a handle on it. as illustrated by the following code sequence: (* original list *) val l = [1. 2002 . The ability to easily handle multiple logical futures for a data structure is a tremendous source of ﬂexibility and expressive power. alleviating the need to perform tedious bookkeeping to manage “versions” or “copies” of a data structure to be passed to different operations. and the other in which we append the list [4. val r = ref 0 (* original cell *) val s = r val = (s := 1) val x = !r (* 1! *) Notice that the contents of (the cell bound to) r changes as a result of performing an assignment to the underlying cell. Some examples will help to clarify the distinction. one whose predecessor relation may be changed dynamically by assignment: W ORKING D RAFT D ECEMBER 9.5. The primitive list types of ML are persistent because the performance of an operation such as cons’ing. the following declaration deﬁnes a type of lists whose tails are mutable.3] val m1 = hd l (* first future of l *) val n1 = rev m1 (* second future of l *) val m2 = l @ [4. The prototypical ephemeral data structure in ML is the reference cell. More elaborate forms of ephemeral data structures are certainly possible. Performing an assignment operation on a reference cell changes it irrevocably.2. the original contents of the cell are lost. then reverse the tail.
For example. Such a use of mutation is an example of what is called a benign effect because for all practical purposes the data structure is “purely functional” (i.. the code is quite tricky to understand! The idea is the same as the iterative reverse function for pure lists. a) = a  ipr (this as (Cons ( . this function reverses the links in the cell so that the elements occur in reverse order of their occurrence in l. imperative data structures are ephemeral. and hence are irreversible (so to speak!). structure of the future uses of values of that type. it is possible for a persistent data type to be used in such a way that persistence is not exploited — rather. when moving elements onto the accumulator argument. a) = ipr (next. 2002 .. but is in fact implemented using mutable storage. this)) in (* destructively reverse a list *) fun inplace reverse l = ipr (l. Given a mutable list l. reﬂecting the linear. Second. here’s an implementation of a destructive reversal of a mutable list. rather than generate new ones. except that we reuse the nodes of the original list. by having observable effects on values) without changing the behavior of the program. The distinction between ephemeral and persistent data structures is essentially the distinction between functional (effectfree) and imperative (effectful) programming — functional data structures are persistent. it is possible to implement a persistent data structure that exploits mutable storage. First. As we will see later the exploitation of benign effects is crucial for building efﬁcient implementations of persistent data structures. The signiﬁcance of a singlethreaded type is that it may as well have been implemented as an ephemeral data structure (e.247 datatype ’a mutable list = Nil  Cons of ’a * ’a mutable list ref Values of this type are ephemeral in the sense that some operations on values of this type are destructive.g. W ORKING D RAFT D ECEMBER 9. persistent).e. local fun ipr (Nil. as opposed to branching. this characterization is oversimpliﬁed in two respects. Such a type is said to be singlethreaded. r as ref next)). Nil) end As you can see. However. every value of the type has at most one future in the program. (r := a.
2002 W ORKING D RAFT . q4 = q0 *) By contrast the following operations do not form a single thread. the original queue remains available as an argument to further queue operations. and remove operations in such a way that the queue argument of one operation is obtained as a result of the immediately preceding queue operation. together with operations to create an empty queue. but rather a branching development of the queue’s lifetime: val val val val q0 : q1 = q2 = (h1. q0) insert (2. and the queue resulting from removing that element. q1) q3) = remove q2 q4) = remove q3 (* h1 = 1. q3 = q1 *) (* h2 = 2. insert. It also provides an exception that is raised in response to an attempt to remove an element from the empty queue. By a sequence of queue operations we shall mean a succession of uses of empty. Thus a sequence of queue operations represents a singlethreaded timeline in the life of a queue value. (h2. not q1! *) (* h1 = 1. Notice that removing an element from a queue yields both the element at the front of the queue. This is a direct reﬂection of the persistence of queues implemented by this signature. q3 = q0 *) D ECEMBER 9. q0) q3) = remove q1 (* NB: q0. int queue = empty insert (1. and to remove an element from the front of the queue.248 Persistent and Ephemeral Data Structures 28. int queue = empty insert (1. insert an element onto the back of the queue. Here is an example of a sequence of queue operations: val val val val val q0 : q1 = q2 = (h1.1 Persistent Queues Here is a signature of persistent queues: signature QUEUE = sig type ’a queue exception Empty val empty : ’a queue val insert : ’a * ’a queue > ’a queue val remove : ’a queue > ’a * ’a queue end This signature describes a structure providing a representation type for queues. q0) insert (2.
. one for the back “half” consisting of recently inserted elements in the order of arrival. How is this achieved? By combining the two naive solutions sketched above. in compensation.. say. W ORKING D RAFT D ECEMBER 9. q4) = remove q3 val (h2. Can we do better? A wellknown “trick” achieves an O(n) worstcase performance for any sequence of n operations. Thus in the worst case a sequence of n enqueue and dequeue operations will take time O(n2 ). These invariants are maintained by using a “smart constructor” that creates a queue from two lists representing the back and front parts of the queue. which is clearly excessive. but. 2002 . which means that each operation takes O(1) steps if we amortize the cost over the entire sequence. How might we implement the signature QUEUE? The most obvious approach is to represent the queue as a list with. This means that some operations may be relatively expensive. q4 = q0 *) 249 In the remainder of this section we will be concerned with singlethreaded sequences of queue operations.e. maintain an even split of elements between the front and the back lists. at the expense of enqueue. The front is empty only if the back is empty. The constructor proceeds by a case analysis on the back and fron parts of the queue. The elements of the queue listed in order of removal are the elements of the front followed by the elements of the back in reverse order. but dequeuing requires time proportional to the number of elements in the queue. 2. This constructor ensures that the representation invariant holds by ensuring that condition (2) is always true of the resulting queue.28. yielding an amortized bound for each operation of the sequence. in general. but the time bound for n operations remains the same in the worst case. many operations will be cheap. q4) = remove q2 (* raise Empty *) (* h2 = 2. by regarding the head of the list as the “front” of the queue. the head element of the list representing the “back” (most recently enqueued element) of the queue. Rather. Notice that this is a worstcase bound for the sequence. We can make dequeue simpler. we will arrange things so that the following representation invariants holds true: 1. The idea is to represent the queue by twolists. and one for the front “half” consisting of soontoberemoved elements in reverse order of arrival (i. in order of removal). We put “half” in quotes because we will not.1 Persistent Queues val (h2. With this representation enqueueing is a constanttime operation.
according to whether the front part is empty or not. we simply return the head element together with the queue created from the original back part and the front part with the head element removed. f::fs) = (f. nil)) = q  make queue (bs. (We will give a more rigorous analysis shortly. rev bs)  make queue (q as (bs. If the front is empty. depending on whether the front part is empty. fs)) = q val empty = make queue (nil.) Here’s the implementation of this idea in ML: structure Queue :> QUEUE = struct type ’a queue = ’a list * ’a list fun make queue (q as (nil.front)) = make queue (x::back. or both the front and back are empty. nil) fun insert (x. Here again the time required is either constant or proportional to the size of the back of the queue. 2002 .250 Persistent and Ephemeral Data Structures If the front list is nonempty. the queue constructor yields the queue consisting of an empty back part and a front part equal to the reversal of the given back part. If the front is nonempty. If the front is empty and the back is nonempty. front) exception Empty fun remove ( . the resulting queue consists of the back and front parts as given. make queue (bs. or in time proportional to the length of the back part. then the next k operations are constanttime. so we raise an exception. then calling the queue constructor to ensure that the result is in conformance with the representation invariant. Notice that if an insertion or removal requires a reversal of k elements. Observe that this is sufﬁcient to ensure that the representation invariant holds of the resulting queue in all cases. nil) = (nil. Observe also that the smart constructor either runs in constant time. Thus an insert can either take constant time. then by condition (2) the queue is empty. fs)) end W ORKING D RAFT D ECEMBER 9. nil) = raise Empty  remove (bs. Removal of an element from a queue requires a case analysis. (back. Insertion of an element into a queue is achieved by cons’ing the element onto the back of the queue. or time proportional to the size of the back of the queue. This is the fundamental insight as to why we achieve O(n) time complexity over any sequence of n operations. according to whether the front part becomes empty after the removal.
Thus at worst we require three steps per element to account for the entire effort expended to perform a sequence of queue operations. the running time of a sequence of n queue operations is now O(n). More precisely. nonamortized constanttime impleW ORKING D RAFT D ECEMBER 9. as it was in the naive implementation.. and it’s departure from the queue. we obtain the result that each operations takes c steps when amortized over the entire sequence. This is in fact a conservative upper bound in the sense that we may need less than 3n steps for the sequence. The key is to observe ﬁrst that the work required to execute a sequence of queue operations may be apportioned to the elements themselves.2 Amortized Analysis How can we prove this claim? First we given an informal argument. rather than O(n2 ). Note that this is a worstcase time bound for each operation. Consequently.28. Arrival requires constant time to add the element to the back of the queue. but asymptotically the bound is optimal — we cannot do better than constant time per operation! (You might reasonably wonder whether there is a worstcase. However.e. when the total effort is evenly apportioned among each of the operations in the sequence. amortized over the entire sequence. 28. then that only a constant amount of work is expended on each element. 2002 . The “life” of a queue element may be divided into three stages: it’s arrival in the queue. then we tighten it up with a more rigorous analysis. it’s transit time in the queue. Consequently. so the overall effort required evens out in the end to constanttime per operation.” i.2 Amortized Analysis 251 Notice that we call the “smart constructor” make queue whenever we wish to return a queue to ensure that the representation invariant holds. some queue operations are more expensive than others. Dividing by n. each operation takes O(1) (constant) time “on average. each such reorganization makes a corresponding number of subsequent queue operations “cheap” (constanttime). In the worst case each element passes through each of these stages (but may “die young”. Departure takes constant time to pattern match and extract the element. which takes constant time per element on the back. We are to account for the total work performed by a sequence of n operations by showing that any sequence of noperations can be executed in cn steps for some constant c. Transit consists of being moved from the back to the front by a reversal. according to whether or not the queue needs to be reorganized to satisfy the representation invariant. never participating in the second or third stage). not an averagecase time bound based on assumptions about the distribution of the operations.
If the n+1st operation is a remove. then T (n + 1) = T (n) + b + 2 (charging one for the cons and one for creating the new pair of lists). If. and hence R(0) = C(0) − T (0) = 0. It is instructive to examine where this solution breaks down in the multithreaded case (i. which is the amount that the charge for n operations exceeds the actual cost of executing them. the queue contains 0 ≤ b ≤ n elements on the back “half” and 0 ≤ f ≤ n elements on the front “half”. C(n + 1) = C(n) + 1. C(n + 1) = C(n) + 2. If it is an insert. Thus C(n) = O(n). We will arrange things so that R(n) ≥ 0 and that C(n) = O(n). then T (n + 1) = T (n) + 1 and C(n + 1) = C(n) + 1 and hence R(n + 1) = R(n) = b. resulting in a queue with n elements on the back and none on the front. T (n + 1) = T (n) + 1.e. f = 0. We prove this by induction on n ≥ 0. on the other hand. This is correct because the remove doesn’t disturb the back in this case. where persistence is fully exploited). since T (0) = 0. C(n + 1) = C(n) + 2. Let T (n) be the cumulative time required (in the worst case) to execute a sequence of n queue operations. it follows that C(n) ≤ 2n for any sequence of n inserts and removes. and hence C(n) ≥ T (n). so R(n + 1) = R(n) + 2 − b − 2 = b + 2 − b − 2 = 0. and f > 0. and hence R(n + 1) = R(n) − b = b − b = 0. Consider the n + 1st operation. This is correct because an insert operation adds one element to the back of the queue. This is correct because the back is now empty. or overcharge. R(n) = b. After any sequence of n ≥ 0 operations have been performed. The result follows immediately since R(n) = b ≥ 0. By charging 2 for each insert operation and 1 for each remove. 2002 . An upper bound on the charge will then provide an upper bound on the actual cost. It is convenient to express this in terms of a function R(n) = C(n) − T (n) representing the cumulative residual.. and f > 0. The condition clearly holds after performing 0 operations.) This argument can be made rigorous as follows. Suppose that we perform a sequence of n insert operations on the empty queue. if we are performing a remove with f = 0. Finally. but the code is far more complicated than the simple implementation we are sketching here. representing the cumulative charge for executing a sequence of n operations and show that T (n) ≤ C(n) = O(n). C(n). We claim that for every n ≥ 0. Down to speciﬁcs. from which the result follows immediately. then T (n + 1) = T (n) + b + 1. C(0) = 0. The general idea is to introduce the notion of a charge scheme that provides an upper bound on the actual cost of executing a sequence of operations. The answer is “yes”. Here again we use of the residual overcharge to pay for the reversal of the back to the front. we have used the residual overcharge to pay for the cost of the reversal.252 Persistent and Ephemeral Data Structures mentation of persistent queues. Call this queue W ORKING D RAFT D ECEMBER 9. We will introduce a charge function. and hence R(n + 1) = R(n) + 1 = b + 1.
Let us suppose that we have n independent “futures” for q. for a total of 2n operations. breaking the amortized constanttime bound just derived for a singlethreaded sequence of queue operations. they all enjoy the beneﬁt of its having been done. 2002 .2 Amortized Analysis 253 q.28. W ORKING D RAFT D ECEMBER 9. the entire collection of 2n operations takes n + n2 steps. each of which removes an element from it. This may be achieved by using a benign side effect to cache the result of the reversal in a reference cell that is shared among all uses of the queue. How much time do these 2n operations take? Since each independent future must reverse all n elements onto the front of the queue before performing the removal. or O(n) steps per operation. We will return to this once we introduce memoization and lazy evaluation. Can we recover a constanttime amortized cost in the persistent case? We can. provided that we share the cost of the reversal among all futures of q — as soon as one performs the reversal.
254 Persistent and Ephemeral Data Structures W ORKING D RAFT D ECEMBER 9. 2002 .
without sacriﬁcing the requirement that an application of such a function cannot ignore failure to yield a value. we may ﬁnd that we can safely place k < n queens on the board. If all possible reconsiderations of all previous decisions all lead to failure. The general strategy is to place queens in successive columns in such a way that it is not attacked by a previously placed queen. only to discover that there is no way to place the next one. They each provide the means for handling failure to produce a value in a computation. To ﬁnd a solution we must reconsider earlier decisions. and continuations. The result type of the function forces the caller to dispatch explicitly on whether or not it returned a normal value. and work forward from there. then the problem is unsolvable.1 The nQueens Problem We will explore the tradeoffs between these three approaches by considering three different implementations of the nqueens problem: ﬁnd a way to place n queens on an n × n chessboard in such a way that no two queens attack one another. Option types provide the means of explicitly indicating in the type of a function the possibility that it may fail to yield a “normal” result. Unfortunately it’s not possible to do this in one pass. 29. and Continuations In this chapter we discuss the close relationships between option types. exceptions.Chapter 29 Options. Exceptions. 2002 . Continuations provide another means of handling failure by providing a function to invoke in the case that normal return is impossible. W ORKING D RAFT D ECEMBER 9. Exceptions provide the means of implicitly signalling failure to return a normal result value.
Exceptions. The operation complete checks whether the board contains a complete safe placement of n queens. .j)::qs) W ORKING D RAFT D ECEMBER 9. This trialanderror approach to solving the nqueens problem is called backtracking search. ) = n fun complete (n. . number placed. i+1. .256 Options. The function safe checks whether it is safe to place a queen at row i in the next free column of a board B.j) = (n. and the function positions returns the coordinates of the queens on the board. i. The function size returns the size of a board. k+1. 1. 1<=next free<=size. nil) fun size (n. A solution to the nqueens problem consists of an n×n chessboard with n queens safely placed on it. k. there is no safe placement of three queens on a 3x3 chessboard. length(placements) = number placed *) type board = int * int * int * (int * int) list fun new n = (n. . 2002 . qs). placements inv: size>=0. qs) = qs fun place ((n. k. The board abstraction may be implemented as follows: structure Board :> BOARD = struct (* rep: size. and Continuations For example. next free column. . (i. 0. The operation place puts a queen at row i in the next available column of the board. ) = (k=n) fun positions ( . The following signature deﬁnes a chessboard abstraction: signature BOARD = sig type board val new : int > board val complete : board > bool val place : board * int > board val safe : board * int > bool val size : board > int val positions : board > (int * int) list end The operation new creates a new board of a given dimension n ≥ 0.
The representation invariant ensures that the components of the representation are properly related to one another (e. nil) = false  conflicts (q. and one using a failure continuation.board option such that if n ≥ 0.j’)) = i=i’ orelse j=j’ orelse i+j = i’+j’ orelse ij = i’j’ fun conflicts (q. .29. if one exists.2 Solution Using Options Here’s a solution based on option types: (* addqueen bd yields SOME bd’ with bd’ a complete safe placement extending bd. 29. with B a complete board containing a safe placement of n queens. qs) fun safe (( . and so on.j). one using exceptions.) Our goal is to deﬁne a function val queens : int > Board. 2002 . or to SOME B otherwise. the claimed number of placements is indeed the length of the list of placed queens. and yields NONE otherwise *) fun addqueen bd = let fun try j = if j > Board.size bd then NONE W ORKING D RAFT D ECEMBER 9. We will consider three different solutions. q’::qs) = threatens (q. then queens n evaluates either to NONE if there is no safe placement of n queens on an n × n board.j). i. one using option types. q’) orelse conflicts (q.2 Solution Using Options fun threatens ((i. j) = not (conflicts ((i..g. (i’. qs)) end 257 The representation type contains “redundant” information in order to make the individual operations more efﬁcient. qs).
if possible.new n) The characteristic feature of this solution is that we must explicitly check the result of each recursive call to addqueen to determine whether a safe placement is possible from that position.258 Options. Exceptions.safe (bd. If so. If no placement is possible in the current column. and otherwise raise an exception indicating failure. 2002 . we instead have it return a value of type Board. and Continuations else if Board.complete bd then SOME bd else try 1 end fun queens n = addqueen (Board. j) then case addqueen (Board. the function yields NONE. we must reconsider the placement of a queen in row j of the next available column.board.place (bd.size bd then W ORKING D RAFT D ECEMBER 9. The case analysis on the result is replaced by a use of an exception handler. if not. and raises Fail otherwise *) fun addqueen bd = let fun try j = if j > Board. which forces reconsideration of the placement of a queen in the preceding row.board option. 29. if one exists. where bd’ is a complete safe placement extending bd. Here’s the code: exception Fail (* addqueen bd yields bd’.3 Solution Using Exceptions The explicit check on the result of each recursive call can be replaced by the use of exceptions. we simply return it. Eventually we either ﬁnd a safe placement. or yield NONE indicating that no solution is possible. j)) of NONE => try (j+1)  r as (SOME bd’) => r else try (j+1) in if Board. Rather than have addqueen return a value of type Board.
The solution based on option types requires an explicit case analysis on the result of each recursive call. This check corresponds directly to the case analysis required in the solution based on option types. What are the tradeoffs between the two solutions? 1. In the outermost call this corresponds to a complete failure to ﬁnd a safe placement. The solution based on exceptions is free of this overhead: it is biased towards the W ORKING D RAFT D ECEMBER 9.board option where a Board. rather than compiletime.29. If a safe placement is indeed found. The type checker will ensure that one cannot use a Board.safe (bd. for otherwise an uncaught exception error would be raised at runtime.complete bd then bd else try 1 end fun queens n = SOME (addqueen (Board. j) then addqueen (Board. an exception handler is required to handle the possibility of there being no safe placement starting in the current position. it is wrapped with the constructor SOME to indicate success. The solution based on exceptions does not explicitly indicate failure in its type.new n)) handle Fail => NONE 259 The main difference between this solution and the previous one is that both calls to addqueen must handle the possibility that it raises the exception Fail.3 Solution Using Exceptions raise Fail else if Board. which means that queens must return NONE. The solution based on option types makes explicit in the type of the function addqueen the possibility of failure. 2002 . However. In the recursive call within try. the programmer is nevertheless forced to handle the failure.board is expected. j)) handle Fail => try (j+1) else try (j+1) in if Board. the check is redundant and therefore excessively costly. This forces the programmer to explicitly test for failure using a case analysis on the result of the call. If “most” results are successful. 2.place (bd.
on the other hand. 29.place (bd. pass the handler around as an additional argument. 2002 . Exceptions. the failure continuation of the computation. and otherwise.260 Options.4 Solution Using Continuations We turn now to a third solution based on continuationpassing. Here’s how it’s done in the case of the nqueens problem: (* addqueen bd yields bd’. if one exists. where bd’ is a complete safe placement extending bd. j) then addqueen (Board. we tend to prefer exceptions if failure is a rarity. static checking is paramount. fc) = let fun try j = if j > Board. For the nqueens problem it is not clear which solution is preferable. The implementation of exceptions ensures that the use of a handler is more efﬁcient than an explicit case analysis in the case that failure is rare compared to success. yields the value of fc () *) fun addqueen (bd. rather than the “failure” case of not returning a board at all. if we wish. and to prefer options if failure is relatively common. if efﬁciency is paramount. rather than having the error arise only at runtime. The idea is quite simple: an exception handler is essentially a function that we invoke when we reach a blind alley. j). then it is advantageous to use options since the type checker will enforce the requirement that the programmer check for failure. If. fn () => try (j+1)) else try (j+1) in if Board. But we can. and Continuations “normal” case of returning a board.safe (bd. In general.size bd then fc () else if Board.complete bd then SOME bd W ORKING D RAFT D ECEMBER 9. Ordinarily we achieve this invocation by raising an exception and relying on the caller to catch it and pass control to the handler.
Using the right tool for the right job makes life easier. it is generally preferable to use them since one may then avoid passing an explicit failure continuation. On a recursive call we pass to addqueen a continuation that resumes search at the next row of the current column. However. The solution based on continuations is very close to the solution based on exceptions.4 Solution Using Continuations else try 1 end fun queens n = addqueen (Board. 2002 . the compiler ensures that an uncaught exception aborts the program gracefully. W ORKING D RAFT D ECEMBER 9. fn () => NONE) 261 Here again the differences are small.new n. but signiﬁcant. we can only offer general advice. Which is preferable? Here again there is no easy answer. in the case that exceptions would sufﬁce. Should we exceed the number of rows on the board.29. there is no obvious way to replace the use of a failure continuation with a use of exceptions in the matcher. both in form and in terms of efﬁciency. we invoke the failure continuation of the most recent call to addqueen. whereas failure to invoke a continuation is not in itself a runtime fault. First off. reﬂecting the ultimate failure to ﬁnd a safe placement. More signiﬁcantly. as we’ve seen in the case of regular expression matching. failure continuations are more powerful than exceptions. The initial continuation simply yields NONE.
262
Options, Exceptions, and Continuations
W ORKING D RAFT
D ECEMBER 9, 2002
Chapter 30
HigherOrder Functions
Higherorder functions — those that take functions as arguments or return functions as results — are powerful tools for building programs. An interesting application of higherorder functions is to implement inﬁnite sequences of values as (total) functions from the natural numbers (nonnegative integers) to the type of values of the sequence. We will develop a small package of operations for creating and manipulating sequences, all of which are higherorder functions since they take sequences (functions!) as arguments and/or return them as results. A natural way to deﬁne many sequences is by recursion, or selfreference. Since sequences are functions, we may use recursive function deﬁnitions to deﬁne such sequences. Alternatively, we may think of such a sequence as arising from a “loopback” or “feedback” construct. We will explore both approaches. Sequences may be used to simulate digital circuits by thinking of a “wire” as a sequence of bits developing over time. The ith value of the sequence corresponds to the signal on the wire at time i. For simplicity we will assume a perfect waveform: the signal is always either high or low (or is undeﬁned); we will not attempt to model electronic effects such as attenuation or noise. Combinational logic elements (such as and gates or inverters) are operations on wires: they take in one or more wires as input and yield one or more wires as results. Digital logic elements (such as ﬂipﬂops) are obtained from combinational logic elements by feedback, or recursion — a ﬂipﬂop is a recursivelydeﬁned wire!
W ORKING D RAFT D ECEMBER 9, 2002
264
HigherOrder Functions
30.1 Inﬁnite Sequences
Let us begin by developing a sequence package. Here is a suitable signature deﬁning the type of sequences: signature SEQUENCE = sig type ’a seq = int > ’a (* constant sequence *) val constantly : ’a > ’a seq (* alternating values *) val alternately : ’a * ’a > ’a seq (* insert at front *) val insert : ’a * ’a seq > ’a seq val map : (’a > ’b) > ’a seq > ’b seq val zip : ’a seq * ’b seq > (’a * ’b) seq val unzip : (’a * ’b) seq > ’a seq * ’b seq (* fair merge *) val merge : (’a * ’a) seq > ’a seq val stretch : int > ’a seq > ’a seq val shrink : int > ’a seq > ’a seq val take : int > ’a seq > ’a list val drop : int > ’a seq > ’a seq val shift : ’a seq > ’a seq val loopback : (’a seq > ’a seq) > ’a seq end Observe that we expose the representation of sequences as functions. This is done to simplify the deﬁnition of recursive sequences as recursive functions. Alternatively we could have hidden the representation type, at the expense of making it a bit more awkward to deﬁne recursive sequences. In the absence of this exposure of representation, recursive sequences may only be built using the loopback operation which constructs a recursive sequence by “looping back” the output of a sequence transformer to its input. Most of the other operations of the signature are adaptations of familiar operations on lists. Two exceptions to this rule are the functions stretch and shrink that dilate and contract the sequence by a given time parameter — if a sequence is expanded by k, its value at i is the value of the original sequence at i/k, and dually for shrinking. Here’s an implementation of sequences as functions.
W ORKING D RAFT D ECEMBER 9, 2002
30.1 Inﬁnite Sequences structure Sequence :> SEQUENCE = struct type ’a seq = int > ’a fun constantly c n = c fun alternately (c,d) n = if n mod 2 = 0 then c else d fun insert (x, s) 0 = x  insert (x, s) n = s (n1) fun map f s = f o s fun zip (s1, s2) n = (s1 n, s2 n) fun unzip (s : (’a * ’b) seq) = (map #1 s, map #2 s) fun merge (s1, s2) n = (if n mod 2 = 0 then s1 else s2) (n div 2) fun stretch k s n = s (n div k) fun shrink k s n = s (n * k) fun fun fun  end drop k s n = s (n+k) shift s = drop 1 s = nil take 0 take n s = s 0 :: take (n1) (shift s)
265
fun loopback loop n = loop (loopback loop) n
Most of this implementation is entirely straightforward, given the ease with which we may manipulate higherorder functions in ML. The only tricky function is loopback, which must arrange that the output of the function loop is “looped back” to its input. This is achieved by a simple recursive deﬁnition of a sequence whose value at n is the value at n of the sequence resulting from applying the loop to this very sequence. The sensibility of this deﬁnition of loopback relies on two separate ideas. First, notice that we may not simplify the deﬁnition of loopback as follows: (* bad definition *) fun loopback loop = loop (loopback loop) The reason is that any application of loopback will immediately loop forever! In contrast, the original deﬁnition is arranged so that application of
W ORKING D RAFT D ECEMBER 9, 2002
266
HigherOrder Functions
loopback immediately returns a function. This may be made more apparent by writing it in the following form, which is entirely equivalent to the deﬁnition given above: fun loopback loop = fn n => loop (loopback loop) n This format makes it clear that loopback immediately returns a function when applied to a loop functional. Second, for an application of loopback to a loop to make sense, it must be the case that the loop returns a sequence without “touching” the argument sequence (i.e., without applying the argument to a natural number). Otherwise accessing the sequence resulting from an application of loopback would immediately loop forever. Some examples will help to illustrate the point. First, let’s build a few sequences without using the loopback function, just to get familiar with using sequences: val evens : int seq = fn n => 2*n val odds : int seq = fn n => 2*n+1 val nats : int seq = merge (evens, odds) fun fibs n = (insert (1, insert (1, map (op +) (zip (drop 1 fibs, fibs)))))(n) We may “inspect” the sequence using take and drop, as follows: take 10 nats take 5 (drop 5 nats) take 5 fibs (* [0,1,2,3,4,5,6,7,8,9] *) (* [5,6,7,8,9] *) (* [1,1,2,3,5] *)
Now let’s consider an alternative deﬁnition of fibs that uses the loopback operation: fun fibs loop s = insert (1, insert (1, map (op +) (zip (drop 1 s, s)))) val fibs = loopback fibs loop;
W ORKING D RAFT D ECEMBER 9, 2002
30.2 Circuit Simulation
267
The deﬁnition of fibs loop is exactly like the original deﬁnition of fibs, except that the reference to fibs itself is replaced by a reference to the argument s. Notice that the application of fibs loop to an argument s does not inspect the argument s! One way to understand loopback is that it solves a system of equations for an unknown sequence. In the case of the second deﬁnition of ﬁbs, we are solving the following system of equations for f : f0 = 1 f1 = 1 f (n + 2) = f (n + 1) + f (n) These equations are derived by inspecting the deﬁnitions of insert, map, zip, and drop given earlier. It is obvious that the solution is the Fibonacci sequence; this is precisely the sequence obtained by applying loopback to fibs loop. Here’s an example of a loop that, when looped back, yields an undeﬁned sequence — any attempt to access it results in an inﬁnite loop: fun bad loop s n = s n + 1 val bad = loopback bad loop val = bad 0 (* infinite loop! *)
In this example we are, in effect, trying to solve the equation sn = sn+1 for s, which has no solution (except the totally undeﬁned sequence). The problem is that the “next” element of the output is deﬁned in terms of the next element itself, rather than in terms of “previous” elements. Consequently, no solution exists.
30.2
Circuit Simulation
With these ideas in mind, we may apply the sequence package to build an implementation of digital circuits. Let’s start with wires, which are represented as sequences of levels: datatype level = High  Low  Undef type wire = level seq type pair = (level * level) seq val Zero : wire = constantly Low val One : wire = constantly High
W ORKING D RAFT D ECEMBER 9, 2002
then to string them together to form an nbit ripplecarry adder. Be sure to present W ORKING D RAFT D ECEMBER 9. High) = High and = Undef fun logical not Undef = Undef  logical not High = Low  logical not Low = High fun logical nop l = l (* a nor b = not a and not b *) val logical nor = logical and o (logical not ** logical not) type unary gate = wire > wire type binary gate = pair > wire fun gate f w 0 = Undef (* logic gate with unit propagation delay *)  gate f w i = f (w (i1)) val delay : unary gate = gate logical nop (* unit delay *) val inverter : unary gate = gate logical not val nor gate : binary gate = gate logical nor It is a good exercise to build a onebit adder out of these elements. fun (f ** g) (x. exactly as in “real life”. ) = Low and ( . As we build up layers of circuit elements. g y) (* hardware fun logical  logical  logical  logical logical and *) and (Low. (* apply two functions in parallel *) infixr **. y) = (f x. Combinational logic elements (gates) may be deﬁned as follows. We introduce an explicit unit time propagation delay for each gate — the output is undeﬁned initially. Low) = Low and (High. it takes longer and longer (proportional to the length of the longest path through the circuit) for the output to settle. High)) We include the “undeﬁned” level to account for propagation delays and settling times in circuit elements.268 HigherOrder Functions (* clock pulse with given duration of each pulse *) fun clock (freq:int):wire = stretch freq (alternately (Low. and is then determined as a function of its inputs. 2002 .
exactly as in the hardware case. (All of these behaviors may be observed by using take and drop to inspect the values on the circuit. 2002 . (* unstable! *) = take 20 X.2 Circuit Simulation 269 the inputs to the adder with sufﬁcient pulse widths to ensure that the circuit has time to settle! Combining these basic logic elements with recursive deﬁnitions allows us to deﬁne digital logic elements such as the RS ﬂipﬂop. Q = RS ff (S. Y))(n) and Y n = nor gate (zip (X. using the implementation of the sequence operations given above. Note also that presentation of “illegal” inputs (such as setting both the R and the S leads high results in metastable behavior of the ﬂipﬂop. R). followed by w *) w i b pulse b (n1) w (i1) S = pulse Low 2 (pulse High 2 Zero). we consider a variant implementation of an RS ﬂipﬂop using the loopback operation: W ORKING D RAFT D ECEMBER 9. The propagation delay inherent in our deﬁnition of a gate is fundamental to ensuring that the behavior of the ﬂipﬂop is welldeﬁned! This is consistent with “real life” — ﬂipﬂop’s depend on the existence of a hardware propagation delay for their proper functioning. observe that the ﬂipﬂop exhibits a momentary “glitch” in its output before settling. R = pulse Low 6 (pulse High 2 Zero). R))(n) in Y end (* generate fun pulse b  pulse b  pulse b val val val val val val a 0 n n pulse w i = w 0 = w i = of b’s n wide. X = RS ff (S. Finally. S). here as in real life Finally.30. It is a good exercise to derive a system of equations governing the RS ﬂipﬂop from the deﬁnition we’ve given here. R : wire) = let fun X n = nor gate (zip (S.) fun RS ff (S : wire. = take 20 Q. Observe that the delays arising from the combinational logic elements ensure that a solution exists by ensuring that the “next” element of the output refers only the “previous” elements. and not the “current” element.
nor gate (zip (X. This is achieved by reducing binary loopback to unary loopback by composing with zip and unzip.270 HigherOrder Functions fun loopback2 (f : wire * wire > wire * wire) = unzip (loopback (zip o f o unzip)) fun RS ff’ (S : wire. Y) = (nor gate (zip (S. W ORKING D RAFT D ECEMBER 9. Y)). R))) in loopback2 RS loop end Here we must deﬁne a “binary loopback” function to implement the ﬂipﬂop. 2002 . R : wire) = let fun RS loop (X.
1 Cacheing Results We begin with a discussion of memoization to increase the efﬁciency of computing a recursivelydeﬁned function whose pattern of recursion involves a substantial amount of redundant computation. 2002 . The problem is to compute the number of ways to parenthesize an expression consisting of a sequence of n multiplications as a function of n. the expression 2 ∗ 3 ∗ 4 ∗ 5 can be parenthesized in 5 ways: ((2 ∗ 3) ∗ 4) ∗ 5. W ORKING D RAFT D ECEMBER 9. 31. A simple recurrence expresses the number of ways of parenthesizing a sequence of n multiplications: fun  fun  sum sum p 1 p n f f = = 0 = 0 n = (f n) + sum (f (n1)) 1 sum (fn k => (p k) * (p (nk)) (n1) where sum f ncomputes the sum of values of a function f (k) with 1 ≤ k ≤ n. (2 ∗ 3) ∗ (4 ∗ 5). Memoization is fundamental to the implementation of lazy data structures. either “by hand” or using the provisions of the SML/NJ compiler. (2 ∗ (3 ∗ 4)) ∗ 5. 2 ∗ (3 ∗ (4 ∗ 5)). For example. This program is extremely inefﬁcient because of the redundancy in the pattern of the recursive calls. 2 ∗ ((3 ∗ 4) ∗ 5). a programming technique for cacheing the results of previous computations so that they can be quickly retrieved without repeated effort.Chapter 31 Memoization In this chapter we will discuss memoization.
the table is consulted to see whether the value has already been computed.g. SOME r). n. we compute the value and store it in the table for future use. r end else p’ n W ORKING D RAFT D ECEMBER 9. An alternative is to use a dictionary (e. This ensures that no redundant computations are performed. NONE) in fun p’ 1 = 1  p’ n = sum (fn k => (p k) * (p (nk))) (n1) and p n = if n < limit then case Array. The idea is to maintain a table of values of the function that is ﬁlled in whenever the function is applied. The penalty is that the array has a ﬁxed size. If not. Here’s the code to implement a memoized version of the parenthesization function: local val limit = 100 val memopad = Array. but which takes logarithmic time to perform a lookup. if so. so we can only record the values of the function at some predetermined set of arguments. If the function is called on an argument n.update (memopad. But in many cases there is no known closed form.sub of SOME r => r  NONE => let val r = p’ n in Array. a balanced binary search tree) which has no a priori size limitation. In this case a simple cacheing technique proves effective. and something else must be done to cut down the overhead. we must compute the value the “hard way”. For simplicity we’ll use a solution based on arrays..array (100. 2002 . it is simply returned.272 Memoization What can we do about this problem? One solution is to be clever and solve the recurrence. As it happens this recurrence has a closedform solution (the Catalan numbers). Once we exceed the bounds of the table. We will maintain the table as an array so that its entries can be accessed in constant time.
31. To do so. The implementation is slick. 2002 .2 Laziness end 273 The main idea is to modify the original deﬁnition so that the recursive calls consult and update the memopad. Notice that the deﬁnitions of p and p’ are mutually recursive! 31. functions of type unit > ’a. This is a value of type unit > ’a. It’s job is to “memoize” the suspension so that the suspended computation is evaluated at most once — once the result is computed. simply write fn () => exp. Here’s the code to do it: structure Susp :> SUSP = struct type ’a susp = unit > ’a W ORKING D RAFT D ECEMBER 9.2 Laziness Lazy evaluation is a combination of delayed evaluation and memoization. (). Delayed evaluation is implemented using thunks. To delay the evaluation of an expression exp of type’a. simply apply the thunk to the null tuple. we will consider the following signature of suspensions: signature SUSP = sig type ’a susp val force : ’a susp > ’a val delay : (unit > ’a) > ’a susp end The function delay takes a suspended computation (in the form of a thunk) and yields a suspension. To “thaw” the expression. remarkable effects can be achieved by combining delayed evaluation with memoization. the value is stored in a reference cell so that subsequent forces are fast. The “exported” version of the function is the one that refers to the memo pad. the expression exp is effectively “frozen” until the function is applied. Here’s a simple example: val thunk = fn () => print "hello" val = thunk () (* nothing printed *) (* prints hello *) While this example is especially simpleminded.
We then deﬁne another thunk t’ that.274 Memoization fun force t = t () fun delay (t : ’a susp) = let exception Impossible val memo : ’a susp ref = ref (fn () => raise Impossible) fun t’ () = let val r = t () in memo := (fn () => r). force simply applies the suspension to the null tuple to force its evaluation. 2002 . It returns r as result. Speciﬁcally. the ﬁrst time dt is forced. simply forces the contents of the memo pad. it immediately forces the contents of the memo pad. “zaps” the memo pad. as before. and returns r. What about delay? When applied. raises an internal exception. 3. but this time the it contains the constant function that immediately returns r. when forced. It forces the thunk t to obtain its value r. if forced. which then forces t its value r. the contents of the memo pad changes as a result of forcing it so that subsequent forces exhibit different behavior. Altogether we have ensured that t is forced at most once by using a form of “selfmodifying” code. it forces the thunk t’. 2. It replaces the contents of the memopad with the constant function that immediately returns r. Here’s an example to illustrate the effect of delaying a thunk: W ORKING D RAFT D ECEMBER 9. does three things: 1. when forced. Whenever dt is forced. We then assign t’ to the memo pad (hence obliterating the placeholder). and return a thunk dt that. fn () => (!memo)() end end It’s worth discussing this code in detail because it is rather tricky. Suspensions are just thunks. However. it is merely a placeholder with which we initialize the reference cell. The second time dt is forced. This can never happen for reasons that will become apparent in a moment. delay allocates a reference cell containing a thunk that. it forces the contents of the memo pad. r end in memo := t’.
W ORKING D RAFT D ECEMBER 9.susp (fn () => e). Thus the “eager” tail function 1 Please see chapter 15 for a description of the SML/NJ lazy data type mechanism. automatically suspends computation. t) = e becomes val Cons (h. 2002 . memoized) computations of stream values. the second deﬁnes the type of stream computations. which are formed by applying the constructor Cons to a value and another stream. the result of forcing a stream computation.3 Lazy Data Types in SML/NJ The lazy datatype declaration1 datatype lazy ’a stream = Cons of ’a * ’a stream expands into the following pair of type declarations datatype ’a stream! = Cons of ’a * ’a stream withtype ’a stream = ’a stream! Susp.force e which forces the righthand side before performing pattern matching. This is achieved by regarding Cons e as shorthand for Cons (Susp. which are suspensions yielding stream values.31.force t = Susp. Thus streams are represented by suspended (unevaluated. When used in a pattern.delay (fn () => print "hello") val val = Susp.force t (* prints hello *) (* silent *) 275 Notice that hello is printed once. t) = Susp. so the message is printed at most once on the screen. The value constructor Cons. 31. For example.susp The ﬁrst deﬁnes the type of stream values. the binding val Cons (h. when used to build a stream.3 Lazy Data Types in SML/NJ val t = Susp. the value constructor Cons induces a use of force. not twice! The reason is that the suspended computation is evaluated at most once. A similar transformation applies to nonlazy function deﬁnitions — the argument is forced before pattern matching commences.
ones)) Unfortunately this is not quite legal in SML since the righthand side involves an application of a a function to another function. lazy function deﬁnitions defer pattern matching until the result is forced. performs the pattern match. ones) expands into the following recursive function deﬁnition: val rec ones = Susp. t)) = t and lstl s = Susp.force s) Memoization which forces the argument as soon as it is applied.delay (fn () => Cons (1. yields a suspension s whose behavior is the same as f (s). t)) = t expands into fun lstl! (Cons ( . when applied to a function f mapping suspensions to suspensions.force s)) which a suspension that. This can either be provided by extending SML to admit such deﬁnitions. t)) = t expands into fun stl! (Cons ( . In the above example the function in question is W ORKING D RAFT D ECEMBER 9. 31. t)) = t and stl s = stl! (Susp. the recursive stream deﬁnition val rec lazy ones = Cons (1.4 Recursive Suspensions We seek to add a function to the Susp package with signature val loopback : (’a susp > ’a susp) > ’a susp that. we’ll explore the latter alternative.delay (fn () => lstl! (Susp. the application of f to the resulting suspension. Finally. Since it is an interesting exercise in itself. or by extending the Susp package to include an operation for building recursive suspensions such as this one. Thus the lazy tail function fun lstl (Cons ( .276 fun stl (Cons ( . On the other hand. 2002 . when forced.
s)) We use loopback to deﬁne ones as follows: val ones = Susp. Here’s the code fun loopback f = let exception Circular val r = ref (fn () => raise Circular) val t = fn () => (!r)() in r := f t . Then we deﬁne a thunk that.delay (fn () => Cons (1.delay (fn () => Cons (1.loopback is implemented properly.4 Recursive Suspensions 277 fun ones loop s = Susp. we assign to the reference cell the result of applying the given function to the result thunk. forces the contents of this reference cell. as in the original deﬁnition and which is the result of evaluating Susp. the exception Circular will be raised. This “ties the knot” to ensure that the output is “looped back” to the input. t end First we allocate a reference cell which is initialized to a placeholder that. But before returning. raises the exception Circular.31. 2002 . This will be the return value of loopback. ones)). Observe that if the loop function touches its input suspension before yielding an output suspension. assuming Susp.loopback ones loop.loopback ones loop The idea is that ones should be equivalent to Susp. if forced. W ORKING D RAFT D ECEMBER 9. when forced. How is loopback implemented? We use a technique known as backpatching.
2002 .278 Memoization W ORKING D RAFT D ECEMBER 9.
This ensures that the representation may be changed without affecting the behavior of the client — since the representation is hidden from it. Consequently. An ADT is implemented by providing a representation type for the values of the ADT and an implementation for the operations deﬁned on values of the representation type. called a representation invariant. For simplicity we take keys to be strings.1 Dictionaries To make these ideas concrete we will consider the abstract data type of dictionaries. any violation of the representation invariant may be localized to the implementation of one of the operations. the client cannot depend on it. then it must truly be invariant — no other code in the system could possibly disrupt it. This signiﬁcantly reduces the time required to ﬁnd an error in a program.Chapter 32 Data Abstraction An abstract data type (ADT) is a type equipped with a set of operations for manipulating values of that type. If the operations of the ADT preserve the representation invariant. on the representation that is preserved by the operations of the type. 2002 . What makes an ADT abstract is that the representation type is hidden from clients of the ADT. A dictionary is a mapping from keys to values. In compensation each operation that yields a value of the ADT as result must guarantee that the representation invariant holds of it. 32. This also facilitates the implementation of efﬁcient data structures by imposing a condition. the only operations that may be performed on a value of the ADT are the given ones. but it is possible to deﬁne a dictionary for W ORKING D RAFT D ECEMBER 9. Each operation that takes a value of the ADT as argument may assume that the representation invariant holds. Put another way.
280 Data Abstraction any ordered type. conditions. 32. respectively.2 Binary Search Trees A simple implementation of a dictionary is a binary search tree. more stringent. the values associated with keys are completely arbitrary. insert a value with a given key. insert. The binary search tree property is an example of a representation invariant on an underlying data structure. and smaller than the value at any node in the right child. the value at that node is greater than the value at any node in the left child of that node. In short a dictionary is an implementation of the following signature: signature DICT = sig type key = string type ’a entry = key * ’a type ’a dict exception Lookup of key val empty : ’a dict val insert : ’a dict * ’a entry > ’a dict val lookup : ’a dict * key > ’a end Notice that the type ’a dict is not speciﬁed in the signature. whereas the types key and ’a entry are deﬁned to be string and string * ’a. The underlying structure is a binary tree with values at the nodes. Viewed as an ADT. We may use a binary search tree to implement a dictionary as follows: structure BinarySearchTree :> DICT = struct type key = string type ’a entry = key * ’a W ORKING D RAFT D ECEMBER 9. and retrieve the value associated with a key (if any). the representation invariant isolates a set of structures satisfying some additional. It follows immediately that no two nodes in a binary search tree are labelled with the same value. and lookup operations that create a new dictionary. A binary search tree is a binary tree with values of an ordered type at the nodes arranged in such a way that for every node in the tree. a dictionary is a type ’a dict of dictionaries mapping strings to values of type ’a together with empty. 2002 .
the representation is essentially just a list! The left child of each node is empty. the right child is the rest of the dictionary. k) of EQUAL => v  LESS => lookup (l.compare (k’. ). Consequently. it might fail to ﬁnd a key that in fact occurs in the tree!). e’))  EQUAL => n) fun lookup (Empty. it takes O(n) time in the worse case to perform a lookup on a dictionary containing n elements.3 Balanced Binary Search Trees 281 (* Rep invariant: ’a tree is a binary search tree *) datatype ’a tree = Empty  Node of ’a tree * ’a entry * ’a tree type ’a dict = ’a tree exception Lookup of key val empty = Empty fun insert (Empty. entry. that insert yields a binary search tree if its argument is one. k’)) end Notice that empty is deﬁned to be a valid binary search tree. e’). e as (k. )) = (case String. The structure BinarySearchTree is sealed with the signature DICT to ensure that the representation type is held abstract. Such a tree is said to be unbalanced because the children of a node have widely varying heights.compare (k’. insert (r. (k. k’)  GREATER => lookup (r. r)  GREATER => Node (l. In particular if we insert keys in ascending order. e. and that lookup relies on its argument being a binary search tree (if not. v). 32. k’) = (case String.32. k) = raise (Lookup k)  lookup (Node (l. Were it to be the case that the children of every node had roughly W ORKING D RAFT D ECEMBER 9. e. k) of LESS => Node (insert (l.3 Balanced Binary Search Trees The difﬁculty with binary search trees is that they may become unbalanced. 2002 . e’ as (k’. entry) = Node (Empty. r). Empty)  insert (n as Node (l. r).
all we have to do is to maintain the redblack invariant. hence each has at most 2h − 1 nodes. The general idea of a selfadjusting tree is that operations on the tree may cause a reorganization of its structure to ensure that some invariant is maintained. The empty tree has blackheight 1 (since we consider it to be black). Here’s why.282 Data Abstraction equal height. These two conditions ensure that a redblack tree is a balanced binary search tree. 2. One that we will consider here is an instance of what is called a selfadjusting tree. As we just remarked. a considerable improvement. this ensures that lookup is efﬁcient. lg(n + 1) ≥ h/2. as required. First. we insert the entry as usual for a binary search tree. which implies that the tree is height balanced. First. In W ORKING D RAFT D ECEMBER 9. we have a black node. and each have at most 2h−1 − 1 nodes. Consequently. for a total of 2 × (2h−1 − 1) + 1 = 2h − 1 nodes. then the lookup would take O(lg n) time. If. Can we do better? Many approaches have been suggested. How is this achieved? By imposing a clever representation invariant on the binary search tree. the number of black nodes on any two paths from that node to a leaf is the same. 2002 . with the fresh node starting out colored red. The black height of both children must be h. on the other hand. observe that a redblack tree of height h with n nodes has black height at least h/2. A redblack tree is a binary search tree in which every node is colored either red or black (with the empty tree being regarded as black) and such that the following properties hold: 1. To ensure logarithmic behavior. observe that a redblack tree of black height h has at least 2h − 1 nodes. which is at least 21 − 1. meaning that the children of any node have roughly the same height. and hence has at least 2h/2 − 1 nodes. so the only question is how to perform an insert operation. The children of a red node are black. The empty tree is a redblack tree. We may prove this by induction on the structure of the redblack tree. called a redblack tree (the reason for the name will be apparent shortly). yielding a total of 2 × (2h − 1) + 1 = 2h+1 − 1 nodes. This number is called the black height of the node. its height is logarithmic in the number of nodes. Now. so h ≤ 2 × lg(n + 1). Suppose we have a red node. which is at least 2h − 1. In other words. In our case we will arrange things so that the tree is selfbalancing. called the redblack tree condition. then the black height of both children is h − 1. For any node in the tree.
As a limiting case if the redred violation is propagated to the root of the entire tree. Here is the ML code to implement dictionaries using a redblack tree. These transformations either eliminate the redred violation outright. we reorganize the tree to look like this. and that we are preserving the blackheight invariant since the greatgrandchildren of the black node in the original situation will appear as children of the two black nodes in the reorganized situation. the situation must look like this. the other two cases are handled symmetrically. if the situation looks like this. This diagram represents four distinct situations. Notice that the tree rotations are neatly expressed using pattern matching. or. and the redred violation is pushed further up the tree. Similarly. according to whether the uppermost red node is a left or right child of the black node. We will maintain the invariant that there is at most one redred violation in the tree. If the situation looks like this. which we’ll discuss shortly). The insertion may or may not create such a violation. and each propagation step will preserve this invariant. Once again. 2002 . Notice as well that the binary search tree conditions are also preserved by this transformation. the blackheight and binary search tree invariants are preserved by this transformation. in logarithmic time. a situation in which a red node has a red child. Notice that by making the uppermost node red we may be introducing a redred violation further up the tree (since the black node’s parent might have been red). Let’s look in detail at two of the four cases of removing a redred violation. then we reorganize the tree to look like this (precisely as before). In each case the redred violation is propagated upwards by transforming it to look like this. structure RedBlackTree :> DICT = struct type key = string W ORKING D RAFT D ECEMBER 9.3 Balanced Binary Search Trees 283 doing so we do not disturb the black height condition. We then remove the redred violation by propagating it upwards towards the root by a constanttime transformation on the tree (one of several possibilities. and whether the red child of the red node is itself a left or right child. but we might introduce a redred violation. Consequently. The violation is propagated upwards by one of four rotations. we recolor the root black.32. and we are done rebalancing the tree. those in which the uppermost red node is the left child of the black node. It follows that the parent of a redred violation must be black. push the violation to the root where it is neatly resolved by recoloring the root black (which preserves the blackheight invariant!). which preserves the blackheight condition. You should check that the blackheight and binary search tree invariants are preserved by this transformation.
key1) of EQUAL => datum1  LESS => lk left  GREATER => lk right) in lk dict end fun restoreLeft (Black (z. Black (x. Red (x. Red (z. d4))) = (z. Red (y. Black (x. Red (y. datum1). d3. d3). d4)) = (z. datum)) = let (* val ins : ’a dict>’a dict insert entry *) (* ins (Red ) may have redred at root *) (* ins (Black ) or ins (Empty) is red/black *) (* ins preserves black height *) fun ins (Empty) = Red (entry. right)) = W ORKING D RAFT D ECEMBER 9. Black  restoreRight dict = dict d2). d1. d4)) = (z. d4)) d2. d4)) (z. d4)))) = (z. d1. d3. d4)) fun insert (dict. d1. d1. datum1). d3)). d3). Red (y. d1. entry as (key. right) = (case String. Red (y. d1. d3. d2). Empty. Red (y. d2).compare(key. d2). Black  restoreLeft dict = dict fun restoreRight (Black (x.284 type ’a entry = string * ’a Data Abstraction (* Inv: binary search tree + redblack conditions *) datatype ’a dict = Empty  Red of ’a entry * ’a dict * ’a dict  Black of ’a entry * ’a dict * ’a dict val empty = Empty exception Lookup of key fun lookup dict key = let fun lk (Empty) = raise (Lookup key)  lk (Red tree) = lk’ tree  lk (Black tree) = lk’ tree and lk’ ((key1. d2. d3. d1. Black  restoreRight (Black (x. Red (y. left. Red (y. d2). Black (x. Red Red (y. Empty)  ins (Red (entry1 as (key1. Black (x. left. d3. Black  restoreLeft (Black (z. 2002 . d4)) d2. Red (x. d1.
left. ins left. left. . right)) = (case String. but then the beneﬁts of checking are lost for the code we care about most! (To paraphrase Tony Hoare. it’s as if we used our life jackets while learning to sail on a pond. 2002 .4 Abstraction vs. ins left. The idea would be to introduce a “debug ﬂag” that. ins right))  ins (Black (entry1 as (key1. causes the operations of the dictionary to check that the representation invariant holds of their arguments and results. right)  GREATER => Red (entry1. ins right))) in case of   end end ins dict Red (t as ( . RunTime Checking You might wonder whether we could equally well use runtime checks to enforce representation invariants. indeed. right)  LESS => restoreLeft (Black (entry1.) By using the type system to enforce abstraction.compare (key. 32. when set. datum1). Efﬁciency considerations aside. key1) of EQUAL => Black (entry. but at considerable expense since the time required to check the binary search tree invariant is proportional to the size of the binary search tree itself. whereas an insert (for example) can be performed in logarithmic time. left. key1) of EQUAL => Red (entry. But wouldn’t we turn off the debug ﬂag before shipping the production copy of the code? Yes. and. A more subtle point is that it may not always be possible to enforce data abstraction at runtime. Red . In the case of a binary search tree this is surely possible. )) => Black t (* recolor *) Red (t as ( . we can conﬁne the possible violations of the representation invariant to the dictionary package itself. then tossed them away when we set out to sea. right))  GREATER => restoreRight (Black (entry1.4 Abstraction vs. right)  LESS => Red (entry1. left. left. we need not turn off the check for production code because there is no runtime penalty for doing so. Red )) => Black t (* recolor *) dict => dict It is worthwhile to contemplate the role played by the redblack invariant in ensuring the correctness of the implementation and the time complexity of the operations. moreover.compare (key.32. RunTime Checking 285 (case String. you might W ORKING D RAFT D ECEMBER 9.
tif fail to terminate. Here’s an example. . In many operating systems processes are “named” by integervalue process identiﬁers. It is a theorem of recursion theory that no runtime check can be deﬁned that ensures that a given integervalued function is total. Yet we can deﬁne an abstract type of total functions that. without performing any I/O or having any other computational effect. g) = f o g fun double x = 2 * x . No runtime check can assure us that an arbitrary integer function is in fact total. or perform any number of other operaW ORKING D RAFT D ECEMBER 9. consider an abstract type of total functions on the integers. while not admitting every possible total function on the integers as values. . provides a useful set of such functions as elements of a structure. Another reason why a runtime check to enforce data abstraction is impossible is that it may not be possible to tell from looking at a given value whether or not it is a legitimate value of the abstact type. end structure Tif :> TIF = struct type tif = int>int fun apply t n = t n fun id x = x fun compose (f. Here’s a sketch of such a package: signature TIF = sig type tif val apply : tif > (int > int) val id : tif val compose : tif * tif > tif val double : tif . we know where to look for the error. . end Should the application of such some value of type Tif. .286 Data Abstraction think that we can always replace static localization of representation errors by dynamic checks for violations of them. those that are guaranteed to terminate when called. Using the process identiﬁer we may send messages to the process. As an example. By using these speciﬁed operations to create a total function. 2002 . cause it to terminate. But this is false! One reason is that the representation invariant might not be computable. we are in effect encoding a proof of totality in the code itself.
is indeed a process identiﬁer. whose underlying representation is int. The only way to know is to consider the “history” of how that integer came into being.4 Abstraction vs. You are invited to imagine how this might be achieved in ML. we cannot tell by looking at the integer whether it is indeed valid. we can enforce the requirement that a value of type pid. The thing to notice here is that any integer at all is a possible process identiﬁer. Using the abstraction mechanisms just described. No runtime check on the value will reveal whether a given integer is a “real” or “bogus” process identiﬁer. RunTime Checking 287 tions on it.32. 2002 . W ORKING D RAFT D ECEMBER 9. and what operations were performed on it.
2002 .288 Data Abstraction W ORKING D RAFT D ECEMBER 9.
and relies on Reynolds’ notion of parametricity to conclude that related implementations engender the same observable behavior in all clients. to be correct) and a candidate implementation (whose correctness is to be established). The methodology generalizes Hoare’s notion of abstraction functions to an arbitrary relation. 2002 . W ORKING D RAFT D ECEMBER 9.Chapter 33 Representation Independence and ADT Correctness This chapter is concerned with proving correctness of ADT implementations by exhibiting a simulation relation between a reference implementation (taken. or known.
2002 .290 Representation Independence and ADT Correctness W ORKING D RAFT D ECEMBER 9.
Chapter 34 Modularity and Reuse 1. Naming conventions. Impedancematching functors. 2. Exploiting structural subtyping (type t convention). W ORKING D RAFT D ECEMBER 9. 2002 . 3.
292 Modularity and Reuse W ORKING D RAFT D ECEMBER 9. 2002 .
It is commonly thought that there is an “opposition” between staticallytyped languages (such as Standard ML) and dynamicallytyped languages (such as Scheme). 2002 . In fact. W ORKING D RAFT D ECEMBER 9.Chapter 35 Dynamic Typing and Dynamic Dispatch This chapter is concerned with dynamic typing in a statically typed language. dynamically typed languages are a special case of staticallytyped languages! We will demonstrate this by exhibiting a faithful representation of Scheme inside of ML.
294 Dynamic Typing and Dynamic Dispatch W ORKING D RAFT D ECEMBER 9. 2002 .
Chapter 36 Concurrency In this chapter we consider some fundamental techniques for concurrent programming using CML. 2002 . W ORKING D RAFT D ECEMBER 9.
296 Concurrency W ORKING D RAFT D ECEMBER 9. 2002 .
2002 .Part V Appendices W ORKING D RAFT D ECEMBER 9.
.
Please refer to the documentation of your compiler for information on its libraries. Most implementations of Standard ML include module libraries implementing a wide variety of services. particularly not those that are concerned with the internals of the compiler or its interaction with the host computer system. 2002 . W ORKING D RAFT D ECEMBER 9.The Standard ML Basis Library The Standard ML Basis Library is a collection of modules providing a basic collection of abstract types that are shared by all implementations of Standard ML. All of the primitive types of Standard ML are deﬁned in structures in the Standard Basis. These libraries are usually not portable across implementations. It also deﬁnes a variety of other commonlyused abstract types.
300 The Standard ML Basis Library W ORKING D RAFT D ECEMBER 9. 2002 .
2. These tools usually provide services such as: 1. Separate compilation and linking to support simultaneous development and to reduce build times. Some rely on generic tools. 36.Compilation Management All program development environments provide tools to support building systems out of collections of separatelydeveloped modules. 4. Most implementations of Standard ML rely on a combination of generic program development tools and tools speciﬁc to that implementation of the language. Release management for building and disseminating systems for general use. Source code management such as version and revision control. support these activities in different ways. Libraries of reusable modules with consistent conventions for identifying modules and their components. and different vendors. 3. commonly known as IDE’s (integrated development environments).2 Overview of CM Building Systems with CM D ECEMBER 9. 2002 W ORKING D RAFT . Different languages. please consult your compiler’s documentation for details of how to use them. others provide proprietary tools. Rather than attempt to summarize all of the known implementations.1 36. Other compilers provide similar tools. we will instead consider the SML/NJ Compilation Manager (CM) as a representative program development framework for ML. such as the familiar Unix tools.
302 Compilation Management W ORKING D RAFT D ECEMBER 9. 2002 .
2002 .Sample Programs A number of example programs illustrating the concepts discussed in the preceding chapters are available on the worldwide web at the following URL: http://www.edu/˜rwh/smlbook/examples W ORKING D RAFT D ECEMBER 9.cs.cmu.
304 Sample Programs W ORKING D RAFT D ECEMBER 9. 2002 .
1997. 2002 . and David MacQueen. The Standard ML Basis Library. Mads Tofte.Bibliography [1] Emden R. Robert Harper. Cambridge University Press. 2000. editors. The Deﬁnition of Standard ML (Revised). Reppy. MIT Press. Gansner and John H. W ORKING D RAFT D ECEMBER 9. [2] Robin Milner.