You are on page 1of 15

A Type System for Smalltalk

Justin O. Graver
University of Florida
Ralph E. Johnson
University of Illinois at Urbana-Champaign

Abstract [Suz81, BI82, SCB*86, Str86, Mey88]. In our type


system, types are based on classes (i.e. each class
This paper describes a type system for Smalltalk that de nes a type or a family of types) but subclass-
is type-safe, that allows most Smalltalk programs to ing has no relationship to subtyping. This is be-
be type-checked, and that can be used as the basis of cause Smalltalk classes inherit implementation and
an optimizing compiler. not speci cation.
Our type system uses discriminated union types
1 Introduction and signature types to describe inclusion polymor-
phism and describes functional polymorphism (some-
There has been a lot of interest recently in type times called parametric polymorphism, see [DT88])
systems for object-oriented programming languages using bounded universal quanti cation. It has the
[CW85, DT88]. Since Smalltalk was one of the ear- following features:
liest object-oriented languages, it is not surprising  automatic case analysis of union types,
that there have been several attempts to provide
a type system for it [Suz81, BI82]. Unfortunately,  e ective treatment of side-e ects by basing the
none of the attempts have been completely successful de nition of the subtype relation on equality of
[Joh86]. In particular, none of the proposed type sys- parameterized types, and
tems are both type-safe and capable of type-checking  type-safety, i.e. a variable's value always con-
most common Smalltalk programs. Smalltalk vio- forms to its type.
lates many of the assumptions on which most object-
oriented type systems are based, and a successful Because the type system uses class information, it can
type system for Smalltalk is necessarily di erent from be used for optimization. It has been implemented as
those for other languages. part of the TS (Typed Smalltalk) optimizing compiler
We have designed a new type system for [JGZ88].
Smalltalk. The biggest di erence between our
type system and others is that most type systems
for object-oriented programming languages equate 2 Background
classes with types and subclassing with subtyping Smalltalk [GR83] is a pure object-oriented program-
Authors' address, telephone, and e-mail: ming language in that everything, from integers to
Departmentof Computer and InformationSciences, E301 CSE, text windows to the execution state, is an object. In
Gainesville, FL 32611, (904) 392-1507, graver@cis.u .edu particular, classes are objects. Since everything is an
Department of Computer Science, 1304 W. Spring eld Ave., object, the only operation needed in Smalltalk is mes-
Urbana, IL 61801, (217) 244-0093, johnson@cs.uiuc.edu
sage sending. Smalltalk uses run-time type-checking.
Every message send is dynamically bound to an im-
plementation depending on the class of the receiver.
This uni ed view of the universe makes Smalltalk a
compact yet powerful language.
Smalltalk is extremely extensible. Most opera-
tions are built out of a small set of powerful primi-
tives. For example, control structures are all imple-
mented in terms of blocks, which are the Smalltalk
equivalent of closures. Smalltalk's equivalent to if-
then-else is the ifTrue:ifFalse: message, which is sent class Change:
to a boolean object with two block arguments. If the values
receiver is an instance of class True then the \true "Array with: self class with: self parameters
block" is evaluated. Similarly, if the receiver is an parameters
instance of class False then the \false block" is eval- self subclassResponsibility
uated. Looping is implemented in a similar way us-
ing recursion. These primitives can be used to im- class ClassRelatedChange:
plement case statements, generators, exceptions, and parameters
coroutines[Deu81]. "className
Smalltalk has been used mostly for prototyping
and exploratory development. It is ideal for these class MethodChange:
purposes because Smalltalk programs are easy to parameters
change and reuse, and Smalltalk comes with a pow- "Array with: className with: selector
erful programming environment and large library of
generally useful components. However, it has not class OtherChange:
been used as much for production programming. This parameters
is partly because it is hard to use for multi-person "self text
projects, partly because it is hard to deliver applica-
tion programs without the large programming envi-
ronment, and partly because it is not as ecient as Figure 1: Change and its subclasses.
languages like C. In spite of these problems, the de-
velopment and maintenance of Smalltalk programs is
so easy that more and more companies are developing
applications with it.
A type system can help solve Smalltalk's prob- scientious programmers remove these improprieties
lems. Type information makes programs easier to from nished programs, we wanted a type system that
understand and can be used to automatically check would allow them, since they are often important in-
interface consistency, which makes Smalltalk bet- termediate steps in the development process. Thus,
ter suited for multiperson projects. However, our a static type system for Smalltalk must be as exible
main motivation for type-checking Smalltalk is to as the traditional dynamic type-checking.
provide information needed by an optimizing com- In an untyped object-oriented language like
piler. Smalltalk methods (procedures) tend to be Smalltalk, classes inherit only the implementation
very small, making aggressive inline substitution nec- of their superclasses. In contrast, most type sys-
essary to achieve good performance. Type informa- tems for object-oriented programming languages re-
tion is required to bind methods at compile-time. Al- quire classes to inherit the speci cation of their super-
though some type information can be acquired by classes [BI82, Car84, CW85, SCB*86, Str86, Mey88],
data ow analysis of individual methods [CUL89], ex- not just the implementation [Suz81, Joh86]. A class
plicit type declarations produce better results and speci cation can be an explicit signature [BHJL86],
also make programs more reliable and easier to un- the implicit signature implied by the name of a class
derstand. From this point of view, the type system [SCB*86], or a signature combined with method pre-
has been quite successful, because the TS compiler and post-conditions, class invariants, etc. [Mey88].
can make Smalltalk programs run nearly as fast as C Inheriting speci cation means that the type of meth-
programs. ods in subclasses must be subtypes of their speci ca-
It is important that a type system for Smalltalk tions in superclasses.
not hurt Smalltalk's good qualities. In particular, Since Smalltalk is untyped, only implementation
it must not hinder the rapid-prototyping style often is inherited. This makes retro tting a type system
used by Smalltalk programmers. Exploratory pro- dicult. Since speci cation inheritance is a logical
grammers try to build programs as quickly as possi- organization, many parts of the Smalltalk inheritance
ble, making localized changes instead of global reor- hierarchy conform to speci cation inheritance, but
ganization. They often use explicit run-time checks there is nothing in the language that requires or en-
for particular classes (evidence that code is in the forces this. Thus, it is common to nd classes in
wrong class) and create new subclasses to reuse code the Smalltalk-80 class library that inherit implemen-
rather than to organize abstraction (evidence that a tation and ignore speci cation.
new abstract class is needed) [JF88]. Although con- Dictionary is a good example of a class that in-
herits implementation but not speci cation; it is a represents zero or more repetitions of the something,
subclass of Set. A Dictionary is a keyed lookup table and (something)+ represents one or more repetitions
that is implemented as a hash table of <key, value> of the something. Each of the type forms will be de-
pairs. Dictionary inherits the hash table implemen- scribed in detail in the following sections.
tation from Set, but applications using Sets would We use the abstract syntax in inference rules
behave quite di erently if given Dictionaries instead. and the concrete syntax in examples. Type expres-
Abstract classes, a commonly used Smalltalk pro- sions are denoted by s, t, and u. Type variables
gramming technique, provide other examples where are denoted by p (abstract syntax) or by P (con-
classes inherit implementation but not speci cation. crete syntax). Lists (tuples) of types are denoted by
Consider the method de nitions shown in Figure 1. ~t = < t1; t2; : : :; tn >. The empty list is denoted by
These (and other) classes are used to track changes <> and t ~t is the list < t; t1; t2; : : :; tn >. Similar


made to the system. The abstract class Change de- notation is used for lists of type variables. C denotes
nes the values method in terms of the parameters a class name and m denotes a message selector.
method. The implementation of the values method is Strictly speaking, a type variable is not a type; it
inherited by the subclasses of Change. The result of is a place holder (or representative) for a type. We
sending the values message depends on the result of assume that all free type variables are (implicitly)
sending the parameters message, which is di erent for universally quanti ed at an appropriate level. For
each class in Figure 1. Hence, the speci cation of the example, the local type variables of a method type
values method is also di erent for each of the classes. are assured to be universally quanti ed over the type
A class-based type system with explicit union of the method; the class type parameters of a class
types works for an implementation inheritance hier- are assumed to be universally quanti ed over all type
archy and greatly simpli es optimization, but is not de nitions in the class. We also assume that all type
very exible when it comes to adding new classes to variables are unique (i.e. have unique names) and are
the system. To regain some of the exibility we in- implicitly renamed whenever a type containing type
corporate type-inference and signatures. This gives variables is conceptually instantiated. For example,
us both exibility and the ability to optimize. the local type variables of a method type are renamed
A type system suitable for an optimizing each time the method type is \used" at a call site. We
Smalltalk compiler must also have parameterized do use explicit range declarations for type variables
types. It is dicult to imagine a (non-trivial) to achieve bounded universal quanti cation (see Sec-
Smalltalk application that does not use Collections . tions 3.5 and 3.6). Given these assumptions, we treat
The compiler must be able to associate the type of type variables as types.
objects being added to a Collection with the type of
objects being removed from it. Without parameter- 3.1 Subtyping
ized types, all Collections are reduced to homogeneous
groups of generic objects. Furthermore, due to the The intuitive meaning of the subtype relation is v

potentially imperative side-e ects of any operation, a that if s t then everything of type s is also of type
v

special imperative de nition of subtyping for param- t. The subtyping relation for our type system can
eterized types is used (see Section 3.1). be formalized as a set of type inference rules (as in
The type system must also be able to describe [CW85]). is a set of inclusion constraints for type
C

the functional and inclusion polymorphism exhibited variables. The notation :p t means to extend the
C v

by many Smalltalk methods. Functional polymor- set with the constraint that the type variable p is a
C

phism can be described, in the usual manner, with subtype of the type t. The notation s t meansC ` v

(implicit) bounded universal quanti cation of type that from we can infer s t. A horizontal bar is
C v

variables. Inclusion polymorphism can be described logical implication: if we can infer what is above it
with explicit union types or signatures. then we can infer what is below it.
For a detailed discussion of these issues see The following are basic inference rules.
[Gra89].
C ` t Anything
v

3 Types C ` Nothing t v

The abstract and concrete syntax for type expressions C :p t p t


v ` v

is shown in Figure 2 (abstract syntax on the left and C :t p t p


v ` v

concrete syntax on the right). In the concrete syn- p p


tax grammar terminals are underlined, (something)* C ` v
t ::= Type ::=
(object type) C(~t ) ClassName (of: Type )
(block type) j~t t ! j Block (of: Type ) returns: Type

(union type) j t+t j Type + Type

j (t) j (Type )

(signature type) j < m; ~t ! t > j understands: ((MsgName

(of: Type )+ returns: Type ) )


j (t) j (Type )

(type variable) j p j P
( )
> j Anything j Anything
( )
? j Nothing j Nothing

Figure 2: The abstract and concrete syntax of types.

The following rules are used for the type parame- type list that may accompany a class name is xed by
ter lists of object types (see Section 3.2) and for the the corresponding class de nition. For example, the
argument type lists for block types (see Section 3.4). classes SmallInteger and Character have zero class type
parameters, so SmallInteger and Character are valid
<> <>
C ` v object types. The classes Array and Set each have one
~t ~t 0 class type parameter, so Array of: SmallInteger and
s s0
C ` v C ` v
Set of: (Array of: Character) are valid object types.
s ~t s0 ~t 0
C `  v  Due to the imperative nature of Smalltalk [GR83,
Subtyping is re exive, transitive, and antisym- Joh86] and because of the antimonotonic ordering of
metric. function types [DT88] we de ne subtyping for ob-
t t C ` v ject types as equality, i.e. one object type is included
s t t u in another if and only if they are equal (Cardelli
also takes this approach for \updatable" objects in
C ` v C ` v

s u
C ` v
[Car85]). (Recall that is a set of inclusion con-
C
s t
C ` v t s C ` v straints and C is a class name.)
s=t C `
C `~s = ~t
3.2 Object Types C `C(~s ) C(~t )
v

Some variables contain only one kind of object. For


example, all Characters1 have an instance variable 3.3 Union Types
that contains a SmallInteger ASCII value. Other vari- The type system de ned so far cannot describe an
ables can contain di erent kinds of objects. For ex- Array containing both Characters and SmallIntegers .
ample, a Set can contain SmallIntegers , Characters, In Smalltalk, a variable can contain many di erent
Arrays, Sets, and so on. Each class has a set of zero kinds of objects over its lifetime. Thus, a type can
or more type variables called class type parameters be a nonempty \set" of object types (or other types).
that can be used in type expressions to specify the An example of a union type is
types of instance variables. These are the only type
variables that can appear in type declarations for in- (Array of: SmallInteger) + (Array of: Character)
stance variables. + (Array of: (SmallInteger + Character)).
An object type is a class name together with a pos- Here, the plus operator is read as \or," so a variable of
sibly empty list of types. The type list of an object the above type could contain an Array whose elements
type provides values for the class type parameters de- are all SmallIntegers , all Characters, or a mixture of
ned for the corresponding class. Thus, the size of the the two.
1 The phrase \a SomeClass" refers to an instance of class Our type system does not use the subclass relation
SomeClass.
itself.
The phrase \class SomeClass" refers to the class on classes to induce the subtype relation on types. In-
stead, type inclusion is based on discriminated union
types. An object type is included in a union type
t u

only if is included in one of 's elements.


t u SequenceableCollection subclass: #OrderedCollection
C` v instanceVariables: ' rstIndex <SmallInteger >
lastIndex <SmallInteger >'
C` v +
s t

s t u
classVariables: "
A union type is included in a union type
u u
0
only if typeParameters: 'ElementType'
each element of is included in .
u u
0 poolDictionaries: "
C` v s C` v category: 'Sequenceable-Collections'
C`
u

v
t u

+
Figure 3: Class de nition for OrderedCollection .
s t u

Some examples are:


Character v Character + SmallInteger
Array of: SmallInteger v
(Array of: SmallInteger) + (Array of: Character) Block returns: Character
Array of: SmallInteger 6v v Block returns: (Character + SmallInteger)
Array of: (SmallInteger + Character)
Note that our type system does not re ect class inher- Block of: (Float + SmallInteger) returns: SmallInteger
itance. Even though Integer is a subclass of Number , v Block of: Float returns: (Character + SmallInteger).
Integer is not a subtype of Number. However, we can
specify the type of all subclasses of Number by listing 3.5 Typed Class De nitions
them, i.e. Integer + Float + Fraction.2 All instance variables and class variables must have
their types declared. Each class de nes a set of type
3.4 Block Types variables called class type parameters, which may be
Blocks (function abstractions) are treated di erently used to specify the types of instance and class vari-
from other objects. A block type consists of the name ables.
Block, a possibly empty list of types for block argu- Figure 3 shows the de nition of OrderedCollection ,
ments, and a return type. For example, which is a subclass of SequenceableCollection . Note
that the type of each instance variable is declared
Block of: (Array of: Character) of: SmallInteger when the variable is declared and that angle brackets
returns: Character are used to set o type expressions from the regular
represents the type of a block with two arguments, Smalltalk code.
and Block returns: SmallInteger represents a block A class de nition de nes an \object type" in
with no arguments. Block types di er from object which the type list is the list of class type parameters
types in several ways. Unlike object types, there is (appearing in the same order as in the class de ni-
no class Block. Types beginning with the name Block tion). The type de ned by the above class de nition
can have di erent sized argument type lists. would be
A block type is included in a block type only
u u
0
OrderedCollection of: ElementType:
if the return type of is included in the return type
u

of and if each argument type of is included in


u
0
u
0
The scope of the class type parameter
the corresponding argument type of (this is the u ElementType is limited to the method and variable
standard antimonotonic subtype relation for function de nitions in class OrderedCollection (the same scope
types [MS82]). as a class variable). Class type parameters are inher-
C` v C` v0 0 ited by subclasses, just like instance and class vari-
C` ! v !
~
s ~
s t t
ables. Any use of the OrderedCollection type outside
of this scope must \instantiate" its class type param-
0 0
~
s t ~
s t

For example, eter to a type constant (like Character ) or to a locally


2 Common union types such as Integer + Float + Fraction known type variable. Thus, the above type actually
and True + False are presently abbreviated using the global de nes a family of object types.
variables NumberType and BooleanType , respectively. Another The range (upper bound for type instantiation)
approach is to use the object type of an abstract class (classes of a class type parameter may be restricted by
such as Number, Collection, and Boolean that provide an im- including an optional range type declaration. A
plementation template for subclasses but have no instances of
their own) to automatically denote the union of the types of class type parameter may only be instantiated to
all its non-abstract subclasses. a type that is a subtype of its range type. The
de nition. Notice that certain elds of the type decla-
f localTypeVariables: <P1 > <P2 > rations have been omitted. The syntax rules for type
receiverType: <Self > declarations are fairly liberal. Fields can occur in any
arguments: arg1 <arg1Type > arg2 <arg2Type > order. All elds except receiverType: and returnType:
temporaries: temp <tempType > can have multiple occurrences. The smallest type
blockArguments: blkArg <blkArgType > declaration for a method is \fg."
returnType: <Self > g Self represents the type of the pseudo-variable self
(and super). Unfortunately, the class of self is di er-
message selector and argument names ent in each class that inherits a method, so Self must
\comment stating purpose of message" di er, too. Thus, each method is type checked for
each class that inherits it by replacing Self with the
j temporary variable names j type appropriate for the inheriting class. In the ab-
statements sence of a receiverType: declaration (which is over 99%
of the time), Self defaults to the object type for the
Figure 4: Template for typed methods. class in which the method is being compiled. Oth-
erwise, Self is replaced with the type given in the
receiverType: declaration. A legal receiverType: must
be a subtype of the default value of Self.
Returning to the example of Figure 5,
range declaration of a class type parameter is syn- SequenceableCollection has one class type parameter
tactically identical to the type declaration for a called ElementType, so the type substituted for Self
variable. In the above example, if we wished to will be SequenceableCollection of: ElementType. The
restrict the elements of OrderedCollections to be argument aBlock is a block that takes one argument of
Characters, the typeParameters: eld would be de- type ElementType and returns an object of unknown
clared as 'ElementType <Character >'. If the range type P. When do: is envoked in another method
declaration of a class type parameter is omitted it is de nition both ElementType and P will be associ-
assumed to be Anything. ated with actual receiver and argument types. The
temporary variables index and length have values de-
3.6 Typed Method De nitions pendent on the size of Collections . In Smalltalk im-
plementations with 30+ bit SmallIntegers , the possi-
The types of method arguments, temporary variables, ble size of any Collection is well within the range of
and block arguments can be explicitly declared or in- SmallIntegers .3 There is no explicit return statement
ferred by the TS type-inference mechanism [Gra89]. in the method so it implicitly returns self.
The only di erence between a typed method and an Type declarations and typed methods introduce
untyped method is that the typed method has type the notions of message and method types. A message
declarations. Type declarations must precede any type consists of a message selector, a receiver type, a
part of the method de nition. A typed method tem- list of argument types, and a return type. A method
plate is shown in Figure 4. The arguments for the type consists of a message type and a set of constraints
elds arguments:, temporaries: , and blockArguments: on the type variables used in the message type. A
are lists of identi er <type > declaration pairs. The method type denotes the type of a method relative
arguments for the receiverType: and returnType: elds to a speci c class of receivers. Therefore, Self should
are single type declarations. The arguments for already have been expanded and should never appear
the localTypeVariables: eld are capitalized identi ers. in a method type. The type of the above method is
The range of a type variable can be restricted by using denoted by
the range: modi er. For example, the local type vari-
able declaration <P range: (Integer + Character)> <SequenceableCollection of: ElementType >
declares P to be a type variable that can be asso- do: <Block of: ElementType returns: T >
ciated with any of the types Integer, Character, or "<SequenceableCollection of: ElementType >.
Integer + Character. Local type variables are instan-
tiated during type-checking to types determined by Message types use the same notation. Local range
the types of actual arguments at a call site. Local constraints on type variables are shown enclosed in
type variables will usually, though not necessarily, 3 Smalltalk de nes in nite precision integer arithmetic
correspond to the class type parameters of some class. using the classes LargeNegativeInteger, SmallInteger, and
The do: method for SequenceableCollection is LargePositiveInteger (the three subclasses of class Integer).
shown in Figure 5 as an example of a typed method SmallIntegers are essentially machine integers; instances of the
Large-Integer classes are more complex.
f localTypeVariables: P < >

arguments: aBlock <Block of: ElementType returns: P >


temporaries: index <SmallInteger > length <SmallInteger >
returnType: <Self > g
do: aBlock
\Evaluate aBlock for each of the receiver's elements."
j index length j
index 0.
length self size.
[(index index + 1) <= length]
whileTrue: [aBlock value: (self at: index)]

Figure 5: The do: method for SequenceableCollection .

braces following the return type of a method type.


For example, f receiverType: Self< >

arguments: anInteger <SmallInteger >


<Object foo: P " P
> < > < > returnType: <InstanceType > g
f P v Integer + Character g.
< > < >

new: anInteger
Subtyping for message types, although compli- \Answer a new instance of size anInteger."
cated by type variables (see [Gra89]), is essentially
the same as for block types, with the added condi- "(super new: (anInteger max: 1)) setTally
tion that the selectors must be equal.
Figure 6: The new: method in Set class.
3.7 Metaclass Method De nitions
In regular class method de nitions the type Self
refers to the type of the receiver. It is useful to extend
this notion to be able to refer to the type of the class be inferred for an expression the expression is type-
of the receiver. This is done by using the Self class correct. The following rule describes the essence of
type speci cation. subtyping:
Similarly, when de ning methods in a metaclass it
is useful to be able to refer to the type of instances of C A` :
; e C` v s s t

the metaclass. This is done by using the InstanceType C A` : ; e t


type speci cation. For example, Figure 6 shows the
typed method de nition of new: for the Set metaclass. 4.1 Basic Type-Checking
A method is type-correct if each of its statements is
type-correct and if the type of the expression in each
4 Type-Checking return statement is a subtype of the declared return
type of the method. Type-checking an expression re-
Type-checking is speci ed using inference rules simi- quires knowing the types of its subexpressions.
lar to those used to describe subtyping. Besides a set The types of all variables are declared.
of type constraints C , we need a set of assumptions
A about the types of variables. The notation A : :v t CA : ` :
; :v t v t

means to extend the set A with the assumption that


variable has type . The notation C A ` : means
v t ; e t The types of the pseudo-variables true , false, and nil
that from the constraints C and assumptions A we can are constants.
infer that an expression has type . If a type can
e t ` true : True
` false : False Type-checking a message send involves looking up one
` nil : Unde nedObject or more method types. The message send is type-
correct if, for each method type, there exists a map-
The type of a literal (constant) is inferred from its ping of type variables to types such that, under this
class. mapping, the type of each actual argument is a sub-
` n : Integer type of the corresponding formal argument. The type
` $c : Character of a message send is the union of the return types of
each of these method types, evaluated in their respec-
etc. tive type assignment environments.
If this type is not general enough then an explicit type Let H be a hierarchy of typed class de nitions and
declaration can be used to supply the \correct" type H < C; m > be the de nition in class C for method
[Gra89]. m, i.e.
An assignment statement v e is type-correct if
both v and e are type-correct (variables are trivially H < C; m >= ~p v ~u C(~s ) m y~ : ~t "u ~z : ~t e
0

type-correct) and if the type of e is a subtype of the where ~p v ~u are local type variables with range
type of v. The type of an assignment statement v e declarations, C(~s ) is the type of the receiver, ~y : ~t
is the type of e. are typed arguments, u is the return type, ~z : ~t are 0

C; A ` v : s C; A ` e : t C ` t v s the typed temporaries, and e is the method body.


C ; A ` (v e) : t The notation
A statement sequence e1 : : :en is type-correct if C(~s )  ~t ! u j ~p v ~u
each statement in the sequence is type-correct. The
type of a statement sequence is the type of the last denotes a method type where C(~s ) is the receiver
statement in the sequence. type, ~t is the list of argument types, u is the return
C ; A ` e 1 : t 1 : : : C ; A ` e n : tn type, and p~ v ~u are range constraints. For notational
convenience, we use the following inference rule to
C ; A ` (e1 : : :en ) : tn extract method types from method de nitions.
The type of a block is inferred from the declared H < C; m >= ~p v ~u C(~s ) m ~y : ~t "u ~z : ~t e
types of its arguments and the inferred type of its
0

statement list. `< C; m >d : C(~s )  ~t ! u j ~p v ~u


C ; A:~y : ~t ` e : u The notation `< C; m >d : t means it can be inferred
C ; A ` [~y j e] : ~t ! u that t is the declared method type of the method
invoked when sending the message m to an instance
An additional inference rule for blocks is given in Sec- of class C.
tion 4.2. In the following inference rule for message sends,
A return statement "e is type-correct if e is type- e : C(~s ) signi es that the type of the receiver is an
correct and if the type of e is a subtype of the de- object type, ~e : ~t denotes the typed arguments, and
clared return type of the method. The set of assump- e m ~e denotes a message send.
tions A contains a special variable returnValue whose
type is the declared return type of the method being C ; A ` e : C(~s ) C ; A ` ~e : ~t
type-checked. The only place a return statement can
appear is as the last statement in the statement list C ; A `< C; m >d : C(~s )  ~t ! u j ~p v ~u
0 0 0

of a method or a block. The types of the top-level C :(~p v ~u):(C(~s ) v C(~s )):(~t v ~t ) ` u v u
0 0 0

statements in the statement list of a method can be C ; A ` (e m ~e ) : u


ignored. The type of a block will always be checked
for inclusion within some block type. To a ord the Message sends to receivers with union types are han-
maximum freedom to such blocks, the type given to dled by an inference rule in Section 4.2.
return statements is the special type Nothing, which Consider, for example, the message send
is included in any type. anArray at: 2, where the type of the variable anArray
is Array of: Character. Let the method type associ-
C ; A ` returnValue : t C ; A ` e : t ated with the method that would be invoked if the
C ; A ` ("e) : Nothing at: message were sent to an Array be
Due to the simplicity of Smalltalk, the only re- <Array of: P > at: <SmallInteger > "<P >.
maining kind of expression is the message send.
The inference rule for message sends produces a set Case analysis is useful when used with the fol-
of inclusion equations lowing rule: a block whose body is type-incorrect has
type IllegalBlock. (i.e. if no better type can be in-
P v Anything (range of P) ferred for a block then it can always be inferred to be
Array of: Character v Array of: P (receiver type) IllegalBlock )
SmallInteger v SmallInteger (argument type)
whose solution is C ; A ` [~y j e] : IllegalBlock
P v Character: Thus, a type-incorrect block will not necessarily cause
the containing method to be type-incorrect. This
In general, there may not be a unique solution to a rule, combined with case analysis and the de nitions
given set of inclusion equations. How to deal with of the normal Smalltalk control structures, provides
(avoid) multiple solutions is discussed in [Gra89]. If automatic type discrimination, as shown in the next
no solution exists then there is a type error some- example.
where. The typed method
An inference rule is also needed to type-check de nition of controlToNextLevel for class Controller is
method de nitions. Recall that the abstract de ni- shown in Figure 7. The notNil message is de ned
tion for a method m looks like to return an object of type True for all classes ex-
cept Unde nedObject , for which it is de ned to re-
p~ v ~u C(~s ) m ~y : ~t "u ~z : ~t e:
0
turn an object of type False. If the type of aView is
assumed to be View then the type of the expression
If, by adding type declarations appropriately to C and aView notNil is True. The ifTrue: method de ned for
A, it can be inferred that e has type u then it can be class True has a type
inferred that the de nition of m is type-correct and
is of type <True > ifTrue: <Block returns: P > "<P >
C(~s )  ~t ! u j ~p v ~u:
This is expressed in the following inference rule where where P is a local type variable for the method. Since
self : C(~s ) is the implicit type of the pseudo-variable the type of aView is View, the controller and startUp
self and returnValue is a special variable used for type- messages are type-correct and P can be mapped to
checking return statements. the return type for the actual block argument.
On the other hand, if the type of aView is assumed
H < C; m >= p~ v ~u C(~s ) m ~y : ~t "u ~z : ~t e 0 to be Unde nedObject then the type of the expression
aView notNil is False. Class False de nes the ifTrue:
C :~p v ~u; A:self : C(~s ): ~y : ~t: returnValue : u: ~z : ~t ` e : u
0
method to have type
C ; A `< C; m >: C(~s )  ~t ! u j ~p v ~u
<False > ifTrue: <P > "<Unde nedObject >
Note that < C; m >: t denotes the true inferred type
of a method m for class C, which must be included so objects of type False accept ifTrue: messages with
in the declared method type < C; m >d : t . 0
an argument of any type. The method has this type
because it ignores its arguments and simply returns
4.2 Case Analysis nil . Since the type of aView is Unde nedObject, the
controller message is unde ned; the body of the block
Type-checking Smalltalk programs frequently re- is illegally typed and the block's type is IllegalBlock.
quires case analysis of a union type. Even though However, P can be mapped to IllegalBlock, so the
the type of a variable may be a union type when its block can legally be an argument of the ifTrue:
use throughout an entire method is considered, its message for class False. Thus, the entire method
type in any particular use can be considered an ob- ControlToNextLevel is type-correct.
ject type (or one of several object types). This re ects
the fact that a variable may reference only one object
at a time. It is therefore more precise, when type- 4.3 Inheritance
checking an expression containing a variable with a One way that inheritance complicates type-checking
union type, to type-check the expression separately is that the type of a method for a subclass that in-
for each type in the union type. The type of the ex- herits it is slightly di erent from its type in the class
pression is then the union of the separate result types. that de nes it. For example, the type of the receiver
C ; A:v : s ` e : u C ; A:v : t ` e : u is di erent, and the type of the returned value will be
C ; A:v : s + t ` e : u di erent when the receiver is returned. This problem
f temporaries: aView <View + Unde nedObject >
returnType: <Self > g
controlToNextLevel
\Pass control to the next control level, that is, to the Controller of a subView
of the receiver's view if possible. The receiver nds the subView (if any)
whose controller wants control and sends that controller the startUp message."
j aView j
aView view subViewWantingControl.
aView notNil ifTrue: [aView controller startUp]

Figure 7: The controlToNextLevel method for class Controller .

is solved by referring to the type of the receiver as methods in each subclass that inherits them. The
Self and expanding Self to the type of the receiver method de nition (de ned in terms of Self ) is inher-
in each subclass. Self is not a type, but instead is ited and Self is expanded to refer to the current class
\macro-expanded" to a type at compile-time. as described in section 3.6. We assume that the hi-
Abstract classes provide another way in which the erarchy of typed class de nitions H (see Section 4.1)
type of a method can change when it is inherited. contains not only the user declared method de ni-
Consider the method de nitions shown in Figure 1. tions for a class, but also any methods that can be
The values method is de ned in class Change and in- meaningfully (i.e. type-correctly) inherited by a class.
herited by the other classes. The return type of values If H0 is a partial function from < C; m > pairs to
depends on the return type of parameters, which is method de nitions representing user declared meth-
di erent for each class. The types of the di erent ods then H is derived by extending H0 to include
parameters methods are de nition points for inherited methods. In other
words, if H0 < C; m > is a user de ned method
<Change > parameters "<Change > then H < C ; m > has the same de nition provided
0

<ClassRelatedChange > parameters "<Symbol > that C is a subclass of C and H0 < C ; m > is
0 0

<MethodChange > parameters "<Array of: Symbol > not de ned. There are many possible H function. A
<OtherChange > parameters "<String > type-correct H is one in which, for all method types
where the types in \receiver position" are receiver derivable from H, the declared type of H < C; m >
types and the types following the \"" are return is the same as the type inferred by type-checking.
types. Thus, the values method must be recompiled (`< C; m >d : t) =) (C ; A `< C; m >: t)
in the context of each subclass of Change to compute ` C; A
its correct return type for that context.
Another way in which abstract classes show that Strictly speaking, if every method de nition in H
Smalltalk classes inherit implementation, not speci - must type-check then certain methods in abstract
cation, is that some methods cannot be executed by classes must be removed in order to have a type-
all of the classes that inherit them. For example, class correct H (e.g. the addAll: method in class Collection ).
Collection has a number of methods that use the add: In practice, inherited methods are type-checked
message, such as addAll:, but it does not implement or upon demand, i.e. when code is rst compiled that
inherit the add: message itself. Instead, add: is imple- might cause that method to be invoked. Array can
mented by the various subclasses of Collection . Some then be a legitimate subclass of Collection ; no inher-
of its subclasses, such as Array, do not implement add: ited code that invokes add: will be type-correct.
and thus cannot use methods like addAll: . Smalltalk The bene ts of delaying type-checking of inherited
relies on run-time type-checking and the \does not methods is that a method is only retype-checked for
understand" error to detect when an unde ned mes- every subclass that actually inherits it, and that this
sage is sent to an object. Our type system detects all type-checking is spread out over a large amount of
such cases at compile-time. time. A new class does not require any of the methods
These problems are solved by retype-checking that it inherits to be type-checked when it is created.
The message types in a signature type may have oc-
f localTypeVariables: <P > currences of Self to denote the type represented by
receiverType: <Block returns: (True + False)> the signature. Receiver types of Self may be omitted.
arguments: aBlock <Block returns: P > When an object type is being checked for inclusion
returnType: <Unde nedObject > g in a signature type Self is instantiated to the object
type.
whileTrue: aBlock A signature type includes object types belonging
"self value to classes not yet created. Thus, signature types con-
ifTrue: tain an in nite number of object types. In fact, the
[aBlock value. type speci ed by an empty signature contains every
self whileTrue: aBlock] possible object type, since every object type under-
stands every message type in the signature.
Figure 8: whileTrue: for class BlockContext . C ; A ` t v <>
Since signature types specify the largest possible
types, procedures that use them exhibit the most
Adding a new method to a superclass does not require polymorphism and so are the most exible. However,
type-checking it for every subclass that inherits it, nor use of signature types prevents the compiler from per-
does adding a new variable. However, changing the forming some important kinds of optimizations, since
type of a method or variable might require a lot of they do not provide enough information about the
computation to ensure that each of its uses is still class of the receiver to allow compile-time binding of
type-correct. message sends.
A signature type is speci ed by a special kind of
4.4 Speci c Receivers type declaration.
Some classes have methods that can be executed <understands: #(tm1 tm2 : : :)>
by only a subset of their instances. For example,
the whileTrue: message should be sent only to blocks Here, the tm 's are strings containing message type
that return Booleans. We can specify this by using speci cations. An example will be given shortly. Sig-
the receiverType: eld in the type declaration of the nature types may also be used to restrict the range of
method de nition as shown in Figure 8. Recall that type variables. A local type variable with a signature
if this eld is omitted then Self defaults to the object range would have a declaration of the form
type for the class in which the method is being com- <P understands: #(tm1 tm2 : : :)>.
piled. Although whileTrue: is nearly the only method
in the Smalltalk-80 class library that needs to declare Since such declarations may be recursive, complex
a speci c receiver, it is easy to imagine other meth- mutually recursive signature types can be built.
ods that would need this feature, such as a summation As an example, consider the occurrencesOf:
method in Collection . method for class Collection shown in Figure 9. The
only restriction placed on the type of the argument
4.5 Signature Types anObject is that its corresponding class (or classes)
must implement (or inherit) an = (equality) message
A signature type is a type (i.e. a set of object and that will take an argument of type ElementType and
block types) speci ed by a set of message types. An return a boolean. It is helpful to compare the use of
object type (or block type) t is included in a signature the do: message in this example with its de nition in
type s if, for each message type m 2 s, a message of Figure 5.
type m sent to an object of type t is type-correct. In Type-checking a message send to a receiver whose
other words, an object type is in a signature type if type is a signature type is straightforward since all
it \understands each message in the signature." relevant type information is contained in the signa-
C ; A ` e : t C ; A ` ~e : ~t C ; A ` (e m ~e ) : u ture type.
C ; A ` t v << m; t  ~t ! u >> C ; A ` e :< : : : < m; ~t ! u > : : : >
0 0

C ; A ` t v << m; t  ~t ! u >> C ; A ` ~e : ~t C :~t v ~t ` u v u


0 0

C ; A ` t v << m ; t  ~t ! u >>
0 0 0 0
C ; A ` (e m ~e ) : u
C ; A ` t v << m; t  ~t ! u >; < m ; t  ~t ! u >>
0 0 0 0
f receiverType: <Self >
arguments: anObject <understands: #('= <ElementType > "<False + True >')>
temporaries: tally <SmallInteger >
blockArguments: each <ElementType >
returnType: <SmallInteger > g
occurrencesOf: anObject
\Answer how many of the receiver's elements are equal to anObject."
j tally j
tally 0.
self do: [:each j anObject = each ifTrue: [tally tally + 1]].
"tally
Figure 9: The occurrencesOf: method in class Collection .

5 Beyond Signatures
class Object:
The type system described so far, with explicit union
types, case analysis, and signature types, is quite isLookupKey
powerful. However, there are still some methods that "false
elude type-checking.
Although Smalltalk is a typeless language, it is
possible to discriminate between classes of objects class LookupKey:
and provide a simple kind of explicit run-time type-
checking. An example is the isLookupKey message isLookupKey
whose implementation is shown in Figure 10. Mes-
sages like isLookupKey can be used in a method to
"true
\type-check" an argument before sending it a mes- Figure 10: Implementation of isLookupKey .
sage that it might not understand.
A example of this style of programming is the =
(equality) message shown in Figure 11. Equality must
be de ned between any two objects and should not
be subject to run-time errors. This style of imple- When a send of = to a LookupKey needs to be
mentation meets both of these criteria. The problem type-checked, the code in Figure 11 is substituted for
is how to type-check such a method. the call. The types of the actual arguments then re-
The type of this method can be roughly stated place those of the formal arguments, allowing both
as follows. An argument must understand the de nition and use information to be used in type-
isLookupKey message. If an argument responds to checking. This usually permits the = method to be
this message with true then it must also understand type-checked completely. If not, then type-checking
the key message and the method will (presumedly) for the method that sends = must also be deferred.
return an object of type True + False, otherwise the This static analysis technique has proven useful for
method returns an object of type False. type-inference as well as for type-checking [Gra89].
In general, the type of a polymorphic method like
= is complicated, but its type for a speci c use is
simple and easy to understand. Thus, type-checking a 6 Type Safety
method defers some of the type-checking for a method A type system can only be shown correct relative to
until its invocations are type-checked. It doesn't mat- a formal de nition of the language. This section out-
ter when a method is type-checked, only that type- lines a proof of correctness for the type system rela-
checking is completed before any code that invokes tive to Kamin's denotational semantics of Smalltalk
the method is executed. [Kam88].
= aLookupKey
De nition 1 (Object/type consistency) An ob-
aLookupKey isLookupKey ject o is consistent with a mapping from objects to
"
ifTrue: [ self key = aLookupKey key] types  , relative to a class hierarchy H, and a state ,
"
ifFalse: [ false] if
1.  (o) = C(~t),
Figure 11: = (equality) in class LookupKey .
2. the class of o is C, and
3. for each instance variable xi of o, let ti be the
declared type of xi with type variables replaced
Usually a type is described as a set of objects. by the types given in  (o). Then the type of the
Type safety then means that the value of an expres- object referred to by xi is a subtype of ti .
sion is always contained in its type. However, our
types are not sets of objects, so a di erent de nition
of type safety is needed. We will de ne type safety
by assigning an object type to each object and then
showing that the value of an expression always is as- Kamin's semantics. The rst is when a variable
signed a type contained in the expression's type. changes its value. This happens in an assignment
The type of an object depends not only on its statement and in assigning arguments to formal pa-
current state but also on its past and future states. rameters when evaluating a method or block.
Thus, it may not be possible to decide whether an ob- The second way that the state changes is when a
ject is in a particular type, though it is often possible new object is created. The type of some new objects,
to show that it is not. For example, a is never in
Set such as new blocks, are known in advance so the new
Array of: Character, and a containing
Set Characters object will always be of the correct type. Unfortu-
is not in Set of: SmallInteger. However, it is not easy nately, problems occur with the new primitive and
to tell whether a Set containing only is
Characters when creating a new context to evaluate a method.
in Set of: (SmallInteger + Character)|it all depends This is because instance variables of new objects and
on whether or not the is referenced by a variable
Set local variables of new contexts are both initialized to
that requires it to contain only .
Characters nil, but the types of these variables usually do not
If each object had a type assigned to it then we include Unde nedObject. Thus, until the variables
could check whether an object was consistent with its are initialized, their value is not consistent with their
type by checking whether its class was consistent with type. We ensure type-safety by requiring all variables
its type and whether the types assigned to the values whose types do not include Unde nedObject to be as-
of its instance variables were included in the declared signed before they are read, and use ow analysis to
types of its instance variables. This assignment would check this. Thus, we will change the statement of the
be consistent if every object was consistent with the proposition slightly.
type assigned to it. Although this type-assignment Type-Safety Theorem: If a type-correct program
might not be unique, any consistent type-assignment is in a state that is consistent with a type assignment
could be thought of as describing the types of all ob-  except for unassigned variables, and if a variable is
jects. These ideas are formalized in De nition 1. always assigned before it is read, then any succeeding
A state is consistent with a type-assignment  if state will be consistent with  .
every object in the state is consistent with  .
Proposition: If a type-correct program is in a state Proof: In a type-correct program, the type of the
that is consistent with a type assignment  then any expression of an assignment statement must be in-
succeeding state will be consistent with  . cluded in the type of the variable, so, if each expres-
The only way that a consistent type-assignment can sion returns a value that is consistent with its type
become inconsistent is if the state changes. This is then assignment statements maintain the consistency
because the type-assignment itself is constant and, of the type-assignment. Once a variable is assigned,
in Kamin's semantics, objects never change and the its value will have a type that is included in the type
class hierarchy does not change. Thus, we can prove of the variable. Thus, if the value of any expression
the proposition by showing that  is maintained as an has a type that is included in the type of the expres-
invariant every place where the state can be changed. sion then whenever a variable v is read, the type of
There are two ways that the state changes in the value of v will be consistent with the type of v.
The theorem is proved by structural induction on Type-safety also depends on all the primitives be-
the number of message sends in the evaluation of an ing given correct types. Since primitives are written
expression. The base case is where there are no mes- in a language other than Smalltalk, it is impossible
sage sends. Since there are no message sends there to reason about them within the framework presented
can only be assignment statements, whose right-hand here. A few primitives are inherently unsafe. These
sides are literals or variables. The variables either are are primarily used by the debugger. Most of the prim-
type-consistent or will be assigned before they are itives have simple types, however.
read, so the assignment statements will all maintain
the consistency of the type-assignment.
The induction step is to assume that any expres- 7 Conclusion
sion that can be evaluated in ? 1 message sends
n
Our type system for Smalltalk is type-safe. It has
(or less) is type-safe and to prove that evaluating an been implemented in Smalltalk and used in the TS
expression optimizing compiler for Smalltalk [JGZ88]. It has
rcv k1: exp1 k2: exp2 : : : km: expm been able to solve most type-checking problems in
the standard Smalltalk-80 class hierarchy. Thus, it is
that requires message sends is type-safe. Since an
n correct, useful, and usable.
expression must be type-checked before it can be eval- Our type system is also unique. It di ers from
uated, we know that the types of the argument ex- other type systems for object-oriented programming
pressions to the k1:k2: km: message are included in
::: languages by acknowledging that only implementa-
the types of the formal parameters of the method that tion is inherited, not speci cation, and by handling
is invoked. Evaluating any expi will take less than n case analysis of union types automatically.
message sends, so each argument is consistent with its Our type system is more complicated than other
corresponding type. When the new context is created type systems for object-oriented programming lan-
and the method is executed, the formal parameters guages. Whether this complication is justi ed de-
of the method will be bound to objects whose type is pends partly on whether a new language is being de-
included in the type of the parameter. Methods also ned or whether a type system is being de ned for
create temporary variables, but they are unassigned, Smalltalk. Current Smalltalk programming practice
so invoking a method maintains the consistency of the requires a type system like ours.
type-assignment (except for unassigned variables).
We assumed that the expression involves mes- n

sage sends. One of them is the k1:k2: km: message,


:::
Acknowlegements
so evaluating the body of the method will involve less This research was supported by NSF contract CCR-
than message sends. Thus, every expression eval-
n
8715752, by the AT&T ISEP grant, and by an equip-
uation in it will result in an object whose type is ment grant from Tektronix.
included in the type of the expression that produced
it. In particular, the type of the object returned from
the method will be included in the type of the ex-
pression in the return statement, and, according to
References
the type-checking rules for the return statement, the [BHJL86] Andrew Black, Norman Hutchinson, Eric
type of the expression in a return statement must be Jul, and Henry Levy. Object structure
included in the declared return type of the method. in the Emerald system. In Proceedings
Thus, the type of the object returned as a result of of OOPSLA `86, pages 78{86, Novem-
the k1:k2: km: message will be included in the re-
::: ber 1986. printed as SIGPLAN Notices,
turn type of the method. 21(11).
The type of
[BI82] A. H. Borning and D. H. H. Ingalls.
rcv k1: exp1 k2: exp2 : : : km: expm A type declaration and inference sys-
tem for Smalltalk. In Conference Record
is the union of the return types of a number of meth- of the Ninth Annual ACM Symposium
ods, but it certainly includes the return type of the on Principles of Programming Languages,
method that was invoked. Thus, the result of the ex- pages 133{139, 1982.
pression, which is the object being returned by the
method, has a type that is included in the type of the [Car84] Luca Cardelli. A semantics of multiple in-
expression. heritance. In Semantics of Data Types,
Lecture Notes in Computer Science, n. [Kam88] Samuel Kamin. Inheritance in Smalltalk-
173, pages 51{67, Springer-Verlag, 1984. 80: A denotational de nition. In Confer-
ence Record of the Fifteenth Annual ACM
[Car85] Luca Cardelli. Amber. In Combinators Symposium on Principles of Programming
and Functional Programming Languages, Languages, pages 80{87, 1988.
Proceedings of the 13th Summer School of
the LITP, Le Val d'Ajol, Vosges (France), [Mey88] Bertrand Meyer. Object-oriented Software
May 1985. Construction. Prentice Hall, 1988.
[CUL89] Craig Chambers, David Ungar, and Elgin [MS82] David MacQueen and Ravi Sethi. A
Lee. An ecient implementation of Self, higher order polymorphic type system for
a dynamically-typed object-oriented lan- applicative languages. In ACM Sympo-
guage based on prototypes. In Proceed- sium of LISP and Functional Program-
ings of OOPSLA `89, pages 49{70, Octo- ming, pages 243{252, 1982.
ber 1989. printed as SIGPLAN Notices,
24(10). [SCB*86] Craig Scha ert, Topher Cooper, Bruce
Bullis, Mike Kilian, and Carrie Wilpolt.
[CW85] Luca Cardelli and Peter Wegner. On An introduction to Trellis/Owl. In Pro-
understanding types, data abstraction, ceedings of OOPSLA `86, pages 9{16,
and polymorphism. Computing Surveys, November 1986. printed as SIGPLAN No-
17(4):471{522, December 1985. tices, 21(11).
[Deu81] L. Peter Deutsch. Building control struc- [Str86] Bjarne Stroustrup. The C++ Program-
tures in the Smalltalk-80 system. Byte, ming Language. Addison-Wesley Publish-
6(8):322{347, August 1981. ing Co., Reading, MA, 1986.
[DT88] Scott Danforth and Chris Tomlinson. [Suz81] Norihisa Suzuki. Inferring types in
Type theories and object-oriented pro- Smalltalk. In Conference Record of
gramming. Computing Surveys, 20(1):29{ the Eighth Annual ACM Symposium on
72, March 1988. Principles of Programming Languages,
pages 187{199, 1981.
[GR83] Adele Goldberg and David Robson.
Smalltalk-80: The Language and its Im-
plementation. Addison-Wesley, Reading,
Massachusetts, 1983.
[Gra89] Justin Graver. Type-Checking and Type-
Inference for Object-Oriented Program-
ming Languages. PhD thesis, University
of Illinois at Urbana-Champaign, 1989.
[JF88] Ralph E. Johnson and Brian Foote. De-
signing reusable classes. Journal of Object-
Oriented Programming, 1(2):22{35, 1988.
[JGZ88] Ralph E. Johnson, Justin O. Graver, and
Lawrence W. Zurawski. TS: An optimiz-
ing compiler for Smalltalk. In Proceed-
ings of OOPSLA `88, pages 18{26, Novem-
ber 1988. printed as SIGPLAN Notices,
23(11).
[Joh86] Ralph E. Johnson. Type-checking
Smalltalk. In Proceedings of OOPSLA `86,
pages 315{321, November 1986. printed as
SIGPLAN Notices, 21(11).

You might also like