Professional Documents
Culture Documents
A Type System For Smalltalk
A Type System For Smalltalk
Justin O. Graver
University of Florida
Ralph E. Johnson
University of Illinois at Urbana-Champaign
made to the system. The abstract class Change de- notation is used for lists of type variables. C denotes
nes the values method in terms of the parameters a class name and m denotes a message selector.
method. The implementation of the values method is Strictly speaking, a type variable is not a type; it
inherited by the subclasses of Change. The result of is a place holder (or representative) for a type. We
sending the values message depends on the result of assume that all free type variables are (implicitly)
sending the parameters message, which is di erent for universally quanti ed at an appropriate level. For
each class in Figure 1. Hence, the speci cation of the example, the local type variables of a method type
values method is also di erent for each of the classes. are assured to be universally quanti ed over the type
A class-based type system with explicit union of the method; the class type parameters of a class
types works for an implementation inheritance hier- are assumed to be universally quanti ed over all type
archy and greatly simpli es optimization, but is not de nitions in the class. We also assume that all type
very exible when it comes to adding new classes to variables are unique (i.e. have unique names) and are
the system. To regain some of the exibility we in- implicitly renamed whenever a type containing type
corporate type-inference and signatures. This gives variables is conceptually instantiated. For example,
us both exibility and the ability to optimize. the local type variables of a method type are renamed
A type system suitable for an optimizing each time the method type is \used" at a call site. We
Smalltalk compiler must also have parameterized do use explicit range declarations for type variables
types. It is dicult to imagine a (non-trivial) to achieve bounded universal quanti cation (see Sec-
Smalltalk application that does not use Collections . tions 3.5 and 3.6). Given these assumptions, we treat
The compiler must be able to associate the type of type variables as types.
objects being added to a Collection with the type of
objects being removed from it. Without parameter- 3.1 Subtyping
ized types, all Collections are reduced to homogeneous
groups of generic objects. Furthermore, due to the The intuitive meaning of the subtype relation is v
potentially imperative side-e ects of any operation, a that if s t then everything of type s is also of type
v
special imperative de nition of subtyping for param- t. The subtyping relation for our type system can
eterized types is used (see Section 3.1). be formalized as a set of type inference rules (as in
The type system must also be able to describe [CW85]). is a set of inclusion constraints for type
C
the functional and inclusion polymorphism exhibited variables. The notation :p t means to extend the
C v
by many Smalltalk methods. Functional polymor- set with the constraint that the type variable p is a
C
phism can be described, in the usual manner, with subtype of the type t. The notation s t meansC ` v
(implicit) bounded universal quanti cation of type that from we can infer s t. A horizontal bar is
C v
variables. Inclusion polymorphism can be described logical implication: if we can infer what is above it
with explicit union types or signatures. then we can infer what is below it.
For a detailed discussion of these issues see The following are basic inference rules.
[Gra89].
C ` t Anything
v
3 Types C ` Nothing t v
j (t) j (Type )
(type variable) j p j P
( )
> j Anything j Anything
( )
? j Nothing j Nothing
The following rules are used for the type parame- type list that may accompany a class name is xed by
ter lists of object types (see Section 3.2) and for the the corresponding class de nition. For example, the
argument type lists for block types (see Section 3.4). classes SmallInteger and Character have zero class type
parameters, so SmallInteger and Character are valid
<> <>
C ` v object types. The classes Array and Set each have one
~t ~t 0 class type parameter, so Array of: SmallInteger and
s s0
C ` v C ` v
Set of: (Array of: Character) are valid object types.
s ~t s0 ~t 0
C ` v Due to the imperative nature of Smalltalk [GR83,
Subtyping is re exive, transitive, and antisym- Joh86] and because of the antimonotonic ordering of
metric. function types [DT88] we de ne subtyping for ob-
t t C ` v ject types as equality, i.e. one object type is included
s t t u in another if and only if they are equal (Cardelli
also takes this approach for \updatable" objects in
C ` v C ` v
s u
C ` v
[Car85]). (Recall that is a set of inclusion con-
C
s t
C ` v t s C ` v straints and C is a class name.)
s=t C `
C `~s = ~t
3.2 Object Types C `C(~s ) C(~t )
v
s t u
classVariables: "
A union type is included in a union type
u u
0
only if typeParameters: 'ElementType'
each element of is included in .
u u
0 poolDictionaries: "
C` v s C` v category: 'Sequenceable-Collections'
C`
u
v
t u
+
Figure 3: Class de nition for OrderedCollection .
s t u
new: anInteger
Subtyping for message types, although compli- \Answer a new instance of size anInteger."
cated by type variables (see [Gra89]), is essentially
the same as for block types, with the added condi- "(super new: (anInteger max: 1)) setTally
tion that the selectors must be equal.
Figure 6: The new: method in Set class.
3.7 Metaclass Method De nitions
In regular class method de nitions the type Self
refers to the type of the receiver. It is useful to extend
this notion to be able to refer to the type of the class be inferred for an expression the expression is type-
of the receiver. This is done by using the Self class correct. The following rule describes the essence of
type speci cation. subtyping:
Similarly, when de ning methods in a metaclass it
is useful to be able to refer to the type of instances of C A` :
; e C` v s s t
type-correct) and if the type of e is a subtype of the where ~p v ~u are local type variables with range
type of v. The type of an assignment statement v e declarations, C(~s ) is the type of the receiver, ~y : ~t
is the type of e. are typed arguments, u is the return type, ~z : ~t are 0
of a method or a block. The types of the top-level C :(~p v ~u):(C(~s ) v C(~s )):(~t v ~t ) ` u v u
0 0 0
is solved by referring to the type of the receiver as methods in each subclass that inherits them. The
Self and expanding Self to the type of the receiver method de nition (de ned in terms of Self ) is inher-
in each subclass. Self is not a type, but instead is ited and Self is expanded to refer to the current class
\macro-expanded" to a type at compile-time. as described in section 3.6. We assume that the hi-
Abstract classes provide another way in which the erarchy of typed class de nitions H (see Section 4.1)
type of a method can change when it is inherited. contains not only the user declared method de ni-
Consider the method de nitions shown in Figure 1. tions for a class, but also any methods that can be
The values method is de ned in class Change and in- meaningfully (i.e. type-correctly) inherited by a class.
herited by the other classes. The return type of values If H0 is a partial function from < C; m > pairs to
depends on the return type of parameters, which is method de nitions representing user declared meth-
di erent for each class. The types of the di erent ods then H is derived by extending H0 to include
parameters methods are de nition points for inherited methods. In other
words, if H0 < C; m > is a user de ned method
<Change > parameters "<Change > then H < C ; m > has the same de nition provided
0
<ClassRelatedChange > parameters "<Symbol > that C is a subclass of C and H0 < C ; m > is
0 0
<MethodChange > parameters "<Array of: Symbol > not de ned. There are many possible H function. A
<OtherChange > parameters "<String > type-correct H is one in which, for all method types
where the types in \receiver position" are receiver derivable from H, the declared type of H < C; m >
types and the types following the \"" are return is the same as the type inferred by type-checking.
types. Thus, the values method must be recompiled (`< C; m >d : t) =) (C ; A `< C; m >: t)
in the context of each subclass of Change to compute ` C; A
its correct return type for that context.
Another way in which abstract classes show that Strictly speaking, if every method de nition in H
Smalltalk classes inherit implementation, not speci - must type-check then certain methods in abstract
cation, is that some methods cannot be executed by classes must be removed in order to have a type-
all of the classes that inherit them. For example, class correct H (e.g. the addAll: method in class Collection ).
Collection has a number of methods that use the add: In practice, inherited methods are type-checked
message, such as addAll:, but it does not implement or upon demand, i.e. when code is rst compiled that
inherit the add: message itself. Instead, add: is imple- might cause that method to be invoked. Array can
mented by the various subclasses of Collection . Some then be a legitimate subclass of Collection ; no inher-
of its subclasses, such as Array, do not implement add: ited code that invokes add: will be type-correct.
and thus cannot use methods like addAll: . Smalltalk The bene ts of delaying type-checking of inherited
relies on run-time type-checking and the \does not methods is that a method is only retype-checked for
understand" error to detect when an unde ned mes- every subclass that actually inherits it, and that this
sage is sent to an object. Our type system detects all type-checking is spread out over a large amount of
such cases at compile-time. time. A new class does not require any of the methods
These problems are solved by retype-checking that it inherits to be type-checked when it is created.
The message types in a signature type may have oc-
f localTypeVariables: <P > currences of Self to denote the type represented by
receiverType: <Block returns: (True + False)> the signature. Receiver types of Self may be omitted.
arguments: aBlock <Block returns: P > When an object type is being checked for inclusion
returnType: <Unde nedObject > g in a signature type Self is instantiated to the object
type.
whileTrue: aBlock A signature type includes object types belonging
"self value to classes not yet created. Thus, signature types con-
ifTrue: tain an in nite number of object types. In fact, the
[aBlock value. type speci ed by an empty signature contains every
self whileTrue: aBlock] possible object type, since every object type under-
stands every message type in the signature.
Figure 8: whileTrue: for class BlockContext . C ; A ` t v <>
Since signature types specify the largest possible
types, procedures that use them exhibit the most
Adding a new method to a superclass does not require polymorphism and so are the most exible. However,
type-checking it for every subclass that inherits it, nor use of signature types prevents the compiler from per-
does adding a new variable. However, changing the forming some important kinds of optimizations, since
type of a method or variable might require a lot of they do not provide enough information about the
computation to ensure that each of its uses is still class of the receiver to allow compile-time binding of
type-correct. message sends.
A signature type is speci ed by a special kind of
4.4 Speci c Receivers type declaration.
Some classes have methods that can be executed <understands: #(tm1 tm2 : : :)>
by only a subset of their instances. For example,
the whileTrue: message should be sent only to blocks Here, the tm 's are strings containing message type
that return Booleans. We can specify this by using speci cations. An example will be given shortly. Sig-
the receiverType: eld in the type declaration of the nature types may also be used to restrict the range of
method de nition as shown in Figure 8. Recall that type variables. A local type variable with a signature
if this eld is omitted then Self defaults to the object range would have a declaration of the form
type for the class in which the method is being com- <P understands: #(tm1 tm2 : : :)>.
piled. Although whileTrue: is nearly the only method
in the Smalltalk-80 class library that needs to declare Since such declarations may be recursive, complex
a speci c receiver, it is easy to imagine other meth- mutually recursive signature types can be built.
ods that would need this feature, such as a summation As an example, consider the occurrencesOf:
method in Collection . method for class Collection shown in Figure 9. The
only restriction placed on the type of the argument
4.5 Signature Types anObject is that its corresponding class (or classes)
must implement (or inherit) an = (equality) message
A signature type is a type (i.e. a set of object and that will take an argument of type ElementType and
block types) speci ed by a set of message types. An return a boolean. It is helpful to compare the use of
object type (or block type) t is included in a signature the do: message in this example with its de nition in
type s if, for each message type m 2 s, a message of Figure 5.
type m sent to an object of type t is type-correct. In Type-checking a message send to a receiver whose
other words, an object type is in a signature type if type is a signature type is straightforward since all
it \understands each message in the signature." relevant type information is contained in the signa-
C ; A ` e : t C ; A ` ~e : ~t C ; A ` (e m ~e ) : u ture type.
C ; A ` t v << m; t ~t ! u >> C ; A ` e :< : : : < m; ~t ! u > : : : >
0 0
C ; A ` t v << m ; t ~t ! u >>
0 0 0 0
C ; A ` (e m ~e ) : u
C ; A ` t v << m; t ~t ! u >; < m ; t ~t ! u >>
0 0 0 0
f receiverType: <Self >
arguments: anObject <understands: #('= <ElementType > "<False + True >')>
temporaries: tally <SmallInteger >
blockArguments: each <ElementType >
returnType: <SmallInteger > g
occurrencesOf: anObject
\Answer how many of the receiver's elements are equal to anObject."
j tally j
tally 0.
self do: [:each j anObject = each ifTrue: [tally tally + 1]].
"tally
Figure 9: The occurrencesOf: method in class Collection .
5 Beyond Signatures
class Object:
The type system described so far, with explicit union
types, case analysis, and signature types, is quite isLookupKey
powerful. However, there are still some methods that "false
elude type-checking.
Although Smalltalk is a typeless language, it is
possible to discriminate between classes of objects class LookupKey:
and provide a simple kind of explicit run-time type-
checking. An example is the isLookupKey message isLookupKey
whose implementation is shown in Figure 10. Mes-
sages like isLookupKey can be used in a method to
"true
\type-check" an argument before sending it a mes- Figure 10: Implementation of isLookupKey .
sage that it might not understand.
A example of this style of programming is the =
(equality) message shown in Figure 11. Equality must
be de ned between any two objects and should not
be subject to run-time errors. This style of imple- When a send of = to a LookupKey needs to be
mentation meets both of these criteria. The problem type-checked, the code in Figure 11 is substituted for
is how to type-check such a method. the call. The types of the actual arguments then re-
The type of this method can be roughly stated place those of the formal arguments, allowing both
as follows. An argument must understand the de nition and use information to be used in type-
isLookupKey message. If an argument responds to checking. This usually permits the = method to be
this message with true then it must also understand type-checked completely. If not, then type-checking
the key message and the method will (presumedly) for the method that sends = must also be deferred.
return an object of type True + False, otherwise the This static analysis technique has proven useful for
method returns an object of type False. type-inference as well as for type-checking [Gra89].
In general, the type of a polymorphic method like
= is complicated, but its type for a speci c use is
simple and easy to understand. Thus, type-checking a 6 Type Safety
method defers some of the type-checking for a method A type system can only be shown correct relative to
until its invocations are type-checked. It doesn't mat- a formal de nition of the language. This section out-
ter when a method is type-checked, only that type- lines a proof of correctness for the type system rela-
checking is completed before any code that invokes tive to Kamin's denotational semantics of Smalltalk
the method is executed. [Kam88].
= aLookupKey
De nition 1 (Object/type consistency) An ob-
aLookupKey isLookupKey ject o is consistent with a mapping from objects to
"
ifTrue: [ self key = aLookupKey key] types , relative to a class hierarchy H, and a state ,
"
ifFalse: [ false] if
1. (o) = C(~t),
Figure 11: = (equality) in class LookupKey .
2. the class of o is C, and
3. for each instance variable xi of o, let ti be the
declared type of xi with type variables replaced
Usually a type is described as a set of objects. by the types given in (o). Then the type of the
Type safety then means that the value of an expres- object referred to by xi is a subtype of ti .
sion is always contained in its type. However, our
types are not sets of objects, so a di erent de nition
of type safety is needed. We will de ne type safety
by assigning an object type to each object and then
showing that the value of an expression always is as- Kamin's semantics. The rst is when a variable
signed a type contained in the expression's type. changes its value. This happens in an assignment
The type of an object depends not only on its statement and in assigning arguments to formal pa-
current state but also on its past and future states. rameters when evaluating a method or block.
Thus, it may not be possible to decide whether an ob- The second way that the state changes is when a
ject is in a particular type, though it is often possible new object is created. The type of some new objects,
to show that it is not. For example, a is never in
Set such as new blocks, are known in advance so the new
Array of: Character, and a containing
Set Characters object will always be of the correct type. Unfortu-
is not in Set of: SmallInteger. However, it is not easy nately, problems occur with the new primitive and
to tell whether a Set containing only is
Characters when creating a new context to evaluate a method.
in Set of: (SmallInteger + Character)|it all depends This is because instance variables of new objects and
on whether or not the is referenced by a variable
Set local variables of new contexts are both initialized to
that requires it to contain only .
Characters nil, but the types of these variables usually do not
If each object had a type assigned to it then we include Unde nedObject. Thus, until the variables
could check whether an object was consistent with its are initialized, their value is not consistent with their
type by checking whether its class was consistent with type. We ensure type-safety by requiring all variables
its type and whether the types assigned to the values whose types do not include Unde nedObject to be as-
of its instance variables were included in the declared signed before they are read, and use ow analysis to
types of its instance variables. This assignment would check this. Thus, we will change the statement of the
be consistent if every object was consistent with the proposition slightly.
type assigned to it. Although this type-assignment Type-Safety Theorem: If a type-correct program
might not be unique, any consistent type-assignment is in a state that is consistent with a type assignment
could be thought of as describing the types of all ob- except for unassigned variables, and if a variable is
jects. These ideas are formalized in De nition 1. always assigned before it is read, then any succeeding
A state is consistent with a type-assignment if state will be consistent with .
every object in the state is consistent with .
Proposition: If a type-correct program is in a state Proof: In a type-correct program, the type of the
that is consistent with a type assignment then any expression of an assignment statement must be in-
succeeding state will be consistent with . cluded in the type of the variable, so, if each expres-
The only way that a consistent type-assignment can sion returns a value that is consistent with its type
become inconsistent is if the state changes. This is then assignment statements maintain the consistency
because the type-assignment itself is constant and, of the type-assignment. Once a variable is assigned,
in Kamin's semantics, objects never change and the its value will have a type that is included in the type
class hierarchy does not change. Thus, we can prove of the variable. Thus, if the value of any expression
the proposition by showing that is maintained as an has a type that is included in the type of the expres-
invariant every place where the state can be changed. sion then whenever a variable v is read, the type of
There are two ways that the state changes in the value of v will be consistent with the type of v.
The theorem is proved by structural induction on Type-safety also depends on all the primitives be-
the number of message sends in the evaluation of an ing given correct types. Since primitives are written
expression. The base case is where there are no mes- in a language other than Smalltalk, it is impossible
sage sends. Since there are no message sends there to reason about them within the framework presented
can only be assignment statements, whose right-hand here. A few primitives are inherently unsafe. These
sides are literals or variables. The variables either are are primarily used by the debugger. Most of the prim-
type-consistent or will be assigned before they are itives have simple types, however.
read, so the assignment statements will all maintain
the consistency of the type-assignment.
The induction step is to assume that any expres- 7 Conclusion
sion that can be evaluated in ? 1 message sends
n
Our type system for Smalltalk is type-safe. It has
(or less) is type-safe and to prove that evaluating an been implemented in Smalltalk and used in the TS
expression optimizing compiler for Smalltalk [JGZ88]. It has
rcv k1: exp1 k2: exp2 : : : km: expm been able to solve most type-checking problems in
the standard Smalltalk-80 class hierarchy. Thus, it is
that requires message sends is type-safe. Since an
n correct, useful, and usable.
expression must be type-checked before it can be eval- Our type system is also unique. It di ers from
uated, we know that the types of the argument ex- other type systems for object-oriented programming
pressions to the k1:k2: km: message are included in
::: languages by acknowledging that only implementa-
the types of the formal parameters of the method that tion is inherited, not speci cation, and by handling
is invoked. Evaluating any expi will take less than n case analysis of union types automatically.
message sends, so each argument is consistent with its Our type system is more complicated than other
corresponding type. When the new context is created type systems for object-oriented programming lan-
and the method is executed, the formal parameters guages. Whether this complication is justi ed de-
of the method will be bound to objects whose type is pends partly on whether a new language is being de-
included in the type of the parameter. Methods also ned or whether a type system is being de ned for
create temporary variables, but they are unassigned, Smalltalk. Current Smalltalk programming practice
so invoking a method maintains the consistency of the requires a type system like ours.
type-assignment (except for unassigned variables).
We assumed that the expression involves mes- n