You are on page 1of 12

Personal Statement and Outline of Proposed Research to support a PhD application

Stephen Kell March 2006

1
1.1

Interests
General Interests

My primary research interests lie in the fields of operating systems, distributed systems and programming languages. I also have interests in software engineering, networks, continuous media applications, sentient environments and human-computer interaction. I intend to pursue a career in research, and during 2005–6 have been a Research Assistant at the Computer Laboratory of the University of Cambridge. Further details may be found in my CV.

1.2

Immediate Interests

I am particularly interested in the ways in which the design and implementation of operating systems and programming languages affect the development and use of applications. This includes, for example, supporting application-level qualityof-service, reliability and security guarantees, and facilitating the adaptation, re-use and redeployment of software components. In today’s increasingly complex, mobile and distributed systems, these properties are ever more important. Their absence in real systems leads to recurrent problems in system development [14], deployment [15] and evolution; their importance in achieving reliability [16, 3], quality-of-service [4] and security [3] is also well-known. Modularity is a structural property which I will define loosely (for the moment) as the ability to adapt, combine, re-use and redeploy components of a software system. These abilities are not only useful in their own right, for instance in reducing development and administration costs, but can contribute to the provision of quality-of-service, reliability, verifiability and security, among others. A modular approach can enable the structural enforcement of these properties across the entire system.

1

In contrast to traditional software engineering work. Here a “component” is a run-time instance of any unit of code: the state corresponding to a single source file. I/O and intra-process linkage – in the case of Unix. files. and the sets of devices. these are devices. • The implementation-specific nature of these interfaces leads programmers to make assumptions which subsequently limit re-use and scalability. Even within the open-source community. since these enforce the boundaries between components. by which I mean modularity among an open-ended set of independently-developed components. where code is freely available. • The low-level nature of these interfaces forces applications to layer their own abstractions on top.g. sockets. resource management. For example. or applications to be dependent on a particular windowing toolkit or run-time system. This kind of modularity is dependent on the support of the operating system’s linker and system call interface. The application programmer must commit to these implementations at development time. Likewise. 2 Problems Conventional programming models make it hard to write highly modular applications. • There is insufficient indirection when accessing services external to a particular component.g. a variety of low-level interfaces provide IPC. This is witnessed in deployment problems (e. VFS implementations and socket types known to the operating system. To overcome these dense inter-module dependencies may require relinking. For . the Unix-like model espoused by most programming languages’ standard libraries is inadequate in the following ways. This leads to many similar but mutually incompatible conventions for data encoding. IPv6) and in code duplication. recompilation or ad-hoc coding effort [17]. an application targetting one networking stack or file format cannot be made to use another. • As a result of the above. it is common for libraries to be duplicated in different languages. software at higher layers of a system is tightly coupled to particular lower-layer implementations. For my PhD I intend to explore techniques for providing modularity within such systems. processes and the linker. my focus is on “dynamic modularity”. even when the application uses only those abstract features which are common to both (e. type systems. a library. Conventionally. Each of these has its own namespace and is restricted to a particular set of implementations – Unix’s heavyweight processes. communication protocols and so on. it is hard for an application written in one language to make use of a library written in another. This precludes direct interoperability between components which do not use the same conventions. an executable or some other grouping. procedure call. IPv6 versus IPv4 [15]). For example.

assuming that clients reside in the same address space. Existing middlewares [19]. adaptibility and replaceability all address problems caused by a lack of foresight among developers. Lack of a pervasive type system limits the potential for static analysis across module boundaries. and without requiring pre-commitment to particular implementations. replaceability. Use of a shared library precludes memory-based fault isolation since it implies a shared address space. reliability) characteristics. is now well-accepted. and be accessible to all kinds of applications and supporting user-space code. they are typically mutually incompatible. Accordingly. except where the need for both modularity and distribution is obvious from the design stages. 3 3. 1. There is frequently no means of propagating quality-of-service requirements to foreign modules. Since modularity’s primary goals of reusability. shared libraries frequently expose data structures directly. Similarly. they are not popular among software developers. within the most basic programming models targeted by application developers. virtual machines [20] and component systems [5] attempt to solve problems of dynamic modularity. but for network communication this must be implemented in the application. there is a strong argument that modularity should be naturally provided “by default”. and still carry dependencies on particular network protocols. However. by devising new programming models which lead naturally to modular applications. the Unix file system provides access control at the API level. since middlewares are perceived as (and often focussed towards) tackling only problems of distribution rather than the more general modularity. For example. adaptibility) and one or more external (quality-of-service. have high compulsory overheads [21]. I intend to research ways of supporting modularity at the operating system level. • The only provision for access control. with some success. programming paradigms or other implementation details. quality of service. and ignore partial failure modes [18]. security. It is embodied in . Moreover. reliability and other cross-cutting concerns is implementation-specific. introduced by Parnas as “information hiding” [9]. motivated by well-known existing research.example.1 Ideas and Approach Foundations My approach will be based on the following principles. distributed filesystems on Unix are notoriously problematic because programmers conventionally assume access to a local file to be fast. Separation of interface from implementation: this principle. This should include demonstrable improvements to both internal (reusability. leading to undetected bugs and security exploits.

This can be inverted: applications should not build in abstractions themselves.many programming languages [22] and operating systems [6. Nemesis provides a particularly strong separation in its programming model [1]. the Unix filesystem API is often found inadequate even for storage applications. specifically the operating system. Many later developments. but hardly solved. Use of a single API also enforces a consistent interface to access control. 3. 2. where a “middle” layer of abstraction implementations sits underneath applications. since this hinders flexibility. 1 In . First-class consideration of connectivity. devices and communication streams. Shaw [8] approaches the same issue. 5. this formalisation must be supported by the run-time system. reusability and portability. and covers many directly relevant problems. Correct placement of abstraction: Engler et al [10] argue that operating systems should not include compulsory abstractions. since it employs direct linking to pre-defined interfaces specified using an IDL. This argument motivates a three-layered approach. it does not solve the “interface mismatch” problem. ultimately. since they allow fewer assumptions on the part of the programmer. Policroniades [24] presents one argument. In this model. However. albeit from a “static” closed-project perspective. including the VFS interface [12] and Plan 9 [7]. compromise performance and reliability.g. Unification of interfaces and namespaces: Unix [11] achieved much of its power and elegance by partially unifying the programming interfaces used to access files. but the resulting interfaces may also be more difficult to use. she concludes that a software component should contain as little specification as possible of how it connects to others. rather than towards applications themselves. about distribution-hiding interfaces [18]) and all interactions must somehow be characterised as read or write operations (a problem first acknowledged. This has been applied to programming fact. by Unix’s ioctl()). I add that to allow dynamic modularity. 23]. The benefits of reflection: reflection [25] is a technique by which the internal workings of a system are rendered tractable from within that system’s computational processes. separate from the components themselves: this principle follows naturally from the point 2. extending this simple storage-oriented interface too far causes problems: programmers may make incorrect assumptions (e. In summary. and is a useful starting point. 4. operating system services are directed towards the middle layer. supporting abstractions analogous to (but distinct from) those in component languages. since they limit application-level flexibility and. extended this idea by noting that the abstract data type exposed by a Unix file is very general.1 It is worth exploring the trade-off here: increased unification of programming interfaces may offer better modularity. but that these should be formalised in a separate domain. However.

3.or run-time provides is a common technique for adding flexibility: examples include dynamic linking. can be used to guarantee correctness and reliability properties without the need for heavyweight run-time checks [28. perhaps augmented by trusted toolchains and proof-carrying code. to target their code at existing concrete interfaces. However. Additionally. Instead. since any component may only access that which it can name. Naming is a particularly deep area. a pervasive type system aids verification across module boundaries and at run-time: static analysis. they should devise their own abstract interface. load. Developers should not need. and there is a rich taxonomy of names: pure or impure. 20. Reflection is often realised as a unified programming interface (referring back to point 4): for example. I propose a technical solution which makes explicit provision for mismatched interfaces. The benefits of type systems: static type-checking is well known as a useful way to detect and avoid bugs during software development. dynamic extensibility and adaptation [28. by exporting a namespace of typed objects. retaining typing information at run-time is also useful in any application where logic dealing in higher-level semantic concerns may be replaced or extended at run-time. we begin by “admitting defeat”. Delaying binding until link-. Names are crucial to both sharing and protection. 7. Under such a system. This . This includes security policies [3]. Needham [26] and others [27] motivate the importance of naming within systems. 29].2 Novel Contributions I contribute two possibly-novel suggestions. Political solutions (i. virtual functions in C++[22]. each module exposes its own interface to higher-layer components. and rely on the runtime support of the operating system to allow this to be joined together with the interfaces exposed by supporting components. standardisation) cannot succeed in ensuring interface matching when there is no common administration between component developers. 4]. Firstly. well-known or secret. The importance of names: influential papers by Saltzer [2]. 6. consider Java’s fixed set of interfaces for manipulation of the JVM’s run-time type metadata. structured or flat. environment variables in Unix [11].e. the run-time system will “glue” the abstract interfaces targeted by client components. I propose that to counter interface mismatch problems. Rather. adaptation and other aspects of dynamic modularity.languages [13] and middlewares [4] to enable run-time extension. application scripting. the Internet’s domain name system and countless others. and in fact should not attempt. names are also crucial to abstraction: using a more abstract namespace removes dependency on particular implementations. Selection of these components should be left until run-time. As the fundamental mechanism for indirection. To these.

names are atomic or linearstructured objects resolved against a context by some well-defined resolution algorithm. Secondly. I present a generalisation of the familiar concept of name to a naming expression. this is a convenient approach. 3.3 Outline of Proposed System Consider. These questions include the following. particularly. 1910. In other words.g. does not preclude grouping of particularly useful transformations into libraries. subscribing roughly to descriptivist theories of naming2 . Names are typically a subcategory of expressions in formal languages: while expressions are tree-structured entities evaluated against an environment by some well-defined reduction process. is a matter for research. it is expected that the model will be oriented around functions (or sets of functions) rather than plain data. by increasing the expressivity of names. where each would appear as a named function. Some other remaining questions concern how to integrate this system with the programming languages used to write components. • What does a naming expression denote? This question effectively asks what primitives the model should include. albeit possibly requiring complex naming expressions.3 Design of the naming language. embedding features of a functional language such as ML [30] within the name service of a file system. In this way. In the spirit of information hiding. With a suitable language design. this technique may be extended to provide arbitrary transformations of all kinds of interfaces. “On Denoting”. the program’s logic may be applied to a set of objects which do not reside in a physical directory on disk. the necessary connection logic can be supplied at run time. a program might supply a function application expression. for example. interface mismatch.embodies point 3 above: inter-module connectivity is given first-class consideration and run-time support. whose evaluation yields the set of objects which to “open”. Instead of supplying the name of a pre-existing directory to a call such as opendir(). There must also be some notion an example see Russell. By introducing an expression-like name. Since adaptation and “glue” code is most easily specified in functional or scripting-oriented languages. Mind. including its type system and computational power (e. not just filesystems. One approach is described in the following section. we have removed some implementation dependency and hence improved the program’s flexibility. The key idea is that the naming expression supplied by the user specifies how to adapt the foreign module’s exported interface into the abstract one targeted by the local module. Making this sufficiently powerful and efficient will be a substantial part of the proposed research. ability to express recursion). allowing components to be replaced without the need for recompilation or relinking. 3 This 2 For . Crucially. the dynamism and flexibility provided by names (as outlined in point 6 above) can be applied to the problems of inter-module connectivity and.

where the programmer specifies an access mode but must assume other interface characteristics. what other operations are required. • How is a name resolved by a component’s code? A call analogous to open() is a possibility. These would allow a local object to be exported into some widely-accessible environment. that this is a natural generalisation of Unix’s open(). and close() the interface. This raises many questions about how arguments and return values are represented. making the assumptions explicit removes the need for this. • How does a component export its own interfaces? In general this is done by updating some naming environment which is accessible to potential clients. by defining the initial space of nameable interfaces. for example command-line arguments or Unix’s standard I/O streams. It must then be possible to invoke() named operations. • How does code get hold of a name? Some names will be explicitly represented in input data. i. similar to the Unix environment or Nemesis’s per-domain contexts.4 Dynamic type checking can confirm whether the name resolves to an object exporting the specified interface. and from that point will allow free interoperability between that language and all other supported languages. • How are the type systems of the naming language and component language resolved? Clearly. and how component languages might abstract away from these basic operations. others. supporting bind() and (optionally) unbind() operations. the caller should specify an authority and an interface type. • How are foreign objects accessed by the component code? An open() call must return some kind of reference to the foreign interface. a correspondence must be known for each component language. Note that the run-time system need only be ported once per language. or the analogue of a file handle. By contrast. a capability. A writable environment might be a subtype of the basic environment.of environment. hence allowing some level of type-safety guarantee.e. These implicit bindings effectively bootstrap a component.e. 4 Note . This would probably include an unforgeable token identifying the interface reference to the system – i. and the naming language’s types must have some run-time representation within the component language. rather than specifying an access mode and (implicitly) calling identity. although the necessary code can sometimes be autogenerated by tools. a function mapping from names to interfaces. These unstated assumptions force the programmer to handle additional error cases. where “environment” is itself an interface similar to Nemesis’s Context. One implementation could involve some sort of hereditary environment. as in Unix. often implicitly. such as support for particular ioctl() or seek() operations. must be assumed by the program and are bound at run time. current systems typically involve binding effort per library as well as per language.

The usual problems of partial failure and resource leakage will be subjects of research.g. security or provision for quality of service. i. compared to the nearest equivalent under conventional models. quantifiable improvements to one or more externally-visible characteristics. These calls themselves may be hidden by the component language’s usual resource management constructs. • Applications developed against the model should not be subject to significant performance penalties. enabling admission control to be performed during the call to open() analogously with type-checking. • It must be possible to show a transition path towards the new model for existing applications. among applications developed using the model. e. . In addition to these. reliability. most likely GNU/Linux. In order to achieve these. • It must be implemented so as to provide backwards compatibility and interoperability with “legacy” (i.e. conventionally-developed) applications on the same system. a client may annotate these with service-level requirements. • How are access control and quality-of-service features integrated into the model? Use of interface references naturally suggests a capability-based approach to access control. relative to applications developed conventionally. quantifiable improvements to modularity among applications developed using the model. I suggest the following practical constraints. 4 Implementation and Evaluation • The programming model must be naturally modular. • There must be demonstrable. scope-based as in C++. • There should be demonstrable. Quality-of-service information might be integrated into the notion of interface: when specifying a set of operations which the named entity must perform.• How is resource management performed? The open() and close() pattern puts bounds on the period of an object’s use by each client. or collectorbased as in Java.e. the following approaches may be helpful. The discussion so far has identified the following requirements. • The model must be developed for an existing widely-used system. possible implementations will be a subject of research.

provides a useful starting point. and unification of naming. This may be achieved through elimination of implementation-specific interfaces. • A new class of process could be added to Linux. it may not be possible to offer high performance without making extensive changes to the Linux kernel. For evaluation purposes. Toy bindings for a variety of languages could then be created as proof of concept. and is not attempted here. entitled “Operating System Support for Application Modularity”. 32]. Existing software measurement work. and the tools to measure them. These calls should functionally (but not syntactically) subsume all previous interfaces. strongly-typed interfaces between components would be extremely difficult. • Empirical data on the modularity (and other characteristics) of software developed using the new model. such as that of Fenton [31. and reconstruction of abstract. some or all of the following will also be required. Some existing work on modularising monolithic code may be helpful [33. 34]. say a monolithic application written in C. Note that this approach will not provide actual modularity until dependency on the underlying system call interface is removed. This could be done by static analysis on the dependencies between object files. • Tools or methods to evaluate the measure on real software. • A rigorous definition of the kind of modularity under consideration. a mock-up could be produced in a high-level language. However. which I will forward to supplement my application in due course. into a set of modules. and one or more corresponding measures. This is remarkably difficult. it may be feasible in some limited cases. However. . or perhaps as a C library. and is worthy of research. • A tool could be developed which splits an existing program. allowing communication with “legacy” processes but offering improved modularity. 5 Afterword I am currently working on a more detailed proposal.• To prototype the model. with a new set of system calls. either from deliberate reimplementations of existing software or (preferably) experiments conducted on real programmers asked to develop a piece of software using the new model and a set of existing components. • Suitable measures for the chosen external characteristics. Note that this does not address inter-process module boundaries.

second edition. R. University of Cambridge Computer Laboratory. Steele. [6] I. G. 99–208. [3] T. April 1995. Dorward. Trickey. S. Atlanta. K. P. Coulson. McAuley. Gosling. [9] D. PhD thesis. “Procedure Calls Are the Assembly Language of Software Interconnection: Connectors Deserve First-Class Status”. ACM Computing Survey. “Operating System Structures to Support Security and Reliable Software”. P. P. Kaashoek. H. Ritchie. [11] D. IEEE Journal on Selected Areas in Communications. USENIX Association Summer Conference Proceedings. Black. “The Java Language Specification”. 1995. “Plan 9 From Bell Labs”. 1995. [8] M. ICSE Workshop on Studies of Software Design. Presotto. 409–445. “The Unix Time-Sharing System”. pp. pp. Flandrena. M. “An Architecture For Next Generation Middleware”. Winterbottom.M. Proceedings of the 5th IEEE Workshop on Hot Topics in Operating Systems. [2] J. Thompson. “Naming and Binding of Objects”. Computing Systems.H. 1976. Roscoe. Communications of ACM. 1998. Emmerich. 1978. Pike. [10] D.F. Roscoe. Fairbairns and E. July 1974 [12] S. Thompson. 2000. Blair. “The Design and Implementation of an Operating System to Support Distributed Multimedia Applications”. “Exterminate All Operating System Abstractions”. G. Communications of the ACM. [5] W.M. T. Bracha.References [1] T. Parnas. Proceedings of the 24rd International Conference on Software Engineering. vol. 2002. “The Structure of a Multi-Service Operating System”. [7] R. B. K. Joy. Kleiman. Hyden. 1996. Addison Wesley. Evers.S. Robin. [4] G. R. Linden. D. D. December 1972. Barham. Shaw. pp.A. “On the criteria to be used in decomposing systems into modules”. D. 60.R. Saltzer. Engler. 1986. 1053–1058. “Distributed Component Technologies and their Software Engineering Implications”. Lecture Notes in Computer Science. Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing. “Vnodes: An Architecture for Multiple File System Types in Sun UNIX”. 8(4). G. Leslie. . M.R. 15(12) . 1993. [13] J. 7(7).L. Communications of the ACM. Papathomas. R.

Lakin. 1994. “Datom: A Proposal for an Alternative Storage System API”. January 1982. [19] Object Management Group. [17] A. [15] H. Afifi. Pike. [21] S. 1999. PhD thesis. “The Hideous Name”. “Architectural Mismatch or Why It’s Hard to Build Systems out of Existing Parts”. 19th Symp. Workshop on Building Software for Pervasive Computing at the 19th Annual ACM Conference on Object-Oriented Programming. “Orion: Named Flows With Access Control”. Addison Wesley. R. R. Bershad. Levy. p. Garlan. invited talk. Fraser. second edition.C. Seattle. December 1991. Washington. Vancouver. J. 315-327. P. Wollrath. Weinberger. [20] T. invited talk.12. August 2005. Custer. University of Cambridge Computer Laboratory. Stroustrup. Ockerbloom. A. Canada. Revision 1. Wyant. University of Cambridge Computer Laboratory. pp. 1998. [27] R. Newman. Policroniades. [22] B.[14] D. 478. [26] R. “Procedural Reflection in Programming Languages”.) “Distributed Systems”. Proceedings of the 17th International Conference on Software Enginneering. L. [18] J. “The Design and Evolution of C++”. “Communication in ad hoc networks or: CORBA considered harmful”. September 1996. “The Common Object Request Broker: Architecture and Specification”. Mass. Kendall. of Technology. on Operating Systems Principles (SOSP). “Names”. Languages. H. [16] M. chapter in S. Swift. Solomon. Needham.M. November 2005. 179–185. Lindholm. “Improving the Reliability of Commodity Operating Systems”. USENIX Summer Conference Proceedings 1985. Toutain. Inst. and Applications (OOPSLA’04). Microsoft Press. Smith.1.M. pp 563-568. [24] C. F. “Methods for IPv4-IPv6 transition”. [25] B. “A Note On Distributed Computing”. 1993. in Proc. Systems. H. [23] D. Sun Microsystems Technical Report SMLI TR-94-29. November 1994. Proceedings of the Fourth IEEE Symposium on Computers and Communications. October 2004. Allen. October 2003. .1. B. OMG TC Document Number 91.N. S. Addison Wesley. Yellin. Waldo.A. Addison Wesley. Mount. pp. S.M. Mullender (ed. April 1995. “Inside Windows NT”. “The Java virtual machine specification”. G.M.

D. Proceedings of the fifteenth ACM Symposium on Operating Systems Principles. Microsoft Research Technical Report MSR-TR-2005-135. T. IEEE Transactions on Software Engineering. “Software Measurement: A Necessary Scientific Basis”. Schwanke. 12. a Hawblitzel. Sirer. [29] B. [34] R. IBM Research Report RZ 2799.G. Murphy. vol. 20. revised 1997. “Deriving structurally based software measures”. P. M. Milner.E. May 1991. “The Definition of Standard ML”. Chambers. Wobber. Fenton. Tofte. Melton. Journal of Systems and Software. Savage. [32] N. . M. O. March 1994.N. MacQueen. Levi. Eggers. Aiken. 1995. B. Larus. pp. Tarditi. D. J. Pardyak. issue 3. A. D. vol. F¨hndrich. Hodson. Barham. C. September 1995. [31] N. Proceedings of the 13th International Conference on Software Engineering. “Droplets: Breaking Monolithic Applications Apart”. Steensgaard. S. M. Fiuczynski. 83–92. Abadi. M. MIT Press. Bershad. N. pp. C. July 1990. 199–206. Deri. issue 3. [30] R. R. 177–187. S. Fenton. October 2005. M. “An Overview of the Singularity Project”. “Extensibility. safety and performance in the SPIN operating system”. “An intelligent tool for re-engineering software modularity”. B. [33] L.[28] G. Becker. E. S. Hunt. Harper. pp. P. Zill.