P. 1
C Interfaces and Implementations

C Interfaces and Implementations

|Views: 88|Likes:
Published by Hòa Nguyễn

More info:

Published by: Hòa Nguyễn on Jul 23, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/26/2012

pdf

text

original

Table of Contents
Copyright................................................................................................................................ 1 Addison-Wesley Professional Computing Series..................................................................... 1 Preface................................................................................................................................... 4 Acknowledgments................................................................................................................. 10 Chapter 1. Introduction......................................................................................................... 12
Section 1.1. Literate Programs....................................................................................................................................................................................................... 13 Section 1.2. Programming Style..................................................................................................................................................................................................... 19 Section 1.3. Efficiency.................................................................................................................................................................................................................... 22 Further Reading............................................................................................................................................................................................................................ 23 Exercises........................................................................................................................................................................................................................................ 24

Chapter 2. Interfaces and Implementations.......................................................................... 26
Section 2.1. Interfaces.................................................................................................................................................................................................................... 26 Section 2.2. Implementations....................................................................................................................................................................................................... 29 Section 2.3. Abstract Data Types.................................................................................................................................................................................................. 32 Section 2.4. Client Responsibilities............................................................................................................................................................................................... 35 Section 2.5. Efficiency.................................................................................................................................................................................................................... 41 Further Reading............................................................................................................................................................................................................................. 41 Exercises........................................................................................................................................................................................................................................ 42

Chapter 3. Atoms.................................................................................................................. 44
Section 3.1. Interface..................................................................................................................................................................................................................... 44 Section 3.2. Implementation......................................................................................................................................................................................................... 45 Further Reading............................................................................................................................................................................................................................. 53 Exercises........................................................................................................................................................................................................................................ 53

Chapter 4. Exceptions and Assertions................................................................................... 56
Section 4.1. Interface..................................................................................................................................................................................................................... 58 Section 4.2. Implementation......................................................................................................................................................................................................... 64 Section 4.3. Assertions.................................................................................................................................................................................................................. 70 Further Reading............................................................................................................................................................................................................................. 74 Exercises........................................................................................................................................................................................................................................ 75

Chapter 5. Memory Management.......................................................................................... 78
Section 5.1. Interface..................................................................................................................................................................................................................... 80 Section 5.2. Production Implementation...................................................................................................................................................................................... 84 Section 5.3. Checking Implementation......................................................................................................................................................................................... 87 Further Reading............................................................................................................................................................................................................................ 96 Exercises........................................................................................................................................................................................................................................ 97

Chapter 6. More Memory Management............................................................................... 100
Section 6.1. Interface.................................................................................................................................................................................................................... 101 Section 6.2. Implementation....................................................................................................................................................................................................... 103 Further Reading........................................................................................................................................................................................................................... 109 Exercises....................................................................................................................................................................................................................................... 111

Chapter 7. Lists.................................................................................................................... 114
Section 7.1. Interface.................................................................................................................................................................................................................... 114 Section 7.2. Implementation........................................................................................................................................................................................................ 119 Further Reading........................................................................................................................................................................................................................... 124 Exercises....................................................................................................................................................................................................................................... 125

Chapter 8. Tables................................................................................................................ 126
Section 8.1. Interface................................................................................................................................................................................................................... 126 Section 8.2. Example: Word Frequencies................................................................................................................................................................................... 129 Section 8.3. Implementation....................................................................................................................................................................................................... 136 Further Reading........................................................................................................................................................................................................................... 143 Exercises...................................................................................................................................................................................................................................... 144

Chapter 9. Sets.................................................................................................................... 148
Section 9.1. Interface................................................................................................................................................................................................................... 149 Section 9.2. Example: Cross-Reference Listings......................................................................................................................................................................... 151 Section 9.3. Implementation....................................................................................................................................................................................................... 159 Further Reading........................................................................................................................................................................................................................... 169 Exercises...................................................................................................................................................................................................................................... 169

Chapter 10. Dynamic Arrays................................................................................................ 172
Section 10.1. Interfaces................................................................................................................................................................................................................ 173 Section 10.2. Implementation...................................................................................................................................................................................................... 176 Further Reading.......................................................................................................................................................................................................................... 180 Exercises...................................................................................................................................................................................................................................... 180

Chapter 11. Sequences......................................................................................................... 182
Section 11.1. Interface.................................................................................................................................................................................................................. 182 Section 11.2. Implementation...................................................................................................................................................................................................... 185
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl

Further Reading........................................................................................................................................................................................................................... 191 Exercises....................................................................................................................................................................................................................................... 191

Chapter 12. Rings................................................................................................................ 194
Section 12.1. Interface.................................................................................................................................................................................................................. 194 Section 12.2. Implementation..................................................................................................................................................................................................... 198 Further Reading.......................................................................................................................................................................................................................... 207 Exercises...................................................................................................................................................................................................................................... 208

Chapter 13. Bit Vectors........................................................................................................ 210
Section 13.1. Interface.................................................................................................................................................................................................................. 210 Section 13.2. Implementation...................................................................................................................................................................................................... 213 Further Reading.......................................................................................................................................................................................................................... 224 Exercises...................................................................................................................................................................................................................................... 224

Chapter 14. Formatting....................................................................................................... 226
Section 14.1. Interface.................................................................................................................................................................................................................. 227 Section 14.2. Implementation..................................................................................................................................................................................................... 235 Further Reading.......................................................................................................................................................................................................................... 249 Exercises...................................................................................................................................................................................................................................... 250

Chapter 15. Low-Level Strings............................................................................................. 252
Section 15.1. Interface.................................................................................................................................................................................................................. 254 Section 15.2. Example: Printing Identifiers................................................................................................................................................................................ 260 Section 15.3. Implementation..................................................................................................................................................................................................... 262 Further Reading........................................................................................................................................................................................................................... 275 Exercises...................................................................................................................................................................................................................................... 276

Chapter 16. High-Level Strings........................................................................................... 280
Section 16.1. Interface................................................................................................................................................................................................................. 280 Section 16.2. Implementation..................................................................................................................................................................................................... 287 Further Reading.......................................................................................................................................................................................................................... 304 Exercises...................................................................................................................................................................................................................................... 305

Chapter 17. Extended-Precision Arithmetic........................................................................ 308
Section 17.1. Interface................................................................................................................................................................................................................. 308 Section 17.2. Implementation...................................................................................................................................................................................................... 314 Further Reading.......................................................................................................................................................................................................................... 332 Exercises...................................................................................................................................................................................................................................... 333

Chapter 18. Arbitrary-Precision Arithmetic........................................................................ 334
Section 18.1. Interface................................................................................................................................................................................................................. 334 Section 18.2. Example: A Calculator........................................................................................................................................................................................... 338 Section 18.3. Implementation..................................................................................................................................................................................................... 345 Further Reading.......................................................................................................................................................................................................................... 364 Exercises...................................................................................................................................................................................................................................... 365

Chapter 19. Multiple-Precision Arithmetic.......................................................................... 368
Section 19.1. Interface................................................................................................................................................................................................................. 369 Section 19.2. Example: Another Calculator................................................................................................................................................................................ 376 Section 19.3. Implementation..................................................................................................................................................................................................... 384 Further Reading........................................................................................................................................................................................................................... 413 Exercises....................................................................................................................................................................................................................................... 413

Chapter 20. Threads............................................................................................................ 416
Section 20.1. Interfaces................................................................................................................................................................................................................ 419 Section 20.2. Examples............................................................................................................................................................................................................... 429 Section 20.3. Implementations................................................................................................................................................................................................... 442 Further Reading........................................................................................................................................................................................................................... 474 Exercises...................................................................................................................................................................................................................................... 476

Interface Summary............................................................................................................. 480
AP................................................................................................................................................................................................................................................. 481 Arena........................................................................................................................................................................................................................................... 482 Arith............................................................................................................................................................................................................................................. 483 Array............................................................................................................................................................................................................................................ 483 ArrayRep...................................................................................................................................................................................................................................... 484 Assert........................................................................................................................................................................................................................................... 485 Atom............................................................................................................................................................................................................................................ 485 Bit................................................................................................................................................................................................................................................. 485 Chan............................................................................................................................................................................................................................................. 487 Except.......................................................................................................................................................................................................................................... 487 Fmt.............................................................................................................................................................................................................................................. 488 List............................................................................................................................................................................................................................................... 489 Mem............................................................................................................................................................................................................................................. 490 MP................................................................................................................................................................................................................................................ 491 Ring.............................................................................................................................................................................................................................................. 494 Sem.............................................................................................................................................................................................................................................. 495 Seq............................................................................................................................................................................................................................................... 496 Set................................................................................................................................................................................................................................................ 497
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl

Stack............................................................................................................................................................................................................................................ 498 Str................................................................................................................................................................................................................................................ 498 Table............................................................................................................................................................................................................................................. 501 Text.............................................................................................................................................................................................................................................. 502 Thread.......................................................................................................................................................................................................................................... 504 XP................................................................................................................................................................................................................................................. 505

Bibliography....................................................................................................................... 508
...................................................................................................................................................................................................................................................... 515

bvdindexIndex..................................................................................................................... 515

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

Addison-Wesley Professional Computing Series
Brian W. Kernighan, Consulting Editor

Ken Arnold/John Peyton, A C User’s Guide to ANSI C Tom Cargill, C++ Programming Style William R. Cheswick/Steven M. Bellovin, Firewalls and Internet Security: Repelling the Wily Hacker David A. Curry, UNIX® System Security: A Guide for Users and System Administrators Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software David R. Hanson, C Interfaces and Implementations: Techniques for Creating Reusable Software John Lakos, Large Scale C++ Software Design Scott Meyers, Effective C++: 50 Specific Ways to Improve Your Programs and Designs Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs Robert B. Murray, C++ Strategies and Tactics David R. Musser/Atul Saini, STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library John K. Ousterhout, Tcl and the Tk Toolkit Craig Partridge, Gigabit Networking J. Stephen Pendergrast Jr., Desktop KornShell Graphical Programming Radia Perlman, Interconnections: Bridges and Routers David M. Piscitello/A. Lyman Chapin, Open Systems Networking: TCP/IP and OSI Stephen A. Rago, UNIX® System V Network Programming Curt Schimmel, UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers W. Richard Stevens, Advanced Programming in the UNIX® Environment W. Richard Stevens, TCP/IP Illustrated, Volume 1: The Protocols W. Richard Stevens, TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the UNIX Domain Protocols Gary R. Wright/W. Richard Stevens, TCP/IP Illustrated, Volume 2: The Implementation

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

C INTERFACES
AND IMPLEMENTATIONS

Licensed by Techniques for Creating Frank Liu Reusable Software 1740749
David R. Hanson
Princeton University

▲ ▼▼ ADDISON-WESLEY
An imprint of Addison Wesley Longman, Inc. Reading, Massachusetts • Harlow, England • Menlo Park, California Berkeley, California • Don Mills, Ontario • Sydney Bonn • Amsterdam • Tokyo • Mexico City

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

This book was prepared from camera-ready copy supplied by the author. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book and Addison Wesley Longman, Inc. was aware of a trademark claim, the designations have been printed in initial caps or all caps. The authors and publishers have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers discounts on this book when ordered in quantity for special sales. For more information, please contact: Corporate & Professional Publishing Group Addison Wesley Longman, Inc. One Jacob Way Reading, Massachusetts 01867 Library of Congress Cataloging-in-Publication Data Hanson, David R. C interfaces and implementations : techniques for creating reusable software / David R. Hanson. p. cm. –– (Addison-Wesley professional computing series) Includes bibliographical references and index. ISBN 0-201-49841-3 (pbk.) 1. C (Computer program language) 2. Computer software– –Reusability I. Title. II. Series. QA76.73.C15H37 1996 005.13'3––dc20 96-28817 CIP Copyright © 1997 by David R. Hanson. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada. Text design by Wilson Graphics & Design (Kenneth J. Wilson). Text printed on recycled and acid-free paper. ISBN 0-201-49841-3 2 3 4 5 6 7 8 9 10-MA-00999897 Second printing, January 1997

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

PREFACE

P

rogrammers are inundated with information about application programming interfaces, or APIs. Yet, while most programmers use APIs and the libraries that implement them in almost every application they write, relatively few create and disseminate new, widely applicable, APIs. Indeed, programmers seem to prefer to “roll their own” instead of searching for a library that might meet their needs, perhaps because it is easier to write application-specific code than to craft welldesigned APIs. I’m as guilty as the next programmer: lcc, a compiler for ANSI/ISO C written by Chris Fraser and myself, was built from the ground up. (lcc is described in A Retargetable C Compiler: Design and Implementation, Addison-Wesley, 1995.) A compiler exemplifies the kind of application for which it is possible to use standard interfaces and to create interfaces that are useful elsewhere. Examples include interfaces for memory management, string and symbol tables, and list manipulation. But lcc uses only a few routines from the standard C library, and almost none of its code can be used directly in other applications. This book advocates a design methodology based on interfaces and their implementations, and it illustrates this methodology by describing 24 interfaces and their implementations in detail. These interfaces span a large part of the computing spectrum and include data structures, arithmetic, string processing, and concurrent programming. The implementations aren’t toys — they’re designed for use in production code. As described below, the source code is freely available. There’s little support in the C programming language for the interfacebased design methodology. Object-oriented languages, like C++ and Modula-3, have language features that encourage the separation of an interface from its implementation. Interface-based design is independent of any particular language, but it does require more programmer willpower and vigilance in languages like C, because it’s too easy to pollute an interface with implicit knowledge of its implementation and vice versa.

xi

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

xii

PREFACE

Once mastered, however, interface-based design can speed development time by building upon a foundation of general-purpose interfaces that can serve many applications. The foundation class libraries in some C++ environments are examples of this effect. Increased reuse of existing software — libraries of interface implementations — reduces initial development costs. It also reduces maintenance costs, because more of an application rests on well-tested implementations of general-purpose interfaces. The 24 interfaces come from several sources, and all have been revised for this book. Some of the interfaces for data structures — abstract data types — originated in lcc code, and in implementations of the Icon programming language done in the late 1970s and early 1980s (see R. E. Griswold and M. T. Griswold, The Icon Programming Language, Prentice Hall, 1990). Others come from the published work of other programmers; the “Further Reading” sections at the end of each chapter give the details. Some of the interfaces are for data structures, but this is not a data structures book, per se. The emphasis is more on algorithm engineering — packaging data structures for general use in applications — than on data-structure algorithms. Good interface design does rely on appropriate data structures and efficient algorithms, however, so this book complements traditional data structure and algorithms texts like Robert Sedgewick’s Algorithms in C (Addison-Wesley, 1990). Most chapters describe one interface and its implementation; a few describe related interfaces. The “Interface” section in each chapter gives a concise, detailed description of the interface alone. For programmers interested only in the interfaces, these sections form a reference manual. A few chapters include “Example” sections, which illustrate the use of one or more interfaces in simple applications. The “Implementation” section in each chapter is a detailed tour of the code that implements the chapter’s interface. In a few cases, more than one implementation for the same interface is described, which illustrates an advantage of interface-based design. These sections are most useful for those modifying or extending an interface or designing related interfaces. Many of the exercises explore design and implementation alternatives. It should not be necessary to read an “Implementation” section in order to understand how to use an interface. The interfaces, examples, and implementations are presented as literate programs; that is, the source code is interleaved with its explanation in an order that best suits understanding the code. The code is extracted automatically from the text files for this book and assembled into the

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

ORGANIZATION

xiii

order dictated by the C programming language. Other book-length examples of literate programming in C include A Retargetable C Compiler and The Stanford GraphBase: A Platform for Combinatorial Computing by D. E. Knuth (Addison-Wesley, 1993).

Organization
The material in this book falls into the following broad categories:
Foundations 1. 2. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 3. 14. 15. 16. Introduction Interfaces and Implementations Exceptions and Assertions Memory Management More Memory Management Lists Tables Sets Dynamic Arrays Sequences Rings Bit Vectors Atoms Formatting Low-Level Strings High-Level Strings

Data Structures

Strings

Arithmetic

17. Extended-Precision Arithmetic 18. Arbitrary-Precision Arithmetic 19. Multiple-Precision Arithmetic 20. Threads

Threads

Most readers will benefit from reading all of Chapters 1 through 4, because these chapters form the framework for the rest of the book. The remaining chapters can be read in any order, although some of the later chapters refer to their predecessors. Chapter 1 covers literate programming and issues of programming style and efficiency. Chapter 2 motivates and describes the interfacebased design methodology, defines the relevant terminology, and tours two simple interfaces and their implementations. Chapter 3 describes

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

. This book can be used for courses in several ways. Many of the interfaces use advanced C programming techniques. and designers. At Princeton. which are used in every interface. I reverse these assignments. Sometimes. Chapters 5 and 6 describe the memory management interfaces used by almost all the implementations. and they must implement Table. for example. and it often induces them to try interface-based design for their own parts of the project. The rest of the chapters each describe an interface and its implementation. the material in this book is used in systems programming courses from the sophomore to first-year graduate levels.2’s word frequency program. and the specifications for Section 8. Assignments require students to be interface clients. and thus serve as nontrivial examples of those techniques. Chapter 4 introduces exceptions and assertions. which are useful in systems programming and data structure courses. The students must implement wf using only my object code for Table.1’s Table interface. and these assignments are usually their first exposure to the semiformal notation used in interfaces and program specification. which is the simplest production-quality interface in this book. they get the object code for wf. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson.” Interfaces and implementations are the focus of Princeton’s sophomore-level systems programming course. the simplest being in project-oriented courses. This download file is made available for personal use only and is subject to the Terms of Service. students often build a compiler for a toy language. Frank Liu Copyright © 1997 by David R.com. and have a working understanding of fundamental data structures at the level presented in texts like Algorithms in C.xiv PREFACE the prototypical Atom interface. Substantial projects are common in graphics courses as well. wf. such as opaque pointers and pointers to pointers. In a compiler course. They are unaccustomed to having only object code for major parts of their program. In the next assignment. C Interfaces and Implementations: Techniques for Creating Reusable Software. for example.. This usage helps students realize the enormous savings that reuse can bring to a project. implementors. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. In one assignment. Instructional Use I assume that readers understand C at the level covered in undergraduate introductory programming courses. but both orders are eye-openers for most students. because that’s a way of life in the “real world. All rights reserved. Many of the interfaces can simplify the projects in these kinds of courses by eliminating some of the grunt programming needed to get such projects off the ground. I distribute Section 8. Any other use requires prior written consent from the copyright owner. This latter effect is particularly valuable in team projects. Unauthorized use. the object code for its implementation.

The results are similar to the interfaces described in Chapters 17 through 19. (This grading policy is less harsh than it sounds. . and different assignments have different weights. only the offending part is penalized. K.com. Programs that crash get a grade of zero. because in multipart assignments. For example. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusable Software. and a postassignment comparison of these interfaces. They also gain an appreciation of the advantages of safe languages. Different groups design interfaces. 1994) and editor-specific interfaces designed and implemented by the students. I forbid “unannounced” crashes. Again. it takes a few assignments before students begin to appreciate the value of these concepts. The unsuccessful alternatives are often more educational than the successful ones. Tk itself provides another good example of interfacebased design. Ousterhout. they come to appreciate that designing good interfaces is hard.) Once students have a few interfaces under their belts. crashes that are not announced by an assertion failure diagnostic. Students work in groups to design the interfaces for the arbitrary-precision arithmetic that is needed for this assignment. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and they almost always become converts to interface-based design. they understand their mistakes. Frank Liu Copyright © 1997 by David R. but it gets the students’ attention.INSTRUCTIONAL USE xv Initial assignments also introduce checked runtime errors and assertions as integral parts of interface specifications. I usually package assignments as interfaces and give the students free rein to revise and improve on them. is always quite revealing. AddisonWesley. like ML and Modula-3. one of Andrew Appel’s favorite assignments is a primality testing program.. This penalty may seem unduly harsh. in hindsight. and allowing substantial changes encourages creative students to explore alternatives. in which unannounced crashes are impossible. Kai Li accomplishes similar goals with a semester-long project that builds an X-based editor using the Tcl/Tk system (J. reproduction and/or distribution are strictly prohibited and violate applicable laws. but worth the effort. All rights reserved. When. Unauthorized use. but none has ever caused a course grade to shift by a whole point. in which the groups critique one anothers’ work. and even to change the goals of the assignment. I’ve given many zeros. later assignments ask them to design new interfaces and to live with their design choices. Giving them a starting point reduces the time required for assignment. Students invariably go down the wrong road. and they pay for it with greatly increased development time. that is. Hanson. Tcl and the Tk Toolkit. In advanced courses.

zip may also be available on America Online.princeton.5 gcc 2. and download the file README. they assume that the machine has two’s-complement integer and IEEE floating-point arithmetic. ciixy.3 MIPS R3000 Ultrix 4. reproduction and/or distribution are strictly prohibited and violate applicable laws. and that unsigned longs can hold object pointers.. Information is also available on the World Wide Web at the URL http://www.xvi PREFACE How to Get the Software The software in this book has been tested on the following platforms: processor SPARC operating systems SunOS 4.2A MIPS R3000 IRIX 5.1 compilers lcc 3. C Interfaces and Implementations: Techniques for Creating Reusable Software.6.7.princeton.cs.zip is a ZIP file compatible with PKZIP version 2. Frank Liu Copyright © 1997 by David R.com. where xy is the version number.0 Alpha OSF/1 3.edu/software/cii/.04g. Hanson.edu in pub/packages/cii.3 cc lcc 3.princeton. that is. which describes the contents of the directory and how to download the distribution. change to the directory pub/packages/cii.0.5 gcc 2. The files in ciixy. their lines end with carriage returns and linefeeds.5 gcc 2. and other online services.zip are DOS/Windows text files. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. ciixy.tar. The source code for everything in this book is available for anonymous ftp at ftp. Any other use requires prior written consent from the copyright owner. This page includes instructions on reporting bugs. .3 Pentium Windows 95 Windows NT 3.tar. and ciixy. for example.0 gcc 2.5.7 Microsoft Visual C/C++ 4. Use an ftp client to connect to ftp.edu. All rights reserved.51 A few of the implementations are machine-specific.6.cs. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. CompuServe. The most recent distributions are usually in files with names like ciixy.cs.gz is a UNIX tar file compressed with gzip.3 cc lcc 3.zip.2 lcc 4.gz or ciixy. 10 is version 1.

John Reppy. Technical interactions with colleagues and students have contributed to this book in many ways. My thanks to SRC for supporting my visits. Alex Gounares. Careful readings of my code and prose by Rex Jaeschke. Interfaces are a way of life at Digital’s System Research Center (SRC). Kai Li. Eric Muller. Unauthorized use. Their feedback over the years has been an important contribution to both the code in this book and its explanation. because they suffered unknowingly through the drafts of most of what’s in this book. Brian Kernighan. and Richard Stevens. and to Bill Kalsow. Richard O’Keefe. Even seemingly unrelated discussions have provoked improvements in my code and in its explanation.com. Greg Astfalk. Rob Pike. Anne Rogers.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. Chris Fraser. Students in these courses have been guinea pigs for my drafts of these interfaces. The CCRs provided ideal hideouts at which to plan and complete this book. Hanson. . Jacob Navia. Norman Ramsey. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. Mary Fernández. The Princeton students in several offerings of COS 217 and COS 596 deserve special thanks. and David Spuler made a significant contribution to the quality of both. Jack Davidson. Thanks to Andrew Appel.ACKNOWLEDGMENTS xvii Acknowledgments I have been using some of the interfaces in this book for my own research projects and in courses at the University of Arizona and Princeton University since the late 1970s. Bill Plauger. and Greg Nelson for many illuminating discussions. Hanson C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. Maylee Noah. reproduction and/or distribution are strictly prohibited and violate applicable laws. Licensed by Frank Liu 1740749 David R. John Ellis. and my 1992 and 1993 summers at SRC working on the Modula-3 project erased any doubts I may have harbored about the efficacy of this approach. My thanks to IDA’s Centers for Communications Research in Princeton and La Jolla for their support during the summer of 1994 and during my 1995–96 sabbatical. Taj Khattra.

com. Frank Liu Copyright © 1997 by David R. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use.C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. . Any other use requires prior written consent from the copyright owner.. reproduction and/or distribution are strictly prohibited and violate applicable laws.

and efficiency. Assuming that library code has been tested thoroughly. procedures. Frank Liu Copyright © 1997 by David R. Most programs are written from scratch. Unfortunately. Ideally. These modules provide the functions. Any other use requires prior written consent from the copyright owner. most of these modules are ready-made and come from libraries. programmers may simply reimplement the parts of the library they need. they may be too hard to 1 A C Interfaces and Implementations: Techniques for Creating Reusable Software. Programmers often write application-specific code for even these kinds of lowlevel components. such as I/O and memory management. There are undoubtedly many reasons for this situation. for example. and they use libraries only for the lowest level facilities. often exhibit this problem. Library design and implementation are difficult. If this effort even appears to be close to the effort required to write the application. to find applications in which the C library functions malloc and free have been replaced by custom memory-management functions. well designed modules are rare. this theoretical ideal rarely occurs in practice. and is only now appearing on most platforms. All rights reserved. Unauthorized use. User-interface libraries. only those that are specific to the application at hand need to be written from scratch. Another reason is size: Some libraries are so big that mastering them is a major undertaking. . Some of the libraries that are available are mediocre and lack standards. one of them is that widely available libraries of robust. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.1 INTRODUCTION big program is made up of many small modules. If the routines and data structures in a library are too general. only the application-specific code will contain bugs. reproduction and/or distribution are strictly prohibited and violate applicable laws. Designers must tread carefully between generality. and debugging can be confined to just that code. This download file is made available for personal use only and is subject to the Terms of Service.. which have proliferated recently. it’s common.com. simplicity. The C library has been standardized since 1989. and data structures used in the program.

1 Literate Programs This book describes modules not by prescription. Library implementors face similar hurdles. if an implementation has bugs. Frank Liu Copyright © 1997 by David R. This design methodology. C Interfaces and Implementations: Techniques for Creating Reusable Software. its realloc function. This book describes the design and implementation of a library that is suitable for a wide range of applications written in the C programming language. For example. 1.2 INTRODUCTION use or inefficient for their intended purposes. Most of the facilities described in the subsequent chapters are those covered in undergraduate courses on data structures and algorithms. ¢compute x • y²≡ sum = 0. A literate program is composed of English prose and labeled chunks of program code. reproduction and/or distribution are strictly prohibited and violate applicable laws. more attention is paid to how they are packaged and to making them robust. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. separates module specifications from their implementations. . they run the risk of not satisfying the demands of applications that might use them. The library exports a set of modules that provide functions and data structures for “programming-in-the-small. If an implementation is too slow or too big — or just perceived to be so — programmers will design their own replacements. Worst of all. for instance.” These modules are suitable for use as “piece parts” in applications or application components that are a few thousand lines long. This download file is made available for personal use only and is subject to the Terms of Service. The C library itself provides a few examples. If they’re too confusing. but by example. what you see is what you get. Hanson. All rights reserved.com. explained in Chapter 2. a poor implementation will scare off users. it shatters the ideal outlined above and renders the library useless. Any other use requires prior written consent from the copyright owner. Each chapter describes one or two interfaces and their implementations in full. and helps provide robust implementations. Unauthorized use. Even if the design is done well. But here. More important. Each module is presented as an interface and its implementation. each chapter is the source code for the interfaces and implementations it describes. If they’re too simple. These descriptions are presented as literate programs. The code for an interface and its implementation is intertwined with prose that explains it.. programmers won’t use them. promotes clarity and precision in those specifications. The code is extracted automatically from the source text for this book. is a marvel of confusion.

} When the chunk ¢function dotproduct² is extracted from the file that holds this chapter. Unauthorized use. i < n. and isn’t limited by the comment conventions of the programming language. sum = 0. uses of chunks are replaced by their code. ¢compute x • y² return sum. return sum. for example. int y[]. The code can be revealed in whatever order is best for understanding it. int n) { int i. To illustrate these features and to provide a complete example of a literate C program. . that definitions of program entities precede their uses. and so on. This chunk is used by referring to it in another chunk: ¢function dotproduct²≡ int dotProduct(int x[]. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. Hanson. its code computes the dot product of the arrays x and y. i++) sum += x[i]*y[i]. English prose subsumes traditional program comments. not in the order dictated by rules that insist. reproduction and/or distribution are strictly prohibited and violate applicable laws. The literate-programming system used in this book has a few more features that help describe programs piecemeal. the rest C Interfaces and Implementations: Techniques for Creating Reusable Software.LITERATE PROGRAMS 3 for (i = 0. defines a chunk named ¢compute x • y².. for (i = 0. int n) { int i. sum. } A literate program can be presented in small pieces and documented thoroughly. int y[]. Frank Liu Copyright © 1997 by David R. The result of extracting ¢function dotproduct² is a file that holds just the code: int dotProduct(int x[]. i < n. Any other use requires prior written consent from the copyright owner. The chunk facility frees literate programs from the ordering constraints imposed by programming languages. its code is copied verbatim. This download file is made available for personal use only and is subject to the Terms of Service. sum. i++) sum += x[i]*y[i]. All rights reserved.

c 4²≡ ¢includes 5² ¢data 6² ¢prototypes 6² ¢functions 5² By convention. the UNIX command % double intro.txt. The 4 in ¢double.txt at the lines shown. reproduction and/or distribution are strictly prohibited and violate applicable laws.” and “if” appear in inter. the second occurrence appears on line 10. The numbers in the chunks used in ¢double. and the output is shown in a regular typewriter font. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The other chunks are labeled with double’s top-level components.c 4² extracts the program.txt | double 10: the 143: interface 343: type 544: if In these and other displays. Hanson.txt:410: type inter. These components are listed in the order dictated by the C programming language.txt inter.” “type. Let’s start double by defining a root chunk that uses other chunks for each of the program’s components: ¢double. commands typed by the user are shown in a slanted typewriter font. This download file is made available for personal use only and is subject to the Terms of Service. the root chunk is labeled with the program’s file name. a program that detects adjacent identical words in its input.com. Frank Liu Copyright © 1997 by David R.txt intro. it reads its standard input and omits the file names from its output. Unauthorized use.. . such as “the the.c 4² are the C Interfaces and Implementations: Techniques for Creating Reusable Software.4 INTRODUCTION of this section describes double. All rights reserved.txt inter. For example: % cat intro. If double is invoked with no arguments. and double occurrences of “interface.” For example. extracting the chunk ¢double.txt:611: if shows that “the” occurs twice in the file intro. Any other use requires prior written consent from the copyright owner.txt:110: interface inter. but they can be presented in any order.c 4² is the page number on which the definition of the chunk begins.txt:10: the inter.

It opens each file and calls doubleword to scan the file: ¢functions 5²≡ int main(int argc. i < argc. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. "r"). ¢functions 5²+≡ int getword(FILE *fp. c = getc(fp). The main function handles double’s arguments. i++) { FILE *fp = fopen(argv[i]. argv[i]. return EXIT_SUCCESS. } } if (argc == 1) doubleword(NULL.h> #include <errno. stdin). char *argv[]) { int i. } else { doubleword(argv[i]. "%s: can't open '%s' (%s)\n". and case doesn’t matter. getword reads the next word from an opened file into buf[0. Frank Liu Copyright © 1997 by David R. } ¢includes 5²≡ #include <stdio.h> #include <stdlib. reproduction and/or distribution are strictly prohibited and violate applicable laws. char *buf. ¢scan forward to a nonspace character or EOF 6² C Interfaces and Implementations: Techniques for Creating Reusable Software. for (i = 1.size-1] and returns one. All rights reserved. a word is one or more nonspace characters. it returns zero when it reaches the end of file. Unauthorized use..h> The function doubleword needs to read words from a file.LITERATE PROGRAMS 5 page numbers on which their definitions begin.com. fp). int size) { int c. if (fp == NULL) { fprintf(stderr. strerror(errno)). argv[0].. . These page numbers help readers navigate the code. For the purposes of this program. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. return EXIT_FAILURE. Any other use requires prior written consent from the copyright owner. fclose(fp).

. Since getword follows main. In addition to plucking the next word from the input.. C Interfaces and Implementations: Techniques for Creating Reusable Software. } ¢prototypes 6²≡ int getword(FILE *. char *. when it is first used. c != EOF && isspace(c). Hanson. ¢data 6²≡ int linenum. linenum is given here.h> The definition of linenum exemplifies chunks that are presented in an order different from what is required by C. getword increments linenum whenever it runs across a new-line character. so it’s easy to find the beginning of a chunk’s definition.size-1] 7² if (c != EOF) ungetc(c. Any other use requires prior written consent from the copyright owner. instead of at the top of the file or before the definition of getword. fp). then functions can be presented in any order. ¢includes 5²+≡ #include <ctype. Unauthorized use. c = getc(fp)) if (c == '\n') linenum++. This download file is made available for personal use only and is subject to the Terms of Service. doubleword uses linenum when it emits its output. int). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which is where C insists that it be defined. This chunk illustrates another literate programming feature: The +≡ that follows the chunk labeled ¢functions 5² indicates that the code for getword is appended to the code for the chunk ¢functions 5².6 INTRODUCTION ¢copy the word into buf[0. All rights reserved. but if it is defined consistently and appears before ¢functions 5² in the root chunk. This chunk is something of a concession to C’s declaration-before-use rule. return ¢found a word? 7². The page number in the label for a continued chunk refers to the first definition for the chunk. ¢scan forward to a nonspace character or EOF 6²≡ for ( .com. . This feature permits the code in a chunk to be doled out a little at a time. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. so that chunk now holds the code for main and for getcode. the call to getword in main needs a prototype. which is the purpose of the ¢prototypes 6² chunk.

Frank Liu Copyright © 1997 by David R. } The index i is compared to size . The if statement protecting this assignment handles the case when size is zero. word)==0) ¢word is a duplicate 8² strcpy(prev. FILE *fp) { char prev[128]. compares it with the previous word. All rights reserved.. and complains about duplicates. word[128].1 to guarantee there’s room to store a null character at the end of the word.. This download file is made available for personal use only and is subject to the Terms of Service.LITERATE PROGRAMS 7 The value of size is the limit on the length of words stored by getword. they’re simply text. linenum = 1. Any other use requires prior written consent from the copyright owner. and zero otherwise: ¢found a word? 7²≡ buf[0] != '\0' This definition shows that chunks don’t have to correspond to statements or to any other syntactic unit of C. word. It looks only at words that begin with letters: ¢functions 5²+≡ void doubleword(char *name. Unauthorized use.1) buf[i++] = tolower(c). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. word). for ( . but this kind of defensive programming helps catch “can’t happen” bugs. Hanson. This case won’t occur in double.com. if (i < size) buf[i] = '\0'. while (getword(fp. C Interfaces and Implementations: Techniques for Creating Reusable Software.size-1] 7²≡ { int i = 0. reproduction and/or distribution are strictly prohibited and violate applicable laws. doubleword reads each word. . sizeof word)) { if (isalpha(word[0]) && strcmp(prev. c = getc(fp)) if (i < size . c != EOF && !isspace(c). prev[0] = '\0'. which discards the excess characters and folds uppercase letters to lowercase: ¢copy the word into buf[0. All that remains is for getword to return one if buf holds a word.

rather. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusable Software. } This chunk is defined as a compound statement so that it can appear as the consequent of the if statement in which it is used. It uses consistent conventions for naming variables. and routines. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. a consistent indentation style. linenum. ¢includes 5²+≡ #include <string. This download file is made available for personal use only and is subject to the Terms of Service. types. The compiler doesn’t care about the names chosen for variables.8 INTRODUCTION } } ¢prototypes 6²+≡ void doubleword(char *. But these kinds of details can have enormous impact on how easily programmers can read and understand a program. FILE *). . printf("%d: %s\n".h> Emitting the output is easy.com. Stylistic conventions are not a rigid set of rules that must be followed at all costs. word). or how the program is divided into modules. It is more important for programs to be read easily and understood by people than it is for them to be compiled easily by computers. to the extent permitted by the typographical constraints imposed by this book. The code in this book follows established stylistic conventions for C programs. and. name).2 Programming Style double illustrates many of the stylistic conventions used for the programs in this book. Unauthorized use. they express a philosophical approach to programming that seeks to maximize readability and understanding. Hanson.. All rights reserved. how the code is laid out. the “rules” are broken whenever varying the conventions helps to emphasize important facets of the code or makes complicated code more readable. Thus. 1. reproduction and/or distribution are strictly prohibited and violate applicable laws. but the file name and its trailing colon are printed only if name isn’t null: ¢word is a duplicate 8²≡ { if (name) printf("%s:".

Programmers can use whatever typographical features are best for conveying their intentions. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. Most routines are short. Locals are declared at the beginning of the compound statements in which they are used.. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. the names of procedures and functions are chosen to reflect what the procedures do and what the functions return. Unauthorized use. in which comments are kept to a minimum. and citations.size-1] 7². Hanson. Any other use requires prior written consent from the copyright owner.PROGRAMMING STYLE 9 In general. those in which the noise and excess typography drown out the content do nothing but smother the code. Finally.com. and clarity.. Code that is clear and that uses good naming and indentation conventions usually explains itself. Literate programming avoids many of the battles that occur in comment wars because it isn’t constrained by the comment mechanisms of the programming language. Variables are declared near their first use. misleading comments are usually worse than no comments. Literate programming seems to encourage accuracy. for example. An example is the declaration of i in ¢copy the word into buf[0. Thus getword returns the next word in the input and doubleword finds and announces words that occur two or more times. special cases in algorithms. longer. precision. for (theindex = 0. some comments are just clutter. no more than a page of code. in sum = 0. including tables. pictures. equations. evocative names are used for global variables and routines. are used for local variables. theindex++) sum += x[theindex]*y[theindex]. Compilers can’t check that comments and code agree. the details of data structures. the variable names obscure what the code does. This download file is made available for personal use only and is subject to the Terms of Service. chunks are even shorter. for example. The declaration of linenum near its first use in getword is an example. usually less than a dozen lines. All rights reserved. and short names. perhaps in chunks. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. In general. Using longer names for indices and variables that are used for similarly traditional purposes usually makes the code harder to read. This book follows the lead of classics in C programming. theindex < numofElements. The loop index i in ¢compute x • y² is an example of the latter convention. Comments are called for only to explain. There are almost no comments in the code because the prose surrounding the chunks that comprise the code take their place. and exceptional conditions. which may mirror common mathematical notation. . Stylistic advice on commenting conventions can evoke nearly religious wars among programmers. when possible.

dst and src. For example. which is acceptable in C because all arguments are passed by value — arguments are just initialized locals. regardless of their fluency in C. const char *src) { char *s = dst. and hence the one programmers are most likely to encounter when reading existing C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. src[i] != '\0'. while (*dst++ = *src++) . The library function strcpy. reproduction and/or distribution are strictly prohibited and violate applicable laws. A good case can be made for preferring the array version to the pointer version.com. } Both versions are reasonable implementations of strcpy. Hanson. return s. } The idiomatic version uses pointers: char *strcpy(char *dst. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Idioms involving pointers are often the most confusing because C provides several unique and expressive operators for manipulating pointers. It also modifies its arguments. Any other use requires prior written consent from the copyright owner. . But the pointer version is the one most experienced C programmers would write. which copies one string to another and returns the destination string. and testing the result of the assignment into the single assignment expression. incrementing a pointer. but they must master them to become fluent in C. for (i = 0.. i++) dst[i] = src[i]. Some of these idioms can confuse programmers new to C. return dst. it uses most of the idioms commonly accepted — and expected — by experienced C programmers. Unauthorized use. illustrates the differences between “idiomatic C” and code written by newcomers to C. This download file is made available for personal use only and is subject to the Terms of Service. const char src[]) { int i. the array version is easier for all programmers to understand. dst[i] = '\0'.10 INTRODUCTION The code in this book is written in C. the latter kind of code often uses arrays: char *strcpy(char dst[]. The pointer version uses the common idiom that combines assignment. Frank Liu Copyright © 1997 by David R.

Their execution times on typical inputs will almost always be fast enough for most applications. the only way to find its bottlenecks is to measure it. Hanson. 1. reproduction and/or distribution are strictly prohibited and violate applicable laws. not necessarily as fast as possible. Most of the code in this book uses efficient algorithms that have good average-case performance and whose worst-case performance is easy to characterize. Tuning can’t fix a bad design. If the program is slow everywhere. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . Programmers’ intuitions are notoriously bad at guessing where programs spend their time. This unfortunate situation occurs when designs are drawn from poorly written or imprecise problem specifications. C Interfaces and Implementations: Techniques for Creating Reusable Software. This book can help you learn these idioms. squeezing instructions out of the inner loop of a linear search is doomed to be less profitable than using a binary search in the first place. Frank Liu Copyright © 1997 by David R. Those cases where performance might pose problems in some applications are clearly identified. understand C’s strong points. delivering fast software that crashes is more expensive in the long run than delivering reliable software that’s fast enough. tuning is called for only if the time spent in that place is a significant amount of the running time. and there’s no point in tuning programs in the wrong places. A program’s bottlenecks rarely occur where you expect them or for the reasons you suspect.3 Efficiency Programmers seem obsessed with efficiency. more difficult to understand. Reliability is more important than efficiency. For example. A program needs only to be fast enough. the inefficiency is probably built into the design. Tuning often introduces errors. This download file is made available for personal use only and is subject to the Terms of Service. and more likely to contain errors.. There’s no point in such tuning unless measurements of execution time show that the program is too slow. Unfortunately. All rights reserved. Straightforward implementations of inherently fast algorithms are better than hand-tuned implementations of slow algorithms. It’s pointless to save 1 percent in a search routine if I/O accounts for 60 percent of the program’s running time. When you’ve found the right place. much of this effort is wasted. Tuning a program to make it faster almost always makes it bigger. Tuning is often done in a vacuum. If a program is too slow. Tuning is often done at the wrong level. Any other use requires prior written consent from the copyright owner.com. or when there’s no overall design at all.EFFICIENCY 11 code. They can spend hours tweaking code to make it run faster. Unauthorized use. The fastest program to a crash isn’t a winner. and avoid common pitfalls.

12 INTRODUCTION Some C programmers make heavy use of macros and conditional compilation in their quests for efficiency. putc. reproduction and/or distribution are strictly prohibited and violate applicable laws. getchar. . but Plauger (1992) gives a more detailed description and a complete implementation.com. the standard I/O functions getc. say. These problems are real. Conditional compilation is often used to configure code for specific platforms or environments. If an application must be configured at compile time. and it’s probably easier to maintain. for example. and putchar.. or to enable or disable debugging code. Any other use requires prior written consent from the copyright owner. WEB is the one of the first tools designed explicitly for literate programming. Unauthorized use. Using macros to avoid function calls is rarely necessary. a single compiler that can select one of. but Kernighan and Ritchie (1988) is probably the most widely used reference. which is a useful reference for C programmers. Further Reading The ANSI standard (1990) and the technically equivalent ISO standard (1990) are the definitive references for the standard C library. are often implemented as macros. These tools are also ideal for keeping track of performance improvements. what you see is what is executed. Similarly. and it also describes how to write “clean C” — C code that can be compiled with C++ compilers. All rights reserved. And it’s often more useful to rework the code so that platform dependencies are selected during execution. six architectures for which to generate code at execution time — a cross compiler — is more useful than having to configure and build six different compilers. Knuth (1992) describes WEB and some of its vari- C Interfaces and Implementations: Techniques for Creating Reusable Software. but conditional compilation is usually the easy way out of them and always makes the code harder to read. Frank Liu Copyright © 1997 by David R. For example. although the authors used ad hoc tools to include code in the book. This download file is made available for personal use only and is subject to the Terms of Service. Jaeschke (1991) condenses the essence of Standard C into a compact dictionary format. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. I/O is one of the few places where macros are justified. the standards are the last word on C. Software Tools by Kernighan and Plauger (1976) gives early examples of literate programs. With versioncontrol tools. Hanson. The code isn’t littered with preprocessor directives that make the code hard to read and obscure what’s being compiled and what isn’t. version-control tools are better at it than C’s conditional-compilation facilities. It pays only when objective measurements demonstrate that the costs of the calls in question overwhelm the running times of the rest of the code. The latest edition of Harbison and Steele (1995) is perhaps the most up-to-date with respect to the standards. This book avoids both whenever possible.

Sedgewick (1990) surveys all of the important algorithms most programmers need to know. Koenig (1989) exposes C’s dark corners and highlights the ones that should be avoided. Kernighan.” 1. The best way to learn to write efficient code is to have a thorough grounding in algorithms and to read other code that is efficient. double is taken from Kernighan and Pike (1984) where it’s implemented in the AWK programming language (Aho. . Frank Liu Copyright © 1997 by David R.com.2 What does double print when it sees three or more identical words in its input? Change double to fix this “feature. Simpler tools (Hanson 1987.3 Many experienced C programmers would include an explicit comparison in strcpy’s loop: C Interfaces and Implementations: Techniques for Creating Reusable Software.EXERCISES 13 ants and uses. This compiler is also a cross compiler. Sewell (1989) is a tutorial introduction to WEB. This book follows the enduring style used in Kernighan and Pike (1984) and Kernighan and Ritchie (1988). All rights reserved. Exercises 1. Kernighan and Pike remains one of the best books on the UNIX programming philosophy. Unauthorized use. and Maguire (1993) provides a perspective from the world of PC programming. This download file is made available for personal use only and is subject to the Terms of Service. Bentley (1982) is 170 pages of good advice and common sense on how to write efficient code. What would happen if linenum were incremented in this case? 1.size-1] 7² when a word ends at a new-line character. Any other use requires prior written consent from the copyright owner. Ramsey 1994) can go a long way to providing much of WEB’s essential functionality. Despite its age. Kernighan and Plauger (1978) is the classic book on programming style. to extract the chunks. but it doesn’t include any examples in C. Explain why. Ledgard’s brief book (1987) offers similar advice.1 getword increments linenum in ¢scan forward to a nonspace or EOF 6² but not after ¢copy the word into buf[0. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. one of the programs in Ramsey’s noweb system. and Knuth (1973a) gives the gory details on the fundamental ones. The best way to learn good programming style is to read programs that use good style. Hanson... McConnell (1993) offers sound advice on many aspects of program construction. noweb is also used by Fraser and Hanson (1995) to present an entire C compiler as a literate program. This book uses notangle. and gives a balanced discussion of the pros and cons of using goto statements. and Weinberger 1988).

com. because such usage is a common source of errors. reproduction and/or distribution are strictly prohibited and violate applicable laws. const char *src) { char *s = dst. while ((*dst++ = *src++) != '\0') . } The explicit comparison makes it clear that the assignment isn’t a typographical error. If you have PCLint or LCLint. return s. experiment with it on some “tested” programs..14 INTRODUCTION char *strcpy(char *dst. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use. All rights reserved. Hanson. Some C compilers and related tools. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. issue a warning when the result of an assignment is used as a conditional. C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. like Gimpel Software’s PC-Lint and LCLint (Evans 1996). .

Clients import interfaces.com. Unauthorized use. implementations export them. they may have only the object code for an implementation.. Frank Liu Copyright © 1997 by David R. Clients share interfaces and implementations. there is usually one interface. and routines that are available to code that uses the module. For a given module. hiding irrelevant representation details and algorithms as much as possible. but there might be many implementations that provide the facilities specified by the interface. This methodology also helps avoid bugs — interfaces and implementations are written and debugged once. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. its interface and its implementation. thus avoiding unnecessary code duplication. The interface specifies what a module does. Clients need to see only the interface. .1 Interfaces An interface specifies only those identifiers that clients may use. but used often.2 AND IMPLEMENTATIONS INTERFACES module comes in two parts. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. A 2. reproduction and/or distribution are strictly prohibited and violate applicable laws. An implementation specifies how a module accomplishes the purpose advertised by its interface. This helps clients avoid dependencies on the specifics of particular implementations. Any other use requires prior written consent from the copyright owner. A client is a piece of code that uses a module. This kind of dependency between a client and an implementation — coupling — causes bugs when an implementation 15 C Interfaces and Implementations: Techniques for Creating Reusable Software. but they all must meet the specification given by the interface. Each implementation might use different algorithms and data structures. Indeed. All rights reserved. It declares the identifiers. types.

com. All rights reserved. Arith_min and Arith_max return the minimum and maximum of their integer arguments. C Interfaces and Implementations: Techniques for Creating Reusable Software. This header file declares the macros. type definitions. Arith_floor (int x. A client imports an interface with the C preprocessor #include directive.h. which usually has a . The interface name appears as a prefix for each of the identifiers in the interface. An implementation provides definitions for each of these functions. int Arith_div(int x. This download file is made available for personal use only and is subject to the Terms of Service. The interface ¢arith. types. and enumeration tags share another single name space. these bugs can be particularly hard to fix when the dependencies are buried in hidden or implicit assumptions about an implementation. In C. but C offers few alternatives. All global structure. but help document client code. y). but usually has only hundreds of modules. functions. reproduction and/or distribution are strictly prohibited and violate applicable laws. In a large program.16 INTERFACES AND IMPLEMENTATIONS changes. y). union. Hanson.h²≡ extern extern extern extern extern extern int int int int int int Arith_max(int x. This convention isn’t pretty. Unauthorized use. C has only minimal support for separating interfaces from implementations. Frank Liu Copyright © 1997 by David R. and enumeration constants — share a single name space.. but simple conventions can yield most of the benefits of the interface/implementation methodology. y). int Arith_min(int x. it’s easy to use the same name for different purposes in otherwise unrelated modules. Module names not only provide suitable prefixes. A well-designed and precisely specified interface reduces coupling. All file-scope identifiers — variables. The following example illustrates the conventions used in this book’s interfaces. The interface is named Arith and the interface header file is named arith. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. One way to avoid these name collisions is use a prefix. such as the module name. declares six integer arithmetic functions. .h file extension. int y). int Arith_mod(int x. Any other use requires prior written consent from the copyright owner. y). int Arith_ceiling(int x. an interface is specified by a header file. and routines that clients may use. The functions in the Arith interface provide some useful pieces missing from the standard C library and provide well-defined results for division and modulus where the standard leaves the behavior of these operations undefined or implementation-defined. data structures. variables. int y). A large program can easily have thousands of global identifiers.

The standard library functions div and ldiv take two integers or long integers and return the quotient and remainder in the quot and rem fields of a structure. 5) is −3. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. z is −2. y) is equal to x%y. When the operands have different signs.5) Arith_floor ( 13. For example: Arith_ceiling( 13.5) Arith_ceiling(-13. y) is the maximum integer that does not exceed the real number z such that z•y = x. y) is defined to be equal to x − y•Arith_div(x.com. Unauthorized use. These semantics permit integer division to truncate toward zero or toward minus infinity when one of the operands is negative. then the value of −13%5 must be −13 − (−3)•5 = 2. For example. 5) is −13 − 5•(−3) = 2.6 −2. Arith_div and Arith_mod behave the same as x/y and x%y. y). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.5) Arith_floor (-13. Frank Liu Copyright © 1997 by David R. y) is equal to x/y and Arith_mod(x. y) returns the least integer not less than the real quotient of x/y. But if −13/5 is −3.6 −2. The C standard insists only that if x/y is representable. They always truncate toward the left on the number line. however. Their semantics are well defined: they always truncate toward zero. The definitions for Arith_div and Arith_mod are couched in more precise mathematical terms.INTERFACES 17 Arith_div returns the quotient obtained by dividing x by y. All rights reserved. if −13/5 is −2. Arith_ceiling returns the integer to the right of x/y on the number line. so Arith_mod(-13. Arith_div and Arith_mod are similarly well defined. Arith_mod(x. The built-in operators are thus useful only for positive operands. Arith_div(x. Hanson. 5).6. The functions Arith_ceiling and Arith_floor follow similar conventions. so div(-13. y) returns the greatest integer not exceeding the real quotient of x/y. Arith_div(x. and Arith_floor(x.. and Arith_floor returns the integer to the left of x/y for all operands. Arith_ceiling(x. . then (x/y)•y + x%y must be equal to x.5) = = = = 13/5 −13/5 13/5 −13/5 = = = = 2.6 2. for x = – 13 and y = 5 (or x = 13 and y = −5). When x and y are both positive or both negative. and Arith_mod returns the corresponding remainder.quot is always equal to −2. 5) returns −3. so Arith_div(-13. the values returned by C’s built-in operators depend on the implementation. reproduction and/or distribution are strictly prohibited and violate applicable laws. then the standard says that −13%5 must be equal to −13 − (−13/5)•5 = −13 − (−2)•5 = −3. When y is zero.6 = = = = 3 −2 2 −3 C Interfaces and Implementations: Techniques for Creating Reusable Software. so Arith_div(-13. and toward minus infinity when their signs are different. toward zero when their operands have the same sign. Thus.

define what is undefined. and make explicit decisions about behaviors that the language specifies as undefined or implementationdefined. When i is zero. Most programming languages include holes in their semantics where the precise meanings of some operations are ill-defined or simply undefined. . like those used in hash tables. Well-designed interfaces plug these holes. for example. and if i is 0. An implementation reveals the representation details and algorithms of its particular rendition of the interface. increment and decrement i correctly. An interface can have more than one implementation. Arith is not just an artificial example designed to show C’s pitfalls. It is useful. but. if i is N-1. too. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. but only when i-1+N can’t overflow. it can be changed without affecting clients.rem is always −1. N).com. All rights reserved. i-1 is N-1. This download file is made available for personal use only and is subject to the Terms of Service. N).2 Implementations An implementation exports an interface. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It returns a structure whose quot and rem fields hold the quotient and remainder of x/y.1. 2. Frank Liu Copyright © 1997 by David R. The library function div(x. y) doesn’t help either. for example. usually by loading them from libraries. (i-1)%N can be -1 or N-1. Well-designed interfaces avoid machine depen- C Interfaces and Implementations: Techniques for Creating Reusable Software. That is. A different implementation might provide better performance. Suppose i is to range from zero to N-1 where N exceeds 1 and incrementing and decrementing i is to be done modulo N.18 INTERFACES AND IMPLEMENTATIONS This laborious specification for an interface as simple as Arith is unfortunately both typical and necessary for most interfaces. It defines the variables and functions necessary to provide the facilities specified by the interface. Unauthorized use. i = Arith_mod(i . ideally.. for algorithms that involve modular arithmetic. C’s semantics are riddled with such holes. The expression i = (i+1)%N works. but i = (i-1)%N doesn’t work because when i is 0. N). The programmer who uses (i-1)%N on a machine where (-1)%N returns N-1 and counts on that behavior is in for a rude surprise when the code is ported to a machine where (-1)%N returns -1. Clients share object code for implementations. Hanson. As long as the implementation adheres to the interface. i+1 is 0. clients never need to see these details. The expressions i = Arith_mod(i + 1. It is possible to use i = (i-1+N)%N. div(i-1.

Arith_div must cope with the two possible behaviors for division when its arguments have different signs. } Licensed by Frank Liu 1740749 In addition to ¢arith. If division truncates toward zero and y doesn’t divide x evenly. Frank Liu Copyright © 1997 by David R. Unauthorized use. Like the interfaces. } int Arith_min(int x.c: ¢arith. there are no linguistic mechanisms in C to check an implementation’s compliance. } C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws.c²≡ #include "arith. All rights reserved. An implementation must provide the facilities specified by the interface it exports.h file to ensure that its definitions are consistent with the interface’s declarations.y) is x/y . x/y will do: ¢arith. otherwise. This download file is made available for personal use only and is subject to the Terms of Service.h" ¢arith. ¢types².c functions 19²≡ int Arith_max(int x.com. In C. Any other use requires prior written consent from the copyright owner.c functions 19² ¢arith. so different implementations or parts of implementations might be needed for each machine on which the interface is used. int y) { return x > y ? y : x. int y) { return x > y ? x : y. ¢prototypes². int y) { if (¢division truncates toward 0 20² && ¢x and y have different signs 20² && x%y != 0) return x/y .c functions 19². else return x/y. Implementations include the interface’s . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢macros².c functions 19²+≡ int Arith_div(int x. an implementation is provided by one or more .c. but may force implementations to be machine-dependent.1. Hanson.. more involved implementations may have chunks named ¢data². the implementations described in this book have a stylized format illustrated by arith.c files. however. File names in chunks. . are omitted when no confusion results.1. Beyond this. etc. such as arith.IMPLEMENTATIONS 19 dencies. then Arith_div(x.

unless y divides x evenly: ¢arith.c functions 19²+≡ int Arith_floor(int x. so Arith_mod is ¢arith. Hanson. } Arith_mod can also use the % operator if it tests for the same conditions as Arith_div. reproduction and/or distribution are strictly prohibited and violate applicable laws. int y) { return x .y) = x . else return x%y. } Arith_floor is just Arith_div. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner.y*(x/y) + y The underlined subexpression is the Standard C definition of x%y. Frank Liu Copyright © 1997 by David R.c functions 19²+≡ int Arith_mod(int x.com. y). int y) { if (¢division truncates toward 0 20² && ¢x and y have different signs 20² && x%y != 0) return x%y + y. tests which way division truncates. Arith_mod(x. This download file is made available for personal use only and is subject to the Terms of Service. Capturing the outcomes of testing whether x and y are less than zero and comparing these outcomes checks the signs: ¢division truncates toward 0 -13/5 == -2 20²≡ ¢x and y have different signs (x < 0) != (y < 0) 20²≡ Arith_mod could be implemented as it’s defined: int Arith_mod(int x.y*(x/y .. y) = x .1) = x . All rights reserved. dividing −13 by 5.y*Arith_div(x.y*Arith_div(x. and Arith_ceiling is Arith_div plus one. . When those conditions are true. int y) { C Interfaces and Implementations: Techniques for Creating Reusable Software.20 INTERFACES AND IMPLEMENTATIONS The example from the previous section.

com. floating-point numbers. A data type is a set of values. C Interfaces and Implementations: Techniques for Creating Reusable Software. trees. Unauthorized use. and type names. } 2. stk). Its interface defines the type and its five operations: ¢initial version of stack.ABSTRACT DATA TYPES 21 return Arith_div(x. built-in data types include characters. functions. y). reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. is the stack. and enumeration tags occupy a name space that is separate from the space for variables. This download file is made available for personal use only and is subject to the Terms of Service. integers. void *x). Frank Liu Copyright © 1997 by David R. y) + (x%y != 0). extern extern extern extern extern #endif The typedef defines the type Stack_T.3 Abstract Data Types An abstract data type is an interface that defines a data type and operations on values of that type. or ADT. Hanson. and so forth. *stk). A high-level type is abstract because the interface hides the details of its representation and specifies the only legal operations on values of the type. int Stack_empty(Stack_T void Stack_push (Stack_T void *Stack_pop (Stack_T void Stack_free (Stack_T stk).h²≡ #ifndef STACK_INCLUDED #define STACK_INCLUDED typedef struct Stack_T *Stack_T. } int Arith_ceiling(int x. lookup tables. which is a pointer to a structure with a tag of the same name. int y) { return Arith_div(x. The canonical example of an abstract data type. Ideally. . Any other use requires prior written consent from the copyright owner. This idiom is used throughout this book. such as lists.. stk. Structures themselves define new types and can be used to form higher-level types. and more. The typename — Stack_T — is the name of interStack_T Stack_new (void). This definition is legal because structure. All rights reserved. In C. these operations don’t reveal representation details on which clients might implicitly depend. union.

Stack_T is an opaque pointer type. . permit stack. which occurs when interfaces import other interfaces. and the #define for STACK_INCLUDED.h is C Interfaces and Implementations: Techniques for Creating Reusable Software.h to be included more than once. but they can’t dereference them. Another common convention is to prefix an underscore to these kinds of names.22 INTERFACES AND IMPLEMENTATIONS est in this interface. Opaque pointers hide representation details and help catch errors. All rights reserved. second and subsequent inclusions would cause compilation errors about the redefinition of Stack_T in the typedef.. but the _INCLUDED suffix helps avoid collisions. such as in comments.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. However. that is. This convention seems the least offensive of the few available alternatives. clients can manipulate such pointers freely. Any other use requires prior written consent from the copyright owner. Standard C reserves leading underscores for implementors and for future extensions. With this convention. Frank Liu Copyright © 1997 by David R. but litters the directives in many places instead of only in the interface. The convention illustrated above makes the compiler do the dirty work. Hanson. but it says nothing about what those structures look like. attempts to pass other kinds of pointers. By convention. the tag name may be important only to the implementation. The macro STACK_INCLUDED pollutes the name space. yield compilation errors. they can’t look at the innards of the structure pointed to by them. Forbidding interfaces to include other interfaces avoids the need for repeated inclusion altogether. This interface reveals that stacks are represented by pointers to structures. stack. The interfaces in this book carry this convention one step further by using a macro to abbreviate X_T to just T within the interface. but forces interfaces to specify the other interfaces that must be imported some other way. an interface X that specifies an ADT defines it as a type named X_T. so it seems prudent to avoid leading underscores. This download file is made available for personal use only and is subject to the Terms of Service. Without this protection. such as _STACK or _STACK_INCLUDED. and forces programmers to provide the includes. Putting the conditional compilation directives in a client instead of the interface avoids reading the interface unnecessarily. The conditional compilation directives #ifdef and #endif. reproduction and/or distribution are strictly prohibited and violate applicable laws. Using the same name avoids polluting the code with excess names that are rarely used. Only Stack_Ts can be passed to the functions above. Unauthorized use. too. The lone exception is a void pointer. Only the implementation has that privilege. which can be passed to any kind of pointer. such as pointers to other structures.

and sets the variable of type T to the null pointer. Clients. void Stack_push (T stk.ABSTRACT DATA TYPES 23 ¢stack.com.h removes the abbreviation. #undef T #endif This interface is semantically equivalent to the previous one. C Interfaces and Implementations: Techniques for Creating Reusable Software. extern extern extern extern extern T Stack_new (void). All rights reserved. . This download file is made available for personal use only and is subject to the Terms of Service. if names is defined and initialized by #include "stack. deallocates the stack assigned to names and sets names to the null pointer. T always refers to the primary type in the interface. Stack_push pushes a pointer onto a stack. This interface provides unbounded stacks of arbitrary pointers. This design helps avoid dangling pointers — pointers that point to deallocated memory. The abbreviation is just syntactic sugar that makes interfaces a bit easier to read. Stack_new manufactures new stacks. must use Stack_T because the #undef directive at the end of stack. void Stack_free (T *stk). int Stack_empty(T stk). and Stack_empty returns one if the stack is empty and zero otherwise. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. deallocates the stack pointed to by that pointer. For example. the statement Stack_free(&names). Unauthorized use. Stack_free takes a pointer to a T. it returns a value of type T that can be passed to the other four functions. Hanson. void *x). Stack_pop removes and returns the pointer on the top of a stack. however.h" Stack_T names = Stack_new(). void *Stack_pop (T stk). reproduction and/or distribution are strictly prohibited and violate applicable laws.h²≡ #ifndef STACK_INCLUDED #define STACK_INCLUDED #define T Stack_T typedef struct T *T.. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R.

must specify that division by zero is an unchecked runtime error. Good interfaces avoid unchecked runtime errors when possible. An unchecked runtime error is a breach of contract that implementations do not guarantee to detect. Unauthorized use. An implementation must provide the facilities specified in the interface. it defines the structure type as the exported type. When an ADT reveals its representation and exports functions that accept and return structures by value. Making division by zero a checked runtime error is a reasonable alternative. but must specify those that can occur.. All rights reserved. Arith could check for division by zero. but with unpredictable and perhaps unrepeatable results. Programs may be able to recover from exceptions. for example. Hanson. If an unchecked runtime error occurs. Those rules that are not specified by C usage or checked by the C compiler must be spelled out in the interface. Any other use requires prior written consent from the copyright owner. while possible. and variables declared in the interface. . whose behavior is undefined. which declares Text_T to be a typedef for struct Text_T. and clients must use these facilities in accordance with the implicit and explicit rules described in the interface. 2. Similar typedefs are used for most of the ADTs in this book. T always abbreviates the primary type in the interface. but leaves it as an unchecked runtime error so that its functions mimic the behavior of C’s built-in division operators. Arith. execution might continue. Frank Liu Copyright © 1997 by David R. and exceptions. which is why Stack_T is a typedef for a pointer to a struct Stack_T. C’s type-checking rules catch errors in the types and in the numbers of arguments to interface functions. the exported type is a pointer type. This download file is made available for personal use only and is subject to the Terms of Service. This convention is illustrated by the Text interface in Chapter 16. Interfaces often specify unchecked runtime errors. Unchecked and checked runtime errors are not expected user errors. For example. and are program bugs from which there is no recovery. Clients must adhere to them. Runtime errors are breaches of the contract between clients and implementations. such as failing to open a file. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and implementations must enforce them. functions. rarely occur. The programming language provides some implicit rules governing the use of types.24 INTERFACES AND IMPLEMENTATIONS When an ADT is represented by a opaque pointer. Exceptions are conditions that. Exceptions are described in detail in Chapter 4.4 Client Responsibilities An interface is a contract between its implementations and its clients. Running out of memory is an example. C Interfaces and Implementations: Techniques for Creating Reusable Software. In any case. checked runtime errors.com. reproduction and/or distribution are strictly prohibited and violate applicable laws.

. Frank Liu Copyright © 1997 by David R. it’s the client’s responsibility to avoid them.c²≡ #include #include #include #include <stddef. The Stack interface specifies three checked runtime errors: 1. which it uses to allocate space. With these additions to the Stack interface. clients can handle exceptions and take corrective action. or 3. These errors announce a client’s failure to adhere to its part of the contract. 2. C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. Any other use requires prior written consent from the copyright owner. An unhandled exception is treated as a checked runtime error. ¢types 25²≡ struct T { int count. Most of the interfaces in this book specify similar checked runtime errors and exceptions. This download file is made available for personal use only and is subject to the Terms of Service.com. we can proceed to its implementation: ¢stack. For example. Interfaces may specify exceptions and the conditions under which they are raised.h" "stack.h" "mem. . All rights reserved. passing a null pointer to a Stack_T to Stack_free.CLIENT RESPONSIBILITIES 25 A checked runtime error is a breach of contract that implementations guarantee to detect. As explained in Chapter 4. Hanson.h> "assert. Interfaces usually list the exceptions they raise and those raised by any interface they import. the Stack interface imports the Mem interface.h" #define T Stack_T ¢types 25² ¢functions 26² The #define directive reinstantiates T as an abbreviation for Stack_T. so it specifies that Stack_new and Stack_push can raise Mem_Failed. The implementation reveals the innards of a Stack_T. passing a null Stack_T to any routine in this interface. which is a structure with a field that points to a linked list of the pointers on the stack and a count of the number of these pointers. passing an empty stack to Stack_pop. reproduction and/or distribution are strictly prohibited and violate applicable laws.

Unauthorized use.com.. return stk. Stack_empty returns one if the count field is 0 and zero otherwise: ¢functions 26²+≡ int Stack_empty(T stk) { assert(stk). It does nothing if e is nonzero. and provides for graceful program termination. Stack_push and Stack_pop add and remove elements from the head of the linked list emanating from stk->head: C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. Stack_new allocates and initializes a new T: ¢functions 26²≡ T Stack_new(void) { T stk. struct elem *link. NEW(stk). } NEW is an allocation macro from the Mem interface. reproduction and/or distribution are strictly prohibited and violate applicable laws. stk->head = NULL. . All rights reserved. assert is part of the standard library. }. assert is used for all checked runtime errors. assert(e) is an assertion that e is nonzero for any expression e. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. but Chapter 4’s Assert interface defines its own assert with similar semantics. } assert(stk) implements the checked runtime error that forbids a null T to be passed to any function in Stack. This download file is made available for personal use only and is subject to the Terms of Service. stk->count = 0. and halts program execution otherwise. return stk->count == 0.26 INTERFACES AND IMPLEMENTATIONS struct elem { void *x. Any other use requires prior written consent from the copyright owner. so its use in Stack_new allocates a new Stack_T structure. } *head. NEW(p) allocates an instance of the structure pointed to by p.

stk->head = t->link. assert(stk). stk->head = t. x = t->x. Any other use requires prior written consent from the copyright owner. assert(stk). stk->count--. reproduction and/or distribution are strictly prohibited and violate applicable laws. t = stk->head. void *x) { struct elem *t. struct elem *t. Frank Liu Copyright © 1997 by David R. Stack_free also calls FREE: ¢functions 26²+≡ void Stack_free(T *stk) { struct elem *t. t = u) { u = t->link. FREE(t). } C Interfaces and Implementations: Techniques for Creating Reusable Software. } FREE is Mem’s deallocation macro. t->link = stk->head. for (t = (*stk)->head. return x.com. NEW(t). FREE(t). This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(stk->count > 0). stk->count++. t->x = x. All rights reserved. Hanson. then sets the argument to the null pointer for the same reasons that Stack_free does — to help avoid dangling pointers. *u. t. assert(stk && *stk).CLIENT RESPONSIBILITIES 27 ¢functions 26²+≡ void Stack_push(T stk.. it deallocates the space pointed to by its pointer argument. } void *Stack_pop(T stk) { void *x. .

Any object pointer can travel through a void pointer without loss of information. Stack_pop. … t = p.3 explores partial solutions to this problem. and pointers. D *q. however. a variable of type void * can hold any pointer to an object. This download file is made available for personal use only and is subject to the Terms of Service. there’s no guarantee that a void pointer can hold a function pointer.. p and q will be equal. .28 INTERFACES AND IMPLEMENTATIONS FREE(*stk). after executing S *p. they store and return void pointers. For example. } This implementation reveals one unchecked runtime error that all ADT interfaces in this book suffer and that thus goes unspecified. be used to subvert the type system. there’s no guarantee that q will be equal to p. after executing S *p. structures. depending on the alignment constraints for the types S and D. for any nonfunction type S. that q will be a valid pointer to an object of type D. which includes the predefined types. q = t. A void pointer is a generic pointer. While many C compilers permit assignments of function pointers to void pointers.com. void *t. that is. Stack_empty. void *t. Many of the ADTs in this book traffic in void pointers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. There are two more unchecked runtime errors whose effects can be more subtle. All rights reserved. Function pointers are different. Hanson. In Standard C. … t = p. and the Stack_T* passed to Stack_free are valid Stack_Ts returned by Stack_new. reproduction and/or distribution are strictly prohibited and violate applicable laws. Exercise 2. It is an unchecked runtime error to store a function pointer—a pointer to a function—in any such ADT. *q. Unauthorized use. There’s no way to guarantee that the Stack_Ts passed to Stack_push. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. however. Void pointers must not. void pointers and char pointers have C Interfaces and Implementations: Techniques for Creating Reusable Software. or. q = t. For example.

it is an unchecked runtime error to store a pointer to S in an ADT but retrieve it into a variable of type D. but it precludes other. because all scalars. But other pointers might be smaller or have different representations.” because Stack_empty doesn’t modify *stk. return stk->count == 0. All rights reserved. Any other use requires prior written consent from the copyright owner. Suppose an implementation delayed deallocating the stack elements in hope of reusing them. including pointers. } This use of const is incorrect. are passed by value in C. reproduction and/or distribution are strictly prohibited and violate applicable laws. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. Stack_empty might be written as follows. const T stk is useless to both Stack_empty and its callers. alternatives. The intent here is to declare stk to be a “pointer to a constant struct T. with or without the const qualifier. Frank Liu Copyright © 1997 by David R.. This download file is made available for personal use only and is subject to the Terms of Service. Thus. Unauthorized use. equally viable. return stk->count == 0. } This usage illustrates why const should not be used for pointers to ADTs: const reveals something about the implementation and thus constrains the possibilities. Stack_empty can’t change the value of a caller’s actual argument. But the declaration const T stk declares stk to be a “constant pointer to a struct T” — the typedef for T wraps the struct T * in a single type. int Stack_empty(const T stk) { assert(stk). Hanson. but deallocated them when Stack_empty was called. and this entire type is the operand of const. where S and D are different object types. For example.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. That implementation of Stack_empty needs to modify *stk. None of the ADTs in this book use const.CLIENT RESPONSIBILITIES 29 the same size and representation. Using const isn’t a problem for this implementation of Stack. but it can’t because stk is declared const. It is tempting to declare an opaque pointer argument const when an ADT function doesn’t modify the referent. . This problem could be avoided by using struct T * in place of T: int Stack_empty(const struct T *stk) { assert(stk).

Unauthorized use. When this approach is not possible. Most implementations stick to basic data structures such as arrays. The performance impact on real applications due to the overhead of calling functions instead of accessing the fields directly is almost always negligible. so functions such as Stack_empty are used to access fields hidden by the implementations.. If objective measurements show that performance improvements are really necessary.5 Efficiency Most of the implementations for the interfaces in this book use algorithms and data structures for which the average-case running times are no more than linear in N. The standard I/O interface. More complicated algorithms and data structures may have better performance when N is large.30 INTERFACES AND IMPLEMENTATIONS 2. the size of their inputs. and most can handle large inputs. but it is usually small. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. C programmers use interfaces daily: the C library is a collection of 15 interfaces. This download file is made available for personal use only and is subject to the Terms of Service. for example. Interfaces that cannot deal with large inputs or for which performance might be an important consideration specify performance criteria. . Further Reading The importance of libraries of procedures and functions has been recognized since the 1950s. defines an ADT. This paper is over two decades old. Parnas (1972) is a classic paper on how to divide programs into modules. Plauger (1992) gives a detailed description of these 15 interfaces and suitable implementations in much C Interfaces and Implementations: Techniques for Creating Reusable Software. by defining macros. The improvements in reliability and in the opportunities for catching runtime errors are considerable and outweigh the slight costs in performance.h. and combinations of these. hash tables. yet it still addresses issues that face programmers today. Any other use requires prior written consent from the copyright owner. it’s better to create a new interface that states its performance benefits rather than changing an existing interface. they should be made without changing the interface. reproduction and/or distribution are strictly prohibited and violate applicable laws. and operations on pointers to FILEs. stdio. linked lists. Implementations must meet these criteria and clients can expect performance as good as but no better than these criteria specify. All the interfaces in this book use simple but efficient algorithms. and trees. which invalidates all of its clients. FILE. Frank Liu Copyright © 1997 by David R.com. All but a few of the ADTs in this book use opaque pointers.

Unauthorized use. Explain why the explicit test -13/5 == -2 is a better way to implement this test. Some of the interfaces in this book are adapted from those interfaces. and it originates the interface-based terminology used in this book (Nelson 1991). Harbison (1992) is a textbook introduction to Modula-3.EXERCISES 31 the same way that this book tours a set of interfaces and implementations. Rewriting the ADTs in this book in C++. Programmers familiar with object-oriented programming may argue that most of the ADTs in this book can be rendered. The notions of unchecked and checked runtime errors. including C++. STL makes good use of C++ templates to instantiate ADTs for specific types (Musser and Saini 1996). Modula-3 is a relatively new language that has linguistic support for separating interfaces from implementations.1 A preprocessor macro and conditional compilation directives.c does arithmetic the C Interfaces and Implementations: Techniques for Creating Reusable Software. 2.com. The importance of assertions is widely recognized. The principles of interface design illustrated in this book apply equally well to object-oriented languages. STL also provides a suite of functions that manipulate template-generated types. Frank Liu Copyright © 1997 by David R. The C++ Standard Template Library — the STL — provides ADTs similar to those described in this book. For example. such as #if. and the T notation for ADTs. This download file is made available for personal use only and is subject to the Terms of Service. such as Modula-3 and Eiffel (Meyer 1992). The text by Roberts (1995) uses interface-based design as the organizing principle for teaching introductory computer science. have assertion mechanisms built into the language. for example. could have been used to specify how division truncates in Arith_div and Arith_mod. Maguire (1993) devotes an entire chapter to using assertions in C programs. strings. Budd (1991) is a tutorial introduction to the object-oriented programming methodology and to some objectoriented programming languages. is a useful exercise for programmers making the switch from C to C++. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. are taken from Modula-3. and some languages. such as C++ (Ellis and Stroustrup 1990) and Modula-3. Exercises 2. perhaps better.. (1993) describe the core interfaces in their Modula-3 system. STL provides a template for a vector datatype that can be used to instantiate vectors of ints. All rights reserved. as objects in object-oriented programming languages. Hanson. and so on. reproduction and/or distribution are strictly prohibited and violate applicable laws. Horning et al.2 The -13/5 == -2 test used in Arith_div and Arith_mod works as long as the compiler used to compile arith.

Devise a system-specific macro isBadPtr(p) that is one when p is an invalid pointer so that occurrences of assert(ptr) can be replaced with assertions like assert(!isBadPtr(ptr)). Revise stack. for example. Any other use requires prior written consent from the copyright owner. One approach. a nonnull pointer is invalid if it specifies an address outside the client’s address space. For example. and pointers are often subject to alignment restrictions. for example. is to add a field to the Stack_T structure that holds a bit pattern unique to Stack_Ts returned by Stack_new. fix arith. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. 2.com. 2.32 INTERFACES AND IMPLEMENTATIONS same way as Arith_div and Arith_mod do when they are called.5 There are many viable interfaces for stacks. 2. C Interfaces and Implementations: Techniques for Creating Reusable Software..c so that it can check for some occurrences of this error. on some systems a pointer to a double must be a multiple of eight. Without using conditional compilation directives. . This condition might not hold.c so that such cross compilations produce code that is guaranteed to work.4 It’s often possible to detect certain invalid pointers. for example. This download file is made available for personal use only and is subject to the Terms of Service.c were compiled by a cross-compiler that runs on machine X and generates code for machine Y. Design and implement some alternatives to the Stack interface. if arith.3 Like all ADTs in this book. Unauthorized use. For example.” A foreign Stack_T is one that was not manufactured by Stack_new. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. Hanson. one alternative is to specify a maximum size as an argument to Stack_new. the Stack interface omits the specification “it is an unchecked runtime error to pass a foreign Stack_T to any routine in this interface.

int len). extern const char *Atom_new (const char *str.com.3 ATOMS n atom is a pointer to a unique. Another advantage is that using atoms saves space because there’s only one occurrence of each sequence. immutable sequence of zero or more arbitrary bytes. . Comparing two byte sequences for equality by simply comparing pointers is one of the advantages of atoms. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. extern const char *Atom_int (long n). reproduction and/or distribution are strictly prohibited and violate applicable laws. A 3. Any other use requires prior written consent from the copyright owner.1 Interface The Atom interface is simple: ¢atom.h²≡ #ifndef ATOM_INCLUDED #define ATOM_INCLUDED extern int Atom_length(const char *str). Frank Liu Copyright © 1997 by David R. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. Atoms are often used as keys in data structures that are indexed by sequences of arbitrary bytes instead of by integers. which is why it’s called an atom. Two atoms are identical if they point to the same location. The tables and sets described in Chapters 8 and 9 are examples. There is only a single occurrence of any atom. All rights reserved. but a pointer to any sequence of bytes can be an atom. Hanson. Most atoms are pointers to null-terminated strings. extern const char *Atom_string(const char *str). #endif 33 C Interfaces and Implementations: Techniques for Creating Reusable Software.

and returns the atom. This download file is made available for personal use only and is subject to the Terms of Service. Once an atom is created.com. Frank Liu Copyright © 1997 by David R. Atom_new. Atom_string is similar to Atom_new. Atom_new. Any other use requires prior written consent from the copyright owner.2 Implementation The implementation of Atom maintains the atom table. which is a pointer to the copy of the sequence in the atom table. All rights reserved.c²≡ ¢includes 34² ¢macros 37² ¢data 36² ¢functions 35² ¢includes 34²≡ #include "atom. Atom_string. if necessary. reproduction and/or distribution are strictly prohibited and violate applicable laws.. adds a copy of that string to the atom table. 3. it exists for the duration of the client’s execution. for example.34 ATOMS Atom_new accepts a pointer to a sequence of bytes and the number of bytes in that sequence. It is a checked runtime error to pass a null pointer to any function in this interface. Finally. Atom_new never returns the null pointer. Atom_length returns the length of its atom argument. to pass a negative len to Atom_new. and Atom_int can each raise the exception Mem_Failed. Atom_length can take time to execute proportional to the number of atoms. Atom_string. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Atom_string.h" Atom_string and Atom_int can be implemented without knowing the representation details of the atom table. just calls Atom_new: C Interfaces and Implementations: Techniques for Creating Reusable Software. . Atom_int returns the atom for the string representation of the long integer n — another common usage. An atom is always terminated with a null character. it caters to the common use of character strings as atoms. or to pass a pointer that is not an atom to Atom_length. and returns the atom. It accepts a null-terminated string. and Atom_int search the atom table and possibly add new elements to it. Hanson. It is an unchecked runtime error to modify the bytes pointed to by an atom. ¢atom. Unauthorized use. if necessary. and Atom_length just searches it. which Atom_new adds when necessary. It adds a copy of the sequence to the table of atoms.

else m = n. strlen(str)). char *s = str + sizeof str. } ¢includes 34²+≡ #include <limits. return Atom_new(str. This download file is made available for personal use only and is subject to the Terms of Service. else if (n < 0) m = -n. then calls Atom_new: ¢functions 35²+≡ const char *Atom_int(long n) { char str[43].h> #include "assert. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software. unsigned long m. do *--s = m%10 + '0'.com.IMPLEMENTATION 35 ¢functions 35²≡ const char *Atom_string(const char *str) { assert(str). Any other use requires prior written consent from the copyright owner. if (n == LONG_MIN) m = LONG_MAX + 1UL. } ¢includes 34²+≡ #include <string. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. so Atom_int can avoid the ambiguities of the signed operators by using unsigned arithmetic. Unsigned division and modulus are well defined. .. (str + sizeof str) .h> Atom_int must cope with the asymmetrical range of two’scomplement numbers and with the ambiguities of C’s division and modulus operators. All rights reserved. while ((m /= 10) > 0). Unauthorized use. Frank Liu Copyright © 1997 by David R. Hanson. return Atom_new(s.s).h" Atom_int first converts its argument to a string. if (n < 0) *--s = '-'.

The linked list emanating from buckets[i] holds those atoms that hash to i. A hash table is the obvious data structure for the atom table. divides m by 10. and continues until m is zero. which marches s backward in str. reproduction and/or distribution are strictly prohibited and violate applicable laws. and the str fields points to the C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. The hash table is an array of pointers to lists of entries. The length of the hash table buckets below — 2.s characters. Hanson. If n is negative. each of which holds one atom: ¢data 36²≡ static struct atom { struct atom *link. or when it is part of an interface.com.. . the value appears only once. The loop forms the decimal string representation of m from right to left. the len field holds the length of the sequence. All rights reserved.36 ATOMS The absolute value of the most negative signed long integer cannot be represented. which is enough to hold the decimal representation of any integer on any conceivable machine. Here. because there is one more negative number than positive number in two’s-complement systems.h. The string representation of any 128-bit signed integer in octal — base 8 — fits in 128/3 + 1 = 43 characters. it’s stored at --s. An entry’s link field points to the next entry on the list. When the conversion is done. Atom_new thus starts by testing for this single anomaly before assigning the absolute value of its argument to the unsigned long integer m. and this string has &str[43] . Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. a minus sign is stored at the beginning of the string. but it will also make the code longer and clutter the name space. The 43 in the definition of str is an example of a “magic number. so 43 characters are enough. int len. } *buckets[2048].” and it’s usually better style to define a symbolic name for such values to ensure that the same value is used everywhere. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. s points to the desired string. it computes the rightmost digit. Defining a symbolic name might make the code easier to read. In this book. As each digit is computed. str has 43 characters. The value of LONG_MAX resides in the standard header limits. that longs are 128 bits. Suppose. char *str. a symbolic name is defined only when the value appears more than once. Unauthorized use. and sizeof is used whenever the value is used. however. for example. The decimal representation can take no more digits than the octal representation.048 — is another example of this convention.

C Interfaces and Implementations: Techniques for Creating Reusable Software.. assert(str). p. reduces this hash number modulo the number of elements in buckets. } ¢macros 37²≡ #define NELEMS(x) ((sizeof (x))/(sizeof ((x)[0]))) The definition of NELEMS illustrates a common C idiom: The number of elements in an array is the size of the array divided by the size of each element. int len) { unsigned long h. } ¢allocate a new entry 39² return p->str.. struct atom *p. macro parameters are italicized to highlight where they are used in the macro body.2 shows the overall structure of the hash table. Each entry is just large enough to hold its sequence. Unauthorized use.len-1] (or the empty sequence. Atom_new computes a hash number for the sequence given by str[0. i < len && p->str[i] == str[i]. Frank Liu Copyright © 1997 by David R. on a little endian computer with 32-bit words and 8-bit characters. reproduction and/or distribution are strictly prohibited and violate applicable laws. Figure 3. int i. for (p = buckets[h]. it simply returns the atom: ¢functions 35²+≡ const char *Atom_new(const char *str. and searches the list pointed to by that element of buckets.. ¢h ← hash str[0.len-1] is already in the table. . This download file is made available for personal use only and is subject to the Terms of Service. p = p->link) if (len == p->len) { for (i = 0. Atom_string("an atom") allocates the struct atom shown in Figure 3.IMPLEMENTATION 37 sequence itself. For example.com. Hanson. if len is zero). sizeof is a compile-time operator. ) i++..len-1] 39² h %= NELEMS(buckets). so this computation applies only to arrays whose size is known at compile time. if (i == len) return p->str. Any other use requires prior written consent from the copyright owner. where the underscore character (_) denotes a space. As this definition illustrates. If it finds that str[0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(len >= 0). All rights reserved.1.

Any other use requires prior written consent from the copyright owner.38 ATOMS link len str a \0 _ m n o a t 7 Figure 3. Frank Liu Copyright © 1997 by David R.com. Hanson.1 Little endian layout of a struct atom for "an atom" buckets 0 ••• ••• 2047 Figure 3. All rights reserved.. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use.2 Hash table structure C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. .

Atom_new cannot use Mem’s NEW. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.len-1] 39²≡ for (h = 0.. If they are so distributed. but adding it at the front of the list is simpler. if (len > 0) memcpy(p->str. If N is less than. i = 0. Atom_new uses a simple table-lookup algorithm: ¢h ← hash str[0. str. i++) h = (h<<1) + scatter[(unsigned char)str[i]]. Hashing the sequence passed to Atom_new involves computing an unsigned number to represent the sequence. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. p->len = len.len-1] into the additional space and linking the new entry onto the beginning of the list emanating from buckets[h]. the search time is essentially a constant. Atom_new adds it by allocating a struct atom and enough additional space to hold the sequence. p->link = buckets[h]. The entry could be appended to the end of the list. which is illustrated in Stack_push. and there are many good hash functions.IMPLEMENTATION 39 If str[0. and the average time to search for a sequence will be N/2•NELEMS(buckets). because the number of bytes depends on len. copying str[0. ¢allocate a new entry 39²≡ p = ALLOC(sizeof (*p) + len + 1). and the sequence is stored in the immediately succeeding bytes. Hashing is a well-studied subject. these hash numbers should be distributed uniformly over the range zero to NELEMS(buckets)−1 for N sequences. All rights reserved. 2•NELEMS(buckets). .. Any other use requires prior written consent from the copyright owner. each list in buckets will have N/NELEMS(buckets) elements. buckets[h] = p. reproduction and/or distribution are strictly prohibited and violate applicable laws. Ideally. ¢includes 34²+≡ #include "mem.com. p->str = (char *)(p + 1). i < len. len)... p->str[len] = '\0'. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. NEW applies only when the number of bytes is known at compile time. The call to ALLOC above allocates the space for both the atom structure and for the sequence. and it mimics the standard library function malloc: its argument is the number of bytes needed.h" Licensed by Frank Liu 1740749 ALLOC is Mem’s primary allocation function.len-1] isn’t in the table. Frank Liu Copyright © 1997 by David R. say.

69050267. 200158423. 1353778505. 593537720. 1639096332. 1561889936. 1047146551. 1668583807. 1261483377. Experience shows that this simple approach helps to more uniformly distribute the hash values. 1198042020. 985847834. 1098183682. 1703199216. 907175364. 433832350. 157584038. 304920680. 980071615. 1612420674. 316088629. 382361684. 1504556512. 973726663. 1623219634. 1405390230. 1956764878. 404219416. 139971187. 1788268214. 1347315106. 1884751139. All rights reserved. 618454448. 125310639. 143302914. 48433401. 438263339. 1513900641. 299641085. 1723401032. 1363416242. 1276257488. 1011597961. 744509763. 1834290522. 1783300476. 400011959. 1802120822. Unauthorized use. 884508252. 121093252. C Interfaces and Implementations: Techniques for Creating Reusable Software. 958076904. 2102252735. 1746481261. 1002804590. 946827723. . 89017443. 774132919. 643279273. 23330161. 202956538. 1968401469. 1506717157. 433233439. 1489680608. 154523136. 124757630. 1440466707. 2013649480. 1504871315. Any other use requires prior written consent from the copyright owner. 885626931. 2018585307. 1134594751. 1965443329. 1530777140. 1479089144. 1851737163. Hanson. 766270699. 523904750. 1315461275. 306401572. 247038362. 1762719719. 348303940. 1247304975. 1008956512. 2082550272. 1733966678. 457882831. 2042227746. 1605135681. 1871172724. 994817018. Casting str[i] to an unsigned character avoids C’s ambiguity about “plain” characters: they can be signed or unsigned. 2055041154. 485614972.. 923108080. 2118956479. This download file is made available for personal use only and is subject to the Terms of Service. 302572379. 1953210302. 899131941. 1891386329. 59253759. 1859353594. 216161028. 109190086. 367393385. 1435821048. 1510320440. 1745777927. 1651524391. 306246424. 1337551289. 488944891. 1925613800. 2109864544. 755120366. 1237390611. 834307717.com. 1830418225. 704035832. 813528929. 706686964. 836935336. which were generated by calling the standard library function rand. 1532354806. 1364585325. 1441966234. 45248011. 1636505764. 1785335569. 36406206. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 1069844923. 1027100827. 1642837685. 946006977. 1215013716. 392083579. 1609787317. 1213147837. 275680090. 573714703. 2081522508. 259412182. 1800004220. 266772940. 567072918. 478464352. 1828531389. ¢data 36²+≡ static unsigned long scatter[] = { 2078917053. 1272929208. 1568675693. values of str[i] that exceed 127 would yield negative indices on machines that use signed characters. 208787970. 1415743291. reproduction and/or distribution are strictly prohibited and violate applicable laws. 928306929. 1893464764. 1734544947. 1303742040. Frank Liu Copyright © 1997 by David R. 1583583926. 1771058762. 1010757900. 269676381. 148144545. 871926821. 471560540. 336563455. 1839739926. 618906479. 1117546963. 1680673954. 2002600785. 1482824219. 876213618. 1509024645. 1953439621. 1169907872. 1498661368. 1447372094. 1640123668. 1049400181. 1099951567. 1238619545. Without the cast. 755253631. 1853748387. 1982435068. 1300134328. 1902249868. 579587572.40 ATOMS scatter is a 256-entry array that maps bytes to random numbers. 1037746818. 1961288571.

484902984. for (i = 0. 513183458. not just a pointer to a string. 719930310. Any other use requires prior written consent from the copyright owner. p. 1193650546. reproduction and/or distribution are strictly prohibited and violate applicable laws. 2143346068. p = p->link) if (p->str == str) return p->len. 13672163. 536647531. 228223074. 147857043. 1306570817. 262925046. If it finds the atom. 244413420. 275676551. 2108097238. 139978006. 36125855. 2026501127. Unauthorized use. 1686379655. 1699573055. 1884137923. . i < NELEMS(buckets). 1602280572 }. 188942202. } assert(0) implements the checked runtime error that Atom_length must be called only with an atom. 365326415. assert(0) is also used to signal conditions that are not supposed to occur — so-called “can’t-happen” conditions. 48964196. All rights reserved. 1549759737. 265012701. 2136819798. 1236198663. 790369079. 552005290.IMPLEMENTATION 41 2018281851. 359743094. 1820959944. 1601294739. 934220434. Atom_length can’t hash its argument because it doesn’t know its length. i++) for (p = buckets[i]. 92778659. 1889425288. 1896806882. 1621212876. 1865626297. 267062626. 176384212. C Interfaces and Implementations: Techniques for Creating Reusable Software. 1354150250. 1893762099. it returns the atom’s length: ¢functions 35²+≡ int Atom_length(const char *str) { struct atom *p. 53392249. 313561074. int i. 1301613820. 1947861263. so Atom_length can simply scream through the lists in buckets comparing pointers. 776289160. 672987810. This download file is made available for personal use only and is subject to the Terms of Service. 1843084537. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. But the argument must be an atom. 1404522304. 489389012. Hanson.com. assert(str). 593586330. 281341425.. 1856406685. assert(0). 1735424165. 2116758626. 5816381. return 0. 264348929. 509027654. 1730298077. 294184214. 273227984. 286900147. 1975249606. 894834024. 1136476375. Frank Liu Copyright © 1997 by David R. 503211273. 360187215.

which implemented strings almost exactly as described in this chapter (Griswold 1972). Sethi.2 Scour the literature for better hash functions. All rights reserved. Try these functions and measure their benefits. reproduction and/or distribution are strictly prohibited and violate applicable laws. The C compiler lcc (Fraser and Hanson 1995) has a module that is similar to Atom and is the predecessor to Atom’s implementation. and never deallocates them. and repeat the measurements. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and in string-manipulation languages. 10. Exercises 3. Frank Liu Copyright © 1997 by David R. Does using a prime help? How much does your conclusion depend on your specific machine? 3. .. and texts on compilers. and Ullman (1986). This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. Then change buckets so that it has 2. such as SNOBOL4. lcc stores the strings for all identifiers and constants that appear in the source program in a single table.4 Here’s another way to declare the atom structure: C Interfaces and Implementations: Techniques for Creating Reusable Software. Atom uses a power of two.039 entries (the largest prime less than 2.000 typical strings and measure Atom_new’s speed and the distribution of the lengths of the lists. which is the source of their name. Sedgewick (1990) and Knuth (1973b) describe hashing in detail and give guidelines for writing good hash functions.42 ATOMS Further Reading Atoms have long been used in LISP.com. Write a program to generate or read. 3. similar texts on algorithms and data structures and the papers they cite. 3.3 Explain why Atom_new doesn’t use the standard C library function strncmp to compare sequences. The hash function used in Atom (and in lcc) was suggested by Hans Boehm. say. Doing so never consumes too much storage because the number of distinct strings in C programs is remarkably small regardless of the size of the source programs. likely sources are Knuth (1973b). Using a prime and a good hash function usually gives a better distribution of the lengths of the lists hanging off of buckets. Hanson.048). which is sometimes explicitly cited as a bad choice.1 Most texts recommend using a prime number for the size of buckets. such as Aho. Any other use requires prior written consent from the copyright owner.

8 There are several functions to deallocate atoms that extensions to the Atom interface might provide. char str[1]. If the hash numbers (not the indices into buckets) for each atom were also stored in struct atoms. .com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and the effect of these accesses is undefined.. Hanson. they could be compared. and a str field long enough to hold len + 1 bytes. This approach avoids the time and space required for the extra indirection induced by declaring str to be a pointer. A struct atom for a string of len bytes is allocated by ALLOC(sizeof (*p) + len).5 Atom_new compares the len field of struct atoms with the length of the incoming sequence to avoid comparing sequences of different lengths. Unauthorized use. For example. where hint estimates the number of atoms the client expects to create. Implement this approach and measure the cost of the indirection. too. Revise Atom’s implementation so that Atom_length’s running time is approximately the same as that of Atom_new. Frank Liu Copyright © 1997 by David R.6 Atom_length is slow. int len. reproduction and/or distribution are strictly prohibited and violate applicable laws. Implement extern void Atom_init(int hint). Any other use requires prior written consent from the copyright owner. All rights reserved. which this exercise and those that follow explore. There are other functions and designs that might be useful.EXERCISES 43 struct atom { struct atom *link. the functions C Interfaces and Implementations: Techniques for Creating Reusable Software. }. this “trick” violates the C standard. 3. Unfortunately. which allocates space for the link and len fields. This download file is made available for personal use only and is subject to the Terms of Service.7 The Atom interface evolved to its present form because its functions were the ones that clients used most often. because clients access the bytes beyond str[0]. What checked runtime errors would you add to constrain when Atom_init could be called? 3. Is it worthwhile? 3. Implement this “improvement” and measure the benefits. Are the savings worth violating the standard? 3.

This download file is made available for personal use only and is subject to the Terms of Service.). which is trivially true for string constants. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. 3. Implement these functions. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. If you provide Atom_add and Atom_free (and Atom_reset from Exercise 3. All rights reserved. 3.. and Atom_aload does the same for a null-terminated array of pointers to strings. int len). what checked runtime errors must be specified and implemented? C Interfaces and Implementations: Techniques for Creating Reusable Software. .44 ATOMS extern void Atom_free (const char *str).. Implement extern void Atom_vload(const char *str. Atom_vload installs the strings given in the variable length argument list up to a null pointer. Hanson.. extern void Atom_aload(const char *strs[]). could deallocate the atom given by str and deallocate all atoms.8).9 Some clients start execution by installing a bunch of strings as atoms for later use. Unauthorized use. which works like Atom_new but doesn’t make a copy of the sequence. respectively. extern void Atom_reset(void). Implement extern const char *Atom_add(const char *str. .10 Copying the strings can be avoided if the client promises not to deallocate them. Don’t forget to specify and implement appropriate checked runtime errors.

They are never expected and always indicate program bugs.3. Thus. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. there is no way to recover from these kinds of errors.com. An exception is an error that may be rare and perhaps unexpected. Some exceptions mirror the cababilities of the machine. Other exceptions indicate conditions detected by the operating system. runtime errors. Programs must plan for and deal with such errors. functions that must cope with user errors return error codes — the errors are a normal part of the computation. but from which recovery may be possible. They are not user errors. Examples include naming nonexistent files. such as hitting an “interrupt” key or getting a write error while writing a file. specifying badly formed numbers in spreadsheets. Usually. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. Unauthorized use. the application must be terminated gracefully. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service.4 EXCEPTIONS AND ASSERTIONS hree kinds of errors occur in programs: user errors. Hanson. and presenting source programs with syntax errors to compilers. perhaps in a way that depends on the machine or the application. These kinds of exceptions are often delivered by signals in UNIX systems T 45 C Interfaces and Implementations: Techniques for Creating Reusable Software. The implementations in this book use assertions to catch these kinds of errors. Exceptions occupy the middle ground between user errors and program bugs. Any other use requires prior written consent from the copyright owner. . and exceptions. User errors are expected because they’re likely to occur as the result of erroneous user input.. examples are arithmetic overflow and underflow and stack overflow. perhaps initiated by the user. Assertions always cause the program to halt. Handling assertions is described in Section 4. The checked runtime errors described in previous chapters are at the other end of the error spectrum.

Hanson. so functions in which they might occur don’t usually return error codes. Transferring control to a handler is like a nonlocal goto — the handler may have been instantiated in a routine far from the one in which the exception was raised. which is in the standard header setjmp. and returns the pointer returned by malloc.. however. The scope of an exception is dynamic: when an exception is raised. malloc returns the null pointer. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which are handled by recovery code. Suppose the function allocate calls malloc to allocate n bytes. In C. . An example illustrates the long story. reproduction and/or distribution are strictly prohibited and violate applicable laws. and allocate checks Allocation_handled before raising the exception: void *allocate(unsigned n) { void *new = malloc(n). Frank Liu Copyright © 1997 by David R. } C Interfaces and Implementations: Techniques for Creating Reusable Software. the standard library functions setjmp and longjmp form the basis for building a structured exception facility.46 EXCEPTIONS AND ASSERTIONS and processed by signal handlers. Applications raise exceptions. if recovery is possible. Allocation_handled is zero unless a handler has been instantiated. allocate wants to raise the Allocate_Failed exception.h: #include <setjmp. it is handled by the handler that was most recently instantiated. 1). Unauthorized use. Exceptions may also occur when limited resources are exhausted. Any other use requires prior written consent from the copyright owner. The exception itself is declared as a jmp_buf. which indicates that the space requested cannot be allocated. The short story is that setjmp instantiates a handler and longjmp raises an exception. this would clutter the code for the rare cases and obscure the common cases. This download file is made available for personal use only and is subject to the Terms of Service. jmp_buf Allocate_Failed.com.h> int Allocation_handled = 0. Some languages have built-in facilities for instantiating handlers and raising exceptions. if (new) return new. or a user specifies a spreadsheet that’s too big. if (Allocation_handled) longjmp(Allocate_Failed. Exceptions don’t happen often. such as when an application runs out of memory. All rights reserved. If. assert(0).

Also. The call to setjmp returns zero. exit(EXIT_FAILURE). If the allocation fails. 4. failing to set it or clear it at the right times causes chaos. A handler is instantiated by calling setjmp(Allocate_Failed). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which returns an integer. execution continues with the call to allocate. Thus. the longjmp in allocate causes setjmp to return again.. Any other use requires prior written consent from the copyright owner. and the macros clearly identify where exceptions are used. When setjmp returns zero. Frank Liu Copyright © 1997 by David R. which is one in the example above. This download file is made available for personal use only and is subject to the Terms of Service. An exception is a global or static variable of type Except_T: ¢except. The interesting feature of setjmp is that it can return twice.INTERFACE 47 allocate uses an assertion to implement a checked runtime error when allocation fails and no handler has been instantiated. Allocation_handled = 0. described in the next section. so execution continues with the calls to fprintf and exit. say. this time with the value one. makebuffer. Unauthorized use. a client handles an exception by testing the value returned by setjmp: char *buf. Nested handlers must be provided because clients can’t know about the handlers instantiated by an implementation for its own purposes. } buf = allocate(4096).com. It isn’t perfect. The call to longjmp in allocate causes the second return of the value given by longjmp’s second argument. but it avoids the errors outlined above. handles these omissions. Allocation_handled = 1. . the Allocation_handled flag is awkward. "couldn't allocate the buffer\n").h²≡ #ifndef EXCEPT_INCLUDED C Interfaces and Implementations: Techniques for Creating Reusable Software. The Except interface. Hanson. which itself instantiates a handler and called allocate. All rights reserved. if (setjmp(Allocate_Failed)) { fprintf(stderr. which would occur if the code above called. This example doesn’t cope with nested handlers.1 Interface The Except interface wraps the setjmp/longjmp facility in a set of macros and functions that collaborate to provide a structured exception facility. reproduction and/or distribution are strictly prohibited and violate applicable laws.

48 EXCEPTIONS AND ASSERTIONS #define EXCEPT_INCLUDED #include <setjmp. Frank Liu Copyright © 1997 by David R. which can be initialized to a string that describes the exception.com.h> #define T Except_T typedef struct T { const char *reason. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. An exception e is raised by the RAISE macro or by the function Except_raise: ¢exported macros 48²≡ #define RAISE(e) Except_raise(&(e). const char *file.. It is an unchecked runtime error to declare an exception as a local variable or as a parameter. . __LINE__) ¢exported functions 48²≡ void Except_raise(const T *e. which are implemented with macros. } T. __FILE__. It is a checked runtime error to pass a null e to Except_raise. Handlers are instantiated by the TRY-EXCEPT and TRY-FINALLY statements. ¢exported ¢exported ¢exported ¢exported #undef T #endif Except_T structures have only one field. These statements handle nested exceptions and manage exception-state data. Unauthorized use. Any other use requires prior written consent from the copyright owner.int line). Exception handlers manipulate the addresses of exceptions. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. This string is printed when an unhandled exception occurs. Exceptions must be global or static variables so that their addresses identify them uniquely. The syntax of the TRY-EXCEPT statement is TRY S EXCEPT( e 1 ) types 53² variables 53² functions 48² macros 48² C Interfaces and Implementations: Techniques for Creating Reusable Software.

which ends at the corresponding END_TRY. All rights reserved. Frank Liu Copyright © 1997 by David R. void *allocate(unsigned n) { void *new = malloc(n). which allocate raises if malloc returns the null pointer: Except_T Allocate_Failed = { "Allocation failed" }. and execution continues after the END_TRY. The ELSE clause is optional.. e 2 .com. If S raises an exception that is not one of e 1 – e n. . the handlers are dismantled. the handlers are dismantled. the statements following ELSE are executed. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The handlers are dismantled. and the exception is passed to the handlers established by the previously executed TRY-EXCEPT or TRY-FINALLY statement. This download file is made available for personal use only and is subject to the Terms of Service. and execution continues after the END_TRY. TRY introduces a new scope. TRY-END_TRY is syntactically equivalent to a statement. the execution of S is interrupted and control transfers immediately to the statements following the relevant EXCEPT clause.INTERFACE 49 S1 EXCEPT( e 2 ) S2 … EXCEPT( e n ) Sn ELSE S0 END_TRY The TRY-EXCEPT statement establishes handlers for exceptions named e 1 . Any other use requires prior written consent from the copyright owner. RAISE(Allocate_Failed). Rewriting the example at the end of the previous section illustrates the use of these macros. Hanson.... Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. the handlers are dismantled and execution continues at the statement after the END_TRY. Unauthorized use. . the handler statements S i in the EXCEPT clause are executed. if (new) return new. If S raises an exception that is not handled by one of the S i . and executes the statements S. reproduction and/or distribution are strictly prohibited and violate applicable laws. If no exceptions are raised by S. Allocate_Failed becomes an exception. e n. If S raises an exception e where e is one of e 1 – e n .

END_TRY. char *buf. END_TRY. int i = 0. RAISE(e). printf("%d\n". Automatic variables that are changed in S must be declared volatile. All rights reserved. the fragment static Except_T e. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. exit(EXIT_FAILURE). reproduction and/or distribution are strictly prohibited and violate applicable laws. The syntax of the TRY-FINALLY statement is C Interfaces and Implementations: Techniques for Creating Reusable Software. if S changes an automatic variable. can print 0 or 1. Any other use requires prior written consent from the copyright owner. depending on the implementation-dependent details of setjmp and longjmp. so Standard C’s caveats about the use of these functions apply to TRYEXCEPT statements.com. For example. EXCEPT(e) . Frank Liu Copyright © 1997 by David R. the change may not survive if an exception causes execution to continue in any of the handler statements S i or after the closing END_TRY. EXCEPT(Allocate_Failed) fprintf(stderr.. This download file is made available for personal use only and is subject to the Terms of Service. TRY-EXCEPT statements are implemented with setjmp and longjmp. "couldn't allocate the buffer\n"). } If the client code wants to handle this exception. it calls allocate from within a TRY-EXCEPT statement: extern Except_T Allocate_Failed. TRY buf = allocate(4096). changing the declaration for i to volatile int i = 0. . causes the example above to print 1. for example. Specifically. Unauthorized use. i).50 EXCEPTIONS AND ASSERTIONS assert(0). TRY i++.

This download file is made available for personal use only and is subject to the Terms of Service. … FINALLY fclose(fp). the exception that caused its execution is reraised so that it can be handled by a previously instantiated handler. TRY buf = allocate(4096).com. END_TRY. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. \ Except_frame. One purpose of the TRY-FINALLY statement is to give clients an opportunity to “clean up” when an exception occurs. . Any other use requires prior written consent from the copyright owner.line) The TRY-FINALLY statement is equivalent to TRY S ELSE S1 RERAISE. char *buf. S1 Note that S 1 is executed whether S raises an exception or not. All rights reserved. Except_frame. After S 1 is executed.. Unauthorized use.exception. S 1 is executed and execution continues at the statement after the END_TRY. If S raises an exception. Handlers can reraise exceptions explicitly with the RERAISE macro: ¢exported macros 48²+≡ #define RERAISE Except_raise(Except_frame. Note that S 1 is executed in both cases.INTERFACE 51 TRY S FINALLY S1 END_TRY If no exceptions are raised by S. the execution of S is interrupted and control transfers immediately to S 1. reproduction and/or distribution are strictly prohibited and violate applicable laws.file. FILE *fp = fopen(…). For example. END_TRY. C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R.

They suffice for most applications because exceptions should be used sparingly — only a handful in a large application. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. Any other use requires prior written consent from the copyright owner. .0) default: return The RETURN macro is used instead of return statements inside TRY statements. END_TRY The final macro in the interface is ¢exported macros 48²+≡ #define RETURN switch (¢pop 56². It is a unchecked runtime error to execute the C return statement inside a TRY-EXCEPT or TRY-FINALLY statement. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. The macros in the Except interface are admittedly crude and somewhat brittle. it is handled by the previously instantiated handler.com. If any of the statements in a TRY-EXCEPT or TRY-FINALLY must do a return.52 EXCEPTIONS AND ASSERTIONS closes the file opened on fp whether allocation fails or succeeds. Unauthorized use. Frank Liu Copyright © 1997 by David R. If allocation does fail. The degenerate statement TRY S END_TRY is equivalent to TRY S FINALLY . Their unchecked runtime errors are particularly troublesome. and can be particularly difficult bugs to find. If exceptions proliferate.. This switch statement is used in this macro so that both RETURN and RETURN e expand into one syntactically correct C statement. C Interfaces and Implementations: Techniques for Creating Reusable Software. If S 1 in a TRY-FINALLY statement or any of the handlers in a TRYEXCEPT statement raises an exception. another handler must deal with Allocate_Failed. they must do so with this macro instead of with the usual C return statement. The details of ¢pop 56² are described in the next section. reproduction and/or distribution are strictly prohibited and violate applicable laws. it’s usually a sign of more serious design errors.

Except_raise. int line. The env field of this structure is a jmp_buf. FINALLY. and calls longjmp. file. ¢exported types 53²≡ typedef struct Except_Frame Except_Frame. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. If an exception occurs and control reaches an END_TRY clause without handling it. which is used by setjmp and longjmp. reproduction and/or distribution are strictly prohibited and violate applicable laws. The TRY clause pushes a new Except_Frame onto the exception stack and calls setjmp.2 Implementation The macros and functions in the Except interface collaborate to maintain a stack of structures that record the exception state and the instantiated handlers. pops the Except_Frame off the exception stack. ¢exported variables 53²≡ extern Except_Frame *Except_stack. jmp_buf env. and END_TRY collaborate to translate a TRY-EXCEPT statement into a statement of the form do { create and push an Except_Frame C Interfaces and Implementations: Techniques for Creating Reusable Software. raising an exception stores the address of the exception in the exception field. and stores the exception coordinates — the file and line number where the exception was raised — in the file and line fields. ELSE. EXCEPT clauses test the exception field of this frame to determine which handler applies. All rights reserved.com. this stack thus handles nested exceptions. and line fields in the top frame. which is called by RAISE and RERAISE. and the prev field of each frame points to its predecessor. The macros TRY. This download file is made available for personal use only and is subject to the Terms of Service. struct Except_Frame { Except_Frame *prev. Unauthorized use. EXCEPT. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. . Except_stack points to the top exception frame on the exception stack. }. const char *file. const T *exception. As suggested by the definition of RERAISE in the previous section.. fills in the exception. The FINALLY clause executes its clean-up code and reraises the exception stored in the popped frame.IMPLEMENTATION 53 4. Hanson. the exception is reraised.

. Except_raised. Figure 4. Unauthorized use. for example. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Except_handled. \ if (Except_flag == Except_entered) { There are four states within a TRY statement.. ¢exported types 53²+≡ enum { Except_entered=0. the box surrounds the FINALLY code. } while (0) The do-while statement makes the TRY-EXCEPT syntactically equivalent to a C statement so that it can be used like any other C statement. Figure 4. Any other use requires prior written consent from the copyright owner.env). The space for an Except_Frame is allocated simply by declaring a local variable of that type inside the compound statement in the body of the do-while begun by TRY: ¢exported macros 48²+≡ #define TRY do { \ volatile int Except_flag.com. \ Except_Frame Except_frame. C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service.54 EXCEPTIONS AND ASSERTIONS if (first return from setjmp) { S } else if (exception is e 1 ) { S1 … } else if (exception is e n ) { Sn } else { S0 } if (an exception occurred and wasn’t handled) RERAISE.2 shows the expansion of the TRY-FINALLY statement. reproduction and/or distribution are strictly prohibited and violate applicable laws. boxes surround the code from the EXCEPT macro. Except_finalized }. Hanson. Frank Liu Copyright © 1997 by David R. It can. \ ¢push 56² \ Except_flag = setjmp(Except_frame.1 shows the code generated for the general TRY-EXCEPT statement The shaded boxes highlight the code resulting from the expansion of the TRY and END_TRY macros. All rights reserved. be used as the consequent of an if statement. as suggested by the following enumeration identifiers. and the double-lined box surrounds the ELSE code.

All rights reserved. Except_frame. which indicates that a TRY statement has been entered and an exception frame has been pushed onto the exception stack. Except_entered must be zero. } else { Except_flag = Except_handled. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. } if (Except_flag == Except_raised) Except_raise(Except_frame.file.exception == &( e 1 )) { Except_flag = Except_handled.1 Expansion of the TRY-EXCEPT statement The first return from setjmp sets Except_flag to Except_entered. } else if (Except_frame. Frank Liu Copyright © 1997 by David R..env).exception == &( e n )) { Except_flag = Except_handled.line). } else if (Except_frame. if (Except_flag == Except_entered) { S if (Except_flag == Except_entered) Except_stack = Except_stack->prev. which indicates that an C Interfaces and Implementations: Techniques for Creating Reusable Software. Except_flag = setjmp(Except_frame. Except_frame. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Sn if (Except_flag == Except_entered) Except_stack = Except_stack->prev. if (Except_flag == Except_entered) Except_stack = Except_stack->prev. Except_frame. S2 } … } else if (Except_frame.IMPLEMENTATION 55 do { volatile int Except_flag. Except_Frame Except_frame. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. .exception == &( e 2 )) { Except_flag = Except_handled. Except_stack = &Except_frame.exception. because the initial call to setjmp returns zero. subsequent returns from setjmp set it to Except_raised. } while (0) Figure 4.com. S1 if (Except_flag == Except_entered) Except_stack = Except_stack->prev. Any other use requires prior written consent from the copyright owner. S0 if (Except_flag == Except_entered) Except_stack = Except_stack->prev.prev = Except_stack.

} if (Except_flag == Except_raised) Except_raise(Except_frame.1. } { if (Except_flag == Except_entered) Except_flag = Except_finalized. \ Except_stack = &Except_frame. Hanson.com. Except_Frame Except_frame. The Except_Frame is pushed onto the exception stack by adding it to the head of the linked list of Except_Frame structures pointed to by Except_stack. . All rights reserved.2 Expansion of the TRY-FINALLY statement exception occurred.prev = Except_stack.line). } while (0) Figure 4. Frank Liu Copyright © 1997 by David R. Except_stack = &Except_frame. This download file is made available for personal use only and is subject to the Terms of Service. Except_frame. ¢pop 56²≡ Except_stack = Except_stack->prev The EXCEPT clauses become the else-if statements shown in Figure 4..file.env). Unauthorized use. Except_flag = setjmp(Except_frame. Except_frame.prev = Except_stack. Handlers set Except_flag to Except_handled to indicate that they’ve handled the exception. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. S1 if (Except_flag == Except_entered) Except_stack = Except_stack->prev. ¢exported macros 48²+≡ #define EXCEPT(e) \ C Interfaces and Implementations: Techniques for Creating Reusable Software. if (Except_flag == Except_entered) { S if (Except_flag == Except_entered) Except_stack = Except_stack->prev.56 EXCEPTIONS AND ASSERTIONS do { volatile int Except_flag. and the top frame is popped by removing it from that list: ¢push 56²≡ Except_frame.exception. reproduction and/or distribution are strictly prohibited and violate applicable laws. Except_frame. Any other use requires prior written consent from the copyright owner.

Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 57 ¢pop if this chunk follows S 57² \ } else if (Except_frame. C Interfaces and Implementations: Techniques for Creating Reusable Software. but the else-if is just an else: ¢exported macros 48²+≡ #define ELSE \ ¢pop if this chunk follows S 57² \ } else { \ Except_flag = Except_handled. Similarly.com. and the if statement in ¢pop if this chunk follows S 57² protects against popping it again. the exception stack is popped. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. the FINALLY clause is like an ELSE clause without the else: Control falls into the clean-up code. All rights reserved. Frank Liu Copyright © 1997 by David R. pops the exception stack only in the first EXCEPT clause. Except_flag is left at Except_raised so that it can be reraised after the clean-up code has been executed. the exception stack has already been popped. Except_flag is changed from Except_entered to Except_finalized here to indicate that an exception did not occur but that a FINALLY clause did appear. The exception is reraised by testing whether Except_flag is equal to Except_raised in the expansion for END_TRY. as the chunk ¢pop if this chunk follows S 57² illustrates. Except_flag remains Except_entered. Hanson. ¢pop if this chunk follows S 57²≡ if (Except_flag == Except_entered) ¢pop 56². The second and subsequent EXCEPT clauses follow handlers in which Except_flag has been changed to Except_handled. . This chunk. ¢exported macros 48²+≡ #define FINALLY \ ¢pop if this chunk follows S 57² \ } { \ if (Except_flag == Except_entered) \ Except_flag = Except_finalized. The ELSE clause is like an EXCEPT clause.exception == &(e)) { \ Except_flag = Except_handled. This download file is made available for personal use only and is subject to the Terms of Service. Using macros for exceptions leads to some contorted code. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which appears before the else-if in the definition of EXCEPT above. If no exception occurs while executing S. so when control reaches the if statement. If an exception occurred.. For these.

Except_raised). if (p == NULL) { ¢announce an uncaught exception } p->exception = e. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner.h" #define T Except_T Except_Frame *Except_stack = NULL. file.58 EXCEPTIONS AND ASSERTIONS If an exception did not occur.. longjmp(p->env. This download file is made available for personal use only and is subject to the Terms of Service.h> #include "assert.com.c²≡ #include <stdlib. void Except_raise(const T *e. Unauthorized use. \ } while (0) The implementation of Except_raise in except. const char *file. int line) { Except_Frame *p = Except_stack. and the 59² C Interfaces and Implementations: Techniques for Creating Reusable Software. and line fields. assert(e). Except_raised will be assigned to Except_flag in the TRY-EXCEPT or TRY-FINALLY statement.h" #include "except. p->line = line. Except_flag will be Except_entered or Except_finalized: ¢exported macros 48²+≡ #define END_TRY \ ¢pop if this chunk follows S 57² \ } if (Except_flag == Except_raised) RERAISE.h> #include <stdio. reproduction and/or distribution are strictly prohibited and violate applicable laws. } If there is an Except_Frame at the top of the exception stack. .c is the last piece of the puzzle: ¢except. The corresponding call to setjmp will return Except_raised. Except_raise fills in the exception. p->file = file. and calls longjmp. pops the exception stack. All rights reserved. Hanson. ¢pop 56². C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

If the exception stack is empty. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. if (e->reason) fprintf(stderr. else fprintf(stderr. assertions like assert(!"ptr==NULL -. It might. " at 0x%p". reproduction and/or distribution are strictly prohibited and violate applicable laws.3 Assertions The standard requires that header assert. "Uncaught exception"). The format of this information is implementation-defined. fflush(stderr).\n").com. The diagnostic information includes the assertion that failed (the text of e) and the coordinates (the file and line number) at which the assert(e) appears. it will be handled by the TRY-EXCEPT statement whose exception frame is now exposed at the top of the exception stack. so Except_raise has little choice but to announce the unhandled exception and halt: ¢announce an uncaught exception 59²≡ fprintf(stderr. but does not define. assert(e) evaluates e and. " %s". if e is zero.ASSERTIONS 59 appropriate handler will be executed.can’t happen") display more meaningful diagnostics. Except_raise pops the exception stack so that if an exception occurs in one of the handlers. e). . e->reason). line).” Alternatively. Frank Liu Copyright © 1997 by David R.h define assert(e) as a macro that provides diagnostic information. start a debugger or simply write a dump of memory. If NDEBUG is defined. the macro NDEBUG. abort is the standard C library function that aborts execution. writes diagnostic information on the standard error and aborts execution by calling the standard library function abort. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. " raised at %s:%d\n". then assert(e) must be equivalent to the vacuous expres- C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. there’s no handler.. Licensed by Frank Liu 1740749 4. fprintf(stderr. abort().h also uses.. assert(0) is a good way to signal conditions that “can’t happen. "aborting.. for example. file. Unauthorized use. sometimes with machine-dependent side effects. assert. if (file && line > 0) fprintf(stderr.

the effect is equivalent to the statement if (!( e 1 )) e 2 . reproduction and/or distribution are strictly prohibited and violate applicable laws. e 2 is a comma expression whose result is a value. "%s:%d: Assertion failed: %s\n". Thus. and the entire expression is cast to void because the standard stipulates that assert(e) returns no value. Hanson. such as an assignment.h are logically equivalent to #undef assert #ifdef NDEBUG #define assert(e) ((void)0) #else extern void assert(int e).h differs from this one because it’s not allowed to include stdio. Since e might not be executed. This download file is made available for personal use only and is subject to the Terms of Service. (int)__LINE__. All rights reserved. When it does. so most versions of assert. #e). The Assert interface defines assert(e) as specified by the standard. 0))) #endif (A “real” version of assert. and does not provide the text of the assertion e: ¢assert. such as if statements. but it can also appear alone as a statement. assert(e) is an expression. which is required by the || operator. #define assert(e) ((void)((e)|| \ (fprintf(stderr. it’s important that it never be an essential computation that has side effects. except that an assertion failure raises the exception Assert_Failed instead of aborting execution.60 EXCEPTIONS AND ASSERTIONS sion ((void)0). programmers can turn off assertions by defining NDEBUG and recompiling.) An expression like e 1 || e 2 usually appears in conditional contexts. not a statement. \ __FILE__.h²≡ #undef assert #ifdef NDEBUG #define assert(e) ((void)0) C Interfaces and Implementations: Techniques for Creating Reusable Software. abort(). . Unauthorized use. The definition of assert uses e 1 || e 2 because assert(e) must expand to an expression. the locution #e generates a string literal whose contents are the characters in the text for the expression e. In a Standard C preprocessor. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner.h in order to use fprintf and stderr.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft..

.h. Unauthorized use.h" extern void assert(int e).h.com.h" const Except_T Assert_Failed = { "Assertion failed" }. so removing them can only make programs faster.ASSERTIONS 61 #else #include "except. Any other use requires prior written consent from the copyright owner. which is why Assert_Failed appears in except.c²≡ #include "assert.c:201 aborting. which is functionally equivalent to the diagnostics issued by machinespecific versions of assert. then an assertion failure causes the program to abort with a message like Uncaught exception Assertion failed raised at stmt. If clients don’t handle Assert_Failed. Assert mimics the standard’s definitions so that the two assert.h headers can be used interchangeably. The two reasons most often cited for omitting assertions are efficiency and the possibility of cryptic diagnostics.. Frank Liu Copyright © 1997 by David R. . #define assert(e) ((void)((e)||(RAISE(Assert_Failed).0))) #endif ¢exported variables 53²+≡ extern const Except_T Assert_Failed. Assertions do take time. The difference in execution time with and without assertions can C Interfaces and Implementations: Techniques for Creating Reusable Software. Some programmers advise against leaving assertions in production programs. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. The implementation of this interface is trivial: ¢assert. Packaging assertions so that they raise exceptions when they fail helps solve the dilemma about what to do with assertions in production programs. as required by the interface.h. All rights reserved. } The parentheses around the name assert in the function definition suppress expansion of the macro assert and thus define the function. and this advice is supported by the standard’s use of NDEBUG in assert. void (assert)(int e) { assert(e).. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

Worse. Any other use requires prior written consent from the copyright owner.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. The more serious problem with assertions is that they can cause diagnostics. When an assertion fails.62 EXCEPTIONS AND ASSERTIONS be measured. For example: #include <stdlib. ELSE C Interfaces and Implementations: Techniques for Creating Reusable Software. The problem with cryptic assertion-failure diagnostics can be handled with a TRY-EXCEPT statement at the top level of the production version of the program that catches all uncaught exceptions and issues a more helpful diagnostic. an editor may destroy a user’s files. Unauthorized use. which calls h from within a loop. This behavior is inexcusable. and placed before the loop in g. that both f and g call h.h> #include "except. Careful analysis may reveal that the assertion in h can be moved to both f and g. however. Messages like General protection fault at 3F60:40EA or Segmentation fault -. char *argv[]) { TRY edit(argc.core dumped are no better than the assertion-failed diagnostic shown above. When measurements do show that an assertion is too costly. If it continues. such as the assertion-failure diagnostic above. But omitting assertions replaces these diagnostics with a greater evil. it does so with unpredictable results and will most likely crash.h" int main(int argc. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. argv). For example. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. and that difference is usually tiny. it’s sometimes possible to move the assertion to reduce its cost without losing its benefit. the program is wrong. suppose h contains an assertion that costs too much.h> #include <stdio. a program that continues after an assertion failure would have stopped it may corrupt user data. Hanson. Removing assertions for efficiency reasons is like making any other change to improve execution time: The change should be made only when objective measurements support it. for example. and that measurements show most of the time is due to the call from g.. . that will mystify users. Frank Liu Copyright © 1997 by David R.

but it’s more efficient when an exception is raised. reproduction and/or distribution are strictly prohibited and violate applicable laws.. sometimes with variations in syntax and semantics.. His implementation is similar. Unauthorized use. Except_raise calls longjmp to transfer to a handler. this handler precedes the cryptic diagnostic with instructions for reporting the bug. Uncaught exception Assertion failed raised at stmt. and so is longjmp.. Except’s TRY-EXCEPT statement is modeled after Modula-3’s TRY-EXCEPT statement. If that handler doesn’t handle the exception. which will help our support " "staff\nfind the cause of this error. This download file is made available for personal use only and is subject to the Terms of Service. END_TRY. If the handler for the exception is N exception frames down the exception stack. Any other use requires prior written consent from the copyright owner. which will help our support staff find the cause of this error. .\nPlease report this error to " "Technical Support at 800-777-1234. and C++ (Ellis and Stroustrup 1990). it prints An internal error has occurred from which there is no recovery. To do this.FURTHER READING 63 fprintf(stderr. Note the following message. "An internal error has occurred from which there is " "no recovery. Except_raise is called again. Roberts’s implementation makes one call to the appropriate handler or to the first FINALLY clause. Several exception mechanisms have been proposed for C. Frank Liu Copyright © 1997 by David R.c:201 aborting.\n\n") RERAISE. Hanson. Eiffel (Meyer 1992). } When an uncaught exception occurs. return EXIT_SUCCESS. All rights reserved.\nNote the " "following message. examples include Ada. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. Roberts (1989) describes an interface for an exception facility that is equivalent to the one provided by Except. Except_raise and longjmp are called N times. Please report this error to Technical Support at 800-777-1234. Further Reading Several languages have built-in exception mechanisms. they all provide facilities similar to the TRY-EXCEPT statement. Modula-3 (Nelson 1991). For an assertion failure. it must place an upper C Interfaces and Implementations: Techniques for Creating Reusable Software.

Some C compilers. Unauthorized use. Exercises 4. Hanson.1 What’s the effect of a statement that has both EXCEPT and FINALLY clauses? These are statements of the form TRY S EXCEPT( e 1) S1 … EXCEPT( e n) Sn FINALLY S0 END_TRY 4.64 EXCEPTIONS AND ASSERTIONS bound on the number of exception handlers in a TRY-EXCEPT statement.4 Some systems print a stack trace when a program aborts.com. C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. Some languages have built-in assertion mechanisms. Digital’s Modula-3 compiler recognizes comments of the form <*ASSERT expression*> as compiler pragmas that specify assertions. like Microsoft’s. provide structured exception facilities as language extensions. 4. Study the UNIX signal repertoire and design and implement an interface for a signal handler that turns signals into exceptions. as described above and implemented by Roberts (1989). For example. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 4.3 UNIX systems use signals to announce some exceptional conditions.2 Change the Except interface and implementation so that Except_raise makes only one call to longjmp to reach the appropriate handler or FINALLY clause. such as floating overflow and when a user strikes an interrupt key. and it may include procedure names and arguments.. Maguire (1993) devotes an entire chapter to using assertions in C programs. Any other use requires prior written consent from the copyright owner. Most languages use facilities similar to C’s assert macro or other compiler directives to specify assertions. Eiffel is an example. Frank Liu Copyright © 1997 by David R. . This download file is made available for personal use only and is subject to the Terms of Service. This shows the state of the procedure-call stack when the program aborted. reproduction and/or distribution are strictly prohibited and violate applicable laws.

All rights reserved.c:201 called from statement() at stmt. C Interfaces and Implementations: Techniques for Creating Reusable Software. Warning: This exercise is a large project.c:63 called from compound() at decl. . the trace might look like this: Uncaught exception Assertion failed raised in whilestmt() at stmt. Unauthorized use.6 If you have access to a C compiler. This facility is particularly useful during development. Depending on the calling conventions on your computer. without using setjmp and longjmp. when assertion failures may be common.c:122 called from funcdefn() at decl. modify it to support exceptions. make it figure out at runtime whether or not to invoke the debugger. Try to make your implementation work in production programs. Frank Liu Copyright © 1997 by David R. and RAISE and RERAISE expressions with the syntax and semantics described in this chapter..com. TRY statements. a program can invoke a debugger on itself when it has detected an error. This download file is made available for personal use only and is subject to the Terms of Service.c:890 called from decl() at decl. but it can be specialized for exception handling.5 On some systems.c:95 called from program() at decl.c:788 called from main() at main. 4. like lcc (Fraser and Hanson 1995). it’s usually possible to instantiate the handlers with only a few instructions. 4. Hanson.EXERCISES 65 Change Except_raise to print a stack trace when it announces an uncaught exception.c:34 aborting. you may be able to print the procedure names and the line numbers of the calls. If your system supports this facility. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. that is. For example.. You will need to implement a mechanism similar to setjmp and longjmp. For example. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws.. change Except_raise to start the debugger instead of calling abort after it announces an uncaught exception.

All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service.com. Frank Liu Copyright © 1997 by David R.C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. .. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use.

This behavior is what makes these kinds of access errors hard to diagnose: when the error is detected. 67 A C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. and free. Unfortunately. Frank Liu Copyright © 1997 by David R.com. and frees the block. … free(p). Any other use requires prior written consent from the copyright owner. uses p and the block it points to. . The fragment p = malloc(nbytes). The Mem interface repackages these routines as a set of macros and routines that are less prone to error and that provide a few additional capabilities. After the call to free.5 MEMORY MANAGEMENT ll nontrivial C programs allocate memory at runtime. p holds a dangling pointer — a pointer that refers to memory that logically does not exist. Hanson. calls malloc to allocate a block of nbytes of memory. although if the block hasn’t been reallocated for another purpose. The standard C library provides four memory-management routines: malloc. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. it may manifest itself at a place and time far away from the origin of the error. the error might go undetected. the fragment p = malloc(nbytes). realloc. calloc. For example. Subsequently dereferencing p is an error.. memory-management bugs are common in C programs. reproduction and/or distribution are strictly prohibited and violate applicable laws. … free(p). and they are often difficult to diagnose and fix. Unauthorized use. assigns the address of the first byte of that block to p.

and the code fails to test for this condition. *buf = '\0'. if (n >= sizeof buf) p = malloc(n). illustrates another error: deallocating free memory. Frank Liu Copyright © 1997 by David R. *p. calloc. This error usually corrupts the data structures used by the memory-management functions. but it may go undetected until a subsequent call to one of those functions. . First. char *buf. "%d". Another error is deallocating memory that wasn’t allocated by malloc.size-1] with the decimal representation of the integer n or with asterisks if that representation takes more than size-1 characters. the intent of char buf[20]. Finally. is to avoid allocation when n is less than the size of buf. Again. Unauthorized use. if (strlen(p) >= size . else p = buf. the code creates a memory leak: it doesn’t C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. p). but it contains at least two errors. the function void itoa(int n. This code looks robust.1) { while (--size > 0) *buf++ = '*'. } else strcpy(buf. reproduction and/or distribution are strictly prohibited and violate applicable laws.68 MEMORY MANAGEMENT … free(p). } fills buf[0. For example. All rights reserved. but the code erroneously calls free even when p points to buf. or realloc. malloc returns the null pointer if the allocation fails.com. Second. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. … free(p). int size) { char *p = malloc(43). Hanson... this error usually corrupts the memory-management data structures and isn’t detected until later. n). sprintf(p.

The macros and routines in the Mem interface offer some protection from these kinds of memory-management errors. Unauthorized use. itoa works correctly when size is less than two. routines. For example..1 Interface The Mem interface exports exceptions. Perhaps a better design would be to insist that size exceed two and to enforce that constraint with a checked runtime error. "%d". however. } Licensed by Frank Liu 1740749 itoa returns the address of its local array buf. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service.INTERFACE 69 deallocate the memory it allocates. They don’t eliminate all such errors. this apparently simpler version of itoa is an example: char *itoa(int n) { char buf[43]. but once itoa returns. 5. and macros: ¢mem. return buf. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . sprintf(buf. n). but it does so by setting buf[0] to the null character. the program will eventually run out memory and fail. Frank Liu Copyright © 1997 by David R. but they don’t accept zero sizes and never return null pointers: C Interfaces and Implementations: Techniques for Creating Reusable Software.h²≡ #ifndef MEM_INCLUDED #define MEM_INCLUDED #include "except. Also. All rights reserved. C novices often commit the latter error.com.h" ¢exported exceptions 70² ¢exported functions 70² ¢exported macros 70² #endif Mem’s allocation functions are similar to those in the standard C library. they can’t guard against dereferencing corrupt pointers or using pointers to local variables that have gone out of scope. If itoa is called often. Hanson. buf no longer exists. The program will slowly consume memory each time itoa is called.

The last two arguments to Mem_alloc and Mem_calloc are the file name and line number of the location of the call. which allocates a block for an instance of the structure T and returns a pointer to that block. int line). __FILE__. Hanson. The block is aligned on an addressing boundary that is suitable for the data with the strictest alignment requirement. This download file is made available for personal use only and is subject to the Terms of Service..0 are not necessarily represented by zeros. Many allocations have the form struct T *p. const char *file. ¢exported macros 70²≡ #define ALLOC(nbytes) \ Mem_alloc((nbytes). . int line). __FILE__. Mem_calloc allocates a block large enough to hold an array of count elements each of size nbytes. Unauthorized use. If file is the null pointer. The null pointer and 0. __LINE__) #define CALLOC(count. extern void *Mem_calloc(long count. It is a checked runtime error for nbytes to be nonpositive. long nbytes. and returns a pointer to the first element.com. A better version of this idiom is C Interfaces and Implementations: Techniques for Creating Reusable Software. The contents of the block are uninitialized. Any other use requires prior written consent from the copyright owner. so Mem_calloc may not initialize them correctly. __LINE__) If Mem_alloc or Mem_calloc cannot allocate the memory requested. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. const char *file. The block is aligned as for Mem_alloc. Frank Liu Copyright © 1997 by David R. All rights reserved. Mem_alloc allocates a block of at least nbytes and returns a pointer to the first byte. reproduction and/or distribution are strictly prohibited and violate applicable laws. which are the usual way to invoke these functions. p = Mem_alloc(sizeof (struct T)).70 MEMORY MANAGEMENT ¢exported exceptions 70²≡ extern const Except_T Mem_Failed. and is initialized to zeros. ¢exported functions 70²≡ extern void *Mem_alloc (long nbytes. they raise Mem_Failed and pass file and line to Except_raise so that the exception reports the location of the call. It is a checked runtime error for count or nbytes to be nonpositive. These are supplied by the following macros. (nbytes). nbytes) \ Mem_calloc((count). Mem_alloc and Mem_calloc supply the locations within their implementations that raise Mem_Failed.

and it’s used in the standard library wherever object sizes are specified. such as NEW(a[i++]). Hanson. This download file is made available for personal use only and is subject to the Terms of Service. but the one with sizeof (struct T) must be changed to reflect the change in p’s type. This allocation idiom is so common that Mem provides macros that encapsulate both the allocation and the assignment: ¢exported macros 70²+≡ #define NEW(p) ((p) = ALLOC((long)sizeof *(p))) #define NEW0(p) ((p) = CALLOC(1. Any other use requires prior written consent from the copyright owner. … p = malloc(n). reproduction and/or distribution are strictly prohibited and violate applicable laws. That is.INTERFACE 71 p = Mem_alloc(sizeof *p). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. If p is changed to a pointer to another structure and the call isn’t updated. Using sizeof *p instead of sizeof (struct T) works for any pointer type. The type size_t is an unsigned integral type capable of representing the size of the largest object that can be declared. int n = -1. NEW0(p) does the same. malloc and calloc take arguments of type size_t. In practice. All rights reserved. for example. p = Mem_alloc(sizeof (struct T)). Frank Liu Copyright © 1997 by David R. sizeof yields a constant of type size_t. (long)sizeof *(p))) NEW(p) allocates an uninitialized block to hold *p and sets p to the address of that block. So NEW and NEW0 evaluate p exactly once. this allocation remains correct. C Interfaces and Implementations: Techniques for Creating Reusable Software.. and it’s safe to use an expression that has side effects as an actual argument to either macro. it is not evaluated at runtime. The argument to the compile-time operator sizeof is used only for its type. and sizeof *p is independent of the pointer’s referent type. NEW is provided on the assumption that most clients initialize a block immediately after allocating it. which is disastrous. the call may allocate too much memory. or too little memory.com. If the type of p is changed. . Unauthorized use. except void pointers. but also clears the block. Mem_alloc and Mem_calloc take integer arguments to avoid errors when negative numbers are passed to unsigned arguments. For example. size_t is either unsigned int or unsigned long. because the client may scribble on unallocated storage. which wastes space. is correct only if p is really a pointer to a struct T.

if ptr is null. Frank Liu Copyright © 1997 by David R. a subsequent dereference will usually cause the program to crash with some kind of addressing error. const char *file. Memory is deallocated by Mem_free: ¢exported functions 70²+≡ extern void Mem_free(void *ptr. The FREE macro also takes a pointer to a block. Hanson. This definite error is better than the unpredictable behavior that dereferencing a dangling pointer can cause. Since ptr is null after its referent has been deallocated by FREE. it usually winds up as a very large unsigned value. int line). Mem_calloc. or a ptr that has already been passed to Mem_free or Mem_resize.4.72 MEMORY MANAGEMENT is clearly an error. it is a checked runtime error to pass Mem_free a nonnull ptr that was not returned by a previous call to Mem_alloc. or Mem_resize. however. as mentioned in Section 2. and sets ptr to the null pointer. As detailed in the sections that follow.com. Any other use requires prior written consent from the copyright owner. The checking implementation implements checked runtime errors to help catch access errors like those described in the previous section. which. C Interfaces and Implementations: Techniques for Creating Reusable Software. __LINE__). All rights reserved. Mem_free deallocates that block. const char *file. but many implementations of malloc won’t catch the error because when −1 is converted to a size_t. helps avoid dangling pointers. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. calls Mem_free to deallocate the block. int line). In the production implementation. If ptr is nonnull. Mem_free has no effect. these access errors are unchecked runtime errors. The values of Mem_free’s file and line arguments are used to report these checked runtime errors. ¢exported macros 70²+≡ #define FREE(ptr) ((void)(Mem_free((ptr).. In that implementation. . Note that FREE evaluates ptr more than once. (ptr) = 0)) Mem_free takes a pointer to the block to be deallocated. there are two implementations that export the Mem interface. long nbytes. Unauthorized use. The function ¢exported functions 70²+≡ extern void *Mem_resize(void *ptr. \ __FILE__.

In the production implementation. calloc. Mem_calloc. and for nbytes to be nonpositive. and to pass it one that has already been passed to Mem_free or Mem_resize. the excess bytes are uninitialized.. Any other use requires prior written consent from the copyright owner. copying some or all of the data from ptr to the new block. the routines encapsulate calls to the memory-management functions in the standard library in the safer package specified by the Mem interface: C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. It is a checked runtime error to pass Mem_resize a null ptr. suitably aligned. Note that RESIZE evaluates ptr more than once. realloc. Mem_resize expands or contracts the block so that it holds at least nbytes of memory. \ (nbytes). it raises Mem_Failed.2 Production Implementation In the production implementation. the first argument to Mem_resize is the pointer that holds the address of the block whose size is to be changed. Frank Liu Copyright © 1997 by David R. with file and line as the exception coordinates. these access errors are unchecked runtime errors. The functions in the Mem interface can be used in addition to the standard C library functions malloc. The macro RESIZE changes ptr to point at the new block — a common use of Mem_resize. a program can use both sets of allocation functions. Like Mem_free. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. reproduction and/or distribution are strictly prohibited and violate applicable laws. Otherwise. The access errors reported as checked runtime errors by the checking implementation apply only to memory managed by that implementation. If nbytes exceeds the size of the block pointed to by ptr. If Mem_resize cannot allocate the new block. Unauthorized use. In the checking implementation. and returns a pointer to the resized block. __LINE__)) changes the size of the block allocated by a previous call to Mem_alloc. or Mem_resize. Hanson. or Mem_resize. nbytes beginning at ptr are copied to the new block. That is.PRODUCTION IMPLEMENTATION 73 ¢exported macros 70²+≡ #define RESIZE(ptr. All rights reserved. Mem_calloc. __FILE__. and deallocating ptr. . nbytes) ((ptr) = Mem_resize((ptr). Only one implementation of the Mem interface may be used in any given program.com. 5. and free. so Mem_resize is logically equivalent to allocating a new block. it is a checked runtime error to pass Mem_resize a ptr that was not returned by a previous call to Mem_alloc. Mem_resize may move the block in order to change its size.

Hanson. . assert(nbytes > 0). } ¢raise Mem_Failed 74²≡ { if (file == NULL) RAISE(Mem_Failed). else Except_raise(&Mem_Failed. Any other use requires prior written consent from the copyright owner.c²≡ #include #include #include #include #include ¢data 74² ¢functions <stdlib. Except_raise will give the caller’s coordinates. const char *file. } ¢data 74²≡ const Except_T Mem_Failed = { "Allocation Failed" }. reproduction and/or distribution are strictly prohibited and violate applicable laws. Mem_alloc calls malloc and raises Mem_Failed when malloc returns the null pointer: ¢functions 74²≡ void *Mem_alloc(long nbytes.com.74 MEMORY MANAGEMENT ¢mem.h" 74² For example. if (ptr == NULL) ¢raise Mem_Failed 74² return ptr. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int line){ void *ptr. If a client doesn’t handle Mem_Failed. For example: Uncaught exception Allocation Failed raised @parse. file. line).h> <stddef. C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. which are passed to Mem_alloc when it reports the unhandled exception.c:431 aborting.h> "assert. Frank Liu Copyright © 1997 by David R.. ptr = malloc(nbytes).. All rights reserved..h" "except.h" "mem.

reproduction and/or distribution are strictly prohibited and violate applicable laws. Mem_free just calls free: ¢functions 74²+≡ void Mem_free(void *ptr. Frank Liu Copyright © 1997 by David R. int line) { assert(ptr).com. if (ptr == NULL) ¢raise Mem_Failed 74² return ptr. ptr = realloc(ptr. but Mem_free doesn’t pass them. assert(count > 0). } The standard permits null pointers to be passed to free. const char *file. nbytes). assert(nbytes > 0). assert(nbytes > 0). because old implementations of free may not accept null pointers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which is one of its advantages and helps avoid bugs. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. const char *file. } When either count or nbytes is zero. All rights reserved. long nbytes. calloc’s behavior is implementation defined. which is reflected in its simpler implementation: ¢functions 74²+≡ void *Mem_resize(void *ptr.PRODUCTION IMPLEMENTATION 75 Similarly. long nbytes. ptr = calloc(count. int line) { if (ptr) free(ptr). int line) { void *ptr.. Mem_resize has a much simpler specification than does realloc. if (ptr == NULL) ¢raise Mem_Failed 74² C Interfaces and Implementations: Techniques for Creating Reusable Software. Mem_calloc encapsulates a call to calloc: ¢functions 74²+≡ void *Mem_calloc(long count. const char *file. . The Mem interface specifies what happens in these cases. Any other use requires prior written consent from the copyright owner. Hanson. nbytes).

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. These additional capabilities. Mem_calloc. free) or (α. Mem_alloc and Mem_calloc add the pair (ptr.. but it also frees a block when nbytes is zero and allocates a block when ptr is the null pointer. that is. which are only loosely related to changing the size of an existing block. invite bugs. free) was in S before the addition. The value free indicates that the address α does not refer to allocated memory. Frank Liu Copyright © 1997 by David R.h> <string. allocated). and Mem_resize never return the same address twice and if they remember all of the addresses they do return and which ones refer to allocated memory. where α is the address returned by an allocation.c²≡ #include #include #include #include #include ¢checking ¢checking ¢data 74² ¢checking ¢checking <stdlib. it has been deallocated explicitly. and the value allocated indicates that α points to allocated memory.h> "assert. ¢memchk. allocated) nor (ptr. and they guarantee that neither (ptr.h" "except. Unauthorized use. too. This download file is made available for personal use only and is subject to the Terms of Service. } Mem_resize’s only purpose is to change the size of an existing block.76 MEMORY MANAGEMENT return ptr. Hanson.3 Checking Implementation The functions exported by the checking implementation of the Mem interface catch the kinds of access errors described at the beginning of this chapter and report them as checked runtime errors. reproduction and/or distribution are strictly prohibited and violate applicable laws. these functions maintain a set S whose elements are the pairs (α.com. Any other use requires prior written consent from the copyright owner. Mem_free(ptr) is legal if ptr C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved.h" "mem. 5. Abstractly. . where ptr is their return value. allocated) to S. realloc does this.h" types 80² macros 79² data 77² functions 79² Mem_free and Mem_resize can detect access errors if Mem_alloc.

these calls make the appropriate changes to S. nbytes. Mem_resize(ptr. but they’re stored in descriptors so that debuggers can print them during a debugging session. const char *file. free). Mem_free deallocates the block at ptr and changes the entry in S to (ptr. file and line are the block’s allocation coordinates — the source coordinates passed to the function that allocated the block. This approach wastes space. and calls Mem_free to deallocate the old one. and it’s easy to do better: never deallocate the byte at an address previously returned by an allocation function. which is allocated elsewhere as described below. int line. allocated) is in S.CHECKING IMPLEMENTATION 77 is null or if (ptr. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. allocated) is in S. copies the contents of the old one to the new one. The link fields form a list of descriptors for blocks that hash to the same index in htab. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. If ptr is nonnull and (ptr. long size. and size is the size of the block. const void *ptr. . which is an array of pointers to descriptors. Hanson. All rights reserved.. This scheme can be implemented by writing a memory allocator that sits on top of the standard library functions. allocated) is in S. Unauthorized use. This allocator maintains a hash table of block descriptors: ¢checking data 77²≡ static struct descriptor { struct descriptor *free. ptr is the address of the block. Frank Liu Copyright © 1997 by David R.com. S can be implemented by keeping a table of the addresses of these bytes. Similarly. These descriptors also form a list of free blocks. Mem_resize calls Mem_alloc to allocate a new block. The condition that the allocation functions never return the same address twice can be implemented by never deallocating anything. These values aren’t used. If so. the head of this list is the dummy descriptor ¢checking data 77²+≡ static struct descriptor freelist = { &freelist }. } *htab[2048]. reproduction and/or distribution are strictly prohibited and violate applicable laws. struct descriptor *link. C Interfaces and Implementations: Techniques for Creating Reusable Software. …) is legal only if (ptr.

both free and allocated. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. Thus. This list is circular: freelist is the last descriptor on the list and its free field points to the first descriptor. Any other use requires prior written consent from the copyright owner. clear spaces are free. All rights reserved. htab freelist C Interfaces and Implementations: Techniques for Creating Reusable Software. The space associated with each descriptor structure appears behind it. and the dotted lines show the free list. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. ••• ••• Figure 5. Frank Liu Copyright © 1997 by David R. and the free blocks are on freelist. solid lines emanate from link fields.1 shows these data structures at one point in time. and htab implements S.78 MEMORY MANAGEMENT and the list is threaded through the free fields of the descriptors. Figure 5. Shaded spaces are allocated.. htab holds descriptors for all of the blocks.1 htab and freelist structures . At any given time. the descriptor’s free field is null if the block is allocated and nonnull if it’s free. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.

Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. find is enough to write a version of Mem_free in which access errors are checked runtime errors: ¢checking functions 79²+≡ void Mem_free(void *ptr.com. All rights reserved. It returns either a pointer to the descriptor or the null pointer: ¢checking functions 79²≡ static struct descriptor *find(const void *ptr) { struct descriptor *bp = htab[hash(ptr. } ¢checking macros 79²≡ #define hash(p. return bp. reproduction and/or distribution are strictly prohibited and violate applicable laws.free = bp. This download file is made available for personal use only and is subject to the Terms of Service. the block is deallocated by appending it to the free list for possible reuse by a subsequent call to Mem_alloc. const char *file. freelist. Any other use requires prior written consent from the copyright owner. ¢set bp if ptr is valid 79² bp->free = freelist.free. . htab)].CHECKING IMPLEMENTATION 79 Given an address. find searches for its descriptor. Frank Liu Copyright © 1997 by David R. while (bp && bp->ptr != ptr) bp = bp->link. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. line). file. shifts it right three bits. Unauthorized use. } } If ptr is nonnull and is a valid address. and reduces it modulo the size of htab.. Hanson. t) (((unsigned long)(p)>>3) & \ (sizeof (t)/sizeof ((t)[0])-1)) The hash macro treats the address as a bit pattern. int line) { if (ptr) { struct descriptor *bp. A pointer is valid if it points to an allocated block: ¢set bp if ptr is valid 79²≡ if (((unsigned long)ptr)%(sizeof (union align)) != 0 || (bp = find(ptr)) == NULL || bp->free) Except_raise(&Assert_Failed.

. long double ld. line). Mem_resize catches access errors by making the same check. const char *file. long nbytes. Mem_free(ptr.80 MEMORY MANAGEMENT The test ((unsigned long)ptr)%(sizeof (union align)) != 0 avoids calls to find for those addresses that aren’t multiples of the strictest alignment and thus cannot possibly be valid block pointers.com. Mem_alloc always returns pointers that are aligned on an address that is a multiple of the size of the following union. } C Interfaces and Implementations: Techniques for Creating Reusable Software. ¢set bp if ptr is valid 79² newptr = Mem_alloc(nbytes. assert(ptr). memcpy(newptr. void (*fp)(void). If the ptr passed to Mem_free isn’t so aligned. float f. long l. line). double d. ptr. As shown below. Frank Liu Copyright © 1997 by David R. Mem_alloc. file. and then calls Mem_free. Any other use requires prior written consent from the copyright owner. All rights reserved. ¢checking types 80²≡ union align { int i. file. int line) { struct descriptor *bp. reproduction and/or distribution are strictly prohibited and violate applicable laws. }. assert(nbytes > 0). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. it can’t possibly be in htab and is thus invalid. and the library function memcpy: ¢checking functions 79²+≡ void *Mem_resize(void *ptr. return newptr. Hanson. nbytes < bp->size ? nbytes : bp->size). This alignment ensures that any type of data can be stored in the blocks returned by Mem_alloc. Unauthorized use. void *p. void *newptr. long *lp. This download file is made available for personal use only and is subject to the Terms of Service..

static int nleft.. All rights reserved. Any other use requires prior written consent from the copyright owner. } avail->ptr = ptr. avail->size = size. because each request needs its own descriptor. const char *file. Hanson. file. return ptr. . Allocating descriptors separately decouples their allocations from those done by Mem_alloc and reduces — but does not eliminate — the chances that they will be corrupted. it makes the descriptors vulnerable to corruption by writes through pointers or indices that stray just outside of allocated blocks.com. const char *file. initializes. line). First. dalloc allocates. '\0'. long nbytes. int line) { void *ptr. One way to do both tasks with one allocation is to allocate a block large enough to hold a descriptor and the storage requested by a call to Mem_alloc. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusable Software. ptr = Mem_alloc(count*nbytes. assert(nbytes > 0). This download file is made available for personal use only and is subject to the Terms of Service. count*nbytes). This approach has two drawbacks. } All that remains is to allocate the descriptors themselves and the code for Mem_alloc. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.CHECKING IMPLEMENTATION 81 Likewise. memset(ptr. Mem_calloc can be implemented by calling Mem_alloc and the library function memset: ¢checking functions 79²+≡ void *Mem_calloc(long count. reproduction and/or distribution are strictly prohibited and violate applicable laws. assert(count > 0). if (nleft <= 0) { ¢allocate descriptors 82² nleft = NDESCRIPTORS. and returns one descriptor. it complicates carving up a block of free storage to satisfy several smaller requests. long size. Second. int line) { static struct descriptor *avail. doling it out of the 512-descriptor chunks obtained from malloc: ¢checking functions 79²+≡ static struct descriptor *dalloc(void *ptr.

It searches freelist for the first free block that is large enough to satisfy the request and divides that block to fill the request. avail->free = avail->link = NULL. Unauthorized use. Mem_alloc allocates a block of memory using the first-fit algorithm. bp = bp->free) { if (bp->size > nbytes) { ¢use the end of the block at bp->ptr 83² } if (bp == &freelist) { C Interfaces and Implementations: Techniques for Creating Reusable Software. avail->line = line. This download file is made available for personal use only and is subject to the Terms of Service. assert(nbytes > 0). Mem_alloc calls malloc to allocate a chunk of memory that’s larger than nbytes. and tries again. Here’s the code: ¢checking functions 79²+≡ void *Mem_alloc(long nbytes. All rights reserved. void *ptr. If freelist doesn’t contain a suitable block. int line){ struct descriptor *bp.com. Mem_alloc raises Mem_Failed when dalloc returns the null pointer.. bp. const char *file. ¢round nbytes up to an alignment boundary 83² for (bp = freelist. } ¢checking macros 79²+≡ #define NDESCRIPTORS 512 The call to malloc might return the null pointer.free. Frank Liu Copyright © 1997 by David R. adds this chunk onto the free list. ¢allocate descriptors 82²≡ avail = malloc(NDESCRIPTORS*sizeof (*avail)). it is used to fill the request the second time around. return avail++. which dalloc passes back to its caller. . Any other use requires prior written consent from the copyright owner. Since the new chunk is larger than nbytes.82 MEMORY MANAGEMENT avail->file = file. As shown below. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if (avail == NULL) return NULL. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. one of many memory-allocation algorithms. nleft--.

htab[h] = bp. ptr = (char *)bp->ptr + bp->size. bp->link = htab[h]. } } assert(0).com. The nbytes at the end of this free block are carved off. return ptr. file.CHECKING IMPLEMENTATION 83 struct descriptor *newptr. On the right. Notice that the new descriptor’s free list link is null.free points to the beginning of the free list. initialized.2 shows the effect of this chunk: on the left is a descriptor that points to some free space before it’s carved up. and added to htab: ¢use the end of the block at bp->ptr 83²≡ bp->size -= nbytes. which is where the for loop starts. if ((bp = dalloc(ptr. ¢newptr ← a block of size NALLOC + nbytes newptr->free = freelist. after which bp->size C Interfaces and Implementations: Techniques for Creating Reusable Software. The test bp->size > nbytes guarantees that the value of bp->ptr is never reused. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. nbytes. freelist. All rights reserved.1)/ (sizeof (union align)))*(sizeof (union align)).free. Large free blocks are divided to fill smaller requests until they’re reduced to sizeof (union align) bytes. and the address of that block is returned after its descriptor is created. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. freelist. } else ¢raise Mem_Failed 74² Figure 5. line)) != NULL) { unsigned h = hash(ptr.free = newptr. . the allocated space is shaded and a new descriptor points to it. htab). Unauthorized use.. return NULL. Frank Liu Copyright © 1997 by David R. The first free block whose size exceeds nbytes is used to fill the request. } 84² Mem_alloc starts by rounding nbytes up so that every pointer it returns is a multiple of the size of the union align: ¢round nbytes up to an alignment boundary 83²≡ nbytes = ((nbytes + sizeof (union align) .

1)/ \ (sizeof (union align)))*(sizeof (union align)) plus nbytes is added to the beginning of the free list.com.2 Allocating the tail of a free block never exceeds nbytes. __LINE__)) == NULL) ¢raise Mem_Failed 74² C Interfaces and Implementations: Techniques for Creating Reusable Software. In that case. All rights reserved. the list does not hold a block whose size exceeds nbytes. __FILE__. Hanson. it will be visited on the next iteration of the for loop and will be used to fill the request. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. The first sizeof (union align) bytes of each chunk are never allocated. nbytes + NALLOC. Any other use requires prior written consent from the copyright owner..84 MEMORY MANAGEMENT ••• ••• ••• ptr Figure 5. This new chunk has a descriptor just as if it had been previously allocated and freed: ¢newptr ← a block of size NALLOC + nbytes 84²≡ if ((ptr = malloc(nbytes + NALLOC)) == NULL || (newptr = dalloc(ptr. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. a new chunk of size ¢checking macros 79²+≡ #define NALLOC ((4096 + sizeof (union align) . . If bp reaches freelist. This download file is made available for personal use only and is subject to the Terms of Service.

such as for proprietary libraries. such as first fit. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. it can be used even when source code is unavailable. best fit. It capitalizes on the observation that many applications allocate blocks of only few different sizes. These versions often use quick fit with a small.3. This download file is made available for personal use only and is subject to the Terms of Service. Instrumenting source code to catch access errors is at the other end of the implementation spectrum. Knuth (1973a) surveys all of the important memory-allocation algorithms. When the lists are empty or the request is for an odd size. Maguire (1993) gives a critique of these functions and describes a similar repackaging. one for each of the N most frequently requested sizes.FURTHER READING 85 Further Reading One of the purposes of Mem is to improve the interface to the standard C allocation functions. Quick fit (Weinstock and Wulf 1988) is one of the most widely used. LCLint (Evans 1996) has many of the features of tools like PC-Lint and can detect many potential memory-allocation errors at compile time. Allocating a block of one of these sizes simply removes the first block from the appropriate list. One of the best is Purify (Hastings and Joyce 1992). They then feed these data to a program that generates source code for versions of malloc and free customized for the application. application-specific set of block sizes. Frank Liu Copyright © 1997 by David R. Austin. reproduction and/or distribution are strictly prohibited and violate applicable laws. Memory-allocation bugs so pervade C programs that entire companies are devoted to building and selling tools that help diagnose and fix such bugs. Quick fit keeps N free lists. since it does so by editing object code. Breach. usually designed to improve performance for specific applications or allocation patterns. and freeing a block adds it to the appropriate list.. which detects almost all kinds of access errors. for example. frequency of allocation versus deallocation. The first-fit algorithm used in Mem_alloc is similar to the one described in Section 8. including those described in Section 5. Unauthorized use. is used.7 of Kernighan and Ritchie (1988). . for example. Hanson. Grunwald and Zorn (1993) describe a system that generates implementations of malloc and free tuned for a specific application. Purify checks every load and store instruction. Any other use requires prior written consent from the copyright owner. which looks for the free block whose size is closest to the request. They first run the application with versions of malloc and free that collect statistics about block sizes. and Sohi (1994) describe a system in which “safe” pointers carry enough information to catch a wide range of access errors. C Interfaces and Implementations: Techniques for Creating Reusable Software. an alternate algorithm. and so forth. and explains why first fit is usually better than. There are endless variations on most memory-mangement algorithms. All rights reserved.

1 Maguire (1993) advocates initializing uninitialized memory to some distinctive bit pattern to help diagnose bugs that are caused by accessing uninitialized memory.. These messages can record the coordinates of the erroneous call and of the allocation coordinates. Any other use requires prior written consent from the copyright owner. it can never satisfy a request yet remains in the free list. combine adjacent free blocks to form larger free blocks. For example. Can you find an application for which measurements can detect the effect of this improvement? 5. when Mem_free is called with a pointer to a block that has already been freed. What are the properties of a good bit pattern? Propose a suitable bit pattern and change the checking implementation of Mem_alloc to use it. reproduction and/or distribution are strictly prohibited and violate applicable laws. If Mem_log is passed a nonnull file pointer. 5. Hanson. Change this code to remove such blocks. Try to find an application where this change catches a bug. Devise an algorithm for Mem_alloc that can combine adjacent free blocks without returning the same address twice.4 Some programmers might argue that raising Assert_Failure in Mem_free is a draconian reaction to an access error because execution can continue if the erroneous call is simply logged and then ignored. . All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service.3 Most implementations of first fit. Frank Liu Copyright © 1997 by David R. it announces access errors by writing messages to log instead of by raising Assert_Failure.c:461 This block is 48 bytes long and was allocated from sym.com.c:123 C Interfaces and Implementations: Techniques for Creating Reusable Software.86 MEMORY MANAGEMENT Exercises 5. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. it might write ** freeing free memory Mem_free(0x6418) called from parse. 5. such as the one in Section 8.7 of Kernighan and Ritchie (1988). The checking implementation of Mem_alloc doesn’t combine adjacent free blocks because it may not return the same address twice. Unauthorized use.2 Once a free block is whittled down to sizeof (union align) bytes in the chunk ¢use the end of the block at bp->ptr 83². Implement extern void Mem_log(FILE *log).

int line. ptr is the location of the block.640) called from types. ptr). file. Implement extern void Mem_leak(apply(void *ptr. cl. For example.c:1101 Permit Mem_log(NULL) to turn off logging and reinstate assertion failure for access errors. Any other use requires prior written consent from the copyright owner. it might report ** resizing unallocated memory Mem_resize(0xf7fff930. Leaks cause programs to run out of memory eventually. const char *file. void *cl). const char *file. apply and cl are called a closure: They specify an operation and some context-specific data for that operation. "** memory in use at %p\n". when Mem_resize is called with a bad pointer. } writes messages like C Interfaces and Implementations: Techniques for Creating Reusable Software. long size. They aren’t a problem for programs that run for only a short time. reproduction and/or distribution are strictly prohibited and violate applicable laws. size. Clients can pass an application-specific pointer. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. a memory leak is an allocated block that is not referenced by any pointer and thus cannot be deallocated. fprintf(log. Hanson. fprintf(log. As described on page 68. long size. line). such as user interfaces and servers. 5. and this pointer is passed along to apply as its last argument.com. Unauthorized use.EXERCISES 87 Similarly. Mem_leak doesn’t know what cl is for. "This block is %ld bytes long " "and was allocated from %s:%d\n". which calls the function pointed to by apply for every allocated block. All rights reserved. int line. and file and line are its allocation coordinates. This download file is made available for personal use only and is subject to the Terms of Service. but they’re a serious problem for long-running programs.. void inuse(void *ptr. size is its allocated size. void *cl) { FILE *log = cl.5 The checking implementation has all of the information it needs to report potential memory leaks. void *cl). . but presumably apply does. to Mem_leak. Together.

log). Unauthorized use.88 MEMORY MANAGEMENT ** memory in use at 0x13428 This block is 32 bytes long and was allocated from gen.c:23 to the log file described in the previous exercise. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. reproduction and/or distribution are strictly prohibited and violate applicable laws.. inuse is called by passing it and the file pointer for the log file to Mem_leak: Mem_leak(inuse. Any other use requires prior written consent from the copyright owner. Hanson. C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service.com. Frank Liu Copyright © 1997 by David R. .

but it can be used only if object lifetimes are nested. Any other use requires prior written consent from the copyright owner. With the arena-based allocator. Space for scroll bars. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. But M Licensed by Frank Liu 1740749 89 C Interfaces and Implementations: Techniques for Creating Reusable Software. is allocated when a window is created. The first-fit algorithm used in the previous chapter is an example. Frank Liu Copyright © 1997 by David R. Both allocation and deallocation are more efficient. Graphical user interfaces are an example. All rights reserved. which allocate memory from an arena and deallocate entire arenas at once. reproduction and/or distribution are strictly prohibited and violate applicable laws. for example.com. Calling malloc requires a subsequent call to free. In some applications. there’s only a single call that deallocates all the memory allocated in an arena since the last deallocation. As discussed in the previous chapter. worse. and so forth. Stack-based allocation is an example of this class of allocation algorithms. and deallocated when the window is destroyed. to deallocate an object that has already been deallocated. or one that shouldn’t be deallocated. buttons.. which often is not the case. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. A compiler is another example. Memory-management algorithms based on the lifetimes of objects are often better for these kinds of applications.6 MORE MEMORY MANAGEMENT ost implementations of malloc and free use memory-management algorithms that are necessarily based on the sizes of objects. there’s no obligation to call free for every call to malloc. lcc. Hanson. and there are no storage leaks. . allocates memory as it compiles a function and deallocates all of that memory at once when it finishes compiling the function. it’s easy to forget to call free or. This chapter describes a memory-management interface and an implementation that uses arena-based algorithms. deallocations are grouped and occur at the same time.

extern const Except_T Arena_NewFailed. Applicative algorithms allocate new data structures instead of changing existing ones.h" #define T Arena_T typedef struct T *T. and it can create dangling pointers. Any other use requires prior written consent from the copyright owner. If an object is allocated in the wrong arena and that arena is deallocated before the program is done with the object. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. arena. arena management is so easy that these problems rarely occur. however. There are two disadvantages of the arena-based scheme: It can use more memory. Hanson.h²≡ #ifndef ARENA_INCLUDED #define ARENA_INCLUDED #include "except.. In practice. perhaps unrelated. ¢exported functions #undef T #endif Arenas are created and destroyed by 91² C Interfaces and Implementations: Techniques for Creating Reusable Software. The arena-based allocator encourages simple applicative algorithms in place of algorithms that might be more space-efficient but are always more complex because they must remember when to call free. extern const Except_T Arena_Failed.1 Interface The Arena interface specifies two exceptions and functions that manage arenas and allocate memory from them: ¢arena.90 MORE MEMORY MANAGEMENT the most important benefit of this scheme is that it simplifies code. . Unauthorized use. the program will reference either unallocated memory or memory that has been reused for another. which creates a storage leak. All rights reserved. 6. reproduction and/or distribution are strictly prohibited and violate applicable laws. It’s also possible to allocate objects in an arena that isn’t deallocated as early as expected.

com. extern void *Arena_calloc(T arena. If Arena_alloc and Arena_calloc cannot allocate the memory requested. and returns a pointer to the first byte. Arena_calloc allocates a block large enough to hold an array of count elements. It is a checked runtime error for count or nbytes to be nonpositive. const char *file. Arena_new creates a new arena and returns an opaque pointer to the newly created arena. . long nbytes. extern void Arena_free (T arena). extern void Arena_dispose(T *ap). All rights reserved. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Arena_free deallocates all the storage in arena. Any other use requires prior written consent from the copyright owner. int line). The block is aligned on an addressing boundary that is suitable for the data with the strictest alignment requirement.. The contents of the block are uninitialized. they raise Arena_Failed and pass file and line to Except_raise so that the exception reports the location of the call. If file is the null pointer. they supply the source locations within their implementations that raise Arena_Failed. long count.INTERFACE 91 ¢exported functions 91²≡ extern T Arena_new (void). Arena_alloc allocates a block of at least nbytes in arena and returns a pointer to the first byte. This download file is made available for personal use only and is subject to the Terms of Service. If Arena_new cannot allocate the arena. long nbytes. It is a checked runtime error to pass a null ap or *ap to Arena_dispose. and is initialized to zeros. which amounts to deallocating everything that has been allocated in arena since arena was created or since the last call to Arena_free for that arena. except they allocate memory from an arena. The block is aligned as for Arena_alloc. int line). reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software. and clears *ap. it raises the exception Arena_NewFailed. in arena. disposes of the arena itself. each of size nbytes. Arena_dispose frees the memory associated with the arena *ap. const char *file. Frank Liu Copyright © 1997 by David R. The allocation functions Arena_alloc and Arena_calloc are like the functions with similar names in the Mem interface. Hanson. ¢exported functions 91²+≡ extern void *Arena_alloc (T arena. The last two arguments to Arena_alloc and Arena_calloc are the file name and the line number of the location of the call. These pointers are passed to the other functions to specify an arena.

h> #include "assert.c²≡ #include <stdlib. This download file is made available for personal use only and is subject to the Terms of Service. char *avail. To allocate N bytes when N does not exceed limit-avail. Unauthorized use.h" #include "arena. char *limit.92 MORE MEMORY MANAGEMENT It is a checked runtime error to pass a null T to any routine in this interface. The prev field points to the head of the chunk. . The avail field points to the chunk’s first free location. }. which begins with an arena structure as described below. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 6. the space beginning at avail and up to limit is available for allocation. Frank Liu Copyright © 1997 by David R.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. ¢macros 98² ¢types 92² ¢data 96² ¢functions 93² An arena describes a chunk of memory: ¢types 92²≡ struct T { T prev. Hanson. If N exceeds C Interfaces and Implementations: Techniques for Creating Reusable Software.. The routines in this interface can be used with those in the Mem interface and with other allocators based on malloc and free.h" #include "except. Except_T Arena_Failed = "Arena Allocation Failed" }. and the limit field points just past the end of the chunk.h" #define T Arena_T const { const { Except_T Arena_NewFailed = "Arena Creation Failed" }. All rights reserved. avail is incremented by N and its previous value is returned.h> #include <string. Any other use requires prior written consent from the copyright owner.2 Implementation ¢arena.

Hanson.. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service.IMPLEMENTATION 93 limit-avail. and allocation proceeds. All rights reserved. which denotes an empty arena: ¢functions 93²≡ T Arena_new(void) { T arena = malloc(sizeof (*arena)).1 An arena with three chunks C Interfaces and Implementations: Techniques for Creating Reusable Software.1 shows the state of an arena after three chunks have been allocated. The shading denotes allocated space. the fields of arena are initialized so they describe the new chunk. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. Figure 6. a new chunk is allocated by calling malloc.com. prev avail limit Figure 6. The arena structure thus heads a linked list of chunks in which the links are the prev fields in copies of the arena structures that begin each chunk. Unauthorized use. Frank Liu Copyright © 1997 by David R. chunks can vary in size and may end with unallocated space if allocations don’t exactly fill the chunks. the current value of *arena is “pushed” by storing it at the beginning of the new chunk. Arena_new allocates and returns an arena structure with its fields set to null pointers. .

arena->limit = arena->avail = NULL.com. This download file is made available for personal use only and is subject to the Terms of Service.nbytes. Most allocations are trivial: They round the request amount up to the proper alignment boundary.94 MORE MEMORY MANAGEMENT if (arena == NULL) RAISE(Arena_NewFailed). *ap = NULL. long nbytes. ¢functions 93²+≡ void *Arena_alloc(T arena. assert(nbytes > 0). return arena->avail . . say. increment the avail pointer by the amount of the rounded request. } Arena_dispose calls Arena_free to deallocate the chunks in the arena. and return the previous value. All rights reserved. } As in the checking implementation of the Mem interface. } Arena uses malloc and free instead of. it then frees the arena structure itself and clears the pointer to the arena: ¢functions 93²+≡ void Arena_dispose(T *ap) { assert(ap && *ap). the size of the union C Interfaces and Implementations: Techniques for Creating Reusable Software. const char *file. return arena. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. so that it’s independent of other allocators. Unauthorized use. arena->prev = NULL. ¢round nbytes up to an alignment boundary 95² while (nbytes > arena->limit .. Hanson. Mem_alloc and Mem_free. Arena_free(*ap). free(*ap). reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R.arena->avail) { ¢get a new chunk 95² } arena->avail += nbytes. Any other use requires prior written consent from the copyright owner. int line) { assert(arena).

Frank Liu Copyright © 1997 by David R. gives the minimum alignment on the host machine. ¢ptr ← a new chunk 96² *ptr = *arena. long double ld.com. the current value of *arena is saved at the beginning of the new chunk. char *limit. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.arena->avail. If the request cannot be satisfied from the current chunk. double d. arena->limit = limit. nbytes is less than arena->limit . which is illustrated in the second chunk on the list shown in Figure 6. and arena’s fields are initialized so that allocation can continue: ¢get a new chunk 95²≡ T ptr. the chunk has at least nbytes of free space. arena->avail = (char *)((union header *)ptr + 1). Its fields are those that are most likely to have the strictest alignment requirements. Unauthorized use. This wastes the free space at the end of current chunk.. . and it is used to round up nbytes: ¢round nbytes up to an alignment boundary 95²≡ nbytes = ((nbytes + sizeof (union align) . arena->prev = ptr. so the body of the while loop in Arena_alloc above is not executed. For most calls. After a new chunk is allocated. }. Any other use requires prior written consent from the copyright owner. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. that is. long l.1. All rights reserved. a new chunk must be allocated.IMPLEMENTATION 95 ¢types 92²+≡ union align { int i. This download file is made available for personal use only and is subject to the Terms of Service.1)/ (sizeof (union align)))*(sizeof (union align)). void (*fp)(void). float f. void *p. ¢types 92²+≡ union header { C Interfaces and Implementations: Techniques for Creating Reusable Software. long *lp.

Hanson. allocation fails and Arena_alloc raises Arena_Failed: ¢raise Arena_Failed 96²≡ { if (file == NULL) C Interfaces and Implementations: Techniques for Creating Reusable Software. union align a. The structure assignment *ptr = *arena pushes *arena by saving it at the beginning of the new chunk. }. Arena_free keeps a few free chunks on a free list emanating from freechunks to reduce the number of times it must call malloc. This download file is made available for personal use only and is subject to the Terms of Service. and it sets the local variable limit for use in ¢get a new chunk 95² above: ¢data 96²+≡ static T freechunks. } If a new chunk must be allocated. Arena_alloc gets a free chunk from this list or by calling malloc.com.. Frank Liu Copyright © 1997 by David R. and the limit fields of those structures point just past the ends of their chunks. if (ptr == NULL) ¢raise Arena_Failed 96² limit = (char *)ptr + m. } else { long m = sizeof (union header) + nbytes + 10*1024. one is requested that is large enough to hold an arena structure plus nbytes. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. nfree is the number of chunks on the list. limit = ptr->limit. As shown below. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and have 10K bytes of available space left over. ¢ptr ← a new chunk 96²≡ if ((ptr = freechunks) != NULL) { freechunks = freechunks->prev. Any other use requires prior written consent from the copyright owner. If malloc returns null. The union header ensures that arena->avail is set to a properly aligned address for the first allocation in this new chunk. Unauthorized use. . static int nfree. This list is threaded through the prev fields of the chunks’ initial arena structures.96 MORE MEMORY MANAGEMENT struct T b. ptr = malloc(m). nfree--.

long count. while (arena->prev) { struct T tmp = *arena->prev. } assert(arena->limit == NULL). This download file is made available for personal use only and is subject to the Terms of Service. the while loop in Arena_alloc tries the allocation again. int line) { void *ptr. All rights reserved. line). count*nbytes. file. Any other use requires prior written consent from the copyright owner. long nbytes. line). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.IMPLEMENTATION 97 RAISE(Arena_Failed).com. assert(arena->avail == NULL). const char *file. Unauthorized use. It still might fail: If the new chunk came from freechunks. else Except_raise(&Arena_Failed. . file. } An arena is deallocated by adding its chunks to the list of free chunks. reproduction and/or distribution are strictly prohibited and violate applicable laws. This assignment and the assignment *arena = tmp thus “pops” the stack of arena structures C Interfaces and Implementations: Techniques for Creating Reusable Software. which is why there’s a while loop instead of an if statement. } Once arena points to the new chunk. Hanson. memset(ptr.. ptr = Arena_alloc(arena. ¢functions 93²+≡ void Arena_free(T arena) { assert(arena). ¢free the chunk described by arena *arena = tmp. assert(count > 0). } 98² The structure assignment to tmp copies to tmp all of the fields of the arena structure pointed to by arena->prev. '\0'. Arena_calloc simply calls Arena_alloc: ¢functions 93²+≡ void *Arena_calloc(T arena. Frank Liu Copyright © 1997 by David R. which also restores *arena to its initial state as the list is traversed. return ptr. count*nbytes). it might be too small to fill the request.

2. and have been described several times. Once the entire list is traversed. for example. freechunks = arena->prev. To avoid tying up too much storage. This download file is made available for personal use only and is subject to the Terms of Service. the chunk is added to freechunks. Hanson. and the dotted lines depict the pointers planted by the three assignments in the code above. Barrett and Zorn (1993) describe how to choose the appropriate arena automatically. The length of the list isn’t a problem. When nfree is less than THRESHOLD. Chunks on freechunks look like allocated memory to other allocators and thus might make calls to malloc fail. lcc’s allocator is slightly simpler than Arena’s: Its arenas are allocated statically. All rights reserved. The deallocated chunk is shown on the right.. . Arena_free keeps no more than ¢macros 98²≡ #define THRESHOLD 10 free chunks on freechunks. Their experiments suggest that the execution path to an allocation site is a good predictor of the lifetime of the block allocated at C Interfaces and Implementations: Techniques for Creating Reusable Software. and its deallocator doesn’t call free. but the free storage it holds might be. Any other use requires prior written consent from the copyright owner. all of the fields of arena should be null. freechunks->limit = arena->limit. Once nfree reaches THRESHOLD. subsequent chunks are deallocated by calling free: ¢free the chunk described by arena 98²≡ if (nfree < THRESHOLD) { arena->prev->prev = freechunks. nfree++. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. allocation was done by macros that manipulated arena structures directly and called a function only when a new chunk was needed. freechunks accumulates free chunks from all arenas and thus could get large.98 MORE MEMORY MANAGEMENT formed by the list of chunks.com. Further Reading Arena-based allocators are also known as pool allocators. In Figure 6. the chunk on the left is to be deallocated. } else free(arena->prev). Arena’s allocator (Hanson 1990) was originally developed for use in lcc (Fraser and Hanson 1995). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. In its initial versions.

Any other use requires prior written consent from the copyright owner.FURTHER READING 99 arena prev avail limit Licensed by Frank Liu 1740749 freechunks Figure 6. All rights reserved. usually when an allocation request can’t be filled. . In languages with garbage collectors. The advantages of this property are hard to overstate. The Vmalloc library includes an implementation of the malloc interface that provides memory checks similar to those done by Mem’s checking implementation. reproduction and/or distribution are strictly prohibited and violate applicable laws. Vmalloc permits clients to organize memory into regions and to provide functions that manage the memory in each region. and storage allocation bugs can’t occur. and it is used to choose one of several applicationspecific arenas. Vmalloc (Vo 1996) is a more general allocator that can be used to implement both the Mem and Arena interfaces.2 Deallocating a chunk when nfree < THRESHOLD that site. With a garbage collector. Arena-based allocation collapses many explicit deallocations into one.. and these checks can be controlled by setting environment variables. This download file is made available for personal use only and is subject to the Terms of Service. space is reclaimed automatically as necessary. Frank Liu Copyright © 1997 by David R.com. Garbage collectors go one step further: They avoid all explicit deallocations. programmers can almost ignore storage allocation. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. A garbage collec- C Interfaces and Implementations: Techniques for Creating Reusable Software. This information includes the call chain and the address of the allocation site. Unauthorized use.

If there’s not enough free space in that chunk. and Knuth (1973a) and Cohen (1981) cover the older algorithms in more depth. Despite this apparent handicap. Keeping track of the largest chunk in freechunks would avoid fruitless traversals in this scheme. They assume that any properly aligned bit pattern that looks like a pointer is one and that the block it points to is accessible. Collectors are usually used in languages that have enough compile-time or runtime data to supply the necessary information. Can you find an application whose memory use is reduced significantly by this change? 6.2 When Arena_alloc needs a new chunk. There is a large body of literature on garbage collection: Appel (1991) is a brief survey that emphasizes recent algorithms. the while loop in Arena_alloc could be replaced with an if statement. All rights reserved. most collectors must know which variables point to blocks and which fields in blocks point to other blocks.100 MORE MEMORY MANAGEMENT tor finds all blocks that are referenced by program variables. SmallTalk. such as C and C++. To find accessible blocks.1 Arena_alloc looks only in the chunk described by arena. reproduction and/or distribution are strictly prohibited and violate applicable laws. and Modula-3. it allocates a new chunk even if there is enough space in some other chunk further down the list. Examples include LISP. A conservative collector thus identifies some inaccessible blocks as accessible and therefore busy. A better choice would be to find the largest free chunk that satisfies the request. it takes the first one on the free list. and all blocks that are referenced by fields in these blocks. Does it make Arena_alloc noticeably slower? Does it use memory more efficiently? C Interfaces and Implementations: Techniques for Creating Reusable Software. ML. conservative collectors work surprising well in some programs (Zorn 1993). Unauthorized use. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Exercises 6. Icon. and so on.. This download file is made available for personal use only and is subject to the Terms of Service. allocating a new one only if freechunks doesn’t hold a suitable chunk. Change Arena_alloc so that it allocates space in an existing chunk if there’s one that has enough space. Frank Liu Copyright © 1997 by David R. With this change. the rest are inaccessible and can be reused. but has no choice but to overestimate the set of accessible blocks. . Implement this scheme and measure its benefits. and measure the resulting benefits. Conservative collectors (Boehm and Weiser 1988) can deal with languages that don’t provide enough type information. These are the accessible blocks. if there is one.com. Hanson.

Hanson. Marking a stack returns a value that encodes the current height of that stack.5 In a stack allocator. and deallocation pops a stack back to a previously marked height.. What checked runtime errors can you provide that will catch deallocation errors? Examples of such errors are deallocating at a point higher than the current the top of a stack.EXERCISES 101 6. . This download file is made available for personal use only and is subject to the Terms of Service. Design and implement an interface for a stack allocator. Frank Liu Copyright © 1997 by David R.com. or deallocating at a point that has already been deallocated and subsequently reallocated. push the current C Interfaces and Implementations: Techniques for Creating Reusable Software. an allocation pushes the new space onto the top of a specified stack and returns the address of its first byte. would change the size of the block pointed to by ptr to nbytes and return a pointer to the resized block. How would you change the implementation to support this function? What checked runtime errors would you support? 6. Any other use requires prior written consent from the copyright owner. Devise a way for Arena_alloc and Arena_free to monitor allocation and deallocation patterns and to compute THRESHOLD dynamically based on these patterns. The goal is to keep the free list as small as possible and to minimize the number of calls to malloc. like Mem_resize. const char *file. This interface might.3 Setting THRESHOLD to 10 means that free list will never hold more than about 100K bytes of memory. which would reside in the same arena (but not necessarily the same chunk) as the block given by ptr. for example. int line) which. Unauthorized use. Design and implement a single interface that supports both kinds of allocators. since Arena_alloc allocates chunks of at least 10K bytes. Other functions might. if it specified arena-based allocation. long nbytes.6 One problem with having more than one memory-allocation interface is that other interfaces must choose between them without knowing the best one for a particular application. All rights reserved.4 Explain why the Arena interface doesn’t support the function void *Arena_resize(void **ptr.” which can be changed by other functions. provide an allocation function that is like Mem_alloc but that operates in an “allocation environment. This environment would specify memory-management details. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. for example. such as which allocator and which arena to use. 6. 6.

102 MORE MEMORY MANAGEMENT environment on an internal stack and establish a new environment..com. C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. All rights reserved. Hanson. and pop the stack to reestablish a previous environment. Investigate these and other variations in your design. reproduction and/or distribution are strictly prohibited and violate applicable laws. .

Frank Liu Copyright © 1997 by David R. 103 C Interfaces and Implementations: Techniques for Creating Reusable Software. Lists so pervade programs that some languages provide them as built-in types. Scheme. although most application-specific interfaces have many similarities.h²≡ #ifndef LIST_INCLUDED #define LIST_INCLUDED #define T List_T typedef struct T *T.7 LISTS list is a sequence of zero or more pointers. Hanson. LISP. and ML are the best known examples.com. A 7. Sequences. and there’s no widely accepted standard interface for lists. Any other use requires prior written consent from the copyright owner. Almost every nontrival application uses lists in some form. This download file is made available for personal use only and is subject to the Terms of Service. . Lists are easy to implement. The List abstract data type described below provides many of the facilities found in most of these application-specific interfaces. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.1 Interface The complete List interface is ¢list. struct T { T rest. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. described in Chapter 11. are another way to represent lists. The number of pointers in a list is its length.. Unauthorized use. A list with zero pointers is an empty list. so programmers usually reimplement them for each application at hand.

it’s hard to imagine many other representations whose implementations would offer advantages significant enough to justify hiding the fact that list elements are structures with two fields. The exercises explore some of the few alternatives. extern void List_map (T list. .). and avoids allocations. Most ADTs hide the representation details of their types.. extern int List_length (T list). . Revealing List_T’s representation simplifies the interface and its use in several ways. All rights reserved.. extern T List_copy (T list). Frank Liu Copyright © 1997 by David R. T tail). reproduction and/or distribution are strictly prohibited and violate applicable laws. void **x). For example. This download file is made available for personal use only and is subject to the Terms of Service. extern T List_reverse(T list). void apply(void **x. Likewise. List_Ts have a trivial representation. extern T List_push (T list. extern T List_pop (T list. extern T List_list (void *x. All routines in this interface accept a null T for any list argument and interpret it as the empty list.com. void *cl). the assignments C Interfaces and Implementations: Techniques for Creating Reusable Software. and it creates a list with N nodes whose first fields hold the N nonnull pointers and whose Nth rest field is null. other structures can have struct List_Ts embedded in them. It’s called with N nonnull pointers followed by one null pointer.104 LISTS void *first. void *cl). extern T List_append (T list. void *x). A null List_T is an empty list. }. which is useful for building lists at compile time. and functions aren’t needed to access the first and rest fields. the complications induced by the alternatives outweigh the benefits of doing so. Unauthorized use. extern void **List_toArray(T list. List reveals these details because for this particular ADT. List_list creates and returns a list. which is its natural representation. void *end). Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner.. For example. extern void List_free (T *list). #undef T #endif A List_T is a pointer to a struct List_T. variables of type struct List_T can be defined and initialized statically.

and returns the new list.INTERFACE 105 List_T p1. "Mem"). (void *)List_list("Mem". void **x) assigns the first field of the first node to *x. (void *)List_list("Arena". p2 = List_list("Atom". the correct call is p = List_list(List_list("Atom". It is an unchecked runtime error to omit the casts shown in this example. p2. and List.. p1 = List_list(NULL). "Mem". "Arena"). and returns the resulting list.com. All rights reserved. List_list can raise Mem_Failed. List_append(T list. return the empty list and a list with four nodes holding the pointers to the strings Atom. List_list("Except". to build a list of four one-element lists that hold the strings Atom. Given a nonempty list. if x is nonnull. For example. Hanson. List_list assumes the pointers passed in the variable part of its argument list are void. NULL). List_push(T list. Mem. reproduction and/or distribution are strictly prohibited and violate applicable laws. void *x) adds a new node that holds x to the beginning of list. NULL). There’s no prototype to provide the necessary implicit conversions. Such casts are one of the pitfalls of variable length argument lists. removes and deallocates the first node. p2 = List_append(p2. NULL). "Arena". Unauthorized use. List_push(p2. NULL). Any other use requires prior written consent from the copyright owner. Arena. Given an empty list. NULL)). List_push(p2. and List. List_push is another way to create a new list. Mem. so programmers must provide casts when passing other than char pointers and void pointers as the second and subsequent arguments. (void *)List_list("List". List_push can raise Mem_Failed. for example. Thus. List_pop(T list. NULL). List_pop simply returns it and does not change *x. . Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. List_push(p2. C Interfaces and Implementations: Techniques for Creating Reusable Software. NULL). creates the same list as the assignment to p2 above. Arena. "List"). it returns tail. If list is null. T tail) appends one list to another: It assigns tail to the last rest field in list. "Atom"). "List". p2 p2 p2 p2 = = = = List_push(NULL.

List_free has no effect. reproduction and/or distribution are strictly prohibited and violate applicable laws. List_free deallocates all of the nodes on *list and sets it to the null pointer. Arena. fprintf(fp. it can change them.. Mem. Clients can pass an application-specific pointer. For each node in list. and Atom. Mem. after executing List_T p3 = List_reverse(List_copy(p2)). "%s\n". List_free takes a pointer to a T. *apply is called with a pointer to its first field and with cl. and Except. void *cl) { char **str = (char **)x. p2 = List_reverse(p2). or nonapplicative—they may change the lists passed to them and return the resulting lists. Any other use requires prior written consent from the copyright owner. It is a checked runtime error to pass a null pointer to List_free. If *list is null. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. p3 is the list Atom. FILE *fp = cl. and List.106 LISTS sets p2 to the five-element list formed by appending the one-element list holding Except to the four-element list created above. apply and cl are called a closure or callback: They specify an operation and some context-specific data for that operation. Unauthorized use. For example. *str). given void mkatom(void **x. List_copy can raise Mem_Failed. p2 remains unchanged. List_copy is an applicative function: It makes and returns a copy of its argument. } the call List_map(p3. List_length returns the number of nodes in its argument. returns a list that holds Except. For example. to List_map.com. and this pointer is passed along to *apply as its second argument. *str = Atom_string(*str). All rights reserved. List. . Frank Liu Copyright © 1997 by David R. Taken together. Thus. Arena. List_reverse reverses the order of the nodes in its list argument and returns the resulting list. Most of the routines described so far are destructive. List_map calls the function pointed to by apply for every node in list. mkatom. Hanson. stderr) replaces the strings in p3 with equivalent atoms and prints C Interfaces and Implementations: Techniques for Creating Reusable Software. Since *apply is called with pointers to the first fields. cl. If *list is nonnull. This download file is made available for personal use only and is subject to the Terms of Service.

i++) printf("%s\n". char **array = (char **)List_toArray(p3. const void *))compare). … List_map(names. applyFree. which is often the null pointer. It is an unchecked runtime error for apply to change list. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. List_toArray can raise Mem_Failed. void *cl) { FREE(*ptr). qsort((void **)array. As suggested by this example. array[i]. sizeof (*array). If the list is empty. frees the data in the list names and then frees the nodes themselves. List_toArray returns a oneelement array. (int (*)(const void *. clients must deallocate the array returned by List_toArray. List_length(p3). . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. FREE(array). Unauthorized use. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. List_toArray(T list. for (i = 0. array[i]).com. } which can be used to deallocate the space pointed to by the first fields of a list before the list itself is deallocated. For example: List_T names. List_free(&names). compare and its use with the standard library function qsort are described on page 123. Given a list with N values. reproduction and/or distribution are strictly prohibited and violate applicable laws. void *end) creates an array in which elements zero through N-1 hold the N values from the first fields of the list and the Nth element holds the value of end. For example. NULL).. List_toArray returns a pointer to the first element of this array. Hanson. NULL). the elements of p3 can be printed in sorted order by int i. C Interfaces and Implementations: Techniques for Creating Reusable Software. Another example is void applyFree(void **ptr.INTERFACE 107 Atom Mem Arena List Except on the error output.

return p. *p = &list. List_list.108 LISTS 7. x). Any other use requires prior written consent from the copyright owner. } The other list-creation function. is more complicated because it must cope with a variable number of arguments and must append a new node to the evolving list for each nonnull pointer argument.h> <stddef. This download file is made available for personal use only and is subject to the Terms of Service.) { va_list ap. it uses a pointer to the pointer to which the new node should be assigned: ¢functions 108²+≡ T List_list(void *x. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. p->rest = list...h> "assert. void *)) { NEW(*p). It allocates one node.2 Implementation ¢list. x = va_arg(ap. for ( . All rights reserved. .h" "mem.com. Unauthorized use. p->first = x.h" #define T List_T ¢functions 108² List_push is the simplest of the List functions.c²≡ #include #include #include #include #include <stdarg. initializes it. Hanson. void *x) { T p. T list. To do so. x.. C Interfaces and Implementations: Techniques for Creating Reusable Software.h" "list. NEW(p). va_start(ap. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . and returns a pointer to it: ¢functions 108²≡ T List_push(T list.

p = &(*p)->rest. so a pointer to the first node is assigned to list. List_append illustrates another use of this idiom: C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. This idiom ensures that List_list(NULL) returns the empty list — a null pointer. All rights reserved. Thereafter. It uses one succinct mechanism to deal with two conditions: the initial node in a possibly empty list. } *p = NULL. so an assignment to *p appends a node to the list.. List_list’s use of pointers to pointers — List_T *s — is typical of many list-manipulation algorithms. which might be the initial value of x. va_end(ap). reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. Frank Liu Copyright © 1997 by David R. Unauthorized use. Any other use requires prior written consent from the copyright owner. } p starts by pointing to list. return list. and the interior nodes of a nonempty list. . The following figure shows the effect of the initialization of p and of the statements in the body of the for loop as List_list builds a three-node list. p points to the rest field of the last node on the list.com. p list Licensed by Frank Liu 1740749 p p p Each trip through the loop assigns the next pointer argument to x and breaks when it hits the first null-pointer argument.IMPLEMENTATION 109 (*p)->first = x.

If list itself is the null pointer. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. while (*p) p = &(*p)->rest. reproduction and/or distribution are strictly prohibited and violate applicable laws. T tail) { T *p = &list. Any other use requires prior written consent from the copyright owner. List_copy is the last of the List functions that uses the pointer-topointer idiom: ¢functions 108²+≡ T List_copy(T list) { T head. . if (x) *x = list->first. return list. for ( . so the perhaps more obvious implementations suffice. list = list->rest) { NEW(*p). All rights reserved. } List_append walks p down list until it points to the null pointer at the end of the list to which tail should be assigned. (*p)->first = list->first. *p = tail. } Pointers to pointers don’t simplify List_pop or List_reverse. } *p = NULL. Frank Liu Copyright © 1997 by David R.com. C Interfaces and Implementations: Techniques for Creating Reusable Software. *p = &head.110 LISTS ¢functions 108²+≡ T List_append(T list.. void **x) { if (list) { T head = list->rest. Unauthorized use. Hanson. List_pop removes the first node in a nonempty list and returns the new list. This download file is made available for personal use only and is subject to the Terms of Service. p ends up pointing to list. which has the desired effect of appending tail to the empty list. return head. list. p = &(*p)->rest. or simply returns an empty list: ¢functions 108²+≡ T List_pop(T list.

} The following figure depicts the situation at each loop iteration just after the first statement in the loop body. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. list. new always points to the first node of the reversed list: ¢functions 108²+≡ T List_reverse(T list) { T head = NULL. for ( . the assignment to next. Frank Liu Copyright © 1997 by David R. } return head. next. head next list C Interfaces and Implementations: Techniques for Creating Reusable Software. . list = next) { next = list->rest. list and next. head = list. } If x is nonnull.. Any other use requires prior written consent from the copyright owner. List_reverse walks two pointers. down the list once and uses them to reverse the list in place as it goes. Notice that List_pop must save list->rest before deallocating the node pointed to by list. Unauthorized use.com.IMPLEMENTATION 111 FREE(list). } else return list. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. list->rest = head. *x is assigned the contents of the first field of the first node before that node is discarded. return head. All rights reserved. is executed for the third element in the list.

which begins with the predecessor of list or is null if list points to the first node. and List_free walks down list deallocating each node: ¢functions 108²+≡ int List_length(T list) { int n. and head points to the reversed list. FREE(*list). assert(list).. list. *list = next) { next = (*list)->rest. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusable Software.com. Frank Liu Copyright © 1997 by David R. and the increment expression list = next advances list to its successor. All rights reserved. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. The second and third statements in the loop body push the node pointed to by list onto the front of head. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. List_length walks down list counting its nodes. list = list->rest) n++. for ( . for (n = 0. at which point the list is: head next list next is advanced again the next time through the loop body.112 LISTS next points to the successor of list or is null if list points to the last node. . *list. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } void List_free(T *list) { T next. return n.

list. and copies the pointers into the array: ¢functions 108²+≡ void **List_toArray(T list. } array[i] = end. so clients never need to check for null pointers.com. and for manipulating C Interfaces and Implementations: Techniques for Creating Reusable Software. list = list->rest) apply(&list->first. but doing so means that List_toArray always returns a nonnull pointer to an array. Frank Liu Copyright © 1997 by David R. return array. like those provided by List. cl). for ( . . Further Reading Knuth (1973a) describes all of the important algorithms for manipulating singly linked lists. } List_toArray allocates an N+1-element array to hold the pointers in an N-element list. void *end) { int i. n = List_length(list). All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. void **array = ALLOC((n + 1)*sizeof (*array)). } Allocating a one-element array for an empty list may seem a waste. Any other use requires prior written consent from the copyright owner. i++) { array[i] = list->first. but it’s trivial. i < n. for (i = 0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. List_map simply walks down list calling the closure function with a pointer to each node’s first field and with the client-specific pointer cl: ¢functions 108²+≡ void List_map(T list. because the closure function does all the work. void *cl). void *cl) { assert(apply). void apply(void **x. Unauthorized use.FURTHER READING 113 } } List_map sounds complicated.. list = list->rest. This download file is made available for personal use only and is subject to the Terms of Service.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and in functional languages like ML (Ullman 1994). so it takes O(N) time for N-element lists. Design an interface for a list ADT that uses circularly linked lists. which are provided by Ring (described in Chapter 12). which holds a pointer to the list itself or to both the first and last node of the list. and List_copy without using pointers to pointers. All rights reserved. . and the list itself is represented by a pointer to the last node. which is one of the most frequently used list operations in many applications. The list head could also hold the length of the list. Circularly linked lists are another representation for singly linked lists. This download file is made available for personal use only and is subject to the Terms of Service. then do an implementation. List_append.com. One approach is to make List_T an opaque pointer that points to a list head. Thus. and appending to a circularly linked list can be done in constant time. Experiment with interfaces that both hide and reveal this representation. Abelson and Sussman (1985) is one of the many textbooks that show how lists can be used to conquer almost any problem. Lists are used for everything in list-manipulation languages like LISP and Scheme. 7. The rest field of the last node in a circularly linked list points to the first node. Exercises 7. Any other use requires prior written consent from the copyright owner.114 LISTS doubly linked lists. The free list in the checking implementation of the Mem interface is an example of a circularly linked list.1 Design a list ADT that hides the representation of lists and does not use null pointers for empty lists. 7. both the first and last node can be reached in constant time. it uses Scheme.4 List_append. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R.. must walk down to the end of the list.2 Rewrite List_list. Hanson.3 Rewrite List_reverse using pointers to pointers. 7. Design the interface first. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusable Software.

8 TABLES n associative table is a set of key-value pairs. wf. only clients inspect keys via functions passed to routines in Table. Section 8. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. maintain symbol tables. All rights reserved. Document-preparation systems use tables to represent indices: For example. Hanson. but it never inspects keys themselves. Unauthorized use. also uses the Atom and Mem interfaces. A 8. The Table interface is designed so that it can be used for many of these uses. It’s like an array except that the indices can be values of any type. and the examples alone could fill a chapter. for example. Some window systems maintain tables that map window titles into some kind of window-related data structures. It maintains key-value pairs.com.h²≡ #ifndef TABLE_INCLUDED #define TABLE_INCLUDED 115 C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner.1 Interface Table represents an associative table with an opaque pointer type: ¢table. which map names to sets of attributes for those names..2 describes a typical Table client. . Tables have many uses. Compilers. Frank Liu Copyright © 1997 by David R. a program that prints the number of occurrences of words in its input. the index might be a table in which the keys are one-character strings — one for each section of the index — and the values are other tables in which the keys are the strings for the index entries themselves and the values are lists of page numbers. This program. Many applications use tables.

the keys in the new table are assumed to be atoms and the implementation of Table provides a suitable hash function. if cmp(x. and visit the key-value pairs in them. and two keys x and y are equal if x = y. respectively. hint. the hash table implementation described in Section 116² C Interfaces and Implementations: Techniques for Creating Reusable Software. Each table can have its own hash and cmp functions. Similarly. hash must return a hash number for key. Atoms are often used as keys. and a comparison function — provide more information than most implementations need. The standard library function strcmp is an example of a comparison function suitable for keys that are strings.y) must return an integer less than zero. Hanson. add and remove key-value pairs from those tables. then hash(x) must be equal to hash(y). All rights reserved. Frank Liu Copyright © 1997 by David R.116 TABLES #define T Table_T typedef struct T *T. Table_Ts are allocated and deallocated by ¢exported functions 116²≡ extern T Table_new (int hint. ¢exported functions #undef T #endif The exported functions allocate and deallocate Table_Ts. For example. All tables can hold an arbitrary number of entries regardless of the value of hint. Any other use requires prior written consent from the copyright owner. or x is greater than y. It is a checked runtime error for hint to be negative. extern void Table_free(T *table). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It is a checked runtime error to pass a null Table_T or null key to any function in this interface. x and y. cmp(x. is an estimate of the number of entries that the new table is expected to hold.y) returns zero. equal to zero. Given two keys. or greater than zero. Unauthorized use. The functions cmp and hash manipulate clientspecific keys. . const void *y). if cmp is the null function pointer. so if hash is the null function pointer. x is less than y. if. unsigned hash(const void *key)). a hash function. reproduction and/or distribution are strictly prohibited and violate applicable laws.com.. keys are assumed to be atoms. This download file is made available for personal use only and is subject to the Terms of Service. Table_new can raise Mem_Failed. Table_new’s arguments — a size hint. x equals y. Table_new’s first argument. but accurate values of hint may improve performance. int cmp(const void *x.

and implementations that use trees don’t need the hint or the hash function. add a new key-value pair or change the value of an existing pair. extern void *Table_get (T extern void *Table_remove(T table). void *end). returns its associated value. If table doesn’t hold key. and this feature is one of the reasons designing good interfaces is difficult. This download file is made available for personal use only and is subject to the Terms of Service. Table_free does not deallocate the keys or values. void **value.3 needs a comparison function that tests only for equality. C Interfaces and Implementations: Techniques for Creating Reusable Software. The functions ¢exported functions 116²+≡ extern int Table_length(T extern void *Table_put (T void *value). table. . if it’s found. which grows by one entry. const void *key). Table_free deallocates *table and sets it to the null pointer. Frank Liu Copyright © 1997 by David R. It is a checked runtime error for table or *table to be null. void *cl). and returns the removed value. Notice that returning the null pointer is ambiguous if table holds null pointer values. If table already holds key. reproduction and/or distribution are strictly prohibited and violate applicable laws. extern void **Table_toArray(T table. This complexity is the price of a design that permits multiple implementations. If table doesn’t hold key. Table_remove searches table for key and. All rights reserved. Table_length returns the number of key-value pairs in table.. const void *key. and remove a key-value pair. void apply(const void *key. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Table_put can raise Mem_Failed. table. which thus shrinks by one entry. key and value are added to table. Table_get returns the null pointer. see Table_map. value overwrites the previous value.com. Table_get searches table for key and. Table_remove has no effect on table and returns the null pointer. const void *key). Any other use requires prior written consent from the copyright owner. fetch the value associated with a key. void *cl). Otherwise. The functions ¢exported functions 116²+≡ extern void Table_map (T table. and Table_put returns the null pointer. Hanson. Unauthorized use. and Table_put returns the previous value. table. return the number of keys in a table.INTERFACE 117 8. removes the key-value pair from table. if it’s found. Table_put adds the key-value pair given by key and value to table.

deallocates the values in table and then table itself. and clients must deallocate the array it returns. The last even-numbered element. For each pair in table. it can change them. apply is called with its key. Table_toArray can raise Mem_Failed. 8. Since apply is called with pointers to the values. } deallocates just the values. assuming the keys are atoms. The keys and values alternate. static void vfree(const void *key. void *cl) { FREE(*value). a pointer to its value. The order of the key-value pairs in the array is unspecified.118 TABLES visit the key-value pairs and collect them into an array. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. Table_toArray builds an array with 2N+1 elements and returns a pointer to the first element. Table_map calls the function pointed to by apply for every key-value pair in table in an unspecified order. Table_free(&table).2 Example: Word Frequencies wf lists the number of times each word appears in a list of named files or in the standard input if no files are specified. For example.com. so Table_map(table. Frank Liu Copyright © 1997 by David R. with keys appearing in the even-numbered elements and their associated values in the following odd-numbered elements. Unauthorized use. It is a checked runtime error for apply to change the contents of table by calling Table_put or Table_remove. Table_map can also be used to deallocate keys or values before deallocating the table. Given a table with N key-value pairs. For example: C Interfaces and Implementations: Techniques for Creating Reusable Software. NULL).2 illustrates the use of Table_toArray. Any other use requires prior written consent from the copyright owner. vfree. which is often the null pointer. at index 2N. to Table_map and this pointer is passed along to apply at each call. This download file is made available for personal use only and is subject to the Terms of Service. . All rights reserved. The program described in Section 8. apply and cl specify a closure: Clients can pass an application-specific pointer. is assigned end. cl. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. void **value. and cl.

Words of this form are recognized by getword. char *buf. it returns zero.c: 1 allocation 7 assert 12 book 1 stdlib 9 void . More generally.EXAMPLE: WORD FREQUENCIES 119 % wf table.c: 3 apply 7 array 13 assert 9 binding 18 book 2 break 10 buckets . it starts with a character for which first returns a nonzero value followed by characters for Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. int rest(int c)).size-1]. Any other use requires prior written consent from the copyright owner.com. All rights reserved.. A word is a contiguous sequence of characters. Unauthorized use.. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h²≡ #include <stdio. and case doesn’t matter. the words in each file are listed in alphabetical order and are preceded by the number of times they appear in the file. which is a generalization of double’s getword described in Section 1.. stores it as a null-terminated string in buf[0.1. When it reaches the end of file without consuming a word.. For wf.c table. a word begins with a character in a first set followed by zero or more characters in a rest set. int first(int c).. getword consumes the next word in the file opened on fp. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. 4 y mem. and returns one.. As this output shows.c mem. . int size. a word is a letter followed by zero more letters or underscores. It’s used enough in this book to be packaged separately in its own interface: ¢getword. Frank Liu Copyright © 1997 by David R. The functions first and rest test a character for membership in first and rest.h> extern int getword(FILE *fp.

c²≡ #include #include #include #include #include <ctype. } C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. break. } ¢store c in buf if it fits 120²≡ { if (i < size . All rights reserved.1) buf[i++] = c.h> "assert. ¢getword. This download file is made available for personal use only and is subject to the Terms of Service. char *buf. the excess characters are discarded.h> <string. size must exceed one.h" "getword. Frank Liu Copyright © 1997 by David R. else buf[size-1] = '\0'. for ( . int rest(int c)) { int i = 0. c != EOF && rest(c). } for ( .h" int getword(FILE *fp.h> <stdio. Any other use requires prior written consent from the copyright owner. If a word is longer than size-2 characters.com. and rest must be nonnull. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. c = getc(fp)) if (first(c)) { ¢store c in buf if it fits 120² c = getc(fp).. c = getc(fp)) ¢store c in buf if it fits 120² if (i < size) buf[i] = '\0'. int size.120 TABLES which rest returns nonzero values. fp). assert(fp && buf && size > 1 && first && rest). if (c != EOF) ungetc(c. return i > 0. buf. int first(int c). c != EOF. c. c = getc(fp). first. and fp. Hanson. . Unauthorized use.

Hanson.com. char *argv[]) { int i. } } if (argc == 1) wf(NULL. converted to an atom. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. that character is stored in buf and only subsequent characters are passed to rest. Values are pointers.EXAMPLE: WORD FREQUENCIES 121 This version of getword is a bit more complex than the version in double because this one must work when a character is in first but not in rest. argv[i]. main calls wf with a null file name and the file pointer for the standard input. wf uses a table to store the words and their counts. i++) { FILE *fp = fopen(argv[i].h> #include <errno. i < argc. for (i = 1. All rights reserved. It thus allocates space for a counter and stores a pointer to this space in the table. Any other use requires prior written consent from the copyright owner. } ¢wf includes 121²≡ #include <stdio. Frank Liu Copyright © 1997 by David R. . "r"). "%s: can't open '%s' (%s)\n". When first returns nonzero. but wf needs to associate an integer count with each key. C Interfaces and Implementations: Techniques for Creating Reusable Software. } else { wf(argv[i]. Each word is folded to lowercase. which name files. reproduction and/or distribution are strictly prohibited and violate applicable laws. wf’s main function processes its arguments. The null file name tells wf not to print the name of the file. main opens each file and calls wf with the file pointer and file name: ¢wf functions 121²≡ int main(int argc. stdin). if (fp == NULL) { fprintf(stderr. This download file is made available for personal use only and is subject to the Terms of Service. and used as a key. return EXIT_SUCCESS. Using atoms lets wf use the defaults for the table’s hash and comparison functions. strerror(errno)).h> #include <stdlib. return EXIT_FAILURE. Unauthorized use. fp).h> If there are no arguments. fclose(fp). argv[0].

word = Atom_string(buf). char buf[128].h> #include "atom. count). else { NEW(count). *count = 1. NULL). If Table_get returns null. buf[i] != '\0'. All rights reserved. the expression (*count)++ increments the integer pointed to by that pointer. initializes it to one to account for this first occurrence of the word.h" #include "table. Table_put(table. NULL. i++) buf[i] = tolower(buf[i]). first. } } if (name) printf("%s:\n". count = Table_get(table. name). . the word isn’t in table. When Table_get returns a nonnull pointer. Frank Liu Copyright © 1997 by David R. rest)) { const char *word. count is a pointer to an integer. if (count) (*count)++.h" #include "getword.. *count. while (getword(fp. Any other use requires prior written consent from the copyright owner. sizeof buf. Unauthorized use.h" #include "mem. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int i. and adds it to the table. buf. so wf allocates space for the counter. reproduction and/or distribution are strictly prohibited and violate applicable laws. word). This download file is made available for personal use only and is subject to the Terms of Service. FILE *).h" ¢wf prototypes 122²≡ void wf(char *. { ¢print the words 123² } ¢deallocate the entries and table 124² } ¢wf includes 121²+≡ #include <ctype. word. Hanson. FILE *fp) { Table_T table = Table_new(0.122 TABLES ¢wf functions 121²+≡ void wf(char *name.com. for (i = 0. This expression is C Interfaces and Implementations: Techniques for Creating Reusable Software.

To treat each of the N key-value pairs as a single element. It can then print the words and their counts by walking down the array: ¢print the words 123²≡ int i. qsort.com. the standard C library sorting function. Membership in first and rest is tested by functions of the same names that use the predicates defined in the standard header ctype. All rights reserved. the size of each element in bytes. . FREE(array). NULL). } int rest(int c) { return isalpha(c) || c == '_'. 2*sizeof (*array).EXAMPLE: WORD FREQUENCIES 123 much different than *count++. compare). which would increment count instead of the integer it points to. } ¢wf prototypes 122²+≡ int first(int c). (char *)array[i]). array[i]. qsort takes four arguments: the array. Once wf has read all of the words. void **array = Table_toArray(table. sorts an array. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. it must sort and print them. wf tells qsort that there are N elements and that each takes the space occupied by two pointers. Unauthorized use. qsort calls the comparison function with pointers to the elements. Each element is itself two pointers — one to the word and one to the C Interfaces and Implementations: Techniques for Creating Reusable Software. and a function that’s called to compare two elements. so wf can sort the array returned by Table_toArray if it tells qsort that key-value pairs in the array should be treated as single elements. Table_length(table). reproduction and/or distribution are strictly prohibited and violate applicable laws. int rest (int c). qsort(array. for (i = 0.h: ¢wf functions 121²+≡ int first(int c) { return isalpha(c). Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. the number of elements. *(int *)array[i+1]. i += 2) printf("%d\t%s\n".. Hanson. This download file is made available for personal use only and is subject to the Terms of Service.

} ¢wf includes 121²+≡ #include <string. so. Any other use requires prior written consent from the copyright owner. const void *y). ¢wf functions 121²+≡ void vfree(const void *key. vfree.124 TABLES count — so the comparision function is called with two pointers to pointers to characters. it should deallocate the table and the counts before it returns. Hanson. NULL)..com. to save space. *(char **)y).h> ¢wf prototypes 122²+≡ int compare(const void *x. . The wf function is called for each file-name argument. void **count. and Table_free deallocates the table itself. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. A call to Table_map deallocates the counts. void *cl) { C Interfaces and Implementations: Techniques for Creating Reusable Software. Table_free(&table). reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.c is compared with book. the arguments x and y are x assert 7 y book 12 The comparison function can compare the words by calling strcmp: ¢wf functions 121²+≡ int compare(const void *x. For instance. ¢deallocate the entries and table 124²≡ Table_map(table. when assert from mem. const void *y) { return strcmp(*(char **)x. Frank Liu Copyright © 1997 by David R.

Frank Liu Copyright © 1997 by David R. and so must not be. The keys aren’t deallocated because they’re atoms.IMPLEMENTATION 125 FREE(*count). reproduction and/or distribution are strictly prohibited and violate applicable laws. Each Table_T is thus a pointer to a structure that holds a hash table of bindings. Besides. which carry the key-value pairs: ¢types 125²≡ struct T { ¢fields 126² struct binding { struct binding *link. see Exercise 8.c²≡ ¢wf includes 121² ¢wf prototypes 122² ¢wf functions 121² 8.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h" "assert.h> "mem. void **.c²≡ #include #include #include #include #include <limits. void *). Any other use requires prior written consent from the copyright owner.3 Implementation ¢table. } ¢wf prototypes 122²+≡ void vfree(const void *.h> <stddef.2). Collecting the various wf. All rights reserved.h" "table. some of them are likely to appear in subsequent files.c fragments forms the program wf: ¢wf. .. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusable Software.h" #define T Table_T ¢types 125² ¢static functions ¢functions 126² 127² Hash tables are one of the obvious data structures for representing associative tables (trees are the other one. Unauthorized use. Hanson.

All rights reserved. 65521. reproduction and/or distribution are strictly prohibited and violate applicable laws. i < table->size. 4093. Frank Liu Copyright © 1997 by David R. 32771. table = ALLOC(sizeof (*table) + primes[i-1]*sizeof (table->buckets[0])). } **buckets. void *value. assert(hint >= 0). i++) . C Interfaces and Implementations: Techniques for Creating Reusable Software. 16381. static int primes[] = { 509. table->length = 0. The cmp and hash functions are associated with a particular table. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner.com. table->cmp = cmp ? cmp : cmpatom. table->hash = hash ? hash : hashatom. for (i = 1. const void *y). . table->buckets = (struct binding **)(table + 1). int i. Table_new uses its hint argument to choose a prime for the size of buckets. 509. int (*cmp)(const void *x. int cmp(const void *x.126 TABLES const void *key. for (i = 0. so they are also stored in the structure along with the number of elements in buckets: ¢fields 126²≡ int size. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 2053.. Hanson. Unauthorized use. table->size = primes[i-1]. const void *y). table->timestamp = 0. and it saves either cmp and hash or pointers to static functions for comparing and hashing atoms: ¢functions 126²≡ T Table_new(int hint. INT_MAX }. primes[i] < hint. unsigned (*hash)(const void *key). i++) table->buckets[i] = NULL. buckets points to an array with the appropriate number of elements. 1021. 8191. unsigned hash(const void *key)) { T table. }.

and searching the list for a key equal to key. Each element in buckets heads a linked list of binding structures that hold a key. All of the keys in each list have the same hash number. and primes[i-1] gives the number of elements in buckets. Since atoms x and y are equal if x = y.com. Unauthorized use. taking it modulo the number of elements in buckets. cmpatom returns zero when x = y and one otherwise. All rights reserved. so cmpatom doesn’t need to test the relative order of x and y. it is shifted right two bits because it’s likely that each atom starts on a word boundary. It calls the table’s hash and cmp functions. its associated value. the functions ¢static functions 127²≡ static int cmpatom(const void *x.. Notice that the loop starts at index 1. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. Mem’s ALLOC allocates the structure and space for buckets. . const void *y) { return x != y. const void *key) { int i. Table uses a prime for the size of its hash table because it has no control over how hash numbers for keys are computed. so the rightmost two bits are probably zero. } static unsigned hashatom(const void *key) { return (unsigned long)key>>2. which yield a wide range of hash-table sizes. } are used instead. Atom uses a simpler algorithm because it also computes the hash numbers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 127 return table. } The for loop sets i to the index of the first element in primes that is equal to or exceeds hint. Table_get finds a binding by hashing its key. An atom is an address and this address itself can be used as a hash number. Figure 8. and a pointer to the next binding structure on the list. This download file is made available for personal use only and is subject to the Terms of Service. The values in primes are the primes nearest n 2 for n from 9 to 16.1 gives an example. ¢functions 126²+≡ void *Table_get(T table. struct binding *p. C Interfaces and Implementations: Techniques for Creating Reusable Software. This particular implementation of Table tests keys for equality only. If cmp or hash is the null function pointer.

128 TABLES ••• key link value ••• Figure 8. and it thus leaves p pointing to the binding of interest.1 Table layout assert(table). ¢search table for key 128² return p ? p->value : NULL. This download file is made available for personal use only and is subject to the Terms of Service. Hanson. All rights reserved. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. Prepared for frliu@microsoft. p ends up null. C Interfaces and Implementations: Techniques for Creating Reusabl . } ¢search table for key 128²≡ i = (*table->hash)(key)%table->size. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusable Software. p = p->link) if ((*table->cmp)(key.com. assert(key). p->key) == 0) break.. Frank Liu Copyright © 1997 by David R. for (p = table->buckets[i]. Otherwise. p. This for loop terminates when it finds the key.

p->value = value. reproduction and/or distribution are strictly prohibited and violate applicable laws. .IMPLEMENTATION 129 Table_put is similar. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. Hanson. but adding to the front of the list is the easiest and most efficient alternative. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. void *prev. Any other use requires prior written consent from the copyright owner. assert(key). if it finds it. prev = NULL. ¢functions²+≡ void *Table_put(T table. return prev. void *value) { int i. assert(table). p->link = table->buckets[i]. changes the associated value. table->timestamp++. Unauthorized use. it searches for a key and. p->key = key.com. table->buckets[i] = p. length is the number of bindings in the table. } Table_put increments two per-table counters: ¢fields 126²+≡ int length.. and adds that binding to the front of the appropriate list hanging off of buckets. table->length++. it’s returned by Table_length: ¢functions 126²+≡ int Table_length(T table) { Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. const void *key. struct binding *p. it allocates and initializes a new binding. } else prev = p->value. ¢search table for key 128² if (p == NULL) { NEW(p). unsigned timestamp. If Table_put doesn’t find the key. It could link the new binding in anywhere on the list.

cl). Unauthorized use.com. it asserts that the table’s timestamp is still equal to this saved value.. but does so by using a pointer to a pointer to a binding so that it can remove the binding for the key if it finds it: ¢functions 126²+≡ void *Table_remove(T table. return table->length. assert(apply). timestamp is used to implement the checked runtime error that Table_map must enforce: the table can’t be changed while Table_map is visiting its bindings. assert(key). stamp = table->timestamp. struct binding **pp. assert(table). assert(table->timestamp == stamp). void *cl) { int i. Any other use requires prior written consent from the copyright owner. i < table->size. } } Table_remove also searches for a key. } A table’s timestamp is incremented every time the table is changed by Table_put or Table_remove. After each call to apply. void *cl). p. All rights reserved. i++) for (p = table->buckets[i]. Hanson. p = p->link) { apply(p->key.130 TABLES assert(table). struct binding *p. void **value. table->timestamp++. unsigned stamp. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. ¢functions 126²+≡ void Table_map(T table. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. void apply(const void *key. for (i = 0. &p->value. assert(table). . Table_map saves the value of timestamp upon entry. const void *key) { int i.

If Table_remove finds the key. for (pp = &table->buckets[i].. } return NULL. pointing to the link field of the kth binding in the list when the k+1st binding is examined. Unauthorized use. p holds the value of *pp. (*pp)->key) == 0) { struct binding *p = *pp. and fills in the array by visiting each binding in table: ¢functions 126²+≡ void **Table_toArray(T table. pp starts by pointing to table->buckets[i] and follows along the list. Hanson. pp = &(*pp)->link) if ((*table->cmp)(key. C Interfaces and Implementations: Techniques for Creating Reusable Software. FREE(p).com. Frank Liu Copyright © 1997 by David R. struct binding *p. j = 0. table->length--. reproduction and/or distribution are strictly prohibited and violate applicable laws.IMPLEMENTATION 131 i = (*table->hash)(key)%table->size. This download file is made available for personal use only and is subject to the Terms of Service. it also decrements the table’s length. return value. All rights reserved. void *value = p->value. void **array. as depicted below. *pp = p->link. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the binding can be unlinked from the list by setting *pp to (*pp)->link. } The for loop is functionally equivalent to the one in ¢search table for key 128². It allocates an array to hold the key-value pairs followed by a terminating end pointer. *pp. Table_toArray is similar to List_toArray. Any other use requires prior written consent from the copyright owner. . void *end) { int i. pp p If *pp holds key. except that pp points to the pointer to the binding for each key.

.. reproduction and/or distribution are strictly prohibited and violate applicable laws. Icon (Griswold and Griswold 1990). The former step is needed only if the table isn’t empty: ¢functions 126²+≡ void Table_free(T *table) { assert(table && *table). Unauthorized use. } p->key must be cast from const void * to void * because the array is not declared const. p = q) { q = p->link. Frank Liu Copyright © 1997 by David R. Table’s implementation uses C Interfaces and Implementations: Techniques for Creating Reusable Software. i++) for (p = (*table)->buckets[i]. i < table->size. but AWK tables (which are called arrays) can be indexed by and hold only strings and numbers. AWK (Aho.132 TABLES assert(table). i++) for (p = table->buckets[i]. and in SNOBOL4’s successor. Hanson. This download file is made available for personal use only and is subject to the Terms of Service.com. which predates AWK. All rights reserved. if ((*table)->length > 0) { int i. p. Table_free must deallocate the binding structures and the Table_T structure itself. } } FREE(*table). Kernighan. and Weinberger 1988) is a recent example. FREE(p). Any other use requires prior written consent from the copyright owner. array[j++] = p->value. but tables appeared in SNOBOL4 (Griswold 1972). return array. } array[j] = end. The order of the key-value pairs in the array is arbitrary. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. i < (*table)->size. *q. p = p->link) { array[j++] = (void *)p->key. array = ALLOC((2*table->length + 1)*sizeof (*array)). struct binding *p. for (i = 0. p. for (i = 0. Tables in SNOBOL4 and Icon can be indexed by and can hold values of any type. } Further Reading Tables are so useful that many programming languages use them as built-in data types.

. visited all bindings in the table. including dictionaries. The load factor can be kept within reasonable bounds by expanding the hash table whenever the load factor exceeds. either as built-in types or in libraries. Exercises 8. Any other use requires prior written consent from the copyright owner. and it can save storage because all tables can start small. Table_put always added a new binding to the table even if the key was already present. and Table_remove removed only the most recent binding.com. one hash chain at a time.1 There are many viable alternatives for associative-table ADTs. Hanson. in earlier versions of Table. For example. so clients could change them. Tables also appear in object-oriented languages. Design and implement a different table ADT. is reasonably small. These kinds of objects are often called container objects because they hold collections of other objects. which is the number of table entries divided by the number of elements in the hash table. a more sophisticated approach in which the hash table is expanded (or contracted) incrementally. All rights reserved. Table’s implementation uses fixed-size hash tables.EXERCISES 133 some of the same techniques used to implement tables in Icon (Griswold and Griswold 1986). a page-description language. but can hold values of any type. when the load factor gets too high. which expands the hash table and rehashes all the existing entries. As long as the load factor. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Larson’s approach eliminates the need for hint. which it calls dictionaries. Performance suffers. The foundation libraries in both SmallTalk and Objective-C include dictionaries. also has tables. Table_map. however. PostScript (Adobe Systems 1990). in great detail. 8. five. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. PostScript tables can be indexed only by “names.. Unauthorized use. Discuss the pros and cons of these and other alternatives.5 explores an effective but naive implementation of dynamic hash tables. keys can be found by looking at only a few entries.2 The Table interface is designed so that other data structures can be used to implement tables. Larson (1988) describes. however. The comparison function reveals the relative order of two keys to admit implementations that use C Interfaces and Implementations: Techniques for Creating Reusable Software. Exercise 8. which are much like the tables exported by Table.” which are PostScript’s rendition of atoms. effectively “hiding” a previous binding with the same key. say. Table_get returned pointers to the values instead of returning the values themselves. reproduction and/or distribution are strictly prohibited and violate applicable laws. In one design.

With this change. All rights reserved.8 Change wf. Hint: In the current implementation. This download file is made available for personal use only and is subject to the Terms of Service.c’s compare function so that it sorts the array in decreasing order of count values.c would appear before those for table. but would simplify clients like wf that sort the table’s bindings anyway. 8. reproduction and/or distribution are strictly prohibited and violate applicable laws. the counts for mem. Reimplement Table using binary search trees or red-black trees. the average-case running time of Table_put is constant and that of Table_get is nearly so. 8. See Sedgewick (1990) for details about these data structures.2. Hanson. This stipulation would complicate the implementation of Table. 8. Implement this amendment.3 The order in which Table_map and Table_toArray visit the bindings in a table is unspecified. and compare its performance with your solution to the previous exercise. Suppose the interface were amended so that Table_map visited the bindings in the order they were added to the table and Table_array returned an array with the bindings in the same order. Unauthorized use. Devise a test program that tests the effectiveness of your heuristics.. 8.134 TABLES trees.5 Once buckets is allocated. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Revise the Table implementation so that it uses a heuristic to adjust the size of buckets periodically as pairs are added and removed. Frank Liu Copyright © 1997 by David R.c to measure how much space is lost because atoms are never deallocated.7 Revise wf.com. it’s never expanded or contracted.6 Implement the linear dynamic-hashing algorithm described in Larson (1988). and measure its benefit. 8. . C Interfaces and Implementations: Techniques for Creating Reusable Software.9 Change wf.c so that it prints the output for each file argument in alphabetical order of file names.4 Suppose the interface stipulated that Table_map and Table_array visited the bindings in sorted order. What are the practical advantages of this behavior? 8.c in the example shown at the beginning of Section 8. for example. Discuss the merits of this proposal and implement it. What are the averagecase running times of Table_put and Table_get in your revised implementation? 8. Any other use requires prior written consent from the copyright owner.

C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. Hanson. .. Frank Liu Copyright © 1997 by David R. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

Hanson. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. All rights reserved.. .com.C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use.

Indeed.. The basic operations on a set are testing for membership. Applications use sets much the way they use tables. but never inspect them directly. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. . sets of characters are usually associated with the universe consisting of the 256 eight-bit character codes. Frank Liu Copyright © 1997 by David R. For example. Hanson. the sets provided by Set are like tables: set members are the keys and the values associated with the keys are ignored. which is U − s. Sets are usually described in terms of a universe — the set of all possible members. The interface exports functions that manipulate set members. it’s possible to form the complement of a set s. adding members. This download file is made available for personal use only and is subject to the Terms of Service. and symmetric difference. is the set whose members appear in only one of s or t. Other operations include set union. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. often written as s / t. the union s + t is a set that contains everything in s and everything in t. difference. Any other use requires prior written consent from the copyright owner. When a universe U is specified. Given two sets s and t. All rights reserved. the difference s − t is the set whose members appear in s but not in t.9 SETS set is an unordered collection of distinct members. Like the Table interface. The sets provided by the Set interface do not rely on universes. the Set interface is designed so that clients provide functions to inspect the properties of the members in specific sets. and removing members.com. A 137 C Interfaces and Implementations: Techniques for Creating Reusable Software. the intersection s ∗ t is the set whose members appear in both s and t. intersection. and the symmetric difference.

Hanson.. Likewise. const void *y). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if hash is the null function pointer. the members are assumed to be atoms. and returns a new T. equal to zero.y) must return an integer less than zero. . set traversal.y) is zero. The functions in the first three groups are similar to those in the Table interface. Given two members x and y. hint is an estimate of the number of members the set is expected to contain. basic set operations. Set_Ts are allocated and deallocated by ¢exported functions 138²≡ extern T Set_new (int hint. and hash(x) must be equal to hash(y). then only one of x or y will appear in a set. x is less than y. initializes. int cmp(const void *x. 138² C Interfaces and Implementations: Techniques for Creating Reusable Software. cmp(x. respectively. if. All rights reserved. cmp and hash are used to compare two members and to map members onto unsigned integers. and operations that accept set operands and return new sets. such as set union. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. or x is greater than y. extern void Set_free(T *set).h²≡ #ifndef SET_INCLUDED #define SET_INCLUDED #define T Set_T typedef struct T *T.1 Interface ¢set.138 SETS 9. x equals y. two members x and y are assumed identical if x = y. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. ¢exported functions #undef T #endif The functions exported by Set fall into four groups: allocation and deallocation. Set_new allocates. Set_new can raise Mem_Failed.com. If cmp(x. unsigned hash(const void *x)). Set_new provides a hash function suitable for atoms. accurate values of hint may improve performance. reproduction and/or distribution are strictly prohibited and violate applicable laws. but any nonnegative value is acceptable. If cmp is the null function pointer. or greater than zero.

and returns the member removed (which might be a different pointer than member). This download file is made available for personal use only and is subject to the Terms of Service. const void *member). Clients must arrange to deallocate the returned array. void *cl). apply cannot change the members stored in set. C Interfaces and Implementations: Techniques for Creating Reusable Software. Set_member returns one if member is in set and zero if it is not. .com. Otherwise. extern void **Set_toArray(T set. Any other use requires prior written consent from the copyright owner. Set_toArray returns a pointer to an N+1-element array that holds the N elements of set in an arbitrary order. Notice that unlike in Table_map. It is a checked runtime error to pass a null set to Set_toArray. It is a checked runtime error to pass a null set or member to any of these routines. Hanson. const void *member). It is a checked runtime error to pass a null set or *set to Set_free. Set_free does not deallocate the members.INTERFACE 139 Set_free deallocates *set and assigns it the null pointer. set. Set_map can be used for that. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Set_put adds member to set. Set_put can raise Mem_Failed. or the number of members it contains. It passes the member and the client-specific pointer cl to apply. and for apply to change set by calling Set_put or Set_remove. const void *member). The functions Licensed by Frank Liu 1740749 set).. Frank Liu Copyright © 1997 by David R. ¢exported functions 138²+≡ extern void Set_map (T set. which is often the null pointer. unless it is already there. The value of end. reproduction and/or distribution are strictly prohibited and violate applicable laws. Set_toArray can raise Mem_Failed. set. is assigned to the N+1st element of the array. Set_remove does nothing and returns null. The basic operations are provided by the functions ¢exported functions 138²+≡ extern int Set_length(T extern int Set_member(T extern void Set_put (T extern void *Set_remove(T Set_length returns set’s cardinality. void *cl). The following functions visit all the members in a set. Unauthorized use. void apply(const void *member. It does not otherwise inspect cl. It is a checked runtime error to pass a null apply or set to Set_map. All rights reserved. set. void *end). Set_remove removes member from set if set contains member. Set_map calls apply for each member of set.

.c: 18 43 72 . These functions interpret a null s or t as the empty set. NULL) returns a copy of s. s and t must have been created by calls to Set_new that specified the same comparison and hash functions. c getword. Unauthorized use.c: 141 142 144 147 148 .c and on lines 18. For each of these functions. s. which helps. Set_union returns s + t. FILE getword. 9.140 SETS ¢exported functions 138²+≡ extern T Set_union(T extern T Set_inter(T extern T Set_minus(T extern T Set_diff (T s.c getword. perform the four set operations described at the beginning of this chapter.. and Set_diff return s / t.2 Example: Cross-Reference Listings xref prints cross-reference lists of the identifiers in its input files.c: 7 8 9 10 11 16 19 22 27 34 35 xref. 43. Any other use requires prior written consent from the copyright owner. for example.c. Thus. s. xref emits a cross-reference list of the identifiers in the standard input. s. T T T T t). All rights reserved. omitting the file names shown in the sample output above: C Interfaces and Implementations: Techniques for Creating Reusable Software.c: 6 xref. If there are no program arguments. t).c and on 5 lines in xref. says that FILE is used on line 6 in getword. % xref xref. but they always return a new.c .c. to find all of the uses of specific identifiers in a program’s source files. Similarly. For example. even if the identifier appears more than once on that line.. t). . This download file is made available for personal use only and is subject to the Terms of Service. Set_inter returns s ∗ t.. Set_union(s. and 72 in xref. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. it is a checked runtime error for both s and t to be null and. when both s and t are nonnull. nonnull T. Hanson. A line number is listed only once. All four create and return new Ts and can raise Mem_Failed. c appears on 11 different lines in getword.. for them to have different comparison and hash functions.com.. Set_minus returns s − t. That is. t). The output lists the files and line numbers in sorted order..

c FILE tables indexed by file names (Table_Ts) table indexed by identifiers (a Table_T) sets of pointers to ints (Set_Ts) 43 18 72 Figure 9. reproduction and/or distribution are strictly prohibited and violate applicable laws. The value associated with FILE in the single top-level table (which is the value of identifiers in the code below) is a second-level Table_T with two keys: atoms for getword.. Any other use requires prior written consent from the copyright owner.c | xref . It builds a table indexed by identifiers in which each associated value is another table indexed by file name.1 depicts this structure and shows the details for the identifier FILE as described after the first display above. The values in this table are sets of pointers to integers. This download file is made available for personal use only and is subject to the Terms of Service. and a set for each keyvalue pair in each second-level table.. .c. which hold the line numbers. The values associated with these keys are Set_Ts that hold pointers to the line numbers on which FILE appears.c 6 xref.c getword..EXAMPLE: CROSS-REFERENCE LISTINGS 141 % cat xref. identifiers getword. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. There is a secondlevel table for each identifier in the top-level table..c and xref. xref’s implementation shows how sets and tables can be used together. All rights reserved. Hanson...1 Cross-reference list data structures C Interfaces and Implementations: Techniques for Creating Reusable Software. Figure 9.. c 141 142 144 147 148 158 159 160 161 162 167 170 173 178 185 186 . FILE 18 43 72 157 . Unauthorized use. Frank Liu Copyright © 1997 by David R.com.

This download file is made available for personal use only and is subject to the Terms of Service. Hanson. stdin. . it calls xref with a null file pointer. } ¢xref includes 142²≡ #include <stdio. return EXIT_FAILURE. then processes its file-name arguments. the file name.com.h> #include <stdlib. i++) { FILE *fp = fopen(argv[i]. Unauthorized use.c²≡ ¢xref includes 142² ¢xref prototypes 143² ¢xref data 146² ¢xref functions 142² xref’s main function is much like wf’s: It creates the table of identifiers. and it’s easier to understand how it is built if you first examine how its contents are printed.h> #include <errno. "r").h> #include "table. for (i = 1. ¢print the identifiers 143² return EXIT_SUCCESS. argv[0]. Frank Liu Copyright © 1997 by David R. the file pointer for the standard input. i < argc. All rights reserved. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. fclose(fp). identifiers). NULL).h" xref builds a complicated data structure. "%s: can't open '%s' (%s)\n". which C Interfaces and Implementations: Techniques for Creating Reusable Software. } else { xref(argv[i]. and the identifier table: ¢xref functions 142²≡ int main(int argc.. argv[i]. reproduction and/or distribution are strictly prohibited and violate applicable laws. } } if (argc == 1) xref(NULL. if (fp == NULL) { fprintf(stderr. and the identifier table. NULL.142 SETS ¢xref. char *argv[]) { int i. It opens each file and calls the function xref with the file pointer. identifiers). Table_T identifiers = Table_new(0. fp. If there are no arguments. strerror(errno)).

the comparison function passed to the standard library function qsort. All rights reserved. Writing separate chunks or functions for each component helps you to understand the details of this voyage. Unauthorized use. The first step builds an array of the identifiers and their values. Any other use requires prior written consent from the copyright owner. const void *y) { return strcmp(*(char **)x. } ¢xref includes 142²+≡ #include <string. Frank Liu Copyright © 1997 by David R.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. } The keys in identifiers are atoms. } FREE(array). const void *y). This download file is made available for personal use only and is subject to the Terms of Service. so they can be captured in an array. void **array = Table_toArray(identifiers. print(array[i+1]). ¢print the identifiers 143²≡ { int i.h> ¢xref prototypes 142²≡ int compare(const void *x. C Interfaces and Implementations: Techniques for Creating Reusable Software. and then walks down the array calling another function. compare). is identical to the compare used in wf and uses strcmp to compare pairs of identifiers (page 123 explains qsort’s arguments): ¢xref functions 142²+≡ int compare(const void *x. sorted. Table_length(identifiers). which is passed to print.. The keys in this table are atoms for the file names. print. for (i = 0. Each value in identifiers is another table. array[i]. qsort(array.EXAMPLE: CROSS-REFERENCE LISTINGS 143 you can do by navigating the components in the data structure. *(char **)y). This step is much like wf’s chunk ¢print the words 123². . Hanson. i += 2) { printf("%s". to deal with the values. 2*sizeof (*array). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and traversed by code similar to that used above. (char *)array[i]). so compare. sorts the array on the identifiers. NULL).

compare). } C Interfaces and Implementations: Techniques for Creating Reusable Software. lines[j]. j++) printf(" %d". Because Set implements sets of pointers. i += 2) { if (*(char *)array[i] != '\0') printf("\t%s:". it then sorts the array and prints the integers: ¢print the line numbers in the set array[i+1] 144²≡ { int j. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. void **lines = Set_toArray(array[i+1]. NULL). for (i = 0.144 SETS ¢xref functions 142²+≡ void print(Table_T files) { int i. void **array = Table_toArray(files. NULL). Each value in the tables passed to print is a set of line numbers. print uses this convention to avoid printing the file name before emitting the list of line numbers. 2*sizeof (*array). (char *)array[i]). Any other use requires prior written consent from the copyright owner. and the key is a zero-length atom. To print them. ¢print the line numbers in the set array[i+1] 144² printf("\n"). Unauthorized use. } FREE(array). Hanson. each of the tables passed to print has only one entry. print can use compare because the keys are just strings. array[i]. xref represents line numbers by pointers to integers and adds these pointers to the sets. qsort(array. FREE(lines). sizeof (*lines).. If there are no file name arguments.com. cmpint). for (j = 0. it calls Set_toArray to build and return a null-terminated array of pointers to integers. qsort(lines. All rights reserved. Table_length(files). *(int *)lines[j]). } ¢xref prototypes 143²+≡ void print(Table_T). Set_length(array[i+1]). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. .

com. but it takes two pointers to pointers to integers and compares the integers: ¢xref functions 142²+≡ int cmpint(const void *x.. else if (**(int **)x > **(int **)y) return +1.EXAMPLE: CROSS-REFERENCE LISTINGS 145 cmpint is like compare. Unauthorized use. it walks down the data structure to the appropriate set and adds the current line number to the set: ¢xref functions 142²+≡ void xref(const char *name. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. buf. const char *id = Atom_string(buf). if (name == NULL) name = "". } ¢xref prototypes 143²+≡ int cmpint(const void *x. It uses getword to read the identifiers in its input. Table_T files. ¢files ← file table in identifiers associated with id 147² ¢set ← set in files associated with name 147² ¢add linenum to set. rest)) { Set_T set. Any other use requires prior written consent from the copyright owner. Hanson. For each identifier. const void *y) { if (**(int **)x < **(int **)y) return -1. const void *y). if necessary 148² } } C Interfaces and Implementations: Techniques for Creating Reusable Software. FILE *fp. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. sizeof buf. . Table_T identifiers){ char buf[128]. xref builds the data structure printed by the code just discussed. name = Atom_string(name). linenum = 1. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. first. else return 0. while (getword(fp.

h" #include "mem.h" ¢xref prototypes 143²+≡ void xref(const char *.com. The code that navigates through the tables to the appropriate set must cope with missing components. For example. } ¢xref includes 142²+≡ #include <ctype.h" #include "getword. return isalpha(c) || c == '_'. Frank Liu Copyright © 1997 by David R. int rest (int c).h" #include "set.146 SETS ¢xref includes 142²+≡ #include "atom. reproduction and/or distribution are strictly prohibited and violate applicable laws. an identifier won’t have an entry in identifiers when it is encountered for the first time. . ¢xref functions 142²+≡ int first(int c) { if (c == '\n') linenum++. Any other use requires prior written consent from the copyright owner.h> getword and the first and rest functions passed to it are described starting on page 119. Unauthorized use. FILE *. All rights reserved. linenum is a global variable that is incremented whenever first trips over a new-line character. first is the function passed to getword to identify the initial character in an identifier: ¢xref data 146²≡ int linenum. so the C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service.. } int rest(int c) { return isalpha(c) || c == '_' || isdigit(c). ¢xref prototypes 143²+≡ int first(int c). Hanson. Table_T).

. inthash). unsigned inthash(const void *x). so a new set is created and added to the files table when it is first needed: ¢set ← set in files associated with name 147²≡ set = Table_get(files. so it can call cmpint. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (files == NULL) { files = Table_new(0.com. . NULL. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } The sets are sets of pointers to integers. there’s no set of line numbers on the first occurrence of an identifier in a new file. } 147²≡ Likewise. files). intcmp is like cmpint. The integer itself can be used as its own hash number: ¢xref functions 142²+≡ int intcmp(const void *x. but its arguments are the pointers in the set. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. All rights reserved. intcmp. } unsigned inthash(const void *x) { return *(int *)x. intcmp and inthash compare and hash the integers. set). Any other use requires prior written consent from the copyright owner. above. name. id). const void *y). const void *y) { return cmpint(&x. &y). Unauthorized use. Table_put(identifiers. } ¢xref prototypes 143²+≡ int intcmp (const void *x. name). if (set == NULL) { set = Set_new(0. id. NULL). Table_put(files. Frank Liu Copyright © 1997 by David R.EXAMPLE: CROSS-REFERENCE LISTINGS 147 code creates the file table and adds the identifier–file table pair to identifiers on the fly: ¢files ← file table in identifiers associated with id files = Table_get(identifiers.

if (!Set_member(set. } } 9.148 SETS By the time control reaches ¢add linenum to set. Any other use requires prior written consent from the copyright owner.c²≡ #include <limits. The exercises explore some of the viable alternatives to this implementation and to the Table implementation. *p = linenum. reproduction and/or distribution are strictly prohibited and violate applicable laws. this code creates a memory leak. Unauthorized use. . p)) { NEW(p). But if set already holds linenum.h> #include <stddef.com. set is the set into which the current line number should be inserted. Frank Liu Copyright © 1997 by David R. p).3 Implementation The implementation of Set is much like the implementation of Table. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h> #include "mem. It represents sets with hash tables and uses the comparison and hash functions to locate members in these tables.h" #include "arith. This leak can be avoided by allocating the space only when linenum isn’t in set: ¢add linenum to set.h" #define T Set_T C Interfaces and Implementations: Techniques for Creating Reusable Software. This could be done with the code: int *p. ¢set. because the pointer to the newly allocated space won’t be added to the table. Set_put(set. if necessary 148²≡ { int *p = &linenum. This download file is made available for personal use only and is subject to the Terms of Service. Set_put(set. All rights reserved. if necessary 148².. *p = linenum. p).h" #include "assert. NEW(p).h" #include "set. Hanson.

. This download file is made available for personal use only and is subject to the Terms of Service. 8191. 16381. unsigned (*hash)(const void *x).. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. i++) . Set_new computes the appropriate number of elements for the buckets array. unsigned timestamp. static int primes[] = { 509.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. length is the number of members in the set. and cmp and hash hold the comparison and hash functions. primes[i] < hint. 509. assert(hint >= 0). } **buckets. int i. for (i = 1. INT_MAX }. 2053. Like Table_new. const void *member. stores that number in the size field. 4093. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. const void *y). Unauthorized use. 1021. }. set = ALLOC(sizeof (*set) + primes[i-1]*sizeof (set->buckets[0])). 65521. const void *y). struct member { struct member *link. and allocates the space for a struct T and the buckets array: ¢functions 149²≡ T Set_new(int hint. timestamp is used to implement the checked runtime error in Set_map that forbids apply from changing the set. int (*cmp)(const void *x.IMPLEMENTATION 149 ¢types 149² ¢static functions ¢functions 149² 150² A Set_T is a hash table in which the chains hold the members: ¢types 149²≡ struct T { int length. unsigned hash(const void *x)) { T set. 32771. Any other use requires prior written consent from the copyright owner. All rights reserved. int size. int cmp(const void *x. Hanson.

1 Member Operations Testing for membership is like looking up a key in a table: hash the potential member and search the appropriate list emanating from buckets: ¢functions 149²+≡ int Set_member(T set. set->length = 0. assert(member). set->cmp = cmp ? cmp : cmpatom. which are the same ones used by Table_new. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } static unsigned hashatom(const void *x) { return (unsigned long)x>>2. which is indicated by null function pointers for either cmp or hash. i < set->size. } 9. Hanson. } Set_new uses hint to choose one of the values in primes for the number of elements in buckets (see page 127). Frank Liu Copyright © 1997 by David R. set->hash = hash ? hash : hashatom. return set. assert(set). . Set_new uses the following comparison and hash functions. struct member *p. const void *y) { return x != y. ¢static functions 150²≡ static int cmpatom(const void *x. This download file is made available for personal use only and is subject to the Terms of Service. set->buckets = (struct member **)(set + 1). If the members are atoms. const void *member) { int i. All rights reserved. Unauthorized use.150 SETS set->size = primes[i-1]. for (i = 0.com. Any other use requires prior written consent from the copyright owner. ¢search set for member 151² C Interfaces and Implementations: Techniques for Creating Reusable Software. set->timestamp = 0. reproduction and/or distribution are strictly prohibited and violate applicable laws.3.. i++) set->buckets[i] = NULL.

const void *member) { int i. assert(set).IMPLEMENTATION 151 return p != NULL. All rights reserved. pp. ¢functions 149²+≡ void Set_put(T set. p = p->link) if ((*set->cmp)(member. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. Frank Liu Copyright © 1997 by David R. assert(member). reproduction and/or distribution are strictly prohibited and violate applicable laws. Adding a new member is similar: search the set for the member. p->member) == 0) break. } ¢search set for member 151²≡ i = (*set->hash)(member)%set->size. in which case the assignment *pp = (*pp)->link below removes the structure from the chain. and add it if the search fails. . } ¢add member to set 151²≡ NEW(p). This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. timestamp is used in Set_map to enforce its checked runtime error. down the appropriate hash chain until *pp is null or (*pp)->member is the member of interest. ¢search set for member 151² if (p == NULL) { ¢add member to set 151² } else p->member = member. p. Unauthorized use. set->buckets[i] = p. p->member = member. struct member *p. so testing p determines Set_member’s outcome.. set->timestamp++. for (p = set->buckets[i]. p is nonnull if the search succeeds and null otherwise. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Set_remove deletes a member by walking a pointer to a pointer to a member structure. p->link = set->buckets[i].com. set->length++.

struct member **pp. set->length--. *pp. } return NULL. This download file is made available for personal use only and is subject to the Terms of Service. Hanson. pp = &(*pp)->link) if ((*set->cmp)(member. assert(set).com. All rights reserved. assert(member). Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. const void *member) { int i. see page 130. FREE(p). (*pp)->member) == 0) { struct member *p = *pp.152 SETS ¢functions 149²+≡ void *Set_remove(T set. return set->length. member = p->member. . Frank Liu Copyright © 1997 by David R. } If the set is nonempty. if ((*set)->length > 0) { C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. Set_remove and Set_put keep track of the number of members in the set by decrementing and incrementing its length field. Set_free must first walk the hash chains deallocating the member structures before it can deallocate the set itself and clear *set. ¢functions 149²+≡ void Set_free(T *set) { assert(set && *set).. for (pp = &set->buckets[i]. return (void *)member. *pp = p->link. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. i = (*set->hash)(member)%set->size. set->timestamp++. } Walking pp down the hash chain is the same idiom used in Table_remove. which Set_length returns: ¢functions 149²+≡ int Set_length(T set) { assert(set).

reproduction and/or distribution are strictly prohibited and violate applicable laws. like List_toArray. cl). } } FREE(*set). p = p->link) { apply(p->member. p. j = 0. C Interfaces and Implementations: Techniques for Creating Reusable Software.. p. i < (*set)->size. It can. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. i++) for (p = (*set)->buckets[i]. which could modify the set’s semantics. Frank Liu Copyright © 1997 by David R. } Set_map is almost identical to Table_map: It traverses the hash chains calling apply for each member. . i++) for (p = set->buckets[i]. FREE(p). for (i = 0. assert(set). ¢functions 149²+≡ void Set_map(T set. Set_toArray is simpler than Table_toArray. void *cl) { int i. use a cast to change the values these members point to. struct member *p. void *end) { int i. p = q) { q = p->link. so apply can’t change the pointers in the set. unsigned stamp. } } One difference is that Set_map passes each member — not a pointer to each member — to apply. i < set->size. struct member *p. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 153 int i.com. assert(set->timestamp == stamp). This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. for (i = 0. void apply(const void *member. *q. stamp = set->timestamp. it allocates an array and just copies the members into it: ¢functions 149²+≡ void **Set_toArray(T set. however. assert(apply). void *cl). Unauthorized use. Hanson.

9. reproduction and/or distribution are strictly prohibited and violate applicable laws. p = p->link) array[j++] = (void *)p->member.2 Set Operations All four set operations have similar implementations. } return set. } } ¢for each member q in t int i. i < set->size. q->member). Hanson. assert(s->cmp == t->cmp && s->hash == t->hash). 154²≡ C Interfaces and Implementations: Techniques for Creating Reusable Software. array[j] = end. T t) { if (s == NULL) { assert(t). } p->member must be cast from const void * to void * because the array is not declared const. for (i = 0.. t->size)). which can be done by making a copy of s then adding each member of t to the copy. All rights reserved. i++) for (p = set->buckets[i]. array = ALLOC((set->length + 1)*sizeof (*array)). This download file is made available for personal use only and is subject to the Terms of Service. assert(set). return array.3. is implemented by adding each element of s and t to a new set. p. else { T set = copy(s. Arith_max(s->size. return copy(t. Unauthorized use. s + t. t->size). for example. { ¢for each member q in t 154² Set_put(set. s->size). Frank Liu Copyright © 1997 by David R.com. struct member *p. Any other use requires prior written consent from the copyright owner. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if it’s not already in that set: ¢functions 149²+≡ T Set_union(T s.154 SETS void **array. } else if (t == NULL) return copy(s.

IMPLEMENTATION 155 struct member *q. which must be nonnull. Hanson. i < t->size. . whichever is smaller. Any other use requires prior written consent from the copyright owner. q. int hint) { T set. which does the addition directly. Unauthorized use. Intersection. it uses the size of the larger hash table in s or t because the resulting set will have at least as many members as Set_union’s largest argument. int i = (*set->hash)(member)%set->size. ¢static functions 150²+≡ static T copy(T t. s ∗ t.com. const void *member = q->member. Frank Liu Copyright © 1997 by David R. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. t->hash). This download file is made available for personal use only and is subject to the Terms of Service. for (i = 0. copy could call Set_put to add each member to the copy. { ¢for each member q in t 154² ¢add q->member to set 155² } return set. Set_union supplies a hint when it makes a copy of s. i++) for (q = t->buckets[i]. and adds members to the new set only if they appear in both s and t: C Interfaces and Implementations: Techniques for Creating Reusable Software. set = Set_new(hint. } ¢add q->member to set 155²≡ { struct member *p. creates a new set with the hash table from s or t. t->cmp. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(t). to avoid Set_put’s fruitless search. ¢add member to set 151² } Set_union and copy both have access to privileged information: they know the representation for sets and can thus specify the size of the hash table for a new set by passing the appropriate hint to Set_new.. q = q->link) The internal function copy returns a copy of its argument. but it uses ¢add q->member to set 155².

{ ¢for each member q in t 154² if (Set_member(s. t->size). s->cmp. else if (s->length < t->length) return Set_inter(t.. t->size). } else if (t == NULL) return Set_new(s->size. t->hash). Unauthorized use. assert(s->cmp == t->cmp && s->hash == t->hash). return Set_new(t->size. s->hash). This download file is made available for personal use only and is subject to the Terms of Service. This causes the for loop in the last else clause to walk through the smaller set. else { T set = Set_new(Arith_min(s->size. All rights reserved. Frank Liu Copyright © 1997 by David R. { ¢for each member q in t 154² C Interfaces and Implementations: Techniques for Creating Reusable Software. t->cmp. Hanson. assert(s->cmp == t->cmp && s->hash == t->hash). q->member)) ¢add q->member to set 155² } return set. s->cmp. Set_inter calls itself with s and t swapped. else { T set = Set_new(Arith_min(s->size. reproduction and/or distribution are strictly prohibited and violate applicable laws. T t) { if (s == NULL) { assert(t). The code below switches the names of the arguments so that it can use the chunk ¢for each member q in t 154² to sequence through s: ¢functions 149²+≡ T Set_minus(T t. s->hash). s − t. s->hash). s->cmp. . s). Any other use requires prior written consent from the copyright owner. creates a new set and adds to it the members from s that do not appear in t. return Set_new(s->size. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } else if (s == NULL) return copy(t.156 SETS ¢functions 149²+≡ T Set_inter(T s. Difference. T s) { if (t == NULL){ assert(s). } } If s has fewer members than t. t->size). s->hash). s->cmp.com.

} else if (t == NULL) return copy(s. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. Any other use requires prior written consent from the copyright owner. A special case. q->member)) ¢add q->member to set 155² } { T u = t. T t) { if (s == NULL) { assert(t). . q->member)) ¢add q->member to set 155² } return set. s->cmp. s = u. t->size). t->size). assert(s->cmp == t->cmp && s->hash == t->hash). q->member)) ¢add q->member to set 155² } return set. The chunk ¢for each member q in t 154² can be used for both passes by swapping the values of s and t between passes: ¢functions 149²+≡ T Set_diff(T s.IMPLEMENTATION 157 if (!Set_member(s. adding to the new set each member that’s not in t. } { ¢for each member q in t 154² if (!Set_member(s. s / t. If s or t is the empty set. some of them are explored in the exercises. Otherwise. s->size). else { T set = Set_new(Arith_min(s->size. which can be done by making a pass over s. Unauthorized use. adding to the new set each member that’s not in s. s->hash). then s / t is t or s. t = s. is the set whose elements appear in either s or t but not both. All rights reserved. return copy(t. then a pass over t.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R.com. which might C Interfaces and Implementations: Techniques for Creating Reusable Software. } } Symmetric difference. } } More efficient implementations of these four operations are possible. { ¢for each member q in t 154² if (!Set_member(s. s / t is equivalent to (s − t) + (t − s).

{ 1 2 3 } is a set of integers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. see Exercise 9. Hanson.1 Implement Set using Table. and the implementation is similar to Icon’s (Griswold and Griswold 1986). Sets are the central data type in SETL.3 The implementations of Set and Table have much in common. small universes. and most of its operators and control structures are designed to manipulate sets. Implement your interface using the support interface designed in the previous exercise. Since it knows the number of members in the copy. A bag is like a set but members can appear more than once.5 copy makes a copy of its set argument one member at a time. This download file is made available for personal use only and is subject to the Terms of Service.com. 9.2 Implement Table using Set. 9. 9. Chapter 13 describes an interface that uses this approach. and { 1 1 2 2 3 } is a bag of integers. Unauthorized use.6 Some of the set operations might be made more efficient by storing the hash numbers in the member structures so that hash is C Interfaces and Implementations: Techniques for Creating Reusable Software. 9.158 SETS be important for some applications.. Design and implement a third interface that distills their common properties. it could allocate all of the member structures at once and then dole them out to the appropriate hash chains as it fills in the copy. Reimplement Set and Table using your new interface. Any other use requires prior written consent from the copyright owner. All rights reserved. Exercises 9. is when the hash tables in s and t are the same size. Implement this scheme and measure its benefits. Frank Liu Copyright © 1997 by David R. Bit vectors are often used to represent sets with fixed. The purpose of this interface is to support the implementations of ADTs like sets and tables.7. Further Reading The sets exported by Set are modeled on the sets in Icon (Griswold and Griswold 1990). . Icon is one of the few languages that have sets as a built-in data type. for example.4 Design an interface for bags. reproduction and/or distribution are strictly prohibited and violate applicable laws. 9.

This download file is made available for personal use only and is subject to the Terms of Service.EXERCISES 159 called only once for each member.10 Explain why cmpint and intcmp use explicit comparisons to compare integers instead of returning the result of subtracting them. Change xref so that it eventually deallocates everything it allocates (except the atoms.com.**(int **)y. Any other use requires prior written consent from the copyright owner. Hanson. This occurs frequently because many applications specify the same hint whenever they call Set_new. s ∗ t. xref emits each line number.c so that it replaces two or more consecutive line numbers by a line range: c getword. if it looks worthwhile. implement it and measure the results. reproduction and/or distribution are strictly prohibited and violate applicable laws. Use the solution to Exercise 5. That is.8 If an identifier appears on several consecutive lines. 9. 9. s − t.7 When s and t have the same number of buckets. Change the implementations of s + t. what’s wrong with the following — apparently much simpler — version of cmpint? int cmpint(const void *x. It’s easiest to do so incrementally as the data structure is being printed. of course).5 to check that you’ve deallocated everything.. For example: c getword. s + t is equal to the union of the subsets whose members are those on the same hash chain. const void *y) { return **(int **)x . more efficient implementation. . and s / t to detect this case and use the appropriate simpler. Unauthorized use. each hash chain in s + t is the union of the elements in the corresponding hash chains of s and t. Analyze the expected savings of this improvement and. 9. } C Interfaces and Implementations: Techniques for Creating Reusable Software.9 xref allocates a lot of memory.c: 7-11 16 19 22 27 34-35 Licensed by Frank Liu 1740749 9.c: 7 8 9 10 11 16 19 22 27 34 35 Modify xref. but deallocates only the arrays created by Table_toArray. That is. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. and the comparison functions are called only when the hash numbers are equal. All rights reserved.

C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. Hanson. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. . Frank Liu Copyright © 1997 by David R.com.

A static array may be allocated at runtime. all arrays have indices that start at zero. This download file is made available for personal use only and is subject to the Terms of Service. Arrays in some form appear as built-in data types in virtually all programming languages. The sizes of static arrays are known at compile time. It exports functions that allocate and deallocate dynamic arrays. but their sizes are known at compile time. In C. It reveals the representation for dynamic arrays for those few clients that need more effi161 A C Interfaces and Implementations: Techniques for Creating Reusable Software. their sizes can be determined at runtime. Some languages. Frank Liu Copyright © 1997 by David R. In some languages. All rights reserved. In C. each array can have its own bounds. the Array ADT described in this chapter provides a similar but more general facility.. like Modula-3. however. like C. This chapter also describes the ArrayRep interface. In C. n must be constant expression. Hanson. Array sizes are specified at either compile time or runtime. . and in other languages. So. have linquistic support for dynamic arrays. such as Modula-3. for example. they must be constructed explicitly as illustrated by functions like Table_toArray. in the declaration int a[n]. and expand or contract them to hold more or fewer elements. Unauthorized use.com. for example. all array indices have the same lower bounds. that is.10 DYNAMIC ARRAYS n array is a homogeneous sequence of values in which the elements in the sequence are associated one-to-one with indices in a contiguous range. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The various toArray functions show just how useful dynamic arrays are. access them with bounds checks. Any other use requires prior written consent from the copyright owner. local arrays are allocated at runtime when the function in which they appear is called. reproduction and/or distribution are strictly prohibited and violate applicable laws. declared arrays must have sizes known at compile time. The arrays returned by functions like Table_toArray are dynamic arrays because space for them is allocated by calling malloc or an equivalent allocation function.

Array_Ts are allocated and deallocated by ¢exported functions 162²≡ extern T Array_new (int length. int size).h²≡ #ifndef ARRAY_INCLUDED #define ARRAY_INCLUDED #define T Array_T typedef struct T *T. Any other use requires prior written consent from the copyright owner. initializes. size must include any padding that may be required for alignment. extern void Array_free(T *array). not the presumably much larger population of clients that import Array. It 162² C Interfaces and Implementations: Techniques for Creating Reusable Software. Array_new allocates. Each element in a particular array is a fixed size. Frank Liu Copyright © 1997 by David R. 10. The advantage of this organization is that importing ArrayRep clearly identifies those clients that depend on the representation of dynamic arrays. and ArrayRep specifies another. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. Array and ArrayRep illustrate a two-level interface or a layered interface. ¢exported functions #undef T #endif exports functions that operate on an array of N elements accessed by indices zero through N−1. unless length is zero. Array specifies a high-level view of an array ADT. .. more detailed view of the ADT at a lower level. This download file is made available for personal use only and is subject to the Terms of Service. Changes to the representation thus affect only them. All rights reserved.com.162 DYNAMIC ARRAYS cient access to the array elements. in which case the array has no elements. Together.1 Interfaces The Array ADT ¢array. Each element occupies size bytes. reproduction and/or distribution are strictly prohibited and violate applicable laws. but different arrays can have elements of different sizes. The bytes in each element are initialized to zero. so that the actual array can be created by allocating length•size bytes when length is positive. and returns a new array of length elements with bounds zero through length−1. Hanson.

. This download file is made available for personal use only and is subject to the Terms of Service. or for elem to be null. extern void *Array_put(T array. Frank Liu Copyright © 1997 by David R. int i). Any other use requires prior written consent from the copyright owner. int length). Unauthorized use. All rights reserved. int i. Array_put overwrites the value of element i with the new element pointed to by elem. the Array interface places no restrictions on the values of the elements. Array_free deallocates and clears *array. ¢exported functions 162²+≡ extern void Array_resize(T array. It can’t return the previous value of element i because the elements are not necessarily pointers. int length). Array_get returns a pointer to element number i. The functions ¢exported functions 162²+≡ extern int Array_length(T array). the sequences described in Chapter 11 are an example. Unlike most of the other ADTs in this book. it’s analogous to &a[i] when a is a declared C array. extern T Array_copy (T array. extern int Array_size (T array). It is a checked runtime error for i to be greater than or equal to the length of array. C Interfaces and Implementations: Techniques for Creating Reusable Software. each element is just a sequence of size bytes. void *elem).com. The rationale for this design is that Array_Ts are used most often to build other ADTs. It is a checked runtime error for array or *array to be null. Array_put returns elem. Array elements are accessed by ¢exported functions 162²+≡ extern void *Array_get(T array. return the number of elements in array and their size.. and they can be any number of bytes long. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It is an unchecked runtime error to call Array_get and then change the size of array via Array_resize before dereferencing the pointer returned by Array_get. Clients access the value of the element by dereferencing the pointer returned by Array_get. which build structures of void pointers. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson.INTERFACES 163 is a checked runtime error for length to be negative or for size to be nonpositive. Unlike Table_put. and Array_new can raise Mem_Failed. It is also an unchecked runtime error for the storage beginning at elem to overlap in any way with the storage of array’s ith element.

the new elements are initialized to zero. It is a checked runtime error to pass a null T to any function in this interface.1 shows the descriptor for an array of 100 integers returned by Array_new(100. the excess elements in the copy are initialized to zero. void *ary). . the array field is null.h²≡ #ifndef ARRAYREP_INCLUDED #define ARRAYREP_INCLUDED #define T Array_T struct T { int length. If length exceeds the current length of the array. }. If length exceeds the number of elements in array. Array_copy is similar. If the array has no elements. int length. Calling Array_resize invalidates any values returned by previous calls to Array_get. Unauthorized use.com. extern void ArrayRep_init(T array. reproduction and/or distribution are strictly prohibited and violate applicable laws. sizeof int) on a machine with four-byte integers. the size of the elements. char *array. expanding or contracting it as necessary. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusable Software. The ArrayRep interface reveals that an Array_T is represented by a pointer to a descriptor — a structure whose fields give the number of elements in the array. Array descriptors are sometimes called dope vectors. Array has no functions like Table_map and Table_toArray because Array_get provides the machinery necessary to perform the equivalent operations. Hanson. #undef T #endif Figure 10.. Array_resize and Array_copy can raise Mem_Failed. but returns a copy of array that holds its first length elements. int size. ¢arrayrep. and a pointer to the storage for the array. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. int size.164 DYNAMIC ARRAYS Array_resize changes the size of array so that it holds length elements.

writing them is an unchecked runtime error. sizeof int) Clients of ArrayRep may read the fields of a descriptor but may not write them. . and ary. It is an unchecked runtime error to initialize a T structure by means other than calling ArrayRep_init. Any other use requires prior written consent from the copyright owner.com. which initializes the fields in the Array_T structure pointed to by array to the values of the arguments length. ArrayRep guarantees that if array is a T and i is nonnegative and less than array->length. This function is provided so that clients can initialize Array_Ts they’ve embedded in other structures. ArrayRep also exports ArrayRep_init.1 The Array_T created by Array_new(100. ••• 99 10. also for length to be nonpositive and ary to be nonnull. All rights reserved. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws.. size. size to be nonpositive. array->array + i*array->size is the address of element i. Frank Liu Copyright © 1997 by David R. and ary to be null. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It is a checked runtime error for array to be null.IMPLEMENTATION 165 length size array 100 4 0 Figure 10.2 Implementation A single implementation exports both the Array and ArrayRep interfaces: C Interfaces and Implementations: Techniques for Creating Reusable Software. length to be nonzero.

h" "arrayrep. reproduction and/or distribution are strictly prohibited and violate applicable laws. size.h" "array. array->length = length. and calls ArrayRep_init to initialize the descriptor’s fields: ¢functions 166²≡ T Array_new(int length. Hanson. length.h> "assert. int length. int size) { T array. if (length > 0) ArrayRep_init(array. All rights reserved. size. else ArrayRep_init(array.h> <string. } ArrayRep_init is the only valid way to initialize the fields of descriptors.h" "mem. This download file is made available for personal use only and is subject to the Terms of Service. void *ary) { assert(array). NULL). Unauthorized use. Frank Liu Copyright © 1997 by David R. assert(ary && length>0 || length==0 && ary==NULL). clients that allocate descriptors by other means must call ArrayRep_init to initialize them. Any other use requires prior written consent from the copyright owner. return array. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. assert(size > 0). . size)).c²≡ #include #include #include #include #include #include <stdlib. array->size = size.166 DYNAMIC ARRAYS ¢array. NEW(array). C Interfaces and Implementations: Techniques for Creating Reusable Software.h" #define T Array_T ¢functions 166² Array_new allocates space for a descriptor and for the array itself if length is positive. int size. CALLOC(length. ¢functions 166²+≡ void ArrayRep_init(T array. length.com.

for example. } Array_free doesn’t have to check if (*array)->array is null because FREE accepts null pointers. and clears its argument: ¢functions 166²+≡ void Array_free(T *array) { assert(array && *array). int i) { assert(array). C Interfaces and Implementations: Techniques for Creating Reusable Software. if a field for an identifying serial number were added to the T structure. int i. It’s possible to add fields without affecting these clients as long as ArrayRep_init doesn’t change. and this field were initialized automatically by ArrayRep_init. array->size). void *elem) { assert(array). else array->array = NULL. FREE(*array). Any other use requires prior written consent from the copyright owner. This scenario would occur. assert(i >= 0 && i < array->length). Array_free deallocates the array itself and the T structure. } Calling ArrayRep_init to initialize a T structure helps reduce coupling: These calls clearly identify clients that allocate descriptors themselves and thus depend on the representation. assert(i >= 0 && i < array->length)..com. reproduction and/or distribution are strictly prohibited and violate applicable laws. assert(elem). Array_get and Array_put fetch and store elements in an Array_T: ¢functions 166²+≡ void *Array_get(T array. All rights reserved. . Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. FREE((*array)->array).IMPLEMENTATION 167 if (length > 0) array->array = ary. memcpy(array->array + i*array->size. Unauthorized use. return array->array + i*array->size. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. elem. } void *Array_put(T array.

} Notice that Array_put returns its third argument.. Array_copy is much like Array_resize. reproduction and/or distribution are strictly prohibited and violate applicable laws. and changes the array’s length field accordingly. } Clients of ArrayRep may access these fields directly from the descriptor. assert(length >= 0). else RESIZE(array->array. Array_length and Array_size return the similarly named descriptor fields: ¢functions 166²+≡ int Array_length(T array) { assert(array). in which case the array is deallocated. Any other use requires prior written consent from the copyright owner. ¢functions 166²+≡ void Array_resize(T array. } int Array_size(T array) { assert(array). Array_resize calls Mem’s RESIZE to change the number of elements in the array. Hanson. not the address of the array element into which those bytes were just stored. else if (array->length == 0) array->array = ALLOC(length*array->size).168 DYNAMIC ARRAYS return elem. All rights reserved. length*array->size). int length) { assert(array). . } Unlike with Mem’s RESIZE. return array->size. and henceforth the descriptor describes an empty dynamic array. array->length = length. Unauthorized use.com. a new length of zero is legal. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. except that it copies array’s descriptor and part or all of its array: C Interfaces and Implementations: Techniques for Creating Reusable Software. return array->length. Frank Liu Copyright © 1997 by David R. if (length == 0) FREE(array->array).

1 Design and implement an ADT that provides dynamic arrays of pointers. array->array. assert(array). array->length*array->size). copy = Array_new(length. these are much like the sequences described in the next chapter. Any other use requires prior written consent from the copyright owner. array->size). 10. else if (array->length > copy->length && copy->length > 0) memcpy(copy->array.FURTHER READING 169 ¢functions 166²+≡ T Array_copy(T array. array->array. Licensed by Frank Liu 1740749 Further Reading } Some languages support variants of dynamic arrays.com.2 Design an ADT for dynamic matrices — arrays with two dimensions — and implement it using Array. copy->length*array->size). Modula-3 (Nelson 1991). but they can’t be expanded or contracted. Lists in Icon (Griswold and Griswold 1990) are like dynamic arrays that can be expanded or contracted by adding or deleting elements from either end. int length) { T copy. return copy. It should provide “safe” access to the elements of these arrays via functions similar in spirit to the functions provided by Table. Can you generalize your design to arrays of N dimensions? C Interfaces and Implementations: Techniques for Creating Reusable Software. if (copy->length >= array->length && array->length > 0) memcpy(copy->array. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. for example. Use Array or Array_Rep in your implementation. Exercises 10. Frank Liu Copyright © 1997 by David R. Hanson. Icon also supports fetching sublists from a list and replacing a sublist with a list of a different size. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. permits arrays with arbitrary bounds to be created during execution. This download file is made available for personal use only and is subject to the Terms of Service.. Unauthorized use. . assert(length >= 0).

if size exceeds the original size. Your design should accept an array-specific value for zero and the implementation should store only those elements that are not equal to zero. each element is truncated.3 Design an implement and ADT for sparse dynamic arrays — arrays in which most of the elements are zero. Array_reshape changes the number of elements in array and the size of each element to length and size. int size). Like Array_resize. if length exceeds the original length. C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. to the Array interface and its implementation. int length.. The ith element in array becomes the ith element in the reshaped array. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. If size is less than the original size. All rights reserved.4 Add the function extern void Array_reshape(T array. . Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. 10. the excess bytes are set to zero. the excess elements are set to zero. respectively.170 DYNAMIC ARRAYS 10.com. This download file is made available for personal use only and is subject to the Terms of Service. Hanson. the reshaped array retains the first length elements of the original array.

Hanson.com. and deques. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. Like arrays.h²≡ #ifndef SEQ_INCLUDED #define SEQ_INCLUDED #define T Seq_T typedef struct T *T. A sequence hides bookkeeping and resizing details in its implementation. A 11. and they often subsume the facilities of separate ADTs for these data structures. Unauthorized use. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. 171 C Interfaces and Implementations: Techniques for Creating Reusable Software. stacks. Despite their relatively simple specification. . lists. An empty sequence holds no values. Values are pointers.. they can also be added to or removed from either end of a sequence. values in a sequence may be accessed by indexing.11 SEQUENCES sequence holds N values associated with the integer indices zero through N−1 when N is positive. Sequences expand automatically as necessary to accommodate their contents.1 Interface A sequence is an instance of the opaque pointer type defined in the Seq interface: ¢seq. Sequences are one of the most useful ADTs in this book. queues. A sequence can be viewed as a more abstract version of the dynamic array described in the previous chapter. they can be used as arrays.

172 SEQUENCES ¢exported functions #undef T #endif 172² It is a checked runtime error to pass a null T to any routine in this interface. If that number is unknown. deallocates the sequence *seq and clears *seq. C Interfaces and Implementations: Techniques for Creating Reusable Software. "C++". Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. … names = Seq_seq("C". The values in the argument list are associated with the indices zero through four.). NULL). a hint of zero creates a small sequence. . It is a checked runtime error for hint to be negative. "ML". Frank Liu Copyright © 1997 by David R. Seq_seq creates and returns a sequence whose values are initialized to its nonnull pointer arguments. ¢exported functions 172²+≡ extern int Seq_length(T seq). ¢exported functions 172²+≡ extern void Seq_free(T *seq). Sequences expand as necessary to hold their contents regardless of the value of hint. hint is an estimate of the maximum number of values the new sequence will hold. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com.. extern T Seq_seq(void *x. see page 105. . Any other use requires prior written consent from the copyright owner. Sequences are created by the functions ¢exported functions 172²≡ extern T Seq_new(int hint). This download file is made available for personal use only and is subject to the Terms of Service. Thus Seq_T names. so programmers must provide casts when passing other than char or void pointers. "Icon".. Seq_new creates and returns an empty sequence. creates a sequence with five values and assigns it to names. "AWK". Hanson. The pointers passed in the variable part of Seq_seq’s argument list are assumed to be void pointers.. The argument list is terminated by the first null pointer. It is a checked runtime error for seq or *seq to be null pointers. All rights reserved. Seq_new and Seq_seq can raise Mem_Failed.

Seq_get returns the ith value in seq. void *x). int i). . Removing the value at the end of a sequence decrements the length of the sequence by one. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. Seq_addhi adds x to the high end of seq and returns x. Seq_remhi removes and returns the value at the high end of seq. These values are accessed by the functions ¢exported functions 172²+≡ extern void *Seq_get(T seq. Frank Liu Copyright © 1997 by David R.. a sequence is contracted by removing values from either end: ¢exported functions 172²+≡ extern void *Seq_remlo(T seq). Adding a value to the beginning of a sequence increments both the indices of the existing values and the length of the sequence by one. void *x).INTERFACE 173 returns the number of values in the sequence seq. extern void *Seq_addhi(T seq. It is a checked runtime error to pass an empty sequence to Seq_remlo or Seq_remhi. Seq_get and Seq_put access the ith value in constant time. void *x). Seq_addlo and Seq_addhi can raise Mem_Failed. Seq_put changes the ith value to x and returns the previous value. It is a checked runtime error for i to be equal to or greater than N. This download file is made available for personal use only and is subject to the Terms of Service. Seq_remlo removes and returns the value at the low end of seq. extern void *Seq_remhi(T seq). extern void *Seq_put(T seq. A sequence is expanded by adding values to either end: ¢exported functions 172²+≡ extern void *Seq_addlo(T seq. The values in an N-value sequence are associated with the integer indices zero through N−1. int i. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Adding a value to the end of a sequence increments the length of the sequence by one. Seq_addlo adds x to the low end of seq and returns x. Unauthorized use.com. C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. Similarly. All rights reserved. Removing the value at the beginning of a sequence decrements both the indices of the remaining values and the length of the sequence by one.

¢static functions ¢functions 175² 179² The length field holds the number of values in the sequence and the array field holds the array in which the values are stored. the i+1st value is stored in element zero of the array. Its representation thus includes a dynamic array — not a pointer to an Array_T. Figure 11. This array always has at least length elements.c²≡ #include #include #include #include #include #include #include #include <stdlib. The box on the left is the Seq_T with its embedded Array_T.length-1. All rights reserved.h> <stdarg.174 SEQUENCES 11. if the ith value in the sequence is stored in element number array. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. a sequence is a high-level abstraction of a dynamic array.. and successive values are stored in successive elements modulo the array size. The zeroth value of the sequence is stored in element number head of the array. Unauthorized use. shown lightly C Interfaces and Implementations: Techniques for Creating Reusable Software. The array is used as a circular buffer to hold the sequence values. Any other use requires prior written consent from the copyright owner. int head. reproduction and/or distribution are strictly prohibited and violate applicable laws.h" "array. Frank Liu Copyright © 1997 by David R.h" "seq.2 Implementation As suggested at the beginning of this chapter. int length. }.length.h> <string. . but some of them are unused when length is less than array. but an Array_T structure itself — and its implementation imports both Array and ArrayRep: ¢seq.1 shows one way in which a seven-value sequence can be stored in a 16-element array.h" "arrayrep. This download file is made available for personal use only and is subject to the Terms of Service.h" "mem.com.h" #define T Seq_T struct T { struct Array_T array.h> "assert. Hanson. That is.

. ArrayRep_init(&seq->array. Unauthorized use. Seq_seq calls Seq_new to create an empty sequence. hint. Hanson. if (hint == 0) hint = 16. sizeof (void *). assert(hint >= 0). Frank Liu Copyright © 1997 by David R. A sequence always has an array even when it’s empty. reproduction and/or distribution are strictly prohibited and violate applicable laws.. A new sequence is created by allocating a dynamic array that can hold hint pointers. return seq. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 175 length size array length head 16 4 0 7 12 15 Figure 11. } Using NEW0 initializes the length and head fields to zero. As detailed below. This download file is made available for personal use only and is subject to the Terms of Service. and they are removed from the beginning by incrementing head modulo the array size. then crawls through its arguments calling Seq_addhi to append each one to the new sequence: C Interfaces and Implementations: Techniques for Creating Reusable Software. NEW0(seq). The box on the right is the array. ALLOC(hint*sizeof (void *))). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.1 A 16-element sequence shaded. its shading shows the elements occupied by values in the sequence. or 16 pointers if hint is zero: ¢functions 175²≡ T Seq_new(int hint) { T seq. values are added to the beginning of the sequence by decrementing head modulo the array size.com. All rights reserved.

} The ith value in a sequence is stored in the (head + i) mod array. } The call to Array_free works only because the address of *seq is equal to &(*seq)->array as asserted in the code. x). void *)) Seq_addhi(seq. reproduction and/or distribution are strictly prohibited and violate applicable laws. T seq = Seq_new(0). Array_free((Array_T *)seq). return seq->length. That is. Unauthorized use.com.. return seq. .) { va_list ap. x. Seq_length simply returns the sequence’s length field: ¢functions 175²+≡ int Seq_length(T seq) { assert(seq). for ( . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the Array_T structure must be the first field of the Seq_T structure so that the pointer returned by NEW0 in Seq_new is a pointer both to a Seq_T and to an Array_T. Any other use requires prior written consent from the copyright owner. va_end(ap)... Frank Liu Copyright © 1997 by David R. va_start(ap. x = va_arg(ap.176 SEQUENCES ¢functions 175²+≡ T Seq_seq(void *x. } Seq_seq uses the macros for handling variable length argument lists much as List_list does.length element of its array. assert((void *)*seq == (void *)&(*seq)->array). All rights reserved. which deallocates the array and its descriptor: ¢functions 175²+≡ void Seq_free(T *seq) { assert(seq && *seq). x). Hanson. Deallocating a sequence can be done by Array_free. . A type cast makes it possible to index the array directly: C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. see page 108.

increment head modulo the array size. . return ¢seq[i] 177². assert(seq->length > 0). assert(i >= 0 && i < seq->length). int i.array)[ (seq->head + i)%seq->array. assert(seq). ¢seq[i] 177² = x. This download file is made available for personal use only and is subject to the Terms of Service. } void *Seq_put(T seq. prev = ¢seq[i] 177². assert(i >= 0 && i < seq->length). Unauthorized use. Frank Liu Copyright © 1997 by David R. and decrement length: C Interfaces and Implementations: Techniques for Creating Reusable Software. Seq_remhi is the simpler of the two because it just decrements the length field and returns the value indexed by the new value of length: ¢functions 175²+≡ void *Seq_remhi(T seq) { int i.. Hanson. All rights reserved. assert(seq). and Seq_put sets it to x: ¢functions 175²+≡ void *Seq_get(T seq. return ¢seq[i] 177².com. int i) { assert(seq). void *x) { void *prev. reproduction and/or distribution are strictly prohibited and violate applicable laws. } Seq_remlo is slightly more complicated because it must return the value indexed by head (which is the value at index zero in the sequence).IMPLEMENTATION 177 ¢seq[i] 177²≡ ((void **)seq->array.length] Seq_get simply returns this array element. i = --seq->length. } Seq_remlo and Seq_remhi remove values from a sequence. return prev. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner.

after checking for expansion. When this condition occurs. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Seq_addhi is again the simpler of the two functions because. assert(seq).length. it stores the new value at the index given by length and increments length: ¢functions 175²+≡ void *Seq_addhi(T seq. but then decrements head modulo the array size and stores x in the array element indexed by the new value of head. void *x) { int i = 0. i = seq->length++. reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. } Seq_addlo and Seq_addhi add values to a sequence and thus must cope with the possibility that its array is full. which is the value at index zero in the sequence: ¢functions 175²+≡ void *Seq_addlo(T seq. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusable Software. void *x.. } Seq_addlo also checks for expansion. assert(seq->length > 0). it does this by calling Array_resize. --seq->length.178 SEQUENCES ¢functions 175²+≡ void *Seq_remlo(T seq) { int i = 0. x = ¢seq[i] 177². . if (seq->length == seq->array.length) expand(seq). assert(seq). Any other use requires prior written consent from the copyright owner. Hanson.length. void *x) { int i. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. return x. both functions call expand to enlarge the array.com. which occurs when length is equal to array. return ¢seq[i] 177² = x. seq->head = (seq->head + 1)%seq->array.

length . if (--seq->head < 0) seq->head = seq->array.1.length) expand(seq). reproduction and/or distribution are strictly prohibited and violate applicable laws. expand encapulates a call to Array_resize that doubles the size of a sequence’s array: ¢static functions 179²≡ static void expand(T seq) { int n = seq->array. Any other use requires prior written consent from the copyright owner. seq->array. memcpy(old+n.IMPLEMENTATION 179 assert(seq).length). as illustrated in Figure 11.1. Unauthorized use. old. expand must also cope with the use of the array as a circular buffer. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . } C Interfaces and Implementations: Techniques for Creating Reusable Software. the elements at the tail end of the original array — from head down — must be moved to the end of enlarged array to open up the middle. if (seq->length == seq->array. All rights reserved. if (seq->head > 0) ¢slide tail down 179² } As this code suggests.com. 2*n). return ¢seq[i] 177² = x. This download file is made available for personal use only and is subject to the Terms of Service. Seq_addlo could decrement seq->head with seq->head = Arith_mod(seq->head . (n . seq->head += n. Unless head just happens to be zero.length.. } Alternatively.seq->head)*sizeof (void *)). and head must be adjusted accordingly: ¢slide tail down 179²≡ { void **old = &((void **)seq->array. Hanson. Frank Liu Copyright © 1997 by David R. seq->length++. Licensed by Frank Liu 1740749 Array_resize(&seq->array.2.array)[seq->head].

. Unauthorized use. which takes time proportional to i/M. Use this representation to build a new implementation for Seq and develop some test programs to measure its performance.180 SEQUENCES old old+n Figure 11. Hanson. Suppose that an access C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved.1 explores the Icon implementation. say. The implementation described in this chapter is also similar to the DEC implementation. The disadvantage of this representation is that the chunks must be traversed to access the ith value. Exercises 11. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. .com. reproduction and/or distribution are strictly prohibited and violate applicable laws. but the names of the operations are taken from the Sequence interface in the library that accompanies the DEC implementation of Modula-3 (Horning. This representation avoids the use of Array_resize because new chunks can be added to either end of the list as necessary to satisfy calls to Seq_addlo and Seq_addhi. 1993). et al. Frank Liu Copyright © 1997 by David R.2 Expanding a sequence Further Reading Sequences are nearly identical to lists in Icon (Griswold and Griswold 1990).1 Icon implements lists — its version of sequences — with a doubly linked list of chunks where each chunk holds. M values. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Exercise 11.

C Interfaces and Implementations: Techniques for Creating Reusable Software.4 Sequences are expanded but never contracted. Hint: Skip lists (Pugh 1990).2 Devise an implementation for Seq that doesn’t use Array_resize. 11. each of which can be accessed in constant time. each of which 2 holds. Be careful — you cannot use Array_free. All rights reserved.6 Rewrite Seq_free so that the assertion it now uses is unnecessary.length/2. so the converted sequence can hold 2N values. say. Frank Liu Copyright © 1997 by David R. you won’t have to sort the line numbers because they will appear in the sequences in increasing order.5 Implement xref again using sequences instead of sets to hold the line numbers.. reproduction and/or distribution are strictly prohibited and violate applicable laws. Each of the 2N-element arrays in this “edge-vector” representation can be allocated lazily. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. For example. can you modify your implementation to make this case run in constant time? 11. 2N elements.3 Suppose you forbid Seq_addlo and Seq_remlo. only after a value is stored in it. Unauthorized use. that is.EXERCISES 181 to value i is almost always followed by an access to value i−1 or i+1.com. Modify Seq>remlo and Seq->remhi so that they contract a sequence whenever more than half of its array is unused. that is. If N is 1. When is this modification a bad idea? Hint: thrashing. the converted sequence can hold over two million elements. devise an implementation that allocates space incrementally but can access any element in logarithmic time. 11. . Since the files are read sequentially. when seq>length becomes less than seq->array. when the original array of N elements fills up. 11. 11. it could be converted to an array of pointers to arrays.024. Any other use requires prior written consent from the copyright owner.

. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.C Interfaces and Implementations: Techniques for Creating Reusable Software.com.. Frank Liu Copyright © 1997 by David R. All rights reserved. Unauthorized use.

In addition. Any other use requires prior written consent from the copyright owner. Like the values in a sequence. . but the Ring ADT reveals only that a ring is an instance of an opaque pointer type: ¢ring.h²≡ #ifndef RING_INCLUDED #define RING_INCLUDED #define T Ring_T typedef struct T *T. and any value in a ring can be removed. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. values in a ring may be accessed by indexing. the values can be renumbered: “rotating” a ring left decrements the index of each value by one modulo the length of the ring.1 Interface As suggested by its name.com. Frank Liu Copyright © 1997 by David R. ¢exported functions 184² 183 C Interfaces and Implementations: Techniques for Creating Reusable Software.. The price for the flexibility of adding values to and removing values from arbitrary locations in a ring is that accessing the ith value is not guaranteed to take constant time. Hanson. A 12. a ring is an abstraction of a doubly linked list. This download file is made available for personal use only and is subject to the Terms of Service. values can be added to a ring anywhere. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. rotating it right increments the indices by one modulo the ring length.12 RINGS ring is much like a sequence: It holds N values associated with the integer indices zero through N−1 when N is positive. however. Unlike a sequence. Values are pointers. An empty ring holds no values. Unauthorized use.

. Ring_length returns the number of values in ring.184 RINGS #undef T #endif It is a checked runtime error to pass a null T to any routine in this interface. These values are accessed by the functions C Interfaces and Implementations: Techniques for Creating Reusable Software. "Sets". extern T Ring_ring(void *x. ¢exported functions 184²+≡ extern void Ring_free (T *ring). reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. names = Ring_ring("Lists". The values in the argument list are associated with the indices zero through four.). Frank Liu Copyright © 1997 by David R. "Sequences".. The argument list is terminated by the first null pointer argument. This download file is made available for personal use only and is subject to the Terms of Service. Thus Ring_T names.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. "Rings". . "Tables". and assigns it to names. Ring_free deallocates the ring in *ring and clears *ring. The values in a ring of length N are associated with the integer indices zero through N−1. extern int Ring_length(T ring). Rings are created by the functions that parallel similar functions in the Seq interface: ¢exported functions 184²≡ extern T Ring_new (void). NULL). . see page 105. Any other use requires prior written consent from the copyright owner..com. so programmers must provide casts when passing other than char or void pointers. Unauthorized use. The pointers passed in the variable part of the ring’s argument list are assumed to be void pointers. Ring_new and Ring_ring can raise Mem_Failed. creates a ring with the five values shown. Ring_ring creates and returns a ring whose values are initialized to its nonnull pointer arguments. . Ring_new creates and returns an empty ring. It is a checked runtime error for ring or *ring to be null pointers.. All rights reserved.

void *x). Unauthorized use. Frank Liu Copyright © 1997 by David R. Values may be added anywhere in a ring by ¢exported functions 184²+≡ extern void *Ring_add(T ring. reproduction and/or distribution are strictly prohibited and violate applicable laws..INTERFACE 185 ¢exported functions 184²+≡ extern void *Ring_get(T ring. Any other use requires prior written consent from the copyright owner. .com. It is a checked runtime error for i to be equal to or greater than N. This download file is made available for personal use only and is subject to the Terms of Service. the top row are the positive positions. Adding a new value increments both the indices of the values to its right and the length of the ring by one. which shows a five-element ring holding the integers zero through four. Ring_put changes the ith value in ring to x and returns the previous value. Ring_add can raise Mem_Failed. int i). Ring_add adds x to ring at position pos and returns x. extern void *Ring_addhi(T ring. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The nonpositive positions specify locations from the end of the ring without knowing its length. Ring_get returns the ith value in ring. extern void *Ring_put(T ring. The positions zero and one are also valid for empty rings. void *x). and the bottom row are the nonpositive positions. Ring_add accepts either form of position. void *x). The positions in a ring with N values specify locations between values as depicted in the following diagram. int pos. C Interfaces and Implementations: Techniques for Creating Reusable Software. which inlcudes the positive positions than exceed one plus the length of the ring and the negative positions whose absolute values exceed the length of the ring. void *x). All rights reserved. 1 2 3 4 5 6 0 1 2 3 4 0 –5 –4 –3 –2 –1 The middle row of numbers are the indices. It is a checked runtime error to specify a nonexistent position. The functions ¢exported functions 184²+≡ extern void *Ring_addlo(T ring. int i.

the functions ¢exported functions 184²+≡ extern void *Ring_remlo(T ring).. and the indices of each value are incremented by n modulo the length of ring. Ring_remlo is equivalent to Ring_remove(ring. It is a checked runtime error for i to be equal to or exceed the length of ring.1). remove and return the value at the low or high end of ring. 1. B A H G C D E F F E G H A B D C C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. and Ring_remhi is equivalent to Ring_remove(ring. All rights reserved. ring is rotated to the right — clockwise — n values. Removing a value decrements the indices of the remaining values to its right by one and the length of the ring by one. and Ring_addhi is equivalent to Ring_add(ring. the arrows point to the first element. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. . The name “ring” comes from the function ¢exported functions 184²+≡ extern void Ring_rotate(T ring. x). x). The function ¢exported functions 184²+≡ extern void *Ring_remove(T ring. extern void *Ring_remhi(T ring). Ring_addlo is equivalent to Ring_add(ring. 0).com. Hanson. 0. Rotating an eight-value ring that holds the strings A through H three places to the right is illustrated by the following diagram.186 RINGS are equivalent to their similarly named counterparts in the Seq interface. Ring_length(ring) . Unauthorized use. Like the Seq functions with similar names. removes and returns the ith value in ring. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int n). If n is positive. Frank Liu Copyright © 1997 by David R. Ring_addlo and Ring_addhi can raise Mem_Failed. int i). It is a checked runtime error to pass an empty ring to Ring_remlo or Ring_remhi. which renumbers the values in ring by “rotating” it left or right.

ring is rotated to the left — counterclockwise — n values and the indices of each value are decremented by n modulo the length of ring.1 shows the structures for a ring with six values. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. and each node’s llink field points to its predecessor. Frank Liu Copyright © 1997 by David R. .2 Implementation The implementation represents a ring as a structure with two fields: ¢ring.h" "ring. }. This download file is made available for personal use only and is subject to the Terms of Service.h" #define T Ring_T struct T { struct node { struct node *llink. int length. } *head. All rights reserved. successive values are in the nodes linked by the rlink fields. If n modulo the length of the ring is zero. The dotted lines emanate from the llink fields and go counterclockwise.h> <stdarg.. *rlink. ¢functions 188² The head field points to a doubly linked list of node structures in which the value fields hold the values in the ring. Any other use requires prior written consent from the copyright owner. and the solid lines emanate from the rlink fields and go clockwise. Unauthorized use. which is what Ring_new returns: C Interfaces and Implementations: Techniques for Creating Reusable Software. 12. head points to the value associated with index zero. Figure 12.IMPLEMENTATION 187 If n is negative.h" "mem. It is a checked runtime error for the absolute value of n to exceed the length of ring.h> <string.h> "assert. An empty ring has a zero length field and a null head field. void *value. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.c²≡ #include #include #include #include #include #include <stdlib. Hanson. Ring_rotate has no effect.

reproduction and/or distribution are strictly prohibited and violate applicable laws.1 A six-element ring ¢functions 188²≡ T Ring_new(void) { T ring. x). Unauthorized use. .. T ring = Ring_new(). return ring. NEW0(ring). } Ring_ring creates an empty ring. } C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R.com.188 RINGS 6 head length llink rlink value Figure 12. All rights reserved. va_end(ap).. for ( . This download file is made available for personal use only and is subject to the Terms of Service.. ring->head = NULL.) { va_list ap. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . return ring. then calls Ring_addhi to append each of its pointer arguments up to but not including the first null pointer: ¢functions 188²+≡ T Ring_ring(void *x. x. void *)) Ring_addhi(ring. x). Any other use requires prior written consent from the copyright owner. va_start(ap. x = va_arg(ap.

*q. for ( . ) C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(ring && *ring). n-. reproduction and/or distribution are strictly prohibited and violate applicable laws. n-. p = q) { q = p->rlink. n-. which is accomplished by the following chunk.com. } returns the number of values in a ring. q = ring->head.. . ¢functions 188²+≡ void Ring_free(T *ring) { struct node *p. All rights reserved. so Ring_free just follows the rlink pointers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if ((p = (*ring)->head) != NULL) { int n = (*ring)->length. Ring_get and Ring_put must both find the ith value in a ring. if (i <= ring->length/2) for (n = i. Any other use requires prior written consent from the copyright owner.> 0. return ring->length. else for (n = ring->length .> 0. This download file is made available for personal use only and is subject to the Terms of Service. It doesn’t matter in which order the nodes are deallocated. Doing this amounts to traversing the list to the ith node structure. Unauthorized use. ¢q ← ith node 189²≡ { int n. Hanson. } } FREE(*ring). Frank Liu Copyright © 1997 by David R. then deallocates the ring header.i. ) q = q->rlink.> 0. FREE(p).IMPLEMENTATION 189 Deallocating a ring first deallocates the node structures. } The function Licensed by Frank Liu 1740749 ¢functions 188²+≡ int Ring_length(T ring) { assert(ring).

} This code takes the shortest route to the ith node: If i is does not exceed one-half the ring’s length. Any other use requires prior written consent from the copyright owner. ¢q ← ith node 189² prev = q->value. Given this chunk. void *x) { struct node *q. Here’s the code: C Interfaces and Implementations: Techniques for Creating Reusable Software.190 RINGS q = q->llink.com.1. and insert it into its proper place in the doubly linked list. q->value = x. void *prev. } The functions that add values to a ring must allocate a node. They must also cope with adding a node to an empty ring. Otherwise. initialize it. return prev. In Figure 12. the two access functions are easy: ¢functions 188²+≡ void *Ring_get(T ring. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. as shown in Figure 12. for example. int i) { struct node *q. assert(i >= 0 && i < ring->length). reproduction and/or distribution are strictly prohibited and violate applicable laws. Shading distinguishes the new node. Ring_addhi is the easiest one of these functions: It adds a new node to the left of the node pointed to by head. assert(i >= 0 && i < ring->length).2. int i. assert(ring). ¢q ← ith node 189² return q->value. values 0 through 3 are reached by going right. the second for loop goes counterclockwise via the llink pointers. assert(ring).. . Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. the first for loop goes clockwise via the rlink pointers to the desired node. Hanson. and the heavier lines in the righthand figure indicate which links are changed. } void *Ring_put(T ring. All rights reserved. and values 4 and 5 are reached by going left.

Unauthorized use. Ring_addhi aims q at the first node in the ring and inserts the new node to its left. . q head p q head Figure 12. As suggested in Figure 12. and the node’s links point to the node itself. reproduction and/or distribution are strictly prohibited and violate applicable laws..2 Inserting a new node to the left of head C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. This insertion involves initializing the links of the new node and redirecting q’s llink and q’s predecessor’s rlink: ¢insert p to the left of q 191²≡ { p->llink = q->llink. void *x) { struct node *p. assert(ring). Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. NEW(p). ¢make p ring’s only value 191²≡ ring->head = p->llink = p->rlink = p. All rights reserved. } Adding a value to an empty ring is easy: ring->head points to the new node. q->llink->rlink = p. return p->value = x. *q.IMPLEMENTATION 191 ¢functions 188²+≡ void *Ring_addhi(T ring. if ((q = ring->head) != NULL) ¢insert p to the left of q 191² else ¢make p ring’s only value 191² ring->length++.com. Hanson.2.

heavy arcs show the new links. At each step. . It’s instructive to redraw this sequence when q points to the only node in the doubly linked list.3’s sequence illustrate the individual effect of these four statements. All rights reserved. Frank Liu Copyright © 1997 by David R.. This download file is made available for personal use only and is subject to the Terms of Service. p q Figure 12. } The second through fifth diagrams in the Figure 12. Unauthorized use. q->llink = p. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws.com.3 Inserting a new node to the left of q C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.192 RINGS p->rlink = q.

return x. converts a position to the index of the value to the right of the position and adds the new node to its left. x). These cases can be handled by letting Ring_addlo and Ring_addhi deal with additions at the ends. } Ring_add is the most complicated of the three functions that add values to a ring because it deals with the arbitrary positions described in the previous section. else if (pos == 0 || pos == ring->length + 1) return Ring_addhi(ring. Ring_addhi(ring. ring->head = ring->head->llink. . x). void *x) { assert(ring). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. but the new node becomes the first node in the ring.com.1. which include adding values to either end of the ring. return p->value = x. assert(pos >= -ring->length && pos<=ring->length+1). if (pos == 1 || pos == -ring->length) return Ring_addlo(ring.IMPLEMENTATION 193 Ring_addlo is almost as easy. void *x) { assert(ring). This download file is made available for personal use only and is subject to the Terms of Service. ¢insert p to the left of q 191² ring->length++. else { struct node *p.. ¢functions 188²+≡ void *Ring_add(T ring. int i = pos < 0 ? pos + ring->length : pos . Frank Liu Copyright © 1997 by David R. Unauthorized use. x). *q. and. ¢q ← ith node 189² NEW(p). which is done by setting head to its predecessor: ¢functions 188²+≡ void *Ring_addlo(T ring. which incidentally takes care of the empty ring case. Any other use requires prior written consent from the copyright owner. for the other cases. reproduction and/or distribution are strictly prohibited and violate applicable laws. int pos. All rights reserved. This transformation can be accomplished by calling Ring_addhi then rotating the ring one value to the right. } } C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. as above.

} If i is zero. . Ring_remove is the most general of the three functions: It finds the ith node and removes it from the doubly linked list: ¢functions 188²+≡ void *Ring_remove(T ring. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. The initialization of i handles the positions that correspond to the indices one through ring->length . ¢q ← ith node 189² if (i == 0) ring->head = ring->head->rlink. The third statement in ¢delete node q 194² frees the node. reproduction and/or distribution are strictly prohibited and violate applicable laws. the only one is when the last value in a ring is removed. x = q->value. The affected links are shown with heavy arcs. assert(ring->length > 0). The three functions that remove values are easier than those that add values because there are fewer boundary conditions. assert(i >= 0 && i < ring->length). ¢delete node q 194² return x. q->rlink->llink = q->llink.194 RINGS The first two if statements cover positions that specify the ends of the ring.. All rights reserved. Ring_remove deletes the first node and thus must redirect head to the new first node. if (--ring->length == 0) ring->head = NULL. and the last two statements decrement ring’s C Interfaces and Implementations: Techniques for Creating Reusable Software. int i) { void *x. deleting one requires only two: ¢delete node q 194²≡ q->llink->rlink = q->rlink. struct node *q. Adding a node involves four pointer assignments.com.4 illustrate the individual effect of the two statements at the beginning of this chunk. Any other use requires prior written consent from the copyright owner.1. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. Frank Liu Copyright © 1997 by David R. FREE(q). The second and third diagrams in Figure 12. assert(ring).

. Unauthorized use. q = ring->head->llink. x = q->value. assert(ring). Again. assert(ring->length > 0). Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com.. struct node *q. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. } As shown above. it’s instructive to draw the sequence for deleting a node from one. This download file is made available for personal use only and is subject to the Terms of Service. but finding the doomed node is easier: ¢functions 188²+≡ void *Ring_remhi(T ring) { void *x.IMPLEMENTATION 195 q Figure 12. Ring_remhi is similar.4 Deleting node q length and clear its head pointer if its last node was just deleted. Ring_addlo is implemented by calling Ring_addhi and changing ring’s head to point to its predecessor. The symmetric idiom C Interfaces and Implementations: Techniques for Creating Reusable Software.and two-node lists. ¢delete node q 194² return x.

. else i = n + ring->length. ring->head = ring->head->rlink. int i. If n is positive. which means that the value with index n modulo N becomes its new head. } The last operation rotates a ring. assert(n >= -ring->length && n <= ring->length). ¢functions 188²+≡ void *Ring_remlo(T ring) { assert(ring). If n is negative. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. return Ring_remhi(ring). ¢functions 188²+≡ void Ring_rotate(T ring. assert(ring->length > 0). assert(ring). ¢q ← ith node 189² ring->head = q. an N-value ring is rotated clockwise. Hanson. 189² here ensures that the rotation takes the short- Further Reading Both Knuth (1973a) and Sedgewick (1990) cover the algorithms for manipulating doubly linked lists in detail. Exercise 12.. Some of the operations provided in Icon for removing and adding values to a list are similar to those provided by Ring.4 explores C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. which means its head moves to the value with index n + N. the ring is rotated counterclockwise. Any other use requires prior written consent from the copyright owner. int n) { struct node *q. } Using ¢q ← ith node est route. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (n >= 0) i = n%ring->length.196 RINGS implements Ring_remlo: Change ring’s head to point to its successor and call Ring_remhi.com.

and measure the performance of both implementations. This download file is made available for personal use only and is subject to the Terms of Service.3 The call Ring_get(ring. such as Ring_get(ring. reproduction and/or distribution are strictly prohibited and violate applicable laws. Modify the implementation so that a ring remembers its most recently accessed index and the corresponding node. 12. 12. Any other use requires prior written consent from the copyright owner.. which are similar to rings.com. Unauthorized use.EXERCISES 197 the Icon implementation.1 Rewrite the loop in Ring_free to eliminate the variable n. as doubly linked lists of arrays that each hold N values. Adding a value either adds it to a vacant slot in an existing array or adds a new array. How big must rings become before the improvement can be detected? C Interfaces and Implementations: Techniques for Creating Reusable Software. Removing a value vacates a slot in an array and. Finding the ith value walks down approximately i/N arrays in the ring’s list and then computes the index into that array for the ith value. i + 1). and use this information to avoid the loops in ¢q ← ith node 189² when possible. This representation is more complicated than the one described in this chapter.2 Inspect the implementation of Ring_rotate carefully. These arrays are used as circular buffers. use the list structure to determine when the loop ends. Devise a test program for which measurements demonstrate the benefits of your improvement. Exercises 12. like the arrays in the Seq implementation. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. Explain why the consequent of the second if statement must be written as i = n + ring->length. Reimplement rings using this representation. All rights reserved. removes the array from the list and deallocates it. i) is often followed closely by another call. . Hanson. if it is the last one occupied in that array. The scheme used in Ring_add for specifying positions is from Icon.4 Icon implements lists. Don’t forget to update this information when values are added or removed. but it performs better for large rings. 12.

This download file is made available for personal use only and is subject to the Terms of Service. Hanson. . Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. Unauthorized use.C Interfaces and Implementations: Techniques for Creating Reusable Software.com. Frank Liu Copyright © 1997 by David R..

T Licensed by Frank Liu 1740749 13.h²≡ #ifndef BIT_INCLUDED #define BIT_INCLUDED #define T Bit_T typedef struct T *T. Bit can provide functions that Set cannot. Nevertheless. Bit provides most of the set-manipulation functions provided by Set. Thus. and also a few functions that are specific to bit vectors.13 BIT VECTORS he sets described in Chapter 9 can hold arbitrary elements because the elements are manipulated only by functions supplied by clients. but they’re used often enough to warrant a separate ADT. The Bit interface exports functions that manipulate bit vectors. For example. 199 C Interfaces and Implementations: Techniques for Creating Reusable Software.1 Interface The name “bit vector” reveals that the representation for sets of integers is essentially a sequence of bits. Unauthorized use.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. Hanson. Any other use requires prior written consent from the copyright owner. such as the complement of a set. Unlike the sets provided by Set. Sets of integers are less flexible.com. This download file is made available for personal use only and is subject to the Terms of Service. which is the set of integers in the range zero to N−1. . the sets represented by bit vectors have a well-defined universe. 256-bit vectors can be used to represent sets of characters efficiently. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. the Bit interface exports only an opaque type that represents a bit vector: ¢bit. which can be used to represent sets of integers from zero to N−1.

Bit_length returns the number of bits in set.200 BIT VECTORS ¢exported functions #undef T #endif 200² The length of a bit vector is fixed when the vector is created by Bit_new: ¢exported functions 200²≡ extern T Bit_new (int length). and Bit_diff.. or for bit to be other than zero or one. ¢exported functions 200²+≡ extern void Bit_free(T *set). The vector represents the integers zero through length−1. It is a checked runtime error for set or *set to be null. The functions above manipulate individual bits in a set. extern int Bit_put(T set. Bit_inter. frees *set and clears *set. the functions C Interfaces and Implementations: Techniques for Creating Reusable Software. and Bit_count returns the number of ones in set. extern int Bit_count (T set). This download file is made available for personal use only and is subject to the Terms of Service. int bit). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Bit_new creates a new vector of length bits and sets all the bits to zero. Bit_get returns one if bit n in set is one and zero otherwise. int n). that is. Unauthorized use. Frank Liu Copyright © 1997 by David R. . Bit_put sets bit n to bit and returns the previous value of that bit. reproduction and/or distribution are strictly prohibited and violate applicable laws. inclusive. It is a checked runtime error for length to be negative. Bit_new can raise Mem_failed. Individual elements of a set — bits in its vector — are manipulated by the functions ¢exported functions 200²+≡ extern int Bit_get(T set. extern int Bit_length(T set). int n. Hanson. Any other use requires prior written consent from the copyright owner. Bit_minus. Bit_get returns bit n and thus tests whether n is in set.com. It is a checked runtime error to pass a null T to any routine in this interface. All rights reserved. It is a checked runtime error for n to be negative or to be equal to or greater than the length of set. except for Bit_union.

¢exported functions 200²+≡ extern int Bit_lt (T s. because Bit_map must process the bits in place. int lo. Any other use requires prior written consent from the copyright owner. and for lo or hi to be negative or to be equal to or greater than the length of set. T t). int lo. extern void Bit_not (T set. Frank Liu Copyright © 1997 by David R. n is the bit number. T t). If the call to apply for bit n changes bit k where k > n. If s ⊂ t. It is a checked runtime error for lo to exceed hi. Unauthorized use.com. and cl is supplied by the client. Bit_eq returns one if s = t and zero otherwise. Unlike the function passed to Table_map. T t). int hi). Bit_leq returns one if s ⊆ t and zero otherwise. extern int Bit_eq (T s. T t). This download file is made available for personal use only and is subject to the Terms of Service. and Bit_not complements bits lo through hi. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which are described in Chapter 9. int lo. C Interfaces and Implementations: Techniques for Creating Reusable Software. bit is the value of bit n. extern void Bit_set (T set. void *cl). The function ¢exported functions 200²+≡ extern void Bit_map(T set. Bit_clear clears bits lo through hi inclusive.. All rights reserved. . int hi). extern int Bit_leq(T s. Hanson. apply may change set. calls apply for each bit in set. reproduction and/or distribution are strictly prohibited and violate applicable laws. ¢exported functions 200²+≡ extern T Bit_union(T s. To do otherwise would require Bit_map to make a copy of the vector before processing its bits. Each function returns a new set whose value is the result of the operation. s is a proper subset of t. Bit_lt returns one if s ⊂ t and zero otherwise. the change will be seen by a subsequent call to apply. beginning at bit zero. Bit_set sets bits lo through hi inclusive. int bit. extern T Bit_inter(T s. T t). which is between zero and one less than the length of the set. it is a checked runtime error for s and t to have different lengths. void *cl). For all three functions. void apply(int n. int hi). manipulate contiguous sequences of bits in a set — subsets of a set. The following functions implement the four standard set operations.INTERFACE 201 ¢exported functions 200²+≡ extern void Bit_clear(T set.

reproduction and/or distribution are strictly prohibited and violate applicable laws. T t). Bit_inter returns the intersection of s and t. . All rights reserved. These functions always return a nonnull T.. but not for both. which is the logical AND of the two bit vectors. T t). s / t.h" "mem. Frank Liu Copyright © 1997 by David R. unsigned char *bytes.h" "bit. It is a checked runtime error for both s and t to be null. unsigned long *words.h> <string. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. NULL) thus returns a copy of s. ¢macros 203² ¢static data 207² ¢static functions ¢functions 203² 212² C Interfaces and Implementations: Techniques for Creating Reusable Software.c²≡ #include #include #include #include #include <stdarg. Bit_diff returns the symmetric difference of s and t. Hanson.h" #define T Bit_T struct T { int length. which is the exclusive OR of the two bit vectors. These functions can raise Mem_Failed. 13. s ∗ t. and interpret them as empty sets. which is the inclusive OR of the two bit vectors.2 Implementation A Bit_T is a pointer to a structure that carries the length of the bit vector and the vector itself: ¢bit. s − t. Bit_minus returns the difference of s and t.com. These four functions accept null pointers for either s or t. and for s and t to have different lengths. Bit_union returns the union of s and t. extern T Bit_diff (T s. which is the logical AND of s and the complement of t. Any other use requires prior written consent from the copyright owner. }. Bit_union(s.202 BIT VECTORS extern T Bit_minus(T s. This download file is made available for personal use only and is subject to the Terms of Service.h> "assert. denoted s + t.

IMPLEMENTATION 203 The length field gives the number of bits in the vector. like Bit_union. else set->words = NULL. where 8•i is the least significant bit in the byte. manipulate all the bits in parallel. and bytes points to at least length ⁄ 8 bytes.1)&(~(BPW-1)))/BPW) Bit_new uses nwords when it allocates a new T: ¢functions 203²≡ T Bit_new(int length) { T set. unsigned longs. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. It’s possible to store the bits in an array of. .. like Bit_get. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The bits are accessed by indexing bytes. Notice that this convention uses only eight bits of each character. set->length = length. Bit_clear. Hanson. Bit_set. bytes[i] refers to the byte holding bits 8•i through 8•i + 7. nwords computes the number of unsigned longs needed for a bit vector of length bits: ¢macros 203²+≡ #define nwords(len) ((((len) + BPW . if all the operations that access individual bits. C Interfaces and Implementations: Techniques for Creating Reusable Software. and Bit_not. Some operations.com. Bit uses an array of characters to permit table-driven implementations of Bit_count. NEW(set). use the same convention for accessing the bits. say. sizeof (unsigned long)). the vectors are accessed BPW bits at a time via words. Any other use requires prior written consent from the copyright owner. the excess bits go unused. if (length > 0) set->words = CALLOC(nwords(length). For these operations. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. where ¢macros 203²≡ #define BPW (8*sizeof (unsigned long)) words must point to an integral number of unsigned longs. Frank Liu Copyright © 1997 by David R. set->bytes = (unsigned char *)set->words. on machines where characters have more than eight bits. assert(length >= 0).

FREE((*set)->words).1. static char count[] = { 0.1.2. Hanson. } 13. --n >= 0. ) { unsigned char c = set->bytes[n]. These excess bytes must be zero in order for the functions below to work properly.3. Bit_free deallocates the set and clears its argument. assert(set).. reproduction and/or distribution are strictly prohibited and violate applicable laws. FREE(*set).3. Unauthorized use. } int Bit_length(T set) { assert(set).3.2. } Bit_new may allocate as many as sizeof (unsigned long) − 1 excess bytes.2. It could simply walk through the set and test every bit. n.2.com. but it’s just as easy to use the two halves of a byte — its two four-bit “nibbles” — as indices into a table that gives the number of one bits for each of the 16 possible nibbles: ¢functions 203²+≡ int Bit_count(T set) { int length = 0.204 BIT VECTORS return set. This download file is made available for personal use only and is subject to the Terms of Service. } C Interfaces and Implementations: Techniques for Creating Reusable Software. the number of one bits in the set. ¢functions 203²+≡ void Bit_free(T *set) { assert(set && *set). . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.1.4 }.2. Any other use requires prior written consent from the copyright owner.1 Member Operations Bit_count returns the number of members in a set — that is. length += count[c&0xF] + count[c>>4].1. return set->length.3.2. All rights reserved. Frank Liu Copyright © 1997 by David R.2. for (n = nbytes(set->length). and Bit_length returns the length field.

that is. Unauthorized use.com. Bit n is bit number n%8 in byte n/8. This loop may access some extraneous bits..1)&(~(8-1)))/8) nbytes computes len ⁄ 8 . int n. } ¢bit n in set 205²≡ ((set->bytes[n/8]>>(n%8))&1) Bit_put uses a similar idiom to set bit n: When bit is one. if (bit == 1) set->bytes[n/8] |= 1<<(n%8). prev = ¢bit n in set 205². C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(0 <= n && n < set->length). but since Bit_new initializes them to zeros. ¢functions 203²+≡ int Bit_put(T set. assert(bit == 0 || bit == 1). All rights reserved. int bit) { int prev. int n) { assert(set). Each iteration of the loop counts the number of bits in byte n of the set by adding to length the sum of the number of bits in the byte’s two four-bit nibbles. This download file is made available for personal use only and is subject to the Terms of Service. . return ¢bit n in set 205². Any other use requires prior written consent from the copyright owner. Bit_get returns the value of bit n by shifting byte n/8 to the right n%8 bits and returning only the rightmost bit: ¢functions 203²+≡ int Bit_get(T set. where the bit numbers in a byte start at zero and increase from the right to the left.IMPLEMENTATION 205 return length. Bit_put shifts a one left by n%8 bits and ORs that result into byte n/8. Hanson. assert(set). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and it’s used in operations that sequence through vectors one bit at a time. assert(0 <= n && n < set->length). Frank Liu Copyright © 1997 by David R. they can’t corrupt the result. reproduction and/or distribution are strictly prohibited and violate applicable laws. } ¢macros 203²+≡ #define nbytes(len) ((((len) + 8 . the least significant bit is bit zero and the most significant bit is bit seven.

clear. 3. if set has 60 bits. These three regions appear.206 BIT VECTORS else set->bytes[n/8] &= ~(1<<(n%8)). For example. but they’re more complicated because they must cope with ranges that straddle byte boundaries.. Bit_set. int hi) { ¢check set. and Bit_not all use similar techniques to set.hi%8 in byte lo/8 208² } ¢check set. Any other use requires prior written consent from the copyright owner. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. return prev.hi/8-1 207² ¢set the least significant bits in byte hi/8 207² } else ¢set bits lo%8. } As shown. right to left. and hi assert(set). . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 54) sets bits three through seven in the first byte. and bits zero through six in byte six. The code for Bit_set reflects these three regions: ¢functions 203²+≡ void Bit_set(T set. it then ANDs this mask into byte n/8.com. where byte numbers start at zero. reproduction and/or distribution are strictly prohibited and violate applicable laws. Bit_clear. in the three shades in the following figure. and hi 206² if (lo/8 < hi/8) { ¢set the most significant bits in byte lo/8 207² ¢set all the bits in bytes lo/8+1. lo. Hanson. lo. 7 6 5 4 3 2 1 0 The four most significant bits of byte seven aren’t used and thus are always zero. all of the bits in bytes one through five. 206²≡ C Interfaces and Implementations: Techniques for Creating Reusable Software. Bit_set(set. Bit_put clears bit n by forming a mask in which bit n%8 is zero and all the other bits are one. Frank Liu Copyright © 1997 by David R. int lo. Unauthorized use.. and complement a range of bits in a set..

. Any other use requires prior written consent from the copyright owner. 0x0F. if it’s seven. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 0xFC. In the second region. hi%8 can be used as an index into a table to select the appropriate mask to OR into byte hi/8: ¢set the least significant bits in byte hi/8 207²≡ set->bytes[hi/8] |= lsbmask[hi%8]. Unauthorized use. Again. 0x7F. Hanson. This download file is made available for personal use only and is subject to the Terms of Service. . 0xE0. i++) set->bytes[i] = 0xFF. all of the bits get set. 0xFF }. } hi%8 determines which bits in the byte hi/8 get set: If hi%8 is zero. 0x1F. ORing msbmask[lo%8] into byte lo/8 sets the appropriate bits: ¢set the most significant bits in byte lo/8 207²≡ set->bytes[lo/8] |= msbmask[lo%8]. C Interfaces and Implementations: Techniques for Creating Reusable Software. 0xFE. all of the bits in each byte get set to one: ¢set all the bits in bytes lo/8+1. 0xF0.IMPLEMENTATION 207 assert(0 <= lo && hi < set->length). These and the other possibilities can be stored in a table of masks indexed by lo%8: ¢static data 207²≡ unsigned char msbmask[] = { 0xFF. i < hi/8. for (i = lo/8+1. only the least significant bit is set.com. reproduction and/or distribution are strictly prohibited and violate applicable laws.. 0xF8. only the most significant bit is set. 0xC0. 0x3F. ¢static data 207²+≡ unsigned char lsbmask[] = { 0x01. When lo and hi refer to bits in different bytes.hi/8-1 207²≡ { int i. 0x07. assert(lo <= hi). if it’s seven. the number of bits that get set in byte lo/8 depends on lo%8: If lo%8 is zero. 0x03. Frank Liu Copyright © 1997 by David R. All rights reserved. all of the bits are set. 0x80 }.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. so the code for this case is: ¢set bits lo%8. for (i = lo/8+1. msbmask and lsbmask provide the complements of the masks that are ANDed with bytes lo/8 and hi/8. the masks given by msbmask[lo%8] and lsbmask[hi%8] can be combined to set the appropropriate bits.hi%8 in byte lo/8 208²≡ set->bytes[lo/8] |= ¢mask for bits lo%8. reproduction and/or distribution are strictly prohibited and violate applicable laws. Bit_not must flip bits lo through hi. Hanson. int lo. the two masks overlap in just those bits that should be set..hi%8 } 208².com. int hi) { ¢check set. and use msbmask and lsbmask in similar ways. 9. this can be done by ORing in the mask 0x3E. i++) set->bytes[i] = 0. Bit_set(set. i < hi/8. Any other use requires prior written consent from the copyright owner. } else set->bytes[lo/8] &= ~¢mask for bits lo%8. lo. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. which it does by using an exclusive OR with masks to cover the appropriate bits: ¢functions 203²+≡ void Bit_not(T set. lo. Unauthorized use. 208². In general. int lo.hi%8 208²≡ (msbmask[lo%8]&lsbmask[hi%8]) Bit_clear and Bit_not are similar to Bit_set. which is the AND of msbmask[1] and lsbmask[5].. .. set->bytes[lo/8] &= ~msbmask[lo%8]. respectively: ¢functions 203²+≡ void Bit_clear(T set. 13) sets bits one through five in the second byte of set. and hi 206² if (lo/8 < hi/8) { int i.. Frank Liu Copyright © 1997 by David R. For Bit_clear. and hi 206² C Interfaces and Implementations: Techniques for Creating Reusable Software.hi%8 ¢mask for bits lo%8. int hi) { ¢check set. set->bytes[hi/8] &= ~lsbmask[hi%8]. For example..208 BIT VECTORS When lo and hi refer to bits in the same byte.

com. void *cl) { int n.hi%8 } 208².2. Frank Liu Copyright © 1997 by David R. set->bytes[lo/8] ^= msbmask[lo%8]. } As shown. ¢bit n in set 205². It passes the bit number. ¢functions 203²+≡ void Bit_map(T set. set->bytes[hi/8] ^= lsbmask[hi%8]. reproduction and/or distribution are strictly prohibited and violate applicable laws. . for (i = lo/8+1. for (n = 0. void *cl). The value of n/8 changes only every eight bytes. } else set->bytes[lo/8] ^= ¢mask for bits lo%8. which stipulates that if apply changes a bit that it hasn’t yet seen.. Unauthorized use. it will see the new value in a subsequent call. n < set->length. All rights reserved. But this improvement violates the interface. Bit_map delivers the bits using the same numbering that is implicit in Bit_get and the other Bit functions that take bit numbers as arguments. int bit. so it’s tempting to copy each byte from set->bytes[n/8] to a temporary variable.. i < hi/8. and the loop can quit as soon as it’s known that s ≠ t: ¢functions 203²+≡ int Bit_eq(T s. This download file is made available for personal use only and is subject to the Terms of Service. void apply(int n. Hanson. i++) set->bytes[i] ^= 0xFF.2 Comparisons Bit_eq compares sets s and t and returns one if they’re equal and zero if they’re not. This can be done by comparing the corresponding unsigned longs in s and t. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 209 if (lo/8 < hi/8) { int i. cl). and a client-supplied pointer. its value. C Interfaces and Implementations: Techniques for Creating Reusable Software. then dole out each bit by shifting it and masking. Bit_map calls apply for every bit in a set. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. n++) apply(n. T t) { int i. assert(set). Licensed by Frank Liu 1740749 13.

words[i] ⊆ t->u. for all i. T t) { int i. . this relationship holds for each unsigned long in s and t. assert(s->length == t->length). ) if (s->words[i] != t->words[i]) return 0. --i >= 0. then s ⊆ t. assert(s->length == t->length). s ⊆ t if. If. All rights reserved. In terms of sets. s ⊂ t if s ⊆ t and s ≠ t. Unauthorized use. the corresponding bit in t is one. for (i = nwords(s->length). T t) { int i. s ⊆ t if s&~t is equal to zero. s ⊆ t if the intersection of s and the complement of t is empty.words[i] is equal to zero and that at least one of s->u. assert(s && t).words[i]&~t->u. return 1. assert(s && t). for (i = nwords(s->length). --i >= 0.words[i]. too. reproduction and/or distribution are strictly prohibited and violate applicable laws. Bit_leq uses this property to stop comparing as soon as the outcome is known: ¢functions 203²+≡ int Bit_leq(T s. lt = 0. } Bit_leq compares sets s and t and determines whether s is equal to t or a proper subset of t.words[i] is not equal to the corresponding t->u. for (i = nwords(s->length).210 BIT VECTORS assert(s && t).words[i]: ¢functions 203²+≡ int Bit_lt(T s.. Any other use requires prior written consent from the copyright owner.com. } Bit_lt returns one if s is a proper subset of t. which can be done by ensuring that s->u. assert(s->length == t->length). Thus. for every bit in s. return 1. ) if ((s->words[i]&~t->words[i]) != 0) return 0. --i >= 0. This download file is made available for personal use only and is subject to the Terms of Service. ) if ((s->words[i]&~t->words[i]) != 0) C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. s->u. Hanson.

. the result is a copy of the other set. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. Hanson. \ assert(s->length == t->length). \ for (i = nwords(s->length). \ return set.2. and in how they form the result for two nonempty sets. s ∗ t. } 13. but one of s or t must be nonnull in order to determine the length of the result.3 Set Operations The functions that implement the set operations s + t. The similarities are captured by the setop macro: ¢macros 203²+≡ #define setop(sequal. s − t. Any other use requires prior written consent from the copyright owner. These functions have similar implementations. copy(t).IMPLEMENTATION 211 return 0. return snull. op) \ if (s == t) { assert(s). } \ else if (s == NULL) { assert(t). --i >= 0.. Frank Liu Copyright © 1997 by David R. in they handle null arguments. and s / t can manipulate their operands one long integer at a time. T set. copy(s). tnull. If either s or t is null. } Bit_union typifies these functions: ¢functions 203²+≡ T Bit_union(T s. All rights reserved. the result is a copy of the set. but three differences: in the result when s and t refer to the same set. These functions also interpret a null T as an empty set. return lt. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. T t) { setop(copy(t). ) \ set->words[i] = s->words[i] op t->words[i]. return sequal. snull. because their functions are independent of bit numbers. else if (s->words[i] != t->words[i]) lt |= 1. C Interfaces and Implementations: Techniques for Creating Reusable Software. \ set = Bit_new(s->length). which must be nonnull. |) } If s and t refer to the same set. } \ else if (t == NULL) return tnull. reproduction and/or distribution are strictly prohibited and violate applicable laws. \ else { \ int i.

This download file is made available for personal use only and is subject to the Terms of Service. & ~) } setop’s third argument. t->bytes. s − t is the bitwise AND of s and the complement of t. the result is a set whose unsigned longs are the bitwise OR of the unsigned longs in s and t. but if t is null. All rights reserved. When s and t are the same Bit_T. s / t is equal to t and vice versa. which is the bitwise exclusive OR of s and t. Bit_new(s->length). Any other use requires prior written consent from the copyright owner. set = Bit_new(t->length). otherwise. copy(s). Bit_new(t->length). & ~. The private function copy duplicates its argument set by allocating a new set of the same length and copying the bits from its argument: ¢static functions 212²≡ static T copy(T t) { T set. When s is null. Hanson. . Frank Liu Copyright © 1997 by David R. Unauthorized use.212 BIT VECTORS Otherwise. return set. C Interfaces and Implementations: Techniques for Creating Reusable Software. Bit_new(t->length). Bit_diff implements symmetric difference. s / t.com. ¢functions 203²+≡ T Bit_minus(T s. s − t is the empty set. assert(t). if (t->length > 0) memcpy(set->bytes. If both s and t are nonnull. s − t is equal to s. &) } If s is null. T t) { setop(Bit_new(s->length). s − t is the empty set. reproduction and/or distribution are strictly prohibited and violate applicable laws. causes the body of the loop to be set->words[i] = s->words[i] & ~t->words[i]. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } Bit_inter returns an empty set if either of its arguments is null. nbytes(t->length)).. it returns a set that is the bitwise AND of its operands: ¢functions 203²+≡ T Bit_inter(T s. T t) { setop(copy(t).

copy(s). when possible. and implement your interface. sparse sets and that can initialize those sets in constant time. Further Reading Briggs and Torczon (1993) describe a set representation that’s designed specifically for large. Exercises 13. copy(t). Be careful about alignment constraints. 13. Can you find an application where this change yields a measurable improvement in execution time? 13. Gimpel (1974) introduced the spatially multiplexed sets described in Exercise 13. What Bit functions could be simplified or improved? Implement this scheme and devise a test program that quantifies the C Interfaces and Implementations: Techniques for Creating Reusable Software.5.2 Design an interface that supports the sparse sets described by Briggs and Torczon (1993). and complement unsigned longs instead of bytes. i < hi/8. All rights reserved.com. most of the bits are zero. set. Unauthorized use. not storing long runs of zeros..FURTHER READING 213 ¢functions 203²+≡ T Bit_diff(T s. Frank Liu Copyright © 1997 by David R. . This download file is made available for personal use only and is subject to the Terms of Service. 13. Bit_clear and Bit_not have similar loops. s / t is the empty set when s and t refer to the same Bit_T. T t) { setop(Bit_new(s->length).1 In sparse sets. ^) } As shown. Any other use requires prior written consent from the copyright owner.3 Bit_set uses the loop for (i = lo/8+1. to set all of the bits from bytes lo/8+1 to hi/8. Revise these loops to clear.4 Suppose the Bit functions kept track of the number of one bits in a set. for example. Revise the implementation of Bit so that it saves space for sparse sets by. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. i++) set->bytes[i] = 0xFF. reproduction and/or distribution are strictly prohibited and violate applicable laws.

This download file is made available for personal use only and is subject to the Terms of Service. for example. Any other use requires prior written consent from the copyright owner. An advantage of this representation is that some operations can be done in constant time by manipulating only these masks. Frank Liu Copyright © 1997 by David R. allocating a new set allocates one of the free columns in the array. C Interfaces and Implementations: Techniques for Creating Reusable Software. Each one-bit column of the array is one set. if you’re forced to change the interface. the bits are stored one word apart. Reimplement Bit using this representation. design a new one. . On a computer with 32-bit ints.com. Characterize under what conditions the benefit is worth the cost.5 In a spatially multiplexed set. Many N-bit sets can share an N-word array. The union of two sets. 13. reproduction and/or distribution are strictly prohibited and violate applicable laws.214 BIT VECTORS possible speed-up. Unauthorized use.. an array of N unsigned ints can hold 32 N-bit sets. Hanson. is a set whose mask is the union of the operands’ masks. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. or allocates a new array if there are no free columns. but it complicates storage management considerably. This property can save space. because the implementation must keep track of the Nword arrays that have free columns for any value of N. All rights reserved. A 32-bit mask with only bit i set identifies the set in column i.

Second. These functions are called with a format string and a list of arguments whose values are to be formatted. All rights reserved. The other characters are copied verbatim.14 FORMATTING he standard C library functions printf. the ith occurrence of %c describes how to format the ith argument in the list of arguments that follow the format string. as usual. and vprintf format data for output. there’s no way to specify a client-specific output routine. While undoubtedly useful. The conversion specifiers can also include width. fprintf. sprintf(buf. Any other use requires prior written consent from the copyright owner. there is no type-checking for the arguments passed in the vari- T 215 C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. the formatted result can be printed or stored only in a string. Finally. where \n denotes a new-line character. there’s no way to specify the size of the output string. Formatting is controlled by conversion specifiers of the form %c embedded in the format string. For example. the set of conversion specifiers is fixed. For example. precision. First. and padding specifications. there’s no way to provide client-specific codes.com. This download file is made available for personal use only and is subject to the Terms of Service.. . "The %s interface has %d functions\n". Unauthorized use. name. using %06d instead of %d in the format string above would fill buf with "The Array interface has 000008 functions\n". if name is the string Array and count is 8. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and sprintf and vsprintf format data into strings. reproduction and/or distribution are strictly prohibited and violate applicable laws. these functions have at least four shortcomings. Hanson. The third and most dangerous shortcoming is that sprintf and vsprintf can attempt to store more characters in the output string than it can hold. count) fills buf with the string "The Array interface has 8 functions\n".

int width. Fmt isn’t an abstract data type. but it does export a type..com. 216² 14. Any other use requires prior written consent from the copyright owner. ¢exported functions #undef T #endif Technically. one variable. Unauthorized use. int put(int c. extern const Except_T Fmt_Overflow. void *cl. All rights reserved. . int precision).h> #include "except. extern char *Fmt_flags. C Interfaces and Implementations: Techniques for Creating Reusable Software. void *cl. void *cl)..h²≡ #ifndef FMT_INCLUDED #define FMT_INCLUDED #include <stdarg. .h> #include <stdio.. and one exception: ¢fmt. as detailed below.). unsigned char flags[256]. This download file is made available for personal use only and is subject to the Terms of Service. void *cl). reproduction and/or distribution are strictly prohibited and violate applicable laws.1 Formatting Functions The two primary formatting functions are: ¢exported functions 216²≡ extern void Fmt_fmt (int put(int c.216 FORMATTING able part of the argument list.1 Interface The Fmt interface exports 11 functions. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. that defines the type of the format conversion functions associated with each formatting code. Frank Liu Copyright © 1997 by David R. 14. Fmt_T. const char *fmt. one type. Hanson. va_list *app.h" #define T Fmt_T typedef void (*T)(int code.1. The Fmt interface fixes the first three of these shortcomings.

reproduction and/or distribution are strictly prohibited and violate applicable laws. void *). and precision. count) prints The Array interface has 8 functions on the standard output when name is Array and count is 8.1. The put function returns an integer. c is treated as an unsigned char. and valid specifiers traverse a path from start to finish. The cast is necessary because fputc has type int (*)(int. The valid flag characters are those that appear in the string pointed to by Fmt_flags. Any other use requires prior written consent from the copyright owner. For example. This download file is made available for personal use only and is subject to the Terms of Service. one specifier can consume zero or more arguments. A specifier begins with a % and is followed by optional flag characters. The characters in a conversion specifier define a path through this diagram. void *cl). but this design permits the standard I/O function fputc to be used as a put function on some machines when a FILE* is passed as cl.INTERFACE 217 extern void Fmt_vfmt(int put(int c. they usually specify justification. and concludes with a single-character format code.1 defines the syntax of conversion specifiers. Thus. stdout.. va_list ap). This usage is correct only where a FILE pointer has the same representation as a void pointer. void *))fputc. described below. Unauthorized use. and is simply passed along uninterpreted to the client’s put function. It is a checked runtime error for a flag character to appear more than 255 times in one specifier. Fmt_fmt formats its fourth and subsequent arguments according to the format string given by its third argument. depending on the appearance of asterisks and on the specific conversion function associ- C Interfaces and Implementations: Techniques for Creating Reusable Software. name. and truncation. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. denoted by C in Figure 14. padding.com. usually its argument. If an asterisk appears for the field width or precision. period. The syntax diagram shown in Figure 14. const char *fmt. The argument cl may point to client-supplied data. the next argument is assumed to be an integer and is used for the width or precision. FILE*) and put has type int (*)(int. All rights reserved. void *cl. . whose interpretation depends on the format code. so the value passed to put is always positive. cl) to emit each formatted character c. The Fmt functions don’t use this capability. "The %s interface has %d functions\n". and calls put(c. fmt. an optional field width. Hanson. Fmt_fmt((int (*)(int. Frank Liu Copyright © 1997 by David R. Fmt_vfmt formats the arguments pointed to by ap according to the format string given by fmt just as it does for Fmt_fmt.

All rights reserved. width.).. These are a subset of those defined in the standard C library. The precise interpretations of the flags. and precision depend on the conversion functions associated with the conversion specifiers.218 FORMATTING specification: flags '%' number '.' number C number: '*' digit digit Figure 14. .1 Conversion-specifier syntax ated with the format code. otherwise..causes the converted string to be left-justified in the given field width.or + sign. The default conversion specifiers and their associated conversion functions are a subset of those for printf and related functions in the standard I/O library. const char *fmt.1. Unauthorized use. A . It is a checked runtime error for a width or precision to specify a value equal to INT_MIN. Any other use requires prior written consent from the copyright owner. A + causes the result of a signed conversion to start with a . reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. the most negative integer. it’s right-justified. The functions calls are those registered at the time of the call to Fmt_fmt. .. A negative precision is treated as if no precision were given. C Interfaces and Implementations: Techniques for Creating Reusable Software. otherwise blanks are used.com. Frank Liu Copyright © 1997 by David R. whose characters are thus the valid flag characters.. The initial value of Fmt_flags points to the string "-+ 0". The functions ¢exported functions 216²+≡ extern void Fmt_print (const char *fmt. A negative width is treated as a . extern void Fmt_fprint(FILE *stream.). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.flag plus the corresponding positive width. A space causes the result of a signed conversion to begin with a space if it’s positive. Hanson. The default conversion specifiers are summarized in Table 14.. A 0 causes a numeric conversion to be padded to the field width with leading zeros. .

Hanson. or fmt to any of the formatting functions described above. Frank Liu Copyright © 1997 by David R. but takes its arguments from the variable length argument-list pointer ap. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.2 Conversion Functions Each format character C is associated with a conversion function. va_list ap). const char *fmt. The two functions Licensed by Frank Liu 1740749 ¢exported functions 216²+≡ extern char *Fmt_string (const char *fmt. Fmt_print writes its formatted output to the standard output. . and vsprintf. including the terminating null character. are similar to the C library functions printf. Clients are responsible for deallocating them.. buf. int size. reproduction and/or distribution are strictly prohibited and violate applicable laws. sprintf. const char *fmt. This download file is made available for personal use only and is subject to the Terms of Service. 14. except that they allocate strings large enough to hold the formatted results and return these strings. It is a checked runtime error for size to be nonpositive.. Fmt_sfmt formats its fourth and subsequent arguments according to the format string given by fmt. are like Fmt_sfmt and Fmt_vsfmt. Both functions return the number of characters stored into buf. Fmt_string and Fmt_vstring can raise Mem_Failed. These associations can be changed by calling ¢exported functions 216²+≡ extern T Fmt_register(int code. T cvt). It is a checked runtime error to pass a null put.). Fmt_sfmt and Fmt_vsfmt raise Fmt_Overflow if they emit more than size characters. .INTERFACE 219 extern int Fmt_sfmt (char *buf..size-1]. extern char *Fmt_vstring(const char *fmt. C Interfaces and Implementations: Techniques for Creating Reusable Software. not counting the terminating null character.. All rights reserved. va_list ap).. Fmt_vsfmt is similar.1. . fprintf. Fmt_fprint formats its third and subsequent arguments according to the format string given by fmt and writes the formatted output to the indicated stream.com. int size. Unauthorized use. and stores the formatted output as a null-terminated string in buf[0.).. extern int Fmt_vsfmt(char *buf. Any other use requires prior written consent from the copyright owner.

otherwise. the result has the form x.1 Default conversion specifiers conversion specifier argument type description The argument is interpreted as an unsigned character and is emitted. the 0 flag is ignored. Any other use requires prior written consent from the copyright owner. there are no characters in the converted result.are ignored. If the argument and the precision are zero. leading zeros are added. or if a precision is given. the decimal point is omitted. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.y. or hexadecimal (x). The default precision is one. The result has the form x. All flags except . decimal (u). x is always one digit and p is always two digits. The argument is converted to the hexadecimal representation of its value as for u.ye±p if p is less than -4 or p is greater than or equal to the precision. The argument is converted to its decimal representation with the form x. The precision gives the number of significant digits. x has at least one digit.com. specifies the minimum number of digits. the space flag is ignored. There are no trailing zeros in y. The precision gives the number of digits to the right of the decimal point. The flags are interpreted as for d. This download file is made available for personal use only and is subject to the Terms of Service. For x.220 FORMATTING Table 14. Unauthorized use. If both the + and space flags appear. When a decimal point appears. The flags and precision are interpreted as for d. c int d int o u x unsigned f double e double g double p void * s char * C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a checked runtime error for the precision to exceed 99. The argument is converted to its signed decimal representation. The flags and precision are interpreted as for d.. If both the . The argument is converted to its decimal representation with the form x. Hanson. All rights reserved.ye±p. the default is 6. It is a checked runtime error for the precision to exceed 99. . If the precision is given explicitly as 0. The precision. if necessary. the letters abcdef are used for the digits whose values exceed 9. if given. The flags and precision are interpreted as for d. The argument is converted to its unsigned representation in octal (o). the default is one. Frank Liu Copyright © 1997 by David R. The argument is converted to its decimal representation as for f or e depending on its value.y. Successive characters from the argument are emitted until a null character is encountered or the number of characters given by an explicit precision have been emitted.and 0 flags appear. and the decimal point is omitted when y is zero. reproduction and/or distribution are strictly prohibited and violate applicable laws.

width. Any other use requires prior written consent from the copyright owner. Fmt_putd and Fmt_puts are not themselves conversion functions. and precision as described for %s. Unauthorized use. and emits the string according to the conversions specified by flags. unsigned char flags[256]. and then restore the previous function. width. int put(int c. void *cl. width and precision are equal to INT_MIN when they are not given explicitly. They are most useful when writing client-specific conversion functions. int width. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. or a null put to Fmt_putd or Fmt_puts. unsigned char flags[256].. Fmt_puts emits str[0. void *cl). and precision. Fmt_putd assumes that str[0. Hanson. as illustrated below. int precision). Similarly. int len.INTERFACE 221 Fmt_register installs cvt as the conversion function for the format character given by code.. It is a checked runtime error to pass a null str. All rights reserved. Frank Liu Copyright © 1997 by David R. The first two are the format code and a pointer to the variable-length argument-list pointer that must be used to access the data to be formatted. a negative len. void *cl). int width. Fmt exports two utility functions used by its internal conversion functions for numerics and strings. It is also a checked runtime error for a format string to use a conversion specifier that has no associated conversion function.1. This download file is made available for personal use only and is subject to the Terms of Service.com. . The last three arguments are the flags. A conversion function must use expressions like C Interfaces and Implementations: Techniques for Creating Reusable Software. int precision). reproduction and/or distribution are strictly prohibited and violate applicable laws.len-1] according to the conversions specified by flags. field width. The type Fmt_T defines the signature of a conversion function — the types of its arguments and its return type. The third and fourth arguments are the client’s output function and associated data. ¢exported functions 216²+≡ extern void Fmt_putd(const char *str. Clients may thus override conversion functions temporarily. int len. A conversion function is called with seven arguments. Many conversion functions are variations on the functions used for the %d and %s conversion specifiers. It is a checked runtime error for code to be less than one or more than 255. and returns a pointer to the previous function. extern void Fmt_puts(const char *str. The flags are given by a character array of 256 elements. the ith element is equal to number of times the flag character i appears in the conversion specifier.len-1] holds the string representation of a signed number. a null flags.. void *cl. and precision as described for %d in Table 14. but they can be called by conversion functions. int put(int c.

void *cl. This download file is made available for personal use only and is subject to the Terms of Service. int width. C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(flags). then increments *app so that it points to the next argument. char *). type) to fetch the arguments that are to be formatted according to the code with which the conversion function is associated. } Fmt_puts interprets flags. void *cl. int precision) { char *str = va_arg(*app. precision).. Fmt’s private conversion function for the code %s illustrates how to write conversion functions. or until it has emitted the number of characters given using an optional precision. flags. int precision) { assert(str). int put(int c. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢normalize width and flags 223² if (precision >= 0 && precision < len) len = precision. type is the expected type of argument. This expression fetches the argument’s value.com. int put(int c. width. The specifier %s is like printf’s %s: Its function emits characters from the string until it encounters a null character. cl. Frank Liu Copyright © 1997 by David R. Fmt_puts(str. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. width.222 FORMATTING va_arg(*app. unsigned char flags[]. and how to use Fmt_puts. The conversion function uses va_arg to fetch the argument from the variable length argument list and calls Fmt_puts: ¢conversion functions 222²≡ static void cvt_s(int code. int width. All rights reserved. Any other use requires prior written consent from the copyright owner. va_list *app. assert(str). . It is an unchecked runtime error for a conversion function to increment *app incorrectly. strlen(str).flag or a negative width specify left-justification. assert(len >= 0). int len. and precision and emits the string accordingly: ¢functions 222²≡ void Fmt_puts(const char *str. Hanson. unsigned char flags[]. The . void *cl). put. void *cl).

.INTERFACE 223 if (!flags['-']) pad(width . a negative precision as an omitted precision. If there is an explicit precision. if (width < 0) { flags['-'] = 1. } 223²≡ C Interfaces and Implementations: Techniques for Creating Reusable Software.. ¢normalize width and flags ¢normalize width 223² ¢normalize flags 224² ¢normalize width 223²≡ if (width == INT_MIN) width = 0. Any other use requires prior written consent from the copyright owner.com. But the default conversions don’t need this generality. i < len. width and precision are equal to INT_MIN when the width or precision are omitted. This download file is made available for personal use only and is subject to the Terms of Service.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. they all treat an omitted width as an explicit width of zero. the 0 flag is ignored. } ¢emit str[0. reproduction and/or distribution are strictly prohibited and violate applicable laws. width = -width. positive integers as stipulated in Fmt’s specification. All rights reserved.len-1] 223² if ( flags['-']) pad(width . Frank Liu Copyright © 1997 by David R.flag along with the corresponding positive width. a negative width as the . as shown above.len-1] 223²≡ { int i. Unauthorized use. } The cast to unsigned char ensures that the values passed to put are always small. ¢emit str[0.len. ' '). ' ').len. for (i = 0. Hanson. . at most precision characters from str are emitted. as well as repeated flags. i++) put((unsigned char)*str++. and repeated occurrences of a flag as one occurrence. This interface provides the flexibility needed for clientspecific conversion functions to use all combinations of explicit and omitted widths and precisions. and. cl).

\ while (nn-. Frank Liu Copyright © 1997 by David R.h> #include <string. Unauthorized use. width .h> #include <limits.2 Implementation The implementation of Fmt consists of the functions defined in the interface.h> #include <math.h" #include "fmt.h" #include "except. and the table that maps conversion specifiers to conversion functions. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.h> #include <ctype.h> #include <stdio. Any other use requires prior written consent from the copyright owner.h> #include <float.len spaces must be emitted to justify the output correctly: ¢macros 224²≡ #define pad(n. ¢fmt.c) do { int nn = (n).com.h" #include "mem. The next section describes the implementation of the other default conversion functions.c²≡ #include <stdarg. } while (0) pad is a macro because it needs access to put and cl. As the calls to pad suggest. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.> 0) \ put((c). This download file is made available for personal use only and is subject to the Terms of Service. cl). the conversion functions associated with the default conversion specifiers.224 FORMATTING ¢normalize flags 224²≡ if (precision >= 0) flags['0'] = 0. . 14.h> #include <stdlib..h" #define T Fmt_T ¢types 226² ¢macros 224² C Interfaces and Implementations: Techniques for Creating Reusable Software.h> #include "assert.

Frank Liu Copyright © 1997 by David R. void *). fmt. because all of the other interface functions call it to do the actual formatting. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . } ¢functions 222²+≡ void Fmt_print(const char *fmt. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. Fmt_fmt is the simplest example. .. cl.. Any other use requires prior written consent from the copyright owner. return putc(c. Hanson.1 Formatting Functions Fmt_vfmt is the heart of the implementation. void *cl) { FILE *f = cl.) { va_list ap. f).com. it initializes a va_list pointer to the variable part of its argument list and calls Fmt_vfmt: ¢functions 222²+≡ void Fmt_fmt(int put(int c. fmt). 14. void *cl..2..IMPLEMENTATION 225 ¢conversion functions ¢data 225² ¢static functions 225² ¢functions 222² 222² ¢data 225²≡ const Except_T Fmt_Overflow = { "Formatting Overflow" }.) { va_list ap. reproduction and/or distribution are strictly prohibited and violate applicable laws. va_start(ap. const char *fmt. ap). C Interfaces and Implementations: Techniques for Creating Reusable Software.. va_end(ap). Unauthorized use. Fmt_vfmt(put. . } Fmt_print and Fmt_fprint call Fmt_vfmt with outc as the put function and with the stream for the standard output or the given stream as the associated data: ¢static functions 225²≡ static int outc(int c.

int size. buf and size are copies of Fmt_vsfmt’s similarly named parameters. Fmt_vsfmt initializes a local instance of this structure and passes a pointer to it to Fmt_vfmt: C Interfaces and Implementations: Techniques for Creating Reusable Software.. . size. Fmt_vfmt(outc. } void Fmt_fprint(FILE *stream. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Fmt_vfmt(outc. va_end(ap). int len. char *bp. Frank Liu Copyright © 1997 by David R. ap). len = Fmt_vsfmt(buf. fmt.) { va_list ap.226 FORMATTING va_start(ap. fmt. reproduction and/or distribution are strictly prohibited and violate applicable laws. fmt). Any other use requires prior written consent from the copyright owner. return len. stream. va_end(ap). }. const char *fmt.. and bp points to the location in buf where the next formatted character is to be stored.. va_end(ap). This download file is made available for personal use only and is subject to the Terms of Service. . Hanson. fmt). fmt). Unauthorized use. stdout. ap). int size.com.. const char *fmt. fmt. ap). va_start(ap. } Fmt_sfmt calls Fmt_vsfmt: ¢functions 222²+≡ int Fmt_sfmt(char *buf. .) { va_list ap. All rights reserved.. va_start(ap. } Fmt_vsfmt calls Fmt_vfmt with a put function and with a pointer to a structure that keeps track of the string being formatted into buf and of how many characters it can hold: ¢types 226²≡ struct buf { char *buf.

This download file is made available for personal use only and is subject to the Terms of Service. va_list ap) { struct buf cl. . All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. return c. and increments the bp field: ¢static functions 225²+≡ static int insert(int c.IMPLEMENTATION 227 ¢functions 222²+≡ int Fmt_vsfmt(char *buf. str = Fmt_vstring(fmt. fmt.1..buf . ap).buf = cl. } The call to Fmt_vfmt above calls the private function insert with each character to be emitted and also the pointer Fmt_vsfmt’s local buf structure. va_list ap. assert(buf). Hanson. deposits it at location given by the bp field. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. .) { char *str. except that they use a different put function. assert(fmt). assert(fmt). ap). Fmt_vfmt(insert. cl. &cl). insert(0. const char *fmt. void *cl) { struct buf *p = cl. &cl. assert(size > 0). } Fmt_string and Fmt_vstring work the same way.cl. Any other use requires prior written consent from the copyright owner.bp = buf... insert checks that there’s room for the character. Unauthorized use. Frank Liu Copyright © 1997 by David R. if (p->bp >= p->buf + p->size) RAISE(Fmt_Overflow). Fmt_string calls Fmt_vstring: ¢functions 222²+≡ char *Fmt_string(const char *fmt. cl.size = size. va_start(ap. fmt).com.bp . return cl. *p->bp++ = c. int size. C Interfaces and Implementations: Techniques for Creating Reusable Software.

2*p->size). } append is like Fmt_vsfmt’s put.buf = cl. &cl. For the other characters in the format string. Fmt_vfmt(append. It interprets the format string and. return str. Any other use requires prior written consent from the copyright owner. except that it doubles the size of the string on the fly.buf). ap).. The buck stops at Fmt_vfmt. for each formatting specifier.bp = ALLOC(cl. cl.228 FORMATTING va_end(ap). cl.com. reproduction and/or distribution are strictly prohibited and violate applicable laws.size = 256. p->size *= 2. fmt. p->bp = p->buf + p->size. va_list ap) { struct buf cl. All rights reserved.bp . } Fmt_vstring initializes a buf structure to a string that can hold 256 characters. return RESIZE(cl. assert(fmt). which is why Fmt_vstring calls RESIZE to deallocate the excess characters.buf. and passes a pointer to this structure to Fmt_vfmt: ¢functions 222²+≡ char *Fmt_vstring(const char *fmt. This download file is made available for personal use only and is subject to the Terms of Service. to hold the formatted characters. Hanson. if (p->bp >= p->buf + p->size) { RESIZE(p->buf. ¢static functions 225²+≡ static int append(int c. append(0. } *p->bp++ = c. } When Fmt_vstring is finished.cl. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. when necessary.size). Unauthorized use. calls the appropriate conversion function. cl. &cl). the string pointed to by the buf field might be too long. Frank Liu Copyright © 1997 by David R. void *cl) { struct buf *p = cl. it calls the put function: C Interfaces and Implementations: Techniques for Creating Reusable Software. return c.

assert(fmt).com. assert(cvt[c]). width gives the field width. precision = INT_MIN.IMPLEMENTATION 229 ¢functions 222²+≡ void Fmt_vfmt(int put(int c. reproduction and/or distribution are strictly prohibited and violate applicable laws. and into dealing with the possibility that the conversion specifier doesn’t have a corresponding conversion function. 0. All rights reserved. const char *fmt. flags[256]. else ¢format an argument 229² } Most of the work in ¢format an argument 229² goes into consuming the flags. C Interfaces and Implementations: Techniques for Creating Reusable Software. width. and it’s indexed by a format character. Unauthorized use. 0. 0. ¢get optional flags 230² ¢get optional field width 231² ¢get optional precision 232² c = *fmt++. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } cvt is an array of pointers to conversion functions. flags. In the chunk below. Frank Liu Copyright © 1997 by David R. . Declaring c to be an unsigned char in the chunk above is necessary to ensure that *fmt is interpreted as an integer in the range 0 to 255. Licensed by Frank Liu 1740749 0. void *cl. and precision. 0. field width. ¢format an argument 229²≡ { unsigned char c. va_list ap) { assert(put). Any other use requires prior written consent from the copyright owner. Hanson. void *cl). This download file is made available for personal use only and is subject to the Terms of Service. while (*fmt) if (*fmt != '%' || *++fmt == '%') put((unsigned char)*fmt++. 0. '\0'. assuming the ASCII collating sequence: ¢data 225²+≡ static T cvt[256] = { /* 07 */ 0. cl). sizeof flags). int width = INT_MIN. memset(flags. &ap. put. and precision gives the precision. cl. (*cvt[c])(c.. precision). 0. cvt is initialized to the conversion functions for the default conversion specifiers.

0. 0. 0.23 */ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. cvt_d. 0. 0.39 40. 0. Hanson. 0. 0. 0 }. 0. 0. 0.31 32. 0. */ 0. 0. 0. 0. 0. */ cvt_p. 16. 0. cvt_f. assert(0 < code && code < (int)(sizeof (cvt)/sizeof (cvt[0]))). It returns the previous value of that element: ¢functions 222²+≡ T Fmt_register(int code. cvt_o. 0. old = cvt[code].55 56. 0. Any other use requires prior written consent from the copyright owner. 0. 0. return old. Frank Liu Copyright © 1997 by David R. */ 0. 0. 0. 0. 0. 0. 0. 0. */ 0. cvt[code] = newcvt. 0. 0. 0.1. 0. 0. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. */ cvt_x. 0. 0.63 64. This download file is made available for personal use only and is subject to the Terms of Service. cvt_c. 0. 0. 0. 0. 0.47 48. 0. 0. cvt_f. Fmt_register installs a new conversion function by storing a pointer to it in the appropriate element of cvt. 0. cvt_u. 0. ¢get optional flags 230²≡ if (Fmt_flags) { C Interfaces and Implementations: Techniques for Creating Reusable Software. 0. 0. 0. 0. 0. 0. 0. 0.. Unauthorized use.230 FORMATTING /* /* /* /* /* /* /* /* /* /* /* /* 8. 0. The first one consumes the flags: ¢data 225²+≡ char *Fmt_flags = "-+ 0". C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. */ 0.com. */ 0. 0. */ 0. 0. incrementing fmt as they go. 0. 0.79 80.71 72. . 0. */ 0. 0. 0. 0. */ 0. cvt_f. 0. 0. */ 0. */ 0. 0. 0. 0. 0. } The chunks that scan the conversion specifier follow the syntax shown in Figure 14. 0. 0. 0. T newcvt) { T old. 0. 0.95 96-103 /* 104-111 /* 112-119 /* 120-127 0. 0. 0. */ 0. 0. 0. 0.15 */ 0. 0. 24. 0.87 88. cvt_s. 0. 0.

Unauthorized use. c = *++fmt) { assert(flags[c] < 255).. Frank Liu Copyright © 1997 by David R. c). it must not specify INT_MIN. Hanson. This test must be made without actually causing overflow. c && strchr(Fmt_flags. When a width or precision is given explicitly. 10•n + d doesn’t overflow. which is why the constraint is rearranged in the assertion above. Any other use requires prior written consent from the copyright owner. when an argument specifies a width or precision. n = 10*n + d. A period announces an approaching optional precision: C Interfaces and Implementations: Techniques for Creating Reusable Software. } As this code suggests. } else for (n = 0. .com. for ( . All rights reserved. int). in which case the next integer argument provides their values. ¢n ← next argument or scan digits 231² width = n. isdigit(*fmt). assert(n <= (INT_MAX . } } Next comes the field width: ¢get optional field width 231²≡ if (*fmt == '*' || isdigit(*fmt)) { int n. assert(n != INT_MIN). fmt++) { int d = *fmt . reproduction and/or distribution are strictly prohibited and violate applicable laws. it must not exceed INT_MAX. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. fmt++.d)/10).'0'. ¢n ← next argument or scan digits 231²≡ if (*fmt == '*') { n = va_arg(ap. This download file is made available for personal use only and is subject to the Terms of Service. flags[c]++.IMPLEMENTATION 231 unsigned char c = *fmt. which is reserved as the default value. which is equivalent to the constraint 10•n + d ≤ INT_MAX — that is. } An asterisk can appear for the width or precision.

unsigned char flags[]. void *cl). ¢declare buf and p. else if (val < 0) m = -val. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.p. converts it to an unsigned integer. int width.' && (*++fmt == '*' || isdigit(*fmt))) { int n. else m = val.2. reproduction and/or distribution are strictly prohibited and violate applicable laws. } Notice that a period not followed by an asterisk or a digit is consumed and is interpreted as an explicitly omitted precision. Unauthorized use. if (val < 0) *--p = '-'. initialize p 233² if (val == INT_MIN) m = INT_MAX + 1U.2 Conversion Functions cvt_s. Hanson. and generates the appropriate string in a local buffer. It then calls Fmt_putd to emit the string. It fetches the integer argument. cvt_d is the conversion function for %d. int). cl. This download file is made available for personal use only and is subject to the Terms of Service. } C Interfaces and Implementations: Techniques for Creating Reusable Software. width. int precision) { int val = va_arg(*app. unsigned m. . int put(int c. put. do *--p = m%10 + '0'. va_list *app.232 FORMATTING ¢get optional precision 232²≡ if (*fmt == '. the conversion function for %s. ¢conversion functions 222²+≡ static void cvt_d(int code. (buf + sizeof buf) . and it is typical of the functions that format numbers.. All rights reserved. while ((m /= 10) > 0). most significant digit first. Any other use requires prior written consent from the copyright owner. Fmt_putd(p. 14.com. void *cl. flags. precision). is shown on page 222. ¢n ← next argument or scan digits 231² precision = n.

Unauthorized use. and precision. The order of the if statements in ¢compute the sign 233² implements the rule that a + flag takes precedence over a space flag. int put(int c. see Section 3. it specifies the minimum number of digits that must appear. assert(len >= 0). int width. All rights reserved. assert(flags). int precision) { int sign.2. . Any other use requires prior written consent from the copyright owner. int len. reproduction and/or distribution are strictly prohibited and violate applicable laws. len--. That many digits must be emitted. void *cl).com. } else if (flags['+']) sign = '+'. initialize p 233²≡ char buf[43]. The length of the C Interfaces and Implementations: Techniques for Creating Reusable Software. else sign = 0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. cvt_d does unsigned arithmetic for the same reasons that Atom_int does. unsigned char flags[].IMPLEMENTATION 233 ¢declare buf and p. Frank Liu Copyright © 1997 by David R. ¢functions 222²+≡ void Fmt_putd(const char *str. void *cl. which also explains why buf has 43 characters. If a precision is given. This download file is made available for personal use only and is subject to the Terms of Service. which may require adding leading zeros. then sets sign to that character: ¢compute the sign 233²≡ if (len > 0 && (*str == '-' || *str == '+')) { sign = *str++. ¢normalize width and flags 223² ¢compute the sign 233² { ¢emit str justified in width 234² } } Fmt_putd must emit the string in str as specified by flags. char *p = buf + sizeof buf.. Fmt_putd first determines whether or not a sign or leading space is needed. width. Hanson. assert(str). else if (flags[' ']) sign = ' '.

the value converted. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. in which case. else n = len. if the output is to be right-justified with leading zeros. if (precision < 0) precision = 1.234 FORMATTING converted result. . All rights reserved. if (len < precision) n = precision. ¢emit the sign 234² } ¢emit the sign 234²≡ if (sign) put(sign. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. or it can emit the sign and the padding. if the output is to be left-justified. and this code handles the special case when a value of zero is converted with a precision of zero. no characters from the converted result are emitted. n. depends on the precision. Hanson. } else { pad(width . if (sign) n++.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. n is assigned the number of characters that will be emitted.n. ¢emit str justified in width 234²+≡ if (flags['-']) { ¢emit the sign 234² } else if (flags['0']) { ¢emit the sign 234² pad(width . '0'). ' '). C Interfaces and Implementations: Techniques for Creating Reusable Software.. and the sign: ¢emit str justified in width 234²≡ int n. or it can emit the padding and the sign.n. This download file is made available for personal use only and is subject to the Terms of Service. if the output is to be right-justified with spaces. Fmt_putd can now emit the sign. else if (precision == 0 && len == 1 && str[0] == '0') n = 0. Unauthorized use. cl).

va_list *app. '0'). void *cl). All rights reserved. It emits the decimal representation for the next unsigned integer: ¢conversion functions 222²+≡ static void cvt_u(int code. void *cl). ¢conversion functions 222²+≡ static void cvt_o(int code. . if the output is leftjustified: ¢emit str justified in width 234²+≡ pad(precision . int put(int c.IMPLEMENTATION 235 Fmt_putd can finally emit the converted result. cvt_u is simpler than cvt_d. Hanson.len. int width. but it can use all of Fmt_putd’s machinery for emitting the converted result. Unauthorized use. ¢declare buf and p. and the padding. This download file is made available for personal use only and is subject to the Terms of Service. if dictated by the precision. put. va_list *app.com. C Interfaces and Implementations: Techniques for Creating Reusable Software. unsigned char flags[]. except that the output bases are different.n. unsigned char flags[]. int precision) { unsigned m = va_arg(*app. width. Fmt_putd(p. Any other use requires prior written consent from the copyright owner. while ((m /= 10) > 0). unsigned). flags. cl. initialize p 233² do *--p = m%10 + '0'. void *cl.len-1] 223² if (flags['-']) pad(width . initialize p 233² do *--p = (m&0x7) + '0’. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. precision). void *cl. } The octal and hexadecimal conversions are like the unsigned decimal conversions. which simplifies the conversions themselves.. (buf + sizeof buf) . unsigned). including the leading zeros. int precision) { unsigned m = va_arg(*app. int put(int c. ¢declare buf and p. int width. ¢emit str[0. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws.p. ' ').

int put(int c. Any other use requires prior written consent from the copyright owner. while ((m >>= 4) != 0).com. This download file is made available for personal use only and is subject to the Terms of Service. unsigned char flags[]. void *cl). left. cl. va_list *app. 236² C Interfaces and Implementations: Techniques for Creating Reusable Software. . void *cl). void *cl. precision). put. Unauthorized use.are ignored. unsigned char flags[]. width.or right-justified in width characters. int put(int c. The argument is interpreted as a pointer. int precision) { unsigned long m = (unsigned long)va_arg(*app. ¢declare buf and p. Fmt_putd(p. va_list *app. put. width. } static void cvt_x(int code. ¢conversion functions 222²+≡ static void cvt_p(int code. unsigned). reproduction and/or distribution are strictly prohibited and violate applicable laws. cvt_p emits a pointer as a hexadecimal number. int width. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢emit m in hexadecimal 236² } cvt_c is the conversion function associated with %c. flags. The precision and all flags except . initialize p 233² precision = INT_MIN. and it’s converted to an unsigned long in which to do the conversion. All rights reserved. because an unsigned might not be big enough to hold a pointer. void *cl. Fmt_putd(p. Frank Liu Copyright © 1997 by David R. It ignores the precision and the other flags. Hanson.p. ¢declare buf and p. initialize p 233² ¢emit m in hexadecimal } ¢emit m in hexadecimal 236²≡ do *--p = "0123456789abcdef"[m&0xf]. int width. flags. it formats a single character.. precision).p. int precision) { unsigned m = va_arg(*app. (buf + sizeof buf) . void*).236 FORMATTING while ((m >>= 3) != 0). cl. (buf + sizeof buf) .

99f. va_arg(*app. This download file is made available for personal use only and is subject to the Terms of Service. fmt[3] = precision%10 + '0'. and DBL_MAX_10_EXP is log 10 DBL_MAX . int width.1. } to convert the absolute value of val into buf. void *cl). put((unsigned char)va_arg(*app. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. DBL_MAX_10_EXP and DBL_MAX are defined in the standard header file float. if ( flags['-']) pad(width . fmt[2] = (precision/10)%10 + '0'.. int put(int c. void *cl. Unauthorized use. Machine-dependent algorithms are faster and more accurate. so the conversion function associated with the e. f. Converting a floating-point value to its decimal representation accurately is surprisingly difficult to do in a machine-independent way. cl).IMPLEMENTATION 237 ¢conversion functions 222²+≡ static void cvt_c(int code.dd?". assert(precision <= 99). All rights reserved.1. and g conversion specifiers uses ¢format a double argument into buf 237²≡ { static char fmt[] = "%. int precision) { ¢normalize width 223² if (!flags['-']) pad(width . unsigned char flags[]. and plain characters are all emitted the same way. double)). .com. that is. fmt[4] = code. va_list *app. Frank Liu Copyright © 1997 by David R. it then emits buf. DBL_MAX is the largest value that can be represented as a double. reproduction and/or distribution are strictly prohibited and violate applicable laws. int). Hanson. it’s the largest decimal exponent that can be rep- C Interfaces and Implementations: Techniques for Creating Reusable Software. which may require DBL_MAX_10_EXP+1+1+99+1 characters. and are thus converted to and passed as integers.h. unsigned. fmt. cvt_c converts the resulting integer to an unsigned char so that signed. ' '). } cvt_c fetches an integer instead of a character because character arguments passed in the variable part of an argument list suffer the default argument promotions. The longest output comes from the specifier %. sprintf(buf. ' '). Any other use requires prior written consent from the copyright owner. The difference between the floating-point conversion specifiers is in how they format the various parts of a floating-point value.

int width. void *cl). His code also shows how to implement the other printf-style formatting flags and codes. cl. cvt_f handles all three codes: ¢conversion functions 222²+≡ static void cvt_f(int code. put. flags.99f. All rights reserved. and makes the buffer’s maximum size known at compile time. 99 digits after the decimal point. if DBL_MAX is converted with the conversion specifier %. This download file is made available for personal use only and is subject to the Terms of Service. including low-level code for converting strings to floating-point values and vice versa. given a floating-point C Interfaces and Implementations: Techniques for Creating Reusable Software. if (code == 'g' && precision == 0) precision = 1. . reproduction and/or distribution are strictly prohibited and violate applicable laws.com. va_list *app. Unauthorized use. Floating-point conversions have been implemented many times. Section 4. a decimal point. Any other use requires prior written consent from the copyright owner. The litmus test for these conversions is if.797693 ×10 and DBL_MAX_10_EXP is 308. and a terminating null character. Frank Liu Copyright © 1997 by David R. but it’s easy to botch these conversions by making them inaccurate or too slow. void *cl. DBL_MAX is 308 1. int precision) { char buf[DBL_MAX_10_EXP+1+1+99+1]. Limiting the precision to 99 limits the size of the buffer needed to hold the converted result.. unsigned char flags[]. strlen(buf). For 64-bit doubles in IEEE 754 format. Hanson. } Further Reading Plauger (1992) describes the implementation of the C library’s printf family of output functions.238 FORMATTING resented by a double. the result may have DBL_MAX_10_EXP+1 digits before the decimal point. width. Thus. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int put(int c. if (precision < 0) precision = 6. The assignments to fmt[2] and fmt[3] assume the ASCII collating sequence. %e and %g. take fewer characters than the result for %f.8 in Hennessy and Patterson (1994) describes the IEEE 754 floating-point standard and the implementation of floating-point addition and multiplication. ¢format a double argument into buf 237² Fmt_putd(buf. The converted results from the other conversion specifiers. precision). Goldberg (1991) surveys the properties of floating-point arithmetic that most concern programmers.

4 Write a conversion function for emitting the elements in a Bit_T as a sequence of integers in which a run of ones is emitted as a range. 14. Devise a way to do this deallocation only when it pays. that is. Hanson.2 Use the algorithms described in Steele and White (1990) to implement the e. the output conversion produces a string from which the input conversion recreates a y that is bitwise identical to x.. would format x. Frank Liu Copyright © 1997 by David R. Exercises 14. for some x. 1 32–45 68 70–71. and shows that.3 Write a conversion function that takes the conversion specifier from the next integer argument and associates it with the character @. This download file is made available for personal use only and is subject to the Terms of Service. For example. when the space deallocated is worth the effort it takes to deallocate it.value according to the format code carried along in x. Steele and White (1990) describe how to do an accurate output conversion. this conversion requires arithmetic of arbitrary precision. x.EXERCISES 239 value x. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. 14. for example. reproduction and/or distribution are strictly prohibited and violate applicable laws.1 Fmt_vstring uses RESIZE to deallocate the unused portion of the string that it returns. Fmt_string("The offending value is %@\n". Any other use requires prior written consent from the copyright owner. Unauthorized use. Clinger (1990) describes how to do the input conversion accurately. 14. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.value). f.format. . and g conversions. x.com.format.

C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. Frank Liu Copyright © 1997 by David R. . reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service.. All rights reserved.com. Any other use requires prior written consent from the copyright owner. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 't'. Frank Liu Copyright © 1997 by David R. 'i'. 'n'. not chars. This download file is made available for personal use only and is subject to the Terms of Service. which explains why sizeof 'F' is equal to sizeof (int). Hanson. Incidentally. 'o'. and string literals can be used to initialize arrays of characters. ' '. it has the value zero. like 'F'. char *msg = "File not found". 'u'. are ints. String literals can also stand for arrays initialized to the given characters. that is. ' '. an N-character string is an array of N+1 characters in which the last character is the null character. 241 C C Interfaces and Implementations: Techniques for Creating Reusable Software. but it does include facilities for manipulating arrays of characters.15 LOW-LEVEL STRINGS is not a string-processing language per se. is shorthand for char msg[] = { 'F'. is equivalent to static char t376[] = "File not found".. Pointers to characters can be used to traverse character arrays. 'l'. Any other use requires prior written consent from the copyright owner. '\0' }. For example. Unauthorized use. . 'f'. 'e'. The language itself has only two features that help process strings. By convention. reproduction and/or distribution are strictly prohibited and violate applicable laws. which are commonly called strings. For example. 'n'. char *msg = t376. character constants. 'o'.com. 'd'. char msg[] = "File not found".

First. Unauthorized use. like strncat. but allocation errors can still occur. compare. strcat will scribble on unallocated storage or storage used for something else. reproduction and/or distribution are strictly prohibited and violate applicable laws. and transform strings. A string literal can be used anywhere the name of a read-only array can be used.com. Frank Liu Copyright © 1997 by David R. const char *src) It appends src to the end of dst. The assignment is equivalent to the more verbose { static char digits[] = "0123456789abcdef". Hanson. All rights reserved.h because most of the Str functions allocate the space for their results. a client must allocate the space for the result. defined in the standard header string. . } digits is a compiler-generated name.. Second. The C library includes a suite of functions that manipulate nullterminated strings. *p++ = digits[m&0xf]. copy.242 LOW-LEVEL STRINGS where t376 is an internal name generated by the compiler. The cost associated with these allocations is the price for safety. search. Fmt’s cvt_x uses a string literal in an expression: do *--p = "0123456789abcdef"[m&0xf]. If dst isn’t big enough to hold the additional characters from src. and most important. it copies characters up to and including the null character from src to successive elements in dst beginning at the element in dst that holds the null character. that is. scan. This download file is made available for personal use only and is subject to the Terms of Service.h. C Interfaces and Implementations: Techniques for Creating Reusable Software. take additional arguments that limit the number of characters copied to their results. while ((m >>= 4) != 0). strcat illustrates the two drawbacks of the functions defined in string. These functions. These functions are safer than those in string. The functions in the Str interface described in this chapter avoid these drawbacks and provide a convenient way to manipulate substrings of their string arguments. such as dst in strcat.h. Any other use requires prior written consent from the copyright owner. strcat is typical: char *strcat(char *dst. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which helps. all of the functions are unsafe — none of them can check to see whether the result string is large enough. Some of the functions. For example.

For example.1 Interface ¢str. 244² 1 2 3 4 5 6 7 8 9 10 I n t e r f a c e 0 –9 –8 –7 –6 –5 –4 –3 –2 –1 C Interfaces and Implementations: Techniques for Creating Reusable Software.h²≡ #ifndef STR_INCLUDED #define STR_INCLUDED #include <stdarg. position zero is the location to the right of the last character..h> ¢exported functions #undef T #endif All of the string arguments to the functions in the Str interface are given by a pointer to a null-terminated array of characters and positions. Nonpositive positions specify positions from the right end of the string. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. . Frank Liu Copyright © 1997 by David R. the following diagram shows the positions in the string Interface. Positive positions specify the location from the left end of a string. All rights reserved. because clients of the string. 15. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service.INTERFACE 243 These allocations are often needed anyway. Any other use requires prior written consent from the copyright owner. Hanson.h functions. clients of the Str functions must still deallocate the results. The Text interface described in the next chapter exports another set of string-manipulation functions that avoid some of the allocation overhead of the Str functions.h functions must allocate the results when their sizes depend on the outcomes of computations. Like Ring positions. position one is the location to the left of the first character. string positions identify locations between characters including the position after the last nonnull character. As for the string. reproduction and/or distribution are strictly prohibited and violate applicable laws.

int i. int j). and that return information about strings and positions in them. Hanson. int j). Unauthorized use. extern char *Str_catv (const char *s.. s[3:3] and s[3:-7] both specify the null substring between the n and t in Interface. Order is important when specifying substrings with indices. int i. The functions that create strings are: ¢exported functions 244²≡ extern char *Str_sub(const char *s. It is a checked runtime error to pass a null string pointer to any function in this interface. int j2). All of these functions allocate the space for their results. int j. extern char *Str_map (const char *s. extern char *Str_dup(const char *s. Positions are better than character indices because they avoid these confusing boundary cases. int i.). C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. s[-4:0] is the substring face. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. Other conventions that use negative indices could be used.. const char *s2. For example. But this convention must permit the index of the null character in order to specify the substring face with s[4. denoted by s[i:j]. extern char *Str_reverse(const char *s.com. but they have disadvantages. and they all can raise Mem_Failed. Changing this convention so that a substring ends after the second index makes it impossible to specify a null substring. s[i:i+1] is always the character to the right of i for any valid position i. These positions can be given in either order: s[0:-4] also specifies face.9]. int i1.6] specifies the substring terf. and it cannot specify the leading null substring. And nonpositive positions can be used to access the tail of a string without knowing its length. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int j1. s[1. the indices in the string Interface run from zero to nine inclusive. If s points to the string Interface. int n). Character indices are another way to specify substrings and may seem more natural. Frank Liu Copyright © 1997 by David R. const char *to). extern char *Str_cat(const char *s1. except the rightmost position. If substrings are specified with two indices where the substring starts after the first index and ends before the second one. const char *from. int i. Substrings can be null. int i2.. . except as detailed below for Str_catv and Str_map.. int j.. reproduction and/or distribution are strictly prohibited and violate applicable laws. Str exports functions that create and return null-terminated strings. but they’re more cumbersome than positions.244 LOW-LEVEL STRINGS Two positions i and j in the string s specify the substring between them.

0. 1. 10) Str_sub("Interface". Str_catv is similar. Frank Liu Copyright © 1997 by David R. 1. Str_map(s. The argument list is terminated by a null pointer. 0. and Str_map returns null. " plant". All rights reserved. Hanson. C Interfaces and Implementations: Techniques for Creating Reusable Software. Str_dup returns a string with n copies of s[i:j]. for example. Str_reverse returns the string consisting of the characters from s[i:j] in the opposite order in which they appear in s. "ABCDEFGHIJKLMNOPQRSTUVWXYZ". For example. -4.. that is. and returns the concatenation of these substrings. 1) returns a copy of Interface. Str_dup("Interface". NULL) returns the string face plant. 0. Each character from s[i:j] that appears in from is mapped to the corresponding character in to. 0) Str_sub("Interface". It is a checked runtime error for n to be negative. 1. Any other use requires prior written consent from the copyright owner. . This download file is made available for personal use only and is subject to the Terms of Service. from and to are used only to establish the default mapping. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 0) all return face. i and j are ignored. It is a checked runtime error to pass an i and j that do not specify a substring in s to any function in this interface. the calls Str_sub("Interface".com. If both from and to are null. it takes zero more triples that each specify a string and two positions. Unauthorized use. "abcdefghijklmnopqrstuvwxyz") returns a copy of s in which uppercase characters are replaced by their lowercase equivalents. the mapping specified by the most recent call to Str_map is used. Str_map returns a string consisting of the characters from s[i:j] mapped according to the values given by from and to. Str_catv("Interface". the substring of s between the positions i and j. If s is null. 6. Str_dup is often used to copy a string. -4.INTERFACE 245 Str_sub returns s[i:j]. For example. 0. a string consisting of the characters from s1[i1:j1] followed by the characters from s2[i2:j2]. 10) Str_sub("Interface". The positions can be given in either order. Str_cat returns the concatentation of s1[i1:j1] and s2[i2:j2]. 6. Characters that do not appear in from are mapped to themselves. Note the use of the positions 1 and 0 to specify all of Interface. -4. For example. reproduction and/or distribution are strictly prohibited and violate applicable laws.

Str_cmp returns a value that is less than zero. extern int Str_find (const const char *str). ¢exported functions 244²+≡ extern int Str_pos(const char *s. so Str_pos is often used when an index is needed. C Interfaces and Implementations: Techniques for Creating Reusable Software. int i. int j. or greater than zero if s1[i1:j1] is lexically less than. printf("%s\n". int j. This download file is made available for personal use only and is subject to the Terms of Service. int i. A positive position can always be converted to an index by subtracting one. For example. int j. Str_len returns the number of characters in s[i:j].. reproduction and/or distribution are strictly prohibited and violate applicable laws. for nonnull from and to strings to specify strings of different lengths.246 LOW-LEVEL STRINGS The following are checked runtime errors: for only one of the from or to pointers to be null. char *s. for all of s. int j. -4)-1]) prints face. Unauthorized use. &s[Str_pos(s. int i). Functions with names that include _r search from the right ends of their argument strings. char *s. int c). Hanson. int c). int j. equal to zero. int i. Str_pos returns the positive position corresponding to s[i:i]. char *s. none allocate space. and for both from and to be null on the first call to Str_map. ¢exported functions 244²+≡ extern int Str_chr (const extern int Str_rchr (const extern int Str_upto (const const char *set). and to to be null. if s points to the string Interface. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int j2). Any other use requires prior written consent from the copyright owner. int j). int i. extern int Str_len(const char *s. int i. equal to. All rights reserved. the others search from the left ends. extern int Str_cmp(const char *s1. extern int Str_rupto(const const char *set). int i. int i2. char *s. from. they return zero. or greater than s2[i2:j2]. int j1.com. When the search succeeds. The following functions search strings for characters and other strings. The remaining functions in the Str interface return information about strings or positions in strings. Frank Liu Copyright © 1997 by David R. when the search fails. . int i1. these functions return positive positions that reflect the result of the search. const char *s2. char *s.

char *s. int i. Str_upto and Str_rupto return the position in s before the leftmost or rightmost occurrence in s[i:j] of any character in set. int j. Str_rmany returns the positive position in s before a contiguous sequence of one of more characters from set at the end of s[i:j]. const char *str). char *s. or zero if c doesn’t appear in s[i:j].. Str_chr and Str_rchr return the position in s before the leftmost or rightmost occurrence of the character c in s[i:j]. Frank Liu Copyright © 1997 by David R. . Str_find and Str_rfind return the position in s before the leftmost or rightmost occurrence of str in s[i:j]. int i. Any other use requires prior written consent from the copyright owner. It is checked runtime error to pass a null set to Str_any. The functions ¢exported functions 244²+≡ extern int Str_any (const const char *set). int j. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Str_many. Hanson. It is a checked runtime error to pass a null set to these functions. char *s.com. int i. or zero if s[i:i+1] doesn’t appear in set. char *s. or Str_rmany. It is a checked runtime error to pass a null str to these functions. int j. reproduction and/or distribution are strictly prohibited and violate applicable laws. int i. Unauthorized use. or zero if s[i:j] doesn’t begin with a character from set. int i. or zero if none of the characters in set appear in s[i:j].INTERFACE 247 extern int Str_rfind(const char *s. int j. extern int Str_match (const const char *str). or zero if str doesn’t appear in s[i:j]. int i. Str_any returns the positive position in s after the character s[i:i+1] if that character appears in set. they return the positive positions that follow or precede the matched substrings. extern int Str_rmatch(const const char *str). extern int Str_rmany (const const char *set). int j. extern int Str_many (const const char *set). Str_many returns the positive position in s after a contiguous sequence of one or more characters from set at the beginning of s[i:j]. C Interfaces and Implementations: Techniques for Creating Reusable Software. step over substrings. All rights reserved. or zero if s[i:j] doesn’t end with a character from set. char *s.

basename("/usr/jenny/main. 0.c".c". All rights reserved. and Str_rfind search from the right ends of their argument strings.c". Str_sub(name. as illustrated by the following examples. Str_rmatch returns the positive position in s before the occurrence of str at the end of s[i:j]. respectively. 0.c wf basename uses Str_rchr to find the rightmost slash and Str_rmatch to isolate the suffix. "in") return 7 and 16. "in") Str_rfind("The rain in Spain". . Str_rchr.c". Str_many and Str_match step right and return the positions after the characters they step over. 0. 0. main. Frank Liu Copyright © 1997 by David R. Str_rupto. Unauthorized use./src/main. Str_rmany and Str_rmatch step left. reproduction and/or distribution are strictly prohibited and violate applicable laws.. 0. 1. 0. because in appears three times. basename(". "rain") both return 5. 0.248 LOW-LEVEL STRINGS Str_match returns the positive position in s after the occurrence of str at the beginning of s[i:j]. basename("main. For example. 1. or zero if s[i:j] doesn’t end with str. 1.c". Any other use requires prior written consent from the copyright owner.com. The function basename shows another typical use of these conventions. or zero if s[i:j] doesn’t begin with str. basename("main.. they return the positions before the characters.c") "") "c") ". the calls Str_find ("The rain in Spain". Str_rmany(name. 1. 1. 1. C Interfaces and Implementations: Techniques for Creating Reusable Software. 0. ".c main. but return positions to the left of the characters or strings they seek. because rain appears only once in their first arguments. 1. 1. 1. " \t")) returns a copy of name without its trailing blanks and tabs. This download file is made available for personal use only and is subject to the Terms of Service. 1. basename accepts a UNIX-style path name and returns the file name without its leading directories or a specific trailing suffix. 0. if there are any.c") main main. basename("examples/wfmain. The calls Str_find ("The rain in Spain". It is checked runtime error to pass a null str to Str_match or Str_rmatch. Hanson. For example. 0.obj") "main. 1. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. "rain") Str_rfind("The rain in Spain".

the file name starts at position i + 1. int precision). which is assigned to i. const char *suffix) { i = Str_rchr(path. . 0) prints the line ______face. It is a checked runtime error for the string pointer. It consumes three arguments — a string pointer and two positions — and it formats the substring in the style specified by the Fmt’s %s format. return Str_dup(path. where _ denotes a space. i. Again. '/'). or zero. 1). reproduction and/or distribution are strictly prohibited and violate applicable laws. Frank Liu Copyright © 1997 by David R. Str_fmt) then Fmt_print("%10S\n". va_list *app. 0. int i. } The value returned by Str_rchr. Licensed by Frank Liu 1740749 15. if Str_fmt is associated with the format code S by Fmt_register('S'. i + 1. Str_dup returns the substring in path between i + 1 and j. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int width. void *cl. as well as the use of the functions that examine strings for characters or other strings. This download file is made available for personal use only and is subject to the Terms of Service. if there is one. C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. i + 1. int put(int c. For example. unsigned char flags[]. Str_match examines the file name and returns the position before the suffix or after the file name. j. void *cl). Any other use requires prior written consent from the copyright owner.2 Example: Printing Identifiers A program that prints the C keywords and identifiers in its input illustrates the use of the use of Str_fmt. is a conversion function that can be used with the formatting functions in the Fmt interface to format substrings. is the position before the rightmost slash. "Interface".EXAMPLE: PRINTING IDENTIFIERS 249 char *basename(char *path. The function ¢exported functions 244²+≡ extern void Str_fmt(int code. j is set to the position after the file name.. int j. j. suffix).com. j = Str_rmatch(path. In either case. Hanson. app. All rights reserved. in either case. or flags to be null. -4.

and letters. i and j identify the next identifier. j.h> "fmt. Fmt_register('S'. When line holds the declaration for main above.com. } } return EXIT_SUCCESS. beginning with i equal to one. the values of i and j passed to Fmt_print are as shown below. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner.250 LOW-LEVEL STRINGS ¢ids. line. char *argv[]) { char line[512]. Assigning j to i causes the next iteration of the while loop to look for the next identifier. Str_upto returns the position in line of the next underscore or letter in line[i:0]. sizeof line. . i.. static char set[] = "0123456789_" "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ". Hanson. 0. 0. underscores. j). Unauthorized use. which is associated with the format code S. This download file is made available for personal use only and is subject to the Terms of Service. i. Str_fmt). &set[10])) > 0){ j = Str_many(line. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. i = j. and that position is assigned to i. and Fmt_print prints it with Str_fmt. } The inner while loop scans line[i:0] for the next identifier. set). All rights reserved. i. Frank Liu Copyright © 1997 by David R.h> <stdio.c²≡ #include #include #include #include <stdlib. while (fgets(line. while ((i = Str_upto(line. Fmt_print("%S\n". stdin) != NULL) { int i = 1. Str_many returns the position after a run of digits.h" int main(int argc. j 4 9 13 18 24 30 int main (int argc . Thus.h" "str. char *argv []) { i 1 5 10 14 20 26 C Interfaces and Implementations: Techniques for Creating Reusable Software.

h" "fmt. j = idx(j. \ i = idx(i.h" "str.h" ¢macros 251² ¢functions 252² The implementation must deal with converting positions to indices and vice versa. Unauthorized use. The index of the character to the right of a negative position i is i + len. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.IMPLEMENTATION 251 There are no allocations in this program.1) encapsulates these definitions. } while (0) C Interfaces and Implementations: Techniques for Creating Reusable Software. where len is the number of characters in the string. \ assert(s). All rights reserved. len).h> "assert.c²≡ #include #include #include #include #include #include <string. } \ assert(i >= 0 && j <= len). 15. and then use these indices to access the string. The index of the character to the right of the positive position i is i − 1.3 Implementation ¢str. j) do { int len. j = t. i. \ if (i > j) { int t = i.h" "mem. reproduction and/or distribution are strictly prohibited and violate applicable laws. The Str functions convert their position arguments to indices. The macro ¢macros 251²≡ #define idx(i. idx(i. len) ((i) <= 0 ? (i) + (len) : (i) . Using positions often avoids allocations in these kinds of applications. Any other use requires prior written consent from the copyright owner. len = strlen(s). len) is the index of the character to the right of i. Hanson. given a position i in a string of length len. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. because the functions use indices to access the actual characters.h> <limits. i = j. len).com. The convert macro encapsulates the steps in this conversion: ¢macros 251²+≡ #define convert(s. ..

assert(n >= 0). Any other use requires prior written consent from the copyright owner. Hanson. 15. convert(s. like strncpy. *p. i. C Interfaces and Implementations: Techniques for Creating Reusable Software. j − i is the length of the specified substring. All rights reserved. which. Thus.1 String Operations Str_dup allocates space for n copies of s[i:j] plus a terminating null character. ¢functions 252²≡ char *Str_sub(const char *s.252 LOW-LEVEL STRINGS The positions i and j are converted to indices in the range zero to the length of s. while (i < j) *p++ = s[i++]. int n) { int k. ¢functions 252²+≡ char *Str_dup(const char *s.com. counting the null character. p = str = ALLOC(n*(j . j − i is the length of the desired substring. so that i never exceeds j. j). reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. j). The concluding assertion enforces the checked runtime error that i and j specify valid positions in s. p = str = ALLOC(j .i) + 1). *p = '\0'. Once converted. if necessary. char *str. int i. convert(s. which might be the terminating null character. . This download file is made available for personal use only and is subject to the Terms of Service. Str_sub illustrates the typical use of convert.. Str_sub and some of the other Str functions can be written using the string routines in the standard C library.2. see Exercise 15. Unauthorized use. int j. and they’re swapped. int j) { char *str. } The position that specifies the end of the substring is converted to the index of the character that follows the substring.3. provided s[i:j] is nonempty. and then copies s[i:j] n times.i + 1). *p. return str. i. int i. needs j − i + 1 bytes of storage. Frank Liu Copyright © 1997 by David R.

int j1.> 0) for (k = i. k++) *p++ = s[k]. return str. } C Interfaces and Implementations: Techniques for Creating Reusable Software. j1).i > 0) while (n-. p = str = ALLOC(j1 . k < j. Frank Liu Copyright © 1997 by David R. i. while (j > i) *p++ = s[--j]. int j2) { char *str. Any other use requires prior written consent from the copyright owner. *p = '\0'. p = str = ALLOC(j .com. convert(s. j2). *p. but it’s used enough to warrant its own tailor-made implementation: ¢functions 252²+≡ char *Str_cat(const char *s1. reproduction and/or distribution are strictly prohibited and violate applicable laws. int i. . j). This download file is made available for personal use only and is subject to the Terms of Service. convert(s1. *p = '\0'. } Str_reverse is like Str_sub.i2 + 1). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. *p = '\0'.. *p. return str. int j) { char *str.IMPLEMENTATION 253 if (j .i1 + j2 . Hanson. while (i2 < j2) *p++ = s2[i2++]. except that it copies the characters backward: ¢functions 252²+≡ char *Str_reverse(const char *s. const char *s2. } Str_cat could just call Str_catv. while (i1 < j1) *p++ = s1[i1++]. int i2. Unauthorized use. int i1. i2. All rights reserved. i1.i + 1). convert(s2. return str.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int). va_start(ap.. ¢len ← the length of the result 254² va_end(ap). va_list ap. i. len = 0. ¢copy each s[i:j] to p. j). *p = '\0'.. return str. The first pass computes the length of each substring by converting the positions to indices. s). because it must make two passes over its variable number of arguments: ¢functions 252²+≡ char *Str_catv(const char *s. which give the length: ¢len ← the length of the result 254²≡ while (s) { i = va_arg(ap.) { char *str. Hanson. va_start(ap.. const char *save = s. s = va_arg(ap. p = str = ALLOC(len + 1). int i. int).com. j = va_arg(ap. } The first pass computes the length of the result by summing the lengths of the argument substrings. len += j . convert(s. j. s = save.i. reproduction and/or distribution are strictly prohibited and violate applicable laws. . the second pass appends the substring given by each triple to the result. *p. After the space for the result is allocated. This download file is made available for personal use only and is subject to the Terms of Service. increment p va_end(ap). const char *). Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. . s). Unauthorized use. All rights reserved.254 LOW-LEVEL STRINGS Str_catv is a bit more complicated. } The second pass is almost identical: The only difference is that the assignment to len is replaced with a loop that copies the substring: 255² C Interfaces and Implementations: Techniques for Creating Reusable Software.

int). map is built by initializing it so that map[c] is equal to c. reproduction and/or distribution are strictly prohibited and violate applicable laws. . j = va_arg(ap. This download file is made available for personal use only and is subject to the Terms of Service. The characters in s[i:j] are mapped and copied into a new string by using them as indices into map: ¢map s[i:j] into a new string 255²≡ char *str. s = va_arg(ap. assert(*from == 0 && *to == 0). i. *p = '\0'. for (c = 0.i + 1). Any other use requires prior written consent from the copyright owner. j). increment p 255²≡ while (s) { i = va_arg(ap. c++) map[c] = c.. while (*from && *to) map[(unsigned char)*from++] = *to++. c < sizeof map. i.com. } Str_map builds an array map in which map[c] is the mapping for c as specified by from and to. const char *). that is. Str_map uses this chunk when both from and to are nonnull. and it uses ¢map s[i:j] into a new string 255² when s is nonnull: C Interfaces and Implementations: Techniques for Creating Reusable Software.IMPLEMENTATION 255 ¢copy each s[i:j] to p. convert(s. while (i < j) *p++ = s[i++]. *p. Frank Liu Copyright © 1997 by David R. while (i < j) *p++ = map[(unsigned char)s[i++]]. Hanson. All rights reserved. The cast prevents characters whose values exceed 127 from being signextended to negative indices. j). The assertion above implements the checked runtime error that the lengths of from and to must be equal. int). each character is mapped to itself. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. p = str = ALLOC(j . convert(s. Unauthorized use. Then the characters in from are used to index the elements in map to which the corresponding characters in to are assigned: ¢rebuild map 255²≡ unsigned c.

so the assertion that map['a'] is nonzero implements the checked runtime error that the first call to Str_map must not have null from and to pointers. ¢functions 252²+≡ int Str_pos(const char *s. } Initially. const char *from.com. assert(s). if (from && to) { ¢rebuild map 255² } else { assert(from == NULL && to == NULL && s). There’s no way to specify a null character in to. reproduction and/or distribution are strictly prohibited and violate applicable laws. } else return NULL. which it returns. len = strlen(s). Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It converts i to an index. return i + 1.. and converts it back to a positive position. } Str_len returns the length of the substring s[i:j] by converting i and j to indices and returning the number of characters between them: C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. Hanson.256 LOW-LEVEL STRINGS ¢functions 252²+≡ char *Str_map(const char *s. i = idx(i. assert(map['a']). Frank Liu Copyright © 1997 by David R. All rights reserved. Str_pos uses this property to return the positive position corresponding to the arbitrary position i in s. int j. validates it. int i. const char *to) { static char map[256] = { 0 }. The positive position to the left of the character with index i is i + 1. all of the elements of map are zero. len). assert(i >= 0 && i <= len). } if (s) { ¢map s[i:j] into a new string 255² return str. . This download file is made available for personal use only and is subject to the Terms of Service. int i) { int len.

i2) { int cond = strncmp(s1. i1. } else return strncmp(s1. return j . and i2 and j2 to indices in s2: ¢string compare 257²≡ convert(s1. 257²+≡ The shorter of s1[i1:j1] and s2[i2:j2] determines the how many characters will be compared. j2). convert(s2. because it involves some bookkeeping: ¢functions 252²+≡ int Str_cmp(const char *s1. Unauthorized use. s2. return cond == 0 ? +1 : cond. ¢string compare 257²+≡ if (j1 . ¢string compare s1 += i1. All rights reserved.i2) { int cond = strncmp(s1. j). which is done by calling strncmp.i1 > j2 . int j) { convert(s.i1).IMPLEMENTATION 257 ¢functions 252²+≡ int Str_len(const char *s. } else if (j1 . } The implementation of Str_cmp is straightforward but tedious. s2. j2 . int i2.i1). reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service.i1 < j2 . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int j2) { ¢string compare 257² } Str_cmp starts by converting i1 and j1 to indices in s1. int j1. s1 and s2 are adjusted so that each points directly to its first character. const char *s2. int i. . i2. j1 .i. Hanson. j1 . C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.i2). Next. s2. s2 += i2. int i1. return cond == 0 ? -1 : cond. j1). Frank Liu Copyright © 1997 by David R.. i.com.

} C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. The standard stipulates that strncmp (and memcmp) must treat the characters in s1 and s2 as unsigned characters. For these implementations.com. return 0. which may be signed or unsigned. j). . 15. int j. All rights reserved. but starts its search from the right end of s[i:j]: ¢functions 252²+≡ int Str_rchr(const char *s. They all return a positive position if the search succeeds. strncmp("\344". int i. "\127". i.. and zero otherwise.258 LOW-LEVEL STRINGS When s1[i1:j1] is shorter than s2[i2:j2] and memcmp returns zero. while (j > i) if (s[--j] == c) return j + 1. reproduction and/or distribution are strictly prohibited and violate applicable laws. return 0. i < j. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int j. } Str_rchr is similar. which gives a welldefined result when character values greater than 127 appear in s1 or s2.3. i++) if (s[i] == c) return i + 1. int i. Any other use requires prior written consent from the copyright owner. s1[i1:j1] is equal to a prefix of s2[i2:j2] and is thus less than s2[i2:j2]. but some implementations of strncmp incorrectly compare “plain” characters. int c) { convert(s. "\127". j). strncmp("\344". i. for ( . Hanson. Str_chr is typical: ¢functions 252²+≡ int Str_chr(const char *s. This download file is made available for personal use only and is subject to the Terms of Service. Some implementations of memcmp produce the same error.2 Analyzing Strings The remaining functions inspect substrings from the left to the right or vice versa for occurrences of characters or other strings. For example. The second if statement handles the opposite case. Unauthorized use. 1) may return a negative value. int c) { convert(s. and the else clause applies when the lengths of the arguments are equal. 1) must return a positive value.

s[--j])) return j + 1. Hanson. Str_upto and Str_rupto are similar to Str_chr and Str_rchr. const char *set) { assert(set). int j. Unauthorized use.. return 0.com. j). const char *set) { assert(set). int j. return 0. i++) if (strchr(set. ¢functions 252²+≡ int Str_find(const char *s. reproduction and/or distribution are strictly prohibited and violate applicable laws. convert(s. int i. while (j > i) if (strchr(set. All rights reserved. Frank Liu Copyright © 1997 by David R. } int Str_rupto(const char *s. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. len = strlen(str). if (len == 0) return i + 1. s[i])) return i + 1. i. const char *str) { int len.IMPLEMENTATION 259 Both functions return the positive position to the left of the occurrence of c. int i. when c appears in s[i:j]. This download file is made available for personal use only and is subject to the Terms of Service. Its implementation treats search strings of length zero or one as special cases. Any other use requires prior written consent from the copyright owner. assert(str). i < j. else if (len == 1) { C Interfaces and Implementations: Techniques for Creating Reusable Software. convert(s. int j. i. } Licensed by Frank Liu 1740749 Str_find searches for the occurrence of a string in s[i:j]. i. int i. convert(s. j). for ( . . j). except that they look for an occurrence in s[i:j] of any one of the characters in a set: ¢functions 252²+≡ int Str_upto(const char *s.

return 0. j--) if (strncmp(&s[j-len].] ≡ str[0. } 260²) If str has no characters.... len) == 0) Str_rfind has the same three cases. str. Any other use requires prior written consent from the copyright owner. All rights reserved..len-1] return i + 1. convert(s. assert(str).. In the general case.260 LOW-LEVEL STRINGS for ( . Str_find is equivalent to Str_chr. . Unauthorized use. } C Interfaces and Implementations: Techniques for Creating Reusable Software. but it must be careful not to accept a match that extends past the end of the substring: ¢s[i. str. j .len-1] 260²≡ (strncmp(&s[i]. i < j.. Frank Liu Copyright © 1997 by David R.len + 1. else if (len == 1) { while (j > i) if (s[--j] == *str) return j + 1. len = strlen(str). Hanson. int i. } else for ( . return 0. but must cope with comparing strings backward. len) == 0) return j .com. int j. i + len <= j. ¢functions 252²+≡ int Str_rfind(const char *s. i++) if (s[i] == *str) return i + 1. const char *str) { int len. } else for ( . reproduction and/or distribution are strictly prohibited and violate applicable laws.. i++) if (¢s[i. i. if (len == 0) return j + 1. j). If str has only one character. This download file is made available for personal use only and is subject to the Terms of Service.len >= i. Str_find looks for str in s[i:j].] ≡ str[0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the search always succeeds.

return 0. Unauthorized use. i = idx(i.. j). This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. assert(i >= 0 && i <= len). const char *set) { assert(set). len). reproduction and/or distribution are strictly prohibited and violate applicable laws. assert(set). const char *set) { int len. s[i])) return i + 2. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R.com. while (i < j && strchr(set. if (i < j && strchr(set. } return 0. s[i])) { do i++. Str_many steps over a run of one or more characters in set that occur at the beginning of s[i:j]: ¢functions 252²+≡ int Str_many(const char *s. convert(s. the index i + 1 is converted to a positive position by adding one. int i. i. s[i])). len = strlen(s). int j. Str_any and its cousins don’t search for characters or strings.IMPLEMENTATION 261 Str_rfind must be careful not to accept a match that extends past the beginning of the substring. they simply step over them if they appear at the beginning or end of the substring in question. int i. Hanson. All rights reserved. return i + 1. . } C Interfaces and Implementations: Techniques for Creating Reusable Software. which explains why Str_any returns i + 2. assert(s). } If the test succeeds.i) + 1 if s[i:i+1] is a character in set: ¢functions 252²+≡ int Str_any(const char *s. Str_any returns Str_pos(s. if (i < len && strchr(set.

. j). s[j-1])) { do --j. This download file is made available for personal use only and is subject to the Terms of Service.. } When the do-while loop terminates..i) + strlen(str) if str occurs at the beginning of s[i:j].262 LOW-LEVEL STRINGS Str_rmany backs up over a run of one or more characters in set that occur at the end of s[i:j]: ¢functions 252²+≡ int Str_rmany(const char *s.. int i. Unauthorized use. int j. i. return j + 2. j). else if (len == 1) { if (i < j && s[i] == *str) return i + 2. search strings with lengths zero or one get special treatment: ¢functions 252²+≡ int Str_match(const char *s. assert(str). len = strlen(str). Set_rmany must return i + 1. } else if (i + len <= j && ¢s[i. j is equal to i − 1 or is the index of a character that is not in set. it must return the position to the right of the character s[j]. const char *set) { assert(set). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. convert(s. i. The value j + 2 is the correct one in both cases. Str_match returns Str_pos(s. if (j > i && strchr(set. while (j >= i && strchr(set. All rights reserved. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. s[j])). Like Str_find. in the second case.len-1] return i + len + 1.com. Any other use requires prior written consent from the copyright owner. In the first case.] ≡ str[0. int i. } return 0. int j. convert(s. const char *str) { int len. Hanson. . 260²) C Interfaces and Implementations: Techniques for Creating Reusable Software. if (len == 0) return i + 1.

width. len = strlen(str). ¢functions 252²+≡ int Str_rmatch(const char *s. along with flags. j). if (len == 0) return j + 1. Unauthorized use. const char *str) { int len. The calling sequence for conversion functions is described on page 221. These three arguments specify the string and two positions within that string. assert(str). Frank Liu Copyright © 1997 by David R. All rights reserved.. which is a conversion function as specified in the Fmt interface. and precision arguments dictate how the string is to be formatted.len >= i && strncmp(&s[j-len]. } The general case must be careful not to consider a match that extends past the end of s[i:j]. Hanson. determine how the substring is emitted. else if (len == 1) { if (j > i && s[j-1] == *str) return j. len) == 0) return j .IMPLEMENTATION 263 return 0. and which can treat search strings with lengths zero or one as special cases. } 15. This download file is made available for personal use only and is subject to the Terms of Service.com.3 Conversion Functions The last function is Str_fmt.len + 1. Any other use requires prior written consent from the copyright owner. width. Similar situations occur in Str_rmatch. . These positions give the length of the substring. The flags.3. str. int i. int j. } else if (j . and precision. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. The important feature of Str_fmt is that it consumes three arguments from the variable part of the argument list passed to one of the Fmt functions. convert(s. reproduction and/or distribution are strictly prohibited and violate applicable laws. return 0. which must avoid a match that extends past the beginning of s[i:j]. Str_fmt lets Fmt_puts interpret these values and emit the string: C Interfaces and Implementations: Techniques for Creating Reusable Software. which. i.

Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. The Icon functions are more powerful because they use Icon’s goal-directed evaluation mechanism.i. j). 0. Griswold (1980) explores such uses of mappings. j. Str’s functions are modeled after Icon’s similarly named string functions. int width. int put(int c. "gfedcba". int). Fmt_puts(s + i. Icon’s find function can return the positions of all the occurrences of one string in another as dictated by the context in which it is called.264 LOW-LEVEL STRINGS ¢functions 252²+≡ void Str_fmt(int code. with goal-directed evaluation. Any other use requires prior written consent from the copyright owner. flags. Hanson. The design of the Str interface is lifted almost verbatim from the string-manipulation facilities in the Icon programming language (Griswold and Griswold 1990). i = va_arg(*app. void *cl). unsigned char flags[].h. int). Unauthorized use. Icon also has a stringscanning facility that. This download file is made available for personal use only and is subject to the Terms of Service. put. j = va_arg(*app. assert(app && flags). cl. } Further Reading Plauger (1992) gives a brief critique of the functions defined in string.h. i. width. is a powerful pattern matching capability. For example. char *). All rights reserved.com. Str_map("abcdefg". int precision) { char *s. Roberts (1995) describes a simple string interface that is similar to Str and based on string. va_list *app. int i. 1. and shows how to implement them. precision). . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Using positions instead of indices and using nonpositive positions to specify locations relative to the ends of strings originated with Icon. if s is a seven-character string. void *cl. Str_map can be used to implement a surprisingly varied number of string transformations. s) returns the reverse of s.. convert(s. j . For example. s = va_arg(*app.

return str. then characterize the improvements for each function in terms of the lengths of their string arguments..c so that it recognizes and ignores C comments. For example.com. int j) { char *str. and keywords. Highly optimized assembly-language implementations are also usually faster. string literals.3 Design and implement a function that searches a substring for a pattern specified by a regular expression.4 Icon has an extensive string scanning facility. j). 15. like those supported in AWK and described in Aho. str = strncpy(ALLOC(j . 15.i).h functions and generate in-line code that may be much faster than the corresponding loops in C.i + 1). Frank Liu Copyright © 1997 by David R. str[j . int i. convert(s. j . Generalize your extended version to accept command-line arguments to specify additional identifiers that are to be ignored. Kernighan. and Weinberger (1988). in which case they operate on the string and the position in the current scanning environment. Reimplement Str using the string. s + i.1 Extend ids. 15. All rights reserved. i. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } Some C compilers recognize calls to the string. char *Str_sub(const char *s. Study Icon’s C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.EXERCISES 265 Exercises 15. This download file is made available for personal use only and is subject to the Terms of Service.i] = '\0'. to copy strings. Str_sub could be written as follows. reproduction and/or distribution are strictly prohibited and violate applicable laws.h functions where possible. like strncpy and memcpy. measure the results using a specific C compiler on a specific machine. Hanson. Unauthorized use.2 The Str implementation could use the string and memory functions in the standard C library. Its ? operator establishes a scanning environment that supplies a string and a position in this string. . This function needs to return two values: the position at which match begins and its length. String functions like find can be invoked with only one argument.

const char *set). For example. Design an interface based on this approach. Compare your design with the Str interface. 15.h defines the function char *strtok(char *s. Be sure to specify what happens when size is too small. int n). const char *s.266 LOW-LEVEL STRINGS string-scanning facility. Extend the Str interface with a function that provides similar capabilities but does not modify its argument. as the current version does. and strtok searches for the first character that is not in set. Suppose the function void Str_result(char *dst. Can you improve on strtok’s design? 15. . and return a pointer to the beginning of the token.. int j. would store its result in dst[0. otherwise. overwrites that character with a null character.7 Here’s another proposal for avoiding allocations in the Str functions. and these allocations might be unnecessary in some applications. set can be different on each call..size-1] and return dst. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. Which is simpler? Which is less prone to error? 15. C Interfaces and Implementations: Techniques for Creating Reusable Software.5 string. which splits s into tokens separated by characters in set. if dst were nonnull. and returns s. When a search fails. cause strtok to continue from where it left off and search for the first character that is in set. reproduction and/or distribution are strictly prohibited and violate applicable laws. overwrite that character with a null character. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. described in Griswold and Griswold (1990). and allocated space only if the destination was the null pointer. The string s is split into tokens by calling strtok repeatedly. strtok returns null. s is passed only on the first call. and design and implement an interface that provides similar functionality. it would allocate space for its result. int size). which have the form strtok(NULL. Hanson. char *Str_dup(char *dst.com.6 The Str functions always allocate space for their results. Frank Liu Copyright © 1997 by David R. set). int size. int i. Suppose the functions accepted an optional destination. Unauthorized use. Subsequent calls. All rights reserved.

the Str functions store their results in dst[0.. All rights reserved. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. . If the result string is nonnull. If the result string is null.EXERCISES 267 posts dst as the “result string” for the next call to a Str function.com. Unauthorized use. they allocate space for their results. C Interfaces and Implementations: Techniques for Creating Reusable Software.size-1] and clear the result string pointer. Hanson. as usual. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. reproduction and/or distribution are strictly prohibited and violate applicable laws.. Discuss the pros and cons of this proposal.

C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use.com. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. .

Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. Lengths are computed in constant time. First. Text provides functions for converting between its string representation and C-style strings. strings are arrays of characters in which the last character is null. Second. All rights reserved. By convention. so either they or their callers must allocate space for string results. The Text interface described in this chapter uses a slightly different representation for strings that addresses both of these disadvantages.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. because they’re carried along with the string. The strings provided by Text are immutable — that is. and allocations occur only when necessary.. these conversions are the price for Text’s improvements. many of these allocations are unnecessary. finding the length of a string requires searching the string for its terminating null character.16 HIGH-LEVEL STRINGS he functions exported by the Str interface described in the previous chapter augment the conventions for handling strings in C. Hanson. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . While this representation is adequate for many applications. the functions in the Str interface and some of those in the standard library assume that strings can be changed. which gives the length of the string and points to its first character: 269 C Interfaces and Implementations: Techniques for Creating Reusable Software. in applications that do not modify strings. Frank Liu Copyright © 1997 by David R.1 Interface The Text interface represents a string by a two-element descriptor. T Licensed by Frank Liu 1740749 16. so computing the length takes time proportional to the length of the string. it does have two significant disadvantages. they cannot be changed in place — and they can contain embedded null characters.

It is an unchecked runtime error to change the string described by a Text_T. some Text functions do allocate space for the strings themselves.h> #define T Text_T ¢exported types 270² ¢exported data 274² ¢exported functions 271² #undef T #endif The string pointed to by the str field is not terminated with a null character. Hanson. } T.270 HIGH-LEVEL STRINGS ¢exported types 270²≡ typedef struct T { int len. or in those returned by Text_box.len gives the length of the string.h²≡ #ifndef TEXT_INCLUDED #define TEXT_INCLUDED #include <stdarg. clients must never deallocate strings. This string space is managed completely by Text.. Any other use requires prior written consent from the copyright owner. including the null character. instead of passing pointers to descriptors. s. Text reveals the representation of descriptors so that clients may access the fields directly. that is.. Strings pointed to by Text_Ts may contain any character. Unauthorized use. none of the Text functions allocate descriptors.len-1]. Clients can read the fields of a Text_T and the characters in the string it points to. reproduction and/or distribution are strictly prohibited and violate applicable laws. and the actual characters are accessed by s. Deallocating C Interfaces and Implementations: Techniques for Creating Reusable Software. or in Text_Ts they initialize. All rights reserved.s. As a result.com. ¢text. descriptors themselves are passed to and returned by functions. When necessary. . It is also a checked runtime error to pass a Text_T with a negative len field or a null str field to any function in this interface. Text exports functions that pass and return descriptors by value. Given a Text_T s. This download file is made available for personal use only and is subject to the Terms of Service. const char *str. but they must not change the fields or the characters in the string. Frank Liu Copyright © 1997 by David R. except as described below. except via functions in this interface.str[0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

If this character is not omitted.. int len). convert between descriptors and C-style strings. extern T Text_box(const char *str. This download file is made available for personal use only and is subject to the Terms of Service. copies s. Clients call Text_box to build descriptors for constant strings or for strings that they allocate. such as calling free or Mem_free. is an unchecked runtime error. It is checked runtime error for str to be null. Positions identify locations between characters. extern char *Text_get(char *str. which are defined as in Str. Text_put can raise Mem_Failed.len+1. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner. Note that the second argument to Text_box omits the null character at the end of editmsg.size-2]. It is a checked runtime error for size to be less than s. static char editmsg[] = "Last edited by: ". It “boxes” str and len in a descriptor and returns the descriptor. … Text_T msg = Text_box(editmsg. Text_get copies the string described by s into str[0. For example.1). Hanson.. Unauthorized use. All rights reserved. Frank Liu Copyright © 1997 by David R. When str is null. T s). It is a checked runtime error for str to be null or for len to be negative.len+1 bytes.str into that space. Many of the Text functions accept string positions. including before the first character and after the last one. sizeof (editmsg) .com. Positive positions identify positions from the left of the string beginning with the first character.INTERFACE 271 strings by external means. The functions ¢exported functions 271²≡ extern T Text_put(const char *str). and returns str. appends a null character. calls Mem_alloc to allocate s. it will be treated as part of the string described by msg. If str is null. . int size. and returns a pointer to the beginning of the allocated space. and nonpositive positions identify positions from the right of the C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assigns to msg a Text_T for "Last edited by: ". Text_put copies the null-terminated string str into the string space and returns a descriptor for the new string. Text_get ignores size. Text_get can raise Mem_Failed.

-4) Text_sub(s. This download file is made available for personal use only and is subject to the Terms of Service. and Text_sub does no allocation. int j).. reproduction and/or distribution are strictly prohibited and violate applicable laws. but many of them don’t accept position arguments because Text_sub provides the same capability at little cost. however. the expressions Text_sub(s. Text_sub simply returns a Text_T in which the str field points to the first character of the substring of s and the len field is the length of the substring. C Interfaces and Implementations: Techniques for Creating Reusable Software. the following figure from Chapter 15 shows the positions in the string Interface. 1 2 3 4 5 6 7 8 9 10 I n t e r f a c e 0 –9 –8 –7 –6 –5 –4 –3 –2 –1 The function ¢exported functions 271²+≡ extern T Text_sub(T s.com. if Text_T s = Text_put("Interface"). Frank Liu Copyright © 1997 by David R. Clients must not count on s and the return value sharing the same string. Most of the functions exported by Text are similar to those exported by Str. For example. 6. All rights reserved. Hanson. because Text may give empty strings and one-character strings special treatment. . 10. Unauthorized use. returns a descriptor for the substring of s between positions i and j. -4) Text_sub(s. 0.272 HIGH-LEVEL STRINGS string. 10) Text_sub(s. The positions i and j can be given in either order. 6. For example. int i. and strings don’t need to be terminated with a null character. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 0) all return descriptors for the substring face. s and the value returned may thus share the characters in the actual string. Any other use requires prior written consent from the copyright owner. Since clients don’t change the characters in a string.

extern T Text_reverse(T s). T s2). Text_pos(s. if s is assigned Interface as shown above. int n). Any other use requires prior written consent from the copyright owner. &Text_lcase) C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a checked runtime error for i in Text_pos or for i or j in Text_sub to specify a nonexistent position in s. reproduction and/or distribution are strictly prohibited and violate applicable laws.INTERFACE 273 The function ¢exported functions 271²+≡ extern int Text_pos(T s. const T *from. const T *to). Also. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the other argument is returned. &Text_ucase. it is a checked runtime error for n to be negative. For example. Frank Liu Copyright © 1997 by David R. All rights reserved. all can raise Mem_Failed. Text_dup returns a descriptor for the string that’s the result of concatenating n copies of s. ¢exported functions 271²+≡ extern T Text_map(T s. if either s1 or s2 describes the empty string. int i). -4) returns 6. . Unauthorized use. For each character in s that appears in from. the corresponding character in to appears in the result string. returns the outcome of mapping s according to the strings pointed to by from and to as follows. concatenate. The functions ¢exported functions 271²+≡ extern T Text_cat (T s1. If a character in s doesn’t appear in from.com. Text_cat returns a descriptor for the string that’s the result of concatenating s1 and s2. Hanson.. returns the positive position in s corresponding to the arbitrary position i. extern T Text_dup (T s. This download file is made available for personal use only and is subject to the Terms of Service. Text_cat makes a new copy of s1 and s2 only when necessary. duplicate. For example. Text_map(s. Text_reverse returns a string that holds the characters from s in the opposite order. and reverse strings. that character itself appears unchanged in the output.

and s[i] denotes the character to the right of position i. and Text_null is the empty string. Text_null. Text_map remembers the most recent nonnull from and to values. it is a checked runtime error for i or j to specify nonexistent positions. Text_map can raise Mem_Failed. Text_digits is 0123456789. described below. because these positions usually encode the state of the analysis. This download file is made available for personal use only and is subject to the Terms of Service. Text_ucase is the string ABCDEFGHIJKLMNOPQRSTUVWXYZ. Strings are compared by ¢exported functions 271²+≡ extern int Text_cmp(T s1. Any other use requires prior written consent from the copyright owner. Text_ascii holds the 128 ASCII characters. or s1 is greater than s2. C Interfaces and Implementations: Techniques for Creating Reusable Software. Text_ascii. reproduction and/or distribution are strictly prohibited and violate applicable laws. Text_cset is a string consisting of all 256 eight-bit characters. Text_lcase. Text exports a set of string-analysis functions that are nearly identical to those exported by Str. . All rights reserved. It is a checked runtime error for only one of from or to to be null. T s2).. and uses these values if from and to are both null. s1 is equal to s2. Text_ucase. s[i:j] denotes the substring of s between positions i and j. or greater than zero if. or for from->len to be different than to->len when from and to are both nonnull. In the descriptions that follow. Frank Liu Copyright © 1997 by David R. The following functions look for occurrences of single characters or sets of characters. Text_lcase is the string abcdefghijklmnopqrstuvwxyz. equal to zero. These functions. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. do accept positions in the string to be examined. Text_ucase and Text_lcase are examples of the predefined descriptors exported by Text. respectively. Unauthorized use.com. in all cases. Hanson. Clients can form other common strings by taking substrings of these.274 HIGH-LEVEL STRINGS returns a copy of s in which uppercase letters have been folded to their lowercase counterparts. Text_digits. which returns a value that’s less than zero. s1 is lexically less than s2. The complete list is: ¢exported data 274²≡ extern const T extern const T extern const T extern const T extern const T extern const T Text_cset.

and Text_rupto returns the positive position to the left of the rightmost occurrence of any character from set in s[i:j]. it returns zero. T set). j. and zero otherwise. Frank Liu Copyright © 1997 by David R. T set). s. The function C Interfaces and Implementations: Techniques for Creating Reusable Software. int int int int j. j) − str. T set). Text_match returns Text_pos(s. int j. i. int int int int i. If s[i:j] begins with a character from set. i. int j. and zero otherwise. Text_rmany returns the positive position before a nonempty sequence of characters from set. int c). If s[i:j] ends with a character from set. i. otherwise Text_rmany returns zero. Both functions return zero if c doesn’t appear in s[i:j]. int j.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.len if s[i:j] ends with str. i. j. T T T T str). Text_find returns the positive position to the left of the leftmost occurrence of str in s[i:j]. s. s. ¢exported functions 271²+≡ extern int Text_find (T extern int Text_rfind (T extern int Text_match (T extern int Text_rmatch(T s. i.INTERFACE 275 ¢exported functions 271²+≡ extern int Text_chr (T extern int Text_rchr (T extern int Text_upto (T extern int Text_rupto(T extern int Text_any (T extern int Text_many (T extern int Text_rmany(T s.. otherwise. If str doesn’t appear in s[i:j]. Text_any returns Text_pos(s. int j. int j. both functions return zero. j. i. . int c). Hanson. T set). Text_upto returns the positive position to the left of the leftmost occurrence of any character from set in s[i:j]. The remaining analysis functions look for occurrences of strings. reproduction and/or distribution are strictly prohibited and violate applicable laws. Text_many returns the positive position following a contiguous nonempty sequence of characters from set. Text_chr returns the positive position to the left of the leftmost occurrence of c in s[i:j]. int int int int int int int i. s. i) + 1 if s[i] is equal to c. and Text_rfind returns the positive position to the left of the rightmost occurrence of str in s[i:j]. All rights reserved. Text_rmatch returns Text_pos(s. and zero otherwise.len if s[i:j] begins with str. s. s. T set). str). i. and Text_rchr returns the positive position to the left of the rightmost occurrence of c in s[i:j]. Unauthorized use. str). s. int j. i. i. Both functions return zero if none of the characters from set appear in s[i:j]. Any other use requires prior written consent from the copyright owner. s. str). s. i) + str. This download file is made available for personal use only and is subject to the Terms of Service.

passing small structures in the variable part of a variable length argument list may not be portable. the following functions manage that space as a stack. and precision arguments in the same way that the printf code %s formats its string argument. but the Text functions can take advantage of several important special cases. unsigned char flags[]. Unauthorized use. Any other use requires prior written consent from the copyright owner. Text_save can raise Mem_Failed. int width. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. ¢exported types 270²+≡ typedef struct Text_save_T *Text_save_T. or for app or flags to be null. This value can later be passed to Text_restore to deallocate that portion of the string space that was allocated since the Text_save_T value was created. C Interfaces and Implementations: Techniques for Creating Reusable Software. width. Hanson. It is a checked runtime error for the pointer to the Text_T to be null. Specifically. ¢exported functions 271²+≡ extern Text_save_T Text_save(void). in Standard C. This download file is made available for personal use only and is subject to the Terms of Service. can be used with the Fmt interface as a conversion function. int put(int c. . void *cl).276 HIGH-LEVEL STRINGS ¢exported functions 271²+≡ extern void Text_fmt(int code. It is a checked runtime error to pass a null Text_save_T to Text_restore. calling Text_restore(h) invalidates all descriptors and all Text_save_T values that were created after h. Frank Liu Copyright © 1997 by David R. which is where it stores the actual strings for the results of the functions described above that return descriptors.. It is an unchecked runtime error to use these values. int precision). A pointer to a Text_T is used because. Text gives clients some limited control over its allocation of the string space. If h is a value of type Text_save_T. va_list *app. void *cl. as detailed below. extern void Text_restore(Text_save_T *save). reproduction and/or distribution are strictly prohibited and violate applicable laws. It consumes a pointer to a Text_T and formats its string according to the optional flags.com.2 Implementation The implementation of Text is much like the implementation of Str. 16. Text_save returns a value of the opaque pointer type Text_save_T that encodes the “top” of the string space.

IMPLEMENTATION

277

¢text.c²≡ #include #include #include #include #include #include

<string.h> <limits.h> "assert.h" "fmt.h" "text.h" "mem.h"

#define T Text_T ¢macros 278² ¢types 287² ¢data 277² ¢static functions ¢functions 278²

286²

The constant descriptors all point to one string consisting of all 256 characters: ¢data 277²≡ static char cset[] =
"\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017" "\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037" "\040\041\042\043\044\045\046\047\050\051\052\053\054\055\056\057" "\060\061\062\063\064\065\066\067\070\071\072\073\074\075\076\077" "\100\101\102\103\104\105\106\107\110\111\112\113\114\115\116\117" "\120\121\122\123\124\125\126\127\130\131\132\133\134\135\136\137" "\140\141\142\143\144\145\146\147\150\151\152\153\154\155\156\157" "\160\161\162\163\164\165\166\167\170\171\172\173\174\175\176\177" "\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217" "\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237" "\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257" "\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277" "\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317" "\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337" "\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357" "\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377"

; const T Text_cset const T Text_ascii const T Text_ucase

= { 256, cset }; = { 128, cset }; = { 26, cset + 'A' };

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

278

HIGH-LEVEL STRINGS

const T Text_lcase = { const T Text_digits = { const T Text_null = {

26, cset + 'a' }; 10, cset + '0' }; 0, cset };

The Text functions accept positions, but convert them to indices of the character to the right of the position in order to access the characters in the string. A positive position is converted to an index by subtracting one, and a nonpositive position is converted to an index by adding the length of the string: ¢macros 278²≡ #define idx(i, len) ((i) <= 0 ? (i) + (len) : (i) - 1) An index is converted to a positive position by adding one, as illustrated by the implementation of Text_pos, which converts its position argument to an index, then converts the index back to a positive position. ¢functions 278²≡ int Text_pos(T s, int i) { assert(s.len >= 0 && s.str); i = idx(i, s.len); assert(i >= 0 && i <= s.len); return i + 1; } The first assertion in Text_pos implements the checked runtime error that all Text_Ts must have nonnegative len fields and nonnull str fields. The second assertion is the checked runtime error that the position i — now an index — corresponds to a valid position in s. If s has N characters, the valid indices are zero through N−1, but the valid positions are one through N+1, which is why the second assertion accepts indices as large as N. Text_box and Text_sub both build and return new descriptors. ¢functions 278²+≡ T Text_box(const char *str, int len) { T text; assert(str); assert(len >= 0); text.str = str;

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATION

279

text.len = len; return text; } Text_sub is similar, but it must convert its position arguments to indices so that it can compute the length of the result: ¢functions 278²+≡ T Text_sub(T s, int i, int j) { T text;

¢convert i and j to indices in 0..s.len text.len = j - i; text.str = s.str + i; return text; }

As shown, there are j − i characters between i and j, after they’ve been converted from positions to indices. The code for that conversion also swaps i and j so that i always specifies the index of the leftmost character. ¢convert i and j to indices in 0..s.len 279²≡ assert(s.len >= 0 && s.str); i = idx(i, s.len); j = idx(j, s.len); if (i > j) { int t = i; i = j; j = t; } assert(i >= 0 && j <= s.len); The position to the right of the last character is converted to the index of a nonexistent character, and the assertions accept such positions. ¢convert i and j to indices in 0..s.len 279² is used only when these indices are not used to fetch or store a character. Text_sub, for example, uses them only to compute a starting position and length. Other Text functions use the resulting values of i and j only after they’ve checked that i and j are valid indices. Text_put and Text_get copy strings in and out of the string space. Text implements its own allocation function, *alloc(int len), to allocate len bytes of string space for several reasons. First, alloc avoids the block headers used in general-purpose allocators, so that it can arrange for strings to be adjacent. This leads to several important optmizations

Licensed by Frank Liu 1740749
279²

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

280

HIGH-LEVEL STRINGS

for Text_dup and Text_cat. Second, alloc can ignore alignment, because there are no alignment restrictions for characters. Finally, alloc must cooperate with Text_save and Text_restore. alloc is described starting on page 286, along with Text_save and Text_restore. Text_put is typical of the few Text functions that allocate string space. It calls alloc to allocate the space required, copies its argument string into that space, and returns the appropriate descriptor: ¢functions 278²+≡ T Text_put(const char *str) { T text; assert(str); text.len = strlen(str); text.str = memcpy(alloc(text.len), str, text.len); return text; } Text_put calls memcpy instead of strcpy because it must not append a null character to text.str. Text_get does just the reverse: It copies a string from the string space to a C-style string. If the pointer to the C-style string is null, Text_get calls Mem’s general-purpose allocator to allocate space for the string and its terminating null character: ¢functions 278²+≡ char *Text_get(char *str, int size, T s) { assert(s.len >= 0 && s.str); if (str == NULL) str = ALLOC(s.len + 1); else assert(size >= s.len + 1); memcpy(str, s.str, s.len); str[s.len] = '\0'; return str; } Text_get calls memcpy instead of strncpy because it must copy null characters that appear in s.

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATION

281

16.2.1 String Operations
Text_dup makes n copies of its Text_T argument s. ¢functions 278²+≡ T Text_dup(T s, int n) { assert(s.len >= 0 && s.str); assert(n >= 0); ¢Text_dup 281² } There are several important special cases in which allocation of n copies of s can be avoided. For example, if s is the null string or n is zero, Text_dup returns the null string; if n is one, Text_dup can just return s: ¢Text_dup 281²≡ if (n == 0 || s.len == 0) return Text_null; if (n == 1) return s; If s has been created recently, s.str might lie at the end of the string space; that is, s.str + s.len might be equal to the address of the next free byte. If so, only n − 1 copies of s are needed, because the original s can serve as the first duplicate. The macro isatend(s, n), defined on page 286, checks whether s.str is at the end of the string space, and whether there’s space for at least n characters. ¢Text_dup 281²+≡ { T text; char *p; text.len = n*s.len; if (isatend(s, text.len - s.len)) { text.str = s.str; p = alloc(text.len - s.len); n--; } else text.str = p = alloc(text.len); for ( ; n-- > 0; p += s.len) memcpy(p, s.str, s.len);

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

282

HIGH-LEVEL STRINGS

return text; } Text_cat returns the concatentation of two strings, s1 and s2. ¢functions 278²+≡ T Text_cat(T s1, T s2) { assert(s1.len >= 0 && s1.str); assert(s2.len >= 0 && s2.str); ¢Text_cat 282² } As for Text_dup, there are several important special cases that avoid allocations. First, if either s1 or s2 is the null string, Text_cat can simply return the other descriptor: ¢Text_cat 282²≡ if (s1.len return if (s2.len return

== 0) s2; == 0) s1;

s1 and s2 might already be adjacent, in which case Text_cat can return a descriptor for the combined result: ¢Text_cat 282²+≡ if (s1.str + s1.len == s2.str) { s1.len += s2.len; return s1; } If s1 lies at the end of the string space, then only s2 needs to be copied; otherwise, both strings must be copied: ¢Text_cat 282²+≡ { T text; text.len = s1.len + s2.len; if (isatend(s1, s2.len)) { text.str = s1.str; memcpy(alloc(s2.len), s2.str, s2.len);

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATION

283

} else { char *p; text.str = p = alloc(s1.len + s2.len); memcpy(p, s1.str, s1.len); memcpy(p + s1.len, s2.str, s2.len); } return text; } Text_reverse, which returns a copy of its argument s with its characters in the opposite order, has only two important special cases: when s is the null string and when it has only one character: ¢functions 278²+≡ T Text_reverse(T s) { assert(s.len >= 0 && s.str); if (s.len == 0) return Text_null; else if (s.len == 1) return s; else { T text; char *p; int i = s.len; text.len = s.len; text.str = p = alloc(s.len); while (--i >= 0) *p++ = s.str[i]; return text; } } The implementation of Text_map is similar to the implementation of Str_map. First, it uses the from and to strings to build an array that maps characters; given an input character c, map[c] is the character that appears in the output string. map is initialized so that map[k] is equal to k, then the characters in from are used to index the elements in map that are to be mapped to the corresponding characters in to: ¢rebuild map int k;
283²≡

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

284

HIGH-LEVEL STRINGS

for (k = 0; k < (int)sizeof map; k++) map[k] = k; assert(from->len == to->len); for (k = 0; k < from->len; k++) map[(unsigned char)from->str[k]] = to->str[k]; inited = 1; The inited flag is set to one after map has been initialized, and inited is used to implement the checked runtime error that the first call to Text_map must specify nonnull from and to strings: ¢functions 278²+≡ T Text_map(T s, const T *from, const T *to) { static char map[256]; static int inited = 0; assert(s.len >= 0 && s.str); if (from && to) { ¢rebuild map 283² } else { assert(from == NULL && to == NULL); assert(inited); } if (s.len == 0) return Text_null; else { T text; int i; char *p; text.len = s.len; text.str = p = alloc(s.len); for (i = 0; i < s.len; i++) *p++ = map[(unsigned char)s.str[i]]; return text; } } Str_map doesn’t need the inited flag because it’s impossible to map a character to the null character with Str_map; asserting that map['a'] is nonzero was enough to implement the checked runtime error (see

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATION

285

page 256). Text_map, however, permits all possible mappings, and thus cannot use a value in map to implement the check. Text_cmp compares two strings s1 and s2 and returns a value that’s less than zero, equal to zero, or greater than zero when s1 is less than, equal to, or greater than s2, respectively. The important special case is when s1 and s2 point to the same string, in which case the shorter string is less than the longer one. Likewise, when one of the strings is a prefix of the other, the shorter one is less. ¢functions 278²+≡ int Text_cmp(T s1, T s2) { assert(s1.len >= 0 && s1.str); assert(s2.len >= 0 && s2.str); if (s1.str == s2.str) return s1.len - s2.len; else if (s1.len < s2.len) { int cond = memcmp(s1.str, s2.str, s1.len); return cond == 0 ? -1 : cond; } else if (s1.len > s2.len) { int cond = memcmp(s1.str, s2.str, s2.len); return cond == 0 ? +1 : cond; } else return memcmp(s1.str, s2.str, s1.len); }

16.2.2 Memory Management
Text implements its own memory allocator so that it can take advantage of adjacent strings in Text_dup and Text_cat. Since the string space holds only characters, Text’s allocator can also avoid block headers and alignment issues, which saves space. The allocator is a simple variant of the arena allocator described in Chapter 6. The string space is like a single arena in which the allocated chunks appear in the list emanating from head: ¢data 277²+≡ static struct chunk { struct chunk *link; char *avail; char *limit; } head = { NULL, NULL, NULL }, *current = &head;

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

286

HIGH-LEVEL STRINGS

The limit field points to the byte one past the end of the chunk, avail points to the first free byte, and link points to the next chunk, all of which is free. current points to the “current” chunk, which is the one in which allocations are made. The definition above initializes current to point to a zero-length chunk; the first allocation appends a new chunk to head. alloc allocates len bytes from the current chunk, or allocates a new chunk of at least 10K bytes: ¢static functions 286²≡ static char *alloc(int len) { assert(len >= 0); if (current->avail + len > current->limit) { current = current->link = ALLOC(sizeof (*current) + 10*1024 + len); current->avail = (char *)(current + 1); current->limit = current->avail + 10*1024 + len; current->link = NULL; } current->avail += len; return current->avail - len; } current->avail is the address of the free byte at the end of the string space. A Text_T s appears at the end of the string space if s.str + s.len is equal to current->avail. The macro isatend is thus ¢macros 278²+≡ #define isatend(s, n) ((s).str+(s).len == current->avail\ && current->avail + (n) <= current->limit) Text_dup and Text_cat can take advantage of strings that appear at the end of the string space only when there’s enough free space in that chunk to satisfy the request, which explains isatend’s second parameter. Text_save and Text_restore give clients a way to save and restore the location of the end of the string space, which is given by the values of current and current->avail. Text_save returns an opaque pointer to an instance of

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

Frank Liu Copyright © 1997 by David R. which carries the values of current and current->avail. Thus. }. save->current = current. for (p = current->link.com. return save. p = q) { q = p->link. } current->link = NULL. Unauthorized use.. } Text_save calls alloc(1) to create a “hole” in the string space so that isatend will fail for any string allocated before the hole. save->avail = current->avail. NEW(save). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. *q. Hanson. p. FREE(*save). current->avail = (*save)->avail. and deallocates all of the chunks that follow the current one. it’s impossible for a string to straddle the end of the string space that’s returned to the client. char *avail. ¢functions 278²+≡ void Text_restore(Text_save_T *save) { struct chunk *p. current = (*save)->current. deallocates the Text_save_T structure and clears *save. FREE(p). Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws.IMPLEMENTATION 287 ¢types 287²≡ struct Text_save_T { struct chunk *current. This download file is made available for personal use only and is subject to the Terms of Service. } C Interfaces and Implementations: Techniques for Creating Reusable Software. ¢functions 278²+≡ Text_save_T Text_save(void) { Text_save_T save. . alloc(1). All rights reserved. Text_restore restores the values of current and current->avail. assert(save && *save).

return 0. i < j. return 0.len 279² for ( . int c) { ¢convert i and j to indices in 0...str[i] == c) return i + 1. T set) { assert(set. which is specified with a Text_T: ¢functions 278²+≡ int Text_upto(T s. int j. reproduction and/or distribution are strictly prohibited and violate applicable laws. return 0. i < j. Text_rchr is similar. Unauthorized use. } C Interfaces and Implementations: Techniques for Creating Reusable Software. except that they search for occurrences of any character in a set of characters.str[i] is equal to c. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R.len)) return i + 1.2. s. Hanson.str[i]. } Text_upto and Text_rupto are like Text_chr and Text_rchr. i + 1 is the position to the left of that character in s.str. set. int i. int j. .str).. ¢convert i and j to indices in 0. int c) { ¢convert i and j to indices in 0.len >= 0 && set.str[--j] == c) return j + 1. } If s.len 279² while (j > i) if (s. int i..3 Analyzing Strings The remaining functions exported by Text inspect strings. Any other use requires prior written consent from the copyright owner.len 279² for ( . i++) if (memchr(set.s. int j.s.s. Text_chr looks for the leftmost occurrence of a character in s[i:j]: ¢functions 278²+≡ int Text_chr(T s. i++) if (s. but looks for the rightmost occurrence of c: ¢functions 278²+≡ int Text_rchr(T s.com. int i.288 HIGH-LEVEL STRINGS 16. All rights reserved. none of them allocate new ones. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

s. Text_find uses memcmp as it searches for the leftmost occurrence in s[i:j] of the string given by str.len <= j. Text_find and Text_rfind. . but the Text functions must use memcmp.str) return i + 1.. which doesn’t interpret null characters as string terminators. reproduction and/or distribution are strictly prohibited and violate applicable laws.len 279² if (str.len 279² while (j > i) if (memchr(set. return 0.str.str[i] == *str. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Licensed by Frank Liu 1740749 ¢functions 278²+≡ int Text_find(T s. int i. i.s. so they use memchr. i < j. The Text functions can’t use strchr because both s and set might contain null characters. This download file is made available for personal use only and is subject to the Terms of Service.. have a similar problem: The Str variants of these functions use strncmp to compare substrings.s.len >= 0 && str. return 0.len == 0) return i + 1.str).com.. Frank Liu Copyright © 1997 by David R. i++) if (s. T set) { assert(set.str).len)) return j + 1. ¢convert i and j to indices in 0. } else for ( . set.len == 1) { for ( . which copes with null characters. T str) { assert(str. str)) return i + 1. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 289 int Text_rupto(T s. i + str. which find occurrences of strings in s[i:j]. The cases that merit special attention are when str is the null string or when it has only one character. int i. int j. All rights reserved. Hanson. Unauthorized use. ¢convert i and j to indices in 0. else if (str. i++) if (equal(s.str[--j].len >= 0 && set. int j. } C Interfaces and Implementations: Techniques for Creating Reusable Software. } Str_upto and Str_rupto use the C library function strchr to check whether a character in s appears in set.

len >= i. s. j .len)) return i + 2. else if (str. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } Text_any steps over the character to the right of position i in s.len.str.len && memchr(set. str)) return j . Any other use requires prior written consent from the copyright owner.len + 1. int i. set.str). This download file is made available for personal use only and is subject to the Terms of Service.str. s. Text_find must not inspect characters beyond the substring s[i:j].s.str). i) + 1.str[i]. and returns Text_pos(s. return 0.len). } else for ( . int j. but it searches for the rightmost occurrence of str. j .str[i].len >= 0 && set. which is the reason for the termination condition in the for loop. All rights reserved. ¢convert i and j to indices in 0.len >= 0 && str.str) return j + 1. assert(i >= 0 && i <= s. ¢functions 278²+≡ int Text_any(T s..len 279² if (str.len >= 0 && s. t) \ (memcmp(&(s). if that character appears in set. (t).str. ¢functions 278²+≡ int Text_rfind(T s..str.len == 1) { while (j > i) if (s. assert(set. } C Interfaces and Implementations: Techniques for Creating Reusable Software.str. Hanson. int i.str[--j] == *str. T str) { assert(str. j--) if (equal(s.len) == 0) In the general case. reproduction and/or distribution are strictly prohibited and violate applicable laws. return 0. Unauthorized use.290 HIGH-LEVEL STRINGS ¢macros 278²+≡ #define equal(s. . T set) { assert(s. (t). i = idx(i. and it avoids inspecting characters that appear before s[i:j].str). i. Text_rfind is like Text_find. Frank Liu Copyright © 1997 by David R.len).com. if (i < s.len == 0) return j + 1.

T set) { assert(set.s. } return 0.len 279² if (i < j && memchr(set.str. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. T set) { assert(set.str.len)). In the first case. int i.str[i].len)). } Text_rmany steps left over the run of one or more characters from set that appear at the end of s[i:j]: ¢functions 278²+≡ int Text_rmany(T s. set. so i + 2 is the position after s[i].str[j]. .str).com. s.len 279² if (j > i && memchr(set. set. int j. s.len)) { do --j. } The do-while loop ends when j is the index of a character that’s not in set or when j is equal to i − 1. Unauthorized use. return j + 2.str). s. This download file is made available for personal use only and is subject to the Terms of Service.IMPLEMENTATION 291 When s[i] is in set. int j. They step over a run of one or more characters given by a set and return the position to the left of the first character that’s not in the set.str. while (j >= i && memchr(set. } return 0. set.. int i. Text_many and Text_rmany are often called after Text_upto and Text_rupto.len >= 0 && set.str. ¢convert i and j to indices in 0. s.. Text_any returns i + 2 because i + 1 is the position of s[i]. reproduction and/or distribution are strictly prohibited and violate applicable laws. set.str[j-1]. while (i < j && memchr(set. Any other use requires prior written consent from the copyright owner.len)) { do i++.len >= 0 && set.. ¢convert i and j to indices in 0. return i + 1. All rights reserved. Text_many steps over the run that appears at the beginning of s[i:j]: ¢functions 278²+≡ int Text_many(T s. Hanson.str[i]. j + 2 is the position to C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R.s.

.str.len == 0) return i + 1. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. j . As with Text_find. This download file is made available for personal use only and is subject to the Terms of Service.str[i] == *str. ¢functions 278²+≡ int Text_rmatch(T s. T str) { assert(str. i. } Text_rmatch is like Text_match. ¢convert i and j to indices in 0. int j.len 279² if (str.s. In the second case.str[j-1] == *str.str. and it’s careful not to examine characters before s[i:j].str) return i + 2. return 0.str). T str) { assert(str. ¢functions 278²+≡ int Text_match(T s. Text_match must not inspect characters outside of s[i:j].len == 1) { if (i < j && s.len >= 0 && str. else if (str. str)) return i + str.len 279² if (str..len.str) return j.len + 1.len <= j && equal(s. but it returns the position before the string in str if s[i:j] ends with that string.len == 1) { if (j > i && s. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. Text_match steps over an occurrence of a string given by str.len >= 0 && str.292 HIGH-LEVEL STRINGS the right of the offending character and thus to the left of the run of characters in set. the condition in the third if statement below ensures that only characters in s[i:j] are examined. j + 2 is to the left of s[i:j]. else if (str. C Interfaces and Implementations: Techniques for Creating Reusable Software. ¢convert i and j to indices in 0.len == 0) return j + 1.str)..s. } else if (j .str. str)) return j . if s[i:j] begins with that string. Text_match’s important special cases are when str is the null string and when str has only one character. which consists entirely of characters in set. Hanson. int j. } else if (i + str. Frank Liu Copyright © 1997 by David R.len + 1. All rights reserved. int i.len >= i && equal(s. .com. Any other use requires prior written consent from the copyright owner. int i.

Hanson. width. and precision specifications for Text_Ts in the same way as printf does for C strings. Unauthorized use. s->len. some C implementations cannot reliably pass two-word structures by value in a variable length argument list.2.FURTHER READING 293 return 0. flags. T*). int precision) { T *s. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which interprets the flags. cl. ¢functions 278²+≡ void Text_fmt(int code. Passing a pointer to a Text_T avoids these problems in all implementations. put. and have built-in features that are similar to the functions exported by Text.4 Conversion Functions The last function is Text_fmt. s = va_arg(*app. not a Text_T. int width. Frank Liu Copyright © 1997 by David R.com. Fmt_puts(s->str. assert(app && flags). typically two words. Both of these languages are general-purpose. } Unlike all the other functions in the Text interface. and it’s impossible to distinguish two-word structures from doubles in variable length argument lists in a portable way. va_list *app. precision). Text_Ts are small. Text_fmt is used to print Text_Ts in the same style as printf’s %s format.. assert(s && s->len >= 0 && s->str). void *cl). width. Further Reading Text_Ts are similar in both their semantics and implementation to strings in SNOBOL4 (Griswold 1972) and Icon (Griswold and Griswold 1990). All rights reserved. string-processing languages. which is a format-conversion function for use with the functions exported by the Fmt interface. void *cl. . So. Any other use requires prior written consent from the copyright owner. int put(int c. Text_fmt consumes a pointer to a Text_T. unsigned char flags[]. This download file is made available for personal use only and is subject to the Terms of Service. } 16. C Interfaces and Implementations: Techniques for Creating Reusable Software. It just calls Fmt_puts. reproduction and/or distribution are strictly prohibited and violate applicable laws.

The characters in a rope can be traversed in linear time. x = Text_save(). however. to extend a string to either the left or the right. Unauthorized use.1 Rewrite ids. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the XPL compiler generator (McKeeman. y.c. Hansen (1992) describes a completely different representation for strings in which a substring descriptor carries enough information to retrieve the larger string of which it is a part. the following sequence is erroneous. Hanson. among other things. but the error is not detected.. Exercises 16. described in Section 15. … Text_restore(&y). Another useful feature is that a rope can be described by a function for generating the ith character. … Text_restore(&x). In systems in which all the Text_Ts are known. 16. . It compacts the strings for the known Text_Ts by copying them to the beginning of the string space.294 HIGH-LEVEL STRINGS Similar techniques for representing and manipulating strings have long been used in compilers and other applications that analyze strings.2.2 Text_save and Text_restore aren’t very robust. and Plass 1995). Horning.com. Icon uses XPL’s garbage-collection algorithm to reclaim string space that’s not referenced by any of the known Text_Ts (Hanson 1980). Concatenation. This download file is made available for personal use only and is subject to the Terms of Service. For example. “Ropes” are another representation in which a string is represented by a tree of substrings (Boehm. just like those in a Text_T or in a C string. Text_save_T x. using the Text functions. This representation makes it possible. is much faster: Concatenating two ropes takes constant time. but the substring operation takes logarithmic time. and Wortman 1970) is an early example. Any other use requires prior written consent from the copyright owner. garbage collection can be used to manage the string space. reproduction and/or distribution are strictly prohibited and violate applicable laws. … y = Text_save(). Frank Liu Copyright © 1997 by David R. Atkinson.

16. to accept Text_Ts that specify regular expressions instead of just strings. reproduction and/or distribution are strictly prohibited and violate applicable laws.5 Design an interface and an implementation based on the substring model described in Hansen (1992). Hanson. because it describes a string-space location after x.3 Text_save and Text_restore permit only stacklike allocation. Kernighan and Plauger (1976) describe regular expressions and the implementation of an automaton that matches them. This download file is made available for personal use only and is subject to the Terms of Service.EXERCISES 295 After the call to Text_restore(&x). Frank Liu Copyright © 1997 by David R. thereby reclaiming the space occupied by unregistered Text_Ts. but requires that all accessible Text_Ts be known. Unauthorized use. 16. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 16. All rights reserved.4 Extend the functions that search for strings. y is invalid. and a function Text_compact that uses the scheme described in Hanson (1980) to compact the strings referenced by all the registered Text_Ts into the beginning of the string space. Revise the implementation of Text so that this error is a checked runtime error. . Garbage collection would be better. Design an extended version of the Text interface that includes a function to “register” a Text_T.. like Text_find and Text_match. C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.com.

This download file is made available for personal use only and is subject to the Terms of Service.C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.com. All rights reserved.. reproduction and/or distribution are strictly prohibited and violate applicable laws. . Unauthorized use. Frank Liu Copyright © 1997 by David R.

Hanson. A 17. .com.295. such as in many scientific applications. These higher-level interfaces are designed for use in applications that need integer values in a potentially huge range. Any other use requires prior written consent from the copyright owner. This chapter describes a low-level interface.647 (using a two’scomplement representation) and the unsigned integers from zero to 4.648 to +2. These ranges are large enough for many — perhaps most — applications. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.147. All rights reserved. but floating-point numbers cannot be used when all of the integer values in a large range are required.967.1 Interface An n-digit unsigned integer x is represented by the polynomial x = xn – 1 b n–1 + xn – 2 b n–2 + … + x1 b + x0 1 297 C Interfaces and Implementations: Techniques for Creating Reusable Software.483. Integers represent every integral value in a relatively compact range. Frank Liu Copyright © 1997 by David R. but some applications need larger ranges.. The values that can be represented are limited only by the available memory. This interface is designed to serve higher-level interfaces like those in the next two chapters.483. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. Floating-point numbers can be used when approximations to the exact values are acceptable.17 EXTENDED-PRECISION ARITHMETIC computer with 32-bit integers can represent the signed integers from −2. Floating-point numbers represent relatively few values in a huge range.294.147. that exports functions for arithmetic operations on extended integers of fixed precision. XP.

30620.647 could be represented by the array unsigned char x[] = { 7. 31785.483.147. 8. which is the most convenient order for implementing the arithmetic operations. for example.647: unsigned char x[] = { 255. and it takes only two digits (four bytes) to represent 2. 255. The number 2. 7. Any other use requires prior written consent from the copyright owner. so an unsigned long can hold b – 1 = 2 – 1. 9033. these com3 plications can be avoided if an unsigned long integer can hold b – 1. which 3 24 holds at least three bytes. if b is 2 16 = 65. The digits x i appear in x. each digit is a number between zero and 65. and 27 bytes to represent the 64-digit number shown above: k C Interfaces and Implementations: Techniques for Creating Reusable Software. 255. . 8 With b = 2 . 4. This representation can be generalized to represent an unsigned integer in any base. This download file is made available for personal use only and is subject to the Terms of Service. and each coefficient x i is represented by one of the 32 bits. Unauthorized use.535 inclusive. 4503. Choosing a larger base may save memory. On a computer with 32-bit unsigned integers. 34909. the larger the digits. 3. and the 64-digit number 349052951084765949147849619903898133417764638493387843990820577 is represented by the 14-element (28-byte) array { 38625. 43799. then each x i is a number between zero and nine inclusive. As detailed below. 32767 }.147. 33017. 8 }.483. b is 10. it takes four bytes to represent the value 2. Frank Liu Copyright © 1997 by David R. 127 }. 28867..com. large bases complicate the implementation of some of the arithmetic operations. because Standard C guarantees that an unsigned long has at least 32 bits. XP 8 uses b = 2 and stores each digit in an unsigned character. because the larger the base.483. 54807. b is 2. and x can be represented by an array. 2 }. n is 32. then a smaller base can be used without wasting space. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.647: unsigned short x[] = { 65535. 4. 4. 3500. If b is 2 and k is the size in bits of one of the predefined unsigned integer types in C. If. reproduction and/or distribution are strictly prohibited and violate applicable laws. Perhaps more important. Hanson. 60627.147. 6. All rights reserved. 1.298 EXTENDED-PRECISION ARITHMETIC where b is the base and 0 ≤ x i < b . least significant digit first. For example. 28372.536. where x[i] holds x i .

128. extern int XP_sub(int n. Hanson. 23. 93. and z. T y. 112. 73. 150. 119. Second. because it omits most checked runtime errors. There are two reasons for this design. if performance considerations necessitate. All rights reserved. T x.. The XP functions described below take n as an input argument and XP_Ts as input and output arguments.h²≡ #ifndef XP_INCLUDED #define XP_INCLUDED #define T XP_T typedef unsigned char *T. and these arrays must be large enough to accommodate n digits. 13. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 41. The XP interface reveals these representation details: ¢xp. 156. y. 171. Here and below. T y. 124. 136. which are assumed to have n digits.INTERFACE 299 { 225. or a nonpositive length. 214. int borrow).com. Frank Liu Copyright © 1997 by David R. This latter consideration is why none of the XP functions do allocations. 195. It is a unchecked runtime error to pass to any function in this interface a null XP_T. y. 23. 236. int carry). 211. T z. and z denote the integer values represented by the arrays x. and returns Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. an XP_T is an array of unsigned characters that holds the digits 8 of an n-digit number in base 2 . XP’s interface is as simple as possible so that some of the functions can be implemented in assembly language. 8 }. XP_add sets z[0. Unauthorized use. 212. T z. T x. ¢exported functions #undef T #endif 299² That is. 17. This download file is made available for personal use only and is subject to the Terms of Service. XP is a dangerous interface. an XP_T that is too small. carry and borrow must be zero or one. least significant digit first. implement z = x + y + carry and z = x − y − borrow. 172. 35. Any other use requires prior written consent from the copyright owner. . XP’s intended clients are higher-level interfaces that presumably specify and implement the checked runtime errors necessary to avoid errors. 249. The functions ¢exported functions 299²≡ extern int XP_add(int n. 110. reproduction and/or distribution are strictly prohibited and violate applicable laws. 151.n-1] to the n-digit sum x + y + carry.. x.

5 explores some other definitions of T that work correctly with const. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. or z to be the same XP_T. x + y + carry doesn’t fit in n digits. The meager benefits of const don’t outweigh its verbosity in XP. implements z = z + x•y. and r and y have m digits. It is an unchecked runtime error for z to be the same XP_T as x or y. The const qualifier does not prevent the same XP_T from being passed as x and z (or y and z). it is not an error for any of x. If y is zero. But this use of const does permit a const unsigned char * to be passed as x and y. z must be large enough to hold n+m digits: XP_mul adds the n+m-digit product x•y to z.. and thus implies that z should not be the same as x or y. int n.T tmp). ¢exported functions 299²+≡ extern int XP_mul(T z. Thus. where x has n digits and y has m digits. When z is initialized to zero.. T x. Exercise 19. T q..n+m-1] to x•y. The function ¢exported functions 299²+≡ extern int XP_div(int n. Any other use requires prior written consent from the copyright owner.n-1] to the n-digit difference x − y − borrow and returns the borrow-out of the most significant digit. The declaration extern int XP_mul(T z. makes it explicit that XP_mul reads x and y and writes z. XP_mul returns the carry-out of the most significant digit of the augmented n+m-digit product. This download file is made available for personal use only and is subject to the Terms of Service. in XP’s declaration for XP_mul above. int m. For just these two functions. const unsigned char *y).300 EXTENDED-PRECISION ARITHMETIC the carry-out of the most significant digit. Hanson. const T cannot be used for x and y. T y). int m. XP_sub sets z[0.com. . and if XP_sub returns one. T x. XP_mul sets z[0. XP_mul illustrates where the const qualifier might help identify input and output parameters and document these kinds of runtime errors. y. implements division: It computes q = x/y and r = x mod y. T r. however. int n. q and x have n digits. because an unsigned char * can be passed to a const unsigned char *. if XP_add returns one. Unauthorized use. y > x. int m. because it means “constant pointer to an unsigned char” instead of the intended “pointer to a constant unsigned char” (see page 29). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. XP_div returns zero and C Interfaces and Implementations: Techniques for Creating Reusable Software. T y. Frank Liu Copyright © 1997 by David R. type casts must be used to pass these values. const unsigned char *x.

x mod y. n. XP_quotient sets z[0. z. equal to zero. x.. T T T T x. Unauthorized use. which returns a value less than zero. y must be positive and must not exceed 8 the base. Any other use requires prior written consent from the copyright owner.. implement addition. T x. ¢exported functions 299²+≡ extern int XP_neg(int n. reproduction and/or distribution are strictly prohibited and violate applicable laws.INTERFACE 301 leaves q and r unchanged.n-1] to x − y and returns the borrow-out of the most significant digit. XP_Ts can be shifted with the functions C Interfaces and Implementations: Techniques for Creating Reusable Software. sets z[0. y). This download file is made available for personal use only and is subject to the Terms of Service.. For XP_product and XP_quotient. respectively. or greater than zero if. x = y. it returns one. for q and r to be the same XP_T. y). x. when carry is one. int carry). Hanson.com. The functions ¢exported functions 299²+≡ extern int XP_sum (int extern int XP_diff (int extern int XP_product (int extern int XP_quotient(int n.. T x. For XP_sum and XP_diff. otherwise. T T T T z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. x < y.n-1] to ~x + carry and returns the carry-out of the most significant digit. tmp must be able to hold at least n+m+2 digits. XP_Ts are compared by ¢exported functions²+≡ extern int XP_cmp(int n. multiplication. All rights reserved. T z. x. XP_product sets z[0.. . XP_neg implements a two’scomplement negation. or for tmp to be too small. y). When carry is zero.n-1] to x/y and returns the remainder. XP_neg implements one’scomplement negation. T y). the carry can be as large as 2 – 1.n-1] to x + y and returns the carry-out of the most significant digit. int int int int y). XP_sum sets z[0. 2 .n-1] to x•y and returns the carry-out of the 8 most significant digit. the remainder can be as large as y – 1. Frank Liu Copyright © 1997 by David R. and division of an n8 digit XP_T by a single base 2 digit y. subtraction. XP_diff sets z[0. z. n. or x > y.. It is an unchecked runtime error for q or r to be one of x or y. y must not 8 exceed 2 − 1. n. z.

it interprets the string in str as an unsigned integer in base. T z. int base. the bits in u that don’t fit in z. T x). T x).. This download file is made available for personal use only and is subject to the Terms of Service.302 EXTENDED-PRECISION ARITHMETIC ¢exported functions 299²+≡ extern void XP_lshift(int n. ¢exported functions 299²+≡ extern int XP_length (int n. that is. Frank Liu Copyright © 1997 by David R. extern unsigned long XP_fromint(int n. which must be equal to zero or one. XP_rshift implements a logical right shift. T x). int m. int size. where z has n digits and x has m digits. . int fill). It ignores leading white space. extern unsigned long XP_toint (int n. The n-digit XP_T z accumulates the integer specified in str using the usual multiplicative algorithm: C Interfaces and Implementations: Techniques for Creating Reusable Software. and accepts one or more digits in base. XP_fromint 8n 8n sets z[0. For bases between 11 and 36. int s. The remaining XP functions convert between strings and XP_Ts. XP_fromstr interprets either lowercase or uppercase letters as digits greater than nine. When n exceeds m. When fill is zero. XP_rshift can be used to implement an arithmetic right shift. T z. unsigned long u). int m. int n. T z.com. it returns the index plus one of the most significant nonzero digit in x[0. char **end). ¢exported functions 299²+≡ extern int XP_fromstr(int n. Any other use requires prior written consent from the copyright owner. It is a checked runtime error for base to be less than two or more than 36. Unauthorized use. extern char *XP_tostr (char *str. const char *str. The vacated bits are filled with fill. T z.. XP_fromstr is like strtoul in the C library. int base. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. that is.. when fill is one. that is. which assign to z the value of x shifted left or right s bits. reproduction and/or distribution are strictly prohibited and violate applicable laws. int fill). T x. int s. the least significant 8•sizeof (unsigned long) bits of x.n-1] to u mod 2 and returns u/ 2 . T x. XP_toint returns x mod (ULONG_MAX+1). XP_length returns the number of digits in x. All rights reserved. Hanson. extern void XP_rshift(int n.n-1]. the bits in the missing digits at the most significant end of x are treated as if they were equal to zero for a left shift and equal to fill for a right shift.

Uppercase letters are used for digits that exceed nine when base exceeds 10. p++) z ← base•z + *p’s value z is not initialized to zero.2 Implementation ¢xp. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. for the character representation of x plus a null character to require more than size characters.h> <string. *p is a digit. XP_tostr fills str with a null-terminated string that is the character representation of x in base. Any other use requires prior written consent from the copyright owner. if end is nonnull. Hanson. XP_fromstr returns zero and sets *end to str. XP_fromstr returns nonzero if the number does not fit in z. If end is nonnull.com. It is also a checked runtime error for str to be null or for size to be too small. Thus. clients must initialize z properly. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. Frank Liu Copyright © 1997 by David R.h" #define T XP_T #define BASE (1<<8) ¢data 320² ¢functions 304² XP_fromint and XP_toint illustrate the kinds of arithmetic manipulations the XP functions must perform. reproduction and/or distribution are strictly prohibited and violate applicable laws.h> "assert. XP_fromint initializes an XP_T so that it is equal to an unsigned long value: C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a checked runtime error for str to be null. and returns str.IMPLEMENTATION 303 for (p = str. If the characters in str do not specify an integer in base. All rights reserved. . XP_fromstr returns the first nonzero carry-out of the multiplication base•z. that is.c²≡ #include #include #include #include <ctype. 17. It is a checked runtime error for base to be less than two or more than 36.h" "xp. *end is assigned the pointer to the character that terminated XP_fromstr’s interpretation because either the multiplication overflowed or a nondigit was scanned. or zero otherwise. x is set to zero.

Hanson. or logical and. do z[i++] = u%BASE. division. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. n-digit XP_T has fewer than n significant digits when it has one or more leading zeros. or modulus by the base to the equivalent left shift. XP_length returns the number of significant digits. } The u%BASE is not strictly necessary. while ((u /= BASE) > 0 && i < n). This download file is made available for personal use only and is subject to the Terms of Service. right shift. Since the base is a constant power of two. i++) z[i] = 0. Unauthorized use. not counting the leading zeros: ¢functions 304²+≡ int XP_length(int n. most compilers will convert multiplication.com. for ( . unsigned long u) { int i = 0. int i = (int)sizeof u.. C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. . ¢functions 304²+≡ unsigned long XP_toint(int n. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (i > n) i = n. XP_toint is the inverse of XP_fromint: It returns the least significant 8•sizeof (unsigned long) bits of an XP_T as an unsigned long. because the assignment to z[i] does the modulus implicitly. All of the arithmetic XP functions do these kinds of explicit operations to help document the algorithms they use. return u. T z. while (--i >= 0) u = BASE*u + x[i]. i < n. T x) { while (n > 1 && x[n-1] == 0) n--. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.304 EXTENDED-PRECISION ARITHMETIC ¢functions 304²≡ unsigned long XP_fromint(int n. } A nonzero. return u. T x) { unsigned long u = 0.

i < n. the carry-out is one.1 Addition and Subtraction The algorithms for implementing addition and subtraction are systematic renditions of the pencil-and-paper techniques from grade school. Each step forms the sum S = carry + x i + y i . and the new carry is S ⁄ b. so ( b – 1 ) + ( b – 1 ) + 1 = 2b – 1 = 511 is the largest value of a single-digit sum. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. because the sum doesn’t fit in four digits. where b is the base — 10 in this example. Frank Liu Copyright © 1997 by David R. int carry) { int i. . All rights reserved. XP_add implements exactly this algorithm.. T x. } At each iteration. } 17.IMPLEMENTATION 305 return n. An example in base 10 best illustrates the addition z = x + y: 1 0 1 0 9 4 2 + 7 3 1 0 11 06 8 2 10 Addition proceeds from the least significant to most significant digit. T z. z i is S mod b. carry /= BASE. reproduction and/or distribution are strictly prohibited and violate applicable laws. and returns the final value of the carry: ¢functions 304²+≡ int XP_add(int n. for (i = 0. i++) { carry += x[i] + y[i]. and. C Interfaces and Implementations: Techniques for Creating Reusable Software. The small numbers in the top row are the carry values. Hanson. Each digit is a number between zero and b – 1. in this example.com. This download file is made available for personal use only and is subject to the Terms of Service.2. then it holds just the carry. the initial value of the carry is zero. In this example. z[i] = carry%BASE. and the two-digit numbers in the bottom row are the values of S. which easily fits in an int. Any other use requires prior written consent from the copyright owner. and the carry can be zero or one. carry holds the single-digit sum S momentarily. } return carry. Unauthorized use. T y.

z[i] = y%BASE. in this example. } D is at most ( b – 1 ) + b – 0 – 0 = 2b – 1 = 511. z[i] = d%BASE. int borrow) { int i. The small numbers in the top row are the borrow values. i++) { y += x[i]. } C Interfaces and Implementations: Techniques for Creating Reusable Software. T y. z i is D mod b. which fits in an int. and. Each step forms the difference D = x i + b – borrow – y i . the initial value of the borrow is zero. i++) { int d = (x[i] + BASE) . int y) { int i. T z. T x. T x. and the two-digit numbers in the bottom row are the values of D. and the new borrow is 1 − D/b.com.. T z. Frank Liu Copyright © 1997 by David R. for (i = 0. for (i = 0. i < n.y[i]. then x is less than y. z = x − y. } return y.borrow . y /= BASE. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. } return borrow. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.d/BASE. borrow = 1 . This download file is made available for personal use only and is subject to the Terms of Service. is similar to addition: 0 1 1 0 0 9 4 2 − 7 3 18 06 09 8 2 16 Subtraction proceeds from the least significant to most significant digit.306 EXTENDED-PRECISION ARITHMETIC Subtraction. ¢functions 304²+≡ int XP_sub(int n. Single-digit addition and subtraction are simpler than the more general functions. Any other use requires prior written consent from the copyright owner. Hanson. and they use the second operand as the carry or borrow: ¢functions 304²+≡ int XP_sum(int n. If the final borrow is nonzero. . i < n.

IMPLEMENTATION 307 int XP_diff(int n. z[i] = carry%BASE. Unauthorized use. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. . 17. int carry) { int i.d/BASE. } The cast ensures that ~x[i] is less than b. carry /= BASE. T z. i++) { int d = (x[i] + BASE) . T x. The following example illustrates this process for n = 4 and m = 3 when the initial value of z is zero: C Interfaces and Implementations: Techniques for Creating Reusable Software. z = x•y forms m partial products each with n digits. i < n. } return carry. All rights reserved. for (i = 0. for (i = 0.2 Multiplication If x has n digits and y has m digits. } XP_neg is like single-digit addition.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. but x’s digits are complemented before the addition: ¢functions 304²+≡ int XP_neg(int n. int y) { int i.2. This download file is made available for personal use only and is subject to the Terms of Service. i++) { carry += (unsigned char)~x[i].y. T x. and the sum of these m partial products forms a result with n+m digits. i < n. T z. Frank Liu Copyright © 1997 by David R.com. z[i] = d%BASE. Hanson. y = 1 . } return y.

is added to the i+1st digit of z. ¢functions 304²+≡ int XP_mul(T z. } return carryout. j. 2•732. } for ( . Frank Liu Copyright © 1997 by David R. when the partial product involving x i is computed.. j++) { carry += x[i]*y[j] + z[i+j]. the digits in the first partial product. int m.com.i. carry /= BASE. int n. The ith digit of the second partial product. j < m. j++) { carry += z[i+j]. 8•732. for (i = 0. The ith digit of this partial product is added to the ith digit of z along with the normal carry computation used in addition. each one can be added to z as the digits in the product are computed. . for (j = 0. are computed from the least significant to most significant digit. T x. All rights reserved. Hanson. T y) { int i. carryout = 0. } carryout |= carry. Any other use requires prior written consent from the copyright owner.308 EXTENDED-PRECISION ARITHMETIC × 2 5 9 1 9 8 0 + 6 6 9 5 4 2 8 1 7 4 8 6 8 3 2 5 4 2 8 6 2 9 6 The partial products do not have to be computed explicitly. j < n + m . carry /= BASE. Unauthorized use. the digits are added to z beginning at its ith digit. } C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. z[i+j] = carry%BASE. i++) { unsigned carry = 0. i < n. In general. z[i+j] = carry%BASE. For example. reproduction and/or distribution are strictly prohibited and violate applicable laws.

There are several algorithms that may be used. Frank Liu Copyright © 1997 by David R. r ← x else q′ ← x/2y. because lg x is the maximum recursion depth. 2 can be as large as ( b – 1 ) ( b – 1 ) + ( b – 1 ) = b – b = 65.IMPLEMENTATION 309 As the digits from the partial products are added to z in the first nested for loop. T x. There can be as many as lg x (log base 2) of these allocations. int y) { int i. z[i] = carry%BASE. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. unsigned carry = 0. the carry can be as large as b − 1. reproduction and/or distribution are strictly prohibited and violate applicable laws. if x < y then q ← 0. T z. stored in carry. r ← r′ − y The intermediate computations involving q′ and r′ must be done using XP_Ts.3 Division and Comparison Division is the most complicated of the arithmetic functions.com. } return carry. C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. so the sum. Hanson. i++) { carry += x[i]*y. carry /= BASE. i < n. The problem with this recursive algorithm is the allocations for q′ and r′. each with their pros and cons. Any other use requires prior written consent from the copyright owner..2. Perhaps the easiest algorithm to understand is the one that is derived from the following mathematical rules to compute q = x/y and r = x mod y. the second nested for loop adds the carry to the remaining digits in z. Unauthorized use. the carry-out of z + x•y is one. with m equal to one and z initialized to zero: ¢functions 304²+≡ int XP_product(int n. which fits in an unsigned. The XP interface forbids these implicit allocations. of course. After adding a partial product to z. and records the carry that spills out of the most significant end of z for this addition. r ← r′ else q ← 2q′ + 1.280. . r′ ← x mod 2y if r′ < y then q ← 2q′. This download file is made available for personal use only and is subject to the Terms of Service. } Licensed by Frank Liu 1740749 17. Single-digit multiplication is equivalent to calling XP_mul. If this carry is ever equal to one. for (i = 0.

x. Frank Liu Copyright © 1997 by David R. the quotient digit q i = R ⁄ y 0 and the new carry is R mod y 0. Division proceeds from the most significant to the least significant digit. and the initial value of the carry is zero. T tmp) { int nx = n. int m. which returns the remainder: ¢functions 304²+≡ int XP_quotient(int n. Single-digit division is easy. y). ¢functions 304²+≡ int XP_div(int n. memset(r + n. n). '\0'. '\0'. T z. T x. } XP_div checks for single-digit division first because that case handles division by zero.n). Hanson. The carry values are the small digits above.310 EXTENDED-PRECISION ARITHMETIC XP_div uses an efficient iterative algorithm for the general case when x ≥ y and y has at least two significant digits. memcpy(r.. Unauthorized use. Dividing 9. the partial dividend R = carry ⋅ b + x i . it uses much simpler algorithms for the easier cases when x < y and when y has only one digit. because the quotient digits can be computed using ordinary unsigned integer division in C. . T q. n = XP_length(n. T r. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (m == 1) { ¢single-digit division 311² } else if (m > n) { memset(q. This download file is made available for personal use only and is subject to the Terms of Service. my = m.com. All rights reserved. my .428 by 7 in base 10 illustrates the steps: 1 7 3 4 6 6 09 24 32 48 At each step. x). int y) { C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. The final value of the carry is the remainder. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This operation is exactly what is implemented by XP_quotient. } else { ¢long division 312² } return 1. T y. m = XP_length(m. nx). T x.

x. i--) { carry = carry*BASE + x[i].367 by 296 in base 10 illustrates the process. This download file is made available for personal use only and is subject to the Terms of Service. for (i = n . Dividing 615. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. efficiently is the crux of the long division problem because that computation involves m-digit operands. reproduction and/or distribution are strictly prohibited and violate applicable laws. so the rest must be set to zero explicitly: ¢single-digit division 311²≡ if (y[0] == 0) return 0. y[0]).IMPLEMENTATION 311 int i. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. z[i] = carry/y. q k. Hanson. carry %= y. } return carry. i >= 0. r[0] = XP_quotient(nx.1). my .. All rights reserved. } R — the value assigned to carry in XP_quotient — can be as large as 2 ( b – 1 )b + ( b – 1 ) = b – 1 = 65. In XP_div. where n ≥ m and m > 1. Unauthorized use. which fits in an unsigned. memset(r + 1. C Interfaces and Implementations: Techniques for Creating Reusable Software. an n-digit dividend is divided by an m-digit divisor. In the general case. '\0'. q. The dividend is extended with a leading zero so that n exceeds m: 2 5 2 3 0 3 0 2 2 0 3 3 0 3 7 6 3 2 7 6 8 7 2 9 6 0 0 6 5 0 0 1 9 2 0 2 2 6 2 4 6 7 7 8 9 Computing each quotient digit. unsigned carry = 0.1.535. .com. the call to XP_quotient returns r’s least significant digit.

The loop computes the n−m+1 quotient digits. k >= 0. which shortens rem by one digit. *dq = tmp + n + 1.312 EXTENDED-PRECISION ARITHMETIC Assuming for the moment that we know how to compute the quotient digits. rem ← x with a leading zero for (k = n . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the chunk for long division becomes ¢long division 312²≡ int k.com. k 3 2 1 0 rem 0615367 023367 23367 2647 279 qk 2 0 7 8 dq 0592 0000 2072 2368 XP_div needs space to hold the digits for the two temporaries rem and dq. and the loop body is executed four times.m. by dividing the m-digit divisor into the m+1-digit prefix of rem. qk. reproduction and/or distribution are strictly prohibited and violate applicable laws. The underlining in the second column identifies the prefix of rem that is divided by y. and dq for each iteration. k--) { compute qk dq ← y∗qk q->digits[k] = qk. C Interfaces and Implementations: Techniques for Creating Reusable Software. n = 6.. This download file is made available for personal use only and is subject to the Terms of Service. it needs n+1 bytes for rem and m+1 bytes for dq. Hanson. and 0. rem is reduced by subtracting the product of qk and y. which is why tmp must be at least n+m+2 bytes long. for k = 6 −3 = 3. unsigned char *rem = tmp. Any other use requires prior written consent from the copyright owner. m = 3. The table below lists the values of k. 2. rem. For the example above. k rem ← rem . Fleshing out the pseudocode above. which is 296. All rights reserved. assert(2 <= m && m <= n). At the end of each iteration. Unauthorized use. 1. . Frank Liu Copyright © 1997 by David R. the following pseudocode outlines an implementation of long division. most significant digit first.dq• b } r ← rem rem starts equal to x with a leading zero.

A simple — but unsuitable — approach starts with qk equal to b−1 and decrements it while y•qk exceeds the m+1-digit prefix of rem: qk = BASE-1. rem. dq ← y•qk. i < nx. This code computes an n− m+1-digit quotient and an m-digit remainder. It turns out that dividing the three-digit prefix of rem by the two-digit pre- C Interfaces and Implementations: Techniques for Creating Reusable Software.k+m] < dq) { qk--.k+ m]. the remaining digits in q and r must be set to zero: ¢fill out q and r with 0s 313²≡ { int i. k ¢rem ← rem . and then adjust it when the estimate is wrong. while (rem[k. for (i = m. } This approach is too slow: The loop might take b−1 iterations. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. k >= 0. Frank Liu Copyright © 1997 by David R.com. for (i = n-m+1.n] holds the n+1 digits for rem.. n). ¢fill out q and r with 0s 313² tmp[0. m). .IMPLEMENTATION 313 memcpy(rem. dq ← y•qk 314² q[k] = qk. i++) q[i] = 0. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. } All that remains is computing the quotient digits...dq• b 315² } memcpy(r. and each iteration requires an m-digit multiplication and an m+1-digit comparison. A better approach is to estimate qk more accurately using normal integer arithmetic. dq ← y•qk. Hanson. i++) r[i] = 0.m. rem[n] = 0. i < my.n+1+ m] holds dq’s m+1 digits. rem always has k+m+1 digits in tmp[0.. x. Any other use requires prior written consent from the copyright owner. ¢compute qk. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. and tmp[n+1. for (k = n . k--) { int qk..

k+m] 314²≡ { int km = k + m.com. y.6666 dictates that b – 1 < 2 32.625 and is also the size of a built-in type. Thus.. qk is one too large.k+m] with dq one digit at a time.k+m] 314² dq[m] = XP_product(m.314 EXTENDED-PRECISION ARITHMETIC fix of y gives an estimate of qk that is either correct or just one too large. dq ← y•qk 314²≡ { int i.. ¢qk ← y[m-2. for (i = m. Unauthorized use. Any other use requires prior written consent from the copyright owner. The for loop compares rem[k. If dq exceeds the m+1digit prefix of rem. so BASE must be less than 2 and thus cannot exceed 1. which is dq’s final digit. unsigned long y2 = y[m-1]*BASE + y[m-2]. An unsigned long can hold values less than 2 . assert(2 <= m && m <= k+m && k+m <= n). y. dq.. unsigned long r3 = rem[km]*(BASE*BASE) + rem[km-1]*BASE + rem[km-2]. if (rem[i+k] < dq[i]) dq[m] = XP_product(m.777. which fits in an unsigned long. 2 3 C Interfaces and Implementations: Techniques for Creating Reusable Software.. } r3 can be as large as ( b – 1 )b + ( b – 1 )b + ( b – 1 ) = b – 1 = 16. qk = r3/y2. dq. Hanson.m−1]/rem[k+m-2. Estimating qk can be done with normal integer division: ¢qk ← y[m−2. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. which 3 10.625. reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved. This computation is what constrains the 32 choice of BASE. . } XP_product. shown above.. if (qk >= BASE) qk = BASE . the loop above is replaced by a single test: ¢compute qk.1. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service. so it is decremented and dq is recomputed. --qk).m-1]/rem[k+m-2. qk). computes y[0. assigns the result to dq. i--) if (rem[i+k] != dq[i]) break.215. 256 is the largest power of two that does not exceed 1.m-1]•qk. i > 0. and returns the final carry..

¢shift left by s/8 bytes 316² s %= 8..dq• b 315²≡ { int borrow. as shown below. This subtraction can be done by conceptually shifting dq left by k digits and subtracting that from rem. This download file is made available for personal use only and is subject to the Terms of Service. &rem[k]. which reduces rem and shortens it by one digit. XP_sub. and the second step shifts the remaining s mod 8 bits. A shift of s bits is done in two steps: The first step shifts 8•(s/8) bits by moving a byte at a time. Unauthorized use. dq. Frank Liu Copyright © 1997 by David R. int s. borrow = XP_sub(m + 1. shown above. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner. s mod 8 bits at a time. while (i > 0 && x[i] == y[i]) i--. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. fill is set to a byte of all ones or zeroes so that it can be used to fill a byte a time. int fill) { fill = fill ? 0xFF : 0. T x. T y) { int i = n . 0). &rem[k].IMPLEMENTATION 315 The final piece in the long-division puzzle is to subtract dq from the m+1-digit prefix of rem. assert(borrow == 0). most significant digit first. T x.com. ¢functions 304²+≡ void XP_lshift(int n. XP_cmp does exactly this for its two XP_T arguments: ¢functions 304²+≡ int XP_cmp(int n. assert(0 <= k && k <= k+m). All rights reserved.y[i]. } The code in ¢compute qk. . Hanson. can be used to do this subtraction by passing it pointers to the appropriate digits in rem: ¢rem ← rem . T z.1.4 Shifting Two functions in XP’s implementation shift XP_Ts left and right by a specified number of bits. } k 17.2. dq ← y•qk 314² shows that two multidigit numbers are compared by comparing their digits. return x[i] . int m. C Interfaces and Implementations: Techniques for Creating Reusable Software.

1.. 13/8 bytes 13%8 bits Shifting left s/8 bytes can be summarized by the following assignments.1. Frank Liu Copyright © 1997 by David R.s/8 .. and the initialization code handles the case when n is less than m: ¢shift left by s/8 bytes 316²≡ { int i. Hanson. All rights reserved. most significant byte first.m+(s/8)-1] ← x[0. Each of these assignments involves a loop. which are set to fill. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. j--) C Interfaces and Implementations: Techniques for Creating Reusable Software.m-1] z[0. j = n .316 EXTENDED-PRECISION ARITHMETIC if (s > 0) ¢shift z left by s bits } 317² These steps are illustrated by the following figure.com. Any other use requires prior written consent from the copyright owner. for ( . z[m+(s/8). and the third assignment sets z’s s/8 least significant bytes to the fill. x i is copied to z i + s ⁄ 8.. j >= m + s/8. the light shading on the right identifies the vacated bits.. if (n > m) i = m . which shows what happens when a six-digit XP_T with 44 ones is shifted left by 13 bits into an eight-digit XP_T.n-1] ← 0 z[s/8. . else i = n .1. In the second assignment..(s/8)-1] ← fill. The first assignment clears the digits in z that don’t appear in x shifted left by s/8 bytes. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

which are set to fill. Any other use requires prior written consent from the copyright owner.IMPLEMENTATION 317 z[j] = 0. int fill) { fill = fill ? 0xFF : 0. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. j--) z[j] = fill. Frank Liu Copyright © 1997 by David R. j >= 0. s This shift is equivalent to multiplying z by 2 . so fill>>(8-s) forms s fill bits in the least significant bits of a byte. then setting the s least significant bits of z to the fill: ¢shift z left by s bits 317²≡ { XP_product(n. T z. int m. s has been reduced to the number of bits to shift.. ¢shift right by s/8 bytes 318² s %= 8. . z. Unauthorized use. } In the second step. for ( . Hanson. All rights reserved. z. i >= 0. and the second step shifts the remaining s mod 8 bits. 1<<s). if (s > 0) ¢shift z right by s bits 318² } Shifting a six-digit XP_T with 44 ones right by 13 bits into an eight-digit XP_T illustrates the steps in right shift in the following figure. for ( . A similar two-step process is used for shifting right: The first step shifts s/8 bytes to the right. j--) z[j] = x[i]. again the light shading on the left identifies the vacated and excess bits.com. C Interfaces and Implementations: Techniques for Creating Reusable Software. } fill is either zero or 0xFF. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. ¢functions 304²+≡ void XP_rshift(int n. int s. T x. z[0] |= fill>>(8-s). i--.

. Hanson.m-1] z[m-(s/8). reproduction and/or distribution are strictly prohibited and violate applicable laws. j++) z[j] = fill.318 EXTENDED-PRECISION ARITHMETIC 13/8 bytes 13%8 bits The three assignments for right shift are z[0. All rights reserved. i++.n-1] ← fill. z. The first assignment copies x i to z i – s ⁄ 8. } The second step shifts z right by s bits. The second and third assignments can.. j = 0. be done in the same loop: ¢shift right by s/8 bytes 318²≡ { int i.. for ( .com.m-1] ← fill z[m. i < m && j < n. j++) z[j] = x[i]. Any other use requires prior written consent from the copyright owner. j < n. Unauthorized use. and the third sets the digits in z that don’t appear in x to fill. Frank Liu Copyright © 1997 by David R. of course. This download file is made available for personal use only and is subject to the Terms of Service. which is equivalent to dividing z s by 2 : ¢shift z right by s bits 318²≡ { XP_quotient(n. starting with byte s/8. least significant byte first.m-(s/8)-1] ← x[s/8. 1<<s). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. z.. . z[n-1] |= fill<<(8-s). } C Interfaces and Implementations: Techniques for Creating Reusable Software. for (i = s/8. The second assignment sets the vacated bytes to the fill.

This download file is made available for personal use only and is subject to the Terms of Service. which must be from two to 36 inclusive. Frank Liu Copyright © 1997 by David R. assert(p).. p++) { carry = XP_product(n.IMPLEMENTATION 319 The expression fill<<(8-s) forms s fill bits in the most significant bits of a byte. Any other use requires prior written consent from the copyright owner. XP_sum(n. return carry. T z. z. ¢skip white space 320² if (¢*p is a digit in base 320²) { int carry. } } Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. which is then OR’ed into z’s most significant byte. ¢functions 304²+≡ int XP_fromstr(int n. for ( .5 String Conversions The last two XP functions convert XP_Ts to strings and vice versa. } if (end) *end = (char *)p. base). if (carry) break. return 0. For bases that exceed 10. z. Unauthorized use.com. map[*p-'0']). reproduction and/or distribution are strictly prohibited and violate applicable laws. XP_fromstr stops scanning its string argument when it encounters an illegal character or the null character. } else { if (end) *end = (char *)str. int base. Hanson. . All rights reserved. letters specify the digits that exceed nine. char **end) { const char *p = str. it accepts optional white space followed by one or more digits in the specified base. z. XP_fromstr converts a string to an XP_T. or when the carryout from the multiplication is nonzero. ¢*p is a digit in base 320². C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. z. 17.2. const char *str. assert(base >= 2 && base <= 36).

30. { 4. 10. 7. map[c-'0'] is the corresponding digit value. 11. 22. Any other use requires prior written consent from the copyright owner. 24. 21. 36. base). 36. 19. x. If end is nonnull. Hanson. 36. which peels off the digits last one first. 33. 23. All rights reserved. 36. This download file is made available for personal use only and is subject to the Terms of Service. 36. int base. 25.com. 17. 17. 36. 11. 32. 25. 5. If c is a digit character. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 36. C Interfaces and Implementations: Techniques for Creating Reusable Software. 15. 31. 8. XP_fromstr sets *end to the pointer to the character that terminated the scan. 12. . int size. 35. ¢data 320²≡ static char 0. 3.320 EXTENDED-PRECISION ARITHMETIC ¢skip white space 320²≡ while (*p && isspace(*p)) p++. do { int r = XP_quotient(n. map['F'-'0'] is 15. 36. 36. 12. 27. 16. 14. 6. 29. 18. 34. map[] = 2. x. 13. Unauthorized use. 16. XP_fromstr tests whether *p is a digit character with ¢*p is a digit in base 320²≡ (*p && isalnum(*p) && map[*p-'0'] < base) XP_tostr uses the usual algorithm for computing the string representation of x. 30. 20. T x) { int i = 0. ¢functions 304²+≡ char *XP_tostr(char *str. 34. int n. 15. but XP_tostr uses the XP functions to do the arithmetic. }. 10. 28. reproduction and/or distribution are strictly prohibited and violate applicable laws. 21. assert(base >= 2 && base <= 36). 22. assert(str). for example. Thus. 31. 26. 14. 20. 36.. Frank Liu Copyright © 1997 by David R. 1. 9. 36. 36. 23. 32. 24. 19. 28. This value is chosen so that c is a digit in base if map[c-'0'] is less than base. 29. 35 map[c-'0'] is 36 for those few invalid digit characters that lie between 0 and z in the ASCII collating sequence. 18. 27. 26. 33. 36. 13.

so XP_tostr concludes by reversing them: ¢reverse str 321²≡ { int j. Brinch-Hansen also shows how to avoid correcting qk most of time by scaling the operands. str[j] = str[i]. reproduction and/or distribution are strictly prohibited and violate applicable laws. j++) { char c = str[j].. j < --i.3 in Knuth (1981) describe the classical algorithms for implementing the arithmetic operations. which includes the proof that the estimated quotient digit is off by at most one. while (n > 1 && x[n-1] == 0) n--. This download file is made available for personal use only and is subject to the Terms of Service. for (j = 0. but can avoid most of the second calls to product when qk must be decremented. Chapter 4 in Hennessy and Patterson (1994) and Section 4. All rights reserved. } The digits end up in str backward. str[i++] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[r]. ¢reverse str 321² return str. C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. str[i] = '\0'. Any other use requires prior written consent from the copyright owner.FURTHER READING 321 assert(i < size). Hanson. assert(i < size). } while (n > 1 || x[0] != 0). } } Further Reading Most of the arithmetic functions in XP are straightforward implementations of the algorithms everyone learned in grade school. Knuth (1981) nicely summarizes the long history of these algorithms. str[i] = c. . Division is difficult because of the constraints imposed in computing the quotient digits. Frank Liu Copyright © 1997 by David R. Scaling costs an extra single-digit multiplication and division.com. The algorithm used in XP_div is taken from BrinchHansen (1994). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

This exceeds ULONG_MAX on most 32-bit computers. like 2 . 17. uniformly distributed in a specified range. .322 EXTENDED-PRECISION ARITHMETIC Exercises 17. is often easier to implement in assembly language. Frank Liu Copyright © 1997 by David R.497. because many machines have double-precision instructions and it’s usually easy to capture carries and borrows. 32 32 C Interfaces and Implementations: Techniques for Creating Reusable Software. Devise a way around this problem. Unauthorized use. and measure the benefits. Are there any conditions under which the recursive algorithm is preferable? 17. too. Representing XP_Ts in base 2 would thus make these functions run twice as fast.com. Reimplement XP in assembly language on your favorite computer and quantify its speed improvements. 17.3 for base 2 . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft..655.2 Implement the “shift and subtract” division algorithm described in Chapter 4 of Hennessy and Patterson (1994) and compare its performance with the Brinch-Hansen algorithm used in XP_div.4 Do Exercise 17. Assembly-language implementations are invariably faster.610. This download file is made available for personal use only and is subject to the Terms of Service. because (2 16 3 ) – 1 = 28. 17. Any other use requires prior written consent from the copyright owner. All rights reserved. however. Hanson.1 Implement the recursive division algorithm and compare its execution time and space performance with the Brinch-Hansen algorithm used in XP_div.5 Extended-precision arithmetic in larger bases. Is the added complexity of division worth the benefits? 17. and normal C integer arithmetic can’t be used to estimate the quotient digits in a portable fashion. Division. implement XP 16 using base 2 .147. presents a problem.3 Most of the XP functions take time proportional to the number of 16 digits in their operands. reproduction and/or distribution are strictly prohibited and violate applicable laws.6 Implement an XP function that generates random numbers.

unlike XP_Ts. The values that can be represented are limited only by the available memory. These integers can be used in applications that need integer values in a potentially huge range. Most applications should use AP or the MP interface described in the next chapter. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner. That is.18 ARBITRARY-PRECISION ARITHMETIC his chapter describes the AP interface. For example. which is only a tiny fraction of the billions of dollars held by some funds. This download file is made available for personal use only and is subject to the Terms of Service. It also implements the checked runtime errors that XP omits.496.com.1 Interface The AP interface hides the representation of an arbitrary-precision signed integer behind an opaque pointer type: 323 C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. some mutual-fund companies track share prices to the nearest centicent — 1/10. T 18.7295. A 32-bit unsigned integer can represent only $429. but AP is a high-level interface: It reveals only an opaque type that represents arbitrary-precision signed integers. which provides signed integers of arbitrary precision and arithmetic operations on them.000 of a dollar — and thus might do all computations in centicents. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. of course. Hanson. AP uses XP. All rights reserved. and they can have an arbitrary number of digits. and to perform the usual arithmetic operations on them.. . the integers provided by AP can be negative or positive. AP exports functions to allocate and deallocate these integers.

initializes it to the value of n. This download file is made available for personal use only and is subject to the Terms of Service. *end is assigned the pointer to the character that terminated AP_fromstr’s interpretation. extern T AP_fromstr(const char *str. . AP_fromstr returns null and sets *end to str. AP_new and AP_fromstr can raise Mem_Failed. It ignores leading white space. Any other use requires prior written consent from the copyright owner. ¢exported functions #undef T #endif It is a checked runtime error to pass a null AP_T to any function in this interface.h> #define T AP_T typedef struct T *T. extern char *AP_tostr(char *str. and returns it. Unauthorized use.com. and accepts an optional sign followed by one or more digits in base. For bases between 11 and 36. AP_new creates a new AP_T.h²≡ #ifndef AP_INCLUDED #define AP_INCLUDED #include <stdarg. it interprets the string in str as an integer in base. Frank Liu Copyright © 1997 by David R. If end is nonnull. and returns it. except where noted below. if end is nonnull. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. AP_fromstr is like strtol in the C library. int base. initializes it to the value specified by str and base. char **end). int size. The functions ¢exported functions 324²+≡ extern long int AP_toint(T x). AP_fromstr creates a new AP_T. 324² C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a checked runtime error for base to be less than two or more than 36. AP_fromstr interprets either lowercase or uppercase letters as digits greater than nine.. AP_Ts are created by ¢exported functions 324²≡ extern T AP_new (long int n). If the characters in str do not specify an integer in base. Hanson. All rights reserved.324 ARBITRARY-PRECISION ARITHMETIC ¢ap. It is a checked runtime error for str to be null. reproduction and/or distribution are strictly prohibited and violate applicable laws.

AP_Ts are deallocated by ¢exported functions 324²+≡ extern void AP_free(T *z). int width. It is a checked runtime error for app or flags to be null.com. T y). It is a checked runtime error for base to be less than two or more than 36. Each returns an AP_T for the result. void *cl. T x). This download file is made available for personal use only and is subject to the Terms of Service. and returns that string. If x is LONG_MIN. AP_free deallocates *z and sets *z to null. extern T AP_add(T x. AP_fmt can raise Mem_Failed. When str is null. int precision). extern void AP_fmt(int code. AP_fmt can be used with the functions in the Fmt interface as a conversion function to format AP_Ts. Unauthorized use.INTERFACE 325 int base. AP_tostr fills str up to size characters. where LONG_MAX is the largest positive long int. The following functions perform arithmetic operations on AP_Ts. extract and print the integers represented by AP_Ts. AP_toint returns a long int with the same sign as x and a magnitude equal to |x| mod (LONG_MAX+1). AP_tostr fills str with a null-terminated string that is the character representation of x in base. Frank Liu Copyright © 1997 by David R. which is -LONG_MAX-1 on two’s-complement machines. AP_toint returns -((LONG_MAX+1) mod (LONG_MAX+1)). C Interfaces and Implementations: Techniques for Creating Reusable Software. It consumes an AP_T and formats it according to the optional flags. If str is null.. All rights reserved. int put(int c. void *cl). for the character representation of x plus a null character to require more than size characters. Uppercase letters are used for digits that exceed nine when base exceeds 10. T y). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢exported functions 324²+≡ extern T AP_neg(T x). AP_tostr allocates a string large enough to hold the representation of x. and each can raise Mem_Failed. AP_tostr can raise Mem_Failed. reproduction and/or distribution are strictly prohibited and violate applicable laws. It is a checked runtime error for size to be too small — that is. unsigned char flags[]. and precision in the same way that the printf specifier %d formats its integer argument. and returns str. extern T AP_sub(T x. Any other use requires prior written consent from the copyright owner. va_list *app. If str is nonnull. width. Hanson. size is ignored. It is a checked runtime error for z or *z to be null. It is the client’s responsibility to deallocate the string. . which is zero.

y AP_pow returns x when p is null. x and y denote the integer values represented by x and y. Hanson. it is a checked runtime error for y to be zero. All rights reserved. AP_neg returns -x. AP_rshift returns an AP_T equal to x s shifted right by s bits. AP_div returns x/y. int s). This download file is made available for personal use only and is subject to the Terms of Service. which is equivalent to dividing x by 2 . reproduction and/or distribution are strictly prohibited and violate applicable laws. and AP_mul returns x•y. x. extern T AP_rshift(T x. Any other use requires prior written consent from the copyright owner. y). AP_add returns x + y. AP_pow returns y ( x ) mod p. y) is equivalent to AP_add(x. are similar to the functions described above but take a long int for y. is the maximum integer that does not exceed the real number w such that w •y = x. AP_Ts can be shifted with the functions ¢exported functions 324²+≡ extern T AP_lshift(T x. AP_sub returns x − y. x.com. y. Each of these functions can raise Mem_Failed. AP_new(y)). For AP_div and AP_mod. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. When p is nonnull. AP_lshift returns an AP_T equal to x shifted left by s bits. T p). The rules regarding division and modulus are the same as for AP_div and AP_mod. y). It is a checked runtime error for y to be negative. and the remainder is defined to be x − y•q. For example. y).. q. x. x. Frank Liu Copyright © 1997 by David R. so the remainder is always positive. y). AP_addi(x. . x. Division truncates to the left: toward minus infinity when one of x or y is negative and toward zero otherwise. unless the shift val- C Interfaces and Implementations: Techniques for Creating Reusable Software. y). The convenience functions ¢exported functions 324²+≡ extern T AP_addi(T extern T AP_subi(T extern T AP_muli(T extern T AP_divi(T extern long AP_modi(T x. T T T T y). and AP_mod returns x mod y. the quotient. int s). The values returned by both functions have the same sign as x. Unauthorized use. x. This definition is identical to the one implemented by the Arith interface described in Chapter 2. long long long long long int int int int int y). Here and below. y). which is s equivalent to multiplying x by 2 . or for p to be nonnull and less than two.326 ARBITRARY-PRECISION ARITHMETIC extern extern extern extern T T T T AP_mul(T AP_div(T AP_mod(T AP_pow(T x. x. More precisely.

other characters are announced as unrecognized operators. or x > y. equal to zero. Hanson. and the shift functions can raise Mem_Failed. if. x < y.. illustrates the use of the XP interface. All rights reserved. AP_Ts are compared by ¢exported functions 324²+≡ extern int AP_cmp (T x.EXAMPLE: A CALCULATOR 327 ues are zero. Unauthorized use. respectively. but a diagnostic announces stack underflow. Both functions return an integer less than zero. ~ + * / % ^ d p f q negation addition subtraction multiplication division remainder exponentiation duplicate the value at the top of the stack print the value at the top of the stack print all the values on the stack from the top down quit White-space characters separate values but are otherwise ignored. A value is one or more consecutive decimal digits. and operators pop their operands from the stack and push their results. Any other use requires prior written consent from the copyright owner. 18. Frank Liu Copyright © 1997 by David R. . This download file is made available for personal use only and is subject to the Terms of Service. or greater than zero. and the operators are as follows. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. calc.com. and the vacated bits are set to zero. reproduction and/or distribution are strictly prohibited and violate applicable laws. x = y. The implementation of the AP interface. C Interfaces and Implementations: Techniques for Creating Reusable Software. described in the next section. The size of the stack is limited only by available memory. uses Polish suffix notation: Values are pushed onto a stack.2 Example: A Calculator A calculator that does arbitrary-precision computations illustrates the use of the AP interface. extern int AP_cmpi(T x. T y). It is a checked runtime error for s to be negative. long int y). The calculator.

.h> "stack.h> <stdio. calc must not call Stack_pop when sp is empty. computing values.com.h" "ap. Frank Liu Copyright © 1997 by David R. Hanson. Unauthorized use. ¢initialization 328²≡ sp = Stack_new(). Any other use requires prior written consent from the copyright owner.h> <string. "?stack underflow\n").h" ¢calc data 328² ¢calc functions 328² As the inclusion of stack. calc uses the stack interface described in Chapter 2 for its stack: ¢calc data 328²≡ Stack_T sp. so it wraps all pop operations in a function that checks for underflow: ¢calc functions 328²≡ AP_T pop(void) { if (!Stack_empty(sp)) return Stack_pop(sp). reproduction and/or distribution are strictly prohibited and violate applicable laws. and managing a stack.328 ARBITRARY-PRECISION ARITHMETIC calc is a simple program that has three main tasks: interpreting the input. even when the stack is empty. return AP_new(0)..h> <stdlib. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢calc. simplifies error-checking elsewhere in calc.c²≡ #include #include #include #include #include #include #include <ctype.h" "fmt.h suggests. } } Always returning an AP_T. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. The main loop in calc reads the next “token” — value or operator — and switches on it: C Interfaces and Implementations: Techniques for Creating Reusable Software. else { Fmt_fprint(stderr.

an operator. "?'%c'". return EXIT_SUCCESS. All rights reserved. calc gathers up the digits that follow the first one into a buffer. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the first digit of a value. Any other use requires prior written consent from the copyright owner. char *argv[]) { int c. reproduction and/or distribution are strictly prohibited and violate applicable laws. which is an error as shown in the default case above. or something else. else Fmt_fprint(stderr. . Licensed by Frank Liu 1740749 An input character is either white space. break. and uses AP_fromstr to convert the run of digits to an AP_T: ¢cases 329²+≡ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': { char buf[512]. White space is simply ignored: ¢cases 329²≡ case ' ': case '\t': case '\n': case '\f': case '\r': break. ¢gather up digits into buf 333² C Interfaces and Implementations: Techniques for Creating Reusable Software. } ¢clean up and exit 329² } ¢clean up and exit 329²≡ ¢clear the stack 333² Stack_free(&sp). Unauthorized use.EXAMPLE: A CALCULATOR 329 ¢calc functions 328²+≡ int main(int argc.. "?'\\%03o'". ¢initialization 328² while ((c = getchar()) != EOF) switch (c) { ¢cases 329² default: if (isprint(c)) Fmt_fprint(stderr. Fmt_fprint(stderr. Frank Liu Copyright © 1997 by David R. Hanson. A digit is the beginning of a value. " is unimplemented\n"). c).com. c).

} ¢pop x and y off the stack 330²≡ AP_T y = pop(). It is easy to make the error of having two or more copies of one AP_T on the stack.. Subtraction and multiplication are similar in form to addition: ¢cases 329²+≡ case '-': { ¢pop x and y off the stack 330² Stack_push(sp. y)). Hanson. y)). This download file is made available for personal use only and is subject to the Terms of Service. NULL)). Unauthorized use. All rights reserved. x = pop().330 ARBITRARY-PRECISION ARITHMETIC Stack_push(sp. reproduction and/or distribution are strictly prohibited and violate applicable laws. which makes it impossible to know which AP_Ts should be freed. AP_free(&y). } case '*': { ¢pop x and y off the stack 330² Stack_push(sp. Any other use requires prior written consent from the copyright owner. break. ¢free x and y 330² break. . AP_fromstr(buf. } Each operator pops zero or more operands from the stack and pushes zero or more results.com. ¢free x and y 330² C Interfaces and Implementations: Techniques for Creating Reusable Software. 10. all others are freed by calling AP_free. y)). ¢free x and y 330² break. The code above shows the simple protocol that avoids this problem: The only “permanent” AP_Ts are those on the stack. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. ¢free x and y 330²≡ AP_free(&x). AP_mul(x. Addition is typical: ¢cases 329²+≡ case '+': { ¢pop x and y off the stack 330² Stack_push(sp. AP_add(x. AP_sub(x.

AP_new(0)). } Exponentiation must guard against a nonpositive power: ¢cases 329²+≡ case '^': { ¢pop x and y off the stack 330² if (AP_cmpi(y. } C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. } else Stack_push(sp. but they must guard against a zero divisor. } else Stack_push(sp. 0) <= 0) { Fmt_fprint(stderr. Stack_push(sp. y. } else Stack_push(sp. All rights reserved. ¢free x and y 330² break. y)). Frank Liu Copyright © 1997 by David R.EXAMPLE: A CALCULATOR 331 break.. reproduction and/or distribution are strictly prohibited and violate applicable laws. AP_new(0)). Unauthorized use. Any other use requires prior written consent from the copyright owner. 0) == 0) { Fmt_fprint(stderr. } case '%': { ¢pop x and y off the stack 330² if (AP_cmpi(y. AP_new(0)). . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. AP_pow(x.com. AP_mod(x. "?nonpositive power\n"). ¢free x and y 330² break. Stack_push(sp. AP_div(x. "?/ by 0\n"). } Division and remainder are also simple. NULL)). "?%% by 0\n"). 0) == 0) { Fmt_fprint(stderr. ¢free x and y 330² break. y)). Hanson. ¢cases 329²+≡ case '/': { ¢pop x and y off the stack 330² if (AP_cmpi(y. Stack_push(sp.

x). without them. All rights reserved.com. ¢cases 329²+≡ case 'f': if (!Stack_empty(sp)) { Stack_T tmp = Stack_new(). Any other use requires prior written consent from the copyright owner.. ¢initialization 328²+≡ Fmt_register('D'. Frank Liu Copyright © 1997 by David R. and pushing the value and a copy of the value. calc must create a temporary stack. calc uses D. 0)). pour the contents of the main stack onto the temporary stack. } Printing an AP_T is accomplished by associating AP_cvt with a format code and using that code in a format string passed to Fmt_fmt. ¢cases 329²+≡ case 'p': { AP_T x = pop(). AP_addi(x. break. Stack_push(sp. Fmt_print("%D\n". Stack_push(sp. The only way to copy an AP_T is to add zero to it. AP_fmt). break. . reproduction and/or distribution are strictly prohibited and violate applicable laws. or to tell how many values are on the stack. Unauthorized use.332 ARBITRARY-PRECISION ARITHMETIC Duplicating the value at the top of the stack is accomplished by popping it off the stack. so that underflow is detected. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and then pour the values from the temporary stack back onto the main stack. Stack_push(sp. printing the values as it goes. } Printing all the values on the stack reveals a weakness in the Stack interface: There’s no way to access the values under the top one. x). Hanson. ¢cases 329²+≡ case 'd': { AP_T x = pop(). while (!Stack_empty(sp)) { C Interfaces and Implementations: Techniques for Creating Reusable Software. A better stack interface might include functions like Table_length and Table_map. x). This download file is made available for personal use only and is subject to the Terms of Service.

} case 'c': ¢clear the stack 333² break. reproduction and/or distribution are strictly prohibited and violate applicable laws. Stack_push(sp. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. Unauthorized use. i++) if (i < (int)sizeof (buf) .1) { i = (int)sizeof (buf) .com. AP_neg(x)). x).. c = getchar(). and quit: ¢cases 329²+≡ case '~': { AP_T x = pop(). Hanson. Stack_pop(tmp)). Stack_push(tmp. for ( . The final chunk of calc reads a run of one or more digits into buf: ¢gather up digits into buf 333²≡ { int i = 0. } while (!Stack_empty(tmp)) Stack_push(sp.1. c != EOF && isdigit(c). Stack_free(&tmp). C Interfaces and Implementations: Techniques for Creating Reusable Software. . break. This download file is made available for personal use only and is subject to the Terms of Service. } break.EXAMPLE: A CALCULATOR 333 AP_T x = pop(). case 'q': ¢clean up and exit 329² ¢clear the stack 333²≡ while (!Stack_empty(sp)) { AP_T x = Stack_pop(sp).1) buf[i] = c. x). AP_free(&x). } calc deallocates the stacked AP_Ts as it clears the stack to avoid creating objects that are unreachable and whose storage could never be deallocated. if (i > (int)sizeof (buf) . clear the stack. Frank Liu Copyright © 1997 by David R. The remaining cases negate values. Fmt_print("%D\n". AP_free(&x). All rights reserved.

h> <string. i). } buf[i] = 0.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson.h> <limits.h" "xp. }. stdin).h> "assert..334 ARBITRARY-PRECISION ARITHMETIC Fmt_fprint(stderr. . Unauthorized use. if (c != EOF) ungetc(c. calc announces excessively long numbers and truncates them. AP uses a sign-magnitude representation for signed numbers: An AP_T points to a structure that carries the sign of the number and its absolute value as an XP_T: ¢ap.h" "mem. Any other use requires prior written consent from the copyright owner. XP_T digits.3 Implementation The implementation of the AP interface illustrates a typical use of the XP interface.h> <stdlib. All rights reserved.h" "fmt.h" "ap. int size. This download file is made available for personal use only and is subject to the Terms of Service. int ndigits. 18. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h" #define T AP_T struct T { int sign. } As this code shows. Frank Liu Copyright © 1997 by David R. ¢macros 337² ¢prototypes 336² C Interfaces and Implementations: Techniques for Creating Reusable Software.c²≡ #include #include #include #include #include #include #include #include #include <ctype. "?integer constant exceeds %d digits\n".

an AP_T represents the number given by the XP_T in digits[0. n).129 ¢static functions ¢functions 335² 335² sign is either 1 or −1. This download file is made available for personal use only and is subject to the Terms of Service.IMPLEMENTATION 335 sign ndigits size digits 4 1 5 11 245 102 33 175 Figure 18.702..468. mk allocates an AP_T capable of holding size digits and initializes it to zero. Frank Liu Copyright © 1997 by David R. Thus. AP_Ts are allocated by ¢functions 335²≡ T AP_new(long int n) { return set(mk(sizeof (long int)). sizeof (*z) + size). reproduction and/or distribution are strictly prohibited and violate applicable laws. The unused elements of the digits array are shaded.. C Interfaces and Implementations: Techniques for Creating Reusable Software. AP_Ts are always normalized: Their most significant digit is nonzero.468. Figure 18. unless the value is zero. it can exceed ndigits. All rights reserved.com. size is the number of digits allocated and pointed to by digits. } which calls the static function mk to do the actual allocation.129 on a little endian computer with 32-bit words and 8-bit characters. ndigits is often less than size.702. Any other use requires prior written consent from the copyright owner. Unauthorized use.1 Little endian layout for an AP_T equal to 751.1 shows the layout of an 11-digit AP_T that is equal to 751. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which is the number of digits in use.ndigits-1]. . ¢static functions 335²≡ static T mk(int size) { T z = CALLOC(1. That is.

and.336 ARBITRARY-PRECISION ARITHMETIC assert(size > 0). z->digits. z->digits. z->digits). } There are two representations for zero in a sign-magnitude representation. -n). as the code in mk suggests. z->digits. by convention. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. AP_new calls the static function set to initialize an AP_T to the value of a long int. LONG_MAX + 1UL). All rights reserved. Unauthorized use. When an AP function forms an XP_T that might be unnormalized. An XP_T is unnormalized.com. int n) { z->ndigits = XP_length(n. z->size = size. z->sign = 1. AP uses only the positive representation. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. z->digits = (XP_T)(z + 1). Frank Liu Copyright © 1997 by David R. long int n) { if (n == LONG_MIN) XP_fromint(z->size. return z. } ¢prototypes 336²≡ static T normalize(T z. z->sign = n < 0 ? -1 : 1. and that the sign of zero is one. z->ndigits = 1. z->size). return z. C Interfaces and Implementations: Techniques for Creating Reusable Software. else if (n < 0) XP_fromint(z->size. n). Any other use requires prior written consent from the copyright owner.. int n). set handles the most negative long int as a special case: ¢static functions 335²+≡ static T set(T z. else XP_fromint(z->size. because its most significant digit can be zero. } The assignment to z->sign is the idiom that ensures that the sign value is either 1 or −1. it calls normalize to fix it by computing the correct ndigits field: ¢static functions 335²+≡ static T normalize(T z. as usual. return normalize(z.

memcpy(z->digits. and it illustrates a recurring problem with a sign-magnitude representation: ¢functions 335²+≡ T AP_neg(T x) { T z. Frank Liu Copyright © 1997 by David R. } ¢macros 337²≡ #define iszero(x) ((x)->ndigits==1 && (x)->digits[0]==0) Negating x simply copies the value and flips the sign.. reproduction and/or distribution are strictly prohibited and violate applicable laws. x->ndigits). The magnitude of x•y is x • y . A sign is −1 or 1. } AP_new is the only way to allocate an AP_T. Hanson. The result is positive when x and y have the same sign or when x or y is zero. The macro iszero takes advantage of the constraint that AP_Ts are normalized: The value zero has only one digit. Any other use requires prior written consent from the copyright owner. FREE(*z). so it is safe for AP_free to “know” that the space for the structure and the digit array were allocated with a single allocation.IMPLEMENTATION 337 An AP_T is deallocated by ¢functions 335²+≡ void AP_free(T *z) { assert(z && *z). assert(x). This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved.com. except when the value is zero.1 Negation and Multiplication Negation is the easiest arithmetic operation to implement. x->digits.3. z = mk(x->ndigits). Unauthorized use. 18. z->sign = iszero(z) ? 1 : -x->sign. and negative otherwise. z->ndigits = x->ndigits. so the comparison C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and it might have as many digits as the sum of the number of digits in x and y. return z. .

y<0 x<0 x≥0 –( x + y ) x– y if x > y y– x y≥0 if y ≥ x – ( x – y ) if y < x x+y – ( y – x ) if x ≤ y x + y is equivalent to x + y . when x and y are nonegative. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and that mk initializes z to both a normalized and an unnormalized zero. z = mk(x->ndigits + y->ndigits). x->digits.com. because it may require subtraction. depending on the signs and values of x and y. This download file is made available for personal use only and is subject to the Terms of Service. assert(y). x->ndigits. XP_mul(z->digits. The following table summarizes the cases. y->digits).338 ARBITRARY-PRECISION ARITHMETIC ¢x and y have the same sign 338²≡ ((x->sign^y->sign) == 0) is true when x and y have the same sign and false otherwise. Any other use requires prior written consent from the copyright owner. Unauthorized use. 18. } Recall that XP_mul computes z = z + x•y.2 Addition and Subtraction Addition is more complicated.3. reproduction and/or distribution are strictly prohibited and violate applicable laws. T y) { T z. Hanson. All rights reserved. y->ndigits. z->sign = iszero(z) || ¢x and y have the same sign 338² ? 1 : -1. AP_mul calls XP_mul to compute x • y and computes the sign itself: ¢functions 335²+≡ T AP_mul(T x. z->size). .. return z. normalize(z. so the cases on the diagonal can both be handled by computing x + y and setting C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(x).

y. assert(x). &z->digits[n]. 0). y). All rights reserved. x->digits.IMPLEMENTATION 339 the sign to the sign of x.n. x). T y) { int n = y->ndigits. return normalize(z. x. Frank Liu Copyright © 1997 by David R.com. y->digits. T y) { T z.y) ((x)->ndigits > (y)->ndigits ? \ (x)->ndigits : (y)->ndigits) add calls XP_add to do the actual addition: ¢static functions 335²+≡ static T add(T z. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. Any other use requires prior written consent from the copyright owner. x->digits. z->sign = iszero(z) ? 1 : x->sign. z->digits. z->size). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(y). ¢functions 335²+≡ T AP_add(T x. T x. else if (x->ndigits > n) { int carry = XP_add(n.y) + 1). y->digits. z->digits[z->size-1] = XP_sum(x->ndigits . } Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. } ¢macros 337²+≡ #define maxdigits(x. } else z->digits[n] = XP_add(n. } else ¢set z to x+y when x and y have different signs 340² return z. z->digits. This download file is made available for personal use only and is subject to the Terms of Service. if (¢x and y have the same sign 338²) { z = add(mk(maxdigits(x. .. &x->digits[n]. if (x->ndigits < n) return add(z. carry). The result may have one more digit than the longest of x and y. 0).

} Subtraction benefits from a similar analysis. y ≥ 0 .. but the sign is positive. too. the magnitude of x + y is x – y . When x ≥ 0 . When x ≥ 0. and the sign is positive. the sign of the result is the same as the sign of x.. The following table lays out the cases. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.x->ndigits-1] becomes z->digits[n. and the sign is negative. the sign of the result is the opposite of the sign of x.com.. the magnitude of x + y is also y – x . and x > y . y < 0 . . and it may have as many digits as y. In both cases. z->sign = iszero(z) ? 1 : -x->sign. This download file is made available for personal use only and is subject to the Terms of Service. the magnitude of x + y is also x – y . described below. y < 0 . x. z->sign = iszero(z) ? 1 : x->sign. XP_add computes the n-digit sum in z->digits[0. does the subtraction. Hanson. } When x < 0. If x and y have the same number of digits. Any other use requires prior written consent from the copyright owner. the magnitude of x + y is y – x . Unauthorized use. and x > y .340 ARBITRARY-PRECISION ARITHMETIC The first test in add ensures that x is the longer operand. Frank Liu Copyright © 1997 by David R.. If x is longer than y. y). but the sign is negative. and x ≤ y . sub. ¢set z to x+y when x and y have different signs 340²+≡ else { z = sub(mk(y->ndigits). y) > 0) { z = sub(mk(x->ndigits). When x < 0 . In both of these cases. y<0 x<0 x≥0 – ( x – y ) if x > y y – x if x ≤ y x+ y y≥0 –( x + y ) x–y if x > y – ( y – x ) if x ≤ y C Interfaces and Implementations: Techniques for Creating Reusable Software.n-1] and returns the carry. x). reproduction and/or distribution are strictly prohibited and violate applicable laws. y ≥ 0. and x ≤ y . ¢set z to x+y when x and y have different signs 340²≡ if (cmp(x. The other addition cases can be simplified. XP_add computes the n-digit sum as in the previous case. The sum of this carry and x->digits[n. The result may have as many digits as x. and the carry is z’s most significant digit. and cmp compares x and y . All rights reserved.z->size-1]. y.

z->sign = iszero(z) ? 1 : x->sign. T y) { T z. the magnitude of x – y is y – x and the sign is the opposite of the sign of x. sub calls the XP functions to implement subtraction. z->sign = iszero(z) ? 1 : x->sign. assert(x). 0). y->digits.com. y). All rights reserved. y) > 0) { z = sub(mk(x->ndigits). T y) { int borrow. } else ¢set z to x−y when x and y have the same sign 341² return z.IMPLEMENTATION 341 Here. Frank Liu Copyright © 1997 by David R. and both can be handled by computing x + y and setting the sign of the result to the sign of x: ¢functions 335²+≡ T AP_sub(T x. the off-diagonal cases are the easy ones. x. Unauthorized use. x). z->digits. y). y. T x. when x ≤ y . } The diagonal cases depend on the relative values of x and y. y never exceeds x. borrow = XP_sub(n. ¢set z to x−y when x and y have the same sign 341²≡ if (cmp(x.. Hanson. if (!¢x and y have the same sign 338²) { z = add(mk(maxdigits(x. } Like add. ¢static functions 335²+≡ static T sub(T z. assert(y). if (x->ndigits > n) C Interfaces and Implementations: Techniques for Creating Reusable Software. z->sign = iszero(z) ? 1 : -x->sign. the magnitude of x – y is x – y and the sign is the same as the sign of x. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. } else { z = sub(mk(y->ndigits). x. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.y) + 1). . When x > y . n = y->ndigits. x->digits.

n-1] and returns the borrow. return normalize(z. When x and y have different signs. The remainder is x mod y when that value is zero and y − ( x mod y ) when x mod y is nonzero. the quotient is x ⁄ y and is positive. reproduction and/or distribution are strictly prohibited and violate applicable laws. T y) { T q. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. When x and y have the same sign. The quotient and remainder might have as many digits as x and y. Frank Liu Copyright © 1997 by David R. z->size). its magnitude is x ⁄ y when x mod y is zero and x ⁄ y + 1 when x mod y is nonzero. &z->digits[n].z->size-1]. This download file is made available for personal use only and is subject to the Terms of Service. . 18. assert(carry == 0). r. q->digits. and the remainder is x mod y . } When x is longer than y.x->ndigits-1] becomes z->digits[n. q->digits. q->size). normalize(q. Any other use requires prior written consent from the copyright owner.com. r ← x mod y 343² if (!¢x and y have the same sign 338² && !iszero(r)) { int carry = XP_sum(q->size. the quotient is negative. } AP_free(&r).3. Unauthorized use. } C Interfaces and Implementations: Techniques for Creating Reusable Software. and the final borrow is zero because x ≥ y in all calls to sub. but there is no borrow to propagate. but is complicated by the truncation rules. respectively. ¢functions 335²+≡ T AP_div(T x. Hanson. borrow).342 ARBITRARY-PRECISION ARITHMETIC borrow = XP_diff(x->ndigits ..n. 1). ¢q ← x/y. The difference between this borrow and x->digits[n. &x->digits[n]. assert(borrow == 0).. The remainder is thus always positive. return q..3 Division Division is like multiplication. the call to XP_sub computes the n-digit difference in z->digits[0. XP_sub computes the n-digit difference as in the previous case. All rights reserved. If x and y have the same number of digits.

. tmp). normalize(r. } normalize(q. q = mk(x->ndigits). return r. T y) { T q. r. r->size). y->digits. ¢functions 335²+≡ T AP_mod(T x. q->sign = iszero(q) || ¢x and y have the same sign 338² ? 1 : -1. r ← x mod y 343²≡ assert(x). x->digits. 0). r->digits. reproduction and/or distribution are strictly prohibited and violate applicable laws.4 Exponentiation AP_pow returns x when p. is null. q->size).3. Hanson. FREE(tmp). ¢q ← x/y. y->ndigits. r->digits. XP_div(x->ndigits. AP_mod does just the opposite: It adjusts only the remainder and discards the quotient. r->size). This download file is made available for personal use only and is subject to the Terms of Service. normalize(r. r->digits. r = mk(y->ndigits). AP_div doesn’t bother adjusting the remainder when x and y have different signs because it discards the remainder. assert(!iszero(y)). . r ← x mod y 343² if (!¢x and y have the same sign 338² && !iszero(r)) { int borrow = XP_sub(r->size. assert(y). the third argument. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. } AP_free(&q). Unauthorized use. q->digits. AP_pow returns ( x ) mod p.IMPLEMENTATION 343 ¢q ← x/y. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. y C Interfaces and Implementations: Techniques for Creating Reusable Software. When p is nony null. All rights reserved. { XP_T tmp = ALLOC(x->ndigits + y->ndigits + 2).com. y->digits. } 18. assert(borrow == 0).

Frank Liu Copyright © 1997 by David R. T p) { T z. The recursion boty y 0 toms out when x or y is zero or one. assert(y). 200 decimal digits. . assert(x). Hanson. because 0 = 0 . The problem is that if y is big. The first three of these special cases are handled by ¢special cases 344²≡ if (iszero(x)) return AP_new(0). T y. this approach takes much longer than the age of the universe. if (isone(x)) return AP_new(¢y is even y 2 y 345² ? 1 : x->sign). Mathematical rules help simplify the computation: § ( x y ⁄ 2 ) = ( xy ⁄ 2 ) ( xy ⁄ 2 ) if x is even z = ¨ © x ⋅ x y – 1 = ( x y ⁄ 2 ) ( x y ⁄ 2 )x otherwise These rules permit x to be computed by calling AP_pow recursively and multiplying and squaring the result. reproduction and/or distribution are strictly prohibited and violate applicable laws.344 ARBITRARY-PRECISION ARITHMETIC ¢functions 335²+≡ T AP_pow(T x. and 1 x = x. if (iszero(y)) return AP_new(1). Unauthorized use. assert(y->sign == 1). x = 1. assert(!p || p->sign==1 && !iszero(p) && !isone(p)). This download file is made available for personal use only and is subject to the Terms of Service. say. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } ¢macros 337²+≡ #define isone(x) ((x)->ndigits==1 && (x)->digits[0]==1) To compute z = x . y times. it’s tempting to set z to one and multiply it by x. ¢special cases 344² if (p) y ¢ z ← x mod p 346² else y ¢ z ← x 345² return z.com.. The depth of the recursion (and hence the number operations) is proportional to lg y. C Interfaces and Implementations: Techniques for Creating Reusable Software. 1 = 1. All rights reserved. Any other use requires prior written consent from the copyright owner.

¢static functions 335²+≡ static T mulmod(T x. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. AP_free(&t). reproduction and/or distribution are strictly prohibited and violate applicable laws. y). . All rights reserved. T p) { T z. The intermediate y⁄2 y⁄2 y⁄2 )( x ) — are deallocated to avoid creatresults — y ⁄ 2. AP_free(&y2). When p > 1. } } y is positive..com. Any other use requires prior written consent from the copyright owner. AP_pow computes x mod p. x . if (!¢y is even 345²) { z = AP_mul(x. and ( x ing unreachable storage. mulmod uses AP_mod and AP_mul to implement x•y mod p. x has more digits than atoms in the uniy verse. t = AP_pow(x. for example. we can’t y actually compute x because it might be too big. so shifting it right one bit computes y ⁄ 2. Unauthorized use. AP_free(&t). is a much smaller number. else { T y2 = AP_rshift(y. t). Hanson. AP_mod and the static function mulmod collaborate to implement this rule. taking care to deallocate the temporary product x•y. Frank Liu Copyright © 1997 by David R. 1). z = AP_mul(t. y When p is nonnull. This download file is made available for personal use only and is subject to the Terms of Service. T y. however. 0). t = z). The following mathematical rule about modular multiplication can be used to avoid numbers that are too big: (x•y) mod p = ((x mod p)•(y mod p)) mod p. y2.IMPLEMENTATION 345 ¢y is even 345²≡ (((y)->digits[0]&1) == 0) The recursive case implements the fourth special case as well as the two cases described by the equation above: ¢ z ← x 345²≡ if (isone(y)) z = AP_addi(x. x mod p. xy = AP_mul(x. y C Interfaces and Implementations: Techniques for Creating Reusable Software. NULL). if x is 10 y decimal digits and y is 200.

except that mulmod is called for the multiplications. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. else if (x->sign == 1) return cmp(x. AP_free(&t). p). .346 ARBITRARY-PRECISION ARITHMETIC z = AP_mod(xy. ¢ z ← x mod p 346²≡ if (isone(y)) z = AP_mod(x. 1). p is passed to the recursive call to AP_pow. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. else { T y2 = AP_rshift(y. p). reproduction and/or distribution are strictly prohibited and violate applicable laws. z = mulmod(t. t = z. 338²) C Interfaces and Implementations: Techniques for Creating Reusable Software. p). When x and y have different signs.5 Comparisons The outcome of comparing x and y depends on their signs and magnitudes. t. T y) { assert(x). Any other use requires prior written consent from the copyright owner. if (!¢x and y have the same sign return x->sign. p). equal to zero. and x is reduced mod p when y is odd. AP_free(&t). it must compare their magnitudes: ¢functions 335²+≡ int AP_cmp(T x. or greater than zero when x < y. All rights reserved. x = y. AP_cmp can simply return the sign of x. assert(y). AP_free(&y2).3. y2. if (!¢y is even 345²) { z = mulmod(y2 = AP_mod(x. t = AP_pow(x. otherwise. } The AP_pow code when p is nonnull is nearly identical to the easier case when p is null. p). AP_cmp returns a value less than zero. AP_free(&xy). Hanson. p).. return z. y). or x > y . AP_free(&y2). Frank Liu Copyright © 1997 by David R.com. } } y 18.

Any other use requires prior written consent from the copyright owner.3. y)).y->ndigits.com. XP_cmp does the actual comparison. } ¢declare and unsigned struct T t. All rights reserved. after cmp checks for operands of different lengths: ¢static functions 335²+≡ static int cmp(T x. x). t. } ¢prototypes 336²+≡ static int cmp(T x. set(&t. AP_addi illustrates this approach: ¢functions 335²+≡ T AP_addi(T x.digits initialize t 347²≡ char d[sizeof (unsigned long)]. which is why the arguments are reversed in the second call to cmp. } When x and y are positive. x < y if x < y .. then calls the more general operation. This download file is made available for personal use only and is subject to the Terms of Service. y->digits). reproduction and/or distribution are strictly prohibited and violate applicable laws. 18. T y) { if (x->ndigits != y->ndigits) return x->ndigits .6 Convenience Functions The six convenience functions take an AP_T as their first argument and a signed long as their second. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. and so on. however. x < y if x > y . This is pos- C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. x->digits. else return XP_cmp(x->ndigits. Frank Liu Copyright © 1997 by David R. = d. sizeof d. The second chunk above allocates the temporary AP_T and its associated digits array on the stack by declaring the appropriate locals. Each initializes a temporary AP_T by passing the long to set. long int y) { ¢declare and initialize t 347² return AP_add(x.size = t. T y). . When x and y are negative.IMPLEMENTATION 347 else return cmp(y.

AP_free(&r). long int y) { ¢declare and initialize t 347² return AP_cmp(x. } C Interfaces and Implementations: Techniques for Creating Reusable Software. set(&t. r->digits). Frank Liu Copyright © 1997 by David R. y)). Hanson. } AP_modi is the oddball. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. long int y) { ¢declare and initialize t 347² return AP_sub(x. All rights reserved. y)). set(&t. long int y) { ¢declare and initialize t 347² return AP_div(x. y)).. set(&t. set(&t. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. set(&t. and because it must discard the AP_T returned by AP_mod. rem = XP_toint(r->ndigits. . Any other use requires prior written consent from the copyright owner. ¢declare and initialize t 347² r = AP_mod(x. } T AP_muli(T x. y)). T r. return rem.348 ARBITRARY-PRECISION ARITHMETIC sible because the size of the digits array is bounded by the number of bytes in an unsigned long. } int AP_cmpi(T x. } T AP_divi(T x. Unauthorized use.com. because it returns a long instead of an AP_T or int. y)). long int y) { ¢declare and initialize t 347² return AP_mul(x. ¢functions 335²+≡ long int AP_modi(T x. long int y) { long int rem. Four of the remaining convenience functions have the same pattern: ¢functions²+≡ T AP_subi(T x.

XP_rshift(z->size. assert(s >= 0). ¢functions 335²+≡ T AP_lshift(T x. z->size).s/8). z->digits. For AP_lshift. z->sign = iszero(z) ? 1 : x->sign. z->sign = x->sign. the result has s ⁄ 8 more digits than the operand and the same sign as the operand. } For AP_rshift. T AP_rshift(T x.IMPLEMENTATION 349 18. . assert(s >= 0). return normalize(z. s. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. x->ndigits. z->digits. if (s >= 8*x->ndigits) return AP_new(0). Any other use requires prior written consent from the copyright owner. 0). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. s.3. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. } } The if statement handles the case when s specifies a shift amount greater than or equal to the number of bits in x. 0). normalize(z. x->ndigits. Hanson. and it is possible that the result is zero.7 Shifting The two shift functions call their XP relatives to shift their operands. assert(x). x->digits.com. int s) { T z. This download file is made available for personal use only and is subject to the Terms of Service. return z. z->size). Frank Liu Copyright © 1997 by David R. z = mk(x->ndigits + ((s+7)&~7)/8). the result has s ⁄ 8 fewer bytes. x->digits. All rights reserved.. int s) { assert(x). else { T z = mk(x->ndigits . XP_lshift(z->size. in which case its sign must be positive.

AP_fromstr calls XP_fromstr. while (*p && isspace(*p)) p++. int carry. assert(base >= 2 && base <= 36). } The rest of the AP functions convert AP_Ts to strings and vice versa. it accepts a signed number with the following syntax. For bases that exceed 10.| + ] { white } digit { digit } where white denotes a white-space character and digit is a digit character in the specified base. x->digits)%(LONG_MAX + 1UL). char **end) { T z. else return (long)u. letters specify the digits that exceed nine. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. ¢functions 335²+≡ T AP_fromstr(const char *str. Any other use requires prior written consent from the copyright owner. All rights reserved.8 String and Integer Conversions AP_toint(x) returns a long int with the same sign as x and with a magnitude equal to |x| mod (LONG_MAX+1).. and it stops scanning its string argument when it encounters an illegal character or the null character. if (*p == '-' || *p == '+') C Interfaces and Implementations: Techniques for Creating Reusable Software. . int base. ¢functions 335²+≡ long int AP_toint(T x) { unsigned long u. Frank Liu Copyright © 1997 by David R. u = XP_toint(x->ndigits. assert(x). assert(p). Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. which must be from two to 36 inclusive.com. if (x->sign == -1) return -(long)u. number = { white } [ . char *endp.3.350 ARBITRARY-PRECISION ARITHMETIC 18. AP_fromstr converts a string to an AP_T. sign = '\0'. This download file is made available for personal use only and is subject to the Terms of Service. const char *p = str.

This download file is made available for personal use only and is subject to the Terms of Service. z = AP_new(0). return z. k++) . when base is 36. when base is two.com. for ( . z->digits. ¢z ← 0 351² carry = XP_fromstr(z->size. and k is four. reproduction and/or distribution are strictly prohibited and violate applicable laws. *p == '0' && p[1] == '0'. k ranges from one. Suppose that base is 2 . if (endp == p) { endp = (char *)str. z = mk(((k*n + 7)&~7)/8). if (end) *end = (char *)endp. Hanson. int k. then m = n•lg ( 2 )/8 = k k•n/8. } else z->sign = iszero(z) || sign != '-' ? 1 : -1. when base is 10. (1<<k) < base. start = p. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(carry == 0). If end is nonnull. Any other use requires prior written consent from the copyright owner. Unauthorized use. and thus z’s XP_T must have a digits array of at least m = k k (n•lg base)/8 bytes.32 bits.IMPLEMENTATION 351 sign = *p++. Thus. AP_fromstr sets *end to endp. p++) n++. to six. ¢*p is a digit in base 352². C Interfaces and Implementations: Techniques for Creating Reusable Software. normalize(z. Frank Liu Copyright © 1997 by David R. z needs k•n ⁄ 8 digits. ¢z ← 0 { 351²≡ const char *start. for (k = 1. k is a conservative estimate of the number of bits each digit in base represents. if we choose k so that 2 is the smallest power of two equal to or greater than base. p. base. for ( . The number of bits in z is n•lg base where n is the number of digits in the string. each digit carries lg 10 ≈ 3. &endp). For example.. n = 0. p++) . . z->size). All rights reserved. } AP_fromstr passes the address of endp to XP_fromstr because it needs to know what terminated the scan so it can check for illegal inputs.

plus one for the terminating null character. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. needed for the string representation of x in base. } ¢*p is ( || || a digit in base 352²≡ '0' <= *p && *p <= '9' && *p < '0' + base 'a' <= *p && *p <= 'z' && *p < 'a' + base .41 decimal digits. for (k = 5. assert(str == NULL || size > 1). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. k ranges from five.. when base is 36. Any other use requires prior written consent from the copyright owner. k underestimates the number of bits each digit in base represents so that n will be a conservative estimate of the number of digits required. This download file is made available for personal use only and is subject to the Terms of Service. } AP_tostr lets XP_tostr compute the string representation of x: ¢functions 335²+≡ char *AP_tostr(char *str. ¢size ← number of characters in str 352²≡ { int k. k--) . if (str == NULL) { C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(x). if (x->sign == -1) size++. and n is 8•m ⁄ k .352 ARBITRARY-PRECISION ARITHMETIC p = start. For example. . All rights reserved. then m = k n•lg ( 2 ) ⁄ 8 = k•n/8. T x) { XP_T q. assert(base >= 2 && base <= 36). (1<<k) > base. If we choose k so that 2 is the largest power of two less than or equal to base. Here. Hanson. Unauthorized use. the digits in x each yield 8/lg 10 ≈ 2.com. and k is three. to one. so space for 8 ⁄ 3 = 3 decimal digits is allocated for each digit in x. int base. when base is two. AP_tostr can use a similar trick to approximate the number of characters. int size. size = (8*x->ndigits)/k + 1 + 1.10) The first for loop in ¢z ← 0 351² skips over leading zeros. The number k of digits in x’s digits array is m = (n•lg base)/8. when base is 10.10 'A' <= *p && *p <= 'Z' && *p < 'A' + base . n.

Recent versions of Icon. q). return str. Fmt_putd(buf. XP_tostr(str + 1. Frank Liu Copyright © 1997 by David R. int put(int c. if (x->sign == -1) { str[0] = '-'. FREE(q). cl. base. memcpy(q. x = va_arg(*app. 352² } q = ALLOC(x->ndigits). } else XP_tostr(str. FREE(buf). char *buf. for example. which is a Fmt-style conversion function for printing AP_Ts. x->ndigits. x->ndigits. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. width. size . 10. } The last AP function is AP_fmt. int width. base. All rights reserved. have only one integer type. strlen(buf). precision). put. x). 0. Any other use requires prior written consent from the copyright owner. assert(x). assert(app && flags). reproduction and/or distribution are strictly prohibited and violate applicable laws. ¢functions 335²+≡ void AP_fmt(int code. x->ndigits).FURTHER READING 353 ¢size ← number of characters in str str = ALLOC(size). int precision) { T x. It uses AP_tostr to format the value in decimal and calls Fmt_putd to emit it. buf = AP_tostr(NULL. unsigned char flags[].1. q). . but use arbitrary-precision arithmetic as necessary to represent the values com- C Interfaces and Implementations: Techniques for Creating Reusable Software. void *cl).com. Hanson. flags. x->digits. } Further Reading AP_Ts are similar to the “bignums” in some programming languages. va_list *app. T). size. void *cl.

Hanson. because that’s their purpose. This download file is made available for personal use only and is subject to the Terms of Service. provides integers of arbitrary length and rationals in which the numerator and denominator are both arbitrary-length integers. Revise them so that they share tmp. but the real advantage of this approach is that it allocates less space for intermediate values. Frank Liu Copyright © 1997 by David R.. 18.3 Implement AP_ceil(AP_T x. Maple V (Char et al. All rights reserved.6. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. keep track of its size.354 ARBITRARY-PRECISION ARITHMETIC puted. 18. another symbolic computation system. u ← 1 while y > 1 do if y is odd then u ← u•z 2 z← z y ← y/2 z ← u•z Iteration is usually faster than recursion. reproduction and/or distribution are strictly prohibited and violate applicable laws. Programmers don’t need to distinguish between machine integers and arbitrary-precision integers. Unauthorized use. for example. Any other use requires prior written consent from the copyright owner. Most symbolic arithmetic systems do arbitrary-precision arithmetic.3 of Knuth 1981): z ← x. for example. and there’s a similar package for ML. . Mathematica (Wolfram 1988). has similar facilities. and expand it when necessary. AP_T y). Facilities for arbitrary-precision arithmetic are often provided as a standard library or package. Reimplement AP_pow using this algorithm and measure the time and space improvements. Be sure to specify what happens when x and y have different signs. LISP systems have long included bignum packages.4 The AP interface is “noisy” — there are lots of parameters and it is easy to confuse the input and output parameters. allocate it once. Exercises 18.com. How large do x and y have to be before this algorithm is noticeably better than the recursive one? 18. AP_T y) and AP_floor(AP_T x. 1992). which return the ceiling and floor of x ⁄ y .1 AP_div and AP_mod allocate and deallocate temporary space every time they’re called. Design and C Interfaces and Implementations: Techniques for Creating Reusable Software.2 The recursive algorithm used in AP_pow is equivalent to the family liar iterative algorithm that computes z = x by repeatedly squaring and multiplying (see Section 4.

Thus. which takes four multiplications and one addition. and determine for what value of n it is noticeably faster than the naive algorithm.3 in Knuth 1981). that is. Implement a recursive version of AP_mul that uses Karatsuba’s algorithm. and two additions. 18. This download file is made available for personal use only and is subject to the Terms of Service. A. An n-digit number x can be split into a sum of its most significant and least signifin⁄2 + b. Unauthorized use. Any other use requires prior written consent from the copyright owner. Focus on making the interface as clean as possible.6 Design an interface whose functions do arithmetic modulo n for an arbitrary n. and Labahn 1992 and Section 4. but don’t omit important functionality. C Interfaces and Implementations: Techniques for Creating Reusable Software. bd . two subtractions. Frank Liu Copyright © 1997 by David R. Use XP_mul for the intermediate computations.3.3 in Geddes.5 Implement an AP function that generates random numbers. and thus accept and return values in the set of integers from zero to n−1.58 (see Section 4. Be careful about division: It’s defined only when this set is a finite field.7 Multiplying two n-digit numbers takes time proportional to n (see page 308). The coefficient of the middle term can be rewritten as ad + bc = ac + bd + ( a – b ) ( d – c ).EXERCISES 355 implement a new interface that uses a Seq_T as a stack from which the functions fetch operands and store results. Czapor. reproduction and/or distribution are strictly prohibited and violate applicable laws. which is when n is a prime. Hanson. All rights reserved. the product xy can be cant n/2 bits. uniformly distributed in a specified range. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Karatsuba showed (in 1962) how to multiply in time proportional to n 1. . When n is large. 18. 18. and ( a – b ) ( d – c ) ).com.. saving one n ⁄ 2-digit multiplication reduces the execution time of multiplication at the expense of space for the intermediate values. x = aB written as xy = ( aB n⁄2 2 + b ) ( cB n⁄2 + d ) = acB + ( ad + bc )B n n⁄2 + bd . The product xy thus requires only three multiplications ( ac .

Any other use requires prior written consent from the copyright owner. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. Hanson..

exports functions that implement multiple-precision arithmetic on unsigned and two’s-complement integers. Hanson. T 357 C Interfaces and Implementations: Techniques for Creating Reusable Software. . compilers must use multiple-precision arithmetic to convert floating-point constants to the closest floating-point values they specify. This download file is made available for personal use only and is subject to the Terms of Service. but want finer control over allocations. Some modern encryption algorithms involve manipulating fixed-precision integers with hundreds of digits. Unlike XP. If Y has integers bigger than X. Also. reproduction and/or distribution are strictly prohibited and violate applicable laws. and the MP functions operate on integers of a given size.com. or must mimic two’s-complement n-bit arithmetic. All rights reserved. need both unsigned and signed operations. MP is intended for applications that need extended-precision arithmetic. the lengths of MP’s integers are given in bits. MP. Like the AP functions. Examples include compilers and applications that use encryption. Some compilers must use multiple-precision integers. Like XP. the compiler can use MP to manipulate Y-sized integers. A crosscompiler runs on platform X and generates code for platform Y. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. Frank Liu Copyright © 1997 by David R. and MP’s functions implement both signed and unsigned arithmetic. Any other use requires prior written consent from the copyright owner.19 MULTIPLE-PRECISION ARITHMETIC he last of the three arithmetic interfaces. Unauthorized use. MP reveals its representation for n-bit integers. the MP functions enforce the usual suite of checked runtime errors.

. Unauthorized use. bit n − 1 is the sign bit. All rights reserved. exported exceptions 359 exported functions 358 #undef T #endif Like XP. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h ≡ #ifndef MP_INCLUDED #define MP_INCLUDED #include <stdarg. changes MP so that subsequent calls do n-bit arithmetic. a cross-compiler might manipulate constants using 128-bit C Interfaces and Implementations: Techniques for Creating Reusable Software. mp. MP uses the two’s-complement representation for signed integers. However. MP reveals that an n-bit integer is represented by n ⁄ 8 bytes. Any other use requires prior written consent from the copyright owner. MP_set returns the previous size. stored least significant byte first. This download file is made available for personal use only and is subject to the Terms of Service. Unlike the XP functions. most applications use only one size of extended integer. it is a checked runtime error to pass a null MP_T to any function in this interface. Hanson.358 MULTIPLE-PRECISION ARITHMETIC 19. Calling exported functions 358 ≡ extern int MP_set(int n). the MP functions implement the usual checked runtime errors.h> #include "except.h> #include <stddef.h" #define T MP_T typedef unsigned char *T. For example. Once initialized.1 Interface The MP interface is large — 49 functions and two exceptions — because it exports a complete suite of arithmetic functions on n-bit signed and unsigned integers. reproduction and/or distribution are strictly prohibited and violate applicable laws. It is a checked runtime error for n to be less than two. it is an unchecked runtime error to pass an MP_T that is too small to hold an nbit integer. Frank Liu Copyright © 1997 by David R. .com. MP is initialized automatically to do arithmetic on 32-bit integers. for example.

reproduction and/or distribution are strictly prohibited and violate applicable laws. Omitting n is the obvious simplification. which allocates an MP_T of the appropriate size. MP_new and MP_fromintu raise MP_Overflow n when u exceeds 2 – 1. MP_set(8). MP_fromintu(z. MP_fromint. All rights reserved. and MP_fromint raises MP_Overflow when v is n–1 n–1 less than – 2 or exceeds 2 – 1. Frank Liu Copyright © 1997 by David R. and MP_fromintu raise exported exceptions 359 ≡ extern const Except_T MP_Overflow. unsigned long u). 0xFFF). For example. and returns it. long v). it simplifies the use of the other MP functions. Clients can use a TRY-EXCEPT statement to ignore the exception when that is the appropriate action. One of those is exported functions 358 +≡ extern T MP_new(unsigned long u). Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusable Software. but a more important simplification is that there are no restrictions on the source and destination arguments: The same MP_T can always appear as both a source and a destination. All of the MP functions compute their results before they raise an exception. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.INTERFACE 359 arithmetic. Hanson. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. z = MP_new(0). For example. exported functions 358 +≡ extern T MP_fromint (T z. MP_T z. initializes it to u. Licensed by Frank Liu 1740749 set z to v or u and return z. extern T MP_fromintu(T z. The extraneous bits are simply discarded. Eliminating these restrictions is possible because the temporary space needed by some of the functions depends only on n and thus can be allocated once by MP_set.com. but only four of the other 48 MP functions do allocations. This design also avoids allocations. if u or v don’t fit in n bits. MP_new. . This design caters to these kinds of applications. sets z to 0xFF and raises MP_Overflow. MP_set can raise Mem_Failed.. and simplifies their argument lists as well.

Clients can use exported functions 358 +≡ extern T MP_cvt (int m. Any other use requires prior written consent from the copyright owner. MP_cvt and MP_cvtu convert x to an m-bit signed or unsigned MP_T in z and return z. The arithmetic functions are C Interfaces and Implementations: Techniques for Creating Reusable Software.. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 0xFFF). z = MP_new(0). Hanson. EXCEPT(MP_Overflow) . T z. END_TRY. Unauthorized use. and there’s no way to capture the result when an exception occurs. which return the value of x as a signed or unsigned long. MP_cvtu extends the result with zeros. Frank Liu Copyright © 1997 by David R. It is a checked runtime error for m to be less than two. T z. This download file is made available for personal use only and is subject to the Terms of Service. TRY MP_fromintu(z. MP_set(8). unsigned char z[sizeof (unsigned)].360 MULTIPLE-PRECISION ARITHMETIC MP_T z. All rights reserved. These functions raise MP_Overflow when x doesn’t fit in the return type. sets z to the least significant 8•sizeof (unsigned) bits from x regardless of the size of x. EXCEPT(MP_Overflow) . Thus. but they set z before doing so. and MP_cvt extends the result with x’s sign bit. extern T MP_cvtu(int m. to convert x to an MP_T of the appropriate size. T x). They raise MP_Overflow when x doesn’t fit in the m-bit destination. END_TRY. . When m exceeds the number of bits in x.com. and it is an unchecked runtime error for z to be too small to hold an m-bit integer. T x). This convention does not apply to exported functions 358 +≡ extern unsigned long MP_tointu(T x). z. sets z to 0xFF and discards the overflow exception. TRY MP_cvtu(8*sizeof (unsigned). x). extern long MP_toint (T x).

MP_sub. and return z. y). x. and z. as detailed below. exported functions 358 +≡ extern T MP_mul2u(T z. MP_mod. If x and y have different signs. Frank Liu Copyright © 1997 by David R. where z has 2n bits. MP_subu raises MP_Overflow when x < y. and z = x mod y. All these functions. MP_div. Those with names ending in u do unsigned arithmetic. It is an C Interfaces and Implementations: Techniques for Creating Reusable Software. T x). Unauthorized use. y). MP_div. y). MP_divu. T x. the others do two’s-complement signed arithmetic.. and MP_modu raise exported exceptions 359 +≡ extern const Except_T MP_Dividebyzero. extern T MP_neg (T z. T T T T T y). MP_mul. T x. x. MP_div and MP_mod truncate toward minus infinity. T x. thus x mod y is always positive. z. Thus. y). reproduction and/or distribution are strictly prohibited and violate applicable laws. MP_add. All rights reserved. when y is zero. x.INTERFACE 361 exported functions 358 +≡ extern T MP_add (T z. z. T x. x. Overflow semantics are the only difference between the unsigned and signed operations. y. return double-length products: They both compute z = x•y. y). extern T MP_sub (T z. and return z. except MP_divu and MP_modu.com. T y). This download file is made available for personal use only and is subject to the Terms of Service. z = x/y. z = x•y. Any other use requires prior written consent from the copyright owner. raise MP_Overflow if the result does not fit. and MP_sub raises MP_Overflow when x and y have different signs and the sign of the result is different from x’s sign. . MP_neg sets to z to the negative of x and returns z. extern extern extern extern extern T T T T T MP_addu(T MP_subu(T MP_mulu(T MP_divu(T MP_modu(T z. extern T MP_mod (T z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. y). extern T MP_mul (T z. y). z. T y). T x. the result cannot overflow. Italics denote the values of x. z. y). z = x − y. T T T T T T T T T T T x. and MP_mod and their unsigned counterparts compute z = x + y. x. extern T MP_mul2 (T z. respectively. Hanson. T x. y). extern T MP_div (T z.

x. For example. it cannot be allocated by MP_new. z. long y). z. if (overflow) C Interfaces and Implementations: Techniques for Creating Reusable Software. z. long y. EXCEPT(MP_Overflow) overflow = 1. All rights reserved. These functions are equivalent to their more general counterparts when their second operands are initialized to y. MP_T z. and they raise similar exceptions. MP_mul(z. y). long long long long y). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. x. Note that since z must accommodate 2n bits. x. Unauthorized use. x. TRY MP_fromint(t. x. y). x. y). long long long long y). x. Hanson. { MP_T t = MP_new(0). reproduction and/or distribution are strictly prohibited and violate applicable laws. y). This download file is made available for personal use only and is subject to the Terms of Service.. Frank Liu Copyright © 1997 by David R.com. END_TRY. x. Any other use requires prior written consent from the copyright owner. T T T T T T T T x. y). t).362 MULTIPLE-PRECISION ARITHMETIC unchecked runtime error for z to be too small to hold 2n bits. unsigned long y). MP_muli(z. . x. z. y). unsigned unsigned unsigned unsigned extern long MP_modi (T x. extern unsigned long MP_modui(T x. The convenience functions accept an immediate unsigned long or long for their second operand: exported functions 358 +≡ extern T MP_addi (T extern T MP_subi (T extern T MP_muli (T extern T MP_divi (T extern extern extern extern T T T T MP_addui(T MP_subui(T MP_mului(T MP_divui(T z. long y. z. x. x. z. y). int overflow = 0. z. y). is equivalent to MP_T z.

. extern T MP_or (T z. Any other use requires prior written consent from the copyright owner. they simply compare x and y. unsigned long y. and exclusive OR of x and y and return z. extern T MP_ori (T z. including MP_divui and MP_modui. extern T MP_andi(T z. MP_and.com. or greater than zero. respectively x < y. Notice that these convenience functions. inclusive OR. T y). These functions never raise exceptions. y). T T T T x. MP_xor and their immmediate counterparts set z to the bitwise AND. extern int MP_cmpui(T x. The following functions treat their input MP_Ts as strings of n bits: exported functions 358 +≡ extern T MP_and (T z. T y). MP_andi(z. and the convenience variants ignore the overflow that would usually occur when y is too big. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. exported functions 358 +≡ extern int MP_cmp (T x. This download file is made available for personal use only and is subject to the Terms of Service. if. x. T x. but they do so after computing z. MP_or. T x. unsigned long y). extern T MP_xori(T z. or x > y. Hanson.INTERFACE 363 RAISE(MP_Overflow). x = y. MP_not sets z to the one’s complement of x and returns z. raise MP_Overflow if y is too big. Frank Liu Copyright © 1997 by David R. x. equal to zero. T y). long y). MP_cmpi and MP_cmpui don’t insist that y fit in an MP_T. For example. T y). however. extern int MP_cmpu (T x. x. compare x and y and return a value less than zero. T y). unsigned long y). extern int MP_cmpi (T x. is equivalent to C Interfaces and Implementations: Techniques for Creating Reusable Software. T x. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. x). MP_T z. unsigned long y). unsigned long y). extern T MP_not (T z. extern T MP_xor (T z. x. All rights reserved. } The convenience functions do no allocations..

and returns z. } None of these functions do any allocations. the C Interfaces and Implementations: Techniques for Creating Reusable Software. extern void MP_fmtu (int code. int s). END_TRY. reproduction and/or distribution are strictly prohibited and violate applicable laws. { MP_T t = MP_new(0). T x). x. int width. Hanson. implement logical and arithmetic shifts. int put(int c. int size. MP_fromstr interprets the string in str as an unsigned integer in base. All rights reserved. int width. int s). char **end). unsigned char flags[]. unsigned char flags[]. void *cl). void *cl. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. however. . int base. MP_and(z. int put(int c. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. extern char *MP_tostr (char *str. T x. It is a checked runtime error for s to be negative. T x. x. unsigned long y. and MP_rshift sets z to x shifted right s bits. MP_lshift sets z to x shifted left s bits. t). void *cl. The three shift functions exported functions 358 +≡ extern T MP_lshift(T z. but the vacated bits are filled with x’s sign bit.. sets z to that integer. exported functions 358 +≡ extern T MP_fromstr(T z. const char *str. extern T MP_ashift(T z. and consumes one or more digits in base. int precision). va_list *app. It ignores leading white space. T x. int s). EXCEPT(MP_Overflow) . va_list *app. int precision). The following functions convert between MP_Ts and strings. MP_ashift is like MP_rshift. Frank Liu Copyright © 1997 by David R. int base. void *cl). extern void MP_fmt (int code.com. y).364 MULTIPLE-PRECISION ARITHMETIC MP_T z. extern T MP_rshift(T z. Both functions fill the vacated bits with zeros and return z. For bases greater than 10. TRY MP_fromintu(t.

MP_fromstr raises MP_Overflow if the string in str specifies an integer that is too big. Like calc. for size to be too small to hold the null-terminated result. MP_fmt and MP_fmtu are Fmt-style conversion functions for printing MP_Ts.. 19. It is a checked runtime error for str to be nonnull. mpcalc uses Polish suffix notation: Values are pushed onto a stack. it is the client’s responsibility to deallocate the string. and operators pop their operands from the stack and push their results. . Any other use requires prior written consent from the copyright owner. Hanson. MP_tostr can raise Mem_Failed. except that it does signed and unsigned computations on n-bit integers. and returns null. Both functions can raise Mem_Failed. MP_fromstr sets *end to str. It illustrates the use of the MP interface.com.EXAMPLE: ANOTHER CALCULATOR 365 lowercase and uppercase letters specify the digits beyond nine. If str does not specify a valid integer.2 Example: Another Calculator mpcalc is like calc. Unauthorized use.size-1] with a null-terminated string representing x in base. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if end is nonnull. MP_fromstr is like strtoul: If end is nonnull. A value is one or more consecutive digits in the current input base. ~ + * / % i k negation addition subtraction multiplication division remainder set the input base set the precision & | ^ < > ! o c AND inclusive OR exclusive OR left shift right shift not set the output base clear the stack C Interfaces and Implementations: Techniques for Creating Reusable Software. If str is null. and returns str. MP_fmt converts the signed MP_T to a string using the same conventions as printf’s %d conversion. This download file is made available for personal use only and is subject to the Terms of Service. and the operators are as follows. Both consume an MP_T and a base. MP_tostr fills str[0. MP_tostr ignores size and allocates the necessary string. or for base to be less than two or more than 36. All rights reserved. or for base to be less than two or more than 36.. Frank Liu Copyright © 1997 by David R. When str is null. reproduction and/or distribution are strictly prohibited and violate applicable laws. and MP_fmtu converts the unsigned MP_T using conventions of printf’s %u conversion. It is a checked runtime error for app or flags to be null. MP_fromstr sets *end to the address of the character that terminated the scan. It is a checked runtime error for str to be null.

Hanson. other characters are announced as unrecognized operators. The size of the stack is limited only by available memory. For division by zero. This download file is made available for personal use only and is subject to the Terms of Service.h" #include "mp. The stack must be empty when the k operator is executed.h" mpcalc data 367 mpcalc functions 367 As the inclusion of seq. the result in this case is the n least significant bits of the value. or 16. and manages a stack. . + = * / and % do signed arithmetic. reproduction and/or distribution are strictly prohibited and violate applicable laws.h> #include "mem. The ~ operator always does signed arithmetic.h suggests.com. Frank Liu Copyright © 1997 by David R.h> #include <string. For overflow. where n is at least two. computes values. mpcalc uses a sequence for its stack: C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. and the p and f operators print unsigned values.c ≡ #include <ctype.h> #include <stdlib. The command nk. but a diagnostic announces stack underflow.h> #include <stdio. If the output base is two. Any other use requires prior written consent from the copyright owner. eight.h" #include "seq.h" #include "fmt. and the & | ^ ! < and > operators always interpret their operands as unsigned numbers. the result is zero. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. When the input base exceeds 10. Unauthorized use. The i and o operators specify the input and output bases. mpcalc announces overflow and division by zero when they occur. mpcalc.h> #include <limits. the defaults are both 10. specifies the size of the integers manipulated by mpcalc.366 MULTIPLE-PRECISION ARITHMETIC d p f q duplicate the value at the top of the stack print the value at the top of the stack print all the values on the stack from the top down quit White-space characters separate values but are otherwise ignored.. For all other bases. The overall structure of mpcalc is much like that of calc: It interprets the input. the default is 32. the + = * / and % operators do unsigned arithmetic. the leading digit of a value must be between zero and nine inclusive. and p and f print signed values.

initialization 367 while ((c = getchar()) != EOF) { volatile MP_T x = NULL. mpcalc’s pop always returns an MP_T. else { Fmt_fprint(stderr. But it also sets up some MP_Ts for operands and results.com. } } Like calc’s pop. initialization 367 ≡ sp = Seq_new(0). Like calc’s main loop. mpcalc functions 367 +≡ int main(int argc. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use.. return MP_new(0). and they’re popped by calling Seq_remhi. Values are pushed by calling Seq_addhi. . z = NULL. even when the stack is empty. and it uses a TRY-EXCEPT statement to catch the exceptions. Frank Liu Copyright © 1997 by David R. char *argv[]) { int c. Any other use requires prior written consent from the copyright owner. because this simplifies error-checking. mpcalc must not call Seq_remhi when the sequence is empty. "?stack underflow\n"). mpcalc’s reads the next value or operator and switches on it. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Dealing with MP’s exceptions makes mpcalc’s main loop a bit more complicated than calc’s.EXAMPLE: ANOTHER CALCULATOR 367 mpcalc data 367 ≡ Seq_T sp. TRY switch (c) { cases 368 } EXCEPT(MP_Overflow) C Interfaces and Implementations: Techniques for Creating Reusable Software. y = NULL. This download file is made available for personal use only and is subject to the Terms of Service. so it wraps all pop operations in a function that checks for underflow: mpcalc functions 367 ≡ MP_T pop(void) { if (Seq_length(sp) > 0) return Seq_remhi(sp). Hanson.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. an operator. If x and y are nonnull after switching on an operator. If z is nonnull. which must be pushed. FREE(x). FREE(y). Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. case ' ': case '\t': case '\n': case '\f': case '\r': break. END_TRY. it holds the result. c). "?overflow\n"). "?'\\%03o'". . Hanson. c). which is an error. Here are the easy cases: cases 368 ≡ default: if (isprint(c)) Fmt_fprint(stderr. reproduction and/or distribution are strictly prohibited and violate applicable laws.368 MULTIPLE-PRECISION ARITHMETIC Fmt_fprint(stderr. } clean up and exit } clean up and exit 368 ≡ clear the stack 368 Seq_free(&sp). else Fmt_fprint(stderr. z). EXCEPT(MP_Dividebyzero) Fmt_fprint(stderr. the first digit of a value. case 'q': clean up and exit 368 clear the stack 368 ≡ while (Seq_length(sp) > 0) { 368 C Interfaces and Implementations: Techniques for Creating Reusable Software. they hold operands that were popped from the stack and thus must be deallocated. return EXIT_SUCCESS. if (z) Seq_addhi(sp. This approach permits the TRYEXCEPT statement to appear only once. or something else. case 'c': clear the stack 368 break. " is unimplemented\n"). break. Fmt_fprint(stderr. and z is used for the result.com. All rights reserved.. This download file is made available for personal use only and is subject to the Terms of Service. "?divide by 0\n"). instead of around the code for each operator. "?'%c'". An input character is either white space. Unauthorized use. x and y are used for operands.

} buf[i] = '\0'. NULL). calc gathers up the digits and calls MP_fromstr to convert them to an MP_T. for ( .. gather up digits into buf 369 MP_fromstr(z.com. break. if (i > (int)sizeof (buf) . stdin). All rights reserved.1) buf[i] = c. reproduction and/or distribution are strictly prohibited and violate applicable laws. cases 368 ≡ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': { char buf[512]. A character is a digit in ibase if c is a digit in ibase 369 ≡ strchr(&"zyxwvutsrqponmlkjihgfedcba9876543210"[36-ibase].EXAMPLE: ANOTHER CALCULATOR 369 MP_T x = Seq_remhi(sp). ibase is the current input base. tolower(c)) Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. } Excessively long values are announced and truncated. i). . z = MP_new(0). i++) if (i < (int)sizeof (buf) . "?integer constant exceeds %d digits\n". This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Fmt_fprint(stderr. ibase. FREE(x). if (c != EOF) ungetc(c. Unauthorized use.1. Hanson. } gather up digits into buf 369 ≡ { int i = 0.1) { i = (int)sizeof (buf) . c = getchar(). } A digit identifies the beginning of a value. buf. Any other use requires prior written consent from the copyright owner. c is a digit in ibase 369 .

All rights reserved. MP_T). z = MP_new(0). x. break. y. y). MP_addu. This download file is made available for personal use only and is subject to the Terms of Service. x. The cases for most of the arithmetic operators have the same form: cases 368 +≡ case '+': case '-': case '*': case '/': case '%': case '&': case '|': case '^': pop pop pop pop pop pop pop pop x x x x x x x x & & & & & & & & y. . (*f->mul)(z. Any other use requires prior written consent from the copyright owner. z). MP_neg(z. u = { "%U\n". break. break.370 MULTIPLE-PRECISION ARITHMETIC is nonnull. break. MP_mul. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. y. reproduction and/or distribution are strictly prohibited and violate applicable laws. x. y. y). } s = { "%D\n". y. MP_add. MP_T. MP_mulu. case '~': z = pop(). y). MP_T (*div)(MP_T. set z 370 ≡ y = pop(). MP_subu. MP_T (*add)(MP_T. MP_T (*mul)(MP_T. MP_T). MP_T. *f = &s. break.com. y). break. mpcalc data 367 +≡ int ibase = 10. y. break. Frank Liu Copyright © 1997 by David R. MP_not(z. x. (*f->div)(z. break. y). x. y. MP_modu }. MP_and(z. MP_T). set set set set set set set set z z z z z z z z 370 370 370 370 370 370 370 370 (*f->add)(z. MP_xor(z. MP_T. (*f->mod)(z. pop x & y. case '!': z = pop(). y). MP_T.. y. break. Hanson. int obase = 10. C Interfaces and Implementations: Techniques for Creating Reusable Software. struct { const char *fmt. x. (*f->sub)(z. MP_divu. y). MP_div. MP_T). MP_T. y). MP_or (z. x = pop(). f points to a structure that holds pointers to functions for those operations that depend on whether mpcalc is doing signed or unsigned arithmetic. break. Unauthorized use. MP_T (*sub)(MP_T. MP_mod }. x. MP_sub. MP_T (*mod)(MP_T. z). MP_T). x.

mpcalc registers MP_fmt with %D and MP_fmtu with %U: initialization 367 +≡ Fmt_register('D'. else f = &s. Hanson.n). if (obase == 2 || obase == 8 || obase == 16) f = &u. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. MP_fmtu). else obase = n. if (n < 2 || n > 36) Fmt_fprint(stderr. f->fmt thus accesses the appropriate format string. break. . Frank Liu Copyright © 1997 by David R. Unauthorized use. MP_fmt). or if the resulting integer isn’t a legal base. obase). } The base isn’t changed if y can’t be converted to a long (that is.com. "?%d is an illegal base\n". else if (c == 'i') ibase = n. The i operator changes ibase. Initially. C Interfaces and Implementations: Techniques for Creating Reusable Software.EXAMPLE: ANOTHER CALCULATOR 371 obase is the output base. and both operators reaim f at either u or s: cases 369 +≡ case 'i': case 'o': { long n. Any other use requires prior written consent from the copyright owner. the bases are both 10. if MP_toint raises MP_Overflow). which holds pointers to the MP functions for signed arithmetic. which the p and f operators use to print MP_Ts. All rights reserved. x = pop(). n = MP_toint(x). Note that p pops its operand into z — the code in the main loop pushes that value back onto the stack. and f points to s. the o operator changes obase. The s and u structures also hold a Fmt-style format string that is used to print MP_Ts. Fmt_register('U'. cases 369 +≡ case 'p': Fmt_print(f->fmt. z = pop().. break. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

372 MULTIPLE-PRECISION ARITHMETIC case 'f': { int n = Seq_length(sp). is simply pushed back onto the stack. C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. } get s & z 372 ≡ long s. The shift operators guard against illegal shift amounts. else if (Seq_length(sp) > 0) Fmt_fprint(stderr. Unauthorized use. "?nonempty stack\n"). } . and shift their operand in place: cases 369 +≡ case '<': { get s & z case '>': { get s & z 372 372 . . Frank Liu Copyright © 1997 by David R. if (s < 0 || s > INT_MAX) { Fmt_fprint(stderr. z. z. Seq_get(sp. n = MP_toint(x). s). z. s). The remaining cases are for the k and d operators: cases 369 +≡ case 'k': { long n. MP_rshift(z. } Compare the code for f with calc’s code on page 332. n). break. break. break. n). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if (n < 2 || n > INT_MAX) Fmt_fprint(stderr.com. s = MP_toint(y). while (--n >= 0) Fmt_print(f->fmt. MP_lshift(z. z = pop(). y = pop(). All rights reserved. "?%d is an illegal shift amount\n". break. } If MP_toint raises MP_Overflow. Hanson. it’s easy to print all of the values on the stack when it’s represented with a Seq_T. Any other use requires prior written consent from the copyright owner. or s is negative or exceeds the largest int. This download file is made available for personal use only and is subject to the Terms of Service. s).. "?%d is an illegal precision\n". x = pop(). the operand. obase).

Frank Liu Copyright © 1997 by David R. x.h> <stdio. z = MP_new(0).c ≡ #include #include #include #include #include #include #include #include #include #include <ctype.h" "mem. } case 'd': { MP_T x = pop(). x). 19.h> <limits.IMPLEMENTATION 373 else MP_set(n). Any other use requires prior written consent from the copyright owner.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h> <string. const Except_T MP_Overflow = { "Overflow" }. setting z causes that value to be pushed by the code in the main loop.h" "xp.h" #define T MP_T macros 374 data 373 static functions functions 374 389 data 373 ≡ const Except_T MP_Dividebyzero = { "Division by zero" }.h> "assert. Unauthorized use. . break.com. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. Seq_addhi(sp.h" "fmt. 0). reproduction and/or distribution are strictly prohibited and violate applicable laws. MP_addui(z.3 Implementation mp. This download file is made available for personal use only and is subject to the Terms of Service. } Again.h" "mp.h> <stdlib. break.

these values are: data 373 +≡ static int nbits = 32. and addresses increase to the left. the number of bytes required to hold n bits. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. As suggested above. The following figure shows how MP interprets these bytes. Frank Liu Copyright © 1997 by David R. Hanson. and msb.374 MULTIPLE-PRECISION ARITHMETIC XP represents an n-bit number as n ⁄ 8 = (n − 1)/8 + 1 bytes. initialize 375 return prev. Given n. When n is 32. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. MP uses nbytes and shift to access the sign bit: macros 374 ≡ #define sign(x) ((x)[nbytes-1]>>shift) These values are changed by MP_set: functions 374 ≡ int MP_set(int n) { int prev = nbits.. static int shift = (32-1)%8. This download file is made available for personal use only and is subject to the Terms of Service. least significant byte first (n is always positive). . (n−1)/8 (n−1)/8−1 1 byte 0 ••• bit n−1 shift The sign bit is bit n − 1. bit (n − 1) mod 8 in byte (n − 1)/8. assert(n > 1). the number of bits the most significant byte must be shifted right to isolate the sign bit. MP computes three values of interest in addition to saving n as nbits: nbytes. which is used to detect overflow. shift.com. The least significant byte is on the right. static unsigned char msb = 0xFF. } C Interfaces and Implementations: Techniques for Creating Reusable Software. that is. static int nbytes = (32-1)/8 + 1. a mask of shift+1 ones. Unauthorized use.

Hanson. tmp[3] = tmp[0] + 3*nbytes. then check whether the result exceeds nbits bits. The allocation is thus done once in MP_set instead of repeatedly in the arithmetic functions. Any other use requires prior written consent from the copyright owner. . Most of the MP functions call XP functions to do the actual arithmetic on nbyte numbers. or when n doesn’t exceed 128. ones is defined this way because it is used for other values of n besides the values passed to MP_set. C Interfaces and Implementations: Techniques for Creating Reusable Software. initialize 375 +≡ if (tmp[0] != temp) FREE(tmp[0]). tmp[2] = tmp[0] + 2*nbytes. MP_new and MP_fromintu illustrate this strategy. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. complementing it yields (n − 1) mod 8 + 1 ones in the least significant bits. Unauthorized use. temp+3*16}. temp+1*16. All rights reserved.IMPLEMENTATION 375 initialize 375 nbits = nbytes = shift = msb = ≡ n. data 373 +≡ static unsigned char temp[16 + 16 + 16 + 2*16+2]. MP_set can use the statically allocated temp when nbytes doesn’t exceed 16. reproduction and/or distribution are strictly prohibited and violate applicable laws.. temp+2*16. (n-1)/8 + 1. macros 374 +≡ #define ones(n) (~(~0UL<<(((n)-1)%8+1))) Shifting ~0 left (n-1)%8+1 bits forms a mask of ones followed by (n . it must allocate space for the temporary. MP_set allocates enough space for one 2•nbyte+2 temporary and three nbyte temporaries. static T tmp[] = {temp. Frank Liu Copyright © 1997 by David R. temp is necessary because MP must be initialized as if MP_set(32) had been executed. Otherwise.com. if (nbytes <= 16) tmp[0] = temp. like MP_div. else tmp[0] = ALLOC(3*nbytes + 2*nbytes + 2). MP_set also allocates some temporary space for use in the arithmetic functions. (n-1)%8.1) mod 8 + 1 zeros. tmp[1] = tmp[0] + 1*nbytes. ones(n).

because it depends on the operation involved. z. Unauthorized use. Frank Liu Copyright © 1997 by David R. . Any other use requires prior written consent from the copyright owner. u).com. carry |= z[nbytes-1]&~msb.376 MULTIPLE-PRECISION ARITHMETIC functions 374 +≡ T MP_new(unsigned long u) { return MP_fromintu(ALLOC(nbytes). The test for unsigned overflow simply tests carry: test for unsigned overflow 376 ≡ if (carry) RAISE(MP_Overflow). all of the MP functions must set their results before raising an exception. MP_set has arranged for msb to hold a mask of shift+1 ones. This download file is made available for personal use only and is subject to the Terms of Service. Notice that MP_fromintu sets z before testing for overflow. u doesn’t fit in nbytes. MP_fromintu must ensure that the 8-(shift+1) most significant bits in z’s most significant byte are zeros. unsigned long u) { unsigned long carry. set z to u 376 test for unsigned overflow return z. functions 374 +≡ T MP_fromint(T z. which are OR’ed into carry before they’re discarded. All rights reserved. } T MP_fromintu(T z. set z to v 377 376 C Interfaces and Implementations: Techniques for Creating Reusable Software. but it might not fit in nbits bits. z[nbytes-1] &= msb. Testing for signed overflow is a bit more complicated. so ~msb isolates the desired bits. If carry is zero. assert(z). u). MP_fromint illustrates an easy case. as specified by the interface. If XP_fromint returns a nonzero carry. long v) { assert(z). reproduction and/or distribution are strictly prohibited and violate applicable laws. } set z to u 376 ≡ carry = XP_fromint(nbytes. u fits in nbytes.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson.

. z’s most significant bits will be ones. Many of the MP functions use the z[nbytes-1] &= msb idiom shown above to discard the excess bits in z’s most significant byte. The first two if clauses handle negative values: z is set to the absolute value of v. z. XP_neg(nbytes. because it can’t negate it. 19. 1). For MP_fromint. 1). Any other use requires prior written consent from the copyright owner. and then to its two’s complement. } else XP_fromint(nbytes. v). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. LONG_MAX + 1UL). and the excess bits must be discarded. If v is negative. taking care to pass only positive values to XP_fromint: set z to v 377 ≡ if (v == LONG_MIN) { XP_fromint(nbytes. z. Hanson. z. v is too big 377 ≡ (nbits < 8*(int)sizeof (v) && (v < -(1L<<(nbits-1)) || v >= (1L<<(nbits-1)))) The two shift expressions compute the most negative and most positive nbits-long signed integer.1 Conversions MP_toint and MP_cvt illustrate another instance of checking for signed overflow: C Interfaces and Implementations: Techniques for Creating Reusable Software. . All rights reserved. z[nbytes-1] &= msb. } else if (v < 0) { XP_fromint(nbytes. signed overflow occurs when nbits is less than the number of bits in a long and v is outside z’s range. z. return z.IMPLEMENTATION 377 if ( v is too big 377 ) RAISE(MP_Overflow). XP_neg(nbytes. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. MP_fromint must treat the most negative integer specially. } First. z. z. which is accomplished by passing a one as the fourth argument to XP_neg. Unauthorized use. z. reproduction and/or distribution are strictly prohibited and violate applicable laws. MP_fromint initializes z to the value of v.3. -v).

Unauthorized use. if d can hold x. i. if (m < nbits) { narrow signed x 379 } else { widen signed x 379 } return z. MP_cvt does both kinds of conversions: It converts an MP_T to an MP_T with either fewer or more bits. so x[i]^fill should be zero if the bits x[m. d). This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. x). Hanson. T x) { int fill. that is. d.. return XP_toint(sizeof d. fill is FF if x is negative and zero otherwise. . if the excess bits in x are equal to the sign bit of x when it’s treated as an m-bit integer. assert(m > 1). XP_toint returns the desired value. assert(x). MP_cvt “narrows” the value of x and assigns it to z. In the chunk below. 378 378 ≡ If m is less than nbits.nbits-1] are all ones or all zeros. } checked runtime errors for unary functions assert(x). x fits in m bits if bits m through nbits−1 in x are either all zeros or all ones. functions 374 +≡ T MP_cvt(int m. Frank Liu Copyright © 1997 by David R.1)/8 + 1. All rights reserved.378 MULTIPLE-PRECISION ARITHMETIC functions 374 +≡ long MP_toint(T x) { unsigned char d[sizeof (unsigned long)]. checked runtime errors for unary functions fill = sign(x) ? 0xFF : 0. mbytes = (m .. assert(z). This case must check for signed overflow. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. MP_cvt(8*sizeof d.com. } MP_cvt raises MP_Overflow if d can’t hold x. T z. C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.

Hanson. If m is at least nbits. carry will be zero.. z[mbytes-1] &= ones(m).IMPLEMENTATION 379 narrow signed x 379 ≡ int carry = (x[mbytes-1]^fill)&~ones(m). x. Licensed by Frank Liu 1740749 MP_tointu uses a similar approach: It calls MP_cvtu to convert x to an MP_T with the number of bits in an unsigned long.com. MP_cvtu(8*sizeof d. memcpy(z. . widen signed x 379 ≡ memcpy(z. Unauthorized use. i++) carry |= x[i]^fill. MP_cvtu either narrows or widens the value of x and assigns it to z. If x is in range. i < mbytes. i < nbytes. nbytes). d). mbytes). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. z[mbytes-1] &= ones(m). MP_cvt “widens” the value of x and assigns it to z. } Again. for (i = nbytes. but MP_cvt must propagate x’s sign bit. The initial assignment to carry ignores the bits that will be part of z’s nonsign bits. which is given by fill. functions 374 +≡ unsigned long MP_tointu(T x) { unsigned char d[sizeof (unsigned long)]. return XP_toint(sizeof d. x). C Interfaces and Implementations: Techniques for Creating Reusable Software. some of carry’s bits will be ones. d. otherwise. Frank Liu Copyright © 1997 by David R. All rights reserved. assert(x). if (carry) RAISE(MP_Overflow). i++) z[i] = fill. for (i = mbytes. z[nbytes-1] |= fill&~msb. This download file is made available for personal use only and is subject to the Terms of Service. then calls XP_toint to return the value. x. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. Overflow cannot occur in this case.

i < mbytes. for (i = nbytes. i < nbytes. for (i = mbytes. i++) carry |= x[i]. but simpler than. the code in MP_cvt: narrow unsigned x 380 ≡ int carry = x[mbytes-1]&~ones(m). T z. 378 19.com.3. } When m is less than nbits. test for unsigned overflow 376 When m is at least nbits. Frank Liu Copyright © 1997 by David R. which is checked with code that is similar to. Unauthorized use. . z[mbytes-1] &= ones(m). because they don’t need to handle signs and testing for overflow is simpler. x. and the excess bits in z are set to zeros: widen unsigned x 380 ≡ memcpy(z.. T x) { int i. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. checked runtime errors for unary functions if (m < nbits) { narrow unsigned x 380 } else { widen unsigned x 380 } return z. overflow occurs if any of x’s bits m through nbits−1 are ones. x. memcpy(z. mbytes). mbytes = (m . overflow cannot occur. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the unsigned arithmetic functions are easier to implement than their signed counterparts. i++) z[i] = 0. Any other use requires prior written consent from the copyright owner.1)/8 + 1. assert(m > 1). Unsigned addition illustrates an easy case. nbytes). XP_add does all the work.380 MULTIPLE-PRECISION ARITHMETIC functions 374 +≡ T MP_cvtu(int m. reproduction and/or distribution are strictly prohibited and violate applicable laws.2 Unsigned Arithmetic As the code for MP_cvtu and MP_cvt suggests.

381 381 ≡ Subtraction is just as easy. T x. assert(z).. T y) { int carry. Frank Liu Copyright © 1997 by David R. but MP_Overflow is raised when there’s an outstanding borrow: functions 374 +≡ T MP_subu(T z. } checked runtime errors for binary functions assert(x). T x. T x. because overflow cannot occur. reproduction and/or distribution are strictly prohibited and violate applicable laws. Hanson. assert(y). Unauthorized use. x. functions 374 +≡ T MP_mul2u(T z. . carry |= z[nbytes-1]&~msb. 381 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. '\0'. y. test for unsigned underflow 381 return z. This download file is made available for personal use only and is subject to the Terms of Service. checked runtime errors for binary functions carry = XP_add(nbytes. z[nbytes-1] &= msb. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. z[nbytes-1] &= msb. } test for unsigned underflow 381 ≡ if (borrow) RAISE(MP_Overflow). MP_mul2u is the simplest of the multiplication functions. z. y. T y) { int borrow. 2*nbytes).IMPLEMENTATION 381 functions 374 +≡ T MP_addu(T z. T y) { checked runtime errors for binary functions memset(tmp[3]. 0). 0). test for unsigned overflow 376 return z.com. checked runtime errors for binary functions borrow = XP_sub(nbytes. borrow |= z[nbytes-1]&~msb. z. All rights reserved. x.

memcpy(z. Any other use requires prior written consent from the copyright owner. } MP_mul2u computes the result into tmp[3] and copies tmp[3] to z so that x or y can be used as z. memcpy(z. } MP_divu avoids XP_div’s restrictions on its arguments by copying y to a temporary: C Interfaces and Implementations: Techniques for Creating Reusable Software. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. (2*nbits . if (tmp[3][nbytes-1]&~msb) RAISE(MP_Overflow).382 MULTIPLE-PRECISION ARITHMETIC XP_mul(tmp[3]. All rights reserved. nbytes. and then narrows that result to nbits and assigns it to z.. nbytes. i < nbytes. x. x. nbytes). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. tmp[3]. i++) if (tmp[3][i+nbytes] != 0) RAISE(MP_Overflow). for (i = 0. nbytes. Allocating the temporary space in MP_set thus not only isolates the allocations.com. Hanson. T y) { checked runtime errors for binary functions 381 memset(tmp[3]. . y). } The product overflows if any of the bits nbits through 2•nbits−1 in tmp[3] are ones. reproduction and/or distribution are strictly prohibited and violate applicable laws. This condition can be tested much the way the similar condition in MP_cvtu is tested: test for unsigned multiplication overflow 382 ≡ { int i. Unauthorized use. y).1)/8 + 1). MP_mul also calls XP_mul to compute a double-length result in tmp[3]. tmp[3]. nbytes. XP_mul(tmp[3]. 2*nbytes). T x. test for unsigned multiplication overflow 382 return z. functions 374 +≡ T MP_mulu(T z. z[nbytes-1] &= msb. but avoids restrictions on x and y. return z. '\0'. which would not work if MP_mul2u computed the result directly into z.

T x. tmp[3])) RAISE(MP_Dividebyzero). z. } copy y to a temporary 383 ≡ { memcpy(tmp[1]. C Interfaces and Implementations: Techniques for Creating Reusable Software. y. the only important difference is the test for overflow. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. T y) { int sx.3 Signed Arithmetic AP’s sign-magnitude representation forces AP_add to consider the signs x of y. functions 374 +≡ T MP_add(T z. return z. } tmp[2] holds the remainder. return z. z. sy. but it uses tmp[2] to hold the quotient: functions 374 +≡ T MP_modu(T z. tmp[3])) RAISE(MP_Dividebyzero).IMPLEMENTATION 383 functions 374 +≡ T MP_divu(T z. tmp[2]. Thus. Frank Liu Copyright © 1997 by David R. y = tmp[1]. All rights reserved. The properties of the two’s-complement representation permit MP_add to avoid this case analysis and simply call XP_add regardless of the signs of x and y. Hanson.. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. x. y. and y is reaimed at tmp[1]. tmp[1] holds y. y. nbytes. Any other use requires prior written consent from the copyright owner. signed addition is nearly identical to unsigned addition. T y) { checked runtime errors for binary functions 381 copy y to a temporary 383 if (!XP_div(nbytes. nbytes). T x.com. nbytes. which is discarded. } 19.3. x. . T x. tmp[2]. T y) { checked runtime errors for binary functions 381 copy y to a temporary 383 if (!XP_div(nbytes. MP_modu is similar. tmp[3] is the 2•nbyte+2 temporary needed by XP_div.

Signed subtraction has the same form as addition. test for signed overflow 384 return z. underflow has occurred. when x is negative and y is positive. Thus. z. sy = sign(y). . z. z[nbytes-1] &= msb. y. reproduction and/or distribution are strictly prohibited and violate applicable laws. When x is positive and y is negative. T x. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and the result has the same sign as y. but the test for overflow is different. test for signed underflow 384 ≡ if (sx != sy && sy == sign(z)) RAISE(MP_Overflow). } For subtraction. the result should be positive. checked runtime errors for binary functions sx = sign(x). Any other use requires prior written consent from the copyright owner. y. Unauthorized use.384 MULTIPLE-PRECISION ARITHMETIC checked runtime errors for binary functions sx = sign(x). sy. functions 374 +≡ T MP_sub(T z.. 0). This download file is made available for personal use only and is subject to the Terms of Service. 0).com. its sign is different from that of x and y: test for signed overflow 384 ≡ if (sx == sy && sy != sign(z)) RAISE(MP_Overflow). XP_add(nbytes. z[nbytes-1] &= msb. x. sy = sign(y). Hanson. the result should be negative. XP_sub(nbytes. if x and y have different signs. underflow occurs when x and y have different signs. All rights reserved. x. When the sum overflows. 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. test for signed underflow 384 return z. T y) { int sx. } 381 Overflow occurs in addition when x and y have the same signs. Frank Liu Copyright © 1997 by David R.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. if (sx && sx == sign(z)) RAISE(MP_Overflow). x. This download file is made available for personal use only and is subject to the Terms of Service. .IMPLEMENTATION 385 Negating x is equivalent to subtracting it from zero: Overflow can occur only when x is negative. XP_neg(nbytes. checked runtime errors for binary functions 381 tmp[3] ← x•y 385 if (sx != sy) XP_neg((2*nbits . sy. do an unsigned multiplication. and when the result overflows. tmp[3]. reproduction and/or distribution are strictly prohibited and violate applicable laws. checked runtime errors for unary functions sx = sign(x). and negate the result when the operands have different signs. Any other use requires prior written consent from the copyright owner. } MP_neg must clear the excess bits in z’s most significant byte because they will be ones when x is positive. Hanson. tmp[3]. Unauthorized use. For MP_mul2. 1). T y) { int sx.. (2*nbits . z[nbytes-1] &= msb. z. return z. Frank Liu Copyright © 1997 by David R. T x) { int sx. and the details are easy to fill in: functions 374 +≡ T MP_mul2(T z. sy = sign(y). All rights reserved. overflow can never occur because it computes a double-length result.1)/8 + 1. z. 1).1)/8 + 1). else memcpy(z.com. 378 C Interfaces and Implementations: Techniques for Creating Reusable Software. T x. functions 374 +≡ T MP_neg(T z. } tmp[3] ← x•y 385 ≡ sx = sign(x). return z. The easiest way to implement signed multiplication is to negate negative operands. it’s still negative.

else memcpy(z. Any other use requires prior written consent from the copyright owner.. y[nbytes-1] &= msb. and reaiming x or y at that temporary. or when the operands have the same signs and the result is negative. sy. 1). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. which needs only (2•nbits − 1)/8 + 1 bytes of z. tmp[3]. y). negate x 386 if y < 0. } By convention. x and y are negated or copied. tmp[1]. nbytes. tmp[3]. '\0'. 1). but only the least significant nbits of the 2•nbit result are copied to z. x[nbytes-1] &= msb. checked runtime errors for binary functions tmp[3] ← x•y 385 if (sx != sy) XP_neg(nbytes. The product has 2•nbits. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. x. T x. y = tmp[1]. z. nbytes. negate y 386 memset(tmp[3]. z[nbytes-1] &= msb. tmp[0]. XP_mul(tmp[3]. All rights reserved. functions 374 +≡ T MP_mul(T z. Unauthorized use. negate x 386 ≡ if (sx) { XP_neg(nbytes. 2*nbytes). . 1). into tmp[0] and tmp[1] by the MP functions. by forming the negated values in an appropriate temporary.386 MULTIPLE-PRECISION ARITHMETIC if x < 0. Hanson. and overflow occurs when the result doesn’t fit in nbits. y. reproduction and/or distribution are strictly prohibited and violate applicable laws. nbytes). 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. if x < 0. x = tmp[0]. negate y 386 ≡ if (sy) { XP_neg(nbytes. MP_mul is similar to MP_mul2. x and y are negated. when necessary. } if y < 0. x. when necessary.com. T y) { int sx.

because both the quotient and the remainder are nonnegative. } 382 Signed division is much like unsigned division when the operands have the same signs. Hanson. In this case.com. x. if the remainder is nonzero. the quotient is decremented. tmp[2]. nbytes. tmp[3])) RAISE(MP_Dividebyzero). T x. The required adjustments are the same ones that AP_div and AP_mod do: The quotient is negated and. and the remainder is positive.IMPLEMENTATION 387 test for unsigned multiplication overflow if (sx == sy && sign(z)) RAISE(MP_Overflow). . y minus that remainder is the correct value. reproduction and/or distribution are strictly prohibited and violate applicable laws. Also. if the unsigned remainder is nonzero. return z. sy = sign(y). negate x 386 if y < 0. } MP_div either negates y into its temporary or copies it there. functions 374 +≡ T MP_div(T z. return z. sy. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. the quotient is negative but must be truncated toward minus infinity. Unauthorized use. the quotient will be negative. checked runtime errors for binary functions 381 sx = sign(x).. and it uses tmp[2] to hold the remainder. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service. negate y 386 else copy y to a temporary 383 if (!XP_div(nbytes. in this case. z. Any other use requires prior written consent from the copyright owner. The complicated case for signed division and modulus is when the operands have different signs. because y and z might the same MP_T. C Interfaces and Implementations: Techniques for Creating Reusable Software. T y) { int sx. if x < 0. y. if (sx != sy) { adjust the quotient 388 } else if (sx && sign(z)) RAISE(MP_Overflow). Overflow occurs only when the dividend is the most negative n-bit value and the divisor is −1.

z. convert it to an MP_T. Any other use requires prior written consent from the copyright owner. 1). macros 374 +≡ #define iszero(x) (XP_length(nbytes. If y is too big. sy. functions 374 +≡ T MP_mod(T z. these functions can use the single-digit functions exported by XP. z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. } else if (sx && sign(tmp[2])) RAISE(MP_Overflow). All rights reserved. these functions must com- C Interfaces and Implementations: Techniques for Creating Reusable Software.. z. z[nbytes-1] &= msb.3.com. z. z. and perform the 8 corresponding arithmetic operation. T y) { int sx.4 Convenience Functions The arithmetic convenience functions take a long or unsigned long immediate operand. y. negate x 386 if y < 0. z. and uses tmp[2] to hold the quotient. When y is a single digit in base 2 . sy = sign(y). } 19. if necessary. and the operation itself might overflow. if x < 0. tmp[3])) RAISE(MP_Dividebyzero). negate y 386 else copy y to a temporary 383 if (!XP_div(nbytes. MP_mod does just the opposite: It adjusts only the remainder. Hanson. because it’s discarded. return z. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (!iszero(tmp[2])) XP_diff(nbytes. T x. z.(x))==1 && (x)[0]==0) MP_div doesn’t bother adjusting the remainder. 0). This download file is made available for personal use only and is subject to the Terms of Service. y. nbytes. if (sx != sy) { if (!iszero(z)) XP_sub(nbytes. Frank Liu Copyright © 1997 by David R.388 MULTIPLE-PRECISION ARITHMETIC adjust the quotient 388 ≡ XP_neg(nbytes. x. Unauthorized use. . 1). But there are two opportunities for overflow: y might be too big. tmp[2]. checked runtime errors for binary functions 381 sx = sign(x).

IMPLEMENTATION 389 plete the operation and the assignment to z before raising an exception. . y)) RAISE(MP_Overflow). tmp[2]). x.com. MP_addui illustrates the approach used by all the convenience functions: functions 374 +≡ T MP_addui(T z. Any other use requires prior written consent from the copyright owner. T). This code also detects overflow when nbits is less than eight and y is too big. y). unsigned long y) { checked runtime errors for unary functions 378 if (y < BASE) { int carry = XP_sum(nbytes. The function op might raise an exception. } macros 374 +≡ #define BASE (1<<8) If y is one digit. T. It then calls the function specified by its first argument. x. All rights reserved. T z.. applyu returns a one if y is too big. T x. test for unsigned overflow 376 } else if (applyu(MP_addu. too. 376 Licensed by Frank Liu 1740749 } C Interfaces and Implementations: Techniques for Creating Reusable Software. T x. and returns one if the saved carry is nonzero. return carry != 0. z. MP_addui calls applyu to convert y to an MP_T and to apply the more general function MP_addu. XP_sum can compute x + y. Unauthorized use. unsigned long u) { unsigned long carry. It saves the carry from the conversion because the conversion might overflow. set z to u op(z. } applyu uses the code from MP_fromintu to convert the unsigned long operand into tmp[2]. { T z = tmp[2]. This download file is made available for personal use only and is subject to the Terms of Service. but only after it computes z: static functions 389 ≡ static int applyu(T op(T. Hanson. return z. z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. x. z[nbytes-1] &= msb. Otherwise. carry |= z[nbytes-1]&~msb. reproduction and/or distribution are strictly prohibited and violate applicable laws. but only after it sets z. because the sum will be too big for any value of x. or zero otherwise. Frank Liu Copyright © 1997 by David R.

C Interfaces and Implementations: Techniques for Creating Reusable Software. so MP_subui doesn’t need to check whether y is too big before calling XP_diff. Unauthorized use. functions 381 +≡ T MP_subui(T z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. y)) RAISE(MP_Overflow). z. y)) RAISE(MP_Overflow). This check is made after computing z. but MP_mului must explicitly check whether y is too big when nbits is less than eight. } check if unsigned y is too big 390 ≡ if (nbits < 8 && y >= (1U<<nbits)) RAISE(MP_Overflow).390 MULTIPLE-PRECISION ARITHMETIC The convenience functions for unsigned subtraction and multiplica8 tion are similar. T MP_mului(T z. test for unsigned underflow 381 } else if (applyu(MP_subu. T x. unsigned long y) { checked runtime errors for unary functions 381 if (y < BASE) { int borrow = XP_diff(nbytes. All rights reserved. carry |= z[nbytes-1]&~msb. y). z. x − y underflows for all x. borrow |= z[nbytes-1]&~msb. y). Frank Liu Copyright © 1997 by David R. z[nbytes-1] &= msb. unsigned long y) { checked runtime errors for unary functions 381 if (y < BASE) { int carry = XP_product(nbytes. . return z. z[nbytes-1] &= msb. x. MP_mului calls MP_product. x.com. x. Hanson. z. z.. because XP_product won’t catch that error when x is zero. When y is less than 2 . Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws. test for unsigned overflow 376 check if unsigned y is too big 390 } else if (applyu(MP_mulu. } When y is too big. x. return z. MP_subui calls MP_diff. T x.

tmp[2]. single-digit divisors). It discards the quotient computed into tmp[2]: functions 381 +≡ unsigned long MP_modui(T x. y)) RAISE(MP_Overflow). else if (y < BASE) { int r = XP_quotient(nbytes. tmp[2]. . reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. unsigned long y) { checked runtime errors for unary functions 381 if (y == 0) RAISE(MP_Dividebyzero). if (y == 0) RAISE(MP_Dividebyzero). y). check if unsigned y is too big 390 return r. y). else if (y < BASE) { XP_quotient(nbytes. x. or zero otherwise. check if unsigned y is too big 390 } else if (applyu(MP_divu.com. } else if (applyu(MP_modu. and returns one if the immediate operand is too big. All rights reserved. but they must test for a zero divisor themselves (because XP_quotient accepts only nonzero. functions 381 +≡ T MP_divui(T z. which uses MP_fromint’s code to convert a long to a signed MP_T in tmp[2]. but only to compute the remainder. and they must test for overflow when nbits is less than eight and y is too big. T x.IMPLEMENTATION 391 MP_divui and MP_modui use XP_quotient. This download file is made available for personal use only and is subject to the Terms of Service. z. } The signed arithmetic convenience functions use the same approach. tmp[2]).. x. Hanson. return z. z. but call a different apply function. return XP_toint(nbytes. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner. y)) RAISE(MP_Overflow). unsigned long y) { assert(x). } MP_modui calls XP_quotient. calls the desired function. Unauthorized use. x. x.

sy = y < 0. This download file is made available for personal use only and is subject to the Terms of Service. T z. Hanson. T x. y). T.392 MULTIPLE-PRECISION ARITHMETIC static functions 389 +≡ static int apply(T op(T. tmp[2]).com. it can use XP_sum when y is nonnegative. } When y is less than 2 . x.. The single-digit XP functions take only positive single-digit operands. -y). x + y is equal to x – y for any x. long y) { checked runtime errors for unary functions if (-BASE < y && y < BASE) { int sx = sign(x). reproduction and/or distribution are strictly prohibited and violate applicable laws. T x. z. set z to v 377 } op(z. long v) { { T z = tmp[2]. if (sy) XP_diff(nbytes. because they must deal with signed operands. Unauthorized use. All rights reserved. but MP’s two’s-complement representation simplifies the details. y)) RAISE(MP_Overflow). The analysis is similar to that done by the AP functions (see page 338). x. else XP_sum (nbytes. x. } 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. z. T). so MP_addi can use XP_diff to compute the sum. x. the signed convenience functions have a bit more work to do than their unsigned counterparts. z[nbytes-1] &= msb. z. Here are the cases for addition. Frank Liu Copyright © 1997 by David R. return v is too big 377 . return z. functions 381 +≡ T MP_addi(T z. test for signed overflow 384 check if signed y is too big 393 } else if (apply(MP_add. Any other use requires prior written consent from the copyright owner. y<0 x<0 x≥0 –( x + y ) = x – y x – y = x– y y≥0 –( x – y ) = x + y x + y = x+ y 8 When y is negative. . so the signed convenience functions must use the signs of their operands to determine which function to call.

The cases for signed subtraction are just the opposite of those for addition (see page 340 for AP_sub’s case): y<0 x<0 x≥0 –( x – y ) = x + y x + y = x+ y y≥0 –( x + y ) = x – y x – y = x– y So. and negates the product when the operands have different signs. T x. } 381 MP_muli uses MP_mul’s strategy: It negates negative operands. long y) { checked runtime errors for unary functions 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. z. T x. sy = y < 0. x. y)) RAISE(MP_Overflow). test for signed underflow 384 check if signed y is too big 393 } else if (apply(MP_sub.com.. z. x. return z. Frank Liu Copyright © 1997 by David R. .IMPLEMENTATION 393 check if signed y is too big 393 ≡ if (nbits < 8 && (y < -(1<<(nbits-1)) || y >= (1<<(nbits-1)))) RAISE(MP_Overflow). and calls XP_diff when y is nonnegative. Hanson. x. z. Any other use requires prior written consent from the copyright owner. z[nbytes-1] &= msb. if (sy) XP_sum (nbytes. long y) { checked runtime errors for unary functions if (-BASE < y && y < BASE) { int sx = sign(x). y). Unauthorized use. All rights reserved. functions 381 +≡ T MP_subi(T z. reproduction and/or distribution are strictly prohibited and violate applicable laws. -y). This download file is made available for personal use only and is subject to the Terms of Service. functions 381 +≡ T MP_muli(T z. computes the product by calling XP_product. MP_subi calls XP_sum to add y to any x when y is negative. else XP_diff(nbytes. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

.394 MULTIPLE-PRECISION ARITHMETIC if (-BASE < y && y < BASE) { int sx = sign(x). return z. if (sx != sy) XP_neg(nbytes. All rights reserved. x. z.com. int r. 1). if (y == 0) RAISE(MP_Dividebyzero). . z ← x/y. y)) RAISE(MP_Overflow). return z. z. else if (-BASE < y && y < BASE) { T z = tmp[2]. because they call XP_quotient to compute the quotient and remainder. x. sy = y < 0. long y) { assert(x). Hanson. if x < 0. MP_divi discards the remainder. r ← x mod y 395 check if signed y is too big 393 } else if (apply(MP_div. Unauthorized use. negate x 386 XP_product(nbytes. check if signed y is too big 393 } else if (apply(MP_mul. y)) RAISE(MP_Overflow). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. x. else if (-BASE < y && y < BASE) { int r. T x. sy ? -y : y). z ← x/y. } MP_divi and MP_modi must check for a zero divisor. and MP_modi discards the quotient: functions +≡ T MP_divi(T z. x. Any other use requires prior written consent from the copyright owner. z. z. long y) { checked runtime errors for unary functions if (y == 0) RAISE(MP_Dividebyzero). } long MP_modi(T x. r ← x mod y 395 381 C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (sx == sy && sign(z)) RAISE(MP_Overflow). z[nbytes-1] &= msb. This download file is made available for personal use only and is subject to the Terms of Service.

5 Comparisons and Logical Operations Unsigned comparison is easy — MP_cmp can just call XP_cmp: functions 381 +≡ int MP_cmpu(T x. x. 1). } When x and y have different signs. x.IMPLEMENTATION 395 check if signed y is too big 393 return r. y)) RAISE(MP_Overflow). z. return MP_toint(tmp[2]). sy = y < 0. z. reproduction and/or distribution are strictly prohibited and violate applicable laws. sy ? -y : y). } z[nbytes-1] &= msb..3. } else if (apply(MP_mod. return XP_cmp(nbytes. if x < 0. Any other use requires prior written consent from the copyright owner. y). negate x 386 r = XP_quotient(nbytes. if (sx != sy) { XP_neg(nbytes. 19. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. Frank Liu Copyright © 1997 by David R. z. T y) { assert(x). } else if (sx && sign(z)) RAISE(MP_Overflow). r ← x mod y 395 ≡ int sx = sign(x). } MP_modi calls MP_toint instead of XP_toint to ensure that the sign is extended properly. z ← x/y. and adjusts the quotient and remainder when x and y have different signs and the remainder is nonzero. assert(y). x.r. z. This download file is made available for personal use only and is subject to the Terms of Service. tmp[2]. r = y . MP_cmp(x.y) simply returns the difference of the signs of y and x: C Interfaces and Implementations: Techniques for Creating Reusable Software. if (r != 0) { XP_diff(nbytes. z. .com. All rights reserved. Hanson. The chunk common to both MP_divi and MP_modi computes the quotient and the remainder. 1).

com. sx = sign(x). when the value is too big. v = y. All rights reserved. x). and because they don’t insist that their long or unsigned long operands fit in an MP_T. if ((int)sizeof y >= nbytes) { unsigned long v = XP_toint(nbytes. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. reproduction and/or distribution are strictly prohibited and violate applicable laws. 0. T y) { int sx. This download file is made available for personal use only and is subject to the Terms of Service. v > y 396 } else { XP_fromint(nbytes. MP_cmpui converts the MP_T to an unsigned long and uses the usual C comparisons. v = y.. else return XP_cmp(nbytes. y). that will be reflected in the outcome of the comparison. 0. When an unsigned long has at least nbits bits. assert(y).sx.396 MULTIPLE-PRECISION ARITHMETIC functions 381 +≡ int MP_cmp(T x. return −1. v > y if (v < y) return -1. MP_cmp can treat them as unsigned numbers and call XP_cmp to compare them. . y). unsigned long y) { assert(x). return XP_cmp(nbytes. Any other use requires prior written consent from the copyright owner. if v < y. tmp[2]). +1. These functions simply compare an MP_T with an immediate value. sy. +1. sy = sign(y). Frank Liu Copyright © 1997 by David R. else if (v > y) 396 ≡ C Interfaces and Implementations: Techniques for Creating Reusable Software. Otherwise. it converts the immediate value to an MP_T in tmp[2] and calls XP_cmp. } When x and y have the same signs. The comparison convenience functions can’t use applyu and apply. x. assert(x). x. Unauthorized use. because they compute integer results. if (sx != sy) return sy . functions 381 +≡ int MP_cmpui(T x. } } return −1. if v < y. tmp[2].

MP_cmpi can safely convert y to an MP_T in tmp[2].IMPLEMENTATION 397 return 1. i++) z[i] = x[i] op y[i]. because that call is made only when y has fewer bits than an MP_T. . } } When x and y have the same signs and y has fewer bits than an MP_T. Frank Liu Copyright © 1997 by David R. return XP_cmp(nbytes. i < nbytes. it uses MP_cmpui’s approach: If the immediate value has at least as many bits as an MP_T. sx = sign(x). tmp[2]). +1. assert(x). v = y.com. sy = y < 0. else return 0. MP_cmpui doesn’t have to check for overflow after it calls XP_fromint. and then call XP_cmp to compare x and tmp[2]. the comparison can be done with C comparisons. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service. Unauthorized use. return −1. v > y 396 } else { MP_fromint(tmp[2]. The binary logical functions — MP_and. and MP_xor — are the easiest MP functions to implement because each byte of the result is a bitwise operation on the corresponding bytes in the operands: macros 374 +≡ #define bitop(op) \ int i. assert(y). else if ((int)sizeof y >= nbytes) { long v = MP_toint(x). MP_cmpi can avoid comparisons altogether when x and y have different signs. long y) { int sx. functions 374 +≡ int MP_cmpi(T x. MP_or. Any other use requires prior written consent from the copyright owner. x. if (sx != sy) return sy . Hanson. \ return z C Interfaces and Implementations: Techniques for Creating Reusable Software.sx. reproduction and/or distribution are strictly prohibited and violate applicable laws. y). \ for (i = 0.. 0. Otherwise. All rights reserved. assert(x). assert(z). MP_cmpi calls MP_fromint instead of XP_fromint in order to handle negative values of y correctly. if v < y.

T x. and after checking for the easy case when s exceeds or is equal to nbits. } There’s little to be gained from writing special-case code for singledigit operands to the three logical convenience functions. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.398 MULTIPLE-PRECISION ARITHMETIC functions 374 +≡ T MP_and(T z. i < nbytes. T x. reproduction and/or distribution are strictly prohibited and violate applicable laws. assert(x). T x) { int i. T y) { bitop(&). } T MP_xori(T z. \ applyu(op. } T MP_xor(T z. unsigned long y) { bitopi(MP_or). macros 374 +≡ #define bitopi(op) assert(z). . unsigned long y) { bitopi(MP_and).. after enforcing their checked runtime errors. and immediate operands to these functions don’t cause an exception. return z. This download file is made available for personal use only and is subject to the Terms of Service. unsigned long y) { bitopi(MP_xor). T y) { bitop(^). Any other use requires prior written consent from the copyright owner. XP_ashift fills with ones and thus implements an arithmetic right shift. } MP_not is the oddball that doesn’t fit bitop’s pattern: functions 374 +≡ T MP_not(T z. Hanson. the result is all zeroes or all ones. z. macros 374 +≡ #define shft(fill. All rights reserved. x. y). applyu can still be used. i++) z[i] = ~x[i]. T x. \ return z functions 374 +≡ T MP_andi(T z. Frank Liu Copyright © 1997 by David R. op) \ 378 C Interfaces and Implementations: Techniques for Creating Reusable Software. } The three shift functions call XP_lshift or XP_rshift. T x. in which case.com. its return value is simply ignored. } T MP_or (T z. checked runtime errors for unary functions for (i = 0. } T MP_ori (T z. T x. Unauthorized use. T y) { bitop(|). z[nbytes-1] &= msb. T x.

. carry = XP_fromstr(nbytes. z[nbytes-1] &= msb. } 19. This download file is made available for personal use only and is subject to the Terms of Service. str. T x. T x. int s) { shft(0. Hanson. functions 374 +≡ T MP_fromstr(T z. z. s. \ return z functions 374 +≡ T MP_lshift(T z. inclusive. end). int base. fill. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. } T MP_ashift(T z. int base. inclusive.6 String Conversions The last four functions convert between strings and MP_Ts. fill).3. \ z[nbytes-1] &= msb. if end is nonnull. T x. carry |= z[nbytes-1]&~msb. int s) { shft(0.XP_rshift). . nbytes). int size. MP_tostr performs the opposite conversion: It takes an MP_T and fills a string with the string representation of the MP_T’s value in a base between two and 36. assert(z). } T MP_rshift(T z. } XP_fromstr does the conversion and sets *end to the address of the character that terminated the conversion. it interprets the string as an unsigned number in a base between two and 36. Unauthorized use.com. XP_rshift). z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.IMPLEMENTATION 399 assert(x). '\0'. T x) { assert(x). x. int s) { shft(sign(x). All rights reserved. char **end){ int carry. XP_lshift). \ else op(nbytes. nbytes. Frank Liu Copyright © 1997 by David R. z is initialized to zero because XP_fromint adds the converted value to z. test for unsigned overflow 376 return z. \ if (s >= nbits) memset(z. functions 374 +≡ char *MP_tostr(char *str. Letters specify the digits above nine in bases that exceed 10. nbytes). assert(s >= 0). MP_fromstr is like strtoul. reproduction and/or distribution are strictly prohibited and violate applicable laws. memset(z. const char *str. assert(z). base. Any other use requires prior written consent from the copyright owner.

nbytes. and calls Fmt_putd to emit the converted result. int put(int c. } 400 ≡ The Fmt-style conversion functions format an unsigned or signed MP_T. base. Hanson. assert(app && flags). size ← number of characters to represent x in base { int k. XP_tostr(str. Any other use requires prior written consent from the copyright owner. if (str == NULL) { size ← number of characters to represent x in base str = ALLOC(size). void *cl.400 MULTIPLE-PRECISION ARITHMETIC assert(base >= 2 && base <= 36). (1<<k) > base.com. Frank Liu Copyright © 1997 by David R. Recall that Fmt_putd emits a number in the style of printf’s %d conversion. nbytes). Each consumes two arguments: an MP_T and a base between two and 36. reproduction and/or distribution are strictly prohibited and violate applicable laws. MP_tostr uses AP_tostr’s trick for computing the size of the string: str must have at least nbits ⁄ k characters.. k--) . va_list *app. } memcpy(tmp[1]. void *cl). . assert(x). All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int precision) { T x. char *buf. where k k is chosen so that 2 is the largest power of two less than or equal to base (see page 352). T). int width. return str. x. C Interfaces and Implementations: Techniques for Creating Reusable Software. for (k = 5. Unauthorized use. MP_tostr allocates a string long enough to hold x’s representation in base. unsigned char flags[]. } 400 If str is null. MP_fmtu calls MP_tostr to convert its MP_T. plus one for the terminating null character. size. inclusive. This download file is made available for personal use only and is subject to the Terms of Service. assert(str == NULL || size > 1). functions 374 +≡ void MP_fmtu(int code. size = nbits/k + 1 + 1. tmp[1]). x = va_arg(*app.

Any other use requires prior written consent from the copyright owner. void *cl). size. int base. Hanson. assert(base >= 2 && base <= 36). Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. strlen(buf).com. x). x). functions 374 +≡ void MP_fmt(int code. sx = sign(x). } else MP_tostr(buf. va_arg(*app. FREE(buf). base = va_arg(*app. MP_tostr(buf + 1. size + 1. . width. int precision) { T x.. base. } MP_fmt has a bit more work to do. because it interprets an MP_T as a signed number. sx. precision). MP_fmt itself allocates the buffer. negate x 386 size ← number of characters to represent x in base buf = ALLOC(size+1). base. 0. Thus. int put(int c. so that it can include a leading sign. precision). strlen(buf). cl. va_list *app. unsigned char flags[]. void *cl. assert(app && flags). int). Unauthorized use. T). size. This download file is made available for personal use only and is subject to the Terms of Service. x). int width. cl. but MP_tostr accepts only unsigned MP_Ts. Fmt_putd(buf. FREE(buf). put. } 400 C Interfaces and Implementations: Techniques for Creating Reusable Software. int). reproduction and/or distribution are strictly prohibited and violate applicable laws. x = va_arg(*app. put.IMPLEMENTATION 401 buf = MP_tostr(NULL. assert(x). if (sx) { buf[0] = '-'. flags. char *buf. width. All rights reserved. if x < 0. if necessary. Fmt_putd(buf. flags.

All rights reserved. Exercises 19. This book is practical. Implement a program generator that. Section 20. Schneier (1996) is a comprehensive survey of cryptography. multiplying two n-digit numbers takes time 2 proportional to n . Hanson. (1992) shows how to use the fast Fourier transform to implement multiplication in time proportional to n lg n lg lg n. 19. This download file is made available for personal use only and is subject to the Terms of Service.6 in Press et al. Clinger (1990) shows that converting floating-point literals to their corresponding IEEE floating-point representations sometimes requires multiple-precision arithmetic to achieve the best accuracy. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner.1 The MP functions do a lot of unnecessary work when nbits is a multiple of eight. MP_nbits. numbers with a whole part and a fractional part. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com. and includes C implementations for some of the algorithms it describes. Section 20. This approach requires multiple-precision numbers with fractional parts. It also has extensive bibliography that is a good starting point for deeper investigations. Can you revise the MP implementation to avoid this work when nbits mod 8 is zero? Implement your scheme and measure its benefits — or costs.402 MULTIPLE-PRECISION ARITHMETIC Further Reading Multiple-precision arithmetic is often used in compilers. that is otherwise identical to MP. Clients should be able to specify the number of digits in both parts. once chosen. For example. generates an interface and an implementation for nbits-bit arithmetic. and sometimes it must be used.4 Design and implement an interface for arithmetic on floatingpoint numbers in which clients can specify the number of bits in C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. As shown on page 308. Be sure to specify the details of rounding. It also implements x/y by computing the reciprocal 1/y and multiplying it by x.6 in Press et al. that is. 19.3 Design and implement an interface for arithmetic on fixed-point.. given a specific value for nbits. nbits never changes.2 For many applications. . 19. (1992) includes some useful algorithms for this exercise. multiple-precision numbers. Frank Liu Copyright © 1997 by David R.

the declaration for MP_add can be either of: T *MP_add(T z[]. C Interfaces and Implementations: Techniques for Creating Reusable Software. const T x[]. This download file is made available for personal use only and is subject to the Terms of Service. const doesn’t prevent accidental aliasing. if T is defined by typedef unsigned char T[]. All rights reserved.” and.. T *y). In MP_add. Compare the readability of the results with the originals. Of course. and clients cannot declare variables of type T. for example. other definitions for XP_T and MP_T that work correctly with const.” because array types in formal parameters “decay” to the corresponding pointer types. Reimplement XP and its clients. reproduction and/or distribution are strictly prohibited and violate applicable laws.com. because the same array may be passed to both z and x. and mpcalc. This problem can be avoided by defining T as a typedef for unsigned char: typedef unsigned char T. AP. calc. There are. however. . T *MP_add(T *z. With this definition. const T x.EXERCISES 403 the exponent and in the significand. MP.5 The XP and MP functions do not use const-qualified parameters for the reasons detailed on page 300. using both these definitions for T. const T y). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. MP_add could be declared by unsigned char *MP_add(T z. This declaration for MP_add illustrates the disadvantage of defining T as an array type: T cannot be used as a return type. For example. Hanson. Any other use requires prior written consent from the copyright owner. Read Goldberg (1991) before attempting this exercise. 19. Unauthorized use. then “const T” means “array of constant unsigned char. x and y have type “pointer to constant unsigned char. Frank Liu Copyright © 1997 by David R. This kind of array type is useful only for parameters. for example. T *x. const T y[]).

Hanson.. All rights reserved. Any other use requires prior written consent from the copyright owner. Frank Liu Copyright © 1997 by David R. .C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.

20 THREADS he typical C program is a sequential. 405 T C Interfaces and Implementations: Techniques for Creating Reusable Software. program. a thread can be dedicated to each of these activities without concern for the others. a jump or call instruction causes the location counter to change to the jump destination or to the address of the function called. Hanson. The three interfaces in this chapter export functions to create and manage threads. because the threads can interact with one another in potentially nondeterministic ways. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. one instruction at a time. Frank Liu Copyright © 1997 by David R. and to communicate among threads. This concurrent execution is what makes writing multithreaded applications so much more complicated than writing single-threaded applications. the location is advanced sequentially. there is one locus of control in the program. Occasionally. A concurrent or multithreaded program has more than one thread. and display output all occur simultaneously. and this path looks like a thread through the program. A program’s location counter gives the address of each instruction as it is executed. and.com. mouse movements and clicks. Most of the time. Any other use requires prior written consent from the copyright owner. these threads are all executing at the same time. Graphical user interfaces are a prime example. That is. reproduction and/or distribution are strictly prohibited and violate applicable laws. except in the few places where they must communicate or synchronize with other threads. This approach helps simplify the implementation of a user interface because each of these threads can be designed and written as if it were a sequential program. In multithreaded systems. in the most general case.. at least conceptually. All rights reserved. . Threads are useful for applications that have inherent concurrent activities. to synchronize the actions of cooperating threads. keyboard inputs. The values of the location counter trace out a path through the program that describes its execution. or single-threaded. This download file is made available for personal use only and is subject to the Terms of Service.

reproduction and/or distribution are strictly prohibited and violate applicable laws. Similar comments apply to signal handling. and deleting them. Each subtask is run in a separate thread. and an assembler. User-level threads are implemented completely in user mode. the UNIX process and all the threads in it wait for that request to be filled. or perhaps both.. and then subsequently resumed where it left off. This download file is made available for personal use only and is subject to the Terms of Service. C Interfaces and Implementations: Techniques for Creating Reusable Software. Most UNIX systems associate signals and signal handlers with the process. . generates code. includes headers and expands macros. These phases usually communicate with one another by reading and writing temporary files. not with the individual threads in the process. and emits assembly language. for example. Some systems were not designed for multithreaded applications. With threads. which limits the usefulness of threads. Section 20. threads cannot overlap useful computation with I/O. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. for example. Threads can also help structure sequential programs because they have state: A thread includes enough associated information for it to be stopped. a compiler proper. when a thread issues a read request. User-level thread packages often have some of the drawbacks described above. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.2 illustrates this use of threads in a pipeline that computes prime numbers. consists of a separate preprocessor. Section 20. each phase could run as a separate thread in a single application. Hanson. The compiler itself might also use separate threads for the lexical analyzer and for the parser. Thread systems support either user-level or kernel-level threads. On these systems. which usually cost more than similar operations in user-level threads. user-level threads can be very efficient. Kernel-level threads use operating system facilities to provide. Some operations in these interfaces require system calls. For example. and the assembler reads the assembly language and emits object code. Newer operating systems have kernel-level threads and use them to provide thread interfaces. All rights reserved. Unauthorized use.2 describes a sorting program that uses this approach.com. writing. threads can improve performance in applications that can be decomposed naturally into relatively independent subtasks. eliminating the temporary files and the overhead of reading. without help from the operating system.406 THREADS On multiprocessor computers. A typical UNIX C compiler. most UNIX systems have blocking I/O primitives. The preprocessor reads the source code. and they all run concurrently and thus finish sooner than if the subtasks were done sequentially. however. That is. the compiler reads and parses the expanded source. nonblocking I/O and per-thread signal handling. and emits the resulting source. On the bright side. The Thread interface described in the next section provides user-level threads.

OS/2. Some thread interfaces are designed for both user-level and kernellevel threads. are nonreentrant. standard libraries may not be reentrant or thread-safe. multiple activations occur because different threads can call the same function simultaneously. A function that changes global variables or uses static variables to hold intermediate results is nonreentrant. Standard C doesn’t require that the library functions be reentrant or thread-safe. but the DCE interface does more. DCE threads use kernel-level threads when the host operating system supports them. The Open Software Foundation’s Distributed Computing Environment. In a single-threaded program. C Interfaces and Implementations: Techniques for Creating Reusable Software. This makes them easier for multithreaded clients to use. In a multithreaded program. Hanson. they must ensure that only one of them at a time calls the functions in the Table interface with that Table_T. its implementations support thread-level signals and protect calls to the standard library functions with appropriate synchronization. A thread-safe function uses synchronization mechanisms to manage access to shared data. as explained below. . If two or more activations of a nonreentrant function exist at the same time. but are reentrant. otherwise. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.THREADS 407 Even on systems with kernel-level threads. even for single-threaded clients. OpenVMS. but comes with the cost of the synchronization. Most of the functions in this book are not thread-safe. is available on most variants of UNIX. With more than 50 functions. if several threads share a Table_T. the DCE thread interface is considerably larger than the three interfaces in this chapter combined. and thus may or may not be reentrant. like Text_map. DCE for short. Two threads calling a nonreentrant function at the same time will thus modify the same storage with undefined results. A threadsafe function can be called by more than one thread simultaneously without concern for synchronization. This download file is made available for personal use only and is subject to the Terms of Service. DCE threads are implemented as user-level. multiple activations can exist simultaneously because of direct and indirect recursion. Typical implementations of some of the functions in the standard C library are nonreentrant. For example. reproduction and/or distribution are strictly prohibited and violate applicable laws. Windows NT. and multithreaded clients must make their own synchronization arrangements.com. Typically. they can modify these intermediate values in unpredictable ways. Unauthorized use. A reentrant function changes only locals and parameters. A few. All rights reserved. so programmers must assume the worst and use synchronization primitives to ensure that only one thread at a time executes a nonreentrant library function. For example. and Windows 95..

This policy is usually implemented with a clock interrupt. Thread. A quantum is the amount of time a running thread runs before it is preempted. In theory. They’re divided into separate interfaces because each has a related but distinct purpose. perhaps based on their own thread interfaces. For example. and Solaris runs a UNIX process by running one or more of its LWPs. Any other use requires prior written consent from the copyright owner. or LWPs. Kernel-level threads are called lightweight processes. but larger than. The larger POSIX interface handles per-thread signals. Sun Microsystems uses Solaris 2 LWPs to implement pthreads. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and its implementation runs user-level threads on LWPs.408 THREADS Sun Microsystems’s Solaris 2 operating system has a two-level thread facility. A thread that starts executing an atomic action will complete that action without interruption by C Interfaces and Implementations: Techniques for Creating Reusable Software. The processors are thus multiplexed between the running threads according to a scheduling policy. With nonpreemptive scheduling. User-level threads are provided by an interface similar to. includes several synchronization mechanisms. a running thread gives up its processor implicitly. Every UNIX “heavyweight” process has at least one LWP. Hanson. The POSIX (Portable Operating Systems Interface) thread interface — pthreads for short — is emerging as the leading standard thread interface.. reproduction and/or distribution are strictly prohibited and violate applicable laws. which periodically interrupts the running thread and gives its processor to another running thread. a running thread may execute a function that causes it to become blocked or to otherwise relinquish its processor. The pthreads facilities are a superset of those exported by Thread and Sem. which multiplex themselves between user-level threads. there are usually more threads than real processors. and specifies which standard C library functions must be threadsafe. Atomic actions execute without preemption. With preemptive scheduling. all running threads execute concurrently. at which point a context switch suspends the current thread and resumes another (perhaps the same) running thread. Context switches also occur with nonpreemptive scheduling when a running thread blocks. The Thread interface uses preemption when its implementation supports it.com. One LWP can service one or more user-level threads.1 Interfaces Each of the three interfaces in this chapter is small. 20. All rights reserved. . Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. Kernel support for LWPs includes nonblocking I/O and per-LWP signals. Solaris multiplexes the processors between the LWPs. Most vendors now offer a pthreads implementation. but in practice.

All rights reserved.. and messages. . tasks. Frank Liu Copyright © 1997 by David R. ¢thread. Atomic functions may block. synchronizing resources.1 Threads The Thread interface exports an exception and the functions that support thread creation. reproduction and/or distribution are strictly prohibited and violate applicable laws. and different terms are often used for the same concepts. . synchronization mechanisms may be called events. extern int Thread_join (T t).)... As the last two paragraphs show. This download file is made available for personal use only and is subject to the Terms of Service. Most of the functions described in this chapter must be atomic so that their results and effects are predictable. Unauthorized use. concurrent programming comes with its own jargon. void *args. Thread_new (int apply(void *). int nbytes. . however. #undef T #endif Calls to all of the functions in this interface are atomic.. For example. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. condition variables.INTERFACES 409 another thread. Any other use requires prior written consent from the copyright owner. subtasks.). extern T Thread_self (void). the call is executed without interruption. the synchronization functions in the Sem interface are examples. extern int extern T Thread_init (int preempt.1. extern void Thread_pause(void). extern void Thread_alert(T t). or microtasks. threads may be called lightweight processes.. extern void Thread_exit (int code). Hanson. extern const Except_T Thread_Alerted.h" #define T Thread_T typedef struct T *T. 20. If a thread calls an atomic function. extern const Except_T Thread_Failed. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software.h²≡ #ifndef THREAD_INCLUDED #define THREAD_INCLUDED #include "except.com.

Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 4. the system is initialized for nonpreemptive scheduling. Typical clients initialize the thread system in main. int main(int argc. If preempt is one. Thread_init initializes the thread system to support only nonpreemptive scheduling. … preempt = Thread_init(1.com. which is an opaque pointer. All rights reserved. return EXIT_SUCCESS. … Thread_exit(EXIT_SUCCESS). reproduction and/or distribution are strictly prohibited and violate applicable laws. calling Thread_exit is equivalent to calling exit. assert(preempt == 1). Otherwise. If preemption is supported. Unauthorized use. This value is made available to other threads that may be waiting for the calling thread’s demise.410 THREADS Thread_init initializes the thread system. often specified with name-value pairs. The integer argument is an exit code. For example. If there is only one thread in the system. might initialize the thread system with four priority levels. much like the one passed to the standard library’s exit function. For example. If preempt is zero. before calling Thread_init. Any other use requires prior written consent from the copyright owner. As the code template above suggests. and returns one. and must be called before any of the other functions. Implementations that use this approach usually expect a null pointer as the terminating argument. the thread system is initialized for preemptive scheduling. NULL). Unknown optional arguments are usually ignored. char *argv[]) { int preempt. or to call Thread_init more than once. NULL). Thread_init returns one. Hanson. Thread handles are passed to Thread_join and Thread_alert. for a client that needs preemption. Thread_new creates a new thread and returns its thread handle. as explained below. and Thread_init returns zero. and are returned by Thread_self. preempt = Thread_init(1. "priorities".. . It is a checked runtime error to call any other function in this interface. } Thread_init may also accept additional implementation-dependent arguments. This download file is made available for personal use only and is subject to the Terms of Service. main usually has the following form. threads must terminate execution by calling Thread_exit. or in the Sem and Chan interfaces. for implementations that support priorities. The new thread runs C Interfaces and Implementations: Techniques for Creating Reusable Software.

com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. args is passed to apply unmodified. When the new thread begins execution. NULL). Thread_new raises Thread_Failed if it cannot create the new thread because of resource limitations. If args is nonnull and nbytes is zero. Thread_exit(apply(p)). Any other use requires prior written consent from the copyright owner. implementations may limit the number of C Interfaces and Implementations: Techniques for Creating Reusable Software. often specified as name-value pairs. t = Thread_new(apply. For example. Frank Liu Copyright © 1997 by David R.. and nbytes is the size of that structure. As this example suggests. args is often a pointer to a structure whose fields hold apply’s arguments. args. Thread creation is synchronous: Thread_new returns after the new thread has been created and has received its argument. It is a checked runtime error for apply to be null. it executes the equivalent of void *p = ALLOC(nbytes). Unauthorized use. An example is Thread_T t. the new thread executes the equivalent of Thread_exit(apply(NULL)).INTERFACES 411 independently of the thread that created it. Exceptions are thread-specific. 2. nbytes. TRY-EXCEPT statements executed in one thread cannot affect the exceptions in another one. the new thread executes the equivalent of Thread_exit(apply(args)). All rights reserved. Like Thread_init. memcpy(p. or for args to be nonnull and nbytes to be negative. which creates a new thread with priority two. "priority". nbytes is ignored. That is. This download file is made available for personal use only and is subject to the Terms of Service. but perhaps before the new thread begins execution. nbytes). optional arguments should be terminated with a null pointer. reproduction and/or distribution are strictly prohibited and violate applicable laws. If args is null. which is presumed to point to argument data for the new thread. Hanson. Thread_new may take additional implementationspecific arguments. args. If args is null. That is. apply is called with a copy of the nbytes bytes pointed to by args. . The new thread starts execution with an empty exception stack: It does not inherit the exception state set up by TRY-EXCEPT statements in the calling thread.

either by calling Thread_exit or by responding to Thread_Alerted. If t names a nonexistent thread. If a thread doesn’t catch Thread_Alerted. Thread_join returns zero. the calling thread is resumed. As a special case. Thread_exit(code) terminates execution of the calling thread. It is a checked runtime error for there to be no running threads. Thread_alert arranges for t to clear its flag and to raise Thread_Alerted the next time it calls Thread_join or a blocking communications or synchronization function. . the entire C Interfaces and Implementations: Techniques for Creating Reusable Software. Thread_alert(t) sets t’s “alert-pending” flag. reproduction and/or distribution are strictly prohibited and violate applicable laws. when this limit is exceeded. there’s no need to call Thread_pause with preemptive scheduling.com. If t is blocked. When thread t terminates. Threads may also become blocked when they call a communications function exported by Chan or a synchronization function exported by Sem. Unauthorized use. Thread_alert makes t runnable. Thread_self returns the thread handle of the calling thread. Thread_join can raise Thread_Alerted. Thread_join returns -1 immediately. or for more than one thread to specify a null t. If t is already running. threads must terminate themselves. Threads waiting for the termination of the calling thread (by virtue of Thread_join) are resumed.412 THREADS threads that can exist simultaneously.. and Thread_join returns the integer that was passed to Thread_exit by t. There is no way to terminate a running thread. A new thread begins as running. and arranges for it to clear its alertpending flag and to raise Thread_Alerted the next time it runs. Thread_new raises Thread_Failed. the entire program terminates by calling exit(code). In this case. When the last thread calls Thread_exit. and dead. It is a checked runtime error to pass to Thread_alert a null handle or to a handle of a nonexistent thread. Thread_join(t) causes the calling thread to suspend execution until thread t terminates by calling Thread_exit. the call Thread_join(NULL) waits for all threads to terminate. Thread_pause causes the calling thread to relinquish the processor to another thread that’s ready to run. and the value of code is returned as the result of calling Thread_join in each of the resumed threads. Hanson. if there is one. This download file is made available for personal use only and is subject to the Terms of Service. Frank Liu Copyright © 1997 by David R. All rights reserved. It is a checked runtime error for a nonnull t to name the calling thread. blocked. If it calls Thread_join. it becomes blocked. When a thread calls Thread_exit. it becomes dead. waiting for another thread to terminate. Thread_pause is used primarily in nonpreemptive scheduling. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. Threads have three states: running. including those that might be created by other threads.

INTERFACES 413 program will terminate with an uncaught exception error. The most common response to an alert is to terminate the thread. C Interfaces and Implementations: Techniques for Creating Reusable Software.1. END_TRY. END_TRY. This download file is made available for personal use only and is subject to the Terms of Service. TRY t = Thread_new(…). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.1. or counting. All rights reserved. Unauthorized use. Abstractly. signal(s) is logically equivalent to incrementing it atomically. Hanson.2 General Semaphores General. reproduction and/or distribution are strictly prohibited and violate applicable laws. Thread_exit(EXIT_SUCCESS). 20. semaphores are low-level synchronization primitives. then decrements it atomically: while (s <= 0) . Any other use requires prior written consent from the copyright owner.com. is incorrect.. a semaphore is a protected integer that can be incremented and decremented atomically. Frank Liu Copyright © 1997 by David R. Code like Thread_T t. } The TRY-EXCEPT statement must be executed by the thread itself. wait(s) waits for s to become positive. The two operations on a semaphore s are wait and signal. which can be accomplished with apply functions that have the following general form. because the TRY-EXCEPT applies to the calling thread. not to the new thread. Thread_exit(EXIT_SUCCESS). EXCEPT(Thread_Alerted) Thread_exit(EXIT_FAILURE). . s = s . int apply(void *p) { TRY … EXCEPT(Thread_Alerted) Thread_exit(EXIT_FAILURE).

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Once initialized. Any other use requires prior written consent from the copyright owner. Unauthorized use. #undef T #endif A semaphore is a pointer to an instance of a Sem_T structure. Sem_init accepts a pointer to a Sem_T and an initial value for its counter. Sem_new is the atomic equivalent of C Interfaces and Implementations: Techniques for Creating Reusable Software. but only so that they can be allocated statically or embedded in other structures. This interface reveals the innards of Sem_Ts. It is an unchecked runtime error to call Sem_init on the same semaphore more than once. it then initializes the semaphore’s data structures and sets its counter to the specified initial value. actual implementations block the calling thread. All rights reserved. This download file is made available for personal use only and is subject to the Terms of Service.414 THREADS Of course. and exports an initialization function and the two synchronization functions: ¢sem. ¢exported macros extern extern extern extern 416² void Sem_init (T *s. void Sem_signal(T *s). it is an unchecked runtime error to access the fields of a Sem_T directly.com. The Sem interface wraps the counter in a structure. . void *queue.. Hanson.h²≡ #ifndef SEM_INCLUDED #define SEM_INCLUDED #define T Sem_T typedef struct T { int count. Frank Liu Copyright © 1997 by David R. T *Sem_new (int count). void Sem_wait (T *s). they don’t loop as this explanation suggests. reproduction and/or distribution are strictly prohibited and violate applicable laws. It is a checked runtime error to pass a null Sem_T pointer to any function in this interface. } T. Clients must treat Sem_T as an opaque type and access fields of Sem_T values only via the functions in this interface. int count). a pointer to the Sem_T can be passed to the two synchronization functions.

NEW(s). Hanson. or mutex. All rights reserved. a thread t blocked on a semaphore s will be resumed before other threads that call Sem_wait(&s) after t. The queuing implicit in the Sem_wait and Sem_signal operations is first-in. and uses it to ensure that only one thread at a time executes statements. is a general semaphore whose counter is zero or one.INTERFACES 415 Sem_T *s. If the calling thread’s alert-pending flag is set. A mutex is used for mutual exclusion. It is an unchecked runtime error to pass an uninitialized semaphore to Sem_wait or to Sem_signal. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. This download file is made available for personal use only and is subject to the Terms of Service. the thread stops waiting and raises Thread_Alerted without decrementing the counter. 1). Sem_init(s. reproduction and/or distribution are strictly prohibited and violate applicable laws. Sem_signal accepts a pointer to a Sem_T and increments its counter atomically. For example. That is. Unauthorized use. Sem_wait raises Thread_Alerted immediately and does not decrement the counter. This idiom is so common that Sem exports macros for it that implement a LOCK-END_LOCK statement with the syntax: C Interfaces and Implementations: Techniques for Creating Reusable Software. If other threads are waiting for the counter to become positive and the Sem_signal operation causes it to become positive. Sem_wait accepts a pointer to a Sem_T. creates and initializes a binary semaphore. count). It is a checked runtime error to call Sem_wait before calling Thread_init. and it’s fair. one of those threads will complete its call to Sem_wait. first-out. Sem_new can raise Mem_Failed.com. If the alert-pending flag is set while the thread is blocked. waits for its counter to become positive. Frank Liu Copyright © 1997 by David R. statements Sem_signal(&mutex). … Sem_wait(&mutex). and returns. A binary semaphore. It is a checked runtime error to call Sem_wait before calling Thread_init. Any other use requires prior written consent from the copyright owner. This operation is atomic. which is an example of a critical region. Sem_T mutex. decrements the counter by one. Sem_init(&mutex. .

Table_T table. Mutexes are often embedded in ADTs to make accessing them threadsafe. because if an exception occurs.com. The LOCK statement helps avoid the common and disastrous errors of omitting the call to Sem_signal at the end of a critical region. All rights reserved. #define END_LOCK Sem_signal(_yymutex). statements FINALLY Sem_signal(&mutex). \ Sem_wait(_yymutex). For example. This download file is made available for personal use only and is subject to the Terms of Service. } Protected_Table_T. The code C Interfaces and Implementations: Techniques for Creating Reusable Software. then LOCK-END_LOCK must not be used. Any other use requires prior written consent from the copyright owner.. but then every use of LOCKEND_LOCK incurs the overhead of the TRY-FINALLY statement. the proper idiom is TRY Sem_wait(&mutex). typedef struct { Sem_T mutex. reproduction and/or distribution are strictly prohibited and violate applicable laws. and of calling Sem_signal with the wrong semaphore.416 THREADS LOCK(mutex) statements END_LOCK where mutex is a binary semaphore initialized to one. Unauthorized use. In this case. The FINALLY clause ensures that the mutex is released whether or not an exception occurred. . ¢exported macros 416²≡ #define LOCK(mutex) do { Sem_T *_yymutex = &(mutex). } while (0) If statements can raise an exception. Hanson. associates a mutex with a table. the mutex will not be released. END_TRY. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Frank Liu Copyright © 1997 by David R. A reasonable alternative is to incorporate this idiom in the definitions for LOCK and END_LOCK.

END_TRY. key. extern T Chan_new (void).mutex). extern int Chan_send (T c.h²≡ #ifndef CHAN_INCLUDED #define CHAN_INCLUDED #define T Chan_T typedef struct T *T. creates a protected table. 1). FINALLY Sem_signal(&tab.table. #undef T #endif C Interfaces and Implementations: Techniques for Creating Reusable Software. not its address. extern int Chan_receive(T c. This download file is made available for personal use only and is subject to the Terms of Service.com.1. fetches the value associated with key atomically.mutex). int size). Notice that LOCK takes the mutex. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Table_put(tab.table = Table_new(…). END_LOCK.table. and LOCK(tab..INTERFACES 417 Protected_Table_T tab.mutex) value = Table_get(tab. void *ptr. reproduction and/or distribution are strictly prohibited and violate applicable laws. Any other use requires prior written consent from the copyright owner. Unauthorized use. const void *ptr. Since Table_put can raise Mem_Failed. tab. All rights reserved.mutex. key). Frank Liu Copyright © 1997 by David R. Sem_init(&tab. 20. additions to tab should be made with code like TRY Sem_wait(&tab. Hanson. value). int size). .3 Synchronous Communication Channels The Chan interface provides synchronous communication channels that can be used to pass data between threads. ¢chan.

. which is a pointer. a pointer to a buffer that is to receive the data. Frank Liu Copyright © 1997 by David R. Chan_send and Chan_receive raise Thread_Alerted immediately. Chan_receive returns the number of bytes accepted. a null ptr. Chan_new can raise Mem_Failed. Chan_send accepts a channel. It is a checked runtime error to pass a null Chan_T. If the calling thread’s alert-pending flag is set. the excess bytes are discarded. The caller waits until another thread calls Chan_send with the same channel. Chan_send and Chan_receive both accept a size of zero. is an example of the use of semaphores for synchronization. Chan_receive accepts a channel. The calling thread waits until another thread calls Chan_receive with the same channel. and both calls return. This download file is made available for personal use only and is subject to the Terms of Service. the data is copied from the sender to the receiver and both calls return. 20. Unauthorized use. because of the over- C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. threads execute concurrently. the data is copied to the receiver from the sender. A group of cooperating threads can work on independent parts of a problem. or a negative size to either function. It is a checked runtime error to call any function in this interface before calling Thread_init. Chan_send returns the number of bytes accepted by the receiver. on a single-processor system. the program will actually run a bit slower.com. On a system with multiple processors. Hanson. Of course. the data may or may not have been transmitted. when this rendezvous occurs. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. . If the sender supplies more than size bytes.418 THREADS Chan_new creates. a pointer to a buffer that holds the data to be sent. Chan’s implementation. at least conceptually. initializes. when this rendezvous occurs.1 Sorting Concurrently With preemption. and the use of semaphores for mutual exclusion. If the alert-pending flag is set while the thread is blocked. and the number of bytes that buffer holds. 20. the thread stops waiting and raises Thread_Alerted. Any other use requires prior written consent from the copyright owner.2 Examples The three programs in this section illustrate simple uses of threads and channels. In this case. this approach uses concurrency to reduce overall execution time. detailed in the next section.2. and returns a new channel. and the maximum number of bytes that buffer can hold.

n = 100000. for (i = 1.h" "fmt. which srand uses to set the C Interfaces and Implementations: Techniques for Creating Reusable Software. *x. Thread_exit(EXIT_SUCCESS). for (i = 0.h> <stdio. argc. sorts them concurrently. i < n.h" ¢sort types 421² ¢sort data 422² ¢sort functions 420² main(int argc. Frank Liu Copyright © 1997 by David R. Hanson. } Licensed by Frank Liu 1740749 time. time returns some integral encoding of the calendar time. and checks that the result is sorted: ¢sort. srand. Unauthorized use. assert(i == n).. however. assert(preempt == 1). NULL). All rights reserved. illustrate the use of the Thread interface.h" "thread. preempt. Any other use requires prior written consent from the copyright owner. This approach does. srand(time(NULL)).c²≡ #include #include #include #include #include #include #include <stdlib.h> "assert. if (argc >= 2) n = atoi(argv[1]). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. i++) if (x[i] < x[i-1]) break. . preempt = Thread_init(1. n.h> <time.EXAMPLES 419 head of switching between threads. i < n. return EXIT_SUCCESS. sort(x.h" "mem. x = CALLOC(n. argv). and rand are standard C library functions. This download file is made available for personal use only and is subject to the Terms of Service. sort generates a specified number of random integers. Sorting is a problem that can be easily decomposed into independent subparts. i++) x[i] = rand(). reproduction and/or distribution are strictly prohibited and violate applicable laws.com. sizeof (int)). char *argv[]) { int i.

n . int n.1). void quick(int a[]. Subsequent calls to rand return the numbers in this sequence.. ub).. ¢sort functions 420²≡ int partition(int a[]. All rights reserved. quick(a. Frank Liu Copyright © 1997 by David R. k. } } t = a[k].j] so that all the values in a[i. The recursion bottoms out when the subarrays are empty. 0. t. all the values in a[k+1. Unauthorized use. The textbook implementation partitions the array into two subarrays separated by a “pivot” value. j) arbitrarily picks a[i] as the pivot value. The function sort is an implementation of quicksort. int lb. while (a[i] < v && i < j) i++. quick(a. while (a[j] > v ) j--. i. v..1). a[i] = a[j]. k . int j) { int v. a[k] = a[j]. while (i < j) { i++. and then calls itself recursively to sort each subarray. ub).j] are greater than v. Hanson. if (i < j) { t = a[i]. int argc. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. } C Interfaces and Implementations: Techniques for Creating Reusable Software. lb. } partition(a. k + 1. j--. return j. int i. lb. char *argv[]) { quick(x. k = i. and a[k] holds v..n-1] with n random numbers. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. v = a[k].. Any other use requires prior written consent from the copyright owner.k-1] are less than or equal to the pivot. a[j] = t. int ub) { if (lb < ub) { int k = partition(a.com. } } void sort(int *x. . It rearranges a[i.420 THREADS seed for generating a sequence of pseudo-random numbers. j++. a[j] = t. sort begins by filling x[0.

sizeof *p. lb.k-1] is sorted by ¢quick 421²≡ p->lb = lb. ¢sort functions 420²+≡ int quick(void *cl) { struct args *p = cl. t. and partition returns k. a[k+1. Similarly. where cutoff gives the minimum number of elements required to sort the subarray in a separate thread. . t = Thread_new(quick. k . All rights reserved.1. NULL). }. This download file is made available for personal use only and is subject to the Terms of Service. but only if there are enough elements in the subarray to make it worthwhile.. p. a[lb. ¢quick 421² } return EXIT_SUCCESS. int lb = p->lb.. ub. } else quick(p). First. ub = p->ub. The recursive calls to quick can be executed concurrently by separate threads. quick’s arguments must be packaged in a structure so that quick can be passed to Thread_new: ¢sort types 421²≡ struct args { int *a.com. ub). p->ub = k . int lb.EXAMPLES 421 The last exchange in partition leaves v in a[k].. } The recursive calls are executed in a separate thread.%d\n". Frank Liu Copyright © 1997 by David R. For example. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use.lb > cutoff) { Thread_T t. lb. Hanson.ub] is sorted by C Interfaces and Implementations: Techniques for Creating Reusable Software. reproduction and/or distribution are strictly prohibited and violate applicable laws. if (lb < ub) { int k = partition(p->a. Any other use requires prior written consent from the copyright owner.1). Fmt_print("thread %p sorted %d. if (k ..

} else quick(p). .k > cutoff) { Thread_T t.. p. 100. Fmt_print("thread %p sorted %d.422 THREADS ¢quick 421²+≡ p->lb = k + 1. } Executing sort with the default values of n and cutoff.. ¢sort functions 420²+≡ void sort(int *x. t = Thread_new(quick.lb = 0.%d\n".. spawns 18 threads: % sort thread thread thread thread thread thread thread 69f08 6dfe0 72028 76070 6dfe0 72028 7a0b8 sorted sorted sorted sorted sorted sorted sorted 0. Frank Liu Copyright © 1997 by David R. Unauthorized use. sort makes the initial call to quick.. sizeof *p. t. args. int n. This download file is made available for personal use only and is subject to the Terms of Service..000. Any other use requires prior written consent from the copyright owner..69678 C Interfaces and Implementations: Techniques for Creating Reusable Software.99999 51593. args.91415 51593.. k + 1. quick(&args). Thread_join(NULL). reproduction and/or distribution are strictly prohibited and violate applicable laws.1.a = x. if (argc >= 3) cutoff = atoi(argv[2]).000 and 10.99999 51164.. int argc. which spawns many threads as the sort progresses. Hanson. All rights reserved.73326 73328. char *argv[]) { struct args args. args. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft..73326 73328. if (ub .51162 51164.ub = n . sort then calls Thread_join to wait for all of these threads to terminate: ¢sort data 422²≡ int cutoff = 10000. NULL). p->ub = ub.com. ub).

Unauthorized use.67132 7931. Access must be limited to a critical region in which only one thread at a time is permitted. Hanson. ¢spin.2 Critical Regions Any data that can be accessed by more than one thread in a preemptive system must be protected.h" #define NBUMP 30000 ¢spin types 425² ¢spin functions 424² int n. so the number of threads created and the traces printed by quick will be different for each execution. sort has an important bug: It fails to protect the calls to Fmt_print in quick.EXAMPLES 423 thread thread thread thread thread thread thread thread thread thread thread 7e100 82148 69f08 7e100 6dfe0 69f08 6dfe0 72028 69f08 6dfe0 76070 sorted sorted sorted sorted sorted sorted sorted sorted sorted sorted sorted 73328.h> "assert..26140 26142. All rights reserved. Fmt_print is not guaranteed to be reentrant. 20. C Interfaces and Implementations: Techniques for Creating Reusable Software..83741 3280.51162 14687. There’s no guarantee that printf or any other library routine will work correctly if it’s interrupted and later resumed..h> <stdlib.51162 15696.h" "thread. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.51162 73328.. Any other use requires prior written consent from the copyright owner.h" "sem. and many of the routines in the C library are nonreentrant.37814 Different executions sort different values. reproduction and/or distribution are strictly prohibited and violate applicable laws.. spin is a simple example of the right way and wrong way to access shared data.37814 37816.37814 15696.83614 51593. This download file is made available for personal use only and is subject to the Terms of Service.com.h" "fmt...2..51162 14687. ... Frank Liu Copyright © 1997 by David R.c²≡ #include #include #include #include #include #include <stdio...

each of which calls unsafe with a pointer to a pointer to n: ¢spin functions 424²≡ int unsafe(void *cl) { int i. n. Thread_join(NULL). ¢increment n unsafely 424² Fmt_print("%d == %d\n". reproduction and/or distribution are strictly prohibited and violate applicable laws. i < m. } C Interfaces and Implementations: Techniques for Creating Reusable Software. preempt = Thread_init(1.424 THREADS int main(int argc. Any other use requires prior written consent from the copyright owner. i++) Thread_new(unsafe. NULL). n = 0. for (i = 0. NBUMP*m). if (argc >= 2) m = atoi(argv[1]). Thread_exit(EXIT_SUCCESS). assert(preempt == 1). i < NBUMP. NULL). i++) *ip = *ip + 1. Frank Liu Copyright © 1997 by David R. } spin spawns m threads that each increment n NBUMP times. Unauthorized use. n. return EXIT_SUCCESS.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved. NBUMP*m). 0. *ip = cl. &n. Hanson. for (i = 0. This download file is made available for personal use only and is subject to the Terms of Service. ¢increment n safely 425² Fmt_print("%d == %d\n". char *argv[]) { int m = 5. The first m threads don’t ensure that n is incremented atomically: ¢increment n unsafely 424²≡ { int i. . preempt. n = 0. } main fires off m threads. return EXIT_SUCCESS.com.

mutex = &mutex. return EXIT_SUCCESS. All rights reserved. NULL). i++) LOCK(*p->mutex) *p->ip = *p->ip + 1. Hanson. END_LOCK. If it’s interrupted just after *ip is fetched. i < NBUMP. and other threads increment *ip. sizeof args. Sem_T mutex. int i. } safe ensures that only one thread at a time executes the critical region. i < m. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. Sem_init(&mutex. 1). reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. Thread_join(NULL). int *ip. args.. for (i = 0. i++) Thread_new(safe. args. for (i = 0.com. } C Interfaces and Implementations: Techniques for Creating Reusable Software. }. . Frank Liu Copyright © 1997 by David R. main initializes one binary semaphore that all the threads use to enter the critical region in safe: ¢increment n safely 425²≡ { int i.ip = &n. struct args args. which is statement *ip = *ip + 1. the value assigned to *ip will be incorrect. &args. Unauthorized use. Each of the second m threads call ¢spin types 425²≡ struct args { Sem_T *mutex.EXAMPLES 425 unsafe is wrong because the execution of *ip = *ip + 1 might be interrupted. ¢spin functions 424²+≡ int safe(void *cl) { struct args *p = cl.

426 THREADS Preemption can occur at any time. For example: % sieve 100 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 sieve is an implementation of the well-known Sieve of Eratosthenes for computing primes. Unauthorized use. as depicted in Figure 20. the numbers in each box are the primes associated with that source filter 2 3 5 7 11 filter 13 17 19 23 29 sink 31 37 2. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Figure 20. The source thread (the white box) generates two followed by the odd integers..com. reproduction and/or distribution are strictly prohibited and violate applicable laws. in which each “sieve” is a thread that discards multiples of its primes. This download file is made available for personal use only and is subject to the Terms of Service. The sink also filters out its primes. it is a prime.3 Generating Primes The last example illustrates a pipeline implemented by communication channels.1. Hanson. . sieve N computes and prints the prime numbers less than or equal to N. 3. 5. Channels connect these threads to form a pipeline. Each box in Figure 20. and pass the others down the pipe. but if a number gets by the sink’s filter. Frank Liu Copyright © 1997 by David R..1 A prime-number sieve C Interfaces and Implementations: Techniques for Creating Reusable Software..1 is a thread. The filters (the light gray boxes) between the source and the sink (the dark gray box) discard numbers that are multiples of their primes.2. so each execution of spin can produce different results for the threads that use unsafe: % spin 87102 == 150000 150000 == 150000 % spin 148864 == 150000 150000 == 150000 20. . and fires them down the pipe. Any other use requires prior written consent from the copyright owner. All rights reserved.

Hanson.h> <stdlib. sizeof args. and exits: ¢sieve.h" struct args { Chan_T c. When the sink has accumulated n primes — 5 in Figure 20. Thread_exit(EXIT_SUCCESS).h> "assert. &args. and the lines between the boxes that form the pipeline are channels. char *argv[]) { struct args args. } source emits integers on its “output” channel. Thread_new(source..h" "thread. Thread_init(1.2 shows how the sieve expands as it computes the primes up to and including 100. C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int n.1 — it spawns a fresh copy of itself and turns itself into a filter. NULL).last = argc > 1 ? atoi(argv[1]) : Thread_new(sink. &args.EXAMPLES 427 thread. Any other use requires prior written consent from the copyright owner. connects them with a new channel. which is passed in the c field of the args structure. args. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws.h" "chan. ¢sieve functions 429² int main(int argc. it creates threads for the source and for the sink. args.com. last.c = Chan_new(). Figure 20. NULL). sizeof args. There are n primes attached to the sink and to each filter. After sieve initializes the thread system.n = argc > 2 ? atoi(argv[2]) : args. }. return EXIT_SUCCESS. which is the only field source needs: NULL). 1000. All rights reserved.h" "fmt. This download file is made available for personal use only and is subject to the Terms of Service. 5. .c²≡ #include #include #include #include #include #include <stdio.

Hanson.com. Frank Liu Copyright © 1997 by David R.2 Evolution of the sieve for the primes up to 100 C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved.. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws.428 THREADS source sink 2 3 5 7 11 2 3 5 7 11 13 17 19 23 29 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 Figure 20. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. Unauthorized use. .

0). primes[j] != 0 && x%primes[j] != 0.. ) i += 2. .. Chan_T output) { int j.] 429²≡ for (j = 0. Any other use requires prior written consent from the copyright owner. return EXIT_SUCCESS. Unauthorized use.com. } Chan_receive(input. &x. Chan_send(p->c. until the thread consuming the potential primes has had its fill: ¢sieve functions 429²+≡ void filter(int primes[]. j++) . } source sends two and the succeeding odd integers as long as a receiver accepts them. int i = 2. } primes[0. Hanson. x. reproduction and/or distribution are strictly prohibited and violate applicable laws. so the search loop zips down primes until it either determines that x is not a prime or bumps into the terminator: ¢x is a multiple of primes[0. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. sizeof x).EXAMPLES 429 ¢sieve functions 429²≡ int source(void *cl) { struct args *p = cl. ¢x is a multiple of primes[0.. sizeof x) == 0) break.. until source hears that its receiver read zero bytes. Once the sink has printed all the primes. &x.) { Chan_receive(input. and terminates. Chan_T input. sizeof i)) for (i = 3. This array is terminated with a zero. if (Chan_send(p->c. sizeof i). &x.. at which point it terminates.. it reads zero bytes from its input channel. &i. A filter reads integers from its input channel and writes potential primes to its output channel.n-1] hold the primes associated with a filter. which signals its upstream filter that the job is done. Each filter does likewise. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. All rights reserved.. for (. &i. This download file is made available for personal use only and is subject to the Terms of Service.] 429² if (primes[j] == 0) if (Chan_send(output.

the c field of args holds the sink’s input channel. x might be a prime. int i = 0. then it is a prime. and sink can terminate. j. 0). primes[256]. the n field gives the number of primes per filter. &x. reproduction and/or distribution are strictly prohibited and violate applicable laws. sink initializes its primes array and listens to its input: ¢sieve functions 429²+≡ int sink(void *cl) { struct args *p = cl.com. which gives the range of the primes desired. This download file is made available for personal use only and is subject to the Terms of Service. Chan_receive(input. Fmt_print(" %d". sizeof x). x). ¢x is prime 430²≡ if (x > p->last) break. for (.. .430 THREADS As suggested by the code above. Any other use requires prior written consent from the copyright owner. and sink prints it and adds it to primes. primes[0] = 0. All rights reserved. all of the desired primes have been printed. return EXIT_SUCCESS. Chan_T input = p->c. Before doing so. primes[i] = 0. All of the action is in the sink. Hanson. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.) { Chan_receive(input. if (i == p->n) ¢spawn a new sink and call filter 431² When x exceeds p->last. Frank Liu Copyright © 1997 by David R. and the last field holds N. Unauthorized use..] 429² if (primes[j] == 0) { ¢x is prime 430² } } Fmt_print("\n"). } If x isn’t a multiple of one of the nonzero values in primes. &x.. x. so it is sent down the output channel to another filter or to the sink. it waits for one more integer C Interfaces and Implementations: Techniques for Creating Reusable Software.. ¢x is a multiple of primes[0. the search fails when it ends at the terminating zero. primes[i++] = x. In this case.

3.h" C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. sizeof *p. NULL). Nonpreemptive threads are often called coroutines. After sink accumulates n primes.. All of the switching between threads in sieve occurs in Chan_send and Chan_receive. The sink’s input channel is the filter’s input channel. When filter returns. too.3 Implementations The Chan implementation can be built entirely on top of the Sem implementation. p->c).c²≡ #include <string. Unauthorized use. so Thread also implements Sem. 20. return EXIT_SUCCESS. and it’s a simple example of using threads primarily for structuring an application.com. Thread_new(sink. This download file is made available for personal use only and is subject to the Terms of Service.1 Synchronous Communication Channels A Chan_T is a pointer to a structure that holds three semaphores. Any other use requires prior written consent from the copyright owner.IMPLEMENTATIONS 431 from its input channel. its thread exits.h> #include "assert. so it’s machine-independent. and there’s always at least one thread ready to run. which signals the upstream threads that the computation is complete. Hanson. reproduction and/or distribution are strictly prohibited and violate applicable laws. . which requires a new channel: ¢spawn a new sink and call filter 431²≡ { p->c = Chan_new(). but it depends on the innards of the Thread implementation. machine and operating-system dependencies creep into the code for only context switching and preemption. filter(primes. and a byte count: ¢chan. p. Thus. Frank Liu Copyright © 1997 by David R. 20. A uniprocessor Thread implementation can be made largely independent of both the host machine and its operating system. As detailed below. sieve works with either preemptive or nonpreemptive scheduling. Sem is machine-independent. but reads zero bytes. input. it clones itself and becomes a filter. a pointer to the message. All rights reserved. } The new channel becomes the clone’s input channel and the filter’s output channel.

and zero. return c. All rights reserved. Chan_send sends a message by waiting on send. recv. send is one when a sender can set ptr and size.h" #include "chan. int *size. recv. sync is one when a receiver has successfully copied a message into its private buffer.. and the sync semaphore ensures that the message transmission is synchronous as specified by the Chan interface. Sem_init(&c->send. but only when it is safe to do so. recv is one when ptr and size hold valid pointers to a message and its size. respectively: ¢chan functions 432²≡ T Chan_new(void) { T c. filling in ptr and size. and the counters for the semaphores send. Hanson. A thread sends a message by filling in the ptr and size fields. NEW(c). Similarly. send and recv oscillate: send is one when recv is zero and vice versa. and zero otherwise — for example. before a sender has set ptr and size. Sem_init(&c->sync. Unauthorized use. and zero otherwise — for example. the ptr and size fields are undefined. and sync are initialized to one.h" #define T Chan_T struct T { const void *ptr. sync.h" #include "sem. 0). Sem_init(&c->recv. before a receiver takes the messsage. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. 0). reproduction and/or distribution are strictly prohibited and violate applicable laws. 1). }. ¢chan functions 432² When a new channel is created. . and waiting on sync: C Interfaces and Implementations: Techniques for Creating Reusable Software.432 THREADS #include "mem.com. } The send and recv semaphores control access to ptr and size. zero. signalling recv. Frank Liu Copyright © 1997 by David R. Sem_T send. Any other use requires prior written consent from the copyright owner. This download file is made available for personal use only and is subject to the Terms of Service.

c->ptr. } c->size holds a pointer to the byte count so that the receiver can modify that count. if (size < n) n = size. Sem_signal(&c->sync). copying the message into its argument buffer and modifying the byte count. if (n > 0) memcpy(ptr. reproduction and/or distribution are strictly prohibited and violate applicable laws. c->size = &size. } n is the number of bytes actually received. Frank Liu Copyright © 1997 by David R. Any other use requires prior written consent from the copyright owner. Sem_wait(&c->send). and signalling sync then send: ¢chan functions 432²+≡ int Chan_receive(Chan_T c. c->ptr = ptr. . return size. assert(size >= 0). Chan_receive performs the three steps that complement those done by Chan_send. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Sem_signal(&c->send). Sem_wait(&c->recv). return n.. n = *c->size.IMPLEMENTATIONS 433 ¢chan functions 432²+≡ int Chan_send(Chan_T c.com. which might be zero. void *ptr. int size) { assert(c). const void *ptr. assert(ptr). This code handles all three cases: when the sender’s size exceeds the C Interfaces and Implementations: Techniques for Creating Reusable Software. *c->size = n. All rights reserved. thereby notifying the sender of how many bytes were transmitted. Chan_receive receives a message by waiting on recv. Sem_signal(&c->recv). int size) { int n. n). Sem_wait(&c->sync). assert(c). assert(ptr). Unauthorized use. Hanson. assert(size >= 0). This download file is made available for personal use only and is subject to the Terms of Service.

and when the receiver’s size exceeds the sender’s size.h> </usr/include/signal. thread.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.h" void _MONITOR(void) {} extern void _ENDMONITOR(void). Its name starts with an underscore C Interfaces and Implementations: Techniques for Creating Reusable Software. Any other use requires prior written consent from the copyright owner.2 Threads The Thread implementation. This download file is made available for personal use only and is subject to the Terms of Service.3. All rights reserved. these addresses encompass critical sections — thread code that must not be interrupted. implements the Thread and Sem interfaces: ¢thread.h> <string.c. reproduction and/or distribution are strictly prohibited and violate applicable laws. #define T Thread_T ¢macros 436² ¢types 435² ¢data 435² ¢prototypes 439² ¢static functions 436² ¢thread functions 438² #undef T #define T Sem_T ¢sem functions 457² #undef T The vacuous function _MONITOR and the external function _ENDMONITOR are used only for their addresses. and _ENDMONITOR is defined at the end of the assembly language file so that the critical section includes this assembly code.434 THREADS receiver’s size. . Unauthorized use.h> <sys/time. Frank Liu Copyright © 1997 by David R.c²≡ #include #include #include #include #include #include #include #include #include <stdio.h" "sem.h> <stdlib. As described below. when the two sizes are equal. A little of this code is written in assembly language.h" "mem.. 20.h> "assert. Hanson.h" "thread.

ready points to the last thread in the queue.and operating system–dependent values. ¢fields 435²≡ T link. The inqueue field of each Thread_T structure points to the queue variable — here. Instead of using. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Only one field. These fields appear first in Thread_T structures because they’re accessed by assembly-language code. B. This design makes it possible to avoid allocating any space for queue entries. reproduction and/or distribution are strictly prohibited and violate applicable laws. as suggested by ready’s initial value and tested by the macro C Interfaces and Implementations: Techniques for Creating Reusable Software. which holds the thread’s stack pointer. which holds running threads that do not have processors. A thread handle is an opaque pointer to a Thread_T structure. ¢types 435²≡ struct T { unsigned long *sp. ready — and is used to remove a thread from a queue. The Thread and Sem interfaces are designed to maintain a simple invariant: A thread is on no queue or it is on exactly one queue. A queue is empty when the queue variable is null. is an example: ¢data 435²≡ static T ready = NULL. sp. C. which carries all of the information necessary to determine the state of the thread. Figure 20.IMPLEMENTATIONS 435 because that’s the convention for implementation-defined assembly language names used here. and C. say. This download file is made available for personal use only and is subject to the Terms of Service. This structure is often called a thread control block. is needed on most machines.com. Frank Liu Copyright © 1997 by David R. The ready queue. . queues are represented by circularly linked lists of Thread_T structures. Hanson. T *inqueue. All rights reserved. and new fields can be added without changing existing assembly-language code. Any other use requires prior written consent from the copyright owner.3 shows three threads on the ready queue in the order A. Unauthorized use. /* must be first */ The initial fields hold machine. Placing them first makes these fields easier to access. Seq_Ts to represent queues. Most thread manipulations revolve around putting threads on queues and removing them from queues. ¢fields 435² }. and the queue is linked through the link fields..

Hanson. . if (*q) { t->link = (*q)->link.com. For example. T *q) { assert(t). } else t->link = t.3 Three threads in the ready queue ¢macros 436²≡ #define isempty(q) ((q) == NULL) If a thread t is on a queue. otherwise both fields are null.436 THREADS ready link inqueue A B C Figure 20. reproduction and/or distribution are strictly prohibited and violate applicable laws. assert(t->inqueue == NULL && t->link == NULL). Unauthorized use. (*q)->link = t. t->inqueue = q. *q = t. } C Interfaces and Implementations: Techniques for Creating Reusable Software. put appends a thread to an empty or nonempty queue: ¢static functions 436²≡ static void put(T t. Any other use requires prior written consent from the copyright owner. All rights reserved. The queue functions below use assertions involving the link and inqueue fields to ensure that the invariant mentioned above holds.. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. then t->link and t->inqueue are nonnull. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service.

assert(!isempty(*q)). q equals t and t->inqueue equals &q. t->link = NULL. for (p = *q. put takes the address of the queue variable so that it can modify it: After calling put(t. This download file is made available for personal use only and is subject to the Terms of Service. if (p == t) *q = NULL.IMPLEMENTATIONS 437 Thus.com. Hanson.. if (t == *q) *q = NULL. . p = p->link) . The third and last queue function removes a queued thread from the queue in which appears: ¢static functions 436²+≡ static void delete(T t. get removes the first element from a given queue: ¢static functions 436²+≡ static T get(T *q) { T t. } The code uses the inqueue field to ensure that the thread was indeed in q. if (*q == t) *q = p. t = (*q)->link. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusable Software. assert(!isempty(*q)). C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. put(t. return t. else (*q)->link = t->link. All rights reserved. &ready) appends t to the ready queue. reproduction and/or distribution are strictly prohibited and violate applicable laws. Unauthorized use. p->link != t. &q). assert(t->link && t->inqueue == q). else { p->link = t->link. T *q) { T p. Any other use requires prior written consent from the copyright owner. t->inqueue = NULL. assert(t->inqueue == q). and it clears the link and inqueue fields to mark the thread as not being in any queue.

and the second ensures that the queue is nonempty. Thread_init creates the “root” thread (the Thread_T structure for the root thread is allocated statically): ¢thread functions 438²≡ int Thread_init(int preempt.. so testing for a null current. which it must be since t is in it. static int nthreads.com.. .438 THREADS } t->link = NULL. Checking for a nonnull current in the other Thread and Sem functions implements the checked C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. nthreads = 1. . Unauthorized use. assert(current == NULL). Frank Liu Copyright © 1997 by David R. Thread_init has not been called. If current is null. Thread_new increments nthreads and Thread_exit decrements it. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. This download file is made available for personal use only and is subject to the Terms of Service.) { assert(preempt == 0 || preempt == 1). current is the thread that currently holds the processor. } ¢data 435²+≡ static T current. static struct Thread_T root. Hanson. if (preempt) { ¢initialize preemptive scheduling 454² } return 1. The if statement handles the case in which t is the only thread on q.handle = &root. root. The handle field simply points to the thread handle and helps check the validity of handles: t identifies an existing thread only if t is equal to t->handle. Any other use requires prior written consent from the copyright owner. t->inqueue = NULL. implements the checked runtime error that Thread_init must be called only once. ¢fields 435²+≡ T handle. current = &root. reproduction and/or distribution are strictly prohibited and violate applicable laws.. } The first assertion ensures that t is in q. as shown above. and nthreads is the number of existing threads.

which simply returns current: ¢thread functions 438²+≡ T Thread_self(void) { assert(current). which. Frank Liu Copyright © 1997 by David R. for example. Thus. Hanson. Except_stack = current->estack.IMPLEMENTATIONS 439 runtime error that Thread_init must be called before any other Thread. The only disadvantage is that a new thread must be created with a state that looks as if the thread called _swtch. all of which are relatively simple because they’re written in whole or in part in assembly language. A and B treat _swtch as just another function call. This download file is made available for personal use only and is subject to the Terms of Service. because each thread has its own stack and exception state. current = get(&ready). control transfers to. because the first time it runs will be as a result of a return from _swtch. return current. t->estack = Except_stack. Licensed by Frank Liu 1740749 which switches contexts from thread from to thread to. An example is Thread_self. say. helps save A’s state when it switches to B. the static function run: ¢static functions 436²+≡ static void run(void) { T t = current. All rights reserved. T to). _swtch is like setjmp and longjmp: When thread A calls _swtch. Sem. current). } C Interfaces and Implementations: Techniques for Creating Reusable Software. _swtch(t. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. where from and to are pointers to Thread_T structures. implementation-specific primitive. } Switching between threads requires some machine-dependent code. ¢prototypes 439²≡ extern void _swtch(T from. There are numerous possible designs for the context-switch primitives. A’s call to _swtch returns. for example. The Thread implementation uses the single. When B calls _swtch to resume A. Unauthorized use. reproduction and/or distribution are strictly prohibited and violate applicable laws.. This simple design also takes advantage of the machine’s calling sequence. or Chan function.com. thread B. _swtch is called in only one place. .

t) must work properly. The estack field holds the pointer to the exception frame at the top of a thread’s exception stack. put(current. reproduction and/or distribution are strictly prohibited and violate applicable laws.4 depicts the context switches between threads A. When A calls Thread_pause. A Thread_pause() Thread_join(C) Thread_exit(0) The vertical solid arrows in Figure 20.4 show when each thread has the processor. All rights reserved. and B is removed and gets the processor. _swtch(t. sets current. It dequeues the leading thread from ready. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assuming that A holds the processor initially and ready holds B and C in that order. &ready). ready holds C A. Figure 20. B Thread_pause() Thread_exit(0) C Thread_pause() Thread_exit(0) C Interfaces and Implementations: Techniques for Creating Reusable Software. Frank Liu Copyright © 1997 by David R. This download file is made available for personal use only and is subject to the Terms of Service.440 THREADS ¢fields 435²+≡ Except_Frame *estack. ¢thread functions 438²+≡ void Thread_pause(void) { assert(current). run(). and run takes care of updating Except’s global Except_stack. Thus. which is described on page 53. Thread_pause puts it on ready and run removes it and switches to it. While B is running. . Thread_pause is the simplest example: It puts current on ready and calls run.com. and they put the current thread on ready or another appropriate queue before calling run. C is removed from ready and gets the processor. it’s added to ready. When B calls Thread_pause. } If there’s only one running thread. the ready queue is shown in brackets beside the solid arrows. and C that execute the following calls. All of the Thread and Sem functions that can cause a context switch call run. The Thread function and the call to _swtch it causes appear under each context switch. Hanson. and the horizontal dashed arrows are the context switches. run switches from the currently executing thread to the thread at the head of the ready queue. Unauthorized use. Any other use requires prior written consent from the copyright owner. B.. and switches to the new thread.

C terminates by calling Thread_exit. Deadlock is a checked runtime error. At this point. the leading thread in ready. that is. so Thread_exit calls exit. When B calls Thread_exit. Hanson. it blocks on C’s termination. Deadlock occurs when ready is empty and run is called. Any other use requires prior written consent from the copyright owner.com. so the processor is given to B. which causes A to be put back in ready as a result of C’s termination. When A calls Thread_join(C). when Thread_exit calls run. Unauthorized use. Frank Liu Copyright © 1997 by David R. A’s call to Thread_exit does not cause a context switch.A) [] Thread_exit(0) exit(0) Figure 20. C Interfaces and Implementations: Techniques for Creating Reusable Software. however: A is the only thread in the system.C) [] Thread_exit(0) _swtch(C. A gets the processor.IMPLEMENTATIONS 441 time A [B C] Thread_pause() _swtch(A.C) [A B] Thread_pause() _swtch(C.A) [B C] Thread_join(C) _swtch(A.B) B C [C A] Thread_pause() _swtch(B. because A is in a queue associated with C.B) [C] Thread_exit(0) _swtch(B. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.4 Context-switching between three threads and ready holds A B. ready again holds B C while A is executing. All rights reserved. ready holds only C.. reproduction and/or distribution are strictly prohibited and violate applicable laws. there are no running threads. After C calls Thread_pause. Thus. and it’s detected in get when it’s called with an empty ready queue. This download file is made available for personal use only and is subject to the Terms of Service. run switches to C and ready becomes empty. .

Thread_join returns −1 immediately. testalert(). testalert(). . run(). testalert raises Thread_Alerted if the calling thread has been alerted. t is an existing thread only if t->handle is equal to t. Unauthorized use. ¢fields 435²+≡ int code. There are two flavors of Thread_join: Thread_join(t) waits for thread t to terminate and returns t’s exit code — the value t passed to Thread_exit. } else return -1. otherwise. } } As described below. Thread_exit stores its argument in the code field of each of the Thread_Ts in t->join as it moves those threads to the ready queue. if (t) { ¢wait for thread t to terminate 442² } else { ¢wait for all threads to terminate 443² return 0. T join. Any other use requires prior written consent from the copyright owner.com. As shown below. ¢thread functions 438²+≡ int Thread_join(T t) { assert(current && t != current). ¢wait for thread t to terminate 442²≡ if (t->handle == t) { put(current. return current->code. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Thread_join(NULL) waits for all threads to terminate and returns zero. the calling thread puts itself on t’s join queue to wait for its demise. C Interfaces and Implementations: Techniques for Creating Reusable Software. &t->join).. This download file is made available for personal use only and is subject to the Terms of Service. reproduction and/or distribution are strictly prohibited and violate applicable laws.442 THREADS Thread_join and Thread_exit illustrate the queue manipulations involving “join queues” and the ready queue. Frank Liu Copyright © 1997 by David R. only one thread can call Thread_join(NULL). t must not be the calling thread. When t is nonnull and refers to an existing thread. When t terminates. Hanson. Thread_exit clears the handle field when a thread terminates. All rights reserved.

Frank Liu Copyright © 1997 by David R. ¢resume threads waiting for current’s termination ¢run another thread or exit 444² } ¢fields 435²+≡ T next. Hanson. resume the threads waiting for the calling thread to terminate and arrange for them to get the exit code. 444² C Interfaces and Implementations: Techniques for Creating Reusable Software. which holds the one and only thread waiting for all others to terminate: ¢wait for all threads to terminate 443²≡ assert(isempty(join0)). it will be the only existing thread.com. which occurs when nthreads is equal to one. run(). The next time the calling thread runs. &join0). . } current->handle = NULL. and check whether the calling thread is the second to last or last thread in the system. and it is returned by Thread_join in each resumed thread. Unauthorized use. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. When t is null. This download file is made available for personal use only and is subject to the Terms of Service. freelist = current. } ¢data 435²+≡ static T join0. the calling thread is put on join0. if (current != &root) { current->next = freelist. release(). reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.. Thread_exit has numerous jobs to do: It must deallocate the resources associated with the calling thread. that exit code is readily available.IMPLEMENTATIONS 443 Thus. This code also handles the case when the calling thread is the only thread in the system. when those threads execute again. testalert(). ¢thread functions 438²+≡ void Thread_exit(int code) { assert(current). if (nthreads > 1) { put(current.

If the calling thread is the root thread. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. as detailed below. Frank Liu Copyright © 1997 by David R. Thread_exit concludes by decrementing nthreads and either calling the library function exit or running another thread: ¢run another thread or exit 444²≡ if (--nthreads == 0) exit(code). else run(). The call to release and the code that appends current to freelist collaborate to deallocate the calling thread’s resources. its storage must not be deallocated. and those threads waiting for its demise can now be resumed: ¢resume threads waiting for current’s termination while (!isempty(current->join)) { T t = get(&current->join).444 THREADS ¢data 435²+≡ static T freelist. ¢resume threads waiting for current’s termination if (!isempty(join0) && nthreads == 2) { assert(isempty(ready)).com. Unauthorized use. t->code = code. . Hanson. put(get(&join0). put(t. &ready). because one of the two existing threads is in join0 and the other one is executing Thread_exit. ready must be empty. If only two threads exist and one of them is in join0. } 444²≡ The calling thread’s exit code is copied to the code field in the Thread_T structures of the waiting threads so that current can be deallocated. This download file is made available for personal use only and is subject to the Terms of Service. &ready). C Interfaces and Implementations: Techniques for Creating Reusable Software.. } 444²+≡ The assertion helps detect errors in maintaining nthreads and ready: If join0 is nonempty and nthreads is two. Clearing the handle field marks the thread as nonexistent. Any other use requires prior written consent from the copyright owner. that waiting thread can now be resumed. because that storage is allocated statically. All rights reserved.

RAISE(Thread_Alerted). &ready). or when it is resumed after being blocked.. Frank Liu Copyright © 1997 by David R. see page 458. The latter case always occurs after a call to run. } } ¢fields 435²+≡ int alerted. Hanson. Similar usage appears in Sem_wait and Sem_signal. testalert is called whenever a thread is about to block. and it’s illustrated by the calls to testalert in the chunks ¢wait for thread t to terminate 442² and ¢wait for all threads to terminate 443². Any other use requires prior written consent from the copyright owner.IMPLEMENTATIONS 445 Thread_alert marks a thread as “alerted” by setting a flag in its Thread_T structure and removing it from the queue. because the calling thread has a different state than t. which is the purpose of testalert: ¢static functions 436²+≡ static void testalert(void) { if (current->alerted) { current->alerted = 0. This download file is made available for personal use only and is subject to the Terms of Service. t->alerted = 1. ¢thread functions 438²+≡ void Thread_alert(T t) { assert(current). C Interfaces and Implementations: Techniques for Creating Reusable Software. All rights reserved. Thread_alert itself cannot raise Thread_Alerted. reproduction and/or distribution are strictly prohibited and violate applicable laws. if it’s in one. The former case is illustrated by the call to testalert at the beginning of Thread_join on page 442. Unauthorized use. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. assert(t && t->handle == t). } } ¢data 435²+≡ const Except_T Thread_Alerted = { "Thread alerted" }. if (t->inqueue) { delete(t. t->inqueue). Threads must raise Thread_Alerted and deal with it themselves. put(t.com. .

¢thread functions 438²+≡ T Thread_new(int apply(void *). initialize the new thread’s state so that it can be started by a return from _swtch.3 Thread Creation and Context-Switching The last Thread function is Thread_new. Frank Liu Copyright © 1997 by David R. memset(t. the only resources a thread needs are the Thread_T structure and a stack. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. sizeof *t). if (args == NULL) nbytes = 0. The Thread_T structure and a 16K byte stack are allocated with a single call to Mem’s ALLOC: ¢allocate resources for a new thread 446²≡ { int stacksize = (16*1024+sizeof (*t)+nbytes+15)&~15. All rights reserved. and append the new thread to ready. assert(current). put(t. } In this uniprocessor implementation of Thread.. assert(apply). void *args. . Unauthorized use. Thread_new has four tasks: allocate the resources for a new thread. int nbytes.. This download file is made available for personal use only and is subject to the Terms of Service.3.) { T t.. but most of it is nearly machine-independent.com. ¢begin critical region 447² TRY t = ALLOC(stacksize). assert(args && nbytes >= 0 || args == NULL). because it interacts with _swtch. ¢initialize t’s state 449² nthreads++.446 THREADS 20. &ready). EXCEPT(Mem_Failed) C Interfaces and Implementations: Techniques for Creating Reusable Software. . reproduction and/or distribution are strictly prohibited and violate applicable laws. return t. '\0'. Some of Thread_new is machine-dependent. release(). Any other use requires prior written consent from the copyright owner. Hanson. increment nthreads. ¢allocate resources for a new thread 446² t->handle = t.

Unauthorized use. with critical set to a positive value that would never be decremented. the most important of which is that a call to a Thread function must not be interrupted. as described below.com. END_TRY. If it didn’t catch the exception. ¢end critical region 447²≡ critical--.5. These kinds of calls are bracketed by chunks that identify critical regions by incrementing and decrementing the value of critical: ¢begin critical region 447²≡ do { critical++. The other mechanism handles interrupts when control is in a routine that is called by a Thread function. after it has completed the critical section. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusable Software. interrupts that occur when critical is nonzero are ignored. and the ones at the bottom are the copy of args and the initial frames. All rights reserved. ¢end critical region 447² if (t == NULL) RAISE(Thread_Failed). This code is complex because it must maintain several invariants. Thread_failed. Thread_new must catch Mem_Failed itself. and it initializes the sp field as depicted in Figure 20. Frank Liu Copyright © 1997 by David R.IMPLEMENTATIONS 447 t = NULL. control would pass to the caller’s exception handler. which is illustrated by the calls to ALLOC and memset. Thread_new assumes that stacks grow toward lower addresses. ¢data 435²+≡ static int critical. As shown on page 455. } while (0). and raise its exception. This download file is made available for personal use only and is subject to the Terms of Service. ¢initialize t’s stack pointer 448² } ¢data 435²+≡ const Except_T Thread_Failed = { "Thread creation failed" }.. Two mechanisms collaborate to maintain this invariant: one deals with interrupts that occur when control is in a Thread function. reproduction and/or distribution are strictly prohibited and violate applicable laws. the shaded box at the top is the Thread_T structure. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. . and is described below. Hanson.

448

THREADS

sp t

args

Figure 20.5 Allocation of a Thread_T structure and a stack ¢initialize t’s stack pointer 448²≡ t->sp = (void *)((char *)t + stacksize); while (((unsigned long)t->sp)&15) t->sp--; As suggested by the assignment to stacksize and by this chunk, Thread_new initializes the stack pointer so that it’s aligned on a 16-byte boundary, which accommodates most platforms. Most machines require either a four-byte or eight-byte stack alignment, but the DEC ALPHA requires a 16-byte alignment. Thread_new starts by calling release, which Thread_exit also calls. Thread_exit can’t deallocate the current thread’s stack because it’s using it. So it adds the thread handle to freelist, and delays the deallocation until the next call to release: ¢static functions 436²+≡ static void release(void) { T t;

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATIONS

449

¢begin critical region 447² while ((t = freelist) != NULL) { freelist = t->next; FREE(t); } ¢end critical region 447² } release is more general than necessary: freelist has only one element, because release is called by both Thread_exit and Thread_new. If only Thread_new had called release, dead Thread_Ts could accumulate on freelist. release uses a critical section because it calls Mem’s FREE. Next, Thread_new initializes the new thread’s stack so that it holds a copy of the nbytes bytes starting at args, and the frames needed to make it appear as if the thread had called _swtch. This latter initialization is machine-dependent: ¢initialize t’s state 449²≡ if (nbytes > 0) { t->sp -= ((nbytes + 15U)&~15)/sizeof (*t->sp); ¢begin critical region 447² memcpy(t->sp, args, nbytes); ¢end critical region 447² args = t->sp; } #if alpha { ¢initialize an ALPHA stack 463² } #elif mips { ¢initialize a MIPS stack 461² } #elif sparc { ¢initialize a SPARC stack 452² } #else Unsupported platform #endif The bottom of the stack shown in Figure 20.5 depicts the result of these initializations: The darker shading identifies the machine-dependent frames and the lighter shading is the copy of args. thread.c and swtch.s are the only modules in this book that use conditional compilation.

Licensed by Frank Liu 1740749

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

450

THREADS

The stack initialization is easier to understand after digesting an assembly-language implementation of _swtch: ¢swtch.s²≡ #if alpha ¢ALPHA swtch 462² ¢ALPHA startup 463² #elif sparc ¢SPARC swtch 450² ¢SPARC startup 452² #elif mips ¢MIPS swtch 460² ¢MIPS startup 461² #else Unsupported platform #endif _swtch(from, to) must save from’s state, restore to’s state, and continue executing to by returning from to’s most recent call to _swtch. Calling conventions save much of the state, because they usually dictate that the values of some registers must be saved across calls, and that some machine-state information, such as condition codes, is not saved. So _swtch saves only the state it needs that is not preserved by the calling conventions — the return address, for example — and it can save these values on the calling thread’s stack. The SPARC _swtch is perhaps the easiest one because the SPARC calling convention saves all of the registers by giving each function its own “register window”; the only registers it must preserve are the frame pointer and the return address. ¢SPARC swtch 450²≡ .global __swtch .align 4 .proc 4 1 __swtch:save 2 st 3 st 4 ta 5 st 6 ld 7 ld

%sp,-(8+64),%sp %fp,[%sp+64+0] %i7,[%sp+64+4] 3 %sp,[%i0] [%i1],%sp [%sp+64+0],%fp

! save from’s frame pointer ! save from’s return address ! flush from’s registers ! save from’s stack pointer ! load to’s stack pointer ! restore to’s frame pointer

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATIONS

451

8 9 10

ld [%sp+64+4],%i7 ret restore

! restore to’s return address ! continue execution of to

The line numbers above identify the nonboilerplate lines for the explanation below, and they are not part of the assembly-language code. By convention, assembly-language names are prefixed with an underscore, so _swtch is known as __swtch in assembly language on the SPARC. Figure 20.6 shows the layout of a frame for _swtch; all SPARC frames have at least 64 bytes at the top of the frame into which the operating system stores the function’s register window, when necessary. The other two words in _swtch’s 72-byte frame hold the saved frame pointer and the return address. Line 1 in _swtch allocates a stack frame for _swtch. Lines 2 and 3 save from’s frame pointer (%fp) and return address (%i7) at the seventeenth and eighteenth 32-bit words in the new frame (at offsets 64 and 68). Line 4 makes a system call to “flush” from’s register windows to the stack, which is necessary in order to continue execution with to’s register windows. This call is unfortunate: one of the presumed advantages of user-level threads is that context-switching does not require kernel intervention. On the SPARC, however, only the kernel can flush the register windows. Line 5 saves from’s stack pointer in the sp field of its Thread_T structure. This instruction shows why that field is first: This code is independent of the size of a Thread_T and the locations of the other fields. Line 6 is italicized because it is the actual context switch. This instruction loads to’s stack pointer into %sp, the stack pointer register. Henceforth, _swtch is executing on to’s stack. Lines 7 and 8 restore to’s frame

%sp

64 bytes = 16 words

saved frame pointer return address

%sp+64 %sp+68

Figure 20.6 Layout of a stack frame for _swtch

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

452

THREADS

pointer and return address, because %sp now points at the top of to’s stack. Lines 9 and 10 comprise the normal function return sequence, and control continues at the address saved the last time to called _swtch. Thread_new must create a frame for _swtch so that some other call to _swtch can return properly and start execution of the new thread, and this execution must call apply. Figure 20.7 shows what Thread_new builds: The frame for _swtch is on the top of the stack, and the frame under it is for the following startup code. ¢SPARC startup 452²≡ .global __start .align 4 .proc 4 1 __start:ld [%sp+64+4],%o0 2 ld [%sp+64],%o1 3 call %o1; nop 4 call _Thread_exit; nop 5 unimp 0 .global __ENDMONITOR __ENDMONITOR: The return address in the _swtch frame points to _start, and the startup frame holds apply and args, as shown in Figure 20.7. On the first return from _swtch, control lands at _start (which is __start in the assembly code). Line 1 in the startup code loads args into %o0, which, following the SPARC calling conventions, is used to pass the first argument. Line 2 loads the address of apply into %o1, which is otherwise unused, and line 3 makes an indirect call to apply. If apply returns, its exit code will be in %o0, and thus that value will be passed to Thread_exit, which never returns. Line 5 should never be executed, and will cause a fault if it is. _ENDMONITOR is explained below. The 15 lines of assembly language in _swtch and _start are all that’s necessary on the SPARC; initializing the stack for a new thread as shown in Figure 20.7 can be done entirely in C. The two frames are built bottomup, as follows. ¢initialize a SPARC stack 452²≡ 1 int i; void *fp; extern void _start(void); 2 for (i = 0; i < 8; i++) 3 *--t->sp = 0; 4 *--t->sp = (unsigned long)args;

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATIONS

453

%sp

16 words _swtch frame fp _start-8

16 words

startup frame apply args 8 words

Figure 20.7 Startup and initial _swtch frames on the SPARC 5 6 7 8 9 10 *--t->sp = (unsigned long)apply; t->sp -= 64/4; fp = t->sp; *--t->sp = (unsigned long)_start - 8; *--t->sp = (unsigned long)fp; t->sp -= 64/4;

Lines 2 and 3 create the eight words at the bottom of the startup frame. Lines 4 and 5 push the value of args and apply onto the stack, and line 6 allocates the 64 bytes at the top of the startup frame. The stack pointer at this point is the frame pointer that must be restored by _swtch, so line 7 saves this value in fp. Line 8 pushes the return address — the saved value of %i7. The return address is eight bytes before _start because the SPARC ret instruction adds eight to the address in %i7

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

454

THREADS

when it returns. Line 9 pushes the saved value of %fp, and line 10 concludes with the 64 bytes at the top of the _swtch frame. If apply is a function that takes a variable number of arguments, its entry sequence stores the values in %o0 through %o5 into the stack at offsets 64 through 88 in its caller’s frame, that is, in the startup frame. Lines 2 and 3 allocate this space and an additional eight bytes so that the stack pointer remains aligned on an eight-byte boundary, as dictated by the SPARC hardware. The MIPS and ALPHA versions of _swtch and _start appear in Section 20.3.6.

20.3.4 Preemption
Preemption is equivalent to periodic, implicit calls to Thread_pause. The UNIX-dependent implementation of preemption in Thread arranges for a “virtual” timer interrupt every 50 milliseconds, and the interrupt handler executes code equivalent to Thread_pause. The timer is virtual, because it ticks only while the process is executing. Thread_init uses the UNIX signal facility to initialize timer interrupts. The first step associates the interrupt handler with the virtual timer signal, SIGVTALRM: ¢initialize preemptive scheduling 454²≡ { struct sigaction sa; memset(&sa, '\0', sizeof sa); sa.sa_handler = (void (*)())interrupt; if (sigaction(SIGVTALRM, &sa, NULL) < 0) return 0; } A sigaction structure has three fields: sa_handler is the address of the function that’s to be called when the SIGVTALRM signal occurs, sa_mask is a signal set that specifies other signals that should be blocked while an interrupt is being handled in addition to SIGVTALRM, and sa_flags provides signal-specific options. Thread_init sets sa_handler to interrupt, described below, and clears the other fields. The sigaction function is the POSIX standard function for associating handlers with signals. The POSIX standard is supported by most UNIX variants and by some other operating systems, such as Windows NT. The three arguments give the symbolic name for the signal number, a pointer to the sigaction structure that modifies the action of that signal, and a

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATIONS

455

pointer to another sigaction structure that’s filled in with the previous action for the signal. When the third argument is null, information about the previous action is not returned. The sigaction function returns zero when the signal’s action has been modified as specified by the second argument; it returns −1 otherwise. Thread_init returns zero when sigaction returns −1, to indicate that the thread system cannot support preemptive scheduling. Once the signal handler is in place, the virtual timer is initialized: ¢initialize preemptive scheduling 454²+≡ { struct itimerval it; it.it_value.tv_sec = 0; it.it_value.tv_usec = 50; it.it_interval.tv_sec = 0; it.it_interval.tv_usec = 50; if (setitimer(ITIMER_VIRTUAL, &it, NULL) < 0) return 0; } The it_value field in an itimerval structure specifies the amount of time in seconds (tv_sec) and milliseconds (tv_msec) to the next timer interrupt. The values in the it_interval field are used to reset the it_value field when the timer expires. Thread_init arranges for the timer interrupt to occur every 50 milliseconds. The setitimer function is much like the sigaction function: Its first argument specifies which timer’s action is to be affected (there’s also a timer for real time), the second argument is a pointer to the itimerval structure that holds the new timer values, and the third argument is a pointer to the itimerval structure that gets the previous timer values, or null if the previous values are not needed. setitimer returns zero when the timer is set successfully, and returns −1 otherwise. The signal handler, interrupt, is called when the virtual timer expires. When the interrupt is dismissed, which occurs when interrupt returns, the timer begins anew. interrupt executes the equivalent of Thread_pause, unless the current thread is in a critical region or is somewhere in a Thread or Sem function. ¢static functions 436²+≡ static int interrupt(int sig, int code, struct sigcontext *scp) {

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

456

THREADS

if (critical || scp->sc_pc >= (unsigned long)_MONITOR && scp->sc_pc <= (unsigned long)_ENDMONITOR) return 0; put(current, &ready); sigsetmask(scp->sc_mask); run(); return 0; } The sig argument carries the signal number, and code supplies additional data for some signals. The scp argument is a pointer to a sigcontext structure that, among other values, contains the location counter at the time of the interrupt in the sc_pc field. thread.c begins with the vacuous function _MONITOR, and the assembly-language code in swtch.s ends with a definition for the global symbol _ENDMONITOR. If the object files are loaded into the program so that the object code for swtch.s follows the object code for thread.c, then the interrupted thread is executing a Thread or Sem function if its location counter is between _MONITOR and _ENDMONITOR. Thus, if critical is nonzero, or scp->sc_pc is between _MONITOR and _ENDMONITOR, interrupt returns and thus ignores this timer interrupt. Otherwise, interrupt puts the current thread on ready and runs another one. The call to sigsetmask restores the signals disabled by the interrupt, which are given in the signal set scp->sc_mask; this set usually holds the SIGVTALRM signal only. This call is necessary because the next thread to run may not have been suspended by an interrupt. Suppose, for example, that thread A calls Thread_pause explicitly, and execution continues with thread B. When a timer interrupt occurs, control lands in interrupt with SIGVTALRM signals disabled. B reenables SIGVTALRM, and gives up the processor to A. If the call to sigsetmask is omitted, A would be resumed with SIGVTALRM disabled, because A was suspended by Thread_pause, not by interrupt. When the next timer interrupt occurs, A is suspended and B continues. In this case, calling sigsetmask is redundant, because B dismisses the interrupt, which restores the signal mask. A flag in the Thread_T structure could be used to avoid the unnecessary calls to sigsetmask. The second and succeeding arguments to interrupt handlers are system-dependent. Most UNIX variants support the code and scp argu-

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

IMPLEMENTATIONS

457

ments shown above, but other POSIX-compliant systems may supply different arguments to handlers.

20.3.5 General Semaphores
Creating and initializing semaphores are the easy two of the four Sem functions: ¢sem functions 457²≡ T *Sem_new(int count) { T *s; NEW(s); Sem_init(s, count); return s; } void Sem_init(T *s, int count) { assert(current); assert(s); s->count = count; s->queue = NULL; } Sem_wait and Sem_signal are short, but it is tricky to write implementations that are both correct and fair. The semaphore operations are semantically equivalent to: Sem_wait(s): while (s->count <= 0) ; --s->count; ++s->count;

Sem_signal(s):

These semantics lead to the concise and correct, but unfair, implementations shown below; these implementations also ignore alerts and checked runtime errors. void Sem_wait(T *s) { while (s->count <= 0) { put(current, &s->queue);

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

Any other use requires prior written consent from the copyright owner.. But A might call Sem_wait first and grab the critical region. } These implementations are unfair because they permit “starvation.. Similarly. B resumes but finds that s->count is zero. This download file is made available for personal use only and is subject to the Terms of Service. (Thread_T *)&s->queue). If A is preempted inside the region. testalert(). assert(s). it’s guaranteed to get the semaphore. All rights reserved. B is moved to the ready queue.com.) { Sem_wait(s). One solution is to ensure that when a thread is moved from s->queue to ready. } --s->count. B could cycle between ready and s->queue indefinitely. } void Sem_signal(T *s) { if (++s->count > 0 && !isempty(s->queue)) put(get(&s->queue). Without some intervention. C Interfaces and Implementations: Techniques for Creating Reusable Software. ¢sem functions 457²+≡ void Sem_wait(T *s) { assert(current). Unauthorized use. and B is in s->queue. } Suppose A is in the critical region denoted by the ellipsis. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws.458 THREADS run(). If B executes next. s->count isn’t decremented when a blocked thread is resumed inside of Sem_wait. but not actually incrementing it. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. &ready). and more threads competing for s make starvation more likely. its call to Sem_wait will return and B will enter the critical region. When A calls Sem_signal. This scheme can be implemented by moving a thread from s->queue to ready when s->count is about to be incremented from zero to one. . … Sem_signal(s). if (s->count <= 0) { put(current. and is moved back onto s->queue.” Assume s is initialized to one and threads A and B both execute for (. Hanson.

6 Context-Switching on the MIPS and ALPHA The MIPS and ALPHA versions of _swtch and _start are similar in design to the SPARC versions but the details are different. though C will get it too. } void Sem_signal(T *s) { assert(current).48+36($sp) save the “callersaved” floating-point and integer registers.3. because other threads that call Sem_wait block since s->count is zero. The MIPS version of _swtch appears below. put(t. The store instructions through the sw $31.. All rights reserved. Alerts make Sem_wait hard to understand. Licensed by Frank Liu 1740749 20. so its resumption is unrelated to the value of s->count. Hanson. If a thread blocked on s is alerted.IMPLEMENTATIONS 459 run(). not by Sem_signal.com. assert(!t->alerted). its call to run in Sem_wait returns with its alerted flag set. testalert(). The frame size is 88 bytes. the thread was moved to ready by Thread_Alert. The italicized instruction switches contexts by loading to’s stack pointer. For general semaphores. } else --s->count. } When s->count is zero and thread C is moved to the ready queue. clear its alerted flag. The thread must leave s undisturbed. reproduction and/or distribution are strictly prohibited and violate applicable laws. This download file is made available for personal use only and is subject to the Terms of Service. that opens the door for another thread to get the semaphore before C. and the load instructions that follow restore to’s callersaved registers. if (s->count == 0 && !isempty(s->queue)) { Thread_T t = get((Thread_T *)&s->queue). C may not get the semaphore first: If D calls Sem_signal before C runs again. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. and raise Thread_Alerted. assert(s). register 31 holds the return address. Frank Liu Copyright © 1997 by David R. C Interfaces and Implementations: Techniques for Creating Reusable Software. Unauthorized use. } else ++s->count. In this case. C is guaranteed to get the semaphore. Any other use requires prior written consent from the copyright owner. however. . &ready).

48+12($sp) $20. Any other use requires prior written consent from the copyright owner.48+24($sp) C Interfaces and Implementations: Techniques for Creating Reusable Software.48+32($sp) $31.24($sp) $f28.d l. This download file is made available for personal use only and is subject to the Terms of Service.-48 $f20.48+12($sp) $20.$31 $sp.48+4($sp) $18.48+16($sp) $21.32($sp) $f30.16($sp) $f26.d . Frank Liu Copyright © 1997 by David R.-4 $16.align 2 .48+4($sp) $18. Unauthorized use.d s.globl _swtch .8($sp) $f24.0($sp) $f22.com.48+0($sp) $17..d s. reproduction and/or distribution are strictly prohibited and violate applicable laws.32($sp) $f30.ent _swtch .d s.88 0xfff00000.16($sp) $f26.88.d l.48+36($sp) $sp.8($sp) $f24.frame subu .set reorder _swtch: .text .mask sw sw sw sw sw sw sw sw sw sw sw lw l.48+24($sp) $23.48+20($sp) $22.d s.fmask s. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.48+16($sp) $21.0($sp) $f22. .40($sp) $16.48+8($sp) $19.48+8($sp) $19.460 THREADS ¢MIPS swtch 460²≡ .48+0($sp) $17.0($4) $sp.d s.48+20($sp) $22.48+28($sp) $30.d l.d l.24($sp) $f28.0($5) $f20.d lw lw lw lw lw lw lw $sp.d l. Hanson. All rights reserved.40($sp) 0xc0ff0000.

apply’s first argument is passed in register 4.com.. and returns its result in register 2.end _swtch .globl _ENDMONITOR _ENDMONITOR: # register 23 holds args # register 30 holds apply # Thread_exit(apply(p)) # register 21 holds Thread_exit This code collaborates with the MIPS-dependent portion of Thread_new. The address of Thread_exit is passed in register 21 because the MIPS startup code must be position-independent. so Thread_new builds only a _swtch frame.88 $31 Here’s the MIPS startup code: ¢MIPS startup 461²≡ . but it does allocate four words on the stack under that frame in case apply takes a variable number of arguments. t->sp -= 16/4.48+36($sp) $sp. long)args. The startup code doesn’t need a frame. This download file is made available for personal use only and is subject to the Terms of Service. All rights reserved. and 30. t->sp -= 88/4.$23 move $25.globl _start _start: move $4. Frank Liu Copyright © 1997 by David R. by storing them in the right places in the frame. t->sp[(48+20)/4] = (unsigned t->sp[(48+28)/4] = (unsigned t->sp[(48+32)/4] = (unsigned t->sp[(48+36)/4] = (unsigned long)Thread_exit.IMPLEMENTATIONS 461 lw lw lw addu j $23.$21 jal $25 syscall .48+28($sp) $30. Hanson.48+32($sp) $31. . reproduction and/or distribution are strictly prohibited and violate applicable laws. 23. Any other use requires prior written consent from the copyright owner. which arranges for Thread_exit. The startup code copies the C Interfaces and Implementations: Techniques for Creating Reusable Software. respectively. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.$2 move $25. Unauthorized use.$30 jal $25 move $4. long)_start. args. and apply to appear in registers 21. long)apply. ¢initialize a MIPS stack 461²≡ extern void _start(void).

0($sp) stt $f20.48+16($sp) ldq $13.com.48+40($sp) stq $10.0($sp) ldt $f20. Frank Liu Copyright © 1997 by David R.16($sp) stt $f18.8($sp) ldt $f19. All rights reserved.48+56($sp) .frame $sp. .-112($sp) .48+8($sp) stq $14.globl _swtch .8($sp) stt $f19.48+0($sp) stq $15.462 THREADS address of args to register 4 and the addresses of apply and Thread_exit to register 25 before the calls (the jal instructions) because that’s what is demanded by the MIPS position-independent calling sequence. This download file is made available for personal use only and is subject to the Terms of Service.mask 0x400fe00.32($sp) ldt $f16. Any other use requires prior written consent from the copyright owner. Unauthorized use.48+24($sp) stq $12.112.0($16) ldq $sp. The ALPHA chunks are similar to the corresponding MIPS chunks.16($sp) ldt $f18.48+16($sp) stq $13.48+48($sp) stq $9.24($sp) ldt $f17.40($sp) ldq $26.24($sp) stt $f17.0($17) ldt $f21.32($sp) stt $f16.48+32($sp) # allocate _swtch's frame # save from’s registers # save from’s stack pointer # restore to’s stack pointer # restore to’s registers C Interfaces and Implementations: Techniques for Creating Reusable Software.ent _swtch _swtch: lda $sp. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.40($sp) .-64 stq $26..48+32($sp) stq $11.48+8($sp) ldq $14.48+0($sp) ldq $15.48+24($sp) ldq $12. Hanson.-112 stt $f21.fmask 0x3f0000. reproduction and/or distribution are strictly prohibited and violate applicable laws.$26 .prologue 0 stq $sp. ¢ALPHA swtch 462²≡ .

($26) # deallocate frame ¢ALPHA startup 463²≡ .end ldq ldq ldq lda ret _swtch $11..48+48($sp) $9. It describes most of the problems specific to programming concurrent systems and their solutions.112($sp) $31. Any other use requires prior written consent from the copyright owner.0($26) mov $0. .($27) call_pal0 .0 . C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.com.prologue 0 mov $14.$26 . t->sp -= 112/8.end _start . long)args.48+56($sp) $sp. This download file is made available for personal use only and is subject to the Terms of Service.frame $sp. and remote procedure calls. All rights reserved.globl _start .$27 jsr $26.0.FURTHER READING 463 . long)apply. Unauthorized use. Hanson.globl _ENDMONITOR _ENDMONITOR: ¢initialize an ALPHA stack 463²≡ extern void _start(void). It also describes C Interfaces and Implementations: Techniques for Creating Reusable Software.48+40($sp) $10. Further Reading Andrews (1991) is a comprehensive text about concurrent programming. including synchronization mechanisms.$16 mov $15.$16 mov $13. long)_start. reproduction and/or distribution are strictly prohibited and violate applicable laws.mask 0x0.ent _start _start: . message-passing systems. Frank Liu Copyright © 1997 by David R.$27 jsr $26. t->sp[(48+24)/8] = (unsigned t->sp[(48+16)/8] = (unsigned t->sp[(48+ 8)/8] = (unsigned t->sp[(48+ 0)/8] = (unsigned # # # # # # register 14 holds args register 15 holds apply call apply reload the global pointer Thread_exit(apply(args)) register 13 has Thread_exit long)Thread_exit.($27) ldgp $26.

and Smaalders (1996). Thread is based on Modula-3’s thread interface. sieve is adapted from a similar example that McIlroy (1968) used to illustrate programming with coroutines. OS/2. and threads in the Open Software Foundation’s Distributed Computing Environment. Channels are based on CSP — communicating sequential processes (Hoare 1978). reproduction and/or distribution are strictly prohibited and violate applicable laws. because both languages have facilities to wait nondeterministically on more than one channel. an applicative concurrent language. and thread-safe implementations for lists. Frank Liu Copyright © 1997 by David R. queues. is a guide to programming with threads. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Newsqueak has been used to implement window systems. Any other use requires prior written consent from the copyright owner. which are like nonpreemptive threads. which is derived from experience with the Modula-2+ thread facilities at Digital’s System Research Center (SRC). Chapter 4 in Nelson (1991).com. and hash tables. Coroutines appear in several languages. All rights reserved. Hanson. POSIX threads (Institute for Electrical and Electronic Engineers 1995) and Solaris 2 threads are described in detail by Kleiman. Originally. This practically oriented book includes a chapter on the interaction of threads and libraries. but it is now available for most operating systems. as this environment is known. sometimes under different names. The thread facilities in most modern operating systems are based in some way on the SRC interfaces. Icon’s coexpressions are an example (Wampler and Griswold 1983). and Mach). Chorus. Shah. Unauthorized use. Marlin (1980) surveys many of the original coroutine proposals and describes model implementations in Pascal variants. which makes thread scheduling nondeterministic (but fair). his approach is similar in spirit to sieve. Windows NT. which exemplify the kinds of interactive applications that benefit the most C Interfaces and Implementations: Techniques for Creating Reusable Software. by Andrew Birrell. Tanenbaum (1995) surveys the design issues for user-level and kernellevel threads and outlines their implementations. His case studies describe the thread packages in three operating systems (Amoeba. including sorting. numerous examples using threads to parallelize algorithms.464 THREADS features designed specifically for concurrent programming in four programming languages. Channels in CSP and Newsqueak are more powerful than those provided by Chan. This download file is made available for personal use only and is subject to the Terms of Service. including OpenVMS. ran on the OSF/1 variant of UNIX. .. McIlroy (1990) details a Newsqueak program that manipulates power series by treating them as data streams. and Windows 95. and describes using random numbers to vary the preemption frequency. DCE. Pike (1990) tours the highlights of the implementation of an interpreter for Newsqueak. anyone who writes thread-based applications will benefit from this article. Threads and channels also appear in Newsqueak.

Most of the NeWS window system itself is written in its variant of PostScript.1 Binary semaphores — usually called locks or mutexes — are the most prevalent type of semaphore. Frank Liu Copyright © 1997 by David R. Quantify the level of message activity that an applica- C Interfaces and Implementations: Techniques for Creating Reusable Software. it describes the differences among the UNIX variants and the POSIX standard. Extend your implementation of locks in the previous exercise to detect these kinds of simple deadlocks.3 Reimplement the Chan interface in thread.EXERCISES 465 from threads.. which includes extensions for nonpreemptive threads. there’s no stack. and use the internal queue and thread functions directly instead of the semaphore functions. Unauthorized use. Exercises 20. imperative languages. The NeWS window system (Gosling. Rosenthal.com. because activations can outlive their callers. The functional language Concurrent ML (Reppy 1997) supports threads and synchronous channels much like those provided by Chan. which describes a similar but slightly different interface for UNIX threads. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Design a suitable representation for channels. All rights reserved. and Arden 1989) is another example of a window system written in a language with threads. for example. These threads are deadlocked: A can’t continue until B unlocks y. and B can’t continue until A unlocks x. In Standard ML. which renders text and images. Be careful about alerts.2 Suppose thread A locks x and then attempts to lock y. and B locks y and then attempts to lock x. This download file is made available for personal use only and is subject to the Terms of Service. Devise a test suite that measures the benefits of this presumably more efficient implementation. Concurrent ML is implemented entirely in Standard ML. reproduction and/or distribution are strictly prohibited and violate applicable laws. 20. 20. . Hanson. The heart of the NeWS system is a PostScript interpreter. It is often easier to implement threads in nonimperative languages than in stack-based.c without using semaphores. Using _MONITOR and _ENDMONITOR functions to delimit the code in the Thread and Sem implementation is from Cormack (1988). Chapter 10 in Stevens (1992) is a comprehensive treatment of signals and signal-handling procedures. Be careful about alerts. As a result. Any other use requires prior written consent from the copyright owner. Design a separate interface for locks whose implementation is simpler than the one for general semaphores. so no special arrangements are needed to support threads.

Your design should permit messages to outlive their sending threads. for a thread to send a message and then exit before that message is received. When a thread that has called alertsleep is alerted. C Interfaces and Implementations: Techniques for Creating Reusable Software. Hanson. Alerts don’t affect threads blocked on a condition variable. and messages are buffered until they’re received. That is.7 Devise a way to make the Thread and Sem functions atomic without using _MONITOR and _ENDMONITOR. You’ll need a critical flag for each thread. wakeup(c) causes one or more threads waiting on c to resume execution.com. that is.5 Modula-3 supports condition variables. Design and implement an interface that supports condition variables. broadcast(c) is like wakeup(c). when one thread calls fgetc. The calling thread must have m locked. and the assembly-language code will need to modify this flag. Any other use requires prior written consent from the copyright owner. use your locks from Exercise 20.466 THREADS tion must have for this revised implementation to make a measurable difference in runtime. for example. Hints: A single global critical flag isn’t enough. 20. buffered communications — an interthread message facility in which the sender doesn’t wait for the message to be received. other threads can execute while that thread waits for input..4 Design and implement an interface for asynchronous. Be careful — it is incredibly easy to make subtle errors using this approach. reproduction and/or distribution are strictly prohibited and violate applicable laws. but all threads sleeping on c resume execution. This download file is made available for personal use only and is subject to the Terms of Service. because it must cope with storage management for the buffered messages and with more error conditions. Asynchronous communication is more complicated than Chan’s synchronous communication. unless they called alertsleep instead of sleep. one of those relocks m and returns from its call to sleep.6 If your system supports nonblocking I/O system calls.1. use them to build a thread-safe implementation of C’s standard I/O library. The atomic operation sleep(m. providing a way for a thread to determine whether a message has been received. it locks m and raises Thread_Alerted. 20. Frank Liu Copyright © 1997 by David R. 20. A condition variable c is associated with a lock m. 20. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Unauthorized use. . for example. c) causes the calling thread to unlock m and wait on c.

20. Modify Thread_init and Thread_new so they accept priority specifications as optional arguments. Frank Liu Copyright © 1997 by David R. reproduction and/or distribution are strictly prohibited and violate applicable laws.1. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. For example. and there’s some low-level mechanism (usually implemented in hardware) for one processor to access the memory of another. 20. Sem.. 4096. All rights reserved. C Interfaces and Implementations: Techniques for Creating Reusable Software. would create a thread with a 4K byte stack. and Chan interfaces onto the distributed-memory model provided by MPPs. 20. which are essentially associative tables of thread attributes. like those in the Thread and Sem functions. Implementing atomic operations will require some form of low-level spin locks that ensure exclusive access to short critical regions that access shared data structures. each processor has its own memory. This implementation is more complicated than the implementation detailed in Section 20.com. Templates avoid repeating the same arguments in thread-creation calls. Hanson. and let thread attributes be specified other than at the creation site. When a thread is created with DCE’s pthread_create.9 Add support for a small number of priorities to Thread’s implementation as suggested in Section 20. . Sem. On MPPs.10 DCE supports templates. 20.11 Implement Thread and Sem on a multiprocessor with shared memory. This download file is made available for personal use only and is subject to the Terms of Service. Design a template facility for Thread using Table_Ts.EXERCISES 467 20. Any other use requires prior written consent from the copyright owner. NULL). such as a Sequent. and revise Thread_new so that it accepts a template as one of its optional arguments.3 because threads really do execute concurrently on a multiprocessor. a template supplies attributes such as stack size and priority. and Chan on a massively parallel processor (MPP) with many processors. "stacksize". One of the challenges in this exercise is deciding how to map the shared-memory model favored by the Thread.8 Extend Thread_new so that it accepts optional arguments that specify the stack size. Unauthorized use. Tanenbaum (1995) describes how to implement a fair scheduling policy that supports priorities. like the Cray T3D. t = Thread_new(…. which is comn posed of 2 DEC ALPHA processors.12 Implement Thread.

and Smaalders 1996).15 Implement Thread. Modify a C compiler to use this approach. This approach not only simplifies thread creation. and Chan using DCE threads. too. 20. Any other use requires prior written consent from the copyright owner. 20. allocate the stack in chunks. Shah. Unauthorized use. Be sure to specify what system-dependent optional parameters your implementation of Thread_new accepts.13 Implement Thread. on the fly. You’ll have to recompile any libraries you use. too. The exit sequence unlinks and deallocates a chunk when its last frame is removed.468 THREADS 20. and Chan using LWPs on Solaris 2. 20. The function entry sequence allocates the frame in the current chunk. C Interfaces and Implementations: Techniques for Creating Reusable Software. Sem.17 If you have access to a C compiler for the SPARC. reproduction and/or distribution are strictly prohibited and violate applicable laws. .14 Implement Thread. All rights reserved. providing optional parameters for Thread_new as necessary. such as lcc (Fraser and Hanson 1995). This download file is made available for personal use only and is subject to the Terms of Service. you’ll need to recompile any libraries you use. this exercise. modify the compiler so that it doesn’t use the SPARC register windows. Hanson.16 Implement Thread. otherwise. A few systems. Warning: This exercise is a large project. 20. Frank Liu Copyright © 1997 by David R.18 Thread_new must allocate a stack because most compilation systems assume that a contiguous stack has already been allocated when a program begins execution. if it fits. and Chan using Microsoft’s Win32 threads interface (see Richter 1995). Sem. and measure its benefits.com. Sem. 20. it allocates a new chunk of sufficient size and links it to the current chunk. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Sem. but also checks for stack overflow automatically. which eliminates the ta 3 system call in _swtch. As with the previous exercise. is a large project. Measure the resulting improvements in runtime. such as the Cray-2.. and Chan using POSIX threads (see Kleiman.

in alphabetical order. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. The summary for each interface lists. The prototype for each function is followed by the exceptions it can raise and a concise description. Hanson.. The following table summarizes the interfaces by category and gives the pages on which the summaries begin. The notation “T is opaque X_T” indicates that interface X exports an opaque pointer type X_T. stand for checked and unchecked runtime error(s). This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Fundamentals Arena Arith Assert Except Mem 471 472 474 476 479 ADTs Array ArrayRep Bit List Ring Seq Set Stack Table 472 473 474 478 483 485 486 487 490 Strings Atom Fmt Str Text 474 477 487 491 Arithmetic AP MP XP 470 480 494 Threads Chan 476 Sem 484 Thread 493 Licensed by Frank Liu 1740749 469 C Interfaces and Implementations: Techniques for Creating Reusable Software. the subsections name each interface and its primary type. if it has one. if the interface reveals its primary type. .r. Unauthorized use.APPENDIX INTERFACE SUMMARY Interface summaries are listed below in alphabetical order. followed by the exported functions.r.e. excluding exceptions. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.e. the exported variables.com. The abbreviations c. abbreviated as T in the descriptions. Frank Liu Copyright © 1997 by David R. The representation for X_T is given. and u.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. vacated bits are filled with zeros. C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a c. Mem_Failed char **end) interprets str as an integer in base and returns the resulting T. Frank Liu Copyright © 1997 by David R. or >0 if x<y. if end is nonnull. long int y) Mem_Failed return the quotient x/y. void AP_free(T *z) deallocates and clears *z. int width. long int y) return an int <0. It is a c. and the result has the same sign as x. reproduction and/or distribution are strictly prohibited and violate applicable laws.r. It is c. Ignores leading white space and accepts an optional sign followed by one or more digits in base. see Arith_div. lowercase or uppercase letters are interpreted as digits greater than 9.r.e. void AP_fmt(int code.e. Unauthorized use. int precision) a Fmt conversion function: consumes a T and formats it like printf’s %d. T y) Mem_Failed T AP_addi(T x. int s) Mem_Failed returns x shifted left by s bits. It is a c. unsigned char flags[]. T y) Mem_Failed T AP_muli(T x. x=y. long int y) Mem_Failed return the product x•y..e. All rights reserved.r. T y) Mem_Failed T AP_divi(T x. T is opaque AP_T T AP_add(T x. for y=0. for s<0. AP_fromstr returns null and sets *end to str. =0. int AP_cmp(T x. or x>y. see Arith_mod. T AP_fromstr(const char *str.e. This download file is made available for personal use only and is subject to the Terms of Service. T AP_lshift(T x. va_list *app. Any other use requires prior written consent from the copyright owner. For 10<base≤36. void *cl. to pass a null T to any AP function. for app or flags to be null. T y) int AP_cmpi(T x. for z or *z to be null. *end points to the character in str that terminated the scan. long int y) Mem_Failed return x mod y. It is a c. long int y) Mem_Failed return the sum x + y.e. for str=null or for base<2 or base>36. T y) Mem_Failed long AP_modi(T x.e. .470 INTERFACE SUMMARY AP It is a c. If str does not specify an integer in base. int base.e.r. Mem_Failed int put(int c. T AP_div(T x. Hanson.com. for y=0.r. void *cl). T AP_mod(T x. If end≠null. It is a c. T AP_mul(T x.r.r.

It is c.. returns x . Unauthorized use. Any other use requires prior written consent from the copyright owner. reproduction and/or distribution are strictly prohibited and violate applicable laws. vacated bits are filled with zeros. T AP_pow(T x. T y) Mem_Failed T AP_subi(T x.ARENA 471 T AP_neg(T x) Mem_Failed returns −x. T p) Mem_Failed y y returns x mod p. The bytes are uninitialized. If p=null. This download file is made available for personal use only and is subject to the Terms of Service.e.com. const char *file. long int AP_toint(T x) returns a long with same sign as x and magnitude x mod LONG_MAX+1. Arena T is opaque Arena_T It is a c.e. long nbytes. T x) fills str[0. T AP_new(long int n) Mem_Failed allocates and returns a new T initialized to n. Arena_Failed const char *file. int size. Arena_Failed long nbytes.size-1] with the character representation of x in base and returns str. long count. int line) allocates space in arena for an array of count elements.r.r. file and line are reported as the offending source coordinates. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. If Arena_alloc raises Arena_Failed.r. The elements are uninitialized. int line) allocates nbytes bytes in arena and returns a pointer to the first byte. All rights reserved. It is a c. and the result has the same sign as x. T y. and returns a pointer to the first element. It is a c. T AP_rshift(T x. to pass nbytes≤0 or a null T to any Arena function. Uppercase letters are used for digits that exceed 9 when base>10. Hanson. void *Arena_alloc(T arena. T AP_sub(T x. void *Arena_calloc(T arena. It is a c.e for y<0 or for a nonnull p<2. long int y) Mem_Failed return the difference x − y. Frank Liu Copyright © 1997 by David R. for a nonnull str to be too small or for base<2 or base>36.. If Arena_calloc raises Arena_Failed.r.e. Mem_Failed int base. for s<0. AP_tostr allocates it. file and line are reported as the offending source coordinates. int s) Mem_Failed returns x shifted right by s bits. each occupying nbytes. .r. If str=null. C Interfaces and Implementations: Techniques for Creating Reusable Software.e. for count≤0. char *AP_tostr(char *str.

Frank Liu Copyright © 1997 by David R.e. for y=0. Unauthorized use. Array T is opaque Array_T Array indices run from zero to N−1. int y) returns x/y.e. for y=0.r. the maximum integer that does not exceed the real number z such that z•y = x.r. The empty array has no elements. Hanson. Any other use requires prior written consent from the copyright owner. int y) returns max(x. int Arith_div(int x. 5) returns 2. initializes. It is a u. Arith_div(−13. It is a c. to pass a null T to any Array function. It is a u. It is a u. int Arith_min(int x. This download file is made available for personal use only and is subject to the Terms of Service. . C Interfaces and Implementations: Techniques for Creating Reusable Software. for y=0.g. y). 5) returns −3. y). deallocates the arena itself..r. T Arena_new(void) Arena_NewFailed allocates.r. int y) returns the greatest integer not exceeding the real quotient of x/y.com. e. e. where N is the length of the array. int Arith_floor(int x. int y) returns min(x. int Arith_max(int x. and returns a new arena. int Arith_mod(int x.g.. and clears *ap.r. reproduction and/or distribution are strictly prohibited and violate applicable laws. It is a c. Arith int Arith_ceiling(int x.e.e. All rights reserved.r. int y) returns x − y•Arith_div(x. for y=0. It is a u. for ap or *ap to be null. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. void Arena_free(T arena) deallocates all of the space in arena — all of the space allocated since the last call to Arena_free. Truncates toward −∞.e..e.472 INTERFACE SUMMARY void Arena_dispose(T *ap) deallocates all of the space in *ap. Arith_mod(−13. y). int y) returns the least integer not less than the real quotient of x/y.

and ary. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It is a c. reproduction and/or distribution are strictly prohibited and violate applicable laws. for elem=null or for i<0 or i≥N. All rights reserved. where N is the length of array.e. int length. for length<0. initializes. the excess elements are cleared. If length exceeds the original length. length=0 and ary≠null.r.r. for length≠0 and ary=null. int size) Mem_Failed allocates. Unauthorized use. Frank Liu Copyright © 1997 by David R. void *ary) initializes the fields in array to the values of length. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner.com. and returns a new array of length elements each of size bytes.e.e. T Array_new(int length.ARRAYREP 473 T Array_copy(T array. It is a c. If length exceeds the length of array. int Array_size(T array) returns the size in bytes of the elements in array. int length) Mem_Failed creates and returns a new array that holds the initial length elements from array. int size. The elements are cleared. void *Array_put(T array. Hanson. the excess elements are cleared.r.r. } *T. void *elem) copies Array_size(array) bytes from elem into the ith element in array and returns elem. ArrayRep typedef struct T { int length. where N is the length of array. int i) returns a pointer to the ith element in array. for length<0 or size≤0. or size≤0. It is a c. to change the fields in a T. T is Array_T void ArrayRep_init(T array. It is a c.. to initialize a T by other means. It is a c. void *Array_get(T array. int length) Mem_Failed changes the number of elements in array to length. void Array_free(T *array) deallocates and clears *array. It is a c. It is a u. .e. size.r. for array or *array to be null. int i.e.e. char *array. C Interfaces and Implementations: Techniques for Creating Reusable Software.e.r. for i<0 or i≥N.r. void Array_resize(T array.r. It is a u. int size.e. int Array_length(T array) returns the number of elements in array.

const char *Atom_int(long n) Mem_Failed returns the atom for the decimal string representation of n. to pass a null str to any Atom function. It is a c. for lo>hi. Unauthorized use. It is a c.474 INTERFACE SUMMARY Assert assert(e) raises Assert_Failed if e is zero. const char *Atom_new(const char *str. assert(e) is an expression.e.len-1].e.h is included. where N is the length of the vector. int hi) clears bits lo. for len<0. to modify an atom.e.. It is a u. C Interfaces and Implementations: Techniques for Creating Reusable Software. int lo. Syntactically.r. All rights reserved. strlen(str)). for str not to be an atom.r.e. Frank Liu Copyright © 1997 by David R.hi in set.e. assertions are disabled. const char *Atom_string(const char *str) Mem_Failed returns Atom_new(str. and Bit_diff. Any other use requires prior written consent from the copyright owner. It is a c. int len) Mem_Failed returns the atom for str[0.r. Atom It is a c. int Atom_length(const char *str) returns the length of the atom str. Bit_inter. If NDEBUG is defined when assert.. Bit T is opaque Bit_T The bits in a bit vector are numbered zero to N−1. int Bit_count(T set) returns the number of ones in set.r. .r.r. creating one if necessary.e to pass a null T to any Bit function. likewise for hi.. This download file is made available for personal use only and is subject to the Terms of Service. void Bit_clear(T set. It is a c. Hanson. or for lo<0 or lo≥N where N is the length of set. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. except for Bit_union. Bit_minus. reproduction and/or distribution are strictly prohibited and violate applicable laws.com.

Any other use requires prior written consent from the copyright owner. or for s and t to have different lengths... T Bit_union(T s. .e. See Bit_diff for c.e. T Bit_minus(T s.com. See Bit_clear for c. cl) for each bit in set from zero to N−1. void Bit_map(T set. for set or *set to be null. void Bit_set(T set. T t) returns 1 if s ⊂ t and zero otherwise. See Bit_diff for c. where N is the length of set. for bit<0 or bit>1.r.e.r. where N is the length of set.e. void Bit_free(T *set) deallocates and clears *set.e. Changes to set by apply affect subsequent values of bit. See Bit_clear for c. int hi) sets bits lo.e. It is a c.e.r.r. void *cl). C Interfaces and Implementations: Techniques for Creating Reusable Software.r. See Bit_eq for c. If s=null or t=null. This download file is made available for personal use only and is subject to the Terms of Service. It is a c.e. T Bit_inter(T s. for s=null and t=null. int Bit_length(T set) returns the length of set.e.hi in set.hi in set. int n. it denotes the empty set. T Bit_new(int length) Mem_Failed creates and returns a new bit vector of length zeros. It is a c. int bit) sets bit n to bit and returns the previous value of bit n. int lo.r. Frank Liu Copyright © 1997 by David R. It is a c. int Bit_lt(T s. int bit. T t) returns 1 if s ⊆ t and zero otherwise.r.e..e. reproduction and/or distribution are strictly prohibited and violate applicable laws. See Bit_eq for c. void *cl) calls apply(n.r. bit. T t) returns 1 if s = t and zero otherwise. int lo. int Bit_leq(T s.r. int Bit_eq(T s. All rights reserved. Hanson. void Bit_not(T set. int Bit_get(T set. T t) Mem_Failed returns s ∪ t: the inclusive OR of s and t.r. for length<0.BIT 475 T Bit_diff(T s. T t) Mem_Failed returns s ∩ t: the logical AND of s and t.r. int Bit_put(T set. or for n<0 or n≥N.r. It is a c.e.r. See Bit_diff for c. Unauthorized use. T t) Mem_Failed returns the symmetric difference s / t: the exclusive OR of s and t. int hi) complements bits lo. void apply(int n.e. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. T t) Mem_Failed returns s − t: the logical AND of s and ~t. for n<0 or n≥N where N is the length of set. int n) returns bit n. for s and t to have different lengths. It is a c.

Any other use requires prior written consent from the copyright owner. int line) raises exception *e at source coordinate file and line. void *ptr. then copies up to size bytes from ptr to the receiver.e.e. Unauthorized use. The ELSE clause is optional.. reproduction and/or distribution are strictly prohibited and violate applicable laws.r. . const void *ptr. All rights reserved. Uncaught exceptions cause program termination. for e=null. } T. C Interfaces and Implementations: Techniques for Creating Reusable Software. It is a c. const char *file. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. initializes. It is a u.com. Hanson.e. RAISE(e) raises e. It is a c.e. and returns the number copied. See Chan_receive for c. to pass a null T to any Chan function. or to call any Chan function before calling Thread_init.e. RETURN RETURN expression is a return statement used within TRY statements. T Chan_new(void) Mem_Failed creates. to use a C return statement in TRY statements. and returns the number copied. int Chan_send(T c.r. int size Thread_Alerted waits for a corresponding Chan_receive. then copies up to size bytes from the sender to ptr.r. int size) Thread_Alerted waits for a corresponding Chan_send. T is Except_T The syntax of TRY statements is as follows. Frank Liu Copyright © 1997 by David R. and returns a new channel. int Chan_receive(T c.r. This download file is made available for personal use only and is subject to the Terms of Service. TRY S EXCEPT( e 1 ) S 1 … EXCEPT( e n ) S n ELSE S 0 END_TRY TRY S FINALLY S 1 END_TRY void Except_raise(const T *e. for ptr=null or size<0.476 INTERFACE SUMMARY Chan T is opaque Chan_T It is a c.r. S and e denote statements and exceptions. RERAISE reraises the exception that caused execution of a handler. Except typedef struct T { char *reason.

Frank Liu Copyright © 1997 by David R. put(c. reproduction and/or distribution are strictly prohibited and violate applicable laws. . It is a c. Here and below. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. T Fmt_register(int code. C Interfaces and Implementations: Techniques for Creating Reusable Software. void *cl). int precision) void Fmt_puts(const char *str.e for str=null. It is a c. for code<0 or code>255. va_list *app.. void *cl. char *Fmt_flags = "-+ 0" points to the flag characters that can appear in conversion specifiers. cl) is called to emit each formatted character c. T cvt) associates cvt with the format character code. unsigned char flags[256]. unsigned char flags[256].1...e to pass a null put. and Fmt_print writes to stdout. int precision) defines the type of a conversion function. int put(int c. Unauthorized use. void *cl).e.r. and precision.r. . . int put(int c. This download file is made available for personal use only and is subject to the Terms of Service. const char *fmt.. void *cl). void *cl. or for a format string to use a conversion specifier that has no associated conversion function. or fmt to any Fmt function. int width. void *cl). void Fmt_fprint(FILE *stream. void *cl. Any other use requires prior written consent from the copyright owner.. It is a c. .) void Fmt_print(const char *fmt.len-1] according to Fmt’s defaults (see Table 14.1 (page 220) summarizes the initial set of conversion specifiers. int put(int c. int precision) format and emit the converted numeric (Fmt_putd) or string (Fmt_puts) in str[0. All rights reserved.) formats and emits the “…” arguments according to the format string fmt. and returns the previous conversion function. void Fmt_fmt(int put(int c. Fmt_fprint writes to stream. width. Table 14. buf. int width. int len. len<0. int len.. page 220) and the values of flags.r.) format and emit the “…” arguments according to fmt. unsigned char flags[256].com. int width. Hanson.. which is called by the Fmt functions when the associated conversion specifier appears in a format string.. void Fmt_putd(const char *str. const char *fmt. or flags=null.FMT 477 Fmt T is Fmt_T typedef void (*T)(int code. void *cl.

T List_copy(T list) Mem_Failed creates and returns a top-level copy of list...size-1] according to fmt.. takes arguments from the list ap. . }. void *cl. C Interfaces and Implementations: Techniques for Creating Reusable Software. List_append returns tail. struct T { T rest. Unauthorized use. T List_list(void *x.) formats the “…” arguments into buf[1. . It is a c.r. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.. . Fmt_Overflow const char *fmt. Hanson. If list=null. void Fmt_vfmt(int put(int c. va_list ap) See Fmt_string. . T tail) appends tail to list and returns list.. int size.. Frank Liu Copyright © 1997 by David R. for size≤0. Raises Fmt_Overflow if more than size−1 characters are emitted.) formats the “…” arguments into a null-terminated string according to fmt and returns that string.com.e.r. int List_length(T list) returns the number of elements in list.) Mem_Failed creates and returns a list whose elements are the “…” arguments up to the first null pointer. void *cl). It is a c. const char *fmt. takes arguments from the list ap. int size. int Fmt_vsfmt(char *buf. and returns the length of buf.e. void *first. va_list ap) See Fmt_sfmt. char *Fmt_vstring(const char *fmt. takes arguments from the list ap. All rights reserved. char *Fmt_string(const char *fmt. This download file is made available for personal use only and is subject to the Terms of Service. appends a null character.478 INTERFACE SUMMARY int Fmt_sfmt(char *buf. void List_free(T *list) deallocates and clears *list. T List_append(T list.. Any other use requires prior written consent from the copyright owner. for list=null. List typedef struct T *T. T is List_T All List functions accept a null T for any list argument and interpret it as the empty list. va_list ap) See Fmt_fmt.. reproduction and/or distribution are strictly prohibited and violate applicable laws. Fmt_Overflow const char *fmt.

See Mem_calloc.. and clears ptr. ptr is evaluated more than once. and returns a pointer to the first element. Mem Licensed by Frank Liu 1740749 It is a c. If list=null. See Mem_alloc. Frank Liu Copyright © 1997 by David R.r. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software. each occupying nbytes bytes. The Nth element in the array is end. void **x) assigns list->first to *x. The elements are cleared. void *end) Mem_Failed creates an N+1-element array of the N elements in list and returns a pointer to its first element. and returns list->rest. T List_push(T list. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.MEM 479 void List_map(T list. deallocates list. file and line are reported as the offending source coordinates. It is a c. The bytes are uninitialized. It is a u.e. Unauthorized use. Mem_Failed const char *file. void *Mem_alloc(long nbytes. Hanson. Any other use requires prior written consent from the copyright owner. void *x) Mem_Failed adds a new element that holds x onto the front of list and returns the new list. .com.r.r.e. See Mem_free. All rights reserved.e. List_pop returns null and does not change *x. This download file is made available for personal use only and is subject to the Terms of Service. for count≤0. T List_pop(T list. to pass nbytes≤0 to any Mem function or macro. CALLOC(count. if x is nonnull. void apply(void **x. T List_reverse(T list) reverses the elements in list in place and returns the reversed list. void **List_toArray(T list. if ptr is nonnull. ALLOC(nbytes) Mem_Failed allocates nbytes bytes and returns a pointer to the first byte. cl) for each element p in list. FREE(ptr) deallocates ptr. If Mem_alloc raises Mem_Failed. int line) allocates nbytes bytes and returns a pointer to the first byte. void *cl) calls apply(&p->first. The bytes are uninitialized. void *cl). nbytes) Mem_Failed allocates space for an array of count elements. for apply to change list.

.r. nbytes) Mem_Failed changes the size of the block at ptr to hold nbytes bytes. int line) deallocates ptr. MP typedef unsigned char *T T is MP_T MP functions do n-bit signed and unsigned arithmetic. The elements are cleared.r. int line) changes the size of the block at ptr to hold nbytes bytes. for count≤0. for ptr to be a pointer that was not returned by a previous call to a Mem allocation function. Function names that end in u or ui do C Interfaces and Implementations: Techniques for Creating Reusable Software. const char *file. Mem_Failed const char *file. and return that address. Frank Liu Copyright © 1997 by David R. set p to the address of the block. and NEW leaves them uninitialized. void *Mem_resize(void *ptr.e.0. RESIZE(ptr. reproduction and/or distribution are strictly prohibited and violate applicable laws. If Mem_resize raises Mem_Failed. Mem_Failed const char *file. for ptr to be a pointer that was not returned by a previous call to a Mem allocation function.r. and returns a pointer to the first element. the excess bytes are uninitialized. If Mem_calloc raises Mem_Failed.com. It is a c. NEW(p) Mem_Failed NEW0(p) Mem_Failed allocate a block large enough to hold *p. and it is a u.e. only nbytes of its bytes appear in the new block. and returns the address of the block. Unauthorized use. This download file is made available for personal use only and is subject to the Terms of Service. file and line are reported as the offending source coordinate. ptr is evaluated more than once.e. for ptr=null. NEW0 clears the bytes. It is a c. All rights reserved. Any other use requires prior written consent from the copyright owner. void Mem_free(void *ptr. long nbytes. if ptr is nonnull.. long nbytes. where n is initially 32 and can be changed by MP_set. each occupying nbytes bytes. Hanson. int line) allocates space for an array of count elements.r.480 INTERFACE SUMMARY void *Mem_calloc(long count. If nbytes exceeds the size of the original block. It is a u. Implementations may use file and line to report memory-usage errors. reaims ptr at the resized block.e. file and line are reported as the offending source coordinates. If nbytes is less than the size of the original block. and returns a pointer to the first byte of the new block. See Mem_resize. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Both macros evaluate ptr once. which does not necessarily initialize pointers to null or floating-point values to 0.

T x. T y) MP_Overflow MP_addui(T z. to pass a null T to any MP function. T z. to pass a T that is too small to any MP function. for m<2. T MP_ashift(T z. T y) MP_Overflow. T x. T z. Hanson. T x. int s) sets z to x shifted right by s bits and returns z. MP_Overflow. T T T T MP_add(T z. or >0 if x<y. unsigned char flags[].e. int MP_cmp(T x. T y) MP_DivideByZero T MP_divui(T z. or x>y. C Interfaces and Implementations: Techniques for Creating Reusable Software. unsigned char flags[]. T y) int MP_cmpui(T x.e.com. int put(int c. T MP_cvt(int m. Any other use requires prior written consent from the copyright owner. It is a c. T x. T y) int MP_cmpi(T x.r. This download file is made available for personal use only and is subject to the Terms of Service. Vacated bits are filled with x’s sign bit. T x.MP 481 unsigned arithmetic. long y) MP_Overflow MP_addu(T z. T y) MP_Overflow MP_addi(T z.r. T x. It is a c. It is a c.. It is a c. T x.e. All rights reserved. They consume a T and a base b and format it like printf’s %d and %u. unsigned long y) MP_Overflow set z to x + y and return z. va_list *app. MP_DivideByZero T MP_divu(T z. int precision) are Fmt conversion functions. T x.r. for the b<2 or b>36. unsigned long y) return an int <0. void *cl). x=y. see Arith_div.r. Frank Liu Copyright © 1997 by David R. . T y) T MP_andi(T z. void *cl). unsigned long y) set z to x AND y and return z. int width. void MP_fmt(int code. T x.r.e. int width. T MP_div(T z. T x) MP_Overflow narrow or widen x to an m-bit signed or unsigned integer in z and return z. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. MP_DivideByZero T MP_divi(T z. The signed functions truncate toward −∞. T MP_and(T z. int put(int c. MP_DivideByZero unsigned long y) set z to x/y and return z. others do signed arithmetic. void *cl. long y) MP_Overflow. =0.e. Unauthorized use. void *cl. va_list *app. and for app or flags to be null. for s<0. T x. reproduction and/or distribution are strictly prohibited and violate applicable laws. T x. T x) MP_Overflow T MP_cvtu(int m. It is a u. long y) int MP_cmpu(T x. MP functions compute their results before raising MP_Overflow or MP_DivideByZero. int precision) void MP_fmtu(int code.

which has 2n bits. T x) MP_Overflow sets z to −x and returns z. T y) MP_Overflow T MP_mul2u(T z. T MP_neg(T z.e. MP_Overflow creates and returns a T initialized to u. const char *str. T x. Any other use requires prior written consent from the copyright owner. unsigned long MP_modui(T x. T x. unsigned long y) MP_Overflow set z to x•y and return z. Truncates toward −∞. MP_DivideByZero sets z to x mod y and returns z. long y) MP_Overflow. T MP_mul2(T z. T MP_mod(T z. T y) MP_Overflow. T MP_not(T z.. long MP_modi(T x. long y) MP_Overflow T MP_mulu(T z. . T x) sets z to ~x and returns z. It is a c. for s<0. int s) set z to x shifted left by s bits and return z. T x. unsigned long y) set z to x OR y and return z. T y) MP_Overflow set z to the double-length result of x•y and return z. sets z to that integer. T x. T x.482 INTERFACE SUMMARY T MP_fromint(T z. long v) MP_Overflow T MP_fromintu(T z. MP_DivideByZero unsigned long y) returns x mod y.r.r. int s) sets z to x shifted right by s bits and returns z. Hanson. See AP_fromstr. T x. T x. T MP_fromstr(T z. MP_Overflow char **end) interprets str as an integer in base. T y) MP_DivideByZero sets z to x mod y and returns z. T MP_new(unsigned long u) Mem_Failed. Vacated bits are filled with zeros. unsigned long u) MP_Overflow set z to v or u and return z. reproduction and/or distribution are strictly prohibited and violate applicable laws. T x. T MP_rshift(T z. T MP_mul(T z. T x. T x. Frank Liu Copyright © 1997 by David R. see Arith_mod. T y) MP_Overflow T MP_mului(T z. T MP_muli(T z. and returns z. This download file is made available for personal use only and is subject to the Terms of Service. T x. It is a c. All rights reserved. MP_DivideByZero returns x mod y. Unauthorized use. for s<0. T MP_modu(T z. T MP_or(T z. T x. C Interfaces and Implementations: Techniques for Creating Reusable Software. Vacated bits are filled with zeros. T y) T MP_ori(T z.com. MP_Overflow. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. int base. T y) MP_Overflow sets z to x•y and returns z. Truncates toward −∞.e. T MP_lshift(T z. see Arith_mod.

See AP_tostr. . where N is the length of the ring. C Interfaces and Implementations: Techniques for Creating Reusable Software. T y) MP_Overflow T MP_subui(T z. unsigned long y) MP_Overflow set z to x − y and return z. If str=null. int Ring_length(T ring) returns the number of elements in ring. for pos < −N or pos > N+1.e. T y) T MP_xori(T z. void *x) Mem_Failed adds x to the high (index N−1) or low (index 0) end of ring and returns x.e.r. It is a c. long int MP_toint(T x) MP_Overflow unsigned long MP_tointu(T x) MP_Overflow return x as a long int or unsigned long. All rights reserved. Unauthorized use. see Str.r. void *Ring_add(T ring. It is a c. void *Ring_addhi(T ring.r. long y) MP_Overflow T MP_subu(T z. MP_tostr ignores size and allocates the string. T x. T y) MP_Overflow T MP_subi(T z. T x. for n<2.com. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Pointers can be added or removed anywhere. Rotating a ring changes its origin. for ring or *ring to be null. Frank Liu Copyright © 1997 by David R. char *MP_tostr(char *str.size-1] with a null-terminated string representing x in base. T MP_xor(T z. T x. Positions identify points between elements. and returns str. Any other use requires prior written consent from the copyright owner. to pass a null T to any Ring function. void Ring_free(T *ring) deallocates and clears *ring. int size.. unsigned long y) set z to x XOR y and return z. T x) fills str[0. T x. T x. void *x) Mem_Failed inserts x at position pos in ring and returns x. Hanson.. reproduction and/or distribution are strictly prohibited and violate applicable laws. T MP_sub(T z. It is a c. int pos. Ring T is opaque Ring_T Ring indices run from zero to N−1. It is a c.r.e. Mem_Failed int base. This download file is made available for personal use only and is subject to the Terms of Service. The empty ring has no elements. T x.e. void *x) Mem_Failed void *Ring_addlo(T ring. rings expand automatically.RING 483 int MP_set(int n) Mem_Failed resets MP to do n-bit arithmetic. where N is the length of ring.

r. int i) removes and returns element i from ring. S and m denote statements and a T. where N is the length of ring. Unauthorized use. It is a c.r.e. It is a u.r. T Ring_ring(void *x. void Ring_rotate(T ring. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Any other use requires prior written consent from the copyright owner. .e. int i) returns the ith element in ring.r. LOCK can raise Thread_Alerted.e. void *Ring_remove(T ring.e.e.r. void *Ring_remhi(T ring) void *Ring_remlo(T ring) removes and returns the element at the high end (index N−1) or low end (index 0) of ring. } T. to pass a null T to any Sem function. T Ring_new(void) Mem_Failed creates and returns an empty ring. for ring to be empty. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software.r. It is a c. statements S are executed and m is unlocked. where N is the length of ring....e.) Mem_Failed creates and returns a ring whose elements are the “…” arguments up to the first null pointer. This download file is made available for personal use only and is subject to the Terms of Service. where N is the length of ring. void *Ring_put(T ring. void *queue. for i<0 or i≥N. It is a c. The syntax of the LOCK statement is as follows.r. void *x) Mem_Failed changes the ith element in ring to x and returns the previous value. Frank Liu Copyright © 1997 by David R. It is a c.e. int n) rotates the origin of ring n elements left (n<0) or right (n≥0). for n <0 or n >N. Sem T is opaque Sem_T typedef struct T { int count.com. or to pass an uninitialized T to any Sem function. to read or write the fields in a T directly. See Ring_get for c. for i<0 or i≥N. or to call any Sem function before calling Thread_init. Hanson. int i. It is a c.484 INTERFACE SUMMARY void *Ring_get(T ring. LOCK(m) S END_LOCK m is locked. All rights reserved. .

Pointers can be added or removed from the low end (index zero) or the high end (index N−1).r. to call Sem_init more than once on the same T. .. It is a c. then decrements s->count. void Sem_signal(T *s) Thread_Alerted increments s->count. Seq T is opaque Seq_T Sequence indices run from zero to N−1. All rights reserved. int i) returns the ith element in seq.e. void *x) changes the ith element in seq to x and returns the previous value. Hanson. T Seq_new(int hint) Mem_Failed creates and returns an empty sequence. int Seq_length(T seq) returns the number of elements in seq.r. void Seq_free(T *seq) deallocates and clears *seq.r. where N is the length of seq.e.r. sequences expand automatically. void *Seq_addhi(T seq. void *x) Mem_Failed adds x to the high or low end of seq and returns x. void Sem_wait(T *s) Thread_Alerted waits until s->count>0. for i<0 or i≥N.e. This download file is made available for personal use only and is subject to the Terms of Service.e. void *Seq_remhi(T seq) void *Seq_remlo(T seq) remove and return the element at the high or low end of seq.r. int count) sets s->count to count. It is a c. reproduction and/or distribution are strictly prohibited and violate applicable laws. Sem_T *Sem_new(int count) Mem_Failed creates and returns a T with its count field initialized to count. It is a u. It is a c. See Seq_get for c. void *x) Mem_Failed void *Seq_addlo(T seq. hint is an estimate of the maximum size of the sequence. Unauthorized use.SEQ 485 void Sem_init(T *s.e. C Interfaces and Implementations: Techniques for Creating Reusable Software. for seq to be empty.com. void *Seq_put(T seq. It is a c. It is a c. Frank Liu Copyright © 1997 by David R. to pass a null T to any Seq function.r.e. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. where N is the length of the sequence.r. void *Seq_get(T seq. Any other use requires prior written consent from the copyright owner. for seq or *seq to be null. int i. The empty sequence has no elements.e for hint<0.

unsigned hash(const void *x)) creates. T Set_inter(T s.) Mem_Failed creates and returns a sequence whose elements are the “…” arguments up to the first null pointer. T t) Mem_Failed returns the symmetric difference s / t: a set whose members appear in only one of s or t.r.r. Hanson. Set_minus.r. reproduction and/or distribution are strictly prohibited and violate applicable laws. cmp.e. Unauthorized use. Mem_Failed int cmp(const void *x. Set_inter.r.e. C Interfaces and Implementations: Techniques for Creating Reusable Software. int Set_length(T set) returns the number of elements in set.486 INTERFACE SUMMARY T Seq_seq(void *x. All rights reserved. T Set_minus(T s. T Set_new(int hint. It is a c. cl) for each member ∈ set. . void Set_map(T set. It is a c.r. to pass a null member or T to any Set function. for apply to change set.com. See Table_new for an explanation of hint. int Set_member(T set.r. for set or *set to be null. Any other use requires prior written consent from the copyright owner. initializes.e. and returns an empty set. for both s=null and t=null. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. void Set_put(T set. It is a c. .e. and hash. const void *member) Mem_Failed adds member to set. See Set_diff for c. which interpret a null T as the empty set. T t) Mem_Failed returns the difference s − t: a set whose members appear in s but not in t.e. or for nonnull s and t to have different cmp and hash functions. except for Set_diff.. void *cl) calls apply(member. void *cl). const void *y). const void *member) returns one if member ∈ set and zero otherwise. T t) Mem_Failed returns s ∩ t: a set whose members appears in s and t. See Set_diff for c.e. void Set_free(T *set) deallocates and clears *set. T Set_diff(T s. and Set_union. Set T is opaque Set_T It is a c. Frank Liu Copyright © 1997 by David R.. This download file is made available for personal use only and is subject to the Terms of Service. void apply(const void *member.. if necessary.

e. Str The Str functions manipulate null-terminated strings. for stk to be empty. Unauthorized use. It is a c. See Set_diff for c. if member ∈ set.r.r. Frank Liu Copyright © 1997 by David R. T Set_union(T s. Any other use requires prior written consent from the copyright owner. Positions identify points between characters. Stack T is opaque Stack_T It is a c.e. It is a c. s[i:j] C Interfaces and Implementations: Techniques for Creating Reusable Software. Element N is end. This download file is made available for personal use only and is subject to the Terms of Service.e. to pass null T to any Stack function. void *end) Mem_Failed creates a N+1-element array that holds the N members in set in an unspecified order and returns a pointer to the first element. void **Set_toArray(T set.STACK 487 void *Set_remove(T set. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. void Stack_push(T stk. In the descriptions below. and returns the removed member. empty T.e. void Stack_free(T *stk) deallocates and clears *stk. otherwise.. T t) Mem_Failed returns s ∪ t: a set whose members appear in s or t. the positions in STRING are: 1 2 3 4 5 6 7 – 6S – 5T – 4R – 3I – 2N – 1G 0 Any two positions can be given in either order. Hanson. void *x) Mem_Failed pushes x onto stk. const void *member) removes member from set. Str functions that create strings allocate space for their results. .r. int Stack_empty(T stk) returns one if stk is empty and zero otherwise. void *Stack_pop(T stk) pops and returns the top element on stk.com. for stk or *stk to be null. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Set_remove returns null.r. T Stack_new(void) Mem_Failed returns a new. for example.

e. reproduction and/or distribution are strictly prohibited and violate applicable laws. int i2. int put(int c. or zero otherwise. It is a c.e. char *Str_cat(const char *s1. except as specified for Str_catv and Str_map. int i. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. It is a c. It consumes three arguments — a string and two positions — and formats the substring in the style of printf’s %s. va_list *app. or zero otherwise. Each triple specifies an s[i:j]. int i1. int j. int i. Mem_Failed int n) returns n copies of s[i:j]. int i1.r. It is a c. or >0 if s1[i1:j1]<s2[i2:j2].r. int j1. for app or flags to be null. void *cl). int c) returns the position in s before the leftmost occurrence of c in s[i:j]. int Str_chr(const char *s.r. const char *set) returns the positive position in s after a nonempty run of characters from set at the beginning of s[i:j]. int i. int j1. char *Str_catv(const char *s. for set=null.. int Str_cmp(const char *s1. Hanson. int j. Mem_Failed const char *s2. int i. int j) returns the length of s[i:j]. for str=null. int j2) returns an integer <0. const char *s2. All rights reserved. int j. const char *set) returns the positive position in s after s[i:i+1] if that character appears in set.r. This download file is made available for personal use only and is subject to the Terms of Service..r. int Str_any(const char *s. =0. It is a c. void Str_fmt(int code. int j2) returns s1[i1:j1] concatenated with s2[i2:j2]..e. int i2.com.488 INTERFACE SUMMARY denotes the substring of s between positions i and j. int Str_len(const char *s. int Str_find(const char *s. or zero otherwise. for set=null.e.e. s1[i1:j1]=s2[i2:j2]. int i. C Interfaces and Implementations: Techniques for Creating Reusable Software. or zero otherwise. int i. void *cl.r. int j. Frank Liu Copyright © 1997 by David R. const char *str) returns the position in s before the leftmost occurrence of str in s[i:j]. . . It is a c. int width. int Str_many(const char *s.e. Any other use requires prior written consent from the copyright owner. or s1[i1:j1]>s2[i2:j2]. for n<0. int precision) is a Fmt conversion function. char *Str_dup(const char *s. to pass a nonexistent position or a null character pointer to any Str function.) Mem_Failed returns a string consisted of the triples in “…” up to a null pointer. It is a c. Unauthorized use. unsigned char flags[].

int i. const char *set) returns the positive position in s before a nonempty run of characters from set at the end of s[i:j]. int i. int j. It is a c. for str=null. for only one of from or to to be null. This download file is made available for personal use only and is subject to the Terms of Service. int Str_rmany(const char *s. Any other use requires prior written consent from the copyright owner. int i.e. or zero otherwise. int i. int j.STR 489 char *Str_map(const char *s. from. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Hanson. const char *set) is the rightmost variant of Str_upto. Unauthorized use. subtracting one yields the index of s[i:i+1]. or zero otherwise. int Str_match(const char *s..com. int j. reproduction and/or distribution are strictly prohibited and violate applicable laws.e. const char *to) returns the string obtained from mapping the characters in s[i:j] according to from and to. int j. int j) Mem_Failed returns s[i:j]. int j. It is a c.r.r. int i. const char *str) returns the positive position in s if s[i:j] starts with str. int Str_rchr(const char *s. int Str_rmatch(const char *s. int Str_pos(const char *s. int i. int i. const char *str) is the rightmost variant of Str_find. It is a c. . int Str_rfind(const char *s. or zero otherwise. or zero otherwise. int j. int j) Mem_Failed returns a copy of s[i:j] with the characters in the opposite order. for str=null. for set=null. or for from=null and to=null on the first call. Licensed by Frank Liu 1740749 C Interfaces and Implementations: Techniques for Creating Reusable Software. from and to establish a default mapping. and to to all be null. Each character from s[i:j] that appears in from is mapped to the corresponding character in to. If from=null and to=null.e. for strlen(from)≠strlen(to).r. int i.r. const char *str) returns the positive position in s before str if s[i:j] ends with str. int i. All rights reserved. Characters that do not appear in from map to themselves. int Str_rupto(const char *s. Frank Liu Copyright © 1997 by David R. It is a c. char *Str_sub(const char *s. int Str_upto(const char *s. const char *set) returns the position in s before the leftmost occurrence in s[i:j] of any character in set. char *Str_reverse(const char *s. for set=null. int j. Mem_Failed const char *from. their previous values are used.e. If s=null.e. for s.r. int j. int i) returns the positive position corresponding to s[i:i]. It is a c. int c) is the rightmost variant of Str_chr. int i.

Mem_Failed int cmp(const void *x. void Table_map(T table. and returns a new. It is a c. cl) for each key-value in table in an unspecified order. void *cl). for hint<0. cmp(x.r. x=y. void *cl) calls apply(key.e. If cmp(x. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.r.490 INTERFACE SUMMARY Table T is opaque Table_T It is a c. =0.y) must return an int <0. for apply to change table.r. void apply(const void *key. int Table_length(T table) returns the number of key-value pairs in table. void *Table_get(T table.e. Unauthorized use. for table or *table to be null. It is a c. Frank Liu Copyright © 1997 by David R.com. or null if table does not hold key.e. unsigned hash(const void *key)) creates. void *end) Mem_Failed creates a 2N+1-element array that holds the N key-value pairs in table in an unspecified order and returns a pointer to the first ele- C Interfaces and Implementations: Techniques for Creating Reusable Software. to pass a null T or a null key to any Table function. then hash(x) must equal hash(y). T Table_new(int hint. initializes. void *value) changes the value associated with key in table to value and returns the previous value. &value. void *Table_put(T table. It is a c.. void *Table_remove(T table. or adds key and value if table does not hold key. or >0 if x<y. hint is an estimate of the number of such pairs expected. and returns null. void Table_free(T *table) deallocates and clears *table. Hanson. If table does not hold key. const void *key) returns the value associated with key in table.e. Any other use requires prior written consent from the copyright owner. const void *y). Mem_Failed const void *key. const void *key) removes the key-value pair from table and returns the removed value.r. empty table that can hold an arbitrary number of key-value pairs. This download file is made available for personal use only and is subject to the Terms of Service. . or x>y. Table_remove has no effect and returns null. cmp and hash are functions for comparing and hashing keys. void **Table_toArray(T table. void **value.y) returns zero. For keys x and y. Table_new uses a function suitable for Atom_T keys. All rights reserved. reproduction and/or distribution are strictly prohibited and violate applicable laws. If cmp=null or hash=null.

C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.e. reproduction and/or distribution are strictly prohibited and violate applicable laws. see Str. .r. but it is a u. Hanson. The keys appear in the even-numbered array elements and the corresponding values appear in the following odd-numbered elements. T s2) returns an int <0. =0. T Text_cat(T s1. All rights reserved. it is a u.r.r. Some Text functions accept positions. which identify points between characters. "\000\001…\176\177" } const T Text_ucase = { 26.r. Unauthorized use. it is a c. This download file is made available for personal use only and is subject to the Terms of Service. s1=s2. "ABCDEFGHIJKLMNOPQRSTUVWXYZ" } const T Text_lcase = { 26. "\000\001…\376\377" } const T Text_ascii = { 128.e. T Text_box(const char *str. to write this string space or deallocate it by external means. Text manages the memory for its immutable strings. Frank Liu Copyright © 1997 by David R. It is a c. int Text_chr(T s. Any other use requires prior written consent from the copyright owner. element 2N is end. int i. int Text_cmp(T s1. so are not terminated by them. T s2) Mem_Failed returns s1 concatenated with s2. int Text_any(T s. to pass a descriptor with str=null or len<0 to any Text function. or zero otherwise.e. int c) See Str_chr. const T Text_cset = { 256. Text T is Text_T typedef struct T { int len. for str=null or len<0. Strings in string space can contain null characters. "abcdefhijklmnopqrtuvwxyz" } const T Text_digits = { 10. or s1>s2. typedef struct Text_save_T *Text_save_T.. const char *str. int i. A T is a descriptor. s[i:j] denotes the substring in s between positions i and j. or >0 if s1<s2. int len) builds and returns a descriptor for the client-allocated string str of length len.e. to write them.com. Text functions accept and return descriptors by value. T set) returns the positive position in s after s[i:i+1] if that character appears in set. clients can read the fields of a descriptor. C Interfaces and Implementations: Techniques for Creating Reusable Software. int j.TEXT 491 ment. "0123456789" } const T Text_null = { 0. } T. "" } are static descriptors initialized as shown. In the descriptions below.

int j. unsigned char flags[].e for only one of from or to to be null. int i.. It is a c.r. int j. It is a c.. int i) See Str_pos. Any other use requires prior written consent from the copyright owner. . or flags to be null. void Text_restore(Text_save_T *save) pops the string space to the point denoted by save. reproduction and/or distribution are strictly prohibited and violate applicable laws. It consumes a pointer to a descriptor and formats the string in the style of printf’s %s. int i. for save=null.size-1]. const T *to) Mem_Failed returns the string obtained from mapping the characters in s according to from and to. int j. It is a c. If from=null and to=null.r. int i. to use other Text_save_T values that denote locations higher than save after calling Text_restore. T str) See Str_rfind. T set) See Str_many. C Interfaces and Implementations: Techniques for Creating Reusable Software. int put(int c.e. int Text_find(T s.e. const T *from. Unauthorized use. int Text_rfind(T s. int Text_rmany(T s.com.e. T set) See Str_rmany. int c) See Str_rchr. int i. T Text_put(const char *str) Mem_Failed copies the null-terminated str into string space and returns its descriptor. or for from->len≠to->len. T Text_reverse(T s) Mem_Failed returns a copy of s with the characters in the opposite order.len-1] to str[0.str. int j. char *Text_get(char *str. and returns str.e. int n) Mem_Failed returns n copies of s. It is a c. int Text_rchr(T s. int Text_match(T s. Text_get allocates the space.. Frank Liu Copyright © 1997 by David R. app. It is a u. for str=null. This download file is made available for personal use only and is subject to the Terms of Service.r. int i.r. appends a null.r. All rights reserved. int Text_pos(T s. for the descriptor pointer.e. T Text_map(T s. void *cl. T str) See Str_find.492 INTERFACE SUMMARY T Text_dup(T s. T s) copies s. int precision) is a Fmt conversion function. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft.r. int j. If str=null. for n<0.r. int Text_many(T s. va_list *app. It is a c. int i. void *cl). It is a c. int size.e.str[0. see Str_map. T str) See Str_match. int j. int width. void Text_fmt(int code. Hanson. their previous values are used.len+1. for str≠null and size<s.

r. Any other use requires prior written consent from the copyright owner. C Interfaces and Implementations: Techniques for Creating Reusabl Prepared for frliu@microsoft. Sem. Thread T is opaque Thread_T It is a c.. to call any Thread function before calling Thread_init. All rights reserved. int Thread_join(T t) Thread_Alerted suspends the calling thread until thread t terminates.r. for t=null or to name a nonexistent thread. and then returns zero. It is a c.THREAD 493 int Text_rmatch(T s. Text_save_T Text_save(void) Mem_Failed returns an opaque pointer that encodes the current top of the string space. It is a c. the argument list must be terminated with a null. .) initializes the Thread for nonpreemptive (preempt=0) or preemptive (preempt=1) scheduling and returns preempt or zero if preempt=1 and preemptive scheduling is not supported. int Text_upto(T s. void Thread_exit(int code) terminates the calling thread and passes code to any threads waiting for the calling thread to terminate. T Text_sub(T s.e. the program terminates with exit(code). .. int j. T str) See Str_rmatch. int j) returns s[i:j]. int j. It is a c. the calling thread waits for all other threads to terminate. T set) See Str_rupto. When the last thread calls Thread_exit.e. If t=null. it clears its flag and raises Thread_Alerted. Thread_join returns t’s exit code.r. int i. Thread_init may accept additional implementation-defined parameters. int i. for t to name the calling thread or for more than one thread to pass a null t. Unauthorized use. T set) See Str_upto. void Thread_alert(T t) sets t’s alert-pending flag and makes t runnable.com. reproduction and/or distribution are strictly prohibited and violate applicable laws. C Interfaces and Implementations: Techniques for Creating Reusable Software.e. int Text_rupto(T s. The next time t runs. or Chan primitive. to call Thread_init more than once. int i. int i.. Frank Liu Copyright © 1997 by David R. When t terminates. int j.e. This download file is made available for personal use only and is subject to the Terms of Service.r. Hanson. or calls a blocking Thread. int Thread_init(int preempt.

494

INTERFACE SUMMARY

T Thread_new(int apply(void *), Thread_Failed void *args, int nbytes, ...) creates, initializes, and starts a new thread, and returns its handle. If nbytes=0, the new thread executes Thread_exit(apply(args)); otherwise, it executes Thread_exit(apply(p)), where p points to a copy of the nbytes block starting at args. The new thread starts with its own empty exception stack. Thread_new may accept additional implementation-defined parameters; the argument list must be terminated with a null. It is a c.r.e. for apply=null, or for args=null and nbytes<0. void Thread_pause(void) relinquishes the processor to another thread, perhaps the caller. T Thread_self(void) returns the calling thread’s handle.

XP
typedef unsigned char *T;

T is XP_T

An extended-precision unsigned integer is represented in base 2 by an array of n digits, least significant digit first. Most XP functions take n as an argument along with source and destination Ts; it is a u.r.e. for n<1 or for n not to be the length of the corresponding Ts. It is a u.r.e. to pass a null T or a T that is too small to any XP function. int XP_add(int n, T z, T x, T y, int carry) sets z[0..n-1] to x + y + carry and returns the carry out of z[n-1]. carry must be zero or one. int XP_cmp(int n, T x, T y) returns an int <0, =0, or >0 if x<y, x=y, or x>y. int XP_diff(int n, T z, T x, int y) sets z[0..n-1] to x − y, where y is a single digit, and returns the bor8 row into z[n-1]. It is a u.r.e. for y> 2 . int XP_div(int n, T q, T x, int m, T y, T r, T tmp) sets q[0..n-1] to x[0..n-1]/y[0..m-1], r[0..m-1] to x[0..n-1] mod y[0..m-1], and returns one, if y≠0. If y=0, XP_div returns zero and leaves q and r unchanged. tmp must hold at least n+m+2 digits. It is a u.r.e. for q or r to be one of x or y, for q and r to be the same T, or for tmp to be too small.

8

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementations: Techniques for Creating Reusabl
Prepared for frliu@microsoft.com, Frank Liu Copyright © 1997 by David R. Hanson.. This download file is made available for personal use only and is subject to the Terms of Service. Any other use requires prior written consent from the copyright owner. Unauthorized use, reproduction and/or distribution are strictly prohibited and violate applicable laws. All rights reserved.

XP

495

unsigned long XP_fromint(int n, T z, unsigned long u) 8n 8n sets z[0..n-1] to u mod 2 and returns u/ 2 . int XP_fromstr(int n, T z, const char *str, int base, char **end) interprets str as an unsigned integer in base using z[0..n-1] as the initial value in the conversion, and returns the first nonzero carry-out of the conversion step. If end≠null, *end points to the character in str that terminated the scan or produced a nonzero carry. See AP_fromstr. int XP_length(int n, T x) returns the length of x; that is, the index plus one of the most significant nonzero digit in x[0..n-1]. void XP_lshift(int n, T z, int m, T x, int s, int fill) sets z[0..n-1] to x[0..m-1] shifted left by s bits, and fills the vacated bits with fill, which must be zero or one. It is a u.r.e. for s<0. int XP_mul(T z, int n, T x, int m, T y) adds x[0..n-1]•y[0..m-1] to z[0..n+m-1] and returns the carry-out of z[n+m-1]. If z=0, XP_mul computes x•y. It is a u.r.e. for z to be the same T as x or y. int XP_neg(int n, T z, T x, int carry) sets z[0..n-1] to ~x + carry, where carry is zero or one, and returns the carry-out of z[n-1]. int XP_product(int n, T z, T x, int y) sets z[0..n-1] to x•y, where y is a single digit, and returns the carry8 out of z[n-1]. It is a u.r.e. for y≥ 2 . int XP_quotient(int n, T z, T x, int y) sets z[0..n-1] to x/y, where y is a single digit, and returns x mod y. It 8 is a u.r.e. for y=0 or y≥ 2 . void XP_rshift(int n, T z, int m, T x, int s, int fill) shifts right; see XP_lshift. If n>m, the excess bits are treated as if they were equal to fill. int XP_sub(int n, T z, T x, T y, int borrow) sets z[0..n-1] to x − y − borrow and returns the borrow into z[n-1]. borrow must be zero or one. int XP_sum(int n, T z, T x, int y) sets z[0..n-1] to x + y, where y is a single digit, and returns the carry8 out of z[n-1]. It is a u.r.e. for y> 2 . unsigned long XP_toint(int n, T x) returns x mod (ULONG_MAX+1).

C Interfaces and Implementations: Techniques for Creating Reusable Software. C Interfaces and Implementati