Lecture Notes in Computer Science

Edited by G. Goos, J. Hartmanis and J. van Leeuwen

1750

Springer
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo

Donald E. Knuth

MMIXware
A RISC Computer
for the Third Millennium

Springer

Series Editors
Gerhard Goos, Karlsruhe University, Germany
Juris Hartmanis, Cornell University, NY, USA
Jan van Leeuwen, Utrecht University, The Netherlands
Author
Donald E. Knuth
Computer Science Department
Stanford University
Stanford, CA 94305-9045 USA
Cataloging-in-Publication Data
Die Deutsche Bibliothek - CIP-Einheitsauftiahme
Knuth, Donald E.:

MMEXware : a rise computer for the third millennium / Donald E. Kjiuth - Berlin ;
Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ;
Singapore ; Tokyo : Springer, 1999
(Lecture notes in computer science ; 1750)
ISBN 3-540-66938-8
CR Subject Classification (1998): C.I, C.5, D.4.8, B.3.2
ISSN 0302-9743
ISBN 3-540-66938-8 Springer-Verlag Berlin Heidelberg New York

The following copyright notice is included in all files of the MMIXware package:
© 1999 Donald E. Knuth
This file may be freely copied and distributed, provided that no changes whatsoever arc
made. All users are asked to help keep the MMIXware files consistent and "uncorrupted,"
identical everywhere in the world. Changes are permissible only if the modified file is
given a new name, different from the names of existing files in the MMIXware package,
and only if the modified file is clearly identified as not being part of that package. (The
CWEB system has a "change file" facility by which users can easily make minor alterations
without modifying the master source files in any way. Everybody is supposed to use
change files instead of changing the files.) The author has tried his best to produce
correct and useful programs, in order to help promote computer science research, but
no warranty of any kind should be assumed.
Usage of those files in derived works is otherwise unrestricted.
All portions of the present book that are not distributed as part of the MMIXware files are
copyright © 1999 by Springer-Verlag. All rights for those portions (including the special
indexes) are reserved.
Printed in Germany
Typesetting: Camera-ready by author
SPIN: 10750063
06/3142 - 5 4 3 2 1 0

Printed on acid-free paper

README
MMIX is a computer intended to illustrate machine-level aspects of programming.
In my books The Art of Computer Programming, it replaces MIX , the 1960s-style
machine that formerly played such a role. MMIX 's so-called RISC (\Reduced Instruction Set Computer") architecture is much better able to represent the computers
being built at the turn of the millennium.
I strove to design MMIX so that its machine language would be simple, elegant, and
easy to learn. At the same time I was careful to include all of the complexities needed
to achieve high performance in practice, so that MMIX could in principle be built and
even perhaps be competitive with some of the fastest general-purpose computers in
the marketplace. I hope that MMIX will therefore prove to be a useful vehicle for
people who are studying how to improve compilers and operating systems, and that
other authors will like MMIX well enough to make use of it in their own textbooks.
My goal in this work is to provide a clean, complete, and well-documented \machineindependent machine" that people all over the world will be able to use as a testbed
for long-term research projects of lasting value, even as real computers continue to
change rapidly.
This book is a collection of programs that make MMIX a virtual reality. One of the
programs is an assembler, MMIXAL, which converts MMIX symbolic

les to MMIX object

les. There also are two simulators, which execute the programs in given object

les.
The

rst simulator, called MMIX-SIM or simply MMIX, executes a program one instruction at a time and allows convenient debugging. The second simulator, MMMIX,
simulates a high-performance pipeline in which many aspects of the computation are
overlapped in time. MMMIX is in fact a highly con

gurable \meta-simulator," capable of simulating an enormous variety of di erent kinds of pipelines with any number
of functional units and with many possible strategies for caching, virtual address
translation, branch prediction, super-scalar instruction issue, etc., etc.
The programs in this book are somewhat primitive, because they all are based on
a simple terminal interface: Users type commands and the computer types out a
reply. Still, these programs are adequate to provide a basis for future developments.
I'm hoping that at least one reader of this book will discover how much fun MMIX
programming can be and will be motivated to create a nice graphical interface, so
that other people will more easily be able to join in the fun. I don't have the time or
talent to construct a good GUI myself, but I've tried to write the programs in such a
way that modi

cations and enhancements will be easy to make.
The latest versions of all these programs can be downloaded from MMIX 's home page
http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html
in a

Furthermore I explicitly allow anybody to copy and modify the programs in any way they like. but anyone can use them without charge.le called mmix.gz . provided only that the computer .tar. The programs are copyrighted.

les are given di erent names whenever they have been changed. Nobody but me is allowed to make a correction or addition to the copyrighted .

unless the corrected .w .le mmixal. for example.

le is identi.

ed by some other name (possibly `turbo-mmixal. etc. .w ' or ` mmixal++.w '.).

a language that combines C with TEX in such a way that standard preprocessors can easily convert mmixal.w into a compilable .README vi The programs are all written in CWEB .

le mmixal.c or a documentation .

CWEB also includes a \change .tex .le mmixal.

le" mechanism by which people can easily customize a master source .

le like mmixal.w without changing the master .

including installation instructions for the related software. (See http://www-cs-faculty.) Readers of the present book who are unfamiliar with CWEB might want to refer to the notes on \How to read CWEB programs" that appear on pages 70{73 of my book The Stanford GraphBase (New York: ACM Press.edu/~knuth/cweb.stanford.le in any way.html and they will also be published in hardcopy form. called \fascicles.html for complete information about CWEB . I plan to prepare updates to Volumes 1{3 whenever Volume 4 needs to refer to new material that belongs more properly in earlier volumes.stanford. but the general ideas are almost self-explanatory so I decided not to reprint those notes here. The . During the next several years. These updates. 1993).edu/~knuth/taocp." will be available on the Internet via http://www-cs-faculty. as I write Volume 4 of The Art of Computer Programming.

rst such fascicle is already .

Indeed. they are typically used as follows. which is the chief component of the MMMIX metasimulator. After the MMIXware programs have been installed on a UNIX-like system. Everybody who is seriously interested in MMIX should read that First Fascicle. but every detail about MMIX can be found in the present book. preferably before reading the programs in the present book. or even to study every part of MMIX-PIPE.nished and available for downloading. But I don't expect a great number of people to study every part of this book closely. it is a tutorial introduction to MMIX and the MMIX assembly language. is one of the most instructive programs I've ever had the pleasure of writing. The main purpose of this book is to provide a complete documentation of the MMIX computer and its assembly language. First a user program is written in assembly language and put into a . the MMIX-PIPE program. I've tried to make the MMIXware programs interesting to read as well as useful. Many details about MMIX were too \picky" or too system-oriented to be appropriate for the First Fascicle.

mms stands for \ MMIX symbolic.") Then the command mmixal foo.mms will translate it into an object . say foo.mms . (The suÆx .le.

lst foo. Alternatively.mmo .le.mms could be used. this would produce a listing . foo. a command such as mmixal -l foo.

le. foo.lst . in addition to foo.mmo . The listing .

mms together with the assembled machine language instructions. when printed.le. . would show the contents of foo.

README vii Once an object .

mmo exists. mmix -P foo will print a pro. for example.mmo ). Many options are also possible. mmix -s foo will print running time statistics when the program ends. it can be run on the simple simulator by issuing a command such as mmix foo (or mmix foo.le like foo.

le that shows exactly how often each instruction was executed. mmix -t2 foo will trace each instruction the . mmix -v foo will give \verbose" details about everything the simulator did.

The MMMIX meta-simulator can also be applied to the same program. First the command mmix -Dfoo.rst two times it is performed. any number of command-line arguments can follow the name of the program being simulated. Also mmix -i foo will run the simulator in interactive mode. although a bit more preparation is necessary. The command mmix foo bar will run the simulator as if MMIX itself were running the command `foo bar ' with a rudimentary operating system. obeying various online commands by which the user can watch exactly what is happening when key parts of the program are reached.mmb foo bar will dump out a binary . etc.

mmb containing the information needed to load `foo bar ' into MMIX 's memory. Then a command like mmmix plain.mmconfig foo.mmb will invoke the meta-simulator with a \plain" pipeline con.le foo.

`g255 ' (show global register 255). The metasimulator always runs interactively. etc. `v0 ' (run quietly). Users can type `? ' in response to this prompt if they want to be reminded about what the simulator can do. `D ' (show the D-cache). `1000 ' (run 1000 cycles). Typical responses are `vff ' (run verbosely). Some familiarity with MMIX-PIPE is necessary to understand the meta-simulator's reports of its activity.guration. using the prompt ` mmmix> ' when it wants instructions about what to do next. `p ' (show the pipeline). but users of mmmix are assumed . `b200 ' (pause when location # 200 is fetched).

(This talent.) The programs in this book appear in alphabetical order: MMIX explains everything about the MMIX architecture. after all. is the hallmark of a computer scientist. MMIX-ARITH contains subroutines for 64-bit .README viii to be able to extract high-level information from a mass of low-level details.

xed and oating point arithmetic. using only 32-bit .

xed point arithmetic. MMIX-CONFIG processes con.

guration .

MMIXAL is the assembly program.les for MMMIX. MMOTYPE is a utility program that translates an MMIX object . MMMIX is the driver program for the meta-simulator. MMIX-MEM handles memory references of MMMIX in special cases associated with memory-mapped input/output. MMIX-IO contains subroutines for the primitive input/output operations of a rudimentary operating system. MMIX-SIM is the program for the non-pipelined simulator. MMIX-PIPE does the hard work of pipeline simulation.

le into humanreadable form. The .

rst of these. it is a complete de. is not actually a program. MMIX. although it has been formatted as a CWEB document.

including the details of features that are used only by the operating system.nition of MMIX . It should be read .

The program MMIX-SIM is the line-at-a-time simulator that is known simply as mmix after it has been compiled. (Actually MMIXAL or MMIX-SIM should probably be read next after MMIX. but the other programs can be read in any order.rst. and MMIX-PIPE last. Every identi.) Mini-indexes have been provided on each right-hand page of this book so that the programs can be read essentially as hypertext.

er that is used on a twopage spread but de.

ned on some other page is listed in the mini-index. MMIX-ARITH x5' means that the identi. For example. a mini-index entry such as `oplus : octa ( ).

er oplus denotes a function de.

returning a value of type octa .ned in section x5 of the MMIX-ARITH module. A master index to all uses of all identi.

edu/~knuth/mmixware.stanford. The .html as a bulletin board for posting all errors that are known to me. but I must have blundered at least once. Therefore I shall use part of Internet page http://www-cs-faculty.ers appears at the end of this book. I've tried to make this book error-free.

rst person who .

Massachusetts 17 October 1999 . Knuth Cambridge.nds a hitherto unreported error will be entitled. to a reward of $2.56. gratefully paid. Happy hacking! Donald E. as usual.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . README CONTENTS MMIX MMIX-ARITH MMIX-CONFIG MMIX-IO MMIX-MEM MMIX-PIPE MMIX-SIM MMIXAL MMMIX MMOTYPE Master Index (a preface) (a table of contents) (a de. . . . . . . . . . . . . . . . . . . . . . . . .CONTENTS v 1 2 62 110 138 148 150 332 422 494 510 524 . . . . . .

nition) (a library) (a part of MMMIX) (a library) (a triviality) (a part of MMMIX) (a simulator) (an assembler) (a meta-simulator) (a utility program) (a table of references) .

and in the .2 MMIX 1. Thirty-eight years have passed since the MIX computer was designed. the standard subroutine calling convention of MIX is irrevocably based on self-modifying code! Decimal arithmetic and self-modifying code were popular in 1962. Volumes 1{3. A completely new design is called for. although it allows for several gigabytes of memory.3. based on the principles of RISC architecture as expounded in Computer Architecture by Hennessy and Patterson (Morgan Kaufmann. But MixMaster itself is hopelessly obsolete. Introduction to MMIX. And ouch. but they sure have disappeared quickly as machines have gotten bigger and faster. a computer that will totally replace MIX in the \ultimate" editions of The Art of Computer Programming. and computer architecture has been converging during those years towards a rather di erent style of machine. we can't even use it with ASCII code to get lowercase letters. Exercise 1. 1996). So here is MMIX . Therefore it is time to replace MIX with a new computer that contains even less saturated fat than its predecessor.1{25 in the third edition of Fundamental Algorithms speaks of an extended MIX called MixMaster. which is upward compatible with the old version.

rst editions of the remaining volumes. because the . I must confess that I can hardly wait to own a computer like this. How do you pronounce MMIX ? I've been saying \em-mix" to myself.

Only time will tell whether I have mommixed the de.rst `M ' represents a new millenium. Therefore I use the article \an" instead of \a" before the name MMIX in English phrases like \an MMIX simulator. to bollix it. the Dictionary of American Regional English 3 (1996) lists \mommix" as a common dialect word used both as a noun and a verb. to mommix something means to botch it." Incidentally.

relies on an operating system to get jobs started in their own address spaces and to provide I/O capabilities. my reply has always been \Nix. It would be wonderful if some expert in operating system design became inspired to write a book that explains exactly how to construct a nice. In particular. Whenever anybody has asked if I will be writing about operating systems. NNIX. Life is too short. But nowadays such power is no longer in the hands of ordinary users. clean NNIX kernel for an MMIX chip. John Hennessy and (especially) Dick Sites have made signi. 3. The MMIX hardware.nition of MMIX . you could bootstrap it with punched cards or paper tape and do everything yourself. but I am unable to build NNIX myself. like all other computing machines made today. From time to time I will necessarily have to refer to things that NNIX does for its users. will come as no surprise. The original MIX computer could be operated without an operating system. I am deeply grateful to the many people who have helped me shape the behavior of MMIX ." Therefore the name of MMIX 's operating system. 2.

2-61. D. LNCS 1750. 1999. Knuth: MMIXware.E.  Springer-Verlag Berlin Heidelberg 1999 . pp.cant contributions.

The description in the following sections is rather different. and to consider nitpicky details about things that might go wrong.3 MMIX: INTRODUCTION TO MMIX 4. . because we are concerned about a complete implementation." a booklet containing tutorial material that will ultimately appear in the fourth edition of The Art of Computer Programming. including all of the features used by the operating system and invisible to normal programs. A programmer's introduction to MMIX appears in \Fascicle 1. Here it is important to emphasize exceptional cases that were glossed over in the tutorial.

an instruction usually means. MMIX basics." For example. 32 1 2 3 sets register 1 to the sum of registers 2 and 3. Every instruction is four bytes long and has the form OP X Y Z : The 256 possible OP codes fall into a dozen or so easily remembered categories. A few instructions combine the Y and Z bytes into a 16-bit YZ . MMIX is a 64-bit RISC machine with at least 256 general- purpose registers and a 64-bit address space.MMIX: MMIX BASICS 4 5. \Set register X to the result of Y OP Z.

two of the jump instructions use a 24-bit XYZ .eld.

But the three bytes X. Y. Z usually have three-pronged signi.eld.

In general.cance independent of each other.$Z is the operation of setting $X = $Y + $Z. Instructions are usually represented in a symbolic form corresponding to the MMIX assembly language. a dollar sign `$ ' symbolizes a register number.$Y. and the instruction above might be written `ADD $1.$3 '. in which each operation code has a mnemonic name. An assembly language instruction with two commas has three operand . operation 32 is ADD . the instruction ADD $X. For example.$2.

Z. an instruction with one comma has two operand . Y.elds X.

an instruction with no comma has one operand . YZ.elds X.

an instruction with no operands has X = Y = Z = 0. XYZ. one in which the Z .eld. Most instructions have two forms.

Z '.$Y. the command `ADD $X. Immediate constants are always nonnegative. The MMIX assembler chooses the correct code by noting whether the third argument is a register number or not. After these abbreviations have been speci. Register numbers and constants can be given symbolic names.$Z|Z ' instead of naming both cases explicitly.$Y. for example. which sets $X = $Y+Z.$Y.Z is 33.$Y. Similarly. In the descriptions below we will introduce such pairs of instructions by writing just `ADD $X. the assembly language instruction `x IS $1 ' makes x an abbreviation for register number 1. The operation code for ADD $X.$Z ' has a counterpart `ADD $X.$Z is 32. Thus. but the operation code for ADD $X. `FIVE IS 5 ' makes FIVE an abbreviation for the constant 5.$Y. for example. and one in which Z is an unsigned \immediate" constant.eld stands for register $Z.

using opcode 33. the instruction ADD x. often used to denote an alphanumeric character in ASCII code.x.) In the discussion below we use the .FIVE increases $1 by 5. A byte is an 8-bit quantity. 6. while names that stand for constants conventionally begin with an uppercase letter. (Weight watchers know that two nybbles make one byte. often used to denote a decimal or hexadecimal digit. but two bytes make one wyde. but it tends to reduce a programmer's confusion.x doubles $1 using opcode 32.x.ed. Symbolic names that stand for register numbers conventionally begin with a lowercase letter. The Unicode standard extends ASCII to essentially all the world's languages by using 16-bit-wide characters called wydes . A nybble is a 4-bit quantity. This convention is not actually enforced by the assembler. while the instruction ADD x.

similarly. an unsigned wyde lies. and the similar term octabyte or \octa" for an 8-byte quantity.MMIX: 5 MMIX BASICS term tetrabyte or \tetra" for a 4-byte quantity.615.551. by subtracting respectively 28 . between 0 and 216 1 = 65535.709. When bytes. tetras.295. or eight bytes.967. Thus. an unsigned octa lies between 0 and 264 1 = 18.446. 232 .744.294. an octabyte has 64 bits. and octas represent numbers they are said to be either signed or unsigned. or 264 times the most signi. or sixteen nybbles. an unsigned tetra lies between 0 and 232 1 = 4. Each MMIX register can be thought of as containing one octabyte. a tetra is two wydes. wydes. 216 . an octa is two tetras. Their signed counterparts use the conventions of two's complement notation. or two tetras. or four wydes.073. An unsigned byte is a number between 0 and 28 1 = 255 inclusive.

M[k] is a 1-byte quantity.775.775.036. A signed wyde is a number between 32768 and +32767. Thus. a signed tetra lies between 2.807.647. but programmers can act as if 264 bytes are indeed present.036.854.cant bit.147. MMIX machines do not actually have such vast memories. a signed byte therefore lies between 128 and +127.483.372. a signed octa lies between 9.483. If k is any unsigned octabyte.223.147.372.808 and +9. because MMIX provides address translation mechanisms by which an operating system can maintain this illusion.223. We use the notation M2t [k] to stand for a number consisting of 2t consecutive bytes starting at location k ^ (264 2t ). (The notation k ^ (264 2t ) means that the least signi.854. inclusive.648 and +2. The virtual memory of MMIX is an array M of 264 bytes. the unsigned bytes 128 through 255 are regarded as the numbers 128 through 1 when they are evaluated as signed bytes.

the notation k _ (2t 1) means that the least signi.cant t bits of k are set to 0. Similarly. and only the least 64 bits of the resulting address are retained.

) All accesses to 2t -byte quantities by MMIX are aligned. in the sense that the .cant t bits of k are set to 1.

Addressing is always \big-endian." In other words. the most signi.rst byte is a multiple of 2t .

cant (leftmost) byte of M2t [k] is M1 [k ^ (264 2t )] and the least signi.

cant (rightmost) byte is M1 [k _ (2t 1)]. if l = 2t . We use the notation s(M2t [k]) when we want to regard this 2t byte number as a signed integer. Formally speaking. s(Ml [k]) = M1 [k ^ ( l)] M1 [k ^ ( l)+1] : : : M1 [k _ (l 1)]  256 28l [M1 [k ^ ( l)]  128]: .

For example.$4. Loading and storing. the \load tetra unsigned" instruction LDTU $1.MMIX: LOADING AND STORING 6 7. the most signi.$5 puts the four bytes M4 [$4 + $5] into register 1 as an unsigned integer. Several instructions can be used to get information from memory into registers.

$4." sets $1 to the signed integer s(M4 [$4 + $5]).$5 . The similar instruction LDT $1.cant four bytes of register 1 are set to zero. (Instructions generally treat numbers as signed unless the operation code speci. \load tetra.

) In the signed case. the most signi.cally calls them unsigned.

cant four bytes of the register will be copies of the most signi.

thus they will be all 0s or all 1s.$Y.$Z|Z `load byte unsigned'. inclusive. inclusive.$Y. depending on whether the number is  0 or < 0. Byte s(M[$Y + $Z]) or s(M[$Y + Z]) is loaded into register X as a signed number between 128 and +127.cant bit of the tetrabyte loaded. Byte M[$Y + $Z] or M[$Y + Z] is loaded into register X as an unsigned number between 0 and 255.$Z|Z `load byte'.$Y. Bytes s(M2 [$Y + $Z]) or s(M2 [$Y + Z]) are loaded into register X as a signed number between 32768 and +32767. inclusive. As mentioned above.  LDB $X.  LDW $X. our notation M2 [k] implies that the least signi.$Z|Z `load wyde'.  LDBU $X.

147.$Z|Z `load tetra'.483.  LDT $X.$Z|Z `load wyde unsigned'.$Y.$Y. our notation M4 [k] implies that the two least signi.648 and +2.647. Bytes M2 [$Y + $Z] or M2 [$Y + Z] are loaded into register X as an unsigned number between 0 and 65535. As mentioned above.  LDWU $X. Bytes s(M4 [$Y + $Z]) or s(M4 [$Y + Z]) are loaded into register X as a signed number between 2.483.cant bit of the address $Y + $Z or $Y + Z is ignored and assumed to be 0. inclusive.147. inclusive.

our notation M8 [k] implies that the three least signi. As mentioned above.296.$Z|Z `load tetra unsigned'.$Y.cant bits of the address $Y + $Z or $Y + Z are ignored and assumed to be 0. Bytes M8 [$Y + $Z] or M8 [$Y + Z] are loaded into register X. Bytes M4 [$Y + $Z] or M4 [$Y + Z] are loaded into register X as an unsigned number between 0 and 4.$Z|Z `load octa'.$Y.294.967.  LDTU $X. inclusive.  LDO $X.

(Niklaus Wirth made a strong plea for such consistency in his early critique of System/360. see JACM 15 (1967).$Y. Bytes M8 [$Y + $Z] or M8 [$Y + Z] are loaded into register X. LDOU is included in MMIX just for completeness and consistency. in spite of the fact that a foolish consistency is the hobgoblin of little minds.$Z|Z `load high tetra'.)  LDHT $X.$Y. Bytes M4 [$Y+$Z] or M4 [$Y+Z] are loaded into the most signi.$Z|Z `load octa unsigned'. since an octabyte can be regarded as either signed or unsigned. There is in fact no di erence between the behavior of LDOU and LDO . 37{74.cant bits of the address $Y + $Z or $Y + Z are ignored and assumed to be 0.  LDOU $X.

and the least signi.cant half of register X.

(One use of \high tetra arithmetic" .cant half is cleared to zero.

$Y. The MMIX assembler converts LDA into the same OP-code as ADDU .  STB $X.$Z|Z `store byte'. The address $Y + $Z or $Y + Z is loaded into register X. The least signi.7 MMIX: LOADING AND STORING is to detect over ow easily when tetrabytes are added or subtracted.$Z|Z `load address'.$2. storing registers into memory.)  LDA $X. This instruction is simply another name for the ADDU instruction discussed below.$Y. the \store octa immediate" command STO $3. Another family of instructions goes the other way. it can be used when the programmer is thinking of memory addresses instead of numbers.17 puts the current contents of register 3 into M8 [$2 + 17]. For example. 8.

An integer over ow exception occurs if $X is not between 128 and +127.$Z|Z `store byte unsigned'.)  STBU $X. (We will discuss over ow and other kinds of exceptions later.cant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Z]. The least signi.$Y.

except that no test for over ow is made.$Y.$Z|Z `store wyde'.cant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Z]. The two least signi.  STW $X. STBU instructions are the same as STB instructions.

cant bytes of register X are stored into bytes M2 [$Y + $Z] or M2 [$Y + Z].  STWU $X. An integer over ow exception occurs if $X is not between 32768 and +32767.$Z|Z `store wyde unsigned'.$Y. The two least signi.

 STT $X. except that no test for over ow is made.$Y.cant bytes of register X are stored into bytes M2 [$Y + $Z] or M2 [$Y + Z].$Z|Z `store tetra'. The four least signi. STWU instructions are the same as STW instructions.

648 and +2.$Z|Z `store tetra unsigned'.483.$Y.147.cant bytes of register X are stored into bytes M4 [$Y + $Z] or M4 [$Y+Z].647.483. An integer over ow exception occurs if $X is not between 2.  STTU $X. The four least signi.147.

The most signi.$Z|Z `store high tetra'. except that no test for over ow is made.  STCO X.$Z|Z .  STOU $X.  STHT $X.$Y. An octabyte whose value is the unsigned byte X is stored into M8 [$Y + $Z] or M8 [$Y + Z].$Y.$Y.  STO $X.$Y.$Y. Identical to STO $X.$Z|Z `store constant octabyte'.$Z|Z `store octa unsigned'.$Z|Z `store octa'. STTU instructions are the same as STT instructions.cant bytes of register X are stored into bytes M4 [$Y + $Z] or M4 [$Y + Z]. Register X is stored into bytes M8 [$Y + $Z] or M8 [$Y + Z].

.cant four bytes of register X are stored into M4 [$Y+$Z] or M4 [$Y+Z].

we can compute with them. Adding and subtracting. Let's consider addition and subtraction .MMIX: ADDING AND SUBTRACTING 8 9. Once numbers are in registers.

two's complement arithmetic. . An integer over ow exception occurs if the sum is  263 or < 263 .rst.$Z|Z `add unsigned'. The sum (16$Y + $Z) mod 264 or (16$Y + Z) mod 264 is placed into register X. The sum (2$Y + $Z) mod 264 or (2$Y + Z) mod 264 is placed into register X.$Y.$Y.  4ADDU $X.$Z|Z `times 2 and add unsigned'.  16ADDU $X. see below. These instructions are the same as ADD $X.$Y. (Over ow could be detected if desired by using the command CMPU ovflo.$Y. The sum ($Y + $Z) mod 264 or ($Y + Z) mod 264 is placed into register X.  8ADDU $X.$Z|Z `add'. (We will discuss over ow and other kinds of exceptions later.$Z|Z `times 4 and add unsigned'.  ADD $X.$Y.$Y after addition.)  2ADDU $X. where CMPU means \compare unsigned". The sum $Y + $Z or $Y + Z is placed into register X using signed.$Y.$Y.$Z|Z `times 8 and add unsigned'.$X. The sum (4$Y + $Z) mod 264 or (4$Y + Z) mod 264 is placed into register X. The sum (8$Y + $Z) mod 264 or (8$Y + Z) mod 264 is placed into register X.$Z|Z commands except that no test for over ow is made.)  ADDU $X.$Z|Z `times 16 and add unsigned'.

.2 has exactly the same e ect as NEG $X.$Z|Z `negate unsigned'.$Z|Z `subtract'.  SUBU $X.$Z|Z `negate'. NEG commands are analogous to the immediate variants of other commands. two's complement arithmetic.$Y. two's complement arithmetic. An integer over ow exception occurs if the result is greater than 263 1. (Notice that in this case MMIX works with the \immediate" constant Y.1 .$Y.Y. not register Y.)  NEGU $X. These two instructions are the same as SUB $X.$Z|Z except that no test for over ow is made. The value Y $Z or Y Z is placed into register X using signed. When Y = 0.$Z|Z `subtract unsigned'.9  MMIX: ADDING AND SUBTRACTING SUB $X.0. NEGU instructions are the same as NEG instructions. because they save us from having to put one-byte constants into a register.Y. The instruction NEG $X.1. CMPU . over ow occurs if and only if $Z = 263 .  NEG $X. The di erence ($Y $Z) mod 264 or ($Y Z) mod 264 is placed into register X. The di erence $Y $Z or $Y Z is placed into register X using signed. An integer over ow exception occurs if the di erence is  263 or < 263 . except that no test for over ow is made. The value (Y $Z) mod 264 or (Y Z) mod 264 is placed into register X.$Y. x15.

MMIX: BIT FIDDLING 10 10. Bit .

$X = $Y ^ $Z or $X = $Y ^ Z. There are eighteen instructions for bitwise logical operations on unsigned numbers. in symbols. In other words.  AND $X. Each bit of register Y is logically anded with the corresponding bit of register Z or of the constant Z. Before looking at multiplication and division. This means in particular that AND $X. which take longer than addition and subtraction.$Y.$Y.$Z|Z `bitwise and'. a bit of register X is set to 1 if and only if the corresponding bits of the operands are both 1.ddling. and the result is placed in register X. let's look at some of the other things that MMIX can do fast.Z always zeroes out the seven most signi.

cant bytes of register X. because 0s are pre.

$X = $Y n $Z or $X = $Y n Z.0 '. a bit of register X is set to 1 if and only if the corresponding bit of register Y is 1 and the other corresponding bit is 0.$Y. In other words. in symbols.  ANDN $X. Each bit of register Y is logically ored with the corresponding bit of register Z or of the constant Z.$Y. and the result is placed in register X. and the result is placed in register X. (This is the complement of $Z n $Y or Z n $Y. In other words.) . Each bit of register Y is logically ored with the complement of the corresponding bit of register Z or of the constant Z. a bit of register X is set to 1 if and only if the corresponding bit of register Y is greater than or equal to the other corresponding bit.$Z|Z `bitwise exclusive or'. In other words. if the operands are bit strings representing sets. Each bit of register Y is logically anded with the complement of the corresponding bit of register Z or of the constant Z.$Z|Z `bitwise or-not'. Each bit of register Y is logically xored with the corresponding bit of register Z or of the constant Z. In the special case Z = 0. and the result is placed in register X. the immediate variant of this command simply copies register Y to register X. in symbols.$Y.$Z|Z `bitwise or'. (This is the logical di erence operation. a bit of register X is set to 0 if and only if the corresponding bits of the operands are equal.xed to the constant byte Z. In other words. $X = $Y  $Z or $X = $Y  Z. The MMIX assembler allows us to write `SET $X. $X = $Y _ $Z or $X = $Y _ Z.  XOR $X.  OR $X.)  ORN $X. a bit of register X is set to 0 if and only if the corresponding bits of the operands are both 0. in symbols. and the result is placed in register X.$Y. $X = $Y _ $Z or $X = $Y _ Z. in symbols.$Y. we are computing the elements that lie in one set but not the other.$Y ' as a convenient abbreviation for `OR $X.$Z|Z `bitwise and-not'.

For each bit position j . In symbols.  NXOR $X. and the complement of the result is placed in register X. in symbols.$Z|Z `bitwise not and'. the j th bit of register X is set either to bit j of register Y or to bit j of the other operand $Z or Z. Each bit of register Y is logically anded with the corresponding bit of register Z or of the constant Z.11  MMIX: BIT FIDDLING NAND $X. a bit of register X is set to 1 if and only if the corresponding bits of the operands are both 0. Each bit of register Y is logically ored with the corresponding bit of register Z or of the constant Z.$Z|Z `bitwise not or'.$Y. $X = $Y ^ $Z or $X = $Y ^ Z. and the complement of the result is placed in register X.$Y. $X = $Y _ $Z or $X = $Y _ Z. In other words. a bit of register X is set to 0 if and only if the corresponding bits of the operands are both 1. in symbols. Each bit of register Y is logically xored with the corresponding bit of register Z or of the constant Z.  MUX $X. In other words.$Y. (MMIX has several such special registers.$Z|Z `bitwise not exclusive or'.$Z|Z `bitwise multiplex'. depending on whether bit j of the special mask register rM is 1 or 0: if Mj then Yj else Zj . associated with instructions that need more than two inputs or produce more than one output.$Y. $X = ($Y ^ rM) _ ($Z ^ rM) or $X = ($Y ^ rM) _ (Z ^ rM). and the complement of the result is placed in register X.) .  NOR $X. a bit of register X is set to 1 if and only if the corresponding bits of the operands are equal. In other words. in symbols. $X = $Y  $Z or $X = $Y  Z.

in the latter case. OR e. $X is set to zero. in the latter case. their bytewise maxima c and bytewise minima d are computed by BDIF x.a. in the latter case. The operands are treated as unsigned integers. unless $Z or Z exceeds register Y. BDIF y. The BDIF and WDIF commands are useful in applications to graphics or video.  WDIF $X. wyde j of $X is set to zero. unless that di erence is negative.$Z|Z `wyde di erence'. similarly. For each wyde position j . unless that di erence is negative. are computed by BDIF x. tetra j of $X is set to zero. byte j of $X is set to zero.b. For each byte position j .  BDIF $X. unless that di erence is negative. SUBU d.$Y.x. ADDU c. To add individual bytes of a and b while clipping all sums to 255 if they don't .y.  ODIF $X.b. namely the absolute values of the di erences of corresponding bytes. MMIX can also perform unsigned byte- wise and biggerwise operations that are somewhat more exotic.$Y. if a and b are registers containing 8-byte quantities.x. the j th tetra of register X is set to tetra j of register Y minus tetra j of the other operand $Z or Z. TDIF and ODIF are also present for reasons of consistency. Besides the eighteen bitwise operations.a. the individual \pixel di erences" e . the j th wyde of register X is set to wyde j of register Y minus wyde j of the other operand $Z or Z.$Z|Z `octa di erence'.x.$Z|Z `tetra di erence'. in the latter case. For each tetra position j .b.$Z|Z `byte di erence'.b.a.$Y. Register X is set to register Y minus the other operand $Z or Z.a. For example. the j th byte of register X is set to byte j of register Y minus byte j of the other operand $Z or Z.MMIX: BIT FIDDLING 12 11.  TDIF $X.$Y.

Exercise: Implement a \nybble di erence" instruction that operates in a similar way on sixteen nybbles at a time.a.a.a.m.y.m.t in a single byte.0.b. in other words. OR ans.x. ANDN xx. complement a .acomp. BDIF x.x. NOR clippedsums.m.b.x. ANDN yy.) . and complement the result. BDIF xx. apply BDIF .b.xx. (The ANDN operation can be regarded as a \bit di erence" instruction that operates in a similar way on 64 bits at a time. BDIF x. The operations can also be used to construct eÆcient operations on strings of bytes or wydes.xx where register m contains the mask # 0f0f0f0f0f0f0f0f. Answer: AND x.yy. one can say NOR acomp. AND y.0.m.

MMIX: 13 BIT FIDDLING 12. Three more pairs of bit-.

in other words. $X =  ($Y n $Z) or $X =  ($Y n Z). in symbols. yij is the j th bit of the ith byte. for example.$Y.$Z|Z `multiple or'. be indexed similarly: z00 z01 : : : z07 z10 z11 : : : z17 : : : z70 z71 : : : z77 : The MOR operation replaces each bit xij of register X by the bit y0j zi0 _ y1j zi1 _    y7j zi7 : Thus." because it counts the number of 1s in register Y.  MOR $X. and the number of 1 bits in the result is placed in register X.$Y. if we number the bits and bytes from 0 to 7 in big-endian fashion from left to right. register X is set to the number of bit positions in which register Y has a 1 and the other operand has a 0.ddling instructions round out the collection of exotics.  SADD $X.$Z|Z `sideways add'. MOR reverses the order of the bytes in register Y. Let the bits of the other operand. Each bit of register Y is logically anded with the complement of the corresponding bit of register Z or of the constant Z. (The ith byte of $X depends on the bytes of $Y as speci. In other words. When the second operand is zero this operation is sometimes called \population counting. if register Z contains the constant # 0102040810204080. converting between little-endian and big-endian addressing. $Z or Z. Suppose the 64 bits of register Y are indexed as y00 y01 : : : y07 y10 y11 : : : y17 : : : y70 y71 : : : y77 .

Alternatively. If we regard 64-bit words as 8  8 Boolean matrices. MOR computes the Boolean product $X = $Z $Y or $X = Z $Y with operands in the opposite order.Z always sets the leading seven bytes of register X to zero. the other byte is set to the bitwise or of whatever bytes of register Y are speci. this operation computes the Boolean product $X = $Y $Z or $X = $Y Z.ed by the ith byte of $Z or Z. with one byte per column.$Y. if we regard 64-bit words as 8  8 matrices with one byte per row. The immediate form MOR $X.

ed by the immediate operand Z. Thus we obtain a matrix product over the . here minusone is a register consisting of all 1s.$Y.) Exercise: Explain how to compute a mask m that is # ff in byte positions where a exceeds b .minusone. This operation is like the Boolean multiplication just discussed. (Moreover. we get a one-byte encoding of m . then MOR with Z = 255. Answer: BDIF x. MOR m.b.x.)  MXOR $X. but exclusive or is used to combine the bits. if we AND this result with # 8040201008040201.$Z|Z `multiple exclusive or'. # 00 in all other bytes.a.

This operation can be used to construct hash functions.) .4{72. (The hash functions aren't bad.eld of two elements instead of a Boolean matrix product. among many other things. but they are not \universal" in the sense of exercise 6.

In this case the Y and Z . Sixteen \immediate wyde" instructions are available for the common case that a 16-bit constant is needed.MMIX: BIT FIDDLING 14 13.

SETMH $X. The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits. SETML $X. respectively. SETL $X.YZ `set to medium high wyde'.elds of the instruction are regarded as a single 16-bit unsigned number YZ. SETML inserts a given value into the second-least-signi. for example.YZ `set to low wyde'.YZ `set to high wyde'. Thus. and placed into register X.  SETH $X.YZ `set to medium low wyde'.

INCMH $X.YZ `increase by low wyde'.YZ complements the most signi.YZ `increase by high wyde'.YZ `increase by medium high wyde'. the command INCH $X. INCML $X.cant wyde of register X and sets the other three wydes to zero.YZ `increase by medium low wyde'.  INCH $X. ignoring over ow. respectively. INCL $X. the result is placed back into register X. The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits. and added to register X. If YZ is the hexadecimal constant # 8000.

ANDNH $X.YZ `bitwise and-not high wyde'. ANDNML $X. INCMH $X. ANDNL $X. the result is placed back into register X.  ORH $X. and ored with register X.YZ `bitwise and-not medium high wyde'. The 16-bit unsigned number YZ is shifted left by either 0 or 16 or 32 or 48 bits. ANDNMH $X. The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits. INCL $X. respectively.cant bit of register X. respectively.YZ `bitwise or with low wyde'. INCML $X. If YZ is the hexadecimal constant # 8000.YZ `bitwise and-not low wyde'.YZ forces the most signi. the result is placed back into register X. ORML $X. any of these INC instructions could also be replaced by OR . Notice that any desired 4-wyde constant GH IJ KL MN can be inserted into a register with a sequence of four instructions such as SETH $X. ORL $X.YZ `bitwise or with medium high wyde'.YZ `bitwise or with medium low wyde'. We will see below that this can be used to negate a oating point number.IJ. the command ANDNH $X.MN.YZ `bitwise and-not medium low wyde'. ORMH $X.GH.YZ `bitwise or with high wyde'.KL. then complemented and anded with register X.

The e ect is the same as multiplication by 2 $Z or by 2Z . the result is placed in register X.$Z|Z `shift left'. MMIX knows several ways to shift a register left or right by any number of bits. The bits of register Y are shifted left by $Z or Z places. an integer over ow exception occurs if the result is  263 or < 263. if the second operand is 64 or more.cant bit of register X to be 0.$Y. This can be used to compute the absolute value of a oating point number. 14. and integer over ow will be signaled unless register Y was zero. register X will become entirely zero. and 0s are shifted in from the right. Register Y is treated as a signed number. In particular.  . but the second operand is treated as an unsigned number.  SL $X.

The e ect is the same as unsigned division of a 64-bit number by 2 $Z or by 2Z . The bits of register Y are shifted left by $Z or Z places. Both operands are treated as unsigned numbers. Register Y is treated as a signed number.$Y.15  MMIX: BIT FIDDLING SLU $X. the result is placed in register X. if the second operand is 64 or more. the result is placed in register X.  SRU $X. but the second operand is treated as an unsigned number. 1 if $Y was negative. register X will become zero if $Y was nonnegative. Both operands are treated as unsigned numbers. if the second operand is 64 or more.$Z|Z `shift left unsigned'.$Z|Z `shift right unsigned'. register X will become entirely zero.  SR $X. and 0s are shifted in from the right. The SLU instructions are equivalent to SL . and copies of the leftmost bit (the sign bit) are shifted in from the left.$Z|Z `shift right'.$Y. . the result is placed in register X.$Y. In particular. except that no test for over ow is made. The bits of register Y are shifted right by $Z or Z places. and 0s are shifted in from the left. The bits of register Y are shifted right by $Z or Z places. The e ect is the same as division by 2 $Z or by 2Z and rounding down.

Arithmetic and logical operations are nice. Comparisons. but computer programs also need to compare numbers and to change the course of a calculation depending on what they .MMIX: COMPARISONS 16 15.

it is set to 0 if register Y is equal to register Z or equal to the unsigned immediate value Z. it is set to 0 if register Y is equal to register Z or equal to the unsigned immediate value Z. . otherwise it is set to 1.$Y. using the conventions of signed arithmetic.  CMPU $X. $X = [$Y > $Z] [$Y < $Z] or $X = [$Y > Z] [$Y < Z]. In symbols. using the conventions of unsigned arithmetic.  CMP $X.$Y.$Z|Z `compare unsigned'. otherwise it is set to 1. In symbols.$Z|Z `compare'. Register X is set to 1 if register Y is less than register Z or less than the unsigned immediate value Z. $X = [$Y > $Z] [$Y < $Z] or $X = [$Y > Z] [$Y < Z]. MMIX has four comparison instructions to facilitate such decision-making. Register X is set to 1 if register Y is less than register Z or less than the unsigned immediate value Z.nd.

which choose quickly between two alternative courses of action.  CSN $X.$Z|Z `conditionally set if negative'. If register Y is negative (namely if its most signi. There also are 32 conditional instructions.17 MMIX: COMPARISONS 16.$Y.

 ZSN $X.cant bit is 1).$Z|Z `zero or set if negative'. odd.  CSZ $X. except that register X changes only if register Y is respectively zero.$Y.$Z|Z `conditionally set if even'.  CSP $X.$Z|Z `conditionally set if zero'.  CSNN $X.  CSEV $X. nonpositive.  CSNP $X.$Y.$Z|Z `conditionally set if nonnegative'. These instructions are entirely analogous to CSN . positive. Otherwise nothing happens.  CSOD $X.$Y.$Y. If register Y is negative (namely if its most signi.$Z|Z `conditionally set if positive'.$Y. register X is set to the contents of register Z or to the unsigned immediate value Z. or nonodd.$Y. nonzero.$Z|Z `conditionally set if nonpositive'.$Y.$Z|Z `conditionally set if odd'. nonnegative.$Y.$Z|Z `conditionally set if nonzero'.  CSNZ $X.

 ZSNP $X. So do AND r. odd.  ZSP $X.  ZSEV $X.r .  ZSOD $X.s.$Z|Z `zero or set if nonnegative'.0 and ZSP r. nonnegative. Otherwise register X is set to zero. These instructions are entirely analogous to ZSN . .$Y.$Y.0 and ZSNZ r. nonzero.s.$Y.$Z|Z `zero or set if nonpositive'. positive.  ZSNZ $X. nonpositive. So do the two instructions CSNP r.1 have the same e ect.$Z|Z `zero or set if nonzero'.$Z|Z `zero or set if positive'.$Y. otherwise $X is set to zero.1 .$Z|Z `zero or set if zero'.s.  ZSZ $X. except that $X is set to $Z or Z if register Y is respectively zero.$Y.s.1 and ZSOD r.$Y.s. or even.  ZSNN $X.cant bit is 1).$Y.$Z|Z `zero or set if odd'.s.$Z|Z `zero or set if even'. Notice that the two instructions CMPU r. register X is set to the contents of register Z or to the unsigned immediate value Z.

Branches and jumps. proceeding from an instruction in tetrabyte M4 [] to the instruction in M4 [ + 4]. MMIX ordinarily executes instructions in sequence. But there are several ways to interrupt the normal ow of control.MMIX: BRANCHES AND JUMPS 18 17. most of which use the Y and Z .

elds of an instruction as a combined 16-bit YZ .

@-4000 (branch if odd) takes control to the instruction that appears 4000 bytes before this BOD command.Case5 '.@+4000 (branch if nonzero) is typical: It means that control should skip ahead 1000 instructions to the command that appears 4000 bytes after the BNZ . the assembler . just as it takes cares of changing ADD from 32 to 33 when necessary. where Case5 is the address of an instruction in location l. For example. There are eight branch-forward instructions.eld. If this instruction appears in location . corresponding to the eight conditions in the CS and ZS commands that we discussed earlier. for example. BNZ $3. Suppose a programmer writes `BNZ $3. Since branches are relative to the current location. And there are eight similar branch-backward instructions. The numeric OP-code when branching backward is one greater than the OP-code when branching forward. the assembler takes care of this automatically. the MMIX assembler treats branch instructions in a special way. if register 3 is not equal to zero. BOD $2. if register 2 is odd.

Then if Æ is nonnegative.rst computes the displacement Æ = b(l )=4c. the quantity Æ is placed in the YZ .

if Æ is negative. and it should be less than 216 . the quantity 216 + Æ is placed in the YZ .eld of a BNZ command.

eld of a BNZ command with OP-code increased by 1. and Æ should not be less than 216 ." In the following notes we will de. The symbol @ used in our examples of BNZ and BOD above is interpreted by the assembler as an abbreviation for \the location of the current instruction.

144 and  + 262. zero.@+4*YZ[-262144] `branch if positive'.@+4*YZ[-262144] `branch if zero'. BZ $X. for example.@+4*YZ[-262144] '.         BN $X.@+4*YZ[-262144] `branch if nonnegative'. But they di er in running time: On some implementations of MMIX .@-4000 and BOD $2. inclusive. for example. They have exactly the same meaning as ordinary branch instructions.@-4000 both go backward 4000 bytes if register 2 is odd. or even. BEV $X. the next instruction is taken from memory location +4YZ (branching forward) or +4(YZ 216 ) (branching backward).@+4*YZ[-262144] `branch if even'.@+4*YZ[-262144] `branch if negative'. as well as for a branch-backward command that branches to the current location plus four times (YZ 65536).140. PBOD $2. a branch .@+4*YZ[-262144] `branch if nonzero'. odd. BNN $X. this stands for a branch-forward command that branches to the current location plus four times YZ.@+4*YZ[-262144] `branch if odd'. nonnegative. Thus one can go from location  to any location between  262. BNZ $X. nonzero. BOD $X.@+4*YZ[-262144] `branch if nonpositive'.ne pairs of branch commands by writing. BP $X. nonpositive. and if this instruction appears in memory location . `BNZ $X. positive. BNP $X. If register X is respectively negative. Sixteen additional branch instructions called probable branches are also provided.

0 ' or `GETA r.  PBEV $X. but they should use PB when they expect branching to occur more often than not. Here is a list of the probable branch commands.@+8. The two least signi.0 ' to return. Register X is set equal to  + 4. A JMP command treats bytes X. while a probable branch takes longer when the branch is not taken.860 inclusive.864 and  + 67.  PBNN $X. But subroutines are normally entered with the instructions PUSHJ or PUSHGO .@+4*YZ[-262144] `probable branch if nonzero'.@+4*YZ[-262144] `probable branch if negative'.@+4*YZ[-262144] `probable branch if odd'. a loading routine can treat program labels with the same mechanism that is used to treat references to data. for completeness:  PBN $X. MMIX takes its next instruction from location $Y + $Z or $Y + Z.@+4*YZ[-262144] `probable branch if even'.@+4*YZ[-262144] `probable branch if zero'. The value  + 4YZ or  + 4(YZ 216 ) is placed in register X. using relative addressing as in the B and PB commands.  JMP @+XYZ[-67108864] `jump'.@+4*YZ[-262144] `probable branch if nonnegative'. then `GO r. which change the location of the next instruction no matter what. Thus programmers should use a B instruction when they think branching is relatively unlikely.@+4*YZ[-262144] `probable branch if nonpositive'. It allows a program to transfer control from location  to any location between  67.  GO $X. we can write `GETA $X.$Z|Z `go to location'.  GETA $X.r. (GO is similar to a jump.MMIX: 19 BRANCHES AND JUMPS instruction takes longer when the branch is taken.$Y. Y.  PBZ $X. Locations that are relative to the current instruction can be transformed into absolute locations with GETA commands.Addr '. MMIX also has unconditional jump instructions.  PBOD $X.  PBP $X.) 19.@+4*YZ[-262144] `probable branch if positive'.  PBNZ $X.  PBNP $X. (The assembly language conventions of branch instructions apply.@+4*YZ[-262144] `get address'. Since GO has the same format as a load or store instruction.108. and continues from there. 18.subloc. and Z as an unsigned 24-bit integer XYZ. but it is not relative to the current location.) An old-fashioned type of subroutine linkage can be implemented by saying either `GO r. the location of the instruction that would ordinarily have been executed next.108. JMP Sub ' to enter a subroutine. for example.

They will.cant bits of the address in a GO command are essentially ignored. x29. and in the return-jump register rJ after PUSHJ or PUSHGO instructions are performed. and in the where-interrupted register at the time of an interrupt. x29. . Therefore they could be used to send some kind of signal to a subroutine or (less likely) to an interrupt handler. however. PUSHJ . PUSHGO . appear in the the value of  returned by GETA instructions.

(Immediate multiplication by powers of 2 can be done more rapidly with the SL instruction. An integer over ow exception can occur. Now for some instructions that make MMIX work harder. and the upper 64 bits are placed in the special himult register rH. (Immediate multiplication by powers of 2 can be done more rapidly with the SLU instruction. The signed product of the number in register Y by either the number in register Z or the unsigned byte Z replaces the contents of register X.  MUL $X.MMIX: MULTIPLICATION AND DIVISION 20 20. if the result is less than 263 or greater than 263 1.$Y. Furthermore.$Y. Multiplication and division.$Z|Z `multiply unsigned'. an instruction like 4ADDU $X. if the upper half is not needed.)  MULU $X.$Z|Z `multiply'.5 .$Y is faster than MULU $X.) . The lower 64 bits of the unsigned 128-bit product of register Y and either register Z or Z are placed in register X.$Y.$Y. as with ADD or SUB .

An integer divide check exception occurs if the divisor is zero. The signed quotient of the number in register Y divided by either the number in register Z or the unsigned byte Z replaces the contents of register X.21  MMIX: MULTIPLICATION AND DIVISION DIV $X. The quotient of y divided by z is de. otherwise integer over ow is impossible. in that case $X is set to zero and rR is set to $Y. An integer over ow exception occurs if the number 263 is divided by 1.$Z|Z `divide'. and the signed remainder is placed in the special remainder register rR.$Y.

and the remainder is de.ned to be by=z c.

ned to be y by=z cz (also written y mod z ).$Y. and exactly the same remainder as anding with z 1 via the AND command.  DIVU $X. Dividing by z = 2t gives exactly the same quotient as shifting right t via the SR command. the remainder is either zero or has the sign of the divisor. Division of a positive 63-bit number by a positive constant can be accomplished more quickly by computing the upper half of a suitable unsigned product and shifting it right appropriately. Thus. The unsigned 128-bit number obtained by pre.$Z|Z `divide unsigned'.

if rD is greater than or equal to the divisor (and in particular if the divisor is zero). (Unsigned arithmetic never signals an exceptional condition.1 of Seminumerical Algorithms explains how to use unsigned division to obtain the quotient and remainder of extremely large numbers. then $X is set to rD and rR is set to $Y. The remainder is placed in the remainder register rR. and the quotient is placed in register X. However.3. unsigned division by z = 2t gives exactly the same quotient as shifting right t via the SRU command. . Section 4.) If rD is zero.xing the special dividend register rD to the contents of register Y is divided either by the unsigned number in register Z or by the unsigned byte Z. and exactly the same remainder as anding with z 1 via the AND command. even when dividing by zero.

but MMIX calls them simply oating point numbers because 64-bit quantities are the norm. Floating point computations. but with fewer bits of precision. Floating point numbers can be in. The IEEE standard refers to such numbers as \double format" quantities. Floating point arithmetic conforming to the famous IEEE/ANSI Standard 754 is provided for arbitrary 64-bit numbers.MMIX: FLOATING POINT COMPUTATIONS 22 21. A positive oating point number has 53 bits of precision and can range from approximately 10 308 to 10308 . \Denormal numbers" between 10 324 and 10 308 can also be represented.

nite. Floating point quantities can also be \Not-a-Numbers" or NaNs. 2:8  1 = 1. and they satisfy such identities as 1:0=1 = +0:0. which are further classi.

For example. The multiplication of extremely large or extremely small oating point numbers is inexact and it also causes over ow or under ow. oating inexact (X). Invalid results occur when taking the square root of a negative number.0 by 13. mathematicians can remember the I exception by relating it to the square root of 1:0. and they each have code letters: Floating over ow (O) or under ow (U). but the division 1. and the division of 91. oating divide by zero (Z). and oating invalid (I). Invalid results also occur when trying to convert in. the multiplication of suÆciently small integers causes no exceptions.ed into signaling NaNs and quiet NaNs. Five kinds of exceptions can occur during oating point computations.0 is also exception-free.0 is inexact.0/3.

nity or a quiet NaN to a .

xedpoint integer. or when mathematically unde. or when any signaling NaN is encountered.

round up (toward +1).) Four di erent rounding modes for inexact results are available: round to nearest (and to even in case of ties). or round down (toward 1). round o (toward zero). (Programmers can be sure that they have not erroneously used uninitialized oating point data if they initialize all their variables to signaling NaN values.ned operations like 1 1 or 0=0 are requested. MMIX has a special arithmetic status register rA that speci.

es the current rounding mode and the user's current preferences for exception handling. IEEE standard arithmetic provides an excellent foundation for scienti.

we need not study all the details.2. but we do need to specify MMIX 's behavior with respect to several things that are not completely de.c calculations. Section 4. and it will be thoroughly explained in the fourth edition of Seminumerical Algorithms. For our present purposes.

For example.ned by the standard. the IEEE standard does not de.

We regard e as an integer between 0 and (11111111111)2 = 2047. When an octabyte represents a oating point number in MMIX 's registers. and the remaining 52 bits are the fraction part f .ne the representations of NaNs. and we regard f as a fraction between 0 and (:111 : : : 1)2 = 1 2 52 . Each octabyte has the following signi. the leftmost bit is the sign. then come 11 bits for an exponent e.

f . 2e 1023(1 + f ). 1. if e = f = 0 (zero). if 0 < e < 2047 (normal). if e = 0 and f > 0 (denormal).cance: 0:0. if e = 2047 and f = 0 (in.

NaN(f ). if e = 2047 and 0 < f < 1=2 (signaling NaN). 2 1022 . if e = 2047 and f  1=2 (quiet NaN).nite). NaN(f ).

this fact is important for interval arithmetic. Exercise: What 64 bits represent the oating point number 1.23 MMIX: FLOATING POINT COMPUTATIONS Notice that +0:0 is distinguished from 0:0. so the answer is # 3ff0000000000000.0? Answer: We want e = 1023 and f = 0. Exercise: What is the largest .

nite oating point number? Answer: We want e = 2046 and f = 1 2 52 . . so the answer is # 7fefffffffffffff = 21024 2971 .

we . except that square root and integerization ignore $Y since they involve only one operand. division. called the standard oating point conventions in the discussion below: The operation is performed on oating point numbers found in two registers. square root. The seven IEEE oating point arithmetic operations (addition. If neither input operand is a NaN. remainder. and nearest-integer) all share common features. subtraction. $Y and $Z.MMIX: FLOATING POINT COMPUTATIONS 24 22. multiplication.

rst determine the exact result. In. then round it using the current rounding mode found in special register rA.

A oating over ow exception occurs if the rounded result is .nite results are exact and need no rounding.

the result is 0:0 except that (+0:0)+(+0:0) = +0:0. The oating point product $Y  $Z is computed by the standard oating point conventions. Floating point under ow cannot occur unless the U-trip has been enabled. An invalid exception occurs if the sum is (+1) + ( 1) or ( 1) + (+1). A oating under ow exception occurs if the rounded result needs an exponent less than 1 and either (i) the unrounded result cannot be represented exactly as a denormal number or (ii) the \ oating under ow trip" is enabled in rA. The oating point sum $Y+$Z is computed by the standard oating point conventions just described. And still more surprising possibilities exist. so the numbers in the interval have a known sign. This instruction is equivalent to FADD . These rules for signed zeros turn out to be useful when doing interval arithmetic: If the lower bound of an interval is +0:0 or if the upper bound is 0:0. if the other one is not a signaling NaN.) NaNs are treated specially as follows: If either $Y or $Z is a signaling NaN. $Z) such that the commands FADD $X. the result is set to $Y. Then if $Z is a quiet NaN. Other possibilities occur when they are both positive and less than # 0010000000000001.nite but needs an exponent greater than 2046.$Z ` oating subtract'. If the sum is exactly zero and the current mode is not rounding-down.  FMUL $X. Silly but instructive exercise: Find all pairs of numbers ($Y.$Z and ADDU $X. an invalid exception occurs and the NaN is quieted by adding 1/2 to its fraction part. the interval does not contain zero. or when one operand is # 0000000000000001 and the other is an odd number between # 0020000000000001 and # 002ffffffffffffd inclusive (rounding to nearest). in that case the result is NaN(1=2) with the sign of $Z.$Z both produce the same result in $X (although FADD may cause oating exceptions). otherwise if $Y is a quiet NaN. If the sum is exactly zero and the current mode is rounding-down. All eight families of solutions will be revealed some day in the fourth edition of Seminumerical Algorithms. such as # 7f6001b4c67bc809 + # ff5ffb6a4534a3f7. and placed in register X.$Y.  FSUB $X. Answer: Of course $Y or $Z could be zero.$Y. (Trips are discussed below.  FADD $X.$Z ` oating add'. the result is +0:0 except that ( 0:0) + ( 0:0) = 0:0. because any under owing result of oating point addition can be represented exactly as a denormal number. and placed in register X. but with the sign of $Z negated unless rZ is a NaN. Or one could be signaling and the other # 0008000000000000. the result is set to $Z.$Z ` oating multiply'. An invalid exception occurs if the product is .$Y.$Y.$Y.

No exception occurs for the product (1)  (1).$Z ` oating remainder'.  FDIV $X. The oating point quotient $Y=$Z is computed by the standard oating point conventions.0. and placed in register X. If neither $Y nor $Z is a NaN.$Y.  FREM $X. the sign of the result is the product of the signs of $Y and $Z. and placed in $X. (The IEEE standard de. in that case the result is NaN(1=2). If neither $Y nor $Z is a NaN. in that case the result is NaN(1=2).25 MMIX: FLOATING POINT COMPUTATIONS (0:0)  (1) or (1)  (0:0).#fff0 will divide it by 2. If a oating point number in register X is known to have an exponent between 2 and 2046. the sign of the result is the product of the signs of $Y and $Z. the instruction INCH $X. A oating divide by zero exception occurs if the quotient is (normal or denormal)=(0:0).$Z ` oating divide'. An invalid exception occurs if the quotient is (0:0)=(0:0) or (1)=(1). No exception occurs for the quotient (1)=(0:0).$Y. The oating point remainder $Y rem $Z is computed by the standard oating point conventions.

An invalid exception occurs if $Y is in. where n is the nearest integer to $Y=$Z. This is not the same as the remainder $Y mod $Z computed by DIV or DIVU .nes the remainder to be $Y n  $Z.) A zero remainder has the sign of rY. and n is an even integer in case of ties.

$Z ` oating square root'.nite and/or $Z is zero. in that case the result is NaN(1=2) with the sign of $Y.  FSQRT $X. p The oating point square root $Z is computed by the standard oating point conventions. and placed in register X. An invalid exception occurs if $Z is a negative number (either in.

$Z ` oating integer'. In. or denormal). in that case the result is NaN(1=2). No exception occurs when taking the square root of 0:0 or +1. using the current rounding mode. In all cases the sign of the result is the sign of $Z. The oating point number in register Z is rounded (if necessary) to a oating point integer. normal. and placed in register $X.  FINT $X.nite.

nite values
and quiet NaNs are not changed; signaling NaNs are treated as in the standard
conventions. Floating point over ow and under ow exceptions cannot occur.
The Y

eld of FSQRT and FINT can be used to specify a special rounding mode, as
explained below.

MMIX:

FLOATING POINT COMPUTATIONS

26

23. Besides doing arithmetic, we need to compare oating point numbers with each
other, taking proper account of NaNs and the fact that 0:0 should be considered
equal to +0:0. The following instructions are analogous to the comparison operators
CMP and CMPU that we have used for integers. 

FCMP $X,$Y,$Z ` oating compare'.
Register X is set to 1 if $Y < $Z according to the conventions of oating point
arithmetic, or to 1 if $Y > $Z according to those conventions. Otherwise it is set
to 0. An invalid exception occurs if either $Y or $Z is a NaN; in such cases the result
is zero. 
FEQL $X,$Y,$Z ` oating equal to'.
Register X is set to 1 if $Y = $Z according to the conventions of oating point
arithmetic. Otherwise it is set to 0. The result is zero if either $Y or $Z is a NaN,
even if a NaN is being compared with itself. However, no invalid exception occurs,
not even when $Y or $Z is a signaling NaN. (Perhaps MMIX di ers slightly from the
IEEE standard in this regard, but programmers sometimes need to look at signaling
NaNs without encountering side e ects. Programmers who insist on raising an invalid
exception whenever a signaling NaN is compared for oating equality should issue the
instructions FSUB $X,$Y,$Y; FSUB $X,$Z,$Z just before saying FEQL $X,$Y,$Z .)
Suppose w, x, y, and z are unsigned 64-bit integers with w < x < 263  y < z .
Thus, the leftmost bits of w and x are 0, while the leftmost bits of y and z are 1. Then
we have w < x < y < z when these numbers are considered as unsigned integers, but
y < z < w < x when they are considered as signed integers, because y and z are
negative. Furthermore, we have z < y  w < x when these same 64-bit quantities are
considered to be oating point numbers, assuming that no NaNs are present, because
the leftmost bit of a oating point number represents its sign and the remaining bits
represent its magnitude. The case y = w occurs in oating point comparison if and
only if y is the representation of 0:0 and w is the representation of +0:0. 
FUN $X,$Y,$Z ` oating unordered'.
Register X is set to 1 if $Y and $Z are unordered according to the conventions of
oating point arithmetic (namely, if either one is a NaN); otherwise register X is set
to 0. No invalid exception occurs, not even when $Y or $Z is a signaling NaN.
The IEEE standard discusses 26 di erent possible relations on oating point numbers; MMIX implements 14 of them with single instructions, followed by a branch (or
by a ZS to make a \pure" 0 or 1 result); all 26 can be evaluated with a sequence of at
most four MMIX commands and a subsequent branch. The hardest case to handle is
`?>=' (unordered or greater or equal, to be computed without exceptions), for which
the following sequence makes $X  0 if and only if $Y ?>= $Z:
FUN
BP
FCMP
1H CSNZ

$255,$Y,$Z
$255,1F
% skip ahead if unordered
$X,$Y,$Z
% $X=[$Y>$Z]-[$Y<$Z]; no exceptions will arise
$X,$255,1 % $X=1 if unordered

MMIX:

27

FLOATING POINT COMPUTATIONS

24. Exercise: Suppose MMIX had no FINT instruction. Explain how to obtain the
equivalent of FINT $X,$Z using other instructions. Your program should do the proper
thing with respect to NaNs and exceptions. (For example, it should cause an invalid
exception if and only if $Z is a signaling NaN; it should cause an inexact exception
only if $Z needs to be rounded to another value.)
Answer: (The assembler pre

xes hexadecimal constants by # .)

1H

SETH
SET
ANDNH
ANDN
FUN
BNZ
FCMP
CSNN
OR
FADD
FSUB

$0,#4330
$1,$Z
$1,#8000
$2,$Z,$1
$3,$Z,$Z
$3,1F
$3,$1,$0
$0,$3,0
$0,$2,$0
$1,$Z,$0
$X,$1,$0

%
%
%
%
%
%
%
%
%
%
%

$0=2^53
$1=$Z
$1=abs($Z)
$2=signbit($Z)
$3=[$Z is a NaN]
skip ahead if $Z is a NaN
$3=[abs($Z)>2^53]-[abs($Z)<2^53]
set $0=0 if $3>=0
attach sign of $Z to $0
$1=$Z+$0
$X=$1-$0

This program handles most cases of interest by adding and subtracting 253 using
oating point arithmetic. It would be incorrect to do this in all cases; for example,
such addition/subtraction might fail to give the correct answer when $Z is a small
negative quantity (if rounding toward zero), or when $Z is a number like 2106 + 254
(if rounding to nearest).

MMIX:

FLOATING POINT COMPUTATIONS

28

25. MMIX goes beyond the IEEE standard to de

ne additional relations between

oating point numbers, as suggested by the theory in Section 4.2.2 of Seminumerical
. Given a nonnegative number , each normal oating point number
u = (f; e) has a neighborhood

Algorithms

N (u) = fx j jx uj  2e 1022 g;
we also de

ne N (0) = f0g, N (u) = fx j jx uj  2 1021 g if u is denormal;
N (1) = f1g if  < 1, N (1) = feverything except 1g if 1   < 2,
N (1) = feverythingg if   2. Then we write
u  v (), if u < N (v) and N (u) < v;
u  v (), if u 2 N (v) or v 2 N (u);
u  v (), if u 2 N (v) and v 2 N (u);
u  v (), if u > N (v) and N (u) > v. 

FCMPE $X,$Y,$Z ` oating compare (with respect to epsilon)'.
Register X is set to 1 if $Y  $Z (rE) according to the conventions of Seminumerical
Algorithms as stated above; it is set to 1 if $Y  $Z
(rE) according to those
conventions; otherwise it is set to 0. Here rE is a oating point number in the special
epsilon register , which is used only by the oating point comparison operations FCMPE ,
FEQLE , and FUNE . An invalid exception occurs, and the result is zero, if any of $Y,
$Z, or rE are NaN, or if rE is negative. If no such exception occurs, exactly one of
the three conditions $Y  $Z, $Y  $Z, $Y  $Z holds with respect to rE. 
FEQLE $X,$Y,$Z ` oating equivalent (with respect to epsilon)'.
Register X is set to 1 if $Y  $Z (rE) according to the conventions of Seminumerical
Algorithms as stated above; otherwise it is set to 0. An invalid exception occurs,
and the result is zero, if any of $Y, $Z, or rE are NaN, or if rE is negative. Notice
that the relation $Y  $Z computed by FEQLE is stronger than the relation $Y  $Z
computed by FCMPE . 
FUNE $X,$Y,$Z ` oating unordered (with respect to epsilon)'.
Register X is set to 1 if $Y, $Z, or rE are exceptional as discussed for FCMPE and FEQLE ;
otherwise it is set to 0. No exceptions occur, even if $Y, $Z, or rE is a signaling NaN.
Exercise: What oating point numbers does FCMPE regard as  0:0 with respect to 
= 1=2, when no exceptions arise? Answer: Zero, denormal numbers, and normal
numbers with f = 0. (The numbers similar to zero with respect to  are zero, denormal
numbers with f  2, normal numbers with f  2 1, and 1 if  >= 1.)
26. The IEEE standard also de

nes 32-bit oating point quantities, which it calls
\single format" numbers. MMIX calls them short oats, and converts between 32bit and 64-bit forms when such numbers are loaded from memory or stored from
memory. A short oat consists of a sign bit followed by an 8-bit exponent and a 23bit fraction. After it has been loaded into one of MMIX 's registers, its 52-bit fraction
part will have 29 trailing zero bits, and its exponent e will be one of the 256 values 0,
(01110000001)2 = 897, (01110000010)2 = 898, : : : , (10001111110)2 = 1150, or 2047,
unless it was denormal; a denormal short oat loads into a normal number with
874  e  896.

29

MMIX:

FLOATING POINT COMPUTATIONS 

LDSF $X,$Y,$Z|Z `load short oat'.
Register X is set to the 64-bit oating point number corresponding to the 32-bit
oating point number represented by M4 [$Y + $Z] or M4 [$Y + Z]. No arithmetic
exceptions occur, not even if a signaling NaN is loaded. 
STSF $X,$Y,$Z|Z `store short oat'.
The value obtained by rounding register X to a 32-bit oating point number is placed
in M4 [$Y + $Z] or M4 [$Y + Z]. Rounding is done with the current rounding mode, in
a manner exactly analogous to the standard conventions for rounding 64-bit results,
except that the precision and exponent range are limited. In particular, oating
over ow, under ow, and inexact exceptions might occur; a signaling NaN will trigger
an invalid exception and it will become quiet. The fraction part of a NaN is truncated
if necessary to a multiple of 2 23 , by ignoring the least signi

cant 29 bits.
If we load any two short oats and operate on them once with either FADD , FSUB ,
FMUL , FDIV , FREM , FSQRT , or FINT , and if we then store the result as a short oat,
we obtain the results required by the IEEE standard for single format arithmetic,
because the double format can be shown to have enough precision to avoid any
problems of \double rounding." But programmers are usually better o sticking
to 64-bit arithmetic unless they have a strong reason to emulate the precise behavior
of a 32-bit computer; 32 bits do not o er much precision.

27. Of course we need to be able to go back and forth between integers and oating
point values. 

FIX $X,$Z `convert oating to

xed'. and the resulting integer (mod 264 ) is placed in register X. The oating point number in register $Z is converted to an integer as with the FINT instruction. An invalid exception occurs if $Z is in.

in that case $X is simply set equal to $Z. A oat-to-.nite or a NaN.

 FIXU $X.x exception occurs if the result is less than 263 or greater than 263 1.$Z `convert oating to .

xed unsigned'. This instruction is identical to FIX except that no oat-to-.

 FLOT $X.$Z|Z `convert .x exception occurs.

 FLOTU $X.$Z|Z `convert . A oating inexact exception occurs if rounding is necessary. The integer in $Z or the immediate constant Z is converted to the nearest oating point value (using the current rounding mode) and placed in register $X.xed to oating'.

xed to oating unsigned'.$Z|Z `convert . but $Z is treated as an unsigned integer.  SFLOT $X. FLOTU is like FLOT .

$Z|Z `convert .xed to short oat'. SFLOTU $X.

) Such conversions appear in MMIX 's repertoire only to establish complete conformance with the IEEE standard. (Thus. the resulting value will not be changed by a \store short oat" instruction. The SFLOT instructions are like the FLOT instructions.xed to short oat unsigned'. a programmer needs them only when emulating a 32-bit machine. . except that they round to a oating point number whose fraction part is a multiple of 2 23 .

MMIX: FLOATING POINT COMPUTATIONS 30 28. Since the variants of FIX and FLOT involve only one input operand ($Z or Z). their Y .

the instruction FLOTU $X. SR $X.$Z. ROUND_DOWN . A programmer can. SUB $X. for example. however.1086 .64 . ROUND_NEAR . ROUND_OFF .$Z will set the exponent e of register X to 1086 l if $Z is a nonzero quantity with l leading zero bits.52 . Y = 2. force the mode of rounding used with these commands by setting Y = 1.$X. The Y . Y = 4. Y = 3.$X . Thus we can count leading zeros by continuing with SETL $0.ROUND_OFF. ROUND_UP . CSZ $X.eld is normally zero.$0.

eld can also be used in the same way to specify any desired rounding mode in the other oating point instructions that have only a single operand. namely FSQRT and FINT . An illegal instruction interrupt occurs if Y exceeds 4 in any of these commands. .

31 MMIX: FLOATING POINT COMPUTATIONS .

increasing  . S [ 1] of octabytes holding the contents of registers that are temporarily inaccessible.MMIX: SUBROUTINE LINKAGE 32 29. MMIX maintains a potentially unbounded list S [0]. when the subroutine has . Subroutine linkage. registers can be \pushed" on to the end of this list. which can coexist with a software-supported stack of variables that are not maintained in registers. The key notion is the idea of a hardware-supported register stack. initially  = 0. S [1]. : : : . MMIX has a several special operations designed to facilitate the process of calling and implementing subroutines. When a subroutine is entered. From a programmer's standpoint.

registers L. But in fact. registers G. The G counter is normally set to a . MMIX maintains two internal one-byte counters L and G. G 1 are \marginal". $1." A marginal register is zero when its value is read. 1. L 1 are \local". L + 1. $255 as if they were alike. 255 are \global. where 0  L  G < 256. the registers are \popped" o again and  decreases.nished its execution. G + 1. : : : . : : : . : : : . : : : . with the property that registers 0. Our discussion so far has treated all 256 registers $0.

thereby de.xed value once and for all when a program is loaded.

if L = 10 and G = 200.ning the number of program variables that will live entirely in registers rather than in memory during the course of execution. the value of L will increase to x + 1. $15 to $5 + $200. and any newly local registers will be zero.)  PUSHJ $X. A programmer may.$5.$Z|Z `push registers and go'. $11.1 would simply set $5 to 1.$Y. the instruction ADD $5. and L to 16. : : : .$15. But the instruction ADD $15. If an instruction places a value into a register that is currently marginal.$200 would set $10. $14 to zero. namely a register x such that L  x < G. For example. The L counter starts at 0. (The process of clearing registers and increasing L might take quite a few machine cycles in the worst case.@+4*YZ[-262144] `push registers and jump'. Suppose . change G dynamically using the PUT instruction described below. We will see later that MMIX is able to take care of any high-priority interrupts that might occur during this time.  PUSHGO $X. however.

Pushing the . then registers 0. X are pushed onto the register stack as described below. Then control jumps to instruction  + 4YZ or  + 4YZ 262144 or $Y + $Z or $Y + Z. If this instruction is in location . as in a JMP or GO command.rst that X < L. Register X is set equal to the number X. : : : . the value  + 4 is placed into the special return-jump register rJ. 1.

: : : . except that all of the local registers $0. $1. Then control jumps to a subroutine with L reduced to 3. $0 $(X + 1). and L is reset to zero. if X = 1 and L = 5. : : : .rst X + 1 registers onto the stack means essentially that we set S [ ] $0. S [ + X] $X. L L X 1. If X  L the actions are similar. For example. where they will be temporarily inaccessible. In . and $4 appear as $0. $3. the registers that we had been calling $2. : : : . $(L 1) are placed on the register stack followed by the number L. $(L X 2) $(L 1). and $2 to the subroutine. S [ + 1] $1. the current contents of $0 and the number 1 are placed on the register stack.   + X + 1.

the instruction PUSHGO $255.$Z pushes all the local registers onto the stack and sets L to zero. and jumps to the instruction in M4 [4YZ + rJ].$Y. The formal details of POP are slightly complicated. the value of $(X 1) goes into the \hole" position where PUSHJ or PUSHGO stored the number of registers previously pushed. undoes the e ect of the most recent PUSHJ or PUSHGO . we .  POP X. We will see later that MMIX is able to achieve the e ect of pushing and renaming local registers without actually doing very much work at all. If X > 0. This command preserves X of the current local registers.YZ `pop registers and return from subroutine'. regardless of the previous value of L.MMIX: 33 SUBROUTINE LINKAGE particular. but we will see that they make sense: If X > L.

: : : . Then we essentially set x S [ 1] mod 256. L min(x + X. $0 S [ x 1]. $x S [ 1]. : : : . here x is the e ective value of the X . $(x + 1) $0.   x 1. G). S [ 1] $(X 1).rst replace X by L + 1. $(L 1) $(L x 2).

for example with the instruction GET $4. If the subroutine does not call any other subroutines.eld on the previous PUSHGO . .) Suppose. Otherwise it should begin by saving rJ. for example. and it should use PUSHJ $5 or PUSHGO $5 when calling sub-subroutines.0 . it can simply end with POP 2. these formulas don't make sense as written. that a subroutine has three input parameters ($0. we actually set $j S [ x 1 + j ] for L > j  0 in that rare case. $1. because rJ will contain the return address. $2) and produces two outputs ($0. The operating system should arrange things so that a memory-protection interrupt will occur if a program does more pops than pushes. $1). (If x > G.rJ if it will be using local registers $0 through $3.

A subroutine with two outputs ($0. we would put the input arguments into $7. the relative order of the two outputs has been switched on the register stack. in due time the outputs of the subroutine will appear in $7 and $6. If a subroutine has. .base. A subroutine with one output $0 ends with POP 1.$4 before saying POP 2. To call the subroutine from another routine that has.0 and the hole disappears (becomes marginal). between the registers that are pushed down and the registers that remain local. Notice that the push and pop commands make use of a one-place \hole" in the register stack. $1) ends with POP 2. A subroutine with no outputs ends with POP 0.) MMIX needs this hole position to remember the number of registers that are pushed down. 6 local registers.nally it should PUT rJ.0 and the hole gets the former value of $1. $8. and $9. then issue the command PUSHGO $6. say.0 . in this case. therefore.Subr . (The hole is position $6 in the example just considered. say.0 and the hole gets the former value of $0.

it ends with POP 5. : : : . where it is followed by ($0. $1. MMIX makes this curious permutation in the case of multiple outputs because the hole is most easily plugged by moving one value down (namely $4) instead of by sliding each of .0 and $4 goes into the hole position. $4).ve outputs ($0. $3). $2.

x43.ve values down in the stack. . GET . PUT . x43.

MMIX: SUBROUTINE LINKAGE 34 These conventions for parameter passing are admittedly a bit confusing in the general case. and I suppose people who use them extensively might someday .

PUSHGO and POP can be implemented with great eÆciency. a. there is good use for subroutines that convert a sequence of register contents like (x. b. Moreover. b." However. and subroutine linkage tends to be a signi. c) where f is a function of a. b. c) into (f.nd themselves talking about \the infamous MMIX register shue. a. and c but not x.

cant bottleneck when other conventions are used. MMIX does not allow the value of G to become less than 32. A subroutine that uses 50 local registers will not function properly if it is called by a program that sets G less than 50. The rules stated above imply that a PUSHJ or PUSHGO instruction with X = 255 pushes all of the currently de. That can readily be done at the same time as we inform the debugger about the symbolic names of addresses in memory. Therefore any subroutine that avoids global registers and uses at most 32 local registers can be sure to work properly regardless of the current value of G. Information about a subroutine's calling conventions needs to be communicated to a debugger.

and puts zero into the hole. : : : . A smaller value of L means that MMIX can change context more easily when switching from one process to another. and in general to keep the value of L as small as possible by decreasing it when registers are no longer active. This makes G local registers available for use by the subroutine jumped to. however. is almost always to use POP with a small value of X. $(L 1) will be restored. If that subroutine later returns with POP 0.ned local registers onto the stack and sets L to zero. The best policy. the former value of L and the former contents of $0. A POP instruction with X = 255 preserves all the local registers as outputs of the subroutine (provided that the total doesn't exceed G after popping). .0 .

But when such a cache exists. Liptay. 270{ 271. which have the same meaning as STO . 15{21. they inform the computer that all of the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X]. S.$Z|Z `prestore data'.$Z|Z `store octa uncached'. will probably be loaded and/or stored in the near future.$Y.$Y. These instructions. These instructions.$Z|Z `preload data'. or M[$Y + Z] through M[$Y + Z + X]. High-performance implementations of MMIX gain speed by keeping caches of instructions and data that are likely to be needed as computation proceeds. System considerations. V. which have the same meaning as LDO . [See M. J. No protection failure occurs if the memory is not accessible.$Y.$Z|Z `load octa uncached'.35 MMIX: SYSTEM CONSIDERATIONS 30.$Y.$Z|Z `prefetch to go'. but they inform the computer that many of the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X]. but they inform the computer that many of the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X].  LDUNC $X. These instructions have no e ect on registers or memory. will probably be used as instructions in the near future. 7 (1968).  STUNC $X. will de. IEEE Transactions EC-14 (1965).] Careful programmers can make the computer run even faster by giving hints about how to maintain such caches.  PRELD X. also inform the computer that the stored octabyte (and its neighbors in a cache block) will probably not be read or written in the near future. Wilkes. IBM System J. No protection failure occurs if the memory is not accessible. or M[$Y + Z] through M[$Y + Z + X].  PREGO X. also inform the computer that the loaded octabyte (and its neighbors in a cache block) will probably not be read or written in the near future. or M[$Y + Z] through M[$Y + Z + X].  PREST X. These instructions have no e ect on registers or memory if the computer has no data cache. These instructions have no e ect on registers or memory.$Y.

nitely be stored in the near future before they are loaded. (Therefore it is permissible for the machine to ignore the present contents of those bytes. if those bytes are being shared by several processors.) No protection failure occurs if the memory is not accessible. the current processor should try to acquire exclusive access. Also. .

$Y. (Otherwise the result of a previous store instruction might appear only in the cache. they force the computer to make sure that all data for the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X].) No protection failure occurs if the memory is not accessible. The action is similar when SYNCD is executed from a negative address. or M[$Y + Z] through M[$Y + Z + X]. But when such a bu er or cache exists.$Z|Z `synchronize data'. but in this case the speci. A program can use this feature before outputting directly from memory.MMIX: SYSTEM CONSIDERATIONS 36  SYNCD X. will be present in memory. if it hasn't already been written. these instructions have no e ect on registers or memory if neither a write bu er nor a \write back" data cache are present. When executed from nonnegative locations. the computer is being told that now is the time to write the information back.

if present).  SYNCID X. When executed from nonnegative locations these instructions have no e ect on registers or memory if the computer has no instruction cache separate from a data cache.$Y.ed bytes are also removed from the data cache (and from a secondary cache. or when data is input directly into memory.$Z|Z `synchronize instructions and data'. or M[$Y + Z] through M[$Y + Z + X]. But when such a cache exists. they force the computer to make sure that the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X]. will be interpreted correctly if used as instructions before they are next modi. The operating system can use this feature when a page of virtual memory is being swapped out.

must . Programmers who insist on executing instructions that have been fabricated dynamically. an MMIX program is not expected to store anything in memory locations that are also being used as instructions. for example when setting a breakpoint for debugging. (Generally speaking.ed. Therefore MMIX 's instruction cache is allowed to become inconsistent with respect to its data cache.

A SYNCID command might be implemented in several ways. the machine might update its instruction cache to agree with its data cache. removes instructions in the speci. for example.rst SYNCID those instructions in order to guarantee that the intended results will be obtained. which is good enough because the need for SYNCID ought to be rare. A simpler solution.

but faster.ed range from the instruction cache. Then all bytes in the speci. so that they will have to be fetched from memory the next time they are needed. in this case the machine also carries out the e ect of a SYNCD command. when SYNCID is executed from a negative location. The behavior is more drastic. No protection failure occurs if the memory is not accessible. if present.

31. and the memory corresponding to any \dirty" cache blocks involving such bytes is not brought up to date. and . when a program is being terminated). The following commands are useful for eÆcient operation in such circumstances. MMIX is designed to work not only on a single processor but also in situations where several processors share a common memory. An operating system can use this version of the command when pages of virtual memory are being discarded (for example. it is replaced in memory with the contents of register X.$Y.$Z|Z `compare and swap octabytes'.  CSWAP $X.ed range are simply removed from all caches. If the octabyte M8 [$Y + $Z] or M8 [$Y + Z] is equal to the contents of the special prediction register rP.

useful for interprocess communication when independent computers are sharing the same memory. Signi. uninterruptible) operation. Otherwise the octabyte in memory replaces rP and register X is set to zero. This is an atomic (indivisible. The compare-and-swap operation was introduced by IBM in late models of the System/370 architecture.37 MMIX: SYSTEM CONSIDERATIONS register X is set equal to 1. and it soon spread to several other machines.

because it simply obliterates everything. in the sense that only the operating system can use them. Only the operating system can disable interrupts (see below).2 and 8. discarding any information in the data caches that wasn't previously in memory. If XYZ = 3. If XYZ = 4. If XYZ = 1. If XYZ = 0. If XYZ = 5. in such a way that all store instructions preceding this SYNC will be completed before all store instructions after it. and in sections 8. If XYZ = 7. if any (including a possible secondary cache). More precisely. if a SYNC command is encountered with XYZ = 4 or XYZ = 5 or XYZ = 6 or XYZ = 7. SYNC might be necessary in certain cases even on a oneprocessor system. (\Clearing" is stronger than \cleaning". for example. 1993). in section 7.cant ways to use it are discussed. the machine drains its pipeline (that is. because input/output processes take place in parallel with ordinary computation.3 of Transaction Processing by Jim Gray and Andreas Reuter (San Francisco: Morgan Kaufmann. the machine clears its instruction and data caches. the caches retain their data. If XYZ = 6. If XYZ = 2.3 of Harold Stone's High-Performance Computer Architecture (Reading. the machine empties its write bu er and cleans its data caches. the machine controls its actions in such a way that all load instructions preceding this SYNC will be completed before all load instructions after it. the machine controls its actions in such a way that all load or store instructions preceding this SYNC will be completed before all load or store instructions after it. The cases XYZ > 3 are privileged. an illegal instruction interrupt occurs.  SYNC XYZ `synchronize'. Massachusetts: Addison{Wesley. Of course no SYNC is necessary between a command that loads from or stores into memory and a subsequent command that loads from or stores into exactly the same location. a \privileged instruction interrupt" occurs unless that interrupt is currently disabled. it stalls until all preceding instructions have completed their activity). the machine controls its actions less drastically. Clearing is also faster. in which instructions may be executed more slowly (or not at all) until some kind of \wake-up" signal is received. 1987). . However.) If XYZ > 7. the machine clears its virtual address translation caches (see below).2. a clear cache remembers nothing. the machine goes into a power-saver mode. but the cache contents also appear in memory.

Its least signi. Special register rA records the current status information about arithmetic exceptions. Trips and traps.MMIX: TRIPS AND TRAPS 38 32.

V for integer over ow. W for oat-to-. where D stands for integer divide check.cant byte contains eight \event" bits called DVWIOUZX from left to right.

and X for oating inexact. I for invalid operation. U for oating under ow.x over ow. Z for oating division by zero. The next least signi. O for oating over ow.

and it might cause both U and X when under ow is enabled. All other types of exceptions arise one at a time. When an exceptional condition occurs. but MMIX always considers an over owed result to be inexact. But if the corresponding enable bit is 1. (The strictest interpretation of the IEEE standard would raise exception X on over ow only if oating over ow is not enabled. the over ow or under ow handler is called and the inexact handler is ignored. the corresponding event bit is set to 1." Floating point over ow always causes two exceptions.cant byte of E contains eight \enable" bits with the same names DVWIOUZX and the same meanings. MMIX interrupts its current instruction stream and executes a special \exception handler. there are two cases: If the corresponding enable bit is 0. If both enable bits are set to 1 in such cases. in general the ." Thus. the event bits record exceptions that have not been \tripped. so there is no ambiguity about which exception handler should be invoked unless exceptions are raised by \ropcode 2" (see below). O and X.) Floating point under ow always causes both U and X when under ow is not enabled.

rst enabled exception in the list DVWIOUZX takes precedence. What about the six high-order bytes of the status register rA? At present. only two of those 48 bits are de.

01 means round o (toward zero).ned. 10 means round up (toward positive in. The two bits corresponding to 217 and 216 in rA specify a rounding mode. the others must be zero for compatibility with possible future extensions. as follows: 00 means round to nearest (the default).

nity). and 11 means round down (toward negative in.

Y. so will illegal or privileged instructions.Z `trap'. Input/output operations or external timers are another common source of interrupts.Z `trip'. 33. all must be handled as gracefully as possible. the X. More precisely. Users can also force interrupts to happen by giving explicit TRAP or TRIP instructions:  TRAP X. or instructions that are emulated in software instead of provided by the hardware.nity). or hardware failures like parity errors. Both of these instructions interrupt processing and transfer control to a handler. We have just seen that arithmetic exceptions will cause interrupts if they are enabled. The execution of MMIX programs can be interrupted in several ways. Interrupts occur also when memory accesses fail|for example if memory is nonexistent or protected. and Z . The di erence between them is that TRAP is handled by the operating system but TRIP is handled by the user. the operating system knows how to deal with all gadgets that might be hooked up to an MMIX processor chip. Power failures that force the machine to use its backup battery power in order to keep running in an emergency. TRIP X.Y.Y.

elds of TRAP have special signi.

cance prede.

For example. a system call| .ned by the operating system kernel.

Y. and Z.39 MMIX: TRIPS AND TRAPS say an I/O command. and Z . Y. The X. or a command to allocate more memory|might be invoked by certain settings of X.

are de.elds of TRIP . on the other hand.

and users also de.nable by users for their own applications.

but interrupts are normally inhibited while a TRAP is being serviced. \Trip handler" programs invoked by TRIP are interruptible.ne their own handlers. Speci.

Only two variants of TRAP are prede.c details about the precise actions of TRIP and TRAP appear below. together with the description of another command called RESUME that returns control from a handler to the interrupted program.

a user process should terminate.ned by the MMIX architecture: If XYZ = 0 in a TRAP command. the operating system should provide default action for cases in which the user has not provided any handler for a particular kind of interrupt (see below). A few additional variants of TRAP are prede. If XYZ = 1.

all have X = 0.ned in the rudimentary operating system used with MMIX simulators. These variants. and the Y . which allow simple input/output operations to be done.

Y = 1 invokes the Fopen routine.eld is a small positive constant. which opens a . For example.

le. (See the program MMIX-SIM for full details.) .

which may or may not have been completed at the time of interrupt and which may or may not need to be resumed after the interrupt has been serviced. the instruction in rX will say ADD . If an interrupt occurs. because an internal instruction retains the identity of the actual command that spawned it. Non-catastrophic interrupts in MMIX are always precise. but rW will point to the real ADD command.) When an instruction has the normal meaning \set $X to the result of $Y op $Z" or \set $X to the result of $Y op Z. for example. the computer silently inserts an internal instruction that increases L before an instruction like ADD $9. and its operands (if any) are placed in special registers rY and rZ." special registers rY and rZ will relate in the obvious way to the Y and Z operands of the instruction. The current instruction. The instruction in rX may not be the same as the instruction in location rW 4. it may be an instruction that branched or jumped to rW. but this is not always the case. The address of the following instruction is placed in the special where-interrupted register rW. and no instructions after that point have yet been executed.$1.$0 if L is currently less than 10. in the sense that all legal instructions before a certain point have e ectively been executed.MMIX: TRIPS AND TRAPS 40 34. the . is put into the special execution register rX. It might also be an instruction inserted internally by the MMIX processor. after an interrupted store instruction. between the inserted instruction and the ADD . For example. (For example.

in cases like STB ).rst operand rY will hold the virtual memory address ($Y plus either $Z or Z). In other cases the actual contents of rY and rZ are de. and the second operand rZ will be the octabyte to be stored in memory (including bytes that have not changed.

ned by each implementation of MMIX . and programmers should not rely on their signi.

For example. so it may be necessary to interrupt them in progress.cance. the FREM instruction ( oating point remainder) is extremely diÆcult to compute rapidly if its . Some instructions take an unpredictable and possibly long amount of time.

(Alternatively. not necessarily the original values of the operands. which can specify prestoring up to 256 bytes. In such cases the rY and rZ registers saved during an interrupt show the current state of the computation. an operation like FREM or even FADD might be implemented in software instead of hardware. the remainder computation will continue where it left o .) Another example arises with an instruction like PREST (prestore). depending on the cache block size. but rY may well have been reduced to a number that has an exponent closer to the exponent of rZ. as we will see later.rst operand has an exponent of 2046 and its second operand has an exponent of 1. An implementation of MMIX might choose to prestore only 32 or 64 bytes at a time. then it can change the contents of rX to re ect the un. The value of rY rem rZ will still be the desired remainder. After the interrupt has been processed.

. Register rX is used to communicate information about partial completion in such a way that the interruption will be essentially \invisible" after a program is resumed.nished part of a partially completed PREST command. pop the stack. Commands that decrease G. or unsave an old context also are interruptible. save the current context.

and sets the 32 bits of the left half to # 80000000. this fact will tell the RESUME command not to TRIP again. We will discuss each of these in turn. (Therefore rX is negative .) The special registers rY and rZ are set to the contents of the registers speci. forced traps. Three kinds of interruption are possible: trips. and dynamic traps. A TRIP instruction puts itself into the right half of the execution register rX.MMIX: 41 TRIPS AND TRAPS 35.

ed by the Y and Z .

the handler should place the desired substitute result into rZ. and $255 is set to zero. RESUME. But the handler for an arithmetic interrupt might want to change the default result of a computation.elds of the TRIP command. namely $Y and $Z.Handler. the operating system provides a default handler via the instructions TRAP 1. registers rY and rZ are set to the operands of the interrupted instruction as explained earlier. 80. Arithmetic exceptions interrupt the computation in essentially the same way as TRIP .rB. GET $255.rB. And if the user does not choose to provide a custom-designed handler. A 16-byte block of memory is more than enough for a sequence of commands like PUSHJ 255. 48. O. GET $255. and X of rA. 32. and 128. I. and it should change the most signi. MMIX now takes its next instruction from virtual memory address 0. W. V. 64. U. 96. The only di erence is that their handlers begin at the respective addresses 16. RESUME which will invoke a user's handler. In such cases. Z. Then $255 is placed into the special bootstrap register rB. if they are enabled. 112. A trip handler might simply record the fact that tripping occurred. for exception bits D.

G . (A bit more work is needed to alter the e ect of a command that stores into memory.cant byte of rX from # 80 to # 02. x26. . x8. RESUME . This will have the desired e ect. because of the rules of RESUME explained below.) Instructions in negative virtual locations do not invoke trip handlers. x38. either for TRIP or for arithmetic exceptions. Such instructions are reserved for the operating system. STB . unless the exception occurred on a command like STB or STSF . STSF . x29. as we will see.

MMIX: TRIPS AND TRAPS 42 36. but with the following modi. A TRAP instruction interrupts the computation essentially like TRIP .

Then if. A subsequent RESUME 1 will then place the value of rZZ in the proper register. (These special registers are needed because a trap might occur while processing a TRIP . Implementations of MMIX might also emulate the process of virtual-address-tophysical-address translation described below. instead of providing for page table calculations in hardware. rYY. The trap handler can tell what instruction to emulate by looking at the opcode. (ii) the interrupt mask register rK is cleared to zero. thereby inhibiting interrupts. rXX.) Another kind of forced trap occurs on implementations of MMIX that emulate certain instructions in software rather than in hardware. (iv) information is placed in a separate set of special registers rBB. Such instructions cause a TRAP even though their opcode is something else like FREM or FADD or DIV . and rZZ.cations: (i) $255 is set to the contents of the special \trap address register" rT. (iii) control jumps to virtual memory address rT. which appears in rXX. should compute the oating point sum of rYY and rZZ and place the result in rZZ. instead of rB. rW. a LDB instruction does not know the physical memory address corresponding to a speci. rY. and rZ. say. rWW. rX. not zero. say. the handler emulating FADD . not zero. In such cases the lefthand half of rXX is set to # 02000000.

The trap handler should place the physical page address into rZZ. then RESUME 1 will complete the LDB . 37. The third and . it will cause a forced trap with the left half of rXX set to # 03000000 and with rYY set to the virtual address in question.ed virtual address.

instruction refers to a negative virtual address. the next for memory parity error. (In such cases the s bits of both rQ and rK are set to 1. The rightmost bit stands for power failure. a security violation occurs if an instruction in a nonnegative address is executed without the rwxnkbsp bits of rK all set to 1.) The eight \machine" bits of rQ and rK represent the most urgent kinds of interrupts.nal kind of interrupt is called a dynamic trap. instruction violates security (see below). The bit positions of rQ and rK have the general form 24 8 24 8 low-priority I/O program high-priority I/O machine where the 8-bit \program" bits are called rwxnkbsp and have the following meanings: r w x n k b s p bit: bit: bit: bit: bit: bit: bit: bit: instruction tries to load from a page without read permission. instruction comes from a privileged (negative) virtual address. Negative addresses are for the use of the operating system only. instruction appears in a page without execute permission. instruction breaks the rules of MMIX . for use by the \kernel" only. instruction tries to store to a page without write permission. instruction is privileged. the next . Such interruptions occur when one or more of the 64 bits in the the special interrupt request register rQ have been set to 1. and when at least one corresponding bit of the special interrupt mask register rK is also equal to 1.

MMIX: 43 TRIPS AND TRAPS for nonexistent memory. The trap handler that begins at location rTT can . etc. except that it uses the special \dynamic trap address register" rTT instead of rT. Then it proceeds as with a forced trap. also are allocated bit positions near the right end. Interrupts that need especially quick service. depending on what devices are available. The allocation of input/output devices to bit positions will di er from implementation to implementation. Once rQ ^ rK becomes nonzero. like requests from a high-speed network. Low priority I/O devices like keyboards are assigned to bits at the left. the next for rebooting. the machine waits brie y until it can give a precise interrupt.

SUBU $1.$1.$0.rK.$1. AND $0. (For example.) If the interrupted instruction contributed 1s to any of the rwxnkbsp bits of rQ.$0.$0. GET $1.$0.gure out the reason for interrupt by examining rQ ^ rK.$1. SADD $2.$2.0 the highest-priority o ending bit will be in $1 and its position will be in $2. XOR $2. after the instructions GET $0.rQ. the corresponding bits are set to 1 also in rX. A dynamic trap handler might be able to use this information (although it should service higher-priority interrupts . ANDN $1.1.

rst if the right half of rQ ^ rK is nonzero). or b does nothing. An instruction that causes a dynamic trap is usually executed before the interruption occurs. SADD . Control of register rK turns out to be the ultimate privilege. k . x38. . x12. RESUME . x43. a store instruction that traps with any of rwxnkbsp stores nothing. a load instruction that traps with r or n loads zero. and in a sense the only important one. However. Therefore the operating system can in fact use instructions that would interrupt an ordinary program. GET . The rules of MMIX are rigged so that only the operating system can execute instructions with interrupts suppressed. an instruction that traps with bits x .

 RESUME Z `resume after interrupt'. the X and Y .MMIX: TRIPS AND TRAPS 44 38. After a trip handler or trap handler has done its thing. it generally invokes the following command.

elds must be zero. If the Z .

rX. subject to certain modi. rY. If the execution register rX is negative. MMIX will use the information found in special registers rW.eld of this instruction is zero. it will be ignored and instructions will be executed starting at virtual address rW. and rZ to restart an interrupted computation. otherwise the instruction in the right half of the execution register will be inserted into the program as if it had appeared in location rW 4.

and the next instruction will come from rW. If the Z .cations that we will explain momentarily.

mask register rK is set to $255 and $255 is set to rBB. and rZZ on some kind of stack. rYY. In such cases it should save the contents of rBB. rYY. Once rK is again zero. before resuming whatever caused the base level interrupt. and rZZ are restored from the stack. just before resuming the computation. and RESUME 1 . before making rK nonzero. (Only the operating system gets to use this feature. and rZ.) An interrupt handler within the operating system might choose to allow itself to be interrupted. and rZZ are used instead of rW. rWW. this can be done with TRAP . it must again disable all interrupts. rWW. rXX. because the trap handler can tell from the virtual address in rWW that it has been invoked by the operating system. rY. rXX. rX. the outer level interrupt mask is placed in $255. rXX. the contents of rBB.eld of RESUME is 1 and if this instruction appears in a negative location. Also. rYY. Then. registers rWW.

nishes the job. Values of Z greater than 1 are reserved for possible later de.

) Finally. Emulated instructions and explicit TRAP commands can therefore cause over ow. of course. spawn a trip interrupt. ropcode 3 is the same as ropcode 0. It also uses the third-from-left byte of rX to raise any or all of the arithmetic exceptions DVWIOUZX. . except that it also tells MMIX to treat rZ as the page table entry for the virtual address rY. This ropcode is normally used with forced-trap emulations. Therefore they cause an illegal instruction interrupt (that is. so that the result of an emulated instruction is placed into the correct register. if any of the corresponding bits are enabled in rA. a ropcode of 1 is similar. (See the discussion of virtual address translation below. say. (Such new exceptions may.) Ropcodes greater than 3 are not permitted. assuming that this makes sense for the operation considered. Ropcode 2 inserts a command that sets $X to rZ. Let's call this byte the \ropcode. only RESUME 1 is allowed to use ropcode 3. at the same time as rZ is being placed in $X." A ropcode of 0 simply inserts the instruction into the execution stream. where X is the second byte in the right half of rX. they set the `b ' bit of rQ) in the present version of MMIX .nition. just as ordinary instructions can. but it substitutes rY and rZ for the two operands. moreover. its leftmost byte controls the way its righthand half will be inserted into the program. If the execution register rX is nonnegative.

Inst. SYNCID . the destination register $X used with ropcode 1 or 2 must not be marginal. they are relevant only if the programmer tries to do something tricky. Thus. x30. and then will jump to location Loc (assuming that Inst doesn't branch elsewhere). x30. and rZZ instead of rW. virtual addresses. All of these restrictions hold automatically in normal use.Loc. ropcode 1 must insert a \normal" instruction. # 6. (See the opcode chart below. and rZ when the ropcode is seen by RESUME 1 . ropcode 3 always applies to rYY and rZZ. rX. opcode chart. Moreover.$0. Notice that the slightly tricky sequence LDA $0. never to rY and rZ. Special restrictions must hold if resumption is to work properly: Ropcodes 0 and 3 must not insert a RESUME instruction. SYNCD .MMIX: 45 TRIPS AND TRAPS The ropcode rules in the previous paragraphs should of course be understood to involve rWW. rYY. namely one whose opcode begins with one of the hexadecimal digits # 0. or # E. x47. rY. # 3. x51. # C. # 1.$1. # D. LDT $1. PUT rX. RESUME will execute an almost arbitrary instruction Inst as if it had been in location Loc-4 . so that those instructions can conveniently be interrupted. # 7. rXX. in particular. PUT rW. .) Some implementations may also allow ropcode 1 with SYNCD[I] and SYNCID[I] . # 2.

rC. rM. Y operand (trap) [30]. In this list rG and rL are what we have been calling simply G and L. rZZ. arithmetic status register [21]. serial number [9]. dividend register [1]. execution register (trip) [25]. rR. multiplex mask register [5]. interrupt request register [16]. and rV have not been mentioned before. rO. but it decreases by 1 on each cycle. Z operand (trap) [31]. bootstrap register (trip) [0]. Y operand (trip) [26]. Such interrupts can be extremely useful for . interval counter [12]. even if it were to increase once every nanosecond. rXX. The cycle counter rC advances by 1 on every \clock pulse" of the MMIX processor. register stack pointer [11]. virtual translation register [18]. rI. register stack o set [10]. interrupt mask register [15]. rS. remainder register [6]. rH. rI. global threshold register [19]. Thus if MMIX is running at 500 MHz. rL. rTT. rK. rV. It is time now to enumerate them all. rZ. rS. Quite a few special registers have been mentioned so far. rBB. it wouldn't reach 264 until more than 584. rB. Z operand (trip) [27]. cycle counter [8]. rD. rN. rF. rT. where-interrupted register (trip) [24].55 years have gone by. rY. failure location register [22]. and causes an interval interrupt when it reaches zero. rF. rC. where-interrupted register (trap) [28]. execution register (trap) [29]. rYY. There is no need to worry about rC over owing. usage counter [17]. return-jump register [4]. local threshold register [20]. trap address register [13]. himult register [3]. together with their internal code numbers: rA. the cycle counter increases every 2 nanoseconds. The interval counter rI is similar. Special registers. rJ. 40. rO. rX. epsilon register [2]. prediction register [23]. dynamic trap address register [14]. rWW. rP. rE. and MMIX actually has even more. rN. bootstrap register (trap) [7]. rU. rW. rG. rU. rQ.MMIX: SPECIAL REGISTERS 46 39.

MMIX: 47 SPECIAL REGISTERS \continuous pro.

The interval interrupt is achieved by setting the leftmost bit of the \machine" byte of rQ equal to 1. Mark T. Sites. Sanjay Ghemawat. Lance M. Henzinger. Vandevoorde. Richard L. Shun-Tak A. ACM Transactions on Computer Systems 15 (1997). Weihl. Berc. see Jennifer M. 357{390. and William E. Anderson. Je rey Dean. this is the eighth-least-signi. Waldspurger. Leung. Monika R.ling" as a means of studying the empirical running time of programs. Carl A.

The usage counter rU consists of three .cant bit.

and the usage count uc . The most signi. the usage mask um . um . uc ).elds (up . called the usage pattern up .

cant byte of rU is the usage pattern. the next most signi.

all oating point instructions are counted together with . the value of uc increases by 1 (mod 248 ). the OP-code chart below implies that all instructions are counted if up = um = 0. Thus. Whenever an instruction whose OP ^ um = up has been executed. for example. and the remaining 48 bits are the usage count.cant byte is the usage mask. all loads and stores are counted together with GO and PUSHGO if up = (10000000)2 and um = (11000000)2 .

xed point multiplications and divisions if up = 0 and um = (11100000)2 . .

but extra care is needed in that case because several instructions might . One nice embodiment of this idea is to represent a binary number x in a redundant form as the di erence x0 x00 of two binary numbers. we need only verify this in the eight cases when x. x00 ) (f (x0 . the 64-bit counters rC and rI can be implemented rather cheaply with only two levels of logic. y. y. Instructions in negative locations. 1)). z ) g(x. y. z ) and g(x. which belong to the operating system. z ) = (x ^ y) _ (x ^ z ) _ (y ^ z ). x00 . We need not actually compute the di erence x0 x00 until we need to examine the register. Proc. x0 . are exceptional: They are included in the usage count only if the leading bit of uc is 1. Information Processing (Paris: 1959). x00 . G.xed point multiplications and divisions alone are counted if up = (00011000)2 and um = (11111000)2 . Robertson. Thus we can subtract 1 from a counter x0 x00 by setting (x0 . for example. x0 . y. Any two such numbers can be added without carry propagation as follows: Let f (x. g(x. 1)  1. x00 ) (g(x00 . A similar trick works for rU. E. we can add 1 by setting (x0 . 1). and z are 0 or 1. g(x0 . f (x00 . z ) = x  y  z: Then it is easy to check that x y + z = 2f (x. y. y. International Conf. z ) is particularly simple in the special cases z = 0 and z = 1. 1)  1). Metze and J. y. The result is 0 00 zero if and only if x = x . z ). Incidentally. completed subroutine calls are counted if up = POP and um = (11111111)2 . 389{396]. The computation of f (x. using an old trick called \carry-save addition" [see.

) G . x51. (Thanks to Frank Yellin for his improvements to this paragraph. L .nish at the same time. opcode chart. x29. . x29.

MMIX: SPECIAL REGISTERS 48 41. in its . The special serial number register rN is permanently set to the time this particular instance of MMIX was created (measured as the number of seconds since 00:00:00 Greenwich mean time on 1 January 1970).

ve least signi.

The three most signi.cant bytes.

This quantity serves as an essentially unique identi.cant bytes are permanently set to the version number of the MMIX architecture that is being implemented together with two additional bytes that modify the version number.

0.0.1 is similar. Version 1. Version 1. but simpli.cation number for each copy of MMIX .0 of the architecture is described in the present document.

42. beginning at an address like # 6000000000000000. : : : . because they are used to implement MMIX 's register stack S [0]. S [2]. S [1]. The operating system initializes a register stack by assigning a large area of virtual memory to each running process. Other versions may become necessary in the future. The register stack o set rO and register stack pointer rS are especially interesting. stack entry S [k] will go into the octabyte M8 [ + 8k].ed to avoid the complications of pipelines and operating systems. Stack under ow will be detected because the process does not have permission to read from M[ 1]. If this starting address is . Stack over ow will be detected because something will give out|either the user's budget or the user's patience or the user's swap space|long before 261 bytes of virtual memory are .

l[513] = l[1]. g[255] are used for register numbers that are  G in MMIX commands. The global registers g[32]. g[33]. but there may be only 256 or there may be 1024. one for globals and one for locals. so that l[512] = l[0].lled by a register stack. : : : . l[511] act as a cyclic bu er with addresses that wrap around mod 512. . l[1]. recall that G is always 32 or more. which we will call . etc. : : : . This bu er is divided into three parts by three pointers. The local registers come from another array that contains 2n registers for some n where 8  n  10. for simplicity of exposition we will assume that there are exactly 512 local registers. The MMIX hardware maintains the register stack by having two banks of 64-bit general-purpose registers. The local register slots l[0].

. . and .

L Registers l[ ]. l[. l[ + 1]. : : : .

1] are what program instructions currently call $0. registers l[. $(L 1). $1. : : : .

l[. ].

l[ 1] are currently unused. +1]. l[ 1] contain items of the register stack that have been pushed down but not yet stored in memory. : : : . and registers l[ ]. l[ + 1]. : : : . if necessary. Special register rS holds the virtual memory address where l[ ] will be stored. Special register rO holds the address .

.MMIX: 49 SPECIAL REGISTERS where l[ ] will be stored. We can deduce the values of . this always equals 8 plus the address of S [0].

. . because = (rO=8) mod 512. and from the contents of rL. rO. and rS.

and = (rS=8) mod 512: To maintain this situation we need to make sure that the pointers . = ( + rL) mod 512. .

. A PUSHJ or PUSHGO operation simply advances toward . and never move past each other.

so it is very simple. . The .

which moves .rst part of a POP operation.

But the next part of a POP requires to move downward. and memory accesses might be required. the operation of increasing L may cause MMIX to set M8 [rS] l[ ] and increase rS by 8 (thereby increasing by 1) one or more times. toward . one or more times if necessary. is also very simple. to keep . Similarly. MMIX will decrease rS by 8 (thereby decreasing by 1) and set l[ ] M8 [rS]. to keep from decreasing past .

L . these operations are interruptible. l[1]. and incorporated in the so-called CRISP architecture developed at AT&T Bell Labs. For example.] Limited versions of MMIX . An even more similar scheme was adopted in the late 1980s by Advanced Micro Devices. g[255]. having fewer registers. we might have only 32 local registers l[0]. l[31] and only 32 global registers g[224]. Ditzel and H. x29. can also be envisioned. : : : . in the processors of their Am29000 series|a family of computers whose instructions have essentially the format `OP X Y Z' used by MMIX . g[225]. R. 48{56. . G . Such a machine could run any MMIX program that maintains the inequalities L < 32 and G  224. If many registers need to be loaded or stored at once. [A somewhat similar scheme was introduced by David R. McLellan in SIGPLAN Notices 17. x29. : : : . from increasing past . 4 (April 1982).

Z `get from special register'. the Y .MMIX: SPECIAL REGISTERS 50 43. Access to MMIX 's special registers is obtained via the GET and PUT commands.  GET $X.

Register X is set to the contents of the special register identi.eld must be zero.

MMIX does not keep secrets from an inquisitive user.ed by its code number Z. But of course only the operating system is allowed to change registers like rK and rQ (the interrupt mask and request registers). An illegal instruction interrupt occurs if Z  32.  PUT X. using the code numbers listed earlier. the Y . And not even the operating system is allowed to change rC (the cycle counter) or rN (the serial number) or the stack pointers rO and rS. Every special register is readable.$Z|Z `put into special register'.

The special register identi.eld must be zero.

any bits of rQ that have changed from 0 to 1 since the most recent GET x.) Impermissible PUT commands cause an illegal instruction interrupt.100 when rL is known to be less than 100. rU. rO. if permissible. rK. The PUT command will not increase rL.rQ will remain 1 after PUT rQ. Some changes are. rU. rQ. special registers 8{11 (namely rC. the leading seven bytes of rG and rL must remain zero. rN. rT. rQ.0 instead of PUT rL. and rTT) a privileged operation interrupt. rV. rT.$Z `restore process state'. (A program should say SETL $99. UNSAVE 0. it sets rL to the minimum of the current value and the new value.  SAVE $X. special registers 12{18 (namely rI.0 `save process state'.z . impermissible: Bits of rA that are always zero must remain zero. rV. and rL must not exceed rG.ed by X is set to the contents of register Z or to the unsigned byte Z itself. or (in the case of rI. and rS) must not change. however. rK. and rTT) can be changed only if the privilege bit of rK is zero. and certain bits of rQ (depending on available hardware) might not allow software to change them from 0 to 1. the Y . Moreover.

eld must be 0. and so must the Z .

the X .eld of SAVE .

eld of UNSAVE . . $1. $255 are placed above them in the register stack. : : : . and L is set to zero. Then the current global registers $G. $(L 1) are pushed down as in PUSHGO $255 . : : : . The SAVE instruction stores all registers and special registers that might a ect the computation of the currently running process. First the current local registers $0. $(G + 1).

rR. rW. If an interrupt occurs while the registers are being saved. (This instruction is interruptible. rX. rP. followed by registers rG and rA packed into eight bytes: 8 24 32 rG 0 rA The address of the topmost octabyte is then placed in register X. rJ. we will have = . rH. rD.nally rB. and rZ are placed at the very top. which must be a global register. rE. rM. rY.

) Immediately after a SAVE the values of rO and rS are equal to the location of the . = in the ring of local registers. The interrupt handler essentially has a new register stack. starting on top of the partially saved context. thus rO will equal rS and rL will be zero.

thus one shouldn't do a POP until this context or some other context has been unsaved. The current register stack is e ectively empty at this point. .rst byte following the stack just saved.

Immediately after an UNSAVE the values of rO and rS will be equal. an interrupt during the middle of an UNSAVE may have already clobbered some of the data in memory before the UNSAVE has completely . Once an UNSAVE has been done. further operations are likely to change the memory record of what was saved. It can also use UNSAVE to establish suitable initial values of rO and rS. restoring all the registers when given an address in register Z that was returned by a previous SAVE .MMIX: 51 SPECIAL REGISTERS The UNSAVE instruction goes the other way. Caution: UNSAVE is destructive. Moreover. in the sense that a program can't reliably UNSAVE twice from the same saved context. But a user program that knows what it is doing can in fact allocate its own register stack or stacks and do its own process switching. this instruction is interruptible. The operating system uses SAVE and UNSAVE to switch context between di erent processes. Like SAVE .

. x29. x42. rS. rO. x42.nished. . G L . x29. although the data will appear properly in all registers.

1 . Thus a programmer should let each segment grow upward from zero. depending on whether the three leading bits are 000. or 011. the machine is able to access smaller addresses of a segment more eÆciently than larger addresses. data. the trap addresses in rT and rTT should be negative. Now it's time for the technical details of virtual address translation. trying to keep any of the 61-bit addresses from becoming larger than necessary. dynamic memory. Thus M[A] really refers to m[(A)]. because they are addresses inside the operating system. all physical addresses  248 are intended for use by memory-mapped I/O devices. respectively. Virtual and physical addresses. 010. 45. but two simple rules are important to ordinary users:  Negative addresses are mapped directly to physical addresses. The mappings 0 . and 3 of 61-bit addresses into 48-bit physical memory space. 2 . In general. and register stack. 2 . for use by the operating system only. (Thus. Virtual 64-bit addresses are converted to physical addresses in a manner governed by the special virtual translation register rV. although arbitrary addresses are legal. All accesses to negative addresses are privileged.) Moreover.  Nonnegative addresses belong to four segments. 1 . for example. but such conventions are not mandatory. 001.MMIX: VIRTUAL AND PHYSICAL ADDRESSES 52 44. The details of this conversion are rather technical and of interest mainly to the operating system. where m is the physical memory array and (A) is determined by the physical mapping function . if 0  A < 263 . These 261 -byte segments are traditionally used for a program's text. one for each segment: (A) = bA=261 c (A mod 261 ). values read from or written to such locations are never placed in a cache. if A < 0. by simply suppressing the sign bit: (A) = A + 263 = A ^ # 7fffffffffff. and 3 are de. There are four mappings 0 .

(1) The .ned by the following rules.

we also de. b3 . b4 .rst two bytes of rV are four nybbles called b1 . b2 .

(2) The next byte of rV. Segment i has at most 1024 bi+1 bi pages. speci.ne b0 = 0. In particular. segment i must have at most one page when bi = bi+1 . s. and it must be entirely empty if bi > bi+1 .

We must have s  13 (hence at least 8K bytes per page). say. which is 2s bytes. Values of s larger than. because memory protection and swapping are applied to entire pages. 20 or so are of use only in rather large programs that will reside in main memory for long periods of time. The maximum legal value of s is 48. (3) The remaining .es the current page size.

ve bytes of rV are a 27-bit root location r. a 10-bit address space number n. and a 3-bit function .

and the b1 . virtual address translation will be done by software instead of hardware. b2 .eld f : 4 4 4 4 8 27 10 3 rV = b1 b2 b3 b4 s r n f Normally f = 0. b3 . b4 . and r . if f = 1.

elds of rV will be ignored by the hardware.

MMIX:

53

VIRTUAL AND PHYSICAL ADDRESSES

(Values of f > 1 are reserved for possible future use; if f > 1 when MMIX tries to
translate an address, a memory-protection failure will occur.)
(4) Each page has an 8-byte page table entry (PTE), which looks like this:
16

PTE =

48

x

a

s

s

13

y

10

3

n

p

Here x and y are ignored (thus they are usable for any purpose by the operating
system); a is the physical address of byte 0 on the page; and n is the address space
number (which must match the number in rV). The

nal three bits are the protection
bits pr pw px ; the user needs pr = 1 to load from this page, pw = 1 to store on this
page, and px = 1 to execute instructions on this page. If n fails to match the number
in rV, or if the appropriate protection bit is zero, a memory-protection fault occurs.
Page table entries should be writable only by the operating system. The 16 ignored
bits of x imply that physical memory size is limited to 248 bytes (namely 256 large
terabytes); that should be enough capacity for awhile, if not for the entire new
millennium.
(5) A given 61-bit address A belongs to page bA=2s c of its segment, and 

i (A) = 2s a + (A mod 2s )
if a is the address in the PTE for page bA=2s c of segment i.
(6) Suppose bA=2s c = (a4 a3 a2 a1 a0 )1024 in the radix-1024 number system. In the
common case a4 = a3 = a2 = a1 = 0, the PTE is simply the octabyte m8 [213 (r +
bi ) + 8a0 ]; this rule de

nes the mapping for the

rst 1024 pages. The next million
or so pages are accessed through an auxiliary page table pointer
1

PTP = 1

50

10

3

c

n

q

in m8 [213 (r + bi +1)+8a1 ]; here the sign must be 1 and the n-

eld must match rV, but
the q bits are ignored. The desired PTE for page (a1 a0 )1024 is then in m8 [213 c + 8a0 ].
The next billion or so pages, namely the pages (a2 a1 a0 )1024 with a2 6= 0, are accessed
similarly, through an auxiliary PTP at level two; and so on.

MMIX:

VIRTUAL AND PHYSICAL ADDRESSES

54

Notice that if b3 = b4 , there is just one page in segment 3, and its PTE appears all
alone in physical location 213 (r + b3 ). Otherwise the PTEs appear in 1024-octabyte
blocks. We usually have 0 < b1 < b2 < b3 < b4 , but the null case b1 = b2 = b3 = b4 = 0
is worthy of mention: In this special case there is only one page, and the segment bits
of a virtual address are ignored; the other 61 s bits of each virtual address must be
zero.
If s = 13, b1 = 3, b2 = 2, b3 = 1, and b4 = 0, there are at most 230 pages of 8K
bytes each, all belonging to segment 0. This is essentially the virtual memory setup
in the Alpha 21064 computers with DIGITAL UNIX TM .
I know these rules look extremely complicated, and I sincerely wish I could have
found an alternative that would be both simple and eÆcient in practice. I tried
various schemes based on hashing, but came to the conclusion that \trie" methods
such as those described here are better for this application. Indeed, the page tables in
most contemporary computers are based on very similar ideas, but with signi

cantly
smaller virtual addresses and without the shortcut for small page numbers. I tried
also to

nd formats for rV and the page tables that would match byte boundaries in
a more friendly way, but the corresponding page sizes did not work well. Fortunately
these grungy details are almost always completely hidden from ordinary users.

MMIX:

55

VIRTUAL AND PHYSICAL ADDRESSES

46. Of course MMIX can't a ord to perform a lengthy calculation of physical addresses every time it accesses memory. The machine therefore maintains a translation
cache (TC), which contains the translations of recently accessed pages. (In fact, there
usually are two such caches, one for instructions and one for data.) A TC holds a set
of 64-bit translation keys
1 2
0

i

61

v

s

s

13
0

10

3

n

0

associated with 38-bit translations
48

a

s

s

13 3
0

p

representing the relevant parts of the PTE for page v of segment i. Di erent processes typically have di erent values of n, and possibly also di erent values of s. The
operating system needs a way to keep such caches up to date when pages are being allocated, moved, swapped, or recycled. The operating system also likes to know which
pages have been recently used. The LDVTS instructions facilitate such operations: 
LDVTS $X,$Y,$Z|Z `load virtual translation status'.
The sum $Y + $Z or $Y + Z should have the form of a translation cache key as above,
except that the rightmost three bits need not be zero. If this key is present in a TC,
the rightmost three bits replace the current protection code p; however, if p is thereby
set to zero, the key is removed from the TC. Register X is set to 0 if the key was
not present in any translation cache, or to 1 if the key was present in the TC for
instructions, or to 2 if the key was present in the TC for data, or to 3 if the key was
present in both. This instruction is for the operating system only.

MMIX:

VIRTUAL AND PHYSICAL ADDRESSES

56

47. We mentioned earlier that cheap versions of MMIX might calculate the physical

addresses with software instead of hardware, using forced traps when the operating
system needs to do page table calculations. Here is some code that could be used
for such purposes; it de

nes the translation process precisely. First we must unpack the . given a nonnegative virtual address in register rYY.

49 $0.$1 $1.$3 s.virt.$1.40 s.#1ff8 $7.#ff $0.1 limit.s $1.$1.$7.virt.$1 % % % % $7=(virtual translation register) $1=rightmost three bits those bits should be zero $1=i (segment number of virtual address) % $1=52-4i % $1=b[i]<<12 % $2=b[i+1]<<12 % % % % $3=(r field of rV) make $3 a physical address base=address of first page table limit=address after last page table % s=(s field of rV) % s must be 13 or more % s must be 48 or less % % % % % mask=(sign bit and n field) set sign bit for PTP validation below $0=a4a3a2a1a0 (page number of virt) $1=[page number is zero] increase limit if page number is zero The next part of the routine .Fail $1.$1 $2.rV $1.rYY $7.#f000 $1.#8000 $0.$1.$3.#8000 base.2 $1.Fail $0.s.$0 $2.37 $3.13 $3.$7.$0 $3.13 $0.$7.$7.limit.$3 limit.elds of rV and compute the relevant base addresses for PTEs and PTPs: GET GET AND BNZ SRU SLU NEG SRU SLU SETL AND AND SLU SRU SLU ORH 2ADDU 2ADDU SRU AND CMP BN CMP BNN SETH ORL ORH SRU ZSZ ADD virt.$0.$1.$3.24 $3.61 $1.Fail mask.52.4 $0.$2.$2.s.#7 $1.s.#8000 mask.

from right to left: OR $5.#3ff.10.nds the \digits" of the page number (a4 a3 a2 a1 a0 )1024 .#3ff.#3ff.1F $2. INCL OR $5.0.10. PBZ base.3F $4.#3ff.base. PBZ base.10.4F .$2. INCL OR $5. SRU AND $3.#2000 $1.base.0.0.$3. PBZ base. INCL OR $5.$1.0. SRU AND $1. SRU AND $0.10.#2000 $4. INCL $1.#2000 $2.2F $3.base.$0. SRU AND $2.#2000 $3. PBZ base.base.

BNZ $6. If errors have been detected.base.base.base.0 % errors lead to PTE of zero rZZ.$7.base.Fail Finally we obtain the PTE and communicate it to the machine.MMIX: 57 VIRTUAL AND PHYSICAL ADDRESSES Then the process cascades back through PTPs. LDO base. AND $6. ANDNL base.base. AND $6. AND $6. This e ectively shuts o memory mapping and makes the page tables inaccessible to the user.base.Fail BNZ $6.base. any other mapping could be substituted by operating systems with special needs. 4H 3H 2H OR $5.0 XOR $6. LDO base.base base.base. ANDNL base.base $255.mask.0 XOR $6.mask. LDO base. ANDNL base.#1fff 8ADDU $6. The program assumes that the ropcode in rXX is 3 (which it is when a forced trap is triggered by the need for virtual translation).$6.limit $6.$7. But people usually want compatibility between di erent implementations whenever possible.#1fff 8ADDU $6. The translation from virtual pages to physical pages need not actually follow the rules for PTPs and PTEs.51 $6.#1fff 8ADDU $6.#7 $6.0 % base=PTE base. LDO base.$7 $6.IntMask % load the desired setting of rK 1 % now the machine will digest the translation All loads and stores in this program deal with negative virtual addresses. ANDNL 8ADDU LDO XOR ANDN SLU BNZ CMP BN Fail SETL Ready PUT LDO RESUME 1H base.$7.$4.$7. AND $6.$6.Fail % branch if n doesn't match $6.$6.$6.0 XOR $6.0 XOR $6.$6.$0.$3. actually any translation with permission bits zero would have the same e ect.Fail BNZ $6.$6.$1.base.$5.$2.Ready % did we run off the end of the page table? base.mask. we set the translation to zero.$6.#1fff % remove low 13 bits of PTP $6.Fail BNZ $6.0 8ADDU $6.base.mask.$6.$6. The only parts of rV that MMIX really needs are the s .base.$6.

eld. which de.

nes page sizes. and the n .

. which keeps TC entries of one process from being confused with the TC entries of another.eld.

) 49.Z `sympathize with your machinery'. fans. a no-op instruction provides a good way for software to send signals to the hardware. The SWYM command was originally included in MMIX 's repertoire because machines occasionally need grease to keep in shape. and any other mechanical equipment hooked up to MMIX .MMIX: THE COMPLETE INSTRUCTION SET 58 48. laser printers." an instruction that does nothing at all. SWYM has turned out to be a \no-op. the running time of an MMIX program can often be estimated satisfactorily by assigning a . Fields X. rZZ) will be put into the TC for data. 50. the machine might be trying to write old data from a cache in order to make room for new data. and Z are ignored. otherwise (rYY. if necessary. as well as the sizes of caches and other bu ers. The running time of MMIX programs depends to a great extent on changes in technology. ropcode 3 of RESUME 1 will put (rYY. But in fact. just as human beings occasionally need to swim or do some other kind of exercise in order to maintain good muscle tone. magnetic tape drives. the hypothetical manufacturers of our hypothetical machine have pointed out that modern computer equipment is already well oiled and sealed for permanent use. Thus there is generally no connection between the current virtual program location rW and the physical location of a memory error. scanners. Even so. rZZ) into the TC for instructions if the opcode in rXX is SWYM . (The instruction leading to this error will probably be long gone before such a fault is detected. One additional instruction proves to be useful. For practical purposes. When a forced trap computes the translation rZZ of a virtual address rYY.Y. or when an operating system is booting up. We have now described all of MMIX 's special registers|except one: The special failure location register rF is set to a physical memory address when a parity error or other memory fault occurs. This command lubricates the disk drives. The complete instruction set. Y. slow versions as well as in costly high-performance models. Details of running time usually depend on things like the amount of main memory available to implement virtual memory. for example. Software programs can also use no-ops to communicate with other programs like symbolic debuggers. for such things as scheduling the way instructions are issued on superscalar superpipelined buzzword-compliant machines. MMIX is a mythical machine.  SWYM X. But knowledge of the latter location can still be useful for hardware repair. but its mythical hardware exists in cheap.

where  (pronounced \oops") is a unit that represents the clock cycle time in a pipelined implementation. say. 35 + 1000. but I'll keep calling it . The value of  will probably decrease from year to year.xed cost to each operation. The ratio = will probably increase with time. The running time will also depend on the number of memory references or mems that a program uses. Each operation will be assumed to take an integer number of . so mem-counting is . The total running time of a program might be reported as. based on the approximate running time that would be obtained on a high-performance machine with lots of main memory. meaning 35 mems plus 1000 oops. each LDO (load octa) instruction will be assumed to cost  + . where  is the average cost of a memory reference. this is the number of load and store instructions. For example. so that's what we will do.

The FREM instruction might typically cost (3+ Æ ). Integer multiplication takes 10. Mispredicted branches or probable branches cost 3. PUT . and comparison all take just 1. 1994). FDIV and FSQRT cost 40 each. and it might slow down greatly when trips are enabled. TRAP . The actual running time of oating point computations will vary depending on the operands. comparisons. and FUN need only 1. for example. Most oating point operations have a nominal running time of 4. FEQL .59 MMIX: THE COMPLETE INSTRUCTION SET likely to become increasingly important. and so do the POP and GO commands. the machine might need one extra  for each denormal input or output. where Æ is the amount by which the exponent of the . SYNC . GET . integer division weighs in at 60. shifts. relative jumps. The same is true for SET . [See the discussion of mems in The Stanford GraphBase (New York: ACM Press.] Integer addition. although the comparison operators FCMP . and SWYM instructions. conditional assignments. and RESUME cost 5 each. as well as bitwise logical operations. and correctly predicted branches-not-taken or probable-branches-taken. TRIP . subtraction.

rst operand exceeds the exponent of the second (or zero. A oating point operation might take only 1 if at least one of its operands is zero. if this amount is negative). in.

the . However. or NaN.nity.

All load and store operations will be assumed to cost  + . # 9. if the result is not needed while other numbers are being calculated. We have not included the cost of fetching instructions from memory. and # B. an integer multiplication or division might have an e ective cost of only 1. It's best to keep the rules simple. except that CSWAP costs 2 + 2. except # 98{# 9F and # B8{# BF. . Furthermore. since we want to keep the estimates as simple as possible without making them terribly out of line. Only a detailed simulation can be expected to be truly realistic. # A. (This applies to all OP codes that begin with # 8. Of course we must remember that these numbers are very rough.xed values stated at the beginning of this paragraph will be used for all seat-of-the-pants estimates of running time.) SAVE and UNSAVE are charged 20 + . because  is just an approximate device for estimating average memory cost.

we have now described them all.MMIX: THE COMPLETE INSTRUCTION SET 60 51. Here is a chart that shows their numeric values: #0 # 0x # 1x # 2x # 3x # 4x # 5x # 6x # 7x # 8x # 9x # Ax # Bx # Cx # Dx # Ex # Fx #1 TRAP FCMP FLOT[I] FMUL FCMPE MUL[I] ADD[I] 2ADDU[I] CMP[I] SL[I] BN[B] BNN[B] PBN[B] PBNN[B] CSN[I] CSNN[I] ZSN[I] ZSNN[I] LDB[I] LDT[I] LDSF[I] LDVTS[I] STB[I] STT[I] STSF[I] SYNCD[I] OR[I] AND[I] BDIF[I] MUX[I] SETH SETMH ORH ORMH JMP[B] POP RESUME #8 #9 #2 #3 FUN FEQL FLOTU[I] FUNE FEQLE MULU[I] ADDU[I] 4ADDU[I] CMPU[I] SLU[I] BZ[B] BNZ[B] PBZ[B] PBNZ[B] CSZ[I] CSNZ[I] ZSZ[I] ZSNZ[I] LDBU[I] LDTU[I] LDHT[I] PRELD[I] STBU[I] STTU[I] STHT[I] PREST[I] ORN[I] ANDN[I] WDIF[I] SADD[I] SETML SETL ORML ORL PUSHJ[B] SAVE UNSAVE #A #B #4 #5 FADD FIX SFLOT[I] FDIV FSQRT DIV[I] SUB[I] 8ADDU[I] NEG[I] SR[I] BP[B] BNP[B] PBP[B] PBNP[B] CSP[I] CSNP[I] ZSP[I] ZSNP[I] LDW[I] LDO[I] CSWAP[I] PREGO[I] STW[I] STO[I] STCO[I] SYNCID[I] NOR[I] NAND[I] TDIF[I] MOR[I] INCH INCMH ANDNH ANDNMH GETA[B] SYNC SWYM #C #D #6 #7 FSUB FIXU SFLOTU[I] FREM FINT DIVU[I] SUBU[I] 16ADDU[I] NEGU[I] SRU[I] BOD[B] BEV[B] PBOD[B] PBEV[B] CSOD[I] CSEV[I] ZSOD[I] ZSEV[I] LDWU[I] LDOU[I] LDUNC[I] GO[I] STWU[I] STOU[I] STUNC[I] PUSHGO[I] XOR[I] NXOR[I] ODIF[I] MXOR[I] INCML INCL ANDNML ANDNL PUT[I] GET TRIP #E #F # 0x # 1x # 2x # 3x # 4x # 5x # 6x # 7x # 8x # 9x # Ax # Bx # Cx # Dx # Ex # Fx . If you think that MMIX has plenty of operation codes. you are right.

61 MMIX: THE COMPLETE INSTRUCTION SET The notation `[I] ' indicates an operation with an \immediate" variant in which the Z .

operation code A9 in hexadecimal notation appears in the lower part of the # Ax row and in the # 1/# 9 column. not ADDI or PUSHJB . `store tetrabyte immediate'. To read this chart. . For example. `[B] ' indicates an operation with a \backward" variant in which a relative address has a negative displacement. But the MMIX assembler uses only the forms ADD and PUSHJ . it is STTI . Simulators and other programs that need to present MMIX instructions in symbolic form will say that opcode # 20 is ADD while opcode # 21 is ADDI . Similarly.eld denotes a constant instead of a register number. use the hexadecimal digits at the top. left. bottom. they will say that # F2 is PUSHJ while # F3 is PUSHJB . and right.

The subroutines below are used to simulate 64-bit MMIX arithmetic on an old-fashioned 32-bit computer|like the one the author had when he wrote MMIXAL and the . Introduction.62 MMIX-ARITH 1.

including a full implementation of the IEEE oating point standard. h Tetrabyte and octabyte type de. assuming only that the C compiler has a 32-bit unsigned integer type. This program module has a simple structure. Meanwhile. we can simulate the future and hope for continued progress.rst MMIX simulators in 1998 and 1999. h Stu for C preprocessor 2 i typedef enum f false . Some day 64-bit machines will be commonplace and the awkward manipulations of the present program will look quite archaic. however. All operations are fabricated from 32-bit arithmetic. Interested readers who have such computers will be able to convert the code to a pure 64-bit form without diÆculty. thereby obtaining much faster and simpler routines. intended to make it suitable for loading with MMIX simulators and assemblers. true g bool .

nitions 3 i h Other type de.

nitions 36 i h Global variables 4 i h Subroutines 5 i 2. Subroutines of this program are declared .

rst with a prototype. then with an old-style C function de. as in ANSI C.

h Stu for C preprocessor 2 i  #ifdef __STDC__ #de. Here are some preprocessor commands that make this work correctly with both new-style and old-style compilers.nition.

ne ARGS (list ) list #else #de.

ne ARGS (list ) ( ) #endif This code is used in section 1. The de. 3.

h Tetrabyte and octabyte type de.nition of type tetra should be changed. if necessary. so that it represents an unsigned 32-bit integer.

nitions 3 i  typedef unsigned int tetra . =  for systems conforming to the LP-64 data model typedef struct f tetra h. l. = two tetrabytes makes one octabyte This code is used in section 1. = = 4. g octa . #de.

0g. 1g. 1999.E. LNCS 1750.  Springer-Verlag Berlin Heidelberg 1999 = . D.ne sign bit ((unsigned ) # 80000000) h Global variables 4 i  = zero octa :h = zero octa :l = 0 = octa zero octa . pp. = neg one :h = neg one :l = 1 octa neg one = f 1. 62-109. = oating point +1 = octa inf octa = f# 7ff00000. Knuth: MMIXware.

28. f octa x. 61. 31. octa oplus (y. 68.5) = octa standard NaN = f# 7ff80000. 85. 46. 41. if (x:l > y:l) return x. 82. 37. 0g. z ) = compute y + z octa y. 38. 8. if (x:l < y:l) return x. octa )). 89. octa ominus ARGS ((octa . 62. In the following subroutine. x:l = y:l x:h . 29. f octa x. octa ominus (y. g See also sections 6. and 75. x:h ++ . octa aux . 26. 25. z = z:l. = auxiliary output of a subroutine = bool over ow . 91. 39. z) = compute y octa y. 34. 6. 24. 27. 32. 7. x:l = y:l + z:l. It's easy to add and subtract octabytes. 30. = set by certain subroutines for signed arithmetic = See also sections 9. z. 40. 69. z. 13. This code is used in section 1. h Subroutines 5 i  octa oplus ARGS ((octa . 12. 44. 54. octa )). This code is used in section 1. 86. x:h = = y:h z:h. 50. delta is a signed quantity that is assumed to . if we aren't terribly worried about speed. x:h g = y:h + z:h. 88.63 MMIX-ARITH: INTRODUCTION = oating point NaN(. and 93. 60. 5.

octa incr (y. = . Standard C. f octa x. return x. g else if (x:l > y:l) x:h . x:l = y:l + delta . int )). h Subroutines 5 i + octa incr ARGS ((octa . g __STDC__ . int delta . delta ) = compute y + Æ octa y. x:h = y:h.t in a signed tetrabyte. if (delta  0) f if (x:l < y:l) x:h ++ .

int s. f while (s  32) y:l = y:h. s = 32. = shift left by s bits. h Subroutines 5 i + octa shift left ARGS ((octa . y:l = s. if (s) f register tetra yhl = y:h  s. int )).64 MMIX-ARITH: INTRODUCTION 7. f while (s  32) y:h = y:l. where 0  s  64 octa shift left (y. octa shift right ARGS ((octa . u. y:l = 0. int s. y:h = (u ? 0 : ( (y:h  31))  (32 s)) + (y:h  s). = s). arithmetically if u = 0 octa shift right (y. u) octa y. y:l = yhl + ylh . int . s) octa y. = shift right. . ylh = y:l  s. g g = return y. s = 32. ylh = y:l  (32 y:h = yhl + ylh . s. Left and right shifts are only a bit more diÆcult. if (s) f register tetra yhl = y:h  (32 s). g g return y. y:h = (u ? 0 : (y:h  31)). int )).

h Subroutines 5 i + octa omult ARGS ((octa . x3. f register int i. The following subroutine returns the lower half of the product. octa )). x3. i ++ ) f  g g t = u[i] v [j ] + w [i + j ] + k. l: tetra . x3. obtaining an unsigned 128-bit product. for (j = 0. with b = 216 . h: tetra . Multiplication. x4. x1. v[2] = z:h & # ffff. for (j = 0. octa = struct . z. This code is used in section 8. = secondary output of subroutines with multiple outputs extern bool over ow . v[0] = z:l & # ffff. over ow : bool . and puts the upper half into a global octabyte called aux . ARGS aux : octa . v[1] = z:l  16. bool : enum . tetra = unsigned int . 11. j < 4. h Unpack the multiplier and multiplicand to u and v 10 i. octa omult (y. aux :l = (w[5]  16) + w[4]. j ++ ) if (:v[j ]) w[j + 4] = 0. and acc 11 i. u[3] = y:h v [3] = z:h = macro ( ). acc :l = (w[1]  16) + w[0]. i < 4. + 4] = k. u[0] = y:l & # ffff.1M of Seminumerical Algorithms.  16. u[1] = y:l  16. tetra u[4]. j. This code is used in section 8. . acc :h = (w[3]  16) + w[2]. j ++ ) w[j ] = 0. h Pack w into the outputs aux g return acc . octa acc . j < 4. h Global variables 4 i + extern octa aux . = 10. It is easy to do this on a 32-bit machine by using Algorithm 4. v[4]. k.65 MMIX-ARITH: MULTIPLICATION 8. u[2] = y:h & # ffff. x4.3. x2. h Unpack the multiplier and multiplicand to u and v 10 i   16. We need to multiply two unsigned 64-bit integers. x3. register tetra t. z) octa y. w[8]. h Pack w into the outputs aux and acc 11 i  aux :h = (w[7]  16) + w[6]. 9. k = t w [j  16. else f for (i = k = 0. w[i + j ] = t & # ffff.

Signed multiplication has the same lower half product as unsigned multiplica- tion.MMIX-ARITH: MULTIPLICATION 66 12. after which the result has over owed if and only if the upper half is unequal to 64 copies of the sign bit in the lower half. return acc . z) octa y. if (y:h & sign bit ) aux = ominus (aux . z). z. . octa )). f g octa acc . octa signed omult (y. over ow = (aux :h 6= aux :l _ ((aux :h  acc :h) & sign bit )). y). z). h Subroutines 5 i + octa signed omult ARGS ((octa . The signed upper half product is obtained with at most two further subtractions. acc = omult (y. if (z:h & sign bit ) aux = ominus (aux .

h Determine the number of signi. mask . otherwise give trivial answer 14 i. and z . vmh . one of the most challenging routines needed for MMIX arithmetic.) The quotient q is returned by the subroutine. q[4].3.67 MMIX-ARITH: DIVISION 13. computes octabytes q and r such that (264 x + y) = qz + r and 0  r < z . octa )). f register int i. octa acc . The following program. assuming that x < z . z . rhat . qhat . y. octa odiv (x. Division. (If x  z . n. of course. given octabytes x. based on Algorithm 4. z) octa x. register tetra t. d.1D of Seminumerical Algorithms. h Check that x < z. vh . h Unpack the dividend and divisor to u and v 15 i. tetra u[8]. y. k. it simply sets q = x and r = y . j. Long division of an unsigned 128-bit integer by an unsigned 64-bit integer is. the remainder r is stored in aux h Subroutines 5 i + octa odiv ARGS ((octa . v[4]. octa . y.

v [2] = z:h & # ffff. u[6] = x:h & # ffff. u[1] = y:l  16. u[5] = x:l  16. return x. otherwise give trivial answer 14 i  if (x:h > z:h _ (x:h  z:h ^ x:l  z:l)) aux = y. j  0. u[3] = y:h  16. h Pack q and u to acc and aux 19 i. This code is used in section 13. h Normalize the divisor 17 i. v [3] = z:h  16. h Check that x < z . v [0] = z:l & # ffff. u[0] = y:l & # ffff. g return acc . u[4] = x:l & # ffff. f g This code is used in section 13.cant places n in the divisor v 16 i. v [1] = z:l  16. u[2] = y:h & # ffff. j ) h Determine the quotient digit q[j ] 20 i. 16. h Unpack the dividend and divisor to u and v 15 i  u[7] = x:h  16. for (j = 3. 15. h Determine the number of signi. 14. h Unnormalize the remainder 18 i.

This code is used in section 13. v[n 1]  0. tetra = unsigned int . x3. ARGS aux : octa . x3. x3. x4. octa = struct . h: tetra . l: tetra . sign bit = macro. x5. over ow : bool . = macro ( ). . x3. ominus : octa ( ). x8. x4. x4. omult : octa ( ). x2. n ) .cant places n in the divisor v 16 i  for (n = 4.

j ) f t = (k  16) + u[j ]. for (j = k = 0.  g  vh = v[n 1]. v [j ] = t & # ffff. 18. j  0. g This code is used in section 13. h Unnormalize the remainder 18 i  mask = (1  d) 1. for (j = 3. j < n + 4. 20. 2]) f . j ++ ) f  g t = (u[j ] d) + k . where d is chosen to make 215  vn 1 < 216 . 21. We shift u and v left by d places. qhat = t=vh . q^ 21 i. j ) u[j ] = 0. f j < n. for (j = k = 0. k = t 16. u[j ] = t & # ffff. if (rhat  # 10000) break . h Subtract bj q^v from u 22 i. g This code is used in section 13. g This code is used in section 20. j  n. This code is used in section 13. h Normalize the divisor 17 i  vh = v[n 1]. vh = 1) . aux :h = (u[3]  16) + u[2]. This code is used in section 13. while (qhat  # 10000 _ qhat  vmh > (rhat  16) + u[j + n qhat . k = t  16. q [j ] = qhat . 19. h If the result was negative. rhat += vh . for (d = 0. q^ 21 i  t = (u[j + n]  16) + u[j + n 1]. vmh = (n > 1 ? v [n 2] : 0). h Find the trial quotient. decrease q^ by 1 23 i. j ++ ) t = (v [j ] d) + k .68 MMIX-ARITH: DIVISION 17. h Pack q and u to acc and aux 19 i  acc :h = (q[3]  16) + q[2]. aux :l = (u[1]  16) + u[0]. for (k = 0. rhat = t vh  qhat . h Determine the quotient digit q[j ] 20 i  f h Find the trial quotient. k = t & mask . vh < # 8000. d ++ . acc :l = (q[1]  16) + q[0]. u[j ] = t  d.

x13. x13. i ++ ) f g t = u[i + j ] + v [i] + k. x13. l: tetra . k = t  16. i < n. but we don't have to fuss over u[j + n]. x13. The true value of u would be obtained by subtracting k from u[j + n]. x13. x13. k: register int . i: register int . g This code is used in section 20. i ++ ) f t = u[i + j ] + # ffff0000 k qhat  v [i]. i < n. x13. x4. d: register int . qhat : tetra . x13. u[i + j ] = t & # ffff. (t  16). for (i = k = 0. x13. but it can be necessary|for example. vh : tetra . k = # ffff g This code is used in section 20.69 MMIX-ARITH: DIVISION 22. x13. x13. . x3. vmh : tetra . 23. u[j + n] will either equal k or k 1. j : register int . n: register int . x3. h Subtract bj q^v from u 22 i  for (i = k = 0. when dividing the number # 7fff800100000000 by # 800080020005. u: tetra [ ]. h If the result was negative. v : tetra [ ]. h: tetra . q : tetra [ ]. x13. The correction here occurs only rarely. aux : octa . mask : tetra . decrease q^ by 1 23 i  if (u[j + n] 6= k) f qhat . After this step. rhat : tetra . acc : octa . t: register tetra . x13. x13. because it won't be examined later. u[i + j ] = t & # ffff. x13.

q = odiv (zero octa . zz = z. case 0 + 1: if (aux :h _ aux :l) aux = ominus (aux . z. octa signed odiv (y. zz = ominus (zero octa . zz . zz ). goto negate q . case 0 + 0 : return q. We assume that the divisor isn't zero.MMIX-ARITH: DIVISION 70 24. if (z:h & sign bit ) sz = 1. aux ). over ow = false . y). register int sy . yy . q. negate q : if (aux :h _ aux :l) return ominus (neg one . aux ). sz . else sy = 0. if (y:h & sign bit ) sy = 2. octa )). zz ). q). yy = ominus (zero octa . case 2 + 0: if (aux :h _ aux :l) aux = ominus (zz . else sz = 0. g . f g octa yy . q). Signed division can be reduced to unsigned division in a tedious but straightforward manner. z) octa y. switch (sy + sz ) f case 2 + 1 : aux = ominus (zero octa . if (q:h  sign bit ) over ow = true . z ). h Subroutines 5 i + octa signed odiv ARGS ((octa . else return ominus (zero octa . yy = y.

71 MMIX-ARITH: BIT FIDDLING 25. Bit .

x3. . 26. f octa x. octa oxor ARGS ((octa . return (xx & # 0000ffff) + (xx  16). x:h g = y:h & z:h. [This classical trick is called the \Gillies{Miller method for sideways addition" in The Preparation of Programs for an Electronic Digital Computer by Wilkes. but three of them occur often enough to deserve packaging as subroutines. x3. tetra = unsigned int . xx = (xx & # 00ff00ff) + ((xx  8) & # 00ff00ff). x4. x1. octa )). Here's a fun way to count the number of bits in a tetrabyte. x:h g = y:h  z:h. octa oandn (y. x:h g = y:h & z:h.ddling. x:l = y:l & z:l. xx = (xx & # 33333333) + ((xx  2) & # 33333333). octa )). x5. x1. l: tetra . octa oand (y. The bitwise operators of MMIX are fairly easy to implement directly. zero octa : octa . x3. f octa x. return x. octa = struct . ominus : octa ( ). sign bit = macro. return x. z) = compute y  z octa y. z ) = compute y ^ z octa y. second edition (Reading. 191{193. x4. z. octa )). Mass. return x. over ow : bool . = macro ( ). x13. neg one : octa . f g ARGS register int xx = x. z. aux : octa . x:l = y:l & z:l. octa oxor (y. xx = (xx & # 55555555) + ((xx  1) & # 55555555). x:l = = = y:l  z:l. x4. h: tetra . f octa x. z. x2. 1957).: Addison{Wesley. xx = (xx & # 0f0f0f0f) + ((xx  4) & # 0f0f0f0f). h Subroutines 5 i + octa oand ARGS ((octa . x4.] h Subroutines 5 i + int count bits ARGS ((tetra )). odiv : octa ( ). octa oandn ARGS ((octa . x3. int count bits (x) tetra x. Wheeler. z) = compute y ^ z = octa y. x4. false = 0. true = 1. and Gill.

or wyde di be done with fewer operations?) h Subroutines 5 i + tetra wyde di ARGS ((tetra . x:l = b & c. k ++ . 1)) a = ((z:h k ) & # 01010101) # ff. register tetra m = d & # 01000100. # c = (o:l & ff) # 01010101. for (k = 0. 29. if (o:l & # ff) f g _ o:l. f g register tetra a = ((y  16) (z  16)) & # 10000. tetra )). we can carry out the following 20-step branchless computation: h Subroutines 5 i + tetra byte di ARGS ((tetra . The last bitwise subroutine we need is the most interesting: It implements MMIX 's MOR and MXOR operations. x. x = zero octa . d = ((y 8) & # 00ff00ff) + # 01000100 # m = d & 01000100. register tetra b = ((y & # ffff) (z & # ffff)) & # 10000. register tetra x = d & (m (m  8)). j   j    = . 8. o = shift right (o. xor ) octa y. else x:h = a & c. return y (z  ((y  z) & (b a (b  16)))).  g  ((z  8) & # 00ff00ff). z . register tetra a.  28. bool )). if (xor ) x:h = a & c. byte di . z) tetra y. octa bool mult (y. b. c. x:l = b & c. tetra wyde di (y. To compute the nonnegative wyde di erences of two tetrabytes. tetra )). z. tetra byte di (y. To compute the nonnegative byte di erences of two given tetrabytes. return x + ((d & (m (m 8))) 8). z. octa . = the operands = bool xor . (Research problem: Can count bits .72 MMIX-ARITH: BIT FIDDLING 27. z ) tetra y.   g o:h return x. f register tetra d = (y & # 00ff00ff) + # 01000100 (z & # 00ff00ff). = do we do xor instead of or? f octa o. b = ((z:l k ) & # 01010101) # ff. register int k. h Subroutines 5 i + octa bool mult ARGS ((octa . z . o = y. another trick leads to a 15-step branchless computation.

x3. x26. zero octa : octa . . octa = struct . x7.73 ARGS MMIX-ARITH: BIT FIDDLING = macro ( ). x4. shift right : octa ( ). x3. x1. x3. l: tetra . tetra = unsigned int . x3. h: tetra . bool : enum . x2. count bits : int ( ).

Standard IEEE oating binary numbers pack a sign. Floating point packing and unpacking. #de. exponent. In this section we consider basic subroutines that convert between IEEE format and the separate unpacked components.MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 74 30. and fraction into a tetrabyte or octabyte.

ne ROUND_OFF 1 #de.

ne ROUND_UP 2 #de.

ne ROUND_DOWN 3 #de.

31. The fpack routine takes an octabyte f . using a given rounding mode. The raw exponent e is usually one less than the . and packs them into the oating binary number that corresponds to 2e 1076 f . e = # 3fe. and s = '+' .ne ROUND_NEAR 4 h Global variables 4 i + = the current rounding mode = int cur round . Thus. The value of f should satisfy 254  f  255 . and a sign s. a raw exponent e. for example. the oating binary number +1:0 = # 3ff0000000000000 is obtained when f = 254 .

when e < 0. the leading bit of f is essentially added to the exponent. or in cases where the value of f is rounded upwards to 255 .) Exceptional events are noted by oring appropriate bits into the global variable exceptions . Special considerations apply to under ow. (This trick works nicely for denormal numbers.nal exponent value. which is not fully speci.

4 of the IEEE standard: Implementations of the standard are free to choose between two de.ed by Section 7.

nitions of \tininess" and two de.

MMIX treats accuracy loss as equivalent to inexactness. hence a result with e < 0 is not necessarily tiny. #de. The fpack routine sets U_BIT in exceptions if and only if the result is tiny." MMIX determines tininess after rounding. Thus. a result under ows if and only if it is tiny and either (i) it is inexact or (ii) the under ow trap is enabled.nitions of \accuracy loss. X_BIT if and only if the result is inexact.

ne X_BIT (1  8)
= oating inexact =
#de

ne Z_BIT (1  9)
= oating division by zero =
#de

ne U_BIT (1  10)
= oating under ow =
#de

ne O_BIT (1  11)
= oating over ow =
#de

ne I_BIT (1  12)
= oating invalid operation =
#de

ne W_BIT (1  13)
= oat-to-

x over ow =
#de

ne V_BIT (1  14)
= integer over ow =
#de

ne D_BIT (1  15)
= integer divide check =
#de

ne E_BIT (1  18)
= external (dynamic) trap bit =
h Subroutines 5 i +
octa fpack ARGS ((octa ; int ; char ; int ));
octa fpack (f ; e; s; r)
octa f ;
= the normalized fraction part =
int e;
= the raw exponent =
char s;
= the sign =
int r;
= the rounding mode =

f

octa o;
if (e > # 7fd)

e

= # 7ff; o = zero octa ;

75

MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING

else f
if (e < 0) f
if (e < 54) o:h = 0; o:l = 1;
else f octa oo ;
o = shift right (f ; e; 1);
oo = shift left (o; e);
if (oo :l 6= f :l _ oo :h 6= f :h)

g
e

= 0;

g else

g

o

o:l

j= 1;

= 

sticky bit 

=

= f;

g
h Round and return the result 33 i;

32. h Global variables 4 i + 

int exceptions ;

= 

=

bits possibly destined for rA

33. Everything falls together so nicely here, it's almost too good to be true!
h Round and return the result 33 i 
if (o:l & 3) exceptions j= X_BIT ;
switch (r) f
case ROUND_DOWN : if (s  '-' ) o = incr (o; 3); break ;
case ROUND_UP : if (s 6= '-' ) o = incr (o; 3);
case ROUND_OFF : break ;
case ROUND_NEAR : o = incr (o; o:l & 4 ? 2 : 1); break ;

g

o = shift right (o; 2; 1);
o:h += e
20;
if (o:h # 7ff00000) exceptions = O_BIT + X_BIT ;
else if (o:h < # 100000) exceptions = U_BIT ;
=
if (s '-' ) o:h = sign bit ;
return o;   

j

j

j 

= over ow
tininess = 

=

This code is used in section 31.

= macro ( ), x2.
incr : octa ( ), x6.
ARGS

h: tetra , x3.

l: tetra , x3.

octa = struct , x3.
shift left : octa ( ), x7.

shift right : octa ( ), x7.
sign bit = macro, x4.
zero octa : octa , x4.

if (e < # 380) f if (e < # 380 25) o = 1. o = 0. = the sign = int r. 35. The funpack routine is. h Round and return the short result 35 i  if (o & 3) exceptions j= X_BIT . else f register tetra o0 . o o = This code is used in section 34. num . int )). It clears exceptions to zero. case ROUND_NEAR : o += (o & 4 ? 2 : 1). oo = o  (# 380 e). g = o  2. 36. case ROUND_OFF : break . r) octa f . case ROUND_UP : if (s 6= '-' ) o += 3. = the raw exponent = char s. and sign s. int . o g e g  = sticky bit = = # 380. . char . += (e # 380)  23. o0 = o. if (e > # 47d) e = # 47f. roughly speaking. = o  (# 380 e). exponent e. if (f :l & # 1fffffff) o j= 1. = the rounding mode = f register tetra o. oo . sfpack packs a short oat. tetra sfpack (f . if (oo 6= o0 ) o j= 1. if (o  # 7f800000) exceptions j= O_BIT + X_BIT . = the fraction part = int e. e. inf . = tininess = if (s  '-' ) o j= sign bit . = over ow else if (o < # 100000) exceptions j= U_BIT . h Subroutines 5 i + tetra sfpack ARGS ((octa . 3):h. from inputs having the same conventions as fpack . the opposite of fpack . s. switch (r) f case ROUND_DOWN : if (s  '-' ) o += 3.76 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 34. g g h Round and return the short result 35 i. break . It returns the type of value found: zro . return o. break . Similarly. It takes a given oating point number x and separates out its fraction part f . else f o = shift left (f .

e. and s to the values from which fpack would produce the original number x without exceptions. = zero is assumed to have this exponent = #de. When it returns num .77 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING or nan . it will have set f .

ne zero exponent ( 1000) h Other type de.

U_BIT X_BIT = macro. ee = (x:h  20) & # 7ff. f . return zro . fpack : octa ( ). x31. char )). x31. x31. x30. x32. num . O_BIT = macro. x3. x30. s = (x:h & sign bit ? '-' : '+' ). x3. = 4. octa . x30. f = shift left (x. 1). e = ee . s) octa x. 2). x7. h Subroutines 5 i + ftype funpack ARGS ((octa . x30. sign bit = macro. inf . return num . tetra = unsigned int . = macro ( ).nitions 36 i  typedef enum f zro . . x31. ROUND_DOWN ROUND_NEAR ROUND_OFF ROUND_UP shift left : octa ( ). x3. = address where the fraction part should be stored = int e. h: tetra . = address where the exponent part should be stored = char s. x4. See also section 59. return (ee < # 7ff ? num : f~ h  # 400000 ^ :f~ l ? inf : nan ). exceptions = 0. = the given oating point value = octa f . f~ h j= # 400000. if (ee ) f e = ee 1. int . exceptions : int . = 2. x3. This code is used in section 1. = 1. 37. ftype funpack (x. g while (:(f~ h & # 400000)). = 3. &= # 3fffff. e. nan g ftype . f = shift left (f . = macro. octa = struct . l: tetra . f~ h g if (:x:l ^ :f~ h) f e = zero exponent . g g ARGS do f ee . x2. = address where the sign should be stored = f register int ee .

g while (:(f~ h & # 400000)). ee = (x  23) & # ff. or when converting from . it uses sfpack and sfunpack only when loading and storing short oats. f~ h g if (:(x & # 7fffffff)) f e = zero exponent . exceptions = 0. e. e = ee + # 380. = the given oating point value = octa f . f~ l = x  31. return zro . f = shift left (f . 1). = address where the fraction part should be stored = int e. h Subroutines 5 i + ftype sfunpack ARGS ((tetra . f~ h j= 400000. ftype sfunpack (x. char )). int . return num . = (x  1) & # 3fffff.78 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 38. = address where the exponent part should be stored = char s. s = (x & sign bit ? '-' : '+' ). return (ee < # ff ? num : (x & # 7fffffff)  # 7f800000 ? inf : nan ). g g do f ee . octa . 39. Since MMIX downplays 32-bit operations. s) tetra x. if (ee ) f e = ee#+ # 380 1. f . = address where the sign should be stored = f register int ee .

&f . &e. s. e. 1). char s. &s). = 32 bits to be loaded into a 64-bit register f octa f . 2. g g = if (s  '-' ) return x. case num : return fpack (f . t = sfunpack (z. break . break . x:h j= sign bit . case nan : x = shift right (f . switch (t) f case zro : x = zero octa . int e. x. h Subroutines 5 i + octa load sf ARGS ((tetra )). . octa load sf (z ) tetra z . case inf : x = inf octa . ftype t. ROUND_OFF ).xed point to oating point. x:h j= # 7ff00000. break .

= 64 bits to be loaded into a 32-bit word f octa f . . x31. shift right : octa ( ). x3. x3. zero exponent = macro. x34. = macro ( ). x31. x3. x36. I_BIT  = = = macro. e. tetra z. x37. inf = 2. cur round ). x7.79 MMIX-ARITH: FLOATING POINT PACKING AND UNPACKING 40. h: tetra . x36. x4. int e. x4. &s). t = funpack (x. exceptions j= I_BIT . break . if (s  '-' ) return z. x30. case num : return sfpack (f . &f . ftype t. &e. sign bit = macro. inf octa : octa . cur round : int . x4. num = 1. char s. tetra store sf (x) octa x. s. h Subroutines 5 i + tetra store sf ARGS ((octa )). break . ROUND_OFF = 1. ( ). x36. z j= sign bit . x32. zro = 0. l: tetra . ftype = enum . fpack : octa ( ). tetra = unsigned int . x3. switch (t) f case zro : z = 0. case nan : if (:(f :h & # 200000)) f f :h j= # 200000. g g g ARGS z NaN was signaling = = # 7f800000 j (f :h  1) j (f :l  31). x36. x2. zero octa : octa . x36. break . nan = 3. octa = struct . x36. x7. funpack : ftype ( ). sfpack : tetra shift left : octa ( ). exceptions : int . x30. case inf : z = # 7f800000.

80 MMIX-ARITH: FLOATING MULTIPLICATION AND DIVISION 41. Floating multiplication and division. The hardest .

xed point operations
were multiplication and division; but these two operations are the easiest to implement

in oating point arithmetic, once their

xed point counterparts are available.
h Subroutines 5 i +
octa fmult ARGS ((octa ; octa ));
octa fmult (y; z)
octa y; z;

f

ftype yt ; zt ;
int ye ; ze ;
char ys ; zs ;
octa x; xf ; yf ; zf ;
register int xe ;
register char xs ;
yt = funpack (y; &yf ; &ye ; &ys );
zt = funpack (z; &zf ; &ze ; &zs );
xs = ys + zs '+' ;
= will be '-' when the result is negative =
switch (4  yt + zt ) f
h The usual NaN cases 42 i;
case 4  zro + zro : case 4  zro + num : case 4  num + zro : x = zero octa ; break ;
case 4  num + inf : case 4  inf + num : case 4  inf + inf : x = inf octa ; break ;
case 4  zro + inf : case 4  inf + zro : x = standard NaN ;
exceptions j= I_BIT ; break ;
case 4  num + num : h Multiply nonzero numbers and return 43 i;

g

g

if (xs  '-' )
return x;

x:h

j= sign bit ;

42. h The usual NaN cases 42 i 
case 4  nan + nan : if (:(y:h & # 80000)) exceptions j= I_BIT ;
case 4  zro + nan : case 4  num + nan : case 4  inf + nan :
if (:(z:h & # 80000)) exceptions j= I_BIT ; z:h j= # 80000;
return z ;
case 4  nan + zro : case 4  nan + num : case 4  nan + inf :
if (:(y:h & # 80000)) exceptions j= I_BIT ; y:h j= # 80000;
return y;

This code is used in sections 41, 44, 46, and 93.
43. h Multiply nonzero numbers and return 43 i 
xe = ye + ze # 3fd;
= the raw exponent =

= omult (yf ; shift left (zf ; 9));
if (aux :h  # 400000) xf = aux ;
else xf = shift left (aux ; 1); xe ;
if (x:h _ x:l) xf :l j= 1;
= adjust the sticky bit
return fpack (xf ; xe ; xs ; cur round );

x

This code is used in section 41. 

= 

=

y

is signaling 

=

81

MMIX-ARITH: FLOATING MULTIPLICATION AND DIVISION

44. h Subroutines 5 i +

octa fdivide ARGS ((octa ; octa ));
octa fdivide (y; z )
octa y; z;

f

ftype yt ; zt ;
int ye ; ze ;
char ys ; zs ;
octa x; xf ; yf ; zf ;
register int xe ;
register char xs ;
yt = funpack (y; &yf ; &ye ; &ys );
zt = funpack (z; &zf ; &ze ; &zs );
xs = ys + zs '+' ;
= will be '-' when the result is negative =
switch (4  yt + zt ) f
h The usual NaN cases 42 i;
case 4  zro + inf : case 4  zro + num : case 4  num + inf : x = zero octa ; break ;
case 4  num + zro : exceptions j= Z_BIT ;
case 4  inf + num : case 4  inf + zro : x = inf octa ; break ;
case 4  zro + zro : case 4  inf + inf : x = standard NaN ;
exceptions j= I_BIT ; break ;
case 4  num + num : h Divide nonzero numbers and return 45 i;

g

g

if (xs  '-' )
return x;

x:h

j= sign bit ;

45. h Divide nonzero numbers and return 45 i 
xe = ye ze + # 3fd;
= the raw exponent
xf = odiv (yf ; zero octa ; shift left (zf ; 9));
if (xf :h  # 800000) f
aux :l j= xf :l & 1;
xf = shift right (xf ; 1; 1);
xe ++ ; 

=

g

if (aux :h _ aux :l) xf :l j= 1;
= adjust the sticky bit
return fpack (xf ; xe ; xs ; cur round ); 

=

This code is used in section 44.

ARGS

= macro ( ), x2.

aux : octa , x4.
cur round : int , x30.
exceptions : int , x32.
fpack : octa ( ), x31.

ftype = enum , x36.
funpack : ftype ( ), x37.
h: tetra , x3.

= macro, x31.
= 2, x36.

I_BIT

inf

inf octa : octa , x4.

l: tetra , x3.

nan = 3, x36.
num = 1, x36.

octa = struct , x3.
odiv : octa ( ), x13.
omult : octa ( ), x8.
shift left : octa ( ), x7.
shift right : octa ( ), x7.

sign bit = macro, x4.
standard NaN : octa , x4.

y:
y:
z:
z:

octa , x46.
octa , x93.
octa , x46.
octa , x93.

Z_BIT

= macro, x31.

zero octa : octa , x4.
zro = 0, x36.

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

82

46. Floating addition and subtraction. Now for the bread-and-butter operation, the sum of two oating point numbers. It is not terribly diÆcult, but many
cases need to be handled carefully.
h Subroutines 5 i +
octa fplus ARGS ((octa ; octa ));
octa fplus (y; z )
octa y; z;

f

ftype yt ; zt ;
int ye ; ze ;
char ys ; zs ;
octa x; xf ; yf ; zf ;
register int xe ; d;
register char xs ;
yt = funpack (y; &yf ; &ye ; &ys );
zt = funpack (z; &zf ; &ze ; &zs );
switch (4  yt + zt ) f
h The usual NaN cases 42 i;
case 4  zro + num : return fpack (zf ; ze ; zs ; ROUND_OFF ); break ; 

may under ow 

= 

may under ow 

=

=

case 4  num + zro : return fpack (yf ; ye ; ys ; ROUND_OFF ); break ;
=

case 4  inf + inf : if (ys 6= zs ) f
exceptions j= I_BIT ; x = standard NaN ; xs = zs ; break ;

g

case 4  num + inf : case 4  zro + inf : x = inf octa ; xs = zs ; break ;
case 4  inf + num : case 4  inf + zro : x = inf octa ; xs = ys ; break ;
case 4  num + num : if (y:h 6= (z:h  # 80000000) _ y:l 6= z:l)
h Add nonzero numbers and return 47 i;
case 4  zro + zro : x = zero octa ;
xs = (ys  zs ? ys : cur round  ROUND_DOWN ? '-' : '+' ); break ;

g

g

if (xs  '-' )
return x;

x:h

j= sign bit ;

47. h Add nonzero numbers and return 47 i 

f octa o;

oo ;
ze _ (ye 

ze ^ (yf :h < zf :h _ (yf :h  zf :h ^ yf :l < zf :l))))
h Exchange y with z 48 i;

if (ye

<

= ye ze ;
xs = ys ; xe = ye ;
if (d) h Adjust for di erence in exponents 49 i;
if (ys  zs ) f
xf = oplus (yf ; zf );
if (xf :h  # 800000) xe ++ ; xf = shift right (xf ; 1; 1);
g else f
xf = ominus (yf ; zf );
if (xf :h  # 800000) xe ++ ; d = xf :l & 1; xf = shift right (xf ; 1; 1); xf :l j= d;
d

83

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

g

else while (xf :h < # 400000) xe

; xf

= shift left (xf ; 1);

return fpack (xf ; xe ; xs ; cur round );

g
This code is used in section 46.
48. h Exchange y with z 48 i 
f
o = yf ; yf
d = ye ; ye
d = ys ; ys

= zf ; zf = o;
= ze ; ze = d;
= zs ; zs = d;

g
This code is used in sections 47 and 51.
49. Proper rounding requires two bits to the right of the fraction delivered to fpack .
The

rst is the true next bit of the result; the other is a \sticky" bit, which is nonzero
if any further bits of the true result are nonzero. Sticky rounding to an integer takes
x into the number bx=2c + dx=2e.
Some subtleties need to be observed here, in order to prevent the sticky bit from
being shifted left. If we did not shift yf left 1 before shifting zf to the right,
an incorrect answer would be obtained in certain cases|for example, if yf = 254 ,
zf = 254 + 253 1, d = 52.
h Adjust for di erence in exponents 49 i 

f

= exact result =
if (d  2) zf = shift right (zf ; d; 1);
else if (d > 53) zf :h = 0; zf :l = 1;
= tricky but OK =
else f
if (ys 6= zs ) d ; xe ; yf = shift left (yf ; 1);
o = zf ;
zf = shift right (o; d; 1);
oo = shift left (zf ; d);
if (oo :l 6= o:l _ oo :h _ o:h) zf :l j= 1;

g
g
This code is used in section 47.

ARGS

= macro ( ), x2.

cur round : int , x30.
exceptions : int , x32.
fpack : octa ( ), x31.

ftype = enum , x36.
funpack : ftype ( ), x37.
h: tetra , x3.

I_BIT

= macro, x31.

inf = 2, x36.
inf octa : octa , x4.

l: tetra , x3.

num = 1, x36.

octa = struct , x3.
ominus : octa ( ), x5.
oplus : octa ( ), x5.
ROUND_DOWN

= 3, x30.

ROUND_OFF

= 1, x30.

shift left : octa ( ), x7.
shift right : octa ( ), x7.
sign bit = macro, x4.
standard NaN : octa , x4.
zero octa : octa , x4.
zro = 0, x36.

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

84

50. The comparison of oating point number with respect to  shares some of the

characteristics of oating point addition/subtraction. In some ways it is simpler, and
in other ways it is more diÆcult; we might as well deal with it now.
Subroutine fepscomp (y; z; e; s) returns 2 if y, z , or e is a NaN or e is negative. It
returns 1 if s = 0 and y  z (e) or if s 6= 0 and y  z (e), as de

ned in Section 4.2.2
of Seminumerical Algorithms ; otherwise it returns 0.
h Subroutines 5 i +
int fepscomp ARGS ((octa ; octa ; octa ; int ));
int fepscomp (y; z; e; s)
octa y; z; e;
= the operands =
int s;
= test similarity? =

f

octa yf ; zf ; ef ; o; oo ;
int ye ; ze ; ee ;
char ys ; zs ; es ;
register int yt ; zt ; et ; d;
et = funpack (e; &ef ; &ee ; &es );
if (es  '-' ) return 2;
switch (et ) f
case nan : return 2;
case inf : ee = 10000;
case num : case zro : break ;

g

yt = funpack (y; &yf ; &ye ; &ys );
zt = funpack (z; &zf ; &ze ; &zs );
switch (4  yt + zt ) f
case 4  nan + nan : case 4  nan + inf : case 4  nan + num : case 4  nan + zro :
case 4  inf + nan : case 4  num + nan : case 4  zro + nan : return 2;
case 4  inf + inf : return (ys  zs _ ee  1023);
case 4  inf + num : case 4  inf + zro : case 4  num + inf : case 4  zro + inf :
return (s ^ ee  1022);
case 4  zro + zro : return 1;
case 4  zro + num : case 4  num + zro : if (:s) return 0;
case 4  num + num : break ;

g

g
h Compare two numbers with respect to epsilon and return 51 i;

51. The relation y  z () reduces to y  z (=2d ), if d is the di erence between the

larger and smaller exponents of y and z .

h Compare two numbers with respect to epsilon and return 51 i 
h Undenormalize y and z, if they are denormal 52 i;
if (ye < ze _ (ye  ze ^ (yf :h < zf :h _ (yf :h  zf :h ^ yf :l < zf :l))))
h Exchange y with z 48 i;
if (ze  zero exponent ) ze = ye ;
= ye ze ;
if (:s) ee = d;
if (ee  1023) return 1;
d

85

MMIX-ARITH: FLOATING ADDITION AND SUBTRACTION

h Compute the di erence of fraction parts, o 53 i;
if (:o:h ^ :o:l) return 1;
if (ee < 968) return 0;
= if y =
6 z and  < 2
if (ee  1021) ef = shift left (ef ; ee 1021);

54

, y 6 z 

=

else ef = shift right (ef ; 1021 ee ; 1);
return o:h < ef :h _ (o:h  ef :h ^ o:l  ef :l);

This code is used in section 50.
52. h Undenormalize y and z , if they are denormal 52 i 
if (ye < 0) yf = shift left (y; 2); ye = 0;
if (ze < 0) zf = shift left (z; 2); ze = 0;

This code is used in section 51.

53. When d > 2, the di erence of fraction parts might not

t exactly in an octabyte;
in that case the numbers are not similar unless   3=8, and we replace the di erence
by the ceiling of the true result. When  < 1=8, our program essentially replaces 255 
by b255 c. These truncations are not needed simultaneously. Therefore the logic is
justi

ed by the facts that, if n is an integer, we have x  n if and only if dxe  n; n  x
if and only if n  bxc. (Notice that the concept of \sticky bit" is not appropriate
here.)
h Compute the di erence of fraction parts, o 53 i 
if (d > 54) o = zero octa ; oo = zf ;
else o = shift right (zf ; d; 1); oo = shift left (o; d);
if (oo :h 6= zf :h _ oo :l 6= zf :l) f
= truncated result, hence d > 2 =
if (ee < 1020) return 0;
= di erence is too large for similarity =
o = incr (o; ys  zs ? 1 : 1);
= adjust for ceiling =

g

o = (ys  zs ? ominus (yf ; o) : oplus (yf ; o));
This code is used in section 51.

= macro ( ), x2.
( ), x37.
h: tetra , x3.
incr : octa ( ), x6.
inf = 2, x36.
l: tetra , x3.
ARGS

funpack : ftype

nan = 3, x36.
num = 1, x36.

octa = struct , x3.
ominus : octa ( ), x5.
oplus : octa ( ), x5.

shift left : octa ( ), x7.
shift right : octa ( ), x7.
zero exponent = macro, x36.
zero octa : octa , x4.
zro = 0, x36.

Floating point output conversion. tetra )). static void bignum dec ARGS ((bignum . if (x:h & sign bit ) printf ("-" ). void print oat (x) octa x. bignum . The print oat routine converts an octabyte to a oating decimal representation that will be input as precisely the same value. h Store f and g as multiprecise integers 63 i. h Compute the signi. h Subroutines 5 i + static void bignum times ten ARGS ((bignum )).MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 86 54. static int bignum compare ARGS ((bignum . f g h Local variables for print oat 56 i. bignum )). void print oat ARGS ((octa )). h Extract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 i.

cant digits s and decimal exponent e 64 i. h Print the signi.

5 . . 1. NaN . One way to visualize the problem being solved here is to consider the vastly simpler case in which there are only 2-bit exponents and 2-bit fractions. 2. when the fraction part is even. 2.8 . 1. . (f : : g). would be 0. 1.2 .7 . Inf .cant digits with proper context 67 i.2 .7 . .5 .2 . 3. . respectively. . NaN. [f : : g]. when the fraction part is odd. 1).5 . NaN. h Extract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 i  f = shift left (x. 1. 3. e = f :h  21. it is open.5 . . 55. . The printed outputs for these sixteen values. . if we actually were dealing with such short exponents and fractions. Then the sixteen possible 4-bit combinations have the following interpretations: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 [0 : : 0:125] (0:125 : : 0:375) [0:375 : : 0:625] (0:625 : : 0:875) [0:875 : : 1:125] (1:125 : : 1:375) [1:375 : : 1:625] (1:625 : : 1:875) [1:875 : : 2:25] (2:25 : : 2:75) [2:75 : : 3:25] (3:25 : : 3:75) [3:75 : : 1] NaN(0 : : 0:375) NaN[0:375 : : 0:625] NaN(0:625 : : 1) Notice that the interval is closed.

= lower and upper bounds on the fraction part = register int e. x3. . l: tetra . h Handle the special case when the fraction part is zero 57 i  f if (:e) f printf ("0.87 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION f :h &= # 1fffff. octa = struct . = denormal = else if (e  # 7ff) f printf ("NaN" ). g:h = # 400000. shift left : octa ( ). if (g:h  # 100000 ^ g:l  1) return . g This code is used in section 54. case 1000 corresponds to the interval [1:875 : : 2:25]. f = incr (f . = exponent part = register int j. h: tetra . At such points the interval extends only half as far to the left of that power of 2 as it does to the right. f :l = # ffffffff. ARGS bignum = struct . = macro ( ). g e . in the 4-bit mini oat numbers considered above. x3. For example. 1). s: char [ ]. g else f :h = extreme NaNs come out OK even without adjusting f or g = j= # 200000. The transition points between exponents correspond to powers of 2. incr : octa ( ). g:l = 2. x59. f :h = # 3fffff. x4. <stdio. x6. g This code is used in section 55. return . 1). 56. x2. g:h j= # 200000. g if (e  # 7ff) f printf ("Inf" ).h>. if (:e) e = 1. printf : int ( ). x7." ). x3. return . = the \standard" NaN = e = # 3ff. x66. sign bit = macro. k. g. tetra = unsigned int . x3. if (:f :h ^ :f :l) h Handle the special case when the fraction part is zero 57 i else f g = incr (f . This code is used in section 54. = all purpose indices = See also section 66. h Local variables for print oat 56 i  octa f . 57.

MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 88 58. We want to .

in the sense that it has fewest signi.nd the \simplest" value in the interval corresponding to the given number.

Thus. outputting the leading digits where both endpoints agree. if the oating point number can be described by a relatively short string such as `. The basic idea is to generate the decimal representations of the two endpoints of the interval.1 ' or `37e100 '. for example.cant digits when expressed in decimal notation. we want to discover that representation. then making a .

nal decision at the .

The \simplest" value is not always unique. and we could represent 1001 in .2 or . in the case of 4-bit mini oat numbers we could represent the bit pattern 0001 as either . For example.rst place where they disagree.3 .

[A solution to the analogous problem for . The algorithm below tries to choose the middle possibility in such cases.ve equally short ways: 2.7 .3 or 2.6 or 2.5 or 2.4 or 2.

xed-point representations. we are done. : : : .] Suppose we are given two fractions f and g. and allows us to deal with the whole range of oating binary numbers using . If f = 0. and we want to compute the shortest decimal in the closed interval [f : : g]. 1990). The program below carries out the stated algorithm by using multiprecision arithmetic on 77-place integers with 28 bits each. 233{242. without the additional complication of round-to-even. If d < e. e. A similar procedure works with respect to the open interval (f : : g). where 0  f < g < 1. 59. Otherwise let 10f = d + f 0 and 10g = e + g0 . and repeat the process on the fractions 0  f 0 < g0 < 1. we can terminate by outputting any of the digits d + 1. This choice facilitates multiplication by 10. otherwise we output the common digit d = e. see Beauty is Our Business (Springer. where 0  f 0 < 1 and 0  g0 < 1. was used by the author in the program for TEX.

xed point arithmetic. from most signi. If f points to a bignum . We keep track of the leading and trailing digit positions so that trivial operations on zeros are avoided. its radix-228 digits are f~ dat [0] through f~ dat [76].

cant to least signi.

Furthermore. = would be 77 if we cared only about print oat = #de.cant. inclusive. both 1. we will use it later with radix 109 . We assume that all digit positions are zero unless they lie in the subarray between indices f~ a and f~ b. The dat array is made large enough to accommodate both applications. f~ dat [f~ a] and f~ dat [f~ b] are nonzero. unless f~ a = f~ b = bignum prec The bignum data type can be used with any radix less than 232 .

ne bignum prec 157 h Other type de.

= index of the most signi.nitions 36 i + typedef struct f int a.

= index of the least signi.cant digit = int b.

unde.cant digit. must be  a = = the digits.

ned except between a and b = tetra dat [bignum prec ]. . g bignum .

for (p = &f~ dat [f~ a]. qq . p ++ . if (f~ dat [f~ b]  0 ^ f~ b > f~ a) f~ b 61. carry . carry = x  28. x54. f < g. x56. carry = 0. x3. p  pp . pp = &f~ dat [f~ b]. And here is how we test whether whatever: h Subroutines 5 i + static int bignum compare (f . Here. assuming that over ow will not occur and that the radix is 228 : h Subroutines 5 i + static void bignum times ten (f ) bignum f . g) bignum f . q. f : octa . g p = carry . q = &f~ dat [f~ a]. q ++ ) if (p 6= q) return p < q ? 1 : 1. f = g. is how we go from f to 10f . register tetra x. print oat : void ( ). tetra = unsigned int . qq = &g~ dat [g~ b]. if (q  qq ) return p < pp . or register tetra p. p  q. x = p  10 + carry . f > g. f register tetra p. g. g if (carry ) f~ a . ) p f p = x & # fffffff. for (p = &f~ dat [f~ b]. q = &g~ dat [g~ a]. for example. f . pp . q. if (f~ a 6= g~ a) return f~ a > g~ a ? 1 : 1. . using any radix f g g return 1.89 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 60.

assuming that register tetra p. borrow = 0. 1] = 0. The following subroutine subtracts using a given radix.q f g ) f > 0 and g for ( . for (p = &f~ dat [g~ b]. p ) if (p) borrow = 0. r) bignum f . Armed with these subroutines. we are ready to solve the problem. q. qq . h Subroutines 5 i + static void bignum dec (f . register int x. else p = r. else borrow = 1. borrow . borrow . g g g f~ a ++ . q = &g~ dat [g~ b]. p = x. while (g~ b > f~ b) f~ dat [ ++ f~ b] = 0. = the radix = f g from f . while (f~ dat [f~ b]  0) f~ b . g. The . x = p q borrow . g. qq = &g~ dat [g~ a]. while (f~ dat [f~ a]  0) f if (f~ a  f~ b) f = the result is zero = f~ a = f~ b = bignum prec 1. p . tetra r. p = x + r. q  qq . 63.90 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 62. f~ dat [bignum prec return . if (x  0) borrow = 0. p = p 1.

the leading digit will go into position dat [1]. and so that when the number to be printed is exactly 1 the integer part of g will also be exactly 1. We choose c so that. for some constant c. = the constant c that makes it work = #de. the number destined for digit dat [k] will consist of the rightmost 28 bits of the given fraction after it has been shifted right c e 28k bits. If the exponent is e.rst task is to put the numbers into bignum form. when e has its maximum value # 7ff.

ne magic o set 2112 #de.

magic o set + 28 e 28  k. magic o set e 28  k. gg :dat [k + 1] = shift left (g. gg :dat [k 1] = shift right (g. 1):l & # fffffff. 1):l & # fffffff. :dat [k] = shift right (f . :dat [k + 1] = shift left (f . magic o set e 28  k. k = (magic o set :dat [k 1] = shift right (f . 1):l & # fffffff. magic o set + 28 e 28  k. 1):l & # fffffff. :b = ( :dat [k + 1] ? k + 1 : k).ne origin 37 = the radix point follows dat [37] = h Store f and g as multiprecise integers 63 i  e)=28. gg :dat [k] = shift right (g. e + 28  k (magic o set 28)):l & # fffffff. . e + 28  k (magic o set 28)):l & # fffffff. :a = ( :dat [k 1] ? k 1 : k).

The invariant relations :dat [ :a] 6= 0 and gg :dat [gg :a] 6= 0 are not maintained by the computation here. we don't need to do that extremely often. In the small-exponent case.91 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION gg :a = (gg :dat [k 1] ? k 1 : k). we may have to multiply f and g by 10 more than 300 times. in the worst case. a lot of leading zeros need to be lopped o . if e is extremely small. and computers are pretty fast nowadays. and we can use the stated algorithm directly. because bignum compare is not used. Of course. the fractions f and g will be less than 1. h Compute#the signi. because the interval endpoints are fractions with denominator 2t for some t > 50. when :a = origin or gg :a = origin . the computation always terminates before f becomes zero. 64. But hey. This code is used in section 54. But no harm is done. If e is suÆciently small. gg :b = (gg :dat [k + 1] ? k + 1 : k).

cant digits s and decimal exponent e 64 i  if (e > 401) h Compute the signi.

x66. bignum prec = 157. x59. :dat [origin ] = 0. for (e = 1. bignum times ten (& ).cant digits in the large-exponent case 65 i else f = if e  # 401 we have gg :a  origin and gg :dat [origin ]  8 = if ( :a > origin ) :dat [origin ] = 0. x7. p: register char . bignum = struct . g : octa . l: tetra . f : octa . x60. x66. x59. else p ++ = :dat [origin ] + '0' . x3. x59. x7. x3. s: char [ ]. bignum times ten : static void ( ). p = s. x59. shift right : octa ( ). x66.  = the middle digit = k: register int . gg :dat [origin ] = 0. bignum compare : static int ( ). = terminate the string s = This code is used in section 54. x56. e: register int . x56. gg : bignum . tetra = unsigned int . bignum times ten (&gg ). a: int . x56. x61. x56. x66. b: int . dat : tetra [ ]. gg :a > origin _ :dat [origin ]  gg :dat [origin ]. . g p = '\0' . : bignum . g p ++ = (( :dat [origin ] + 1 + gg :dat [origin ])  1) + '0' . ) f if (gg :a > origin ) e . x59. shift left : octa ( ).

This example shows that we need a slightly di erent strategy in the case of open intervals. say. but the number 7e22 actually corresponds to # 44ada56a4b0835c0 because of the round-to-even rule. An interesting case arises when the number to be converted is # 44ada56a4b0835bf.9999999999999995e22 . we cannot simply look at the . since the interval turns out to be (69999999999999991611392 :: 70000000000000000000000): If this were a closed interval. Therefore the correct answer is.MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 92 65. 6. we use the stated algorithm by considering f and g to be fractions whose denominator is a power of 10. we could simply give the answer 7e22 . When e is large.

and we do not terminate the process when f = 0 or g = 1. when open intervals are involved. Therefore we change the invariant relation to 0  f < g  1. h Compute the signi.rst position in which the endpoints have di erent decimal digits.

register char p. we have g=f < 1 + 10 16 . p ++ = (j + 1 + k)  1. &tt . &tt . # 10000000). h Local variables for print oat 56 i + bignum . tt :dat [origin ] = 10. &tt )  open . if (bignum compare (&gg . = the middle digit = done : . bignum dec (&gg . tt :a = tt :b = origin . but the ratio g=f is always  (1 + 2 52 + 2 53 )=(1 + 2 52 2 53 ) > 1 + 2  10 16 . k ++ ) bignum dec (&gg . &tt )  open ) break .cant digits in the large-exponent case 65 i  f register int open = x:l & 1. The length of string s will be at most 17. p = s. j ++ ) bignum dec (& . while (1) f bignum times ten (& ). &tt . for (j = '0' . = power of ten (used as the denominator) = char s[18]. 66. = f = 0 in a closed interval if ( :a  bignum prec 1 ^ :open ) goto done . bignum compare (&gg . e ++ ) bignum times ten (&tt ). For if f and g agree to 17 places. p ++ = j . bignum compare (& . = fractions or numerators of fractions = bignum tt . bignum compare (&gg . bignum times ten (&gg ). for (e = 1. gg . . &tt )  0. g This code is used in section 64. # 10000000). g = for (k = j . &tt )  open . # 10000000).

At this point the signi.93 MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 67.

03 ' to `3e-2 '. the output will use an explicit exponent only if the alternative would take more than 18 characters. We prefer the output `300. the result should be multiplied by 10e . h Print the signi.cant digits are in string s. In general. ' to the form `3e2 '. If we put a decimal point at the left of s. and s[0] 6= '0' . and we prefer `.

<stdio. x59. x60.cant digits with proper context 67 i  if (e > 17 _ e < (int ) strlen (s) 17) printf ("%c%s%se%d" . x59. s + e). bignum = struct . l: tetra . (s[1] ? ". x59. x56. printf : int ( ). else if (strlen (s)  e) printf ("%." : "" ). b: int . e. x56. k: register int . x59. x: octa . x63. origin = 37.h>. bignum compare : static int ( ). x54. 0). s + 1.*s. x59. e.%s" . . x54." . j : register int . print oat : void ( ). bignum prec = 157. bignum times ten : static void ( ). <string. e 1). x56.h>. s. a: int .%0*d%s" . dat : tetra [ ]. else printf ("%s%0*d. s). x3. s[0]. s. else if (e < 0) printf (". e: register int . This code is used in section 54. 0. x61. x62. bignum dec : static void ( ). strlen : int ( ). e (int ) strlen (s).

' is the oating constant # c008000000000000 . we want to be able to convert a given decimal number into its oating binary equivalent. h digit string i h optional sign i ! h empty i j + j h exponent i ! e h optional sign ih digit string i h optional exponent i ! h empty i j h exponent i h oating magnitude i ! h digit string ih exponent i j h decimal string ih optional exponent i j Inf j NaN j NaN. `1e3 ' and `1000 ' are both equivalent to # 408f400000000000 . `NaN ' and `+NaN.5 ' are both equivalent to # 7ff8000000000000. `-3. Floating point input conversion. j .94 MMIX-ARITH: FLOATING POINT INPUT CONVERSION 68. Going the other way. h digit string i j h digit string i. h digit string i h oating constant i ! h optional sign ih oating magnitude i h decimal constant i ! h optional sign ih digit string i For example. The following syntax is supported: h digit i ! 0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9 h digit string i ! h digit i j h digit string ih digit i h decimal string i ! h digit string i. The scan const routine looks at a given string and .

It puts the corresponding value into the global octabyte variable val . it also puts the position of the .nds the longest initial substring that matches the syntax of either h decimal constant i or h oating constant i.

0 if a decimal constant was found. It returns 1 if a oating constant was found. 1 if nothing was found. A decimal constant that doesn't .rst unscanned character in the global pointer variable next char .

else sign = '+' . h Subroutines 5 i + static void bignum double ARGS ((bignum )). if (strncmp (p. if (strncmp (p. if ((isdigit (p) ^ :NaN ) _ (p  '.t in an octabyte is computed modulo 264 . else NaN = false . "NaN" . if (p  '+' _ p  '-' ) sign = p ++ . "Inf" . int scan const (s) char s. f h Local variables for scan const 70 i. 3)  0) h Return in. 3)  0) NaN = true . if (NaN ) h Return the standard NaN 71 i. val :h = val :l = 0. p = s. int scan const ARGS ((char )). p += 3.' ^ isdigit ((p + 1)))) h Scan a number and return 73 i.

h Global variables 4 i + octa val . return 1. = . = value returned by scan const = = pointer returned by scan const char next char . no const found : next char = s. g 69.nity 72 i.

71. = are we processing a NaN? = int sign .95 MMIX-ARITH: FLOATING POINT INPUT CONVERSION 70. goto packit . h Return the standard NaN 71 i  f next char = p. = '+' or '-' = See also sections 76 and 81. g This code is used in section 68. val :h = # 600000. q. This code is used in section 68. h Local variables for scan const 70 i  register char p. = for string manipulations = register bool NaN . h Return in. 72.

goto make it in.nity 72 i  f next char = p + 3.

bool : enum .nite . x1. isdigit : int ( ). x3. x1. g This code is used in section 68. false = 0. ARGS = macro ( ). bignum = struct . l: tetra . h: tetra . <ctype. x59. x2. x3.h>. make it in.

nite : label.h>. x79. packit : label. . x1. x3. x78. true = 1. <string. strncmp : int ( ). octa = struct .

We saw above that a string of at most 17 digits is enough to characterize a oating point number.96 MMIX-ARITH: FLOATING POINT INPUT CONVERSION 73. is a number with more than 750 signi. consider the borderline quantity (1 + 2 53 )=21022 . For example. its decimal expansion. when written out exactly. for purposes of output. But a much longer bu er for digits is needed when we're doing input.

g This code is used in section 68. 75. 2)). val ). isdigit (p).2250738585. isdigit (p).. if (dec pt ) h Return a oating point constant 78 i.cant digits: 2. The bu er needs room for eight digits of padding at the left. if (q > buf0 _ p 6= '0' ) if (q < buf max ) q ++ = p. p '0' ). else if (q < buf max ) q ++ = p. for (zeros = 0. 74. so we implement the most general case. shift left (val . return 0.81250000001e-308 . dec pt = (char ) 0. or if additional nonzero digits are added as in 2. val = incr (shift left (val ... next char = p.8125e-308 . h Scan a number and return 73 i  f for (q = buf0 . p ++ . if (p  'e' ) h Scan an exponent 77 i else exp = 0. else if ((q 1)  '0' ) (q 1) = p. followed by up to 1022 + 53 307 signi. g This code is used in section 73. = g if (NaN ) q ++ = '1' . if (sign  '-' ) val = ominus (zero octa . We assume here that the user prefers a perfectly correct answer to a speedy almostcorrect one. h Scan a fraction part 74 i  f dec pt = q.. the rounded value is supposed to change from # 0010000000000000 to # 0010000000000001.' ) h Scan a fraction part 74 i. p ++ ) if (p  '0' ^ q  buf0 ) zeros ++ . If any one of those digits is increased. p ++ ) f = multiply by 5 val = oplus (val . if (p  '. 1). else if ((q 1)  '0' ) (q 1) = p.2250738585.

#de. and eight more digits of padding.cant digits. followed by a \sticky" digit at position buf max 1.

ne buf0 (buf + 8) #de.

=  where we put signi.ne buf max (buf + 777) h Global variables 4 i + static char buf [785] = "00000000" .

cant input digits = .

h Local variables for scan const 70 i + = position of decimal point in buf = register char dec pt . NaN : register bool . sign : int .0e9999999 ' must also be accommodated. q : register char . <ctype. x4. if (:dec pt ) dec pt = q. if (exp sign  '-' ) exp = exp . x68. h Scan an exponent 77 i  f register char exp sign . x70. 78. x5. packit : h Pack and round the answer 84 i. later used for raw binary exponent register int zeros . p ++ . ominus : octa ( ). val : octa . p ++ ) if (exp < 1000) exp = 10  exp + p '0' . = scanned exponent. else exp sign = '+' . : bignum . g This code is used in section 73. x6. scan const : int ( ). . h Return a oating point constant 78 i  f h Move the digits from buf to 79 i. h Determine the binary fraction and binary exponent 83 i. p: register char . incr : octa ( ). Here we don't advance next char and force a decimal point until we know that a syntactically correct exponent exists. = leading zeros removed after decimal point = = 77. g g This code is used in section 73. x7. The code here will convert extra-large inputs like `9e+9999999999999999 ' into 1 and extra-small inputs into zero. register int exp . if (isdigit (p)) f for (exp = p ++ '0' . x81. Strange inputs like `-00. return 1. x70. next char : char . x70. oplus : octa ( ).h>. isdigit (p). isdigit : int ( ). x70. x5. next char = p. zero octa : octa .97 MMIX-ARITH: FLOATING POINT INPUT CONVERSION 76. shift left : octa ( ). zeros = 0. x69. if (p  '+' _ p  '-' ) exp sign = p ++ . x69.

by putting the scanned input digits into a multiprecision .98 MMIX-ARITH: FLOATING POINT INPUT CONVERSION 79. Now we get ready to compute the binary fraction bits.

h Move the digits from buf to 79 i  x = buf + 341 + zeros dec pt exp . : : : . The radix-109 digit in [36 k] is understood to be multiplied by 109k . if (q  buf0 _ x  1413) f make it zero : exp = 99999. After this step. for 36  k  120. g if (x < 0) f make it in. :dat [ :b].xed-point accumulator that spans the full neces- sary range. the number that we want to convert to oating binary will appear in :dat [ :a]. goto packit . :dat [ :a + 1].

p  q ^ k  156. q = &f~ dat [f~ a]. pp = p + 1. 81. g This code is used in section 79. p < q + 8. pp < p + 9. k ++ ) h Put the 9-digit number p : : : (p + 8) into :dat [k] 80 i. h Subroutines 5 i + static void bignum double (f ) bignum f . x. p ++ ) p = '0' . register char pp . tt . :b = k 1. for (x = 0. h Local variables for scan const 70 i + register int k. p ) f f to 2f . = '0' . f register tetra p. :dat [156] += x. for (p = buf0 x % 9. p = x 1000000000. carry . "000000000" . q This code is used in section 78. k = :a. p += 9. 82. for (p = q. = nonzero digits that fall o the right are sticky = while ( :dat [ :b]  0) :b .nite : exp = 99999. for (p = &f~ dat [f~ b]. carry = 0. q. p += 9) if (strncmp (p. else carry = 0. p  q. p = x. goto packit . It changes assuming that over ow will not occur and that the radix is 109 . x = p + p + carry . bignum . g :a = x=9. p  q. Here's a subroutine that is dual to bignum times ten . 80. . if (x  1000000000) carry = 1. register int x. pp ++ ) x = 10  x + pp :dat [k] = x. = pad with trailing zeros = = compute stopping place in buf = q 1 (q + 341 + zeros dec pt exp ) % 9. 9) 6= 0) x = 1. h Put the 9-digit number p : : : (p + 8) into :dat [k] 80 i  f for (x = p '0' .

h Determine the binary fraction and binary exponent 83 i  val = zero octa . dec pt : register char . x69. if ( :a > 36) f for (exp = # 3fe. bignum dec : static void ( ). x3. = break if now zero = g g bignum double (& ). bignum compare : static int ( ). if (f~ dat [f~ b]  0 ^ f~ b > f~ a) f~ b . <string. 1000000000). tt :dat [36] = 2. g g g if (k  0) val :l j= 1. zero octa : octa . else val :l j= 1  k. b: int . g else f tt :a = tt :b = 36. l: tetra . k ) f bignum double (& ). h: tetra . x4. x59. x70. zeros : register int . x3. if ( :b  36) break . bignum prec = 157. exp ++ ) bignum double (&tt ). x76. scan const : int ( ). for (exp = # 3fe. val : octa . x59. = break if now zero = if ( :a  bignum prec 1) break . buf : static char [ ]. :a > 36. q : register char . p: register char . tetra = unsigned int . x78.h>. :dat [36] = 0. = packit : label. a: int . . bignum dec (& . for (k = 54. &tt . bignum compare (& . x61. x75.99 MMIX-ARITH: FLOATING POINT INPUT CONVERSION g p = carry . = This code is used in section 78. exp ) bignum double (& ). bignum times ten : static void add sticky bit if nonzero ( ). x76. k. k. dat : tetra [ ]. 83. if (bignum compare (& . x76. x70. x59. x60. x59. x59. &tt )  0. else val :l j= 1  k. g if (carry ) f~ a . x3. k ) f if ( :dat [36]) f if (k  32) val :h j= 1  (k 32). x75. x68. buf0 = macro. for (k = 54. exp : register int . strncmp : int ( ). &tt )  0) f if (k  32) val :h j= 1  (k 32). x62. bignum = struct .

999999999999999999999 ' doesn't get rounded up. .MMIX-ARITH: FLOATING POINT INPUT CONVERSION 100 84. strictly speaking. else if ((val :h & # 7fffffff)  # 3ff00000 ^ :val :l) val :h j= # 40000000. it is supposed to yield # 7fffffffffffffff. val :l = # ffffffff. sign .0 ' is illegal. if (NaN ) f if ((val :h & # 7fffffff)  # 40000000) val :h j= # 7fffffff. exp . ROUND_NEAR ). val :l = 1. g This code is used in section 78. We need to be careful that the input `NaN. h Pack and round the answer 84 i  val = fpack (val . we silently convert it to # 7ff0000000000001|a number that would be output as `NaN.0000000000000002 '. else val :h j= # 40000000. Although the input `NaN.

octa yf . octa )). NaN : register bool . x3. 36. funpack : ftype ( ).101 MMIX-ARITH: FLOATING POINT REMAINDERS 85. x37. and +2 if y and z are unordered. ze . 70. &zf . exp : register int . int ye . : . else if (y:l > z:l) x = 1. else return 0. ROUND_NEAR sign int x val octa x zro x . case 4  zro + zro : return 0. zt . &ze . nan = 3. The easiest task remaining is to compare two oating point quantities. ftype = enum . yt = funpack (y. Floating point remainders. x36. x36. z ) octa y. x30. fpack : octa ( ). x76. register int x. zs . switch (4  yt + zt ) f case 4  nan + nan : case 4  zro + nan : case 4  num + nan : case 4  inf + nan : case 4  nan + zro : case 4  nan + num : case 4  nan + inf : return 2. 0 if y = z . else if (y:h > z:h) x = 1. break . 69. &zs ). In this section we implement the remainder of the oating point operations|one of which happens to be the operation of taking the remainder. x3. x36. x31. x : x). char ys . x36. else if (y:h < z:h) x = 1. octa = struct . = 4. else if (y:l < z:l) x = 1. zf . x70. h: tetra . : . num = 1. g g ARGS return (ys  '-' ? = macro ( ). Routine fcomp returns 1 if y < z . &ye . x2. = 2. inf l: tetra . int fcomp (y. &yf . &ys ). case 4  zro + num : case 4  num + zro : case 4  zro + inf : case 4  inf + zro : case 4  num + num : case 4  num + inf : case 4  inf + num : case 4  inf + inf : if (ys 6= zs ) x = 1. x3. z. +1 if y > z . zt = funpack (z. h Subroutines 5 i + int fcomp ARGS ((octa . = 0. f ftype yt .

consider the operation of rounding to the nearest oating point integer: h Subroutines 5 i + octa . Several MMIX operations act on a single oating point number and accept an arbitrary rounding mode.102 MMIX-ARITH: FLOATING POINT REMAINDERS 86. For example.

octa . int )).ntegerize ARGS ((octa .

case ROUND_UP : if (zs 6= '-' ) xf = incr (xf . zt = funpack (z. break . if (:r) r = cur round . xf = shift right (zf . zs . r) octa z . 3). = the rounding mode = f g ftype zt . ROUND_OFF ). ze . octa xf . break . if (oo :l 6= zf :l _ oo :h 6= zf :h) xf :l j= 1. if (xf :l) xf :h = # 3ff00000. &zs ). case ROUND_OFF : break . 3). else f octa oo . switch (zt ) f case nan : if (:(z:h & # 80000)) f exceptions j= I_BIT . xf :l & 4 ? 2 : 1). = the operand = int r. case ROUND_NEAR : xf = incr (xf . 1074 ze ). if (ze  1022) return fpack (shift left (xf . g g 87. z:h j= # 80000. = . 1). char zs . return xf . xf :l = 1. g xf :l &= # fffffffc. = sticky bit = g switch (r) f case ROUND_DOWN : if (zs  '-' ) xf = incr (xf .ntegerize (z. xf :l = 0. = already an integer if (ze  1020) xf :h = 0. &zf . int ze . zs . h Integerize and return 87 i  if (ze  1074) return fpack (zf . This code is used in section 86. oo = shift left (xf . &ze . zf . ROUND_OFF ). if (zs  '-' ) xf :h j= sign bit . case num : h Integerize and return 87 i. 1074 ze . case inf : case zro : return z . 1074 ze ). ze .

To convert oating point to .103 MMIX-ARITH: FLOATING POINT REMAINDERS 88.

xed point. we use .

xit . h Subroutines 5 i + octa .

octa . int )).xit ARGS ((octa .

case num : if (funpack (. return z. zt = funpack (z. &ze . &zf .xit (z. int ze . case zro : return zero octa . &zs ). r) octa z . char zs . octa zf . switch (zt ) f case nan : case inf : exceptions j= I_BIT . o. = the rounding mode f ftype zt . = the operand = int r. if (:r) r = cur round .

x32. nan = 3. x3. o = shift left (zf . x4. ROUND_OFF ROUND_UP . &zs )  zro ) return zero octa . = macro ( ). = 2. r). x6. sign bit = macro. ze g g ARGS = g return (zs  '-' ? ominus (zero octa . x36. incr : octa = 2. x30. x37. x3. x36. funpack : ftype ( ). 1076). ominus : octa ( ). if (ze  1076) o = shift right (zf . o) : o). zero octa : octa . x36. ROUND_DOWN = 3. x36. cur round : int . 1076 ze . &zf . num = 1. x7. x31. x30. x30. ftype = enum . ROUND_NEAR = 4. ( ). x5. x7. exceptions : int . W_BIT = macro. x3. fpack : octa ( ). x4. &ze . shift left : octa ( ). octa = struct . inf l: tetra . = 1. zro = 0. 1). h: tetra . x36. x30. x31. if (ze  1140) return zero octa . else f if (ze > 1085 _ (ze  1085 ^ (zf :h > # 400000 _ (zf :h  # 400000 ^ (zf :l _ zs 6= '-' ))))) exceptions j= W_BIT . x31. x2.ntegerize (z. x30. I_BIT = macro. shift right : octa ( ).

104 MMIX-ARITH: FLOATING POINT REMAINDERS 89. Going the other way. we can specify not only a rounding mode but whether the given .

octa froot (z. p) octa z . if (:z:h ^ :z:l) return zero octa . t = sfpack (z. r) octa z . int . if (:r) r = cur round . h Convert to short oat 90 i  f register int ex . h Subroutines 5 i + octa oatit ARGS ((octa . int )). r. g This code is used in section 89. exceptions = ex . 91. s. = s = '+' . The square root operation is more interesting. = the rounding mode f ftype zt . u. z ). sfunpack (t. r). ex = exceptions . z = shift right (z. h Subroutines 5 i + octa froot ARGS ((octa . char s. e . = octabyte to oat = int r. g g if (p) h Convert to short oat 90 i. 1. return fpack (z. = rounding mode = int u. register int t. &s). e. register tetra t. r ). 1). 1). 90.xed point octabyte is signed or unsigned. . exceptions = 0. t = z:l & 1. e. octa oatit (z. = short precision? = f int e. = the operand = int r. else = 1076. z:l j= t. and whether the result should be rounded to short precision.z = shift left (z. &z. int )). while (z:h < # 400000) e while (z:h  # 800000) f e ++ . z = ominus (zero octa . = unsigned? = int p. &e. int . if (:u ^ (z:h & sign bit )) s = '-' . s. int ze .

= This code is used in section 91. x4. = sticky bit return fpack (xf . 1). xf :l = 2. x3. ominus : octa ( ). g g if (zs  '-' ) return x. k. x36. shift right : octa ( ). If n = b sc. (zf :h  (2  (k 43))) & 3). 2). xf = shift left (xf . (zf :l  (2  (k 27))) & 3). x38. for (k = 53. x36. break . x7. &zs ). octa = struct . x31. The square proot can be found by an adaptation of the old pencil-and-paper method. z:h j= # 80000. 1). x37. standard NaN : octa . rf :h = 0. (It could easily be made to run about twice as fast. xe . rf = ominus (rf . cur round : int . l: tetra . x4. x:h j= sign bit . x7. sfunpack : ftype ( ). xe = (ze + # 3fe)  1. shift left : octa ( ). register int xe . inf = 2. num = 1. nan = 3. x31. this invariant can be maintained if we replace s by 4s +(0. sfpack : tetra ( ). funpack : ftype ( ). x36. The following code implements this idea with 2n in xf and r in rf . ftype = enum . xf . x32. if (zs  '-' ^ zt 6= zro ) exceptions j= I_BIT . sign bit = macro. if (ze & 1) zf = shift left (zf . zf . else if (k  27) rf = incr (rf . x36. else switch (zt ) f case nan : if (:(z:h & # 80000)) exceptions j= I_BIT . fpack : octa ( ). 92. '+' . if ((rf :l > xf :l ^ rf :h  xf :h) _ rf :h > xf :h) f xf :l ++ . zro = 0. x4. exceptions : int . xf :l ++ . x34. 3) and n by 2n +(0. 1). case num : h Take the square root and return 92 i. x3. if (k  43) rf = incr (rf . h: tetra . if (:r) r = cur round . xf ). g g if (rf :h _ rf :l) xf :l ++ . x5. &zf . r). tetra = unsigned int . we have s = n2 + r where 0  r  2n. x2. zt = funpack (z. I_BIT = macro. . x6. x3. x30. rf :l = (zf :h  22) 1. 1.105 MMIX-ARITH: FLOATING POINT REMAINDERS char zs . octa x. where s is an integer. return z . x = standard NaN . x3. &ze . ARGS = macro ( ). case inf : case zro : x = z. incr : octa ( ). x36. rf . zero octa : octa . 2.) h Take the square root and return 92 i  xf :h = 0. k. k ) f rf = shift left (rf .

MMIX-ARITH: FLOATING POINT REMAINDERS 106 93. And .

ys . the correct result will always be obtained in one step of fremstep . &zf . case 4  zro + num : case 4  zro + inf : case 4  num + inf : return y. while (ye  thresh ) h Reduce (ye . = will be 1 if we've subtracted an odd multiple of z from y = thresh = ye delta . f ftype yt . int delta . 94. ys .nally. this computation will take a long time. int ye . Results of oating remainder are always exact. 1). the genuine oating point remainder. yf . zf . goto zero out if the remainder is zero. g g if (ys  '-' ) return x. octa fremstep (y. x:h j= sign bit . &yf . &ze . xf . zt . if (thresh < ze ) thresh = ze . ze . 1. if delta is suÆciently large. break . &zs ). exceptions j= I_BIT . yt = funpack (y. delta . . z. switch (4  yt + zt ) f h The usual NaN cases 42 i. ye . zs . A third parameter. z. odd . h Subroutines 5 i + octa fremstep ARGS ((octa . zero out : x = zero octa . return fpack (yf . zt = funpack (z. register int xe . ROUND_OFF ). thresh . so the rounding mode is immaterial. goto try complement if appropriate 95 i. In the latter case the E_BIT is set in exceptions . case 4  num + num : h Remainderize nonzero numbers and return 94 i. ys . One could compute (2n y) rem z much more quickly for large n by using O(log n) multiplications modulo z . if (ye  ze ) f exceptions j= E_BIT . Subroutine fremstep either calculates y rem z or reduces y to a smaller number having the same remainder with respect to z . yf ) by a multiple of zf . ROUND_OFF ). &ys ). gives a decrease in exponent that is acceptable for incomplete results. &ye . case 4  zro + zro : case 4  num + zro : case 4  inf + zro : case 4  inf + num : case 4  inf + inf : x = standard NaN . g if (ye < ze 1) return fpack (yf . but the oating remainder operation isn't important enough to justify such expensive hardware. delta ) octa y. h Remainderize nonzero numbers and return 94 i  odd = 0. octa x. ye . char xs . int )). If there's a huge di erence in exponents and the remainder is nonzero. yf = shift right (yf . octa . say 2500.

x36. fpack : octa ( ). ominus : octa ( ). x30. zf ). x4. x36. x4. This code is used in section 93. yf ). return fpack (xf . = macro ( ). x32. yf = shift left (yf . l: tetra . goto try complement if appropriate 95 i  f if (yf :h  zf :h ^ yf :l  zf :l) goto zero out . I_BIT inf shift left : octa ( ). xs . if (xf :h > yf :h _ (xf :h  yf :h ^ (xf :l > yf :l _ (xf :l  yf :l ^ :odd )))) xf = yf . shift right : octa ( ). x31. while (yf :h < # 400000) ye g This code is used in section 94. yf ) by a multiple of zf . x7. = macro. . xe . ROUND_OFF = 1. x4. x3.107 MMIX-ARITH: FLOATING POINT REMAINDERS try complement : xf = ominus (zf . f g yf = ominus (yf . 95. xe = ze . = 2. zro = 0. 1). x7. x36. h Reduce (ye . 1). x31. octa = struct . because a remainder of 0 is supposed to inherit the original sign of y . ye . ftype = enum . exceptions : int . yf = shift left (yf . = macro. 1). xs = ys . ARGS E_BIT . zero octa : octa . sign bit = macro. h: tetra . xs = '+' + '-' ys . x3. if (ye  ze ) odd = 1. Here we are careful not to change the sign of y . goto zero out if the remainder is zero. xf = shift left (xf . x5. x36. x37. standard NaN : octa . while (xf :h < # 400000) xe . if (yf :h < zf :h _ (yf :h  zf :h ^ yf :l < zf :l)) if (ye  ze ) goto try complement . num = 1. funpack : ftype ( ). ROUND_OFF ). x2. x3. x31.

h Add nonzero numbers and return 47 i Used in section 46. h Check that x < z. h Compare two numbers with respect to epsilon and return 51 i Used in section 50. h Compute the di erence of fraction parts. Names of the sections.MMIX-ARITH: NAMES OF THE SECTIONS 108 96. h Adjust for di erence in exponents 49 i Used in section 47. h Compute the signi. o 53 i Used in section 51. otherwise give trivial answer 14 i Used in section 13.

h Compute the signi.cant digits in the large-exponent case 65 i Used in section 64.

cant digits s and decimal exponent e 64 i Used in section 54. h Determine the binary fraction and binary exponent 83 i Used in section 78. h Determine the number of signi. h Convert to short oat 90 i Used in section 89.

h Extract the exponent e and determine the fraction interval [f : : g] or (f : : g) 55 i Used in section 54. h Global variables 4. h Determine the quotient digit q[j ] 20 i Used in section 13. h Handle the special case when the fraction part is zero 57 i Used in section 55. h Integerize and return 87 i Used in section 86. h Other type de. 75 i Used in section 1. h Multiply nonzero numbers and return 43 i Used in section 41. 81 i Used in section 68. h Exchange y with z 48 i Used in sections 47 and 51. 69. 30. decrease q^ by 1 23 i Used in section 20. 32. h Normalize the divisor 17 i Used in section 13. 9. h Move the digits from buf to 79 i Used in section 78. h If the result was negative. q^ 21 i Used in section 20. h Local variables for scan const 70. h Local variables for print oat 56. 66 i Used in section 54. h Find the trial quotient. h Divide nonzero numbers and return 45 i Used in section 44. 76.cant places n in the divisor v 16 i Used in section 13.

nitions 36. 59 i Used in section 1. h Print the signi. h Pack q and u to acc and aux 19 i Used in section 13. h Pack w into the outputs aux and acc 11 i Used in section 8. h Pack and round the answer 84 i Used in section 78.

h Return a oating point constant 78 i Used in section 73.cant digits with proper context 67 i Used in section 54. h Return in. yf ) by a multiple of zf . h Put the 9-digit number p : : : (p + 8) into :dat [k] 80 i Used in section 79. goto try complement if appropriate 95 i Used in section 94. goto zero out if the remainder is zero. h Reduce (ye . h Remainderize nonzero numbers and return 94 i Used in section 93.

h Return the standard NaN 71 i Used in section 68.nity 72 i Used in section 68. . h Round and return the short result 35 i Used in section 34. h Scan a fraction part 74 i Used in section 73. h Scan a number and return 73 i Used in section 68. h Stu for C preprocessor 2 i Used in section 1. h Round and return the result 33 i Used in section 31. h Store f and g as multiprecise integers 63 i Used in section 54. h Scan an exponent 77 i Used in section 73.

29. 88. 89. 40. 34. 26. 54. 68. 62. 46. 60. h Subtract bj q^v from u 22 i Used in section 20. 27. 12. 28. 93 i Used in section 1. 25. 86. 82. 91. 44. 31.109 MMIX-ARITH: NAMES OF THE SECTIONS h Subroutines 5. 41. h Take the square root and return 92 i Used in section 91. 13. 39. 50. 38. 8. 61. h Tetrabyte and octabyte type de. 24. 85. 37. 7. 6.

h The usual NaN cases 42 i Used in sections 41. h Unnormalize the remainder 18 i Used in section 13.nitions 3 i Used in section 1. 46. h Unpack the multiplier and multiplicand to u and v 10 i Used in section 8. h Unpack the dividend and divisor to u and v 15 i Used in section 13. . if they are denormal 52 i Used in section 51. 44. h Undenormalize y and z. and 93.

Con.110 MMIX-CONFIG Input format.

guration .

les allow this simulator to adapt itself to in.

nitely many possible combinations of hardware features. The purpose of the present module is to read a con.

guration .

and set up the relevant data structures.le. All data in a con. check it for validity.

guration .

this convention allows a user to include comments in the .le consists simply of tokens separated by one or more units of white space. where a \token" is any sequence of nonspace characters that doesn't contain a percent sign. Percent signs and anything following them on a line are ignored.

(5) there are two functional units. (6) the division instructions take three pipeline stages. (2) the memory bus takes 100 cycles to process an address. one for all the odd-numbered opcodes and one for all the rest.le. spending 40 cycles in the . (3) there's a D-cache. (4) each block in the Dcache has 1024 bytes. in which each set has 4 blocks and the replacement policy is least-recently-used. % Silly configuration writebuffer 200 memaddresstime 100 Dcache associativity 4 lru Dcache blocksize 1024 unit ODD 5555555555555555555555555555555555555555555555555555555555555555 unit EVEN aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa div 40 30 20 % three-stage divide It means that (1) the write bu er has capacity for 200 octabytes. Here's a simple (but weird) example: 1.

and 20 in the last. 30 in the second. Four kinds of speci.rst stage. 2. (7) all other parameters have default values.

cations can appear in a con.

guration .

according to the following syntax: h speci.le.

D. maximum octabytes in the write bu er. pp. LNCS 1750. must be  1. maximum instructions issued but not committed. must be  1.  Springer-Verlag Berlin Heidelberg 1999 . 1999. 110-136. Knuth: MMIXware. 3.cation i ! h PV spec i j h cache spec i j h pipe spec i j h functional spec i h PV spec i ! h parameter ih decimal value i h cache spec i ! h cache name ih cache parameter ih decimal value ih policy i h pipe spec i ! h operation ih pipeline times i h functional spec i ! unit h name ih 64 hexadecimal digits i A h PV spec i simply assigns a given value to a given parameter.  renameregs (default 5). must be  1. The possibilities for h parameter i are as follows:  fetchbuffer (default 4).E. must be  1.  writebuffer (default 2). maximum instructions in the fetch bu er. maximum partial results in the reorder bu er.  reorderbuffer (default 5).

fremmax (default 1). must be  1. maximum store instructions in the reorder bu er. memaddresstime (default 20).  disablesecurity (default 0). or 1024. and the `p ' interrupt will be signaled only when going from a nonnegative location to a negative one. must be  1.  branchpredictbits (default 0). extra cycles taken if a oating point input is denormal. is 1 if the hot-seat security checks are turned o . 512. must be  1. prime number used to address simulated memory. must be  1. maximum reductions in FREM computation per cycle. maximum instructions committed per cycle. This option is used only for testing purposes. must exceed memchunksmax . number of bytes per memory busload. must be 256. denout (default 1). maximum number of 216 -byte chunks of simulated memory.  hashprime (default 2009). number of bits of instruction-address-xor-branchhistory used to index the branch prediction table. memreadtime (default 20). it means that the `s ' interrupt will not occur.  branchaddressbits (default 0). cycles to read one memory busload. The stated defaults for memchunksmax and hashprime should be adequate for almost all applications. number of bits in branch history used to index the branch prediction table. cycles to process memory address. minimum number of cycles for data to remain in the write bu er. number of local registers in ring.  branchhistorybits (default 0). number of bits in each branch prediction table entry. dispatchmax (default 1). must be  1. not its results|unless a very huge program is being simulated. denin (default 1).  branchdualbits (default 0). must be  1. preferably by a factor of about 2. maximum lookahead for jumps per cycle. must be  1. membusbytes (default 8). is zero if page table calculations must be emulated by the operating system. extra cycles taken if a oating point result is denormal. must be  8. memwritetime (default 20). The values of memchunksmax and hashprime a ect only the speed of the simulator. peekahead (default 1).111               MMIX-CONFIG: INPUT FORMAT memslots (default 2). must be  1.  memchunksmax (default 1000). localregs (default 256). cycles to write one memory busload. must be  1. writeholdingtime (default 0). fetchmax (default 2). maximum instructions fetched per cycle. number of bits in instruction address used to index the branch prediction table. maximum instructions issued per cycle. commitmax (default 1).  hardwarepagetable (default 1). . must be a power of 2 that is 8 or more.

A h cache spec i assigns a given value to a parameter a ecting one of .MMIX-CONFIG: INPUT FORMAT 112 4.

number of processes that can simultaneous query the cache.  ports (default 1). The granularity must be 8 if writeallocate is 0." used to remember which items of data have changed since they were read from memory. number of bytes per cache block. which remembers all recently written data. which holds blocks removed from the main cache sets. The h policy i parameter should be nonempty only on cache speci.  copyouttime (default 1). must be  1. number of cycles to move a cache block from the cache proper to its output bu er. number of cache blocks in the victim bu er. (A cache with associativity 1 is said to be \direct-mapped.")  granularity (default 8). must be a power of 2 and at least 8. must be a power of 2. at least equal to the granularity. is 1 in a \write-back" cache. must be  1. number of cycles to move a cache block from its input bu er into the cache proper. must be zero or a power of 2.  writeback (default 0). must be a power of 2. The blocksize of ITcache and DTcache must be 8. number of sets of cache blocks. which doesn't make space for newly written data that fails to hit an existing cache block. which holds dirty data as long as possible. must be  1. must be a power of 2. is 1 in a \write-allocate" cache. number of cycles to query the cache.  writeallocate (default 0).ve possible caches: h cache spec i ! h cache name ih cache parameter ih decimal value ih policy i h cache name i ! ITcache j DTcache j Icache j Dcache j Scache h policy i ! h empty i j random j serial j pseudolru j lru The possibilities for h cache parameter i are as follows:  associativity (default 1).)  copyintime (default 1). (Hits in the S-cache actually require twice the accesstime. number of bytes per \dirty bit. (A cache with set size 1 is said to be \fully associative.  setsize (default 1). and at most equal to 8192.  victimsize (default 0). is 0 in a \write-around" cache. number of cache blocks per cache set. must be  1.")  blocksize (default 8). is 0 in a \write-through" cache.  accesstime (default 1). which cleans all data as soon as possible. once to query the tag and once to transmit the data.

cations for parameters associativity and victimsize . If no replacement policy is speci.

writeallocate . and copyouttime parameters affect the performance only of the D-cache and S-cache. The granularity . All four policies are equivalent when the associativity or victimsize is 1. so they never need to write their data. . random is the default. writeback . the other three caches are read-only. pseudolru is equivalent to lru when the associativity or victimsize is 2.ed.

and (if the PREGO command is used) the performance of the I-cache and IT-cache. regardless of the number of speci.113 MMIX-CONFIG: INPUT FORMAT The ports parameter a ects the performance of the D-cache and DT-cache. The Scache accommodates only one process at a time.

ed ports. Only the translation caches (the IT-cache and DT-cache) are present by default. But if any speci.

say.cations are given for. an I-cache. all of the unspeci.

 sh (default 1). the multiplex operator. oating point square root. Thus. oating point division. mul1 applies to nonzero one-byte multipliers.ed I-cache parameters take their default values. oating point multiplication. The secondary cache must have the same granularity as the D-cache.  sadd (default 1). A h pipe spec i governs the execution time of potentially slow operations. h pipe spec i ! h operation ih pipeline times i h pipeline times i ! h decimal value i j h pipeline times ih decimal value i Here the h operation i is one of the following:  mul0 through mul8 (default 10). oating point addition and subtraction. the boolean matrix ultiplication operators MOR and MXOR . 5. where j is as small as possible.  mux (default 1). signed and unsigned.  mor (default 1). oating point integerization. conversion from oating to . The block size of the secondary cache must not be less than the block size of the primary caches. the values for mul j refer to products in which the second operand is less than 28j .  fadd (default 4).  fsqrt (default 40). The existence of a S-cache (secondary cache) implies the existence of both I-cache and D-cache (primary caches for instructions and data).  fix (default 2). this applies to left and right shifts.  fmul (default 4).  div (default 60). for example. signed and unsigned. the sideways addition operator. this applies to integer division.  fdiv (default 40).  fint (default 4).

xed. conversion from . signed and unsigned.  flot (default 2).

signed and unsigned. For example. with a positive number of cycles to be spent in each stage. oating comparison with respect to epsilon. a speci.xed to oating.  feps (default 4). In each case one can specify a sequence of pipeline stages.

it can start working on a second product after three cycles have gone by.cation like `fmul 3 1 ' would say that a functional unit that supports FMUL takes a total of four cycles to compute the oating point product in two stages. denin is added to the time for the . If a oating point operation has a denormal input.

If a oating point operation has a denormal result. denout is added to the time for the last stage. .rst stage.

MMIX-CONFIG: 6. INPUT FORMAT 114 The fourth and .

nal kind of speci.

cation de.

nes a functional unit: h functional spec i ! unit h name ih 64 hexadecimal digits i The symbolic name should be at most .

The 64 hexadecimal digits contain 256 bits. the most signi.fteen characters long. with `1' for each supported opcode.

cant (leftmost) bit is for opcode 0 (TRAP ). and the least signi.

we can de.cant bit is for opcode 255 (TRIP ). For example.

a multiplication unit (which handles .ne a load/store unit (which handles register/memory operations).

xed and oating point multiplication). a boolean unit (which handles only bitwise operations). and a more general arithmeticlogical unit. as follows: unit unit unit unit LSU MUL BIT ALU 00000000000000000000000000000000fffffffcfffffffc0000000000000000 000080f000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000ffff00ff00ff0000 f0000000ffffffffffffffffffffffff0000000300000003ffffffffffffffff The order in which units are speci.

ed is important. because MMIX 's dispatcher will try to match each instruction with the .

Therefore it is best to list more specialized units (like the BIT unit in this example) before more general ones.rst functional unit that supports its opcode. this lets the specialized units have .

rst chance at the instructions they can handle. having possibly identical speci. There can be any number of functional units.

.cations. since these names are used in diagnostic messages. give each unit a unique name (e. One should. however. Opcodes that aren't supported by any speci. ALU1 and ALU2 if there are two arithmetic-logical units).g.

Full details about the signi. 7.ed unit will cause an emulation trap.

which de.cance of all these parameters can be found in the mmix-pipe module.

nes and discusses the data structures that need to be con.

gured and initialized. Of course the speci.

cations in a con.

guration .

Some combinations of parameters are clearly ridiculous.le needn't make any sense. But there remain a huge number of possibilities of interest. We could create a thousand rename registers and issue a hundred instructions per cycle. etc. we could specify 1-cycle division but pipelined 100-cycle shifts. especially as technology continues to evolve. We could. specify a unit that handles only the two opcodes NXOR and DIVUI . By experimenting with con. nor need they be practically achievable. for example. or 1-cycle memory access but 100-cycle cache access.

.gurations that are extreme by present-day standards. we can see how much might be gained if the corresponding hardware could be built economically.

Basic input/output.MMIX-CONFIG: 115 BASIC INPUT/OUTPUT 8. Let's get ready to program the MMIX con.

First we need some macros to print error messages.g subroutine by building some simple infrastructure. #de.

ne errprint0 (f ) fprintf (stderr . f ) #de.

f . a) fprintf (stderr .ne errprint1 (f . a) #de.

b) fprintf (stderr . f . a.ne errprint2 (f . a. b) #de.

ne errprint3 (f . b. b. a. a. f . c) fprintf (stderr . c) #de.

#de. exit ( 1).ne panic (x) f x. errprint0 ("!\n" ). g And we need a place to look at the input.

ne BUF_SIZE 100 = we don't need long lines = h Global variables 9 i  = input comes from here = FILE con.

g .

 = See also sections 15 and 28. exit : void ( ). x11. FILE . <stdio. = input lines go here = char token [BUF_SIZE ]. = and tokens are copied to here = = this is our current position = char buf pointer = bu er . This code is used in section 38.le . <stdlib.h>.h>. 9. = does token contain the next token already? bool token prescanned . bool = enum. char bu er [BUF_SIZE ]. <stdio. fprintf : int ( ).h>. MMIX-PIPE MMIX con.

stderr : FILE . x38. . <stdio.g : void ( ).h>.

The get token routine copies the next token of input into the token bu er. After the input has ended. a .MMIX-CONFIG: BASIC INPUT/OUTPUT 116 10.

= set token to the next token of the con.nal `end ' is appended. h Subroutines 10 i  static void get token ARGS ((void )).

guration .

g while (1) f = scan past white space = if (buf pointer  '\0' _ buf pointer  '\n' _ buf pointer if (:fgets (bu er . return . q. if (token prescanned ) f token prescanned = false .le = static void get token ( ) f register char p. BUF_SIZE . con.

g .

&v) 6= 1) return 1. 23. get token ( ). for (p = buf pointer .. 11. return v. g else if (:isspace (buf pointer )) break . 16. q = '\0' .. else buf pointer ++ . The get int routine is called when we wish to input a decimal value. g . bu er )). return . :isspace (p) ^ p 6= '%' . static int get int ( ) f int v. 22. return . buf pointer = bu er . if (sscanf (token .'" . "%d" . "end" ). q = token . q p See also sections 11.  = .le )) f strcpy (token . 30. h Subroutines 10 i + static int get int ARGS ((void )). This code is used in section 38. and 31. It returns 1 if the next token isn't a valid decimal integer.  '%' ) f g g g if (strlen (bu er )  BUF_SIZE 1 ^ bu er [BUF_SIZE 2] 6= '\n' ) panic (errprint1 ("config file line too long: `%s. p ++ . q ++ ) buf pointer = p.

A simple data structure makes it fairly easy to deal with parameter/value speci.MMIX-CONFIG: 117 BASIC INPUT/OUTPUT 12.

h Type de.cations.

= internal name = int defval . h Type de. See also sections 13 and 14. This code is used in section 38.nitions 12 i  typedef struct f char name [20]. = symbolic name = int v. maxval . = default value = int minval . Cache parameters are a bit more diÆcult. g pv spec . but still not bad. = minimum and maximum legal values = = must it be a power of two? = bool power of two .

13. wrb . maxval . prts g c param . typedef struct f char name [20]. = minimum and maximum legal values = = must it be a power of two? = bool power of two . gran . wra . h Type de. setsz . = default value = int minval . int defval . acctm . = symbolic name = = internal code = c param v. citm . blksz . cotm .nitions 12 i + typedef enum f assoc . Operation codes are the easiest of all. vctsz . g cpv spec .

= default value = g op spec . x9. x6. con. = symbolic name = = internal code = internal opcode v. x9.nitions 12 i + typedef struct f char name [8]. buf pointer : char . bu er : char [ ]. 14. int defval . x11. MMIX-PIPE bool = enum. ARGS = macro ( ). x9. BUF_SIZE = 100.

g .

h>.h>. x9. x9. fgets : char ( ). strlen : int ( ). MMIX-PIPE panic = macro ( ). MMIX-PIPE errprint1 = macro ( ). <string. <stdio.le : FILE . x49. x9. .h>. token : char [ ]. sscanf : int ( ). internal opcode = enum . x8. false = 0. x8. MMIX-PIPE x11. token prescanned : bool . isspace : int ( ). <ctype. strcpy : char ( ). <stdio.h>. <string.h>.

Most of the parameters are external variables that are declared in the header .MMIX-CONFIG: BASIC INPUT/OUTPUT 118 15.

but some are private to this module.le mmix-pipe. Here we de.h .

10g. f"reorderbuffer" . 1. INT_MAX . 256. &peekahead . 1. false g. 10g. f"blocksize" . f"setsize" . 2. &mem chunks max . INT_MAX . INT_MAX . INT_MAX . 10g.ne the main tables used below. f"mor" . 1. 1. &mem write time . &dispatch max . f"accesstime" . false g. false g. mul6 . 0. 8. INT_MAX . 1. f"memwritetime" . true g. 1. 1g. 1. false g. 0. mul7 . false g. vctsz . 1. f"mul2" . f"writeback" . f"denin" . 1. false g. 0. mul5 . f"hashprime" . prts . false g. 2009. 1. 1. sadd . 5. 8. cpv spec CPV [ ] = ff"associativity" . 0. 1. 0. &bp a . 32. 1. false g. 1. &denin penalty . INT_MAX . true g. 0. f"div" . f"branchhistorybits" . 60g. write buf size . &reorder buf size . f"mul1" . h Global variables 9 i + int fetch buf size . 5. 1. f"mux" . false g. INT_MAX . f"branchdualbits" . false g. f"granularity" . mul8 . f"branchaddressbits" . false g. 0. mul3 . &mem addr time . 0. f"mul8" . 1. f"branchpredictbits" . false gg. &max rename regs . 20. f"mul5" . 8. &bp c . 0. acctm . INT_MAX . INT_MAX . 10g. f"writebuffer" . INT_MAX . 0. 1. 0. false gg. f"membusbytes" . cotm . 0. 8192. f"sadd" . 1. 10g. INT_MAX . false g. INT_MAX . 1. f"hardwarepagetable" . 1. f"memaddresstime" . 1. 2. 1. f"copyouttime" . 0. 1g. 0. 1. 4. 1. mux . 2. true g. 1024. INT_MAX . &fetch max . false g. &holding time . f"writeholdingtime" . 1g. 1. 10g. f"commitmax" . 1. 1. div . &write buf size . sh . 10g. 0. wrb . INT_MAX . INT_MAX . true g. &bp b . &mem read time . false g. . 0. f"memslots" . INT_MAX . 1. f"fetchmax" . 32. false g. INT_MAX . false g. false g. &mem bus bytes . mul1 . f"memchunksmax" . int max cycs = 60. assoc . op spec OP [ ] = ff"mul0" . f"denout" . 0. 1. f"mul7" . 1. f"mul4" . 20. 1. &denout penalty . &hash prime . 1. &max mem slots . INT_MAX . 1000. 256. f"localregs" . 1. blksz . &bp n . 10g. 8. 8192. false g. citm . setsz . f"mul3" . f"sh" . f"mul6" . f"copyintime" . 0. 8. &lring size . false g. mor . (int ) &security disabled . mul4 . mul2 . 1. mem bus bytes . INT_MAX . false g. 1. false g. true g. f"peekahead" . f"fremmax" . false g. &hardware PT . wra . reorder buf size . 32. 1. f"disablesecurity" . INT_MAX . &frem max . f"ports" . hardware PT . INT_MAX . false g. &fetch buf size . &commit max . 0. f"dispatchmax" . 0. INT_MAX . f"writeallocate" . false g. true g. false g. mul0 . false g. 10g. 8. 0. false g. INT_MAX . 1. 20. 0. f"renameregs" . INT_MAX . false g. pv spec PV [ ] = f f"fetchbuffer" . INT_MAX . 8. gran . 1g. f"victimsize" . true g. f"memreadtime" . INT_MAX .

MMIX-CONFIG: 119 BASIC INPUT/OUTPUT f"fadd" fadd 4g f"fmul" fmul 4g f"fdiv" fdiv 40g f"fsqrt" fsqrt 40g f"fint" .

nt 4g f"fix" .

blksz = 1. fdiv = 16. MMIX-PIPE x150. MMIX-PIPE x49. . cpv spec = struct . . . OP  int PV size CPV size OP size . x349. denin penalty : int . . . acctm = 7. fetch max : int . . . . . . commit max : int . false = 0. . CPV . x13. MMIX-PIPE x59. MMIX-PIPE x11. . bp n : int . . assoc = 0. MMIX-PIPE x49. dispatch max : int . . MMIX-PIPE MMIX-PIPE MMIX-PIPE x59. MMIX-PIPE x59. cotm = 9. . . citm = 8. . bp c : int . . MMIX-PIPE x150. denout penalty : int . feps = 21.x 2g f"flot" ot 2g f"feps" feps 4gg. bp b : int . x349. x13. MMIX-PIPE x49. fadd = 14. . x13.  the number of entries in PV . . MMIX-PIPE x150. bp a : int . x13. . x13. MMIX-PIPE x49. MMIX-PIPE x150. div = 9. x13. .

.nt = 18. MMIX-PIPE x49. . = = . . .

mux = 11. lring size : int . x13. MMIX-PIPE x11. gran = 3. x13. MMIX-PIPE holding time : x207. MMIX-PIPE x86. frem max : int . MMIX-PIPE x49. mem read time : int . MMIX-PIPE x49. MMIX-PIPE int . MMIX-PIPE x49. mul3 = 3. setsz = 2. max mem slots : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE x214. ot = 20. sadd = 12. x49. mul5 = 5. x12. x214. MMIX-PIPE x49. MMIX-PIPE x349. MMIX-PIPE x49. mul4 = 4. wra = 6. wrb = 5. MMIX-PIPE x49. x86. MMIX-PIPE x49. vctsz = 4. MMIX-PIPE x49. MMIX-PIPE x49. mul1 = 1. mem write time : int . mul8 = 8. x214. x13. sh = 10. hash prime : int . true = 1. fsqrt = 17. prts = 10. mul2 = 2. MMIX-PIPE x49. mor = 13. mul0 = 0. MMIX-PIPE x49. MMIX-PIPE x49. x13. MMIX-PIPE x49. mul7 = 7. MMIX-PIPE MMIX-PIPE x66. MMIX-PIPE x59. peekahead : int . security disabled : bool . x86. MMIX-PIPE x49. pv spec = struct .h>. max rename regs : int . mem addr time : int . int .x = 19. x207. x13. <limits. x13. x14. MMIX-PIPE x49. INT_MAX = macro. mem chunks max : int . . op spec = struct . mul6 = 6. fmul = 15. MMIX-PIPE x49. x247.

= default mode is write-through and write-around = ~ c access time = c copy in time = c copy out time = 1. should equal CPV [0]:defval = ~ c bb = 8. if (:c) panic (errprint1 ("Can't allocate %s" . = default victimsize = ~ c repl = random . c aa = 1. = default blocksize = ~ c cc = 1. = default replacement policy = ~ c vrepl = random . = default setsize = ~ c gg = 8. static cache new cache (name ) char name . = default associativity. The new cache routine creates a cache structure with default values. ~ ~ ~ c . f register cache c = (cache ) calloc (1. (These default values are \hard-wired" into the program.) h Subroutines 10 i + static cache new cache ARGS ((char )).MMIX-CONFIG: BASIC INPUT/OUTPUT 120 16. name )). not actually read from the CPV table. sizeof (cache )). = default granularity = ~ c vv = 0. = default victim replacement policy = ~ c mode = 0.

ller :ctl = &(c .

~ ~ c .ller ctl ).

ller ctl :ptr a = (void ) c. ~ c .

~ ~ c usher ctl :ptr a = (void ) c. ~ c ports = 1. h Initialize to defaults 17 i  PV size = (sizeof PV )=sizeof (pv spec ). DTcache = new cache ("DTcache" ). .ller ctl :go :o:l = 4. ~ return c. Icache = Dcache = Scache = . = one stage = pipe seq [OP [j ]:v][1] = 0. for (j = 0. g 17. CPV size = (sizeof CPV )=sizeof (cpv spec ). ~ c usher :ctl = &(c usher ctl ). ~ c name = name . ~ c usher ctl :go :o:l = 4. ITcache = new cache ("ITcache" ). g This code is used in section 38. j ++ ) f pipe seq [OP [j ]:v][0] = OP [j ]:defval . j < PV size . for (j = 0. j ++ ) (PV [j ]:v) = PV [j ]:defval . j < OP size . OP size = (sizeof OP )=sizeof (op spec ).

Before we're ready to process the con.MMIX-CONFIG: 121 READING THE SPECS Reading the specs.

guration .

h>. access time : int . just to make sure that TRAP and TRIP instructions are handled by somebody. sizeof (func )). 18. calloc : void ( ). defval : int . so that we know how much space to allocate for them. errprint1 = macro ( ). x13. if (strcmp (token . x23. "end" ) 6= 0) f get token ( ). = TRIP = funit [funit count ]:ops [7] = # 1. cpv spec = struct . x15. x8. ctl : control . MMIX-PIPE ARGS = macro x167. x168. x8. errprint0 = macro ( ). if (:funit ) panic (errprint0 ("Can't allocate the functional strcpy (funit [funit count ]:name . we need to count the number of functional units. copy out time : int . x12. x167. MMIX-PIPE MMIX-PIPE x6. Dcache : cache . get token ( ). aa : int . copy in time : int . x14. "%%" ). . int . CPV size : int . x167. <stdlib. bb : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE x168. cc : int . This code is used in section 38. x167. DTcache : cache . x167. g g funit = (func ) calloc (funit count + 1. while (strcmp (token . units" )). CPV : cpv spec [ ]. defval : int . x167. ( ). A special background unit is always provided. coroutine . = TRAP = funit [funit count ]:ops [0] = # 80000000. = a unit might be named unit or end = get token ( ). "unit" )  0) f funit count ++ . h Count and allocate the functional units 18 i  funit count = 0. MMIX-PIPE x167. x15. cache = struct .le. x167.

x40. funit count : int . gg : int . x77. x49. x167. x10. name : char [ ]. x76. go = 72. x17. get token : static void ( ). x38. x167. usher ctl : control . x167. x167. x168. x168. j : register int .ller ctl : control . funit : func . o: octa . x76. Icache : cache . x167. x167. func : struct . mode : int . . usher : coroutine . x77. l: tetra . name : char . ITcache : cache .

x14. panic = macro ( ).h>. PV : pv spec [ ]. x167. x167. <string. <string. Scache : cache . vv : int . MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE . repl : replace policy . ops : tetra [ ]. x15. x15. PV size : int . vrepl : replace policy . pipe seq : unsigned char [ ][ ]. token : char [ ]. v : int . x164. x168. v : internal opcode . x12. pv spec = struct . x9. x167. random = 0. x44. x167. ports : int . op spec = struct . x12.ller : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE OP : op spec [ ]. OP size : int . strcmp : int ( ). x76. x15. x15. ptr a : void . strcpy : char ( ).h>. x14. x136. x8.

MMIX-CONFIG: READING THE SPECS 122 Now we can read the speci.

nor does it try to be very eÆcient.cations and obey them. This program doesn't bother to be very tolerant of errors. Incidentally. the speci.

h Record all the specs 19 i  rewind (con. We simply read them token by token.cations don't have to be broken into individual lines in any meaningful way.

g .

PV [j ]:name . token )). break . 20. if (n > PV [j ]:maxval ) panic (errprint2 ("Configuration error: %s must be PV [j ]:maxval )). . h If token is an operation name. process a cache spec 21 i. if (PV [j ]:power of two ^ (n & (n 1))) panic (errprint1 ("Configuration error: %s must be PV [j ]:name )). "end" )  0) break . j ++ ) if (strcmp (token . h If token is a parameter name. 19. token [0] = '\0' .le ). >= %d" . <= %d" . h If token is a cache name. panic (errprint1 ("Configuration syntax error: Specification can't start with \ `%s'" . h If token is a parameter name. if (n < PV [j ]:minval ) panic (errprint2 ("Configuration error: %s must be PV [j ]:minval )). PV [j ]:name . if (strcmp (token . PV [j ]:name )  0) f n = get int ( ). process a PV spec 20 i. if (strcmp (token . This code is used in section 19. g if (j < PV size ) continue . while (strcmp (token . funit count = 0. (PV [j ]:v) = n. process a pipe spec 24 i. process a PV spec 20 i  for (j = 0. "end" ) 6= 0) f get token ( ). j < PV size . a power of 2" . g This code is used in section 38. "unit" )  0) h Process a functional spec 25 i.

if (strcmp (token . "Icache" )  0) f if (:Icache ) Icache = new cache ("Icache" ). "ITcache" )  0) f pcs (ITcache ). else if (strcmp (token . MMIX-PIPE con. 22. g else if (strcmp (token . "Dcache" )  0) f if (:Dcache ) Dcache = new cache ("Dcache" ). continue . pcs (Scache ). = oops. "pseudolru" )  0) rr = pseudo lru . continue . "Scache" )  0) f if (:Icache ) Icache = new cache ("Icache" ). continue . if (:Scache ) Scache = new cache ("Scache" ). 21. "DTcache" )  0) f pcs (DTcache ). else if (strcmp (token . "random" )  0) rr = random . g else if (strcmp (token . f g  = get token ( ). "lru" )  0) rr = lru . pcs (Dcache ). continue . else if (strcmp (token . pcs (Icache ). process a cache spec 21 i  if (strcmp (token . g else if (strcmp (token . if (:Dcache ) Dcache = new cache ("Dcache" ). "serial" )  0) rr = serial . we should rescan that token else token prescanned = true . static void ppol (rr ) = subroutine to scan for a replacement policy replace policy rr . g else if (strcmp (token . h Subroutines 10 i + static void ppol ARGS ((replace policy )). g This code is used in section 19. continue .MMIX-CONFIG: 123 READING THE SPECS h If token is a cache name. ARGS = macro ( ).

g .

cache . n: register int . pcs : static void ( ). x15. power of two : bool . ITcache : MMIX-PIPE MMIX-PIPE x16. j : register int . strcmp : int ( ). MMIX-PIPE x164. x12. <string. x164. x8. x15. x168. x23. x12. get int : static int ( ). MMIX-PIPE x164. errprint2 = macro ( ). MMIX-PIPE x77. v: int . maxval : int . minval : int . x12. MMIX-PIPE x168. errprint1 = macro ( ). new cache : static cache ( ). token prescanned : bool .h>.h>. x8. name : char [ ]. lru = 3.  = PV : pv spec [ ]. funit count : int . rewind : void ( ). x9. DTcache : x6. MMIX-PIPE MMIX-PIPE x168. x11. get token : static void ( ).le : FILE . replace policy = enum. x9. Icache : cache . Scache : cache . x8. MMIX-PIPE x164. x10. x38. MMIX-PIPE x168. true = 1. cache . PV size : int . x9. serial = 1. token : char [ ]. . x164. x12. MMIX-PIPE x11. Dcache : cache . MMIX-PIPE x168. panic = macro ( ). pseudo lru = 2. x12. random = 0. <stdio. x38.

OP [j ]:name )  0) f for (i = 0. c copy in time = n. if (n < CPV [j ]:minval ) panic (errprint2 ("Configuration error: %s must be >= %d" . g h If token is an operation name. j < CPV size . 23. j < OP size . for (j = 0. case vctsz : c~ vv = n. . CPV [j ]:minval )). if (n > 255) panic (errprint0 ("Configuration if (n > max cycs ) max cycs = n. break . CPV [j ]:name . case acctm : if (n > max cycs ) max cycs = n. n. case setsz : c~ cc = n. ppol (&(c~ repl )). break . CPV [j ]:name )). switch (CPV [j ]:v) f case assoc : c~ aa = n. if (CPV [j ]:power of two ^ (n & (n 1))) panic (errprint1 ("Configuration error: %s must be power of 2" . j ++ ) if (strcmp (token . ~ case citm : if (n > max cycs ) max cycs = n. if (j  CPV size ) panic (errprint1 ("Configuration syntax error: `%s' isn't \ a cache parameter name" . f g  = register int j . CPV [j ]:name . get token ( ). break . ~ case cotm : if (n > max cycs ) max cycs = n.MMIX-CONFIG: READING THE SPECS 124 h Subroutines 10 i + static void pcs ARGS ((cache )). c copy out time = n. break . break . if (n < 0) break . cycles mu\ error: Pipeline cycles must be <= 255" )). ~ case prts : c~ ports = n. case blksz : c~ bb = n. break . i ++ ) f n = get int ( ). break . break . case gran : c~ gg = n. process a pipe spec 24 i  for (j = 0. case wra : c~ mode = (c~ mode & WRITE_ALLOC ) + n  WRITE_ALLOC . token )). j ++ ) if (strcmp (token . static void pcs (c) = subroutine to process a cache spec cache c. c access time = n. CPV [j ]:name )  0) break . ppol (&(c~ vrepl )). if (n > CPV [j ]:maxval ) panic (errprint2 ("Configuration error: %s must be <= %d" . CPV [j ]:maxval )). case wrb : c~ mode = (c~ mode & WRITE_BACK ) + n  WRITE_BACK . . break . n = get int ( ). if (n  0) panic (errprint0 ("Configuration error: Pipeline 24. break . break . st be positive" )).

CPV size : int . vrepl : replace policy . access time : int . token prescanned = true . x13. assoc = 0. x13. errprint0 = macro ( ). errprint2 = macro ( ). setsz = 2. x167. minval : int . repl : replace policy . ARGS = macro ( ). . x13. token prescanned : bool . MMIX-PIPE x11. MMIX-PIPE x136. citm = 8. MMIX-PIPE x136. maxval : int . errprint1 = macro ( ). blksz = 1. i: j: MMIX-PIPE x167. x8. MMIX-PIPE x167. true = 1. MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE x167. x15. x15. WRITE_BACK = 1. cache = struct . x15. x38. get token : static void ( ). n: register int . MMIX-PIPE x166. x8. cotm = 9. x13. x9. mode : int . token : char [ ]. x38. wrb = 5. MMIX-PIPE x167. c param . CPV : cpv spec [ ].h>. MMIX-PIPE x167. x167. x13. vv : int . pipe seq [OP [j ]:v][i] = n. x9. ppol : static void ( ). x167. x167. get int : static int ( ). register int . MMIX-PIPE x167. MMIX-PIPE x167. x13. x14. x13. x8. x15. WRITE_ALLOC = 2. ports : int . <string. x11. x13. x10. copy out time : int . x13. x22. break . register int . READING THE SPECS error: More than %d pipeline stages" . x167. x38. x13. x13. copy in time : int . OP size : int . acctm = 7. internal opcode . x13. prts = 10. x13. strcmp : int ( ). gg : int . OP : op spec [ ]. x13. v: v: MMIX-PIPE MMIX-PIPE panic = macro ( ). x15. This code is used in section 19. MMIX-PIPE x166.MMIX-CONFIG: 125 g g if (i  pipe limit ) panic (errprint1 ("Configuration pipe limit )). MMIX-PIPE x167. gran = 3. x13. MMIX-PIPE bb : int . wra = 6. if (j < OP size ) continue . name : char [ ]. cc : int . x13. x8. x14. vctsz = 4. max cycs : int . x6. pipe seq : unsigned char [ ][ ]. power of two : bool . name : char [ ]. pipe limit = 90. aa : int .

token [j ])). . for (i = j = n = 0. j ++ ) f if (token [j ]  '0' ^ token [j ]  '9' ) n = (n  4) + (token [j ] '0' ).MMIX-CONFIG: get token ( ). else panic (errprint1 ("Configuration error: `%c' is not a hex digit" . n = 0. g g 126 h Process a functional spec 25 i  25. else if (token [j ]  'A' ^ token [j ]  'F' ) n = (n  4) + (token [j ] 'A' + 10). token )). get token ( ). funit [funit count ]:name )). strcpy (funit [funit count ]:name . continue . if ((j & # 7)  # 7) funit [funit count ]:ops [i ++ ] = n. if (strlen (token ) > 15) panic (errprint1 ("Configuration error: `%s' is more than 15 characters long" . else if (token [j ]  'a' ^ token [j ]  'f' ) n = (n  4) + (token [j ] 'a' + 10). if (strlen (token ) 6= 64) panic (errprint1 ("Configuration error: unit %s doesn't ha\ ve 64 hex digit specs" . This code is used in section 19. token ). j < 64. f READING THE SPECS funit count ++ .

MMIX-CONFIG: 127 CHECKING AND ALLOCATING 26. The battle is only half over when we've absorbed all the data of the con. Checking and allocating.

guration .

coroutines. Let's tackle that . etc.le. and we must allocate space for cache blocks. We still must check for interactions between di erent quantities. One of the most diÆcult tasks facing us to determine the maximum number of pipeline stages needed by each functional unit.

name : char [ ]. max pipe op = feps . mul = 26. int stages [ld ] = int stages [st ] = int stages [frem ] = 2. get token : i: x77. <string. token : char [ ]. n: register int . . j ++ ) int stages [j ] = 1. x8. i ++ ) f funit [j ]:co [i]:name = funit [j ]:name . j < 256. frem = 25. errprint1 = macro ( ). MMIX-PIPE x49. g g This code is used in section 38. x8. MMIX-PIPE x49. needed by funit [j ] 29 i. x38. MMIX-PIPE x76. MMIX-PIPE x77. for (j = 0. mul0 = 0. MMIX-PIPE x23. j  funit count .h>. panic = macro ( ). mul8 = 8. MMIX-PIPE x49. This code is used in section 26. k: int . h Allocate coroutines in each functional unit 26 i  h Build table of pipeline stages needed for each opcode 27 i. MMIX-PIPE x49. x28. funit [j ]:co = (coroutine ) calloc (n. int op : x28. co : coroutine . coroutine = struct . strlen : int ( ). x38.h>. max real command = trip . j ++ ) int stages [j ] = strlen (pipe seq [j ]). n. funit count : int . MMIX-PIPE x49. MMIX-PIPE x23. sizeof (coroutine )). for (i = 0. funit [j ]:co [i]:stage = i + 1. funit : func . j  max pipe op . for (j = mul0 .h>. j ++ ) stages [j ] = int stages [int op [j ]]. stage : int . x28. register int . <stdlib. MMIX-PIPE static void ( ). funit [j ]:k = n. for (j = 0. MMIX-PIPE x49. x76. MMIX-PIPE ld = 56. MMIX-PIPE x49. <string. div = 9. n = 0. for ( . j : register int . MMIX-PIPE x49. 27. stages : int [ ]. st = 63. j ++ ) if (strlen (pipe seq [j ]) > n) n = strlen (pipe seq [j ]). i < n. j  max real command . j ++ ) f h Determine the number of stages. MMIX-PIPE x76. strcpy : char ( ). ops : tetra [ ]. calloc : void ( ). MMIX-PIPE x136. internal opcode [ ]. MMIX-PIPE x76. x38. int stages : int [ ].rst. pipe seq : unsigned char [ ][ ]. x10. MMIX-PIPE x49. int stages [mul ] = n. x9. j  mul8 . h Build table of pipeline stages needed for each opcode 27 i  for (j = div .

h Global variables 9 i + internal opcode int op [256] = f trap . fadd . funeq . funeq . etc.MMIX-CONFIG: CHECKING AND ALLOCATING 128 28. but it replaces divu by div . fcmp . . The int op conversion table is similar to the internal op array of the MMIX pipe routine. fsub by fadd .

x . fadd . .

x . feps . ot . feps . frem . ot . . ot . ot . feps . fsqrt . ot . fdiv . ot . ot . ot . fmul .

ld . subu . st . st . ld . pbr . br . cset . andn . = stages as function of mmix opcode = h Determine the number of stages. sh . pbr . = stages as function of internal opcode = int int stages [max real command + 1]. addu . add . br . cset . sh . addu . nor . sub . ld . st . br . sadd . addu . add . st . cset . ld . pushgo . st . zset . addu . nxor . set . set . ld . st . subu . sh . or . cmpu . cset . ld . noop . br . br . noop . orn . st . br . sh . set . st . wdif . pbr . sub . st . ld . andn . if (n  0) panic (errprint1 ("Configuration error: unit %s doesn't do anything" . tdif . get . zset . bdif . st . addu . bdif . mux . int stages [256]. mor . i ++ ) if (((funit [j ]:ops [i  5]  (i & # 1f)) & # 80000000) ^ stages [i] > n) n = stages [i]. sh . sh . pushj . set . zset . set . st . br . zset . mul . sync . ld . mul . and . ld . ld . pbr . pbr . ld . ld . subu . pbr . resume . zset . zset . ld . zset . odif . put . br . sh . pbr . zset . cmp . mor . sub . cmp . zset . subu . zset . andn . pop . div . cset . or . br . This code is used in section 26. ld . br . addu . sub . addu . 29. zset . zset . pbr . br . odif . go . ld . cset . mor . cset . tdif . addu . orn . or . st . nand . st . cset . nand . or . funit [j ]:name )). andn . st . st . put . cset . ld . unsave . ld . zset . addu . ld . div . zset . st . nxor . cset . ld . cmpu . st . sadd . xor . andn . pushj . zset . addu . pbr . pbr . and . i < 256. cset . nor . ld . go . pbr . addu . noop . div . st . zset . st . pbr . br . pbr . br . ld . st . ld .nt . cset . addu . ld . st . st . addu . cset . cset . cset . save . br . mul . st . div . wdif . or . st . trip g. st . cset . addu . sh . mux . set . br . br . pbr . andn . pbr . pbr . n. ld . pushgo . ld . st . prego . ld . st . st . or . ld . . st . prego . needed by funit [j ] 29 i  for (i = n = 0. mul . ld . xor . mor . st .

The next hardest thing on our agenda is to set up the cache structure .MMIX-CONFIG: 129 CHECKING AND ALLOCATING 30.

For example.elds that depend on the parameters. although we have de.

ned the parameter in the bb .

eld (the block size). we also need to compute the b .

l = 0. MMIX-PIPE x49. MMIX-PIPE x49. bb : int . and we must create the cache blocks themselves. MMIX-PIPE x49. fcmp = 22. div = 9. MMIX-PIPE x49. br = 69. j = 1) l ++ . addu = 30. feps = 21. and = 37. g add = 29. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. j . MMIX-PIPE x49. h Subroutines 10 i + static int lg ARGS ((int )). fdiv = 16. cmp = 46. MMIX-PIPE x49. MMIX-PIPE x167. f register int j . cmpu = 47. errprint1 = macro ( ). . x8. ARGS = macro int . MMIX-PIPE x49. cset = 53. MMIX-PIPE x49. fadd = 14. bdif = 48. return l 1. divu = 28. for (j = n. l. MMIX-PIPE x49. andn = 38. MMIX-PIPE x49. MMIX-PIPE x167. static int lg (n) = compute binary logarithm = int n. MMIX-PIPE x49. b: ( ). MMIX-PIPE x49.eld (log of the block size).

nt = 18. MMIX-PIPE x49. .

zset = 52.x = 19. MMIX-PIPE x49. x38. fsub = 24. MMIX-PIPE x49. MMIX-PIPE x49. trap = 82. MMIX-PIPE x76. st = 63. orn = 35. pushgo = 74. fsqrt = 17. x51. MMIX-PIPE x49. resume = 76. MMIX-PIPE x47. MMIX-PIPE x49. MMIX-PIPE x49. max real command = trip . pbr = 70. . fmul = 15. xor = 40. prego = 73. ot = 20. j : register int . set : cacheset . MMIX-PIPE x76. or = 34. MMIX-PIPE x49. MMIX-PIPE x49. sadd = 12. nor = 36. odif = 51. pushj = 71. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. nxor = 41. mor = 13. MMIX-PIPE x6. MMIX-PIPE x49. MMIX-PIPE x49. noop = 81. MMIX-PIPE x49. tdif = 50. MMIX-PIPE x49. MMIX-PIPE x77. sync = 79. internal opcode = enum . nand = 39. funeq = 23. MMIX-PIPE x49. x8. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. internal opcode [ ]. MMIX-PIPE x49. x49. MMIX-PIPE x49. MMIX-PIPE x49. i: internal op : MMIX-PIPE MMIX-PIPE ld = 56. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. go = 72. subu = 32. MMIX-PIPE x49. MMIX-PIPE x49. unsave = 78. MMIX-PIPE x49. ops : tetra [ ]. mmix opcode = enum . panic = macro ( ). pop = 75. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x49. MMIX-PIPE x167. MMIX-PIPE x49. x38. mux = 11. funit : func . wdif = 49. MMIX-PIPE x49. x38. MMIX-PIPE x49. n: register int . MMIX-PIPE x49. trip = 83. frem = 25. save = 77. mul = 26. MMIX-PIPE x49. put = 55. MMIX-PIPE x49. name : char [ ]. MMIX-PIPE x49. MMIX-PIPE x49. register int . sub = 31. sh = 10. MMIX-PIPE x49. get = 54.

c inbuf :data = (octa ) calloc (c bb  3. sizeof (char )). ~ ~ c v = lg (c vv ). name )). name )). char )). if (c~ bb < c~ gg ) panic (errprint1 ("Configuration error: blocksize of %s is\ less than granularity" . c outbuf :data = (octa ) calloc (c bb  3. c inbuf :dirty = (char ) calloc (c bb  c g. static void alloc cache (c. ~ ~ c tagmask = (1  (c~ b + c~ c)). ~ ~ if (:c~ inbuf :data ) panic (errprint1 ("Can't allocate data for inbuffer of %s" . f register int j . ~ ~ if (:c~ outbuf :data ) panic (errprint1 ("Can't allocate data for outbuffer of %s" . name )). g #de. if (name [0] 6= 'S' ) h Allocate reader coroutines for cache c 34 i. c~ gg )). ~ ~ c b = lg (c bb ). if (c~ gg 6= 8 ^ :(c~ mode & WRITE_ALLOC )) panic (errprint2 ("Configuration error\ : %s does write-around with granularity %d" . if (c~ vv ) h Allocate the victim cache for cache c 33 i. sizeof (octa )). name )). ~ ~ ~ if (:c~ outbuf :dirty ) panic (errprint1 ("Can't allocate dirty bits for outbuffer of %s" . ~ ~ c g = lg (c gg ). name )). ~ ~ c c = lg (c cc ). name . ~ ~ ~ if (:c~ inbuf :dirty ) panic (errprint1 ("Can't allocate dirty bits for inbuffer of %s" . h Allocate the cache sets for cache c 32 i.MMIX-CONFIG: CHECKING AND ALLOCATING 130 h Subroutines 10 i + static void alloc cache ARGS ((cache . sizeof (octa )). k. char name . name ) cache c. c a = lg (c aa ). 31. ~ if (c~ a + c~ b + c~ c  32) panic (errprint1 ("Configuration error: %s has >= 4 gigabytes of data" . if (name [1]  'T' ^ c~ bb 6= 8) panic (errprint1 ("Configuration error: blocksize of %s must be 8" . sizeof (char )). c outbuf :dirty = (char ) calloc (c bb  c g. name )). name )).

sizeof (cacheblock )). j. ~ ~ ~ 32. sizeof (char )). j ++ ) f c set [j ] = (cacheblock ) calloc (c aa . ~ ~ if (:c~ set ) panic (errprint1 ("Can't allocate cache sets for %s" . for (k = 0. . sizeof (cacheset )).ne sign bit # 80000000 h Allocate the cache sets for cache c 32 i  c set = (cacheset ) calloc (c cc . k ++ ) f = invalid tag = c set [j ][k ]:tag :h = sign bit . name )). ~ c set [j ][k ]:dirty = (char ) calloc (c bb  c g. for (j = 0. name )). ~ ~ if (:c~ set [j ]) panic (errprint2 ("Can't allocate cache blocks for set %d of %s" . j < c~ cc . k < c~ aa .

j . <stdlib. v: int . sizeof (char )). cc : int . if (:c~ victim ) panic (errprint1 ("Can't allocate blocks for victim cache of %s" . x167. mode : int . name )). x6. x167. int . x167. panic = macro ( ). x167. ~ ~ if (:c~ victim [k]:data ) panic (errprint2 ("Can't allocate data for block %d of victim cache of k. c victim [k ]:data = (octa ) calloc (c bb  3. data : octa . x8. x167. x167. k < c~ vv . outbuf : cacheblock . cache = struct . octa = struct . f c ~ victim = (cacheblock ) calloc (c~ vv . x167. MMIX-PIPE vv : int . %d of set %d of %s" . x17. int . x8. . name )). x17. g This code is used in section 31. x30. tag : octa . dirty : char . x167. x167. g: gg : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE x167. h Allocate the victim cache for cache 33 i  33. name )). MMIX-PIPE x167. tagmask : int . x167. ~ c victim [k ]:dirty = (char ) calloc (c bb  c g. MMIX-PIPE WRITE_ALLOC = 2. x8. sizeof (octa )). sizeof (cacheblock )). set %d of %s" . h: tetra . x167. j . errprint3 = macro ( ). c set [j ][k ]:data = (octa ) calloc (c bb  3. k. ARGS = macro ( ). calloc : void ( ). b: int . x8. sizeof (octa )). k ++ ) f = invalid tag = c victim [k ]:tag :h = sign bit . int . x167.MMIX-CONFIG: 131 g g CHECKING AND ALLOCATING if (:c~ set [j ][k]:dirty ) panic (errprint3 ("Can't allocate dirty bits for block k. bb : int . errprint2 = macro ( ). This code is used in section 31. MMIX-PIPE x166. k. a: aa : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE errprint1 = macro ( ). MMIX-PIPE x167. ~ ~ ~ if (:c~ victim [k]:dirty ) panic (errprint2 ("Can't allocate dirty bits for block %d \ of victim cache of %s" . int . c g %s" . MMIX-PIPE x167. for (k = 0. x167. victim : MMIX-PIPE cacheset . set : cacheset . lg : static int ( ).h>. ~ ~ if (:c~ set [j ][k]:data ) panic (errprint3 ("Can't allocate data for block %d of name )). inbuf : cacheblock . x167. name )). x167.

name )). alloc cache (ITcache .MMIX-CONFIG: 132 h Allocate reader coroutines for cache 34 i  34. j < c~ ports . c g g This code is used in section 31. for (j = 0. j ++ ) f c reader [j ]:stage = vanish . ITcache~ . "ITcache" ). ~ c reader [j ]:name = (name [0]  'D' ? (name [1]  'T' ? "DTreader" : "Dreader" ) : ~ (name [1]  'T' ? "ITreader" : "Ireader" )). if (:c~ reader ) panic (errprint1 ("Can't allocate readers for %s" . sizeof (coroutine )). f CHECKING AND ALLOCATING c ~ reader = (coroutine ) calloc (c~ ports . h Allocate the caches 35 i  35.

ller :name = "ITfiller" . ITcache~ .

ller :stage = .

DTcache~ .ll from virt . alloc cache (DTcache . "DTcache" ).

ller :name = "DTfiller" . DTcache~ .

ller :stage = .

Icache~ . "Icache" ). if (Icache ) f alloc cache (Icache .ll from virt .

Icache~ .ller :name = "Ifiller" .

ller :stage = .

Dcache~ . g if (Dcache ) f alloc cache (Dcache .ll from mem . "Dcache" ).

ller :name = "Dfiller" . Dcache~ .

ller :stage = .

ll from mem . Dcache~ usher :name = "Dflusher" . Dcache~ usher :stage = ush to mem . if (Scache~ bb < Icache~ bb ) panic (errprint0 ("Configuration if if g error\ : Scache blocks smaller than Icache blocks" )). (Scache~ gg 6= Dcache~ gg ) panic (errprint0 ("Configuration error\ : Scache granularity differs from the Dcache" )). g if (Scache ) f alloc cache (Scache . Icache~ . "Scache" ). (Scache~ bb < Dcache~ bb ) panic (errprint0 ("Configuration error\ : Scache blocks smaller than Dcache blocks" )).

ller :stage = .

Dcache~ .ll from S .

ller :stage = .

Scache~ .ll from S . Dcache~ usher :stage = ush to S .

ller :name = "Sfiller" . Scache~ .

ller :stage = .

for this we need to determine the maximum waiting time that will occur between scheduler and schedulee. The only nontrivial task remaining is to allocate the ring of queues for coroutine scheduling. . if (Scache ^ Scache~ bb ) > n n = Scache~ bb . Scache~ usher :name = "Sflusher" . h Allocate the scheduling queue 36 i  bus words = mem bus bytes  3. Now we are nearly done. 36. This code is used in section 38. n = 1. Scache~ usher :stage = ush to mem .ll from mem . j = (mem read time < mem write time ? mem write time : mem read time ).

coroutine = struct . MMIX-PIPE x167. x8. x8. alloc cache : x31. <stdlib. . errprint1 = macro ( ). p ++ ) f p name = "" . ~ g  = g This code is used in section 38. cache . for (p = ring . 1)=bus words )  j . MMIX-PIPE x214. DTcache : cache . c: MMIX-PIPE MMIX-PIPE MMIX-PIPE x168.h>. bus words : int . = header nodes are nameless = ~ p stage = max stage . bb : int . ring = (coroutine ) calloc (ring size . x23. n = mem addr time + ((int ) (n + bus words = now max cycs bounds the waiting time if (n > max cycs ) max cycs = n. if (Dcache ^ Dcache~ bb > n) n = Dcache~ bb . ring size = max cycs + 1. static void ( ). sizeof (coroutine )). p < ring + ring size . Dcache : cache . x31. errprint0 = macro ( ).MMIX-CONFIG: 133 CHECKING AND ALLOCATING if (Icache ^ Icache~ bb > n) n = Icache~ bb . calloc : void ( ). if (:ring ) panic (errprint0 ("Can't allocate the scheduling ring" )). f register coroutine p. x168.

ll from mem = 95. MMIX-PIPE x129. MMIX-PIPE x129. .

.ll from S = 94.

ll from virt = 93. mem bus bytes : int . MMIX-PIPE x167. x15. MMIX-PIPE x129. mem read time : int . mem write time : . MMIX-PIPE x129.

MMIX-PIPE x29. name : char . x38. ush to S = 96. reader : coroutine . mem addr time : int . ring size : int . MMIX-PIPE x129. int . vanish = 98. stage : int . ports : int . j : register int . MMIX-PIPE x167. x38. MMIX-PIPE x168. x168. MMIX-PIPE x167. MMIX-PIPE x214.ller : coroutine . MMIX-PIPE x214. MMIX-PIPE . x214. x8. ush to mem = 97. MMIX-PIPE x29. ring : coroutine . MMIX-PIPE x167. MMIX-PIPE x129. Icache : cache . MMIX-PIPE x23. name : char . x31. x15. j : register int . gg : int . n: register int . ITcache : MMIX-PIPE max stage = 99. MMIX-PIPE x23. MMIX-PIPE x129. cache . x31. Scache : cache . panic = macro ( ). MMIX-PIPE x167. max cycs : int . usher : coroutine . MMIX-PIPE x168.

sizeof (octa )). : Branch table has >= 4 gigabytes of data" )).MMIX-CONFIG: CHECKING AND ALLOCATING 134 h Touch up last-minute trivia 37 i  if (hash prime  mem chunks max ) panic (errprint0 ("Configuration error: hashprime must exceed memchunksmax" )). reorder top = reorder bot + reorder buf size . fetch bot = (fetch ) calloc (fetch buf size + 1. if (:mem hash [0]:chunk ) panic (errprint0 ("Can't allocate chunk 0" )). sizeof (specnode )). if (:dispatch stat ) panic (errprint0 ("Can't allocate dispatch counts" )). dispatch stat = (int ) calloc (dispatch max + 1. reorder bot = (control ) calloc (reorder buf size + 1. if (:fetched ) panic (errprint0 ("Can't allocate prefetch buffer" )). no hardware PT = 1 hardware PT . sizeof (char )). wbuf bot = (write node ) calloc (write buf size + 1. sizeof (write node )). if (Icache ^ Icache~ bb > j ) j = Icache~ bb . mem hash [hash prime ]:chunk = (octa ) calloc (1  13. if (:fetch bot ) panic (errprint0 ("Can't allocate the fetch buffer" )). if (:mem hash [hash prime ]:chunk ) panic (errprint0 ("Can't allocate 0 chunk" )). sizeof (control )). mem hash [0]:chunk = (octa ) calloc (1  13. fetch top = fetch bot + fetch buf size . if (:bp table ) panic (errprint0 ("Can't allocate the branch table" )). l = (specnode ) calloc (lring size . This code is used in section 38. if (bp n  0) bp table = . mem chunks = 1. mem hash = (chunknode ) calloc (hash prime + 1. sizeof (int )). else f = a branch prediction table is desired = if (bp a + bp b + bp c  32) panic (errprint0 ("Configuration error\ 37. if (:mem hash ) panic (errprint0 ("Can't allocate the hash table" )). j = bus words . . fetched = (octa ) calloc (j . wbuf top = wbuf bot + write buf size . if (:wbuf bot ) panic (errprint0 ("Can't allocate the write buffer" )). sizeof (chunknode )). sizeof (octa )). sizeof (fetch )). if (:l) panic (errprint0 ("Can't allocate local registers" )). g bp table = (char ) calloc (1  (bp a + bp b + bp c ). if (:reorder bot ) panic (errprint0 ("Can't allocate the reorder buffer" )). sizeof (octa )).

PUTTING IT ALL TOGETHER Here then is the desired con.MMIX-CONFIG: 135 38.

#include <stdio. rewind = = calloc .h>  INT_MAX  #include "mmix-pipe.guration subroutine.h" h Type de.h> #include <ctype. = fopen . sscanf . strlen .h> #include <string. exit = = isspace = = strcpy . strcmp = Putting it all together. fgets .h> #include <stdlib.h> #include <limits.

nitions 12 i h Global variables 9 i h Subroutines 10 i void MMIX con.

g (.

lename ) char .

con. f register int .lename .

g .

le = fopen (.

if (:con.lename "r" ).

g .

h Allocate the caches 35 i. file %s" . h Count and allocate the functional units 18 i. j . h Allocate the scheduling queue 36 i. h Allocate coroutines in each functional unit 26 i. .le ) panic (errprint1 ("Can't open configuration h Initialize to defaults 17 i. h Record all the specs 19 i. n . = = i. h Touch up last-minute trivia 37 i.

x214. x69. x206. bp b : int . bp n : int . bp c : int . x15. bp a : int . MMIX-PIPE x150. g bb : int . calloc : void ( ). bp table : char . x15. hardware PT : int .lename )). MMIX-PIPE x150. <stdio. x284. MMIX-PIPE x150.h>. fetch top : fetch . Icache : cache . <stdio. chunknode = struct . fopen : FILE ( ). x168.h>. con. fetch buf size : int . x207. x69. fgets : char ( ). MMIX-PIPE x150. MMIX-PIPE x150. <stdlib. fetch bot : MMIX-PIPE MMIX-PIPE MMIX-PIPE int . fetch .h>. x206. hash prime : int . chunk : octa . MMIX-PIPE x167. fetched : octa .

g .

lring size : int . <string. <string. x60. x44. isspace : int ( ). dispatch stat : int .h>.h>. panic = macro ( ).h>. <stdio. <stdlib. wbuf bot : write node . x86. <string. mem hash : chunknode . errprint0 = macro ( ). fetch = struct .h>. x247.le : FILE . x66. x242. rewind : void ( ). <stdio. x15. l: specnode . x8. mem chunks max : int . x68. x86. x17. x9. strcmp : int ( ). x207. x207. x247. <limits. exit : void ( ). INT_MAX = macro. specnode = struct . reorder top : control . strcpy : char ( ). strlen : int ( ). reorder buf size : int .h>. bus words : MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE no hardware PT : MMIX-PIPE bool . dispatch max : int . . <ctype. write buf size : int .h>. x59. x8. octa = struct . x15. mem chunks : int . x40. MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE MMIX-PIPE x246. control = struct . x207. errprint1 = macro ( ).h>. reorder bot : control . x8. wbuf top : write node .h>. x60. sscanf : int ( ). write node = struct .

h Process a functional spec 25 i Used in section 19. h Count and allocate the functional units 18 i Used in section 38. n. h Allocate the caches 35 i Used in section 38. 28 i Used in section 38. h Build table of pipeline stages needed for each opcode 27 i Used in section 26. h Touch up last-minute trivia 37 i Used in section 38. 15. 31 i Used in section 38. h Subroutines 10. h Allocate the victim cache for cache c 33 i Used in section 31. h Allocate coroutines in each functional unit 26 i Used in section 38. process a PV spec 20 i Used in section 19. h Allocate the cache sets for cache c 32 i Used in section 31. 11. process a pipe spec 24 i Used in section 19. 23. h If token is a parameter name. h Record all the specs 19 i Used in section 38. h Global variables 9. 22. NAMES OF THE SECTIONS Names of the sections. h Allocate reader coroutines for cache c 34 i Used in section 31. h Allocate the scheduling queue 36 i Used in section 38. process a cache spec 21 i Used in section 19. h If token is a cache name. h Determine the number of stages. h Type de. 30. h Initialize to defaults 17 i Used in section 38. h If token is an operation name. needed by funit [j ] 29 i Used in section 26.MMIX-CONFIG: 39. 16.

136 . 13. 14 i Used in section 38.nitions 12.

This program module contains brute-force implementations of the ten input/output primitives de.138 MMIX-IO Introduction.

1. because they are intended to be loaded with the pipeline simulator as well as with the simple simulator. The subroutines are grouped here as a separate package. h Preprocessor macros 2 i h Type de.ned at the beginning of MMIX-SIM.

h Preprocessor macros 2 i  #include <stdio. and we set things up to accommodate older versions of C. Of course we include standard C library routines.h> #ifdef __STDC__ #de.h> #include <stdlib.nitions 3 i h External subroutines 4 i h Global variables 6 i h Subroutines 7 i 2.

ne ARGS (list ) list #else #de.

ne ARGS (list ) ( ) #endif #ifndef FILENAME_MAX #de.

ne FILENAME_MAX 256 #endif #ifndef SEEK_SET #de.

ne SEEK_SET 0 #endif #ifndef SEEK_END #de.

3. The unsigned 32-bit type tetra must agree with its de.ne SEEK_END 2 #endif This code is used in section 1.

h Type de.nition in the simulators.

typedef struct f tetra h. Three basic subroutines are used to get strings from the simulated memory and to put strings into that memory. l.nitions 3 i  typedef unsigned int tetra . This code is used in section 1. These subroutines are de. g octa . = two tetrabytes makes one octabyte  = See also section 5.

ned appropriately in each simulator. We also use a few subroutines and constants de.

1999.  Springer-Verlag Berlin Heidelberg 1999 . int size . 138-147. D. LNCS 1750. extern int mmgetchars ARGS ((char buf .ned in MMIX-ARITH.E. 4. Knuth: MMIXware. int stop )). pp. octa addr . h External subroutines 4 i  extern char stdin chr ARGS ((void )).

This code is used in section 1. int size . octa oplus ARGS ((octa . int )). Each possible handle has a . = neg one :h = neg one :l = 1 = octa neg one . = zero octa :h = zero octa :l = 0 = octa zero octa . octa )). octa ominus ARGS ((octa . octa )). octa addr )). octa incr ARGS ((octa .MMIX-IO: 139 extern extern extern extern extern extern INTRODUCTION void mmputchars ARGS ((unsigned char buf .

h Type de.le pointer and a current mode. 5.

nitions 3 i + typedef struct f FILE fp .  .

 [read OK] + 2[write OK] + 4[binary] + 8[readwrite]  g sim .le pointer  int mode .

6. h Global variables 6 i  = = = = sim .le info .

le info s.

See also sections 9 and 24. The . This code is used in section 1.le [256].

h Subroutines 7 i  void mmix io init ARGS ((void )). void mmix io init ( ) f g s.rst three handles are initially open. 7.

le [0]:fp = stdin . s.

le [0]:mode = 1. s.

s.le [1]:fp = stdout .

s.le [1]:mode = 2.

s.le [2]:fp = stderr .

<stdio. MMIX-SIM x120. Standard C. 20. . 21. 19. 14. __STDC__ . x387. stdin : FILE . oplus : ( ). MMIX-PIPE x381. SEEK_END = macro. <stdio. incr : octa ( ). <stdio. MMIX-ARITH x5. MMIX-ARITH x4. MMIX-ARITH x6. 16.h>. <stdio. 10. . MMIX-PIPE x384. This code is used in section 1. FILENAME_MAX = macro. MMIX-ARITH x4. mmgetchars : int ( ). 22.h>.h>.h>.h>. and 23. See also sections 8. mmputchars : ( ). FILE . <stdio. mmgetchars : int ( ). <stdio. <stdio.h>. neg one : ominus : ( ). 12. MMIX-PIPE zero octa : octa . stdin chr : char ( ). stdout : FILE . MMIX-SIM x114. octa octa octa stderr : FILE . mmputchars : int ( ). SEEK_SET = macro.le [2]:mode = 2. MMIX-ARITH x5. int MMIX-SIM x117.h>. stdin chr : char ( ). 18. 11.

octa name . mode . octa mmix fopen (handle . name .MMIX-IO: INTRODUCTION 140 8. 0)  FILENAME_MAX ) goto abort . output. mode ) unsigned char handle . FILE tmp . The only tricky thing about these routines is that we want to protect the standard input. if (s. if (mmgetchars (name buf . octa . and error streams from being preempted. f char name buf [FILENAME_MAX ]. octa )). if (mode :h _ mode :l > 4) goto abort . h Subroutines 7 i + octa mmix fopen ARGS ((unsigned char . name . FILENAME_MAX .

le [handle ]:mode 6= 0 ^ handle > 2) fclose (s.

le [handle ]:fp ). s.

mode string [mode :l]).le [handle ]:fp = fopen (name buf . if (:s.

s.le [handle ]:fp ) goto abort .

le [handle ]:mode = mode code [mode :l]. = success = return zero octa . abort : s.

# 2. # 5. 10. If the simulator is being used interactively. = failure = return neg one .le [handle ]:mode = 0. "wb" . "rb" . int mode code [ ] = f# 1. "w+b" g. # 6. g 9. h Global variables 6 i + char mode string [ ] = f"r" . # fg. "w" . we can avoid competition for stdin by substituting another .

f g s. void mmix fake stdin (f ) FILE f . h Subroutines 7 i + void mmix fake stdin ARGS ((FILE )).le.

octa mmix fclose (handle ) unsigned char handle .  = f should be open in mode "r"  = h Subroutines 7 i + octa mmix fclose ARGS ((unsigned char )). f g if (s. 11.le [0]:fp = f .

le [handle ]:mode  0) return neg one . if (handle > 2 ^ fclose (s.

s.le [handle ]:fp ) 6= 0) return neg one .

= success = return zero octa .le [handle ]:mode = 0. .

octa mmix fread (handle . 12.MMIX-IO: 141 INTRODUCTION h Subroutines 7 i + octa mmix fread ARGS ((unsigned char . octa o. octa bu er . register int n. if (:(s. f register unsigned char buf . octa )). size . size ) unsigned char handle . o = neg one . bu er . octa .

if (s.le [handle ]:mode & # 1)) goto done .

le [handle ]:mode & # 8) s.

h Read n  size :l characters into buf 13 i. mmputchars (buf . bu er ). o:h = 0. n. sizeof (char )). buf = (unsigned char ) calloc (size :l. if (:buf ) goto done . g h Read n  size :l characters into buf 13 i  if (s. free (buf ). if (size :h) goto done . done : return ominus (o. size ). o:l = n.le [handle ]:mode &= # 2.

for (p = buf . p < buf + n. n = size :l. p ++ ) p = stdin chr ( ).le [handle ]:fp  stdin ) f register unsigned char p. 13. g else f clearerr (s.

1. size :l. s.le [handle ]:fp ). n = fread (buf .

if (ferror (s.le [handle ]:fp ).

h>. calloc : void ( ). MMIX-ARITH x4. x3. s. h: tetra . x4. <stdio. FILE . ominus : octa ( ). fp : FILE . MMIX-ARITH x4. x2. x3. fread : size t ( ). neg one : octa . <stdio.h>.h>. mode : int .h>. clearerr : void ( ).h>. MMIX-ARITH x5. free : void ( ). FILENAME_MAX = macro.h>. x3. <stdio. <stdio. fopen : FILE ( ).h>. <stdio. ( ). x5. g This code is used in section 12.le [handle ]:fp )) goto done .h>. mmgetchars : extern int ( ). <stdlib. mmputchars : extern int x4. octa = struct . <stdio. fclose : int ( ).h>. ferror : int ( ). <stdlib. ARGS = macro ( ). <stdio. l: tetra . x5.

le : sim .

<stdio. . x4. stdin chr : extern char ( ). stdin : FILE . zero octa : octa .le info [ ].h>. x6.

14. octa o. f g char buf [256]. octa mmix fgets (handle . register int n.MMIX-IO: INTRODUCTION h Subroutines 7 i + octa mmix fgets ARGS ((unsigned char . size . size ) unsigned char handle . bu er . register char p. octa bu er . octa )). int eof = 0. octa . if (:(s. s.

if (s.le [handle ]:mode & # 1)) return neg one . if (:size :l ^ :size :h) return neg one .

le [handle ]:mode & # 8) s.

if (size :l < s ^ :size :h) s = size :l. n). n). while (1) f h Read n < 256 characters into buf 15 i. o = zero octa . if ((n ^ buf [n 1]  '\n' ) _ (:size :l ^ :size :h) _ eof ) return o. n). size = incr (size . g h Read n < 256 characters into buf 15 i  = 255. 1). bu er = incr (bu er . size = incr (size . mmputchars (buf .le [handle ]:mode &= # 2. if (s. o = incr (o. bu er ). n + 1.

n ++ . s.le [handle ]:fp  stdin ) for (p = buf . ) f p = stdin chr ( ). 15. if (p ++  '\n' ) break . n = 0. n < s. s g else f if (:fgets (buf . s + 1.

eof = feof (s.le [handle ]:fp )) return neg one .

n = 0.le [handle ]:fp ). ) f if (:p ^ eof ) break . n ++ . if (p ++  '\n' ) break . 142 . This code is used in section 14. for (p = buf . g  g p = '\0' . n < s.

the author wishes good luck to whoever has to do this.MMIX-IO: 143 INTRODUCTION The routines that deal with wyde characters might need to be changed on a system that is little-endian. MMIX is always big-endian. but external .

register char p. bu er . s. register int n. octa . octa mmix fgetws (handle . h Subroutines 7 i + octa mmix fgetws ARGS ((unsigned char . octa bu er . size ) unsigned char handle . size . f g char buf [256].les prepared on random operating systems might be backwards. octa )). int eof . octa o. 16. if (:(s.

if (:size :l ^ :size :h) return neg one . if (s.le [handle ]:mode & # 1)) return neg one .

le [handle ]:mode & # 8) s.

MMIX-ARITH x6. feof : int ( ). x5. l: tetra . size = incr (size . mmputchars : extern int x4. x5. g ARGS = macro ( ). 2  n). x3. fgets : char ( ). bu er ). bu er = incr (bu er . 2  n + 2. 1). x3. mode : int . <stdio. bu er :l &= 2. n). neg one : octa .le [handle ]:mode &= # 2. size = incr (size . ( ). x3. o = zero octa .h>. incr : octa ( ). <stdio.h>. mmputchars (buf . octa = struct . if ((n ^ buf [2  n 1]  '\n' ^ buf [2  n 2]  0) _ (:size :l ^ :size :h) _ eof ) return o. s. MMIX-ARITH x4. o = incr (o. n). h: tetra . x2. while (1) f h Read n < 128 wyde characters into buf 17 i. fp : FILE .

le : sim .

zero octa : octa .h>. x6. MMIX-ARITH x4.le info [ ]. <stdio. stdin : FILE . . stdin chr : extern char ( ). x4.

MMIX-IO: INTRODUCTION 144 h Read n < 128 wyde characters into buf 17 i  = 127. if (s. if (size :l < s ^ :size :h) s = size :l.

) f p ++ = stdin chr ( ). n ++ . 17. s. s g else for (p = buf . if ((p 1)  '\n' ^ (p 2)  0) break . p ++ = stdin chr ( ). 1. n = 0. 2. n < s. n = 0.le [handle ]:fp  stdin ) for (p = buf . n < s. ) f if (fread (p.

le [handle ]:fp ) 6= 2) eof = feof (s.

f g ++ . if ((p 1)  '\n' ^ (p 2)  0) break . 18.le [handle ]:fp ). f g char buf [256]. octa mmix fwrite (handle . bu er . p += 2. if (:eof ) return neg one . if (:(s. break . size ) unsigned char handle . octa bu er . octa )). n  g p = (p + 1) = '\0' . register int n. octa . size . h Subroutines 7 i + octa mmix fwrite ARGS ((unsigned char . This code is used in section 16.

if (s. size ).le [handle ]:mode & # 2)) return ominus (zero octa .

le [handle ]:mode & # 8) s.

le [handle ]:mode &= # 1. size :l. size = incr (size . 1. else n = mmgetchars (buf . if (fwrite (buf . 1). bu er . 256. bu er . n. s. 1). n). while (1) f if (size :h _ size :l  256) n = mmgetchars (buf .

size ). ush (s.le [handle ]:fp ) 6= n) return ominus (zero octa .

bu er = incr (bu er . f . octa )). if (:size :l ^ :size :h) return zero octa . n). g h Subroutines 7 i + octa mmix fputs ARGS ((unsigned char . string ) unsigned char handle . octa string . 19.le [handle ]:fp ). octa mmix fputs (handle .

MMIX-IO: 145 INTRODUCTION char buf [256]. octa o. register int n. o = zero octa . if (:(s.

le [handle ]:mode & # 2)) return neg one . if (s.

le [handle ]:mode & # 8) s.

n. while (1) f n = mmgetchars (buf . 1. string . s. if (fwrite (buf .le [handle ]:mode &= # 1. 0). 256.

n). o = incr (o. if (n < 256) f ush (s.le [handle ]:fp ) 6= n) return neg one .

n). string ) unsigned char handle . octa o. g g g string = incr (string . octa string . register int n. o = zero octa . f char buf [256]. h Subroutines 7 i + octa mmix fputws ARGS ((unsigned char . octa )). if (:(s.le [handle ]:fp ). octa mmix fputws (handle . return o. 20.

1). 256. while (1) f n = mmgetchars (buf . s. string . n.le [handle ]:mode & # 2)) return neg one . if (fwrite (buf . 1.

o = incr (o. n  1). if (n < 256) f ush (s.le [handle ]:fp ) 6= n) return neg one .

x16.le [handle ]:fp ). <stdio. neg one : octa . feof : int ( ).h>. x5. x3. x16.h>. x16. p: register char . n: register int .h>. handle : unsigned char . n). MMIX-ARITH x5. x5. h: tetra . fwrite : size t ( ). x16. fread : size t ( ). x16. mode : int . octa = struct . l: tetra . x3. incr : octa ( ). s: register int . <stdio. x4. MMIX-ARITH x4. MMIX-ARITH x4. ARGS = macro ( ).h>. eof : int . ush : int ( ). <stdio. mmgetchars : extern int ( ). g g g string = incr (string . <stdio. x16. x2. fp : FILE . x3. buf : char [ ]. MMIX-ARITH x6. ominus : octa ( ). s. return o.

le : sim .

<stdio. zero octa : octa .le info [ ]. stdin : FILE . x16. x6. x4.h>. size : octa . . stdin chr : extern char ( ).

MMIX-IO: INTRODUCTION 146 #de.

f if (:(s. o set ) unsigned char handle . octa o set . 21.ne sign bit ((unsigned ) # 80000000) h Subroutines 7 i + octa mmix fseek ARGS ((unsigned char . octa mmix fseek (handle . octa )).

if (s.le [handle ]:mode & # 4)) return neg one .

le [handle ]:mode & # 8) s.

if (o set :h & sign bit ) f if (o set :h 6= # ffffffff _ :(o set :l & sign bit )) return neg one .le [handle ]:mode = # f. if (fseek (s.

SEEK_END ) 6= 0) return neg one . g else f if (o set :h _ (o set :l & sign bit )) return neg one . (int ) o set :l + 1. if (fseek (s.le [handle ]:fp .

octa o. octa mmix ftell (handle ) unsigned char handle . if (:(s. f register long x.le [handle ]:fp . SEEK_SET ) 6= 0) return neg one . h Subroutines 7 i + octa mmix ftell ARGS ((unsigned char )). (int ) o set :l. 22. g g return zero octa .

x = ftell (s.le [handle ]:mode & # 4)) return neg one .

return o. One last subroutine belongs here. 23. o:h g = 0. o:l = x. if (x < 0) return neg one . just in case the user has modi.le [handle ]:fp ).

h Subroutines 7 i + void print trip warning ARGS ((int . void print trip warning (n. octa loc .ed the standard error handle. octa )). loc ) int n. f g if (s.

le [2]:mode & # 2) fprintf (s.

"floating point division by zero" . divide check" . "invalid floating point operation" . loc :h. "float-to-fix overflow" . "floating point inexact" g. "Warning: trip warning [n]. %s at location %08x%08x\n" . h Global variables 6 i + char trip warning [ ] = f"TRIP" . "integer overflow" . . "floating point overflow" . "floating point underflow" . "integer 24. loc :l).le [2]:fp .

11. 12. 8. 24 i Used in section 1. 10. 9. 18. h Preprocessor macros 2 i Used in section 1. 16. NAMES OF THE SECTIONS Names of the sections. h Global variables 6. 21. 19. 22. 20. h Read  size characters into buf 13 i Used in section 12. h Subroutines 7. h Read 256 characters into buf 15 i Used in section 14. h Type de.MMIX-IO: 147 25. 23 i Used in section 1. 14. h Read 128 wyde characters into buf 17 i Used in section 16. h External subroutines 4 i Used in section 1.

SEEK_END = macro. SEEK_SET = macro. l: tetra .h>. <stdio.h>. fseek : int ( ). h: tetra .h>. <stdio. <stdio. fprintf : int ( ). n < n < n :l ARGS = macro ( ). <stdio. x2. <stdio. x5. 5 i Used in section 1.nitions 3.h>. mode : int . s. neg one : octa . x5.h>. MMIX-ARITH x4. x3. octa = struct . x3. ftell : long ( ). fp : FILE . x3.

le : sim .

x6. MMIX-ARITH x4. . zero octa : octa .le info [ ].

If the =  header .148 MMIX-MEM 1. At present only dummy versions of these routines are implemented. Users who need nontrivial versions of spec read and/or spec write should prepare their own and link them with the rest of the simulator.h> #include "mmix-pipe. Memory-mapped input and output. reading and writing MMIX This module supplies procedures for memory addresses that exceed 48 bits.h" extern octa read hex ( ). Such addresses are used by the operating system for input and output. static char buf [20]. so they require special treatment. #include <stdio. 2.

pp. val :h. val ) octa addr . addr :h. return val . fgets (buf .  Springer-Verlag Berlin Heidelberg 1999 . addr :l). addr :l. addr :h. LNCS 1750. 148-149. addr :l. val = read hex (buf ). if (verbose & interactive read bit ) f printf ("** Read from loc %08x%08x: " .le for all modules =  found in the main program module = = interactive read bit of the verbose control is set. val :l. stdin ). val :l. g g 3. f octa val . 1999. ticks :l). Otherwise zero is read. void spec write (addr .E. if (verbose & show spec bit ) printf (" (spec_read %08x%08x from %08x%08x at time %d)\n" . D. void spec write ARGS ((octa . val . the user is supposed to supply values dynamically. octa )). addr :h. The default spec write just reports its arguments. 20. else val :l = val :h = 0. ticks :l). octa spec read ARGS ((octa )). val :h. octa spec read (addr ) octa addr . without actually writing any- thing. Knuth: MMIXware. f g if (verbose & show spec bit ) printf (" (spec_write %08x%08x to %08x%08x at time %d)\n" .

MMIX-PIPE x8. MMIX-PIPE x4. h: tetra . x17. . x17. spec write : ( ).h>. <stdio. stdin : FILE . MMIX-PIPE x8. printf : int ( ). ticks = macro. interactive read bit = 1  5. fgets : char ( ). MEMORY-MAPPED INPUT AND OUTPUT tetra . MMIX-PIPE x87. l: MMIX-PIPE MMIX-PIPE show spec bit MMMIX =1  6.MMIX-MEM: 149 ARGS = macro ( ). read hex : octa ( ). <stdio.h>. x17. x3. <stdio.h>. MMIX-PIPE x6. verbose : int . MMIX-PIPE x17. octa = struct .

Introduction.150 MMIX-PIPE 1. This program is the heart of the meta-simulator for the ultracon.

gurable MMIX pipeline: It de.

is also de.nes the MMIX run routine. MMIX init . Another routine. which does most of the work.

ned here. and so is a header .

h . The header .le called mmix_pipe.

le is used by the main routine and by other routines like MMIX con.

A lot of subtle things can happen when instructions are executed in parallel.edu . Readers of this program should be familiar with the explanation of MMIX architecture as presented in the main program module for MMMIX. Rewards will be paid to bug-. The author has tried his best to make everything correct : : : but the chances for error are great. which are compiled separately. then the program will be as useful as possible.g .stanford. Anyone who discovers a bug is therefore urged to report it as soon as possible to knuth-bug@cs. Therefore this simulator ranks among the most interesting and instructive programs in the author's experience.

" a technique of overlapping that is explained for the related DLX computer in Chapter 3 of Hennessy & Patterson's book Computer Architecture (second edition). while the XM and XD groups are experts in multilink suspensions and di erentials. while groups XF. (The XF people are good at oating the points. each group has its own workspace with room to deal with one car at a time.nders! (Except for bugs in version 0. The vehicles go next to one of the four \execution" groups: Group X handles routine maintenance.) It sort of boggles the mind when one realizes that the present program might someday be translated by a C compiler for MMIX and used to simulate itself. This high-performance prototype of MMIX achieves its eÆciency by means of \pipelining." explained in Chapter 4 of that book. 2. Group D (the \decode and dispatch" group) does the initial vehicle inspection and writes up an order that explains what kind of servicing is required. which we can think of as eight groups of auto mechanics. Group F (the \fetch" group) is in charge of rounding up customers and getting them to enter the assembly-line garage in an orderly fashion. Other techniques such as \dynamic scheduling" and \multiple issue. One good way to visualize the procedure is to imagine that somebody has organized a high-tech car repair shop according to similar principles. and XD are specialists in more complex tasks that tend to take longer. are used too. each specializing in a particular task.) When the relevant X group has . There are eight independent functional units. XM.

In a similar way. at regular 100-nanocentury intervals. where they send or receive messages and possibly pay money to members of the \memory" group. Finally all necessary parts are installed by members of group W.nished its work. the \write" group. cars drive to M station. Everything is tightly organized so that in most cases the cars move in synchronized fashion from station to station. and the car leaves the shop. most MMIX instructions can be handled in a .

 Springer-Verlag Berlin Heidelberg 1999 . Knuth: MMIXware. Each stage ideally takes one clock cycle. another being D. although XF. we might see one instruction being fetched. and (especially) XD are slower. 1999.ve-stage pipeline. or by XM for multiplication. or by XD for division or square root.E. XM. LNCS 1750. F{D{X{M{W. pp. 150-331. with X replaced by XF for oating-point addition or conversion. If the instructions enter in a suitable pattern.

and yet another is .151 MMIX-PIPE: INTRODUCTION decoded. and up to four being executed. while another is accessing memory.

) Consider. and r cycles. of course. the theory of knapsack programming. meanwhile. This prepares the way for the M stage to watch for over ow and to get ready for any exceptional action that might be needed with respect to the settings of special register rA. for example. Then the D stage recognizes the command as an ADD and acquires the current values of $Y and $Z. the X stage adds the values together.nishing up by writing new information into registers. This instruction enters the computer's processing unit in F stage. (Well. Finally. Pipelining with eight separate stages might therefore make the machine run up to 8 times as fast as it could if each instruction were being dealt with individually and without overlap.7 of The Art of Computer Programming. another instruction is being fetched by F. because of the shared M and W stages. On the next clock cycle. But we can achieve a factor of more than 7 if we are very lucky. tells us that the maximal achievable speedup is at most 8 1=p 1=q 1=r when XF. and XD have delays bounded by p. the ADD instruction. to be discussed in Section 7. all this is going on simultaneously during one clock cycle. perfect speedup turns out to be impossible. q. on the . XM. taking only one clock cycle if it is in the cache of instructions recently seen.

Although this process has taken .fth clock cycle. the sum is either written into $X or the trip handler for integer over ow is invoked.

thereby blocking somebody else from moving from F to D. In such cases the employees in some parts of the shop will occasionally be idle. but they also make it a lot more instructive because we can get a better understanding of the issues involved if we are required to treat them in greater generality. 5).ve clock cycles (that is. and potentially fetching. MMIX con. auto parts might not be readily available. Of course congestion can occur. inside a computer as in a repair shop. this program is designed for experiments with many kinds of pipelines. the net increase in running time has been only 1. or a car might have to sit in D station while waiting to move to XM. Such complications make this program more diÆcult than a simple pipeline simulator would be. given the sequence of customers that they encounter. potentially using additional functional units (such as several independent X groups). and executing several noncon icting instructions simultaneously. Sometimes there won't necessarily be a steady stream of customers. dispatching. But we assume that they always do their jobs as fast as possible. For example. With a clever person setting up appointments|translation: with a clever programmer and/or compiler arranging MMIX instructions|the organization can often be expected to run at nearly peak capacity. In fact.

g : void ( ). . MMIX init : void ( ). x10. x10. MMIX-CONFIG x38. MMIX run : void ( ).

#include <stdio.MMIX-PIPE: INTRODUCTION 152 3.h> #include <stdlib. Here's the overall structure of the present program module.h> #include "abstime.h> #include <math.h" h Header de.

nitions 6 i h Type de.

nitions 11 i h Global variables 20 i h External variables 4 i h Internal prototypes 13 i h External prototypes 9 i h Subroutines 14 i h External routines 10 i 4. The identi.

er Extern is used in MMIX-PIPE to declare variables that are accessed in other modules. Actually all appearances of `Extern ' are de.

but `Extern ' will become `extern ' in the header .ned to be blank here.

#de.le.

242. 247. 211. 86. This code is used in sections 3 and 5. and 349. 5.ne Extern = blank for us. 77. 69. 150. 136. 168. 59. 66. 115. The header . extern for them = format Extern extern h External variables 4 i  Extern int verbose . 60. 214. 207. 284. = controls the level of diagnostic output = See also sections 29. 98.

le repeats the basic de.

nitions and declarations.h 5 i  #de. h mmix-pipe.

ne Extern extern h Header de.

nitions 6 i h Type de.

Subroutines of this program are declared .nitions 11 i h External variables 4 i h External prototypes 9 i 6.

then with an old-style C function de.rst with a prototype. as in ANSI C.

The following preprocessor commands make this work correctly with both new-style and old-style compilers.nition. h Header de.

nitions 6 i  #ifdef __STDC__ #de.

ne ARGS (list ) list #else #de.

57. 87. . This code is used in sections 3 and 5. 129. 52.ne ARGS (list ) ( ) #endif See also sections 7. and 166. 8.

153 MMIX-PIPE: INTRODUCTION 7. So we bypass the library names here. h Header de. Some of the names that are natural for this program are in con ict with library names on at least one of the host computers in the author's tests.

nitions 6 i + #de.

ne random my random #de.

ne fsqrt my fsqrt #de.

h Header de. The amount of verbosity depends on the following bit codes.ne div my div 8.

nitions 6 i + #de.

ne issue bit (1  0) = show control blocks when issued. committed = = show the pipeline and locks on every cycle = #de. deissued.

ne pipe bit (1  1) #de.

ne coroutine bit (1  2) = show the coroutines when started on every cycle = #de.

ne schedule bit (1  3) = show the coroutines when scheduled = #de.

ne uninit mem bit (1  4) = complain when reading from an uninitialized chunk of memory = #de.

ne interactive read bit (1  5) = prompt user when reading from I/O location = #de.

ne show spec bit (1  6) = display special read/write transactions as they happen = #de.

ne show pred bit (1  7) = display branch prediction details = #de.

ne show wholecache bit (1  8) = display cache blocks even when their key tag is invalid = 9. The MMIX init ( ) routine should be called exactly once. after MMIX con.

175. Then MMIX run can be called as often as the user likes. 161. octa breakpoint )). h External prototypes 9 i  Extern void MMIX init ARGS ((void )). void ( ). __STDC__ . and 252.g ( ) has done its work but before the simulator starts to execute any programs. 178. 212. 180. This code is used in sections 3 and 5. Standard C. 209. MMIX con. Extern void MMIX run ARGS ((int cycs . See also sections 38.

. x10.g : MMIX-CONFIG x38. octa = struct . x10. MMIX init : void ( ). MMIX run : void ( ). x17.

179. print locks ( ). . h Type de. 210. while (cycs ) f if (verbose & (issue bit j pipe bit j coroutine bit j schedule bit )) printf ("*** Cycle %d\n" . h Perform one machine cycle 64 i. 162. This code is used in section 3. and 253. if (verbose & pipe bit ) f print pipe ( ). g See also sections 39.MMIX-PIPE: 154 INTRODUCTION 10. void MMIX run (cycs . 176. breakpoint ) int cycs . h Initialize everything 22 i. g if (breakpoint hit _ halted ) f if (breakpoint hit ) printf ("Breakpoint instruction if (halted ) printf ("Halted at time break . ticks :l 1). f h Local variables 12 i. 181. ticks :l %d\n" . cease : . octa breakpoint . j . ticks :l). 213. 11. g g cycs fetched at time %d\n" . h External routines 10 i  void MMIX init ( ) f g register int i.

12. 40. j. 164. 23. 246. 37. See also sections 124 and 258. 76. 68. m. 44. wow g bool .nitions 11 i  typedef enum f false . 1). This code is used in sections 3 and 5. bool breakpoint hit = false . = slightly extended booleans = See also sections 17. . 206. This code is used in section 10. and 371. h Local variables 12 i  register int i. bool halted = false . 167. true .

155 MMIX-PIPE: INTRODUCTION 13. Error messages that abort this program are called panic messages. The macro called confusion will never be needed unless this program is internally inconsistent. #de.

f ) #de.ne errprint0 (f ) fprintf (stderr .

a) fprintf (stderr . f. a) #de.ne errprint1 (f.

a. a. f. b) #de.ne errprint2 (f. b) fprintf (stderr .

ne panic (x) f errprint0 ("Panic: " ). x. errprint0 ("!\n" ). g #de. expire ( ).

x8. 25. 170. verbose : int . 182. 250. ticks :l). <stdio. 198. stderr : FILE . . the net e ect of our data structures and algorithms is intended to be equivalent to the net e ect of a silicon implementation. 381. x17. and 377. 196. x39. we will use data structures and algorithms appropriate to the C programming language. 384. 193. exit : void ( ). 199.ne confusion (m) errprint1 ("This can't happen: %s" . in terms of the repair-station analogy described in the main program. 15. 14. 156. For example. 92. 72. x17. 183. unless computer hardware could \cheat" in an equivalent way. fprintf : int ( ). 192. 240. x4.)\n" .h>. 188. 204. <stdlib. 189. h Subroutines 14 i  static void expire ( ) = the last gasp before dying = f if (ticks :h) errprint2 ("(Clock time is %dH+%d. At every clock cycle. g See also sections 19. we'll use pointers and arrays. pipe bit = 1  1. 195. exit ( 2). 24. 96. octa = struct . l: tetra . ticks :l). 174. 95. 241. print locks : void ( ). 205. although a real pipeline would have them act in parallel. See also sections 18. ARGS = macro. The methods used below are essentially equivalent to those used in real machines today. 254. The data structures of this program are not precisely equivalent to logical gates that could be implemented directly in silicon. 201. 63. 185. 56. x8. 172. 34. 91. 379. 28. 45. 93. printf : int ( ). x8. 21. The coroutines are performed sequentially. We will not \cheat" by letting one coroutine access a value early in its cycle that another one computes late in its cycle. 171. ticks :h. 46. instead of buses and ports and latches. However. This code is used in section 3. we will call on each active coroutine to do one phase of its operation. 62. 169. 55. 31. 97. Each functional unit in the MMIX pipeline is programmed here as a coroutine in C. <stdio.h>. x253. ticks = macro. coroutine bit = 1  2. 33. 203.h>. issue bit = 1  0. This code is used in section 3. 27. x17. 190. 35. schedule bit = 1  3. 184. 32. x87. 157. 94. 208. except that diagnostic facilities are added so that we can readily watch what is happening. m) h Internal prototypes 13 i  static void expire ARGS ((void )). 202. x6. 251. 90. 159. 187. 42. 30. 158.h>. and 387. 73. print pipe : void ( ). x8. 378. 255. 200. <stdio. 191. this corresponds to getting each group of auto mechanics to do one unit of operation on a car. 43. h: tetra .)\n" . else errprint1 ("(Clock time is %d. 173. 186.

Where should we begin? It is tempting to start with a global view of the simulator and then to break it down into component parts. But that task is too daunting. So let us look . Low-level routines. because there are so many unknowns about what basic ingredients ought to be combined when we construct the larger components.MMIX-PIPE: LOW-LEVEL ROUTINES 156 16.

Once we have created some infrastructure.rst at the primitive operations on which the superstructure will be built. we'll be able to proceed with con.

dence to the larger tasks ahead. because nearly every computer available to the author at the time of writing (1998{1999) was limited in that way. The de. Details of the basic arithmetic appear in a separate program module called MMIX-ARITH. because the same routines are needed also for the assembler and for the non-pipelined simulator. 17. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic.

to conform with the de. if necessary.nition of type tetra should be changed.

nitions found there. h Type de.

70. 21. oplus (y. z ) returns the sum of octabytes y and z. 374. and 388. 88. o:l). h Subroutines 14 i + static void print octa (o) octa o. h Subroutines 14 i + extern octa oplus ARGS ((octa y. 51. f g if (o:h) printf ("%x%08x" . o:h. 303. 50. for example. g octa . 54.nitions 11 i + typedef unsigned int tetra . See also sections 36. 65. else printf ("%x" . 107. = neg one :h = neg one :l = 1 = extern octa neg one . = bits set by oating point operations = = the current rounding mode = extern int cur round . 78. = set by certain subroutines for signed arithmetic = extern int exceptions . 376. octa z)). 48. 83. 154. Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes. 20. = unsigned y z = . 127. 238. h Internal prototypes 13 i + static void print octa ARGS ((octa )). 305. 248. Multiplication returns the high half of a product in the global variable aux . 41. = two tetrabytes makes one octabyte = 18. This code is used in section 3. 235. = for systems conforming to the LP-64 data model = typedef struct f tetra h. 53. = auxiliary output of a subroutine = extern bool over ow . octa z)). 230. division returns the remainder in aux . 285. h Global variables 20 i  = zero octa :h = zero octa :l = 0 = extern octa zero octa . 315. o:l). l. 99. 194. extern octa aux . = unsigned y + z = extern octa ominus ARGS ((octa y. 19. 148.

octa z)). = y  s. int s)). when z 6= 0. omult ARGS ((octa y. bool xor )). octa z )). 0  s  64 = shift left ARGS ((octa y. octa z. int )). = half of WDIF = extern tetra wyde di ARGS ((tetra y. = oating point x = y  z = extern octa fmult ARGS ((octa y. = signed x = y  z . = y ^ z = = y  s. = MOR or MXOR = extern octa bool mult ARGS ((octa y. octa z )). int delta )). tetra z )). = oating point x = z = extern octa fremstep ARGS ((octa y. octa y. x) = y  z = signed omult ARGS ((octa y. octa z. octa z )). extern extern extern extern extern extern extern = octa octa octa octa octa octa octa oating point x rem z = y rem z = extern octa . octa z)). = unsigned y + Æ (Æ is signed) = oand ARGS ((octa y. signed if :uns = shift right ARGS ((octa y. int uns )). octa z )). octa z )). aux = (x.157 MMIX-PIPE: LOW-LEVEL ROUTINES incr ARGS ((octa y. octa z)). = unsigned (x. = unsigned (aux . y )=z . extern octa fplus ARGS ((octa y. octa z )). = store short oat = extern tetra store sf ARGS ((octa x)). aux = y mod z = = x =  (z )  = extern int count bits ARGS ((tetra z )). tetra z)). = y ^ z = oandn ARGS ((octa y. = signed y=z . = oating point p x = y z = extern octa froot ARGS ((octa . setting over ow = extern octa odiv ARGS ((octa x. y ) mod z = extern octa signed odiv ARGS ((octa y. = load short oat = extern octa load sf ARGS ((tetra z )). int delta )). = half of BDIF = extern tetra byte di ARGS ((tetra y. = oating point x = y z = extern octa fdivide ARGS ((octa y. int s.

octa eps . 0. = = oating point x = round(z ) = 1. octa z. octa z)).ntegerize ARGS ((octa z. y > z . extern int fcomp ARGS ((octa y. int sim )). = . int shrt )). int mode )). 1. = x = sim ? [y  z ()] : [y  z ()] = extern octa oatit ARGS ((octa z. int mode . int unsgnd . y k z = extern int fepscomp ARGS ((octa y. or 2 if y < z . y = z .

x to oat = extern octa .

MMIX-ARITH x84. MMIX-ARITH x29. int = macro. exceptions : int . . cur round : int . MMIX-ARITH x44. count bits : int ( ). byte di : tetra ( ). bool = enum. MMIX-ARITH x4. fepscomp : int ( ). fdivide : octa ( ). MMIX-ARITH x32. MMIX-ARITH x26. MMIX-ARITH x27. MMIX-ARITH x50. MMIX-ARITH x30. octa . bool mult : octa ( ). fcomp : int ( ). x11. x6.xit ARGS ((octa z.

.ntegerize : octa ( ). MMIX-ARITH x86.

MMIX-ARITH x88. oatit : = oat to .xit : octa ( ). ARGS aux : mode )).

MMIX-ARITH x5. signed omult : octa ( ). zero octa : octa . MMIX-ARITH x7. MMIX-ARITH x40. shift left : octa ( ). omult : octa ( ). MMIX-ARITH x39. froot : octa ( ). fplus : octa ( ). MMIX-ARITH x25. oplus : octa ( ). . printf : int ( ). MMIX-ARITH x912. MMIX-ARITH x7. MMIX-ARITH x28. wyde di : tetra ( ). MMIX-ARITH x24.x = octa ( ). MMIX-ARITH x6. signed odiv : octa ( ). incr : octa ( ). store sf : tetra ( ). odiv : octa ( ). MMIX-ARITH x46. neg one : octa . fremstep : octa ( ). MMIX-ARITH x5. MMIX-ARITH x4. MMIX-ARITH x4. MMIX-ARITH x89. oand : octa ( ). over ow : bool . shift right : octa ( ). load sf : octa (). oandn : octa ( ). MMIX-ARITH x4. ominus : octa ( ). MMIX-ARITH x8.h>. MMIX-ARITH x93. MMIX-ARITH x41. MMIX-ARITH x13. fmult : octa ( ). MMIX-ARITH x25. MMIX-ARITH x12. <stdio.

71. 61. 236. This code is used in section 10. 79. 89. We had better check that our 32-bit assumption holds. 116.MMIX-PIPE: LOW-LEVEL ROUTINES 22. 153. 158 . 249. 1):h 6= ffffffff) panic (errprint0 ("Incorrect implementation of type tetra" )). 231. See also sections 26. and 286. h Initialize everything 22 i  # if (shift left (neg one . 128.

Coroutines. Coroutines|sometimes called threads|are more or less independent processes that share and pass data and control back and forth. in which new threads are spawned dynamically and have independent stacks for computation. They correspond to the individual workers in an organization.159 MMIX-PIPE: COROUTINES 23. We don't need the full power of recursive coroutines. after all. we are. simulating a . As stated earlier. this program can be regarded as a system of interacting coroutines.

xed piece of hardware. The total number of coroutines we deal with is established once and for all by the MMIX con.

and each coroutine has a .g routine.

g. Each coroutine has a symbolic name for diagnostic purposes (e. The simulation operates one clock tick at a time. by executing all coroutines scheduled for time t before advancing to time t + 1. h Type de. 2 for the second stage of a pipeline).xed amount of local data. a nonnegative stage number (e..g. a pointer to a lock variable (or  if no lock is currently relevant). ALU1 ). a pointer to the next coroutine scheduled at the same time (or  if the coroutine is unscheduled). The coroutines at time t may decide to become dormant or they may reschedule themselves and/or other coroutines for future times. and a reference to a control block containing the data to be processed..

= symbolic identi.nitions 11 i + typedef struct coroutine struct f char name .

h: tetra . errprint0 = macro ( ). = what it might be locking = struct coroutine struct lockloc . MMIX con. x17. ARGS = macro. h Internal prototypes 13 i + static void print coroutine id ARGS ((coroutine )). static void errprint coroutine id ARGS ((coroutine )). x13. g coroutine . x44. control struct : struct .cation of a coroutine = int stage . = its rank = = its successor = struct coroutine struct next . = its data = struct control struct ctl . 24. x6.

MMIX-CONFIG x38. shift left : octa ( ).g : void ( ). . neg one : octa . MMIX-ARITH x4. panic = macro ( ). x13. MMIX-ARITH x7.

c~ name . : : : . f g if (c) printf ("%s:%d" . 26. one each for times t. c~ stage ).MMIX-PIPE: COROUTINES 160 25. c~ name . when t is the current clock time. h Subroutines 14 i + static void print coroutine id (c) coroutine c. c~ stage ). t + ring size 1. Coroutine control is masterminded by a ring of queues. All scheduling is . else errprint0 ("??" ). f g if (c) errprint2 ("%s:%d" . static void errprint coroutine id (c) coroutine c. t + 1. else printf ("??" ).

rst-come-.

rst-served. We want to process the later stages of a pipeline . except that coroutines with higher stage numbers have priority.

for the same reason that a car must drive from M station into W station before another car can enter M station. Each queue is a circular list of coroutine nodes.rst. in this sequential implementation. linked together by their next .

g 27. p < ring + ring size . h Subroutines 14 i + static void schedule (c. and we have c~ stage  c~ next~ stage unless c = h. int )). d)). h Initialize everything 22 i + f register coroutine p. p ++ ) p~ next = p. we call the function schedule (c. register coroutine p = &ring [tt ]. A list head h with stage = max stage comes at the end and the beginning of the queue. errprint1 (" with delay %d" . it does not a ect the computation. 28. but we will generally set s to the state at which the scheduled coroutine will begin.. int d.elds. d.) h Internal prototypes 13 i + static void schedule ARGS ((coroutine . To schedule a coroutine c with positive delay d < ring size . from back to front. int . . f register int tt = (cur time + d) % ring size . for (p = ring . s) coroutine c. (All stage numbers of legitimate coroutines are less than max stage . Initially all queues are empty. (The s parameter is used only if scheduling is being logged.) The queued items are h~ next . d. etc. s). h~ next~ next . = start at the list head = = do a sanity check = if (d  0 _ d  ring size ) panic (confusion ("Scheduling " ). s. errprint coroutine id (c).

printf (" at time %d. c~ next = p~ next . state %d\n" . s). ticks :l + d.161 MMIX-PIPE: COROUTINES while (p~ next~ stage < c~ stage ) p = p~ next . h External variables 4 i + = set by MMIX con. print coroutine id (c). if (verbose & schedule bit ) f printf (" scheduling " ). p~ next = c. g g 29.

The all-important ctl . 30.g . Extern coroutine ring . Extern int cur time . must be suÆciently large = Extern int ring size .

will be explained below. One of its key components is the state . which contains the data being manipulated.eld of a coroutine.

we often want it to begin in state 0. x13. x23. 0). schedule (c. int )). errprint0 = macro ( ). h Subroutines 14 i + static void startup (c. ARGS confusion = macro ( ). h Internal prototypes 13 i + static void startup ARGS ((coroutine . int d. 31. = macro. When we schedule a coroutine for a new task. d) coroutine c.eld. max stage = 99. x17. x13. errprint2 = macro ( ). MMIX con. Extern = macro. f g c~ ctl~ state = 0. x13. errprint1 = macro ( ). l: tetra . x23. x13. x6. which helps to specify the next actions the coroutine will perform. d. x4. x129. coroutine = struct . ctl : control .

printf : int ( ). state : int . x23. x4.g : void ( ). stage : int . schedule bit = 1  3. panic = macro ( ). x44. next : coroutine . MMIX-CONFIG x38. <stdio. x87. verbose : int . ticks = macro. . x13. name : char .h>. x23. x8. x23.

The following routine removes a coroutine from whatever queue it's in.MMIX-PIPE: COROUTINES 162 32. such a self-loop can occur when a coroutine goes to sleep and expects to be awakened (that is. The case c~ next = c is also permitted. Sleeping coroutines have important data in their ctl . scheduled) by another coroutine.

eld. which have c~ next = . they are therefore quite di erent from unscheduled or \unemployed" coroutines. An unemployed coroutine is not assumed to have any valid data in its ctl .

h Subroutines 14 i + static coroutine queuelist (t) int t. we empty the queue called ring [t] and link its items in the opposite order (from to back). = dummy coroutine at origin of circular list = . q = &sentinel . printf ("\n" ). 36. The following subroutine uses the well known algorithm discussed in exercise 2. for (p = ring [t]:next .eld. c~ next = . p = r) f r = p~ next .3{7 of The Art of Computer Programming. 35. r. if (c~ next ) f for (p = c. When it is time to process all coroutines that have queued up for a particular time t. p~ next 6= c. g g ring [t]:next = &ring [t]. h Internal prototypes 13 i + static coroutine queuelist ARGS ((int )). h Subroutines 14 i + static void unschedule (c) coroutine c. p 6= &ring [t]. h Internal prototypes 13 i + static void unschedule ARGS ((coroutine )). print coroutine id (c). sentinel :next = q. q = p.2. p = p~ next ) . 33. f register coroutine p. return q. if (verbose & schedule bit ) f printf (" unscheduling " ). h Global variables 20 i + coroutine sentinel . f register coroutine p. g g g 34. p~ next = q. p~ next = c~ next .

Therefore a coroutine might need to be aborted before it has .163 MMIX-PIPE: COROUTINES 37. in the sense that we want certain results to be ready if they prove to be useful. we understand that speculative computations might not actually be needed. Coroutines often start working on tasks that are speculative.

All coroutines must be written in such a way that important data structures remain intact even when the coroutine is abruptly terminated. #de.nished its work. we need to be sure that \locks" on shared resources are restored to an unlocked state when a coroutine holding the lock is aborted. In particular. otherwise it points to the coroutine responsible for unlocking it. A lockvar variable is  when it is unlocked.

(c)~ lockloc = &(l). l) f l = c. g #de.ne set lock (c.

ne release lock (c. l) f l = . (c)~ lockloc = . g h Type de.

ring : coroutine . x23. wbuf lock~ stage ). print cache locks (Icache ). if (dispatch lock ) printf ("dispatch locked by %s:%d\n" . 38. ctl : control . x23. speed lock~ stage ). Dcache : cache . mem lock : lockvar . dispatch lock~ name . coroutine = struct . print coroutine id : static printf : int ( ). x23. x23. lockvar . mem lock~ stage ). x29. x25. x23. name : char . if (clean lock ) printf ("cleaner locked by %s:%d\n" . x23. DTcache : cache . print cache locks (Dcache ). x230. clean lock~ name .nitions 11 i + typedef coroutine lockvar . x4. speed lock : lockvar . x8.h>. <stdio. void ( ). stage : int . clean lock~ stage ). schedule bit = 1  3. mem lock~ name . if (speed lock ) printf ("write buffer flush locked by %s:%d\n" . next : coroutine . Extern = macro. h External prototypes 9 i + Extern void print locks ARGS ((void )). speed lock~ name . print cache locks (DTcache ). wbuf lock~ name . wbuf lock : lockvar . x6. x168. h External routines 10 i + void print locks ( ) f g print cache locks (ITcache ). if (wbuf lock ) printf ("head of write buffer locked by %s:%d\n" . Scache : cache . x4. x247. if (mem lock ) printf ("mem locked by %s:%d\n" . x168. . x174. 39. x168. print cache locks : static void ( ). verbose : int . x214. lockloc : coroutine . print cache locks (Scache ). Icache : cache . x247. = macro. x168. dispatch lock~ stage ). ARGS clean lock : ITcache : cache . dispatch lock : lockvar . x168. x65.

Many of the quantities we deal with are speculative values that might not yet have been certi.MMIX-PIPE: 164 COROUTINES 40.

they might not yet have been calculated. The value o is meaningful only if the pointer p is . otherwise p points to a source of further information. in fact. An additional known bit tells whether the o .ed as part of the \real" calculation. A spec consists of a 64-bit quantity o and a pointer p to a specnode . A specnode is a 64-bit quantity o together with links to other specnode s that are above it or below it in a doubly linked list.

eld has been calculated. There also is a 64-bit addr .

to identify the list and give further information. A specnode list keeps track of speculative values related to a speci.eld.

h Type de.c register or to all of main memory. we will discuss such lists in detail later.

nitions 11 i + typedef struct f octa o. else f printf (">" ). h Subroutines 14 i + static void print spec (s) spec s. 43. g else if (s:o:h _ s:o:l) f print octa (s:o). h Internal prototypes 13 i + static void print spec ARGS ((spec )). print specnode id (s:p~ addr ). print specnode id (s:addr ). g specnode . g static void print specnode (s) specnode s. down . octa addr . else printf ("?" ). 42. f g if (s:known ) f print octa (s:o). struct specnode struct up . h Global variables 20 i + = zero spec :o:h = zero spec :o:l = 0 and zero spec :p =  = spec zero spec . f g if (:s:p) print octa (s:o). bool known . printf ("!" ). typedef struct specnode struct f octa o. printf ("?" ). g . 41. g spec . struct specnode struct p.

. ( ). printf : int ( ). h: tetra . octa = struct . x19. bool = enum. x17. x17. x6. l: COROUTINES print specnode id : static void tetra . <stdio. x91.h>. x11.165 ARGS MMIX-PIPE: = macro. x17. print octa : static void ( ).

and rl . A control record contains the original location of an instruction. and its four bytes OP X Y Z. which are specnode records called x. Each group of employees updates the work order as the car moves through the shop. a. The inputs to a STO command are $Y. there is one \output." and the . The analog of an automobile in our simulator is a block of data called control . and rD. An instruction has up to four inputs. b and ra . z.) For example. it also has up to three outputs. which refer to MMIX 's internal registers rA and rL. which represents all the relevant facts about an MMIX instruction.MMIX-PIPE: COROUTINES 166 44. which are spec records called y. $Z. and $X. (We usually don't mention the special input ra or the special output rl . the main inputs to a DIVU command are $Y. $Z. the outputs are the quotient $X and the remainder rR. We can think of it as the work order attached to a car's windshield.

if any. Each control block also points to the coroutine that owns it. And it has various other .eld x:addr will be set to the physical address of the memory location corresponding to virtual address $Y + $Z.

for example. we have already mentioned the state .elds that contain other tidbits of information.

which often governs a coroutine's actions. The i .eld.

for example. is generally used together with state to switch between alternative computational steps. which contains an internal operation code number. If. the op .eld.

We shall de.eld is SUB or SUBI or NEG or NEGI . the internal opcode i will be simply sub .

ne all the .

The go . An actual hardware implementation of MMIX wouldn't need all the information we are putting into a control block.elds of control records now and discuss them later. Some of that information would typically be latched between stages of a pipeline. by counting how many registers of that kind would be in use if we were mimicking low-level hardware details more precisely." We simulate rename registers only indirectly. other portions would probably appear in so-called \rename registers.

eld is a specnode for convenience in programming. although we use only its known and o sub.

elds. It generally contains the address of the subsequent instruction. h Type de.

yy . ra .nitions 11 i + h Declare mmix opcode and internal opcode 47 i typedef struct control struct f octa loc . = inputs = specnode x. = the original instruction bytes = spec y. = does x correspond to a memory write? = bool mem x . = does x correspond to a rename register? = bool ren x . = should rU be increased? = = should we stall until b:p  ? = bool need b . a. int state . unsigned char xx . = does a correspond to a rename register? = bool ren a . . rl . zz . = a coroutine whose ctl this is = = internal opcode = internal opcode i. go . = virtual address where an instruction originated = mmix opcode op . = does rl correspond to a new value of rL? = bool set l . = internal mindset = bool usage . b. z. = outputs = coroutine owner . = should we stall until ra :p  ? = bool need ra .

x40. g control . denout . x17. p: specnode . x40. x40.167 MMIX-PIPE: COROUTINES bool interim . x23. mmix opcode = enum . x40. x49. ctl : control . specnode = struct . unsigned int interrupt . internal opcode = enum . x47. = execution time penalties for denormal handling = = speculative rO and rS before this instruction = octa cur O . octa = struct . = does this instruction generate an interrupt? = = generic pointers for miscellaneous use = void ptr a . ptr b . = does this instruction need to be reissued on interrupt? = = arithmetic exceptions for event bits of rA = unsigned int arith exc . x23. ptr c . cur S . x11. known : bool . coroutine = struct . x 49. x40. . spec = struct . sub = 31. = history bits for use in branch prediction = int denin . bool = enum. x40. o: octa . unsigned int hist . addr : octa .

c~ state ). f octa default go . print bits (c~ arith exc  8). g g if (verbose & show pred bit ) printf (" hist=%x" . c~ zz . 4). if (c~ need ra ) f printf (" rA=" ). print spec (c~ b). print bits (c~ interrupt ). (c~ y:o:h _ c~ y:o:l _ c~ y:p) f printf (" y=" ). if (c~ go :o:l 6= default go :l _ c~ go :o:h 6= default go :h) f printf (" ->" ). default go = incr (c~ loc . h Subroutines 14 i + ARGS ((control )). . c~ op . printf (": %02x%02x%02x%02x(%s)" . print spec (c~ ra ). c~ hist ). g (c~ z:o:h _ c~ z:o:l _ c~ z:p) f printf (" z=" ).MMIX-PIPE: 168 COROUTINES 45. print octa (c~ x:o). if (c~ need b ) printf ("*" ). if (c~ loc :h _ c~ loc :l _ c~ op _ c~ xx _ c~ yy _ c~ zz _ c~ owner ) f print octa (c~ loc ). g (c~ b:o:h _ c~ b:o:l _ c~ b:p _ c~ need b ) f printf (" b=" ). print specnode (c~ x). printf ("%c" . print spec (c~ z). g else if (c~ x:o:h _ c~ x:o:l) f printf (" x=" ). static void print control block (c) control c. print octa (c~ go :o). g if (c~ arith exc ) printf (" exc=" ). g if (c~ interrupt ) f printf (" int=" ). h Internal prototypes 13 i + static void print control block 46. g if (c~ ren a ) f printf (" a=" ). (c~ interim ) printf ("+" ). g if if if if if g (c~ usage ) printf ("*" ). c~ x:known ? '!' : '?' ). print specnode (c~ rl ). print specnode (c~ a). print spec (c~ y). g if (c~ ren x _ c~ mem x ) f printf (" x=" ). c~ yy . internal op name [c~ i]). c~ xx . printf (" state=%d" . g if (c~ set l ) f printf (" rL=" ).

MMIX-ARITH x6. y: spec . x40. usage : bool . print bits : static void ( ). verbose : int . x44.h>. print specnode : static void yy : unsigned char . x44. x44. x44. print octa : static void ( ). o: LISTS ra : spec . l: tetra . x44. z : spec . octa . internal op name : char [ ]. x49. need ra : bool . hist : unsigned int . x44. x17. x43. set l : bool . go = 72. x44. printf : int ( ). i: internal opcode . x19.169 a: specnode . x44. x44. x40. x44. ARGS MMIX-PIPE: loc : octa . x44. print spec : static void ( ). need b : bool . x44. . interrupt : unsigned int . state : int . b: spec . x43. x44. x50. x44. x44. x44. x17. ren x : bool . x: specnode . incr : octa ( ). known : bool . x44. x44. ( ). x44. owner : coroutine . octa = struct . show pred bit = 1  7. op : mmix opcode . x44. control = struct . x44. mem x : bool . <stdio. x40. = macro. zz : unsigned char . h: tetra . x8. rl : specnode . x44. x17. p: specnode . arith exc : unsigned int . x44. x6. interim : bool . x56. x44. ren a : bool . x44. x44. xx : unsigned char . x4. x44.

STBI . SRUI . TRIP g mmix opcode . SWYM . ZSNP . LDT . STHT . IIADDU . ANDI . BNPB . CSZI . CMPU . CSZ . DIVUI . FCMPE . STOI . STCO . in order. CSN . GET . CSNNI . BZB . SFLOTI . CSNP . POP . NEGUI . JMPB . BEV . MUXI . NORI . ZSNI . GOI . PRESTI . ORI . NAND . IVADDU . SAVE . ANDNML . LDVTS . ADDU . LDTI . ZSNNI . VIIIADDUI . SR . CMPI . INCL . ZSZ . FADD . SUBUI . INCMH . ZSZI . FINT . STWU . See also section 49. XVIADDUI . PBEV . ZSNPI . PUTI . STWUI . STB . ORH . PUSHJB . SFLOT . STSF . FLOT . STHTI . STBUI . LDOU . IVADDUI . FMUL . LDHT . CSNN . SADDI . MXOR . SADD . FLOTI . SLUI . BP . LDO . ZSOD . SLI . PUSHJ . PBNB . SUB . ORMH . ANDN . LDOI . ODIF . LDTU . SYNCIDI . SETMH . SETH . PBP . GETAB . LDOUI . PBNZB . JMP . GETA . PBZB . BZ . FDIV . LDBUI . CSEV . FUN . ZSNZ . MXORI . LDTUI . CSNZI . PBOD . CSP . SYNCID . MUL . SYNC . CSNPI . STOUI . CSNI . ZSNN . BN . INCH . MULUI . ADDI . ANDNH . SUBI . ORNI . STW . SLU . PREGO . ZSN . LDUNCI . STTI . SL . SETL . LDSF . SYNCD . BDIF . TDIF . FEQL . GO . CSODI . This code is used in section 44. LDWUI . PUT . STTU . Lists. SRU . CSWAP . FUNE . PBNZ . MORI . LDB . BOD . RESUME . LDW . CSWAPI . MULI . PREST . STUNC . CSNZ . LDBU . NEGU . SFLOTU . 170 . DIVU . FLOTU . PRELD . LDHTI . SRI . MULU . STBU . LDSFI . NEG . PBNPB . BNB . NEGI . h Declare mmix opcode and internal opcode 47 i  typedef enum f TRAP . VIIIADDU . SUBU . SFLOTUI . PBNNB . TDIFI . BNNB . LDWI . CMP . DIV . BNZB . ADDUI . NANDI . NOR . FSUB . ANDNMH . Here is a (boring) list of all the MMIX opcodes. ORN . PBPB . XVIADDU . CSPI . CSEVI . ORML . STTUI . XOR . ANDNI . STWI . STSFI . ODIFI . ZSODI . CSOD . ZSEV . STOU . PRELDI . INCML . ANDNL . DIVI . FREM . ORL . STT . LDVTSI . CMPUI . PUSHGO . NXORI . LDUNC . PUSHGOI . STO . STUNCI . ADD . ZSNZI . BNN . OR . LDBI . PREGOI . FIXU . BNZ . AND . PBZ . MUX . FCMP . SETML .MMIX-PIPE: LISTS 47. UNSAVE . BDIFI . LDWU . PBODB . PBNP . PBEVB . FEQLE . XORI . BODB . PBN . STCOI . MOR . ZSEVI . ZSPI . SYNCDI . ZSP . BPB . IIADDUI . FIX . PBNN . BEVB . FLOTUI . WDIFI . BNP . NXOR . WDIF . FSQRT .

"8ADDU" . "ADD" . "STUNCI" . "LDTU" . "SUBU" . "GET" . "BEVB" . "FADD" . "PBOD" . "PUSHGO" . "FDIV" . "FSUB" . "XOR" . "STT" . "LDVTSI" . "PBODB" . "LDBI" . "BDIFI" . "PRELD" . "GETAB" . "XORI" . "NEGI" . "PBNZ" . "ANDNL" . "STTU" . "SETL" . "STOI" . "ORML" . "PBN" . "FINT" . "SADD" . "BNNB" . "CSPI" . "8ADDUI" . "INCML" . "FUN" . "GOI" . "FMUL" . "JMP" . "ZSEV" . "BDIF" . "PBNP" . "MOR" . "SFLOTUI" . "STHTI" . "CMPU" . "LDHTI" . "DIV" . "SLUI" . "CSWAPI" . "ADDU" . "MULU" . "PBZ" . "SFLOT" . "SYNCIDI" . "STCOI" . "DIVI" . "CSNNI" . "WDIF" . "CSNP" . "PBEV" . "PBNNB" . "STTUI" . "PUSHJ" . "LDUNCI" . "PBP" . "ZSNZ" . "STBU" . "STWI" . "ZSNP" . "RESUME" . "CSOD" . "ZSZI" . "FIXU" . "FCMPE" . "BODB" . "FLOTI" . "4ADDU" . "BOD" . "INCL" . "MULUI" . "MXOR" . "LDWU" . "ANDN" . "DIVUI" . "ZSODI" . "NAND" . "MULI" . "GETA" . "ZSNI" . "ANDNH" . "LDOU" . "BPB" . "SUBUI" . "ANDI" . "POP" . "STTI" . "PBNZB" . "SLU" . "SETML" . "BNP" . "ORN" . "CSP" . "16ADDUI" . "LDSF" . "ZSN" . "FSQRT" . "SYNCD" . "NEGU" . "BNB" . "STWU" . "BNZ" . "JMPB" . "SETMH" . "PREGOI" . "TRIP" g. "PUTI" . "STSF" . "ORMH" . "OR" . "BZ" . "PUSHGOI" . "BNN" . "WDIFI" . "FEQL" . "BN" . "STO" . "ADDI" . "SRU" . "LDT" . "2ADDUI" . "SRUI" . "SUB" . "LDBU" . "CSNZ" . "PREST" . "CMP" . "ZSEVI" . "BZB" . "ZSPI" . "STUNC" . "PBNPB" . "LDTUI" . "SYNC" . "STHT" . "FEQLE" . "LDUNC" . "UNSAVE" . "INCH" . "TDIF" . "FUNE" . "STBUI" . "GO" . "2ADDU" . "SETH" . "CSZ" . "SAVE" . "PUT" . "AND" . "SR" . "LDOUI" . "ZSOD" . "SFLOTU" . "SUBI" . "STOUI" . "CSNPI" . "LDB" . "PBNB" . "LDVTS" . "PRELDI" . "ORNI" . "PREGO" . "ANDNMH" . "CMPUI" . "ODIF" . "ZSNN" . "STOU" . "16ADDU" . "CSNI" . "FLOT" . "4ADDUI" . "PBPB" . "MUL" . "BNZB" . "SL" . "NEGUI" . "MORI" .171 MMIX-PIPE: LISTS 48. "FIX" . "LDHT" . "STSFI" . "PBEVB" . "ANDNI" . . "NXORI" . "CSNN" . "FLOTUI" . "TDIFI" . "NEG" . "SWYM" . "SADDI" . "NXOR" . "ANDNML" . "ZSZ" . "CSN" . "ZSNZI" . "BNPB" . "CSODI" . "PBZB" . "LDTI" . "STWUI" . "ZSP" . "INCMH" . "ODIFI" . "ORL" . "SYNCID" . "STCO" . "SLI" . "FREM" . "FCMP" . "SFLOTI" . "NANDI" . "LDBUI" . "ZSNNI" . "STBI" . "CSZI" . "MUX" . "CSEV" . "DIVU" . "LDSFI" . "BEV" . "CSEVI" . "MXORI" . "LDW" . "PRESTI" . "NOR" . "STW" . "ADDUI" . h Global variables 20 i + char opcode name [ ] = f"TRAP" . "ZSNPI" . "LDOI" . "STB" . "ORH" . "BP" . "LDO" . "PUSHJB" . "SYNCDI" . "LDWUI" . "ORI" . "NORI" . "LDWI" . "PBNN" . "FLOTU" . "MUXI" . "SRI" . "CSNZI" . "CSWAP" . "CMPI" .

And here is a (likewise boring) list of all the internal opcodes. less than or equal to max pipe op . correspond to operations for which arbitrary pipeline delays can be con.MMIX-PIPE: LISTS 172 49. The smallest numbers.

gured with MMIX con.

greater than max real command . and to compute page table entries.g . The largest numbers. correspond to internally generated operations that have no oÆcial OP code. for example. there are internal operations to shift the pointer in the register stack. h Declare mmix opcode and internal opcode 47 i + #de.

ne max pipe op feps #de.

: : : . sadd . = multiplication by 1{8. mux . fsqrt . mor . mul5 . 57{64 bits = div .ne max real command trip typedef enum f mul0 . fmul . mul8 . fadd . 9{16. mul6 . mul2 . mul3 . = multiplication by zero = mul1 . fdiv . sh . . mul4 . mul7 .

.nt .

orn . nor . OR[M][H.L] = = SUB[I] . ANDN[M][H. or . add . addu . divu . FSUB = = FMUL = = FDIV = = FSQRT = = FINT = = FIX[U] = = [S]FLOT[U][I] = = FCMPE . fcmp . fsub .]ADDU[I] . = DIV[U][I] = = S[L. NEG[I] = = SUBU[I] . set .16.R][U][I] = = MUX[I] = = SADD[I] = = M[X]OR[I] = = FADD .L] = = NAND[I] = = XOR[I] = = NXOR[I] = = SLU[I] = = SRU[I] = . frem . NEGU[I] = = SET[M][H.L] = = ORN[I] = = NOR[I] = = AND[I] = = ANDN[I] . mulu . FEQL = = FSUB = = FREM = = MUL[I] = = MULU[I] = = DIVU[I] = = ADD[I] = = [2. andn .x . subu . feps . nand . INC[M][H. xor . nxor . funeq . sub .L] . FUNE . ot . FEQLE = = FCMP = = FUN .4. and . shru . GETA[B] = = OR[I] .8. mul . shlu .

173 MMIX-PIPE: shl .T. = SL[I] = shr . = ODIF[I] = zset . = SYNCD[I] = syncid . = load page table entry = ldunc . = PUSHJ[B] = go . = UNSAVE = sync . = TRIP = incgamma . = ST[B. = SAVE = unsave . STHT[I] = stunc . = WDIF[I] = tdif .Z. STUNC[I] = syncd . = SYNCID[I] = pst . LDHT[I] . = ZS[N][N. CSEV[I] . = increase rL and .Z. = POP = resume . = SR[I] = cmp .P][I] . = PREGO[I] = pushgo .P][B] = pushj . ZSEV[I] . = SWYM = trap . = STO[U][I] . = JMP[B] = noop . ZSOD[I] = cset .T][U][I] . = SYNC = jmp . = GO[I] = prego .W.P][I] . in write bu er = cswap . = CMPU[I] = bdif . = TDIF[I] = odif . = CS[N][N. STCO[I] .O][U][I] . = LD[B. = PUSHGO[I] = pop . = GET = put . CSOD[I] = get .Z. = load page table pointer = ldpte . = PUT[I] = ld . = LDVTS[I] = preld . = PREST[I] = st . = B[N][N. = CMP[I] = cmpu . = TRAP = trip . = increase pointer = decgamma . = decrease pointer = incrl . = LDUNC[I] = ldvts . LDSF[I] = ldptp . = PB[N][N.Z. = CSWAP[I] = br . = PRELD[I] = prest .P][B] = pbr . = RESUME = save . = STUNC[I] . = BDIF[I] = wdif .W.

= intermediate stage of SAVE = unsav . MMIX con. = sav . = intermediate stage of UNSAVE = resum = intermediate stage of RESUME = g internal opcode .

MMIX-CONFIG x38.g : void ( ). LISTS .

"mul6" . fcmp . "fint" . . "orn" . "prest" . "fadd" . "sav" . "mul2" . "pushj" . "sadd" . "divu" . "addu" . "fsub" . "div" . "incgamma" . "bdif" . "subu" . h Global variables 20 i + char internal op name [ ] = f"mul0" . h Global variables 20 i + internal opcode internal op [256] = f trap . "cmp" . "go" . "stunc" . "set" . "fix" . "sub" . "fmul" . "mor" . funeq . "ldunc" . "xor" . "syncd" . "st" . "frem" . "zset" . "shr" . "pbr" . "trip" . "shl" . "mul1" . "or" . "mulu" . "funeq" . "mul" . "save" . "ldptp" . "ldvts" . "mul7" . "shru" . "mul8" . "pop" . fadd . "fsqrt" . "fdiv" . "sync" . "decgamma" . We need a table to convert the external opcodes to internal ones. "cset" . "pst" . "unsav" . "unsave" . "get" . "add" . "nor" . "br" . "tdif" . "noop" . "resume" . "trap" . "mul5" . "odif" . "shlu" . "pushgo" . "nand" . "syncid" . "flot" . "put" . "mul4" . "and" . "andn" .MMIX-PIPE: LISTS 174 50. 51. "feps" . "nxor" . "prego" . "ld" . "preld" . "resum" g. "incrl" . "ldpte" . "wdif" . "cmpu" . "jmp" . funeq . "fcmp" . "sh" . "mul3" . "mux" . "cswap" .

.x . fsub .

. ot . ot . feps . fsqrt .x . ot . fdiv . ot . ot . feps . feps . fmul . frem . ot . ot . ot .

odif . addu . preld . zset . addu . pst . br . addu . cset . br . pbr . nand . pst . ldunc . cset . cset . shlu . nor . set . preld . cset . st . cset . mul . pbr . and . pbr . zset . set . zset . mux . shru . ldvts . pbr . cset . subu . sub . pst . set . shr . nxor . prego . addu . sadd . st . st . trip g. mux . br . br . or . sub . pst . divu . sadd . br . pbr . cset . cset . andn . cset . pbr . cmpu . zset . addu . andn . br . st . ld . cset . resume . ld . wdif . andn . pst . pbr . sync . pushj . tdif . cset . or . addu . orn . ld . subu . ld . pst . pushj . mor . ld . shlu . set . wdif . bdif . ld . andn .nt . mor . set . addu . mor . put . pbr . shl . zset . ld . ld . cset . or . mulu . br . st . pbr . nor . addu . pst . pbr . cset . ld . ldvts . cmp . jmp . zset . prest . nand . sub . subu . ld . zset . tdif . br . addu . pst . divu . noop . get . addu . syncid . cmpu . br . odif . div . zset . syncd . pst . nxor . br . addu . go . pbr . shru . st . zset . br . add . ld . br . cswap . shr . ldunc . pushgo . pst . zset . save . andn . pst . zset . or . add . orn . syncd . ld . mor . cset . ld . cset . subu . pst . mul . andn . pbr . zset . syncid . br . prego . shl . or . mulu . pst . pst . addu . ld . cmp . . pbr . set . zset . and . xor . or . sub . cswap . zset . ld . unsave . div . br . jmp . pbr . pst . pop . put . ld . ld . ld . pbr . pushgo . go . addu . ld . pst . xor . prest . st . addu . zset . br . st . cset . pbr . ld . br . zset . bdif .

bdif = 48.175 add = 29. divu = 28. div = 9. x49. cset = 53. andn = 38. x49. x49. x49. addu = 30. x49. x49. fadd = 14. x49. cmp = 46. x49. fcmp = 22. x49. x49. x49. cswap = 68. x49. x49. fdiv = 16. cmpu = 47. x49. x49. x49. . and = 37. feps = 21. br = 69.

. x49.nt = 18.

mul = 26. x49. x49. jmp = 80. pst = 66. nxor = 41. x49. x49. LISTS . x49. x49. internal opcode = enum . x49. x49. pushgo = 74. x49. x49. x49. x49. subu = 32. nand = 39. set = 33. x49. shl = 44. x49. zset = 52. odif = 51. sub = 31. x49. fsqrt = 17. noop = 81. ot = 20. syncd = 64. save = 77. preld = 61. x49. xor = 40. trip = 83. x49. x49. shru = 43. x49. x49. mulu = 27. x49. x49. prest = 62. x49. ldunc = 59. x49. trap = 82. x49. prego = 73. syncid = 65. x49. x49. x49. frem = 25. x49. get = 54. x49. x49. x49. ld = 56. pbr = 70. x49. MMIX-PIPE: go = 72. x49. fsub = 24. x 49.x = 19. x49. orn = 35. x49. x49. x49. x49. funeq = 23. x49. wdif = 49. x49. shr = 45. shlu = 42. st = 63. nor = 36. mor = 13. x49. x49. x49. sadd = 12. x49. x49. x49. ldvts = 60. x49. fmul = 15. x49. x49. x49. unsave = 78. pop = 75. resume = 76. x49. or = 34. sync = 79. x49. put = 55. pushj = 71. tdif = 50. x49. mux = 11.

MMIX-PIPE: LISTS 176 52. we might as well de. While we're into boring lists.

h Header de. These codes have been designed so that special registers 0{7 are unencumbered. 12{18 can't be PUT by the user.ne all the special register numbers. together with an inverse table for use in diagnostic outputs. 8{11 can't be PUT by anybody. The SAVE and UNSAVE commands store and restore special registers 0{6 and 23{27. Pipeline delays might occur when GET is applied to special registers 21{31 or when PUT is applied to special registers 15{20.

nitions 6 i + #de.

ne rA 21 = arithmetic status register = #de.

ne rB 0 = bootstrap register (trip) = #de.

ne rC 8 = cycle counter = #de.

ne rD 1 = dividend register = #de.

ne rE 2 = epsilon register = #de.

ne rF 22 = failure location register = #de.

ne rG 19 = global threshold register = #de.

ne rH 3 = himult register = #de.

ne rI 12 = interval counter = #de.

ne rJ 4 = return-jump register = #de.

ne rK 15 = interrupt mask register = #de.

ne rL 20 = local threshold register = #de.

ne rM 5 = multiplex mask register = #de.

ne rN 9 = serial number = #de.

ne rO 10 = register stack o set = #de.

ne rP 23 = prediction register = #de.

ne rQ 16 = interrupt request register = #de.

ne rR 6 = remainder register = #de.

ne rS 11 = register stack pointer = #de.

ne rT 13 = trap address register = #de.

ne rU 17 = usage counter = #de.

ne rV 18 = virtual translation register = #de.

ne rW 24 = where-interrupted register (trip) = #de.

ne rX 25 = execution register (trip) = #de.

ne rY 26 = Y operand (trip) = #de.

ne rZ 27 = Z operand (trip) = #de.

ne rBB 7 = bootstrap register (trap) = #de.

ne rTT 14 = dynamic trap address register = #de.

ne rWW 28 = where-interrupted register (trap) = #de.

ne rXX 29 = execution register (trap) = #de.

ne rYY 30 = Y operand (trap) = #de.

"rJ" . The . "rI" . "rM" . "rP" . "rG" .ne rZZ 31 = Z operand (trap) = 53. "rE" . "rYY" . "rY" . "rV" . "rZ" . "rX" . "rZZ" g. h Global variables 20 i + char special name [32] = f"rB" . "rXX" . 54. "rW" . "rWW" . "rC" . "rH" . "rF" . "rBB" . "rS" . "rK" . Here are the bit codes that a ect trips and traps. "rQ" . "rO" . "rA" . "rT" . "rN" . "rR" . "rD" . "rL" . "rU" . "rTT" .

#de. the next eight apply to rA.rst eight cases also apply to the upper half of rQ.

ne P_BIT (1  0) = instruction in privileged location = #de.

ne S_BIT (1  1) = security violation = .

177 MMIX-PIPE: #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne B_BIT (1  2) K_BIT (1  3) N_BIT (1  4) PX_BIT (1  5) PW_BIT (1  6) PR_BIT (1  7) PROT_OFFSET 5 X_BIT (1  8) Z_BIT (1  9) U_BIT (1  10) O_BIT (1  11) I_BIT (1  12) W_BIT (1  13) V_BIT (1  14) D_BIT (1  15) H_BIT (1  16) F_BIT (1  17) E_BIT (1  18) h Global variables 20 i + LISTS = instruction breaks the rules = = instruction for kernel only = = virtual translation bypassed = = permission lacking to execute from page = = permission lacking to write on page = = permission lacking to read from page = = distance from PR_BIT to protection code position = = oating inexact = = oating division by zero = = oating under ow = = oating over ow = = oating invalid operation = = oat-to-.

b = 1) if (x & b) printf ("%c" . 56. b = E_BIT . for (j = 0. bit code map [j ]). j . f g register int b. 55. but a few are de. (x & (b + b 1)) ^ b. h Subroutines 14 i + static void print bits (x) int x. The lower half of rQ holds external interrupts of highest priority. j ++ .x over ow = = integer over ow = = integer divide check = = trip handler bit = = forced trap bit = = external (dynamic) trap bit = char bit code map [ ] = "EFHDVWIOUZXrwxnkbsp" . 57. Most of them are implementation-dependent. h Internal prototypes 13 i + static void print bits ARGS ((int )).

h Header de.ned in general.

nitions 6 i + #de.

ne POWER_FAILURE (1  0) = try to shut down calmly and quickly = #de.

ne PARITY_ERROR (1  1) = try to save the .

le systems = #de.

ne NONEXISTENT_MEMORY (1  2) = a memory address can't be used = #de.

ne REBOOT_SIGNAL (1  4) = it's time to start over = #de.

printf : int ( ). <stdio.h>.ne INTERVAL_TIMEOUT (1  7) = the timer register. . x6. has reached zero = ARGS = macro. rI.

This simulator is based on the idea of \dynamic scheduling with register renaming. Now that we understand some basic low-level structures." as implemented in several processors of the 1990s and described in section 4. we're ready to look at the larger picture. Moreover. 25{33]. The essential idea is to keep track of the pipeline contents by recording all dependencies between un. Tomasulo [IBM Journal of Research and Development 11 (1967). M. the dynamic scheduling method is extended here to \speculative execution. second edition (1995)." as introduced in the 1960s by R. Dynamic speculation.MMIX-PIPE: DYNAMIC SPECULATION 178 58.6 of Hennessy and Patterson's Computer Architecture.

the addition leaves the hot seat and is said to be committed. for example. The best way to get an understanding of speculative execution is perhaps to imagine that the reorder bu er is large enough to hold hundreds of instructions in various stages of execution. because it enforces clarity of thinking. Organizing the pipeline as a reorder bu er allows us to look ahead and keep busy computing values that have a good chance of being needed later. Without such a broad viewpoint. and they insist on having the equipment removed even after it has been partially or completely installed. It will stay in the reorder bu er until reaching the hot seat at the front of the queue. we can say that all instructions not yet in the hot seat are being executed speculatively. Instructions that have completed execution and have not yet been committed are analogous to cars that have gone through our hypothetical repair shop and are waiting for their owners to pick them up. The reorder bu er is in fact a queue of control records. in strict program order. Thus we have a somewhat paradoxical situation in which a diÆcult general problem turns out to be easier to solve than its simpler special cases. and to think of an implementation of MMIX that has dozens of functional units|more than would ever actually be built into a chip. . Finally. and the world of automobiles does not have a natural counterpart for the notion of speculative execution. all analogies break down.nished computations in a queue called the reorder bu er. meaning that they won't really be called for unless a prior branch instruction has the predicted outcome. instead of waiting for slow instructions or slow memory references to be completed. However. corresponding to all instructions that have been dispatched or issued but not yet committed. Then one can readily visualize the kinds of control structures and checks that must be made to ensure correct execution. correspond to an instruction that adds together two numbers whose values are still being computed. That notion corresponds roughly to situations in which people are led to believe that their cars need a new piece of equipment. those numbers have been allocated space in earlier positions of the reorder bu er. because an external interrupt might occur at any time and change the entire course of computation. Indeed. a programmer or hardware designer will be inclined to think only of the simple cases and to devise algorithms that lack the proper generality. but the sum won't be written immediately into the destination register. The addition will take place as soon as both of its operands are known. but they suddenly change their mind once they see the price tag. conceptually forming part of a circle of such records inside the simulator. Some instructions in the reorder bu er may in fact be executed only on speculation. An entry in the reorder bu er might.

etc.179 MMIX-PIPE: DYNAMIC SPECULATION Speculatively executed instructions might make no sense: They might divide by zero or refer to protected memory areas. however. The person who designs a computer with speculative execution is an optimist. who understands that all predictions might come to naught. Such anomalies are not considered catastrophic or even exceptional until the instruction reaches the hot seat. . The person who designs a reliable implementation of such a computer is a pessimist. The pessimist does. take pains to optimize the cases that do turn out well. who has faith that the vast majority of the machine's predictions will come true.

$3 .$2.MMIX-PIPE: DYNAMIC SPECULATION 180 59. say ADD $1. The . Let's consider what happens to a single instruction. as it travels through the pipeline in a normal situation.

although other possibilities are allowed by MMIX con. it is placed into the I-cache (that is. We will assume for simplicity in this discussion that each I-cache access takes one clock cycle. the instruction cache).rst time this instruction is encountered. so that we won't have to access memory when we need to perform it again.

) We assume that L > 3. all of these conditions are usually true. Fetching is done by a coroutine whose stage number is 0. provided that functional units are available for those instructions and there is room in the reorder bu er. which might point to specnode entries in the reorder bu er for previous instructions whose destinations are l[2] and l[3]. and if the machine isn't in the process of deissuing instructions that were mispredicted. $2. The operands l[2] and l[3] might not be known at time 1001. and $3 are local registers. they are spec values. The fetch unit of our machine is able to fetch up to fetch max instructions on each clock cycle and place them in the fetch bu er. and our simulated machine might have several of them. The dispatch unit of our simulator is able to issue up to dispatch max instructions on each clock cycle and move them from the fetch bu er to the reorder bu er. (In fact. If they aren't all stalled in stage 1 of their pipelines. and if all such prior instructions can be issued without using up all the free ALUs.g . A functional unit that handles ADD is usually called an ALU (arithmetic logic unit). so that the ADD instruction is supposed to set l[1] l[2] + l[3]. The dispatcher . provided that there is room in the bu er and that all the instructions belong to the same cache block. and if the reorder bu er isn't full. For simplicity we'll assume in fact that the register stack is empty. so that $1. A cache block typically contains 8 or 16 instructions. our ADD instruction will be issued at time 1001. Suppose the simulated machine fetches the example ADD instruction at time 1000. and if fewer than dispatch max instructions are ahead of the ADD in the fetch bu er.

containing appropriate spec values corresponding to l[2] and l[3] in its y and z .lls the next available control block of the reorder bu er with information for the ADD .

elds. The x .

meaning that this speculative value still needs to be computed. corresponding to l[1] and to all instructions in the reorder bu er that have l[1] as a destination. if they are issued before the sum x:o has been computed. Double linking is used in the specnode list because the ADD instruction might be cancelled before it is . Subsequent instructions that need l[1] as a source will point to x. The boolean value x:known will be set to false .eld of this control block will be inserted into a doubly linked list of specnode records.

except for its event bits. is represented in the ra . thus deletions might occur at either end of the list for l[1]. In fact. it will also stall if its third input rA is not known. At time 1002. the current speculative value of rA.nally committed. the ALU handling the ADD will stall if its inputs y and z are not both known (namely if y:p 6=  or z:p 6= ).

eld of the control block. In such a case the ALU will look to see if the spec values pointed to by y:p and/or z:p and/or ra :p become de. and we must have ra :p  .

ned on this clock cycle. . and it will update its own input values accordingly.

Then x:o will be set to y:o + z:o and x:known will become true . the interrupt and arith exc . This will make the result destined for l[1] available to be used in other commands at time 1003.181 MMIX-PIPE: DYNAMIC SPECULATION But let's assume that y. If no over ow occurs when adding y:o to z:o. z . and ra are already known at time 1002.

there are two cases.elds of the control block for ADD are set to zero. But when over ow does occur (shudder). which is found in . based on the V-enable bit of rA.

eld b:o of the control block. the V-bit of the arith exc . If this bit is 0.

the arith exc .eld in the control block is set to 1.

interrupting the normal sequence. But if the V-enable bit is 1.eld will be ored into rA when the ADD instruction is eventually committed. the interrupt . In such a case. the trip handler should be called.

and the ADD command can pass through the hot seat and out of the reorder bu er. interrupt : unsigned int . (Of course the over ow and the trip handler are still speculative until the ADD instruction is committed. x11. and they will all be completed normally.) The commission unit of this simulator is able to commit and/or deissue up to commit max instructions on each clock cycle. commit max . b: MMIX con. With luck. namely location 32. The virtual starting address of the over ow trip handler. h External variables 4 i + Extern int fetch max . x40. x4. x44. false = 0. Then l[1] can be set to x:o. x44. spec . fewer than commit max instructions will be ahead of our ADD instruction at time 1003. But the pipeline keeps charging ahead.eld of the control block is set to specify a trip. Other exceptional conditions might cause the ADD itself to be terminated before it gets to the hot seat. and the event bits of rA can be updated from arith exc . is hastily passed to the fetch routine. and instructions will be fetched from that location as soon as possible. all instructions following the ADD in the reorder bu er must now be deissued. x44. known : bool . Extern = macro. peekahead . and the fetcher and dispatcher are told to forget what they have been doing. = limits on instructions that can be handled per clock cycle = arith exc : unsigned int . dispatch max . always trying to guess the most probable outcome.

x44. MMIX-CONFIG x38. . spec . specnode . x: y: z: specnode. x40. x40. octa . x44. stage : int . x23. ra : spec . spec . true = 1. x44.g : o: p: void ( ). x44. x11.

The . Fortunately the vast majority of instructions are suÆciently simple that we can deal with them more eÆciently while other computations are taking place. All other instructions in the reorder bu er are being executed on speculation. In this implementation the reorder bu er is simply housed in an array of control records. Thus all instructions that change the global state in complicated ways|like LDVTS . The instruction currently occupying the hot seat is the only issued-but-notyet-committed instruction that is guaranteed to be truly essential to the machine's computation. which changes the virtual address translation caches|are performed only when they reach the hot seat. but we might want to jettison them all if. say.MMIX-PIPE: DYNAMIC SPECULATION 182 60. if they prove to be needed. an external interrupt occurs. well and good.

and the last is reorder top .rst array element is reorder bot . etc. Variable cool points to the next control block that will be . Variable hot points to the control block in the hot seat. and hot 1 to its predecessor.

: : : . 62.lled in the reorder bu er. for (p = hot . reorder top . g g g g printf ("\n" ). printf (" %d available rename register%s. print coroutine id (p~ owner ). h Internal prototypes 13 i + static void print reorder bu er ARGS ((void )). cool . else f register control p. p 6= cool . h Subroutines 14 i + static void print reorder bu er ( ) f printf ("Reorder buffer" ). except of course that we wrap around from reorder bot to reorder top when moving down in the bu er. if (deissues ) printf (" (%d to be deissued)" . hot 1. p = (p  reorder bot ? reorder top : p 1)) f print control block (p). = the number of instructions that need to be deissued = 61. printf (":\n" ). = front and rear of the reorder bu er = = value of hot at beginning of cycle = Extern control old hot . if (hot  cool ) printf (" (empty)\n" ). If hot  cool the reorder bu er is empty. h Initialize everything 22 i + hot = cool = reorder top . 63. rename regs 6= 1 ? "s" : "" . rename regs . if (doing interrupt ) printf (" (interrupt state %d)" . Extern int deissues . deissues = 0. if (p~ owner ) f printf (" " ). mem slots . deissues ). otherwise it contains the control records hot . h External variables 4 i + Extern control reorder bot . mem slots 6= 1 ? "s" : "" ). %d memory slot%s\n" . cool + 1. . = least and greatest entries in the ring containing the reorder bu er = Extern control hot . doing interrupt ).

65. x44. incr : octa ( ). h Global variables 20 i + = how many dispatched on this cycle = int dispatch count . x11. mem slots : int . = remember the hot seat position at beginning of cycle = old hot = hot . = lock to prevent instruction issues = lockvar dispatch lock .h>.183 MMIX-PIPE: DYNAMIC SPECULATION 64. instructions? = Extern int dispatch stat . g This code is used in section 10. control = struct . MMIX-ARITH x6. x70. x87. x4. . print coroutine id : static void ( ). = omit security checks for testing purposes? = Extern bool security disabled . .. x6. x46. Here is an overview of what happens on each clock cycle. suppress dispatch = (deissues _ dispatch lock ). x37. lockvar = coroutine . x59. ARGS = macro. ticks = incr (ticks . = how many cycles of interrupt preparations remain = int doing interrupt . commit max : int . dispatch count = 0. if (:suppress dispatch ) h Dispatch one cycle's worth of instructions 74 i. ticks = macro. h Perform one machine cycle 64 i  f h Check for external interrupt 314 i. = and the beat moves on = dispatch stat [dispatch count ] ++ . x69.. if (doing interrupt ) h Perform one cycle of the interrupt preparations 318 i else h Commit and/or deissue up to commit max instructions 67 i. rename regs : int . <stdio. bool = enum. printf : int ( ). 1). Extern = macro. tail : fetch . x86. x86. x25. 1. h External variables 4 i + = how often did we dispatch 0. h Execute all coroutines scheduled for the current time 125 i. 66. print control block : static void ( ). = should dispatching be bypassed? = bool suppress dispatch . owner : coroutine . x44. = remember the fetch bu er contents at beginning of cycle = old tail = tail . old tail : fetch .

h Commit and/or deissue up to commit max instructions 67 i  f for (m = commit max . = reorder bu er is empty = if (:security disabled ) h Check for security violation. m ) h Deissue the coolest instruction 145 i. break if so 149 i.MMIX-PIPE: DYNAMIC SPECULATION 184 67. m > 0 ^ deissues > 0. m ) f if (hot  cool ) break . m > 0. if (hot~ owner ) break . for ( . = hot seat instruction isn't .

or break if it's not ready 146 i. if (hot  reorder bot ) hot = reorder top . = allow the resumed instruction to see the new rK = g g This code is used in section 64.nished = h Commit the hottest instruction. . if (i  resum ) break . i = hot~ i. else hot .

instructions are . and committing stages in that order. It would be nice to present the parts of this simulator by dealing with the fetching. The dispatch stage. dispatching. After all. executing.185 MMIX-PIPE: THE DISPATCH STAGE 68.

and .rst fetched. then dispatched. then executed.

the fetch stage depends heavily on diÆcult questions of memory management that are best deferred until we have looked at the simpler parts of simulation. However. Therefore we will take our initial plunge into the details of this program by looking .nally committed.

The fetch bu er. which have . assuming that instructions have somehow appeared magically in the fetch bu er. lives in an array that is best regarded as a ring of elements. The elements are structures of type fetch . like the circular priority queue of all coroutines and the circular queue used for the reorder bu er.rst at the dispatch phase.

ve .

which is the virtual address of that instruction. an interrupt . which is an MMIX instruction.elds: A 32-bit inst . a 64-bit loc .

eld. a boolean noted . the protection bits in the relevant page table entry for this address do not permit execution access. which is nonzero if. for example.

and a hist . which becomes true after the dispatch unit has peeked at the instruction to see whether it is a jump or probable branch.eld.

(The least signi.eld. which records the recent branch history.

cant bits of hist correspond to the most recent branches.) h Type de.

The fetch coroutine will be adding entries at the tail position. = have we peeked at this instruction? = unsigned int hist . x66. = front and rear of the fetch bu er = bool = enum. x12. : : : . security disabled : bool . although a few more recently fetched instructions will usually be present in the fetch bu er by the time this part of the program is executed. resum = 89. = bit codes that might cause interruption = bool noted . x60. tail . = the instruction itself = unsigned int interrupt . x60. x12. which starts at old tail when a cycle begins. x17. x17. old tail : fetch . x44. x44. x60. x4. = least and greatest entries in the ring containing the fetch bu er = Extern fetch head . x70. in parallel with the actions simulated by the dispatcher. 69. = if we peeked. just as the oldest and youngest entries in the reorder bu er are called hot and cool . this was the peek hist = g fetch . = virtual address of instruction = tetra inst . owner : coroutine . fetch top . m: register int . old tail + 1. x59. i: internal opcode . x60. Extern = macro. true = 1. x49. reorder bot : control . The oldest and youngest entries in the fetch bu er are pointed to by head and tail . x11. h External variables 4 i + Extern fetch fetch bot . cool : control . x11.nitions 11 i + typedef struct f octa loc . Therefore the dispatcher is allowed to look only at instructions in head . x99. hot : control . x60. . tetra = unsigned int . commit max : int . reorder top : control . i: register int . octa = struct . head 1. peek hist : unsigned int . deissues : int .

MMIX-PIPE: 186 THE DISPATCH STAGE 70. h Global variables 20 i + = rear of the fetch bu er available on the current cycle = fetch old tail . #de. 71.

if (resuming ) printf (" (resumption state %d)" . g f g printf ("Instruction pointer is " ). else f register fetch p. if (p~ interrupt ) print bits (p~ interrupt ). 74. In the following program. g g printf ("\n" ). 72. else print specnode id (inst ptr :p~ addr ). if (p~ noted ) printf ("*" ). the dispatcher also looks at up to peekahead further instructions to see if they are jumps or other commands that change the ow of control. p~ inst . printf ("\n" )." by imagining a huge fetch bu er and the potential ability to issue dozens of instructions per cycle. else f printf ("waiting for " ).ne UNKNOWN_SPEC ((specnode ) 1) h Initialize everything 22 i + head = tail = fetch top . 73. although the actual numbers are typically quite small. if (inst ptr :p  UNKNOWN_SPEC ) printf ("dispatch" ). h Subroutines 14 i + static void print fetch bu er ( ) f printf ("Fetch buffer" ). inst ptr :p = UNKNOWN_SPEC . Much of this action would happen in parallel on a real machine. . if (inst ptr :p  ) print octa (inst ptr :o). else if (inst ptr :p~ addr :h  1) print coroutine id (((control ) inst ptr :p~ up )~ owner ). printf (": %08x(%s)" . h Internal prototypes 13 i + static void print fetch bu er ARGS ((void )). If the fetch bu er is not empty after dispatch max instructions have been dispatched. true head records the head of the fetch bu er as instructions are actually dispatched. if (head  tail ) printf (" (empty)\n" ). for (p = head . while head refers to the position currently being examined (possibly peeking into the future). but our simulator works sequentially. printf (":\n" ). The best way to understand the dispatching process is once again to \think big. p = (p  fetch bot ? fetch top : p 1)) print octa (p~ loc ). opcode name [p~ inst  24]). resuming ). p 6= tail .

187 MMIX-PIPE: THE DISPATCH STAGE If the fetch bu er is empty at the beginning of the current clock cycle. a \dispatch bypass" allows the dispatcher to issue the .

peek hist = cool hist . x40. x48. addr : octa . new head . printf : int ( ). int . x91. = macro. h Dispatch one cycle's worth of instructions 74 i  f register fetch true head . x59. x40. interrupt : unsigned j: register int . up : specnode . x68. x56. int . x12. opcode name : char [ ]. inst : tetra . and try to dispatch it if j < dispatch max 75 i. x44. print specnode id : static void ( ). j < dispatch max + peekahead . x25. x17. x68. cool hist : unsigned int . j ++ ) h Look at the head instruction. print coroutine id : static void ( ). . x69. print octa : static void ( ). Otherwise the dispatcher is restricted to previously fetched instructions. x6. x69. x40. fetch bot : fetch . ARGS inst ptr : spec . specnode = struct . x99. x284. x19. fetch = struct . noted : bool . owner : coroutine . x99. print bits : static void (). x40. x78. <stdio. resuming : int .rst instruction that enters the fetch bu er on this cycle. fetch top : fetch . dispatch max : int .h>. g This code is used in section 64. x44. x59. x69. p: specnode . tail : fetch . peek hist : unsigned peekahead : int . x40. x68. x68. x68. head = true head . head : fetch . x69. o: octa . h: tetra . true head = head . loc : octa . control = struct . if (head  old tail ^ head 6= tail ) old tail = (head  fetch bot ? fetch top : head 1). for (j = 0.

h Look at the head instruction. register int yz . i 80 i. register func u = . f . yz = head~ inst & # ffff. else new head = head 1. h Install default . h Determine the ags. op = head~ inst  24. register bool freeze dispatch = false . f . = fetch bu er empty = if (head  old tail ) break .MMIX-PIPE: THE DISPATCH STAGE 188 75. if (head  fetch bot ) new head = fetch top . and the internal opcode. and try to dispatch it if j < dispatch max 75 i  f register mmix opcode op .

but can peek ahead = head = new head . g This code is used in section 74. if (f & rel addr bit ) h Convert relative address to absolute address 84 i. else new cool = cool 1. An instruction can be dispatched only if a functional unit is available to handle it. cool O = new O . A functional unit consists of a 256-bit vector that speci.elds in the cool block 100 i. cool hist = peek hist . g if (cool  reorder bot ) new cool = reorder top . else h Redirect the fetch if control changes at this inst 85 i. or goto stall 111 i. continue . h Dispatch an instruction to the cool block if possible. otherwise goto stall 101 i. cool = new cool . if (j  dispatch max _ dispatch lock _ nullifying ) f = can't dispatch. h Check for# suÆcient rename registers and memory slots. if ((op & e0)  # 40) h Record the result of branch prediction 152 i. h Issue the cool instruction 81 i. if (head~ noted ) peek hist = head~ hist . 76. continue . otherwise goto stall 82 i. h Assign a functional unit if available. cool S = new S . stall : h Undo data structures set prematurely in the cool block and break 123 i.

There are k coroutines in the array. and an array of coroutines for the pipeline stages. h Type de.es a subset of MMIX 's opcodes. where k is the maximum number of stages needed by any of the opcodes supported.

= pointer to the . = symbolic designation = tetra ops [8]. = number of pipeline stages = coroutine co . = big-endian bitmap for the opcodes supported = int k.nitions 11 i + typedef struct func struct f char name [16].

h External variables 4 i + Extern func funit . .rst of k consecutive coroutines = g func . = pointer to array of functional units = = the number of functional units = Extern int funit count . 77.

h Global variables 20 i + = the reorder position following cool = control new cool . #de. u  funit + funit count . for (u = funit . h Initialize everything 22 i + f register func u. i ++ ) support [i] j= u~ ops [i].189 MMIX-PIPE: THE DISPATCH STAGE 78. = big-endian bitmap for all opcodes supported = 79. because we need to shut o a lot of special actions when an opcode is not supported. It is convenient to have a 256-bit vector of all the supported opcodes. int resuming . g 80. i < 8. = set nonzero if resuming an interrupted instruction = tetra support [8]. u ++ ) for (i = 0.

x49. x99. i = trap . coroutine = struct . noted : bool . x83. x60. j : register int . i: i: new S : octa . internal op : internal opcode [ ]. cool hist : unsigned int . This code is used in section 75. nullifying : bool . rel addr bit = # 40. x12. loc : octa . x17. x74. x99. trap = 82. h: tetra . x59. mmix opcode = enum . f . new head : register fetch . dispatch max : int . new O : octa . x12. peek hist : unsigned int . x69. g else f = ags [op ]. inst : tetra . x99. i = internal op [op ]. tetra = unsigned int . x70. x49. x47. x60. x65. x51. x10. x23. fetch top : fetch . reorder top : control . control = struct . x99. if (i  trip ^ (head~ loc :h & sign bit )) f = 0. i = noop . x68. x69. cool O : octa . register int . register int . head : fetch . x44. x68. x11. x11. x68. x4. x60. fetch bot : fetch . x83. cool : control . old tail : fetch . trip = 83. x47. . noop = 81. x98. x17. x49. x69. TRAP = # 00. and the internal opcode. hist : unsigned int . reorder bot : control . bool = enum. x315. x68. dispatch lock : lockvar . ags : unsigned char [ ]. cool S : octa . i 80 i  if (:(support [op  5] & (sign bit  (op & 31)))) f = oops.ne sign bit ((unsigned ) # 80000000) h Determine the ags. x98. Extern = macro. this opcode isn't supported by any function unit = f = ags [TRAP ]. false = 0.

startup (u~ co . dispatch lock ). print coroutine id (u~ co ). g dispatch count ++ . if (cool~ op  SAVE ) h Get ready for the next step of SAVE 341 i else if (cool~ op  UNSAVE ) h Get ready for the next step of UNSAVE 335 i else if (cool~ i  preld _ cool~ i  prest ) h Get ready for the next step of PRELD or PREST 228 i else if (cool~ i  prego ) h Get ready for the next step of PREGO 229 i g else if (cool~ i  max real command ) f if (( ags [cool~ op ] & ctl change bit ) _ cool~ i  pbr ) if (inst ptr :p   ^ (inst ptr :o:h & sign bit ) ^:(cool~ loc :h & sign bit ) ^ cool~ i 6= trap ) cool~ interrupt j= P_BIT . We assign the . h Issue the cool instruction 81 i  if (cool~ interim ) f cool~ usage = false . = jumping from nonnegative to negative = = delete instruction from fetch bu er = true head = head = new head . printf ("\n" ). u~ co~ ctl = cool . = schedule execution of the new inst = if (verbose & issue bit ) f printf ("Issuing " ).MMIX-PIPE: THE DISPATCH STAGE 190 81. This code is used in section 75. print control block (cool ). 1). g if (freeze dispatch ) set lock (u~ co . cool~ owner = u~ co . 82. printf (" " ). resuming = 0.

otherwise we assign the . if possible.rst functional unit that supports op and is totally unoccupied.

u ++ ) if ((u~ ops [t] & b) ^ (u~ co~ next  )) goto unit found . g g for (u = funit . u ++ ) if (u~ ops [t] & b) f for (i = 0. g goto unit found . i < u~ k. goto unit found . h Assign a functional unit if available. if (cool~ i  trap ^ op = 6 TRAP ) f = opcode needs to be emulated = = this unit supports just TRIP and TRAP = u = funit + funit count . u < funit + funit count . u  funit + funit count . . i ++ ) if (u~ co [i]:next ) goto unit busy . goto stall .rst functional unit that supports op and has stage 1 unoccupied. otherwise goto stall 82 i  f register int t = op  5. for (u = funit . b = sign bit  (op & 31). unit busy : . = all units for this op are busy = unit found : This code is used in section 75.

sign bit = macro. x44. x49. # SAVE = . x12. x65. x44. x25. UNSAVE = # fb. ctl change bit = # 80. register func . <stdio. x47. prego = 73.191 co : coroutine . x69. x80. interim : bool . x44. x40. print coroutine id : static void ( ). pbr = 70. u: x 74. x23. false = 0. owner : coroutine . dispatch count : int . i: register int . max real command = trip . x76. x47. x17. x54. x37. x75. next : coroutine . x44. x65. new head : register fetch . . x31. resuming : int . x74. true head : register fetch .h>. o: octa . p: specnode . op : mmix opcode . x46. trap = 82. x75. x83. stall : label. P_BIT = 1  0. x75. x44. x44. h: tetra . x49. funit : func . ops : tetra [ ]. x40. ags : unsigned char [ ]. usage : bool . verbose : int . startup : static void ( ). x76. funit count : int . fa x set lock = macro ( ). preld = 61. x78. x4. THE DISPATCH STAGE print control block : static void ( ). printf : int ( ). x60. x49. op : register mmix opcode . x49. x49. x23. x11. i: internal opcode . x77. TRAP = # 00. head : MMIX-PIPE: issue bit = 1  0. interrupt : unsigned int . fetch . k: int . x49. inst ptr : spec . x8. x76. loc : octa . x284. x83. dispatch lock : lockvar . x75. x44. x77. freeze dispatch : register bool . prest = 62. 47. cool : control . ctl : control .

# 20 means rX is a destination. # 40 means YZ is part of a relative address. # 8 means rY is a source operand.MMIX-PIPE: 192 THE DISPATCH STAGE 83. #de. The ags table records special properties of each operation code in binary notation: # 1 means Z is an immediate value. # 10 means rX is a source operand. # 4 means Y is an immediate value. # 80 means the control changes at this point. # 2 means rZ is a source operand.

ne X is dest bit # 20 #de.

ne rel addr bit # 40 #de.

# 2a. # 50. # 29. # 2a. # 1a. # 50. # 8a. # 09. # 30. # 50. # 2a. # 29. # 09. # 02. # 3a. # 26. # 2a. # 19. # 2a. # 2a. # 19. # 19. # 50. # 50. # 50. : : : = = JMP . # 00. # 0a. : : : = = SYNCD . # 2a. # 50. # 2a. # 3a. # 29. # 2a. # 29. # 30. # 29. # 3a. # 50. # 29. # 1a. : : : = = LDSF . # 29. # 26. # 39. # 2a. # 39. # 2a. # 29. # 80. # 39. # 2a. # 3a. # 0a. # 29. # 29. # 29. : : : = = SETH . # 29. # 20. # 29. : : : = = POP . : : : = = AND . # 29. : : : = = MUX . # 2a. # 3a. # 1a. # 30. # 3a. # 19. # 0a. # 50. # 29. # 1a. # 29. # 2a. # 29. # 26. # 29. # 25. # 1a. : : : = = LDT . : : : = = TRAP . # 29. : : : = = PBN . # 29. # 30. # 29. # 60. # 26. : : : = = LDB . # 2a. = FLOT . # 26. # 30. : : : = = CSN . # 25. # 29. # c0. # 2a. : : : = = ORH . # 19. # 2a. # 0a. # 19. # 89. # 1a. # 2a. # 09. # 50. : : : = = BN . : : : = = STSF . : : : = = PBNN . # 30. : : : = = ZSN . # 2a. # 26. : : : = = CSNN . # 29. # 20. # 09. # 50. # 2a. # 50. # 80. # 29. # c0. # 50. # 29. # 50. # 50. # 50. # 2a. # 2a. # 3a. # 20. : : : = = SL . # 50. # 30. # c0. # 30. # 2a. # 2a. # 39. # 26. # 30. # 1a. : : : = = FMUL . # 29. # 2a. # 29. # 2a. # 30. # 29. # 2a. # 30. # 19. # 1a. # 2a. # 2a. # 29. # 29. # 50. # 50. # 2a. # 2a. # 2a. # 29. : : : = = OR . # 29. : : : = = STB . # 50. # 0a. # 2a. # 1a. # 29. : : : = = 2ADDU . # 2a. # 2a. # 2a. # 39. # 50.ne ctl change bit # 80 h Global variables 20 i + unsigned char ags [256] = f# 8a. : : : = = CMP . # 29. # 25. # 2a. # a9. # 19. # 60. : : : = = ZSNN . # 2a. # 02. # 29. # 2a. # 29. : : : = = BNN . : : : = . # 2a. # 26. # 2a. # 19. # 2a. # 01. # 09. # 20. # 2a. # 2a. # 50. # 25. # 39. # 29. # 26. # 2a. # 29. # 39. # 29. # 25. # 19. # 00. # 50. # 2a. # 50. # 29. # 09. # 1a. # 29. # 29. # 50. # 2a. # 29. # 1a. : : : = = ADD . # 30. # 0a. # 19. # 8ag. # 29. # 3a. # 19. # 29. # 50. # 01. # 50. # 29. # 50. : : : = = STT . # 2a. # 20. # 50. # 2a. # 50. # c0. # 2a. : : : = = MUL . # 29. # 2a. # 2a. # 39. # 29. : : : = = BDIF . # 2a. # 2a. # 25. # 29. # 2a. # 2a. # 2a. # 50. # 29. # 50. # 29. # 2a. # 26. # 2a. # 2a. # 29. # 1a. # aa. # 2a. # 2a. # 2a. # 2a. : : : = = LDVTS . # 29. # 2a. # 29.

dispatch max : int . UNKNOWN_SPEC up : = macro. head~ noted = true . x49. if (op & 1) yz = (i  jmp ? # 1000000 : # 10000). MMIX-ARITH x6. x44. x11. break . i: register int . x86. will wait on cool~ go = case go : case pushgo : case trap : case resume : case syncid : inst ptr :p = UNKNOWN_SPEC . zero spec : spec . known : bool . yz : register int . x70. f . x75. head : fetch . x44. break . trip = 83. o: octa . trap = 82. x49. A slightly tricky optimization of the POP instruction is made in the common case that the speculative value of rJ is known. x68. x49. pushj = 71. x49. x44. x40. break . g This code is used in section 75. 4). x41. old tail : fetch . switch (i) f case jmp : case br : case pbr : case pushj : inst ptr = cool~ z . f h Restart the fetch coroutine 287 i. resume = 76. noted : bool . x99. x80. go = 72. x59.193 MMIX-PIPE: THE DISPATCH STAGE 84. p: specnode . x12. x52. incr : octa ( ). new head : register fetch . cool~ y:o = incr (head~ loc . cool : control . sign bit = macro. x40. peek hist : unsigned int . x71. x60. loc : octa . x49. x49. yz  2). x49. syncid = 65. dispatch lock : lockvar . z : spec . g g g This code is used in section 75. x49. x68. head~ hist = peek hist . hist : unsigned int . rJ = 4. x68. 85. loc : octa . x17. h: tetra . x40. pop = 75. g = otherwise fall through. br = 69. x pushgo = 74. The location of the next instruction to be fetched is in a spec variable called inst ptr . cool~ z:o = incr (head~ loc . pbr = 70. x315. case pop : if (g[rJ ]:up~ known ^ j < dispatch max ^ :dispatch lock ^ :nullifying ) inst ptr :o = incr (g[rJ ]:up~ o. x69. x12. j : register int . x75. yz  2). 74. specnode [ ]. x40. cool~ y:p = . true = 1. x68. go : specnode . x49. tail : fetch . op : register mmix opcode . x284. specnode . h Convert relative address to absolute address 84 i  f if (i  jmp ) yz = head~ inst & # ffffff. x65. x44. inst ptr : spec . inst : tetra . nullifying : bool . f: g: register int . cool~ z:p = . break . h Redirect the fetch if control changes at this inst 85 i  f register int predicted = 0. if (predicted _ (f & ctl change bit ) _ (i  syncid ^ :(cool~ loc :h & sign bit ))) = discard all remaining fetches = old tail = tail = new head . case trip : inst ptr = zero spec . inst ptr :p = . x49. x69. x75. x49. jmp = 80. y : spec . if ((op & # e0)  # 40) h Predict a branch outcome 151 i.

The dispatcher works with cool instructions and puts them into the reorder bu er. where they gradually get warmer and warmer. A machine register like l[101] or g[250] is represented by a specnode whose o . At any given time the simulated machine is in two main states. have intermediate temperatures. between hot and cool . the \hot state" corresponding to instructions that have been committed and the \cool state" corresponding to all the speculative changes currently being considered. Intermediate instructions.MMIX-PIPE: THE DISPATCH STAGE 194 86.

eld is the current hot value of the register. If the up and down .

h External variables 4 i + Extern specnode g[256]. representing intermediate speculative values (sometimes called \rename registers"). a node is removed from the cool end if an instruction needs to be deissued.elds of this specnode point to the node itself. until the value becomes known. = currently unused capacity = Extern int rename regs . 87. for speculative instructions that use this register as a destination. rB is internally the same as g[0]. For example. mem slots . a node is removed from the hot end when an instruction is committed. = global registers and special registers = Extern specnode l. : : : . Otherwise up and down are pointers to the coolest and hottest ends of a doubly linked list of specnodes. the hot and cool values of the register are identical. : : : occupy the same array as the global registers g[32]. The rename registers are implemented as the x or a specnodes inside control blocks. g[33]. Speculative instructions that use the register as a source operand point to the nexthottest specnode on the list. The doubly linked list of specnodes is an input-restricted deque: A node is inserted at the cool end when the dispatcher issues an instruction with this register as destination. max mem slots . because rB = 0. h Header de. = the number of on-chip local registers (must be a power of 2) = = capacity of reorder bu er = Extern int max rename regs . rB. The special registers rA. = the ring of local registers = Extern int lring size .

nitions 6 i + #de.

ne ticks g[rC ]:o = the internal clock = 88. The addr . h Global variables 20 i + = for calculations modulo lring size = int lring mask . 89.

elds in the specnode lists for registers are used to identify that register in diagnostic messages. (The macro ABSTIME is de. which is initially 255. memory addresses are positive. and rN. All registers are initially zero except rG. Such addresses are negative. which has a constant value identifying the time of compilation.

ned externally in the .

Beware: Our assumption will fail in February of 2106." is less than 232 .) #de.h .le abstime. ABSTIME is a trivial program that computes the value of the standard library function time (). We assume that this number. which should have just been created by ABSTIME . which is the number of seconds in the \UNIX epoch.

ne VERSION 1 = version of the MMIX architecture that we support = #de.

ne SUBVERSION 0 = secondary byte of version number = #de.

ne SUBSUBVERSION 0 = further quali.

cation to version number = .

<time. x80. x40. x6. g g[rG ]:o:l = 255. rN = 9. x40. g[rN ]:o:l = ABSTIME . x40. x10. x60. g specnode . printf : int ( ). h Subroutines 14 i + ARGS ((octa )). l[j ]:known = true . specnode = struct . static void print specnode id (a) octa a. octa = struct . x44.h. g[j ]:addr :l = j. x52. special name : char [ ]. <stdio. hot : control . l: tetra . a:l 256). x17. rG = 19. x60. known : rB = 0. g[rN ]:o:h = (VERSION  24) + (SUBVERSION  16) + (SUBSUBVERSION  8). h Internal prototypes 13 i + static void print specnode id 91. l[j ]:up = l[j ]:down = &l[j ]. ARGS = macro. x52. print octa : static void ( ). j < 256. abstime. printf ("]" ).195 MMIX-PIPE: THE DISPATCH STAGE h Initialize everything 22 i + rename regs = max rename regs . x40. = macro. mem slots = max mem slots . up : specnode . x: specnode . x40. x11. h: tetra .h>. j < lring size . for (j = 0. l[j ]:addr :l = 256 + j. x53. = see comment and warning above = for (j = 0. else printf ("l[%d]" . true = 1. bool . g [j ]:up = g[j ]:down = &g[j ]. print octa (a). addr : octa . x4. ABSTIME j: register int . o: octa . x17. a:l). x40. x17. j ++ ) f g [j ]:addr :h = sign bit . Extern = macro. time : time t ( ).h>. rC = 8. cool : control . x52. . g 90. down : specnode . j ++ ) f l[j ]:addr :h = sign bit . g [j ]:known = true . f g a: if (a:h  sign bit ) f if (a:l < 32) printf (special name [a:l]). lring mask = lring size 1. else if (a:l < 256) printf ("g[%d]" . sign bit = macro. x19. g else if (a:h 6= 1) f printf ("m[" ). x44. x52.

g . r~ up = t. t~ up~ down = t. d = t~ down . else res :p = r~ up .MMIX-PIPE: THE DISPATCH STAGE 196 92. h Internal prototypes 13 i + static spec specval ARGS ((specnode )). return res . t. 95. h Subroutines 14 i + = insert t into list r = static void spec install (r. h Subroutines 14 i + = remove t from its list = static void spec rem (t) specnode t. d~ up = u. 97. The specval subroutine produces a spec corresponding to the currently coolest value of a given local or global register. h Subroutines 14 i + static spec specval (r) specnode r. g 94. 96. if (r~ up~ known ) res :o = r~ up~ o. specnode )). f spec res . u~ down = d. t) specnode r. t~ addr = r~ addr . spec rem takes such a value out. f register specnode u = t~ up . h Internal prototypes 13 i + static void spec install ARGS ((specnode . t~ down = r. res :p = . h Internal prototypes 13 i + static void spec rem ARGS ((specnode )). f g t~ up = r~ up . The spec install subroutine introduces a new speculative value at the cool end of a given doubly linked list. Conversely. 93.

spec = struct . new S . are not actually used. = values of rO. = history bits for branch prediction = unsigned int cool hist . x40. x40. since rO and rS are always multiples of 8. known : rO = 10. is also treated specially. peek hist . x86. The other bits of rA. x40. x40. o: octa . . which are needed to control trip handlers and oating point rounding. cool S . Some special registers are so central to MMIX 's operation. rA. the register stack pointers rO and rS are treated in this way. down : specnode . cool : control . h External variables 4 i + = values of rO. are treated in the normal way. x44. x52. x40. x52. by accumulating values of arith exc . octa = struct . namely g[rO ] and g[rS ]. ARGS g: specnode [ ].) The arithmetic status register. cool G . an instruction to GET the value of rA will be executed only in the hot seat. x40. = macro. x17. x40. rS after cool = octa new O . The normal specnodes for rO and rS. Its event bits are kept up to date only at the \hot" end. (Actually cool O and cool S correspond to the register values divided by 8. x40. p: specnode . arith exc : unsigned int . For example. x60. addr : octa . h Global variables 20 i + = values of rL and rG before the cool instruction = int cool L . 99. bool . the cool values are called cool O and cool S . up : specnode .197 MMIX-PIPE: THE DISPATCH STAGE 98. rS = 11. x4. specnode = struct . Extern = macro. they are carried along with each control block in the reorder bu er instead of being treated as source and destination registers of each instruction. x6. rS before the cool instruction = Extern octa cool O .

h Install default .MMIX-PIPE: THE DISPATCH STAGE 198 100.

cool~ rl :known = true . cool~ rl :up = . cool~ need b = cool~ need ra = cool~ ren x = cool~ mem x = cool~ ren a = cool~ set l = false . h Dispatch an instruction to the cool block if possible. h Make sure cool L and cool G are up to date 102 i. else cool~ usage = ((op & (g[rU ]:o:h  16))  g[rU ]:o:h  24 ? true : false ). new O = cool~ cur O = cool O . cool~ go :known = false . otherwise goto stall 101 i  = reorder bu er is full = if (new cool  hot ) goto stall . cool~ i = i. cool~ x:o = cool~ a:o = cool~ rl :o = zero octa . 101.elds in the cool block 100 i  cool~ op = op . cool~ zz = (head~ inst ) & # ff. h Install the operand . cool~ go :o = incr (cool~ loc . cool~ x:up = . if ((head~ loc :h & sign bit ) ^ :(g[rU ]:o:h & # 8000)) cool~ usage = false . cool~ interim = false . cool~ a:up = . cool~ go :addr :h = 1. new S = cool~ cur S = cool S . cool~ xx = (head~ inst  16) & # ff. cool~ a:known = false . This code is used in section 75. cool~ arith exc = cool~ denin = cool~ denout = 0. cool~ yy = (head~ inst  8) & # ff. cool~ x:known = false . cool~ interrupt = head~ interrupt . cool~ y = cool~ z = cool~ b = cool~ ra = zero spec . 4). cool~ loc = head~ loc . cool~ go :up = (specnode ) cool . cool~ hist = peek hist .

switch (i) f h Special cases of instruction dispatch 117 i default : break . . The UNSAVE operation begins by loading register rG from memory.elds of the cool block 103 i. if (:g[rG ]:up~ known ^ :(op cool G = g[rG ]:up~ o:l. g dispatch done : This code is used in section 75. cool L = g[rL ]:up~ o:l. h Make sure cool L and cool G are up to date 102 i  if (:g[rL ]:up~ known ) goto stall . so we aren't fussy about it here.  UNSAVE ^ cool~ xx  1)) goto stall . or insert an internal command and goto dispatch done if X is marginal 110 i. if (f & X is dest bit ) h Install register X as the destination. We don't really need to know the value of rG until twelve other registers have been unsaved. This code is used in section 101. 102.

h Install the operand .199 MMIX-PIPE: THE DISPATCH STAGE 103.

cool G : int . x12. x49. x60. zero spec : spec . x17. g This code is used in section 103. specnode = struct . MMIX-ARITH x6. zz : unsigned char . rG = 19. x44. x41. x44. head : fetch . x86. o: octa . zero octa : octa . up : specnode . incr : octa ( ). ra : spec . x44. need ra : bool . x75. x44. unsigned char . x44. cool L : int . x78. x44. mem x : bool . hot : control . else if (f & # 8) h Set cool~ y from register Y 105 i g This code is used in section 101. if (f & # 1) cool~ z:o:l = cool~ zz . x: specnode . rU = 17. x68. x44. x44. x99. interrupt : unsigned interrupt : unsigned known : bool . f : register int . x75. x44. x44. x40. x44. x44. x75. a: specnode . x11. xx : unsigned char . lring mask : int . x44. i: register int . x68. else if (f & # 2) h Set cool~ z from register Z 104 i else if ((op & # f0)  # e0) h Set cool~ z as an immediate wyde 109 i. z : spec . specval : static spec ( ). op : mmix opcode . x44. g This code is used in section 103. x40. new S : octa . x99. else if (cool~ yy < cool L ) cool~ y = specval (&l[(cool O :l + cool~ yy ) & lring mask ]). loc : octa . op : register mmix opcode . x44. set l : bool . denin : int . new O : octa . x44. cool : control . x44. g : specnode [ ]. 105. new cool : control . ren a : bool . l: l: int . y: spec . x52. x80. x60. arith exc : unsigned int . true = 1. cool O : octa . addr : octa . x44. denout : int . x83. 104. h Set cool~ y from register Y 105 i  f if (cool~ yy  cool G ) cool~ y = specval (&g[cool~ yy ]). cur O : octa . int . x99. x44. x86. else if (cool~ zz < cool L ) cool~ z = specval (&l[(cool O :l + cool~ zz ) & lring mask ]). UNSAVE = # fb. x47. x44. sign bit = macro. cool S : octa . go : specnode . third operand : unsigned char [ ]. ren x : bool . rL = 20. x44. x40. x44. x44. x52. inst : tetra . x44. loc : octa . x44. hist : unsigned int . if (f & # 4) cool~ y:o:l = cool~ yy . yy : . h: tetra . cur S : octa . MMIX-ARITH x4. tetra . resuming : int . x99. x88. x44. x69. stall : label.elds of the cool block 103 i  if (resuming ) h Insert special operands when resuming an interrupted operation 324 i else f if (f & # 10) h Set cool~ b from register X 106 i if (third operand [op ] ^ (cool~ i 6= trap )) h Set cool~ b and/or cool~ ra from special register 108 i. x40. false = 0. x44. trap = 82. x17. specnode . x11. x40. X is dest bit = # 20. x78. need b : bool . peek hist : unsigned int . i: internal opcode . x44. x93. x99. h Set cool~ z from register Z 104 i  f if (cool~ zz  cool G ) cool~ z = specval (&g[cool~ zz ]). x98. spec . x44. b: interim : bool . x107. x68. x52. rl : specnode . usage : bool . x98.

rA . 0. If an operation requires a special register as third operand. = AND . 0. 0. = PBN . 0. 0. 0. 0. 0. 0. = 2ADDU . 0. 0. 0. 0. : : : = rA .MMIX-PIPE: THE DISPATCH STAGE 200 106. 0. 0. 0. 0. 0. 0. 0. 0. : : : = 0. rA . rA . 0. : : : = 0. rA . 0. 0. rE . : : : = 0. 0. rE . 0. 0. 0. 0. : : : = 0. 0. 0. 0. : : : = rA . 0. 0. 0. : : : = 0. 0. rA . 0. rA . : : : = rM . 0. = FLOT . 0. pbr = if (f & rel addr bit ) cool~ need b = true . 0. : : : = 0. 0. 0. = STB . 0. rA . 0. rA . 0. 0. 0. = POP . : : : = 0. 0. 0. rA . : : : = rA . 0. 0. = ZSN . : : : = rA . 0. : : : = 0. rA . 0. : : : = 0. : : : = 0. 0. 0. rA . 0. rD . rA . = LDT . = STSF . 0. 0. : : : = 0. else if (cool~ xx < cool L ) cool~ b = specval (&l[(cool O :l + cool~ xx ) & lring mask ]). 0. 0. = CMP . rA . 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. : : : = 0. 0. 0. g This code is used in section 103. 0. 0. 0. 107. = ADD . : : : = 0. 0. 0. rA . 0. 255g. rA . 0. 0. 0. = br . 0. 0. = BDIF . = CSN . 0. : : : = 0. rD . 0. 0. 0. 0. 0. 0. 0. = LDB . : : : = . 0. = TRAP . 0. : : : = rA . 0. = SYNCD . 0. 0. 0. 0. 0. 0. 0. 0. 0. = SL . = PBNN . rA . 0. rA . 0. 0. 0. = ZSNN . rA . 0. : : : = 0. 0. 0. rA . = MUL . rA . : : : = rA . 0. = LDVTS . 0. = SETH . 0. rE . = LDSF . 0. : : : = 0. 0. rM . 0. = OR . 0. 0. 0. = FMUL . 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. h Global variables 20 i + unsigned char third operand [256] = f 0. 0. 0. 0. rA . 0. 0. 0. 0. rA . rA . rA . 0. 0. that register is listed in the third operand table. 0. 0. rA . 0. 0. rA . : : : = rA . 0. 0. 0. = MUX . rA . : : : = 0. = BNN . : : : = 0. 0. 0. 0. 0. : : : = rJ . 0. 0. = JMP . 0. : : : = rA . 0. 0. 0. 0. 0. 0. rA . = BN . 0. rA . : : : = 0. 0. 0. = ORH . 0. 0. 0. 0. = STT . 0. : : : = 0. 0. = CSNN . 0. 0. rA . : : : = 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. h Set cool~ b from register X 106 i  f if (cool~ xx  cool G ) cool~ b = specval (&g[cool~ xx ]). 0. 0.

201 MMIX-PIPE: THE DISPATCH STAGE 108. The cool~ b .

cool : control . rE = 2. x52. cool~ ra = specval (&g[rA ]). set = 33. register int . g if (i 6= set ) f = register X should also be the Y operand = cool~ y = cool~ b. rJ = 4. l: specnode . rel addr bit = # 40. op : x 75. when rA is needed. specval : static spec ( ). f : register int . case 2: cool~ z:o:l = yz  16. x11. x88. true = 1. register mmix opcode . cool L : int . rD = 1. case 1: cool~ z:o:h = yz . x49. So we use cool~ ra instead. break . need b : bool . yz : . rM = 5. x75. x12. g g This code is used in section 103. y: spec . x40. g : specnode [ ]. b: spec . h Set cool~ z as an immediate wyde 109 i  f switch (op & 3) f case 0: cool~ z:o:h = yz  16. x93. need ra : bool . which need rA. zero spec : spec . x44. x52. x99. x86. cool~ b = specval (&g[third operand [op ]]). o: octa . x98. break . x60. i: register int . x17. 109. x75. l: tetra . x52. x17. br = 69.eld is busy in operations like STB or STSF . cool G : int . if (third operand [op ] 6= rA ) cool~ need b = true . x44. ra : spec . g This code is used in section 103. x44. x86. x44. case 3: cool~ z:o:l = yz . x49. x99. x41. x49. h Set cool~ b and/or cool~ ra from special register 108 i  f if (third operand [op ]  rA _ third operand [op ]  rE ) cool~ need ra = true . x44. lring mask : int . xx : unsigned char . z : spec . x44. x44. rA = 21. x52. cool~ b = zero spec . h: tetra . pbr = 70. break . x52. break . x83. cool O : octa .

g g This code is used in section 101. or insert an internal command and goto dispatch done if X is marginal 110 i  f if (cool~ xx  cool G ) cool~ ren x = true . else if (cool~ xx < cool L ) cool~ ren x = true .MMIX-PIPE: THE DISPATCH STAGE 202 110. h Check for suÆcient rename registers and memory slots. else f = we need to increase L before issuing head~ inst = increase L : if (((cool S :l cool O :l) & lring mask )  cool L ^ cool L 6= 0) h Insert an instruction to advance gamma 113 i else h Insert an instruction to advance beta and L 112 i. h Install register X as the destination. 111. else goto stall . This code is used in section 75. rename regs = cool~ ren x + cool~ ren a . or goto stall 111 i  if (rename regs < cool~ ren x + cool~ ren a ) goto stall . &cool~ x). if (cool~ mem x ) if (mem slots ) mem slots . 112. &cool~ x). spec install (&g[cool~ xx ]. The incrl instruction advances . spec install (&l[(cool O :l + cool~ xx ) & lring mask ].

and rL by 1 at a time when we know that .

= this instruction to be handled by the simplest units = cool~ interim = true . new S = incr (cool S . &cool~ rl ). cool~ rl :o:l = cool L + 1. &cool~ x). in the ring of local registers. cool~ x:known = true . 6= . h Insert an instruction to advance gamma 113 i  f cool~ need b = cool~ need ra = false . cool~ y = cool~ z = zero spec . 1). 113. cool~ y:p = . cool~ b = specval (&l[cool S :l & lring mask ]). cool~ y:o = shift left (cool S . cool~ i = incgamma . op = SETH . 3). cool~ need b = cool~ need ra = false . h Insert an instruction to advance beta and L 112 i  f cool~ i = incrl . cool~ z = zero spec . spec install (&l[(cool O :l + cool L ) & lring mask ]. g This code is used in section 110. The incgamma instruction advances and rS by storing an octabyte from the local register ring to virtual memory location cool S  3. cool~ ren x = cool~ set l = true . goto dispatch done . . = cool~ x:o = zero octa = spec install (&g[rL ].

x52. shift left : octa ( ). spec install : static void ( ). g This code is used in sections 110. x44. inst : tetra . x99. incgamma = 84. x44. spec install (&mem . cool~ need b = false . x99. cool~ ren x = true . true = 1. x44. x75. dispatch done : label. mem slots : int . rL = 20. x101. x75. x44. x44. rl : specnode . decgamma = 85. x47. i: internal opcode . MMIX-ARITH x6. cool S : octa . zero octa : octa . zero spec : spec . x47. g This code is used in section 120. o: octa . x40. x98. cool~ y:o = shift left (new S . x44. = this instruction needs to be handled by load/store unit = cool~ interim = true . goto dispatch done . x11. op = LDOU . h Insert an instruction to decrease gamma 114 i  f cool~ i = decgamma . new S = incr (cool S . rename regs : int . x: specnode . cool~ ptr a = (void ) mem :up . need b : bool . g : specnode [ ]. l: LDOU set l : bool . x98. The decgamma instruction decreases and rS by loading an octabyte from virtual memory location (cool S 1)  3 into the local register ring. x44. x44. MMIX-ARITH x7. x40. &cool~ x). goto dispatch done . x68. x69. x41. x86. 3). x95. y : spec . x47. 114. x99. incrl = 86. 1). mem : specnode . cool~ y:p = . x44. interim : bool . x44. ren x : bool . x44. spec install (&l[new S :l & lring mask ]. ptr a : void . x86. xx : unsigned char . false = 0. cool O : octa . cool G : int . known : bool . x11. x115. cool~ z = cool~ b = zero spec . x17. x88. tetra#. = # e0. head : fetch . &cool~ x). cool : control . x93. p: specnode .203 MMIX-PIPE: THE DISPATCH STAGE cool~ mem x = true . MMIX-ARITH x4. mem x : bool . b: spec . and 337. l: specnode . x40. z : spec . specval : static spec ( ). op = STOU . up : specnode . x60. = 8e. x44. stall : label. x44. x49. STOU = # ae. ren a : bool . = this instruction needs to be handled by load/store unit = cool~ interim = true . 119. x44. lring mask : int . op : register mmix opcode . need ra : bool . incr : octa ( ). new S : octa . x49. x44. x86. SETH . x40. x49. x86. cool L : int .

and the addr . Storing into memory requires a doubly linked data list of specnodes like the lists we use for local and global registers.MMIX-PIPE: THE DISPATCH STAGE 204 115. In this case the head of the list is called mem .

The addr . 116. h External variables 4 i + Extern specnode mem .elds are physical addresses in memory.

&cool~ x). 3). 332. else cool~ z = specval (&g[cool~ zz ]). 120. spec install (cool~ xx  cool G ? &g[cool~ xx ] : &l[(cool O :l + cool~ xx ) & lring mask ]. break . h Special cases of instruction dispatch 117 i + case put : if (cool~ yy = 6 0 _ cool~ xx  32) goto illegal inst . case st : if ((op & # fe)  STCO ) cool~ b:o:l = cool~ xx . 347. h Initialize everything 22 i + mem :addr :h = mem :addr :l = 1. &cool~ a). 227. break . with $X as a secondary output. g if (cool~ xx  15 ^ cool~ xx  20) freeze dispatch = true . Partial store (pst ) commands read an octabyte from memory before they write it. rQ. else if (cool~ zz  rS ) cool~ z:o = shift left (cool S . spec install (&mem . 122. cool~ i = pst . When new data is PUT into special registers 16{21 (namely rK. illegal inst : cool~ interrupt j= B_BIT . 117. break . case get : if (cool~ yy _ cool~ zz  32) goto illegal inst . h Special cases of instruction dispatch 117 i  case cswap : cool~ ren a = true . privileged inst : cool~ interrupt j= K_BIT . 121. we will see later that such drastic PUT s defer execution until they reach the hot seat. case pst : cool~ mem x = true . cool~ ren x = true . .eld of a memory specnode is all 1s until the physical address has been computed. case ld : case ldunc : cool~ ptr a = (void ) mem :up . 118. Therefore we stop issuing further instructions until such PUT s are committed. if (cool~ xx  18 ^ :(cool~ loc :h & sign bit )) goto privileged inst . break . 3). Moreover. 119. 337. mem :up = mem :down = &mem . rG. This code is used in section 101. spec install (&g [cool~ xx ]. &cool~ x). if (cool~ xx  8) f if (cool~ xx  11) goto illegal inst . rU. 322. noop inst : cool~ i = noop . and 355. 312. See also sections 118. case ldvts : if (cool~ loc :h & sign bit ) break . if (cool~ zz  rO ) cool~ z:o = shift left (cool O . The CSWAP operation is treated as a partial store. break . rV. goto noop inst . or rL) it can a ect many things.

cool O : octa . up : specnode . But the value of L will be decreased before the PUSHGO is complete. x44. get = 54. freeze dispatch : register bool . rS = 11.205 MMIX-PIPE: THE DISPATCH STAGE 119. x60. = 1  3. Extern = macro. cool~ a:known = true . l: specnode . loc : octa . cool S : octa . x40. g : specnode [ ]. shift left : octa ( ). x49. cool~ rl :o:l = cool L x 1. mem x : bool . cool~ x:known = true . x44. rl : specnode . ( ). x49. x44. if (x  cool L ) f if (((cool S :l cool O :l) & lring mask )  cool L ^ cool L 6= 0) h Insert an instruction to advance gamma 113 i x = cool L . B_BIT = 1  2. &cool~ x). x11. new O = incr (cool O . a: specnode . l: tetra . pst = 66. x52. down : specnode . lring mask : int . spec install (&g[rJ ]. &cool~ a). o: octa . x44. bool . cswap = 68. z : spec . pushj = 71. x: specnode. x17. b: spec . x44. x93. ptr a : void . op : register mmix opcode . ld = 56. g cool~ ren x = true . x40. spec install (&l[(cool O :l + x) & lring mask ]. cool~ ren a = true . x17. x80. MMIX-ARITH x6. sign bit = macro. x86. x44. Moreover. cool : control . x49. STCO = # b4. x40. pushgo = 74. specnode = struct . unsigned char . incr : octa ( ). g break . ldvts = 60. cool L ++ . case syncid : if (cool~ loc :h & sign bit ) break . x44. &cool~ rl ). x98. x40. x44. x44. set l : bool . so it will never actually exceed G. x98. cool G : int . we needn't insert an incrl command. break . x44. cool~ a:o = incr (cool~ loc . x52. zz : unsigned char . inst ptr : spec . x86. A PUSHGO instruction with X  L causes L to increase momentarily by 1. x49. x99. octa . x49. x49. cool~ x:o:l = x. ren x : bool . addr : K_BIT known : rJ = 4. x99. x + 1). h Special cases of instruction dispatch 117 i + case pushgo : inst ptr :p = &cool~ go . x44. ren a : bool . true = 1. x44. rO = 10. h: tetra . syncid = 65. x49. x47. go : specnode . incrl = 86. spec install : static void x 95. i: internal opcode . x49. spec install (&g[rL ]. x49. x44. x44. x52. x44. case pushj : f register int x = cool~ xx . x44. ldunc = 59. case go : inst ptr :p = &cool~ go . p: specnode . x99. interrupt : unsigned int . x40. x49. cool~ x:o:h = 0. st = 63. new O : octa . cool~ set l = true . x44. x49. x54. x40. x49. x52. noop = 81. x40. x49. unsigned char . x75. 4). MMIX-ARITH x7. x284. xx : yy : . x75. even if L = G. rL = 20. go = 72. x54. cool L : int . specval : static spec ( ). x88. put = 55. x49. x4.

let x be its least signi. unless = . Once it is known. This element is usually present in the local register ring. We need to know the topmost \hidden" element of the register stack when a POP instruction is dispatched.MMIX-PIPE: THE DISPATCH STAGE 206 120.

break . &cool~ rl ). g g break . f register tetra x. register int new L . pop unsave : if (cool S :l  cool O :l) h Insert an instruction to decrease gamma 114 i. spec install (&l[(cool O :l 1) & lring mask ]. if (new L > cool G ) new L = cool G . register specnode p = l[(cool O :l 1) & lring mask ]:up . cool~ rl :o:l = new L . if ((tetra ) (cool O :l cool S :l)  x) h Insert an instruction to decrease gamma 114 i. if (inst ptr :p  UNKNOWN_SPEC ^ new head  tail ) inst ptr :p = &cool~ go . &cool~ x). 121. It's tempting to say that we could avoid taking up space in the reorder bu er when no operation needs to be done. cool~ set l = true . We will be decreasing rO by x + 1. case div : case divu : cool~ ren a = true . if (p~ known ) x = (p~ o:l) & # ff. &cool~ a). A JMP instruction quali.cant byte. h Special cases of instruction dispatch 117 i + case mulu : cool~ ren a = true . new O = incr (cool O . else goto stall . else new L = x. if (cool~ i  pop ) f cool~ z:o:l = yz  2. 122. if (cool~ i  pop ) new L = x + (cool~ xx  cool L ? cool~ xx : cool L + 1). x 1). spec install (&g[rH ]. spec install (&g[rR ]. spec install (&g[rL ]. break . so we may have to decrease repeatedly in order to maintain the condition rS  rO. h Special cases of instruction dispatch 117 i + case pop : if (cool~ xx ^ cool L  cool~ xx ) cool~ y = specval (&l[(cool O :l + cool~ xx 1) & lring mask ]). &cool~ a). if (x < new L ) cool~ ren x = true .

because the change of control occurs before the execution stage. because the interruption mechanism needs to . a program might get into a loop that consists entirely of jumps and no-ops.es as a no-op in this sense. However. even a no-op might have to be counted in the usage register rU. if it appears in a negative location. Even more importantly. then we wouldn't be able to interrupt it. so it might get into the execution stage for that reason. A no-op can also cause a protection interrupt.

. JMPB . and SWYM . The SWYM instruction with F_BIT set is a special case: This is a request from the fetch coroutine for an update to the IT-cache. when the page table method isn't implemented in hardware.nd the current location in the reorder bu er! At least one functional unit therefore needs to provide explicit support for JMP .

cool S : octa . x60. x44. break . x52. x44. specnode = struct . y : spec . cool L : int . x44. loc : octa . z : spec . i: internal opcode . x44. . interrupt : unsigned int . x93. x71. rH = 3. x44. set l : bool . x40. x54. x44. x40. x17. x99. x40. specnode [ ]. x49. MMIX-ARITH x6. x11. xx : unsigned char . x49. true = 1. x98. x98. stall : label. pop = 75. o: p: octa . h Undo data structures set prematurely in the cool block and break 123 i  if (cool~ ren x _ cool~ mem x ) spec rem (&cool~ x). rR = 6. break .207 MMIX-PIPE: THE DISPATCH STAGE h Special cases of instruction dispatch 117 i + case noop : if (cool~ interrupt & F_BIT ) f g cool~ go :o = cool~ y:o = cool~ loc . incr : octa ( ). x17. a: specnode . specval : static spec ( ). mulu = 27. known : bool . cool G : int . ren x : bool . x69. mem x : bool . F_BIT g: = 1  17. x52. x40. yz : register int . spec install : static void ( ). cool : control . x44. x44. ren a : bool . inst ptr : spec . x284. x86. x52. x 74. if (cool~ ren a ) spec rem (&cool~ a). lring mask : int . x97. new O : octa . x: specnode . x49. x40. inst ptr = specval (&g[rT ]). specnode . UNKNOWN_SPEC up : = macro. x 95. x44. l: specnode . x49. x88. x44. if (cool~ set l ) spec rem (&cool~ rl ). tetra = unsigned int . x75. x99. x44. spec rem : static void ( ). div = 9. tail : fetch . noop = 81. x99. go : specnode . rl : specnode . x49. x44. x44. This code is used in section 75. x86. rT = 13. rL = 20. specnode . if (inst ptr :p  &cool~ go ) inst ptr :p = UNKNOWN_SPEC . l: tetra . x44. cool O : octa . new head : register fetch . x52. x75. 123. divu = 28.

A coroutine with stage = 0 works in the fetch unit. Each coroutine scheduled for action at the current tick of the clock has a stage number corresponding to a particular subset of the MMIX hardware.MMIX-PIPE: THE EXECUTION STAGES 208 124. For example. the coroutines with stage = 2 are the second stages in the pipelines of the functional units. So now we want to simulate the behavior of its functional units. MMIX 's raison d'^etre is its ability to execute instructions. Several arti. The execution stages.

hence self ~ stage is the current stage number of interest.cially large stage numbers are used to control special coroutines that that do things like write data from bu ers into memory. self ~ ctl . In this program the current coroutine of interest is called self . is called data . The data record has many . We typically are simulating an operation in which data~ x is being computed as a function of data~ y and data~ z. this is the control block being operated on by the current coroutine. Another key variable.

elds. as described earlier when we de.

data~ owner is the same as self . This part of the simulator is written as if each functional unit is able to handle all 256 operations. which issues an instruction only to a functional unit that supports it. of course. during the execution stage. Once an instruction has been dispatched. we can simulate it most easily if we imagine that its functional unit is universal. In practice. Coroutines with higher stage numbers are processed . the actual specialization is governed by the dispatcher.ned control structures. for example. if it is nonnull. a functional unit tends to be much more specialized. however.

are the external operation code data~ op .rst. We typically have data~ state = 0 when a coroutine is . once self ~ stage is given. The three most important variables that govern a coroutine's behavior. and the value of data~ state . the internal operation code data~ i.

rst .

When a coroutine has done all it wants to on a single cycle. = the control block of the current coroutine = 125. h Local variables 12 i + register coroutine self . = the current coroutine being executed = register control data . It will not be scheduled to do any further work unless the schedule routine has been called since it began execution.red up. it says goto done . The wait macro is a convenient way to say \Please schedule me to resume again at the current data~ state " after a speci.

ed time. #de. wait (1) will restart a coroutine on the next clock tick. for example.

ne wait (t) f schedule (self . data~ state ). g #de. t. goto done .

t. data~ state ) #de.ne pass after (t) schedule (self + 1.

goto done .ne sleep f self ~ next = self . g = wait forever = #de.

t. t) schedule (c.ne awaken (c. c~ ctl~ state ) .

case 1: h Simulate the . print coroutine id (self ). if (cur time  ring size ) cur time = 0. printf ("\n" ). if (verbose & coroutine bit ) f printf (" running " ). self = sentinel :next ) f for (self = queuelist (cur time ). print control block (data ).209 MMIX-PIPE: THE EXECUTION STAGES h Execute all coroutines scheduled for the current time 125 i  cur time ++ . self ~ next = . = unschedule this coroutine = data = self ~ ctl . self = sentinel :next = self ~ next . 6 &sentinel . g switch (self ~ stage ) f case 0: h Simulate an action of the fetch coroutine 288 i. printf (" " ).

spec . x44. i: internal opcode . See also sections 215. 232. sentinel : coroutine . self ~ lockloc = . x23. x44. 237. h Cases for control of special coroutines 126 i. done : . x129. x29. print control block : static void ( ). stage : int .h>. x36. 127. <stdio. 126. x23. x44. = another = = such coroutines share a common control block = control vanish ctl . x44. g This code is used in section 64. x29. x25. op : mmix opcode . x44. default : h Simulate later stages of an execution pipeline 135 i. spec . x: y: z: specnode. x23. x44. 217. print coroutine id : schedule : static void ( ). x35. coroutine bit = 1  2. next : coroutine . static void ( ). x44. x4.rst stage of an execution pipeline 130 i. This code is used in section 125. verbose : int . h Cases for control of special coroutines 126 i  case vanish : goto terminate . h Global variables 20 i + = trivial coroutine that vanishes = coroutine mem locker . ring size : int . x23. control = struct . x28. and 257. ctl : control . 224. A special coroutine whose stage number is vanish simply goes away at its scheduled time. coroutine = struct . 222. owner : coroutine . x23. g terminate : if (self ~ lockloc ) (self ~ lockloc ) = . lockloc : coroutine . coroutine Dlocker . x44. x46. queuelist : static coroutine ( ). cur time : int . state : int . x8. printf : int ( ). . vanish = 98.

j < Icache~ ports . Dlocker :ctl = &vanish ctl . Here is a list of the stage numbers for special coroutines to be de. Dlocker :name = "Dlocker" . j ++ ) ITcache~ reader [j ]:ctl = &vanish ctl .MMIX-PIPE: THE EXECUTION STAGES 210 128. j ++ ) Dcache~ reader [j ]:ctl = &vanish ctl . mem locker :ctl = &vanish ctl . j ++ ) Icache~ reader [j ]:ctl = &vanish ctl . j < ITcache~ ports . vanish ctl :go :o:l = 4. Dlocker :stage = vanish . j < DTcache~ ports . j ++ ) DTcache~ reader [j ]:ctl = &vanish ctl . if (Dcache ) for (j = 0. j < Dcache~ ports . for (j = 0. h Initialize everything 22 i + mem locker :name = "Locker" . mem locker :stage = vanish . if (Icache ) for (j = 0. for (j = 0. 129.

ned below. h Header de.

nitions 6 i + #de.

ne max stage 99 = exceeds all stage numbers = #de.

ne vanish 98 = special coroutine that just goes away = #de.

ne ush to mem 97 = coroutine for ushing from a cache to memory = = coroutine for ushing from a cache to the S-cache = #de.

ne ush to S 96 = coroutine for .

lling a cache from memory = #de.

ne .

ll from mem 95 #de.

ne .

ll from S 94 = coroutine for .

lling a cache from the S-cache = #de.

ne .

ll from virt 93 = coroutine for .

lling a translation cache = = coroutine for emptying the write bu er = #de.

ne write from wbuf 92 #de.

At the very beginning of stage 1. a functional unit will stall if necessary until its operands are available.ne cleanup 91 = coroutine for cleaning the caches = 130. the state is set nonzero and execution proper begins. h Simulate the . As soon as the operands are all present.

set state = 1 if it's there 131 i.rst stage of an execution pipeline 130 i  switch1 : switch (data~ state ) f case 0: h Wait for input data if necessary. case 1: h Begin execution of an operation 132 i. h Special cases for states in the . case 3: h Finish execution of an operation 144 i. case 2: h Pass data to the next stage of the pipeline 134 i.

g This code is used in section 125.rst stage 266 i. .

DTcache : cache . set state = 1 if it's there 131 i  j = 0. x167.) h Wait for input data if necessary. stage : int . x23. x44. we grab it now but wait for the next cycle. data~ z:p = . x44. x167. Dcache : cache . vanish ctl : control . ports : int . x127. x40. If some of our input data has been computed by another coroutine on the current cycle. state : int . if (data~ z:p~ known ) data~ z:o = data~ z:p~ o. x40. x168. x12. j: j: register int . x168. name : char . x10. Icache : cache . if (j ) wait (1). reader : coroutine . x44. p: specnode . mem locker : coroutine . x127. ra : spec . y: z: spec . x23. ITcache : cache . if (data~ y:p) f j ++ . ctl : control . x44. o: octa . x168. data~ b:p = . if (data~ y:p~ known ) data~ y:o = data~ y:p~ o. g if (data~ b:p) f if (data~ need b ) j ++ . g if (data~ ra :p) f if (data~ need ra ) j ++ . x168. else if (data~ need b ) j += 10. x23. x44. b: spec . register int . data : register control . l: tetra . known : bool . x125. x127. Dlocker : coroutine . (An actual machine wouldn't have latched the data until then. if (data~ ra :p~ known ) data~ ra :o = data~ ra :p~ o. g if (data~ z:p) f j ++ . x44. data~ y:p = . else j += 10. else if (data~ need ra ) j += 10. if (data~ b:p~ known ) data~ b:o = data~ b:p~ o. x44. x40.211 MMIX-PIPE: THE EXECUTION STAGES 131. x44. = otherwise we fall through to case 1 = This code is used in section 130. need ra : bool . spec . g if (j < 10) data~ state = 1. wait = macro ( ). go : specnode . else j += 10. . x 124. need b : bool . data~ ra :p = . x17.

but others like FADD almost certainly require more time.MMIX-PIPE: 212 THE EXECUTION STAGES 132. This simulator can be con. Simple register-to-register instructions like ADD are assumed to take just one cycle.

gured so that FADD might take. etc. or two pipeline stages of two cycles each (2 + 2). has been con. This code is used in section 130. for simplicity. g h Set things up so that the results become known when they should 133 i. say.. placing them in data~ x and possibly also in data~ a and/or data~ interrupt . or a single unpipelined stage lasting four cycles (4). 133. etc. The results will not be oÆcially made known until the proper time. If the internal opcode data~ i is max pipe op or less. a special pipeline sequence like 1 + 1 + 1 + 1 or 2 + 2 or 15 + 10. h Cases for stage 1 execution 155 i. h Begin execution of an operation 132 i  switch (data~ i) f h Cases to compute the results of register-to-register operation 137 i. In any case the simulator computes the results now. h Cases to compute the virtual address of a memory operation 265 i. four pipeline stages of one cycle each (1+1+ 1 + 1).

This code is used in section 132. Suppose the pipeline sequence is t1 + t2 +    + t . = 3. 134. f register unsigned char s = pipe seq [data~ i]. g j = s[self ~ stage ]. When we're in stage j . The value of denin is added to t1 . h Pass data to the next stage of the pipeline 134 i  = stall if the next stage is occupied = pass data : if ((self + 1)~ next ) wait (1). if (data~ i  max pipe op ) f register unsigned char s = pipe seq [data~ i]. the coroutine for stage j + 1 of the same functional unit is self + 1. Otherwise we assume that the pipeline sequence is simply 1. wait t cycles and make the results known . we want to do the following: Wait (t1 1) cycles and pass data to stage 2. if (j > 1) wait (j 1). wait t2 cycles and pass data to stage 3. = more than one stage = else j += data~ denout . so we represent the sequence as a string pipe seq [data~ i] of unsigned \characters. h Set things up so that the results become known when they should 133 i  k k j k k data~ state = 3. : : : . if (s[1]) data~ state = 2. if (s[self ~ stage + 1]  0) j += data~ denout . wait t 1 cycles and pass data to stage k." terminated by 0. Given such a string. passit : (self + 1)~ ctl = data . g goto switch1 . Each t is positive and less than 256. data~ state = the next stage is the last = pass after (j ). . the value of denout is added to t .gured. j = s[0] + data~ denin .

213 MMIX-PIPE: THE EXECUTION STAGES data~ owner = self + 1. 1: h Begin execution of a stage-two operation 351 i. 2: goto pass data . goto done . 3: goto . h Simulate later stages of an execution pipeline 135 i  switch2 : if (data~ b:p ^ data~ b:p~ known ) data~ b:o = data~ b:p~ o. data~ b:p = . This code is used in section 130. switch (data~ state ) f 0: panic (confusion ("switch2" )). 135.

136. they can be overridden by MMIX con. h Special cases for states in later stages 272 i. The default pipeline times use only one stage.n ex . case case case case g This code is used in section 125.

since it must never interfere with the stage numbers for special coroutines de. The total number of stages supported by this simulator is limited to 90.g .

ned below. (The author doesn't feel guilty about making this restriction.) h External variables 4 i + #de.

143. x44. 139. 344. 140. x44. x4. confusion = macro ( ). x13. ctl : control . x44.ne pipe limit 90 Extern unsigned char pipe seq [max pipe op + 1][pipe limit + 1]. (We might as well start with the easy cases and work our way up. x124. Extern = macro. 142. and 350. break . See also sections 138. 345. 137. The simplest of all register-to-register operations is set .) h Cases to compute the results of register-to-register operation 137 i  case set : data~ x:o = data~ z:o. This code is used in section 132. 141. done : label. denout : int . x44. spec . data : register control . 343. 346. x125. 348. a: b: specnode . denin : int . which occurs for commands like SETH as well as for commands like GETA . . x23.

x12. i: internal opcode . j : register int . x44. unsigned int . max pipe op = feps . x40.n ex : label. x44. MMIX con. x144. known : bool . x49.

next : coroutine . x44. x125. set = 33. x44. self : register coroutine . x23. interrupt : panic = macro ( ). x44. owner : coroutine . spec . . pass after = macro ( ). wait = macro ( ). p: specnode . switch1 : label. x44. o: octa . state : int . x49. MMIX-CONFIG x38. x: z: specnode. stage : int . x125. x23. x40. x 124. x130. x13.g : void ( ). x40.

break . data~ x:o:l = data~ y:o:l j data~ z:o:l. . h Cases to compute the results of register-to-register operation 137 i + case add : data~ x:o = oplus (data~ y:o. 139. data~ x:o:l = data~ y:o:l  data~ z:o:l. break . if (((data~ y:o:h  data~ z:o:h) & sign bit )  0 ^ ((data~ y:o:h  data~ x:o:h) & sign bit ) 6= 0) data~ interrupt j= V_BIT . case xor : data~ x:o:h = data~ y:o:h  data~ z:o:h. Over ow occurs in the calculation x = y z if and only if it occurs in the calculation y = x + z. 140. case nand : data~ x:o:h = (data~ y:o:h & data~ z:o:h). break . break . 1 + ((data~ op  1) & # 3)) : data~ y:o. case andn : data~ x:o:h = data~ y:o:h & data~ z:o:h. data~ z:o). break . but also for operations like 4ADDU . data~ z:o). which account for 24 of MMIX 's 256 opcodes. data~ x:o:l = data~ y:o:l  data~ z:o:l. break . Over ow occurs when adding y to z if and only if y and z have the same sign but their sum has a di erent sign. but over ow must also be detected. data~ x:o:l = (data~ y:o:l j data~ z:o:l). case orn : data~ x:o:h = data~ y:o:h j data~ z:o:h. break . The implementation of ADDU is only slightly more diÆcult. case nxor : data~ x:o:h = data~ y:o:h  data~ z:o:h. data~ z:o). data~ x:o:l = data~ y:o:l & data~ z:o:l. case and : data~ x:o:h = data~ y:o:h & data~ z:o:h. data~ x:o:l = (data~ y:o:l & data~ z:o:l). break . case subu : data~ x:o = ominus (data~ y:o. case nor : data~ x:o:h = (data~ y:o:h j data~ z:o:h). break .MMIX-PIPE: THE EXECUTION STAGES 214 138. in which we simply want to add data~ y:o to data~ z:o. break . Signed addition and subtraction produce the same results as their unsigned counterparts. h Cases to compute the results of register-to-register operation 137 i + case addu : data~ x:o = oplus ((data~ op & # f8)  # 28 ? shift left (data~ y:o. data~ x:o:l = data~ y:o:l & data~ z:o:l. h Cases to compute the results of register-to-register operation 137 i + case or : data~ x:o:h = data~ y:o:h j data~ z:o:h.L] operations. break . It would be trivial except for the fact that internal opcode addu is used not only for the ADDU[I] and INC[M][H. Here are the basic boolean operations. data~ x:o:l = data~ y:o:l j data~ z:o:l.

(Notice that shlu is changed to sh . if the default value of pipe seq [sh ] is changed. Similar changes to the internal op codes are made for other operators below. But we compute shifts all at once here.) #de. if (((data~ x:o:h  data~ z:o:h) & sign bit )  0 ^ ((data~ y:o:h  data~ x:o:h) & sign bit ) 6= 0) data~ interrupt j= V_BIT . break . data~ z:o).215 MMIX-PIPE: THE EXECUTION STAGES case sub : data~ x:o = ominus (data~ y:o. for this reason. 141. or they might even be pipelined. because other parts of the simulator will take care of the pipeline timing. The shift commands might take more than one cycle.

The MUX operation has three operands. interrupt : unsigned int . data~ z. register control . break . and = 37. MMIX-ARITH x5. sub = 31. x49. o: octa . x124. x49. x17. l: tetra . f octa tmpo . sign bit = macro. tmpo = shift right (data~ x:o. x44. shl = 44. 142. x44. orn = 35. octa = struct . x44. data~ i = sh . shift right : octa ( ). case shru : data~ x:o = shift right (data~ y:o. ominus : octa ( ). . x49. shift amt ). x49. shift amt . if (tmpo :h 6= data~ y:o:h _ tmpo :l 6= data~ y:o:l) data~ interrupt j= V_BIT . data : mux = 11. shift left : octa ( ). specnode. 1). x44. x44. nxor = 41. x49. pipe seq : unsigned 136. case shl : data~ x:o = shift left (data~ y:o. shift amt . x49. x49. V_BIT x: = 1  14. x44. x44. shift amt . data~ i = sh . x49. x49. shr = 45. nor = 36. x49. h: tetra . data~ x:o:l = (data~ y:o:l & data~ b:o:l) + (data~ z:o:l & data~ b:o:l). add = 29. 0). andn = 38. x char [ ][ ]. g break . x49. MMIX-ARITH x5. xor = 40. x49. h Cases to compute the results of register-to-register operation 137 i + case mux : data~ x:o:h = (data~ y:o:h & data~ b:o:h) + (data~ z:o:h & data~ b:o:h). x49. x49. subu = 32. z : spec . nand = 39. x49. x40. MMIX-ARITH x7. x49. x54. shift amt ). data~ i = sh . oplus : octa ( ). or = 34. shlu = 42. x80. break . x17. the third operand is the current (speculative) value of rM. data~ i = sh . x49. i: internal opcode . op : mmix opcode . break . MMIX-ARITH x7. shru = 43. case shr : data~ x:o = shift right (data~ y:o. b: spec . x17. the special mask register. 0). addu = 30. sh = 10. and data~ b. x49. break .ne shift amt (data~ z:o:h _ data~ z:o:l  64 ? 64 : data~ z:o:l) h Cases to compute the results of register-to-register operation 137 i + case shlu : data~ x:o = shift left (data~ y:o. namely data~ y. y : spec . Otherwise MUX is unexceptional.

if (data~ y:o:l > data~ z:o:l) goto cmp pos . = data~ x is zero = cmp zero : break . case cmpu : if (data~ y:o:h < data~ z:o:h) goto cmp neg . cmp neg : data~ x:o = neg one . if (data~ y:o:l < data~ z:o:l) goto cmp neg . break . now that we understand the basic ideas. because it completes the execution stage for the simple cases already considered. if ((data~ y:o:h & sign bit ) < (data~ z:o:h & sign bit )) goto cmp pos . The ren x and ren a . But one more piece of code ought to written before we move on. The other operations will be deferred until later. Comparisons are a breeze.MMIX-PIPE: THE EXECUTION STAGES 216 143. h Cases to compute the results of register-to-register operation 137 i + case cmp : if ((data~ y:o:h & sign bit ) > (data~ z:o:h & sign bit )) goto cmp neg . 144. = data~ x:o:h is zero = cmp pos : data~ x:o:l = 1. if (data~ y:o:h > data~ z:o:h) goto cmp pos . break .

elds tell us whether the x and/or a .

h Finish execution of an operation 144 i  .elds contain valid information that should become oÆcially known.

= this coroutine now fades away = This code is used in section 130. die : data~ owner = . else if (data~ mem x ) data~ x:known = true . = no trips enabled for the operating system = if (data~ interrupt & # ffff) h Handle interrupt at end of execution stage 307 i. . goto terminate . if (data~ loc :h & sign bit ) data~ ra :o:l = 0. data~ x:addr :l &= 8.n ex : if (data~ ren x ) data~ x:known = true . if (data~ ren a ) data~ a:known = true .

cool : control . Committing changes the hot state by doing the least recently issued instructions. x60. printf : int ( ). lockloc : coroutine . x44.217 MMIX-PIPE: THE COMMISSION/DEISSUE STAGE 145. x8. cur S : octa . known : bool . (cool~ ren a ) rename regs ++ . x86. x 33. (cool~ mem x ) mem slots ++ . if (verbose & issue bit ) f printf ("Deissuing " ). in their original order. x60. x23. rename regs : int . Deissuing changes the cool state by undoing the most recently issued instructions. spec rem : static void ( ). g cool O = cool~ cur O . x44. x44. x46. verbose : x: y: z: int . terminate : label. a: specnode .h>. if (cool~ owner~ next ) unschedule (cool~ owner ). x44. spec . . (cool~ owner ) f if (cool~ owner~ lockloc ) (cool~ owner~ lockloc ) = . rl : specnode . Deissuing must take priority over committing. x44. x40. x80. l: tetra . at most commit max instructions are deissued and/or committed on each clock cycle. mem x : bool . cool S : octa . spec rem (&cool~ a). x 124. neg one : octa . x86. cool O : octa . cmpu = 47. data : register control . spec rem (&cool~ x). x44. (cool~ set l ) spec rem (&cool~ rl ). x17. x125. specnode. x44. deissues : int . g g (cool~ ren x ) rename regs ++ . printf ("\n" ). if (cool~ owner ) f printf (" " ). x4. issue bit = 1  0. x17. x40. next : coroutine . o: octa . h Deissue the coolest instruction 145 i  f cool = (cool  reorder top ? reorder bot : cool + 1). cool~ owner~ lockloc = . x44. We hope most of them are committed. x44. The commission/deissue stage. true = 1. x44. Both operations are similar. <stdio. so we assume that they take the same time. x98. but from time to time our speculation is incorrect and we must deissue a sequence of instructions that prove to be unwanted. x44. cmp = 46. x44. x60. x59. x49. x60. print coroutine id : ren x : bool . mem slots : int . reorder top : control . Control blocks leave the reorder bu er either at the hot end (when they're committed) or at the cool end (when they're deissued). print control block (cool ). spec rem (&cool~ x). owner : coroutine . x44. ren a : bool . x98. print control block : static void ( ). because the dispatcher cannot do anything until the machine's cool state has stabilized. x44. unschedule : static void ( ). x11. deissues . addr : octa . in reverse order. static void ( ). cool S = cool~ cur S . x25. commit max : int . x23. spec . set l : bool . x49. print coroutine id (cool~ owner ). MMIX-ARITH x4. cur O : octa . ra : spec . if if if if if g This code is used in section 67. h: tetra . x40. sign bit = macro. x97. loc : octa . reorder bot : control . interrupt : unsigned int . x44.

A load or store instruction is \nulli. or break if it's not ready 146 i  f if (nullifying ) h Nullify the hottest instruction 147 i else f if (hot~ i  get ^ hot~ zz  rQ ) new Q = oandn (g[rQ ]:o. spec rem (&(hot~ a)). (hot~ ren a ) rename regs ++ . (hot~ arith exc ) g[rA ]:o:l j= hot~ arith exc . spec rem (&(hot~ x)). else if (hot~ i  put ^ hot~ xx  rQ ) hot~ x:o:h j= new Q :h. if (g[rU ]:o:l  0) f g[rU ]:o:h ++ . printf ("\n" ). (hot~ set l ) hot~ rl :up~ o = hot~ rl :o. if ((g[rU ]:o:h & # 7fff)  0) g [rU ]:o:h = # 8000. (hot~ usage ) f g[rU ]:o:l ++ .MMIX-PIPE: THE COMMISSION/DEISSUE STAGE 218 146. if (verbose & issue bit ) f printf ("Committing " ). h Commit the hottest instruction. g if if if if if g g (hot~ ren x ) rename regs ++ . g This code is used in section 67. if (hot~ mem x ) h Commit to memory if possible. hot~ x:o). otherwise break 256 i. 147. hot~ a:up~ o = hot~ a:o. hot~ x:up~ o = hot~ x:o. print control block (hot ). hot~ x:o:l j= new Q :l. spec rem (&(hot~ rl )). g if (hot~ interrupt  H_BIT ) h Begin an interruption and break 317 i.

ed" if it is about to be captured by a trap interrupt. (It is important to have stopped dispatching when nulli. thus nullifying is sort of a cross between deissuing and committing. In such cases it will be the only item in the reorder bu er.

spec rem (&hot~ a). spec rem (&hot~ x). because instructions such as incgamma and decgamma change rS. printf ("\n" ). cool O = hot~ cur O . if (hot~ mem x ) mem slots ++ . cool S = hot~ cur S . spec rem (&hot~ x).) h Nullify the hottest instruction 147 i  f if (verbose & issue bit ) f g printf ("Nullifying " ). . if (hot~ ren x ) rename regs ++ . h Global variables 20 i + = when rQ increases in any bit position. nullifying = false . so should this = octa new Q . if (hot~ set l ) spec rem (&hot~ rl ). print control block (hot ). 148. if (hot~ ren a ) rename regs ++ . Interrupt bits in rQ might be lost if they are set between a GET and a PUT . and we need to change it back when an unexpected interruption occurs.cation is necessary. Therefore we don't allow PUT to zero out bits that have become 1 since the most recently committed GET . g This code is used in section 146.

rU = 17. hot : control . verbose : int . cur S : octa . x46. cool S : octa . incgamma = 84. x97. x52. h: tetra . x44. x40. x49. x44. x54. x60. mem x : bool . cur O : octa . l: tetra . x98. x49. x44. oandn : put = 55. rename regs : int . x44. x49. x315. x44. x44. x86. x44. x: specnode. mem slots : int . decgamma = 85. false = 0. nullifying : bool . ren x : bool . x17.h>. specnode [ ]. g: MMIX-PIPE: int . x86. get = 54. printf : int ( ). x44. octa = struct . THE COMMISSION/DEISSUE STAGE interrupt : unsigned int . x52. x44. x98. x4. x52. zz : unsigned char . . x11. i: internal opcode . set l : bool . x44. x40. x49. x44. xx : unsigned char . rl : specnode . x44. MMIX-ARITH x25. octa ( ). x17. ( ). ren a : bool .219 a: specnode . x17. spec rem : static void up : specnode . <stdio. rA = 21. rQ = 16. x44. H_BIT = 1  16. issue bit = 1  0. o: octa . arith exc : unsigned cool O : octa . usage : bool . print control block : static void ( ). x44. x86. x44. x8.

MMIX-PIPE: THE COMMISSION/DEISSUE STAGE 220 149. g[rQ ]:o:h j= P_BIT . printf ("\n" ). h Check for security violation. break if so 149 i  f if (hot~ loc :h & sign bit ) f if ((g[rK ]:o:h & P_BIT ) ^ :(hot~ interrupt & P_BIT )) f hot~ interrupt j= P_BIT . put . break . leading to an immediate interrupt (unless the current instruction is trap . g This code is used in section 67. new Q :h j= P_BIT . g else if ((g[rK ]:o:h & # ff) 6= # ff ^ :(hot~ interrupt & S_BIT )) f hot~ interrupt j= S_BIT . g[rQ ]:o:h j= S_BIT . Such instructions take one extra cycle before they are committed. rK=" ). The nonnegativelocation case turns on the S_BIT of both rK and rQ. printf (". g g break . Conversely. new Q :h j= S_BIT . print octa (g [rK ]:o). An instruction will not be committed immediately if it violates the basic security rule of MMIX : An instruction in a nonnegative location should not be performed unless all eight of the internal interrupts have been enabled in the interrupt mask register rK. . print octa (g[rQ ]:o). or resume ). if (verbose & issue bit ) f g g printf (" setting rQ=" ). printf ("\n" ). print octa (g [rQ ]:o). an instruction in a negative location should not be performed if the P_BIT is enabled in rK. if (verbose & issue bit ) f printf (" setting rQ=" ). g[rK ]:o:h j= S_BIT .

221 MMIX-PIPE: BRANCH PREDICTION 150.3 of Hennessy and Patterson's Computer Architecture. second edition. section 4. An MMIX programmer distinguishes statically between \branches" and \probable branches." but many modern computers attempt to do better by implementing dynamic branch prediction. for example. (See.) Experience has shown that dynamic branch prediction can signi. Branch prediction.

cantly improve the performance of speculative execution. where n is between 1 and 8. The bp table is consulted and updated on every branch instruction (every B or PB instruction. but not JMP ). for advice on past history of similar situations. but 8 bits are allocated per entry for convenience in this program. by reducing the number of instructions that need to be deissued. It is indexed by the a least signi. This simulator has an optional bp table containing 2 + + entries of n bits each. Usually n is 1 or 2 in practice.

we will predict the opposite of the instruction's recommendation.cant bits of the address of the instruction. If it is negative. (Incidentally. the b most recent bits of global branch history. namely to predict a branch taken only in the PB case. a large value of n is not necessarily a good idea. we will follow the prediction in the instruction. If it is nonnegative. A bp table entry begins at zero and is regarded as a signed n-bit number. The n-bit number is increased (if possible) if the instruction's prediction was correct. and the next c bits of both address and history (exclusive-ored). For example. decreased (if possible) if the instruction's prediction was incorrect. if n = 8 the machine might need 128 steps to recognize that a branch taken the .

x52.rst 150 times is not taken the next 150 times. b c rK = 15. P_BIT = 1  0. o: put = 55. h External variables 4 i + = parameters for branch prediction = Extern int bp a . x60. h: tetra . and n in this discussion are called bp a . and bp n in the program. x80. x49. bp c . x8. bp c . octa .h>. x52. trap = 82. hot : control . . bp n . x54. x44. print octa : static void ( ). x54. x44. x17. b. g : specnode [ ]. interrupt : unsigned int . bp b . sign bit = macro. loc : octa . issue bit = 1  0. x4. x19. x49. = either  or an array of 2 a+b+c items = Extern char bp table . <stdio. S_BIT = 1  1. x4. resume = 76. x49. new Q : octa . And if we modify the update criteria to avoid this problem. printf : int ( ). bp b . rQ = 16. we obtain a scheme that is rarely better than a simple scheme with smaller n.) The values a. verbose : int . x86. c. a Extern = macro. x148. x40.

Branch prediction is made when we are either about to issue an instruction or peeking ahead. g if (predicted ) peek hist = (peek hist else peek hist = 1. reversed ? "NG" : "OK" . g This code is used in section 75. h = bp table [m]. if (reversed ) f bp table [m] = h down . h up = (h + 1) & bp nmask . but we don't want to update it yet. g This code is used in section 85. if (h & bp npower ) predicted = # 10. bp table [m] ((bp table [m] & bp npower )  1)). bp ok stat ++ . h Predict a branch outcome 151 i  f predicted = op & # 10. just in case our prediction turns out to be wrong. bp[%x]=%d\n" . g else f = go with the ow = bp table [m] = h up . print octa (cool~ loc ). printf (" %s. if (h up  bp npower ) h up = h. g if (verbose & show pred bit ) f printf (" predicting " ).  1) + 1. m = ((head~ loc :l & bp cmask )  bp b ) + (head~ loc :l & bp amask ). m. h = bp table [m]. We look at the bp table . reversed = op & # 10. And we store the opposite table value in cool~ x:o:l. h Record the result of branch prediction 152 i  if (bp table ) f register int reversed . cool~ x:o:l = h down . g cool~ x:o:h = m. 152. h up . m = ((head~ loc :l & bp cmask )  bp b ) + (head~ loc :l & bp amask ). . h down . cool~ i = pbr + br cool~ i. m = ((cool hist & bp bcmask )  bp b )  (m  2). cool~ x:o:l = h up . = start with the instruction's recommendation = if (bp table ) f register int h. if (h  bp npower ) h down = h. h. m = ((cool hist & bp bcmask )  bp b )  (m  2). We update the bp table when an instruction is issued.MMIX-PIPE: 222 BRANCH PREDICTION 151. if (peek hist & 1) reversed = # 10. else h down = (h 1) & bp nmask . = reverse the sense = bp rev stat ++ .

h Initialize everything 22 i + = least a bits of instruction address = bp amask = ((1  bp a ) 1)  2. = least signi. and n. = the next c address bits = bp cmask = ((1  bp c ) 1)  (bp a + 2). The calculations in the previous sections need several precomputed constants. c. b.223 MMIX-PIPE: BRANCH PREDICTION 153. depending on the parameters a. = least b + c bits of history info = bp bcmask = (1  (bp b + bp c )) 1.

bp npower . h Cases for stage 1 execution 155 i  case br : case pbr : j = register truth (data~ b:o. bp bcmask . bp nmask . the n-bit code for 1 = bp npower = 1  (bp n 1). if (j ) data~ go :o = data~ z:o. bp ok stat . else f = oops. data~ op ). bp cmask .cant n bits = bp nmask = (1  bp n ) 1. we're ready to determine if the prediction was correct or not. After a branch or probable branch instruction has been issued and the value of the relevant register has been computed in the reorder bu er as data~ b:o. 155. = how often we overrode and agreed = int bp rev stat . = how often we failed and succeeded = int bp bad stat . h Recover from incorrect branch prediction 160 i. if (j  (data~ i  pbr )) bp good stat ++ . misprediction = bp bad stat ++ . 154. bp good stat . = 2n 1 . h Global variables 20 i + int bp amask . else data~ go :o = data~ y:o. g goto .

x49. cool hist : unsigned int . x60.n ex . bp b : int . x44. 325. x99. x150. bp a : int . b: spec . bp n : int . cool : control . x150. This code is used in section 132. 327. x150. 329. bp c : int . See also sections 313. 124. x150. . x150. and 356. bp table : char . data : register control . 328. 331. br = 69.

print octa : static void ( ).n ex : label. head : pbr = 70. x 157. x h: tetra . <stdio. x144.h>. verbose : int . show pred bit = 1  7. x44. j : register int . x85. register truth : static int ( ). x44. go : specnode . fetch . peek hist : unsigned int . x4. x8. spec . o: octa . x49. x44. x17. x68. x44. i: internal opcode . x44. x75. x44. x99. printf : int ( ). m: register int . x: y: z: specnode. x17. loc : octa . . x40. predicted : register int . x12. op : mmix opcode . l: tetra . x69. spec . x19. x44. op : register mmix opcode . x12. loc : octa .

MMIX-PIPE: 224 BRANCH PREDICTION 156. The register truth subroutine is used by B . CS . PB . and ZS commands to decide whether an octabyte satis.

resuming = 0. break . h Subroutines 14 i + static int issued between (c. = clear the fetch bu er = old tail = tail = head . We correct the bp table only if we discover that a prediction was wrong. h Internal prototypes 13 i + static int issued between ARGS ((control . break . nor do we worry about cases where the same bp table entry is being updated by two or more active coroutines. After all. or if misprediction is detected by one unit just as another unit is generating an interrupt. return (c reorder bot ) + (reorder top cc ). else return b. mmix opcode op . 159. not part of the real computation. break . switch ((op  1) & # 3) f case 0: b = o:h  31. f g if (c > cc ) return c 1 cc . h Subroutines 14 i + static int register truth (o. if (i < deissues ) goto die .es the conditions of the opcode. cool ). f register int b. op ) octa o. h Recover from incorrect branch prediction 160 i  i = issued between (data . cc . = zero? = = positive? = case 2: b = (o:h < sign bit ^ (o:h _ o:l)). break . . mmix opcode )). 158. case 3: b = o:l & # 1. = negative? = case 1: b = (o:h  0 ^ o:l  0). data~ op . control )). we assume that an arbitration takes place so that only the hottest one actually deissues the cooler instructions. The issued between subroutine determines how many speculative instructions were issued between a given control block in the reorder bu er and the current cool pointer. when cc = cool . deissues = i. Changes to the bp table aren't undone when they were made on speculation in an instruction being deissued. so that we will be less likely to make the same mistake later. cc ) control c. = odd? = g g if (op & # 8) return b  1. h Internal prototypes 13 i + static int register truth ARGS ((octa . 160. the bp table is just a heuristic. 157. If more than one functional unit is able to process branch instructions and if two of them simultaneously discover misprediction.

old tail : fetch . %d bad\n" . printf (". %d bp ok stat . x17. reorder top : control . op : mmix opcode . x8. x59. x60. go : specnode . j ++ ) printf (" %d %d\n" . x4. tail : fetch . bp bad stat ). x17. x66. x154. h External prototypes 9 i + Extern void print stats ARGS ((void )). %d in opposition. x69. x60. data~ x:o:h. x40. x44. x154. x12. bp bad stat ). bp good stat . bad stat : int . x80. x150. o: octa . bp[%x]=%d\n" . x99. die : label. good stat : int . printf : int ( ). if (verbose & show pred bit ) f printf (" mispredicted " ). h: tetra . octa = struct .225 MMIX-PIPE: BRANCH PREDICTION h Restart the fetch coroutine 287 i. x284. x144. sign bit = macro. x47. x60. This code is used in section 155. x44. dispatch stat [j ]). inst ptr :o = data~ go :o. Extern = macro.h>. x70. data : register control . deissues : int . npower : int . i: register int . x69. x6. g g cool hist = (j ? (data~ hist  1) + 1 : data~ hist  1). j  dispatch max . inst ptr :p = . x44. x54. = 1  0. print octa : static void ( ). control = struct . bp good stat . h External routines 10 i + void print stats ( ) f g ARGS bp bp bp bp bp bp register int j . x44. hist : unsigned int . printf ("Instructions issued per cycle:\n" ). reorder bot : control . p: specnode . bp rev stat . x19. x4. inst ptr : spec . . P_BIT show pred bit = 1  7. if (bp table ) printf ("Predictions: %d in agreement. x154. g if (bp table ) f = this is what we should have stored = bp table [data~ x:o:h] = data~ x:o:l. mmix opcode = enum . 162. x44. head : fetch . %d good. x154. x40. x12. 161. <stdio. = macro. else printf ("Predictions: %d good. x78. x124. loc : octa . cool hist : unsigned int . resuming : int . x44. for (j = 0. l: tetra . rev stat : int . bad\n" . print octa (data~ loc ). x17. ok stat : int . verbose : int . cool : control . x60. j : register int . if (:(data~ loc :h & sign bit )) f if (inst ptr :o:h & sign bit ) data~ interrupt j= P_BIT . x154. dispatch max : int . x44. else data~ interrupt &= P_BIT . x: specnode. data~ x:o:l ((data~ x:o:l & bp npower )  1)). j. dispatch stat : int . table : char . interrupt : unsigned int .

In a RISC architecture all interaction between main memory and the computer registers is speci. It's time now to consider MMIX 's MMU. Cache memory. the memory management unit. This part of the machine deals with the critical problem of getting data to and from the computational units.MMIX-PIPE: 226 CACHE MEMORY 163.

thus memory accesses are much easier to deal with than they would be on a machine with more complex kinds of interaction. (See. if we want to do it well. for example. because main memory typically operates at a much slower speed than the registers do. High-speed implementations of MMIX introduce intermediate \caches" of storage in order to keep the most important data accessible.ed by load and store instructions. and cache maintenance can be complicated when all the details are taken into account. But memory management is still diÆcult. Chapter 5 of Hennessy and Patterson's Computer Architecture. second edition.) This simulator can be con.

Arbitrary access times for each cache can be speci. and an Scache for both instructions and data. is supported only if both I-cache and D-cache are present. also called a secondary cache.gured to have up to three auxiliary caches between registers and memory: An I-cache for instructions. The S-cache. a D-cache for data.

Consider a cache that has blocks of 2 bytes each and associativity 2 . here b  3 and a  0. but we call it a cache since it is implemented in nearly the same way as an I-cache. and 2 bytes per block.) Besides the optional I-cache. so that both cases are handled with the same basic methods. the cache contains 2 + + bytes of data. Translation caches have b = 3 and they also usually have c = 0. but only one load or store instruction can be updating the D-cache or S-cache or main memory at a time. for translation of virtual addresses to physical addresses. D-cache. With 2 sets of 2 blocks each. obtained from a virtual address by blanking out the lower s bits and inserting the value of n. The I-cache. there are required caches called the IT-cache and DT-cache. The case a = 0 corresponds to a so-called direct-mapped cache. Our speculative pipeline can have many functional units handling load and store instructions. If a tag matches the speci. D-cache. we might assume. the case c = 0 corresponds to a so-called fully associative cache. we ignore the low-order b bits and use the next c bits to address the cache set . data might be passing between the S-cache and memory while other data is passing between the reorder bu er and the D-cache. and main memory might require 20 cycles or more. A translation cache is often called a \table lookaside bu er" or TLB. but the access time for the S-cache might be say 5 cycles. furthermore. and S-cache are addressed by 48-bit physical addresses. in addition to the space needed for tags. We will consider all caches to be addressed by 64-bit keys. 164. then the remaining 64 b c bits should match one of 2 tags in that set. Given a 64-bit key. that data items in the I-cache or D-cache can be sent to a register in one or two clock cycles. where the page size s and the process number n are found in rV. (However. and S-cache. but the IT and DT caches are addressed by 64-bit keys. as if they were part of main memory. the D-cache can have several read ports. for example.ed independently.

The item chosen b a a c b a b c a . Otherwise we \miss.ed bits. we \hit" in the cache and can use and/or update the data found there." and we probably want to replace one of the cache blocks by the block containing the item sought.

227 MMIX-PIPE: CACHE MEMORY for replacement is called a victim. but four strategies for victim selection are available when we must choose from among 2 entries for a > 0:  \Random" selection chooses the victim by extracting the least signi. The choice of victim is forced when the cache is direct-mapped.

It requires a bit table r1 : : : r . h Type de. a times.  \LRU (Least Recently Used)" selection chooses the victim that ranks last if items are ranked inversely to the time that has elapsed since their previous use. When a = 2. Whenever we use an item with binary address (i1 : : : i )2 in the set. where r101 means the same as r5 . (For example.  \Serial" selection chooses 0. 0. we start with l 1 and then repeatedly set l 2l + r . r10 0. r1 1 i :::i a 1 1 i.) To select a victim. we adjust the bit table as follows: a a a a a r1 1 i1 . a here the subscripts on r are binary numbers. : : : on successive trials. when a = 3. 2 1. When a = 1. : : : . 1.cant a bits of the clock. r101 1. : : : . the use of element (010)2 sets r1 1. 0. r1 1 i 1 i2 . then we choose element l 2 . this scheme was implemented in the Intel 80486 chip. this scheme is equivalent to LRU. 1.  \Pseudo-LRU" selection chooses the victim by a rough approximation to LRU that is simpler to implement in hardware. 2 1. : : : .

which contains the last 2 victim blocks removed from the main cache area. 165. A cache might also include a \victim" area. lru g replace policy . The victim area can be searched in parallel with the speci. serial .nitions 11 i + typedef enum f l a random . pseudo lru .

ed cache set. thereby increasing the chance of a hit without making the search go slower. Each of the three replacement policies can be used also in the victim cache. v .

meaning that we send all new data to memory immediately and never mark anything dirty. if g = 3. an entire cache block is either dirty or clean. for each cache block. This means that we maintain.MMIX-PIPE: 228 CACHE MEMORY 166. even if the cache block being written has to be fetched . a set of 2 \dirty bits. or we can write-back. Two policies are available when new data is written into all or part of a cache block. Thus if g = b. meaning that we keep the new data in the cache. meaning that we update the memory from the cache only when absolutely necessary. where b  g  3. We can write-through. Furthermore we can write-allocate." which identify the 2 -byte groups that have possibly changed since they were last read from memory. the dirtiness of each octabyte is maintained separately. A cache also has a granularity 2 .

so they do not need the facilities discussed in this section. IT-cache. not directly to memory. or we can write-around. (In this discussion. meaning that we keep the new data only if it was part of an existing cache block. the D-cache and S-cache can be assumed to have the same granularity. The I-cache. \memory" is shorthand for \the next level of the memory hierarchy". the I-cache and D-cache write new data to the S-cache. if there is an S-cache. Moreover.rst because of a miss.) h Header de. and DT-cache are read-only.

nitions 6 i + #de.

ne WRITE_BACK 1 = use this if not write-through = #de.

They are represented by cache structures. We have seen that many avors of cache can be simulated. We use a full byte to store each dirty bit. containing arrays of cacheset structures that contain arrays of cacheblock structures for the individual blocks. and we use full integer words to store rank .ne WRITE_ALLOC 2 = use this if not write-around = 167.

.elds for LRU processing. memory economy is less important than simplicity in this simulator. h Type de. etc.

= bits of key not included in the cache block address = char dirty . v. = optional WRITE_BACK and/or WRITE_ALLOC = = cycles to know if there's a hit = int access time . = array of 2b 3 octabytes. = array of 2c sets of arrays of cache blocks = cacheset victim . bb . setsize. typedef cacheblock cacheset . g. gg . blocksize. int tagmask . granularity. = cycles to copy an old block from the cache = int copy out time . cacheset set . = array of 2a or 2v blocks = typedef struct f int a. cc . setsize. b. vv . the data in a cache block = int rank . vrepl . c. and victimsize = = associativity. = lg of associativity. = auxiliary information for non-random policies = g cacheblock . if present = . = array of 2g b dirty bits. and victimsize (all powers of 2) = int aa .nitions 11 i + g b g g typedef struct f octa tag . = the victim cache. = cycles to copy a new block into the cache = int copy in time . blocksize. int mode . = 2b+c = = how to choose victims and victim-victims = replace policy repl . one per granule = octa data . granularity.

229 MMIX-PIPE: CACHE MEMORY coroutine .

= a coroutine for copying new blocks into the cache = = its control block = control .ller .

ller ctl . cacheblock inbuf . = a coroutine for writing dirty old data from the cache = = its control block = control usher ctl . = . coroutine usher .

lling comes from here = cacheblock outbuf . = nonzero when the cache is being changed signi. = ushing goes to here = lockvar lock .

cantly = = nonzero when .

ller should pass data back = lockvar .

169. Scache . = how many coroutines can be reading the cache? = coroutine reader . DTcache . Now we are ready to de.ll lock . = array of coroutines that might be reading simultaneously = = "Icache" . Dcache . for example = char name . ITcache . int ports . g cache . h External variables 4 i + Extern cache Icache . 168.

ne some basic subroutines for cache maintenance. false = 0. x11. j < c~ bb . Extern = macro. 170. h Internal prototypes 13 i + static bool is dirty ARGS ((cache . replace policy = enum. x11. d ++ . x164. true = 1. bool = enum. x4. for (j = 0. j += c~ gg ) if (d) return true . = a cache block = f g ARGS register int j . x164. x11. return false . . x44. Let's begin with a trivial routine that tests if a given cache block is dirty. x6. control = struct . register char d = p~ dirty . cacheblock )). x17. random = 0. lockvar = coroutine . x23. octa = struct . coroutine = struct . h Subroutines 14 i + static bool is dirty (c. x37. p) cache c. = macro. = the cache containing it = cacheblock p.

printf (" (%d)\n" . printf ("%08x%08x: " . p:tag :h. p:data [j ]:l. c~ lock~ stage ). cache )). 174. p:rank ). h Subroutines 14 i + static void print cache block (p. p:tag :l). c) cacheblock p. c~ name . 172. f g if (c) f if (c~ lock ) printf ("%s locked by %s:%d\n" . if (c~ . j. f register int i. c~ lock~ name . j ++ . p:dirty [i] ? '*' : ' ' ). For diagnostic purposes we might want to display an entire cache block. j < b. g = c~ gg  3. for (i = j = 0. b = c~ bb  3. g 173. cache c. i += ((j & (g 1)) ? 0 : 1)) printf ("%08x%08x%c" . p:data [j ]:h. h Internal prototypes 13 i + static void print cache block ARGS ((cacheblock . h Internal prototypes 13 i + static void print cache locks ARGS ((cache )).MMIX-PIPE: CACHE MEMORY 230 171. h Subroutines 14 i + static void print cache locks (c) cache c.

c~ name . c~ .ll lock ) printf ("%sfill locked by %s:%d\n" .

c~ .ll lock~ name .

This can be a huge amount of data.ll lock~ stage ). if (c~ . The print cache routine prints the entire contents of a cache. g 175. dirty only ? "Dirty blocks" : "Contents" . f if (c) f register int i. h External routines 10 i + void print cache (c. bool )). j . bool dirty only . since interesting cases arise more often when a cache is fairly small. printf ("%s of %s:" . 176. the task of debugging favors the use of small caches. but it can be very useful when debugging. h External prototypes 9 i + Extern void print cache ARGS ((cache . dirty only ) cache c. c~ name ). Fortunately.

ller :next ) f printf (" (filling " ). print octa (c~ name [1]  'T' ? c~ .

ller ctl :y:o : c~ .

ller ctl :z:o). . printf (")" ). g if (c~ usher :next ) f printf (" (flushing " ).

dirty : char . bb : int . c). printf (")" ). data : octa . x167. j ). We don't print the cache blocks that have an invalid tag. x167. &c~ set [i][j ]))) f printf ("[%d][%d] " . j ). g g h Print all of c's cache blocks 177 i. Extern = macro. aa : int . x167. g This code is used in section 176. cache = struct . i ++ ) for (j = 0. g for (j = 0. &c~ victim [j ]))) f printf ("V[%d] " . i < c~ cc . cc : int . h Print all of c's cache blocks 177 i  for (i = 0. x11. i. x4. j < c~ vv . x167. print cache block (c~ set [i][j ]. cacheblock = struct . x167. . 177. j < c~ aa .231 MMIX-PIPE: g CACHE MEMORY print octa (c~ outbuf :tag ). unless requested to be verbose. printf ("\n" ). = macro. x167. j ++ ) if ((:(c~ victim [j ]:tag :h & sign bit ) _ (verbose & show wholecache bit )) ^ (:dirty only _ is dirty (c. print cache block (c~ victim [j ]. bool = enum. c). x6. j ++ ) if ((:(c~ set [i][j ]:tag :h & sign bit ) _ (verbose & show wholecache bit )) ^ (:dirty only _ is dirty (c. x167.

.ll lock : lockvar . x167.

ller : coroutine . . x167.

show wholecache bit = 1  8. x167. vv : int . x167. o: octa . x167. verbose : int . l: tetra .h>. lock : lockvar . x17. x44. printf : int ( ). x 8. name : char . x167. sign bit = macro. tag : octa . . h: tetra . x167. x44. y: z: spec . x170. ARGS usher : coroutine . x167. gg : int . next : coroutine . x167. print octa : static void ( ). is dirty : rank : int . spec . x167. x19. x167. x167. <stdio. x40. victim : cacheset . x17.ller ctl : control . x4. static bool ( ). x23. set : cacheset . x23. outbuf : cacheblock . stage : int . x80. x167.

h External routines 10 i + void clean block (c. cacheblock )). We clear the dirty entries here. 181. j < c~ bb  3. j ++ ) p~ dirty [j ] = false . cacheblock p.MMIX-PIPE: CACHE MEMORY 232 178. just to be tidy. although they could actually be left in arbitrary condition when the tags are invalid. j < c~ vv . The get reader subroutine . g for (j = 0. j ++ ) p~ data [j ] = zero octa . f register int i. j ++ ) f clean block (c. 179. p~ tag :l = 0. j < c~ bb  c~ g. i < c~ cc . f g register int j . j . The clean block routine simply initializes a given cache block. e ectively restoring it to its initial condition. p~ tag :h = sign bit . h External prototypes 9 i + Extern void clean block ARGS ((cache . h External prototypes 9 i + Extern void zap cache ARGS ((cache )). j ++ ) f clean block (c. The zap cache routine invalidates all tags of a given cache. i ++ ) for (j = 0. for (j = 0. g g 182. for (j = 0. j < c~ aa . &(c~ victim [j ])). 180. h External routines 10 i + void zap cache (c) cache c. &(c~ set [i][j ])). for (i = 0. p) cache c.

f register int j . g .nds the index of an available reader coroutine for a given cache. or returns a negative value if no readers are available. h Internal prototypes 13 i + static int get reader ARGS ((cache )). 183. return 1. h Subroutines 14 i + static int get reader (c) cache c. j < c~ ports . for (j = 0. j ++ ) if (c~ reader [j ]:next  ) return j .

assuming that the destination cache has a suÆciently large block size. x11. h Subroutines 14 i + static void copy block (c. register int o = p~ tag :l & (cc~ bb 1). confusion = macro ( ). x167.) We also assume that both blocks have compatible tags. h Internal prototypes 13 i + static void copy block ARGS ((cache . p. bb : int . x167. x167. data : octa . (In other words. x167. jj ++ ) if (p~ dirty [j ]) f pp~ dirty [jj ] = true . cc . . = macro. ports : int . x167. x11. cacheblock p. p. cache . sign bit = macro. x167. cacheblock )).233 MMIX-PIPE: CACHE MEMORY 184. ii ++ ) pp~ data [ii ] = p~ data [i]. pp ) cache c. j ++ . x80. x23. ii . cc : int . victim : cacheset . true = 1. i. for (j = 0. cc . x167. x167. if (c~ g 6= cc~ g _ p~ tag :h 6= pp~ tag :h _ p~ tag :l o 6= pp~ tag :l) panic (confusion ("copy block" )). ARGS dirty : char . The subroutine copy block (c. false = 0. cc . cacheblock = struct . x6. vv : int . x167. f g aa : register int j. x167. x17. lim = (j + 1)  (c~ g 3). h: tetra . reader : coroutine . x4. x17. cacheblock . pp . ii = jj  (c~ g 3). MMIX-ARITH x4. we assume that cc~ b  c~ b. i < lim . 185. x167. g : int . for (i = j  (c~ g 3). tag : octa . lim . next : coroutine . x167. x167. i ++ . cache = struct . and that both caches have the same granularity. g int . Extern = macro. x13. b: int . jj . pp ) copies the dirty items from block p of cache c into block pp of cache cc . jj = o  c~ g. l: tetra . x167. x13. set : cacheset . panic = macro ( ). zero octa : octa . j < c~ bb  c~ g. x167.

The choose victim subroutine selects the victim to be replaced when we need to change a cache set.MMIX-PIPE: 234 CACHE MEMORY 186. We need only one bit of the rank .

Of course we use an a-bit counter to implement policy = serial . policy = lru .elds to implement the r table when policy = pseudo lru . and we don't need rank at all when policy = random . we need an a-bit rank . In the other case.

register int l. h Subroutines 14 i + static void note usage (l. p ++ ) if (p~ rank > r) p~ rank . . s[0]:rank = (l + 1) & (aa 1). f register cacheblock p. int . case serial : l = s[0]:rank . cacheset . p < s + aa . panic (confusion ("lru victim" )).eld. m. The note usage subroutine updates the rank entries to record the fact that a particular block in a cache set is now being used. register int j. int aa . for (p = s. register cacheblock p. r. = what happened? nobody has rank zero = case pseudo lru : for (l = 1. the least recently used entry has rank 0. case lru : for (p = s. g 188. = a cache block that's probably worth preserving = cacheset s. replace policy )). aa . p ++ ) if (p~ rank  0) return p. h Internal prototypes 13 i + static void note usage ARGS ((cacheblock . policy ) a f g cacheset s. p < s + aa . s. m = aa  1. if (aa  1 _ policy  serial ) return . h Internal prototypes 13 i + static cacheblock choose victim ARGS ((cacheset . switch (policy ) f case random : return &s[ticks :l & (aa 1)]. return &s[l]. return &s[l aa ]. int . = setsize = replace policy policy . if (policy  lru ) f r = l~ rank . = setsize = replace policy policy . = the set that contains l = int aa . 189. replace policy )). m. and the most recently used entry has rank 2 1 = aa 1. 187. aa . m = 1) l = l + l + s[l]:rank . m. policy ) cacheblock l. h Subroutines 14 i + static cacheblock choose victim (s.

x164. register int j. else s[j ]:rank = 1. replace policy = enum. . x167. = setsize = replace policy policy . l~ rank = 0. j = j + j + 1. serial = 1. x13. j = j + j . if (aa  1 _ policy  serial ) return . m. x164. = a cache block we probably don't need = cacheset s. m = 1) if (r & m) s[j ]:rank = 1. x6. else s[j ]:rank = 0. s. j = j + j . m = 1) if (r & m) s[j ]:rank = 0. else f r=l g g return . cacheblock = struct . m. for (p = s. random = 0. r. h Internal prototypes 13 i + static void demote usage ARGS ((cacheblock . ARGS x 167. it changes the rank of a given block to least recently used. cacheset = cacheblock . x87. f register cacheblock p. ticks = macro. int . x167. aa . replace policy )). = policy  pseudo lru = s. m. 191.235 MMIX-PIPE: g l~ rank CACHE MEMORY = aa 1. panic = macro ( ). 190. for (j = 1. x164. The demote usage subroutine is sort of the opposite of note usage . m = aa  1. policy ) cacheblock l. confusion = macro ( ). x13. g = policy  pseudo lru = s. pseudo lru = 2. m = aa  1. aa : int . h Subroutines 14 i + static void demote usage (l. lru = 3. for (j = 1. p < s + aa . x167. = macro. x164. p ++ ) if (p~ rank < r) p~ rank ++ . else f r=l g g return . j = j + j + 1. if (policy  lru ) f r = l~ rank . x164. cacheset . rank : int . = the set that contains l = int aa .

The cache search routine looks for a given key in a given cache. the set in which the block was found is stored in global variable hit set . and returns a cache block if there's a hit. Notice that we need to check more bits of the tag when we search in the victim area. #de. If the search hits. otherwise it returns .MMIX-PIPE: CACHE MEMORY 236 192.

= the key = f register cacheset s. = cache miss. alf ) c~ set [(alf :l & (c~ tagmask ))  c~ b] h Internal prototypes 13 i + static cacheblock cache search ARGS ((cache . alf ) cache c. return p. h Global variables 20 i + cacheset hit set . = double miss = hit : hit set = s. register cacheblock p. s = c~ victim . 195. and no victim area = for (p = s. p < s + c~ aa . If p = cache search (c. p ++ ) if (((p~ tag :l  alf :l) & ( c~ bb ))  0 ^ p~ tag :h  alf :h) goto hit . octa )). alf ) hits and if we call use and . p < s + c~ vv . g 194. = the cache to be searched = octa alf . if (:s) return . h Subroutines 14 i + static cacheblock cache search (c.ne cache addr (c. p ++ ) if (((p~ tag :l  alf :l) & c~ tagmask )  0 ^ p~ tag :h  alf :h) goto hit . alf ). return . for (p = s. = the set corresponding to alf = s = cache addr (c. 193.

unless the . A hit in the victim area moves the cache block to the main area.x (c. cache c is updated to record the usage of key alf . p) immediately afterwards.

ller routine of cache c is active. A pointer to the (possibly moved) cache block is returned. h Internal prototypes 13 i + static cacheblock use and .

x ARGS ((cache . 196. cacheblock )). h Subroutines 14 i + static cacheblock use and .

c~ repl ).x (c. p) cache c. hit set . hit set . c~ vrepl ). f if (hit set 6= c~ victim ) note usage (p. c~ vv . else f = found in victim cache = note usage (p. cacheblock p. c~ aa . if (:c~ .

p~ tag ). note usage (q. c~ aa . h Swap cache blocks p and q 197 i. register cacheblock q = choose victim (s. c~ repl ). c~ aa .ller :next ) f register cacheset s = cache addr (c. . s. c~ repl ).

g This code is used in sections 196 and 205. The demote and . return p. 198. register octa dd = p~ data . We can simply permute the pointers inside the cacheblock structures of a cache. p~ tag = q~ tag . t = p~ tag . q~ dirty = d. instead of copying the data.237 MMIX-PIPE: g g g CACHE MEMORY return q. 197. q~ data = dd . p~ data = q~ data . h Swap cache blocks p and q 197 i  f octa t. p~ dirty = q~ dirty . if we are careful not to let any of those pointers escape into other data structures. q~ tag = t. register char d = p~ dirty .

x routine is analogous to use and .

x . except that we want don't want to promote the data we found. h Internal prototypes 13 i + static cacheblock demote and .

h Subroutines 14 i + static cacheblock demote and . 199. cacheblock )).x ARGS ((cache .

if (hit set = g aa : else demote usage (p. b: int . ARGS demote usage : static void ( ). x187. . x167. x167. choose victim : static cacheblock ( ).x (c. cacheblock p. hit set . c~ vrepl ). int . hit set . x167. x6. p) cache c. x167. dirty : char . x167. f 6 c~ victim ) demote usage (p. c~ aa . cacheset = cacheblock . = macro. bb : int . data : octa . c~ vv . x167. cache = struct . x167. 191. c~ repl ). cacheblock = struct . return p. x167.

x167.ller : coroutine . vv : int . x17. x q: x 205. h: tetra . x167. note usage : static void ( ). register cacheblock . x189. x17. x167. x167. vrepl : replace policy . x205. x167. tag : octa . x17. x23. x167. . tagmask : int . set : cacheset . victim : cacheset . repl : replace policy . octa = struct . x167. next : coroutine . x167. p: register cacheblock . l: tetra .

The subroutine load cache (c.MMIX-PIPE: CACHE MEMORY 238 200. p) is called at a moment when c~ lock has been set and c~ inbuf has been .

c~ inbuf :data = d. f g register int i. p~ tag ). d = p~ data . p) cache c. for (i = 0. cacheblock p. i ++ ) p~ dirty [i] = false . p~ data = c~ inbuf :data .lled with clean data to be placed in the cache block p. use and . p~ tag = c~ inbuf :tag . register octa d. i < c~ bb  c~ g. h Internal prototypes 13 i + static void load cache ARGS ((cache . 201. cacheblock )). h Subroutines 14 i + static void load cache (c. = p not moved = hit set = cache addr (c.

It puts cache block p into c~ outbuf and .x (c. p). 202. The subroutine ush cache (c. keep ) is called at a \quiet" moment when c~ usher :next = . p.

203. dd = c~ outbuf :dirty . c~ outbuf :dirty = p~ dirty .res up the c~ usher coroutine. cacheblock . = should we preserve the data in p? = f g register octa d. p. c~ outbuf :tag = p~ tag . It returns a pointer to a cache block in the main area where the new information should be put. j ++ ) p~ dirty [j ] = false . The alloc slot routine is called when we wish to put new information into a cache after a cache miss. bool )). for (j = 0. which will take care of sending the data to lower levels of the memory hierarchy. register char dd . c~ outbuf :data = p~ data . p~ dirty = dd . = a block inside cache c = bool keep . c~ copy out time ). Cache block p is also marked clean. register int j . The tag of that cache block is invalidated. if (keep ) for (j = 0. h Subroutines 14 i + static void ush cache (c. j < c~ bb  c~ g. h Internal prototypes 13 i + static void ush cache ARGS ((cache . keep ) cache c. j ++ ) c~ outbuf :data [j ] = p~ data [j ]. j < c~ bb  3. else d = c~ outbuf :data . cacheblock p. 204. = will not be aborted = startup (&c~ usher . p~ data = d. the calling routine should take care of .

lling it and giving it a valid tag in due time. The cache's .

This routine returns . if the block being replaced is dirty. Inserting new information might also require writing old information into the next level of the memory hierarchy.ller routine should not be active when alloc slot is called.

register cacheblock p. octa )). sign bit = macro. the D-cache access time is 1. = the set corresponding to alf = s = cache addr (c. false = 0. dirty : char . x167. A key value is never negative. One process hits in DT-cache but misses in D-cache. h Swap cache blocks p and q 197 i. x167. alf ) cache c. alf ). g copy out time : int . c~ vrepl ). if (is dirty (c. x167. c~ repl ). x167.239 MMIX-PIPE: CACHE MEMORY  in such cases if the cache is ushing a previously discarded block. x167. 205. aa : int . c~ vv . x17. alf )) return . p. octa = struct . octa alf . = key that probably isn't in the cache = f register cacheset s. Otherwise it schedules the usher coroutine. else p = choose victim (s. bb : int . c~ aa . h Internal prototypes 13 i + static cacheblock alloc slot ARGS ((cache . cache = struct . data : octa . x80. x6. x167. = invalidate the tag = q~ tag :h j= sign bit . c~ repl ). and two processes simultaneously look for the same physical address. x11. c~ aa . Such cases are rare. waiting 5 cycles before trying alloc slot in the D-cache. outbuf : cacheblock . bool = enum. register int j . if (cache search (c. q. next : coroutine . repl : replace policy . h Subroutines 14 i + static cacheblock alloc slot (c. g p~ tag :h j= sign bit . . false ). ush cache (c. x23. return p. g if (c~ victim ) f q = choose victim (s. if (c~ victim ) p = choose victim (c~ victim . x11. return q. so it might have updated the D-cache. ARGS = macro. x167. meanwhile the other process missed in D-cache but didn't need to use the DT-cache. p)) f if (c~ usher :next ) return . x167. This routine returns  also if the given key happens to be in the cache. but the following scenario shows that they aren't impossible: Suppose the DT-cache access time is 5. Therefore we can invalidate the tag in the chosen slot by forcing it to be negative.

usher : coroutine . startup : static void ( ). cacheblock ( ). cache search : static tag : octa . x167. x167. x167. x192. x193. use and . cache addr = macro ( ). x167. x31.ller : coroutine . g : int .

x167. choose victim : static cacheblock ( ). x17. x170. x167. x167. vrepl : replace policy . hit set : cacheset . x167. cacheblock = struct . vv : int . x194. victim : cacheset . cacheset = cacheblock . x167.x : static h: tetra . x167. x196. x167. x187. is dirty : static bool ( ). . lock : lockvar . inbuf : cacheblock . cacheblock ( ).

simulation could proceed until more than 2 1 tags are required.) We could regard memory as a special kind of cache.MMIX-PIPE: 240 SIMULATED MEMORY 206. How should we deal with the potentially gigantic memory of MMIX ? We can't simply declare an array m that has 248 bytes. Simulated memory. up to 263 bytes are needed. if we consider also the physical addresses  248 that are reserved for memory-mapped input/output. (Indeed. For example. with 2 blocks each having a di erent tag. in which every access is required to hit. But then the prede. such an \M-cache" could be fully associative.

as needed. If the address is 248 or greater. and we will use hashing to search for the relevant chunk whenever a physical address is given. supplied by the user. Instead. say. will be called upon to do the reading or writing.ned value of a might well be so large that the sequential search of our cache search routine would be too slow. Chunk addresses that are not used take no space in this simulator. special routines called spec read and spec write . 1000 such patterns occur. Parameter mem chunks max speci. we will allocate memory in chunks of 216 bytes at a time. the simulator will dynamically allocate approximately 65MB for the portions of main memory that are used. But if. Otherwise the 48-bit address consists of a 32-bit chunk address and a 16-bit chunk o set.

This parameter does not constrain the range of simulated physical addresses. h Type de.es the largest number of di erent chunk addresses that are supported. which cover the entire 256 large-terabyte range permitted by MMIX .

preferably more than twice as large but not much bigger than that.nitions 11 i + a a typedef struct f tetra tag . = 32-bit chunk address = octa chunk . The default values mem chunks max = 1000 and hash prime = 2009 are set by MMIX con. 207. = either  or an array of 213 octabytes = g chunknode . The parameter hash prime should be a prime number larger than the parameter mem chunks max .

g unless the user speci.

h Subroutines 14 i + = for memory mapped I/O = extern octa spec read ARGS ((octa addr )). The separately compiled procedures spec read ( ) and spec write ( ) have the same calling conventions as the general procedures mem read ( ) and mem write ( ). but not enormous = Extern int hash prime .es otherwise. = the simulated main memory = Extern chunknode mem hash . 208. . h External variables 4 i + = this many chunks are allocated so far = Extern int mem chunks . = up to this many di erent chunks per run = Extern int mem chunks max . = likewise = extern void spec write ARGS ((octa addr . = larger than mem chunks max . octa val )).

optionally with a comment to the user. If the program tries to read from a chunk that hasn't been allocated.241 MMIX-PIPE: SIMULATED MEMORY 209. Chunk address 0 is always allocated . the value zero is returned.

x6. mem hash [h]:tag 6= key . x4.rst. h: tetra . 211. key = (addr :l & # ffff0000) + addr :h. static cacheblock ( ). return mem hash [h]:chunk [o ]. x17. register int h. key . x213. Extern = macro. MMIX con. Then we can assume that a matching chunk tag implies a nonnull chunk pointer. addr :l). h External variables 4 i + = the hash index that was most recently correct = Extern int last h . last h = h. x17. so that we can rapidly read other words that we know must belong to the same chunk. x13. if (addr :h  (1  16)) return spec read (addr ). representing uninitialized memory. = zero will be returned = h = hash prime . g g g if (h  0) h = hash prime . = macro. For this purpose it is convenient to let mem hash [hash prime ] be a chunk full of zeros. ARGS cache search : l: tetra . x193. break . o = (addr :l & # ffff)  3. addr :h. mem write : void ( ). 210. h External prototypes 9 i + Extern octa mem read ARGS ((octa addr )). This routine sets last h to the chunk found. f register tetra o . h ) f if (mem hash [h]:chunk  ) f if (verbose & uninit mem bit ) errprint2 ("uninitialized memory read at %08x%08x" . for (h = key % hash prime . h External routines 10 i + octa mem read (addr ) octa addr . errprint2 = macro ( ).

( ). octa = struct . MMIX-MEM x2. uninit mem bit = 1  4. spec read : octa ( ). spec write : tetra = unsigned int . x4. MMIX-MEM x3. MMIX-CONFIG x38. x17. x8.g : void ( ). verbose : int . x17. .

214. f register tetra o . mem chunks max )). depending on the characteristics of the memory bus being simulated. 215. val ) octa addr . break . = cycles to write to main memory = Extern int mem write time . sizeof (octa )). val ). Let bus words be the number of octabytes read or written simultaneously (usually bus words is 1 or 2. h External prototypes 9 i + Extern void mem write ARGS ((octa addr . for (h = key % hash prime . if (mem hash [h]:chunk  ) panic (errprint1 ("I can't allocate memory chunk number %d" . h External variables 4 i + = cycles to transmit an address on memory bus = Extern int mem addr time . octa val )). its data~ ptr a will be Scache or Dcache . if (addr :h  (1  16)) f spec write (addr . h ) f if (mem hash [h]:chunk  ) f if ( ++ mem chunks > mem chunks max ) panic (errprint1 ("More than %d memory chunks are needed" . One of the principal ways to write memory is to invoke a ush to mem coroutine. return . = is nonnull when the bus is busy = Extern lockvar mem lock . respectively. mem hash [h]:chunk = (octa ) calloc (1  13. in octabytes = Extern int bus words . The memory is characterized by several parameters. . key = (addr :l & # ffff0000) + addr :h. The data to be written will just have been copied to the cache's outbuf . or the Dcache~ usher if there is a D-cache but no S-cache. = cycles to read from main memory = Extern int mem read time . 213. h External routines 10 i + void mem write (addr . When such a coroutine is started. which is the Scache~ usher if there is an S-cache. The number of clock cycles needed to read or write c  bus words octabytes that all belong to the same cache block is assumed to be mem addr time + c  mem read time or mem addr time + c  mem write time . key . mem chunks )). mem hash [h]:tag 6= key . = width of memory bus. mem hash [h]:tag = key . val . mem hash [h]:chunk [o ] = val .MMIX-PIPE: SIMULATED MEMORY 242 212. register int h. it must be a power of 2). last h = h. g g g if (h  0) h = hash prime . g o = (addr :l & # ffff)  3.

= this frees mem lock and c~ outbuf = g g 216.243 MMIX-PIPE: SIMULATED MEMORY h Cases for control of special coroutines 126 i + case ush to mem : f register cache c = (cache ) data~ ptr a . h Write the dirty data of c~ outbuf and wait for the bus 216 i  f register int o . last o . . switch (data~ state ) f case 0: if (mem lock ) wait (1). case 1: set lock (self . data~ state = 2. case 2: goto terminate . count . mem lock ). h Write the dirty data of c~ outbuf and wait for the bus 216 i. data~ state = 1.

ii .rst . . for (i = j = 0. o = (addr :l & # ffff)  3. addr = c~ outbuf :tag . = octabytes per granule = octa addr . register int del = c~ gg  3.

if (:c~ outbuf :dirty [j ]) i = ii . j ++ ) f ii = i + del .rst = 1. o += del . count = 0. j < c~ bb  c~ g. addr :l += del  3. else while (i < ii ) f if (.

rst ) f count ++ . . last o = o .

x12. x13. x13. x167. x17. data : octa . x6. octa = struct . ptr a : void . mem chunks max : int . x167. x206. 208. g else f if ((o  last o ) & ( bus words )) count ++ . tag : tetra . ARGS = macro. g : int . x167. set lock = macro ( ). g g g i ++ . x44. x4. x44. j : register int . errprint1 = macro ( ). bb : int . last h : int . dirty : char . Scache : cache . tetra = unsigned int . lockvar = coroutine . addr :l += 8. gg : int . terminate : label. calloc : void (). x168. x168. Extern = macro. x211. x207. x37. g This code is used in section 215. x167. x17. x17. x129. wait (mem addr time + count  mem write time ). x125. mem hash [last h ]:chunk [o ] = c~ outbuf :data [i]. data : register control . state : int . mem hash : chunknode . mem chunks : int . self : register coroutine . x12. Dcache : cache . x 124. <stdlib. hash prime : int . cache = struct . c~ outbuf :data [i]). x167. o ++ . ush to mem = 97.h>. x . l: tetra . x167.rst = 0. last o = o . x124. tetra . wait = macro ( ). x207. x167. x167. tag : octa . x207. chunk : octa . usher : coroutine . mem write (addr . outbuf : cacheblock . x167. x206. h: panic = macro ( ). x17. x207. i: register int . x125. x37. spec write : extern void ( ).

case 2: h Fill Scache~ inbuf with clean memory data 219 i. But if both D-cache and S-cache exist. have an additional cache between the S-cache and memory. furthermore. p). data~ ptr b = (void ) cache search (Scache . etc. Some machines. else data~ state = 6. wait (Scache~ access time ). called the B-cache (the \backup cache"). case 4: copy block (c. Cache transfers. if (block di ) h Copy Scache~ inbuf to slot p 220 i. data~ state = 1.MMIX-PIPE: CACHE TRANSFERS 244 217. But one simplifying fact does help us: We know that the usher coroutine will not be aborted until it has run to completion. the Dcache~ usher is a more complicated coroutine of type ush to S . if (data~ ptr b ) data~ state = 4. case 3: h Allocate a slot p in the S-cache 218 i. Scache . such as the Alpha 21164. Scache~ lock ). hit set = cache addr (Scache . case 1: set lock (self . use and . c~ outbuf :tag ). p = (cacheblock ) data~ ptr b . h Cases for control of special coroutines 126 i + case ush to S : f register cache c = (cache ) data~ ptr a . &(c~ outbuf ). c~ outbuf :tag ). We have seen that the Dcache~ usher sends data directly to the main memory if there is no S-cache. but such extensions of the present program are left to the interested reader. A B-cache could be simulated by extending the logic used here. register int block di = Scache~ bb c~ bb . else if (Scache~ mode & WRITE_ALLOC ) data~ state = (block di ? 2 : 3). switch (data~ state ) f case 0: if (Scache~ lock ) wait (1). In this case we need to deal with the fact that the S-cache blocks might be larger than the D-cache blocks. the S-cache might have a write-around and/or write-through policy.

case 6: h Handle write-around when ushing to the S-cache 221 i. g goto terminate . case 5: if ((Scache~ mode & WRITE_BACK )  0) f = write-through = if (Scache~ usher :next ) wait (1). h Allocate a slot p in the S-cache 218 i  if (Scache~ . p. true ). wait (Scache~ copy in time ). ush cache (Scache . g g 218. = p not moved = data~ state = 5.x (Scache . p).

ller :next ) wait (1). . c~ outbuf :tag ). p~ tag :l = c~ outbuf :tag :l & ( Scache~ bb ). if (:p) wait (1). data~ ptr b = (void ) p. = perhaps an unnecessary precaution? = p = alloc slot (Scache . p~ tag = c~ outbuf :tag . This code is used in section 217.

chunk : octa . x205. x167. set lock (&mem locker . but it's easier to read them all and to charge only for reading the ones we needed. x168. x214. x167. j < Scache~ bb  3. else Scache~ inbuf :data [j ] = mem hash [last h ]:chunk [j + o ]. data : octa . We only need to read block di bytes. x193. x167. mem lock ). copy block : static void ( ). cache = struct . g This code is used in section 217. h Fill Scache~ inbuf with clean memory data# 219 i  f register int o = (c~ outbuf :tag :l & ffff)  3.  bb : int . wait (delay ). register int count = block di  3.245 MMIX-PIPE: CACHE TRANSFERS 219. bus words : int . g This code is used in section 217. alloc slot : static cacheblock ( ). x167. x167. access time : int . x124. p~ data = Scache~ inbuf :data . data~ state = 3. register int delay . x185. copy in time : int . x206. cache addr = macro ( ). startup (&mem locker . cache search : static cacheblock ( ). delay ). cacheblock = struct . for (j = 0. if (mem lock ) wait (1). x192. Dcache : cache . delay = mem addr time +(int ) ((count + bus words 1)=(bus words ))  mem read time . Scache~ inbuf :data = d. . h Copy Scache~ inbuf to slot p 220 i  f register octa d = p~ data . data : register control . 220. j ++ ) if (j  0) Scache~ inbuf :data [j ] = mem read (c~ outbuf :tag ). x167.

x127. x23. startup : static void ( ). x167. ptr a : void . x 203. terminate : label. Scache : cache . x37. octa = struct . x167. state : int . register cacheblock . set lock = macro ( ). mem lock : lockvar . p: x 124. x125. x167. ptr b : void . mem locker : coroutine . x194. x167. mem read time : int . x31. mem read : octa ( ).ller : coroutine . x214. x 207. mem addr time : int . x17. x12. lock : lockvar . inbuf : cacheblock . ush cache : static void ( ). x44. x11. x167. x129. x44. x211. ush to S = 96. x214. mode : int . last h : int . x168. self : register coroutine . hit set : cacheset . l: tetra . usher : coroutine . x167. x167. outbuf : cacheblock. x44. next : coroutine . true = 1. mem hash : chunknode . x210. use and . x17. tag : octa . x258. x214. j : register int .

x166. x125. WRITE_ALLOC WRITE_BACK . = 2. x166. x196. wait = macro ( ).x : static cacheblock ( ). = 1.

Scache~ outbuf :tag :l = c~ outbuf :tag :l & ( Scache~ bb ).MMIX-PIPE: CACHE TRANSFERS 221. &(c~ outbuf ). j < Scache~ bb  Scache~ g. 222. goto terminate . copy block (c. Scache~ outbuf :tag :h = c~ outbuf :tag :h. &(Scache~ outbuf )). The S-cache gets new data from memory by invoking a . Scache . Here we assume that the granularity is 8. startup (&Scache~ usher . Scache~ copy out time ). for (j = 0. h Handle write-around when ushing to the S-cache 221 i  if (Scache~ usher :next ) wait (1). 246 This code is used in section 217. j ++ ) Scache~ outbuf :dirty [j ] = false .

the I-cache or D-cache may also invoke a .ll from mem coroutine.

it holds mem lock . When such a coroutine is invoked. A physical memory address is given in data~ z:o. and its caller has gone to sleep. and data~ ptr a speci. if there is no S-cache.ll from mem coroutine.

es either Icache or Dcache . Furthermore. data~ ptr b speci.

es a block within that cache. The coroutine simulates reading the contents of the speci. determined by the alloc slot routine.

places the result in the x:o .ed memory location.

eld of its caller's control block. It proceeds to . and wakes up the caller.

ll the cache's inbuf and. the speci. ultimately.

ed cache block. Let c = data~ ptr b . The caller is then c~ . before waking the caller again.

In such cases c~ . if this variable is nonnull. if it has been aborted).ll lock . the caller might not wish to be awoken or the receive the data (for example. However.

the .ll lock will be .

If c = Scache . h Cases for control of special coroutines 126 i + case .lling action continues without the wakeup calls. the S-cache will be locked and the caller will not have been aborted.

register coroutine cc = c~ .ll from mem : f register cache c = (cache ) data~ ptr a .

g = the second wakeup call = if (cc ) awaken (cc . mem read time ). awaken (cc . data~ state = 3. case 2: if (c 6= Scache ) f if (c~ lock ) wait (1). set lock (self . c~ lock ). data~ state = 2. if (cc ) f cc~ ctl~ x:o = data~ x:o. c~ copy in time ). load cache (c. mem lock ). (cacheblock ) data~ ptr b ). switch (data~ state ) f case 0: data~ x:o = mem read (data~ z:o). g g . case 3: goto terminate . h Read data into c~ inbuf and wait for the bus 223 i. wait (c~ copy in time ). g data~ state = 1.ll lock . case 1: release lock (self .

cacheblock = struct . c: register cache . chunk : octa . ctl : control . for (i = 0. data : register control . o = (c~ inbuf :tag :l & # ffff)  3. c~ inbuf :tag = data~ z:o. x205. i < count . if (count  bus words ) wait (1 + mem read time ) else wait ((int ) (count =bus words )  mem read time ). x167. bb : int .247 MMIX-PIPE: CACHE TRANSFERS 223. x124. awaken = macro ( ). h Read data into c~ inbuf and wait for the bus 223 i  f register int count .  dirty : char . x217. x11. Dcache : cache . copy block : static void ( ). x206. x23. copy in time : int . count = c~ bb  3. g This code is used in section 222. o . x185. x168. bus words : int . x23. coroutine = struct . false = 0. x214. x167. we wait an extra cycle. so that there will be two wakeup calls. data : octa . cache = struct . x167. . i ++ . copy out time : int . x167. If c's cache size is no larger than the memory bus. o ++ ) c~ inbuf :data [i] = mem hash [last h ]:chunk [o ]. x167. x167. c~ inbuf :tag :l &= c~ bb . x167. alloc slot : static cacheblock ( ). x125.

ll from mem = 95. x129. .

201. x44. x mem read : octa ( ). ptr b : void . wait = macro ( ). x17. x167. x167. x 124. x40. mem lock : lockvar . lock : lockvar . Scache : cache . spec .ll lock : lockvar . x207. x168. self : register coroutine . g : int . x167. load cache : static void ( ). state : int . j : register int . x211. x12. l: tetra . mem hash : chunknode . . outbuf : cacheblock . x37. inbuf : cacheblock . o: octa . startup : static void ( ). x167. ptr a : void . x210. set lock = macro ( ). x44. x12. h: tetra . x167. next : coroutine . x17. terminate : label. x214. release lock = macro ( ). x44. x214. x44. i: register int . x23. x31. x37. usher : coroutine . x125. tag : octa . x168. mem read time : int . x167. x: z: specnode. last h : int . Icache : cache . x125. x44. x167.

MMIX-PIPE: CACHE TRANSFERS 248 224. The .

ll from S coroutine has the same conventions as .

This is the .ll from mem . except that the data comes directly from the S-cache if it is present there.

ller coroutine for the I-cache and D-cache if an S-cache is present. h Cases for control of special coroutines 126 i + case .

register coroutine cc = c~ .ll from S : f register cache c = (cache ) data~ ptr a .

data~ state = 1. case 1: h Start the S-cache .ll lock . p = (cacheblock ) data~ ptr c . switch (data~ state ) f case 0: p = cache search (Scache . if (p) goto S non miss . data~ z:o).

ller 225 i. data~ state = 2. case 2: if (cc ) f cc~ ctl~ x:o = data~ x:o. sleep . = this data has been supplied by Scache~ .

= we had been holding that lock = load cache (c. g case 3: h Copy data from p into c~ inbuf 226 i. c~ lock ). g g . Scache~ access time ). data~ state = 4. sleep . case 5: if (cc ) awaken (cc . Scache~ lock = . (cacheblock ) data~ ptr b ).ller = = we propagate it back = awaken (cc . awaken (cc . data~ state = 5. set lock (self . Scache~ access time ). = second wakeup call = goto terminate . g data~ state = 3. 1). wait (c~ copy in time ). the S-cache will have our data = S non miss : if (cc ) f cc~ ctl~ x:o = p~ data [(data~ z:o:l & (Scache~ bb 1))  3]. = when we awake. wait (Scache~ access time ). case 4: if (c~ lock ) wait (1).

249 MMIX-PIPE: CACHE TRANSFERS 225. but we're about to take on the Scache~ . We are already holding the Scache~ lock .

ll lock too (with the understanding that one is \stronger" than the other). For a short time the Scache~ lock will point to us but we will point to Scache~ .

ll lock . h Start the S-cache . this will not cause diÆculty. because the present coroutine is not abortable.

ller 225 i  if (Scache~ .

data~ z:o).ller :next _ mem lock ) wait (1). if (:p) wait (1). p = alloc slot (Scache . set lock (&Scache~ .

Scache~ .ller . set lock (self . mem lock ).

data~ ptr c = Scache~ .ll lock ).

Scache~ .ller ctl :ptr b = (void ) p.

ller ctl :z:o = data~ z:o. startup (&Scache~ .

for (j = 0. j < c~ bb c~ inbuf :data [j ] = p~ data [o ]. This code is used in section 224. c~ inbuf :tag :l &= c~ bb . The S-cache blocks might be wider than the blocks of the I-cache or D-cache. so the copying in this step isn't quite trivial. mem addr time ). release lock (self . Scache~ . o = (c~ inbuf :tag :l & (Scache~ bb 1))  3. 226. h Copy data from p into c~ inbuf 226 i  f register int o .ller . c~ inbuf :tag = data~ z:o.

x167. x193. copy in time : int . x23. x167. j ++ . data : octa . cache search : static cacheblock ( ). x167. access time : int . alloc slot : static cacheblock ( ). set lock (self . data : register control . ctl : control . coroutine = struct . x23. x167.  awaken = macro ( ). x 124. x205. x125. x167. .  3. bb : int .ll lock ). x167. o ++ ) g This code is used in section 224. Scache~ lock ). cache = struct . cacheblock = struct .

ll from mem = 95. . x129.

. x129.ll from S = 94.

x167.ll lock : lockvar . .

ller : coroutine . . x167.

x37. x40. x31. ptr a : void . p: register cacheblock . x167. x44. x125. x125. state : int . lock : lockvar . x201. x 124. next : coroutine . x214. Scache : cache . x12. spec . ptr b : void . self : register coroutine . x44. l: tetra . x44. inbuf : cacheblock . tag : octa . x23. x125. x167. sleep = macro. terminate : label. x44. startup : static void ( ). j : register int . mem lock : lockvar . release lock = macro ( ). wait = macro ( ). x167. x44. . mem addr time : int . x214. x 258. ptr c : void . x167. set lock = macro ( ).ller ctl : control . o: octa . x: z: specnode. x168. x44. x37. x17. load cache : static void ( ).

These commands will try to preload blocks $Y + $Z. An interruption will then be able to resume properly. the instruction PREST 200. (Remember that these instructions are only hints. PREST 63.$Y.$Z and PREST X. PREST 191.$Z. h Special cases of instruction dispatch 117 i + case preld : case prest : if (:Dcache ) goto noop inst .$Y. cool~ ptr a = (void ) mem :up . h Get ready for the next step of PREGO 229 i  head~ inst = (head~ inst & ((Icache~ bb 1)  16)) # 10000.$Y. This code is used in section 81. b b b This code is used in section 81. into the cache if it is not too busy. if (cool~ xx  Dcache~ bb ) cool~ interim = true .$Z . Similar considerations apply to the instructions PREGO X.$Y.$Y. If it is invoked by starting in state 0. 229.$Z. The instruction PRELD X.$Y. case prego : if (:Icache ) goto noop inst . is occasionally called into action to remove dirty data from the D-cache and S-cache. $Y + $Z + 2 . If the block size is 64. a command like PREST 200.$Z . PREST 127. break .$Z generates bX=2 c commands if there are 2 bytes per block in the D-cache. Another coroutine. we act on them only if it is reasonably convenient to do so.$Z is considered to a ect bytes $Y + $Z + 192 through $Y + $Z + 200.MMIX-PIPE: 250 CACHE TRANSFERS 227.$Z is actually issued as four commands PREST 200. cool~ ptr a = (void ) mem :up . In the pipeline. 228.$Y. if (cool~ xx  Icache~ bb ) cool~ interim = true . : : : . break .$Y. 230. or fewer bytes if $Y + $Z is not a multiple of 64.$Z. with its i . called cleanup .) h Get ready for the next step of PRELD or PREST 228 i  # head~ inst = (head~ inst & ((Dcache~ bb 1)  16)) 10000.$Y.

it will clean everything. It can also be invoked in state 4. with its i .eld set to sync .

eld set to syncd and with a physical address in its z:o .

otherwise . then it simply makes sure that no D-cache or S-cache blocks associated with that address are dirty. Field x:o:h should be set to zero if items are expected to remain in the cache after being cleaned.eld.

If that coroutine dies. We assume that the D-cache and S-cache have some sort of way to identify their .eld x:o:h should be set to sign bit . The coroutine that invokes cleanup should hold clean lock . the cleanup coroutine will terminate prematurely. because of an interruption.

h Global variables 20 i + coroutine clean co . clean ctl :go :o:l = 4. clean co :name = "Clean" . h Initialize everything 22 i + clean co :ctl = &clean ctl . clean co :stage = cleanup . in access time cycles. lockvar clean lock . control clean ctl .rst dirty block. . 231. if any.

preld = 61. x60. x44. interim : bool . x44. x44. for the D-cache 233 i. Dcache : cache . inst : tetra . x115. i: register int . unsigned char . state : int . x44. x167. o: octa . x168. x49. sync = 79. x49. ptr a : void . syncd = 64. x129. name : char . x17. x23. cleanup = 91. terminate : label. x44. x37. x68. x44. x23. lockvar = coroutine .251 MMIX-PIPE: CACHE TRANSFERS 232. coroutine = struct . stage : int . sign bit = macro. prest = 62. x49. data : register control . case 10: goto terminate . x44. x69. bb : int . x23. z : spec . x167. x: specnode. x167. h: tetra . x40. x118. x49. switch (data~ state ) f h Cases 0 through 4. x40. p: register cacheblock . x44. x12. noop inst : label. ctl : control . mem : specnode . x44. x124. x125. x23. cacheblock = struct . x80. true = 1. Icache : cache . x 258. x11. cool : control . l: tetra . prego = 73. up : specnode . head : fetch . x17. x49. x168. h Cases 5 through 9. control = struct . ptr b : void . h Cases for control of special coroutines 126 i + case cleanup : p = (cacheblock ) data~ ptr b . xx : . g access time : int . for the S-cache 234 i. go : specnode .

wait (Dcache~ access time ). i = j = 0. data~ ptr b = (void ) p. Dcache~ access time ). g goto Dclean loop . data~ state = 2. Dclean : data~ state = 1. wait (Dcache~ copy out time ). if (:is dirty (Dcache . i ++ . ush cache (Dcache . Dclean inc : j ++ . data~ x:o:h  0). Dcache~ lock ). case 4: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1). goto Dclean inc . i = data~ y:o:h. wait (Dcache~ access time ). if (data~ i 6= sync ) goto Sprep . j = data~ y:o:l. data~ state = 3. for the D-cache 233 i  case 0: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1). if (p) f demote and . startup (&Dcache~ reader [j ]. p. case 1: if (Dcache~ usher :next ) wait (1). startup (&Dcache~ reader [j ]. Dcache~ lock ). Dcache~ access time ). if (i < Dcache~ cc ^ j  Dcache~ aa ) j = 0. p~ tag :h j= data~ x:o:h. h Cases 0 through 4. Dcache~ lock ). if (i  Dcache~ cc ^ j  Dcache~ vv ) f data~ state = 5. release lock (self . case 3: if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1). Dclean loop : p = (i < Dcache~ cc ? &(Dcache~ set [i][j ]) : &(Dcache~ victim [j ])). p)) f p~ tag :h 6= data~ x:o:h. startup (&Dcache~ reader [j ]. p = cache search (Dcache . data~ z:o). set lock (self . if (p~ tag :h & sign bit ) goto Dclean inc . = premature termination = case 2: if (:clean lock ) goto done . g data~ y:o:h = i. Dcache~ access time ). Dcache~ lock ).MMIX-PIPE: CACHE TRANSFERS 233. data~ y:o:l = j . set lock (self . set lock (self . if (Dcache~ usher :next ) wait (1).

if (:Scache ) goto done . if (is dirty (Dcache . Sclean loop : p = (i < Scache~ cc ? &(Scache~ set [i][j ]) : &(Scache~ victim [j ])). This code is used in section 232. wait (Dcache~ access time ). 234. g data~ state = 9.x (Dcache . i = j = 0. for the S-cache 234 i  case 5: if (self ~ lockloc ) (self ~ lockloc ) = . Scache~ lock ). p). if (Scache~ lock ) wait (1). p)) goto Dclean . 252 . set lock (self . h Cases 5 through 9. self ~ lockloc = .

if (i < Scache~ cc ^ j  Scache~ aa ) j = 0. set lock (self . if (Scache~ lock ) wait (1). g goto Sclean loop . wait (Scache~ access time ). i = data~ y:o:h. wait (Scache~ copy out time ). goto Sclean inc . Sprep : data~ state = 9. case 9: if (self ~ lockloc ) release lock (self . if (:Scache ) goto done . if (p) f demote and . Sclean : data~ state = 6. Scache~ lock ). if (i  Scache~ cc ^ j  Scache~ vv ) f data~ state = 10. data~ y:o:l = j .253 MMIX-PIPE: CACHE TRANSFERS if (p~ tag :h & sign bit ) goto Sclean inc . Sclean inc : j ++ . p~ tag :h j= data~ x:o:h. data~ state = 7. ush cache (Scache . p. case 8: if (Scache~ lock ) wait (1). Scache~ lock ). set lock (self . Scache~ lock ). if (:is dirty (Scache . p)) f p~ tag :h 6= data~ x:o:h. data~ z:o). data~ ptr b = (void ) p. release lock (self . Dcache~ lock ). case 6: if (Scache~ usher :next ) wait (1). if (Scache~ usher :next ) wait (1). i ++ . data~ x:o:h  0). g data~ y:o:h = i. wait (Scache~ access time ). p = cache search (Scache . j = data~ y:o:l. data~ state = 8. = premature termination = case 7: if (:clean lock ) goto done . if (data~ i 6= sync ) goto done .

Dcache : cache . x168. access time : int . wait (Scache~ access time ). x 124. This code is used in section 232. demote and . x230. p)) goto Sclean . copy out time : int . x167. x193. p). x167. x167. x167. if (is dirty (Scache . g data~ state = 10. clean lock : lockvar .x (Scache . data : register control . aa : int . cc : int . cache search : static cacheblock ( ).

x258. x44. x203. x167. vv : int . state : int . release lock = macro ( ). z : spec . x170. x80. o: octa . l: tetra . y : spec . get reader : h: tetra . wait = macro ( ). x167. x12. x23. tag : octa . x44. x17. x199. i: register int . sync = 79. x37. x44. lockloc : coroutine . x167. ush cache : static static int ( ). x12. x167. victim : cacheset . void ( ). x: specnode . x167. x125. startup : static void ( ). done : label. x125. sign bit = macro. x49. x167. . x44. Scache : cache . ptr b : void . x44. x40. x23. x31. x168. is dirty : static bool ( ). x17.x : static cacheblock ( ). p: register cacheblock . set : cacheset . set lock = macro ( ). x37. self : register coroutine . j : register int . x167. x124. reader : coroutine . x44. lock : lockvar . i: internal opcode . usher : coroutine . next : coroutine . x183.

Then the rules say that we should . for loading page table pointers and page table entries. In e ect. Special arrays of coroutines and control blocks come into play when we need to implement MMIX 's rather complicated page table mechanism for virtual address translation. that we need to translate a virtual address for the DT-cache in which the virtual page address (a4 a3 a2 a1 a0 )1024 of segment i has a4 = a3 = 0 and a2 6= 0. we have up to ten control blocks outside of the reorder bu er that are capable of executing instructions just as if they were part of that bu er. The \opcodes" of these non-abortable instructions are special internal operations called ldptp and ldpte . Suppose. for example.MMIX-PIPE: 254 VIRTUAL ADDRESS TRANSLATION 235. Virtual address translation.

rst .

then another page table pointer p1 in location p2 + 8a1 . and .nd a page table pointer p2 in physical location 213 (r + b + 2) + 8a2 .

#de.8a1 x.8a2 x. Coroutine c corresponds to the instruction that involves a and computes p . The LDPTP and LDPTE commands return zero if their y operand is zero or if the page table does not properly match rV. The simulator achieves this by setting up three coroutines c0 .nally the page table entry p0 in location p1 + 8a0 .x.x.8a0 i where x is a hidden internal register and the other quantites are immediate values. Slight changes to the normal functionality of LDO give us the actions needed to implement LDPTP and LDPTE .[263 + 213 (r + b + 2)]. when c0 has computed its value p0 . c1 . c2 whose control blocks correspond to the pseudo-instructions i LDPTP LDPTP LDPTE x. we know how to translate the original virtual address.

ne LDPTP PREGO = internally this won't cause confusion = j j j #de.

= each coroutine is a two-stage pipeline = char IPTname [5] = f"IPT0" . "IPT4" g. DPTctl [5]. char DPTname [5] = f"DPT0" . "DPT2" . = control blocks for I and D page translation = coroutine IPTco [10]. "IPT2" . "IPT3" . .ne LDPTE GO h Global variables 20 i + control IPTctl [5]. DPTco [10]. "IPT1" . "DPT1" . "DPT4" g. "DPT3" .

h Initialize everything 22 i + for (j = 0. IPTctl [j ]:go :o = DPTctl [j ]:go :o = incr (neg one . DPTctl [0]:i = IPTctl [0]:i = ldpte . 4). IPTctl [j ]:ptr a = DPTctl [j ]:ptr a = (void ) &mem . IPTco [2  j + 1]:stage = DPTco [2  j + 1]:stage = 2. IPTco [2  j ]:stage = DPTco [2  j ]:stage = 1. g ITcache~ . DPTco [2  j ]:name = DPTco [2  j + 1]:name = DPTname [j ]. IPTctl [j ]:ren x = DPTctl [j ]:ren x = true . IPTctl [j ]:loc = DPTctl [j ]:loc = neg one . DPTctl [j ]:i = IPTctl [j ]:i = ldptp . IPTco [2  j ]:ctl = &IPTctl [j ]. else DPTctl [0]:op = IPTctl [0]:op = LDPTE . j < 5. if (j > 0) DPTctl [j ]:op = IPTctl [j ]:op = LDPTP .255 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 236. IPTco [2  j ]:name = IPTco [2  j + 1]:name = IPTname [j ]. j ++ ) f DPTco [2  j ]:ctl = &DPTctl [j ]. IPTctl [j ]:x:addr :h = DPTctl [j ]:x:addr :h = 1.

DTcache~ .ller ctl :ptr c = (void ) &IPTco [0].

ctl : control . . control = struct . x40. addr : octa . DTcache : cache .ller ctl :ptr c = (void ) &DPTco [0]. x44. x23. coroutine = struct . x23. x168.

x44. x: specnode . ldptp = 57. x40. mem : specnode . x44. x44. h: tetra . x47. x49. GO = # 9e. x44. mmix opcode . x10. MMIX-ARITH x4. incr : octa ( ). x115. neg one : octa . x49. go : specnode . x168. op : . x44. x167. x23. x17. j : register int . x47. loc : octa . ldpte = 58. name : char . x44. ren x : bool . MMIX-ARITH x6. i: internal opcode . x44. x11. stage : int . o: octa .ller ctl : control . ITcache : cache . true = 1. x23. ptr c : void . x44. ptr a : void . PREGO = # 9c.

MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 256 237. Page table calculations are invoked by a coroutine of type .

ll from virt . which is used to .

The calling conventions of .ll the IT-cache or DT-cache.

ll from virt are analogous to those of .

ll from mem or .

who holds the cache's . We wake up the caller. and data~ ptr a points to a cache (ITcache or DTcache ).ll from S : A virtual address is supplied in data~ y:o. while data~ ptr b is a block in that cache.

unless the caller has been aborted.) h Cases for control of special coroutines 126 i + case . as soon as the translation of the given address has been calculated. (No second wakeup call is necessary.ll lock .

ll from virt : f register cache c = (cache ) data~ ptr a . register coroutine cc = c~ .

g h Compute the new entry for c~ inbuf and give the caller a sneak preview 245 i.ll lock . case 2: if (c~ lock ) wait (1). case 1: if (data~ b:p) f if (data~ b:p~ known ) data~ b:o = data~ b:p~ o. switch (data~ state ) f case 0: h Start up auxiliary coroutines to compute the page table entry 243 i. case 3: data~ b:o = zero octa . = &IPTco [0] or &DPTco [0] = octa aaaaa . register coroutine co = (coroutine ) data~ ptr c . c~ lock ). data~ b:p = . we recompute all these variables. etc. goto terminate . for convenience. (cacheblock ) data~ ptr b ). page s . data~ state = 3.. set lock (self . The current contents of rV. else wait (1). Whenever rV changes. data~ state = 2. are kept unpacked in several global variables page r . h Global variables 20 i + = the 10-bit n . the special virtual translation register. load cache (c. wait (c~ copy in time ). data~ state = 1. g g 238.

= the 27-bit r . times 8 = int page n .eld of rV.

= the 8-bit s .eld of rV = int page r .

eld of rV = int page s . = the 4-bit b .

page b [0] = 0 = int page b [5].elds of rV. = the least signi.

1).cant s bits = octa page mask . rv = shift right (rv . rv = data~ z:o. page n = rv :l & # 1ff8. 239. = does rV violate the rules? = bool page bad = true . page bad = (rv :l & 7 ? true : false ). 13. h Update the page variables 239 i  f octa rv . .

27. and how we compute a physical address from a translation found in the cache. page mask :l = (1  page s ) 1.257 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION page r = rv :l & # 7ffffff. page s = rv :l & # ff. rv = shift right (rv . page b [1] = (rv :l  20) & # f. page b [4] = (rv :l  8) & # f. 1). 240. if (page s < 13 _ page s > 48) page bad = true . page b [2] = (rv :l  16) & # f. #de. else if (page s < 32) page mask :h = 0. else page mask :h = (1  (page s 32)) 1. page b [3] = (rv :l  12) & # f. g This code is used in section 329. Here's how we compute a tag of the IT-cache or DT-cache from a virtual address. page mask :l = # ffffffff.

h Subroutines 14 i + static octa phys addr (virt . t:l &= 8. Cheap (and slow) versions of MMIX leave the page table calculations to software. page mask )). page n ) h Internal prototypes 13 i + static octa phys addr ARGS ((octa . oand (virt . f octa t. octa )). . t = trans . If the global variable no hardware PT is set true. g 242. 241.ne trans key (addr ) incr (oandn (addr . = zero out the protection bits = return oplus (t. page mask ). trans . trans ) octa virt .

x124. x44. Extern = macro. not state 0. DTcache : cache . x167. ARGS b: b.) h External variables 4 i + Extern bool no hardware PT . bool = enum. x235. . x4. x11. (See the RESUME_TRANS operation. data : register control . cacheblock = struct . x23. x11. x167. coroutine = struct .ll from virt begins its actions in state 1. = macro. MMIX-DOC x45. x168. cache = struct . x6. spec . DPTco : coroutine [ ]. x167. copy in time : int . false = 0.

x129. .ll from mem = 95.

x129.ll from S = 94. .

x129. .ll from virt = 93.

shift right : octa ( ). MMIX-ARITH x25. x11. zero octa : octa . oplus : octa ( ). x125. oand : octa ( ). o: octa . void . IPTco : coroutine [ ]. h: tetra . lock : lockvar . spec . x44. inbuf : cacheblock . x17. x40. x37. incr : octa ( ). MMIX-ARITH x5. known : bool . load cache : static void ( ). x167. x40. x17. 124. x167. oandn : octa ( ). terminate : label. MMIX-DOC x45. x44. wait = macro ( ). . 45. x17. MMIX-ARITH x7. x44. n. ptr a : void . x125. x168. = 3. x201. l: tetra . void . 45. state : int . ptr b : ptr c : r. x40. set lock = macro ( ). MMIX-ARITH x4. MMIX-DOC x RESUME_TRANS s MMIX-DOC x . x44. p: specnode . y: z: spec . x235. true = 1. x44. x167. self : x register coroutine . x44. octa = struct . ITcache : cache .ll lock : lockvar . x320. MMIX-ARITH x25. MMIX-ARITH x6.

aaaaa = shift right (aaaaa . = the segment number = aaaaa :h &= # 1fffffff. since data~ b:o is zero = if (page b [i + 1] < page b [i] + j ) = . i = aaaaa :h  29. h Start up auxiliary coroutines to compute the page table entry 243 i  aaaaa = data~ y:o. g This code is used in section 237. Note: The operating system is supposed to ensure that changes to the page table entries do not appear in the pipeline when a translation cache is being updated. 1). 10. co [0]:ctl~ z:o = zero octa . for (j = 0. The . The internal LDPTP and LDPTE instructions use only the \hot state" of the memory system. aaaaa :l 6= 0 _ aaaaa :h 6= 0. 1). else f if (j  0) j = 1. = the address within segment i = = the page address = aaaaa = shift right (aaaaa . co [2  j ]:ctl~ z:o:l = (aaaaa :l & # 3ff)  3. g = address too large = nothing needs to be done. 244. j ++ ) f co [2  j ]:ctl~ z:o:h = 0.MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 258 243. h Issue j pseudo-instructions to compute a page table entry 244 i. page s .

startup (&co [2  j ]. . 1). 13). co [2  (j 1)]:ctl~ y:p = &co [2  j ]:ctl~ x.rst stage of coroutine c is co [2  j ]. aaaaa :l = page r + page b [i] + j . for ( . j ) f co [2  j ]:ctl~ x:o = zero octa . which will load page table information from memory (or hopefully from the D-cache). It will pass the j th control block to the second stage. co [2  j ]:ctl~ owner = &co [2  j ]. . h Issue j pseudo-instructions to compute a page table entry 244 i  j j . g data~ b:p = &co [0]:ctl~ x. co [2  j ]:ctl~ y:o:h += sign bit . if (j  0) break . This code is used in section 243. co [2  j ]:ctl~ x:known = false . co [2  j ]:ctl~ y:o = shift left (aaaaa . co [2  j + 1]. co [2  j ]:ctl~ y:p = .

Its least signi.259 MMIX-PIPE: VIRTUAL ADDRESS TRANSLATION 245. At this point the translation of the given virtual address data~ y:o is the octabyte data~ b:o.

cant three bits are the protection code p = p p p . its page address .

It is entirely zero. data : octa . sign bit = macro. x44. x40. zero octa : octa . page b : int [ ]. tag : octa . x12. including the protection bits. x40. awaken = macro ( ). x238. register cache . x44. x12. h: tetra . co : register coroutine . x125. false = 0. i: register int . p: specnode . x17. x: y: z: specnode. h Compute the new entry for c~ inbuf and give the caller a sneak preview 245 i  c~ inbuf :tag = trans key (data~ y:o). x44. ctl : control . x238. x31. x11. aaaaa : octa . x44. x238. c~ inbuf :data [0] = data~ b:o. x80. x237. x40. shift right : octa ( ). x17. page s : int . j : register int . x23. l: tetra . page r : int . r w x s g This code is used in section 237. x167. b: c: spec . MMIX-ARITH x7. x167. x124. o: octa . MMIX-ARITH x7. 1). x237. if there was a page table failure. known : bool . x44. startup : static void ( ). if (cc ) f cc~ ctl~ z:o = data~ b:o. . inbuf : cacheblock. cc : register coroutine . x237. spec . shift left : octa ( ). awaken (cc . MMIX-ARITH x4. x240. owner : coroutine . x237. trans key = macro ( ).eld is scaled by 2 . spec . data : register control . x167.

When such instructions . The write bu er.MMIX-PIPE: THE WRITE BUFFER 260 246. The dispatcher has arranged things so that speculative stores into memory are recorded in a doubly linked list leading upward from mem .

they enter the \write bu er. h Type de. The \hot state" of the computation is re ected not only by the registers and caches but also by the instructions that are pending in the write bu er." which holds octabytes that are ready to be written into designated physical memory addresses (or into the D-cache and/or S-cache).nally are committed.

nitions 11 i + typedef struct f octa o. with elements write tail + 1. The data will sit at least holding time cycles before it leaves the write bu er. = data to be stored = octa addr . write head . write tail + 2. = its physical address = tetra stamp . g write node . 247. We represent the bu er in the usual way as a circular list. : : : . = when last committed (mod 232 ) = = is this write special? = internal opcode i. This speeds things up when di erent .

= minimum holding time = Extern int holding time . write ctl :go :o:l = 4. h External variables 4 i + Extern write node wbuf bot . h Subroutines 14 i + static void print write bu er ( ) f printf ("Write buffer" ). 249. = should we ignore holding time ? = Extern lockvar speed lock . write head = write tail = wbuf top . write ctl :ptr a = (void ) &mem . write co :stage = write from wbuf . = front and rear of the write bu er = = is the data in write head being written? = Extern lockvar wbuf lock . h Internal prototypes 13 i + static void print write bu er ARGS ((void )). h Global variables 20 i + = coroutine that empties the write bu er = coroutine write co . = least and greatest write bu er nodes = Extern write node write head . write tail .elds of the same octabyte are being stored by di erent instructions. startup (&write co . write co :name = "Write" . 248. wbuf top . h Initialize everything 22 i + write co :ctl = &write ctl . if (write head  write tail ) printf (" (empty)\n" ). 250. = its control block = control write ctl . . 251. 1).

p 6= write tail . print octa (p~ o). printf (" (age %d)\n" . ticks :l p~ stamp ). g g f g 252.261 MMIX-PIPE: THE WRITE BUFFER else f register write node p. for (p = write head . print octa (p~ addr ). The entire present state of the pipeline computation can be visualized by printing . p = (p  wbuf bot ? wbuf top : p 1)) printf ("m[" ). printf (":\n" ). if (p~ i  stunc ) printf (" unc" ). printf ("]=" ). else if (p~ i  sync ) printf (" sync" ).

stage : int . x49. Extern = macro. name : char . x129. internal opcode = enum . x4. tetra = unsigned int . print fetch bu er : static void ( ). x19.h>. <stdio. octa = struct . then the fetch bu er. void ( ). print fetch bu er ( ). x115. sync = 79. go : specnode . write from wbuf = 92. x49. x23. print reorder bu er ( ). x17. control = struct . mem : specnode . coroutine = struct . x44. x17. from sizzling hot to ice cold. o: printf : int ( ). then the reorder bu er. startup : static void ( ). print reorder bu er : static ticks = macro. . octa .rst the write bu er. l: tetra . 253. x44. x23. x31. x17. ptr a : void . x73. x23. x49. x23. x87. = macro. h External routines 10 i + void print pipe ( ) f g ARGS print write bu er ( ). This shows the progression of results from oldest to youngest. x37. x63. x6. lockvar = coroutine . print octa : static void ( ). x44. stunc = 67. h External prototypes 9 i + Extern void print pipe ARGS ((void )). ctl : control . x40.

If not. The write search routine looks to see if any instructions ahead of a given place in the mem list of the reorder bu er are storing into a given physical address. or if there's a pending instruction in the write bu er for that address. The search starts at the x:up . If so. it returns . the subroutine returns the special code value DUNNO .MMIX-PIPE: THE WRITE BUFFER 262 254. If the answer is currently unknown. it returns a pointer to the value to be written. because at least one possibly relevant physical address has not yet been computed.

otherwise at the ptr a .eld of a control block for a store instruction.

eld of the control block. The i .

It may also be sync (from SYNC 1 or SYNC 3 ) or stunc (from STUNC ).eld in the write bu er is usually st or pst . #de. inherited from a store or partial store command.

if (q~ addr :l  addr :l ^ q~ addr :h  addr :h) return &(q~ o). h Subroutines 14 i + static octa write search (ctl . but it will also increase the number of bu ered items when writes are to di erent locations. for ( . ) f if (q  write head ) return . we believe the most recent hint. p 6= &mem . register write node q = write tail . Increasing the value of holding time will increase the chance that this economy is possible. octa )). When we're committing new data to memory. 255. even if it doesn't cause an interrupt. if (hot~ i 6= sync ) for ( . . octa addr . ) f . unless that item is already in the process of being written out. g g 256. addr ) control ctl . if (q  wbuf top ) q = wbuf bot . . if (hot~ interrupt & (F_BIT + # ff)) goto done with write . p = p~ up ) f if (p~ addr :h  1) return DUNNO . if ((p~ addr :l & 8)  addr :l ^ p~ addr :h  addr :h) return (p~ known ? &(p~ o) : DUNNO ). h Commit to memory if possible.ne DUNNO ((octa ) 1) = an impossible non- pointer = h Internal prototypes 13 i + static octa write search ARGS ((control . otherwise break 256 i  f register write node q = write tail . f register specnode p = (ctl~ mem x ? ctl~ x:up : (specnode ) ctl~ ptr a ). When \store" is followed by \store uncached" at the same address. A store instruction that sets any of the eight interrupt bits rwxnkbsp will not a ect memory. else q ++ . g for ( . addr :l &= 8. we can update an existing item in the write bu er if it has the same physical address. or vice versa.

h: tetra . x247. x40. (q~ addr :l  hot~ x:addr :l ^ q~ addr :h  hot~ x:addr :h ^ (q 6= write head _ :wbuf lock )) goto addr found . = p. x44. write head : write node . ptr a : void . F_BIT = 1  17. addr : addr : octa . write tail q~ addr = hot~ x:addr . octa . stunc = 67. q~ stamp = ticks :l. mem slots : int . x246. (q  wbuf top ) q = wbuf bot . g q = write tail . x246. holding time : int . done with write : spec rem (&(hot~ x)). write node = struct . g This code is used in section 146. (q~ i  sync ) break . x247. x87. octa = struct . x115. octa .263 MMIX-PIPE: if if if if THE WRITE BUFFER (q  write head ) break . x17. st = 63. x247. mem slots ++ . stamp : tetra . x6. x40. x44. x49. addr found : q~ o = hot~ x:o. interrupt : unsigned int . pst = 66. x247. x49. mem : specnode . = the write bu er is full = if (p  write head ) break . spec rem : static void ( ). octa . known : bool . x17. x97. x246. . x60. write tail : write node . l: tetra . x: specnode . x44. x246. wbuf lock : lockvar . x86. x40. x247. o: o: sync = 79. i: internal opcode . x17. mem x : bool . control = struct . x44. up : specnode . q~ i = hot~ i. x44. x40. g f register write node p = (write tail  wbuf bot ? wbuf top : write tail 1). x54. wbuf top : write node . x246. specnode = struct . x40. x49. i: internal opcode . x44. ARGS = macro. x49. else q ++ . ticks = macro. hot : control . x247. wbuf bot : write node .

A special coroutine whose duty is to empty the write bu er is always active.MMIX-PIPE: THE WRITE BUFFER 264 257. It holds the wbuf lock while it is writing the contents of write head . It holds Dcache~ .

ll lock while waiting for the D-cache to .

sleep . write restart : data~ state = 0. if there's a cache hit 262 i. g 258. else write head . case 2: data~ state = 0. case 5: if (write head  wbuf bot ) write head = wbuf top . if (write head~ i  sync ) h Ignore the item in write head 264 i. startup (&Dcache~ reader [j ]. Dcache~ access time ). mem direct : h Write directly from write head to memory 260 i. data~ state = 2. h Write the data into the D-cache and set state = 4.ll a block. data~ state = 5. sleep . The granularity is guaranteed to be 8 in write-around mode (see MMIX con. wait (Dcache~ access time ). 259. = write bu er is empty = if (write head  write tail ) wait (1). case 0: if (self ~ lockloc ) (self ~ lockloc ) = . h Local variables 12 i + register cacheblock p. = not cached = = D-cache busy = if (Dcache~ lock _ (j = get reader (Dcache ) < 0)) wait (1). self ~ lockloc = . = data too raw = if (:Dcache _ (write head~ addr :h & # ffff0000)) goto mem direct . h Cases for control of special coroutines 126 i + case write from wbuf : p = (cacheblock ) data~ ptr b . switch (data~ state ) f case 4: h Forward the new data past the D-cache if it is write-through 263 i. case 1: h Try to put the contents of location write head~ addr into the D-cache 261 i. q. if (ticks :l write head~ stamp < holding time ^ :speed lock ) wait (1). = wake up when the D-cache has the block = case 3: h Handle write-around when writing to the D-cache 259 i. data~ state = ((Dcache~ mode & WRITE_ALLOC ) ^ write head~ i 6= stunc ? 1 : 3).

j < Dcache~ bb  Dcache~ g. h Handle write-around when writing to the D-cache 259 i  if (Dcache~ usher :next ) wait (1). wbuf lock ). wait (Dcache~ copy out time ).g ). Dcache~ copy out time ). Dcache~ outbuf :data [(write head~ addr :l & (Dcache~ bb 1))  3] = write head~ o. set lock (self . This code is used in section 257. data~ state = 5. Dcache~ outbuf :tag :l = write head~ addr :l & ( Dcache~ bb ). Although an uncached store will not be stored in the D-cache (unless it hits in the D-cache). Dcache~ outbuf :tag :h = write head~ addr :h. startup (&Dcache~ usher . for (j = 0. . it will go into a secondary cache. j ++ ) Dcache~ outbuf :dirty [j ] = false . Dcache~ outbuf :dirty [(write head~ addr :l & (Dcache~ bb 1))  Dcache~ g] = true .

wbuf lock ). mem addr time + mem write time ). write head~ o). mem write (write head~ addr . 261. = a coroutine of type vanish = set lock (&mem locker . This code is used in section 257. another instruction might be . wait (mem addr time + mem write time ). A subtlety needs to be mentioned here: While we're trying to update the Dcache. mem lock ).265 MMIX-PIPE: THE WRITE BUFFER 260. startup (&mem locker . set lock (self . data~ state = 5. h Write directly from write head to memory 260 i  if (mem lock ) wait (1).

lling the same cache block (although not because of the same physical address). h Try to put the contents of location write head~ addr into the D-cache 261 i  if (Dcache~ . Therefore we goto write restart here instead of saying wait (1).

ller :next ) goto write restart . p = alloc slot (Dcache . if (:p) goto write restart . write head~ addr ). if (Scache ) set lock (&Dcache~ . if ((Scache ^ Scache~ lock ) _ (:Scache ^ mem lock )) goto write restart .

ller . Scache~ lock ) else set lock (&Dcache~ .

set lock (self . Dcache~ . mem lock ).ller .

ll lock ). data~ ptr b = Dcache~ .

ller ctl :ptr b = (void ) p. Dcache~ .

startup (&Dcache~ .ller ctl :z:o = write head~ addr .

x17. x167. x213. x127. lock : lockvar .ller . Scache ? Scache~ access time : mem addr time ). x23. mem lock : lockvar . addr : octa . mem write time : int . access time : int . lockloc : coroutine . mem write : void ( ). alloc slot : static cacheblock l: tetra . x167. mem locker : coroutine . MMIX con. This code is used in section 257. x214. x246. mem addr time : int . x214. x214.

data : register control . x167. int . false = 0. x205. x40. cacheblock = struct .g : void ( ). data : octa . dirty : char . ( ). x167. x167. x168. copy out time : int . . o: octa . x246. x167. x23. o: octa . x124. x167. MMIX-CONFIG x38. next : coroutine . x167. x11. mode : int . Dcache : cache .

ll lock : lockvar . x167. .

x167. .ller : coroutine . x167. outbuf : cacheblock .

 bb : sleep = macro. stunc = 67. x11. wait = macro ( ). h: tetra . x247. p: register write node . x129. x256. 247. wbuf bot : write node . spec . x124. wbuf top : write node . x166. wbuf lock : lockvar . . x44. stamp : tetra . x167. x168. usher : coroutine . sync = 79. x247. x167. x246. x44. x167. x167. x183. x31. x12. x247. speed lock : lockvar . x246. x247. x87. x129. x125. x167. set lock = macro ( ). x17. x49. x37. write from wbuf = 92. i: internal opcode . holding time : int . x49. x247. get reader : static int ( ). x125. true = 1. Scache : cache . startup : static void ( ). g : int . reader : coroutine . j : register int . state : int . WRITE_ALLOC = 2. vanish = 98. x44. x write tail : z: write node .ller ctl : control . x247. self : register coroutine . tag : octa . ticks = macro. write head : write node . ptr b : void .

h Write the data into the D-cache and set state = 4. if there's a cache hit 262 i  p = cache search (Dcache .MMIX-PIPE: THE WRITE BUFFER 266 262. The D-cache is not locked. if (p) f p = use and . since other coroutines that might be simultaneously reading the D-cache are not going to use the octabyte that changes. Perhaps the simulator is being too lenient here. write head~ addr ). Here it is assumed that Dcache~ access time is enough to search the D-cache and update one octabyte in case of a hit.

264. set lock (self . wait (1). h Forward the new data past the D-cache if it is write-through 263 i  if ((Dcache~ mode & WRITE_BACK )  0) f = write-through = if (Dcache~ usher :next ) wait (1). p~ data [(write head~ addr :l & (Dcache~ bb p~ dirty [(write head~ addr :l & (Dcache~ bb 1))  3] = write head~ o. wbuf lock ). . 1))  Dcache~ g] = true . data~ ptr b = (void ) p. wbuf lock ). data~ state = 4. p. g This code is used in section 257. 263. true ). wait (Dcache~ access time ). ush cache (Dcache . h Ignore the item in write head 264 i  f set lock (self . g This code is used in section 257. p). data~ state = 5. g This code is used in section 257.x (Dcache .

267 access time : int . data : data : MMIX-PIPE: 203. x167. x246. o: octa . x246. x167. dirty : char . g : int . x23. x258. x17. x124. register control . addr : octa . x usher : LOADING AND STORING x 124. true = 1. next : coroutine . x44. Dcache : cache . state : int . x167. x167. l: tetra . mode : int . x168. cache search : static cacheblock ( ). x167. x167. octa . x37. x167. bb : int . coroutine . set lock = macro ( ). x44. x11. x193. use and . p: register cacheblock . self : register coroutine . ush cache : static void ( ). ptr b : void .

write node . = 1. x125. wait = macro ( ). . x166. WRITE_BACK write head : x 247. wbuf lock : lockvar . x196.x : static cacheblock ( ). x247.

Is there an elegant way to handle all the contingencies? Alas. A RISC machine is often said to have a \load/store architecture. We want memory accesses to be eÆcient. but numerous cases must be dealt with when we miss.MMIX-PIPE: LOADING AND STORING 268 265." perhaps because loading and storing are among the most diÆcult things a RISC machine is called upon to do. The . Instructions like LDO x. so we try to access the D-cache at the same time as we are translating a virtual address via the DT-cache. Usually we hit in both caches. the author of this program was unable to think of anything better than to throw lots of code at the problem | knowing full well that such a spaghetti-like approach is fraught with possibilities for error. y. z operate in two pipeline stages. Loading and storing.

waiting if necessary until y and z are both known. then it starts to access the necessary caches.rst stage computes the virtual address y + z . In the second stage we ascertain the corresponding physical address and hopefully .

The second stage of a store command can begin even though x is not known during the . In this case. An instruction like STB x. z .nd the data in the cache (or in the speculative mem list or the write bu er). z shares some of the computation of LDO x. however. and mem is the output. y. because only one byte is being stored but the other seven bytes must be found in the cache. x is treated as an input. y.

#de.rst stage. Here's what we do at the beginning of stage 1.

ne ld st launch 7 = state when load/store command has its memory address = h Cases to compute the virtual address of a memory operation 265 i  case preld : case prest : case prego : data~ z:o = incr (data~ z:o. goto die . goto switch1 . case ldptp : case ldpte : if (data~ y:o:h) goto start ld st . data~ z:o). This code is used in section 132. data~ state = ld st launch . 266. data~ x:known = true . data~ xx & (data~ i  prego ? Icache : Dcache )~ bb ). = page table fault = data~ x:o = zero octa . = (I hope the adder is fast enough) = case ld : case ldunc : case ldvts : case st : case pst : case syncd : case syncid : start ld st : data~ y:o = oplus (data~ y:o. #de.

ne PRW_BITS (data~ i < st ? PR_BIT : data~ i  pst ? PR_BIT + PW_BIT : (data~ i  syncid ^ (data~ loc :h & sign bit )) ? 0 : PW_BIT ) h Special cases for states in the .

goto . if (page bad ) f if (data~ i  st _ (data~ i < preld ^ data~ i > syncid )) data~ interrupt j= PRW_BITS . if (data~ y:o:h & sign bit ) h Do load/store stage 1 with known physical address 271 i. h Handle special cases for operations like prego and ldvts 289 i.rst stage 266 i  = second stage must be clear = case ld st launch : if ((self + 1)~ next ) wait (1).

h Look up the address in the DT-cache. startup (&DTcache~ reader [j ]. and also in the D-cache if possible 267 i. .n ex . g if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1). DTcache~ access time ).

if both caches hit). See also sections 310. The data~ state will be DT hit if the physical address is known via the DT-cache. 267. data~ state will be DT miss if the virtual address key can't be found in the DT-cache. then stage 2 will have to compute the physical address the hard way. but the data may or may not be in the D-cache. This code is used in section 130. the state will depend on what transpired in stage 1. And data~ state will be ld ready if data~ x:o is the desired octabyte (for example. For example. and 363. 360.269 MMIX-PIPE: LOADING AND STORING pass after (DTcache~ access time ). = second stage state when DT-cache doesn't hold the key = #de. goto passit . When stage 2 of a load/store command begins. 326. The data~ state will be hit and miss if the DT-cache hits and the D-cache doesn't.

ne DT miss 10 = second stage state when physical address is known = #de.

ne DT hit 11 = second stage state when D-cache misses = #de.

ne hit and miss 12 = second stage state when data has been read = #de.

ne ld ready 13 = second stage state when data needn't be read = #de.

ne st ready 14 = second stage state when we can .

ll a block with zeroes = #de.

x167. . x124. die : label. bb : int . startup (&Dcache~ reader [j ]. and also in the D-cache if possible 267 i  p = cache search (DTcache . DTcache : cache . x167. Dcache~ access time ). x144. access time : int . data : register control . if (:Dcache _Dcache~ lock _(j = get reader (Dcache )) < 0 _(data~ i  st ^data~ i  syncid )) h Do load/store stage 1 without D-cache lookup 270 i. Dcache : cache . This code is used in section 266. if (p) h Do a simultaneous lookup in the D-cache 268 i else data~ state = DT miss .ne prest win 15 h Look up the address in the DT-cache. x168. cache search : static cacheblock ( ). trans key (data~ y:o)). x168. x193.

startup : static void ( ). x130. trans key = macro ( ). y : spec . octa ( ). ldptp = 57. x 124. known : bool . x31. x54. unsigned char . x44. x: specnode. x125. x11. next : coroutine . x44. MMIX-ARITH x5. ldunc = 59. x49. x238. x49. x54. xx : MMIX-ARITH x4. x44. st = 63. mem : specnode . x44. o: octa . wait = macro ( ). x40. x49. PW_BIT = 1  6. incr : octa ( ). x49. h: tetra . x167. x240. preld = 61. x49. x49. x23. x49. sign bit = macro. j : register int . Icache : cache . page bad : bool . x49. true = 1. x167. x115. lock : lockvar . loc : octa . p: register cacheblock . self : register coroutine . ld = 56. PR_BIT reader : coroutine . zero octa : octa . x125. x44. x258. x80. x49. prego = 73. x12. x17. x49. syncd = 64. x49. ldvts = 60. ldpte = 58. get reader : static int ( ). x134. x44. switch1 : label. . x40. MMIX-ARITH x6. x168. prest = 62.n ex : label. passit : label. = 1  7. x144. z : spec . oplus : pass after = macro ( ). x49. interrupt : unsigned int . syncid = 65. x44. pst = 66. x44. state : int . x183. i: internal opcode .

Dcache~ access time ) cycles. the machine will not know this until the DT-cache has hit. provided that the lower b + c bits of the two addresses are the same. but we go to DT hit instead of to hit and miss because the D-cache will experience a spurious miss. We assume that it is possible to look up a virtual address in the DT-cache at the same time as we look for a corresponding physical address in the D-cache.) If both caches hit. otherwise the operating system can try to make them the same by \page coloring" whenever possible. Therefore we simulate the operation of accessing the D-cache.MMIX-PIPE: LOADING AND STORING 270 268. the physical address is known in max(DTcache~ access time . #de. (They will always be the same if b + c  page s . If the lower b + c bits of the virtual and physical addresses di er.

data~ z:o).ne max (x. else if (m) data~ x:o = m. else f q = cache search (Dcache . m = write search (data . if (q) f if (data~ i  ldunc ) q = demote and . y) ((x) < (y) ? (y) : (x)) h Do a simultaneous lookup in the D-cache 268 i  f octa m. data~ state = ld ready . data~ z:o = phys addr (data~ y:o. h Update DT-cache usage and check the protection bits 269 i. data~ z:o). if (m  DUNNO ) data~ state = DT hit . p~ data [0]). else if (Dcache~ b + Dcache~ c > page s ^ ((data~ y:o:l  data~ z:o:l) & ((Dcache~ bb  Dcache~ c) (1  page s )))) = spurious D-cache lookup = data~ state = DT hit .

else q = use and . q).x (Dcache .

data~ state = hit and miss . q).x (Dcache . Dcache~ access time )). data~ x:o = q~ data [(data~ z:o:l & (Dcache~ bb 1))  3]. goto passit . . g data~ state = ld ready . g else pass after (max (DTcache~ access time . g This code is used in section 267.

PX_BIT . PW_BIT . The protection bits p p p in a translation cache are shifted four positions right from the interrupt codes PR_BIT .271 MMIX-PIPE: LOADING AND STORING 269. we abort the load/store operation immediately. If the data is protected. h Update DT-cache usage and check the protection bits 269 i  p = use and . this protects the privacy of other users.

goto . j = PRW_BITS . if (data~ i 6= preld ^ data~ i 6= prest ) data~ interrupt j= j & (p~ data [0]:l  PROT_OFFSET ).x (DTcache . p). if (((p~ data [0]:l  PROT_OFFSET ) & j ) 6= j ) f if (data~ i  syncd _ data~ i  syncid ) goto sync check .

x167. data : register control . x193. else f m = write search (data . x124. if (p) f h Update DT-cache usage and check the protection bits 269 i. and 272. p~ data [0]). goto passit . cache search : static cacheblock ( ). Dcache : cache . g This code is used in section 267. if (m ^ m 6= DUNNO ) data~ x:o = m. pass after (DTcache~ access time ). access time : int . x167. data~ z:o). else data~ state = DT hit . demote and . data~ z:o = phys addr (data~ y:o. if (data~ i  st ^ data~ i  syncid ) data~ state = st ready . 270. 270. data~ state = ld ready . bb : int . int . data : octa . x167.n ex . x167. x168. x167. g g else data~ state = DT miss . c: int . h Do load/store stage 1 without D-cache lookup 270 i  f octa m. r w x g This code is used in sections 268.

b: DT hit = 11.x : static cacheblock ( ). . DT miss = 10. x199. DUNNO = macro. DTcache : cache . x267. x267. x168. x254.

x258. = 1 6. x44. x49. phys addr : static octa ( ). x258. x267. = 1 5. ldunc = 59. x44. passit : label. state : int . page s : int . PROT_OFFSET = 5.n ex : label. x125. x54. i: internal opcode . pass after = macro ( ). x49. x144. 241. x370. syncd = 64. x49. = 1  7. preld = 61. x PR_BIT = macro. x40. o: octa . ld ready = 13. x267. prest = 62. x49. x17. x49. interrupt : unsigned int . x17. x266. syncid = 65. octa = struct . hit and miss = 12. x267. x54. p: register cacheblock . sync check : label. x12. use and . l: tetra . x49. j : register int . st ready = 14. x238. q : register cacheblock . x134. x54. x44. x54. st = 63.

z : spec . x44. x44.x : static cacheblock ( ). x196. x: specnode . x44. x255. write search : static octa ( ). PRW_BITS PW_BIT  PX_BIT  . y : spec .

if (:(data~ loc :h & sign bit )) f if (data~ i  syncd _ data~ i  syncid ) goto sync check . h Do load/store stage 1 with known physical address 271 i  f octa m. goto .MMIX-PIPE: 272 LOADING AND STORING 271. if (data~ i 6= preld ^ data~ i 6= prest ) data~ interrupt j= N_BIT .

goto passit . data~ x:o = mem read (data~ z:o). if (data~ i  st ^ data~ i  syncid ) f data~ state = st ready . mem lock ). pass after (1). set lock (&mem locker . g data~ z:o = data~ y:o. mem addr time + mem read time ).n ex . data~ state = ld ready . Dcache~ access time ). g startup (&Dcache~ reader [j ]. g else if ((data~ z:o:h & # ffff0000) _ :Dcache ) g f if (mem lock ) wait (1). if (m) f if (m  DUNNO ) data~ state = DT hit . goto passit . data~ z:o). pass after (1). q = cache search (Dcache . if (Dcache~ lock _ (j = get reader (Dcache )) < 0) f data~ state = DT hit . else data~ x:o = m. g m = write search (data . data~ z:o:h = sign bit . startup (&mem locker . data~ z:o). data~ state = ld ready . goto passit . if (q) f if (data~ i  ldunc ) q = demote and . pass after (mem addr time + mem read time ).

x (Dcache . q). else q = use and .

data~ state = hit and miss . goto passit . pass after (Dcache~ access time ). data~ state = ld ready . q). g else g This code is used in section 266. data~ x:o = q~ data [(data~ z:o:l & (Dcache~ bb 1))  3]. .x (Dcache .

x124. x168. Dcache : cache . cache search : static cacheblock ( ). bb : int . x193. octa . register control .273 access time : int . x167. x167. x167. demote and .

x199. .x : static cacheblock ( ). x17. ld ready = 13. data : data : DT hit = 11. x267. x210. mem read : octa ( ). x214. x214. x214. x167. mem locker : coroutine . x254. = macro. loc : octa . lock : lockvar . o: octa . x54. mem read time : int . x267. x127. x44. mem lock : lockvar . x40. ldunc = 59. MMIX-PIPE: l: tetra . N_BIT = 1  4. x49. mem addr time : int .

x49. st ready = 14. x183. x267. hit and miss = 12. x49. x80. use and . x44. preld = 61. i: internal opcode . startup : static void ( ). x44. x17. h: tetra . x44. x134. x125. passit : label. x49. j : register int . syncd = 64. x49. x37.n ex : label. x12. x258. x31. st = 63. x370. DUNNO LOADING AND STORING reader : coroutine . x17. sync check : label. set lock = macro ( ). interrupt : unsigned int . pass after = macro ( ). syncid = 65. prest = 62. get reader : static int ( ). x167. q: register cacheblock . sign bit = macro. state : int . x49. x144. x267. octa = struct .

write search : static octa ( ). x44. x: y: z: x 255. x44. . x44. x125. spec . specnode. x196. wait = macro ( ).x : static cacheblock ( ). spec .

rather long-winded. likewise. yet quite similar to the cache manipulations we have already seen several times.MMIX-PIPE: LOADING AND STORING 274 272. Several instructions might be trying to . The program for the second stage is.

(A similar situation faced us in the write from wbuf coroutine.) The second stage therefore needs to do some translation cache searching just as the .ll the DT-cache for the same page.

#de. because DT-cache misses are rare. we don't go all out for speed.rst stage did. however. In this stage.

ne DT retry 8 = second stage state when DT-cache should be searched again = #de.

else data~ state = DT hit . if (data~ i  st ^ data~ i  syncid ) data~ state = st ready . DTcache~ access time ). case DT miss : if (DTcache~ . wait (DTcache~ access time ). p~ data [0]). if (p) f h Update DT-cache usage and check the protection bits 269 i. data~ z:o = phys addr (data~ y:o. case DT retry : if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1).ne got DT 9 = second stage state when DT-cache entry has been computed = h Special cases for states in later stages 272 i  square one : data~ state = DT retry . trans key (data~ y:o)). g else data~ state = DT miss . startup (&DTcache~ reader [j ]. p = cache search (DTcache .

ller :next ) if (data~ i  preld _ data~ i  prest ) goto .

if (no hardware PT ) if (data~ i  preld _ data~ i  prest ) goto . else goto square one .n ex .

n ex . p = alloc slot (DTcache . data~ ptr b = DTcache~ . if (:p) goto square one . else goto emulate virt . trans key (data~ y:o)).

ller ctl :ptr b = (void ) p. DTcache~ .

DTcache~ .ller ctl :y:o = data~ y:o. set lock (self .

startup (&DTcache~ .ll lock ).

if (data~ i  preld _ data~ i  prest ) goto . data~ state = got DT .ller . 1).

DTcache~ . case got DT : release lock (self . else sleep .n ex .

ll lock ). if (((data~ z:o:l  PROT_OFFSET ) & j ) 6= j ) f if (data~ i  syncd _ data~ i  syncid ) goto sync check . j = PRW_BITS . data~ interrupt j= j & (data~ z:o:l  PROT_OFFSET ). goto .

if (data~ i  st ^ data~ i  syncid ) goto . data~ z:o). g data~ z:o = phys addr (data~ y:o.n ex .

299. 280. 276. 311. This code is used in section 135. 364. 279. = otherwise we fall through to ld retry below = See also sections 273. .nish store . and 370. 354.

275 MMIX-PIPE: LOADING AND STORING 273. The second stage might also want to .

ll the D-cache (and perhaps the S-cache) as we get the data. Several load instructions might be trying to .

A PRELD or PREST instruction.ll the same cache block. h Special cases for states in later stages 272 i + ld retry : data~ state = DT hit . So we should go back and look in the D-cache again if we miss and cannot allocate a slot immediately." doesn't do anything more if the caches are already busy. which is just a \hint. case DT hit : if (data~ i  preld _ data~ i  prest ) goto .

Dcache~ access time ). startup (&Dcache~ reader [j ].n ex . h Check for a hit in# pending writes 278 i. q = cache search (Dcache . data~ z:o). if (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1). if ((data~ z:o:h & ffff0000) _ :Dcache ) h Do load/store stage 2 without D-cache lookup 277 i. if (q) f if (data~ i  ldunc ) q = demote and .

q).x (Dcache . else q = use and .

bb : int . g else data~ state = hit and miss . access time : int . x124. demote and . data~ x:o = q~ data [(data~ z:o:l & (Dcache~ bb 1))  3]. octa . x277. x167. x205. h Try to get the contents of location data~ z:o in the D-cache 274 i. wait (Dcache~ access time ). register control . x193. case hit and miss : if (data~ i  ldunc ) goto avoid D .  avoid D : label. cache search : static cacheblock ( ). x167. q). data~ state = ld ready .x (Dcache . x168. Dcache : cache . x167. alloc slot : static cacheblock ( ).

DT miss = 10. emulate virt : label. x199. x310. . data : data : DT hit = 11. x168. x267. x267. DTcache : cache .x : static cacheblock ( ).

x167. .ll lock : lockvar .

.ller : coroutine . x167.

x167.ller ctl : control . .

x144.n ex : label. .

i: internal opcode . x267. x 241. x167. x44. x267. phys addr : static octa ( ). o: p: octa . x31. set lock = macro ( ). sleep = macro. prest = 62. h: tetra . 54. register cacheblock . syncid = 65. PROT_OFFSET x PRW_BITS x ptr b void  x q: : register cacheblock . x17.nish store : label. x37. x258. . x40. ld ready = 13. x44. use and . x267. x280. next : coroutine . preld = 61. x23. hit and miss = 12. x240. no hardware PT : bool . x17. x 124. 44. x44. ldunc = 59. x370. x183. x167. st = 63. x49. reader : coroutine . x125. l: tetra . = macro. j : register int . syncd = 64. interrupt : unsigned int . x242. state : int . x37. self : register coroutine . x49. sync check : label. lock : lockvar . x12. x49. x49. startup : static void ( ). st ready = 14. release lock = macro ( ). x49. x 258. = 5. x49. get reader : static int ( ). 266. trans key = macro ( ).

spec . x44. write from wbuf = 92. x44. x: y: z: specnode. spec . x44.x : static cacheblock ( ). x125. x196. wait = macro ( ). x129. .

MMIX-PIPE: LOADING AND STORING 274. if (Dcache~ . h Try to get the contents of location data~ z:o in the D-cache 274 i  h Check for prest with a fully spanned cache block 275 i.

if ((Scache ^ Scache~ lock ) _ (:Scache ^ mem lock )) goto ld retry . data~ z:o).ller :next ) goto ld retry . q = alloc slot (Dcache . if (Scache ) set lock (&Dcache~ . if (:q) goto ld retry .

Scache~ lock ) else set lock (&Dcache~ .ller .

Dcache~ .ller . mem lock ). set lock (self .

ll lock ). data~ ptr b = Dcache~ .

Dcache~ .ller ctl :ptr b = (void ) q.

ller ctl :z:o = data~ z:o. startup (&Dcache~ .

if (data~ i  preld _ data~ i  prest ) goto .ller . data~ state = ld ready . Scache ? Scache~ access time : mem addr time ).

276 This code is used in section 273. If a prest instruction makes it to the hot seat. else sleep . Hence we can pretend that we know they are zero. we have been assured by the user of PREST that the current values of bytes in virtual addresses data~ y:o (data~ xx & Dcache~ bb ) through data~ y:o + (data~ xx & (Dcache~ bb 1)) are irrelevant. 275.n ex . This is advantageous if it saves us from .

case prest win : if (data 6= old hot _ Dlocker :next ) wait (1). if (Dcache~ lock ) goto . 276. h Check for prest with a fully spanned cache block 275 i  if (data~ i  prest ^ (data~ xx  Dcache~ bb _ ((data~ y:o:l & (Dcache~ bb 1))  0)) ^ ((data~ y:o:l + (data~ xx & (Dcache~ bb 1)) + 1)  data~ y:o:l)  Dcache~ bb ) goto prest span . h Special cases for states in later stages 272 i + prest span : data~ state = prest win . This code is used in section 274.lling a cache block from the S-cache or from memory.

= OK if Dcache~ .n ex .

q). q~ tag :l &= Dcache~ bb . q~ tag = data~ z:o. set lock (&Dlocker . startup (&Dlocker .ller is busy = q = alloc slot (Dcache . Dcache~ lock ). Dcache~ copy in time ). data~ z:o). g goto . if (q) f clean block (Dcache .

data~ x:o = mem read (data~ z:o). mem lock ). set lock (&mem locker . g This code is used in section 273. mem addr time + mem read time ).n ex . startup (&mem locker . . 277. wait (mem addr time + mem read time ). h Do load/store stage 2 without D-cache lookup 277 i  f avoid D : if (mem lock ) wait (1). data~ state = ld ready .

= macro. access time : int . if (m) f data~ x:o = m. Dlocker : coroutine . data~ z:o). wait (1). x168. data : register control . clean block : void ( ). x167. data~ state = ld ready . x 124. alloc slot : static cacheblock ( ). copy in time : int .  bb : int . Dcache : cache . x127. x254. x167. if (m  DUNNO ) wait (1). h Check for a hit in pending writes 278 i  f octa m = write search (data . x205. . x179. g g This code is used in section 273. x167.277 MMIX-PIPE: LOADING AND STORING 278.

x167.ll lock : lockvar . .

.ller : coroutine . x167.

x167. .ller ctl : control .

mem read time : int . x44. old hot : control . x267. PREST = # ba. l: tetra . mem read : octa ( ). next : coroutine . x49. prest win = 15. o: octa . x214. x168. y : spec . x47. x210. xx : unsigned char . tag : octa . sleep = macro. mem addr time : int . self : register coroutine . ptr b : void . x144. x44. x x 124. mem locker : coroutine . x17. x31. x273. preld = 61. octa = struct . mem lock : lockvar . set lock = macro ( ). x40. x44. x37. z : spec . x44. x267. x44. x214. x214. x23. x125. state : int . x167. Scache : cache . x127.n ex : label. startup : static void ( ). DUNNO ld ready = 13. x125. x17. wait = macro ( ). . x 255. write search : static octa ( ). ld retry : label. 258. x60. q: register cacheblock . lock : lockvar . x44. x: specnode. x49. i: internal opcode . x167. prest = 62. x44.

self ~ lockloc = .MMIX-PIPE: LOADING AND STORING 278 279. The requested octabyte will arrive sooner or later in data~ x:o. if (data~ i  st ) goto . except that we might need to massage the input a little bit. h Special cases for states in later stages 272 i + case ld ready : if (self ~ lockloc ) (self ~ lockloc ) = . Then a load instruction is almost done.

goto .nish store . i = 56. switch (data~ op  1) f case LDB  1 : case LDBU  1 : j = (data~ z:o:l & # 7)  3.

i = 48.n ld . case LDW  1 : case LDWU  1 : j = (data~ z:o:l & # 6)  3. goto .

i = 32.n ld . case LDT  1 : case LDTU  1 : j = (data~ z:o:l & # 4)  3. .

i. data~ op & # 2). default : goto . j ).n ld : data~ x:o = shift right (shift left (data~ x:o.

n ex . data~ x:o:l = 0. goto . case LDHT  1: if (data~ z:o:l & 4) data~ x:o:h = data~ x:o:l.

data~ state = 3. g else data~ x:o = load sf (data~ x:o:h). if ((data~ x:o:h & # 7f800000)  0 ^ (data~ x:o:h & # 7fffff)) f data~ x:o = load sf (data~ x:o:h). wait (denin penalty ).n ex . case LDSF  1: if (data~ z:o:l & 4) data~ x:o:h = data~ x:o:l. goto .

n ex . case LDPTP  1: if ((data~ x:o:h & sign bit )  0 _ (data~ x:o:l & # 1ff8) 6= page n ) data~ x:o = zero octa . else data~ x:o:l &= (1  13). goto .

data~ x:o:h &= # ffff. goto . data~ x:o:l & # 7). page mask ). case LDPTE  1: if ((data~ x:o:l & # 1ff8) 6= page n ) data~ x:o = zero octa .n ex . else data~ x:o = incr (oandn (data~ x:o.

g 280. case UNSAVE  1 : h Handle an internal UNSAVE when it's time to load 336 i. h Special cases for states in later stages 272 i + .n ex .

Store instructions have an extra complication. because some of them need to check for over ow. if (Dcache ^ Dcache~ bb < data~ b:o:l) data~ b:o:l = Dcache~ bb . switch (data~ op  1) f case STUNC  1 : data~ i = stunc . goto do syncid . case syncid : data~ b:o:l = (Icache ? Icache~ bb : 8192).nish store : data~ state = st ready . h Finish a store command 281 i  data~ x:addr = data~ z:o. goto . g 281. if (data~ b:p) wait (1). default : data~ x:o = data~ b:o. case syncd : data~ b:o:l = (Dcache ? Dcache~ bb : 8192). case st ready : switch (data~ i) f case st : case pst : h Finish a store command 281 i. goto do syncd .

. case STSF  1 : data~ b:o:h = store sf (data~ b:o).n ex .

else data~ x:o:h = data~ b:o:h. wait (denout penalty ). data~ state = 3. goto . else data~ x:o:h = data~ b:o:h. LOADING AND STORING f g case STHT  1: if (data~ z:o:l & 4) data~ x:o:l = data~ b:o:h.279 MMIX-PIPE: data~ interrupt j= exceptions . if ((data~ b:o:h & # 7f800000)  0 ^ (data~ b:o:h & # 7fffff)) if (data~ z:o:l & 4) data~ x:o:l = data~ b:o:h.

i = 56. case STB  1 : case STBU  1 : j = (data~ z:o:l & # 7)  3. goto .n ex .

goto . case STW  1 : case STWU  1 : j = (data~ z:o:l & # 6)  3.n st . i = 48.

case STT  1 : case STTU  1 : j = (data~ z:o:l & # 4)  3. . i = 32.n st .

n st : h Insert data~ b:o into the proper .

checking for arithmetic exceptions if signed 282 i.eld of data~ x:o. goto .

. oandn : octa ( ). x235. lockloc : coroutine . x235.n ex . = macro. case CSWAP  1 : h Finish a CSWAP 283 i. = macro. 47. case SAVE  1 : h Handle an internal SAVE when it's time to store 342 i. 47. 47. g This code is used in section 280. do syncd : label. = . 47. = . x364. = . load sf : octa (). = . do syncid : label. exceptions : int . = . x364. o: octa . 47. MMIX-ARITH x32. MMIX-ARITH x39. = . 47. x40. x23.

MMIX-ARITH x40. x47. STUNC = # b6. x238. x124. x44. x47. x44. x40. i: register int . store sf : tetra ( ). x49. addr : octa . state : int . UNSAVE = # fb. x17. stunc = 67. x44. x47. LDB = # 80. denout penalty : int . b: LDHT # 92 x LDPTE LDPTP LDSF # 90 x LDT # 88 x LDTU # 8a x LDW # 84 x LDWU # 86 x MMIX-ARITH x7. x: z: specnode. STB = # a0. spec . x47. x44. pst = 66. x168. data : register control . x49. x44. Icache : cache . spec . MMIX-ARITH x7. STW = # a4. self : register coroutine . SAVE = # fa. i: internal opcode . zero octa : octa . x47. wait = macro ( ). shift left : octa ( ). MMIX-ARITH x25. interrupt : unsigned int . op : mmix opcode . x47. x238. p: specnode . Dcache : cache . sign bit = macro. x47. x47.n ex : label. CSWAP = # 94. st ready = 14. x349. STTU = # aa. bb : int . syncid = 65. x49. STWU = # a6. x47. STSF = # b0. MMIX-ARITH x6. h: tetra . x168. incr : octa ( ). x167. x144. x12. x47. STHT = # b2. x349. x267. x125. page n : int . . syncd = 64. x47. LDBU = # 82. st = 63. x44. MMIX-ARITH x4. STBU = # a2. x40. ld ready = 13. j : register int . x124. x17. x12. denin penalty : int . x80. x47. x44. x49. STT = # a8. shift right : octa ( ). page mask : octa . x267. x49. l: tetra . x47. x47.

h Insert data~ b:o into the proper .MMIX-PIPE: LOADING AND STORING 280 282.

$Y. rP) as well as three outputs ($X. i). 283. i). M8 [A]. j. i). printf ("\n" ). checking for arithmetic exceptions if signed 282 i  f octa mask . if (before :l 6= after :l _ before :h 6= after :h) data~ interrupt j= V_BIT . g This code is used in section 281. To keep from exceeding the capacity of the control blocks in our pipeline. g else f = data~ a:o is zero = if (verbose & issue bit ) f printf (" setting rP=" ). if (data~ x:o:h  g[rP ]:o:h ^ data~ x:o:l  g[rP ]:o:l) f data~ a:o:l = 1. $Z. g [rP ]:o = data~ x:o. thereby allowing us nonspeculative access to rP. j. 1). before = data~ b:o. print octa (g[rP ]:o). g g data~ i = cswap .eld of data~ x:o. data~ x:o:h = mask :h & (data~ x:o:h  data~ b:o:h). g mask = shift right (shift left (neg one . we wait until this instruction reaches the hot seat. data~ x:o:l = mask :l & (data~ x:o:l  data~ b:o:l). if (:(data~ op & 2)) f octa before . goto . after . = data~ a:o:h is zero = data~ x:o = data~ b:o. h Finish a CSWAP 283 i  if (data = 6 old hot ) wait (1). The CSWAP operation has four inputs ($X. i. data~ b:o = shift right (shift left (data~ b:o. rP). 1). 0). after = shift right (shift left (data~ b:o.

a ects the trace output only = This code is used in section 281.n ex . . = cosmetic change.

The fetch stage.281 MMIX-PIPE: THE FETCH STAGE 284. we can relax and apply our knowledge to the slightly simpler task of . Now that we've mastered the most diÆcult memory operations.

lling the fetch bu er. Further simpli. It's slightly simpler because the I-cache is read-only. except that we use the I-cache instead of the D-cache. Fetching is like loading/storing.

spec . h Global variables 20 i + = the active region of that bu er = int fetch lo . as the one and only coroutine with stage number zero. in order to see if that instruction is worthwhile. Once that bu er has been exhausted. Moreover. coroutine = struct . x124. . we don't know what to fetch. x44. the fetch coroutine accesses a cache block containing the instruction whose virtual address is given by inst ptr (the instruction pointer). Complications arise if the instruction isn't in the cache. the bu er might have more data by the time the next cycle rolls around. with the most recently fetched octabytes in positions fetch lo . x44. Extern octa fetched . data : register control . cswap = 68. : : : . In normal circumstances. it inst ptr :p is nonnull. x59. fetch hi 1 of a bu er called fetched . x291. fetch ready = 23. coroutine fetch co . we want to implement PREGO with reasonable eÆciency. The fetch coroutine usually begins a cycle in state fetch ready . control = struct . with luck. or if we can't translate the virtual address because of a miss in the IT-cache. x23. fetch hi . x44. and transfers up to fetch max instructions from that block to the fetch bu er. However. because there is only one fetch unit. fetch lo + 1. which we have already implemented for the D-cache and DT-cache. The fetch coroutine is always present. = bu er for incoming instructions = 285. h External variables 4 i + = the instruction pointer (aka program counter) = Extern spec inst ptr . x49. control fetch ctl . Extern = macro. so we include the complications of simultaneous I-cache and IT-cache readers. fetch max : int . a: b: specnode . inst ptr is a spec variable whose value might not even be known. x4.cations would be possible if there were no PREGO instruction. the coroutine reverts to state 0.

x44. MMIX-ARITH x7. g: h: specnode [ ]. octa = struct . rP = 23. p: specnode . op : mmix opcode . MMIX-ARITH x7. MMIX-ARITH x4. x86. x23. x17. issue bit = 1  0. x: specnode . x19. old hot : control . x40. wait = macro ( ). tetra . j : register int . stage : int . l: tetra . shift left : octa ( ). x40. verbose : int . x17. print octa : static void ( ). x144. x44. <stdio. x12. V_BIT = 1  14. x17. o: octa . x52.n ex : label. printf : int ( ). internal opcode . x125. i: i: register int . . x4. x44. interrupt : unsigned int . x12. x40. x44.h>. x54. x60. neg one : octa . spec = struct . x8. shift right : octa ( ).

startup (&fetch co . 1). 309. and 316. 288. h Restart the fetch coroutine 287 i  if (fetch co :lockloc ) (fetch co :lockloc ) = . 287. Some of the actions here are done not only by the fetcher but also by the . unschedule (&fetch co ). 160. fetch ctl :go :o:l = 4. fetch co :lockloc = . h Initialize everything 22 i + fetch co :ctl = &fetch ctl . This code is used in sections 85. 1). startup (&fetch co . 308.MMIX-PIPE: 282 THE FETCH STAGE 286. fetch co :name = "Fetch" .

rst and second stages of a prego operation. #de.

data~ state = 1. g h Simulate an action of the fetch coroutine 288 i  switch0 : switch (data~ state ) f new fetch : data~ state = 0. inst ptr :p = . See also section 352. 290. if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1). if necessary. if (page bad ) goto bad fetch . h Look up the address in the IT-cache. 289. startup (&ITcache~ reader [j ]. h Other cases for the fetch coroutine 298 i g This code is used in section 125. data~ x:o = data~ z:o = zero octa . wait (1). ITcache~ access time ). . h Wait. and also in the I-cache if possible 291 i.ne wait or pass (t) if (data~ i  prego ) else wait (t) f pass after (t). data~ y:o = inst ptr :o. g This code is used in section 288. data~ interrupt = 0. h Handle special cases for operations like prego and ldvts 289 i  if (data~ i  prego ) goto start fetch . goto passit . if necessary. This code is used in section 266. case 1: start fetch : if (data~ y:o:h & sign bit ) h Begin fetch with known physical address 296 i. wait or pass (ITcache~ access time ). until the instruction pointer is known 290 i  if (inst ptr :p) f 6 UNKNOWN_SPEC ^ inst ptr :p~ known ) if (inst ptr :p = inst ptr :o = inst ptr :p~ o. case 0: h Wait. until the instruction pointer is known 290 i.

283 MMIX-PIPE: THE FETCH STAGE 291. #de.

ne got IT 19 = state when IT-cache entry has been computed = = state when IT-cache doesn't hold the key = #de.

ne IT miss 20 = state when physical instruction address is known = #de.

ne IT hit 21 = state when I-cache misses = #de.

ne Ihit and miss 22 = state when instructions have been read = #de.

ne fetch ready 23 = state when a \preview" octabyte is ready = #de.

name : char . x168. This code is used in section 288. x49. UNKNOWN_SPEC unschedule : x 33. and also in the I-cache if possible 291 i  p = cache search (ITcache . x44. wait = macro ( ). x183. x44. access time : int . trans key = macro ( ). spec . static void ( ). x258.ne got one 24 h Look up the address in the IT-cache. state : int . l: tetra . inst ptr : spec . sign bit = macro. zero octa : octa . known : bool . cache . x238. trans key (data~ y:o)). x44. x80. x167. x167. reader : coroutine . x134. x240. passit : label. bad fetch : label. ctl : control . x44. ldvts = 60. page bad : bool . pass after = macro ( ). x23. x168. fetch ctl : control . x167. x193. x285. x40. x124. x23. if (:Icache _ Icache~ lock _ (j = get reader (Icache )) < 0) h Begin fetch without I-cache lookup 295 i. x: y: z: specnode. lockloc : coroutine . get reader : static int ( ). x31. x44. p: specnode . data : register control . o: octa . i: internal opcode . x284. x23. interrupt : unsigned int . h: tetra . lock : lockvar . startup : static void ( ). p: register cacheblock . x40. x49. Icache : cache . . x44. go : specnode . x17. x125. startup (&Icache~ reader [j ]. cacheblock ( ). fetch co : coroutine . x12. if (p) h Do a simultaneous lookup in the I-cache 292 i else data~ state = IT miss . x17. Icache~ access time ). cache search : static ITcache : j: prego = 73. spec . x285. MMIX-ARITH x4. x40. register int . x44. x125. = macro. x301. x71.

MMIX-PIPE: THE FETCH STAGE 284 292. else f q = cache search (Icache .) h Do a simultaneous lookup in the I-cache 292 i  f h Update IT-cache usage and check the protection bits 293 i. p~ data [0]). data~ z:o). (See the remarks about \page coloring. if (Icache~ b + Icache~ c > page s ^ ((data~ y:o:l  data~ z:o:l) & ((Icache~ bb  Icache~ c) (1  page s )))) = spurious I-cache lookup = data~ state = IT hit . We assume that it is possible to look up a virtual address in the IT-cache at the same time as we look for a corresponding physical address in the I-cache. provided that the lower b + c bits of the two addresses are the same." when we made similar assumptions about the DT-cache and D-cache. if (q) f q = use and . data~ z:o = phys addr (data~ y:o.

h Copy the data from block q to fetched 294 i. q). data~ state = fetch ready . 293.x (Icache . Icache~ access time )). g This code is used in section 291. g else data~ state = Ihit and miss . g wait or pass (max (ITcache~ access time . h Update IT-cache usage and check the protection bits 293 i  p = use and .

j ++ ) fetched [j ] = q~ data [j ]. j < Icache~ bb . g This code is used in sections 292 and 296. 294. wait or pass (ITcache~ access time ). 295. .x (ITcache . h Begin fetch without I-cache lookup 295 i  f if (p) f h Update IT-cache usage and check the protection bits 293 i. p). This code is used in sections 292 and 295. if (:(p~ data [0]:l & (PX_BIT  PROT_OFFSET ))) goto bad fetch . g This code is used in section 291. data~ z:o = phys addr (data~ y:o. fetch lo = (inst ptr :o:l & (Icache~ bb 1))  3. g else data~ state = IT miss . fetch hi = Icache~ bb  3. At this point inst ptr :o equals data~ y:o. h Copy the data from block q to fetched 294 i  if (data~ i = 6 prego ) f for (j = 0. p~ data [0]). data~ state = IT hit .

285 MMIX-PIPE: THE FETCH STAGE 296. h Begin fetch with known physical address 296 i  f if (data~ i  prego ^ :(data~ loc :h & sign bit )) goto .

data~ z:o).n ex . g startup (&Icache~ reader [j ]. q = cache search (Icache . data~ z:o:h = sign bit . wait or pass (1). data~ z:o = data~ y:o. known phys : if (data~ z:o:h & # ffff0000) goto bad fetch . if (:Icache ) h Read from memory into fetched 297 i. Icache~ access time ). if (q) f q = use and . if (Icache~ lock _ (j = get reader (Icache )) < 0) f data~ state = IT hit .

x167. fetch lo : int . x284. bb : int . bad fetch : label. x167. access time : int . fetched : octa . . c: int . x285. x167. x124. int . data~ state = Ihit and miss . x285. cache search : static cacheblock ( ). x167. data : octa . data~ state = fetch ready . fetch ready = 23. data : register control . h Copy the data from block q to fetched 294 i. fetch hi : int . q). x301. wait or pass (Icache~ access time ).x (Icache . g else g This code is used in section 288. x167. x193. x291.

register cacheblock . 241. Ihit and miss = 22. x44. j : register int . x291.n ex : label. x167. x44. x258. prego = 73. x31. use and . x17. x167. max = macro ( ). inst ptr : spec . x284. h: tetra . x291. q : register cacheblock . IT hit = 21. x49. page s : int . x291. x168. sign bit = macro. loc : octa . get reader : static int ( ). state : int . x40. lock : lockvar . b: i: internal opcode . reader : coroutine . o: p: octa . l: tetra . x17. ITcache : cache . phys addr : static octa ( ). x44. x54. x268. x238. x54. PROT_OFFSET = 5. PX_BIT = 1  5. startup : static void ( ). x258. x183. x168. x144. x80. IT miss = 20. x12. Icache : cache .

x196. x . wait or pass = macro ( ). z : spec . x288. x44. y : spec . x44.x : static cacheblock ( ).

MMIX-PIPE: THE FETCH STAGE 297. g This code is used in section 296. startup (&mem locker . addr :l &= (bus words  3). fetch hi = bus words . fetch lo = (data~ z:o:l  3) & (bus words 1). 298. data~ state = fetch ready . j ++ ) fetched [j ] = mem hash [last h ]:chunk [((addr :l & # ffff)  3) + j ]. mem lock ). j < bus words . addr = data~ z:o. h Other cases for the fetch coroutine 298 i  case IT miss : if (ITcache~ . mem addr time + mem read time ). set lock (&mem locker . for (j = 1. wait (mem addr time + mem read time ). h Read from memory into fetched 297 i  f octa addr . if (mem lock ) wait (1). fetched [0] = mem read (addr ).

ller :next ) if (data~ i  prego ) goto .

it was present after all = if (data~ i  prego ) goto .n ex . if (:p) = hey. if (no hardware PT ) h Insert dummy instruction for page table emulation 302 i. p = alloc slot (ITcache . else wait (1). trans key (data~ y:o)).

else goto new fetch . data~ ptr b = ITcache~ .n ex .

ITcache~ .ller ctl :ptr b = (void ) p.

ITcache~ .ller ctl :y:o = data~ y:o. set lock (self .

ll lock ). startup (&ITcache~ .

if (data~ i  prego ) goto .ller . 1). data~ state = got IT .

else sleep .n ex . ITcache~ . case got IT : release lock (self .

case IT hit : if (data~ i  prego ) goto .ll lock ). fetch retry : data~ state = IT hit . data~ z:o = phys addr (data~ y:o. if (:(data~ z:o:l & (PX_BIT  PROT_OFFSET ))) goto bad fetch . data~ z:o).

case Ihit and miss : h Try to get the contents of location data~ z:o in the I-cache 300 i. 299.n ex . h Special cases for states in later stages 272 i + case IT miss : case Ihit and miss : case IT hit : case fetch ready : goto switch0 . else goto known phys . See also section 301. 286 . This code is used in section 288.

h Try to get the contents of location data~ z:o in the I-cache 300 i  if (Icache~ .287 MMIX-PIPE: THE FETCH STAGE 300.

ller :next ) goto fetch retry . if (Scache ) set lock (&Icache~ . q = alloc slot (Icache . data~ z:o). if (:q) goto fetch retry . if ((Scache ^ Scache~ lock ) _ (:Scache ^ mem lock )) goto fetch retry .

ller . Scache~ lock ) else set lock (&Icache~ .

set lock (self . Icache~ .ller . mem lock ).

ll lock ). data~ ptr b = Icache~ .

ller ctl :ptr b = (void ) q. Icache~ .

startup (&Icache~ .ller ctl :z:o = data~ z:o.

ller . if (data~ i  prego ) goto . Scache ? Scache~ access time : mem addr time ). data~ state = got one .

x206. x301. access time : int . x285. x291. x167. fetch ready = 23. This code is used in section 298. x285.  bad fetch : label. x 124. chunk : octa . else sleep . alloc slot : static cacheblock ( ). fetched : octa .n ex . fetch hi : int . . x214. x284. fetch lo : int . x205. data : register control . bus words : int .

.ll lock : lockvar . x167.

.ller : coroutine . x167.

x167.ller ctl : control . .

x54. Scache : cache . x12. mem addr time : int . x44. x31. x40. x288. x44. got IT = 19. x49. x168. x214. x258. x23. x37. self : register coroutine . IT hit = 21. x214. x210. x240. mem lock : lockvar . mem locker : coroutine . Ihit and miss = 22. trans key = macro ( ). mem read : octa ( ). z : spec . o: octa .n ex : label. x144. no hardware PT : bool . x168. ITcache : cache . x291. x242. known phys : label. x214. x54. x44. p: register cacheblock . j : register int . x291. x291. release lock = macro ( ). mem read time : int . state : int . sleep = macro. x291. startup : static void ( ). next : coroutine . x211. lock : lockvar . ptr b : void . x168. l: tetra . x17. y : spec . x288. x17. PX_BIT = 1  5. set lock = macro ( ). x125. x44. x125. q : register cacheblock . prego = 73. phys addr : static octa ( ). x258. x207. x44. wait = macro ( ). x296. x167. x . new fetch : label. octa = struct . x127. IT miss = 20. x291. i: internal opcode . Icache : cache . mem hash : chunknode . switch0 : label. last h : int . got one = 24. 241. x124. x37. PROT_OFFSET = 5.

The I-cache .MMIX-PIPE: THE FETCH STAGE 288 301.

ller will wake us up with the octabyte we want. before it has .

h Other cases for the fetch coroutine 298 i + bad fetch : if (data~ i  prego ) goto .lled the entire cache block. In that case we can fetch one or two instructions before the rest of the block has been loaded.

goto fetch one . case fetch ready : if (self ~ lockloc ) (self ~ lockloc ) = . = a \preview" of the new cache data = case got one : fetched [0] = data~ x:o. swym one : fetched [0]:h = fetched [0]:l = SWYM  24.n ex . self ~ lockloc = . if (data~ i  prego ) goto . fetch hi = 1. data~ state = fetch ready . data~ interrupt j= PX_BIT . fetch one : fetch lo = 0.

if (tail  fetch bot ) new tail = fetch top . h Insert dummy instruction for page table emulation 302 i  f if (cache search (ITcache . h Global variables 20 i + bool sleepy . g This code is used in section 298. j < fetch max . wait (1). sleepy = true . sleep . tail = new tail . else new tail = tail 1. j ++ ) f register fetch new tail . h Install a new instruction into the tail position 304 i. 302. = fetch bu er is full = if (new tail  head ) break . g g inst ptr :o = incr (inst ptr :o. trans key (inst ptr :o))) goto new fetch . = have we just emitted the page table emulation call? = . for (j = 0. 303. if (sleepy ) f sleepy = false . 4). if (fetch lo  fetch hi ) goto new fetch .n ex . data~ interrupt j= F_BIT . goto swym one .

i = tail~ inst  24. SAVE . The commands RESUME .289 MMIX-PIPE: THE FETCH STAGE 304.) h Install a new instruction into the tail position 304 i  tail~ loc = inst ptr :o. UNSAVE . 305. At this point we check for egregiously invalid instructions. if (i  RESUME ^ i  SYNC ^ (tail~ inst & bad inst mask [i RESUME ])) tail~ interrupt j= B_BIT . (Sometimes the dispatcher will actually allow such instructions to occupy the fetch bu er. if (inst ptr :o:l  breakpoint :l ^ inst ptr :o:h  breakpoint :h) breakpoint hit = true . This code is used in section 301. else tail~ inst = fetched [fetch lo ]:h. tail~ interrupt = data~ interrupt . and SYNC should not have nonzero bits in the positions de. tail~ noted = false . if (inst ptr :o:l & 4) tail~ inst = fetched [fetch lo ++ ]:l. for internally generated commands.

data : register control . x291. fetch ready = 23. fetch = struct . 124. fetch top : fetch . x54. x59. fetch lo : int . x285. x69. x68. breakpoint hit : bool . = 1  17. x193. fetch bot : fetch . x11. x284. fetch hi : int . x12. false = 0. x54. B_BIT = 1  2. # ffff. bool = enum. x69. breakpoint : octa . # fffff8g. x11. x F_BIT . x10. fetched : octa . x285. # ffff00. h Global variables 20 i + # int bad inst mask [4] = f fffffe. cache search : static cacheblock ( ).ned here. fetch max : int .

true = 1. self : register coroutine . tail : fetch . . x40. x68. x47. x12. fetch . x44. x49. x69. x288. SYNC = # fc. RESUME = # f9. x11. ITcache : cache . x 124.n ex : label. SAVE = # fa. x47. lockloc : coroutine . PX_BIT = 1  5. noted : bool . x240. x144. x44. x68. x125. h: tetra . i: register int . j : register int . SWYM = # fd. l: tetra . x68. x291. interrupt : unsigned int . got one = 24. loc : octa . x23. i: internal opcode . x17. x12. sleep = macro. interrupt : unsigned int . x: specnode. state : int . incr : octa ( ). inst : tetra . x44. x68. wait = macro ( ). x47. x54. x168. prego = 73. x125. x17. x284. x69. new fetch : label. UNSAVE = # fb. x47. inst ptr : spec . MMIX-ARITH x6. trans key = macro ( ). x44. x47. head : o: octa .

MMIX-PIPE: INTERRUPTS 290 306. which show up as bit codes in the interrupt . the discipline of a reorder bu er. allows us to deal with interrupts in a fairly natural way. Interrupts. however. Fortunately. The scariest thing about the design of a pipelined machine is the existence of interrupts. which disrupt the smooth ow of a computation in ways that are diÆcult to anticipate. MMIX has three kinds of interrupts. which forces instructions to be committed in order. Our solution to the problems of dynamic scheduling and speculative execution therefore solves the interrupt problem as well.

E_BIT invokes a dynamic-trap handler. Most instructions come to the following part of the program. for external interrupts like I/O signals or for internal interrupts caused by improper instructions. for TRIP instructions and arithmetic exceptions. In all three cases. F_BIT invokes a forced-trap handler. if they have . 307. for TRAP instructions and unimplemented instructions that need to be emulated in software. the pipeline control has already been redirected to fetch new instructions starting at the correct handler address by the time an interrupted instruction is ready to be committed.eld when an instruction is ready to be committed: H_BIT invokes a trip handler.

we need to hold onto them until we get to the hot seat. If the trip bits aren't all zero. when they will be joined with the bits of rQ and probably cause an interrupt. we want to update the event bits of rA. or perform an enabled trip handler. If the trap bits are nonzero.nished execution with any 1s among the eight trip bits or the eight trap bits. A load or store instruction with nonzero trap bits will be nulli. or both.

ed. Under ow that is exact and not enabled is ignored. in accordance with the IEEE standard conventions. not committed. (This applies also to under ow triggered by RESUME_SET .) #de.

if (j & data~ ra :o:l) h Prepare for exceptional trip handler 308 i. if ((j & (U_BIT + X_BIT ))  U_BIT ^ :(data~ ra :o:l & U_BIT )) j &= U_BIT . . data~ arith exc = (j & data~ ra :o:l)  8. j = data~ interrupt & # ff00. if (data~ interrupt & # ff) goto state 5 . g This code is used in section 144.ne is load store (i) (i  ld ^ i  cswap ) h Handle interrupt at end of execution stage 307 i  f if ((data~ interrupt & # ff) ^ is load store (data~ i)) goto state 5 . data~ interrupt = j .

x44. x17. inst ptr :o = data~ go :o. x49. F_BIT = 1  17. x12. x159. data~ interrupt j= F_BIT . = clear the fetch bu er = old tail = tail = head . hist : unsigned int . the present coroutine might have already been deissued. deissues : int . x54. data~ go :o:h = 0. cool : control . x124. = 1 10. x60. This code is used in section 310. x69. die : label. old tail = tail = head . m: register int . cool ). x71. goto state 4 . U_BIT  x UNKNOWN_SPEC X_BIT  x . head : fetch . cool hist : unsigned int . resuming = 0. i = 1. i: i: o: octa . if (i < deissues ) goto die . = 1  15. 54. p: specnode . x12. issued between : static int ( ). resuming = 0. x310. x40. x320. x44. register int . 309. j : register int . E_BIT = 1  18. = macro. x54. h: tetra . x69. 54. ra : spec . = clear the fetch bu er = cool hist = data~ hist . old tail : state 4 : label. x284.291 MMIX-PIPE: INTERRUPTS 308. Indeed. x54. Since execution is speculative. inst ptr :p = . internal opcode . l: tetra . x17. RESUME_SET = 2. ld = 56. go : specnode . inst ptr : spec . m = 16. for (i = j & data~ ra :o:l. deissues = i. interrupt : unsigned int . g This code is used in section 307. h Restart the fetch coroutine 287 i. x12. h Restart the fetch coroutine 287 i. :(i & D_BIT ). x44. data~ go :o:l = m. state 5 : label. tail : fetch . x78. h Prepare to emulate the page translation 309 i  i = issued between (data . x70. x60. cool hist = data~ hist . fetch . x99. x49. inst ptr :p = UNKNOWN_SPEC . data~ interrupt j= H_BIT . x44. x40. x310. x144. = 1 8. h Prepare for exceptional trip handler 308 i  f i = issued between (data . data : register control . D_BIT H_BIT = 1  16. resuming : int . an exceptional condition might not be part of the \real" computation. x44. arith exc : unsigned int . cool ). x54. x44. if (i < deissues ) goto die . m += 16) . cswap = 68. deissues = i.

lest we issue an instruction that uses g[255] or rB as an operand. We need to stop dispatching when calling a trip handler from within the reorder bu er. h Special cases for states in the .MMIX-PIPE: INTERRUPTS 292 310.

break . if (i  trip ) cool~ x:o = cool~ go :o = zero octa . 311. cool~ b = specval (&g[255]). goto . spec install (&g[i  trap ? rBB : rB ]. h Cases for stage 1 execution 155 i + case trap : data~ interrupt j= F_BIT . cool~ x:o = inst ptr :o. inst ptr :p = . if (:g[rT ]:up~ known ) goto stall . &cool~ x). cool~ x:known = true . case 5: goto state 5 . We need to use them also in later stages. g if (data~ interrupt & # ff) f g [rQ ]:o:h j= data~ interrupt & # ff. case 5: if (data 6= old hot ) wait (1). &cool~ a). print octa (g[rQ ]:o). cool~ need b = true . if ((data~ interrupt & F_BIT ) ^ data~ i 6= trap ) f inst ptr :o = g[rT ]:o. cool~ ren a = true . dispatch lock ). 312. set lock (self . spec install (&g[255]. if (verbose & issue bit ) f printf (" setting rQ=" ). = traps and emulated ops = inst ptr = specval (&g[rT ]). case trip : cool~ ren x = true . state 5 : data~ state = 5.rst stage 266 i + emulate virt : h Prepare to emulate the page translation 309 i. 313. h Special cases of instruction dispatch 117 i + case trap : if (( ags [op ] & X is dest bit ) ^ cool~ xx < cool G ^ cool~ xx  cool L ) goto increase L . printf ("\n" ). h Special cases for states in later stages 272 i + case 4: goto state 4 . g g goto die . new Q :h j= data~ interrupt & # ff. case 4: if (dispatch lock ) wait (1). if (is load store (data~ i)) nullifying = true . state 4 : data~ state = 4. data~ a:o = data~ b:o. The instructions of the previous section appear in the switch for coroutine stage 1 only.

n ex . data~ a:o = data~ b:o. goto . case trip : data~ interrupt j= H_BIT .

n ex . .

new Q : octa . x144. x49. x99. x11. rI = 12. g inst ptr :o = g[rTT ]:o. known : bool . else f hot~ interrupt j= E_BIT . interrupt : unsigned int . op : register mmix opcode . x54. x44. false = 0. is load store = macro ( ). x37. inst ptr : spec . = 1  7. . x11. F_BIT = 1  17. cool L : int . 315. x52. if (verbose & issue bit ) f printf (" setting rQ=" ). x99. g g trying to interrupt = false . x60. x40. x54. x44. die : label. stall : label. x60. l: tetra . x40. x95. x44. x124. h Check for external interrupt 314 i  g[rI ]:o = incr (g[rI ]:o. self : register coroutine . dispatch lock : lockvar .293 MMIX-PIPE: INTERRUPTS 314. x52. specval : static spec ( ). h Global variables 20 i + = encouraging interruptible operations to pause = bool trying to interrupt . o: octa . x52. cool G : int . x93. if (((g[rQ ]:o:h & g[rK ]:o:h) _ (g[rQ ]:o:l & g[rK ]:o:l)) ^ cool 6= hot ^ :(hot~ interrupt & (E_BIT + F_BIT + H_BIT )) ^ :doing interrupt :(hot~ i  resum )) f if (hot~ owner ) trying to interrupt = true . x8. old hot : control . x65. x110. x52. x148. rK = 15. x52. ^ h Deissue all but the hottest command 316 i. spec install : static void ( ). x307. x124. print octa (g[rQ ]:o). if (g[rI ]:o:l  0 ^ g[rI ]:o:h  0) f g [rQ ]:o:l j= INTERVAL_TIMEOUT . bool nullifying . x75. g This code is used in section 64. An instruction in the hot seat can be externally interrupted only if it is ready to be committed and not already marked for tripping or trapping. x44. issue bit = 1  0. x75. rQ = 16. spec . printf ("\n" ). increase L : label. x284. rTT = 14. x65. need b : bool . data : register control . x17. E_BIT = 1  18. new Q :l j= INTERVAL_TIMEOUT . = stopping dispatch to nullify a load/store command = a: b: specnode . bool = enum. The following check is performed at the beginning of every cycle. x44. rT = 13. doing interrupt : int . 1). trap = 82. cool : control . set lock = macro ( ). state : int . inst ptr :p = .

print octa : static void ( ). i: internal opcode . p: specnode . wait = macro ( ). x52. x83. x19. ags : unsigned char [ ]. ren x : bool . x17. <stdio. x54. MMIX-ARITH x6. rBB = 7. x60. x44. trip = 83. x4. x40. x52. g : specnode [ ].n ex : label. i: register int . zero octa : octa . incr : octa ( ).h>. x49. x83. x44. x44. x: specnode . x40. INTERVAL_TIMEOUT x 57. x144. ren a : bool . x44. H_BIT = 1  16. hot : control . MMIX-ARITH x4. h: tetra . x44. x11. x12. x44. x49. x86. resum = 89. rB = 0. x44. verbose : int . x125. owner : coroutine . xx : unsigned char . go : specnode . X is dest bit = # 20. true = 1. . printf : int ( ). up : specnode .

if (i  deissues ) f deissues = i. tail = head . but only if the simulator has done so at the user's request. (It's only a heuristic for branch prediction. g This code is used in section 314. The value of cool hist becomes aky here. Otherwise the test `i  deissues ' here will always succeed. if (is load store (hot~ i)) nullifying = true . but the unpredictable nature of external interrupts suggests that we are better o leaving it alone.MMIX-PIPE: INTERRUPTS 294 316. It's possible that the command in the hot seat has been deissued. We could try to keep it strictly up to date. Even though an interrupted instruction has oÆcially been either \committed" or \nulli. = clear the fetch bu er = h Restart the fetch coroutine 287 i. 317. and a suÆciently strong prediction will survive one-time glitches due to interrupts. resuming = 0.) h Deissue all but the hottest command 316 i  i = issued between (hot . cool ).

g This code is used in section 64. case 1: h Set resumption registers (rY. if (((hot~ interrupt & H_BIT ) ^ hot~ i = 6 trip ) _ ((hot~ interrupt & F_BIT ) ^ hot~ i = 6 trap ) _ (hot~ interrupt & E_BIT )) doing interrupt = 3. = trip or trap started by dispatcher = else doing interrupt = 2. we should set rF here. $255) 319 i. rXX) 320 i. break . break . break . case 2: h Set resumption registers (rW. The simulator doesn't do anything with rF at present. break . suppress dispatch = true . h Perform one cycle of the interrupt preparations 318 i  switch (doing interrupt ) f case 3: h Set resumption registers (rB. rX) or (rWW. either in case 2 or case 1. h Begin an interruption and break 317 i  f = trap = if (:(hot~ interrupt & H_BIT )) g[rK ]:o = zero octa . If a memory failure occurs. $255) or (rBB. 318. rZ) or (rYY. g This code is used in section 146. if (hot  reorder bot ) hot = reorder top . while we save enough of the machine state to resume the computation later.ed. rZZ) 321 i." it stays in the hot seat for two or three extra cycles. else hot . .

x60. trap = 82. x60. print octa : static void ( ). nullifying : bool . x54. printf ("\n" ). x60. cool hist : unsigned int . x60. suppress dispatch : bool . rTT = 14. $255) 319 i  j = hot~ interrupt & H_BIT . resuming : int . if (j ) g[255]:o = zero octa . if (verbose & issue bit ) f if (j ) f printf (" setting rB=" ). reorder bot : control .h>. rB = 0. true = 1. x315. doing interrupt : int . x65. printf (". g[j ? rB : rBB ]:o = g [255]:o. x12. g else f printf (" setting rBB=" ). rBB = 7. x49. $255) or (rBB. rK = 15. H_BIT = 1  16. x52. x99. j: x 159. MMIX-ARITH x4. x54. reorder top : control . <stdio. = 1  17. x69. x54. trip = 83. interrupt : unsigned int . g g This code is used in section 318. print octa (g[255]:o). x78. x12. tail : fetch . o: octa . else g[255]:o = g[hot~ interrupt & F_BIT ? rT : rTT ]:o. deissues : int . x8. h Set resumption registers (rB. printf : int ( ). x19. x52. x44. print octa (g[rBB ]:o). cool : control . i: internal opcode . = 1  18. E_BIT F_BIT is load store = macro ( ). print octa (g[rB ]:o). x65. x52. x307. printf (". x52. x4. head : fetch . register int . i: register int . x44. x86. rT = 13. x11. x60.295 MMIX-PIPE: INTERRUPTS 319. x40. x69. issued between : static int ( ). issue bit = 1  0. verbose : int . x52. . g : specnode [ ]. zero octa : octa . x49. $255=0\n" ). $255=" ). hot : control .

#de. Here's where we manufacture the \ropcodes" for resumption.MMIX-PIPE: 296 INTERRUPTS 320.

ne RESUME_AGAIN 0 = repeat the command in rX as if in location rW 4 = #de.

ne RESUME_CONT 1 = same. but substitute rY and rZ for operands = #de.

ne RESUME_SET 2 = set r[X] to rZ = #de.

ne RESUME_TRANS 3 = install (rY. rZ) into IT-cache or DT-cache. then RESUME_AGAIN = #de.

= emulation when r[X] is not a destination = else f = dynamic = if (hot~ interim ) j = (hot~ i  frem _ hot~ i  syncd _ hot~ i  syncid ? RESUME_CONT : RESUME_AGAIN ). printf ("\n" ). printf (". hot~ xx . print octa (g[rWW ]:o). print octa (g[rW ]:o). rXX=" ). rX) or (rWW. if (verbose & issue bit ) f printf (" setting rW=" ). g [rX ]:o:h = sign bit . c. else j = # 80. print octa (g[rXX ]:o). g[rX ]:o:l = j . hot~ yy . . g [rXX ]:o:l = j . if (hot~ interrupt & F_BIT ) f = forced = if (hot~ i 6= trap ) j = RESUME_TRANS . if (verbose & issue bit ) f printf (" setting rWW=" ). g g This code is used in section 318. = TRAP = else if ( ags [internal op [hot~ op ]] & X is dest bit ) j = RESUME_SET . d) ((((((unsigned ) (a)  8) + (b))  8) + (c))  8) + (d) h Set resumption registers (rW. g g else f = trap = g [rWW ]:o = hot~ go :o. hot~ zz ). 4). b. rX=" ). = emulate page translation = else if (hot~ op  TRAP ) j = # 80. print octa (g[rX ]:o). if (hot~ interrupt & H_BIT ) f = trip = g [rW ]:o = incr (hot~ loc . = g g emulation = else j = # 80. printf (".ne pack bytes (a. else if (is load store (hot~ i)) j = RESUME_AGAIN . printf ("\n" ). = normal external interruption = g [rXX ]:o:h = (j  24) + (hot~ interrupt & # ff). rXX) 320 i  j = pack bytes (hot~ op .

if (verbose & issue bit ) f if (j ) f printf (" setting rY=" ). h: tetra . x: specnode . x49. l: tetra . rZ = 27. x52. go : specnode . x83. x54. rW = 24. F_BIT = 1  17. octa . x49. i: internal opcode . internal op : internal opcode [ ]. x49. rX = 25. g else f printf (" setting rYY=" ). x60. TRAP = # 00. rZZ=" ). print octa (g[rYY ]:o). SWYM = # fd. g: char [ ]. rZ=" ). print octa : static void ( ). rWW = 28. rZZ = 31. o: pst = 66. sign bit = macro. x49. x44. MMIX-ARITH x6. x51. x83. printf ("\n" ). op : mmix opcode . x44. x44. interrupt : unsigned int . x44. specnode [ ]. printf (". incr : octa ( ). else g[j ? rZ : rZZ ]:o = hot~ z:o. else g[j ? rY : rYY ]:o = hot~ y:o. <stdio. x44. trap = 82. x49. rY = 26. x86. x47. x44. x52. x44. print octa (g[rY ]:o). z : spec . H_BIT = 1  16. rXX = 29. verbose : int . st = 63. x80. issue bit = 1  0. x49. printf : int ( ). h Set resumption registers (rY. x44. printf (". x44. print octa (g[rZZ ]:o). if (hot~ i  st _ hot~ i  pst ) g[j ? rZ : rZZ ]:o = hot~ x:o. x17. loc : octa .h>. x52. x12. x19. ags : unsigned frem = 25. printf ("\n" ). x17. x52. y: spec . x52. g g This code is used in section 318. x52. x44. syncd = 64. yy : . x4. unsigned char . hot : control . xx : unsigned char . j : register int . is load store = macro ( ). x40. X is dest bit = # 20. x52. rZZ) 321 i  j = hot~ interrupt & H_BIT . x44. x54. x47. syncid = 65. zz : unsigned char . rZ) or (rYY. x8. print octa (g[rZ ]:o). x52. x307. x44. rYY = 30. interim : bool .297 MMIX-PIPE: INTERRUPTS 321. if ((hot~ interrupt & F_BIT ) ^ hot~ op  SWYM ) g[rYY ]:o = hot~ go :o.

cool~ i = noop .MMIX-PIPE: INTERRUPTS 298 322. cool~ ren a = true . for example. if ((1  m) & # 8f30) goto bad resume . cool~ a:known = true .) A subtle point arises here: If RESUME_TRANS is being used to compute the page translation of virtual address zero. because it has to do such drastic things. cool~ x:o = g[rBB ]:o. Whew. g else f cool~ go :o = inst ptr :o. 323. we've successfully interrupted the computation. case RESUME_CONT : resuming += 1 + cool~ zz . if (cool~ zz ) f h Magically do an I/O operation. an incgamma instruction were necessary. cool~ x:known = true . &cool~ x). cool~ i = resum . The restrictions on inserted instructions are designed to insure that those instructions will be the very next ones issued. break . since we want to issue another instruction after the RESUME itself. Here we set cool~ i = resum . . if (:(cool~ b:o:h & sign bit )) h Resume an interrupted operation 323 i. an interrupt may be occurring at this very moment. else if (inst ptr :o:h & sign bit ) cool~ interrupt j= P_BIT . cool~ a:o = g[255]:o. The RESUME instruction waits for the pipeline to drain. we don't want to execute the dummy SWYM instruction from virtual address 4! So we avoid the SWYM altogether. if (:(cool~ loc :h & sign bit )) f if (cool~ zz ) cool~ interrupt j= K_BIT . For example. g if (cool~ interrupt ) f inst ptr :o = incr (cool~ loc . switch (cool~ xx ) f case RESUME_SET : cool~ b:o:l = (SETH  24) + (cool~ b:o:l & # ff0000). as transparently as possible. it might cause a page fault and we'd lose the operand values for RESUME_SET or RESUME_CONT . 4). head~ interrupt j= cool~ b:o:h & # ff00. (If. cool~ ren x = true . spec install (&g [rK ]. h Special cases of instruction dispatch 117 i + case resume : if (cool 6= old hot ) goto stall . if (((cool~ b:o:l  24) & # fa) 6= # b8) f = not syncd or syncid = m = cool~ b:o:l  28. resuming = 2. The remaining task is to restart it again. h Resume an interrupted operation 323 i  f cool~ xx = cool~ b:o:h  24. inst ptr = specval (&g[cool~ zz ? rWW : rW ]). head~ loc = incr (inst ptr :o. g g cool~ b = specval (&g[cool~ zz ? rXX : rX ]). spec install (&g[255]. &cool~ a). changing the registers needed for resumption. if cool~ loc is rT 372 i. 4).

= 1. SETH = # e0. x44. z : spec . x60. x320. x44. resuming : int . x78. rYY = 30. incgamma = 84. spec . x47. y : spec . cool~ i = noop .299 MMIX-PIPE: g INTERRUPTS m = (cool~ b:o:l  16) & # ff. x54. head : fetch . if (m  cool L ^ m < cool G ) goto bad resume . l: tetra . g g This code is used in section 322. a: b: specnode . x49. x40. = avoid uninterruptible loop = if (m  RESUME ) goto bad resume . resume = 76. x47. x320. x44. = see \subtle point" above = g default : bad resume : cool~ interrupt j= B_BIT . x44. incr : octa ( ). x99. head~ noted = false . x44. x17. loc : loc : octa . x44. x11. rX = 25. cool G : int . if ((cool~ b:o:l  24) 6= SWYM ) goto resume again . x69. false = 0. x44. = 0. g : specnode [ ]. syncid = 65. noted : bool . x49. case RESUME_TRANS : if (cool~ zz ) f cool~ y = specval (&g[rYY ]). go : specnode . x44. octa . = 3. RESUME_AGAIN RESUME_CONT RESUME_SET RESUME_TRANS rWW = 28. break . interrupt : unsigned int . ren a : bool . x49. x68. x320. K_BIT = 1  3. x12. x . x68. true = 1. x17. 95. rXX = 29. i: internal opcode . break . sign bit = macro. x54. x47. x52. m: register int . MMIX-ARITH x6. x80. x: specnode . x49. x68. interrupt : unsigned int . x44. x11. x54. RESUME_AGAIN : resume again : head~ inst = cool~ b:o:l. x52. x44. x44. inst : tetra . o: octa . known : bool . SYNC = # fc. cool~ z = specval (&g[rZZ ]). x68. x52. x52. x52. cool : control . h: tetra . cool L : int . xx : unsigned char . x99. x49. x93. cool~ i = resume . x52. x49. specval : static spec ( ). x60. x44. x44. zz : unsigned char . resum = 89. resuming = 0. break . x75. rBB = 7. RESUME = # f9. x47. x52. syncd = 64. x40. noop = 81. stall : label. P_BIT = 1  0. x284. inst ptr : spec . bad inst mask : int [ ]. rZZ = 31. m = head~ inst  24. x305. x86. = 2. rW = 24. rK = 15. x320. ren x : bool . if (:cool~ zz ^ m > RESUME ^ m  SYNC ^ (head~ inst & bad inst mask [m RESUME ])) case head~ interrupt j= B_BIT . old hot : control . spec install : static void ( ). SWYM = # fd. x52. B_BIT = 1  2.

cool~ ra = specval (&g[rA ]). #de. g This code is used in section 103.MMIX-PIPE: INTERRUPTS 324. g if (resuming  3) f = RESUME_SET = cool~ need ra = true . g cool~ usage = false . cool~ z = specval (&g[rZ ]). cool~ z = specval (&g[rZZ ]). h Insert special operands when resuming an interrupted operation 324 i  f if (resuming & 1) f cool~ y = specval (&g[rY ]). g else f cool~ y = specval (&g[rYY ]). 325.

ne do resume trans 17 = state for performing RESUME_TRANS actions = h Cases for stage 1 execution 155 i + case resume : case resum : if (data~ xx 6= RESUME_TRANS ) goto .

data~ z:o:l & 7). data~ z:o = incr (oandn (data~ z:o. data~ ptr a = (void ) ((data~ b:o:l  24)  SWYM ? ITcache : DTcache ). 326. goto resume trans . h Special cases for states in the .n ex . data~ state = do resume trans . page mask ). data~ z:o:h &= # ffff.

rst stage 266 i + case do resume trans : resume trans : f register cache c = (cache ) data~ ptr a . if (c~ lock ) wait (1). if (c~ .

trans key (data~ y:o)). p = alloc slot (c. if (p) f c~ .ller :next ) wait (1).

c~ .ller ctl :ptr b = (void ) p.

ller ctl :y:o = data~ y:o. c~ .

ller ctl :b:o = data~ z:o. c~ .

schedule (&c~ .ller ctl :state = 1.

ller . c~ access time . 1). g g goto .

n ex . 300 .

The internal instructions that handle the register stack simply reduce to things we already know how to do.) h Cases for stage 1 execution 155 i + case noop : if (data~ interrupt & F_BIT ) goto emulate virt . the internal instructions for saving and unsaving do sometimes lead to special cases. the necessary mechanisms are already present. based on data~ op . case jmp : case pushj : case incrl : case unsave : goto . though. (Well.301 MMIX-PIPE: ADMINISTRATIVE OPERATIONS 327. Administrative operations. for the most part.

n ex . case sav : if (:(data~ mem x )) goto .

rP. goto . rF. rA. rW{rZ. or rWW{rZZ) only in the hot seat. case decgamma : case unsav : data~ i = ld . case incgamma : case save : data~ i = st . The same applies to rK. data~ z:o = g [data~ zz ]:o. We can GET special registers  21 (that is. goto switch1 . g data~ x:o = data~ z:o. because those registers are implicit outputs of many instructions. goto switch1 . h Cases for stage 1 execution 155 i + case get : if (data~ zz  21 _ data~ zz  rK ) f if (data 6= old hot ) wait (1). since it is changed by TRAP and by emulated instructions.n ex . 328.

x168. .n ex . x54. x44. x11. false = 0.  spec . decgamma = 85. x205. x310. x167. cache = struct . DTcache : cache . access time : int . x49. alloc slot : static cacheblock b: ( ). x 124. emulate virt : label. F_BIT = 1  17. cool : control . x60. data : register control . x167.

ller : coroutine . . x167.

x167. .ller ctl : control .

x320. x44. x49. ra : spec . rA = 21. x52. x238. x44. x44. noop = 81. x168.n ex : label. rZZ = 31. x11. x17. op : mmix opcode . x44. old hot : control . rZ = 27. lock : lockvar . oandn : octa ( ). x52. x17. interrupt : unsigned int . x44. schedule : static void ( ). true = 1. x49. x130. save = 77. get = 54. x: specnode. x40. MMIX-ARITH x25. y : spec . wait = macro ( ). x23. x52. trans key = macro ( ). need ra : bool . unsigned char . x144. x44. mem x : bool . x44. x49. x320. x49. incrl = 86. rK = 15. x52. x44. x240. unsav = 88. x49. incgamma = 84. p: register cacheblock . resume = 76. x44. x44. x167. x49. o: octa . h: tetra . z : spec . x93. next : coroutine . usage : bool . x49. x125. x49. MMIX-ARITH x6. x44. x60. x44. g: specnode [ ]. x49. ITcache : cache . zz : unsigned char . x86. = 2. x49. pushj = 71. ptr b : void . ld = 56. i: internal opcode . sav = 87. x78. x44. ptr a : void . x49. rYY = 30. x49. x49. x47. resum = 89. rY = 26. RESUME_SET RESUME_TRANS resuming : int . x49. st = 63. l: tetra . = 3. x52. switch1 : label. page mask : octa . incr : octa ( ). specval : static spec ( ). x52. x258. jmp = 80. SWYM = # fd. x44. unsave = 78. xx : . state : int . x28. x44.

h Cases for stage 1 execution 155 i + case put : if (data~ xx  15 ^ data~ xx  20) f if (data = 6 old hot ) wait (1). case rQ : new Q :h j= data~ z:o:h & g[rQ ]:o:h. data~ x:o = data~ z:o. A PUT is. delayed in the cases that hold dispatch lock . break . else if (data~ z:o:l > g[rL ]:o:l) data~ z:o:l = g[rL ]:o:l. new Q :l j= data~ z:o:l & g[rQ ]:o:l. g g else if (data~ xx  rA ^ (data~ z:o:h 6= 0 _ data~ z:o:l  # 40000)) data~ interrupt j= B_BIT . break . goto . data~ z:o:l j= new Q :l. This program does not restrict the 1 bits that might be PUT into rQ. data~ z:o:l = g[rL ]:o:l. switch (data~ xx ) f case rV : h Update the page variables 239 i. case rL : if (data~ z:o:h = 6 0) data~ z:o:h = 0. data~ z:o:h j= new Q :h. case rG : h Update rG 330 i. similarly.MMIX-PIPE: ADMINISTRATIVE OPERATIONS 302 329. default : break . although the contents of that register can have drastic implications. break .

j < commit max . (Remember that we're currently in the hot seat. g[g [rG ]:o:l]:o = zero octa . g else data~ interim = false . if (j  commit max ) f if (:trying to interrupt ) wait (1). When rG decreases. g This code is used in section 329. if (data~ z:o:l  g[rG ]:o:l) break . j ++ ) f g g[rG ]:o:l . and holding dispatch lock . 330. = potentially interruptible = for (j = 0. 331. we assume that up to commit max marginal registers can be zeroed during each clock cycle. Computed jumps put the desired destination address into the go . else if (data~ z:o:l < g[rG ]:o:l) f data~ interim = true .n ex .) h Update rG 330 i  if (data~ z:o:h = 6 0 _ data~ z:o:l  256 _ data~ z:o:l < g[rL ]:o:l _ data~ z:o:l < 32) data~ interrupt j= B_BIT .

data~ y:o = data~ b:o. h Cases for stage 1 execution 155 i + case go : data~ x:o = data~ go :o. goto add go . = move rJ to y . case pop : data~ x:o = data~ y:o.eld.

eld = case pushgo : add go : data~ go :o = oplus (data~ y:o. if ((data~ go :o:h & sign bit ) ^ :(data~ loc :h & sign bit )) data~ interrupt j= P_BIT . data~ z:o). data~ go :known = true . goto .

n ex . .

This sequence is controlled by the instruction currently in the fetch bu er. which changes its X and Y .303 MMIX-PIPE: ADMINISTRATIVE OPERATIONS 332. The instruction UNSAVE z generates a sequence of internal instructions that accomplish the actual unsaving.

The .elds until all global registers have been loaded.

z 112. 254. If an interrupt occurs before these instructions have all been committed. z 8. the execution register will contain enough information to restart the process. z 16. 0. UNSAVE 1. UNSAVE 1. An interrupt occurring during this last stage will . z . z 104. UNSAVE 2. rZ. rY.rst instructions of the sequence are UNSAVE 0. rB. After the global registers have all been loaded. : : : . UNSAVE 2. 255. etc. UNSAVE continues by acting rather like POP . UNSAVE 1. z 96.

= this instruction needs to be handled by load/store unit = cool~ i = unsav . else f cool~ interim = true . op = LDOU . h Set up the . a context switch might then take us back to restoring the local registers again. But no information will be lost. even though the register from which we began unsaving has long since been replaced. switch (cool~ xx ) f case 0: if (cool~ z:p) goto stall . h Special cases of instruction dispatch 117 i + case unsave : if (cool~ interrupt & B_BIT ) cool~ i = noop .nd rS < rO.

x65. x11. x44. break . = this takes us to dispatch done = = 1  2. b: spec . cool~ interim = false . cool~ interrupt j= B_BIT . cool~ i = noop . . cool : control . case 3: cool~ i = unsave . break . op = UNSAVE .rst phase of unsaving 334 i. dispatch lock : lockvar . g g break . x101. x124. case 1: case 2: h Generate an instruction to unsave g[yy ] 333 i. data : register control . goto pop unsave . commit max : int . default : cool~ interim = false . dispatch done : label. x60. false = 0. break . x54. x59.

x40. x44. x40. x44. go : specnode . g : specnode [ ]. . rL = 20. x49. = 8e. x49. sign bit = macro. x52. x52. x148. x44. x12. xx : MMIX-ARITH x4. yy : unsigned char . rA = 21. wait = macro ( ). x44. o: octa . true = 1. rV = 18. x44. oplus : octa ( ). stall : label. loc : octa . old hot : control . x44. x49. x11. op : mmix opcode . x: specnode. unsav = 88. UNSAVE = # fb. x75. x125. x47. pop unsave : label. pushgo = 74. put = 55. MMIX-ARITH x5. x49. unsigned char . h: tetra . B_BIT l: tetra#. pop = 75. x60. x17. x17. x44. p: specnode . unsave = 78. x86. x80. rG = 19. x44. known : bool . LDOU P_BIT = 1  0. x47. x49. j : register int . zero octa : octa . x44. rQ = 16.n ex : label. noop = 81. x49. x144. y : spec . x 315. x54. x40. x44. trying to interrupt : bool . go = 72. x52. z : spec . x120. x49. x52. new Q : octa . interim : bool . i: internal opcode . x44. x52. interrupt : unsigned int .

h Generate an instruction to unsave g[yy ] 333 i  cool~ ren x = true . h Set up the . 1). new O = new S = incr (cool O . cool~ ptr a = (void ) mem :up . cool~ z:o = shift left (new O . This code is used in section 332.MMIX-PIPE: ADMINISTRATIVE OPERATIONS 333. spec install (&g [cool~ yy ]. 3). &cool~ x). 334.

0). cool~ yy 1. new O = new S = shift right (cool~ z:o. cool~ yy 1. else if (cool~ yy  0) head~ inst = pack bytes (UNSAVE . 0. 336. 2. 3. g g goto . spec install (&g[rL ]. 2. 1. data~ a:o:l &= # 3ffff. break . g if (data~ x:o:l < 32) f data~ x:o:l = 32. case 2: if (cool~ yy  cool G ) head~ inst = pack bytes (UNSAVE . data~ interrupt j= B_BIT . break . &cool~ x). 1. &cool~ rl ). cool~ ptr a = (void ) mem :up . h Handle an internal UNSAVE when it's time to load 336 i  if (data~ xx  0) f data~ a:o = data~ x:o. rR . 0). 0). else head~ inst = pack bytes (UNSAVE . = unsaved rG = if (data~ a:o:h _ (data~ a:o:l & # fffc0000)) f data~ a:o:h = 0. This code is used in section 332. cool~ set l = true . 0). else head~ inst = pack bytes (UNSAVE . spec install (&g[rA ]. 255. 3. cool~ ren a = true . spec install (&g [rG ]. data~ interrupt j= B_BIT . rZ . case 1: if (cool~ yy  rP ) head~ inst = pack bytes (UNSAVE . 335. g This code is used in section 81. h Get ready for the next step of UNSAVE 335 i  switch (cool~ xx ) f case 0: head~ inst = pack bytes (UNSAVE . data~ x:o:h = 0. 0). data~ a:o:h &= # ffffff.rst phase of unsaving 334 i  cool~ ren x = true . 1. break . 1). 0). &cool~ a). = unsaved rA = data~ x:o:l = data~ x:o:h  24.

but backwards. This code is used in section 279. cool~ i = sav . if (cool~ interrupt & B_BIT ) cool~ i = noop . else if (((cool S :l cool O :l) & lring mask )  cool L ^ cool L 6= 0) h Insert an instruction to advance gamma 113 i else f cool~ interim = true . switch (cool~ zz ) f 304 . Of course SAVE is handled essentially like UNSAVE . h Special cases of instruction dispatch 117 i + case save : if (cool~ xx < cool G ) cool~ interrupt j= B_BIT . 337.n ex .

305 MMIX-PIPE: ADMINISTRATIVE OPERATIONS case 0: h Set up the .

break . break . case 2: case 3: h Generate an instruction to save g[yy ] 339 i.rst phase of saving 338 i. default : cool~ interim = false . g g break . If an interrupt occurs during the . cool~ interrupt j= B_BIT . cool~ yy = cool G . break . case 1: if (cool O :l 6= cool S :l) h Insert an instruction to advance gamma 113 i cool~ zz = 2. 338. cool~ i = noop .

the value cool~ zz = 1 will get things restarted properly. if context is saved and unsaved during the interrupt. many incgamma instructions may no longer be necessary.) h Set up the .rst phase. say between two incgamma instructions. (Indeed.

&cool~ x). if (cool~ zz  3 ^ cool~ yy > rZ ) h Do the . cool~ x:o:l = cool L . 3).rst phase of saving 338 i  cool~ zz = 1. new O = new S = incr (cool O . cool~ x:known = true . 1). This code is used in section 337. cool L + 1). cool~ set l = true . = this instruction needs to be handled by load/store unit = cool~ mem x = true . spec install (&l[(cool O :l + cool L ) & lring mask ]. &cool~ rl ). new O = incr (cool O . 339. &cool~ x). cool~ ren x = true . spec install (&g[rL ]. cool~ z:o = shift left (cool O . spec install (&mem . h Generate an instruction to save g[yy ] 339 i  op = STOU . cool~ x:o:h = 0.

x11. op : mmix opcode . false = 0. mem : specnode . x320. x86. x115. tetra . new S : octa . noop = 81. . x124. bool . x44. x17. x40. mem x : bool . pack bytes = macro ( ). This code is used in section 337. x99. x99. x40. specnode . lring mask : int . new O : octa .nal SAVE 340 i else cool~ b = specval (&g[cool~ yy ]). x88. o: octa . x44. x49.

n ex : label. up : specnode . x44. x44. incgamma = 84. x47. cool S : octa . x44. x98. x44. x52. i: internal opcode . ptr a : void . UNSAVE = # fb. x69. x11. h: tetra . x52. x44. specval : static spec ( ). ae x true = 1. a: b: specnode . rL = 20. x17. rA = 21. x68. ren a : bool . x44. x99. ren x : bool . z : spec . x44. cool G : int . set l : bool . known : l: l: rZ = 27. STOU = . unsigned char . MMIX-ARITH x7. B_BIT = 1  2. x40. x52. data : register control . cool L : int . x49. x44. 47. rl : specnode . x44. x44. x93. head : fetch . # MMIX-ARITH x7. rG = 19. shift right : octa ( ). interrupt : unsigned int . x99. x52. unsigned char . x44. rP = 23. sav = 87. x44. zz : unsigned char . interim : bool . xx : yy : . g : specnode [ ]. cool : control . spec install : static void ( ). shift left : octa ( ). spec . rR = 6. x144. x86. x60. incr : octa ( ). x: specnode. inst : tetra . x98. x49. x44. x49. x95. MMIX-ARITH x6. save = 77. x44. x52. x52. x44. x54. cool O : octa .

The .MMIX-PIPE: ADMINISTRATIVE OPERATIONS 306 340.

it also places the .nal SAVE instruction not only stores rG and rA.

nal address in global register X. h Do the .

cool~ xx . break . break . &cool~ a). 1). cool~ xx . 3). 342. break . cool~ xx . h Get ready for the next step of SAVE 341 i  switch (cool~ zz ) f case 1: head~ inst = pack bytes (SAVE .nal SAVE 340 i  f cool~ i = save . cool~ xx . 0. cool~ yy + 1. spec install (&g[cool~ xx ]. cool~ yy + 1. 0. rP . else head~ inst = pack bytes (SAVE . data~ x:o:l = g[rA ]:o:l. 3). else f = we need the hottest value of rA = if (data 6= old hot ) wait (1). h Handle an internal SAVE when it's time to store 342 i  f if (data~ interim ) data~ x:o = data~ b:o. 3). 341. data~ a:o = data~ y:o. 2). else head~ inst = pack bytes (SAVE . g This code is used in section 339. cool~ ren a = true . data~ x:o:h = g[rG ]:o:l  24. cool~ xx . g This code is used in section 81. g goto . cool~ interim = false . case 3: if (cool~ yy  rR ) head~ inst = pack bytes (SAVE . case 2: if (cool~ yy  255) head~ inst = pack bytes (SAVE .

. g This code is used in section 281.n ex .

More register-to-register ops.307 MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS 343. Now that we've .

nished most of the hard stu . we can relax and .

First let's complete the .ll in the holes that we left in the all-register parts of the execution stages.

specnode . . x60. x44. shift right : octa SAVE ( ). x124. D_BIT = 1  15. x49. data~ z:o). div = 9. aux :l _ aux :h. case div : if (data~ z:o:l  0 ^ data~ z:o:h  0) f data~ interrupt j= D_BIT . j ++ ) aux = shift right (aux . MMIX-ARITH x12. x49. x49. mulu = 27. mul8 = 8. break . if (over ow ) data~ interrupt j= V_BIT . x49. by dispensing with multiplication and division. x54. data~ i = div . odiv : octa ( ). data~ i = set . if (over ow ) data~ interrupt j= V_BIT . mul1 = 1. x47.xed point arithmetic operations. x40. x49. cool : control . MMIX-ARITH x13. = # fa. false = 0. signed omult : octa ( ). = j is mul0 or mul1 or : : : or mul8 = case divu : data~ x:o = odiv (data~ b:o. data~ z:o). h Cases to compute the results of register-to-register operation 137 i + case mulu : data~ x:o = omult (data~ y:o. x60. divu = 28. b: spec . data~ a:o = aux . = divide by zero needn't wait in the pipeline = g else f g a: data~ x:o = signed odiv (data~ y:o. data~ a:o = data~ y:o. true = 1. x49. x11. quantify mul : aux = data~ z:o. data : register control . mul0 = 0. data~ z:o). x44. data~ i = j . 8. octa . old hot : control . x49. x49. MMIX-ARITH x24. data~ a:o = aux . case mul : data~ x:o = signed omult (data~ y:o. break . set = 33. MMIX-ARITH x8. for (j = mul0 . aux : mul = 26. x95. over ow : bool . x11. spec install : static void ( ). omult : octa ( ). break . data~ y:o. 1). signed odiv : octa ( ). MMIX-ARITH x4. goto quantify mul . o: octa . data~ z:o). data~ a:o = aux .

save = 77. x44. x320. wait = macro ( ). x44. MMIX-ARITH x4. x44. x125.n ex : label. x52. rR = 6. x52. x: specnode . x69. rA = 21. x144. y : spec . x86. x49. x44. inst : tetra . x17. xx : unsigned char . l: tetra . interim : bool . x44. head : fetch . x17. rG = 19. rP = 23. x52. x54. V_BIT = 1  14. . x68. x44. x44. j : register int . z : spec . h: tetra . i: internal opcode . interrupt : unsigned int . x12. x44. yy : unsigned char . MMIX-ARITH x7. x44. ren a : bool . x52. x44. zz : unsigned char . pack bytes = macro ( ). g : specnode [ ].

h Cases to compute the results of register-to-register operation 137 i + case sadd : data~ x:o:l = count bits (data~ y:o:h & data~ z:o:h) + count bits (data~ y:o:l & data~ z:o:l). data~ op & # 2). however. break . although the ZS instructions do more. case odif : if (data~ y:o:h > data~ z:o:h) data~ x:o = ominus (data~ y:o. the INCL instruction need not wait for the division to be completed. abort the division in such a case.c. rather surprisingly. case wdif : data~ x:o:h = wyde di (data~ y:o:h. more diÆcult to implement than the zero set (ZS ) instructions. data~ z:o:l).) h Cases to compute the results of register-to-register operation 137 i + case zset : if (register truth (data~ y:o. = otherwise data~ x:o is already zero = goto . The conditional set (CS ) instructions are. break . it might invoke a trip handler. case mor : data~ x:o = bool mult (data~ y:o. for example. data~ z:o. break . break . or change the inexact bit. If the value of x is zero. break . 345.MMIX-PIPE: 308 MORE REGISTER-TO-REGISTER OPS 344. data~ op )) data~ x:o = data~ z:o.1. data~ z:o). the instructions LDO x.0. FDIV y. CSZ y. Next let's polish o the bitwise and bytewise operations. case tdif : if (data~ y:o:h > data~ z:o:h) data~ x:o:h = data~ y:o:h data~ z:o:h. data~ z:o:h). tdif l : if (data~ y:o:l > data~ z:o:l) data~ x:o:l = data~ y:o:l data~ z:o:l. data~ x:o:l = byte di (data~ y:o:l. case bdif : data~ x:o:h = byte di (data~ y:o:h.d.b. etc. else if (data~ y:o:h  data~ z:o:h) goto tdif l . The reason is that dynamic instruction dependencies are more complicated with CS . break . INCL y. data~ z:o:l). data~ x:o:l = wyde di (data~ y:o:l.a. Our policy is to treat common cases eÆciently and to treat all cases correctly. (We do not.x. but not to treat all cases with maximum eÆciency. data~ z:o:h). Consider.

which record anomalous events in the global variable exceptions . Floating point computations are mostly handled by the routines in MMIXARITH. data~ op )) data~ x:o = data~ z:o. case cset : if (register truth (data~ y:o. data~ need b = true . else f data~ state = 0. data~ b:p = .n ex . 346. g break . But we consider the operation trivial if an input is in. goto switch1 . else if (data~ b:p  ) data~ x:o = data~ b:o.

and we may need to increase the execution time when denormals are present. #de.nite or NaN.

ne ROUND_OFF 1 #de.

ne ROUND_UP 2 #de.

ne ROUND_DOWN 3 #de.

ne ROUND_NEAR 4 #de.

ne is denormal (x) ((x:h & # 7ff00000)  0 ^ ((x:h & # fffff) _ x:l)) #de.

ne is trivial (x) ((x:h & # 7ff00000)  # 7ff00000) #de.

ne set round cur round = (data~ ra :o:l < # 10000 ? ROUND_NEAR : data~ ra :o:l  16) .

data~ x:o = fplus (data~ y:o. data~ z:o).309 MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS h Cases to compute the results of register-to-register operation 137 i + case fadd : set round . .

n b ot : if (is denormal (data~ y:o)) data~ denin = denin penalty . .

.n u ot : if (is denormal (data~ x:o)) data~ denout = denout penalty .

data~ interrupt j= exceptions . if (is trivial (data~ y:o) _ is trivial (data~ z:o)) goto .n ot : if (is denormal (data~ z:o)) data~ denin = denin penalty .

if (data~ i  fsqrt ^ (data~ z:o:h & sign bit )) goto .n ex .

zero octa ) 6= 2) data~ a:o:h = sign bit . if (fcomp (data~ z:o. data~ i = fadd . set round . data~ a:o).n ex . = use pipeline times for addition = goto . case fsub : data~ a:o = data~ z:o. break . data~ x:o = fplus (data~ y:o.

n b ot . data~ z:o). data~ x:o = fmult (data~ y:o. goto . case fmul : set round .

data~ x:o = fdivide (data~ y:o. data~ z:o).n b ot . goto . case fdiv : set round .

case fsqrt : data~ x:o = froot (data~ z:o. goto .n b ot . data~ y:o:l).

case .n u ot .

nt : data~ x:o = .

data~ y:o:l).ntegerize (data~ z:o. goto .

n u ot . case .

x : data~ x:o = .

xit (data~ z:o. if (data~ op & # 2) exceptions &= W_BIT . = unsigned case doesn't over ow = goto . data~ y:o:l).

data~ interrupt j= exceptions . data~ op & # 4). break . data~ y:o:l.n ot . case ot : data~ x:o = oatit (data~ z:o. 347. data~ op & # 2. h Special cases of instruction dispatch 117 i + case fsqrt : case .

nt : case .

bool mult : octa ( ). break . MMIX-ARITH x30. x44. x60. cur round : int . count bits : int ( ). spec . x49.x : case ot : if (cool~ y:o:l > 4) goto illegal inst . bdif = 48. byte di : tetra ( ). cset = 53. MMIX-ARITH x29. data : . a: b: specnode . x49. cool : control . MMIX-ARITH x27. x44. MMIX-ARITH x26.

.n ex : label. x144.

x49.nt = 18. .

x40. odif = 51. MMIX-ARITH x86. x44. x49. op : mmix opcode . ominus : octa ( ). .ntegerize : octa ( ). MMIX-ARITH x5. octa .

x49.x = 19. .

oatit : octa ( ). tdif = 50. zset = 52. x54. z : spec . fplus : octa ( ). MMIX-ARITH x44. MMIX-ARITH x84. x49. fdiv = 16. denout : int . x11. mor = 13. x49. x49. l: tetra . need b : bool . x: specnode . MMIX-ARITH x32. 124. x49. wyde di : tetra ( ). switch1 : label. x130. p: specnode . h: tetra . W_BIT wdif = 49.xit : octa ( ). MMIX-ARITH x41. state : int . x44. fadd = 14. sign bit = macro. x44. x44. x49. x o: = 1  13. register truth : static int MMIX-ARITH x89. ot = 20. denout penalty : int . x17. sadd = 12. x44. x80. x44. x49. y : spec . x17. interrupt : unsigned int . denin penalty : int . x44. x49. fcomp : int ( ). froot : octa ( ). illegal inst : label. ra : spec . fsqrt = 17. x118. x49. MMIX-ARITH x912. fmul = 15. x349. x49. MMIX-ARITH x46. MMIX-ARITH x28. exceptions : int . x49. zero octa : octa . x157. MMIX-ARITH x4. fsub = 24. fmult : octa ( ). x44. true = 1. i: internal opcode . x49. x44. fdivide : octa ( ). MMIX-ARITH x88. ( ). x44. x40. x44. register control . x349. denin : int . .

if (j  2) data~ i = fcmp . data~ op 6= FEQLE ). data~ z:o. h Cases to compute the results of register-to-register operation 137 i + case feps : j = fepscomp (data~ y:o. else goto cmp zero . data~ b:o.MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS 310 348. case FEQLE : goto cmp . switch (data~ op ) f case FUNE : if (j  2) goto cmp pos . else if (is denormal (data~ y:o) _ is denormal (data~ z:o)) data~ denin = denin penalty .

if (j < 0) goto cmp neg . g case fcmp : j = fcomp (data~ y:o. data~ z:o). cmp .n . case FCMPE : if (j ) goto cmp zero or invalid .

349. denout penalty . data~ z:o. The oating point remainder operation is especially interesting because it can be interrupted when it's in the hot seat. h Cases to compute the results of register-to-register operation 137 i + case frem : if (is trivial (data~ y:o) _ is trivial (data~ z:o)) f data~ x:o = fremstep (data~ y:o. Extern int denin penalty . cmp zero or invalid : if (j  2) data~ interrupt j= I_BIT . else goto cmp zero . case funeq : if (fcomp (data~ y:o. data~ z:o)  (data~ op  FUN ? 2 : 0)) goto cmp pos . 350. goto . goto cmp zero .n : if (j  1) goto cmp pos . h External variables 4 i + Extern int frem max . 2500).

goto passit . j = 1. data~ z:o. pass after (j ). g if ((self + 1)~ next ) wait (1). frem max ). h Begin execution of a stage-two operation 351 i  j = 1. 351. if (trying to interrupt ^ data  old hot ) goto . data~ interim = true . if (exceptions & E_BIT ) f data~ y:o = data~ x:o.n ex . if (data~ i  frem ) f data~ x:o = fremstep (data~ y:o. if (is denormal (data~ y:o) _ is denormal (data~ z:o)) j += denin penalty .

n ex . g else f g data~ state = 3. . g This code is used in section 135. wait (j ). if (is denormal (data~ x:o)) j += denout penalty . data~ interrupt j= exceptions . data~ interim = false .

= # 13. FEQLE . x143. x143. cmp neg : label. x44. cmp zero : label. x47.311 b: spec . x143. cmp pos : label. data : register control . MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS MMIX-ARITH x50.

x346. x49. x125. is denormal = macro ( ). x134. x23. x44. trying to interrupt : bool . MMIX-ARITH x84. FCMPE = # 11. x44. frem = 25. x11. x . is trivial = macro ( ). x44. z : spec . x47. x144. x49. fremstep : octa ( ). x346. fcomp : int ( ). j : register int . x44. x54. exceptions : int . x40. x44. false = 0. x125. x44. funeq = 23. x11. x44. x12. wait = macro ( ). x: specnode . fepscomp : int ( ). x54. interrupt : unsigned int . feps = 21. x4. old hot : control . passit : label. MMIX-ARITH x32. x49. x47. x44. 315. o: pass after = macro ( ). i: internal opcode . MMIX-ARITH x93. op : mmix opcode . I_BIT = 1  12. y : spec . Extern = macro. octa . denin : int . FUN = # 02. interim : bool . FUNE = # 12. x49.n ex : label. x 124. x next : coroutine . self : register coroutine . true = 1. 124. x44. state : int . fcmp = 22. x47. x60. E_BIT = 1  18.

because it changes the IT and DT caches. data~ z:o:h = 0. data~ y:o).MMIX-PIPE: SYSTEM OPERATIONS 312 352. the system is also responsible for ensuring that the page table permission bits agree with the LDVTS permission bits when the latter are nonzero. if (DTcache~ lock _ (j = get reader (DTcache )) < 0) wait (1). = N. the operating system must have previously used SYNCD to write out any dirty bytes that might have been cached from that page. startup (&DTcache~ reader [j ]. if (data~ z:o:l) f p = use and .B. data~ z:o:l = data~ y:o:l & # 7. System operations.: Not trans key (data~ y:o) = p = cache search (DTcache . SYNCD will be inoperative after write permission goes away. h Do stage 1 of LDVTS 353 i  f if (data 6= old hot ) wait (1). (Also. then the hardware simulation will be done! A LDVTS instruction is delayed until it reaches the hot seat. The operating system should use SYNC after LDVTS if the e ects are needed immediately. 353. if write permission is taken away from a page.) h Handle special cases for operations like prego and ldvts 289 i + if (data~ i  ldvts ) h Do stage 1 of LDVTS 353 i. DTcache~ access time ). if (p) f data~ x:o:l = 2. Finally we need to implement some operations for the operating system.

p~ data [0]:l = (p~ data [0]:l & 8) + data~ z:o:l. p). g else f p = demote and .x (DTcache .

data~ y:o). if (p) f data~ x:o:l j= 1. goto passit . startup (&ITcache~ reader [j ].: Not trans key (data~ y:o) = p = cache search (ITcache .x (DTcache . g This code is used in section 352. if (data~ z:o:l) f p = use and . h Special cases for states in later stages 272 i + case ld st launch : if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1). p). g g pass after (DTcache~ access time ). = N. = invalidate the tag = p~ tag :h j= sign bit . 354. ITcache~ access time ).B.

p).x (ITcache . g else f p = demote and . p~ data [0]:l = (p~ data [0]:l & 8) + data~ z:o:l.

wait (ITcache~ access time ). . g g data~ state = 3. = invalidate the tag = p~ tag :h j= sign bit . p).x (ITcache .

356. if (cool~ zz & 1) cool~ mem x = true . goto . SYNC 0 and SYNC 4 are the simplest. The SYNC operation interacts with the pipeline in interesting ways. if (cool~ zz  4) freeze dispatch = true . spec install (&mem . h Cases for stage 1 execution 155 i + case sync : switch (data~ zz ) f case 0: case 4: if (data 6= old hot ) wait (1). SYNC 5 . SYNC 1 and SYNC 3 put a \barrier" into the write bu er so that subsequent store instructions will not merge with previous stores. they just lock the dispatch and wait until they get to the hot seat. and SYNC 7 remove things from caches once they get to the hot seat. break . halted = (data~ zz 6= 0). SYNC 6 . &cool~ x). g else f g if (cool~ zz 6= 1) freeze dispatch = true . SYNC 2 and SYNC 3 lock the dispatch until all previous load instructions have left the pipeline.313 MMIX-PIPE: SYSTEM OPERATIONS 355. h Special cases of instruction dispatch 117 i + case sync : if (cool~ zz > 3) f if (:(cool~ loc :h & sign bit )) goto privileged inst . after which the pipeline has drained.

case 2: case 3: h Wait if there's an un.n ex .

dispatch lock ). case 1: data~ x:addr = zero octa . release lock (self .nished load ahead of us 357 i. goto .

case 6: if (data 6= old hot ) wait (1). h Zap the translation caches 358 i. data : octa . demote and . x167. data : register control . x167. x60. h Zap the instruction and data caches 359 i. h Clean the data caches 361 i. addr : octa . x40. x124. case 7: if (data 6= old hot ) wait (1). cache search : static cacheblock ( ). g access time : int .n ex . case 5: if (data 6= old hot ) wait (1). cool : control . x193.

DTcache : cache . x65. x199. . x168.x : static cacheblock ( ). dispatch lock : lockvar .

x80. sign bit = macro. lock : lockvar . x17. get reader : static int ( ). x75. x44. old hot : x 258. self : register coroutine . loc : octa . mem x : bool . x17. x44. freeze dispatch : register bool . x31. x183. privileged inst : label. j : register int . x115. i: internal opcode . x12. x49. x49. p: register cacheblock . x49. x265. mem : specnode . release lock = macro ( ). x12. trans key = macro ( ).n ex : label. ld st launch = 7. x44. spec install : static void ( ). 95. x118. l: tetra . x144. sync = 79. use and . pass after = macro ( ). halted : bool . x134. o: octa . x44. x240. x37. x168. x167. ITcache : cache . reader : coroutine . x125. passit : label. tag : octa . true = 1. x167. x11. x 124. x40. state : int . x60. prego = 73. startup : static void ( ). control . ldvts = 60. x167. h: tetra .

x . wait = macro ( ). y : spec . x44. x: specnode . x125. zero octa : octa . MMIX-ARITH x4. z : spec . x44. x44. x196.x : static cacheblock ( ). zz : unsigned char . x44.

h Wait if there's an un.MMIX-PIPE: 314 SYSTEM OPERATIONS 357.

wait (Icache~ access time ). wait (DTcache~ access time ). Icache~ lock ). g g This code is used in section 356. for (cc = data . if (cc~ owner ^ (cc~ i  ld _ cc~ i  ldunc _ cc~ i  pst )) wait (1). set lock (self . startup (&Icache~ reader [j ]. wait (1). 360. cc 6= hot . zap cache (DTcache ). ) f cc = (cc  reorder top ? reorder bot : cc + 1). 359. 358. Icache~ access time ). if (Icache~ lock _ (j = get reader (Icache )) < 0) wait (1). h Zap the instruction and data caches 359 i  if (:Icache ) f g data~ state = 11. data~ state = 10. h Special cases for states in the .nished load ahead of us 357 i  f register control cc . h Zap the translation caches 358 i  if (DTcache~ lock _ (j = get reader (DTcache )) < 0) startup (&DTcache~ reader [j ]. goto switch1 . This code is used in section 356. This code is used in section 356. zap cache (Icache ). DTcache~ access time ). DTcache~ lock ). set lock (self . data~ state = 11. Perhaps the delay should be longer here.

if (ITcache~ lock _ (j = get reader (ITcache )) < 0) wait (1). set lock (self . startup (&Dcache~ reader [j ]. startup (&ITcache~ reader [j ]. if (:Dcache ) f data~ state = 12. set lock (self . wait (Dcache~ access time ). self ~ lockloc = .rst stage 266 i + case 10: if (self ~ lockloc ) (self ~ lockloc ) = . = zap the write bu er = write head = write tail . data~ state = 3. Dcache~ lock ). ITcache~ lock ). zap cache (Dcache ). goto switch1 . . case 11: if (self ~ lockloc ) (self ~ lockloc ) = . ITcache~ access time ). wait (ITcache~ access time ). Dcache~ access time ). g if (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1). if (wbuf lock ) wait (1). zap cache (ITcache ). data~ state = 12. self ~ lockloc = . write ctl :state = 0.

if (:Scache ) goto . self ~ lockloc = .315 MMIX-PIPE: SYSTEM OPERATIONS case 12: if (self ~ lockloc ) (self ~ lockloc ) = .

zap cache (Scache ). wait (Scache~ access time ). h Wait till write bu er is empty 362 i  6 write tail ) f if (write head = if (:speed lock ) set lock (self .n ex . 362. self ~ lockloc = . clean ctl :state = 0. data~ state = 3.) h Special cases for states in the . 361. (Servicing the interruption might. h Wait till write bu er is empty 362 i. put more stu into the cache. data~ interim = true . if (clean co :next _ clean lock ) wait (1). clean lock ). data~ state = 13. The cleanup process might take a huge amount of time. g This code is used in sections 361 and 364. clean ctl :i = sync . wait (1). 1). Scache~ lock ). 363. wait (1). clean ctl :x:o:h = 0. so we must allow it to be interrupted. h Clean the data caches 361 i  if (self ~ lockloc ) (self ~ lockloc ) = . startup (&clean co . if (Scache~ lock ) wait (1). of course. This code is used in section 356. speed lock ). set lock (self . set lock (self .

rst stage 266 i + case 13: if (:clean co :next ) f = it's done! = data~ interim = false . goto .

n ex . g = accept an interruption = if (trying to interrupt ) goto .

wait (1). x44. x11. x168. x230. Dcache : cache . clean lock : lockvar . clean ctl : control . x167. x230. clean co : coroutine . x230. x124. access time : int . data : register control . DTcache : cache . control = struct . x168. false = 0.n ex . .

wait = macro ( ).n ex : label. write head : write node . i: internal opcode . x167. x49. startup : static void ( ). x44. x168. reorder bot : control . write ctl : control . control . x37. x31. x write tail : write node . register int . x247. x181. x49. set lock = macro ( ). owner : coroutine . Icache : cache . true = 1. speed lock : lockvar . state : int . self : register coroutine . x167. x60. trying to interrupt : bool . switch1 : label. x247. x44. x49. x60. x12. 247. wbuf lock : lockvar . o: octa . x49. x44. reorder top : control . x 315. x168. x130. x40. x23. interim : bool . reader : coroutine . x168. h: tetra . . x125. get reader : static int ( ). ldunc = 59. pst = 66. x60. x23. x183. lockloc : coroutine . hot : ITcache : j: cache . x144. zap cache : void ( ). x17. x: specnode . sync = 79. ld = 56. x248. lock : lockvar . x11. Scache : cache . x247. x44. x44. next : coroutine . x 124.

data~ y:o is a virtual address and data~ z:o is the corresponding physical address. data~ xx + 1 is the number of bytes we are supposed to be syncing.MMIX-PIPE: SYSTEM OPERATIONS 316 364. data~ b:o:l is the number of bytes we can handle at once (either Icache~ bb or Dcache~ bb or 8192). Now we consider SYNCD and SYNCID . When control comes to this part of the program. and PREST . because SYNCD and SYNCID are not merely hints. because we cannot be sure that the starting virtual address will be aligned with the beginning of a cache block. PREGO . We need to realize that the bytes speci. We need a more elaborate scheme to implement SYNCD and SYNCID than we have used for the \hint" instructions PRELD . They cannot be converted into a sequence of cache-block-size commands at dispatch time.

h Clean the S-cache block for data~ z:o. data~ state = 32. do syncd : data~ state = 33. if any 366 i. And we also need to keep the fetch bu er empty until a user's SYNCID has completely brought the memory up to date. h Wait till write bu er is empty 362 i. h Wait till write bu er is empty 362 i. if any 365 i. g h Clean the I-cache block for data~ z:o. case 32: if (self ~ lockloc ) (self ~ lockloc ) = . self ~ lockloc = . self ~ lockloc = . if any 367 i. data~ state = 35. h Special cases for states in later stages 272 i + do syncid : data~ state = 30. wait (Scache~ access time ). wait (Icache~ access time ). if (:Scache ) goto next sync . case 30: if (data 6= old hot ) wait (1). if (((data~ b:o:l 1) & data~ y:o:l) < data~ xx ) data~ interim = true . self ~ lockloc = . if (self ~ lockloc ) (self ~ lockloc ) = . case 33: if (data 6= old hot ) wait (1). h Clean the D-cache block for data~ z:o. wait (Dcache~ access time ). if (:Dcache ) goto next sync . if (((data~ b:o:l 1) & data~ y:o:l) < data~ xx ) data~ interim = true .ed by SYNCD or SYNCID might cross a virtual page boundary|possibly with di erent protection bits on each page. We need to allow for interrupts. if (:Dcache ) if (data~ i  syncd ) goto . goto switch2 . if (:Icache ) f data~ state = (data~ loc :h & sign bit ? 31 : 33). case 31: if (self ~ lockloc ) (self ~ lockloc ) = . data~ state = (data~ loc :h & sign bit ? 31 : 33).

else goto next sync . h Use cleanup on the cache blocks for data~ z:o. data~ state = 34. = accept an interruption = goto . if any 368 i. if (trying to interrupt ^ data~ interim ^ data  old hot ) f = anticipate RESUME_CONT = data~ z:o = zero octa . case 34: if (:clean co :next ) goto next sync .n ex .

n ex . . next sync : data~ state = 35. g wait (1).

self ~ lockloc = . goto .317 MMIX-PIPE: SYSTEM OPERATIONS case 35: if (self ~ lockloc ) (self ~ lockloc ) = . data~ go :known = true . if (data~ interim ) h Continue this command on the next cache block 369 i.

data~ z:o). startup (&Icache~ reader [j ].n ex . set lock (self . p = cache search (Icache . 365. Icache~ lock ). if (p) f demote and . Icache~ access time ). if any 365 i  if (Icache~ lock _ (j = get reader (Icache )) < 0) wait (1). h Clean the I-cache block for data~ z:o.

if (p) f demote and . if any 366 i  if (Dcache~ lock _ (j = get reader (Dcache )) < 0) wait (1). data~ z:o). p = cache search (Dcache . h Clean the D-cache block for data~ z:o. startup (&Dcache~ reader [j ]. p). p). Dcache~ access time ). Dcache~ lock ). clean block (Icache . g This code is used in section 364. 366. set lock (self .x (Icache .

Scache~ lock ). clean block (Dcache . 367. p). p = cache search (Scache . g This code is used in section 364. p). if any 367 i  if (Scache~ lock ) wait (1). data~ z:o). set lock (self .x (Dcache . if (p) f demote and . h Clean the S-cache block for data~ z:o.

x (Scache . p). x44. x124. data : register control . spec . x230. x167. clean co : coroutine . x168. clean block (Scache . x193. x167. p). bb : int . cleanup = 91. cache search : static cacheblock ( ). clean block : void ( ). x129. access time : int . g This code is used in section 364. x179. demote and . Dcache : cache .

x : static cacheblock ( ). . x199.

x144. x125. x44. interim : bool . trying to interrupt : bool . x320. . switch2 : label. next : coroutine . x44. spec . known : bool . state : int . x40. y: z: spec . self : register coroutine . x60. x40. x167. o: octa . reader : coroutine . x167. x17. x37. RESUME_CONT = 1. x258. p: register cacheblock . x12. MMIX-ARITH x4. set lock = macro ( ). b: i: internal opcode . x44. loc : octa . x168. x80. lockloc : coroutine . x44.n ex : label. h: tetra . lock : lockvar . j: register int . get reader : static int ( ). xx : unsigned char . x23. true = 1. old hot : control . x183. x44. zero octa : octa . x 124. syncd = 64. x31. x11. go : specnode . x135. x44. x49. sign bit = macro. x168. x44. x44. x23. Scache : cache . x 315. wait = macro ( ). l: tetra . x17. Icache : cache . startup : static void ( ).

= maybe crossed a page boundary = if (data~ i  syncd ) goto do syncd . schedule (&clean co . 4). 1. clean ctl :i = syncd . else goto do syncid . We use the fact that cache block sizes are divisors of 8192. clean ctl :state = 4. g This code is used in section 364. set lock (self . if any 368 i  if (clean co :next _ clean lock ) wait (1). If the . clean lock ). h Continue this command on the next cache block 369 i  f data~ interim = false . 370. data~ y:o:l &= data~ b:o:l. if ((data~ y:o:l & 8191)  0) goto square one .MMIX-PIPE: SYSTEM OPERATIONS 318 368. data~ b:o:l). clean ctl :x:o:h = data~ loc :h & sign bit . This code is used in section 364. clean ctl :z:o = data~ z:o. h Use cleanup on the cache blocks for data~ z:o. data~ y:o = incr (data~ y:o. data~ z:o:l = (data~ z:o:l & 8192) + (data~ y:o:l & 8191). data~ xx = ((data~ b:o:l 1) & data~ y:o:l) + 1. 369.

rst page lacks proper protection. h Special cases for states in later stages 272 i + sync check : if ((data~ y:o:l  (data~ y:o:l + data~ xx ))  8192) f data~ xx = (8191 & data~ y:o:l) + 1. data~ y:o:l &= 8192. g data~ y:o = incr (data~ y:o. goto . we still must try the second. 8192). goto square one . in the rare case that a page boundary is spanned.

n ex . .

because we sometimes want to pretend that a real operating system is present without actually having one loaded. #de. We're done implementing the hardware. Input and output. the ten special I/O traps of MMIX-SIM are performed instantaneously behind the scenes. Of course all claims of accurate simulation go out the door when this feature is used. This simulator therefore implements a special feature: If RESUME 1 is issued in location rT.319 MMIX-PIPE: INPUT AND OUTPUT 371. but there's still a small matter of software remaining.

ne max sys call Ftell h Type de.

Fseek . Fgetws . x230. Fread . Fopen . x364. x129. Fputws . false = 0. x230. Fclose . x11. b: spec . clean ctl : control . x44. clean co : coroutine . Fwrite . x 124.nitions 11 i + typedef enum f Halt . . do syncd : label. Fgets . clean lock : lockvar . Ftell g sys call . data : register control . do syncid : label. Fputs . x364. cleanup = 91. x230.

state : int . x44. x49. x44. l: tetra . x272. x44. x23. interim : bool . MMIX-ARITH x6. x144. x: specnode. x44. o: octa . wait = macro ( ). set lock = macro ( ). sign bit = macro. x 124. x44. incr : octa ( ). square one : label. x80. i: internal opcode . unsigned char . x28. schedule : static void ( ). x17. syncd = 64. y : spec . self : register coroutine . x44. x44. loc : octa . h: tetra .n ex : label. x37. z : spec . x17. xx : . x40. x125. next : coroutine . x44.

if cool~ loc is rT 372 i  if (cool~ loc :l  g[rT ]:o:l ^ cool~ loc :h  g[rT ]:o:h) f register unsigned char yy . octa . 3. mmix fwrite ARGS ((unsigned char . if (yy > max sys call ) goto magic done . 3. break . Here we need only declare those subroutines. zz = g[rXX ]:o:l & # ff. ma ). octa ma . ma ).MMIX-PIPE: INPUT AND OUTPUT 320 372. 2. octa )). case Fgetws : g[rBB ]:o = mmix fgetws (zz . 375. if (:(trap loc :h _ trap loc :l  # 90)) print trip warning (trap loc :l  4. 376. mb . ma ). h Prepare memory arguments ma = M[a] and mb = M[b] if needed 380 i. octa )). octa trap loc . octa . octa . break . h Global variables 20 i + extern extern extern extern extern extern extern octa octa octa octa octa octa octa mmix fopen ARGS ((unsigned char . mmix fgetws ARGS ((unsigned char . break . break . 2. h Global variables 20 i + char arg count [ ] = f1. mb . 1. mmix fgets ARGS ((unsigned char . g[rBB ]:o). mb . octa )). case Fputs : g[rBB ]:o = mmix fputs (zz . break . g This code is used in section 372. case Fputws : g[rBB ]:o = mmix fputws (zz . mmix fclose ARGS ((unsigned char )). 3. mmix fputs ARGS ((unsigned char . incr (g [rW ]:o. g = this will enable interrupts = g This code is used in section 322. zz . mb . and write three primitive interfaces on which they depend. 2. case Fclose : g[rBB ]:o = mmix fclose (zz ). ma ). case Fread : g[rBB ]:o = mmix fread (zz . break . octa )). 1g. octa )). g[rBB ]:o). mb . 374. 3. break . switch (yy ) f case Halt : h Either halt or print warning 373 i. else if (zz  1) f magic done : g[255]:o = neg one . break . case Ftell : g[rBB ]:o = mmix ftell (zz ). h Magically do an I/O operation. octa . trap loc = incr (g[rWW ]:o. 4)). case Fseek : g[rBB ]:o = mmix fseek (zz . break . yy = g[rXX ]:o:l  8. case Fwrite : g[rBB ]:o = mmix fwrite (zz . 3. octa . if (g[rXX ]:o:l & # ffff0000) goto magic done . case Fopen : g[rBB ]:o = mmix fopen (zz . break . 373. octa )). ma ). mb . case Fgets : g[rBB ]:o = mmix fgets (zz . . 4). break . g[rBB ]:o). The input/output operations invoked by TRAP s are done by subroutines in an auxiliary program module called MMIX-IO. h Either halt or print warning 373 i  if (:zz ) halted = true . mmix fread ARGS ((unsigned char .

Fopen = 1. octa )). x371. x52. MMIX-IO x8. octa )). loc : octa . MMIX-IO x12. int . rBB = 7. x371. octa )). halted : bool . neg one : octa . MMIX-ARITH x6. mmix fopen : octa ( ). x52. void magic write ARGS ((octa . MMIX-IO x16. mmix fwrite : octa ( ). x60. mmix ftell : octa ( ). true = 1. Fgetws = 5. x371. mmix fread : octa ( ). MMIX-ARITH x4. mmix fputs : octa ( ). l: tetra . MMIX-IO x23. octa )). void print trip warning ARGS ((int . Fputs = 7. x371. g: h: specnode [ ]. x52. Fclose = 2. mmix fgets : octa ( ). MMIX-IO x19. x371. x40. rXX = 29. rWW = 28. x371. x86. x17. octa )). octa mmix fseek ARGS ((unsigned char . mmix fseek : octa ( ). MMIX-IO x14. Halt = 0. x17. x371. mmix fgetws : octa ( ). tetra . MMIX-IO x20. . int mmputchars ARGS ((unsigned char . x371. Fgets = 4. o: octa . Fseek = 9. x52. octa .321 MMIX-PIPE: extern extern extern extern INPUT AND OUTPUT octa mmix fputws ARGS ((unsigned char . octa magic read ARGS ((octa )). octa = struct . x44. x17. MMIX-IO x22. int . octa mmix ftell ARGS ((unsigned char )). incr : octa ( ). rW = 24. rT = 13. x12. Fputws = 8. x52. x371. x371. cool : control . MMIX-IO x18. ARGS = macro. x371. MMIX-IO x11. h Internal prototypes 13 i + int mmgetchars ARGS ((char . max sys call = macro. Fread = 3. char stdin chr ARGS ((void )). Fwrite = 6. x6. x371. 377. print trip warning : void ( ). int )). mmix fputws : octa ( ). x11. mmix fclose : octa ( ). Ftell = 10. MMIX-IO x21.

We need to cut through all the complications of bu ers and caches in order to do magical I/O. The magic read routine .MMIX-PIPE: INPUT AND OUTPUT 322 378.

D-cache. and memory until .nds the current octabyte in a given physical address by looking at the write bu er. S-cache.

else q ++ . . addr ). . g g g return mem read (addr ). if (p) return p~ data [(addr :l & (Dcache~ bb 1))  3]. addr ). if (p) p~ data [(addr :l & (Dcache~ bb 1))  3] = val . if (((Dcache~ inbuf :tag :l  addr :l) & Dcache~ tagmask )  0 ^ Dcache~ inbuf :tag :h  addr :h) Dcache~ inbuf :data [(addr :l & (Dcache~ bb 1))  3] = val . 379. (Yes. this is magic. if (((Dcache~ outbuf :tag :l  addr :l) & Dcache~ tagmask )  0 ^ Dcache~ outbuf :tag :h  addr :h) Dcache~ outbuf :data [(addr :l & (Dcache~ bb 1))  3] = val . ) f if (q  write head ) break . for (q = write tail . ) f if (q  write head ) break . g if (Dcache ) f p = cache search (Dcache . val . if (Scache ) f p = cache search (Scache . Any \dirty" or \least recently used" status remains unchanged.nding it. for (q = write tail . if (q  wbuf top ) q = wbuf bot . addr ). val ) octa addr . register cacheblock p. if (Scache ) f p = cache search (Scache . addr ). The magic write routine changes the octabyte in a given physical address by changing it wherever it appears in a bu er or cache. if (q~ addr :l  addr :l ^ q~ addr :h  addr :h) return q~ o. if (q~ addr :l  addr :l ^ q~ addr :h  addr :h) q~ o = val .) h Subroutines 14 i + void magic write (addr . g if (Dcache ) f p = cache search (Dcache . if (q  wbuf top ) q = wbuf bot . register cacheblock p. h Subroutines 14 i + octa magic read (addr ) octa addr . f register write node q. if (p) return p~ data [(addr :l & (Scache~ bb 1))  3]. . else q ++ . f register write node q.

if (((Scache~ outbuf :tag :l  addr :l)& Scache~ tagmask )  0 ^ Scache~ outbuf :tag :h  addr :h) Scache~ outbuf :data [(addr :l & (Scache~ bb 1))  3] = val . tag : octa . arg loc = g[rBB ]:o. ma = magic read (arg loc ). write head : write node . g : specnode [ ]. x17.323 MMIX-PIPE: g g g INPUT AND OUTPUT if (p) p~ data [(addr :l & (Scache~ bb 1))  3] = val . Scache : cache . x17. x167. tagmask : int . mem write (addr . x246. x372. data : octa . x 247. mb : octa . ma : octa . x210. MMIX-ARITH x4. 8). x167. octa = struct . x247. x86. o: o: octa . 380. rBB = 7. . if (arg loc :h & # 9fffffff) ma = zero octa . x213. x167. h Prepare memory arguments ma = M[a] and mb = M[b] if needed 380 i  if (arg count [yy ]  3) f octa arg loc . x193. inbuf : cacheblock . if (arg loc :h & # 9fffffff) mb = zero octa . x167. x40. octa . x167. The conventions of our imaginary operating system require us to apply the trivial memory mapping in which segment i appears in a 232 -byte page of physical addresses starting at 232 i. x167. x17. mb = magic read (arg loc ). x372. x374. mem write : void ( ). cacheblock = struct . wbuf top : write node . else arg loc :h = 29. write tail : write node . bb : int . outbuf : cacheblock . else arg loc :h = 29. cache search : static cacheblock ( ). x247. x168. arg count : char [ ]. x52. arg loc = incr (g [rBB ]:o. addr : octa . h: tetra . yy : register unsigned char . x247. g This code is used in section 372. x168. if (((Scache~ inbuf :tag :l  addr :l) & Scache~ tagmask )  0 ^ Scache~ inbuf :tag :h  addr :h) Scache~ inbuf :data [(addr :l & (Scache~ bb 1))  3] = val . val ). x246. MMIX-ARITH x6. x372. x246. mem read : octa ( ). zero octa : octa . Dcache : cache . wbuf bot : write node . write node = struct . x167. incr : octa ( ). l: tetra .

addr . stop ) reads characters starting at address addr in the simulated memory and stores them in buf . h Subroutines 14 i + int mmgetchars (buf . g p ++ . register int m. else p = (x:h  (8  ((a:l) & # 3))) & # ff. int stop . The number of bytes read and stored. octa addr . return 0. exclusive of terminating nulls. return if done 382 i else h Read and store up to eight bytes. and two consecutive null bytes starting at an even address will terminate the process. return if done 383 i g g return size .MMIX-PIPE: INPUT AND OUTPUT 324 381. h Read and store one byte. if (:p ^ stop  0) f if (stop  0) return m. if (((addr :h & # 9fffffff) _ (incr (addr . g This code is used in section 381. (p + 2) = (x:h  8) & # ff. a = incr (a. size . a:h = 29. if (:p ^ (stop  0 _ (stop > 0 ^ x:h < # 10000))) return m. The subroutine mmgetchars (buf . If stop < 0 there is no other criterion. size 1):h & # 9fffffff)) ^ size ) f fprintf (stderr . if stop = 0 a null character will also terminate the process. m ++ . 1). is returned. "Attempt to get characters from off the page!\n" ). stop ) char buf . a = addr . octa a. size . (p + 1) = (x:h  16) & # ff. if ((a:l & # 1) ^ (p 1)  '\0' ) return m 1. if (:(p + 2) ^ (stop  0 _ (stop > 0 ^ (x:h & # ffff)  0))) return m + 2. f register char p. addr . if ((a:l & # 7) _ m > size 8) h Read and store one byte. 383. return if done 382 i  f if (a:l & # 4) p = (x:l  (8  ((a:l) & # 3))) & # ff. m = 0. otherwise addr is even. ) f x = magic read (a). x. . h Read and store up to eight bytes. if (:(p + 1) ^ stop  0) return m + 1. 382. int size . m < size . continuing until size characters have been read or some other stopping criterion has been met. g for (p = buf . return if done 383 i  f p = x:h  24.

<stdio. a = incr (a. MMIX-ARITH x6. m ++ . 1). if (:(p + 3) ^ stop  0) return m + 3. a = addr . if (((addr :h & # 9fffffff) _ (incr (addr . 384. The subroutine mmputchars (buf . x17. h: octa = struct . int size . addr ) puts size characters into the simulated memory starting at address addr . a:h = 29. x. x379. addr ) unsigned char buf . else x:h = (((x:h  s)  p) & # ff)  s. return 0. x378. fprintf : int ( ). tetra . if (:(p + 7) ^ stop  0) return m + 7. incr : octa ( ). (p + 7) = x:l & # ff. 8). if (:(p + 5) ^ stop  0) return m + 5. h Subroutines 14 i + int mmputchars (buf . octa a. <stdio. m < size . octa addr . x). . (p + 4) = x:l  24. if (a:l & # 4) x:l = (((x:l  s)  p) & # ff)  s. m += 8. x17. l: tetra . (p + 5) = (x:l  16) & # ff. size . p += 8. magic write (a. x17. p ++ . h Load and write one byte 385 i  f register int s = 8  ((a:l) & # 3). register int m.325 MMIX-PIPE: INPUT AND OUTPUT (p + 3) = x:h & # ff. if (:(p + 4) ^ (stop  0 _ (stop > 0 ^ x:l < # 10000))) return m + 4.h>. if (:(p + 6) ^ (stop  0 _ (stop > 0 ^ (x:l & # ffff)  0))) return m + 6. f register unsigned char p. magic write : void ( ). x = magic read (a). stderr : FILE . f g for (p = buf . g This code is used in section 381. m = 0. g This code is used in section 384. (p + 6) = (x:l  8) & # ff. size . ) f if ((a:l & # 7) _ m > size 8) h Load and write one byte 385 i else h Load and write eight bytes 386 i. a = incr (a. size 1):h & # 9fffffff)) ^ size ) fprintf (stderr . "Attempt to put characters off the page!\n" ). g g 385. magic read : octa ( ).h>.

g This code is used in section 384. 8). 387. 256. . stdin buf end = p + 1. We cannot deduce the number of characters read by fgets simply by looking at strlen (stdin buf ). x:l = ((p + 4)  24) + ((p + 5)  16) + ((p + 6)  8) + (p + 7). a = incr (a. g g return stdin buf start ++ . magic write (a. h Global variables 20 i + = standard input to the simulated program = char stdin buf [256]. h Subroutines 14 i + char stdin chr ( ) f register char p. stdin buf start = stdin buf . because fgets might read a null character before coming to a newline character. fgets (stdin buf . But there is a slight complication. = current position in that bu er = char stdin buf start . we try to keep the two uses separate by maintaining a private bu er for the simulated program's StdIn . p += 8. When standard input is being read by the simulated program at the same time as it is being used for interaction. h Load and write eight bytes 386 i  f x:h = (p  24) + ((p + 1)  16) + ((p + 2)  8) + (p + 3).MMIX-PIPE: INPUT AND OUTPUT 326 386. x). p ++ ) if (p  '\n' ) break . m += 8. for (p = stdin buf . p < stdin buf + 254. stdin ). Online input is usually transmitted from the keyboard to a C program a line at a time. therefore an fgets operation works much better than fread when we prompt for new input. 388. = current end of that bu er = char stdin buf end . while (stdin buf start  stdin buf end ) f printf ("StdIn> " ). ush (stdout ).

h Deissue all but the hottest command 316 i Used in section 314. MMIX-ARITH x6. otherwise goto stall 82 i Used in section 75. h Check for suÆcient rename registers and memory slots. h Cases 0 through 4. x: octa . a: octa . 344. h Begin execution of a stage-two operation 351 i Used in section 135. fgets : char (). <stdio. 350 i Used in section 132. 222. 348. for the D-cache 233 i Used in section 232. or goto stall 111 i Used in section 75. 224. h Convert relative address to absolute address 84 i Used in section 75. 140. if any 367 i Used in section 364. Names of the sections. otherwise break 256 i Used in section 146.h>. h Commit the hottest instruction. 49 i Used in section 44. h Allocate a slot p in the S-cache 218 i Used in section 217. 141. <stdio. if any 366 i Used in section 364. magic write : p: void ( ). octa ( ). h Begin fetch without I-cache lookup 295 i Used in section 291. <stdio. h Check for external interrupt 314 i Used in section 64. h Check for prest with a fully spanned cache block 275 i Used in section 274.h>. 328. x384. 325. for the S-cache 234 i Used in section 232. h Continue this command on the next cache block 369 i Used in section 364. h Check for security violation. h Cases 5 through 9. h Cases for control of special coroutines 126. 217. 142. x379. h Commit to memory if possible. h Begin an interruption and break 317 i Used in section 146. break if so 149 i Used in section 67. <string. x384. 232. 345. h Clean the D-cache block for data~ z:o. 257 i Used in section 125.h>. 313. h Check for a hit in pending writes 278 i Used in section 273.327 MMIX-PIPE: NAMES OF THE SECTIONS 389.h>. h Assign a functional unit if available. stdout : FILE . 139. h Copy Scache~ inbuf to slot p 220 i Used in section 217. 215. register unsigned char . h Copy the data from block q to fetched 294 i Used in sections 292 and 296. h Clean the data caches 361 i Used in section 356. or break if it's not ready 146 i Used in section 67. m: register int . h Copy data from p into c~ inbuf 226 i Used in section 224. h Deissue the coolest instruction 145 i Used in section 67. x 384. 356 i Used in section 132. h Clean the S-cache block for data~ z:o. incr : l: tetra .h>. h Begin execution of an operation 132 i Used in section 130. 329. h Cases for stage 1 execution 155. stdin : FILE . <stdio. h Begin fetch with known physical address 296 i Used in section 288. <stdio. strlen : int ( ). 138. h: tetra . 327. x17. h Declare mmix opcode and internal opcode 47. h Compute the new entry for c~ inbuf and give the caller a sneak preview 245 i Used in section 237.h>. h Cases to compute the results of register-to-register operation 137. 343. 237. fread : size t ( ). h Cases to compute the virtual address of a memory operation 265 i Used in section 132. x17.h>. if any 365 i Used in section 364. . 331. x384. printf : int ( ). <stdio. ush : int ( ). 143. 346. h Commit and/or deissue up to commit max instructions 67 i Used in section 64. h Clean the I-cache block for data~ z:o.

h Do the . h Do a simultaneous lookup in the I-cache 292 i Used in section 291. and the internal opcode. h Dispatch one cycle's worth of instructions 74 i Used in section 64. h Dispatch an instruction to the cool block if possible. i 80 i Used in section 75. f . h Do load/store stage 1 without D-cache lookup 270 i Used in section 267. h Do stage 1 of LDVTS 353 i Used in section 352. h Do load/store stage 1 with known physical address 271 i Used in section 266. h Do load/store stage 2 without D-cache lookup 277 i Used in section 273. h Do a simultaneous lookup in the D-cache 268 i Used in section 267.MMIX-PIPE: NAMES OF THE SECTIONS h Determine the ags. otherwise goto stall 328 101 i Used in section 75.

374. h Header de. 98. 77. 305. 161. 150. 51. 284. 154. 162.nal SAVE 340 i Used in section 339. 66. h Finish execution of an operation 144 i Used in section 130. h Handle an internal SAVE when it's time to store 342 i Used in section 281. 168. h Finish a store command 281 i Used in section 280. 60. 38. h Global variables 20. h Handle interrupt at end of execution stage 307 i Used in section 144. h Get ready for the next step of UNSAVE 335 i Used in section 81. h Forward the new data past the D-cache if it is write-through 263 i Used in section 257. 238. 388 i Used in section 3. 180. 211. 115. 41. 285. h Either halt or print warning 373 i Used in section 372. h External routines 10. h Execute all coroutines scheduled for the current time 125 i Used in section 64. 194. 253 i Used in section 3. 50. 235. h Fill Scache~ inbuf with clean memory data 219 i Used in section 217. 48. 209. 88. 127. h External prototypes 9. h Generate an instruction to save g[yy ] 339 i Used in section 337. 176. 210. 352 i Used in section 266. h Handle write-around when ushing to the S-cache 221 i Used in section 217. 69. 54. 148. 248. h Handle write-around when writing to the D-cache 259 i Used in section 257. 53. 178. h Generate an instruction to unsave g[yy ] 333 i Used in section 332. 303. 376. 83. 136. 252 i Used in sections 3 and 5. 315. 39. 70. h Handle an internal UNSAVE when it's time to load 336 i Used in section 279. h Get ready for the next step of SAVE 341 i Used in section 81. 36. 86. 59. h Handle special cases for operations like prego and ldvts 289. 230. 181. h Get ready for the next step of PRELD or PREST 228 i Used in section 81. 213. 107. h Get ready for the next step of PREGO 229 i Used in section 81. h External variables 4. 212. 175. 247. 242. 214. 65. 29. 179. 99. 78. h Finish a CSWAP 283 i Used in section 281. 349 i Used in sections 3 and 5. 207.

236. and 337. 7. 231. 116. 87. h Ignore the item in write head 264 i Used in section 257. h Insert an instruction to decrease gamma 114 i Used in section 120. h Insert an instruction to advance beta and L 112 i Used in section 110. 129. 26. 71. 128. 79. 119. h Initialize everything 22. 153. h Insert an instruction to advance gamma 113 i Used in sections 110. 57. 249. 286 i Used in section 10. 8. 89. 61. 52. 166 i Used in sections 3 and 5.nitions 6. .

329 MMIX-PIPE: NAMES OF THE SECTIONS h Insert data~ b:o into the proper .

eld of data~ x:o. h Install default . h Insert special operands when resuming an interrupted operation 324 i Used in section 103. h Insert dummy instruction for page table emulation 302 i Used in section 298. checking for arithmetic exceptions if signed 282 i Used in section 281. h Install a new instruction into the tail position 304 i Used in section 301.

h Install register X as the destination. or insert an internal command and goto dispatch done if X is marginal 110 i Used in section 101. h Install the operand .elds in the cool block 100 i Used in section 75.

34. h Issue j pseudo-instructions to compute a page table entry 244 i Used in section 243. and 316. 90. and try to dispatch it if j < dispatch max 75 i Used in section 74. h Redirect the fetch if control changes at this inst 85 i Used in section 75. 184. 202. 309. 62. 45. return if done 383 i Used in section 381. 158. h Nullify the hottest instruction 147 i Used in section 146. 192. h Other cases for the fetch coroutine 298. 173. h Predict a branch outcome 151 i Used in section 85. 258 i Used in section 10. . and also in the I-cache if possible 291 i Used in section 288. h Magically do an I/O operation. 55. h Prepare for exceptional trip handler 308 i Used in section 307. h Make sure cool L and cool G are up to date 102 i Used in section 101. h Look up the address in the DT-cache. 32. 240. 190. 188. h Perform one cycle of the interrupt preparations 318 i Used in section 64. 42. h Internal prototypes 13. 169. h Resume an interrupted operation 323 i Used in section 322. 92. 254. h Read from memory into fetched 297 i Used in section 296. 182. 30. 250. h Print all of c's cache blocks 177 i Used in section 176. 200. 27. if cool~ loc is rT 372 i Used in section 322. 96. and also in the D-cache if possible 267 i Used in section 266. h Recover from incorrect branch prediction 160 i Used in section 155. h Local variables 12. 156. h Perform one machine cycle 64 i Used in section 10. h Load and write one byte 385 i Used in section 384. 72. h Read data into c~ inbuf and wait for the bus 223 i Used in section 222. return if done 382 i Used in section 381. h Read and store up to eight bytes.elds of the cool block 103 i Used in section 101. 24. 377 i Used in section 3. h Look at the head instruction. h Pass data to the next stage of the pipeline 134 i Used in section 130. 160. h Issue the cool instruction 81 i Used in section 75. 195. 124. h Read and store one byte. h Prepare memory arguments ma = M[a] and mb = M[b] if needed 380 i Used in section 372. h Prepare to emulate the page translation 309 i Used in section 310. 18. h Record the result of branch prediction 152 i Used in section 75. 301 i Used in section 288. h Load and write eight bytes 386 i Used in section 384. 171. 204. h Look up the address in the IT-cache. 308. h Restart the fetch coroutine 287 i Used in sections 85. 94. 186. 198.

rZZ) 321 i Used in section 318. h Set up the . rXX) 320 i Used in section 318. h Set cool~ b from register X 106 i Used in section 103. h Set resumption registers (rW. h Set resumption registers (rB. h Set cool~ z as an immediate wyde 109 i Used in section 103. h Set cool~ z from register Z 104 i Used in section 103. $255) 319 i Used in section 318. h Set resumption registers (rY. $255) or (rBB. rX) or (rWW. h Set cool~ y from register Y 105 i Used in section 103.MMIX-PIPE: NAMES OF THE SECTIONS 330 h Set cool~ b and/or cool~ ra from special register 108 i Used in section 103. h Set things up so that the results become known when they should 133 i Used in section 132. rZ) or (rYY.

rst phase of saving 338 i Used in section 337. h Set up the .

h Simulate later stages of an execution pipeline 135 i Used in section 125.rst phase of unsaving 334 i Used in section 332. h Simulate an action of the fetch coroutine 288 i Used in section 125. h Simulate the .

280. 279. 364. h Special cases for states in later stages 272. 273. h Special cases for states in the . 354. 311. 276. 370 i Used in section 135. 299.rst stage of an execution pipeline 130 i Used in section 125.

122. 363 i Used in section 130. 355 i Used in section 101. 332. 326. 312. 322. 360. 118. 119. 121. 310. 120.rst stage 266. h Special cases of instruction dispatch 117. h Start the S-cache . 337. 227. 347.

19. 387 i Used in section 3. 378. 31. 159. 73. 379. 157. 208. h Start up auxiliary coroutines to compute the page table entry 243 i Used in section 237. 21. 170. 43. h Swap cache blocks p and q 197 i Used in sections 196 and 205. h Try to put the contents of location write head~ addr into the D-cache 261 i Used in section 257. 251. 174. 191. 91. h Subroutines 14. 25. 95. h Type de. 97. 46. 189. 193. 35. 255. 381. 183.ller 225 i Used in section 224. 63. 199. 201. h Try to get the contents of location data~ z:o in the D-cache 274 i Used in section 273. 185. 196. 205. 187. 241. h Try to get the contents of location data~ z:o in the I-cache 300 i Used in section 298. 172. 203. 384. 93. 56. 33. 28.

h Update DT-cache usage and check the protection bits 269 i Used in sections 268. 37. 68. h Update rG 330 i Used in section 329. h Use cleanup on the cache blocks for data~ z:o. h Update IT-cache usage and check the protection bits 293 i Used in sections 292 and 295. h Undo data structures set prematurely in the cool block and break 123 i Used in section 75. h Update the page variables 239 i Used in section 329. 246. 270. 44. 17.nitions 11. 76. and 272. if any 368 i Used in section 364. h Wait for input data if necessary. 164. h Wait if there's an un. 206. 167. 371 i Used in sections 3 and 5. 23. 40. set state = 1 if it's there 131 i Used in section 130.

nished load ahead of us 357 i Used in section 356. .

if there's a cache hit 262 i Used in section 257. h Wait. h Zap the translation caches 358 i Used in section 356.331 MMIX-PIPE: NAMES OF THE SECTIONS h Wait till write bu er is empty 362 i Used in sections 361 and 364. h Write the dirty data of c~ outbuf and wait for the bus 216 i Used in section 215. h Zap the instruction and data caches 359 i Used in section 356. h Write the data into the D-cache and set state = 4. if necessary.h 5 i . until the instruction pointer is known 290 i Used in section 288. h Write directly from write head to memory 260 i Used in section 257. h mmix-pipe.

Introduction. This program simulates a simpli.332 MMIX-SIM 1.

It provides only a rudimentary terminal-oriented interface.) MMIX is simpli. (Hint. hint. Its main goal is to help people create and test MMIX programs for The Art of Computer Programming and related publications.ed version of the MMIX com- puter. but it has enough infrastructure to support a cool graphical user interface | which could be added by a motivated reader.

commands like SYNC and SYNCD and PREGO do nothing. Thus. except for a few special cases of TRAP that provide rudimentary input-output. Thus.z or RESUME 1 or LDVTS x. and there are no caches. Certain special registers remain constant: rF = 0. rTT = # 8000000600000000.y.  Simulation applies only to user programs. not to an operating system kernel.  No trap interrupts are implemented. rQ = 0.  All instructions take a . all addresses must be nonnegative. rV = # 369c200400000000. rT = # 8000000500000000. rK = # ffffffffffffffff.ed in the following ways:  There is no pipeline. \privileged" commands such as PUT rK.z are not allowed. instructions should be executed only from addresses in segment 0 (addresses less than # 2000000000000000).

where progfile is an output of the MMIXAL assembler... 2. all times are expressed in terms of  and . is a sequence of optional command-line arguments passed to the simulated program. and the usage counter rU increases by 1 for each instruction. you say `mmix h options i progfile args.. args. MUL takes 10.xed amount of time.. For example." The clock register rC increases by 232 for each  and 1 for each . To run this simulator. assuming UNIX conventions. '. LDB takes  +  . given by the rough estimates stated in the MMIX documentation. But the interval counter rI decreases by 1 for each instruction. \mems" and \oops. and h options i is any subset of the following:  -t<n> Trace each instruction the .

. tracing all eight exceptions. and in several other commands below. The option -x by itself is equivalent to -xff . (The notation <n> in this option. (The notation <x> in this option. : : : . It also shows the full details of SAVE and UNSAVE operations.  -r Trace details of the register stack.  -l<n> List the source line corresponding to each traced instruction.rst n times it is executed. # 40 for V (integer over ow). This option shows all the \hidden" loads and stores that occur when octabytes are written from the ring of local registers into memory. stands for a hexadecimal integer.)  -x<x> Trace each instruction that raises an arithmetic exception belonging to the given bit pattern. # 01 for X ( oating inexact). stands for a decimal integer. and in several other options and interactive commands below.) The exception bits are DVWIOUZX as they appear in rA. or read from memory into that ring. namely # 80 for D (integer divide check).

1999. Knuth: MMIXware. pp. 332-421. if one instruction came from line 10 of the source D.E. For example.lling gaps of length n or less.  Springer-Verlag Berlin Heidelberg 1999 . LNCS 1750.

333 MMIX-SIM: INTRODUCTION .

If <n> is omitted it is assumed to be 3.le and the next instruction to be traced came from line 12.  -s Show statistics of running time with each traced instruction.  -P Show the program pro. line 11 would be shown also. provided that n  1.

the frequency counts of each instruction that was executed) when the simulation ends.le (that is.  -L<n> List the source lines corresponding to each instruction that appears in the program pro.

le. .

lling gaps of length n or less.  -v Be verbose: Turn on all options.)  -q Be quiet: Cancel all previously speci. (More precisely. This option implies -P . the -v option is shorthand for -t9999999999 -r -s -l10 -L10 .

n). this number must be a power of 2.  -f<filename> Use the named .ed options. n).  -b<n> Set the bu er size of source lines to max(72.  -I Go into interactive mode when the simulated program halts or pauses for a breakpoint.  -c<n> Set the capacity of the local register ring to max(256.  -i Go into interactive mode before starting the simulation.

le for standard input to the simulated program.  -D<filename> Prepare the named .

While the program is being simulated. which summarizes the command-line options. even if -i and -I were not speci. instead of actually doing a simulation. The author recommends -t2 -l -L for initial oine debugging.  -? Print the \Usage " message. an interrupt signal (usually control-C) will cause the simulator to break and go into interactive mode after tracing the current instruction.le for use by other simulators.

ed on the command line. running times. . MMIX x 50.

' that begins the option). and the following operations are also available:  Simply typing h return i or n h return i to the mmix> prompt causes one MMIX instruction to be executed and traced. the user is prompted `mmix> ' and a variety of commands can be typed online.  c continues simulation until the program halts or reaches a breakpoint. (Actually the command is `c h return i'.MMIX-SIM: INTRODUCTION 334 3. In interactive mode. after printing the pro. but we won't bother to mention the h return i in the following description. then the user is prompted again. Any command-line option can be given in response to such a prompt (including the `.)  q quits (terminates the simulation).

le (if it was requested) and the .

We have already discussed the -s option on the command line. but a lot of statistics can . which causes these statistics to be printed automatically.nal statistics.  s prints out the current statistics (the clock times and the current instruction location).

ll up a lot of .

$<n><t> . : : : .le space. g<n><t> . global register. rZZ<t> . and M<x><t> will show the current value of a local register. rA<t> .  l<n><t> . so users may prefer to see the statistics only on demand. Here <t> speci. or memory location. special register. dynamically numbered register. rB<t> .

and if <t> is `" ' it will be given as a string of eight one-byte characters. the previous type will be repeated. then the command `! ' will show it in decimal and `. according to the numbering used in GET and PUT commands. for example.es the type of value to be displayed. perhaps in another format. if <t> is `. the command `l10# ' will show local register 10 in hexadecimal notation. if <t> is `# ' it will be given in hexadecimal. Just typing <t> by itself will repeat the most recently shown value. The `<t> ' in any of these commands can also have the form `=<value> '. where the value is a decimal or oating point or hexadecimal or string constant. ' it will be given in oating point notation. Register rA is equivalent to g22 . if <t> is `! '. ' will show it as a oating point number. but padded at the left with zeros if fewer than eight characters are speci. the default type is decimal. (The syntax rules for oating point constants appear in MMIX-ARITH. the value will be given in decimal notation. If <t> is empty. A string constant is treated as in the BYTE command of MMIXAL .

) This assigns a new value before displaying it. but a subsequent `+30! ' would merely display g201 through g230 in decimal notation. For example. g202 .1e3 ' sets local register 10 equal to 100. `M1000=-Inf ' sets M[# 1000]8 = # fff0000000000000. For example. l40 in hexadecimal notation. Marginal registers cannot be set to nonzero values. After `g200=3 ' a subsequent `+30 ' will set g201 . g230 equal to 3. The command `rI=250 ' sets the interval counter to 250. in format <t> . . l12 .ed. this will cause a break in simulation after 250 instructions have been executed. the representation of 1. `g250="ABCD". Memory addresses will advance by 8 instead of by 1. after `l10# ' a subsequent `+30 ' will show l11 .  +<n><t> shows the next n octabytes following the one most recently shown.#a ' sets global register 250 equal to # 000000414243440a. If <n> is empty. `l10=. the default value n = 1 is used. : : : . : : : . Special registers other than rI cannot be set to values disallowed by PUT .

regardless of its frequency count. meaning to break when the tetrabyte is read. Data_Segment . is added to all memory addresses in M . w . D . or # 6000000000000000. and/or executed. Pool_Segment . here [rwx] stands for any subset of the letters r . `brwx1000 ' causes a break just after any simulated instruction loads. # 2000000000000000.  b[rwx]<x> sets breakpoints at tetrabyte x. @ . sort of like a GO command.  t<x> says that the instruction in tetrabyte location x should always be traced.335 MMIX-SIM: INTRODUCTION  @<x> sets the address of the next tetrabyte to be simulated.  i<filename> reads a sequence of interactive commands from the speci. and/or x . S changes the \current segment" to either Text_Segment .  u<x> undoes the e ect of t<x> . or Stack_Segment . The current segment. P .  T . `b1000 ' undoes this breakpoint. For example. respectively. namely to # 0. stores. written.  B lists all current breakpoints and tracepoints. t . and b commands. # 4000000000000000. or appears in tetrabyte number # 1000. initially # 0. `bx1000 ' causes a break in the simulation just after the tetrabyte in # 1000 is executed. u .

ed .

therefore an included . Included lines that begin with % or i are ignored.le. ignoring blank lines. etc. one command per line. This feature can be used to set many breakpoints or to display a number of key registers.

le cannot include another .

 h (help) reminds the user of the available interactive commands. otherwise ignored. Included lines that begin with a blank space are reproduced in the standard output. .le.

Rudimentary I/O. Input and output are provided by the following ten prim- itive system calls:  Fopen (handle . mode ).MMIX-SIM: RUDIMENTARY I/O 336 4. TextWrite . BinaryWrite . BinaryRead . Here handle is a one-byte integer. An Fopen call associates handle with the external . name . BinaryReadWrite . and mode is one of the values TextRead . name is a string.

le called name and prepares to do input and/or output on that .

le. It returns 0 if the .

If mode is TextWrite or BinaryWrite . any previous contents of the named . otherwise returns the value 1.le was opened successfully.

If mode is TextRead or TextWrite . the .le are discarded.

and it is said to be a text .le consists of \lines" terminated by \newline" characters.

otherwise the .le.

le consists of uninterpreted bytes. and it is said to be a binary .

le. Text .

les and binary .

in such cases .les are essentially equivalent in cases where this simulator is hosted by an operating system derived from UNIX.

les can be written as text and read as binary or vice versa. text . But with other operating systems.

les and binary .

Within any MMIX program.les often have quite di erent representations. the newline character has byte code # 0a = 10. At the beginning of a program three handles have already been opened: The \standard input" . and certain characters with byte codes less than ' ' are forbidden in text.

the \standard output" .le StdIn (handle 0) has mode TextRead .

le StdOut (handle 1) has mode TextWrite . and the \standard error" .

the standard output and standard error . lines of standard input should be typed following a prompt that says `StdIn> '.le StdErr (handle 2) also has mode TextWrite . When this simulator is being run interactively.

les of the simulated program are intermixed with the output of the simulator itself. The input/output operations supported by this simulator can perhaps be understood most easily with reference to the standard library <stdio> that comes with the C language. because the conventions of C have been explained in hundreds of books. If we declare an array FILE .

le [256] and set .

.le [0] = stdin .

and .le [1] = stdout .

name . then the simulated system call Fopen (handle .le [2] = stderr . mode ) is essentially equivalent to the C expression (.

le [handle ]? (.

le [handle ] = freopen (name . . mode string [mode ].

le [handle ])): (.

le [handle ] = fopen (name . mode string [mode ])))? 0: 1. if we prede.

 Fclose (handle ). mode string [TextWrite ] = "w" . mode string [BinaryWrite ] = "wb" . mode string [BinaryRead ] = "rb" .ne the values mode string [TextRead ] = "r" . and mode string [BinaryReadWrite ] = "wb+" . If the given .

it is closed|no longer associated with any .le handle has been opened.

Again the result is 0 if successful. or 1 if the .le.

le was already closed or unclosable. The C equivalent is fclose (.

le [handle ]) ? 1 : 0 with the additional side e ect of setting .

.le [handle ] = .

337 MMIX-SIM: RUDIMENTARY I/O  Fread (handle . size ). The . bu er .

le handle should have been opened with mode TextRead . The next size characters are read into MMIX 's memory starting at address bu er . or BinaryReadWrite . If an error occurs. otherwise. the value 1 size is returned. BinaryRead . if the end of .

. where n is the number of characters successfully read and stored. otherwise the negative value n size is returned.le does not intervene. size . 0 is returned. 1. The statement fread (bu er .

in the absence of .le [handle ]) size has the equivalent e ect in C.

le errors. The . bu er . size ).  Fgets (handle .

or BinaryReadWrite . Characters are read into MMIX 's memory starting at address bu er . If an error or end of . BinaryRead .le handle should have been opened with mode TextRead . the next byte in memory is then set to zero. until either size 1 characters have been read and stored or a newline character has been read and stored.

le occurs before reading is complete. the memory contents are unde.

otherwise the number of characters successfully read and stored is returned.ned and the value 1 is returned. . The equivalent in C is fgets (bu er . size .

precede a newline. Up to size 1 wyde characters are read.  Fgetws (handle . using conventions of the ISO multibyte string extension (MSE). is approximately fgetws (bu er . and they are counted just like other characters. however.le [handle ]) ? strlen (bu er ) : 1 if we assume that no null characters were read in. bu er . null characters may. size ). except that it applies to wyde characters instead of one-byte characters. The C version. size . . a wyde newline is # 000a. This command is the same as Fgets .

The . size ). bu er .le [handle ]) ? wcslen (bu er ) : 1 where bu er now has type wchar t .  Fwrite (handle .

or BinaryReadWrite . The next size characters are written from MMIX 's memory starting at address bu er . where n is the number of characters successfully written. 0 is returned.le handle should have been opened with one of the modes TextWrite . otherwise the negative value n size is returned. 1. . size . The statement fwrite (bu er . If no error occurs. BinaryWrite .

le [handle ]) size together with ush (.

string ).le [handle ]) has the equivalent e ect in C. Fputs (handle . The .

One-byte characters are written from MMIX 's memory to the . BinaryWrite . or BinaryReadWrite .le handle should have been opened with one of the modes TextWrite .

up to but not including the . starting at address string .le.

. The C version is  fputs (string . The number of bytes written is returned. or 1 on error.rst byte equal to zero.

together with ush (.le [handle ])  0 ? strlen (string ) : 1.

.le [handle ]).

string ).MMIX-SIM:  RUDIMENTARY I/O 338 Fputws (handle . The .

Wyde characters are written from MMIX 's memory to the . BinaryWrite . or BinaryReadWrite .le handle should have been opened with one of the modes TextWrite .

up to but not including the .le. starting at address string .

rst wyde equal to zero. The number of wydes written is returned. or 1 on error. The C+MSE version is fputws (string . .

le [handle ])  0 ? wcslen (string ) : 1 together with ush (.

o set ). where string now has type wchar t .le [handle ]). The .  Fseek (handle .

le handle should have been opened with one of the modes BinaryRead . or BinaryReadWrite . BinaryWrite . This operation causes the next input or output operation to begin at o set bytes from the beginning of the .

if o set  0.le. or at o set 1 bytes before the end of the .

le. if o set < 0. o set = 0 \rewinds" the . (For example.

The C version is fseek (.) The result is 0 if successful. o set = 1 moves forward all the way to the end. or 1 if the stated positioning could not be done.le to its very beginning.

o set < 0 ? SEEK_END : SEEK_SET )? 1: 0. If a . o set < 0 ? o set + 1 : o set .le [handle ].

The . an Fseek command must be given when switching from input to output or from output to input.le in mode BinaryReadWrite is used for both reading and writing.  Ftell (handle ).

or BinaryReadWrite . BinaryWrite .le handle should have been opened with mode BinaryRead . This operation returns the current .

le position. or 1 if an error has occurred. measured in bytes from the beginning. In this case the C function ftell (.

they provide the necessary functionality for extremely complex input/output behavior. with the exception of the two administrative operations remove and rename . Fseek . For example. every function in the stdio library of C. The . Fread .le [handle ]) has exactly the same meaning. Although these ten operations are quite primitive. Fclose . and Ftell . can be implemented as a subroutine in terms of the six basic operations Fopen . Notice that the MMIX function calls are much more consistent than those in the C library. Fwrite .

The ten input/output operations of the previous section are invoked by TRAP commands with X = 0. the address of the second is placed in $255. the third. and Z = Handle . Y = Fopen or Fclose or : : : or Ftell . The returned value will be in $255 when the system call is . if present. the second. the second argument is placed in $255. if present. is always a size. The result returned is always nonnegative if the operation was successful. If there are two arguments.rst argument is always a handle. the second argument is M[$255]8 and the third argument is M[$255 + 8]8 . negative if an anomaly arose. If there are three arguments. is always an address. These common features make the functions reasonably easy to remember. 5.

) . (See the example below.nished.

and local register $1 points to the . and $255 is set to the numeric equivalent of Main . The user program starts at symbolic location Main . Local register $0 is initially set to the number of command-line arguments . At this time the global registers are initialized according to the GREG statements in the MMIXAL program.339 MMIX-SIM: RUDIMENTARY I/O 6.

and M[$0  3 + $1 + 8]8 is zero. the last such pointer is M[$0  3+$1]8 . Each command-line argument is a pointer to a string. (Register $1 will point to an octabyte in Pool_Segment . and the command-line strings will be in that segment too.rst such argument. which is always a pointer to the program name.) Location M[Pool_Segment ] will be the address of the .

and rL = 2. All open . If an instruction has been loaded into tetrabyte M[# 90]4 . and rR are initially zero.$255 and eventually says RESUME . rQ. a nonzero value indicates an anomalous exit. SETML $255. (The routine at # 90 can pass control to Main without increasing rL.Halt.rst unused octabyte of the pool segment. rB. as in the exit statement of C. this RESUME command will restore $255 and rB. rE. which is often symbolically written `TRAP 0.#F700.0 ' to make its intention clear. PUT rX. rJ. if it starts with the slightly tricky sequence PUT rW. rD. the simulator actually begins execution at # 90 instead of at Main . rI. PUT rB. $255. rP. A subroutine library loaded with the user program might need to initialize itself. $255.) 7. rM. Registers rA. rF. rH. in this case $255 holds the location of Main . The contents of $255 at that time are considered to be the value \returned" by the main program. The main program ends when MMIX executes the system call TRAP 0 . But the user program should not really count on the fact that rL is initially 2.

h>.les are closed when the program ends. . exit : void ( ). <stdlib.

Here. for example. is a complete program that copies a text .MMIX-SIM: RUDIMENTARY I/O 340 8.

le to the standard output. given the name of the .

8 openit: s=argv[1] STOU s.OpenIt GETA t.3 PBNN t.TextRead OCTA Buffer.#a.1 quit: exit(-1) TRAP 0.2F fputs("!\n".0 LOC (@+3)&-4 align to tetrabyte BYTE " filename".Arg1 copyit: TRAP 0.StdErr GETA t.stderr) TRAP 0.le to be copied.file[3]) TRAP 0.0 LDOU s.0.Arg0 fopen(argv[1].Arg0 LDA t.Fputs.argv) { CMP t.1F fputs("Usage: ".s fputs(argv[1].Halt.1F fputs("Can't open file ".0 BYTE "Usage: ".StdErr LDOU t.stderr) TRAP 0.Buf_Size LOC #200 main(argc.Fputs."r".Fputs.StdErr NEG t.1.stderr) TRAP 0.CopyIt if (no error) goto copyit GETA t.file[3]) .Fread.3 items=fread(buffer.0 LOC (@+3)&-4 align to tetrabyte BYTE "!".buf_size.stderr) JMP Quit goto quit BYTE "Can't open file ".Fputs.Fopen. * SAMPLE t argc argv s Buf_Size Buffer Arg0 Arg1 Main Quit 1H 2H OpenIt 1H 2H CopyIt PROGRAM: COPY A GIVEN FILE TO STANDARD OUTPUT IS $255 IS $0 IS $1 IS $2 IS 1000 LOC Data_Segment LOC @+Buf_Size GREG @ OCTA 0.#a.stderr) TRAP 0.argv.2 if (argc==2) goto openit PBZ t.stderr) TRAP 0.StdErr SET t.2F fputs(" filename\n".argv.StdErr GETA t.argc. It includes all necessary error checking.0 fputs(argv[0].0 LDA t.Fputs.

.1F readerr: fputs("Trouble r.Arg1 items=fwrite(buffer.StdOut t.stdout) 0.Arg1+8 t.buf_size.1.stdout) 0.stderr) Quit goto quit } "Trouble reading!".ReadErr if (ferror(file[3])) goto readerr t...1F trouble: fputs("Trouble w.items.1.Halt.stderr) Quit goto quit "Trouble writing StdOut!".Fwrite.Arg1 n=fwrite(buffer.Fwrite.CopyIt if (items >= buf_size) goto copyit t..StdOut t.EndIt if (items < buf_size) goto endit t.!".341 Trouble 1H EndIt ReadErr 1H MMIX-SIM: BN LDA TRAP PBNN GETA JMP BYTE INCL BN STO LDA TRAP BN TRAP GETA JMP BYTE RUDIMENTARY I/O t.0 .0 exit(0) t.0 t.#a.#a.Trouble if (n < items) goto trouble 0.!".Buf_Size t.

we de. Basics.MMIX-SIM: BASICS 342 9. To get started.

38. 39. 59. It uses subroutines from the MMIX-ARITH module. because nearly every computer available to the author at the time of writing (1999) was limited in that way. 10. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic. and 135. This code is used in section 141. The de. 16. 54. 64. h Type declarations 9 i  typedef enum false . f See also sections 10.ne a type that provides semantic sugar. 55. assuming only that type tetra represents unsigned 32-bit integers. true g bool .

nition of tetra given here should be changed. to agree with the de. if necessary.

The following hack makes this work with new compilers as well as the old standbys. We declare subroutines twice. l.nition in that module. = a monobyte = 11. =  for systems conforming to the LP-64 data model = typedef struct f tetra h. g octa . once with a prototype and once with the oldstyle C conventions. h Type declarations 9 i + typedef unsigned int tetra . = two tetrabytes makes one octabyte = typedef unsigned char byte . h Preprocessor macros 11 i  #ifdef __STDC__ #de.

ne ARGS (list ) list #else #de.

165. 137. . 154. 148. else printf ("%x" . 162. 45. 140. This code is used in section 141. g See also sections 13. 114. f if (o:h) printf ("%x%08x" . 15. 120. 50. h Subroutines 12 i  void print hex ARGS ((octa )). void print hex (o) octa o. 91. 117. o:h. 12.ne ARGS (list ) ( ) #endif See also sections 43 and 46. 143. 20. 47. and 166. o:l). 17. 82. 27. o:l). 42. 160. 83. This code is used in section 141. 26.

printf : int ( ).343 __STDC__ MMIX-SIM: .h>. BASICS . <stdio. Standard C.

val . = y ^ z = = y  s. extern octa omult ARGS ((octa y. = oating point x = y z = p extern octa froot ARGS ((octa . y )=z . octa y. = ag set by signed multiplication and division = extern int exceptions . oplus (y. = oating point x rem z = y rem z = extern octa . for example. octa z )). extern octa fplus ARGS ((octa y. = unsigned y z = extern octa incr ARGS ((octa y. = MOR or MXOR = extern octa bool mult ARGS ((octa y. = where a scanned constant ended = extern char next char . octa z. = oating point x = y  z = extern octa fmult ARGS ((octa y. = store short oat = extern tetra store sf ARGS ((octa x)). octa z. = auxiliary data = extern bool over ow . bool xor )). extern octa odiv ARGS ((octa x. 0  s  64 = extern octa shift left ARGS ((octa y. int s)). extern octa oplus ARGS ((octa y. = load short oat = extern octa load sf ARGS ((tetra z )). octa z )). z) returns the sum of octabytes y and z. = bits set by oating point operations = = the current rounding mode = extern int cur round . int s. = oating point x = z = extern octa fremstep ARGS ((octa y. x) = y  z = = signed x = y  z = extern octa signed omult ARGS ((octa y. aux = (x. = unsigned (aux . int )). octa z )). = unsigned y + Æ (Æ is signed) = extern octa oand ARGS ((octa y. octa z )). tetra z)). octa z)). octa z)). extern octa aux . = half of WDIF = extern tetra wyde di ARGS ((tetra y. = y  s. = neg one :h = neg one :l = 1 = extern octa neg one . octa z )). h Subroutines 12 i + = zero octa :h = zero octa :l = 0 = extern octa zero octa . = half of BDIF = extern tetra byte di ARGS ((tetra y. = x =  (z ) = extern int count bits ARGS ((tetra z )). = unsigned (x. int delta )). signed if :uns = extern octa shift right ARGS ((octa y. y ) mod z = = signed x = y=z = extern octa signed odiv ARGS ((octa y. Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes. = oating point x = y z = extern octa fdivide ARGS ((octa y. octa z)). octa z)). Division inputs the high half of a dividend in the global variable aux and returns the remainder in aux . = unsigned y + z = extern octa ominus ARGS ((octa y. octa z)). int uns )). int delta )).MMIX-SIM: BASICS 344 13. tetra z )).

= x = sim ? [y  z ()] : [y  z ()] = extern octa oatit ARGS ((octa z. octa eps . y > z . 1. = . int shrt )). octa z. y = z . int mode . y k z = extern int fepscomp ARGS ((octa y. or 2 if y < z . = 1. int sim )). 0. int mode )). octa z)). int unsgnd . = oating point x = round(z ) = extern int fcomp ARGS ((octa y.ntegerize ARGS ((octa z.

x to oat = extern octa .

= oat to .xit ARGS ((octa z. int mode )).

returns the type = . extern int scan const ARGS ((char buf )).x = = print octabyte as oating decimal = extern void print oat ARGS ((octa z )). = val = oating or integer constant.

#de. Here's a quick check to see if arithmetic is in trouble.345 MMIX-SIM: BASICS 14.

41. byte di : tetra ( ). count bits : int ( ). x10. fcomp : int ( ). oand : octa ( ). <stdlib. x27. load sf : octa (). x84. This code is used in section 141. h: tetra . ( ). exit ( 2). x4. x44. incr : octa ( ). <stdio. fepscomp : int ( ). x41. x13. neg one : octa . and 147. x912. fdivide : octa ( ). odiv : octa ( ). x10. fremstep : octa ( ). x29. x39. l: tetra . next char : char .h>. See also sections 18. g h Initialize everything 14 i  # if (shift left (neg one .h>. x6. x26. m). x10. cur round : int . x50. fprintf : int ( ). x25. "Panic: %s!\n" . 24. exceptions : int . 77. 32. 1):h 6= ffffffff) panic ("Incorrect implementation of type tetra" ). x93. froot : octa ( ). x30.ne panic (m) f fprintf (stderr . octa = struct . exit : void ( ). x46. . x32. x69.

ominus : octa ( ). x86.ntegerize : octa ( ). . x5.

MMIX-ARITH x print oat : void MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH . store sf : tetra ( ). aux : octa . wyde di : tetra ( ). x8. zero octa : octa . bool = enum . x54. <stdio.h>.xit : octa ( ). stderr : FILE . x24. MMIX-ARITH fplus : octa MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH oplus : octa ( ). x40. x10. ARGS = macro ( ). signed omult : octa ( ). x68. signed odiv : octa ( ). x88. shift right : octa ( ). x28. x89. omult : octa ( ). 4. scan const : int ( ). ( ). x12. bool mult : octa ( ). tetra = unsigned int . x4. x9. x69. x7. x7. val : octa . over ow : bool . oatit : octa ( ). fmult : octa ( ). x11. MMIX-ARITH x4. MMIX-ARITH x5. shift left : octa ( ).

Binary-to-decimal conversion is used when we want to see an octabyte as a signed integer.MMIX-SIM: 346 BASICS 15. #de. The identity b(an + b)=10c = ba=10cn + b((a mod 10)n + b)=10c is helpful here.

lo = lo =10. j ++ ) f = 64-bit division by 10 r = ((hi % 10)  16) + (lo  16). t = ((r % 10)  16) + (lo & # ffff). r. lo . dig [j ] = t % 10. g for (j = 0. if (lo  0 ^ hi  0) printf ("0" ). void print int (o) octa o. lo = ((r=10)  16) + (t=10). hi = hi =10. g g g for (j . dig [j ] + '0' ). = . g for ( . hi = hi . f register tetra hi = o:h.ne sign bit ((unsigned ) # 80000000) h Subroutines 12 i + void print int ARGS ((octa )). register int j . j ) printf ("%c" . hi . else lo = lo . lo = o:l. t. j  0. char dig [20]. else f if (hi & sign bit ) f printf ("-" ). if (lo  0) hi = hi . j ++ ) f dig [j ] = lo % 10.

are kept in a tree structure organized as a treap. By assigning time stamps at random. is the base address of 512 simulated tetrabytes. we maintain a tree structure that almost always is fairly well balanced. can be thought of as the time the node was inserted into the tree. it follows the conventions of an ordinary binary search tree. IEEE Symp. Each simulated tetrabyte has an associated frequency count and source . 540{546]. called stamp . and Seidel [Communications of the ACM 23 (1980). on Foundations of Computer Science 30 (1989). called loc . Each node of the treap has two keys: One. following ideas of Vuillemin. Aragon. all subnodes of a given node have a larger stamp . The other.347 MMIX-SIM: SIMULATED MEMORY 16. Chunks of simulated memory. Simulated memory. 2K bytes each. with all locations in the left subtree less than the loc of a node and all locations in the right subtree greater than that loc . 229{239.

= the tetrabyte of simulated memory = tetra freq . = the number of times it was obeyed as an instruction = unsigned char bkpt . h Type declarations 9 i + typedef struct f tetra tet .le reference. = breakpoint information for this tetrabyte = = source .

if known = unsigned char .le number.

= source line number. if known = unsigned short line no . g mem tetra . = location of the .le no . typedef struct mem node struct f octa loc .

rst of 512 simulated tetrabytes = tetra stamp .h>. printf : int ( ). 17. h: tetra . return p. panic = macro ( ). = the chunk of simulated tetrabytes = mem tetra dat [512]. p = (mem node ) calloc (1. calloc : void (). priority += # 9e3779b9. x10. x11. x10. = priority . x19. Section 6.h>. x10. octa = struct . p~ stamp g ARGS = macro ( ). mem node new mem ( ) f register mem node p. <stdio. x14. if (:p) panic ("Can't allocate any more memory" ). priority : tetra .4]. h Subroutines 12 i + mem node new mem ARGS ((void )). and it guarantees that no two stamps will be identical. g mem node . based on the idea of Fibonacci hashing [see Sorting and Searching. . right . x10. The stamp value is actually only pseudorandom. sizeof (mem node )). <stdlib.  b2 ( 1)c = = 32 l: tetra . = time stamp for treap balancing = = pointers to subtrees = struct mem node struct left . This is good enough for our purposes. tetra = unsigned int .

The mem . 121. 40. last mem = mem root . 110. since the simulator will be putting command-line information there before it runs the program. 56. Initially we start with a chunk for the pool segment. 144. 20. See also sections 25. and 151. h Global variables 19 i  tetra priority = 314159265.MMIX-SIM: SIMULATED MEMORY 348 18. 129. 61. h Initialize everything 14 i + mem root = new mem ( ). This code is used in section 141. 113. 139. 19. 52. 65. = the memory node most recently read or written = mem node last mem . 76. = pseudorandom time stamp counter = = root of the treap = mem node mem root . 48. 31. mem root~ loc :h = # 40000000.

nd routine .

nds a given tetrabyte in the simulated memory. inserting a new node into the treap if necessary. h Subroutines 12 i + mem tetra mem .

mem tetra mem .nd ARGS ((octa )).

key :h = addr :h. register int o set . register mem node p = last mem . key :l = addr :l & # fffff800. setting last mem and p to its location 21 i. octa key . return &p~ dat [o set  2]. o set = addr :l & # 7fc.nd (addr ) f g octa addr . if (p~ loc :l 6= key :l _ p~ loc :h 6= key :h) h Search for key in the treap. .

g for (p = mem root . octa = struct . mem node = struct . mem tetra = struct . h: tetra . r = &(q)~ right . h Fix up the subtrees of q 22 i. forming the left and right subtrees of the new node q. l = &p~ right . loc : octa . for (p = mem root . if ((key :l < p~ loc :l ^ key :h  p~ loc :h) _ key :h < p~ loc :h) p = p~ left . x11. while (p) f if ((key :l < p~ loc :l ^ key :h  p~ loc :h) _ key :h < p~ loc :h) r = p. x16. x16. p = q .349 MMIX-SIM: SIMULATED MEMORY 21. 22. p = r. x10. left : mem node . tetra = unsigned int . x 17. x16. p = l. l: tetra . else l = p. h Fix up the subtrees of q 22 i  f register mem node l = &(q)~ left . x10. ) f if (key :l  p~ loc :l ^ key :h  p~ loc :h) goto found . x16. stamp : tetra . q = &mem root . p. x16. r = &p~ left . setting last mem and p to its location 21 i  f register mem node q. . (q)~ loc = key . x10. The e ect will be as if key had been inserted before all of p's nodes. else p = p~ right . ARGS = macro ( ). g q = new mem ( ). x10. h Search for key in the treap. new mem : mem node ( ). g l = r = . At this point we want to split the binary search tree p into two parts based on the given key . p ^ p~ stamp < priority . p = q) f if ((key :l < p~ loc :l ^ key :h  p~ loc :h) _ key :h < p~ loc :h) q = &p~ left . found : last mem = p. dat : mem tetra [ ]. x16. else q = &p~ right . g This code is used in section 20. x16. g This code is used in section 21. right : mem node .

Loading an object .MMIX-SIM: LOADING AN OBJECT FILE 350 23.

we read in an MMIX object. using modi. To get the user's program into memory.le.

Complete details of mmo format appear in the program for MMIXAL.cations of the routines in the utility program MMOtype . Here we need to de. a reader who hopes to understand this section ought to at least skim that documentation.

#de.ne only the basic constants used for interpretation.

ne mm # 98 = the escape code of mmo format = = the quotation lopcode = #de.

ne lop quote # 0 = the location lopcode = #de.

ne lop loc # 1 = the skip lopcode = #de.

ne lop skip # 2 = the octabyte-.

x lopcode = #de.

ne lop .

xo # 3 = the relative-.

x lopcode = #de.

ne lop .

xr # 4 = extended relative-.

x lopcode = #de.

ne lop .

xrx # 5 = the .

le name lopcode = #de.

ne lop .

le # 6 = the .

le position lopcode = #de.

ne lop line # 7 = the special hook lopcode = #de.

ne lop spec # 8 = the preamble lopcode = #de.

ne lop pre # 9 = the postamble lopcode = #de.

ne lop post # a = the symbol table lopcode = #de.

ne lop stab # b = the end-it-all lopcode = #de.

) h Initialize everything 14 i + mmo .ne lop end # c 24. but such enhancements are left to the interested reader. (A more ambitious simulator could implement MMIXAL -style expressions for interactive debugging. We do not load the symbol table.

le = fopen (mmo .

le name . "rb" ). if (:mmo .

le ) f register char alt name = (char ) calloc (strlen (mmo .

mmo" .le name ) + 5. sizeof (char )). "%s. mmo . sprintf (alt name .

mmo .le name ).

le = fopen (alt name . if (:mmo . "rb" ).

le ) f fprintf (stderr . "Can't open the object file %s or %s!\n" . mmo .

alt name ). h Global variables 19 i + = the input . g g free (alt name ). 25. byte count = 0.le name . exit ( 3).

le = FILE mmo .

= have we encountered lop post ? = = index of the next-to-be-read byte = int byte count . byte buf [4].le . = the most recently read bytes = int yzbytes . int postamble . = the two least signi.

cant bytes = int delta . = di erence for relative .

= buf bytes packed big-endianwise = .xup = tetra tet .

351 MMIX-SIM: LOADING AN OBJECT FILE 26. The tetrabytes of an mmo .

#de.le are stored in friendly big-endian fashion. but this program is supposed to work also on computers that are little-endian. Therefore we read four successive bytes and pack them into a tetrabyte. instead of reading a single tetrabyte.

"Bad object file! (Try running MMOtype. exit ( 4). 1. mmo . void read tet ( ) f g if (fread (buf . 4.ne mmo err f fprintf (stderr . g h Subroutines 12 i + void read tet ARGS ((void )).)\n" ).

if (:byte count ) read tet ( ). byte read byte ( ) f g register byte b. h Load the preamble 28 i  = read the . h Subroutines 12 i + byte read byte ARGS ((void )).le ) 6= 4) mmo err . yzbytes = (buf [2]  8) + buf [3]. tet = (((buf [0]  8) + buf [1])  16) + yzbytes . byte count = (byte count + 1) & 3. return b. b = buf [byte count ]. 28. 27.

= .rst tetrabyte of input = read tet ( ). if (buf [0] 6= mm _ buf [1] 6= lop pre ) mmo err . if (ybyte 6= 1) mmo err . if (zbyte  0) obj time = # ffffffff. else f = zbyte 1.

ARGS = macro ( ). j : register int . <stdlib.h>. mmo .h>. <stdlib. fread : size t ( ). for ( . free : void ( ). j ) read tet ( ). x62. exit : void ( ). <stdio. fprintf : int ( ). <stdio.le creation time = read tet ( ). obj time = tet .h>. FILE . <stdio.h>.h>.h>. x10. calloc : void (). byte = unsigned char . j > 0. fopen : FILE ( ). x11. <stdio.h>. <stdlib. j g This code is used in section 32.

stderr : FILE .le name = macro. x31. tetra = unsigned int . strlen : int ( ).h>. zbyte = macro. <stdio. . <stdio.h>. x33. x142. ybyte = macro.h>. <string. sprintf : int ( ). obj time : tetra . x10. x33.

continue . loop : if (buf [0]  mm ) switch (buf [1]) f 6 1) mmo err .MMIX-SIM: LOADING AN OBJECT FILE 352 29. We load not only the current location but also the current . In a normal situation. h Load the next item 29 i  f read tet ( ). case lop quote : if (yzbytes = read tet ( ). h Cases for lopcodes in the main loop 33 i case lop post : postamble = 1. if (ybyte _ zbyte < 32) mmo err . break . g h Load tet as a normal item 30 i. g This code is used in section 32. 30. the newly read tetrabyte is simply supposed to be loaded into the current location. default : mmo err .

if cur line is nonzero and cur loc belongs to segment 0.le position. #de.

val ) ll = mem .ne mmo load (loc .

if (cur line ) f ll~ . tet ). ll~ tet = val h Load tet as a normal item 30 i  f mmo load (cur loc .nd (loc ).

le no = cur .

ll~ line no = cur line . 31. = the most recently selected . 4). g This code is used in section 29. cur loc :l &= 4. h Global variables 19 i + = the current location = octa cur loc .le . g cur loc = incr (cur loc . cur line ++ .

le number = int cur .

= the current position in cur .le = 1.

le . = an octabyte of temporary interest = = when the object . octa tmp . if nonzero = int cur line .

cur .le was created = tetra obj time . h Initialize everything 14 i + cur loc :h = cur loc :l = 0. 32.

h Load the preamble 28 i. h Load the postamble 37 i. fclose (mmo . do h Load the next item 29 i while (:postamble ).le = 1. cur line = 0.

.le ). cur line = 0.

which falls through to the normal case after reading an extra tetrabyte. We have already implemented lop quote . #de. Now let's consider the other lopcodes in turn.353 MMIX-SIM: LOADING AN OBJECT FILE 33.

ne ybyte buf [2] = the next-to-least signi.

cant byte = #de.

ne zbyte buf [3] = the least signi.

cant byte = h Cases for lopcodes in the main loop 33 i  case lop loc : if (zbyte  2) f read tet ( ). Fixups load information out of order. 35. cur loc :l = tet . when future references have been resolved. read tet ( ). continue . The current . g else if (zbyte  1) cur loc :h = ybyte  24. 34. This code is used in section 29. yzbytes ). cur loc :h = (ybyte  24) + tet . else mmo err . case lop skip : cur loc = incr (cur loc . continue . See also sections 34. and 36.

le name and line number are not considered relevant. h Cases for lopcodes in the main loop 33 i + case lop .

cur loc :l). continue . cur loc :h). 4). mmo load (tmp . else mmo err . g else if (zbyte  1) tmp :h = ybyte  24.xo : if (zbyte  2) f read tet ( ). read tet ( ). tmp :l = tet . case lop . mmo load (incr (tmp . tmp :h = (ybyte  24) + tet .

goto .xr : delta = yzbytes .

xr . case lop .

if (j 6= 16 ^ j 6= 24) mmo err . . if (delta & # fe000000) mmo err . read tet ( ). delta = tet .xrx : j = yzbytes .

(delta  # 1000000 ? (delta & # ffffff) (1  j ) : delta )  2). continue .xr : tmp = incr (cur loc . delta ). x25. buf : byte [ ]. . delta : int . fclose : int ( ). x25. mmo load (tmp . <stdio.h>.

l: tetra . lop . x16. MMIX-ARITH x6. j : register int . x10. h: tetra . incr : octa ( ). x 62. ll : register mem tetra . x62.le no : unsigned char . line no : unsigned short . x16. x 20. x10.

xo = # 3. lop . x23.

lop .xr = # 4. x23.

x23. lop quote = # 0. x23. lop post = # a. lop skip = # 2. lop loc = # 1. x23. x23.xrx = # 5. x23. mem .

nd : mem tetra (). mm = # 98. mmo err = macro. mmo . x23. x26.

x10. x26. x25. tet : tetra . . x25. x10. yzbytes : int .le : FILE . tetra = unsigned int . x25. x16. x25. postamble : int . tet : tetra . read tet : void ( ). octa = struct .

MMIX-SIM: 354 LOADING AN OBJECT FILE 35. The space for .

le names isn't allocated until we are sure we need it. h Cases for lopcodes in the main loop 33 i + case lop .

le : if (.

le info [ybyte ]:name ) f if (zbyte ) mmo err . cur .

.le = ybyte . g else f if (:zbyte ) mmo err .

1). if (:.le info [ybyte ]:name = (char ) calloc (4  zbyte + 1.

"No room to store the file name!\n" ). exit ( 5). g cur .le info [ybyte ]:name ) f fprintf (stderr .

for (j = zbyte . p = .le = ybyte .

j > 0. continue . p = buf [0]. (p + 1) = buf [1]. g g cur line = 0. j . (p + 2) = buf [2]. (p + 3) = buf [3].le info [ybyte ]:name . case lop line : if (cur . p += 4) f read tet ( ).

h Cases for lopcodes in the main loop 33 i + case lop spec : while (1) f read tet ( ). if (buf [1] 6= lop quote _ yzbytes = read tet ( ). the . Special bytes are ignored (at least for now). cur line = yzbytes .le < 0) mmo err . the ll pointer in the following loop stays in the same chunk (namely. if (buf [0]  mm ) f 6 1) goto loop . continue . Since a chunk of memory holds 512 tetrabytes. 36. g =  end of special data = g 37.

aux :l = # 18.rst chunk of segment 3. also known as Stack_Segment ). h Load the postamble 37 i  # aux :h = 60000000. ll = mem .

nd (aux ). inst ptr :l = (ll 1)~ tet . = Main = inst ptr :h = (ll 2)~ tet . = we will UNSAVE from here. to get going = This code is used in section 32. 12  8). . = and $1 = Pool_Segment + 8 = G = zbyte . ll~ tet = tet . j ++ . g [255] = incr (aux . (ll + 2  12)~ tet = G  24. aux :l += 4) read tet ( ). L = 0. = this will ultimately set rL = 2 = (ll 5)~ tet = argc . (ll 1)~ tet = 2. j < 256 + 256. (ll 3)~ tet = # 8. for (j = G + G. ll ++ . = and $ = argc = (ll 4)~ tet = # 40000000.

Loading and printing source lines. The loaded program generally contains cross references to the lines of symbolic source .355 MMIX-SIM: LOADING AND PRINTING SOURCE LINES 38.

les. Source . so that the context of each instruction can be understood. The following sections of this program make such information available when it is desired.

le data is kept in a .

= name of source .le node structure: h Type declarations 9 i + typedef struct f char name .

le = = number of lines in the .

long map . = pointer to map of .le = int line count .

le positions = g .

39.le node . In partial preparation for the day when source .

we de.les are in Unicode.

ne a type Char for the source characters. = bytes that will become wydes some day = 40. h Global variables 19 i + = data about each source . h Type declarations 9 i + typedef char Char .

le = .

le node .

As in MMIXAL . buf : byte [ ]. Char bu er . but the user is allowed to increase the limit. we prefer source lines of length 72 characters or less. 41. <stdlib. aux : octa . argc : int . bu er = (Char ) calloc (buf size + 1.le info [256].) h Initialize everything 14 i + if (buf size < 72) buf size = 72.h>. sizeof (Char )). x141. cur . calloc : void (). x25. (Longer lines will silently be truncated to the bu er size when the simulator lists them. MMIX-ARITH x4. = size of bu er for source lines = int buf size .

lop . x61. x62. inst ptr : octa . x31. exit : void ( ). h: tetra .le : int . x10. <stdlib. x 62. incr : octa ( ). loop : label. x31. G: register int . x29. <stdio. l: tetra . x75. fprintf : int ( ). x10. ll : register mem tetra . x76.h>. cur line : int . L: register int . g: octa [ ]. j : register int . x75. MMIX-ARITH x6.h>.

x23. mem . lop quote = # 0.le = # 6. x23. lop spec = # 8. x23. x23. lop line = # 7.

mm = # 98. UNSAVE = # fb. x26. rL = 20. zbyte = macro. ybyte = macro.h>. x33. read tet : void ( ). x62. x 20. x55. x54. x25. p: register char . stderr : FILE . x33. tet : tetra . <stdio. x25. x23. . yzbytes : int .nd : mem tetra (). mmo err = macro. x26.

MMIX-SIM: LOADING AND PRINTING SOURCE LINES 356 42. The .

rst time we are called upon to list a line from a given source .

le. Source . we make a map of starting locations for each line.

h Check if the source . l. register long p.les should contain at most 65535 lines. h Subroutines 12 i + void make map ARGS ((void )). register int k. We assume that they contain no null characters. void make map ( ) f long map [65536].

le has been modi.

l < 65536 ^ :feof (src .ed 44 i. for (l = 1.

le ). l ++ ) f map [l] = ftell (src .

src . buf size .le ). loop : if (:fgets (bu er .

le )) break . if (bu er [strlen (bu er ) 1] 6= '\n' ) goto loop . g g .

le info [cur .

.le ]:line count = l.

le info [cur .

le ]:map = p = (long ) calloc (l. k ++ ) p[k] = map [k]. k < l. We want to warn the user if the source . for (k = 1. if (:p) panic ("No room for a source-line map" ). sizeof (long )). 43.

le has changed since the object .

so we use the UNIX system function stat .h> 44. The standard C library doesn't provide the information we need.le was written. h Check if the source . in hopes that other operating systems provide a similar way to do the job. h Preprocessor macros 11 i + #include <sys/types.h> #include <sys/stat.

le has been modi.

ed 44 i  f struct stat stat buf . if (stat (.

le info [cur .

&stat buf )  0) if ((tetra ) stat buf :st mtime > obj time ) fprintf (stderr .le ]:name . "Warning: File %s was modified. it may not match the program!\n" . .

le info [cur .

45.le ]:name ). If a . preceded by 12 characters containing the line number. Source lines are listed by the print line routine. g This code is used in section 42.

f . void print line (k) int k. nothing is printed|not even an error message.le error occurs. h Subroutines 12 i + void print line ARGS ((int )). the absence of listed data is itself a message.

357 g MMIX-SIM: LOADING AND PRINTING SOURCE LINES char buf [11]. if (k  .

le info [cur .

le ]:line count ) return . if (fseek (src .

le . .

le info [cur .

SEEK_SET ) 6= 0) return . buf size . if (:fgets (bu er .le ]:map [k]. src .

le )) return . "%d: " . line shown = true . h Preprocessor macros 11 i + #ifndef SEEK_SET #de. bu er ).6s %s" . 46. printf ("line %. k). buf . sprintf (buf . if (bu er [strlen (bu er ) 1] 6= '\n' ) printf ("\n" ).

ne SEEK_SET 0 = Set .

le pointer to "o set" = #endif 47. The show line routine is called when we want to output line cur line of source .

le number cur .

Its job is primarily to maintain continuity. assuming that cur line 6= 0.le . by opening or reopening the src .

le if the source .

because the desired line has already been printed.le changes. Sometimes no output is necessary. h Subroutines 12 i + void show line ARGS ((void )). void show line ( ) f register int k. and by connecting the previously output lines to the new one. if (shown .

le 6= cur .

le ) h Prepare to list lines from a new source .

buf size : int . k  cur line . <stdlib.le 49 i = already shown = else if (shown line  cur line ) return . shown line = cur line . bu er : Char . x40.. g else for (k = shown line + 1. k ++ ) print line (k). if (cur line > shown line + gap + 1 _ cur line < shown line ) f if (shown line > 0) if (cur line < shown line ) printf ("--------\n" ).\n" )..h>. = indicate the gap = print line (cur line ). x40. cur . = macro ( ). x11.  indicate upward move = = g ARGS else printf (" . calloc : void ().

x31.h>. .h>. <stdio. cur line : int . feof : int ( ). fgets : char ().le : int . <stdio. x31.

le info : .

<stdio. x48. x38. map : long . fseek : int ( ). x31. line count : int .h>. x40.h>.h>. gap : int .h>. x38. <stdio. line shown : bool . panic = macro ( ). x38. shown . name : char . <stdio. printf : int ( ). fprintf : int ( ). x14.le node [ ]. obj time : tetra . SEEK_SET = macro. x48.h>. ftell : long ( ). <stdio. <stdio.

h>.le : int . x48. <stdio. x48. src . sprintf : int ( ). shown line : int .

stat : int ( ). <string.h>. stderr : FILE . st mtime : time t . x10.le : FILE .h>.h . . true = 1. <stdio. strlen : int ( ). tetra = unsigned int . x9. x48. <sys/stat. <sys/stat.h>.

h Global variables 19 i + = the currently open source .MMIX-SIM: 358 LOADING AND PRINTING SOURCE LINES 48.

le = FILE src .

le . = index of the most recently listed .

le = int shown .

le = 1. = the line most recently listed in shown .

= the gap when printing .le = int shown line . int gap . = minimum gap between consecutively listed source lines = = did we list anything recently? = bool line shown . = are we listing source lines? = bool showing source .

nal frequencies = int pro.

le gap . = showing source within .

nal frequencies = bool pro.

h Prepare to list lines from a new source .le showing source . 49.

le 49 i  f if (:src .

le ) src .

le = fopen (.

le info [cur .

else freopen (.le ]:name . "r" ).

le info [cur .

src .le ]:name . "r" .

le ). if (:src .

"Warning: I can't open file %s. source .le ) f fprintf (stderr .

le info [cur .

\n" . showing source = false . . return . g printf ("\"%s\"\n" . listing omitted.le ]:name ).

le info [cur .

le ]:name ). shown .

le = cur .

shown line = 0.le . if (:.

le info [cur .

if (p~ right ) print freqs (p~ right ). Here is a simple application of show line . The subtree is traversed in symmetric order. therefore the frequencies appear in increasing order of the instruction locations. for (j = 0. void print freqs (p) mem node p. g This code is used in section 47. h Subroutines 12 i + void print freqs ARGS ((mem node )). It is a recursive routine that prints the frequency counts of all instructions that occur in a given subtree of the simulated memory and that were executed at least once. . j ++ ) if (p~ dat [j ]:freq ) h Print frequency data for location p~ loc + 4  j 51 i. octa cur loc . f g register int j . if (p~ left ) print freqs (p~ left ). j < 512. 50.le ]:map ) make map ( ).

.. h Print frequency data for location p~ loc + 4  j 51 i  f cur loc = incr (p~ loc . An ellipsis (. if (showing source ^ p~ dat [j ]:line no ) f g cur .359 MMIX-SIM: LOADING AND PRINTING SOURCE LINES 51. 4  j ). unless source line information intervenes. ) is printed between frequency data for nonconsecutive instructions.

le = p~ dat [j ]:.

\n" ). .le no . cur line = p~ dat [j ]:line no . if (pro. if (line shown ) goto loc implied ... line shown = false . show line ( ). if (cur loc :l 6= implied loc :l _ cur loc :h 6= implied loc :h) 0.

le started ) printf (" loc implied : printf ("%10d. info [p~ dat [j ]:tet  24]:name ). cur loc :h. %08x%08x: %08x (%s)\n" . implied loc = incr (cur loc . 4). pro. p~ dat [j ]:tet . cur loc :l. p~ dat [j ]:freq .

h Global variables 19 i + = location following the last shown frequency data = octa implied loc . = have we printed at least one frequency count? = bool pro.le started = true . 52. g This code is used in section 50.

shown . 53.le started . h Print all the frequency counts 53 i  f printf ("\nProgram profile:\n" ).

le = cur .

gap = pro. shown line = cur line = 0.le = 1.

le gap . showing source = pro.

cur . x11. implied loc = neg one . ARGS = macro ( ). x9. bool = enum . print freqs (mem root ). g This code is used in section 141.le showing source .

dat : mem tetra [ ]. x31. x16. <stdio.le : int . cur line : int . x9. . x31.h>. false = 0. FILE .

le info : .

. x40.le node [ ].

<stdio. octa = struct .h>. <stdio. x16. x19. tet : tetra . printf : int ( ). map : long . make map : void ( ).h>. neg one : octa . x16. incr : octa ( ). x16. MMIX-ARITH x4. info : op info [ ]. x16. stderr : FILE . x16. x16. mem node = struct . x16. x16. x38. freq : tetra . <stdio. x10.le no : unsigned char . left : mem node . MMIX-ARITH x6.h>. <stdio. mem root : mem node . line no : unsigned short . show line : void ( ). h: tetra . x9.h>. freopen : FILE ( ). fprintf : int ( ). name : char . true = 1. x38. x10. x10. x65. right : mem node . <stdio. fopen : FILE ( ). loc : octa . l: tetra . x47.h>. x42. .

SFLOTU . LDTU . ZSZ . ORN . STW . "rV" . CMPI . STSF . PRELD . ZSEVI . "rT" . We also need to enumerate the special names for special registers. rA . rW . "rG" . LDVTS . ZSP . PBNNB . NEG . SUBUI . LDB . STSFI . BNZ . PBNN . PBPB . LDHT . BNB . "rD" . STOU . ORL . "rJ" . FUN . PREGOI . SLU . PBZB . NXOR . STCO . BDIFI . ZSNN . TDIFI . LDOUI . STHT . SRI . BODB . SL . LDVTSI . LDBI . "rYY" . "rA" . BPB . SFLOT . STBUI . FDIV . CMPUI . ORH . DIV . GOI . rF . ZSODI . ANDNI . BNPB . rV . "rK" . rJ . CSPI . "rH" . SYNC . BP . CSODI . ZSNI . FCMPE . SWYM . SRU . "rN" . rY . h Type declarations 9 i + typedef enum f rB . CSP . "rZ" . FEQLE . rXX . PUTI . XOR . ZSZI . "rE" . INCMH . ANDNH . ORI . "rF" . LDOI . PBNZ . CMPU . LDBUI . so we might as well enumerate them now. STHTI . CSNNI . SR . "rW" . DIVI . SYNCIDI . 56. "rU" . WDIF . "rR" . FCMP . PUT . STOUI . WDIFI . rO . "rTT" . VIIIADDU . FMUL . "rO" . PBOD . MUX . LDW . rM . STCOI . BN . PBNP . POP . PRELDI . MXOR . rP . PUSHGO . LDSFI . SFLOTUI . SFLOTI . BNN . ODIFI . CSWAPI . PUSHJ . STWUI . LDO . ANDNML . ZSOD . SADDI . STTU . NEGI . NEGU . GETAB . LDWI . CSZ . "rWW" . FLOTU . FADD . STWI . SYNCD . LDOU . FSUB . SETL . PBZ . rZ . ZSNZ . rC . FLOTI . 55. MOR . CSEV . rBB . SETML . CSNI . PREST . MXORI . CSEVI . DIVUI . INCH . IVADDUI . XORI . LDUNCI . PBEV . ANDNMH . DIVU . AND . LDTI . MULU . "rS" . GETA . SLUI . RESUME . LDWU . LDTUI . rWW . MUXI . ORNI . PUSHGOI . STO . STUNC . ZSNNI . PBN . SLI . BNP . "rP" . CSNZI . PBNPB . FIX . rG . LDSF . CSNPI . MORI . BZ . rZZ g special reg . FSQRT . STWU . SYNCID . ADDUI . SETH . rK . PREGO . rD . FLOT . "rXX" . ANDI . FUNE . CSNZ . CMP . LDUNC . TRIP g mmix opcode . JMPB . OR . "rY" . "rX" . BEVB . ZSNPI . CSOD . rL . rX . PBODB . NANDI . CSN .MMIX-SIM: LISTS 360 54. rU . STBI . PUSHJB . TDIF . ODIF . STB . . rR . BNZB . ANDNL . PBNB . h Type declarations 9 i + typedef enum f TRAP . ORMH . BZB . NXORI . CSNP . "rBB" . PRESTI . GET . rS . Lists. XVIADDU . STTI . MUL . BEV . "rI" . BDIF . IVADDU . "rQ" . rYY . STTUI . FINT . NEGUI . ANDN . STT . ADD . FREM . SUB . INCML . BNNB . rQ . ORML . "rM" . IIADDUI . "rC" . LDT . STOI . CSNN . XVIADDUI . h Global variables 19 i + char special name [32] = f"rB" . FLOTUI . "rZZ" g. MULUI . NORI . BOD . ADDI . STBU . rTT . ZSNZI . This simulator needs to deal with 256 di erent opcodes. FEQL . STUNCI . LDBU . rT . FIXU . INCL . ZSEV . "rL" . ADDU . NAND . SRUI . rI . PBNZB . LDHTI . UNSAVE . NOR . MULI . SADD . GO . SYNCDI . ZSPI . CSZI . ZSN . ZSNP . SUBI . LDWUI . rE . PBP . SETMH . CSWAP . VIIIADDUI . IIADDU . SAVE . JMP . SUBU . rH . PBEVB . rN .

Here are the bit codes for arithmetic exceptions. are de. These codes. except H_BIT .361 MMIX-SIM: LISTS 57.

#de.ned also in MMIX-ARITH.

ne X_BIT (1  8) = oating inexact = #de.

ne Z_BIT (1  9) = oating division by zero = #de.

ne U_BIT (1  10) = oating under ow = #de.

ne O_BIT (1  11) = oating over ow = #de.

ne I_BIT (1  12) = oating invalid operation = #de.

ne W_BIT (1  13) = oat-to-.

x over ow = #de.

ne V_BIT (1  14) = integer over ow = #de.

ne D_BIT (1  15) = integer divide check = #de.

The bkpt .ne H_BIT (1  16) = trip = 58.

eld associated with each tetrabyte of memory has bits associated with forced tracing and/or breaking for reading. and/or execution. #de. writing.

ne trace bit (1  3) #de.

ne read bit (1  2) #de.

ne write bit (1  1) #de.

To complete our lists of lists. #de.ne exec bit (1  0) 59. we enumerate the rudimentary operating system calls that are built in to MMIXAL .

Fread . Fputws . Fputs . Ftell g sys call . Fgets . Fopen . bkpt : unsigned char . Fgetws . Fseek . Fwrite . .ne max sys call Ftell h Type declarations 9 i + typedef enum f Halt . Fclose . x16.

MMIX-SIM: THE MAIN LOOP 362 60. The main loop. h Install operand . exc = 0. Now let's plunge in to the guts of the simulator. inst = g[rX ]:l. 4). f = info [op ]: ags . old L = L. op = inst  24. else h Fetch the next instruction 63 i. h Perform one instruction 60 i  f if (resuming ) loc = incr (inst ptr . if (f & rel addr bit ) h Convert relative address to absolute address 70 i. the master switch that controls most of the action. yy = (inst  8) & # ff. xx = (inst  16) & # ff. x = y = z = a = b = zero octa . yz = inst & # ffff. zz = inst & # ff.

= are we in interactive mode? = = should we go into interactive mode? = bool interact after break . a. 61. if requested 128 i. g This code is used in section 141. if (loc :h  # 20000000) goto privileged inst . = did the program come to a halt? = bool breakpoint . . b. = ropcode of a resumed instruction = = the style of oating point rounding just used = int round mode . ma . bool resuming . g h Check for trip interrupt 122 i. mb . tetra inst . if (f & X is dest bit ) h Install register X as the destination. y. h Update the clocks 127 i. = exceptions raised by the current instruction = = exception bits that cause tracing = int tracing exceptions . z. w = oplus (y. Operands x and a are usually destinations (results). bool interacting . = should we trace the current instruction? = = should we trace details of the register stack? = bool stack tracing . h Global variables 19 i + octa w. = are we resuming an interrupted instruction? = bool halted . z ). bool tripping . and/or b. computed from the source operands y. = did the last branch instruction guess correctly? = = each instruction should be traced this many times = tetra trace threshold . h Trace the current instruction. int rop . = location of the current instruction = = location of the next instruction = octa inst ptr . = should we pause after the current instruction? = bool tracing . = the current instruction = = value of L before the current instruction = int old L . octa loc .elds 71 i. = are we about to go to a trip handler? = bool good . z . int exc . switch (op ) f h Cases for individual MMIX instructions 84 i. x. adjusting the register stack if necessary 80 i. if (resuming ^ op = 6 RESUME ) resuming = false . = operands = = destination = octa x ptr .

zz .363 MMIX-SIM: THE MAIN LOOP 62. yz . int xx . yy . h Local registers 62 i  register register register register register register = operation code of the current instruction = mmix opcode op . = operand .

h Fetch the next instruction 63 i  f loc = inst ptr . = miscellaneous indices = = current place in the simulated memory = mem tetra ll . k. = current place in a string = See also section 75. This code is used in section 141. char p. ll = mem . 63. j.elds of the current instruction = tetra f . = properties of the current op = int i.

inst = ll~ tet . cur .nd (loc ).

le = ll~ .

64.le no . g op info . Much of the simulation is table-driven. bool = enum . 4). unsigned char mems . ll~ freq ++ . based on a static data structure called the op info for each operation code. = how many  it costs = = how it is appears when traced = char trace format . if (ll~ bkpt & exec bit ) breakpoint = true . g This code is used in section 60. = symbolic name of an opcode = unsigned char ags . h Type declarations 9 i + typedef struct f char name . = how many  it costs = unsigned char oops . x9. cur line = ll~ line no . x16.  trace threshold ). = its instruction format = = its special register input = unsigned char third operand . bkpt : unsigned char . cur . tracing = breakpoint _ (ll~ bkpt & trace bit ) _ (ll~ freq inst ptr = incr (inst ptr .

false = 0. cur line : int . x58. x9. x31. exec bit = macro.le : int . x31. .

info : op info [ ]. x16. mem . x76. MMIX-ARITH x6. L: register int . h: tetra . x10. x16. x75. incr : octa ( ). l: tetra .le no : unsigned char . line no : unsigned short . x65. x16. g: octa [ ]. freq : tetra . x10.

x58. .nd : mem tetra (). x16. mem tetra = struct . octa = struct . x10. oplus : octa ( ). x54. x16. tet : tetra . true = 1. privileged inst : label. x9. = # f9. rX = 25. trace bit = macro. tetra = unsigned int . MMIX-ARITH x5. mmix opcode = enum . x10. x65. rel addr bit = # 40. RESUME MMIX-ARITH x 4. x55. x65. zero octa : octa . x 20. x107. x54. X is dest bit = # 20.

For example. the ags .MMIX-SIM: THE MAIN LOOP 364 65.

eld of info [op ] tells us how to obtain the operands from the X. Y. and Z .

Each entry records special properties of an operation code. # 80 means a push or pop or unsave instruction. # 2 means rZ is a source operand. The trace format . # 8 means rY is a source operand. # 40 means YZ is part of a relative address. # 20 means rX is a destination.elds of the current instruction. in binary notation: # 1 means Z is an immediate value. # 4 means Y is an immediate value. # 10 means rX is a source operand.

#de.eld will be explained later.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

ne #de.

f"FUNE" . 10. "%l = %y * %z = %x" g. # 2a. 0. 0.z = %x" g. 5. f"FCMPE"#. 2a. f"FREM" .z = %x" g. # 25.z (%. 66. "%l = %. 0.x" g. "%l = %(int%) %. 0. 0. f"FMUL" . 0. "%l = %. 4. 0. # 2a.y %(*%) %. 4. h Info for logical and control commands 69 ig. "%l = [%. 2a.z = %. 0. "%l = %(fix%) %.z = %. ## 25. f"FLOTU" . 1. "%l = [%. 4. 0. 0. 26. f"FADD" . 4.y(||)%. "%l = %(sflot%) %z = %.x" g. f"FLOT" . f"FSUB" . "%l = %.z (%. 0. 0.y %(+%) %. rE . 4.x" g. 0. ## 2a. 0. 10. h Info for load/store commands 68 i.y(||)%.z = %. "%r" g.z] = %x" g. f"SFLOTI" . # 26.x" g. 0. "%l = %(flot%) %#z = %.z = %. f"FLOTUI" . 4. "%l = %(sflot%) %z = %. # 2a. 0. "%l = %#y * %#z = %#x.y %(rem%) %. # 2a.x" g. 4. 255 . . # 25. 0. "%l = %(flot%) %z = %. 4. 1.b)) = %x" g.x" g. f"FCMP"#. 0. 4. 1. 2a. 0. "%l = %. 0. 0. "%l = %. 0. "%l = %(sqrt%) %. # 0a. f"FSQRT"#. f"FLOTI" . rE . "%l = %(sflot%) %#z = %. "%l = %(sflot%) %z = %. f"FDIV" . 40. # 25. f"FIXU" . 0.y %(/%) %. "%l = %y * %z = %x" g.z = %. 0.y(==)%. f"SFLOTU" . 0.z (%.z = %. 4. 0. # 2a. 4. 0. 0. # 26. 4. 0. 0.x" g. "%l = %. 0. 4.z] = %x" g. h Info for branch commands 67 i. # 2a.b)] = %x" g. # 29. 0.x" g.x" g. 0.y cmp %. 4. 0. f"FUN" . 40. 0. rH=%#a" g. 0. 0. 0. 0. f"FINT" . "%l = %.x" g. "%l = %(flot%) %z = %. "%l = [%. # 2a. f"MULI" . 2a.ne Z is immed bit # 1 Z is source bit # 2 Y is immed bit # 4 Y is source bit # 8 X is source bit # 10 X is dest bit # 20 rel addr bit # 40 push pop bit # 80 h Global variables 19 i + op info info [256] = fh Info for arithmetic commands 66 i. ##26. f"FEQL" . 0.y cmp %.x" g. ## 26. 0. 0. f"MUL" . f"MULU" . rE .b)] = %x" g. 0. 1.z = %#x" g.x" g. 0.y %(-%) %.x" g. 0.x" g. "%l = [%. 0. # 26.y(==)%. 0. 4. 26.x" g. "%l = %(fix%) %. 0. f"FEQLE" . 10. 4. f"SFLOT" . 2a. h Info for arithmetic commands 66 i  f"TRAP" . # 26.z = %. 0. f"SFLOTUI" . f"FIX" . 4. "%l = %(flot%) %z = %. 0.

1. ##2a. f"16ADDUI" . 1. # 29. 1. 0. "%l = %y >> %#z = %x" g. 0. 2a. 0. 0. 1. 1. 0. 60. 1. 1. f"2ADDUI"#. 1. "%l = %#y + %#z = %#x" g. "%l = %#y .365 MMIX-SIM: THE MAIN LOOP f"MULUI"#. 0. 0. f"SLI" .%z = %x" g. "%l = %#y <<1+ %z = %#x" g. # 2a. ags : unsigned char . 0. 1.%z = %x" g. 29. 0. 0. trace format : char . f"SUBUI" . f"CMP" . 29. 0. "%l = %y cmp %z = %x" g. f"SUBI" . f"8ADDU" . 1. 0. 1. "%l = %#y >> %z = %#x" g This code is used in section 65. "%l = %#y <<3+ %#z = %#x" g. f"SR" . #2a. 0. 29. 0.%z = %x" g. 1. 1. f"DIV" . 0. 1. 0. ##2a. rR=%#a" g. 0. # 2a. 0. x64. 2a. "%l = %#y <<4+ %z = %#x" g. f"NEG" . f"CMPUI" . f"SLUI" . 25. #26. f"8ADDUI" . 0. "%l = %y >> %z = %x" g. 29. f"SL" . 0. f"ADDI" . f"2ADDU" . 0. 0. 0. 0. 60. 0. "%l = %#y <<4+ %#z = %#x" g. 1. f"4ADDU" . ##26. "%l = %y / %z = %x. # 2a. 60. "%l = %#y * %z = %#x. ##2a. 29. 0. 1. # 29. 1. 0. 0. x55. "%l = %y cmp %z = %x" g. f"SUB" . "%l = %y . 0. rE = 2. f"NEGUI" . f"ADDUI"#. 0. 1. "%l = %y + %z = %x" g. "%l = %#y <<2+ %z = %#x" g. 1. 0. rR=%a" g. f"ADDU" . rH=%#a" g. 1. op : register mmix opcode . 0. f"CMPI" . 1. 0. f"DIVI" . 0. 0. 0. # 29. 0. 0. #2a. 1. 0. "%l = %#y . 10.%#z = %#x" g. "%l = %y . 1. 0. rR=%#a" g. # 29. op info = struct . 1.%#z = %#x" g. "%l = %#y <<3+ %z = %#x" g. 0. 0. # 29. 0. 29. 2a. rD = 1. "%l = %y / %z = %x. 1. "%l = %#y + %z = %#x" g. 0. 60. 0. # 25. 0. "%l = %y << %z = %x" g. "%l = %y . "%l = %#y >> %#z = %#x" g. 0. ##2a. rD . f"CMPU" . "%l = %y << %#z = %x" g. x 62. "%l = %y . f"SUBU" . "%l = %#b%0y / %z = %#x. # 29. 0. rR=%a" g. f"16ADDU" . 0. 0. 0. f"SRUI" . f"SLU" . "%l = %y . 1. "%l = %#y << %z = %#x" g. 1. "%l = %#y cmp %z = %x" g. "%l = %#y cmp %#z = %x" g. "%l = %#y <<1+ %#z = %#x" g. 0. "%l = %#y <<2+ %#z = %#x" g. 29. ##2a. x55. f"ADD" . . 0. 0. 0. 0. 0. f"NEGU" . 0.%z = %#x" g. 0. # 29. 1. f"DIVUI"#. 0. 0. "%l = %y + %z = %x" g. rD . 0. f"SRU" .%z = %x" g. 1.%z = %#x" g. 0. 0. 29. # 29. # 2a. "%l = %#y << %#z = %#x" g. 0. 0. "%l = %#b%0y / %#z = %#x. 1. 1. 1. x64. f"NEGI" . f"DIVU" . 0. 0. 0. f"4ADDUI"#. #2a. x64. 0. ##2a. 0. # 29. "%l = %y . 0. f"SRI" .

1. 0. 0. 0. 1. 1. ## 50. 0. 0. 0. 0. "%b even? %t%g" g. 0. 1. f"BNP" . 0. 1. 0. 0. f"PBNNB"#. # 39. # 39. "%b<=0? %t%g" g. "%b==0? %t%g" g. 0. 0. "%b<0? %t%g" g. 0. # 50. "%b>0? %t%g" g. 0. 366 . f"PBNZ" . f"BZB" . "%l = %y==0? %z: %b = %x" g. 0. 1. "%l = %y<0? %z: %b = %x" g. 1. 1. 0. 0. 0. 0. "%l = %y==0? %z: %b = %x" g. 1. f"PBNPB"#. 0. 0. 0. f"CSEVI" . ##50. 0. 1. # 50. # 3a. 0. 0. f"CSODI"#. 1. "%l = %y!=0? %z: %b = %x" g. 0.MMIX-SIM: THE MAIN LOOP 67. 0. f"PBP" . 0. f"CSPI" . 0. "%b<0? %t%g" g. f"CSEV" . h Info for branch commands 67 i  f"BN" . f"BNN" . "%b odd? %t%g" g. f"CSP" . # 39. 0. 50. "%b even? %t%g" g. 0. 1. 0. "%b>0? %t%g" g. 50. "%l = %y!=0? %z: %b = %x" g. f"PBN" . "%l = %y>=0? %z: %b = %x" g. 0. "%b!=0? %t%g" g. 50. f"BNZB" . f"CSOD" . 0. f"CSNPI" . # 39. 1. 50. 1. "%b==0? %t%g" g. 1. "%b odd? %t%g" g. 1. "%l = %y<=0? %z: %b = %x" g. 1. 0. 0. 0. 0. "%b odd? %t%g" g. 1. f"BNZ" . "%b==0? %t%g" g. 0. 50. 0. f"PBOD" . 50. # 3a. 0. 0. 0. "%l = %y<=0? %z: %b = %x" g. 39. 0. "%l = %y<0? %z: %b = %x" g. 0. 1. 0. 1. ##50. 0. 1. 3a. 0. "%b even? %t%g" g. 0. f"BNPB"#. 1. # 50. "%l = %y odd? %z: %b = %x" g. "%b even? %t%g" g. 1. 0. 0. f"BODB" . 50. f"PBEV" . 0. 1. 0. 1. 0. 1. # 50. 0. 0. 1. "%l = %y even? %z: %b = %x" g. "%b!=0? %t%g" g. "%l = %y even? %z: %b = %x" g. 0. 0. f"CSZI" . 50. 1. 0. "%b==0? %t%g" g. 1. f"BZ" . # 50. f"PBODB"#. 0. 0. 50. f"BNNB" . "%b<0? %t%g" g. "%b>=0? %t%g" g. 0. 0. # 3a. # 39. 0. f"PBNN" . f"PBZB"#. "%b<=0? %t%g" g. 0. f"BP" . f"PBNP" . 1. "%b>=0? %t%g" g. 0. ##3a. "%l = %y odd? %z: %b = %x" g. 1. 39. 0. 0. 1. 1. # 39. 1. # 50. 0. 0. "%b>0? %t%g" g. 0. 0. f"CSNP" . 1. 0. 0. f"CSNNI" . 1. 0. f"BEV" . f"PBNZB"#. # 50. 0. ##50. f"PBNB"#. "%b<=0? %t%g" g. "%b<=0? %t%g" g. # 3a. 1. 0. f"PBPB" . 0. 1. "%b!=0? %t%g" g. "%b>=0? %t%g" g. f"BPB" . 0. f"CSZ" . "%l = %y>=0? %z: %b = %x" g. "%b>=0? %t%g" g. # 50. 1. # 50. "%b odd? %t%g" g. 0. ##3a. 1. f"BOD" . "%l = %y>0? %z: %b = %x" g. f"CSNZ" . 50. # 50. f"CSN" . # 50. # 50. f"BNB" . f"CSNZI" . "%l = %y>0? %z: %b = %x" g. 0. 1. # 50. 0. 0. "%b<0? %t%g" g. 1. # 50. f"CSNI" . f"PBEVB"#. 0. "%b!=0? %t%g" g. 50. 0. 0. 1. 1. 0. # 50. f"PBZ" . 1. 0. 0. f"BEVB"#. # 50. 3a. 0. 50. 0. f"CSNN" . 1. 0. "%b>0? %t%g" g. 1. 0. 0. 1. 0.

# 29. 0. # 29. 29. 0. "%l = %y>=0? %z: 0 = %x" g. 0. 0. 0. f"ZSNP" . 2a. f"ZSNPI" . 1. 1. 0. 0. 0. # 29. # 29. 1. 1. 29. 0. "%l = %y!=0? %z: 0 = %x" g. "%l = %y<=0? %z: 0 = %x" g. "%l = %y odd? %z: 0 = %x" g. 1. 0. 29. "%l = %y==0? %z: 0 = %x" g. "%l = %y even? %z: 0 = %x" g. 1. f"ZSP" . f"ZSNN" . "%l = %y<=0? %z: 0 = %x" g. 0. 2a. 0. f"ZSZI" . 0. 1. "%l = %y odd? %z: 0 = %x" g. "%l = %y even? %z: 0 = %x" g This code is used in section 65. THE MAIN LOOP . 0. 1. f"ZSOD" . f"ZSPI" . 0. # 2a. 0. 2a. f"ZSNNI"#. f"ZSEVI" . "%l = %y>0? %z: 0 = %x" g.367 MMIX-SIM: f"ZSN" . "%l = %y<0? %z: 0 = %x" g. 0. "%l = %y>0? %z: 0 = %x" g. # 2a. 0. 0. 0. 1. 1. 1. f"ZSODI" . f"ZSEV" . 0. 0. f"ZSNZ" . 0. f"ZSZ" . 1. "%l = %y>=0? %z: 0 = %x" g. 0. "%l = %y!=0? %z: 0 = %x" g. 1. 0. 1. 0. 0. "%l = %y==0? %z: 0 = %x" g. ##2a. "%l = %y<0? %z: 0 = %x" g. f"ZSNZI"#. f"ZSNI"#. 1. 0. 0. ##2a. 1. ##2a. 0. # 29. 0. 0.

1. 1. 1. 29. 0. # 39. M8[%#w]=%#a" g. f"LDHT" . 3a. "%l = M1[%#y%?+] = %#x" g. 1. 1. 1. 0. #2a. "%l = M8[%#y+%#z] = %#x" g. "%l = M4[%#y%?+] = %x" g. 0. 0. 1. "%l = M2[%#y+%#z] = %x" g. 1.MMIX-SIM: THE MAIN LOOP 68. f"STTI" . f"LDT" . 1. f"LDOI" . 1. # 29. 0. 0. 1. M8[%#w]=%#a" g. "M4[%#y%?+] = %b. f"STOUI" . 0. "M8[%#y%?+] = %#b" g. 1. 1. 1. 1. 1. "[%#y%?+ .x" g. 0. 0. f"PRELDI"#. 2a. h Info for load/store commands 68 i  f"LDB" . f"STTU" . ## 29. # 29. 19. f"LDO" . 2a. 0. 29. 0. %r" g. "%l = M8[%#y%?+] = %#x" g. 1. # 1a. 0. 2.. f"LDVTSI"#. 1. 09. "%l = [M8[%#y+%#z]==%a] = %x. 1. 0. 1. 0. 29.. ## 29. "%l = M8[%#y+%#z] = %x" g. ## 29. 1. M8[%#w]=%#a" g. 1. 2. f"LDWI" . 1.. 1. f"STT" . f"STWU" . ##1a. 1. 1. 368 . 1. "%l = M2[%#y+%#z] = %#x" g. 1. 0. # 29. %r" g. f"LDSFI"#. 1. "" g. 1. f"GO" . f"LDBUI" . M8[%#w]=%#a" g. 0. # 2a. "%l = M4[%#y+%#z]<<32 = %#x" g. 0. "M8[%#y+%#z] = %b" g. "%l = M8[%#y%?+] = %x" g. 0. M8[%#w]=%#a" g. 1. # 19. 0. %#x]" g. "%l = M4[%#y+%#z] = %x" g. 1. "%l = (M4[%#y%?+]) = %. f"STBI" . -> %#y+%#z" g. %#x]" g. f"STW" . 1. 1. ##2a. f"LDTUI" . 1. 1. 1. 0. 0. 0. 29. 1. "M2[%#y%?+] = %b. 1a. "M4[%#y%?+] = %#b. 0. 1. "%l = M4[%#y+%#z] = %#x" g. f"STO" . "M2[%#y%?+] = %#b. 1. 1. "%l = M2[%#y%?+] = %#x" g. f"LDUNC" . 1. #0a. 1. M8[%#w]=%#a" g. # 2a. f"STTUI" . f"STWI" .x" g. "M1[%#y%?+] = %#b. 09. 0. "%l = M8[%#y%?+] = %#x" g. "%l = M1[%#y%?+] = %x" g. M8[%#w]=%#a" g. M8[%#w]=%#a" g. 0. # 19. f"LDOU" . 0. 0. "M4[%#y+%#z] = %b. 1. 1. "[%#y+%#z . 0. f"GOI" . "%l = M2[%#y%?+] = %x" g. # 19. "%l = M8[%#y+%#z] = %#x" g. f"LDSF" . 1. #0a. # 29. 1. "M1[%#y+%#z] = %#b. 1. # 19. 0. 1. 1. 1. f"PRELD" . 1. 0. f"STOU" . 1. 1. 1. "%l = M1[%#y+%#z] = %#x" g. 1. 0. "" g. f"CSWAP" . 0. f"PREGO" . 1. "M2[%#y+%#z] = %b. f"LDTU" . "M1[%#y+%#z] = %b. 1. f"PREGOI" . f"LDUNCI"#. 1. 3. 0. ##2a. # 19. "%l = M4[%#y%?+] = %#x" g. f"LDBU" . f"LDWUI"#. 1. 1. 1. 1. 1. "%l = %#x. 19. M8[%#w]=%#a" g. 0. 1. 0. 2. 1. f"LDW" . 1. f"LDWU" . "M1[%#y%?+] = %b. 2a. 0. 0. 1. # 29. # 1a. -> %#y%?+" g. # 2a. 1.. # 2a. 1. 1. M8[%#w]=%#a" g. f"LDBI" . 0. "%l = [M8[%#y%?+]==%a] = %x. 0. "[%#y%?+ . 1. 2. 1. f"LDHTI" . 0. ##1a. f"STBU" . 1. # 29. f"STOI" . #2a. 1. 1. 0. f"LDVTS" . M8[%#w]=%#a" g. 0. "M8[%#y+%#z] = %#b" g. 1. M8[%#w]=%#a" g. 0. f"LDOUI"#. "%l = %#x. "M2[%#y+%#z] = %#b. # 1a. 1. 1. "%l = M1[%#y+%#z] = %x" g. 0. # 1a. 2a. 3. 1. 1. f"STB" . %#x]" g. 0. "%l = M4[%#y%?+]<<32 = %#x" g. # 2a. 0. f"STBUI" . 1. f"LDTI" . f"STWUI"#. 0. 0. "M4[%#y+%#z] = %#b. 0. 1a. # 19. 0. f"CSWAPI"#. "[%#y+%#z . 0. 0. 0. 0. 0. %#x]" g. "M8[%#y%?+] = %b" g. "%l = (M4[%#y+%#z]) = %. 0.

1. 1. "[%#y%?+ . rL=%a. %#x]" g.b. 0. # 1a. 0a. "[%#y+%#z . # 1a. 0. 0. "M8[%#y+%#z] = %#b" g. "%lrO=%#b. 0. f"STUNCI" . ##8a. # 19. 1. 1. M8[%#w]=%#a" g. "M8[%#y%?+] = %#b" g. f"STHT" . 0. 1. 0. 0. 1.. f"PUSHGOI" . M8[%#w]=%#a" g. 1. 1. # 19. f"SYNCD" . 1. 0. 09. f"STSFI"#. f"SYNCDI"#. 0. "[%#y+%#z . f"PRESTI" . %#x]" g. 0. f"SYNCID" . 0. %#x]" g. 0a. "M8[%#y%?+] = %b" g.. 1. f"SYNCIDI" . 0. 1. "[%#y+%#z . f"STCOI" . f"STHTI" . 0. 0. . 1. -> %#y%?+" g This code is used in section 65. 0. 09. f"PUSHGO" . "%lrO=%#b. rL=%a. 1. 0. "%(M4[%#y+%#z]%) = %. # 09. 1. rJ=%#x. f"PREST" .. "M4[%#y+%#z] = %#b>>32.369 MMIX-SIM: THE MAIN LOOP f"STSF" .b.. f"STUNC" . 1. 0. 89. %#x]" g. 1. %#x]" g. ##0a. 0. f"STCO" . 1. 1. 0. "M4[%#y%?+] = %#b>>32. 0. # 19.. 0. "[%#y%?+ .. %#x]" g. 1. ##0a. -> %#y+%#z" g. M8[%#w]=%#a" g. "[%#y%?+ . "%(M4[%#y%?+]%) = %. ## 09. 1. 1. M8[%#w]=%#a" g. 1. 1a. rJ=%#x. 0. "M8[%#y+%#z] = %b" g. 3. 3. 0. 0.

# 29. "%l = %#y | %z = %#x" g. f"SETMH" . 0. 0. 29. 0. 0. f"SADDI"#. f"MXORI"#. 0. # 30. # 30. 0. f"XOR" . "%l = %#y wdif %#z = %#x" g. 0. # 30. 0. 0. 0. 0. "%l = %#y odif %z = %#x" g. "%l = %#z" g. 0. 0. ## 29. 1. 0. 2a. 0. 0. 0. 0. 0. 0. # 30. # 2a. 1. 1. "%l = %#y |~ %#z = %#x" g. f"MXOR" . f"XORI" . 0. "%l = %#y tdif %z = %#x" g. 1. 1. 0. f"NXORI"#. 29. 20. 1. 0. 1. 1.MMIX-SIM: THE MAIN LOOP 69. 1. 0. f"NANDI"#. 1. 0. 0. 2a. 1. 1. 0. f"INCML"#. # 20. f"ORH" . # 29. f"ANDN" . h Info for logical and control commands 69 i  f"OR" . "%l = %#y & %#z = %#x" g. 2a. "%l = %#y & %z = %#x" g. f"WDIFI"#. # 30. 0. 1. "%l = %#y mor %z = %#x" g. "%l = %#y \\ %#z = %#x" g. 0. "%l = %#y + %#z = %#x" g. 0. 1. 0. f"ANDNL" . 0. "%l = %#y + %#z = %#x" g. "%l = %#y ~& %#z = %#x" g. # 30. "%l = %#y tdif %#z = %#x" g. # 2a. 2a. "%l = %#y wdif %z = %#x" g. 0. 0. 0. f"SETL" . 0. f"ANDI" . f"ANDNI" . 0. "%l = %#y ~^ %z = %#x" g. 0. 0. 2a. 0. 0. f"NORI" . 0. # 29. 0. 1. "%l = %#y mxor %z = %#x" g. 0. "%l = %#y | %#z = %#x" g. 0. f"BDIF" . 0. 1. 0. "%l = %#y | %#z = %#x" g. 0. 30. # 29. f"AND" . # 2a. "%l = %#y \\ %z = %#x" g. f"MUXI" . "%l = %#y ~| %#z = %#x" g. f"INCL" . 0. f"ANDNMH" . rM . f"MUX" . 1. f"ODIF" . "%l = %#y ^ %z = %#x" g. f"WDIF" . "%l = %#y ~^ %#z = %#x" g. 1. 0. 0. 0. 1. f"BDIFI"#. 0. 0. 1. 0. 0. 0. 0. "%l = %#y |~ %z = %#x" g. # 30. f"ORML"#. 0. 1. # 29. "%l = %#b? %#y: %z = %#x" g. 1. "%l = %#y \\ %#z = %#x" g. 0. f"TDIFI"#. 0. # 20. 1. 0. 1. # 29. rM . f"ORMH" . "%l = %#y bdif %#z = %#x" g. f"MORI" . f"ANDNML" . 2a. 0. 1. 1. "%l = %#y mxor %#z = %#x" g. 2a. 1. 1. f"INCH" . "%l = %#y bdif %z = %#x" g. 1. # 30. 1. 0. 0. f"SETH" . 0. 1. 0. # 29. "%l = %#y \\ %#z = %#x" g. 0. "%l = nu(%#y%?\\) = %x" g. f"NAND" . f"TDIF" . "%l = nu(%#y\\%#z) = %x" g. # 29. 0. ##2a. 0. 0. f"ANDNH" . 0. f"ORL" . 1. 0. # 29. f"NOR" . f"ORI" . f"ORN" . # 30. ## 29. ##2a. 1. 1. 0. # 30. f"NXOR" . 0. 0. 0. "%l = %#z" g. 0. 29. "%l = %#y odif %#z = %#x" g. 1. 1. 0. "%l = %#z" g. 0. 1. # 29. 0. 0. 1. 0. 1. "%l = %#y | %#z = %#x" g. f"ODIFI"#. 1. 1. 0. 1. 1. "%l = %#y ~& %z = %#x" g. 0. "%l = %#y | %#z = %#x" g. "%l = %#y + %#z = %#x" g. f"ORNI" . 0. 2a. "%l = %#y ~| %z = %#x" g. 0. 0. 0. "%l = %#y | %#z = %#x" g. 0. 0. "%l = %#y \\ %#z = %#x" g. f"SADD" . 1. 0. 0. "%l = %#z" g. f"SETML" . 1. ##2a. "%l = %#y mor %#z = %#x" g. # 29. 1. # 20. 370 . 30. 1. "%l = %#y \\ %#z = %#x" g. 2a. 1. 0. f"MOR" . "%l = %#y + %#z = %#x" g. f"INCMH" . # 2a. "%l = %#y ^ %#z = %#x" g. "%l = %#b? %#y: %#z = %#x" g.

0. rJ=%#x. "-> %#z" g. 0.. "%l = %#z" g. f"GET" . 0. 1. "{%#b} -> %#z" g. "%lrL=%a. # 02. # 01. g This code is used in section 60. "" g. rL=%a. 0. 5. -> %#y%?+" g. 0. #c0. 0. . f"SYNC" . 0a. 0. 0. 0. 0. c0. rB=%#b. rZ=%#z. f"GETAB" . 0. ##60. 0. # 82.. 01.371 MMIX-SIM: THE MAIN LOOP f"JMP" . 0. 0. ##40. 255. f"GETA" . "" g. rL=%a" g. 1. yz  2). f"SAVE" . 70. 0. ##20. "%lrO=%#b. f"PUSHJ" . 0. "%l = %s = %#x" g. 0. f"RESUME"#. 00. f"PUSHJB" . y = inst ptr . 0. h Convert relative address to absolute address 70 i  f if ((op & # fe)  JMP ) yz = inst & # ffffff. if (op & 1) yz = (op  JMPB ? # 1000000 : # 10000). # 80#. 1. -> %#z" g. 1. "%s = %r" g. rJ . 0. 1. -> %#z" g. f"PUT" . h Install operand .. "-> %#z" g. 1. rJ=%#x. g[255]=0" g This code is used in section 65. 0. 1. rO=%#b. "rY=%#y. rL=%a. f"SWYM" . 0. 0. 1. 0. # 40. f"POP" . 60. 0. f"UNSAVE"#. 20. f"JMPB" . z = incr (loc . 1. 0. 1. 0. 71. 5. "%s = %r" g. "%l = %#x" g. 0. 20. f"PUTI" . 1. "%lrO=%#b. 1. "%l = %#z" g. # 00. 3. "%#z: rG=%x. 1. 0. 20. f"TRIP" .

info : op info [ ]. x54. x61. third operand : unsigned char . 62. x55. rM = 5. y: octa . x62. inst ptr : octa . x61. f : register tetra . x65. SETH . op : register mmix opcode . else if (f & # 8) h Set y from register Y 73 i. yz : register int . MMIX-ARITH x6. x54. z: octa . x61. x61. x10. x62. l: tetra . = # e0. loc : octa . x54. rJ = 4. yy : register int . else if (f & # 2) h Set z from register Z 72 i else if ((op & # f0)  SETH ) h Set z as an immediate wyde 78 i. JMP = # f0. x RESUME_AGAIN = 0. x61. if (f & # 1) z:l = zz . if (f & # 4) y:l = yy . zz : register int . x64.elds 71 i  if (resuming ^ rop = 6 RESUME_AGAIN ) h Install special operands when resuming an interrupted operation 126 i else f if (f & # 10) h Set b from register X 74 i. x61. x62. g This code is used in section 60. rop : int . x61. incr : octa ( ). if (info [op ]:third operand ) h Set b from special register 79 i. x125. resuming : bool . b: octa . x62. JMPB = # f1. inst : tetra . x55. x61.

MMIX-SIM: THE MAIN LOOP 372 72. There are 256 global registers. the . g[0] through g[255].

int S . h Local registers 62 i +  accessible copies of key registers = 76. O. The current values of rL. (In fact. O and S actually hold the values rO/8 and rS/8. h Set b from register X 74 i  f if (xx  G) b = g[xx ]. = global registers = octa l.) h Set z from register Z 72 i  f if (zz  G) z = g[zz ]. usually 256 but the user can increase this to a larger power of 2 if desired. 73. else if (yy < L) y = l[(O + yy ) & lring mask ]. 74. There are lring mask + 1 local registers. h Set y from register Y 73 i  f if (yy  G) y = g[yy ]. etc. and rS are kept in separate variables called L. = local registers = = the number of local registers (a power of 2) = int lring size . else if (xx < L) b = l[(O + xx ) & lring mask ]. because of the way MMIX has been simpli. = one less than lring size = int lring mask . and S for convenience. G. Several of the global registers have constant values. rB . else if (zz < L) z = l[(O + zz ) & lring mask ]. rO. 75. g This code is used in section 71. h Global variables 19 i + octa g[256].rst 32 of them are used for the special registers rA . g This code is used in section 71. = congruent to rS  3 modulo lring size = 77. g This code is used in section 71. rG. modulo lring size .

(The macro ABSTIME is de. Special register rN has a constant value identifying the time of compilation.ed in this simulator.

ned externally in the .

" is less than 232 .) #de. which is the number of seconds in the \UNIX epoch. ABSTIME is a trivial program that computes the value of the standard library function time ().h . Beware: Our assumption will fail in February of 2106. which should have just been created by ABSTIME .le abstime. We assume that this number.

ne VERSION 1 = version of the MMIX architecture that we support = #de.

ne SUBVERSION 0 = secondary byte of version number = #de.

ne SUBSUBVERSION 1 = further quali.

= . O .cation to version number = register int G. L.

= see comment and g [rT ]:h = # 80000005. cur round = ROUND_NEAR . In operations like INCH . l = (octa ) calloc (lring size .   16) + (SUBSUBVERSION  8). if (:l) panic ("No room for the local registers" ). g [rTT ]:h = # 80000006. if (lring size & lring mask ) panic ("The number of local registers must be a power of 2" ). 78. warning above =  lring mask = lring size 1. g [rV ]:h = # 369c2004. sizeof (octa )). we want z to be the yz .373 MMIX-SIM: THE MAIN LOOP h Initialize everything 14 i + g [rK ] = neg one . g [rN ]:h = (VERSION 24) + (SUBVERSION g [rN ]:l = ABSTIME . if (lring size < 256) lring size = 256.

shifted left 48 bits. x10. break . : ().h>. x55. third operand : unsigned char . INCH = e4. op : register mmix opcode . time : time t ( ). x30. neg one : octa . x55.h>. x55. yz : register int . panic = macro ( ). x10. break .eld. x62. x14. rB = 0. rA = 21. x55. x54. 79. h Set z as an immediate wyde 78 i  f switch (op & 3) f case 0: z:h = yz  16. info : op info [ ]. x10. g This code is used in section 71. cur round : int . x55. abstime. z: octa . zz : register int . x61. = b. rK = 15. yy : register int . x61. <stdlib. x 62. 54. rT = 13. : . x100. which has previously been placed in b. x62. x55. rN = 9. xx : register int . 61. . = . h: tetra#. case 1: case 2: case 3: g y z:h = yz . rV = 18. x4. y: octa . break . We also want y to be register X.h. z:l = yz z:l = yz .  16. h Set b from special register 79 i  b = g[info [op ]:third operand ]. x62. x62. x64. = macro. ROUND_NEAR = 4. ABSTIME ADDU # 22 x b octa x calloc void  MMIX-ARITH MMIX-ARITH octa = struct . x65. break . then INCH can be simulated as if it were ADDU . rTT = 14. This code is used in section 71. x55. <time. l: tetra .

x ptr = &l[(O + xx ) & lring mask ]. sprintf (lhs . "$%d=l[%d]" . "$%d=g[%d]" . 82. by storing the oldest local register into memory location rS and advancing rS. L = g [rL ]:l = L + 1. xx . g else f while (xx  L) h Increase rL 81 i. h Install register X as the destination. g This code is used in section 60. (O + xx ) & lring mask ). x ptr = &g[xx ]. #de. 81. The stack store routine advances the \gamma" pointer in the ring of local registers. xx ).MMIX-SIM: 374 THE MAIN LOOP 80. adjusting the register stack if necessary 80 i  if (xx  G) f sprintf (lhs . h Increase rL 81 i  f  ^ 6 0) if (((S O) & lring mask ) L L = l[(O + L) & lring mask ] = zero octa . xx . g This code is used in section 80. stack store ( ).

ne test store bkpt (ll ) if ((ll )~ bkpt & write bit ) breakpoint = tracing = true h Subroutines 12 i + void stack store ARGS ((void )). void stack store ( ) f register mem tetra ll = mem .

g [rS ]:h.nd (g[rS ]). = incr (g[rS ]. if (stack tracing ) f tracing = true . The stack load routine is essentially the inverse of stack store . k. #de. if (cur line ) show line ( ). S ++ . 83. 8). test store bkpt (ll + 1). l[k]:h. printf (" M8[#%08x%08x]=l[%d]=#%08x%08x. register int k = S & lring mask . g [rS ]:l. ll~ tet = l[k]:h. (ll + 1)~ tet = l[k]:l. g g g [rS ] rS+=8\n" . test store bkpt (ll ). l[k]:l).

. register int k. void stack load ( ) f register mem tetra ll .ne test load bkpt (ll ) if ((ll )~ bkpt & read bit ) breakpoint = tracing = true h Subroutines 12 i + void stack load ARGS ((void )).

ll = mem .375 MMIX-SIM: THE MAIN LOOP S . g [rS ] = incr (g [rS ]. 8).

g ARGS g = macro ( ). h: tetra . l: tetra . l[k]:h = ll~ tet . k = S & lring mask .nd (g[rS ]). incr : octa ( ). lring mask : int . breakpoint : bool . G: register int . x76. g [rS ]:h. l[k]:h. x10. x11. if (cur line ) show line ( ). test load bkpt (ll ). cur line : int . x16. x75. lhs : char [ ]. l: octa . MMIX-ARITH x6. x76. g [rS ]:l. k. x76. x139. mem . x10. if (stack tracing ) f tracing = true . L: register int . x75. printf (" rS-=8. l[k ]:l). g: octa [ ]. x31. test load bkpt (ll + 1). l[%d]=M8[#%08x%08x]=#%08x%08x\n" . x61. l[k]:l = (ll + 1)~ tet . bkpt : unsigned char .

rS = 11.h>. <stdio. tracing : bool . true = 1. x55.nd : mem tetra (). printf : int ( ). x55. mem tetra = struct .h>. x62. read bit = macro. MMIX-ARITH x 4. x75. write bit = macro. O: register int . x58. <stdio. x16. stack tracing : bool . x47. xx : register int . x16. x76. tet : tetra . x61. x 20. rL = 20. zero octa : octa . x58. x9. sprintf : int ( ). x ptr : octa . x61. S : int . . show line : void ( ). x61.

of course. goto store x . case NXOR : case NXORI : x:h = (y:h  z:h). goto store x . 96. goto store x . 101. The task is to compute x = y + z . z ). 94. x:l = (y:l  z:l). Let's get the simple bitwise operations out of the way too. We record such exceptions in a variable called exc . case IIADDU : case IIADDUI : case IVADDU : case IVADDUI : case VIIIADDU : case VIIIADDUI : case XVIADDU : case XVIADDUI : x = oplus (shift left (y. . and not too hard. and to signal over ow if the sum is out of range. case NOR : case NORI : x:h = (y:h j z:h). case NAND : case NANDI : x:h = (y:h & z:h). h Cases for individual MMIX instructions 84 i + case SUB : case SUBI : case NEG : case NEGI : x = ominus (y. 95. x:l = y:l & z:l. x:l = y:l & z:l. 102. x:l = y:l j z:l. 97. 86. = w = oplus (y. goto store x . goto store x . The main control routine has put the input operands into octabytes y and z. 87. 93. similar. case XOR : case XORI : x:h = y:h  z:h. Over ow occurs in the calculation x = y z if and only if it occurs in the calculation y = x + z . goto store x .MMIX-SIM: SIMULATING THE INSTRUCTIONS 376 84. z). z ) = 6 0) exc j= V_BIT . It has also made x ptr point to the octabyte where the result should be placed. Over ow is one of the eight arithmetic exceptions. 107. h Cases for individual MMIX instructions 84 i + case OR : case ORI : case ORH : case ORMH : case ORML : case ORL : x:h = y:h j z:h. one for each MMIX instruction. x:l = y:l  z:l. 86. 85. z). case ANDN : case ANDNI : case ANDNH : case ANDNMH : case ANDNML : case ANDNL : x:h = y:h & z:h. goto store x . since it is somehow the most typical case|not too easy. See also sections 85. The master switch branches in 256 directions. 104. case ORN : case ORNI : x:h = y:h j z:h. Other cases of signed and unsigned addition and subtraction are. goto store x . 106. 92. case ADDU : case ADDUI : case INCH : case INCMH : case INCML : case INCL : x = w. 89. and 124. Over ow occurs if and only if y and z have the same sign but the sum has a di erent sign. case SETH : case SETMH : case SETML : case SETL : case GETA : case GETAB : x = z. case SUBU : case SUBUI : case NEGU : case NEGUI : x = ominus (y. This code is used in section 60. goto store x . x:l = y:l j z:l. ((op & # f)  1) 3). goto store x . goto store x . break . h Cases for individual MMIX instructions 84 i  case ADD : case ADDI : x = w. case AND : case ANDI : x:h = y:h & z:h. 90. x:l = (y:l & z:l). which is set to zero at the beginning of each cycle and used to update rA at the end. Let's start with ADD . 88. 108. goto store x . if (((x:h  z:h) & sign bit )  0 ^ ((x:h  y:h) & sign bit ) 6= 0) exc j= V_BIT . if (((y:h  z:h) & sign bit )  0 ^ ((y:h  x:h) & sign bit ) = store x : x ptr = x. goto store x . x:l = (y:l j z:l). Simulating the instructions.

377 = . 54. SETH SETL SETMH SETML MMIX-ARITH . = . = . x54. NAND cc x NANDI # cd x NEG # 34 x NEGI # 35 x NEGU # 36 x NEGUI # 37 x NOR # c4 x NORI # c5 x NXOR # ce x NXORI # cf x MMIX-ARITH MMIX-ARITH = ## e0. x61. : . shift left : octa ( ). = . x54. w: octa . 54. 54. = . ominus : octa ( ). = . 54. 54. 54. XOR = # c6. 10. = . ORML = # ea. x54. x61. = e2. = . x54. XVIADDUI = # 2f. = . sign bit = macro. x5. x54. x10. 54. : . = . x61. x7. = . x: octa . 54. = . = . z: octa . = . x54. = . x54. x54. x54. = . 54. ORN = # c2. ORI = # c1. = . = . x61. = . y: octa . 54. XORI = # c7. 54. x54. ORMH = # e9. 54. 54. x54. x15. # ORH = e8. x62. x54. = # e1. 54. ORNI = # c3. 54. SUB = # 24. = . 54. 54. 54. 54. VIIIADDU = # 2c. x54. 54. x54. 54. 54. x57. x54. = . V_BIT = macro. x54. 54. = . = . ORL = # eb. x54. = . = . VIIIADDUI = # 2d. = . ADD # 20 x ADDI # 21 x ADDU # 22 x ADDUI # 23 x AND # c8 x ANDI # c9 x ANDN # ca x ANDNH # ec x ANDNI # cb x ANDNL # ef x ANDNMH # ed x ANDNML # ee x exc int x GETA # f4 x GETAB # f5 x h tetra x IIADDU # 28 x IIADDUI # 29 x INCH # e4 x INCL # e7 x INCMH # e5 x INCML # e6 x IVADDU # 2a x IVADDUI # 2b x MMIX-SIM: SIMULATING THE INSTRUCTIONS l: tetra#. = . OR = # c0. 54. 54. 54. SUBU = # 26. x54. 54. x5. x54. = . 54. SUBI = # 25. 54. 54. XVIADDU = # 2e. = . = . SUBUI = # 27. op : register mmix opcode . oplus : octa ( ). 54. x54. = #e3. x ptr : octa . 61. x54. x61. 54. = . = . x54.

in such cases the inputs appear in y. The less simple bit manipulations are almost equally simple.MMIX-SIM: SIMULATING THE INSTRUCTIONS 378 87. and b. #de. The MUX operation has three inputs. z . given the subroutines of MMIX-ARITH.

goto store x . case DIVU : case DIVUI : x = odiv (b. case WDIF : case WDIFI : x:h = wyde di (y:h. z:l). When an operation has two outputs. goto store x . z:h). test over ow : if (over ow ) exc j= V_BIT . goto store x . op & # 2). a = g[rR ] = aux . goto store x . z ). case MOR : case MORI : x = bool mult (y. Here we simply merge those bits into the exc variable. 0). x:l = (y:l & b:l) j (z:l & b:l). case MXOR : case MXORI : x = bool mult (y. case SADD : case SADDI : x:l = count bits (y:h & z:h) + count bits (y:l & z:l). The oating point routines of MMIX-ARITH record exceptional events in a variable called exceptions . 88. y. a = g[rR ] = aux . goto store x . goto store x . z. goto store x . = shift right (x. shift amt . false ). goto store x . tdif l : if (y:l > z:l) x:l = y:l z:l. goto store x ." but the true de. x:l = byte di (y:l. goto store x . a = g[rH ] = aux . z. case BDIF : case BDIFI : x:h = byte di (y:h. true ). goto store x . shift amt ). case SLU : case SLUI : x = shift left (y. z ). the primary output is placed in x and the case a auxiliary output is placed in a. goto store x . case TDIF : case TDIFI : if (y:h > z:h) x:h = y:h z:h. case MULU : case MULUI : x = omult (y. if (a:h 6= y:h _ a:l 6= y:l) exc j= V_BIT . z:h). goto store x . h Cases for individual MMIX instructions 84 i + case MUL : case MULI : x = signed omult (y. shift amt ). The U_BIT is not exactly the same as \under ow. z). goto test over ow . x:l = wyde di (y:l. z:l). else if (y:h  z:h) goto tdif l . goto store x . shift amt .ne shift amt (z:h _ z:l  64 ? 64 : z:l) h Cases for individual MMIX instructions 84 i + SL : case SLI : x = shift left (y. case ODIF : case ODIFI : if (y:h > z:h) x = ominus (y. case SR : case SRI : case SRU : case SRUI : x = shift right (y. case DIV : case DIVI : x = signed odiv (y. case MUX : case MUXI : x:h = (y:h & b:h) j (z:h & b:h). 89. z ). z).

.nition of under ow will be applied when exc is combined with rA. h Cases for individual MMIX instructions 84 i + case FADD : x = fplus (y. z).

n oat : round mode = cur round . if (fcomp (a. case FSUB : a = z. a). goto . x = fplus (y. zero octa ) 6= 2) a:h = sign bit . store fx : exc j= exceptions . goto store x .

z ). case FMUL : x = fmult (y.n oat . goto .

n oat . case FDIV : x = fdivide (y. z). goto .

n oat . case FREM : x = fremstep (y. z. goto . 2500).

n oat . .

y:l).379 MMIX-SIM: SIMULATING THE INSTRUCTIONS case FSQRT : x = froot (z. .

case FINT : x = .n uni oat : if (y:h _ y:l > 4) goto illegal inst . goto store fx . round mode = (y:l ? y:l : cur round ).

goto . y:l).ntegerize (z.

n uni oat . case FIX : x = .

goto .xit (z. y:l).

case FIXU : x = .n uni oat .

y:l). exceptions &= W_BIT .xit (z. goto .

case FLOT : case FLOTI : case FLOTU : case FLOTUI : case SFLOT : case SFLOTI : case SFLOTU : case SFLOTUI : x = oatit (z. op & # 4). goto . y:l. op & # 2.n uni oat .

x54. cur round : int . x54. x61. count bits : int ( ). BDIF = d0. x54. fcomp : int ( ). x54. false = 0. x44. MMIX-ARITH x4. BDIFI = # d1. DIVU = # 1e. DIVUI = # 1f.n uni oat . x54. FDIV = # 14. FADD = # 04. FINT = # 17. a: octa . . DIVI = # 1d. x26. exc : int . x61. ( ). 29. aux : octa . x54. x32. x9. x30. x27. x54. exceptions : int . b: octa#. DIV = # 1c. bool mult : octa ( ). x61. x54. x54. fdivide : octa ( ).

x86. x54.ntegerize : octa ( ). FIX = # 05. .

MULI = # 19. x54. WDIFI = # d3. x54. x54. MXORI = # df. x55. signed odiv : octa ( ). x54. x54. oatit : octa ( ). x54. h: tetra . round mode : int . x13. x54. x54. x93. x10. x89. x28. x54. FLOTUI = # 0b. x54. V_BIT = macro. x57. x7. x54. odiv : octa ( ). x54. TDIFI = # d5. x54. rH = 3. MOR = # dc. x41. x24. x76. x54. y: octa . store x : label. WDIF = # d2. x9. ODIFI = # d7. z: octa . FSUB = # 06. sign bit = macro. x54. MUXI = # d9. MORI = # dd. x57. SADDI = # db. x10. MUX = # d8. x54. x57. x54. froot : octa ( ). x62. x54. x54. x54. FMUL = # 10. x54. x107. x61. op : register mmix opcode . shift left : octa ( ). FLOT = # 08. rR = 6. SLU = # 3a. shift right : octa ( ). x54. illegal inst : label. true = 1. ( ). signed omult : octa ( ). x54. ominus : octa ( ). SR = # 3c. FREM = # 16. x54. SLI = # 39. SRUI = # 3f. x54. zero octa : octa . SLUI = # 3b. x46. x54. wyde di : tetra ( ). FLOTI = # 09. x: octa . x8. x4. x54. TDIF = # d4. x912. x61. fremstep : octa ( ). SRI = # 3d. SRU = # 3e. fplus : octa ( ). x54. x54. x12. x55. U_BIT = macro. SFLOTUI = # 0f. FIXU = # 07. ODIF = # d6. FSQRT = # 15. x54. l: tetra . x54. x84. x61. x84. x54. x88. x54. x7. SFLOTU = # 0e. W_BIT = macro. fmult : octa MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH . x54. x54. MXOR = # de. g: octa [ ]. MMIX-ARITH x byte di : tetra MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH MMIX-ARITH SADD = # da. over ow : bool . x54. omult : octa ( ).xit : octa ( ). FLOTU = # 0a. x5. MULUI = # 1b. SFLOTI = # 0d. x61. x54. MULU = # 1a. SFLOT = # 0c. x54. x4. SL = # 38. x54. MUL = # 18. x15.

We have now done all of the arithmetic operations except for the cases that compare two registers and yield a value of 1 or 0 or 1. = x is 0 by default = #de.MMIX-SIM: 380 SIMULATING THE INSTRUCTIONS 90.

cmp pos : x:l = 1. case FCMP : k = fcomp (y. z. cmp . goto store x . goto store x .ne cmp zero store x h Cases for individual MMIX instructions 84 i + case CMP : case CMPI : if ((y:h & sign bit ) > (z:h & sign bit )) goto cmp neg . if (y:l  z:l) goto cmp zero . case FCMPE : k = fepscomp (y. if (y:l < z:l) goto cmp neg . if (k < 0) goto cmp neg . z ). b. if (y:h > z:h) goto cmp pos . if ((y:h & sign bit ) < (z:h & sign bit )) goto cmp pos . true ). case CMPU : case CMPUI : if (y:h < z:h) goto cmp neg . cmp neg : x = neg one . if (k) goto cmp zero or invalid .

goto cmp . goto cmp zero . case FEQLE : k = fepscomp (y. z)  0) goto cmp pos . case FUN : if (fcomp (y.n : if (k  1) goto cmp pos . cmp zero or invalid : if (k  2) exc j= I_BIT . z. else goto cmp zero . b. z)  2) goto cmp pos . else goto cmp zero . false ). case FEQL : if (fcomp (y.

91. else goto cmp zero . z. b. case FUNE : if (fepscomp (y. true )  2) goto cmp pos . We have now done all the register-register operations except for the conditional commands.n . Conditional commands and branch commands all make use of a simple subroutine that determines whether a given octabyte satis.

break . break . mmix opcode op . f register int b. switch ((op  1) & # 3) f case 0: b = o:h  31. break . = odd? = g g if (op & # 8) return else return b. mmix opcode )). b  1. =  positive? = = . op ) octa o. int register truth (o. case 3: b = o:l & # 1. = negative? = case 1: b = (o:h  0 ^ o:l  0). = zero? case 2: b = (o:h < sign bit ^ (o:h _ o:l)). h Subroutines 12 i + int register truth ARGS ((octa .es the condition of a given opcode. break .

CMPUI = # 33. x54. ZSNZ = # 7a. x54. ZSNPI = # 7d. x54. x54. x54. ZSNN = # 78. x54. y: octa . CSZI = # 63. CSNP = # 6c. I_BIT = macro. ZSODI = # 77. ZSZ = # 72. x57. x9. = macro ( ). x11. x10. h Cases for individual MMIX instructions 84 i + case CSN : case CSNI : case CSZ : case CSZI : case CSP : case CSPI : case CSOD : case CSODI : case CSNN : case CSNNI : case CSNZ : case CSNZI : case CSNP : case CSNPI : case CSEV : case CSEVI : case ZSN : case ZSNI : case ZSZ : case ZSZI : case ZSP : case ZSPI : case ZSOD : case ZSODI : case ZSNN : case ZSNNI : case ZSNZ : case ZSNZI : case ZSNP : case ZSNPI : case ZSEV : case ZSEVI : x = register truth (y. x54. x54. CMPU = # 32. x61. x54. fcomp : int ( ). x54. CSNN = # 68. FCMP = # 01. h: tetra . sign bit = macro. CMPI = # 31. x54. x54. x54. x54. ZSNP = # 7c. FCMPE = # 11. x54. x9. x54. x61. x61. fepscomp : int ( ). CSNZI = # 6b. mmix opcode = enum . x54. x54. x54. x54. CSP = # 64. CSN = # 60. octa = struct . x15. x54. x54. store x : label. CMP = 30. FUN = # 02. x54. x 62. x54. x54. x54. ZSEV = 7e. x54. x54. x54. op ) ? z : b. x10. CSEV = # 6e. MMIX-ARITH x4. x54. MMIX-ARITH x50. CSEVI = # 6f. . x54. exc : int . CSODI = # 67. x54. it will be the contents of register X on the CS operations. ZSOD = # 76. ZSEVI = # 7f. CSZ = # 62. l: tetra . x54. MMIX-ARITH x84. x84. CSNI = # 61. x62. false = 0. x54. x: octa . ZSNZI = # 7b. x54. op : register mmix opcode . ZSPI = # 75. CSNZ = # 6a. x10. ZSNI = # 71.381 MMIX-SIM: SIMULATING THE INSTRUCTIONS 92. ZSZI = # 73. z: octa#. ZSN = # 70. CSPI = # 65. x54. ZSP = # 74. FEQLE = # 13. x54. true = 1. CSNNI = # 69. x61. goto store x . x54. x54. x54. k: register int . neg one : octa . FEQL = # 03. x54. The b operand will be zero on the ZS operations. CSOD = # 66. x54. FUNE = # 12. ZSNNI = # 79. x54. CSNPI = # 6d. ARGS b: octa#. x61.

g else good = (op < PBN ). op ). Memory operations are next on our agenda. Happiness! h Cases for individual MMIX instructions 84 i + case BN : case BNB : case BZ : case BZB : case BP : case BPB : case BOD : case BODB : case BNN : case BNNB : case BNZ : case BNZB : case BNP : case BNPB : case BEV : case BEVB : case PBN : case PBNB : case PBZ : case PBZB : case PBP : case PBPB : case PBOD : case PBODB : case PBNN : case PBNNB : case PBNZ : case PBNZB : case PBNP : case PBNPB : case PBEV : case PBEVB : x:l = register truth (b. y + z . good = (op  PBN ). g[rC ]:l += 2. The memory address. when 32 opcodes reduced to a single case? We get to do it one more time. = penalty is 2 for bad guess else bad guesses ++ .MMIX-SIM: 382 SIMULATING THE INSTRUCTIONS 93. = 94. has already been placed in w. goto . Didn't that feel good. h Cases for individual MMIX instructions 84 i + LDB : case LDBI : case LDBU : case LDBUI : = 56. j = (w:l & # 3)  3. if (good ) good guesses ++ . if (x:l) f inst ptr = z . break .

n ld . case LDW : case LDWI : case LDWU : case LDWUI : i = 48. goto . j = (w:l & # 2)  3.

j = 0.n ld . goto . case LDT : case LDTI : case LDTU : case LDTUI : i = 32.

n ld . case LDHT : case LDHTI : i = j = 0. .

n ld : ll = mem .

test load bkpt (ll ). case LDO : case LDOI : case LDOU : case LDOUI : case LDUNC : case LDUNCI : ll = mem . j ). op & # 2). i.nd (w). x:h = ll~ tet . check ld : if (w:h & sign bit ) goto privileged inst . x = shift right (shift left (x. goto store x .

case LDSF : case LDSFI : ll = mem . x:l = (ll + 1)~ tet . goto check ld . test load bkpt (ll ).nd (w). test load bkpt (ll + 1). x:h = ll~ tet .

case i w:l &= 8.nd (w). test load bkpt (ll ). . goto check ld . x = load sf (ll~ tet ).

x54. x62. x54. x10. = . 54. x54. 54. x76. = . good guesses : int . 54. = . x54. x54. 54. LDBU = # 82. = . = . BEV = # 4e. 54. = . BNPB = # 4d. BZB = # 43. ll : register mem tetra . x54. 54. 54. 54. = . inst ptr : octa . x54. 54. BNP = # 4c. = . 54. BNZ = # 4a. x54. = . = . x61. x61. 54. x139. = . x54. g: octa [ ]. BNB = # 41. 54. LDBI = # 81.383 b: octa . load sf : octa (). x54. BEVB = # 4f. x54. BOD = # 46. x62. good : bool . x54. x54. x10. j : register int . BNZB = # 4b. = . = . x54. BZ = # 42. 54. = . x54. = . i: register int . bad guesses : int . 54. BODB = # 47. 54. BN = # 40. 54. x39. x54. 54. = . LDB = # 80. x139. = . l: tetra . mem . MMIX-SIM: SIMULATING THE INSTRUCTIONS = . 54. 54. x61. x54. x62. x54. BP = # 44. BPB = # 45. = . x54. h: tetra . BNNB = # 49. BNN = # 48.

shift left : octa ( ). = . 54. x15. x61. = . 54. rC = 8. register truth : int ( ). x62. store x : label. sign bit = macro. PBN # 50 x PBNB # 51 x PBNN # 58 x PBNNB # 59 x PBNP # 5c x PBNPB # 5d x PBNZ # 5a x PBNZB # 5b x PBOD # 56 x PBODB # 57 x PBP # 54 x PBPB # 55 x PBZ # 52 x PBZB # 53 x MMIX-ARITH MMIX-ARITH . PBEV = # 5e. z: octa . x: octa . = . 54. 54. x7. = . w: octa . 54. x54. 54. LDBUI # 83 x LDHT # 92 x LDHTI # 93 x LDO # 8c x LDOI # 8d x LDOU # 8e x LDOUI # 8f x LDSF # 90 x LDSFI # 91 x LDT # 88 x LDTI # 89 x LDTU # 8a x LDTUI # 8b x LDUNC # 96 x LDUNCI # 97 x LDW # 84 x LDWI # 85 x LDWU # 86 x LDWUI # 87 x MMIX-ARITH = . = . 54. 54. x83. 54. = . x20. 54. = . op : register mmix opcode . = . x55. x16. 54. 54. x61. = . test load bkpt = macro ( ). privileged inst : label. = .nd : mem tetra (). shift right : octa ( ). y: octa . tet : tetra . x91. x84. x61. = . 54. x61. = . x7. = . PBEVB = # 5f. 54. x54. x107.

j = (w:l & # 3)  3. goto .MMIX-SIM: SIMULATING THE INSTRUCTIONS 384 95. h Cases for individual MMIX instructions 84 i + case STB : case STBI : case STBU : case STBUI : i = 56.

j = (w:l & # 2)  3.n pst . goto . case STW : case STWI : case STWU : case STWUI : i = 48.

. case STT : case STTI : case STTU : case STTUI : i = 32.n pst . j = 0.

n pst : ll = mem .

i. 0).nd (w). g ll~ tet = (ll~ tet  (b:l  (i 32 j ))) & ((((tetra ) 1)  (i 32))  j ). if ((op & # 2)  0) f a = shift right (shift left (b. if (a:h 6= b:h _ a:l 6= b:l) exc j= V_BIT . goto . i).

n st . case STSF : case STSFI : ll = mem .

exc = exceptions .nd (w). ll~ tet = store sf (b). goto .

case STHT : case STHTI : ll = mem .n st .

nd (w). . ll~ tet = b:h.

n st : test store bkpt (ll ). w:l &= 8. ll = mem .

ll = mem . a:l = (ll + 1)~ tet . a:h = ll~ tet . case STCO : case STCOI : b:l = xx . = for trace output = goto check st .nd (w ). case STO : case STOI : case STOU : case STOUI : case STUNC : case STUNCI : w:l &= 8.

ll = mem . (ll + 1)~ tet = b:l.nd (w). break . We shue some of the operands around so that they will appear correctly in the trace output. h Cases for individual MMIX instructions 84 i + case CSWAP : case CSWAPI : w:l &= 8. 96. ll~ tet = b:h. check st : if (w:h & sign bit ) goto privileged inst . test store bkpt (ll ). The CSWAP operation has elements of both loading and storing. test store bkpt (ll + 1).

nd (w). . x:l = 1. if (ll~ tet  a:h ^ (ll + 1)~ tet  a:l) f x:h g g = 0. goto check ld . b:l = (ll + 1)~ tet . "rP=%#b" ). "M8[%#w]=%#b" ). test load bkpt (ll ). (ll + 1)~ tet = b:l. ll~ tet = b:h. g [rP ] = b. strcpy (rhs . test store bkpt (ll + 1). strcpy (rhs . a = g [rP ]. else f b:h = ll~ tet . test store bkpt (ll ). test load bkpt (ll + 1).

h: tetra . rS = if (xx  11) goto illegal inst . = 95. x62. The GET command is permissive. x54. g: octa# [ ]. a: octa . x75. = ##94. g g [xx ] = z . ll : register mem tetra . 98. x54. x61. zz = xx . x10. j : register int . if (xx  8) f = can't change rC. x61. check ld : label. L) 98 i else if (xx  rG ) h Get ready to update rG 99 i. old L : int . g This code is used in section 97.%x) = %z" ). strcpy (rhs . break . if (xx  18) goto privileged inst . x10. privileged inst : label. 62. L: register int . x107. CSWAP CSWAPI MMIX-ARITH x 32.385 MMIX-SIM: SIMULATING THE INSTRUCTIONS 97. l: tetra . x mem . L) 98 i  f = z . b: octa . "%z = %#z" ). x62. x = g [zz ]. GET = fe. illegal inst : label. h Cases for individual MMIX instructions 84 i + case GET : if (yy 6= 0 _ zz  32) goto illegal inst . exceptions : int . x76.%#x) if (z:l > L _ z:h) z:h = 0. goto store x . rN. op : register mmix opcode . z:h ? "min(rL. x = %z" : "min(rL. x94. x61. if (xx  rA ) h Get ready to update rA 100 i else if (xx  rL ) h Set L = z = min(z. x62. case PUT : case PUTI : if (yy 6= 0 _ xx  32) goto illegal inst . h Set L = z = min(z. x54. i: register int . x61. else old L = L = z:l. but PUT is restrictive. x107. PUT = # f6. x54. z:l = L. x20. rO. exc : int . strcpy (rhs .

<string. STBUI = # a3. x54. STTU = # aa. . x7. x82. x54. STHTI = # b3. x54. sign bit = macro. MMIX-ARITH x shift right : octa MMIX-ARITH MMIX-ARITH = # af. test store bkpt = macro ( ). x54. STUNCI = # b7. STWUI = # a7. tetra = unsigned int . x54. STHT = # b2. x54. x55. STOUI V_BIT = macro. x62. x54. x54. STOI = # ad. x15. STOU = # ae. store x : label. STW = # a4. rG = 19. rA = 21. x61. x54. x55. x54. x54. x62. strcpy : char ( ). x54. z: octa . x57. STSF = # b0. x54. STO = # ac. x54. STBU = # a2. rP = 23. x61. x: octa . STTI = # a9. x55. STCO = # b4. x54. STSFI = # b1. STT = # a8. tet : tetra . x139. x62. STCOI = # b5. xx : register int . x54. STBI = # a1. STUNC = # b6. x54. x54. x54. zz : register int . x83. x10. x54. rhs = macro. x54. STWI = # a5. ( ). STB = # a0. STTUI = # ab. x54. 7. STWU = # a6. yy : register int . x16. x84. shift left : octa ( ). PUTI = # f7. x55. rL = 20. x54. x54.nd : mem tetra (). w: octa . x40. x61.h>. x54. test load bkpt = macro ( ). store sf : tetra ( ).

#de.MMIX-SIM: SIMULATING THE INSTRUCTIONS 386 99. G = z:l. for (j = z:l. g This code is used in section 97. j < G. 100. h Get ready to update rG 99 i  f if (z:h 6= 0 _ z:l > 255 _ z:l < L _ z:l < 32) goto illegal inst . j ++ ) g[j ] = zero octa .

ne ROUND_OFF 1 #de.

ne ROUND_UP 2 #de.

ne ROUND_DOWN 3 #de.

= the \hole" records the amount pushed sprintf (lhs . ((k + 1)  3)). goto push . y:l). xx ). cur round = (z:l  # 10000 ? z:l  16 : ROUND_NEAR ). if (L > k) f l[(O 1) & lring mask ] = y . l[(O + xx ) & lring mask ] = x. 4). break . g else lhs [0] = '\0' . case PUSHJ : case PUSHJB : inst ptr = z . b = g [rO ] = incr (g [rO ].ne ROUND_NEAR 4 h Get ready to update rA 100 i  f if (z:h 6= 0 _ z:l  # 40000) goto illegal inst . because we want to trace them coherently. while ((tetra ) (O S )  (tetra ) k) stack load ( ). z ). L = xx + 1. k = l[(O 1) & lring mask ]:l & # ff. b = g [rO ] = incr (g [rO ]. h Cases for individual MMIX instructions 84 i + case PUSHGO : case PUSHGOI : inst ptr = w. "l[%d]=#%x%08x. "l[%d]=#%x. L = k + (xx  L ? xx : L + 1). (O 1) & lring mask . Pushing and popping are rather delicate. z:l = yz  2. y:l). if (y:h) sprintf (lhs . O += xx + 1. goto sync L . g = xx . y = g [rJ ]. sync L : a:l = g[rL ]:l = L. if (L > G) L = G. push : if (xx  L) f if (((S O) & lring mask )  L ^ L 6= 0) stack store ( ). (xx + 1)  3). "l[%d]=%d. (O + xx ) & lring mask . x:l = . 101. inst ptr = oplus (y. xx = L ++ . g This code is used in section 97. if (g[rS ]:l  g[rO ]:l) stack load ( ). (O 1) & lring mask . y:h. " . x = g [rJ ] = incr (loc . case POP : if (xx 6= 0 ^ xx  L) y = l[(O + xx 1) & lring mask ]. " . " . O = k + 1. else sprintf (lhs .

To complete our simulation of MMIX 's register stack. x: octa . i   g O = S. else if (k rR ) k = rP . x82. lring mask : int . x62. S : int . MMIX-ARITH x5. O: register int . yz : register int . h: tetra . x10. UNSAVE = # fb. stack store : void ( ). b: octa . else if (k rZ + 1) break . 8). L: register int . x84. x54. xx : register int . else k ++ . x76. rO = 10. x10. x61.h>. x75. SAVE oplus : octa ( ). x62. for (k = G. w: octa . PUSHGO = # be. O += L + 1. 30. x83. x61. rS = 11. . l: tetra . x61. x54. if (k 255) k = rB . G: register int . while (g[rO ]:l = g[rS ]:l) stack store ( ). (L + 1) L = g [rL ]:l = 0. x62. x62. 6 h  f  3). x55. x61. x75. if (((S O) & lring mask )  L ^ L = l[(O + L) & lring mask ]:l = L.387 MMIX-SIM: SIMULATING THE INSTRUCTIONS 102. x10. = # fa. k: register int . PUSHJB = # f3. cur round : int . x55. incr : octa ( ). x54. x55. yy : register int . g [rO ] = g [rS ]. x76. lhs : char [ ]. y: octa . inst ptr : octa . x = incr (g [rO ]. store x : label. MMIX-ARITH x MMIX-ARITH store x . POP = # f8. tetra = unsigned int . loc : octa . x55. . 6 0) stack store ( ). rB = 0. x76. g [rO ] = incr (g [rO ]. sprintf : int ( ). x75. h Cases for individual MMIX instructions 84 i + case SAVE : if (xx < G _ yy = 6 0 _ zz = 6 0) goto illegal inst . PUSHGOI = # bf. j : register int . ) Store g[k] in the register stack 103 . x61. illegal inst : label. rZ = 27. x54. l: octa . x107. x61. we need to implement SAVE and UNSAVE . zz : register int . goto a: octa . z: octa . rR = 6. rP = 23. zero octa : octa . MMIX-ARITH x4. x55. x 6. x61. stack load : void ( ). PUSHJ = # f2. x62. x54. <stdio. rL = 20. x139. x55. x55. x54. x55. x76. x61. g: octa [ ]. x54. rJ = 4. x62.

we store rG and rA.MMIX-SIM: SIMULATING THE INSTRUCTIONS 388 103.) h Store g[k] in the register stack 103 i  ll = mem . This part of the program naturally has a lot in common with the stack store subroutine. not g[k]. if k is rZ + 1. (There's a little white lie in the section name.

h Cases for individual MMIX instructions 84 i + case UNSAVE : if (xx = 6 0 _ yy = 6 0) goto illegal inst . else x = g[k]. j < k. g [rS ]:l. x:l). g [rO ] = g [rS ]. k. else k . This code is used in section 102. rS+=8\n" . j ++ ) stack load ( ).rA)" : special name [k]. g [rS ] = incr (g[rS ]. x:h. g if (k  rP ) k = rR . k  rZ + 1 ? "(rG. 105. if (stack tracing ) f tracing = true . x:l). ll~ tet = x:h. g [rS ] = incr (z. else if (k  rB ) k = 255. x:l = g[rA ]:l. if (k  rZ + 1) x:h = G  24. g [rS ]:l. x:h. g [rL ]:l = L. O = S . L = k > G ? G : k. test store bkpt (ll ). else printf (" M8[#%08x%08x]=%s=#%08x%08x. rS+=8\n" . 8). 104. 8). stack load ( ). S = g [rS ]:l  3.nd (g[rS ]). test store bkpt (ll + 1). (ll + 1)~ tet = x:l. . g [rS ]:h. g S ++ . for (j = 0. if (k  32) printf (" M8[#%08x%08x]=g[%d]=#%08x%08x. z:l &= 8. if (cur line ) show line ( ). ) f h Load g[k] from the register stack 105 i. g [rG ]:l = G. break . for (k = rZ + 1. g [rS ]:h. else if (k  G) break . a = g [rL ]. 8). h Load g[k] from the register stack 105 i  g [rS ] = incr (g [rS ]. k = l[S & lring mask ]:l & # ff. ll = mem .

else g[k]:h = ll~ tet . if (stack tracing ) f tracing = true . g[%d]=M8[#%08x%08x]=#%08x%08x\n" . k. ll~ tet . test load bkpt (ll ). (ll + 1)~ tet ). . g[k]:l = (ll + 1)~ tet . if (k  rZ + 1) x:l = G = g[rG ]:l = ll~ tet  24. g [rS ]:h. a:l = g[rA ]:l = (ll + 1)~ tet & # 3ffff. if (cur line ) show line ( ). test load bkpt (ll + 1). g [rS ]:l. if (k  32) printf (" rS-=8.nd (g[rS ]).

break . l: tetra . mem . GO = # 9e. l: octa . Several loose ends remain to be nailed down. GOI = # 9f. cur line : int . h Cases for individual MMIX instructions 84 i + case SYNCID : case SYNCIDI : case PREST : case PRESTI : case SYNCD : case SYNCDI : case PREGO : case PREGOI : case PRELD : case PRELDI : x = incr (w. inst ptr : octa . x 62.rA)=M8[#%08x%08x]=#%08x%08x\n" . if (:interacting ^ :interact after break ) halted = true . we do provide a bit of information when tracing. x54. case LDVTS : case LDVTSI : privileged inst : strcpy (lhs . break inst : breakpoint = tracing = true . breakpoint : bool . special name [k]. MMIX-ARITH x6. a: octa . break . x62. x61. JMPB = # f1. x76. xx ). g This code is used in section 104. x61. x61. g: octa [ ]. JMP = # f0. h Cases for individual MMIX instructions 84 i + case GO : case GOI : x = inst ptr . x54. illegal inst : strcpy (lhs . x31. 106. x54. # LDVTSI = 99. "!illegal" ). x61. x54. ll~ tet . break . interact after break : bool .389 MMIX-SIM: SIMULATING THE INSTRUCTIONS else if (k  rZ + 1) printf (" (rG. But if the user has invoked them. ll~ tet . goto break inst . g[rS ]:l. x75. x61. case SYNC : if (xx 6= 0 _ yy 6= 0 _ zz > 6) goto illegal inst . because there are no caches. goto store x . g [rS ]:l. x139. inst ptr = w. %s=M8[#%08x%08x]=#%08x%08x\n" . x10. if (zz  3) break . G: register int . x54. x62. g[rS ]:h. lhs : char [ ]. The cache maintenance instructions don't a ect this simulation. L: register int . j : register int . "!privileged" ). lring mask : int . x54. h: tetra . (ll + 1)~ tet ). halted : bool . incr : octa ( ). case JMP : case JMPB : inst ptr = z . g [rS ]:h. k: register int . LDVTS = # 98. x76. interacting : bool . x10. x76. else printf (" rS-=8. x75. (ll + 1)~ tet ). ll : register mem tetra . x61. 107. indicating the scope of the instruction.

x: octa . rO = 10. x54. x54. x82. x75. yy : register int . xx : register int . stack load : void ( ). x55. x54.nd : mem tetra (). SYNCD = # b8. printf : int ( ). x54. rS = 11. x55. x54. x61. SYNCDI = # b9. tet : tetra . x83. PRELD = # 9a. 20. strcpy : char ( ). show line : void ( ). x55. stack store : void ( ). x62. rL = 20. SYNCIDI = # bd. rZ = 27. x61. x55. x55. test store bkpt = macro ( ). . x55. x16. x61. x62.h>. x82. x76. tracing : bool . x61. x55. x54. stack tracing : bool . x O: register int . <string. x62. SYNCID = # bc. SYNC = # fc.h>. S : int . PREGO = 9c. rR = 6. x56. x54. w: octa . rB = 0. x54. true = 1. x54. x55. # special name : char [ ]. rP = 23. x47. rA = 21. test load bkpt = macro ( ). x55. x83. zz : register int . x54. PRESTI = # bb. PREST = # ba. PREGOI = # 9d. PRELDI = # 9b. x84. x9. <stdio. x54. z: octa . x54. store x : label. x61. rG = 19. UNSAVE = # fb.

break . break . g [rXX ]:h = sign bit . break . break . b). case Fwrite : g[rBB ] = mmix fwrite ((unsigned char ) zz . 1. a = incr (b. case Fputws : g[rBB ] = mmix fputws ((unsigned char ) zz . 2.%#b) = %x" .%#b) = %x" . "$255 = Ftell(%!z) = %x" g. ma ).M8[%#a]=%p) = %x" . 110. We have now implemented 253 of the 256 instructions: all but TRIP . 3. Fputs(%!z. case Fseek : g[rBB ] = mmix fseek ((unsigned char ) zz . break . h Global variables 19 i + char arg count [ ] = f1. break . ma ). Fwrite(%!z. 3.M8[%#b]=%#q. Fgetws(%!z. x This code is used in section 108. 3. h Either halt or print warning 109 i  if (:zz ) halted = breakpoint = true . break . g = g[255] = g[rBB ]. z:h = 0. except for the system calls mentioned in the introduction. print trip warning (loc :l  4. The TRIP instruction simply turns H_BIT on in the exc variable. 2. break . break . g else goto privileged inst . else if (zz  1) f if (loc :h _ loc :l  # 90) goto privileged inst . 3. "$255 = Fread(%!z. h Cases for individual MMIX instructions 84 i + case TRIP : exc j= H_BIT . trap format [yy ]). b). g [rXX ]:l = inst . Fgets(%!z. 8). g[rBB ] = g[255]. case Fclose : g[rBB ] = mmix fclose ((unsigned char ) zz ). TRAP . z:l = zz . b). switch (yy ) f case Halt : h Either halt or print warning 109 i. break . ma ). break . mb .%b) = %x" . case Fputs : g[rBB ] = mmix fputs ((unsigned char ) zz . 3. ma ). case Fgets : g[rBB ] = mmix fgets ((unsigned char ) zz . 2.M8[%#b]=%#q. g [rWW ] = inst ptr . "$255 "$255 "$255 "$255 "$255 "$255 "$255 = = = = = = = Fopen(%!z. mb .M8[%#b]=%#q. break . case Fread : g[rBB ] = mmix fread ((unsigned char ) zz .MMIX-SIM: TRIPS AND TRAPS 390 108.M8[%#a]=%p) = %x" . h Prepare memory arguments ma = M[a] and mb = M[b] if needed 111 i. strcpy (rhs . . 1g. this will trigger an interruption to location 0. mb . and RESUME . g [rZZ ] = z . The TRAP instruction is not simulated.M8[%#a]=%p) = %x" . "$255 = Fputws(%!z. 4)). 109. Trips and traps. ma ). char trap format [ ] = f"Halt(%z)" . case Fopen : g[rBB ] = mmix fopen ((unsigned char ) zz .M8[%#b]=%#q.M8[%#b]=%#q. Fclose(%!z) = %x" . case Ftell : g[rBB ] = mmix ftell ((unsigned char ) zz ). mb . Fseek(%!z. case TRAP : if (xx = 6 0 _ yy > max sys call ) goto privileged inst . incr (g[rW ].M8[%#a]=%p) = %x" . case Fgetws : g[rBB ] = mmix fgetws ((unsigned char ) zz . g [rYY ] = y. mb .M8[%#a]=%p) = %x" . break .

391 MMIX-SIM: TRIPS AND TRAPS 111. h Prepare memory arguments ma = M[a] and mb = M[b] if needed 111 i  if (arg count [yy ]  3) f ll = mem .

test load bkpt (ll + 1).nd (b). mb :l = (ll + 1)~ tet . mb :h = ll~ tet . test load bkpt (ll ). ll = mem .

Here we need only declare those subroutines. <stdio. loc : octa . 112. octa mmix fclose ARGS ((unsigned char )). h Global variables 19 i + extern extern extern extern extern extern extern extern extern extern extern extern extern void mmix io init ARGS ((void )). test load bkpt (ll + 1). void mmix fake stdin ARGS ((FILE )). Fopen = 1. x61. max sys call = macro. octa mmix fputs ARGS ((unsigned char . x61. octa )). exc : int . ARGS breakpoint : bool . = macro ( ). octa mmix ftell ARGS ((unsigned char )). octa )). octa . The input/output operations invoked by TRAP s are done by subroutines in an auxiliary program module called MMIX-IO. x11. Fputs = 7. octa mmix fread ARGS ((unsigned char . h: tetra . x59. FILE . x61. x59. x61. octa mmix fgets ARGS ((unsigned char . octa mmix fwrite ARGS ((unsigned char . x59. x61. octa )). octa mmix fgetws ARGS ((unsigned char . Fread = 3. x10. Fgetws = 5. test load bkpt (ll ). x59. Fseek = 9. ma :l = (ll + 1)~ tet . a: octa . g This code is used in section 108. octa . Fgets = 4.nd (a). x59. x59.h>. b: octa . octa )). octa )). x59. ma : octa . octa )). Fputws = 8. Ftell = 10. x59. x61. Fwrite = 6. Fclose = 2. x76. x59. octa . octa . octa )). void print trip warning ARGS ((int . octa mmix fputws ARGS ((unsigned char . 113. mb : octa . x61. octa mmix fopen ARGS ((unsigned char . octa mmix fseek ARGS ((unsigned char . and write three primitive interfaces on which they depend. x59. octa )). g: octa [ ]. mem . octa )). octa . ma :h = ll~ tet . x59.

x62. mmix ftell : octa ( ). sign bit = macro. x22. x61. x10. privileged inst : label. yy : register int . rhs = macro. zz : register int . x61. mmix fputws : octa ( ). 18. x55. inst : tetra . x23. rBB = 7. strcpy : char ( ). x6. x21. mmix fputs : octa ( ). rXX = 29. x61. x55. TRIP = # ff.h>. rW = 24. rYY = 30. x59. x55. z: octa . mmix fwrite : octa ( ). mmix fopen : octa ( ). x55. x55. x55. <string. MMIX-IO MMIX-IO MMIX-IO MMIX-IO MMIX-IO MMIX-IO MMIX-IO MMIX-ARITH x mmix fclose : octa MMIX-IO MMIX-IO MMIX-IO ( ). xx : register int . x16. true = 1. x7. (). x57. x9. y: octa . print trip warning : void ( ). x12. inst ptr : octa . x54. x8. x14. Halt = 0. x11. rZZ = 31. x16. x61.nd : mem tetra (). x61. x62. x10. mmix fread : octa ( ). x: octa . H_BIT = macro. tet : tetra . x83. MMIX-IO x mmix io init : void MMIX-IO MMIX-IO . x15. x 20. ll : register mem tetra . x62. mmix fake stdin : void 10. incr : octa ( ). l: tetra . rWW = 28. x19. x107. mmix fseek : octa ( ). ( ). test load bkpt = macro ( ). octa = struct . x61. x20. mmix fgetws : octa ( ). x54. x139. x62. mmix fgets : octa ( ). halted : bool . TRAP = # 00.

m = 0. is returned. register tetra x. octa a. a = addr . stop ) char buf . int . int )). continuing until size characters have been read or some other stopping criterion has been met. if stop = 0 a null character will also terminate the process. stop ) reads characters starting at address addr in the simulated memory and stores them in buf . register int m. octa . h Subroutines 12 i + int mmgetchars ARGS ((char .MMIX-SIM: TRIPS AND TRAPS 392 114. If stop < 0 there is no other criterion. int size . m < size . exclusive of terminating nulls. addr . size . int mmgetchars (buf . octa addr . f register char p. ) f ll = mem . size . int stop . for (p = buf . The subroutine mmgetchars (buf . register mem tetra ll . otherwise addr is even. The number of bytes read and stored. addr . and two consecutive null bytes starting at an even address will terminate the process.

g This code is used in section 114. return if done 115 i  f p = (x  (8  ((a:l) & # 3))) & # ff. if (:p ^ (stop  0 _ (stop > 0 ^ x < # 10000))) return m. if ((a:l & # 3) _ m > size 4) h Read and store one byte. (p + 3) = x & # ff. return if done 116 i g g return size .nd (a). (p + 2) = (x  8) & # ff. return if done 116 i  f p = x  24. . (p + 1) = (x  16) & # ff. x = ll~ tet . g p ++ . a = incr (a. 1). test load bkpt (ll ). if ((a:l & # 1) ^ (p 1)  '\0' ) return m 1. h Read and store up to four bytes. return if done 115 i else h Read and store up to four bytes. if (:(p + 1) ^ stop  0) return m + 1. if (:p ^ stop  0) f if (stop  0) return m. h Read and store one byte. m ++ . if (:(p + 2) ^ (stop  0 _ (stop > 0 ^ (x & # ffff)  0))) return m + 2. 115. 116.

a = incr (a. 117. int size . register tetra x. g This code is used in section 114. 4). octa addr . register int m. m < size . int . ) f ll = mem . The subroutine mmputchars (buf . size . for (p = buf . m += 4. addr ) unsigned char buf . a = addr . octa )). addr ) puts size characters into the simulated memory starting at address addr . size . += 4. m = 0.393 MMIX-SIM: TRIPS AND TRAPS if (:(p + 3) ^ stop p  0) return m + 3. h Subroutines 12 i + int mmputchars ARGS ((unsigned char . int mmputchars (buf . octa a. f g register unsigned char p. register mem tetra ll .

m ++ . test store bkpt (ll ). x10. p += 4. m += 4. ll~ tet = (((ll~ tet  s)  p) & # ff)  s.nd (a). h Load and write one byte 118 i  f register int s = 8  ((a:l) & # 3). incr : octa ( ). if ((a:l & # 3) _ m > size 4) h Load and write one byte 118 i else h Load and write four bytes 119 i. l: tetra . a = incr (a. h Load and write four bytes 119 i  f ll~ tet = (p  24) + ((p + 1)  16) + ((p + 2)  8) + (p + 3). 1). 4). MMIX-ARITH x6. g This code is used in section 117. g This code is used in section 117. ARGS = macro ( ). mem . x11. p ++ . 119. a = incr (a. g 118.

tet : tetra . x16. test load bkpt = macro ( ). mem tetra = struct . x10. x16. octa = struct .nd : mem tetra (). . test store bkpt = macro ( ). x10. x82. tetra = unsigned int . x 20. x83.

Under ow that is exact and not enabled is ignored. 121. = . ush (stdout ). p < stdin buf + 254. 256. stdin buf start = stdin buf . Online input is usually transmitted from the keyboard to a C program a line at a time. We cannot deduce the number of characters read by fgets simply by looking at strlen (stdin buf ). = current end of that bu er = char stdin buf end . therefore an fgets operation works much better than fread when we prompt for new input. char stdin chr ( ) f register char p. When standard input is being read by the simulated program at the same time as it is being used for interaction. Just after executing each instruction. p ++ ) if (p  '\n' ) break . we try to keep the two uses separate by maintaining a private bu er for the simulated program's StdIn . if (exc ) f if (exc & tracing exceptions ) tracing = true . we do the following. because fgets might read a null character before coming to a newline character. for (p = stdin buf . stdin buf end = p + 1. f g g g fgets (stdin buf .) h Check for trip interrupt 122 i  if ((exc & (U_BIT + X_BIT ))  U_BIT ^ :(g[rA ]:l & U_BIT )) exc &= U_BIT . 122. (This applies also to under ow that was triggered by RESUME_SET . h Global variables 19 i + = standard input to the simulated program = char stdin buf [256].MMIX-SIM: 394 TRIPS AND TRAPS 120. return stdin buf start ++ . = current position in that bu er = char stdin buf start . But there is a slight complication. stdin ). while (stdin buf start  stdin buf end ) if (interacting ) f printf ("StdIn> " ). = exc & (g[rA ]:l j H_BIT ). h Subroutines 12 i + char stdin chr ARGS ((void )).

123. h Initiate a trip interrupt 123 i  f tripping = true . for (k = 0. = 1.  trips taken are not logged as events = j = = . j g This code is used in section 60. g [rA ]:l j= exc  8. k ++ ) . exc &= (H_BIT  k).nd all exceptions that have been enabled if (j ) h Initiate a trip interrupt 123 i. :(j & H_BIT ).

g[rZ ] = z. inst ptr :l = k  4. g [rX ]:l = inst . g [255] = zero octa . g [rX ]:h = sign bit . We are . inst ptr :h = 0. if ((op & # e0)  STB ) g[rY ] = w. g[rZ ] = b. else g[rY ] = y.395 MMIX-SIM: TRIPS AND TRAPS g [rW ] = inst ptr . g [rB ] = g [255]. g This code is used in section 122. 124.

yy : register int . tracing : bool . RESUME = # f9. x11. break . x4. x61. x55. <stdio.h>. x strlen : int ( ). exc : int .h>. k: register int . x10. true = 1. inst ptr : octa . rW = 24. <stdio. x62. <string. rX = 25.nally ready for the last case. xx : register int . sign bit = macro. RESUME_SET = 2. x54. H_BIT = macro. y: octa . illegal inst : label. x57. x61. x57. ARGS b: octa . zero octa : octa . j : register int . = macro ( ). x61. x10. x61. if (:(b:h & sign bit )) h Prepare to perform a ropcode 125 i. zz : register int . <stdio. tracing exceptions : int . inst ptr = z = g[rW ]. x55. z: octa . x61. <stdio. x62. x61. fread : size t ( ). stdout : FILE . x62.h>. op : register mmix opcode . rY = 26. rZ = 27. x55. x9. x57. b = g [rX ]. x61. rB = 0. ush : int ( ).h>. STB = # a0. interacting : bool . h Cases for individual MMIX instructions 84 i + case RESUME : if (xx _ yy _ zz ) goto illegal inst . x55. g: octa [ ]. x62. x62. x61. x107.h>. = macro. x61. x55.h>. tripping : bool . inst : tetra . x55. MMIX-ARITH . <stdio. rA = 21.h>. fgets : char (). l: tetra . h: tetra . U_BIT w: octa . stdin : FILE . x15. x125. <stdio. X_BIT = macro. 62. printf : int ( ). x61. x76. x61. x54.

#de. Here we check to see if the ropcode restrictions hold.MMIX-SIM: 396 TRIPS AND TRAPS 125. the ropcode will actually be obeyed on the next fetch phase. If so.

ne RESUME_AGAIN 0 = repeat the command in rX as if in location rW 4 = #de.

ne RESUME_CONT 1 = same. but substitute rY and rZ for operands = #de.

h Update the clocks 127 i  if (g[rU ]:l _ g[rU ]:h _ :resuming ) f g [rC ]:h += info [op ]:mems . 126. 32 . exc = g[rX ]:h & # ff00. 1). if (k  L ^ k < G) goto illegal inst . = clock goes up by 2 for each  = g [rC ] = incr (g [rC ]. = usage counter counts total instructions simulated = g [rI ] = incr (g [rI ]. info [op ]:oops ). g This code is used in section 60. 127. = the ropcode is the leading byte of rX = switch (rop ) f case RESUME_CONT : if ((1  (b:l  28)) & # 8f30) goto illegal inst . = clock goes up by 1 for each  = g [rU ] = incr (g [rU ]. z = g [rZ ]. We don't want to count the UNSAVE that bootstraps the whole process. z = zero octa . g resuming = true . case RESUME_SET : k = (b:l  16) & # ff. g This code is used in section 124. default : goto illegal inst . f = X is dest bit . = g This code is used in section 71. else f = RESUME_CONT y = g [rY ]. break . case RESUME_AGAIN : if ((b:l  24)  RESUME ) goto illegal inst . h Install special operands when resuming an interrupted operation 126 i  if (rop  RESUME_SET ) f g op = ORI . 1). = interval timer counts down by 1 only = if (g[rI ]:l  0 ^ g[rI ]:h  0) tracing = breakpoint = true . y = g [rZ ].ne RESUME_SET 2 = set r[X] to rZ = h Prepare to perform a ropcode 125 i  f rop = b:h  24.

...... h Print the frequency count. the location.. This part of the program prints out a symbolic interpretation of what has just happened........ h Trace the current instruction....... just traced = true ... just traced = false .. Tracing. h Print a stream-of-consciousness description of the instruction 131 i..... = gap will not be . we often want to display its e ect. and the instruction 130 i. if (showing stats _ breakpoint ) show stats (breakpoint )..... After an instruction has been executed.. g else if (just traced ) f printf (" .. if requested 128 i  if (tracing ) f if (showing source ^ cur line ) show line ( ).........397 MMIX-SIM: TRACING 128.\n" )......

x55. rC = 8. x55. x62. l: tetra . g: octa [ ]. ORI = # c1. <stdio. show line : void ( ). shown line : int . x55. mems : unsigned char . x62. x61. x61. true = 1. bool = enum . x54. x10. rZ = 27. x9. z: octa . x75. g This code is used in section 60. f : register tetra .lled = shown line = gap 1. x31. zero octa : octa . x54. illegal inst : label. cur line : int . x61. x47. . false = 0. x10. y: octa . x9. x64. x107. L: register int . x48. x61. x55. x9. showing source : bool . x61. op : register mmix opcode . h Global variables 19 i + = should traced instructions also show the statistics? = bool showing stats . 129. exc : int . x61. rU = 17. x62. x55. MMIX-ARITH x 4. show stats : void ( ). oops : unsigned char . rX = 25. rY = 26. x54. k: register int . rI = 12. x55. info : op info [ ]. incr : octa ( ). breakpoint : bool . x65. b: octa . = was the previous instruction traced? = bool just traced . X is dest bit = # 20. x75. UNSAVE = # fb. x48. x61. RESUME = # f9. printf : int ( ). x76. tracing : bool . x65. x64. MMIX-ARITH x6. x61. h: tetra . resuming : bool . G: register int . gap : int .h>. rop : int . x48. x140.

loc :h.. inst  16. loc :h. g g else f ll = mem .MMIX-SIM: TRACING 398 130. break .%02x. case RESUME_CONT : printf (" (%08x%08x: %04xrYrZ (%s)) " . info [op ]:name ). loc :l. loc :l. loc :h. case RESUME_SET : printf (" (%08x%08x: . break . the location. inst . loc :l. break . and the instruction 130 i  if (resuming ^ op 6= RESUME ) f switch (rop ) f case RESUME_AGAIN : printf (" (%08x%08x: %08x (%s)) " . h Print the frequency count. info [op ]:name ).rZ (SET)) " .. (inst  16) & # ff).

info [op ]:name ). g This code is used in section 128. Satterthwaite. inst . 197{217. printf ("%10d. This part of the simulator was inspired by ideas of E. ll~ freq . 131. Online debugging tools have improved signi. loc :l.nd (loc ). 2 (1972). H. %08x%08x: %08x (%s) " . loc :h.

L).cantly since Satterthwaite published his work. lhs + 1). p. alas. g[rA ]:l). = privileged or illegal = else f h Print changes to rL 132 i. today's algebraic programming languages do not provide tracing facilities that come anywhere close to the level of quality that Satterthwaite was able to demonstrate for ALGOL in 1970. = LDA . if (exc ) printf (". for ( . Push. p ++ ) h Interpret character p in the trace format 133 i. h Print changes to rL 132 i  if (L = 6 old L ^ :(f & push pop bit )) printf ("rL=%d. but good oine tools are still valuable. rA=#%05x" . -> #%02x" . if (tripping ) tripping = false . pop. " . otherwise the change is implicit. if L 6= old L . if (z:l  0 ^ (op  ADDUI _ op  ORI )) p = "%l = %y = %#x" . printf ("\n" ). SET = Software|Practice and Experience else p = info [op ]:trace format . This code is used in section 131. g This code is used in section 128. inst ptr :l). and UNSAVE instructions display changes to rL and rO explicitly. 132. printf (". . h Print a stream-of-consciousness description of the instruction 131 i  if (lhs [0]  '!' ) printf ("%s instruction!\n" .

ll : register mem tetra . x62. exc : int . inst : tetra . h: tetra . x65. ADDUI MMIX-SIM: lhs : char [ ]. info : op info [ ]. L: register int . inst ptr : octa . loc : octa . freq : tetra . f : register tetra . false = 0. x75. x139. l: tetra . x16. x10. x61. x9. x76. x61. g: octa [ ]. mem . x 62. x61. x10.399 = # 23. x54. x61.

p: register char . x61. old L : int . x61. x125. x61. op : register mmix opcode .nd : mem tetra (). x65. push pop bit = # 80. = 0. resuming : bool . x55. x64. x54. <stdio. x125. = 1. ORI = # c1. x61. x61. z: octa . rop : int . RESUME = # f9. name : char . x62. tripping : bool . x125. trace format : char . = 2.h>. x62. RESUME_AGAIN RESUME_CONT RESUME_SET . rA = 21. 20. x64. x TRACING printf : int ( ). x54.

which de. Each MMIX instruction has a trace format string.MMIX-SIM: TRACING 400 133.

depending on the value of x.  %? stands for omission of the following operator if z = 0. the memory address of LDBI is described by `%#y%?+ '. %x . and if the stack o set is 100. %w .  %s stands for the name of special register g[zz ]. say. ma . For example. the corresponding brackets are ( and ) . respectively. %q .$2. stdout ). otherwise as `%#y+%z '. `$255=g[255] '). . else f style = decimal . a \style" character may follow the percent sign in this case. ^ and^ .$3 with $2 = 5 and $3 = 8. x. The POP instruction uses lhs to indicate how the \hole" in the register stack was plugged.  %l stands for the string lhs . ROUND_UP .g. the string for ADD is "%l = %y + %z = %x" . which usually represents the \left hand side" of the instruction just performed. Percent signs (% ) induce special format conventions. %y . oating point addition is denoted by `[+] ' when the current rounding mode is rounding-o .. This mechanism allows us to use variable formats for opcodes like TRAP that have several variants. ->loc ' (where loc is the location of the next instruction) or `No '.  %( and %) are brackets that indicate the mode of oating point rounding. _ and _ . this means to treat the address as simply `%#y ' if z = 0. as follows:  %a . char switch : switch ( ++ p) f h Cases for formatting characters 134 i. b. [ and ] . ROUND_OFF . w.  %g means to print ` (bad guess) ' if good is false . the trace output will be "\$1=l[101] = 5 + 8 = 13" . ADD $1. h Interpret character p in the trace format 133 i  f if (p = 6 '%' ) fputc (p.  %t means to print either `Yes. `$1=l[101] ') or as a register number and its equivalent in the array of global registers (e. This case is used only when z is a relatively small number (z:h = 0)..nes its symbolic representation. If round mode = ROUND_NEAR . Such brackets are placed around a oating point operator. %p . For example.g. if the instruction is.  %r means to switch to string rhs and continue formatting from there. formatted as a register number and its equivalent in the ring of local registers (e. y . ROUND_DOWN . as explained below. g g This code is used in section 131. = can't happen = g default : printf ("BUG!!" ). and z . mb . for example. %b . and %z stand for the numeric contents of octabytes a.

pre. Octabytes are printed as decimal numbers unless a \style" character intervenes between the percent sign and the name of the octabyte: `# ' denotes hexadecimal notation.401 MMIX-SIM: TRACING 134.

xed by # . `0 ' denotes hexadecimal notation with no pre.

= handle . x: octa .h>. goto char switch .h>. lhs : char [ ]. x61. break . break . ROUND_OFF = 1. trace print (z ). break . oating . zhex . false = 0. x61. stdout : FILE . rhs = macro. . b: octa . x61. trace print (mb ). x100. printf : int ( ). x61. trace print (w).h>. mb : octa . 1. x61. <stdio. x137. ma : octa . or StdErr if the value is 0. x62. x10. ' denotes oating decimal notation. x62. ROUND_UP = 2. x139. x61. trace print (y). x61. <stdio. trace print (ma ).' : '!' : style style style style = hex . break . x61. ROUND_NEAR = 4. hex .xed # and with leading zeros not suppressed. break . goto char switch . = 3. or 2. good : bool . x100. This code is used in section 133. StdOut . a: octa . h: tetra . trace print : void ( ). handle g fmt style . h Type declarations 9 i + typedef enum f decimal . x100. x137. = zhex . style : fmt style . break . w: octa . See also sections 136 and 138. fputc : int ( ). x9. x100. goto char switch . z: octa . `. break . trace print (b). x139. h Cases for formatting characters 134 i  case case case case '#' : '0' : '. p: register char . trace print (x). x76. 136. break . = oating . x61. <stdio. ROUND_DOWN round mode : int . and `! ' means to use the names StdIn . x61. g: octa [ ]. h Cases for formatting characters 134 i + case case case case case case case case 'a' : 'b' : 'p' : 'q' : 'w' : 'x' : 'y' : 'z' : trace print (a). y: octa . zz : register int . goto char switch . 135.

break . case 'r' : p = switchable string . case zhex : printf ("%08x%08x" . case ')' : fputc (right paren [round mode ]. break . #de. break . case 't' : if (x:l) printf (" Yes. h Subroutines 12 i + fmt style style . o:l). stdout ). char stream name [ ] = f"StdIn" . else print int (o). return . print hex (o). else printf (" No" ). stdout ). case oating : print oat (o). return . case '?' : p ++ . case handle : if (o:h  0 ^ o:l < 3) printf (stream name [o:l]). p. void trace print (o) octa o. return . case hex : fputc ('#' . break . case 'g' : if (:good ) printf (" (bad guess)" ). 139. h Cases for formatting characters 134 i + case '(' : fputc (left paren [round mode ]. stdout ). case 'l' : printf (lhs ). "StdErr" g. break . break . -> #" ). if (z:l) printf ("%c%d" . z:l). break . f g switch (style ) f case decimal : print int (o). g 138. void trace print ARGS ((octa )). print hex (inst ptr ). case 's' : printf (special name [zz ]). "StdOut" .MMIX-SIM: 402 TRACING 137. return . break . return . o:h.

= switchable string must be able to hold any trap format = char lhs [32]. = holds rhs . position 0 is ignored = char switchable string [48]. '^' . bad guesses . '[' . '(' g. '^' . =  branch prediction statistics = . '_' . ']' . ')' g.ne rhs &switchable string [1] h Global variables 19 i + = denotes the rounding mode = char left paren [ ] = f0. = denotes the rounding mode = char right paren [ ] = f0. '_' . int good guesses .

halted ? "halted" : "now" . l: tetra . g [rC ]:l  1 ? "" : "s" .403 MMIX-SIM: TRACING 140. print oat : void ( ). x15. x55. x110. trap format : char [ ]. fputc : int ( ). g [rU ]:l. x62. x10. x56. x12. good guesses  1 ? "" : "es" . x61. 4) : inst ptr . h: tetra . oating = 3. g [rC ]:h. printf (" (%s at location #%08x%08x)\n" . print int : void ( ). %d mem%s. x55. f g ARGS octa o. printf (" %d instruction%s. hex = 1.h>. <stdio. x print hex : void ( ). g [rC ]:h  1 ? "" : "s" . x135. if (:verbose ) return . printf : int ( ). octa = struct . <stdio. x61. g [rU ]:l  1 ? "" : "s" . x135. x10. o = halted ? incr (inst ptr . fmt style = enum . = macro ( ). incr : octa ( ). inst ptr : octa . x135. MMIX-ARITH x6. zz : register int . void show stats (verbose ) bool verbose . p: register char . z: octa . good guesses . %d bad\n" . handle = 4. g [rC ]:l. zhex = 2. x76. %d good guess%s. x135. x9. bool = enum . x62. g: octa [ ]. x61. x: octa . good : bool . special name : char [ ].h>. bad guesses ).h>. x61. <stdio. x11. x61. %d oop%s. decimal = 0. stdout : FILE . rU = 17. h Subroutines 12 i + void show stats ARGS ((bool )). x10. rC = 8. x135. o:l). MMIX-ARITH 54. o:h. round mode : int . . x61. x135. halted : bool .

MMIX-SIM: RUNNING THE PROGRAM 404 141. Running the program. Now we are ready to .

h Process the command line 142 i. interrupt = false .h> #include "abstime.h" h Preprocessor macros 11 i h Type declarations 9 i h Global variables 19 i h Subroutines 12 i main (argc . argv ) int argc . h Load the command line arguments 163 i. end simulation : if (pro. mmix io init ( ).h> #include <stdlib.h> #include <ctype. h Get ready to UNSAVE the initial context 164 i. while (1) f if (interrupt ^ :breakpoint ) breakpoint = interacting = true .h> #include <string.t the pieces together into a working simulator. #include <stdio. h Initialize everything 14 i. if (interact after break ) interacting = true . char argv [ ].h> #include <signal. interact after break = false . if (halted ) break . else f g g breakpoint = false . f h Local registers 62 i. do h Perform one instruction 60 i while ((:interrupt ^ :breakpoint ) _ resuming ). if (interacting ) h Interact with the user 149 i.

if (interacting _ pro.ling ) h Print all the frequency counts 53 i.

ling _ showing stats ) show stats (true ). g 142. Here we process the command-line options. when we .

nish. cur arg should be the name of the object .

le to be loaded and simulated. #de.

ne mmo .

This code is used in section 141.le name cur arg h Process the command line 142 i  myself = argv [0]. true ). true ). cur arg ++ ) scan option (cur arg + 1. = exit with usage note = if (:cur arg ) scan option ("?" . cur arg ^ (cur arg )[0]  '-' . for (cur arg = argv + 1. . = this is the argc of the user program = argc = cur arg argv .

x myself : char . x61. mmix io init : void ( ). x61. MMIX-IO 7.405 breakpoint : bool . halted : bool . pro. x144. MMIX-SIM: interrupt : bool . x61. x61. interact after break : bool . x9. false = 0. x144. x144. cur arg : char . interacting : bool .

scan option : void ( ).ling : bool . x9. show stats : void ( ). RUNNING THE PROGRAM resuming : bool . x144. true = 1. showing stats : bool . x143. x129. x140. . x61.

Careful readers of the following subroutine will notice a little white bug: A tracing speci.MMIX-SIM: 406 RUNNING THE PROGRAM 143.

"%d" . case 'L' : if (:(arg + 1)) pro. &trace threshold ) 6= 1) trace threshold = 0. return . usage ) char arg .cation like t1000000000 or even t0000000000 or even t!!!!!!!!!! is silently converted to t4294967295 . return . h Subroutines 12 i + void scan option ARGS ((char . switch (arg ) f case 't' : if (strlen (arg ) > 10) trace threshold = # ffffffff. "%d" . else if (sscanf (arg + 1. = should we exit with usage note if unrecognized? f = register int k. return . &tracing exceptions ) 6= 1) tracing exceptions = 0. case 's' : showing stats = true . return . case 'x' : if (:(arg + 1)) tracing exceptions = # ff. else if (sscanf (arg + 1. The -b and -c options are e ective only on the command line. case 'l' : if (:(arg + 1)) gap = 3. else if (sscanf (arg + 1. = command-line argument (without the `. showing source = true . void scan option (arg . return . "%x" . but they are harmless while interacting. &gap ) 6= 1) gap = 0.') = bool usage . case 'r' : stack tracing = true . bool )).

&pro. "%d" . else if (sscanf (arg + 1.le gap = 3.

le gap ) 6= 1) pro.

pro.le gap = 0.

le showing source = true . case 'P' : pro.

pro. showing stats = true . stack tracing = true .ling = true . showing source = true . case 'v' : trace threshold = # ffffffff. return . gap = 10. tracing exceptions = # ff.

pro.le gap = 10.

pro.le showing source = true .

stack tracing = showing stats = showing source = false . return . case 'q' : trace threshold = tracing exceptions = 0. pro.ling = true .

ling = pro.

case 'f' : h Open a . return . "%d" . "%d" . case 'b' : if (sscanf (arg + 1. case 'c' : if (sscanf (arg + 1. case 'I' : interact after break = true . &lring size ) 6= 1) lring size = 0. return .le showing source = false . return . case 'i' : interacting = true . return . return . &buf size ) 6= 1) buf size = 0.

return .le for simulated standard input 145 i. case 'D' : h Open a .

return . usage help [k]). myself ). .. exit ( 1).. usage help [k][0]. "Usage: %s <options> progfile command-line-args. default : if (usage ) f fprintf (stderr .le for dumping binary output 146 i.\n" . k ++ ) fprintf (stderr . for (k = 0.

x11. <stdio. fprintf : int ( ). x9. = macro ( ). exit : void ( ).407 g ARGS MMIX-SIM: g RUNNING THE PROGRAM g else for (k = 0. x76. x61. interact after break : bool .h>. bool = enum .h>. x9. return . x48. false = 0. myself : char . x144. gap : int . x40. usage help [k][1] 6= 'b' . lring size : int . <stdio. interacting : bool . pro.h>. buf size : int . k ++ ) printf (usage help [k]). <stdlib. x61. printf : int ( ).

pro. x48.le gap : int .

le showing source : bool . x 48. pro.

stack tracing : bool .h>. true = 1. x144.h>. x48. . x61. x129. showing stats : bool . tracing exceptions : int . x144. x9. stderr : FILE . <stdio. showing source : bool . <stdio.h>. strlen : int ( ).ling : bool . <string. x61. trace threshold : tetra . x61. sscanf : int ( ). usage help : char [ ].

bool interrupt .MMIX-SIM: RUNNING THE PROGRAM 144. = has the user interrupted the simulation recently? = bool pro. h Global variables 19 i + char myself . the name of this simulator = = pointer to current place in the argument vector = char cur arg . = argv [0].

= should we print the pro.ling .

le at the end? = = .

le substituted for the simulated StdIn = FILE fake stdin . = .

le used for binary dumps = FILE dump .

"M<x><t> set and/or show memory octabyte in format t\n" . "-s show statistics after each traced instruction\n" . "S set current segment to Stack_Segment\n" . "-L<n> list source lines with the profile\n" . "b[rwx]<x> set or reset breakpoint at location x\n" . "B show all current breakpoints and tracepoints\n" . but only after the program halts\n" . "@<x> go to location x\n" . "-r trace hidden details of the register stack\n" . char interactive help [ ] = f "The interactive commands are:\n" . "-I interact. " or <empty> (previous <t>) or =<value> (change value)\n" . "u<x> untrace location x\n" . "i<file> insert commands from file\n" . "n trace one instruction\n" . filling gaps <= n\n" . "D set current segment to Data_Segment\n" . "<return> trace one instruction\n" . "-t<n> trace each instruction the first n times\n" . "-b<n> change the buffer size for source lines\n" . "" g. 408 . "-c<n> change the cyclic local register ring size\n" . "l<n><t> set and/or show local register in format t\n" . (floating) or # (hex) or \" (string)\n" . "-P print a profile when simulation ends\n" . " <t> is ! (decimal) or . "T set current segment to Text_Segment\n" . "t<x> trace location x\n" . "-v be verbose: show almost everything\n" . char usage help [ ] = f " with these options: (<n>=decimal number. "-i run interactively (prompt for online commands)\n" . "-f<filename> use given file to simulate standard input\n" . "-l<n> list source lines when tracing. "P set current segment to Pool_Segment\n" . <x>=hex number)\n" . "q quit the simulation\n" . "g<n><t> set and/or show global register in format t\n" . "$<n><t> set and/or show dynamic register in format t\n" . "-q be quiet: show only the simulated standard output\n" . "-e<x> trace each instruction with an exception matching x\n" . "c continue until halt or breakpoint\n" .le . "-D<filename> dump a file for use by other simulators\n" . "s show current statistics\n" . "+<n><t> set and/or show n additional octabytes in format t\n" .

h Open a . "" g. 145.409 MMIX-SIM: RUNNING THE PROGRAM "-<option> change a tracing/listing/profile option\n" . "-? show the tracing/listing/profile options \n" .

fake stdin = fopen (arg + 1. "Sorry. 146.le for simulated standard input 145 i  fclose (fake stdin ). I can't open else mmix fake stdin (fake stdin ). file %s!\n" . arg + 1). if (:fake stdin ) fprintf (stderr . "r" ). h Open a . This code is used in section 143.

le for dumping binary output 146 i  dump .

"wb" ).le = fopen (arg + 1. if (:dump .

"Sorry. arg + 1). h Initialize everything 14 i + signal (SIGINT .le ) fprintf (stderr . = now catchint will catch the . I can't open file %s!\n" . 147. This code is used in section 143. catchint ).

true = 1. x9. bool = enum . x11. <signal. mmix fake stdin : void ( ). <signal. fprintf : int ( ). arg : char .h>. = macro ( ).h>. void catchint (n) int n. MMIX-IO 10.rst interrupt = 148. argv : char [ ]. stderr : FILE . x9.h>. x141. catchint ). ARGS  now catchint will catch the next interrupt = = FILE . fopen : FILE ( ). x = macro. signal (SIGINT . . fclose : int ( ).h>.h>. f g interrupt = true . <stdio. <stdio.h>. <stdio. <stdio. x143. <stdio. h Subroutines 12 i + void catchint ARGS ((int )). SIGINT signal : void (( ))( ).h>.

repeating = 0. h Interact with the user 149 i  f register int repeating .MMIX-SIM: RUNNING THE PROGRAM 410 149. switch (p) f case '\n' : case 'n' : breakpoint = tracing = true . p = command buf . false ). interact : h Put a new command in command buf 150 i. case 's' : show stats (true ). case 'q' : goto end simulation . h Cases that change cur disp mode 152 i. goto interact . if (p[k 1]  '\n' ) p[k 1] = '\0' . case '-' : k = strlen (p). goto interact . = trace one inst and break = = continue until breakpoint = case 'c' : goto resume simulation . h Cases that de. scan option (p + 1.

(Type h for help)\n" . goto interact . g check syntax : if (p 6= '\n' ) f if (:p) incomplete str : printf ("Syntax error: Incomplete command!\n" ). else f p[strlen (p) 1] = '\0' . else strcpy (command buf + 9. I don't understand `%s'. interactive help [k][0]. p). goto interact . g This code is used in section 141. printf ("Eh? Sorry. 150.ne cur disp type 153 i. k ++ ) printf (interactive help [k])." ). h Cases that set and clear tracing and breakpoints 161 i. command buf ). if (k < 10 ^ command buf [k 1]  '\n' ) command buf [k 1] = '\0' . ". printf ("Syntax error. g g while (repeating ) h Display and/or set the value of the current octabyte 156 i... resume simulation : . default : what say : k = strlen (command buf ). case 'h' : for (k = 0. h Put a new command in command buf 150 i  f register bool ready = false . goto interact . I'm ignoring `%s'!\n" . incl read : while (incl .

le ^ :ready ) if (:fgets (command buf . command buf size . incl .

le )) f fclose (incl .

incl .le ).

. while (:ready ) f printf ("mmix> " ).le = . else ready = true . ush (stdout ). g else if (command buf [0] 6= '\n' ^ command buf [0] 6= 'i' ^ command buf [0] 6= '%' ) if (command buf [0]  ' ' ) printf (command buf ).

command buf size . incl . stdin )) command buf [0] = 'q' . if (command buf [0] 6= 'i' ) ready = true .411 MMIX-SIM: RUNNING THE PROGRAM if (:fgets (command buf . else f command buf [strlen (command buf ) 1] = '\0' .

le = fopen (command buf + 1. "r" ). if (incl .

if (isspace (command buf [1])) incl .le ) goto incl read .

if (incl . "r" ).le = fopen (command buf + 2.

command buf + 1). printf ("Can't open file `%s'!\n" . g g g This code is used in section 149. #de.le ) goto incl read . 151.

ne command buf size 1024 = make it plenty long. = . for oating point tests = h Global variables 19 i + char command buf [command buf size ].

le of commands included by `i ' = FILE incl .

rQ = 16. rE . x55. 0. rX .h>. <stdio.le . rTT . rN . fgets : char (). 0.h>. rT . rI = 12. stdin : FILE . rM = 5. x55. x61. rV . scan option : void ( ). rT = 13. x10. = current segment o set = octa cur seg . 0. stdout : FILE . rW = 24.h>. x55. interactive help : char [ ]. rBB . 0. rZ g. <stdio. x55. 0. x55. 0. rF . rK . rG = 19. tracing : bool . x141. x55. x55. x55. fopen : FILE ( ). rS . breakpoint : bool . bool = enum . <stdio. rH . rL = 20. rP = 23. 0. 0. 0. x62. x55. rZZ g. <string. <stdio. end simulation : label. = 'l' or 'g' or '$' or 'M' = char cur disp mode = 'l' . rD . p: register char . rA = 21. rV = 18. x9. rZ = 27. 0. isspace : int ( ). <ctype. = was the last <t> of the form =<val> ? = bool cur disp set .h>. 0. true = 1. false = 0. x 144. rY = 26. FILE . rJ . x55. x10.h>. rN = 9. x55. x55. rD = 1. x9. = '!' or '. rO = 10. rU . x55. rWW = 28. x55. x55.h>. x55. <stdio. x55. x143. x55. rG . 0. rTT = 14.h>. <stdio. x55. 0. 0. x140. rX = 25. rC . x55. = the h half is relevant only in mode 'M' = octa cur disp addr . x55. x9. x55. rF = 22. char spec reg code [ ] = frA . <string. 0. x55. x55. 0. x55. rU = 17. rYY = 30. x55.' or '#' or '"' = char cur disp type = '!' . x55. fclose : int ( ). k: register int . x55. rR . 0. show stats : void ( ). rBB = 7. <stdio. . rJ = 4. rR = 6. char spec regg code [ ] = f0. h: tetra . strlen : int ( ). 0. <stdio. rC = 8. rO . rK = 15. printf : int ( ). rE = 2. octa = struct . rH = 3.h>. rYY . rS = 11. x55. rL . x55. rZZ = 31.h>. ush : int ( ).h>. rXX . rB = 0. rY . x61. x55. rM . strcpy : char ( ). x55. rI . 0. x62. rB . rP . rWW . rQ . rXX = 29.h>. rW .

if ((p + 1) 6= p) cur disp addr :l = spec reg code [p 'A' ].MMIX-SIM: RUNNING THE PROGRAM 152. cur disp addr :l &= 8. p ++ . for (cur disp addr :l = 0. case 'r' : p ++ . = the `= ' is remembered only by `+ ' = new mode : cur disp set = false . goto new mode . cur disp addr = scan hex (p + 1. 153. p += 2. p ++ ) repeating = 10  repeating + p '0' . goto scan type . h Cases that change cur disp mode 152 i  case 'l' : case 'g' : case '$' : cur disp mode = p ++ . else goto what say . cur disp mode = 'g' . 8). else cur disp addr :l ++ . g goto scan type . case '+' : if (:isdigit ((p + 1))) repeating = 1. isdigit (p). h Cases that de. case 'M' : cur disp mode = p. goto new mode . p ++ ) cur disp addr :l = 10  cur disp addr :l + p '0' . else if (spec regg code [p 'A' ]) cur disp addr :l = spec regg code [p 'A' ]. if (repeating ) f if (cur disp mode  'M' ) cur disp addr = incr (cur disp addr . cur seg ). p = next char . for (p ++ . if (p < 'A' _ p > 'Z' ) goto what say . This code is used in section 149. repeating = 1. isdigit (p).

set type : cur disp type = p ++ .' ) break . o set ) char s. if ( ++ p  '#' ) cur disp type = p. if (p 6= '. repeating = 1. scan eql : cur disp set = true . p = next char . h Subroutines 12 i + octa scan hex ARGS ((char octa scan hex (s. val :l &= # ff. val = scan hex (p + 1. val :h = 0. octa )). case '=' : repeating = 1. 412 . goto scan eql . break . else cur disp type = (scan const (p) > 0 ? '. if (p 6= '=' ) break . scan string : cur disp type = '"' . break . else if (p  '"' _ p  '\'' ) goto scan string . zero octa ). 154. val = zero octa . . octa o set .' : case '#' : case '"' : cur disp set = false . scan type : if (p  '!' _ p  '. This code is used in section 149.' _ p  '#' _ p  '"' ) goto set type .ne cur disp type 153 i  case '!' : case '. h Scan a string constant 155 i.' : '!' ).

isdigit (p). else if (p  'A' ) o = incr (o. scan const : int ( ). x9. l: tetra . MMIX-ARITH 69. return oplus (o. p ++ ) k = (10  k + p val = incr (shift left (val . x151. isxdigit : int ( ). what say : label. h Scan a string constant 155 i  while (p  '. if (p  '\'' ^ (p + 2)  p) p = (p + 2) = '"' . g g else if (p  '\n' ) goto incomplete str . val = incr (shift left (val . '0' 'a' + 10). x151. oplus : octa ( ). <ctype. o set ). p: register char . 8). <ctype. cur disp set : bool . x149. x10. x11. next char : char . x9. 155. if (p ^ p ++  '"' ) if (p  '.413 f MMIX-SIM: RUNNING THE PROGRAM register char p. MMIX-ARITH x5. MMIX-ARITH x 4. x7. p ++ ) val = incr (shift left (val . spec regg code : char [ ]. repeating : register int . h: tetra . '0' 'A' + 10). MMIX-ARITH x4. aux : octa . ARGS = macro ( ). isdigit : int ( ). cur disp mode : char . if (p  'a' ) o = incr (o. k: register int . x62. g g next char = p. val : octa . x10. o = zero octa . x10. x149. cur disp addr : octa . x151.h>. MMIX-ARITH x6. k). MMIX-ARITH 68. incomplete str : label. x shift left : octa MMIX-ARITH spec reg code : char [ ]. aux :l & # ff). x octa = struct . '0' ) & # ff. 8). if (p  '"' ) f for (p ++ . false = 0. cur seg : octa . 8). incr : octa ( ). true = 1. isxdigit (p).' ) f if ( ++ p  '#' ) f aux = scan hex (p + 1. zero octa : octa . x151. x151. . p ++ ) f o = incr (shift left (o. x151. zero octa ). p ^ p 6= '\n' ^ p 6= '"' . for (p = s. p '0' ).' ) goto scan string . g else if (isdigit (p)) f for (k = p ++ '0' . x62. x151. x149. p = next char . ( ). octa o. cur disp type : char . 4). p). MMIX-ARITH x69. g This code is used in section 153.h>.

case 'g' : k = cur disp addr :l & # ff. fputc ('\n' .MMIX-SIM: RUNNING THE PROGRAM 414 156. else cur disp addr :l ++ . g This code is used in section 149. case '$' : k = cur disp addr :l & # ff. if (k < 32) h Set g[k] = val only if permissible 158 i. break . h Display and/or set the value of the current octabyte 156 i  f if (cur disp set ) h Set the current octabyte to val 157 i. else if (k  G) g[k] = val . repeating . h Set the current octabyte to val 157 i  switch (cur disp mode ) f case 'l' : l[cur disp addr :l & lring mask ] = val . if (:repeating ) break . 157. h Display the current octabyte 159 i. if (k < L) l[(O + k) & lring mask ] = val . 8). break . if (cur disp mode  'M' ) cur disp addr = incr (cur disp addr . break . case 'M' : if (:(cur disp addr :h & sign bit )) f ll = mem . stdout ). g [k] = val .

G g g This code is used in section 157.nd (cur disp addr ). Here we essentially simulate a PUT command. cur round = (val :l  # 10000 ? val :l  16 : ROUND_NEAR ). = val :l. else break . . g for (j = val :l. g This code is used in section 156. j ++ ) g[j ] = zero octa . h Set g[k] = val only if permissible 158 i  if (k  9 ^ k = 6 rI ) f if (k  19) break . ll~ tet = val :h. g break . but we simply break if the PUT is illegal or privileged. (ll + 1)~ tet = val :l. if (k  rA ) f if (val :h = 6 0 _ val :l  # 40000) break . 158. else if (k  rL ) f if (val :h  0 ^ val :l < L) L = val :l. g else if (k  rG ) f if (val :h = 6 0 _ val :l > 255 _ val :l < L _ val :l < 32) break . j < G.

else if (k  G) printf ("$%d=g[%d]=" . k. aux = l[(O + k) & lring mask ]. (O + k) & lring mask ). k). aux = zero octa . k. if (k < L) printf ("$%d=l[%d]=" . k). break . aux = l[k]. else printf ("$%d=" . case 'M' : if (cur disp addr :h & sign bit ) aux = zero octa . case '$' : k = cur disp addr :l & # ff. printf ("g[%d]=" . h Display the current octabyte 159 i  switch (cur disp mode ) f case 'l' : k = cur disp addr :l & lring mask . break . else f ll = mem . k). break .415 MMIX-SIM: RUNNING THE PROGRAM 159. aux = g[k]. aux = g [k]. printf ("l[%d]=" . k). case 'g' : k = cur disp addr :l & # ff.

x151. x151. x151. mem . print hex (aux ). l: tetra . x62. x75. break . cur disp addr : octa . case '"' : print string (aux ). break . ll : register mem tetra . g: octa [ ]. x 20. break . l: octa . break . g This code is used in section 156. x76.nd (cur disp addr ). x10. x151. print hex (cur disp addr ). G: register int . MMIX-ARITH x4. <stdio. x 62.' : print oat (aux ).h>. break . incr : octa ( ). x75. MMIX-ARITH x6. x76. switch (cur disp type ) f case '!' : print int (aux ). cur disp type : char . fputc : int ( ). aux :l = (ll + 1)~ tet . printf ("]=" ). x62. aux :h = ll~ tet . g g printf ("M8[#" ). aux : octa . h: tetra . x76. stdout ). x10. cur disp mode : char . case '. MMIX-ARITH x 30. lring mask : int . j : register int . case '#' : fputc ('#' . cur disp set : bool . k: register int . cur round : int . L: register int .

x75. x55. PUT = # f6.nd : mem tetra (). MMIX-ARITH x69. rI = 12. x55. O: register int . <stdio. x15. 54. x55. rA = 21. x55. rG = 19.h>. x160. rL = 20. print hex : void ( ). print oat : void ( ). . sign bit = macro. stdout : FILE . print int : void ( ). MMIX-ARITH x printf : int ( ). x54. print string : void (). x16. MMIX-ARITH x 4. <stdio. tet : tetra . val : octa . x15. ROUND_NEAR = 4.h>. repeating : register int . x12. zero octa : octa . x149. x100.

state = 2.MMIX-SIM: RUNNING THE PROGRAM 160. b. void print string (o) octa o. state = 1. k ++ ) g g f = ((k < 4 ? o:h  (8  (3 k)) : o:l  (8  (7 k)))) & # ff. case 't' : case 'u' : k = p." : state  1 ? ". b).0" . k < 8. break . f register int k. state > 1 ? "\"" : "" ). state = 1. else printf ("%s#%x" . b). cur seg ). g else if (b  ' ' ^ b  '~' ) printf ("%s%c" . p = next char . state > 1 ? "" : state  1 ? ". for (k = state = 0. cur seg ). p = next char . else if (state > 1) printf ("\"" ). if (val :h < # 20000000) f ll = mem . h Cases that set and clear tracing and breakpoints 161 i  case '@' : inst ptr = scan hex (p + 1. state > 1 ? "\". halted = false ." : "" . state . val = scan hex (p + 1. if (b  0) f if (state ) printf ("%s. h Subroutines 12 i + void print string ARGS ((octa )).\"" : "\"" . 161. b if (state  0) printf ("0" ).

else ll~ bkpt &= trace bit .nd (val ). if (k  't' ) ll~ bkpt j= trace bit . p ++ ) if (p  'r' ) k j= read bit . val = scan hex (p. p ++ . p = next char . g break . if (:(val :h & sign bit )) f ll = mem . case 'b' : for (k = 0. cur seg ). :isxdigit (p). else if (p  'w' ) k j= write bit . else if (p  'x' ) k j= exec bit .

passit : p ++ . goto passit . 416 . goto passit . case 'S' : cur seg :h = # 60000000.nd (val ). break . case 'P' : cur seg :h = # 40000000. goto passit . case 'D' : cur seg :h = # 20000000. case 'B' : show breaks (mem root ). This code is used in section 149. g break . case 'T' : cur seg :h = 0. ll~ bkpt = (ll~ bkpt & 8) j k. goto passit .

4  j ). p~ dat [j ]:bkpt & trace bit ? 't' : '-' . The location of the . g g if (p~ right ) show breaks (p~ right ). octa cur loc . starting at M[Pool_Segment + 8  (argc + 2)]8 . cur loc :l. for (j = 0.417 MMIX-SIM: RUNNING THE PROGRAM 162. j < 512. f register int j . if (p~ left ) show breaks (p~ left ). the strings themselves are octabyte-aligned. We put pointers to the command-line strings in M[Pool_Segment + 8  (k + 1)]8 for 0  k < argc . printf (" %08x%08x %c%c%c%c\n" . p~ dat [j ]:bkpt & read bit ? 'r' : '-' . j ++ ) if (p~ dat [j ]:bkpt ) f cur loc = incr (p~ loc . p~ dat [j ]:bkpt & write bit ? 'w' : '-' . cur loc :h. p~ dat [j ]:bkpt & exec bit ? 'x' : '-' ). 163. void show breaks (p) mem node p. h Subroutines 12 i + void show breaks ARGS ((mem node )).

rst free octabyte in the pool segment is placed in M[Pool_Segment ]8 . x:l = # 8. k ++ . for (k = 0. cur arg ++ ) f ll = mem . 8  (argc + 1)). loc = incr (x. k < argc . h Load the command line arguments 163 i  = # 40000000.

ll~ tet = loc :h.nd (x). (ll + 1)~ tet = loc :l. ll = mem .

x:h g x:l = 0. strlen (cur arg ). loc :l += 8 + (strlen (cur arg ) & 8). x:l += 8.nd (loc ). ll = mem . loc ). mmputchars (cur arg .

dat : mem tetra [ ]. bkpt : unsigned char . inst ptr : octa . h: tetra . mem .h>. (ll + 1)~ tet = loc :l. k: register int . x6. argc : int . x151. x61. x61. isxdigit : int ( ). x16. halted : bool . x11. false = 0. x141. x16. ll~ tet = loc :h. x61. cur seg : octa . left : mem node . ll : register mem tetra . incr : octa ( ). x 62. exec bit = macro. x58. x9. x 20. x62. This code is used in section 141. x10.nd (x). loc : octa . = macro ( ). <ctype. cur arg : char . x10. x144. x16. ARGS MMIX-ARITH l: tetra . loc : octa . x16.

mem root : mem node .nd : mem tetra (). x10. x58. x19. right : mem node . x62. scan hex : octa ( ). . x58. read bit = macro. <stdio. mmputchars : int ( ). x61. mem node = struct . x154. strlen : int ( ). x16.h>. printf : int ( ). x16. x octa = struct . x58. next char : char . tet : tetra . val : octa . x15. write bit = macro. sign bit = macro.h>. MMIX-ARITH x69. x16. p: register char . x: octa . <string. trace bit = macro. x117. MMIX-ARITH 69.

x:l = # 90. h Get ready to UNSAVE the initial context 164 i  x:h = 0. ll = mem .MMIX-SIM: 418 RUNNING THE PROGRAM 164.

if (ll~ tet ) inst ptr = x. rop = RESUME_AGAIN . if (dump . resuming = true .nd (x). g [rX ]:l = ((tetra ) UNSAVE  24) + 255.

165. dump (mem root ). dump tet (0). The special option `-D<filename> ' can be used to prepare binary .le ) f x:l = 1. g This code is used in section 141. dump tet (0). exit (1).

les needed by the MMIX -in-MMIX simulator of Section 1. This option puts big-endian octabytes into a given .4.3.

x = incr (x. The simulated simulator knows how to load programs in such a format (see exercise 1. if (p~ left ) dump (p~ left ).4. : : : . dump tet (0). M8 [l +16]. g g g dump tet (p~ dat [j ]:tet ). and so does the meta-simulator MMMIX. void dump tet ARGS ((tetra )). octa cur loc . h Subroutines 12 i + void dump ARGS ((mem node )). . dump tet (p~ dat [j + 1]:tet ). x = cur loc . M8 [l +8]. if (cur loc :l 6= x:l _ cur loc :h 6= x:h) f if (x:l 6= 1) dump tet (0). for (j = 0. f register int j . j < 512. 8).3{20). a location l is followed by one or more nonzero octabytes M8 [l]. j += 2) if (p~ dat [j ]:tet _ p~ dat [j + 1]:tet ) f cur loc = incr (p~ loc . void dump (p) mem node p. followed by zero. dump tet (cur loc :l). 4  j ).le. if (p~ right ) dump (p~ right ). dump tet (cur loc :h).

419 MMIX-SIM: RUNNING THE PROGRAM 166. f g ARGS fputc (t  24. dump . h Subroutines 12 i + void dump tet (t) tetra t.

fputc ((t  16) & # ff. dump .le ).

fputc ((t  8) & # ff.le ). dump .

le ). fputc (t & # ff. dump .

dat : mem tetra [ ]. dump .le ). = macro ( ). x11. x16.

x10. ll : register mem tetra . exit : void ( ). inst ptr : octa . x76. x 20. incr : octa ( ). g: octa [ ]. x10.h>. x61. MMIX-ARITH x6. mem . x 62. <stdio. x16. <stdlib.le : FILE .h>. l: tetra . x16. x144. fputc : int ( ). loc : octa . left : mem node . h: tetra .

octa = struct . tet : tetra . x10. RESUME_AGAIN = 0. x125.nd : mem tetra (). . mem node = struct . x9. right : mem node . x10. x16. UNSAVE = # fb. x19. x54. x61. x61. rX = 25. tetra = unsigned int . x16. true = 1. rop : int . x: octa . x16. x55. mem root : mem node . resuming : bool . x61.

136. 94. 104. h Cases for individual MMIX instructions 84. 90. 85.MMIX-SIM: NAMES OF THE SECTIONS 420 167. 138 i Used in section 133. 96. 95. h Cases for formatting characters 134. h Cases for lopcodes in the main loop 33. 34. 106. 101. Names of the sections. 97. 89. 92. h Cases that de. 86. 88. 35. 108. h Cases that change cur disp mode 152 i Used in section 149. 36 i Used in section 29. 87. 107. 93. 102. 124 i Used in section 60.

h Cases that set and clear tracing and breakpoints 161 i Used in section 149.ne cur disp type 153 i Used in section 149. h Check if the source . h Check for trip interrupt 122 i Used in section 60.

le has been modi.

31. h Fix up the subtrees of q 22 i Used in section 21. 41. 56. 65. h Either halt or print warning 109 i Used in section 108. h Get ready to UNSAVE the initial context 164 i Used in section 141. 25. 76. 24. h Increase rL 81 i Used in section 80. h Fetch the next instruction 63 i Used in section 60. h Global variables 19. 48. 32. h Display and/or set the value of the current octabyte 156 i Used in section 149. 151 i Used in section 141. h Info for load/store commands 68 i Used in section 65. 77. h Install operand . 113.ed 44 i Used in section 42. h Info for logical and control commands 69 i Used in section 65. h Get ready to update rA 100 i Used in section 97. h Info for branch commands 67 i Used in section 65. 18. 129. h Initialize everything 14. 52. 144. 61. h Get ready to update rG 99 i Used in section 97. h Convert relative address to absolute address 70 i Used in section 60. 139. h Info for arithmetic commands 66 i Used in section 65. 40. 147 i Used in section 141. h Display the current octabyte 159 i Used in section 156. h Initiate a trip interrupt 123 i Used in section 122. 121. 110.

adjusting the register stack if necessary 80 i Used in section 60. h Load g[k] from the register stack 105 i Used in section 104. h Load and write one byte 118 i Used in section 117. h Install special operands when resuming an interrupted operation 126 i Used in section 71. h Open a . h Load the command line arguments 163 i Used in section 141. h Local registers 62. 75 i Used in section 141. h Load the preamble 28 i Used in section 32. h Load tet as a normal item 30 i Used in section 29. h Load the next item 29 i Used in section 32. h Install register X as the destination.elds 71 i Used in section 60. h Interact with the user 149 i Used in section 141. h Load and write four bytes 119 i Used in section 117. h Load the postamble 37 i Used in section 32. h Interpret character p in the trace format 133 i Used in section 131.

le for dumping binary output 146 i Used in section 143. .

421 MMIX-SIM: NAMES OF THE SECTIONS h Open a .

h Perform one instruction 60 i Used in section 141. h Prepare to list lines from a new source .le for simulated standard input 145 i Used in section 143. h Prepare memory arguments ma = M[a] and mb = M[b] if needed 111 i Used in section 108.

h Put a new command in command buf 150 i Used in section 149. h Set b from special register 79 i Used in section 71. h Print a stream-of-consciousness description of the instruction 131 i Used in section 128. h Read and store up to four bytes. h Set b from register X 74 i Used in section 71. 17. h Print changes to rL 132 i Used in section 131. h Update the clocks 127 i Used in section 60. 143. h Process the command line 142 i Used in section 141. 83. 148. 15. 10. h Read and store one byte. the location. return if done 116 i Used in section 114. 43. h Preprocessor macros 11. 162. 120. h Set y from register Y 73 i Used in section 71. L) 98 i Used in section 97. 117. 13. 16. h Scan a string constant 155 i Used in section 153. 26. h Print the frequency count. 82. h Set the current octabyte to val 157 i Used in section 156. 50. 20. 114. h Set z as an immediate wyde 78 i Used in section 71. h Prepare to perform a ropcode 125 i Used in section 124. and the instruction 130 i Used in section 128. 46 i Used in section 141. h Print all the frequency counts 53 i Used in section 141. h Set z from register Z 72 i Used in section 71. 166 i Used in section 141. h Type declarations 9. 55. setting last mem and p to its location 21 i Used in section 20. . 154. 42. if requested 128 i Used in section 60. 64. h Trace the current instruction. 165. 27. 91. 39. 140. 59. 47. h Subroutines 12. 45. 38. h Search for key in the treap. 135 i Used in section 141. 54. 160. h Print frequency data for location p~ loc + 4  j 51 i Used in section 50.le 49 i Used in section 47. 137. h Set g[k] = val only if permissible 158 i Used in section 157. h Set L = z = min(z. return if done 115 i Used in section 114. h Store g[k] in the register stack 103 i Used in section 102.

De.422 MMIXAL 1.

nition of MMIXAL. and translates it into binary . the MMIX assembly language. This program takes input written in MMIXAL .

However. Yet it tries to have enough features to serve also as the back end of compilers for C and other high-level languages. because it is primarily intended for the simple demonstration programs in The Art of Computer Programming. each of which usually contains a single instruction. Instructions for using the program appear at the end of this document. MMIXAL is much simpler than the \industrial strength" assembly languages that computer manufacturers usually provide. and so are lines with two or more instructions.les that can be loaded and executed on MMIX simulators. lines with no instructions are possible. First we will discuss the input and output languages in detail. 2. step by step. then we'll put everything together. A program in MMIXAL consists of a series of lines. Each instruction has three parts called its label . then we'll consider the translation process.

eld. opcode .

eld. and operand .

eld. these .

The label .elds are separated from each other by one or more spaces.

which is often empty.eld. consists of all characters up to the .

The opcode .rst blank space.

eld. runs from the . which is never empty.

rst nonblank after the label to the next blank space. The operand .

eld. runs from the next nonblank character (if any) to the . which again might be empty.

If the operand .rst blank or semicolon that isn't part of a string or character constant.

eld is followed by a semicolon. The label . otherwise the rest of the line is ignored. with the additional proviso that string or character constants are not allowed to extend from one line to another. a new instruction begins immediately after the semicolon. possibly with intervening blanks. The end of a line is treated as a blank space for the purposes of these rules.

Popular ways to introduce comments.eld must begin with a letter or a digit. either at the beginning of a line or after the operand . otherwise the entire line is treated as a comment.

nor does it know how to include header . 3. However. MMIXAL has no built-in macro capability. because they will be assumed to introduce another instruction. are to precede them by the character % as in TEX.eld. MMIXAL is not very particular. or by // as in C ++ . Lisp-style comments introduced by single semicolons will fail if they follow an instruction.

les and such things. But users can run their .

For example. unless it is told not to do so. (Caution: The preprocessor also removes C-style comments. this program interprets it as a line directive emitted by a preprocessor.les through a standard C preprocessor to obtain MMIXAL programs in which macros and such things have been expanded. # 13 "foo. If a line begins with the special form `# h integer i h string i'.mms" means that the following line was line 13 in the user's source .) Literate programming tools could also be used for preprocessing.

mms .le foo. Line directives allow us to correlate errors with the user's original .

for use by simulators and debuggers. D. Knuth: MMIXware. 422-493. we also pass them to the output.  Springer-Verlag Berlin Heidelberg 1999 . pp. 1999. LNCS 1750.le.E.

MMIXAL: DEFINITION OF MMIXAL 423 4. MMIXAL deals primarily with symbols and constants. Constants are simplest. which it interprets and combines to form machine language instructions and data. so we will discuss them .

representing a number in radix 10. the code #27 . But the data structures below are designed so that a change to Unicode will not be diÆcult when the time is ripe.rst. The quoted character can be anything except the character that the C library calls \n or newline. .#7eb3 ) when Unicode is supported. h character constant i ! ' h single byte character except newline i' h constant i ! h decimal constant i j h hex constant i j h character constant i Notice that ''' represents a single quote. since wyde character input is not supported.'o' . 'a' represents the constant #61 . say. For example. A string constant like "Hello" is an abbreviation for a sequence of one or more character constants separated by commas: 'H'.'l'. " (namely #9ad8.' '. representing a number in radix 16: h digit i ! 0 j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9 h hex digit i ! h digit i j A j B j C j D j E j F j a j b j c j d j e j f h decimal constant i ! h digit i j h decimal constant ih digit i h hex constant i ! # h hex digit i j h hex constant ih hex digit i Constants whose value is 264 or more are reduced modulo 264 .'e'.#5fb7.'l'. A character constant is a single character enclosed in single quote marks. and '\' represents a backslash. Any character except newline or the double quote mark " can appear between the double quotes " is an abbreviation for ' '. In the present implementation a character constant will always be at most 255.' ' of a string constant. the code #5c . But if the input were in Unicode one could write. preceded by # . ' @' or ' ' for #05d0 or #0416 . also known as 97 . MMIXAL characters are never \quoted" by backslashes as in the C language. The present program does not support Unicode directly because basic software for inputting and outputting 16-bit characters was still in a primitive state at the time of writing. Similarly. 5. A hexadecimal constant is a sequence of hexadecimal digits. A decimal constant is a sequence of digits. that character should be represented as #12 .  6. it denotes the ASCII or Unicode number corresponding to that character.

for purposes of this de. A symbol in MMIXAL is any sequence of letters and digits. A colon `: ' or underscore symbol `_ ' is regarded as a letter. beginning with a letter.MMIXAL: DEFINITION OF MMIXAL 424 7.

when MMIXAL is used with Unicode. All extended-ASCII characters like `e'. h letter i ! A j B j    j Z j a j b j    j z j : j _ j h character with code value > 126 i h symbol i ! h letter i j h symbol ih letter i j h symbol ih digit i In future implementations. 8. are also treated as letters. A symbol is said to be fully quali.nition. thus MMIXAL symbols will be able to involve Greek letters or Chinese characters or thousands of other glyphs. whose 8-bit code exceeds 126. all wyde characters whose 16-bit code exceeds 126 will be regarded as letters.

ed if it begins with a colon. Every symbol that is not fully quali.

ed is an abbreviation for the fully quali.

ed symbol obtained by placing the current pre.

the current pre.x in front of it.

x is always fully quali.

At the beginning of an MMIXAL program the current pre.ed.

:z current prefix is :Foo: means ADD :Foo:x.y. The current pre.z Bar: :x.:Foo:y. ADD PREFIX ADD PREFIX ADD PREFIX ADD x.z Foo: x.:Foo:z current prefix is :Foo:Bar: means ADD :x.Foo:Bar:y. but the user can change it with the PREFIX command.y.x is simply the single character `: '.:Foo:z This mechanism allows large programs to avoid con icts between symbol names. when parts of the program are independent and/or written by di erent users.:z : x.y.:y.:Foo:Bar:y.Foo:z % % % % % % % means ADD :x.:Foo:Bar:y. For example.:z current prefix reverts to : means ADD :x.

or H . 9. A local symbol is a decimal digit followed by one of the letters B . F . meaning \backward. but this convention need not be obeyed." or \here": h local operand i ! h digit i B j h digit i F h local label i ! h digit i H The B and F forms are permitted only in the operand .x conventionally ends with a colon." \forward.

the H form is permitted only in the location .eld of MMIXAL instructions.

A local operand such as 2F stands for the . or 0 if 2H has not yet appeared as a label.eld. A local operand such as 2B stands for the last local label 2H in instructions before the current one.

rst 2H in instructions after the current one. 2H JMP 2B the . Thus. in a sequence such as 2H JMP 2F.

rst instruction jumps to the second and the second jumps to the .

Local symbols are useful for references to nearby points of a program. in cases where no meaningful name is appropriate. They can also be useful in special situations where a rede.rst.

an instruction like 9H IS 9B+1 will maintain a running counter. for example.nable symbol is needed. .

425 MMIXAL: DEFINITION OF MMIXAL 10. Each symbol receives a value called its equivalent when it appears in the label .

eld of an instruction. it is said to be de.

ned after its equivalent has been established. are prede. A few symbols. like rA and ROUND_OFF and Fopen .

ned because they refer to .

otherwise every symbol should be de.xed constants associated with the MMIX hardware or its rudimentary operating system.

because the second `2H ' is not the same symbol as the . The two appearances of `2H ' in the example above do not violate this rule.ned exactly once.

A prede.rst.

ned symbol can be rede.

After it has been rede.ned (given a new equivalent).

ned it acts like an ordinary symbol and cannot be rede.

ned again. A complete list of the prede.

ned symbols appears in the program listing below. . A pure equivalent is an unsigned octabyte. `$20 ' means register number 20. but a register number equivalent is a one-byte value. between 0 and 255. Equivalents are either pure or register numbers. for example. A dollar sign is used to change a pure number into a register number.

" \change from pure value to register number." \complement the bits. and & mean.j | j ^ Each expression has a value that is either pure or a register number." \subtract from zero." and \take the serial number". The character @ stands for the current location. Constants and symbols are combined into expressions in a simple way: h primary expression i ! h constant i j h symbol i j h local operand i j @ j ( h expression i) j h unary operator ih primary expression i h term i ! h primary expression i j h term ih strong operator ih primary expression i h expression i ! h term i j h expression ih weak operator ih term i h unary operator i ! + j . .j ~ j $ j & h strong operator i ! * j / j // j % j << j >> j & h weak operator i ! + j . Only the . respectively.MMIXAL: DEFINITION OF MMIXAL 426 11. \do nothing. ~ . $ . The unary operators + .. which is always pure.

The last unary operator. & . it converts a symbol to the unique positive integer that is used to identify it in the binary . + . applies only to symbols. and it is of interest primarily to system programmers. can be applied to a register number.rst of these.

and x^y stand respectively for (x + y ) mod 264 (addition). x|y . x>>y . For example. and x&y stand respectively for (xy ) mod 264 (multiplication). and y-x denotes the pure value 9 . and the strong binary operations are all essentially multiplicative by nature while the weak binary operations are essentially additive. x<<y . x-y . (Incidentally. x//y . fractional division is legal only if x < y. strong and weak. The weak binary operations x+y . >> . (x y ) mod 264 (subtraction). x%y . if x denotes $1 and y denotes $10 . and x  y (bitwise exclusive or) on unsigned octabytes. Binary operators come in two avors. (x  2y ) mod 264 (left shift). None of the strong binary operations can be applied to register numbers. The primary reason is that the designers of C chose to give << . b264 x=yc (fractional division). one might ask why the designer of MMIXAL did not simply adopt the existing rules of C for expressions. h pure i + h register i. not pretending to have a close relationship with that language. bx=y c (division). but in MMIXAL we want to be able to write things like o<<24+x<<16+y<<8+z or @+yz<<2 or @+(#100-@)&#ff . it was better to make a clean break. x mod y (remainder). These operations can be applied to register numbers only in four contexts: h register i + h pure i. The strong ones are essentially concerned with multiplication or division: x*y . but a register number assigned as the equivalent of a symbol should not exceed 255. x _ y (bitwise or). h register i h pure i and h register i h register i. x/y . because MMIXAL has just two levels of precedence. Since the conventions of C were inappropriate. Division is legal only if y > 0. The new rules are quite easily memorized. and x ^ y (bitwise and) on unsigned octabytes.) 12. then x+3 and 3+x denote $4 . A symbol is called a future reference until it has been de. Register numbers within expressions are allowed to be arbitrary octabytes. and & a lower precedence than + . bx=2y c (right shift).le output by MMIXAL .

MMIXAL restricts the use of future references. so that programs can be assembled quickly in one pass over the input. therefore all expressions can be evaluated when the MMIXAL processor .ned.

.rst sees them.

We noted earlier that each MMIXAL instruction contains a label . moreover. PUSHJ . for example. one can say JMP 1F or JMP 1B-4 . 13. future references can appear as operands only in instructions that have relative addresses (namely branches. which does nothing). probable branches. GETA ) or in octabyte constants (the pseudo-operation OCTA ). but not JMP 1F-4 . JMP .427 MMIXAL: DEFINITION OF MMIXAL The restrictions are easily stated: Future references cannot be used in expressions together with unary or binary operators (except the unary + . Thus.

eld. an opcode .

and an operand .eld.

The label .eld.

when it is nonempty. the symbol or local label receives an equivalent. The operand .eld is either empty or a symbol or local label.

it is equivalent to the simple operand .eld is either empty or a sequence of expressions separated by commas. when it is empty.

h instruction i ! h label ih opcode ih operand list i h label i ! h empty i j h symbol i j h local label i h operand list i ! h empty i j h expression list i h expression list i ! h expression i j h expression list i. h expression i The opcode .eld `0 '.

$Z . or a pseudo-operation.$Y. one never writes ADDI or JMPB in the source input to MMIXAL . LDA $X. MMIXAL determines such things automatically. Thus.eld either contains a symbolic MMIX operation name (like ADD ).$Y. The symbolic operation names for genuine MMIX operations should not include the suÆx I for an immediate operation or the suÆx B for a backward jump. SET $X.$Y.Z is equivalent to ADDU $X.$Z is equivalent to ADDU $X. it loads the address of memory location $Y+$Z into register X.  LDA $X. Similarly.Z .$Y is equivalent to OR $X. although such opcodes might appear when a simulator or debugger or disassembler is presenting a numeric instruction in symbolic form. h opcode i ! h symbolic MMIX operation i j h pseudo-operation i h symbolic MMIX operation i ! TRAP j FCMP j    j TRIP h pseudo-operation i ! IS j LOC j PREFIX j GREG j LOCAL j BSPEC j ESPEC j BYTE j WYDE j TETRA j OCTA . or an alias operation. There are two alias operations:  SET $X.Y (when Y is not a register) is equivalent to SETL $X. Similarly. it sets register X to register Y. Alias operations are alternate names for MMIX operations whose standard names are inappropriate in certain contexts.$Y.0 .Y .$Y. Pseudo-operations do not correspond directly to MMIX commands. but they govern the assembly process in important ways.

The . MMIX operations like ADD require exactly three expressions as operands.MMIXAL: DEFINITION OF MMIXAL 428 14.

In the .y+1 if the equivalent of x is $1 and the equivalent of y is $2 . The command \subtract 5 from register 1' could be expressed as SUB $1.$1. register/pure).rst two must be register numbers. Thus. pure. register/pure) or only two (register.x.5 or as SUB x. The third must be either a register number or a pure number between 0 and 255. for example. ADD becomes ADDI in the assembled output. MMIX operations like FLOT require either three operands (register.5 '. say.$2.$3 or as.y. the command \set register 1 to the sum of register 2 and register 3" could be expressed as ADD $1. ADD x.$1.5 ' or `SUBI x.5 but not as `SUBI $1. in the latter case.x.

which is best expressed in terms of the prede.rst case the middle operand is the rounding mode.

ROUND_DOWN . pure).ned symbolic values ROUND_CURRENT . ROUND_CURRENT ). In the second case the middle operand is understood to be zero (namely. MMIX operations like SETL or INCH . ROUND_UP . which involve a wyde intermediate constant. require exactly two operands. for (0. ROUND_NEAR . The value of the second operand should . ROUND_OFF . (register. 1. 4) respectively. 3. 2.

t in two bytes. also require two operands. The . MMIX operations like BNZ . which mention a register and a relative address.

rst operand should be a register number. The second operand. when subtracted from the current location and divided by four. should yield a result r in the range 216  r < 216 . The second operand might also be unde.

in that case.ned. the eventual value must satisfy the restriction stated for de.

The opcodes GETA and PUSHJ are similar.ned values. except that the .

but it has only one operand. The JMP operation is also similar. and it allows the larger address range 224  r < 224 . like LDO and STHT and GO . except that the .rst operand to PUSHJ might also be pure (see below). are treated like ADD if they have three operands. MMIX operations that refer to memory.

rst operand should be pure (not a register number) in the case of PRELD . PREGO . The . These opcodes also accept a special two-operand form. and SYNCID . in which the second operand stands for a base address and an immediate o set (see below). PREST . SYNCD . STCO .

rst operand of PUSHJ and PUSHGO can be either a pure number or a register number. In the .

Sub ' or `PUSHGO 2." Both cases result in the same assembled output. in the second case (`PUSHJ $2.Sub ') the programmer might be thinking \let's push down two registers".Sub ' or `PUSHGO $2.Sub ') the programmer might be thinking \let's make register 2 the hole position for this subroutine call. .rst case (`PUSHJ 2.

yz or TRAP xyz.MMIXAL: DEFINITION OF MMIXAL 429 The remaining MMIX opcodes are idiosyncratic: NEG r. PUT s.s. SWYM and TRIP are like TRAP . TRAP x. UNSAVE r.z or TRAP x.p.y. GET r. SAVE r.0. POP p. preferably given by one of the prede.z.z.yz. Here s is an integer between 0 and 31. SYNC xyz. RESUME xyz.

p is a pure byte. : : : for special register codes. and z are either register numbers or pure bytes. rB . r is a register number. y .ned symbols rA . yz and xyz are pure values that . x .

they a ect . All of these rules can be summarized by saying that MMIXAL treats each MMIX opcode in the most natural way. When there are three operands.t respectively in two and three bytes.

elds X. and Z of the assembled MMIX instruction. Y. when there are two operands. they a ect .

elds X and YZ. it a ects . when there is just one operand.

15. 2. or 3 if necessary. the MMIXAL instruction tells the assembler to carry out four steps: (1) Align the current location so that it is a multiple of 4.eld XYZ. by adding 1. In all cases when the opcode corresponds to an MMIX operation. (2) De.

ne the equivalent of the label .

eld to be the current location. if the label is nonempty. (3) Evaluate the operands and assemble the speci.

ed MMIX instruction into the current location. . (4) Increase the current location by 4.

MMIXAL: DEFINITION OF MMIXAL 16. Now let's consider the pseudo-operations. 430  h label i IS h expression i de. starting with the simplest cases.

 h label i LOC h expression i . which must not be a future reference. The expression may be either pure or a register number.nes the value of the label to be the value of the expression.

rst de.

Then the current location is changed to the value of the expression. `X LOC @+500 ' de.nes the label to be the value of the current location. which must be pure. `LOC #1000 ' will start assembling subsequent instructions or data in location whose hexadecimal value is # 1000. For example. if the label is nonempty.

nes X to be the address of the .

can be expressed as `LOC @+(256-@)&255 '. assembly will continue at location X + 500.rst of 500 bytes in memory. A less trivial example arises if we want to emit instructions and data into two separate areas of memory. if it is not already aligned in that way. but we want to intermix them in the MMIXAL source . The operation of aligning the current location to a multiple of 256.

le. We could start by de.

Instead of the two pseudo-instructions `8H IS @.ning 8H and 9H to be the starting addresses of the instruction and data segments. 9H IS @ '. LOC 9B ' one could in fact write simply `8H LOC 9B ' when switching from instructions to data. a sequence of instructions could be enclosed in `LOC 8B . respectively. Then. 8H IS @ '. Any number of such sequences could then be combined. : : : . : : : . a sequence of data could be enclosed in `LOC 9B .  PREFIX h symbol i rede.

nes the current pre.

x to be the given symbol (fully quali.

ed). The label .

tetrabytes. The next pseudo-operations assemble bytes. 17. or octabytes of data.eld should be blank.  h label i BYTE h expression list i de. wydes.

nes the label to be the current location. if the label .

then it assembles one byte for each expression in the expression list.eld is nonempty. and advances the current location by the number of bytes. The expressions should all be pure numbers that .

t in one byte. but it . and 0 into locations # 1000. String constants are often used in such expression lists. and advances the current location to # 1006. if the current location is # 1000. 'e' .  h label i WYDE h expression list i is similar. For example. 'l' . 'o' . : : : . the instruction BYTE "Hello".0 assembles six bytes containing the constants 'H' . # 1005. 'l' .

rst makes the current location even. by adding 1 to it if necessary. Then it de.

nes the label (if a nonempty label is present). The expressions should all be pure numbers that . The current location is advanced by twice the number of expressions in the list. and assembles each expression as a two-byte value.

but it aligns the current location to a multiple of 4 before de.t in two bytes.  h label i TETRA h expression list i is similar.

Each expression should be a pure number that .ning the label. The current location is advanced by 4n if there are n expressions in the list. then it assembles each expression as a four-byte value.

 h label i OCTA h expression list i is similar. but it .ts in four bytes.

The current .rst aligns the current location to a multiple of 8. it assembles each expression as an eight-byte value.

but they should all be de. Any or all of the expressions may be future references.431 MMIXAL: DEFINITION OF MMIXAL location is advanced by 8n if there are n expressions in the list.

ned as pure numbers eventually. .

MMIXAL: DEFINITION OF MMIXAL 432 18. They could be allocated by hand. and de. Global registers are important for accessing memory in MMIX programs.

At the beginning of assembly. but MMIXAL provides a mechanism that is usually much more convenient:  h label i GREG h expression i allocates a new global register. the current global threshold G is $255. Each distinct GREG instruction decreases G by 1.ned with IS instructions. and assigns its number as the equivalent of the label. the .

for example. Base addresses can simplify memory accesses in an important way. The value of the expression will be loaded into the global register at the beginning of the program. it should remain constant throughout the program execution . Suppose. . Two or more base addresses with the same constant value are assigned to the same global register number. If this value is nonzero.nal value of G will be the initial value of rG when the assembled program is loaded. such global registers are considered to be base addresses.

you will be able to write simply `LDO $1. and `LDO $2. DD . the second operand should be a pure number whose value can be expressed as b + Æ .CC LOC @+8. and EE : AA LOC @+8. BB .DD LOC @+8.ve octabyte values appear in a data segment.CC ' to bring CC into register $2 . CC .AA ' to bring AA into register $1 . and their addresses are called AA . where 0  Æ < 256 and b is the value of a base address in one of the preceding GREG commands.BB LOC @+8.EE LOC @+8 Then if you say Base GREG AA . The MMIXAL processor will . Here's how it works: Whenever a memory operation such as LDO or STB or GO has only two operands.

For example. Caution: The -x feature makes the number of actual MMIX instructions hard to predict.nd the closest base address and manufacture an appropriate command. The -x option inserts additional instructions if necessary. SETL $255. If no base address is close enough. then the -x option would assemble two instructions. For example. if there is no base address that allows LDO $2. but if FF equals Base+1000 .16 '.Base. the instruction `LDO $2. . This base address convention can be used also with the alias operation LDA .FF .CC ' loads the address of CC into register 3. LDO $2. For example. so extreme care must be used if your style of coding includes relative branch instructions in dangerous forms like `BNZ x.CC ' in the example of the preceding paragraph would be converted automatically to `LDO $2.Base. an error message will be generated. so that any address is accessible.16 '.1000. unless this program is run with the -x option on the command line. using global register 255. `LDA $3.$255 in place of LDO $2.FF to be implemented in a single instruction. by assembling the instruction `ADDU $3.Base.@+8 '.

When MMIXAL programs use subroutines with a memory stack in addition to the built-in register stack.$2.fp GREG 0 '. Short programs rarely run out of global registers.0 '. but long programs need a mechanism to check that GREG hasn't been used too often. 19.$2 to be an abbreviation for `LDO $1.MMIXAL: DEFINITION OF MMIXAL 433 MMIXAL also allows a two-operand form for memory operations such as LDO $1. The following pseudoinstruction provides the necessary safety valve:  LOCAL h expression i ensures that the expression will be a local register in the program being assembled. and the label . these instructions allocate a stack pointer sp=$254 and a frame pointer fp=$253 . The expression should be a register number. However. they usually begin with the instructions `sp GREG 0. subroutine libraries are free to implement any conventions for global registers and stacks that they like.

eld should be blank. MMIXAL will report an error if the . At the close of assembly.

so MMIXAL implicitly acts as if the instruction `LOCAL $31 ' were present. (MMIX always considers $0 through $31 to be local.) 20. the h expression i should have a value that . there are two pseudo-instructions to pass information and hints to the loading routine and/or to debuggers that will be using the assembled program.nal value of G does not exceed all register numbers that are declared local in this way.  BSPEC h expression i begins \special mode". A LOCAL instruction need not be given unless the register number is 32 or more. Finally.

and the label .ts in two bytes.

eld should be blank.  ESPEC ends \special mode". the operand .

eld is ignored. and the label .

only the pseudo-operations IS . TETRA . WYDE . PREFIX .eld should be blank. but not loaded as part of the assembled program. Ordinary MMIX instructions cannot appear in special mode. The operand of BSPEC should have a value that . and LOCAL are allowed. All material assembled between BSPEC and ESPEC is passed directly to the output. OCTA . BYTE . GREG .

this value identi.ts in two bytes.

hence MMIXAL provides a general-purpose conduit. System routines often need to pass such information through an assembler to the operating system. (For example.) . BSPEC 0 might introduce information about subroutine calling conventions at the current location. and BSPEC 1 might introduce line numbers from a high-levellanguage program that was compiled into the code at the current place.es the kind of data that follows.

at the address corresponding to the fully quali. A program should begin at the special symbolic location Main (more precisely.MMIXAL: DEFINITION OF MMIXAL 434 21.

ed symbol :Main ). and it must always be de. This symbol always has serial number 1.

unless the user is a true hacker who is willing to take the risk that such data might crash the system. (More precisely. and a pointer to the beginning of an array of argument pointers in $1. but the general rule \do not load two things into the same byte" is safest. the loader will load the bitwise xor of all the data assembled for each byte position.ned.) All locations that do not receive assembled data are initially zero. and the operating system may put commandline data and debugger data into segment 2. . except that the loading routine will put register stack data into segment 3. Locations should not receive assembled data more than once.) Segments 2 and 3 should not get assembled data. (The rudimentary MMIX operating system starts a program with the number of command-line arguments in $0.

Binary MMO output. When the MMIXAL processor assembles a .435 MMIXAL: BINARY MMO OUTPUT 22.

mms .le called foo. it produces a binary output .

le called foo. (The suÆx mms stands for \MMIX symbolic.mmo ." and mmo stands for \MMIX object.") Such mmo .

les have a simple structure consisting of a sequence of tetrabytes. Some of the tetrabytes are instructions to a loading routine. Loader instructions are distinguished from tetrabytes of data by their . others are data to be loaded.

rst (most signi.

#de. which is unlikely to occur in tetras of data. This code value corresponds to MMIX 's opcode LDVTS . Sometimes they are combined into a single 16-bit operand called YZ. The second byte X of a loader instruction is the loader opcode.cant) byte. called mm in the program below. called the lopcode. Y and Z. The third and fourth bytes. which has the special escape-code value # 98. are operands.

ne mm # 98 .

MMIXAL: BINARY MMO OUTPUT 436 23. contrived example will help explain the basic ideas of mmo format. Consider the following input . A small.

mms" % this comment is a line directive LOC 2B-4*10 % move 10 tetras before previous location 1H JMP 2B % resolve previous references to 1F BSPEC 5 % begin special data of type 5 TETRA &a<<8 % four bytes of special data WYDE a-$0 % two more bytes of special data ESPEC % end a special data packet LOC ABCD+2 % resume the data segment BYTE "cd".ABCD+1 % use the base address BZ $3. called test.mms : % A peculiar example of MMIXAL % location #2000000000000000 LOC Data Segment OCTA 1F % a future reference a GREG @ % $254 is base address for ABCD ABCD BYTE "ab" % two bytes of data LOC #123456789 % switch to the instruction segment Main JMP 1F % another future reference LOC @+#4000 % skip past 16384 bytes 2H LDB $3. TRAP % and refer to the future again # 3 "foo.#98 % assemble three more bytes of data It de.1F.le.

the program halts when it gets to an all-zero TRAP instruction following the BZ . But the assembled output of this .nes a silly program that essentially puts 'b' into register 3.

mms was the . and in fact test.le illustrates most of the features of MMIX objects.

rst test .

le tried by the author when the MMIXAL processor was originally written. The binary output .

le test. Fuller explanations appear with the descriptions of individual lopcodes below. version 1. 98090101 36f4a363 98012001 00000000 00000000 00000000 61620000 98010002 00000001 2345678c 98060002 74657374 2e6d6d73 98070007 f0000000 98024000 lop pre 1. 1 tetra) (the .mmo assembled from test. shown in hexadecimal notation with brief comments.mms consists of the following tetrabytes. 1 (preamble.

le creation time) lop loc # 20. 1 (data segment. 1 tetra) (low tetrabyte of address in data segment) (high tetrabyte of OCTA 1F ) (low tetrabyte. will be .

2 tetras) (high tetrabyte of address in instruction segment) (low tetrabyte of address.xed up later) ("ab" . 2 (instruction segment. after alignment) lop . padded with trailing zeros) lop loc 0.

2 (.le 0.

le name 0.mms" ) lop line 7 (line 7 of the current . 2 tetras) ("test" ) (".

le) (JMP 1F . will be .

xed up later) lop skip # 4000 (advance 16384 bytes) .

lop . x24.MMIXAL: BINARY MMO OUTPUT 437 98070009 8103fe01 42030000 9807000a 00000000 98010002 00000001 2345a768 98050010 0100fff5 98040ff7 98032001 00000000 98060102 666f6f2e 6d6d7300 98070004 f000000a 98080005 00000200 00fe0000 98012001 0000000a 00006364 98000001 98000000 980a00fe 20000000 00000008 00000001 2345678c 980b0000 203a5040 50404020 41204220 43094408 83404020 4d206120 69056e01 2345678c 81400f61 fe820000 980c000a lop end = # c.

le = # 6. x24. lop .

xo = # 3. x24. lop .

lop . x24.xr = # 4.

x24.xrx = # 5. lop line 9 (line 9 of the current .

uses base address b ) (BZ $3.1 . will be .1F .le) (LDB $3.b.

2 (instruction segment. 2 tetras) (high tetrabyte of address in instruction segment) (low tetrabyte of address 1H ) lop .xed later) lop line 10 (stay on line 10) (TRAP ) lop loc 0.

xrx 16 (.

x 16-bit relative address) (.

xup for location @-4*-11 ) lop .

xr # ff7 (.

x @-4*#ff7 ) lop .

1 (data segment.xo # 20. 1 tetra) (low tetrabyte of data segment address to .

x) lop .

2 (.le 1.

" ("mms" . 2 tetras) ("foo. 0 lop line 4 (line 4 of the current .le name 1.

x24. padded with trailing zeros) lop post $254 (begin postamble. lop spec = # 8. lop loc = # 1. x24. x24. 1 tetra) (low tetrabyte of address in data segment) ("cd" with leading zeros. 1 (data segment. serial 2) lop end (end symbol table. 10 tetras) lop line = # 7. x24. serial 3) (Main = # 000000012345678c. lop stab = # b. x24. lop quote = # 0. lop pre = # 9. . G is 254) (high tetrabyte of the initial contents of $254) (low tetrabyte of base address $254) (high tetrabyte of the initial contents of $255) (low tetrabyte of $255. is address of Main ) lop stab (begin symbol table) (compressed form for symbol table as a ternary trie) (ABCD = # 2000000000000008. x24.le) (JMP 2B ) lop spec 5 (begin special data of type 5) (TETRA &b<<8 ) (WYDE a-$0 ) lop loc # 20. x24. lop post = # a. x24. lop skip = # 2. serial 1) (a = $254. because of alignment) lop quote (don't treat next tetrabyte as a lopcode) (BYTE #98 .

When a tetrabyte of the mmo .MMIXAL: BINARY MMO OUTPUT 438 24.

the tetrabyte actually goes into location  ^ ( 4) = 4b=4c. according to MMIX 's usual conventions. (If  is not a multiple of 4. When a tetrabyte does begin with the escape code.le does not begin with the escape code.) The current line number is also increased by 1. its next byte is the lopcode de. and  is increased to the next higher multiple of 4. if it is nonzero. it is loaded into the current location .

ning a loader instruction. There are thirteen lopcodes:  lop quote : X = # 00. Z = tetra count (Z = 1 or 2). Set the current location to the 64-bit address de. YZ = 1. Y = high byte. even if it begins with the escape code. Treat the next tetra as an ordinary tetrabyte.  lop loc : X = # 01.

If Z = 2. Usually Y = 0 (for the instruction segment) or Y = # 20 (for the data segment).ned by the next Z tetras. plus 256 Y. the high tetra appears .

rst.  lop skip : X = # 02. YZ = delta. Increase the current location by YZ.  lop .

Y = high byte. Load the value of the current location  into octabyte P. where P is the 64-bit address de. Z = tetra count (Z = 1 or 2).xo : X = # 03.

ned by the next Z tetras plus 256 Y as in lop loc . (The octabyte at P was previously assembled as zero because of a future reference.)  lop .

Load YZ into the YZ .xr : X = # 04. YZ = delta.

or GETA . where P is  4YZ. PUSHJ . (This tetrabyte was previously loaded with an MMIX instruction that takes a relative address: a branch.eld of the tetrabyte in location P. Its YZ . namely the address that precedes the current location by YZ tetrabytes. probable branch. JMP .

)  lop .eld was previously assembled as zero because of a future reference.

Z = 16 or 24.xrx : X = # 05. Y = 0. Proceed as in lop .

but load Æ into tetrabyte P =  4Æ instead of loading YZ into P =  4YZ.xr . Here Æ is the value of the tetrabyte following the lop .

but it is needed when . (The latter case arises only rarely. If the leading byte is 1. Æ should be treated as the negative number (Æ ^ # ffffff) 2Z when calculating the address P. its leading byte will either 0 or 1.xrx instruction.

or JMP to JMPB . we have Z = 24 when . The value of Æ that is xored into location P in such cases will change BZ to BZB .xing up a relative \future" reference that ultimately leads to a \backward" instruction.. etc.

Z = 16 otherwise.xing a JMP .)  lop .

Y = .le : X = # 06.

le number. Set the current . Z = tetra count.

le number to Y and the current line number to zero. If this .

le number has occurred previously. and the next Z tetrabytes are the characters of the . otherwise Z should be positive. Z should be zero.

le name in big-endian order. Trailing zeros follow the .

le name if its length is not a multiple of 4.  lop line : X = # 07. If the line number is nonzero. Set the current line number to YZ. the current . YZ = line number.

)  lop spec : X = # 08. Begin special data of type YZ. YZ = type.le and current line should correspond to the source location that generated the next data to be loaded. (The MMIXAL processor gives precise line numbers to the sources of tetrabytes in segment 0. for use in diagnostic messages. but not to the sources of tetrabytes assembled in other segments. which tend to be instructions. The subsequent .

Z = tetra count. continuing until the next loader operation other than lop quote . A lop quote instruction allows tetrabytes of special data to begin with the escape code. comprise the special data. A lop pre instruction.  lop pre : X = # 09. which de.439 MMIXAL: BINARY MMO OUTPUT tetrabytes. Y = 1.

nes the \preamble." must be the .

rst tetrabyte of every mmo .

The Y .le.

eld speci.

currently 1.es the version number of mmo format. other version numbers may be de.

the . but version 1 should always be supported as described in the present document.ned later. If Z > 0. The Z tetrabytes following a lop pre command provide additional information that might be of interest to system routines.

rst tetra of additional information records the time that this mmo .

Y = 0. These tetrabytes specify 256 G octabytes in big-endian fashion (high half . G+1. It causes the loaded program to begin with rG equal to the stated value of G. Z = G (must be 32 or more). $255 initially set to the values of the next (256 G)  2 tetrabytes.  lop post : X = # 0a.le was created. This instruction begins the postamble. : : : . which follows all instructions and data to be loaded. and with $G. measured in seconds since 00:00:00 Greenwich Mean Time on 1 Jan 1970.

rst). It is followed by the symbol table. This instruction must appear immediately after the (256 G)  2 tetrabytes following lop post .  lop stab : X = # 0b. YZ = 0. which lists the equivalents of all user-de.

YZ = tetra count.  lop end : X = # 0c. This instruction must be the very last tetrabyte of each mmo .ned symbols in a compact form that will be described later.

exactly YZ tetrabytes must appear between it and the lop stab command.le. Furthermore. (Therefore a program can easily .

nd the symbol table without reading forward through the entire mmo .

) A separate routine called MMOtype is available to translate binary mmo .le.

les into human-readable form. = the quotation lopcode = #de.

ne lop quote # 0 = the location lopcode = #de.

ne lop loc # 1 = the skip lopcode = #de.

ne lop skip # 2 = the octabyte-.

x lopcode = #de.

ne lop .

xo # 3 = the relative-.

x lopcode = #de.

ne lop .

xr # 4 = extended relative-.

x lopcode = #de.

ne lop .

xrx # 5 = the .

le name lopcode = #de.

ne lop .

le # 6 = the .

le position lopcode = #de.

ne lop line # 7 = the special hook lopcode = #de.

ne lop spec # 8 = the preamble lopcode = #de.

ne lop pre # 9 = the postamble lopcode = #de.

ne lop post # a = the symbol table lopcode = #de.

ne lop stab # b = the end-it-all lopcode = #de.

ne lop end # c .

Many readers will have noticed that MMIXAL has no facilities for relocatable output. The author's . nor does mmo format support such features.MMIXAL: BINARY MMO OUTPUT 440 25.

but the rules were substantially more complicated and therefore inconsistent with the goals of The Art of Computer Programming. now that computer memory is signi. with external linkages.rst drafts of MMIXAL and mmo did allow relocatable objects. The present design might actually prove to be superior to the current practice.

.cantly cheaper than it used to be. and they can communicate with each other in much more exible ways. because one-pass assembly and loading are extremely fast when relocatability and external linkages are disallowed. and such libraries will certainly improve in quality when their source form is accessible to a larger community of users. Di erent program modules can be assembled together about as fast as they could be linked together under a relocatable scheme. Debugging tools are enhanced when open-source libraries are combined with user programs.

441 MMIXAL: BASIC DATA TYPES 26. This program for the 64-bit MMIX architecture is based on 32-bit integer arithmetic. The de. because the same routines are needed also for the simulators. because nearly every computer available to the author at the time of writing was limited in that way. Basic data types. Details of the basic arithmetic appear in a separate program module called MMIX-ARITH.

nition of type tetra should be changed. if necessary. to conform with the de.

h Type de.nitions found in MMIX-ARITH.

h Subroutines 28 i  extern octa oplus ARGS ((octa y. ( ). = auxiliary output of a subroutine = extern bool over ow . and 74. 59. = set by certain subroutines for signed arithmetic = See also sections 33. This code is used in section 136. 37. 73. MMIX-ARITH x5. = unsigned (aux . extern octa omult ARGS ((octa y. 55. This code is used in section 136. MMIX-ARITH x8. octa z)). MMIX-ARITH x4. 45. aux = (x. = y  s. 62. 50. int uns )). 0  s  64 = extern octa shift left ARGS ((octa y. 139. MMIX-ARITH x4. 67. incr : octa ( ). 48. 49. 44. 51. = assumes that an int is exactly 32 bits wide = typedef struct f tetra h. 60. x31. Most of the subroutines in MMIX-ARITH return an octabyte as a function of two octabytes. neg one : octa . l. 58. oand : octa ( ). octa z )). z ) returns the sum of octabytes y and z .nitions 26 i  typedef unsigned int tetra . MMIX-ARITH x5. x) = y  z = extern octa odiv ARGS ((octa x. MMIX-ARITH x13. for example. 63. aux : octa . = unsigned (x. Division inputs the high half of a dividend in the global variable aux and returns the remainder in aux . This code is used in section 136. oplus (y. y ) mod z = See also sections 41. 36. true g bool . y )=z . odiv : octa ( ). See also sections 30. 120. 105. oplus : octa ( ). 69. MMIX-ARITH x7. 27. int s. 90. shift right : octa ( ). 28. ARGS = macro . and 143. and 82. zero octa : octa . MMIX-ARITH x6. MMIX-ARITH x4. = unsigned y z = extern octa incr ARGS ((octa y. octa y. octa z)). signed if :uns = extern octa shift right ARGS ((octa y. extern octa aux . ominus : octa ( ). g octa . 68. MMIX-ARITH x4. = unsigned y + Æ (Æ is signed) = extern octa oand ARGS ((octa y. octa z)). = y ^ z = = y  s. int s)). h Global variables 27 i  = zero octa :h = zero octa :l = 0 = extern octa zero octa . = neg one :h = neg one :l = 1 = extern octa neg one . 47. octa z)). over ow : bool . 133. 42. = unsigned y + z = extern octa ominus ARGS ((octa y. omult : octa ( ). 57. MMIX-ARITH x7. 77. MMIX-ARITH x25. 54. 46. 43. shift left : octa ( ). = two tetrabytes makes one octabyte = typedef enum f false . int delta )). 56. 52. 83.

1). Future versions of this program will work with symbols formed from Unicode characters. 71. Here's a rudimentary check to see if arithmetic is in trouble.MMIXAL: BASIC DATA TYPES 442 29. h Initialize everything 29 i  acc = shift left (neg one . The type Char is de. 30. and 140. See also sections 32. This code is used in section 136. 61. but the present code limits itself to an 8-bit subset. 91. 84. if (acc :h 6= # ffffffff) panic ("Type tetra is not implemented correctly" ).

some calls of fprintf will become calls of fwprintf . but Char can be changed to a 16-bit type in the Unicode version.ned here in order to ease the later transition: At present. Char is the same as char . The switchable type name Char provides at least a . and some occurrences of %s will become %ls in print formats. for example. Other changes will also be necessary when the transition to Unicode is made.

h Type de.rst step towards a brighter future with Unicode.

While we're talking about classic systems versus future systems.nitions 26 i + typedef char Char . = bytes that will become wydes some day = 31. we might as well de.

ne the ARGS macro. Each subroutine below is declared . which makes function prototypes available on ANSI C systems without making them uncompilable on older systems.

then with an old-style de.rst with a prototype.

h Preprocessor de.nition.

nitions 31 i  #ifdef __STDC__ #de.

ne ARGS (list ) list #else #de.

. This code is used in section 136.ne ARGS (list ) ( ) #endif See also section 39.

h Initialize everything 29 i + if (buf size < 72) buf size = 72. sizeof (Char )). lab . because a symbolic listing adds 19 characters per line. This limit can be raised. Basic input and output. by using the -b option when invoking the assembler. but short bu ers will keep listings from becoming unwieldy.MMIXAL: BASIC INPUT AND OUTPUT 443 32. Input goes into a bu er that is normally limited to 72 characters. bu er = (Char ) calloc (buf size + 1.

op .eld = (Char ) calloc (buf size + 1. sizeof (Char )).

if (:bu er _ :lab . sizeof (Char )). sizeof (Char )). operand list = (Char ) calloc (buf size . err buf = (Char ) calloc (buf size + 60.eld = (Char ) calloc (buf size . sizeof (Char )).

eld _ :op .

33. h Global variables 27 i + Char bu er .eld _ :operand list _ :err buf ) panic ("No room for the buffers" ). = raw input of the current line = = current position within bu er = Char buf ptr . = copy of the label .

eld of the current instruction = Char lab .

= copy of the opcode .eld .

eld of the current instruction = Char op .

eld . = copy of the operand .

src . h Get the next line of input text. or break if the input has ended 34 i  if (:fgets (bu er . 34. buf size + 1.eld of the current instruction = Char operand list . = place where dynamic error messages are sprinted = Char err buf .

if (bu er [j 1]  '\n' ) bu er [j 1] = '\0' . line no ++ .le )) break . = remove the newline = else if (fgetc (src . line listed = false . j = strlen (bu er ).

<stdio. x139. neg one : octa . EOF = ( false fgetc 1). = 0. __STDC__ . MMIX-ARITH x7. <stdio.h>. fwprintf : int ( ). x26. buf ptr = bu er .le ) 6= EOF ) h Flush the excess part of an overlong line 35 i. calloc : void ( ). fprintf : int ( ).h>. buf size : int . line listed : bool. x36. multibyte\function. 26. panic = macro ( ). x45. j : register int . x136. <stdlib. acc : octa . shift left : octa ( ). h: tetra .h>. x36. <stdio.h>. src . if (bu er [0]  '#' ) h Check for a line directive 38 i.h>. : ( ). <stdio. MMIX-ARITH x4. Standard C. line no : int . This code is used in section 136. int x fgets : char ( ). x83.

strlen : int ( ). .le : FILE .h>. <string. x139.

h Flush the excess part of an overlong line 35 i  f g do j = fgetc (src .MMIXAL: BASIC INPUT AND OUTPUT 444 35.

This code is used in section 34. "(say `-b <number>' to increase the length of my input buffer)\n" ). if (:long warning given ) f long warning given = true . h Global variables 27 i + = index of the current . err ("*trailing characters dropped" ). err ("*trailing characters of long input line fprintf (stderr . 36.le ). while (j 6= '\n' ^ j 6= EOF ). g else have been dropped" ).

le in .

lename = int cur .

le . = current position in the .

37. We keep track of source . = have we given the hint about -b ? = bool long warning given . = have we listed the bu er contents? = bool line listed .le = int line no .

for error reporting and for synchronization data in the object .le name and line number at all times.

Up to 256 di erent source .le.

h Global variables 27 i + char .le names can be remembered.

= source .lename [257].

including those in line directives = = how many .le names.

lename entries have we .

lled? = int .

p ++ ) . p ++ ) . p ++ ) j = 10  j + p '0' . for (j = p ++ '0' . 38. it will also be treated as a comment by the assembler. isspace (p). If the current line is a line directive. if (p  '\"' ) f if (:.lename count . isspace (p). isdigit (p). h Check for a line directive 38 i  f for (p = bu er + 1. for ( .

lename [.

lename count ]) f .

lename [.

if (:.lename count ] = (Char ) calloc (FILENAME_MAX + 1. sizeof (Char )).

lename [.

q = .lename count ]) panic ("Capacity exceeded: Out of filename memory" ). g for (p ++ .

lename [.

it's a line directive = q = '\0' . if (p  '\"' ^ (p 1) 6= '\"' ) f = yes. p ^ p 6= '\"' .lename count ]. g g g for (k = 0. strcmp (. q ++ ) q = p. p ++ .

lename [k]. .

lename [.

k ++ ) . if (k  .lename count ]) 6= 0.

lename count ) .

cur .lename count ++ .

. line no = j 1. This code is used in section 34.le = k.

MMIXAL: BASIC INPUT AND OUTPUT 445 39. Archaic versions of the C library do not de.

h Preprocessor de.ne FILENAME_MAX .

nitions 31 i + #ifndef FILENAME_MAX #de.

41. q. which the user can request with a command-line option. we . This code is used in section 136. In such a listing. The next several subroutines are useful for preparing a listing of the assembled results. h Local variables 40 i  register Char p. = the place where we're currently scanning = See also section 65.ne FILENAME_MAX 256 #endif 40.

The ush listing line subroutine is called when we have .ll the leftmost 19 columns with a representation of the output that has been assembled from the input in the bu er. Sometimes the assembled output requires more than one line. because we have room to output only a tetrabyte per line.

f g if (line listed ) fprintf (listing . Its parameter is a string to be printed between the assembled material and the bu er contents.nished generating one line's worth of assembled material. h Subroutines 28 i + void ush listing line ARGS ((char )). if the input line hasn't yet been echoed. The length of this string should be 19 minus the number of characters already printed on the current line of the listing. void ush listing line (s) char s.

else f fprintf (listing . "\n" ).le .

<stdio. isdigit : ( ).h>. s. fprintf : ( ). g ARGS = macro ( ). <stdio.h>. listing . <stdlib. <stdio. x33.h>. fgetc : int ( ). <stdio. bool : enum. x26. calloc : void ( ).h>. FILENAME_MAX = macro. x136. bu er ). <ctype.h>.le . int int int j : register int . err = macro ( ). "%s%s\n" . x30. bu er : Char .h>. x31. EOF = ( 1). x45. x136. <ctype. line listed = true . isspace : ( ).h>. Char = char. k: register int .

panic = macro ( ). x139. src . x45.le : FILE .

strcmp : int ( ).h>.h>. true = 1.le : FILE . . stderr : FILE . x26. <string. <stdio. x139.

Only the three least signi.MMIXAL: BASIC INPUT AND OUTPUT 446 42.

unless the other digits have changed. if (cur loc :h 6= listing loc :h _ ((cur loc :l  listing loc :l) & # fffff000)) fprintf (listing . mod 4 = f octa o. void update listing loc (k) int k. h Subroutines 28 i + void update listing loc ARGS ((int )).cant hex digits of a location are shown on the listing. The following subroutine prints an extra line when a change needs to be shown. = the location to display.

for (k = 0. = are we between BSPEC and ESPEC ? = bool spec mode . 43.le . k < 4. listing loc :l = (cur loc :l & 4) j k. held bits and listing bits . cur loc :h. f g g listing loc :h = cur loc :h. It should be called only when listing bits 6= 0. listing bits is increased by # 10  j if that byte is a future reference to be resolved later. = number of bytes in the current special output = tetra spec mode loc . (cur loc :l & 4) j k). k ++ ) if (listing bits & (1  k)) break . More precisely. 44. = assembled bytes = unsigned char hold buf [4]. The bytes are held until we need to output them. two auxiliary variables. h Subroutines 28 i + void listing clear ARGS ((void )). if (spec mode ) fprintf (listing . are then increased by 1  j . = current location on the listing = octa listing loc . Furthermore. = which of them haven't been listed yet? = unsigned char listing bits . they are placed into the hold buf . The listing clear routine lists any that have been held but not yet shown. h Global variables 27 i + = current location of assembled output = octa cur loc . ush listing line (" " ). k. "%08x%08x:" . When bytes are assembled. a byte assembled for a location that is j plus a multiple of 4 is placed into hold buf [j ]. = which bytes of hold buf are active? = unsigned char held bits . " ). void listing clear ( ) f register int j.

" else f update listing loc (k). fprintf (listing .le .

j ++ ) if (listing bits & (# 10  j )) fprintf (listing . g for (j = 0.. " .%03x: " . j < 4. (listing loc :l & # ffc) j k).le ..

le . else if (listing bits & (1  j )) fprintf (listing . "xx" ).

"%02x" .le . else fprintf (listing . hold buf [j ]).

" " ).le . .

x26. l: tetra . x26. bool : enum . x62. listing bits = 0. # BSPEC = 104. x26. fprintf : int ( ). x41.MMIXAL: BASIC INPUT AND OUTPUT 447 g ush listing line (" " ). x62. ush listing line : void ( ).h>. ESPEC = # 105. ARGS = macro ( ). h: tetra . <stdio. listing . x31.

x26. . x26. x139. octa = struct.le : FILE . tetra = unsigned int .

otherwise the error is probably serious enough to make manual correction necessary. yet it is not tragic.MMIXAL: BASIC INPUT AND OUTPUT 448 45. if it begins with `! ' it is fatal. If the message begins with `* ' it is merely a warning. Errors and warnings appear also on the optional listing . Error messages are written to stderr .

#de.le.

ne err (m) f report error (m). if (m[0] 6= '*' ) goto bypass . g #de.

p) f sprintf (err buf . p). g #de.ne derr (m. m. report error (err buf ). if (err buf [0] 6= '*' ) goto bypass .

report error (err buf ). m. p. if (err buf [0] 6= '*' ) goto bypass . q) f sprintf (err buf . p.ne dderr (m. q). g #de.

ne panic (m) f sprintf (err buf . "!%s" . g #de. report error (err buf ). m).

p) f err buf [0] = '!' . m. p). f if (:.ne dpanic (m. void report error (message ) char message . sprintf (err buf + 1. g h Subroutines 28 i + void report error ARGS ((char )). report error (err buf ).

lename [cur .

le ]) .

lename [cur .

if (message [0]  '*' ) fprintf (stderr . line %d warning: %s\n" . "\"%s\".le ] = "(nofile)" . .

lename [cur .

. line no . "\"%s\". message + 1).le ]. line %d fatal error: %s\n" . else if (message [0]  '!' ) fprintf (stderr .

lename [cur .

. "\"%s\". line %d: %s!\n" . line no .le ]. message + 1). else f fprintf (stderr .

lename [cur .

le ]. line no . message ). err count ++ . g if (listing .

le ) f if (:line listed ) ush listing line ("****************** " ). if (message [0]  '*' ) fprintf (listing .

le . message + 1). else if (message [0]  '!' ) fprintf (listing . "************ warning: %s\n" .

message + 1). else fprintf (listing .le . "******** fatal error: %s!\n" .

Output to the binary obj . 46. h Global variables 27 i + = this many errors were found = int err count . 47.le . message ). "********** error: %s!\n" . g g if (message [0]  '!' ) exit ( 2).

The bytes are assembled in small bu ers. because we want the output to be big-endian even when the assembler is running on a little-endian machine. not output as single tetrabytes.le occurs four bytes at a time. #de.

ne mmo write (buf ) if (fwrite (buf . 1. 4. obj .

obj .le ) 6= 4) dpanic ("Can't write on %s" .

le name ) .

unsigned char lop quote command [4] = fmm . lop quote . if (listing . when held bits 6= 0 = void mmo clear ( ) f g if (hold buf [0]  mm ) mmo write (lop quote command ). void mmo out ARGS ((void )). 1g. = clears hold buf . mmo write (hold buf ).MMIXAL: BASIC INPUT AND OUTPUT 449 h Subroutines 28 i + void mmo clear ARGS ((void )). 0.

cur . x31. x102. bypass : label. if (mmo line no ) mmo line no ++ . mmo cur loc :l &= 4. unsigned char mmo buf [4].le ^ listing bits ) listing clear ( ). ARGS = macro ( ). 4). held bits = 0. int mmo ptr . hold buf [0] = hold buf [1] = hold buf [2] = hold buf [3] = 0. mmo cur loc = incr (mmo cur loc . mmo write (mmo buf ). = output the contents of mmo buf void mmo out ( ) f g = if (held bits ) mmo clear ( ).

.h>.le : int . err buf : Char . <stdlib. x33. x36. exit : void ( ).

x36.lename : char [ ]. x43. fwrite : size t ( ). line no : int . l: tetra . x26. x44. x36. incr : octa ( ). line listed : bool . x6. listing bits : unsigned char . <stdio. <stdio. unsigned char [ ]. x41. ush listing line : void ( ).h>. MMIX-ARITH listing . listing clear : void ( ). hold buf : x43. held bits : unsigned char . x37. x43. fprintf : int ( ).h>.

mmo cur loc : octa . mmo line no : int . mm = # 98. lop quote = # 0. x139.le : FILE . x24. x51. obj . x51. x22.

obj . x139.le : FILE .

h>. stderr : FILE . x139. . <stdio. sprintf : int ( ).le name : char [ ].h>. <stdio.

unsigned short yz . void mmo lop ARGS ((char . void mmo byte (b) unsigned char b. mmo buf [2] = (t  8) & # ff. 49. f g mmo buf [0] = mm . mmo buf [3] = t & # ff. = output a tetrabyte = void mmo tetra (t) tetra t. z. mmo buf [2] = yz mmo out ( ). void mmo byte ARGS ((unsigned char )).MMIXAL: BASIC INPUT AND OUTPUT 450 48. mmo buf [3] = z .  8. void mmo lopp ARGS ((char . z) char x. f g mmo buf [(mmo ptr ++ ) & 3] = b. mmo out ( ). mmo buf [1] = x. mmo buf [2] = y. f g mmo buf [0] = t  24. y. yz ) char x. The mmo loc subroutine makes the current location in the object . mmo buf [3] = yz & # ff. mmo buf [1] = x. unsigned short )). h Subroutines 28 i + void mmo tetra ARGS ((tetra )). char )). mmo buf [1] = (t  16) & # ff. mmo out ( ). void mmo lop (x. void mmo lopp (x. char . y. f g  output a loader operation = =  output a loader operation with two-byte operand = = mmo buf [0] = mm . if (:(mmo ptr & 3)) mmo out ( ).

cur loc :h  24. mmo cur loc ). o:l). o = ominus (cur loc . if (held bits ) mmo clear ( ). 1). 0. void mmo loc ( ) f octa o. 2). . if (o:h  0 ^ o:l < # 10000) f if (o:l) mmo lopp (lop skip . mmo tetra (cur loc :h).le equal to cur loc . g else f if (cur loc :h & # ffffff) f mmo lop (lop loc . g else mmo lop (lop loc . h Subroutines 28 i + void mmo loc ARGS ((void )).

Similarly. the mmo sync subroutine makes sure that the current . mmo cur loc = cur loc . 50.MMIXAL: BASIC INPUT AND OUTPUT 451 g g mmo tetra (cur loc :l).

le and line number in the output .

le agree with cur .

void mmo sync ( ) f register int j.le and line no . h Subroutines 28 i + void mmo sync ARGS ((void )). register char p. k. if (cur .

le 6= mmo cur .

le ) f if (.

lename passed [cur .

le ]) mmo lop (lop .

cur .le .

0).le . else f mmo lop (lop .

cur .le .

(strlen (.le .

lename [cur .

p = . for (j = 0.le ]) + 3)  2).

lename [cur .

if (j  3) mmo out ( ). g if (j ) f for ( . j = (j + 1) & 3) f mmo buf [j ] = p. j ++ ) mmo buf [j ] = 0. g g g . mmo out ( ). p ++ .le ]. j < 4. p.

lename passed [cur .

mmo cur .le ] = 1.

le = cur .

le . mmo line no = 0. x31. g line numbers exceeding 65535" ). mmo line no = line no . g ARGS = macro ( ). line no ). cur . if (line no 6= mmo line no ) f if (line no  # 10000) panic ("I can't deal with mmo lopp (lop line .

x43. . cur loc : octa .le : int . x36.

lename : char [ ]. x37. .

# h: held bits : lop .lename passed : char [ ]. tetra . l: tetra . x26. x36. x43. line no : int . x51. unsigned char. x26.

char [ ]. x24. mmo clear : void ( ). lop line = # 7. x24. x24. mm = # 98. mmo cur .le = 6. lop skip = # 2. x24. x47. mmo buf : unsigned x47. x22. lop loc = # 1.

x51. strlen : int ( ).h>. mmo ptr : int . octa = struct. <string. x47. x45. MMIX-ARITH x5. x51. mmo line no : int . mmo out : void ( ). x47. ominus : octa ( ). mmo cur loc : octa .le : int . x26. . panic = macro ( ). tetra = unsigned int . x26. x51.

h Global variables 27 i + = current location in the object .MMIXAL: BASIC INPUT AND OUTPUT 452 51.

= current line number in the mmo output so far = int mmo line no . = index of the current .le = octa mmo cur loc .

le in the mmo output so far = int mmo cur .

le . = has a .

lename been recorded in the output? = char .

2. tetra dat . j ))) & # ff. The value of k should be 1. and cur loc should be a multiple of k. h Subroutines 28 i + void assemble ARGS ((char . f register int j. listing bits j= 1  jj . The x bits parameter tells which bytes. else f l = cur loc :l. g for (j = 0. dat . x bits ) char k. h Make sure cur loc and mmo cur loc refer to the same tetrabyte 53 i. j ++ ) f jj = (l + j ) & 3. g listing bits j= x bits . if (spec mode ) l = spec mode loc . j < k. or 4. if (((l + k) & 3)  0) f if (listing . are part of a future reference. tetra . 52. jj . if any. void assemble (k. if (:held bits ^ :(cur loc :h & # e0000000)) mmo sync ( ). unsigned char x bits . l. Here is a basic subroutine that assembles k bytes starting at cur loc . hold buf [jj ] = (dat  (8  (k 1 held bits j= 1  jj . unsigned char )).lename passed [256].

53. This code is used in section 52. else cur loc = incr (cur loc . k). mmo clear ( ). h Make sure cur loc and mmo cur loc refer to the same tetrabyte 53 i  if (cur loc :h 6= mmo cur loc :h _ ((cur loc :l  mmo cur loc :l) & # fffffffc)) mmo loc ( ).le ) listing clear ( ). . g g if (spec mode ) spec mode loc += k.

Algorithms in C (Reading. x15.) Each trie node stores a character. or greater than the character in the trie. and there are branches to subtries for the cases where a given character is less than. R.: Addison{Wesley. equal to. Mass. Symbols are stored and retrieved by means of a ternary search trie.MMIXAL: THE SYMBOL TABLE 453 54. on Discrete Algorithms 8 (1997). 1998). Sedgewick. The symbol table.4. There also is a pointer to a symbol table entry if a symbol ends at the current node. h Type de. (See ACM{SIAM Symp. 360{369. following ideas of Bentley and Sedgewick.

last trie node = t + 1000. We allocate trie nodes in chunks of 1000 at a time. g trie node . if (t  last trie node ) f t = (trie node ) calloc (1000. h Global variables 27 i + = root of the trie = trie node trie root . sizeof (trie node )). trie node new trie node ( ) f register trie node t = next trie node . g g next trie node = t + 1. mid . = root of subtrie for unquali. if (:t) panic ("Capacity exceeded: Out of trie memory" ). = the (possibly wyde) character stored here struct ternary trie struct left . last trie node . =  downward in the ternary trie = = equivalents of symbols = = struct sym tab struct sym . return t.nitions 26 i + typedef struct ternary trie struct f unsigned short ch . 56. h Subroutines 28 i + trie node new trie node ARGS ((void )). 55. = root of subtrie for opcodes = trie node op root . right . = allocation control = trie node next trie node .

ed symbols = trie node cur pre.

x6. listing . cur loc : octa . x43. <stdlib.x . tetra . ARGS = macro ( ). x43. unsigned char [ ]. calloc : void ( ). h: held bits : hold buf : MMIX-ARITH tetra .h>. unsigned char . x44. x31. unsigned char. x26. x43. incr : octa ( ). x26. x43. listing clear : void ( ).

x139. x26. x47. sym tab struct : struct . tetra = unsigned int . . x43. mmo loc : void ( ). spec mode : bool . panic = macro ( ). x50. x45. x43. x49. l: listing bits : mmo sync : void ( ). x26. spec mode loc : tetra .le : FILE . mmo clear : void ( ). x58. octa = struct.

The trie search subroutine starts at a given node of the trie and .MMIXAL: THE SYMBOL TABLE 454 57.

The string ends with the . inserting new nodes if necessary.nds a given string in its middle subtrie.

rst nonletter or nondigit. #de. the location of the terminating character is stored in global variable terminator .

goto store new char . Char s. else f tt~ left = new trie node ( ). = where the search ended = trie node trie search (t. tt = tt~ right . goto store new char . s) trie node t. g g g 58. while (1) f if (:isletter (p) ^ :isdigit (p)) terminator = p.ne isletter (c) (isalpha (c) _ c  '_' _ c  ':' _ c > 126) h Subroutines 28 i + trie node trie search ARGS ((trie node . Symbol table nodes hold the serial numbers and equivalents of de. else f tt~ right = new trie node ( ). f g if (tt~ mid ) f tt = tt~ mid . f register trie node tt = t. while (p 6= tt~ ch ) f if (p < tt~ ch ) f if (tt~ left ) tt = tt~ left . g g else f if (tt~ right ) tt = tt~ right . register Char p = s. return tt . store new char : tt~ ch = p ++ . tt = tt~ left . g else f tt~ mid = new trie node ( ). Char )). Char terminator . tt = tt~ mid . g g g p ++ .

ned symbols. They also hold \.

xup information" for unde.

this will allow the loader to correct any previously assembled instructions that refer to such symbols when are they are eventually de.ned symbols.

In the symbol table node for a de.ned.

ned symbol. the link .

eld has one of the special codes DEFINED or REGISTER or PREDEFINED . and the equiv .

eld holds the de.

The serial number is a unique identi.ned value.

er for all user-de.

ned symbols. .

MMIXAL: THE SYMBOL TABLE 455 In the symbol table node for an unde.

ned symbol. the equiv .

The link .eld is ignored.

eld points to the .

rst node of .

that node is. a symbol table node that might link to other .xup information. in turn.

The serial number in a .xups.

xup node is either 0 or 1 or 2. meaning respectively \.

xup the octabyte pointed to by equiv " or \.

xup the relative address in the YZ .

eld of the instruction pointed to by equiv " or \.

xup the relative address in the XYZ .

eld of the instruction pointed to by equiv ." = code value for octabyte equivalents = #de.

ne DEFINED (sym node ) 1 = code value for register-number equivalents = #de.

ne REGISTER (sym node ) 2 = code value for not-yet-used equivalents = #de.

ne PREDEFINED (sym node ) 3 = serial code for octabyte .

xup = #de.

ne .

x o 0 = serial code for relative .

xup = #de.

ne .

x yz 1 = serial code for JMP .

xup = #de.

ne .

x xyz 2 h Type de.

= serial number of symbol.nitions 26 i + typedef struct sym tab struct f int serial . type number for .

xups = = DEFINED status or link to .

ch : isdigit : int ( ). <ctype. .h>. Char = char. x31. right : trie node . mid : trie node . isalpha : int ( ). x54. left : trie node . octa equiv .xup = struct sym tab struct link . x55. x26. trie node = struct . x54. <ctype. x30. ARGS = macro ( ). = the equivalent value = g sym node . unsigned short . x54.h>. x54. new trie node : trie node ( ). octa = struct. x54.

But in this case we also have the possibility of reusing old . The allocation of new symbol table nodes proceeds in chunks.MMIXAL: THE SYMBOL TABLE 456 59. like the allocation of trie nodes.

xup nodes that are no longer needed. #de.

ne recycle .

if (serialize ) return p.xup (pp ) pp~ link = sym avail . We initialize the trie by inserting all the prede. = should the new node receive a unique serial number? = f register sym node p = sym avail . last sym node = p + 1000. p~ link = . p~ serial = 0. p~ equiv = zero octa . sym avail = pp h Subroutines 28 i + sym node new sym node ARGS ((bool )). g else f p = next sym node . h Global variables 27 i + int serial number . if (p  last sym node ) f p = (sym node ) calloc (1000. if (:p) panic ("Capacity exceeded: Out of symbol memory" ). = stack of recycled symbol table nodes = sym node sym avail . last sym node . if (p) f sym avail = p~ link . 61. = root of the sym = sym node sym root . p~ serial = ++ serial number . = allocation control = sym node next sym node . 60. sym node new sym node (serialize ) bool serialize . g g g next sym node = p + 1. sizeof (sym node )).

Opcodes are given the pre.ned symbols.

this character nicely divides uppercase letters from lowercase letters. h Initialize everything 29 i + trie root = new trie node ( ).x ^ . to distinguish them from ordinary symbols. cur pre.

h Put the special register names into the trie 66 i. h Put other prede. trie root~ mid = op root .x = trie root . trie root~ ch = ':' . h Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 i. op root~ ch = '^' . op root = new trie node ( ).

ned symbols into the trie 70 i. .

based on bits that are stored as the \equivalents" of opcode symbols like ^ADD .MMIXAL: THE SYMBOL TABLE 457 62. = is YZ or XYZ relative? = #de. Most of the assembly work can be table driven.

ne rel addr bit # 1 = should opcode be immediate if Z or YZ not register? = #de.

ne immed bit # 2 = should register status of Z be ignored? = #de.

ne zar bit # 4 = must Z be a register? = #de.

ne zr bit # 8 = should register status of Y be ignored? = #de.

ne yar bit # 10 = must Y be a register? = #de.

ne yr bit # 20 = should register status of X be ignored? = #de.

ne xar bit # 40 = must X be a register? = #de.

ne xr bit # 80 = should register status of YZ be ignored? = #de.

ne yzar bit # 100 = must YZ be a register? = #de.

ne yzr bit # 200 = should register status of XYZ be ignored? = #de.

ne xyzar bit # 400 = must XYZ be a register? = #de.

ne xyzr bit # 800 = is it OK to have zero or one operand? = #de.

ne one arg bit # 1000 = is it OK to have exactly two operands? = #de.

ne two arg bit # 2000 = is it OK to have exactly three operands? = #de.

ne three arg bit # 4000 = is it OK to have more than three operands? = #de.

or octa? = #de. tetra. wyde.ne many arg bit # 8000 = how much alignment: byte.

ne align bits # 30000 = should the label be blank? = #de.

ne no label bit # 40000 = must YZ be a memory reference? = #de.

ne mem bit # 80000 = is this opcode allowed in SPEC mode? = #de.

ne spec bit # 100000 h Type de.

typedef enum f SET = # 100. BSPEC . = treatment of operands = g op spec . bool : enum. ESPEC . ARGS = macro ( ). TETRA . PREFIX . ch : unsigned short . calloc : void ( ). WYDE . OCTA g pseudo op .h>. = numeric opcode = int bits .nitions 26 i + typedef struct f char name . BYTE . cur pre. LOCAL . <stdlib. x31. LOC . = symbolic opcode = short code . IS . x54. GREG . x26.

new trie node : trie node x55. link : sym node .x : trie node . int . op root : trie node . x45. MMIX-ARITH x4. x58. x58. serial : ( ). x56. sym node = struct . panic = macro ( ). . equiv : octa . x58. x56. x54. mid : trie node . x58. zero octa : octa . x56. trie root : trie node .

# 98. # 22081g. f"LDW" . f"CSNZ" . # 240a8g. # a8. f"LDWU" . # 240a2g. # 240a2g. # 240a2g. f"BN" . # 2c. # 68. f"PBOD" . f"PRELD" . f"GO" . # 240a2g. # 52. f"CSN" . # a60a2g. # 240a2g. # 96. f"ZSNN" . # 240a8g. # 26282g. # 240a2g. # a6. # 46. # 82. # 22081g. f"FINT" . # 72. f"SUB" . # 26082g. # 26282g. # 88. f"2ADDU" . f"FIX" . # a60a2g. # 5a. f"ZSN" . # a60a2g. f"FLOT" . # 36. # 22081g. f"LDBU" . # 26288g. # 240a2g. f"ZSEV" . # 02. # 240a2g. f"CSEV" . # 240a2g. # 03. f"ADDU" . # 1e. f"DIV" . # 56. # 7a. # 240a2g. # 240a8g. # 22081g. # 15. h Global variables 27 i + op spec op init table [ ] = f f"TRAP" . # 1c. # 2e. # 16. f"STB" . f"BP" . # 78. # a60a2g. # 240a2g. f"CSWAP" . # 01. # a60a2g. # 62. f"BNP" . # 11. # a60a2g. # a0. f"CSZ" . # 22081g. # 240a2g. # 240a2g. f"SL" . # 26288g. # 0a. # a60a2g. f"ZSZ" . # 240a8g. # 04. # a60a2g. # a60a2g. f"FCMPE" . f"LDT" . # 240a2g. # aa. # 40. f"FMUL" . # 38. # 94. # 4c. # 80. # 22081g. # 240a2g. # 92. f"LDOU" . # 22081g. f"CMPU" .MMIXAL: THE SYMBOL TABLE 63. # a60a2g. # a60a2g. # a6022g. f"FADD" . # 3a. # 5c. # a60a2g. # 22081g. 458 . # 30. f"BZ" . f"PBEV" . # a60a2g. # 240a2g. # 2a. # 26282g. # 240a2g. # 240a2g. f"PBZ" . # 5e. f"ZSNZ" . f"BNN" . f"NEG" . f"SR" . # 8c. # 240a2g. # 8a. # 240a2g. # 44. # 1a. # 3c. # a6022g. f"STT" . f"ZSP" . f"CMP" . # 42. # 240a2g. f"ZSOD" . f"LDTU" . # 9e. # 05. # a60a2g. f"SFLOT" . # 9c. # 90. # 240a2g. # 24. # 26. f"STWU" . # 84. # 240a8g. # 240a2g. # 26282g. # a60a2g. # 6a. # 0e. f"MULU" . # 10. f"LDB" . # 26288g. # a60a2g. # 240a2g. # 240a8g. f"CSP" . f"STTU" . f"PBNZ" . f"FCMP" . f"PBNN" . # 66. # 14. f"16ADDU" . f"LDHT" . # 07. f"FDIV" . # 22081g. # 240a2g. # 7c. # 240a2g. # 58. # 240a8g. f"FUN" . # 34. f"DIVU" . f"NEGU" . # 48. # 8e. # 9a. # 27554g. f"PBNP" . # 22081g. f"BNZ" . # 86. # 70. # 18. # 50. f"FSUB" . f"SFLOTU" . # 28. f"FIXU" . # 240a2g. # 26288g. f"FSQRT" . # 13. f"FREM" . # 240a2g. f"FLOTU" . # 64. # 22081g. # 32. f"FUNE" . # 3e. # 20. f"BOD" . f"STBU" . # 17. f"CSNP" . f"STW" . # a60a2g. f"SLU" . # 240a8g. # a60a2g. f"PREGO" . # 00. # 240a2g. # 6c. # 12. # 240a2g. # 60. # 240a2g. # 240a2g. f"PBN" . f"CSOD" . # a4. # 74. f"8ADDU" . # 0c. # 240a2g. # 22081g. f"LDUNC" . # 22. f"ADD" . # 240a8g. # a60a2g. # 7e. f"FEQL" . f"CSNN" . f"4ADDU" . # a2. # 6e. # 22081g. # 4e. # 22081g. # 76. f"SUBU" . # 54. f"LDSF" . # 26082g. f"SRU" . f"BEV" . f"LDVTS" . f"ZSNP" . f"FEQLE" . # 08. # 06. f"PBP" . f"MUL" . # 22081g. # 240a2g. # 240a8g. f"LDO" . # 240a8g. # a60a2g. # 22081g. # 4a.

f"WYDE" . f"INCML"#. 22080g. #e8. PREFIX = # 103. f"GET" . f"LOC" . 27554g. f"ESPEC" . f"UNSAVE" . f"AND" . = the number of items in op init table = int op init size . #a6022g. # e0. x62. f"ANDNL" . ##13f000g. # d2. f"POP" . 22180g. # c8. # 141800gg. f"SETMH" . f"SYNC"#. ##23000g. f"LDA" . f"GETA" . IS = # 101. ## cc. # fd#. f"ODIF" . # 22080g. # bc. f6. f"SETH" . f"PREST" . f"PUSHJ"#. f"SETML"#. dc. # 101000g. # 240a2g. BSPEC . d6. GREG = # 106. e6#. 240a2g. #22080g. # 240a2g. ##f0. fe. f"PUSHGO" . LOCAL . 240a2g. # fc#. # 27554g. SET . ##21001g. a6022g. 22080g. f"ANDNML" . # e1. f"WDIF" . f"STCO" . f"SETL" . ff#. f"MOR" . eb. f"PREFIX" . x62. # 11f000g. ## 12f000g. f"TDIF" . # 22080g. x62. f"ORMH" . # e5. OCTA . f"NOR" . f"PUT" . 22. # a60a2g. f"NXOR" . f"INCH" . LOCAL = # 107. a6022g. c0. # ca. f"ANDN" . ## ce. f"OCTA" . # 22080g. f"TRIP"#. # c6. #b4. f"SET" . # ed. 141000g. OCTA = # 10b. # f2#. # 22080g. 22080g. BYTE . ##a60a2g. f"BDIF" . ESPEC = # 105. TETRA = # 10a. f"ORML" . # 22080g. BYTE = # 108. f"BSPEC" . e7#. b8. # 21000g. f"IS" . x62. f"ORL" . f"NAND" . f"SYNCID" . GREG . f"XOR" . 22080g. # c4. TETRA . ##ae. f"STUNC" . de. # 240a2g. # 10f000g. f"INCMH" . 41400g. ## d8. # b0. x62. 240a2g. ea. x62. ## 240a2g. fb. f"SWYM"#. ##ac. LOC = # 102. 22080g. f"MXOR" . f4. BSPEC = # 104. ESPEC . # 101400g#. #e4. ## 240a2g. 240a2g. # e9. f"RESUME" . ##f8. fa. #22080g.MMIXAL: THE SYMBOL TABLE 459 f"STO" . # 1400g. ##f9. f"LOCAL" . ## da. f"SADD" . x62. IS . WYDE . 141000g. 22002g. a60a2g. ## 240a2g. f"TETRA" . #a60a2g. # ef. # ee. x62. f"STOU" . # 22080g. # a6022g. LOC . x62. # 22080g. x62. SET = # 100. PREFIX . f"MUX" . ## 22080g. f"GREG" . # 240a2g. # 22080g. # 240a2g. f"SAVE" . # d0. # 22041g. f"ANDNH" . 22081g. # 240a2g. 23a00g. #a60a2g. ba. d4. f"BYTE" . #b6. # 240a2g. f"INCL"#. ## e3. x62. f"JMP" . f"ANDNMH" . op spec = struct . ##a60a2g. WYDE = # 109. f"ORH" . # a6062g. f"SYNCD" . # f"OR" . # ec. f"STHT" . ## 240a2g. # e2#. # 240a2g. # be. f"STSF" . # c2. x62. #b2. . x62. # 22080g. f"ORN" . ##21000g. # 22080g.

"rS" . h Global variables 27 i + char special name [32] = f"rB" . "rC" . "rBB" . "rR" . "rF" . for (j = 0. "rN" . "rU" . "rM" . "rZZ" g. "rQ" . h Type de. g This code is used in section 61. j < op init size . pp~ link = PREDEFINED . h Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 i  op init size = (sizeof op init table )=sizeof (op spec ). "rWW" . h Put the special register names into the trie 66 i  for (j = 0. "rK" . pp~ link = PREDEFINED . special name [j ]). "rJ" . pp = tt~ sym = new sym node (false ). "rXX" . "rO" . pp~ equiv :l = j . j < 32. h Local variables 40 i + register trie node tt . j ++ ) f tt = trie search (trie root . "rTT" . j ++ ) f tt = trie search (op root . "rH" . pp~ equiv :h = op init table [j ]:code . 68. "rI" . pp~ equiv :l = op init table [j ]:bits . "rL" . "rY" . "rX" . pp = tt~ sym = new sym node (false ). 66. "rE" . "rV" . qq . "rW" . "rYY" . g This code is used in section 61. "rT" . "rP" .MMIXAL: THE SYMBOL TABLE 460 64. "rA" . "rZ" . 65. register sym node pp . "rD" . op init table [j ]:name ). "rG" . 67.

f"ROUND_OFF" . 0. 0. f"Fwrite" . 0g. 0. f"W_Handler" . 10gg. # 04g. f"X_BIT" . # 60000000. f"V_Handler" . f"V_BIT" . # 7ff00000. # 20000000. f"Ftell" . 0. f"Z_Handler" . 4g. f"TextWrite" . f"D_Handler" . 0. 0g. f"D_BIT" . f"Fopen" . # 80g. 0. f"W_BIT" . 0. f"I_Handler" . 0. 2g. f"TextRead" . f"BinaryWrite" . 0. 4g. 0. 0. # 30g. f"ROUND_NEAR" . 0. # 60g. f"O_BIT" . 0g. # 40g. 0. f"StdErr" . # 80g. f"U_BIT" . f"BinaryReadWrite" . f"O_Handler" . f"Fgetws" . # 50g. 0. 0. f"U_Handler" . 0. f"Fgets" . f"ROUND_DOWN" . 0. f"BinaryRead" . f 69. 3g. 0. f"Pool_Segment" . 2g. 0. # 02g. 1g. f"ROUND_UP" . 0g. # 10g. 7g. # 70g. 3g. # 08g. 0.nitions 26 i + typedef struct char name . 5g. 0. 0. 2g. # 40g. # 20g. 3g. 0. 0g. 1g. 1g. f"I_BIT" . f"Halt" . 0. 0. 0. # 10g. f"StdOut" . f"Stack_Segment" . 9g. g predef spec . f"Fputws" . 6g. 0g. f"Fseek" . f"X_Handler" . f"Fread" . 0. 0. 0. f"StdIn" . 8g. . 0. 2g. 0g. tetra h. 1g. 0. 0. 0. int predef size . f"Fputs" . 0. 4g. f"Inf" . f"Data_Segment" . f"Fclose" . 0. f"Z_BIT" . # 40000000. l. 0. h Global variables 27 i + predef spec predefs [ ] = ff"ROUND_CURRENT" . 0. 0g. # 01g. 0. 0. 0. # 20g.

MMIXAL: THE SYMBOL TABLE 461 70. h Put other prede.

We place Main into the trie at the beginning of assembly. for (j = 0. j ++ ) f tt = trie search (trie root . pp~ equiv :l = predefs [j ]:l. predefs [j ]:name ).ned symbols into the trie 70 i  predef size = (sizeof predefs )=sizeof (predef spec ). pp = tt~ sym = new sym node (false ). pp~ link = PREDEFINED . j < predef size . 71. pp~ equiv :h = predefs [j ]:h. so that it will show up as an unde. g This code is used in section 61.

ned symbol if the user speci.

op init table : op spec [ ]. code : short . x62. x26. PREDEFINED = macro. new sym node : sym ( ). link : sym node . x63. tetra = unsigned int . x58. x57. trie node = struct . x58. bits : int . x58. trie search : trie node ( ). x136. x56. x63. j : register int . x54.es no starting point. x62. . x26. true = 1. x59. l: tetra . x54. trie root : trie node . sym : sym node . x26. op root : trie node . x62. sym node = struct . equiv : octa . op spec = struct . x26. x58. false = 0. h Initialize everything 29 i + trie search (trie root . "Main" )~ sym = new sym node (true ). h: tetra . node op init size : int . x62. name : char . x26. x56.

At the end of assembly we traverse the entire symbol table. visiting each symbol in lexicographic order and transmitting the trie structure to the output .MMIXAL: THE SYMBOL TABLE 462 72.

le. We detect any unde.

traverse t~ mid . visit t~ sym . if the left subtrie is nonempty. This pattern leads to a compact representation in the mmo . if this symbol table entry is present. we traverse t~ left . traverse t~ right .ned future references at this time. if the middle subtrie is nonempty. if the right subtrie is nonempty. The order of traversal has a simple recursive pattern: To traverse the subtrie rooted at t.

middle subtrie.le. and right subtrie. if the symbol's equivalent is one to eight bytes long. # 09 to # 0e. the character is omitted if the middle subtrie and the equivalent are both empty. character. if the left subtrie is nonempty. equivalent. if the middle subtrie is nonempty. The master byte is the sum of # 80. if the symbol's equivalent is $0 plus one byte. # 10. # 20. if the right subtrie is nonempty. if the symbol's equivalent is 261 plus one to six bytes. The \equivalent" of an unde. usually requiring fewer than two bytes per trie node plus the bytes needed to encode the equivalents and serial numbers. # 40. # 0f. # 01 to # 08. if the character occupies two bytes instead of one. Each node of the trie is encoded as a \master byte" followed by the encodings of the left subtrie.

represented as a sequence of one or more bytes in radix 128. the .ned symbol is zero. Symbol equivalents are followed by the serial number. but stated as two bytes long.

serial number 214 1 is encoded as # 7fff. (Thus. serial number 214 is # 010080. First we prune the trie by removing all prede.) 73.nal byte of the serial number is tagged by adding 128.

ned symbols that the user did not rede.

g if (t~ left ) f t~ left = prune (t~ left ). f register int useful = 0. h Subroutines 28 i + trie node prune ARGS ((trie node )). else t~ sym = . trie node prune (t) trie node t. if (t~ left ) useful = 1. g .ne. if (t~ sym ) f if (t~ sym~ serial ) useful = 1.

if (t~ ch > # ff) m += # 80. else if (t~ sym~ link  DEFINED ) h Encode the length of t~ sym~ equiv 76 i else if (t~ sym~ link _ t~ sym~ serial  1) h Report an unde. h Subroutines 28 i + trie node out stab ARGS ((trie node )). if (t~ sym ) f if (t~ sym~ link  REGISTER ) m += # f. register sym node pp . if (t~ mid ) useful = 1.MMIXAL: THE SYMBOL TABLE 463 if (t~ mid ) f t~ mid = prune (t~ mid ). g if (t~ right ) f t~ right = prune (t~ right ). trie node out stab (t) trie node t. 74. else return . if (t~ left ) m += # 40. if (t~ right ) useful = 1. if (t~ right ) m += # 10. g g if (useful ) return t. f register int m = 0. if (t~ mid ) m += # 20. Then we output the trie by following the recursive traversal pattern. j .

sym node = struct . REGISTER = macro. trie node . DEFINED = macro. g g mmo byte (m). mid : trie node . right : x58. serial : int . if (t~ right ) out stab (t~ right ).ned symbol 79 i. equiv : octa . if (m & # 2f) h Visit t and traverse t~ mid 75 i. trie node = struct . sym : sym node . x54. ch : link : sym node . x31. if (t~ left ) out stab (t~ left ). x48. x58. . x58. unsigned short . ARGS = macro ( ). x58. x54. mmo byte : void ( ). x58. x54. x54. x54. x54. left : trie node . x58.

MMIXAL: THE SYMBOL TABLE 464 75. sym ptr is the . A global variable called sym buf holds all characters on middle branches to the current trie node.

rst currently unused character in sym buf . sym ptr ++ = (m & # 80 ? '?' : t~ ch ). if (m ^ t~ sym~ link ) f if (listing . = Unicode? not yet = m &= # f. h Visit t and traverse t~ mid 75 i  f if (m & # 80) mmo byte (t~ ch  8). mmo byte (t~ ch & # ff).

m ) mmo byte (((t~ sym~ serial  (7  m)) & # 7f) + (m ? 0 : # 80)). for (j = 1. if (m  15) m = 1. else mmo byte ((t~ sym~ equiv :l  (8  (m 1))) & # ff). if (x) m += 4. m < 4.le ) h Print symbol sym buf and its equivalent 78 i. 77. Char sym ptr . m  0. Strictly speaking. 76. = data segment = else x = t~ sym~ equiv :h. h Encode the length of t~ sym~ equiv 76 i  f register tetra x. but really! h Global variables 27 i + Char sym buf [1000]. j < 4. m ++ ) if (t~ sym~ serial < (1  (7  (m + 1)))) break . m ) if (m > 4) mmo byte ((t~ sym~ equiv :h  (8  (m 5))) & # ff). for ( . for (m = 0. x = t~ sym~ equiv :h # 20000000. g This code is used in section 74. for ( . g g if (t~ mid ) out stab (t~ mid ). m > 0. j ++ ) if (x < (1  (8  j ))) break . . if ((t~ sym~ equiv :h & # ffff0000)  # 20000000) m += 8. We make room for symbols up to 999 bytes long. This code is used in section 74. sym ptr . else x = t~ sym~ equiv :l. the program should check if this limit is exceeded. m += j . else if (m > 8) m = 8.

The initial `: ' of each fully quali.MMIXAL: THE SYMBOL TABLE 465 78.

which is allowed by the rules of MMIXAL . is printed as the null string. h Print symbol sym buf and its equivalent 78 i  f g sym ptr = '\0' .ed symbol is omitted here. since most users of MMIXAL will probably not need the PREFIX feature. fprintf (listing . One consequence of this omission is that the one-character symbol `: ' itself.

pp = t~ sym . " %s = " . if (pp~ link  DEFINED ) fprintf (listing .le . sym buf + 1).

else if (pp~ link  REGISTER ) fprintf (listing . pp~ equiv :h. pp~ equiv :l).le . "#%08x%08x" .

"$%03d" .le . else fprintf (listing . pp~ equiv :l).

fprintf (listing . "?" ).le .

This code is used in section 75. " (%d)\n" . 79. pp~ serial ). h Report an unde.le .

h Check and output the trie 80 i  = annihilate all the opcodes = op root~ mid = . (sym ptr + 1) = '\0' .ned symbol 79 i  f sym ptr = (m & # 80 ? '?' : t~ ch ). if (listing . prune (trie root ). "undefined symbol: %s\n" . This code is used in section 74.  Unicode? not yet = = fprintf (stderr . 80. sym buf + 1). err count ++ . sym ptr = sym buf . g m += 2.

le ) fprintf (listing .

h>. x58. mmo ptr  2). x26. mmo lop (lop stab . listing . 0. <stdio. fprintf : int ( ). j : register int . x30. out stab (trie root ). h: tetra .le . x58. while (mmo ptr & 3) mmo byte (0). 0). Char = char. x26. x54. This code is used in section 142. equiv : octa . l: tetra . link : sym node . x74. x46. mmo lopp (lop end . x58. err count : int . "\nSymbol table:\n" ). DEFINED = macro. unsigned short .

tetra = unsigned int . x48. x48. x139. mid : trie node . x58. x54. pp : x74. mmo byte : void ( ). x56. stderr : FILE . REGISTER = macro. x73. x74. register sym node . x24. trie root : trie node .h>. ch : lop end = # c. x54. <stdio. out stab : trie node ( ). x24. mmo lop : void ( ). x26. serial : int . mmo ptr : int . m: register int . mmo lopp : void ( ). lop stab = # b.le : FILE . . x56. x74. sym : sym node . op root : trie node . x48. x47. x74. x58. prune : trie node ( ). t: trie node .

MMIXAL: EXPRESSIONS 466 81. Expressions. The most intricate part of the assembly process is the task of scanning and evaluating expressions in the operand .

Fortunately.eld. Two stacks hold pending data as the operand . MMIXAL 's expressions have a simple structure that can be handled easily with a stack-based approach.

the val stack contains values that have not yet been used. After an entire operand list has been scanned. the op stack will be empty and the val stack will hold the operand values needed to assemble the current instruction. Entries on op stack have one of the constant values de. The op stack contains operators that have not yet been performed. 82.eld is scanned and evaluated.

and they have one of the precedence levels de.ned here.

Entries on val stack have equiv . and status .ned here. link .

h Type de.elds. the link points to a trie node if the expression is a symbol that has not yet been subjected to any operations.

unary g prec . outer rp . inner lp . serialize . registerize . mod . typedef enum f pure . minus . shr . typedef enum f zero .nitions 26 i + typedef enum f negate . plus . frac . outer lp . strong . or . and . complement . reg val . times . shl . unde. over . xor . inner rp g stack op . weak .

ned g stat . = pure . = current value = = trie reference for symbol = trie node link . stat status . or unde. reg val . typedef struct f octa equiv .

= top entry on the operator stack 83. #de.ned = g val node .

ne top op op stack [op ptr 1] = top entry on the value stack = #de.

ne top val val stack [val ptr 1] = next-to-top entry of the value stack #de.

zero g. octa acc . = stack for pending operands = val node val stack . weak . strong . zero . weak . = precedences of the respective stack op values = = newly scanned operator = stack op rt op . unary . = number of items on val stack = int val ptr . strong . strong . zero .ne next val val stack [val ptr 2] h Global variables 27 i + = stack for pending operators = stack op op stack . strong . strong . prec precedence [ ] = funary . zero . weak . = number of items on op stack = int op ptr . unary . = temporary accumulator = = = . weak . unary . strong . strong .

sizeof (stack op )). h Initialize everything 29 i + op stack = (stack op ) calloc (buf size .MMIXAL: EXPRESSIONS 467 84. The operand . sizeof (val node )). val stack = (val node ) calloc (buf size . 85. if (:op stack _ :val stack ) panic ("No room for the stacks" ).

h Scan the operand .eld of an instruction will have been copied into a separate Char array called operand list when we reach this part of the program.

h>. x139. op stack [0] = outer lp . x40. trie node = struct . . hold op : op stack [op ptr ++ ] = rt op . calloc : void ( ). rt op 97 i. register Char . = val stack is empty = val ptr = 0. x54. Char . while (precedence [top op ]  precedence [rt op ]) h Perform the top operation on op stack 98 i. = op stack contains an \outer left parenthesis" = while (1) f h Scan opening tokens until putting something on val stack 86 i. x26. operand list : p: panic = macro ( ). g operands done : This code is used in section 102. buf size : int . op ptr = 1. scan close : h Scan a binary operator or closing token.eld 85 i  p = operand list . octa = struct . x33. <stdlib. x45.

break . A comment that follows an empty operand list needs to be detected here. goto scan open . goto scan open . case '\'' : h Scan a character constant 92 i. derr ("syntax error after character `%c'" . goto scan open . break . break . h Scan opening tokens until putting something on val stack 86 i  scan open : if (isletter (p)) h Scan a symbol 87 i else if (isdigit (p)) f if ((p + 1)  'F' ) h Scan a forward local 88 i else if ((p + 1)  'B' ) h Scan a backward local 89 i else h Scan a decimal constant 94 i. (p 2)). g g if ((p 1)) derr ("syntax error at character `%c'" . case '@' : h Scan the current location 96 i. case '&' : op stack [op ptr ++ ] = serialize .MMIXAL: EXPRESSIONS 468 86. p + 1). break . else tt = trie search (cur pre. h Scan a symbol 87 i  f if (p  ':' ) tt = trie search (trie root . p = operand list . This code is used in section 85. g else switch (p ++ ) f case '#' : h Scan a hexadecimal constant 95 i. case '(' : op stack [op ptr ++ ] = inner lp . 87. case '-' : op stack [op ptr ++ ] = negate . case '\"' : h Scan a string constant 93 i. case '$' : op stack [op ptr ++ ] = registerize . (p 1)). goto scan open . operand list [1] = '\0' . case '~' : op stack [op ptr ++ ] = complement . goto scan open . case '+' : goto scan open . default : = treat operand list as empty = if (p  operand list + 1) f operand list [0] = '0' .

if (pp~ link  PREDEFINED ) pp~ link = DEFINED . top val :link = tt . top val :equiv = pp~ equiv . top val :status = (pp~ link  DEFINED ? pure : pp~ link g This code is used in section 86. pp = tt~ sym . p = terminator .  REGISTER ? reg val : unde. symbol found : val ptr ++ .x . p). if (:pp ) pp = tt~ sym = new sym node (true ).

.ned ).

goto symbol found . Initially 0H . Statically allocated variables forward local host [j ] and backward local host [j ] masquerade as nodes of the trie. backward local [10]. : : : .MMIXAL: EXPRESSIONS 469 88. This code is used in section 86. p += 2. 91. h Global variables 27 i + trie node forward local host [10]. 1H . 90. sym node forward local [10]. p += 2. 9H are de. h Scan a forward local 88 i  f g tt = &forward local host [p '0' ]. 89. goto symbol found . h Scan a backward local 89 i  f g tt = &backward local host [p '0' ]. This code is used in section 86. backward local host [10].

ned to be zero. backward local [j ]:link = DEFINED . j < 10. x82. j ++ ) f forward local host [j ]:sym = &forward local [j ]. g complement = 2. h Initialize everything 29 i + for (j = 0. backward local host [j ]:sym = &backward local [j ]. cur pre.

x58. x56. . x83. equiv : octa . op stack : stack op . register sym node . x45. x58. x82. x82. x82. trie search : trie node ( ). x58. op ptr : int . isdigit : int ( ). x59. sym node = struct . negate = 0. x82. x56. pp : x65. PREDEFINED = macro. link : trie node . REGISTER = macro. top val = macro. x26. x83. x33. terminator : Char . tt : register trie node . equiv : octa . pure = 0. serialize = 1. x136. x83. x57. trie node = struct . j: register int . x82. x82. x54. status : stat .h>. unde. trie root : trie node . x65. x54. link : sym node . <ctype. x82. x58. true = 1. p: register Char . inner lp = 4. isletter = macro ( ). x57. sym : sym node . x40. registerize = 3. reg val = 1.x : trie node DEFINED = macro. x58. operand list : Char . x82. new sym node : sym node ( ). derr = macro ( ). x57. x58. x82.

val stack : val node . . x82. x83.ned = 2. x83. val ptr : int .

break . else if (p  'A' ) acc = incr (acc . p ++ ) f acc = incr (shift left (acc . 2)). p += 2. err ("*null string is treated as g else if ((p + 1)  '\"' ) p += 2. h Scan the current location 96 i  acc = cur loc . else p = '\"' . h Scan a decimal constant 94 i  acc :h = 0. acc :l = p. g goto constant found . '0' 'a' + 10). zero" ). rt op 97 i  switch (p ++ ) f case '+' : rt op = plus . goto constant found .' . We have already checked to make sure that the character constant is legal. top val :link = . . goto constant found . This code is used in section 86. if (p  'a' ) acc = incr (acc . acc :h = acc :l = 0. 1). acc :l = 0. for (p ++ . acc = incr (shift left (acc . This code is used in section 86. p '0' ). if (p  '\"' ) f p ++ . acc :l = p. 93. isdigit (p).MMIXAL: EXPRESSIONS 470 92. top val :status = pure . acc :l = p '0' . h Scan a hexadecimal constant 95 i  if (:isxdigit (p)) err ("illegal hexadecimal constant" ). This code is used in section 86. shift left (acc . This code is used in section 86. '0' 'A' + 10).  p = '. p '0' ). 4). 96. 97. p ++ ) f acc = oplus (acc . 94. g constant found : val ptr ++ . break . 95. This code is used in section 86. for ( . h Scan a binary operator or closing token. case '-' : rt op = minus . h Scan a character constant 92 i  acc :h = 0. h Scan a string constant 93 i  acc :h = 0. top val :equiv = acc . isxdigit (p). goto constant found .

= comma else rt op = outer lp . = g h Cases for unary operators 100 i h Cases for binary operators 99 i g This code is used in section 85. else p ++ . break . err = macro ( ). default : derr ("syntax error at `%c'" . x82. hold op : label.h>. derr = macro ( ). break . case '^' : rt op = xor . op stack : stack op . case '/' : if (p 6= '/' ) rt op = over . x82. scan close : label. l: tetra . break . x82. MMIX-ARITH x6. times = 7. inner rp = 18. mod = 10. isxdigit : int ( ). MMIX-ARITH x7. shift left : octa ( ). oplus : octa ( ). h Perform the top operation on op stack 98 i  switch (op stack [ op ptr ]) f case inner lp : if (rt op  inner rp ) goto scan close . case '&' : rt op = and . x82. break . plus = 5. top val = macro. x83. sh check : p ++ . x45. goto hold op . x83. err ("*missing left parenthesis" ). shl = 11. rt op : stack op . x26.MMIXAL: EXPRESSIONS 471 case '*' : rt op = times . break . break . case '\0' : case '. break . x82. reg val = 1. case '%' : rt op = mod . frac = 9. <ctype. x82. (p 2)). outer rp = 17. x40. minus = 6. x83. x43. case '<' : rt op = shl . <ctype. derr ("syntax error at `%c'" . op ptr : int . break . p: register Char . link : trie node . x82. x83. incr : octa ( ). break . g This code is used in section 85. inner lp = 4. . case outer lp : if (rt op  outer rp ) f if (top val :status  reg val ^ (top val :equiv :l > # ff _ top val :equiv :h)) f err ("*register number too large. x82. goto scan close . x82. x83. (p 1)). x82. acc : octa . outer lp = 16. x82. x82. over = 8. x85. goto sh check . err ("*missing right parenthesis" ). equiv : octa . xor = 15. operands done : label. h: tetra . status : stat . case '|' : rt op = or . x82. and = 13. x83. x82. x85. case '>' : rt op = shr . 98. x82. top val :equiv :h = 0. pure = 0. top val :equiv :l &= # ff. else f op ptr ++ . or = 14.' : rt op = outer rp . x82. x82. if ((p 1)  (p 2)) break . x26. MMIX-ARITH x5. will be reduced mod 256" ). val ptr : int . g g if (:(p 1)) goto operands done . x85. x82. x82. case ')' : rt op = inner rp . cur loc : octa . shr = 12. x45. rt op = frac . isdigit : int ( ). x82.h>.

MMIXAL: EXPRESSIONS 472 99. Now we come to the part where equivalents are changed by unary or binary operators found in the expression being scanned. Once we've written the code for this case. and in some ways the fussiest one to deal with. h Cases for binary operators 99 i  case plus : if (top val :status  unde. the other cases almost take care of themselves. The most typical operator. is binary addition.

if (next val :status  unde.ned ) err ("cannot add an undefined quantity" ).

if (top val :status  reg val ^ next val :status  reg val ) err ("cannot add two register numbers" ). next val :equiv = oplus (next val :equiv . .ned ) err ("cannot add to an undefined quantity" ). top val :equiv ).

100.n bin : next val :status = (top val :status  next val :status ? pure : reg val ). See also section 101. #de. delink : top val :link = . break . This code is used in section 98. val ptr .

case serialize : if (:top val :link ) err ("can take serial number of symbol top val :equiv :h = 0. top val :equiv :l = top val :link~ sym~ serial . This code is used in section 98. goto delink . case complement : unary check ("complement" ). top val :equiv :l = top val :equiv :l. 101. verb ) h Cases for unary operators 100 i  case negate : unary check ("negate" ). goto delink .ne unary check (verb ) if (top val :status 6= pure ) derr ("can %s pure values only" . top val :equiv = ominus (zero octa . goto delink . case registerize : unary check ("registerize" ). top val :status = pure . #de. top val :equiv :h = top val :equiv :h. top val :status = reg val . only" ). goto delink . top val :equiv ).

verb ) h Cases for binary operators 99 i + case minus : if (top val :status  unde.ne binary check (verb ) if (top val :status 6= pure _ next val :status 6= pure ) derr ("can %s pure values only" .

ned ) err ("cannot subtract an undefined quantity" ). if (next val :status  unde.

top val :equiv ).ned ) err ("cannot subtract from an undefined quantity" ). next val :equiv = ominus (next val :equiv . goto . if (top val :status  reg val ^ next val :status 6= reg val ) err ("cannot subtract register number from pure value" ).

top val :equiv ). next val :equiv = omult (next val :equiv . goto . case times : binary check ("multiply" ).n bin .

if (op stack [op ptr ]  mod ) next val :equiv = aux . goto . if (top val :equiv :l  0 ^ top val :equiv :h  0) err ("*division by zero" ). next val :equiv = odiv (zero octa .n bin . next val :equiv . top val :equiv ). case over : case mod : binary check ("divide" ).

.n bin .

if (next val :equiv :h  top val :equiv :h ^ (next val :equiv :l  top val :equiv :l _ next val :equiv :h > top val :equiv :h)) err ("*illegal fraction" ). top val :equiv ). next val :equiv = odiv (next val :equiv . goto .MMIXAL: EXPRESSIONS 473 case frac : binary check ("compute a ratio of" ). zero octa .

case shl : case shr : binary check ("compute a bitwise shift of" ). top val :equiv :l). true ). goto . else next val :equiv = shift right (next val :equiv . top val :equiv :l.n bin . if (top val :equiv :h _ top val :equiv :l > 63) next val :equiv = zero octa . else if (op stack [op ptr ]  shl ) next val :equiv = shift left (next val :equiv .

case and : binary check ("compute bitwise and of" ). next val :equiv :h &= top val :equiv :h. next val :equiv :l &= top val :equiv :l.n bin . goto .

next val :equiv :l j= top val :equiv :l. case or : binary check ("compute bitwise or of" ).n bin . next val :equiv :h j= top val :equiv :h. goto .

case xor : binary check ("compute bitwise xor of" ). next val :equiv :l = top val :equiv :l. next val :equiv :h = top val :equiv :h. goto .n bin .

h: tetra . MMIX-ARITH x5. minus = 6. x82. x83. x45. l: tetra . plus = 5. ominus : octa ( ). octa ( ). shift left : octa ( ). x82. x82. x45. x82. unde. x82. status : stat . x82. x82. link : trie node . times = 7. x82. x83. MMIX-ARITH x4. x26. x26. x82. aux : octa . mod = 10. x82. negate = 0.n bin . over = 8. x82. MMIX-ARITH shl = 11. sym : sym node . x82. and = 13. x58. true = 1. MMIX-ARITH x13. registerize = 3. top val = macro. x5. octa ( ). op stack : stack op . x26. shr = 12. frac = 9. equiv : octa . x82. derr = macro ( ). x82. x82. pure = 0. x54. omult : MMIX-ARITH MMIX-ARITH or = 14. serial : int . x82. x82. x8. x83. oplus : octa ( ). x7. reg val = 1. next val = macro. MMIX-ARITH shift right : x7. odiv : octa ( ). op ptr : int . complement = 2. x82. err = macro ( ). x82. serialize = 1. x83.

.ned = 2. MMIX-ARITH x4. x82. x82. xor = 15. x83. val ptr : int . zero octa : octa .

We get to this part of the program at the beginning of a line. Our current position in the bu er is the value of buf ptr . h Scan the label .MMIXAL: ASSEMBLING AN INSTRUCTION 474 102. or after a semicolon at the end of an instruction earlier on the current line. h Process the next MMIXAL instruction or comment 102 i  p = buf ptr . Now let's move up from the expression level to the instruction level. Assembling an instruction. buf ptr = "" .

goto bypass if there is none 103 i.eld. h Scan the opcode .

h Copy the operand .eld. goto bypass if there is none 104 i.

op .eld 106 i. buf ptr = p. if (spec mode ^ :(op bits & spec bit )) derr ("cannot use `%s' in special mode" .

eld ). if ((op bits & no label bit ) ^ lab .

op .eld [0]) f derr ("*label field of `%s' instruction is ignored" .

eld ). lab .

g if (op bits & align bits ) h Align the location pointer 107 i. h Scan the operand .eld [0] = '\0' .

eld 85 i. if (opcode  GREG ) h Allocate a global register 108 i. if (lab .

eld [0]) h De.

103. h Do the operation 116 i.ne the label 109 i. h Scan the label . bypass : This code is used in section 136.

eld. goto bypass if there is none 103 i  if (:p) goto bypass . q = lab .

p). q ++ ) q = p. g q = '\0' . if (p ^ :isspace (p)) derr ("label syntax error at `%c'" . if (isdigit (lab . isdigit (p) _ isletter (p). if (:isspace (p)) f if (:isdigit (p) ^ :isletter (p)) goto bypass . = comment = for (q ++ = p ++ . p ++ .eld .

eld [0]) ^ (lab .

eld [1] 6= 'H' _ lab .

eld [2])) derr ("improper local label `%s'" . lab .

eld ). isspace (p). p ++ ) . 104. This code is used in section 102. for (p ++ . We copy the opcode .

h Scan the opcode .eld to a special bu er because we might want to refer to the symbolic opcode in error messages.

eld. goto bypass if there is none 104 i  q = op .

q = '\0' . while (isletter (p) _ isdigit (p)) q ++ = p ++ .eld . if (:isspace (p) ^ p ^ op .

p). pp = trie search (op root .eld [0]) derr ("opcode syntax error at `%c'" . op .

eld )~ sym . if (:pp ) f if (op .

op .eld [0]) derr ("unknown operation code `%s'" .

if (lab .eld ).

label `%s' will be ignored" . lab .eld [0]) derr ("*no opcode.

g . goto bypass .eld ).

106. op bits = pp~ equiv :l.MMIXAL: ASSEMBLING AN INSTRUCTION 475 opcode = pp~ equiv :h. = numeric code for MMIX operation or MMIXAL pseudo-op = = ags describing an operator's special characteristics = tetra op bits . h Global variables 27 i + tetra opcode . We copy the operand . This code is used in section 102. 105. while (isspace (p)) p ++ .

h Copy the operand .eld to a special bu er so that we can change string constants while scanning them later.

eld 106 i  q = operand list . g else if (p  '\"' ) f for (q ++ = p ++ . if (p  '\'' ) f q ++ = p ++ . if (isspace (p)) break . rest of the line is a comment = = change empty operand . while (p) f if (p  '. q ++ = p ++ . p ^ p 6= '\"' . p ++ . = if not followed by semicolon.' ) p ++ . g g q ++ = p ++ . if (p 6= '\'' ) err ("illegal character constant" ). if (:p) err ("incomplete character constant" ). q ++ ) q = p. if (p  '. else p = "" .' ) break . while (isspace (p)) p ++ . if (:p) err ("incomplete string constant" ).

x62. This code is used in section 102. lab .h>. derr = macro ( ).eld to `0 ' = if (q  operand list ) q ++ = '0' . x45. tetra . tetra . x33. x26. isdigit : int ( ). equiv : octa . x62. <ctype. err = macro ( ). x57. <ctype. h: isletter = macro ( ). align bits = # 30000.h>. q = '\0' . x26. GREG = # 106. buf ptr : Char . int ( ). x45. x58.

x33. x62. op .eld : Char . no label bit = # 40000.

x40. trie search : trie node ( ). x26. p: register Char . . operand list : Char . x54. spec bit = # 100000. spec mode : bool . x43. x57. register Char . isspace : l: q: x65. x33. sym : sym node . x40. x62. x33. tetra = unsigned int . x56.eld : Char . pp : register sym node . op root : trie node .

It is important to do the alignment in this step before de.MMIXAL: ASSEMBLING AN INSTRUCTION 476 107.

ning the label or evaluating the operand .

If the label is.eld. This code is used in section 102. 108. acc :l = (1  j ). f g g if (greg  32) err ("too many global registers" ). greg val [greg ] = val stack [0]:equiv . got greg : . (1  j ) 1). cur loc = oand (incr (cur loc . Furthermore. greg . say 2H . acc ). an operand of 2F will have been treated as unde. j ++ ) if (greg val [j ]:l  val stack [0]:equiv :l ^ greg val [j ]:h  val stack [0]:equiv :h) cur greg = j . j < 255. goto got greg . 109. h Allocate a global register 108 i  f if (val stack [0]:equiv :l _ val stack [0]:equiv :h) f for (j = greg . acc :h = 1. cur greg = greg . we will already have used the old value of 2B when evaluating the operands. h Align the location pointer 107 i  f g j = (op bits & align bits )  16. g This code is used in section 102.

ned. Symbols can be de. which it still is.

ned more than once. but only if each de.

A warning message is given when a prede.nition gives them the same equivalent value.

ned symbol is being rede.

if its prede.ned.

h De.ned value has already been used.

pp 111 i. acc = cur loc . if (opcode  IS ) f cur loc = val stack [0]:equiv . cur loc :l = cur greg . new link = REGISTER . if (pp~ link  DEFINED _ pp~ link  REGISTER ) f if (pp~ equiv :l 6= cur loc :l _ pp~ equiv :h 6= cur loc :h _ pp~ link 6= new link ) f if (pp~ serial ) derr ("symbol `%s' is already defined" . h Find the symbol table node. if (val stack [0]:status  reg val ) new link = REGISTER .ne the label 109 i  f sym node new link = DEFINED . lab . g else if (opcode  GREG ) cur loc :h = 0.

derr ("*redefinition of predefined symbol `%s'" .eld ). pp~ serial = ++ serial number . lab .

g .eld ).

g if (isdigit (lab . else if (pp~ link ) f if (new link  REGISTER ) err ("future reference cannot be to do h Fix prior references to this label 112 i while (pp~ link ). a register" ).MMIXAL: ASSEMBLING AN INSTRUCTION 477 g else if (pp~ link  PREDEFINED ) pp~ serial = ++ serial number .

eld [0])) pp = &backward local [lab .

eld [0] '0' ]. h Fix references that might be in the val stack 110 i. pp~ link = new link . pp~ equiv = cur loc . if (listing .

h Fix references that might be in the val stack 110 i  if (:isdigit (lab . g cur loc = acc . This code is used in section 102.le ^ (opcode  IS _ opcode  LOC )) h Make special listing to show the label equivalent 115 i. 110.

j < val ptr .eld [0])) for (j = 0. j ++ ) if (val stack [j ]:status  unde.

pp 111 i  if (isdigit (lab .ned ^ val stack [j ]:link~ sym  pp ) f val stack [j ]:status = (new link  REGISTER ? reg val : pure ). 111. val stack [j ]:equiv = cur loc . h Find the symbol table node. g This code is used in section 109.

eld [0])) pp = &forward local [lab .

eld [0] '0' ]. else f if (lab .

eld [0]  ':' ) tt = trie search (trie root . lab .

else tt = trie search (cur pre.eld + 1).

lab .x .

x143. x83. x62. g This code is used in section 109. align bits = # 30000. x43. backward local : sym node x90. acc : octa . cur pre. pp = tt~ sym . cur greg : int . cur loc : octa . if (:pp ) pp = tt~ sym = new sym node (true ).eld ).

<ctype. greg val : octa [ ]. h: tetra . lab . x136. GREG = # 106. x62. derr = macro ( ). x143. [ ]. x82. greg : int . . MMIX-ARITH x6. IS = # 101. incr : octa ( ). x45. x58. x62. x26. isdigit : int ( ). equiv : octa . err = macro ( ). x45. forward local : sym node x90. x56. x58. equiv : octa . j : register int . x26. l: tetra . x133.x : trie node DEFINED = macro.h>. [ ].

x58. x33. x82. listing . link : sym node .eld : Char . link : trie node .

le : FILE . x54. sym : sym node . pp : register sym node . true = 1. op bits : tetra . oand : MMIX-ARITH x65. tt : register trie node . x62. x58. PREDEFINED = macro. LOC = # 102. unde. x58. x59. x139. x105. x60. new sym node : sym node ( ). trie search : trie node ( ). x58. status : stat . x82. x105. x82. x65. serial number : int . opcode : tetra . sym node = struct . trie root : trie node . x82. x57. reg val = 1. pure = 0. x25. serial : int . octa ( ). x56. x26. REGISTER = macro. x58.

x83. x83. val ptr : int . . val stack : val node .ned = 2. x82.

pp~ link = qq~ link . mmo loc ( ).MMIXAL: ASSEMBLING AN INSTRUCTION 478 112. if (qq~ serial  . h Fix prior references to this label 112 i  f g qq = pp~ link .

recycle .x o ) h Fix a future reference from an octabyte 113 i else h Fix a future reference from a relative address 114 i.

113. h Fix a future reference from an octabyte 113 i  f g if (qq~ equiv :h & # ffffff) f mmo lop (lop . This code is used in section 109.xup (qq ).

0. g else mmo lop (lop .xo . 2). mmo tetra (qq~ equiv :h).

mmo tetra (qq~ equiv :l). 1). 0). if (o:l & 3) dderr ("*relative address in location #%08x%08x not divisible qq~ equiv :h. o = ominus (cur loc . h Fix a future reference from a relative address 114 i  f g octa o. o = shift right (o. qq~ equiv :l). k = 0. 114. qq~ equiv :h  24. 2. if (o:h  0) if (o:l < # 10000) mmo lopp (lop .xo . qq~ equiv ). This code is used in section 112.

o:l). else if (qq~ serial  .xr .

x xyz ^ o:l < # 1000000) f mmo lop (lop .

24). g else k = 1. 0.xrx .# else if (o:h  ffffffff) if (qq~ serial  . mmo tetra (o:l).

x xyz ^ o:l  # ff000000) f mmo lop (lop .

24). mmo tetra (o:l & # 1ffffff). )f g else if (qq~ serial  .xrx . 0.

x yz ^ o:l  # ffff0000 mmo lop (lop .

by 4" . qq~ equiv :l). 16). g else k = 1. . if (k) dderr ("relative address in location #%08x%08x is too far qq~ equiv :h. mmo tetra (o:l & # 100ffff). This code is used in section 112.xrx . 0. away" . else k = 1.

h Make special listing to show the label equivalent 115 i  if (new link  DEFINED ) f fprintf (listing .MMIXAL: ASSEMBLING AN INSTRUCTION 479 115.

cur loc :l).le . cur loc :h. g else f g fprintf (listing . ush listing line (" " ). "(%08x%08x)" .

ush listing line (" This code is used in section 109. if (op bits & many arg bit ) h Do a many-operand operation 117 i else switch (val ptr ) f case 1: if (:(op bits & one arg bit )) derr ("opcode `%s' needs more than one operand" . op . "($%03d)" . cur loc :l & # ff). " ).le . h Do the operation 116 i  future bits = 0. 116.

h Do a one-operand operation 129 i. op . case 2: if (:(op bits & two arg bit )) if (op bits & one arg bit ) derr ("opcode `%s' must not have two operands" .eld ).

eld ) else derr ("opcode `%s' must have more than two operands" . op .

eld ). op . h Do a two-operand operation 124 i. case 3: if (:(op bits & three arg bit )) derr ("opcode `%s' must not have three operands" .

eld ). op . h Do a three-operand operation 119 i. default : derr ("too many operands for opcode `%s'" .

. DEFINED = macro.eld ). x43. x58. dderr = macro ( ). cur loc : octa . equiv : octa . derr = macro ( ). x45. x58. x45. g This code is used in section 102.

.x o = 0. x58.

x58.x xyz = 2. .

x120. x41. l: tetra . listing .x yz = 1. h: tetra .h>. x26. <stdio. k: register int . future bits : int . x58. fprintf : int ( ). link : sym node . ush listing line : void ( ). x136. x26. x58.

lop . x139.le : FILE .

lop . x24.xo = # 3.

lop .xr = # 4. x24.

op bits : tetra . octa = struct . one arg bit = # 1000. x49. mmo lop : void ( ). x26. x24. mmo tetra : void ( ). x62. mmo lopp : void ( ).xrx = # 5. op . x62. x105. many arg bit = # 8000. x48. mmo loc : void ( ). x109. x48. new link : sym node . MMIX-ARITH x5. x48. ominus : octa ( ).

pp : register sym node . x65. x65. recycle . qq : register sym node .eld : Char . x33.

x62. x62. serial : int . val ptr : int . two arg bit = # 2000. x59. x83. shift right : octa ( ). x58. three arg bit = # 4000. x7.xup = macro ( ). MMIX-ARITH .

and OCTA . j < val ptr . if (k < 8) assemble (k. k). 0). k = 1  (opcode BYTE ). h Do a many-operand operation 117 i  for (j = 0. if ((val stack [j ]:equiv :h ^ opcode < OCTA ) _ (val stack [j ]:equiv :l > # ffff ^ opcode < TETRA ) _ (val stack [j ]:equiv :l > # ff ^ opcode < WYDE )) if (k  1) err ("*constant doesn't fit in one byte" ) else derr ("*constant doesn't fit in %d bytes" . val stack [j ]:equiv :l.MMIXAL: ASSEMBLING AN INSTRUCTION 480 117. TETRA . The many-operand operators are BYTE . else if (val stack [j ]:status  unde. WYDE . j ++ ) f h Deal with cases where val stack [j ] is impure 118 i.

0. g This code is used in section 116. 118. else assemble (4.ned ) assemble (4. h Deal with cases where val stack [j ] is impure 118 i  if (val stack [j ]:status  reg val ) err ("*register number used else if (val stack [j ]:status  unde. # f0). assemble (4. val stack [j ]:equiv :h. 0. assemble (4. 0). # f0). 0). val stack [j ]:equiv :l.

qq~ link = pp~ link .ned ) f if (opcode 6= OCTA ) err ("undefined constant" ). pp~ link = qq . qq = new sym node (false ). qq~ serial = . pp = val stack [j ]:link~ sym .

x o . qq~ equiv = cur loc . as a constant" ) g This code is used in section 117. 119. h Do a three-operand operation 119 i  h Do the Z .

eld 121 i. h Do the Y .

assemble X : h Do the X .eld 122 i.

This code is used in section 116. future bits ). break . 120. Individual . assemble inst : assemble (4.eld 123 i. (opcode  24) + xyz .

= pieces for assembly = = places where there are future references int future bits . yz . x. x.elds of an instruction are placed into global variables z . = 121. h Global variables 27 i + tetra z. yz . xyz . y. y . and/or xyz . h Do the Z .

eld 121 i  if (val stack [2]:status  unde.

ned ) err ("Z field is undefined" ). op . if (val stack [2]:status  reg val ) f if (:(op bits & (immed bit + zr bit + zar bit ))) derr ("*Z field of `%s' should not be a register number" .

eld ). = immediate = . g else if (op bits & immed bit ) opcode ++ .

MMIXAL: ASSEMBLING AN INSTRUCTION 481 else if (op bits & zr bit ) derr ("*Z field of `%s' should be a register if (val stack [2]:equiv :h _ val stack [2]:equiv :l > # ff) err ("*Z field doesn't fit in one byte" ). z = val stack [2]:equiv :l & # ff. number" . op .

h Do the Y .eld ). 122. This code is used in section 119.

eld 122 i  if (val stack [1]:status  unde.

op .ned ) err ("Y field is undefined" ). if (val stack [1]:status  reg val ) f if (:(op bits & (yr bit + yar bit ))) derr ("*Y field of `%s' should not be a register number" .

op . g else if (op bits & yr bit ) derr ("*Y field of `%s' should be a register number" .eld ).

This code is used in section 119. 123. y = val stack [1]:equiv :l & # ff. yz = (y  8) + z . if (val stack [1]:equiv :h _ val stack [1]:equiv :l > # ff) err ("*Y field doesn't fit in one byte" ).eld ). h Do the X .

eld 123 i  if (val stack [0]:status  unde.

if (val stack [0]:status  reg val ) f if (:(op bits & (xr bit + xar bit ))) derr ("*X field of `%s' should not be a register number" . op .ned ) err ("X field is undefined" ).

g else if (op bits & xr bit ) derr ("*X field of `%s' should be a register number" .eld ). op .

xyz = (x  16) + yz . x45. x45. derr = macro ( ). equiv : octa . x = val stack [0]:equiv :l & # ff. equiv : octa . x52. x82. . false = 0.eld ). x58. x43. if (val stack [0]:equiv :h _ val stack [0]:equiv :l > # ff) err ("*X field doesn't fit in one byte" ). x26. assemble : BYTE = # 108. void ( ). This code is used in section 119. cur loc : octa . x62. err = macro ( ).

k: register int . op . x26. x26. x58. link : sym node . x136. new sym node : sym node ( ). x82. x59. OCTA = # 10b. x136. x105.x o = 0. h: tetra . j : register int . x58. x62. link : trie node . l: tetra . x62. op bits : tetra . immed bit = # 2.

TETRA = # 10a. x33. x54. x105. register sym node . x62. x58. unde. qq : status : stat . x82. x26. x82. tetra = unsigned int . reg val = 1. serial : int . x65.eld : Char . pp : register sym node . x65. opcode : tetra . sym : sym node .

x62. x62. x62. . zar bit = # 4.ned = 2. x62. x83. zr bit = # 8. x82. x83. xar bit = # 40. yr bit = # 20. x62. val stack : val node . x62. x62. yar bit = # 10. val ptr : int . # WYDE = 109. xr bit = # 80.

MMIXAL: ASSEMBLING AN INSTRUCTION 482 124. h Do a two-operand operation 124 i  if (val stack [1]:status  unde.

ned ) f if (op bits & rel addr bit ) h Assemble YZ as a future reference and goto assemble X 125 i else err ("YZ field is undefined" ). op . g else if (val stack [1]:status  reg val ) f if (:(op bits & (immed bit + yzr bit + yzar bit ))) derr ("*YZ field of `%s' should not be a register number" .

= change to OR = if (opcode  SET ) val stack [1]:equiv :l = 8. opcode ++ . if (opcode  SET ) opcode = # e3. = change to SETL = = immediate = else if (op bits & immed bit ) opcode ++ . opcode = # c1.eld ). = silently append . else if (op bits & yzr bit ) f derr ("*YZ field of `%s' should be a register number" . else if (op bits & mem bit ) val stack [1]:equiv :l = 8. op .0 = g else f = val stack [1]:status  pure = if (op bits & mem bit ) h Assemble YZ as a memory address and goto assemble X 127 i.

yz = val stack [1]:equiv :l & # ffff. qq~ link = pp~ link . qq = new sym node (false ). qq~ serial = . pp~ link = qq . if (val stack [1]:equiv :h _ val stack [1]:equiv :l > # ffff) err ("*YZ field doesn't fit in two bytes" ). 125. h Assemble YZ as a future reference and goto assemble X 125 i  f g pp = val stack [1]:link~ sym . goto assemble X . This code is used in section 116.eld ). g if (op bits & rel addr bit ) g h Assemble YZ as a relative address and goto assemble X 126 i.

dest = shift right (val stack [1]:equiv . address is not divisible by 4" ). 0).x yz . 126. 0). 2. 2. source ). This code is used in section 124. goto assemble X . . if (val stack [1]:equiv :l & 3) err ("*relative source = shift right (cur loc . acc = ominus (dest . future bits = # c0. dest . yz = 0. h Assemble YZ as a relative address and goto assemble X 126 i  f octa source . qq~ equiv = cur loc .

k = 0. opcode ++ . g g if (o:l  # ff ^ :o:h ^ k) yz = (k  8) + o:l. h Assemble YZ as a memory address and goto assemble X 127 i  f octa o. x82. else if (:expanding ) err ("no base address is close enough to the else h Assemble instructions to put supplementary data in $255 128 i. goto assemble X . x43. expanding : int . cur loc : octa . x26. assemble X : label. x45. x119. k = j . yz = acc :l. greg val [j ]). address A" ) This code is used in section 124. x58. opcode ++ . derr = macro ( ). o = val stack [1]:equiv . goto assemble X . . # 10000). if (acc :h  o:h ^ (acc :l  o:l _ acc :h < o:h)) o = acc . equiv : octa . j ++ ) if (greg val [j ]:h _ greg val [j ]:l) f acc = ominus (val stack [1]:equiv . x45. This code is used in section 124. if (acc :l > # ffff _ acc :h) err ("relative address g g is more than #ffff tetrabytes forward" ). for (j = greg . j < 255. is more than #10000 tetrabytes backward" ).MMIXAL: ASSEMBLING AN INSTRUCTION 483 if (:(acc :h & # 80000000)) f if (acc :l > # ffff _ acc :h) err ("relative address g else f acc = incr (acc . acc : octa . x139. false = 0. err = macro ( ). 127. equiv : octa . x83.

x136. x26. x136. new sym node : sym node ( ). incr : octa ( ). op bits : tetra . x58. immed bit = # 2. x58. x5. x62. x62. greg : int . x26. future bits : int . l: tetra . x26. link : trie node . octa = struct . x105. MMIX-ARITH x6. mem bit = # 80000. k: register int . h: tetra . greg val : octa [ ]. x143.x yz = 1. x59. j : register int . op . x82. link : sym node . x120. ominus : octa ( ). x133.

x105. x33. node . x82. x62. MMIX-ARITH x7.eld : Char . MMIX-ARITH x65. rel addr bit = # 1. x54. qq : register sym x65. serial : int . x82. SET = # 100. reg val = 1. status : stat . x82. x58. pure = 0. shift right : octa ( ). unde. x62. opcode : tetra . sym : sym node . pp : register sym node .

val stack : val node . yz : tetra . . x82. x62. x62. x83. x120. yzr bit = # 200.ned = 2. yzar bit = # 100.

#de.MMIXAL: ASSEMBLING AN INSTRUCTION SETH # e0 ORH # e8 ORL # eb 128.

ne #de.

ne #de.

j ++ ) f switch (j & 3) f case 0: yz = o:h  16. g g if (k) yz = (k  8) + 255. opcode ++ . (j  24) + (255  16) + yz . 129. = SETMH or ORMH = case 2: yz = o:l  16. = Y = $255. break . break .ne h Assemble instructions to put supplementary data in $255 128 i  f for (j = SETH . break . h Do a one-operand operation 129 i  if (val stack [0]:status  unde. = SETML or ORML = case 3: yz = o:l & # ffff. = SETL or ORL = g if (yz ) f assemble (4. Z = 0 = g This code is used in section 127. = Y = $k . j  ORL . 0). break . = SETH = case 1: yz = o:h & # ffff. j j= ORH . Z = $255 = else yz = 255  8.

ned ) f if (op bits & rel addr bit ) h Assemble XYZ as a future reference and goto assemble inst 130 i else if (opcode 6= PREFIX ) err ("the operand is undefined" ). g else if (val stack [0]:status  reg val ) f if (:(op bits & (xyzr bit + xyzar bit ))) derr ("*operand of `%s' should not be a register number" . op .

g else f = val stack [0]:status  pure = if (op bits & xyzr bit ) derr ("*operand of `%s' should be a register number" .eld ). op .

qq~ link = pp~ link . qq = new sym node (false ). if (op bits & rel addr bit ) h Assemble XYZ as a relative address and goto assemble inst 131 i. 130. 484 . if (val stack [0]:equiv :h _ val stack [0]:equiv :l > # ffffff) err ("*XYZ field doesn't fit in three bytes" ). g if (opcode > # ff) h Do a pseudo-operation and goto bypass 132 i. pp~ link = qq . This code is used in section 116. xyz = val stack [0]:equiv :l & # ffffff.eld ). goto assemble inst . h Assemble XYZ as a future reference and goto assemble inst 130 i  f pp = val stack [0]:link~ sym .

MMIXAL: ASSEMBLING AN INSTRUCTION 485 g qq~ serial = .

131. x26. err = macro ( ). goto assemble inst . false = 0. xyz = acc :l. equiv : octa . x82. derr = macro ( ). x119. cur loc : octa . x83. if (:(acc :h & # 80000000)) f if (acc :l > # ffffff _ acc :h) err ("relative address is more than g else f acc = incr (acc . source ). acc : octa . dest = shift right (val stack [0]:equiv . 0). x58. . 2. if (acc :l > # ffffff _ acc :h) err ("relative address is more than g g address is not divisible by 4" ). equiv : octa . qq~ equiv = cur loc . 2. x102. h Assemble XYZ as a relative address and goto assemble inst 131 i  f octa source . bypass : label. future bits = # e0. This code is used in section 129.x xyz . # 1000000). assemble inst : label. x45. x43. #ffffff tetrabytes forward" ). #1000000 tetrabytes backward" ). 0). opcode ++ . goto assemble inst . if (val stack [0]:equiv :l & 3) err ("*relative source = shift right (cur loc . assemble : void ( ). x45. xyz = 0. dest . acc = ominus (dest . This code is used in section 129. x52.

reg val = 1. rel addr bit = # 1.x xyz = 2. x5. octa = struct . octa ( ). octa . x82. x120. x58. serial : int . x127. op bits : tetra . op . tetra . k: register int . future bits : int . x136. x26. shift right : octa ( ). x59. h: incr : MMIX-ARITH l: o: qq : ( ). x136. x58. ominus : octa ( ). x26. x62. j : register int . x105.

link : trie node . x62. x58. x105. pure = 0. x26. x54. status : stat . new sym node : sym node x65. x82. PREFIX = # 103. sym : sym node .eld : Char . MMIX-ARITH x6. unde. x65. x82. MMIX-ARITH x7. pp : register sym node . x33. x82. tetra . link : sym node . opcode : tetra . register sym node .

. x120. x62. val stack : val node . x120. xyz : tetra . x83. yz : tetra . x62.ned = 2. x82. xyzar bit = # 400. xyzr bit = # 800.

case PREFIX : if (:val stack [0]:link ) err ("not a valid prefix" ). case IS : goto bypass . h Do a pseudo-operation and goto bypass 132 i  switch (opcode ) f case LOC : cur loc = val stack [0]:equiv . cur pre.MMIXAL: ASSEMBLING AN INSTRUCTION 132.

x = val stack [0]:link . case GREG : if (listing . goto bypass .

if (listing . case LOCAL : if (val stack [0]:equiv :l > lreg ) lreg = val stack [0]:equiv :l.le ) h Make listing for GREG 134 i. goto bypass .

le ) f fprintf (listing .

goto bypass . val stack [0]:equiv :l). h Make listing for GREG 134 i  if (val stack [0]:equiv :l _ val stack [0]:equiv :h) f fprintf (listing . " ). "($%03d)" .le . val stack [0]:equiv :l). spec mode = true . goto bypass . case BSPEC : if (val stack [0]:equiv :l > # ffff _ val stack [0]:equiv :h) err ("*operand of `BSPEC' doesn't fit in two bytes" ). 134. 133. mmo sync ( ). g This code is used in section 129. mmo loc ( ). spec mode loc = 0. h Global variables 27 i + = initial values of global registers = octa greg val [256]. ush listing line (" g goto bypass . mmo lopp (lop spec . case ESPEC : spec mode = false .

fprintf (listing . ush listing line (" %08x)" . val stack [0]:equiv :h). " ). cur greg . val stack [0]:equiv :l). "($%03d=#%08x" .le .

g else f g fprintf (listing . " ush listing line (" " ).le .

" ). "($%03d)" . 486 . ush listing line (" This code is used in section 132. cur greg ).le .

the command mmixal [options] sourcefilename will assemble the MMIXAL program in . On a UNIX-like system. Running the program.MMIXAL: RUNNING THE PROGRAM 487 135.

writing any error messages on the standard error .le sourcefilename .

are:  -o objectfilename Send the output to a binary . (Nothing is written to the standard output. which may appear in any order.) The options.le.

le called objectfilename . If no -o speci.

the object .cation is given.

le name is obtained from the input .

le name by changing the .

mmo ' if sourcefilename doesn't end with s .  -l listingname Output a listing of the assembled input and output to a text .nal letter from `s ' to `o '. or by appending `.

bypass : label. x43. x62. x102. by assembling auxiliary instructions that make temporary use of global register $255.  -x Expand memory-oriented commands that cannot be assembled as single instructions. cur pre.le called listingname . cur loc : octa . cur greg : int .  -b bufsize Allow up to bufsize characters per line of input. BSPEC = # 104. x143.

ush listing line : void ( ). fprintf : int ( ). x82. tetra . ESPEC = # 105.h>. x82. x26. err = macro ( ). <stdio. x26. false = 0. x62. x62. listing . link : trie node . x41. l: tetra . IS = # 101.x : trie node . x62. x45. equiv : octa . x26. GREG = # 106. x56.

x143. PREFIX = # 103. . x62.le : FILE . x105. octa = struct. mmo loc : void ( ). x26. mmo sync : void ( ). # h: LOC = 102. x48. x62. val stack : val node . x50. x26. opcode : tetra . lop spec = # 8. x139. true = 1. x83. x43. x24. x43. x49. spec mode : bool . spec mode loc : tetra . mmo lopp : void ( ). x62. lreg : int . LOCAL = # 107.

. Here.MMIXAL: RUNNING THE PROGRAM 488 136.

#include <stdio.h> #include <time. is the overall structure of this program.h> h Preprocessor de.h> #include <string.h> #include <ctype.nally.h> #include <stdlib.

nitions 31 i h Type de.

argv ) int argc . f j. while (1) f h Process the next MMIXAL instruction or comment 102 i. h Initialize everything 29 i. char argv [ ]. while (1) f h Get the next line of input text. g if (listing . = all-purpose integers = h Local variables 40 i. h Process the command line 137 i. k . or break if the input has ended 34 i. if (:buf ptr ) break .nitions 26 i h Global variables 27 i h Subroutines 28 i int main (argc .

h Process the command line 137 i  for (j = 1. 137. j < argc 1 ^ argv [j ][0]  '-' . else if (:line listed ) ush listing line (" " ). else if (argv [j ][1]  'o' ) j ++ . j ++ ) if (:argv [j ][2]) f if (argv [j ][1]  'x' ) expanding = 1. g h Finish the assembly 142 i. because MMIX-SIM does not use a space in this context.le ) f register int g g if (listing bits ) listing clear ( ). strcpy (obj . The space after "-b" is optional.

else break . "[-x] [-l listingname] [-b buffersize] [-o objectfilename]" ). argv [0]. g exit ( 1). else if (argv [j ][1]  'b' ^ sscanf (argv [j + 1]. strcpy (listing name . &buf size ) 6= 1) break . "%d" . else if (argv [j ][1]  'l' ) j ++ . "%d" . argv [j ]). g else if (argv [j ][1] 6= 'b' _ sscanf (argv [j ] + 1. if (j 6= argc 1) f fprintf (stderr . . argv [j ]). "Usage: %s %s sourcefilename\n" .le name . &buf size )  1) j ++ .

MMIXAL: RUNNING THE PROGRAM 489 src .

h Open the .le name = argv [j ]. This code is used in section 136. 138.

les 138 i  src .

le = fopen (src .

if (:src .le name . "r" ).

src .le ) dpanic ("Can't open the source file %s" .

le name ). if (:obj .

le name [0]) f j = strlen (src .

if (src .le name ).

le name [j 1]  's' ) f strcpy (obj .

le name . src .

obj .le name ).

le name [j 1] = 'o' . g g else sprintf (obj .

mmo" . "%s.le name . src .

le name ). obj .

le = fopen (obj .

le name . "wb" ). if (:obj .

obj .le ) dpanic ("Can't open the object file %s" .

if (listing name [0]) f listing .le name ).

"w" ). if (:listing .le = fopen (listing name .

h Global variables 27 i + = name of the MMIXAL input . 139.le ) dpanic ("Can't open the listing file %s" . listing name ). g This code is used in section 140.

le = char src .

le name . = name of the binary output .

le = char obj .

= name of the optional listing .le name [FILENAME_MAX + 1].

le = char listing name [FILENAME_MAX + 1]. FILE src .

le . obj .

listing .le .

h Initialize everything 29 i + h Open the .le . int expanding . = are we expanding instructions when base address fail? = = maximum number of characters per line of input = int buf size . 140.

.les 138 i.

lename [0] = src .

.le name .

1. mmo cur .lename count = 1. 141. 1). h Output the preamble 141 i. mmo tetra (time ()). h Output the preamble 141 i  mmo lop (lop pre .

exit : void ( ). x45. buf ptr : Char . x33.h>.le = 1. dpanic = macro ( ). <stdlib. This code is used in section 140. <stdio. FILE . .h>.

lename : char [ ]. x37. .

<stdio. FILENAME_MAX = macro. line listed : bool .h>. x43. x36.lename count : int . fprintf : int ( ).h>. x37. <stdio. fopen : FILE ( ). x44. ush listing line : void ( ). x24. lop pre = # 9.h>. mmo cur . <stdio. x41. listing bits : unsigned char . listing clear : void ( ).

h>. time : time t ( ). .h>. sscanf : int ( ). <stdio.le : int .h>. <string. <stdio. mmo lop : void ( ). &