This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Patrick J. Burns

Copyright c 1998 Patrick J. Burns Permission to reproduce individual copies of this book for personal use is granted. Multiple copies may be created for nonproﬁt academic purposes—a nominal charge to cover the expense of reproduction may be made. Using this text for commercial instruction is prohibited without permission. There is no guarantee (express or implied) on the book or the accompanying software. The author takes no responsibility for damage that may result from their use.

S-PLUS is a registered trademark of MathSoft, Inc. S is a trademark of AT&T. Other trademarked products are also referred to.

Contents

1 Essentials 1.1 1.2 1.3 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subscripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [ with positive numbers . . . . . . . . . . . . . . . . . . . . . . . [ with negative numbers . . . . . . . . . . . . . . . . . . . . . . . [ with characters . . . . . . . . . . . . . . . . . . . . . . . . . . . [ with logicals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The $ operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . [[ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subscripted arrays . . . . . . . . . . . . . . . . . . . . . . . . . . [ with a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empty subscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 1.5 1.6 1.7 1.8 1.9 Object-Oriented Programming . . . . . . . . . . . . . . . . . . . Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coercion Happens . . . . . . . . . . . . . . . . . . . . . . . . . . Looping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 4 5 5 6 6 7 8 9 10 11 12 13 15 16 17 18 19 19 21 21 22 23 23 25

1.10 A Model of Computation . . . . . . . . . . . . . . . . . . . . . . 1.11 Things To Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Poetics 2.1 2.2 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

iv 2.3 2.4 2.5

CONTENTS Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beauty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Doing It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Think safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimize expense . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 2.7 2.8 2.9 Critique of S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 29 30 31 34 36 36 37 37 38 39 39 40 44 45 49 52 58 60 68 68 68 69 71 75 81 90 94 98

3 Ecology 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Source Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Suites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Vocabulary 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Apply Family . . . . . . . . . . . . . . . . . . . . . . . . . . Pedestrian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . Synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.10 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.11 Under-Documented . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.12 Deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 S Poetry c 1998 Patrick J. Burns v1.0

CONTENTS

v

4.13 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.14 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.15 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Choppy Water 5.1 5.2 5.3 5.4 5.5 5.6 121

Identity Crisis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Oﬀ the Wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 135

6 Debugging 6.1 6.2 6.3 6.4

Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Dumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Exploring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 the dimnames bug . . . . . . . . . . . . . . . . . . . . . . . . . . 140 the print.matrix bug . . . . . . . . . . . . . . . . . . . . . . . . . 141 the trace bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 the get bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.5 6.6 6.7 6.8 6.9

Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Bug Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 153

7 Unix for S Programmers 7.1 7.2 7.3 7.4 7.5 7.6

Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . 153 Using Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 167

8 C for S Programmers 8.1 8.2

Comparison of S and C . . . . . . . . . . . . . . . . . . . . . . . 167 Loading Code into S . . . . . . . . . . . . . . . . . . . . . . . . . 171 S Poetry c 1998 Patrick J. Burns v1.0

vi 8.3 8.4 8.5 8.6 8.7 8.8 8.9

CONTENTS Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 C Code for S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Common C Mistakes . . . . . . . . . . . . . . . . . . . . . . . . . 181 Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

8.10 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9 S for S Programmers 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 187

Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Zero Length Objects . . . . . . . . . . . . . . . . . . . . . . . . . 195 Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Object-Oriented Programming . . . . . . . . . . . . . . . . . . . 223 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 The R Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

9.10 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 9.11 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 10 Numbers 235

10.1 Changing Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 10.2 Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 242 10.3 Polygamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 10.4 Digamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 10.5 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . 268 10.6 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 10.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 10.8 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 11 Character 275

11.1 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 11.2 Home Brew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 11.3 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 11.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 S Poetry c 1998 Patrick J. Burns v1.0

CONTENTS

vii

11.5 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 12 Arrays 12.2 Array Functions 285 . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

12.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Stable Apply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Binding Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 12.3 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 12.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 13 Formulas 291

13.1 Lazy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 13.2 Manipulating Formulas . . . . . . . . . . . . . . . . . . . . . . . . 296 13.3 Mathematical Graphs . . . . . . . . . . . . . . . . . . . . . . . . 297 13.4 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 13.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 14 Functions 319

14.1 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . 319 14.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 14.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 327 14.4 Optimization via PORT . . . . . . . . . . . . . . . . . . . . . . . 338 Simple interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 General interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 14.5 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 14.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 14.7 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 15 Large Computations 355

15.1 Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 15.2 Reduce and Expand . . . . . . . . . . . . . . . . . . . . . . . . . 356 15.3 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 15.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 15.5 Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 16 Esoterics 361

16.1 Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 16.2 Dostoevsky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 S Poetry c 1998 Patrick J. Burns v1.0

viii

CONTENTS 16.3 Things to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 16.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

Epilogue Glossary

365 371

S Poetry c 1998 Patrick J. Burns v1.0

. . . . . . . . . if any. . .List of Tables 1. . 169 Operators with diﬀerent meanings in C than in S . 169 Terminology from various languages. . . . . version 3. . . . . .1 8. . . . . .1 8. . . . . . . .3 8. . . .2 8. . . . 173 Modes within C Code. . .4 Precedence table for the S language. . . . 16 Some C operators and approximate S equivalents. . . . . . . . . . . . . 176 ix . . . . . . . .

Burns v1.x LIST OF TABLES S Poetry c 1998 Patrick J.0 .

character and complex. Essentially everything in S— for instance. Then what is expressed in this sentence is far more complex than any meaning that could be expressed with the English package. The concept of mode should become clear as examples unfold. a call to a function—is an S object. Of fundamental importance is that S is a language.Chapter 1 Essentials Five great strengths of S are its variety of objects. object-oriented programming and graphics. its vector orientation. While the length of an object is generally straightforward. All objects possess a length and a mode. One viewpoint is that S has self-knowledge. This makes S much more useful than if it were merely a “package” for statistics or graphics or mathematics. If an object is atomic. The atomic modes to speak of are—numeric. A powerful and fairly unique part of S that will be addressed in later chapters is the possibility of computing on the language. For example. The most basic object is an atomic vector—often merely called a vector. 1. This self-awareness makes a lot of things possible in S that are not in other languages. Imagine if English were not a language. then each element has the same mode. but merely a collection of words that could only be used individually—a package.1 Objects I brieﬂy outline the important types of objects that exist in S. Each of these is explored before turning to other matters. the power of subscripting. 1 . the mode is more arbitrary. The length of a vector is the number of elements it contains. There is no such thing as an atomic vector that contains both numbers and logicals. logical. the length of a numeric or complex vector says how many numbers are contained in the object.

0 . For example.nan function.4-8. then a double quote within the string needs a backslash in front of it. S considers this a detail that should not concern the user. An example of a complex vector of length 2 is: c(1+3i. and inﬁnity minus inﬁnity. If double quotes are used. You can distinguish NaNs from ordinary NAs with the is. Likewise. numeric objects may contain NAs and Infs. In addition to the usual numbers. but in fact they eliminate a great deal of bother. Each string contains an arbitrary number of characters. Each element of a vector of mode character is a character string. You may initially think that these special values are burdensome. There are two ﬂavors of NA. In S there is no diﬀerence between writing 1 and 1.7i) Complex vectors have special values analogous to those for numeric objects. FALSE and NA. Only when you are interfacing to C or Fortran is it an issue. it is the escape. The other NA is NaN. > c("single ’ quote". The imaginary part of a complex number is written with an “i” at the end. single-precision ﬂoating-point numbers and integers. for numeric data. Burns v1. Objects of mode complex contain complex numbers.0. ’single \’ quote’) [1] "single ’ quote" "double \" quote" [3] "double \" quote" "single ’ quote" There are a few other characters that are formed by following a backslash with another character. Character strings are surrounded by either double quotes or single quotes. TRUE is also written T.2 CHAPTER 1. may be either positive or negative. ESSENTIALS Although S does distinguish between double-precision ﬂoating-point numbers. a single quote needs to be escaped if the string is delimited by single quotes. There are three logical values: TRUE. There is not a missing value for mode character—often the empty string is used where a missing value is ﬁtting. Just as zero is S Poetry c 1998 Patrick J. meaning “Not-a-Number”—these occur through mathematical operations such as zero divided by zero. The backslash is a special character—rather than meaning “backslash”. and similarly. "\\" means “backslash” and "\n" means “newline”. Double quotes are always used when character vectors are printed. A backslash followed by a triple of octal digits gives the corresponding ASCII code. An Inf is an inﬁnite value and. FALSE can be written F. + "double \" quote". the usual type means “missing value”. 2. There is one more object that is classiﬁed as atomic—NULL. ’double " quote’.

You can imagine each S object having a hook where the attributes list hangs—some objects have nothing there. The class attribute is the granddaddy attribute of them all. length and attributes deﬁne an object. columns. so that they can take on whatever nature you choose. OBJECTS 3 a very important number.) The availability of lists in S is a very powerful feature—it sets it apart from much other software. Since everything in an atomic vector must be of one type. an object that stands for “nothing” is a powerful instrument. One more thing allows an even richer set of objects in S. but I use it no matter what. Arrays provide a good example of attributes.1. By adding just the dim attribute.c(stevie=4. many of which will be discussed later. and the elements tell how many rows. which we will come to shortly. The product of the elements in the dim must equal the length of the object. and optionally a dimnames attribute. You have the ability to give objects any attributes. we get an object that has a quite diﬀerent nature. Its attributes are a list with 1 component named "names". This is what drives the object-oriented programming. To put various types into a single object.0 . Suppose we issue the command: jjcw <. the length of the dim is the dimension of the array. A matrix—a rectangular arrangement of elements—is a two-dimensional array. The names of an object is a vector of character strings that has the same length as the object. S Poetry c 1998 Patrick J. Chambers and Wilks (1988. Lists are objects of mode list and the length of a list is the number of components it contains. The dim is a vector of integers. the second component a character vector of length 3. but limited in applicability. others have quite involved lists there. The attributes of an object is a list in which each component has a name. Each component of a list is another S object. But we already have all the basics—mode. they are pleasantly simple. An S array is merely a vector that has a dim attribute. p112) use the word “component” only when it is named. munchkin=2) The object named jjcw is a numeric vector of length 2. The dimnames provide names along each dimension. Thus a matrix has a dim attribute of length 2. and the third component a list of length 24. which is a character vector of length 2. Each object may have a set of attributes. Burns v1. are in the array. For example. etc. the ﬁrst component may be a numeric vector of length 20.1. S understands some attributes internally—the names attribute is an example. you need a list. (Note that Becker. The shapes a bright container can contain! 1 There are additional types of objects in S. The S notion of array generalizes matrices from two dimensions to an arbitrary number of dimensions.

and element S Poetry c 1998 Patrick J.0 .name on page 279.2 Vectorization Many S functions are vectorized. ESSENTIALS Many objects are given names. Capital letters are distinct from lower-case characters. That users become unconscious of the fact that log(x) is shorthand for a whole series of operations is testimony to the power of vectorization. suppose x has length 2n.do not have “valid” names so special measures need to be used at times when working with them. Assignment functions.value which is essentially the last value that was not given a name. generally objects are given “valid” S names. if any. then the result is length 2 with the ﬁrst element being the square of the element in x and the second being its cube.Last. and taken for granted by experienced users. if you decide that you should have assigned the last command to an object name. Vectorization tends to be frightening to new users. See valid. The ﬁrst digit. Additionally the command x+2 returns a vector as long as x containing two plus the corresponding element of x. Burns v1. Drinking my juices up 2 The arithmetic operators in S are vectorized. More generally. in the name must be preceded by an alphabetic character. then x+y returns a vector in which each element is the sum of the corresponding elements in x and y. A valid name contains only alphanumeric characters and periods. The main source of confusion is when the two vectors are not the same length and neither has length 1.Last.. Then the result is the same length as x and element 2i of the result will be the cube of element 2i of x. then the result is again length 2 with the ﬁrst element being the square of the ﬁrst element of x and the second element being the cube of the second element of x. Consider the command: x ^ (2:3) If x is 1 long. then you can do something like: > jj <.value 1. One particular object is .4 CHAPTER 1. each element of the result is the natural logarithm of the corresponding element in x. If x has length 2. meaning that they operate on each of the elements of a vector. For example. Replication (as done by the rep function and implicitly done by the arithmetic operators) means to keep re-using the elements in order until the result is the proper length. If x and y are the same length. Although there are ways of using almost any name. The rule is that the shorter vector is replicated to be the length of the longer vector. The x+2 example is really a case of this—the length one vector containing 2 is replicated to the length of x.s. the command log(x) returns a vector that is the same length as x. For instance. like dim<.

A ﬁrst step in mastering S is to thoroughly understand subscripting. I often speak in terms of extraction. but replacement is analogous (except where noted). The result is usually the length of the numeric vector that is inside the brackets. The same is true when x has length 2n + 1.x[sub] x[sub] <.0 . The actual rule on the length of the result is that it is the length of the integer subscripting vector after zeros have been removed. but in this case you will also get a warning that the longer length is not an even multiple of the shorter length.1. Here is such a warning with a diﬀerent command: > 1:4 + 5:1 [1] 6 6 6 6 2 Warning messages: Length of longer object is not a multiple of the length of the shorter object in: 1:4 + 5:1 It is often the case that something is wrong when such a warning appears. 1. The subscripting numbers are coerced to be integer before subscripting is attempted. the command state.3 Subscripting The term subscripting refers to the extraction or replacement of parts of objects. If the expression that contains the subscripting is on the left-hand side of an assignment. Each possibility is described below. then the values are replaced. There is no constraint on the length of the subscripting vector or the number of times any particular index appears. [ with positive numbers A single square bracket with positive numbers extracts the elements with indices equal to the numbers. SUBSCRIPTING 5 2i − 1 of the result will be the square of element 2i − 1 of x.value # extraction # replacement In what follows. otherwise they are merely extracted. For example. value <. Burns v1. S Poetry c 1998 Patrick J. There are a number of ways of subscripting. Elements that are NA or larger than the length of the vector being subscripted create NAs (or empty strings) in the result.name[1:5] extracts the ﬁrst ﬁve state names.3.

[ with characters The subscripting vector can be character.6 CHAPTER 1. The command: state. then you will end up with both the original value and the new value with the shortened name: S Poetry c 1998 Patrick J. It is a general feature of S that if there is a ﬁnite population of character strings from which to pick.0 NA NA NA NA 34.34.4 [ with negative numbers When the vector inside the square brackets contains negative numbers. 50)] returns the names of all of the states except the ﬁrst and the 50th.4 > jj [1] 1.0 3.0 .name[-c(1. A missing value in the subscripting vector creates an error. ESSENTIALS If you perform a replacement and a subscript is larger than the length of the subscripted vector. Be careful when performing a replacement. then you can abbreviate: > jjcw stevie munchkin 4 2 > jjcw["stevie"] stevie 4 > jjcw["st"] stevie 4 DANGER. then all but the speciﬁed elements are selected. You do not need the names to match completely.0 2.1:5 > jj[10] <.0 5. If there is a mix of positive and negative subscripts. For example: > jj <. Elements of the subscripting vector that are zero. then the vector will be as long as the largest subscript and any elements not given a value will be NA. in which case it corresponds to the names of the vector. 4.0 . or beyond the length of the vector are ignored. Burns v1. repeated. If you do not use the full name. you will get an error. the strings in the subscripting vector only need enough of the ﬁrst part of each name so that it is uniquely identiﬁed.

it is implicitly replicated to be the same length as the vector being subscripted.rep(c(F. The naturalness and simplicity of this form of subscripting belie its great utility.1:100 > jjeven <. The result of the subscripting operation will be as long as the number of TRUE and NA elements in the subscripting vector. Here is a puzzle that you may run into at some point.rep(c(F. length=100) > jjthree <. F. T).6 > jjcw stevie munchkin st 4 2 6 7 [ with logicals The command x[x>0] is an example of subscripting with a logical vector.1. Here are a few tragically inadequate examples: > jjseq <. Burns v1.3. > jjwt2 dorothy harold munchkin stevie S Poetry c 1998 Patrick J. length=100) > jjseq[jjeven] [1] 2 4 6 8 10 12 14 16 18 20 22 [13] 26 28 30 32 34 36 38 40 42 44 46 [25] 50 52 54 56 58 60 62 64 66 68 70 [37] 74 76 78 80 82 84 86 88 90 92 94 [49] 98 100 > jjseq[jjeven & jjthree] # divisible by 6 [1] 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 > jjseq[jjeven | jjthree] [1] 2 3 4 6 8 9 10 12 14 15 16 [13] 20 21 22 24 26 27 28 30 32 33 34 [25] 38 39 40 42 44 45 46 48 50 51 52 [37] 56 57 58 60 62 63 64 66 68 69 70 [49] 74 75 76 78 80 81 82 84 86 87 88 [61] 92 93 94 96 98 99 100 24 48 72 96 96 18 36 54 72 90 DANGER. SUBSCRIPTING > jjcw["st"] <. An NA in the subscripting vector implies an NA in the result. When the subscripting vector is logical.0 . T).

Machine$double. The $ operator The $ operator is the most common way to extract a component from a list. However. the situation starts to become clear. On the left is the name of the list.8 CHAPTER 1.Machine$double.21:22 > length(jjwt2[jjsubw]) [1] 2 > length(21:22) [1] 2 Both sides of the assignment are the same length. As in subscripting with character vectors.Machine))] [1] "double. > jjsubw [1] F T NA F > jjwt2 dorothy harold munchkin stevie 14 21 2 4 The missing value in the subscripting vector counts in extraction.220446e-16 > . the component name only needs enough of the start to be uniquely identiﬁed. There is more than one component of .exponent" S Poetry c 1998 Patrick J.e". > . so we think that S must surely be going crazy. names(.eps" "double.220446e-16 > .replace function on page 73 contains examples that are perhaps more enlightening.21:22 Warning messages: Replacement length not a multiple of number of elements to replace in: jjwt2[jjsubw] <. but not in replacement. Burns v1.e": > names(.Machine that partially matches "double.0 .eps [1] 2. once we look at the subscripting vector.Machine)[grep("double. ESSENTIALS 14 15 2 4 > jjwt2[jjsubw] <. The match function can be used to perform similar operations with subscripting—see page 71.ep [1] 2.Machine$double.e NULL You may be surprised about the result of the last command. and on the right is the name of the component of interest. The p.

eps"]]. The statement x$compname which is equivalent to x[["compname"]] is distinct from x[[compname]]. Also use [[ if the subscripting object is the name of an object that contains what you want to subscript. Burns v1. When the latter is the correct formulation.0 .3. DANGER. the match has to be exact or NULL is returned. Although I’ve presented the [ operator as if it worked just on atomic vectors. It is of signiﬁcance that an error does not occur. > jj $abc: [1] 1 2 3 4 5 6 7 8 9 > jj$a [1] 1 2 3 4 5 6 7 8 9 > jj$a <. The [[ operator does everything that $ does and more.Machine$double. it in fact works on lists (and lots of other objects) also. then $ can not be used—you must use [[ with a numeric index. DANGER. If the component has no name. The former is 5 characters shorter and is prettier—that’s the entire diﬀerence.Machine[[1]] S Poetry c 1998 Patrick J. Look at the diﬀerence: > . The ﬁrst gives you the ﬁrst component of x. Beware the diﬀerence between x[[1]] and x[1].Machine[["double. The statement . compname will be a length 1 vector that is either a positive integer or a character string of the (start of the) name of one of the components of x. The caveat about abbreviating when doing a replacement applies here also.eps is precisely the same as . So don’t do it.1.3456 > jj $abc: [1] 1 2 3 4 5 6 7 8 9 $a: [1] 3456 [[ The $ operator could be removed from the S language without reducing functionality. SUBSCRIPTING 9 In such a case. and the second gives you a list of length 1 whose component is the ﬁrst component of x.

Dimension dropping is a common source of intermittent bugs. "Area"] will be an ordinary vector and not a matrix. 2.220446e-16 > . For example state.Machine[1] $double.10 [1] 2. and iris[1:2. "Pop")] If you subscript an array and one or more dimensions are only 1 long. Without the drop=F the code works ﬁne as long as the array keeps the shape that the programmer expected. 4] will just be a vector. When subscripting an array in a function deﬁnition. If this is not the behavior you want. ESSENTIALS The [[ operator always gets a single component (while [ may get any number).x77[1:5. DANGER. Note. then those dimensions are dropped.eps: [1] 2. The subscripting vector for each dimension can be any of the forms given for [ above. then use drop=F in the subscript. this is discussed on page 203. but there is trouble as soon as one of the subscripted dimensions has length 1. 4] will be a matrix while iris[1:2. 4. c("Area". that character subscripting vectors now refer to dimnames rather than names.x77[1:5.0 . Burns v1. though.220446e-16 CHAPTER 1. For example iris[1:2. Subscripted arrays Arrays and data frames have special subscripting capabilities—each dimension may be subscripted independently. 2. An example is: state. A specialized use of [[ allows the subscripting vector to have length more than 1. 1:2. drop=F] remains a three-dimensional array. it is good practice to explicitly specify whether drop should be TRUE S Poetry c 1998 Patrick J. The array iris is three-dimensional.

9.1] <. 1] <. you can coerce to an array. [ with a matrix The methods of subscripting arrays that I have described have the constraint that the result also looks like an array.] 3 The ﬁrst works while the second creates an error. A matrix has consecutive elements running down the columns. arrays may not be assigned an out of bounds value. 4:2) This ﬂexibility with subscripting can lead to bugs if commas are forgotten. The ﬁrst dimension of an array moves fastest and the last dimension moves slowest.9: Array subscript (5) out of bounds.as. Compare: > > > > jj <.1:3.1] [1.0 . jj > [1] 1 2 3 NA 9 jjm <. then the expression x[5:7] picks out the 5th row of the ﬁrst column and the ﬁrst two rows of the second column .] 2 [3.] 1 [2. If you want to make sure that assignments will be in bounds. jj[5] <. But we may want to select elements that S Poetry c 1998 Patrick J. So if x is a matrix with 5 rows. Data frames are really lists in which each component is the contents of a column. SUBSCRIPTING or FALSE.1. 11 When subscripting an array. should be at most 3 Dumped > [. Unlike ordinary vectors. A logical reason for this discrepancy is that increasing a dimension of an array is much more involved than increasing the length of a vector. so subscripting a data frame with a single subscripting vector is thinking of the object as this list. DANGER.9. Burns v1. jjm Error in jjm[5. An array is a vector with the elements wrapped into the dimensions.matrix(1:3). You can see the pattern by issuing a command like: array(1:24. you can always use just a single subscripting vector which treats the object as if it did not have dimensions.3. jjm[5.

new xmat } In this function rmin has an element corresponding to each row of the input xmat. The command S Poetry c 1998 Patrick J.NA would not work in the useful way it does. Suppose that we want to replace the smallest value in each row of a matrix. function(x) order(x)[1]) xmat[cbind(1:nrow(xmat). DANGER. but there is no one column to pick.12 CHAPTER 1.mathgraph function on page 312 provides another example of this form of subscripting. So we know that we want to subscript once from each row. "fjjrowmin"<function(xmat. Note that a logical matrix as a subscript is treated the same as if it were an ordinary vector. and the number of rows of the subscripting matrix is the number of elements being subscripted. Empty subscripts Subscripting vectors may be missing—meaning (perhaps ironically) everything is included as is. A two-column matrix is then created with cbind.apply(xmat. The incidmat. Here is a function that solves the smallest value replacement problem. ESSENTIALS appear “at random”. new=-Inf) { rmin <. The solution is to subscript with a matrix. 1. The most common use of this is when subscripting arrays. Here’s an example. rmin)] <. The number of columns of the subscripting matrix must equal the number of dimensions in the array being subscripted. This is a bug (in my opinion) since character subscripts are equivalent to positive numeric subscripts. The numbers in the subscripting matrix should be positive. It is possible to think of the problem in terms of a single subscripting vector. Otherwise xmat[xmat < 0] <.0 . but there may be information from the array that is useful. Burns v1. A matrix can be used only as the subscripter of an array. Each element of rmin is the position (column) within the row that contains the smallest value in that row of xmat. Many versions of S do not allow a subscripting matrix to be character.

] returns the ﬁrst 5 rows and all of the columns of state. 13 An empty subscript can be useful for ordinary vectors also.4. Burns v1. A missing subscripting vector is decidedly diﬀerent than a zero length vector such as NULL.rep(0. length(x)) DANGER. Thus it is appropriate that they look the same when printed.0 . Though the actual structure of a data frame and a matrix is very diﬀerent. Generic functions have S Poetry c 1998 Patrick J. then the result has zero length.1.0 replaces each element of x with 0. What follows is how version 3 does it. you don’t need to ﬁnd out what type of object it is. Thus it is often diﬀerent than x <.x77[1:5. You merely use print and the right thing happens. Programming is simpliﬁed for the reason above. When a data frame is printed. The command x[] <. If the subscripting vector is NULL (actually atomic and zero length). but leaves its attributes (such as names) alone. then try to remember the proper function to use on that type of object. Some functions are generic—print is an example—meaning that the action depends on the type of object given as an argument. it looks like a matrix. OBJECT-ORIENTED PROGRAMMING state. The print function is using object-orientation to make that happen. but carries a lot of weight. but the version 3 approach works in version 4. the concepts of data frame and matrix are almost identical. then do it. Version 4 of S has changed some of the mechanics of how object-orientation is achieved. inheritance allows a very productive way to leverage existing code.4 Object-Oriented Programming The idea of object-oriented programming is simple. 1. If you want to print an object.x77. Object-orientation simpliﬁes. Likewise it is possible for S objects to get dressed appropriately depending on what class of object they are. Furthermore. plus it also suggests a proper ensemble of functions to write for a particular application. Here’s the whole thing: if you told a group of people “dress for work”. It can also simplify programming. then you would expect each to put on clothes appropriate for that individual’s job.

This is the public view—that it looks like a matrix. Inheritance results from a class attribute longer than 1. so that the data frame method is used. objects of class "terms" are digested formulas (class "formula"). If an object’s class has length 2. Burns v1. Inheritance should be based on similarity of the structure of the objects. A data frame is implemented as a list with each component containing the contents of a column. A method is the function that is actually used when a generic function is called. then the ﬁrst class inherits from the second. ESSENTIALS one or more methods. So the print method for data frames (class "data. then the default method is indicated. but contains a speciﬁc type of data. "data.frame") You could write methods for some generic functions. If no element of the class matches a method. The class of an object is a character vector.default.0 . A generic print threatens ill. Remember that the class vector goes from speciﬁc to general.) In version 3 the name of a method is the name of the generic function followed by a period followed by the name of the class. It is the "class" attribute of an argument (usually the ﬁrst argument) that determines which of the possible methods will be used.frame") is print. Data frames could be changed so that the public view remains the same. Each element of the class is examined in turn until one matches a method.dframe". The loan function presented on page 226 is almost like this. they are not structurally very similar at all. Ideally the user of a class of objects (as opposed to the creator of the class) should be able to use only generic functions. I can use lapply on it to apply a function to each column. it means that OOP can stand for “Obfuscation-Oriented Programming” as well as “Object-Oriented Programming”. The idea of a data frame is that it be a rectangular arrangement of elements in which the type of element in one column need not be the same as the elements in other columns. For example. like [ (subscripting). so the decision was that there be no inheritance. I think there is room for improvement in many of the print methods that now exist (including ones that I have written). so the class could be something like: c("my. Version 4 allows more ﬂexible naming. then the default method is used. Since it is a list. It would be advantageous in spots if terms inherited from formulas since they are conceptually very similar. Data frames are a good example to take up. and hence need no S Poetry c 1998 Patrick J.data. However. an error occurs. If the object does not have a "class" attribute. but their implementation is diﬀerent so that my lapply trick would not work. Suppose that you want to create a class of object that looks like a data frame.frame and the default print method is print. Since what you see is not what you get. This leads to the issue of private versus public views. not on conceptual similarity. but not others that you plan to use. The private view is that it is implemented as a list. (Whenever the default method is called for and it does not exist.14 CHAPTER 1. The onus is on the programmer of the print method to eliminate as much confusion as possible. perhaps print and summary.

Two principles that S graphics follow is that they are device independent. You can create your own high-level graphics function from scratch by starting with a call to frame and then using whatever low-level functions you like. Device independence means that the same commands are used to create a graph on a PostScript printer as on a monitor running Motif—the only diﬀerence is what type of graphics device the user has set to listen for graphics commands. More on object-orientation starts on page 223.1. and contains a number of conceptual regions. of the private view. The surface of a device (to be concrete. Window devices produce graphics when you are running a window system on the network where S is running. 1. GRAPHICS 15 information about the precise structure of the object. think of a page of hard-copy output) is called a graphics frame. and you can use a method function (with few exceptions) as an ordinary function on any objects you choose. for example) merely add to the existing ﬁgure. and is only skimmed in this one. S does not enforce object-orientation in any way. telnet to a Unix machine where I run S.0 . Graphics devices are put into one of three categories. Examples of graphics parameters include cex (character expansion) which gives the size of the text relative to the standard font. Finally there are terminal devices that are used when S is run remotely. High-level functions. A graphics device is an S function that arranges for graphics to be rendered. that is. and that each graph can be built up with numerous commands. Hard-copy devices are for creating a physical picture. and get graphics on my Macintosh screen using tek4105. The topic deserves a book of its own. which performs Tektronics emulation. and mfcol which controls the number of plots per page of output. For example. Low-level functions (points and lines. the most common is the postscript device. The ability to add to plots means that complex graphics can be created easily. Within S Poetry c 1998 Patrick J. create an entire new ﬁgure. The state is described by a reasonably large number of graphics parameters. as opposed to graphics devices—are divided into high-level and low-level functions. the S-PLUS motif device is an example. I can slouch in front of my Macintosh at home. Here I describe the basics of how graphics work in S. Objects that you create need not have a class attribute. like plot and barplot.5. Graphics functions—functions that say what to draw. Some functions have an add argument so that they can function in either capacity. Burns v1.5 Graphics Graphics are an integral part of S. the postscript function makes it so that a ﬁle of PostScript commands will result when a graphics command is given. For instance. There should be a “Devices” help ﬁle that informs you of graphics devices available to you. A graphics device has a state that is queried and modiﬁed by the par function.

16 Operator $ [ [[ ^ : %% %whatever% * / + == < > >= <= != ! & | && || ~ <- _ <<; Associativity left to right left to right right to left right to left none left to right left to right left to right left to right right to left left to right none right to left left to right

CHAPTER 1. ESSENTIALS Task component subscript subscript exponentiation unary minus sequence in-built and user-deﬁned multiply and divide addition and subtraction comparison not and, or formula assignment statement end

Table 1.1: Precedence table for the S language, version 3. the frame there are one or more ﬁgures. Surrounding the set of ﬁgures is the outer margin; this often contains zero area. Within each ﬁgure is a plot area. Surrounding the plot area within the ﬁgure is the margin for that ﬁgure. The plot area is where the points in a scatter plot go. The margin contains titles, tick labels and axis labels. Some of the graphics functions are listed on page 113, and the par function is discussed brieﬂy on page 48.

1.6

Precedence

As in many languages, operators in S obey precedence. The command 6 + 8 * 9 will perform the multiplication before the addition. If you want the addition done ﬁrst, use parentheses: (6 + 8) * 9 For the most part, the precedence in S is intuitive, the one I have the hardest time with is the : operator. See table 1.1. When operators have equal precedence, almost all of them associate from left to right—that is, the leftmost operation is performed ﬁrst. The exceptions are the assignment operators <- and <<- plus exponentiation with ˆ which each associate from right to left. S Poetry c 1998 Patrick J. Burns v1.0

1.7. COERCION HAPPENS

17

Somewhat related to precedence is the brace. A pair of braces ({ }) bundles a number of statements into a single statement. For example, you need to use braces when you write a function that contains more than one statement. The same is true in if statements and loops.

1.7

Coercion Happens

The order of the atomic modes is: logical, numeric, complex, character. This is ordered by the amount of information possible. There are only three possible logical values, any logical value can be represented as a numeric value, any numeric value can be represented as a complex value, and any of the above plus the contents of the Library of Congress can be represented by a character value. Whenever an atomic vector is created with elements of more than one mode, then the result will have the mode of the most general element. For example, the mode of c(T, "cat") will be character and the mode of c(3, 4i) will be complex. The expression c(T, NA) implies that the mode of a solitary NA must have mode logical. This, by the way, implies that if a single NA is used as a subscripting vector, then that is a logical subscript, not a numeric one. Compare: > c(T, [1] T > c(T, [1] 1 NA) NA as.numeric(NA)) NA

Although most coercions are obvious, the ones involving logicals are not. When coercing logical to numeric or complex, a TRUE becomes 1, and a FALSE becomes 0. A number coerced to logical is TRUE unless it is zero (or NA). In many operations coercion occurs automatically. This is the grease that allows S to run smoothly (and sometimes a slimy trick if the result is not what you expected). A common use of coercion is in an expression like if(length(x)) ... where if is expecting a logical value but it gets a numeric one. The expression is equivalent to if(length(x) != 0) ... but more compact and almost as intuitive. A very useful coercion occurs in sum(x > 0); the result of x > 0 is a logical vector the same length as x, but sum expects numeric or complex arguments so it coerces the logical vector to numeric (the least general mode that it needs). The result is the number of positive values in x. S Poetry c 1998 Patrick J. Burns v1.0

18

CHAPTER 1. ESSENTIALS

DANGER. Notice: > 50 < "7" [1] T This shows that the coercion must be to character, and that you want to think carefully about comparisons on characters.

1.8

Looping

The only places where actual work gets done in S functions are inside calls to .C, .Fortran or .Internal. That is, in calls to C routines, Fortran routines, or the special routines (written in C) that understand S objects. The more directly you get to these endpoints, the more eﬃcient your code will be. It is well-known to users of S that using for loops and its cousins can be quite slow. Fundamentally, this is not the fault of the looping mechanism itself, but that a lot of calls to S functions are being made. There is a fair amount of overhead for each call to an S function; if a loop does hundreds or thousands of iterations, the overhead adds up to a noticeable amount. So the general rule is that the less that is in loops, the faster your code. Here are three situations: no loop is used; apply, lapply or a relative is used; a loop is used. No loop is required if the same operation is done on each element of an object. If you want to replace each negative element of a matrix with zero, then you could loop over the rows, loop over the columns and change each element individually. A faster, more direct way would be: xmat[xmat < 0] <- 0 If the same operation is done on sections of an object, then a call to apply or lapply or a similar function is probably a good choice. The apply function operates on arrays, and lapply operates on lists. In older versions of S, the apply family of functions was implemented as for loops, but version 3.2 of SPLUS changed them so that they are more eﬃcient than a for loop. When these internal apply functions are available to you, they can speed execution substantially compared to the equivalent for loop. A section on the apply family begins on page 75, in particular, see the sorting example on page 76. When one iteration requires results from previous iterations, there is usually no alternative to using a loop. Loops can be eliminated in a few special cases of this through the use of functions like cumsum and the S-PLUS functions cumprod, cummax, filter. When you do use a loop, try to put as little in the loop as S Poetry c 1998 Patrick J. Burns v1.0

1.9. FUNCTION ARGUMENTS

19

possible. Also try to loop as few times as possible—sometimes there is a choice of how to loop.

1.9

Function Arguments

Arguments to functions can be divided into required arguments and optional arguments. The ability to have optional arguments is yet another powerful feature of S. You can look at it either as getting more ﬂexibility without any complication, or as getting simplicity with no reduction of functionality. Since not all arguments need to be given in a call to a function, there needs to be a mechanism to match the function arguments to the arguments in the call. Arguments may be speciﬁed by name (and since there is a limited set of arguments for a function, the usual convention of abbreviation may be used). Arguments may also be speciﬁed by their position in the call. Here is the general rule for matching the call arguments to the function arguments: First, names that match exactly are paired up. Second, names that partially match are paired. Finally, remaining arguments in the call are matched to the remaining arguments in the function in order. Come live with me and be my love 3 There is a complication. The ... construct (called “ellipsis” but more commonly “three-dots”) allows a function to have an arbitrary number of arguments. There is a distinction between arguments that appear in the function before the three-dots and those that appear after it. Arguments before the three-dots are matched as usual—by partial name matching and/or by position. Arguments that appear after the three-dots are matched only by full name. All unmatched arguments go into the three-dots. Carefully written help ﬁles indicate arguments that may only be matched by full name by placing an equal sign after the argument name when it is described (the paste help is an example). It does not matter where these arguments appear in the call—there is no requirement that they be last.

1.10

A Model of Computation

It is advantageous to be able to visualize what is happening when an S function (or any program) runs. The description I’m about to give is not accurate in every detail, but should serve the purpose of helping you to picture what is happening between the time you hit the return key and you get back an S prompt. That maps are of time, not place 4 As in business, the three most important things in computing are location, location, location. There are three types of space that are important for S objects: disk-space, RAM and swap-space. Disk-space is where permanent S objects live. RAM (random access memory) is where S objects that are being S Poetry c 1998 Patrick J. Burns v1.0

20

CHAPTER 1. ESSENTIALS

used in a calculation must be. Swap-space is an extension of RAM for when things get rough. If we multiply the permanent dataset prim9 by 2, S ﬁrst ﬁnds prim9 in diskspace and makes a copy of it in RAM. Then the multiplication of each element is performed. The multiplications can take place because the computer knows the address in RAM of each element of the copy of prim9. More particularly, the situation is probably that it knows that all of the elements are lined up side by side, it knows how many there are, it knows how much space each one takes, and it knows the address of the ﬁrst one. We can envision RAM as a long line of boxes, where each has an address. At any one time some of these boxes are in use, and probably some are empty. When S needs to bring another object into RAM, the computer ﬁnds a place for the object and remembers the address where it was put. If there is not enough room in RAM for the new object, then the operating system guesses about what in RAM won’t be needed for a while, and puts that in the swap-space (storage area). At a point when something that is in swap-space is needed, then something else needs to be put in swap-space and what is needed is “swapped” or “paged” into that place in RAM. When computations are small, then everything ﬁts in RAM and there are no problems. If RAM ﬁlls up, then paging starts; this dramatically slows the progress of computations, as the computer spends most of its time trying to solve the puzzle of how to get all of the necessary ingredients into RAM at the same time. If the computation needs more memory than RAM plus the swapspace can hold, then the computer has to give up—sometimes the exit from this situation is less than graceful. Here is an example of an error message from running out of memory: Error in 1:ll: Unable to obtain requested dynamic memory Dumped

DANGER. Do not confuse this with trying to create an object that is larger than the current limit: Error in 1:ll: Trying to allocate a vector with too many elements (40000000) Dumped This latter error can be ﬁxed by increasing the object.size option. The former error can only be solved by changing the computations or using a machine that has more memory.

Humans tend to make a distinction between data like 2 and prim9, versus S Poetry c 1998 Patrick J. Burns v1.0

1.11. THINGS TO DO

21

operations like multiplication. But at the RAM level, functions are also just a collection of addresses. We are now in a position to examine why S is so slow, or more positively, why C is so fast. Each object in a C program is declared to be of a certain type, so the computer knows how much space each object (or at least each element of the object) takes. C is compiled so that it has a clear map of which addresses it needs to visit in which order before it starts. In comparison with this, S is on a treasure hunt—S goes to one spot, then gets a clue to where it has to go next, where it gets a new clue, and so on. So why doesn’t S do what C does? Flexibility. An input to an S function is not restricted to a particular type. S functions do not care when functions that they call are changed; the deﬁnition of functions at the time of use are used, not the deﬁnition at the time that the calling function was created. And so on, etcetera. S and C should not be thought of as competitors, with C winning because it is faster or with S winning because it is more ﬂexible. They should be thought of as associates, each with its own strengths, that should each be employed as appropriate.

1.11

Things To Do

Write a function that uses each form of subscripting. What happens when you subscript with complex numbers? Is this the most useful behavior? Write a command that uses at least one operator from each line of the precedence table. Read each S function available to you, and try to understand how it works.

1.12

Further Reading

Becker, Chambers and Wilks (1988) The New S Language is the fundamental document for the S language. Chambers and Hastie (1992) Statistical Models in S discusses object-orientation as it ﬁrst appeared in S—Appendix A may be of particular interest. Spector (1994) An Introduction to S and S-Plus provides a gentler introduction to S than this book as well as covering graphics more thoroughly. Venables and Ripley (1994, 1997) Modern Applied Statistics with S-Plus also discuss the S language in a few chapters and then go on to provide examples of statistical analyses in S. S Poetry c 1998 Patrick J. Burns v1.0

22 If you are a historian or you want to see how far S has come, then you can look at Becker and Chambers (1984) S: An Interactive Environment for Data Analysis and Graphics; and Becker and Chambers (1985) Extending the S System. These two books are obsolete for the purposes of computing.

1.13

1 2

Quotations

Theodore Roethke “I Knew a Woman” Henry David Thoreau “I Am a Parcel of Vain Strivings Tied” 3 Christopher Marlowe “The Passionate Shepherd to His Love” 4 Henry Reed “Lessons of the War: Judging Distances”

S Poetry c 1998 Patrick J. Burns v1.0

Chapter 2

Poetics

In which right thinking is advocated.

2.1

Abstraction

Abstraction is the heart of programming—all else is mere detail. A major symptom of the need to abstract is the occurrence of redundant code. You may immediately realize that you would rather not type the same statements over and over again, so abstracting those lines into a function saves your ﬁngers. But there is a more important reason. Suppose that you decide something needs to change in those lines—you found a bug or the scope of the functionality changed—then the single function holds a tremendous advantage over the repetitious code. In the repetitious case, you need to change every occurrence (yet more typing), and there will be a bug when you miss one—and that bug will be hard to ﬁnd. Capricious Rule 1 Whenever a group of at least three similar lines of code occurs more than once, a function shall be written to perform the task. The simplest form of abstraction in S is to take a task that you use one or more command lines to do, and make that into a function. In the example below we assume the user wants to look at the histogram and get the summary and variance for a number of datasets. > fjjexplore function(x) { hist(x) c(summary(x), var = var(x)) 23

24 CHAPTER 2. An example is the lib.load function provides indirection for the location of the object ﬁles.load and dyn.Location to poetry directory") real.dyn.dyn. The poet.306 9.load function is another example of indirection.791 9. which is going to be diﬀerent for each installation.314 9.045 9. var 8.load(real.794 0. A particular type of abstraction is indirection.paste(Poet. S Poetry c 1998 Patrick J. The library function is not limited to looking for libraries in one set place.Location")) stop("set Poet. Change it. DANGER. "poet. fname. some abstractions are more useful than others. This generally can be ﬁxed by bundling like objects together and possibly vectorizing. The poet.591 9.loc <. In order to be loaded. Max.0 . Burns v1.loc) invisible(ans) } There are object ﬁles that need to be loaded to use several of the functions described in this book.y) # a plot appears Min. Median Mean 3rd Qu.load")) ans <.load2(real.dyn. One of the beneﬁts of object-oriented programming is that it forces the programmer to abstract the problem. instead there is a two-step process which is ﬂexible and convenient. the directory where the ﬁles live needs to be known.loc object that states the directories where S libraries may be found. 1st Qu.load2 depending on the availability of the functions in speciﬁc versions—another instance of indirection. If your code reads like a phone book. something is wrong.Location. Subscripting in S is an example of powerful abstraction. It also allows an easy way of changing between dyn.load"<function(fname) { if(!exists("Poet.dyn.09961393 Every function embodies an abstraction.dyn. but at least there is hope of avoiding an amorphous mass. sep = "/") if(exists("dyn.loc) else ans <. The abstraction chosen may be good or poor of course. Several ways of approaching subscripting are all handled similarly—there is both generality and simplicity. POETICS } > fjjexplore(freeny.

0 . Till it bore an apple bright. I must confess that I have written a couple 500-line functions of which I am reasonably proud. For instance. When looking for show-stoppers. I think of laziness as the most important of the three. Capricious Rule 1 No function shall contain more than 37 lines of code. S Poetry c 1998 Patrick J. Even if the original plan appears feasible. you may ﬁnd a more streamlined attack on the problem. Functions that are simple are easily used and easily adapted. Each function should have a contract—it accepts certain inputs. impatience and hubris. positively most splendiferous mistake of beginning programmers is to try to do too much in a single function. 5 But why? Because small functions are easier to understand. Being a lazy person. “Good programmers are a little bit lazy: they sit back and wait for an insight rather than rushing forward with their ﬁrst idea.2 Simplicity Programming Perl lists three virtues of programmers: laziness. and agrees to produce a speciﬁc output. Impatience is useful to make computers do more work than we do (and to make them do all the boring stuﬀ). then the whole system should work.2.” Jon Bentley in Programming Pearls takes a slightly diﬀerent tack. This is the core idea of “modular programming”— that the code can be broken into pieces that can each be veriﬁed to do what they are supposed to. involved function is hard to adapt to a slightly diﬀerent purpose. And it grew both day and night. but these are exceptional cases in which a lot must be done. you are concentrating on the weakest portions of your plan. A likely example is that speed may not be adequate—moving strategic pieces of the code from S to C can save the day.2. Think in terms of building blocks. SIMPLICITY 25 2. Hubris forces the programmer to produce quality solutions. and look for ways to break it up into subtasks. simple functions are less often broken. Most importantly. Unix Power Tools credits Programming Perl as describing laziness as the “quality that makes you go to great eﬀort to reduce overall energy expenditure. The absolutely. it is a virtue to write a small function to do something that you are repeatedly doing on the command line. Burns v1. Because a long. I would like to add another virtue to the stack: simple-mindedness. Because the subfunctions that you create are going to be useful again. If all the pieces work.” Another good way to be lazy is to prove the feasibility of a large project before you spend a lot of time on programming details. Consider all of the things that need to be done to achieve your goal. and to attempt the impossible. Look for critical points that might not be doable within the given constraints—show-stoppers—and solve those ﬁrst. Simpletons avoid whatever is complex.

.0 . do not write functions that create or modify objects. Actually this is not quite true—if NULL is a valid input for y. Similarly details can be hidden from functions. For example. and return it from the function. "fjjmiss1" <function(x. The object contains a number of components that are used in prediction and so on. POETICS When changes are made. but in diﬀerent ways. } "fjjmiss2" <function(x. Side eﬀects destroy the transparency of a set of functions. y=NULL) { if(is. avoid using global variables as much as possible. There has been a proposal to ban the <<.. A very important thing not to do is write functions with side eﬀects. then testing can be focused near the changes.. } . } Assuming that the ellipses in the two functions are the same.26 CHAPTER 2. I second the motion. it should be used only when necessary because it can complicate programming. Data hiding is another technique to achieve simplicity. The most obvious example of data hiding is some of the print methods in S. It is much more transparent when all necessary variables are passed in as arguments.. Burns v1.. y) { if(missing(y)) { . then the two functions are diﬀerent.null(y)) { .. On the other end of the same stick.. yet its printed form is merely a terse summary of what the object represents. Instead collect all of the newly created information (probably in a list). Looking at these functions in isolation. Specifically. Although the missing function is a useful tool. an lm object represents a linear regression. the results will be the same. It is also likely that the function will be much more versatile. making it more likely that the user will make assumptions that are not true. fjjmiss1 has the advantage since it is more general.operator to make it a little less convenient to change objects behind the scenes. Let’s look at two functions that do the same thing. S Poetry c 1998 Patrick J.. } .

you want the interaction between functions to be smooth as well. However. Distrust everyone’s programming. A call to fjjmiss2 has no such complication. All of the names should be distinguishable by the ﬁrst few letters—a sort of anti-alliteration—while maintaining clear identiﬁcation of the meaning.” Think of consistency as abstraction on a more diﬀuse scale. so it is nice to have argument and component names that can be abbreviated eﬀectively. If we always use the same argument name for the same thing. Follow naming conventions to maintain consistency. Capricious Rule 1 The ﬁrst two letters of an argument name shall distinguish it from all of the other argument names. 2. The arguments for the function accepting a function should have its arguments all in capitals. Consistency makes it fast for users to learn the system. the arguments to lapply are: X. An exception is the naming of function arguments and list components. no surprise for the reader.stack function on page 228... especially your own. keeps some bugs from getting into the code. However. it is better to cultivate distrust actively than to suﬀer bouts of depression due to disappointment. CONSISTENCY 27 When we consider these functions being called by other functions. For example. Unlike poetry in a natural language where there is rhyme. You not only want each function to be as simple as possible. we really have abstracted the concept of that argument. and makes it easier to ﬁx bugs that do appear. “No surprise for the writer. FUN. While we are on the subject of virtues. then be smallminded when programming. the picture changes. there are few constraints when writing in S. Argument names in S are traditionally in lower case. We desire simplicity of use as well as simplicity of construction.3 Consistency If you believe that consistency is the hobgoblin of small minds.2. As people gain experience with computing this tends to be a natural occurrence. S Poetry c 1998 Patrick J. meter and so on. A function that calls fjjmiss1 needs an if statement that depends on whether or not y is to be missing. there is yet a ﬁfth to be had— distrust. an exception is when the function takes a function as an argument along with an arbitrary number of arguments for that function.3. An appropriate use of missing is in the [. . As Robert Frost said in “The Figure a Poem Makes”.0 . Burns v1.

Version 3 of S forces a naming convention on methods for generic functions. Give similar arguments in diﬀerent functions the same name. Capricious Rule 1 No variable shall be named tmp. All variables are temporary so tmp is not descriptive. Burns v1. x=prim9) The X of lapply and the x of mean (in this case set to be prim9) can peacefully coexist.integral function (page 320). POETICS The reason these arguments are capitalized is so their names will not conﬂict with argument names for the function passed in as FUN. you don’t need to follow this but you should decide what you are saying when you capitalize. You can make a call like: lapply(as. Unfortunately there are several functions that take functions as arguments that do not adhere to this convention.0 . mean. For example. An example of using this convention is the line. I think iter. For example iter. Variables within functions should have clearly meaningful names.28 CHAPTER 2. Note that S is not confused when the same name is used in nested functions.” anonymous (Eighteenth Century) S Poetry c 1998 Patrick J. Objects that live in database 0 usually start with a dot and have the next letter capitalized. Object names tend to be in lower case. and variables that start with “n” are more common than those starting with “i”.. “All ﬂesh is grass. It is bad practice to use the same name for more than one meaning within the function (an exception can be made to save space when dealing with large objects). How came this to pass? He heard the good pastor Say. Version 4 does not enforce the convention.list(seq(0. nnn and nnnn would be better named n3 and n4.01)). but it is probably wise not to stray from it without reason.1. On a Horse Who Bit a Clergyman The steed bit his master. Avoid names that can be easily confused. Thus it is tempting to have it stand for more than one thing in a single function.max is a better choice because it more clearly states its meaning.max is used in several functions to specify the maximum number of iterations allowed. Some other functions use niter for the same purpose. by=. An alternative method of handling the problem is given in the genopt function (page 331).

It is the prgrammer’s responsibility to maintain consistency. We get more ﬂexibility at the possible expense of consistency.4. But turning that on its side in the light of vectorization. [1] 1 1 2 > rep(1:5. > rep(1:5. eﬀectively abstracted function that conforms to all known conventions. 5)) 2 3 3 4 4 5 5 4:0) 1 2 2 2 3 3 4 Since computers are less amused by ambiguity than we.2. Burns v1. The times argument is what makes it transcend mere utility. The most common bug when using rep is to forget to name the length argument in the call. when the world is mudluscious 6 One of the most beautiful functions in S is rep. and they do. > rep(1:5. you are probably approaching the beautiful. [1] 1 1 1 rep(2. there are few constraints on its form. we can also give a vector for times that tells how many times to replicate each element in turn. The length argument states the length that the answer is to be.4 Beauty Once you have a simple. so that it is interpreted to be the times argument. it should be the case that these two meanings for times meet up nicely for length one objects. times says how many replications of the object we want. and if object-orientation is used. This function does a lot with three straightforward arguments. 2) [1] 1 2 3 4 5 1 2 3 4 5 > rep(1:5. Thus we can give commands like: > rep(1:5. leng=2) [1] 1 2 S Poetry c 1998 Patrick J. In its most common use. BEAUTY 29 Object-oriented programming is not enforced by S. len=4) [1] 1 2 3 4 > rep(1:5. len=7) [1] 1 2 3 4 5 1 2 DANGER. 2.0 .

Arbitrary limits are ugly and should be avoided. or it could have had a bunch more arguments that made the whole thing complicated.5 Doing It Much has been made of “top-down” programming. The maximum order of the polynomial solved by polyroot is 48. general names (like maybe fly or run) for inferior functions. you may paint yourself into some corners. The number of arguments that can be given to a C or Fortran routine through . Too many arguments means that too much is done and hence it is too complicated. 2. If the solution you have to a problem contains an arbitrary limit. old and of vast importance 7 The ﬁnal ﬂourish of creating a beautiful function is to give it a good name. The number of explanatory variables that can be given to the leaps function is limited to something like 31. by all means take advantage of it. In my experience it is more likely that the problem is only partially known (or knowable) in advance. I suspect that there are few worlds born whole as out of Zeus’ head. Instead it holds to a middle ground of both power and simplicity. but if taken to the extreme. If “top-down” is taken to mean think before you type. The number of characters in the name should be inversely proportional to the number of times it will be typed—a function that is typed only a few times a month can aﬀord to have a longer name than one typed several times a day. only details are left to be worked out.” A good image for code generation is one of punctuated evolution. As more is learned.0 . A descriptive name for an infrequently used function also makes it easier to ﬁnd. the proper structure starts to show itself. I think). Capricious Rule 1 All functions shall have three to seven arguments.C or . An apt slogan is one seen on bumperstickers: “Think Globally. Experimentation provides insight into what the real question is. POETICS rep could have been written so that it did less. Act Locally. It is sometimes professed that the way to approach a programming task is to specify the main structure and how everything relates to its neighbors in advance. Burns v1. Too few arguments means that the abstraction embodied by the function is not general enough. There are short intervals of massive changes interspersed with long periods of almost no S Poetry c 1998 Patrick J.30 CHAPTER 2. then I like it. Avoid using good. The name should clearly indicate the functionality.Fortran is limited to some arbitrary number (about 50 in most versions of S-PLUS. Disturbed some rythm. There are only three places that I can think of where they exist. This brings to mind the following rule. your pride (remember hubris?) should get in the way of taking it seriously. and how wide of a net can be cast. but if you run across one. S does a good job of not having them.

Do not propagate it over the internet (do as I say. then ﬁx the limitations that you’ve found—always trying to move toward simpler and more general. Burns v1. With trembling care. hone it until it is right. DOING IT 31 change at all. If you have a signiﬁcant piece of code. then breaking the compatibility by forcing the use of the new structure is best. give it to a few people to test and comment on. In the midst of all of this pondering on programming etiquette. with a later version being (hopefully) an improvement of the preceeding version. play with it a little.0 .5. This is a caution to get it right the ﬁrst time. This is a dark cloud over beautiful programming— either consistency. then thrown completely away and rewritten. It is seldom the case that programming is performed with no constraints. As a deadline worth millions looms. Obviously the aim is not to produce examples of poor coding for you to emulate. Write an initial version.2. It is safer to assign a name to the length of the vector. Unlike living creatures. so you can use 5 as the length of the vector—hard-code the value—but someone who changes the code and uses a diﬀerent length vector may not ﬁnd all of the locations where 5 means the length of this vector. Keep in mind that code evolves and adapts. but I’m sure it is closer to optimal than how software actually is written. Nobody ever does it because it seems extraordinarily wasteful. Think about it. There should be such a sign whereever software is under construction. knowing that most things break. The periods of change can be spurred either by new needs or by a better structure becoming evident. the choices tend to be reduced to retaining the same interaction (though internals might change). the length of a vector may be 5 in a function. there may not be time to create perfect functions. 9 For example. S Poetry c 1998 Patrick J. If the program is not wide-spread. So was I once myself a swinger of birches. computer programs unfortunately often need to exhibit backward-compatibility. 8 Think safety Many building construction sites have the prominent sign “THINK SAFETY”. not as I do). The main reason is to drive home that revising code should be standard practice. There is an adage (never followed) that software should be written once. Do not sell it. truth may break in. then prosyletize. In many of the examples in this book two or more versions of a function are given. Take the gold and tidy your code later. It is not enough to get it right for this one use—think also of how minor changes to the code might introduce bugs. or having two systems that do basically the same thing. If fame has spread though. non-redundancy or backward-compatibility has to give way.

Do not change the match function’s default value of nomatch because there are numerous places that do not use the safe approach. nomatch=NA))) DANGER. Here is a method to do that. Burns v1. If the use of a function critically depends on the value of a default.32 CHAPTER 2. for the default value of the drop argument in subscripting to be FALSE. Safer would be: if(any(is. then the above construction will not work. drop = F) { cl <.7710749 > dim(. y)))) which depends on the default value of the nomatch argument to match. j. > jjpbx a b 1 1 -0. A common fragment of code is if(any(is.class(x) x <.7710749 2 2 1.0032299 > class(jjpbx) [1] "matrix" > jjpbx[1.na(match(x.value) NULL It seems perfectly natural to me that we may want an object of class matrix to remain a matrix when subscripted—that is. Here is an example of where thinking of safety would have helped. If someone decides to create their own version of match with the default value of nomatch changed to 0. then explicitly give the default for that argument. POETICS If you are adding arguments to a function that is already in use. y. say.na(match(x.] a b 1 -0.unclass(x) ans <.x[i. j. "[. If you do this. then calls based on the previous deﬁnition of the function will continue to work properly. i.matrix"<function(x.3230554 3 3 -1.0 . drop = drop] S Poetry c 1998 Patrick J.Last. then put the new arguments at the end.

] > dim(jjone. paste(rep(".cl ans } 33 Subscripting indeed works. S Poetry c 1998 Patrick J.d <. On the other hand. you want errors when they do occur to produce a message that will make the cause of the problem clear. ".d) [1] 1 2 > dim(drop(jjone. Most important is that a function should never be allowed to give a wrong answer. Burns v1.0 .length(dim(x)) if(n == 0) return(x) eval(parse(text = paste("x[".2.d)) NULL This will cause drop to fail on objects that do not have a drop argument in subscripting but do have a dim—that seems quite acceptable. n 1).drop=T to the deﬁnition of drop ﬁxes the problem: "drop"<function(x) { n <. Put error messages where appropriate (using the stop function). Here is how I decide where to put error checks: • If the problem will result in a wrong answer. DOING IT # avoid NULL matrices if(length(dim(ans))) class(ans) <. Better a totally undecipherable error or the computer crash than two times two equal ﬁve. low-level functions that aren’t intended to be called directly by the user should have few error checks.drop=T]"))) } > dim(drop(jjone. but now the drop function doesn’t work on these objects. In general. collapse = "").5. put in a check (and look around for more of the same). functions called a lot by inexperienced users should be heavily checked.d)) [1] 1 2 Adding the seven characters .". > jjone.jjpbx[1. You do not want to unduly clutter or weigh down functions with excessive error checking. This is harder than may be naively thought.

clarity and accuracy should take precedence over brevity. This S Poetry c 1998 Patrick J. . However. It ﬁlls in many of the items of interest and marks the spots where you need to put in information.) Writing the documentation clariﬁes how the function ﬁts with others. Once you know where you want an error message. (This is a dreadful way to program. then put in an error check. and comments in the code are two types. The other (which I term prose documentation) is an explanation that might appear as a user’s manual. There are a few possibilities: • The description is clear. and suggests tests for bugs as well as being a useful product in its own right. (In version 4 the documentation is a part of the S object rather than a separate ﬁle. I believe in short messages (so the user will actually read it). “matrix of explanatory data”. Prose documentation would tend to show how to use a group of functions in a coordinated manner as well as motivating why one would want to use the functionality. The description should include the type of object expected: “integer giving the number of . then put in an error check. you need to decide what it is to say. the principles remain the same.0 . but it is a good image to have when writing help ﬁles. Fine. When writing a help ﬁle. not “explanatory data”.34 CHAPTER 2. Documentation There are three forms of documentation that you can write for your S code. .”.” is preferable to “number of . • If the problem will create a reasonable error elsewhere but this is likely to be used by novices. Help ﬁles. Such thought is useful not only for placing error messages. You need a measure for what “clear” means—think of a reasonably intelligent person who is completely uninformed about the subject. I think writing a help ﬁle for a function is an integral part of writing the function. Here are some things to consider when writing a help ﬁle: • Describe each argument fully. . Implicit in the list above is that thought has been put into where errors come from.) The prompt function produces a ﬁle that is a template of the help ﬁle you want. it is a good goal to write it so that a person without a computer and with just the help ﬁles can conﬁdently write correct programs. Burns v1. You can not possibly know that your function is written as well as possible until you have written a thorough help ﬁle for it. but also to eliminate bugs. . POETICS • If the problem will result in an error farther on and the error condition will be vague or misleading.

When threedimensional. and to document it all in one step. No matter how clear and concise you think your explanations are. a good example will better answer the user’s questions. and explanation of how to use the function. While you do this. to debug the function. If possible. • The argument is hard to explain. Burns v1. and you have a clear map of what remains to be done to the function. The order should match that of functions with similar arguments. spell them out. but they should tend to be sparse. This is the section that is likely to be the most informative if done properly. • Provide examples that demonstrate the use and usefulness of the function. and nothing but. Also add comments if you S Poetry c 1998 Patrick J. This is a very expedient way to force me to think through what the function should be doing. then surely users will get it wrong. that I know what it should be but I’m not sure the function does that. If the argument can’t be explained. but those who claim to have no bugs in their code are either fools or liars.. For example. you might have a statement like “numerical array. give examples that the user can copy directly to get output—the inbuilt datasets are expedient for this. My feeling on comments in code is that there should be some. This might include references to literature.0 . • State all else that would be useful to the user. • A most important part of a help ﬁle is the BUGS section. the most used arguments should be near the front. wonder whether everything relevant is returned. even if these are just limitations on what they might be expected to do. or that the function does something more sensible than my description. as well as being more explicit for the user. and more likely the whole thing is ill-deﬁned. Write these down while they are fresh in your mind so that users are warned.5. In general. Put in comments when you are doing something clever—it is very easy to forget how clever you are. I realize that it’s fashionable to pretend that bugs do not exist. • The description is clear. details of the computing algorithm. Many functions as they are written contain known bugs.2. • If there are side eﬀects. DOING IT 35 gives more opportunity to ponder what happens if the input is diﬀerent from what is expected.. Do all of them have good names that are consistent with other function arguments? • Explain the output fully. An ill-conceived argument is a potent source of bugs.” I often ﬁnd that I don’t know what the “then” clause is. • Are the arguments in an appropriate order? Required arguments should be ﬁrst. but does not match what the function actually does. then . as appropriate. Rethink. Change the function or the description.

and there has been the desire to have as much backward-compatibility as possible. the time to train users. It is hard to overestimate the time required to maintain code. it is not a very convenient operation. On the whole it is a very poetic language. Part of the reason for focusing on these is that they are the easiest to measure. the time to maintain the code.36 CHAPTER 2. S Poetry c 1998 Patrick J. It allows and encourages a great deal of simplicity. all of the inconsistency in S is on the surface—the underlying language is incredibly consistent and simple. I don’t see any solution to this. Explain the bug fully so that it will be clear when the workaround can be removed. There are also some function arguments that have diﬀerent names in diﬀerent places. while computers get cheaper all the time. So some things that should logically be object-oriented are not. S does have its faults. the time to adapt the code to new uses. POETICS are working around a bug.6 Critique of S I give S high marks for having eﬀective abstractions and providing the ability to create further abstractions. This is important—I have several comments scattered through my code that inform of a workaround. Although you can specify that a particular name should come from a speciﬁc place. then you should consider rewriting the code so it is simpler rather than adding comments. some functionality is duplicated. some functions use formulas while similar functions do not. it is an important component of the expense. but do not contain enough information to know when the workaround can be abandoned. you merely need to be careful about naming objects. One is that the name-space is easily cluttered. All of this can frustrate a new user. Burns v1. The actual problem also includes the time to program the solution. and so on. Another weakness of S is that it has a lot of inconsistencies and redundancies. Almost all of this is because S has changed over the years. and so on. Often this optimization is too narrowly deﬁned as minimizing the memory and time usage of the code as it is run. 2. However. though. Remember that humans are expensive (and easily frustrated or bored). Minimize expense A programmer’s job is to minimize expense. If your code needs a lot of commenting to make it understandable.0 .

even though it was written in the late Paleolithic.2. THINGS TO DO 37 2. An example of his wisdom is that it is often useful to picture what a solution would look like. Write help for each of your important objects. but the techniques are suitable for any problem solving. A wonderful illustration of their spirit is that some of the examples of poor programming in the second edition are their code from the ﬁrst edition.7.8 Further Reading The books Programming Pearls and More Programming Pearls by Jon Bentley are very good at encouraging good programming. which I paraphrase: Rule of Style 1 If you have nothing to say. decide how to make it better. Find a beautiful S function (rep is not an allowable answer). The most memorable part of the book for me is his rules of style. It uses a form of Lisp as its language of choice—most of the examples could be translated into S without undue contortion. Structure and Interpretation of Computer Programs by Abelson. Here is a sample: “The fact that a four line comment is needed to explain what is going on should be enough to rewrite the code. Sussman and Sussman has abstraction as its main theme. It can be very useful to document non-functions as well as functions. and then work backwards from there. The Elements of Programming Style by Kernighan and Plauger has much wisdom that is still applicable. Then revise the functions so that as much commenting as possible can be eliminated. Put comments into your functions so that they are completely explained. An even better reason is that the comment is wrong. This book talks about how to solve mathematical problems. Burns v1. S Poetry c 1998 Patrick J.7 Things to Do Take a complicated function (preferably one of your own) and abstract the portions of what it does into several subfunctions. then don’t say it. 2. What is good about it? What would be good functionality for a function named fly? For each S function available to you. Repeat this on the same function as many times as you can. When real conceptual trouble strikes.” They often present some published code and then show how it can be improved. performing the abstraction diﬀerently each time. I occasionally pull out How to Solve It by George Polya.0 .

38 Rule of Style 2 If by some miracle you have two things to say.9 5 6 Quotations William Blake “A Poison Tree” e. cummings “In Just-” 7 Theodore Roethke “Moss-Gathering” 8 Robert Frost “Birches” 9 Edwin Arlington Robinson “Mr. Burns v1. 2.0 . e. Flood’s Party” S Poetry c 1998 Patrick J. say one thing and then the other.

3. but the usage is established. Then it turns to subjects more directly related to programming. (A better name would be “search path” since it is not an S list.Data") 39 . You can add to the search list with the attach function. Typically you give it a character string which names a directory containing S objects: attach(". for a more truthful explanation. When S needs to ﬁnd an object. This is a simpliﬁed version of how S searches for objects. then it looks in each database in the search list in turn. see page 210./other_directory/. So an object named x that lives in database 2 will mask an object named x that lives in database 3. This chapter starts with topics concerned with the use of S in general— databases and their organization. Once it ﬁnds an object with the correct name (and possibly the correct mode).. naming objects. it uses that object.Chapter 3 Ecology In which peaceful relations are encouraged.) You can see what databases are currently on the search list with the command: search() The ﬁrst database on the list is called the working database.1 Databases S has a search list where it looks for all of the objects that it needs. You need to have write permission for the working database in order to have S happy with you. and this is where assignments are written (it is possible to make assignments elsewhere).

loc is set. > x1 Error: Object "x1" not found > jjx <.2 Object Names When I ﬁrst came to S. If you want to add libraries of your own. do: library() To get some more information about a library. I don’t recommend using a data frame or a list as the working database. there are some superﬁcial bugs when lib. x2=letters[1:9]) > attach(jjx) > x1 [1] 1 2 3 4 5 6 This can be an excellent way to organize computation in some situations.loc object that is a character vector of the pathnames of the directories where libraries may be found. I was thrilled that the objects I created remained permanently—even after I quit S. but an optional argument allows you to select where in the search list the new one is to fall. then S-PLUS allows you to create a lib. Also the user does not need to know the path of the directory of functions.0 .First. Usually you give detach the number of the database that you want to disappear. 3. This was in contrast to systems where I needed S Poetry c 1998 Patrick J. ECOLOGY This will place the new database in position 2. Burns v1. In (most) versions of S. DANGER. you can do a command like: library(help="examples") and do library(examples) to attach the library.40 CHAPTER 3. however. I don’t know of any that are harmful. use detach. The columns of data frames and the named components of lists are the objects within the database. Libraries are mildly special directories of S objects.list(x1=1:6. To see what libraries are available to you. To remove databases from the search list. You can attach lists and data frames to serve as databases. In version 3 of S the only real diﬀerence between a library and an ordinary directory of S objects is that the .lib function will perform some startup actions when the library is attached.

time. Functions are small so throwing them away buys you very little disk-space. With proper naming conventions for your objects. Likewise. ye Mighty. but at some cost. it doesn’t appear in the words of any natural language I’m likely to learn soon. but does something slightly diﬀerent. and it can be thought of as deriving from “junk”.0 . By the time my directory contained a few hundred objects. permanent data are objects other than functions that you want to keep long-term. Other naming conventions that I’ve seen include starting the object name with foo. I start the name of a temporary function with fjj and start the name of other temporary objects with jj. or using triples of letters. Permanent functions are functions that you have written that you will continue to want to use. except that it returns 3 numbers: the total computer time. I can say: objects(pat="jj") To exclude the temporary functions. OBJECT NAMES 41 to explicitly state what I wanted to save before quitting. The problem with the ﬁrst three is that it is conceivable that the name of an object you want to keep may sneak into the convention. I began to learn the price to be paid for this convenience. or to be able to quickly decide what to throw away when the system administrator yells at you for hogging all of the disk-space? Look on my works.3. Some people might want to add a category of derived data. Temporary objects are those that will not aﬀect you adversely should they disappear.unix. I ﬁnd it very useful to diﬀerentiate between temporary functions and other temporary objects. permanent data. If I want to list all of my temporary objects. I say: objects(pat="^jj") (The “hat” symbol indicates the beginning of a word. you can ease the burden of object proliferation. Here’s my reasoning: jj is easy to type. I have a function called p. Wouldn’t it be nice to be able to discover the name of any particular object you want. tmp or junk. temporary functions and temporary data. but a matrix of numbers even with distinguishing dimnames soon becomes a mystery. The problem with the last is that it is not convenient to list them. A convention for temporary objects is most important.) I can remove all of these with remove(objects(pat="^jj")) S Poetry c 1998 Patrick J.time which is basically the same as unix. and despair! 10 The types of objects that you create can be divided into ﬁve categories: permanent functions. Burns v1.2. to represent data that can be reproduced. modiﬁed system functions. it seldom appears in abbreviations. For example. Functions also tend to have a longer useful life than data—I can read a function and know what it does. What I call modiﬁed system functions are those that are similar to an in-built function. the clock time and the memory size.

date") It is good to have a naming convention for modiﬁed system functions. then learn emacs. 138)] remove(jj) Note that jj may be one of the objects removed. and then wrap the call to remove around the same call to objects. and to re-execute them.matrix for example.summary reports various information on objects: > objects. Burns v1. S Poetry c 1998 Patrick J. For command line editing in S. f. the best method is to use S-mode in emacs. objects. Admirers of S-mode are many and fervent. and we do not want the dataset date to be part of the output. If you already know vi well and you have S-PLUS. what=c("data.mode extent fjjmem1 function function 3 fjjmem2 function function 3 fjjmem3 function function 3 fjjmem4 function function 3 jjm matrix integer 3 x 1 jjmat matrix double 5 x 5 In the example above we ask for objects in the working database (the default) that have names containing “jjm”. to the system name. "extent") ) data. Below we want to get all of the functions in the working database (and nothing else). then an alternative is to use the e ﬂag when you start S-PLUS: % Splus -e You can then hit the “escape” key.42 CHAPTER 3.summary(pat="jjm". This means that text editing is involved. If you typed the last statement but missed typing the second “j”. If you know neither. A more complicated removal might be handled like: jj <. ECOLOGY This brings us to an advertisement for some form of command line editing. A safer approach is to look at the result of the objects command. + "storage. and sort them by date: objects. On Unix there are essentially two choices for an editor—emacs or vi.class".objects(pat="^jj") jj <.summary(mode="function". Command line editing is the ability to recall past commands. 9.class storage. order="dataset. go back into past commands and edit them.0 .mode". In S-PLUS. then you might well be sorry. That’s okay—suicide is allowed. possibly after moodifying the command.jj[-c(4. A method I’ve seen is to append f.

out = F) } CODE NOTE.matrix written by x when x uses fy (got that?).exit(unlink(filenames)) dput(get(x.")) on.matrix functions in a group. there are circumstances in which it is decidedly valuable to avoid computations. so I opted not to do this. the line could be changed to: if(nargs() < 3) loc <.character(x) || length(x) > 1) x <. newlab). num = T) This uses find unless all three arguments are passed into the call.tempfile(paste("database". then there is no need to do the find command. sep = ".matrix functions. Burns v1. This also makes it easier to direct questions and comments to the right person."new" filenames <. We would have x. filenames[1].deparse(substitute(x)) loc <.3.find(x."old" if(is. However.na(c(old. Person x uses function fy written by person y . filenames[ 2]). num = T) if(any(is.old else oldlab <.new else newlab <. where = old).matrix written by y . old = loc[2]. The worst type of error might occur—everything looks ﬁne. filenames[2]) unix(paste("diff -c ". if you are in a group of S users and functions are likely to be shared. Consider the situation of two or three f. elaboration is politic.0 .numeric(old)) oldlab <. but the answer is wrong. filenames[1]) dput(get(x. S Poetry c 1998 Patrick J. OBJECT NAMES 43 This seems ﬁne if you are an isolated user. but fy sees the f.matrix instead of two f. A solution is for each individual to have their own preﬁx to use. function fy uses the f.numeric(new)) newlab <. where = new). Taking that into account. If both old and new are given.2.find(x.matrix and y. c( oldlab. "". the probability that all three arguments will be given is inﬁnitesimally small and the find call is not expensive. In this case. The diffmask function will tell you if and how two objects of the same name diﬀer: "diffmask"<function(x. new = loc[1]) { if(!is. However. new)))) stop("need two locations for object") if(is.

evaluates a call to it. bad. I use diffmask to see the current modiﬁcations. but—barring extraordinary circumstances—you will not change it in all of the other directories. Burns v1. ECOLOGY I ﬁnd diffmask quite useful—I wrote it for myself. You don’t want to copy the same functions into each directory (bad. not the book.q referred to in Shere might look like: . A . but you can have one directory where your general functions live. Suppose that you do copy a function into several diﬀerent directories. Here is a Unix script to set up an S directory in the current directory. Often I have a local version of a function that is also somewhere else on the search list because I am modifying or debugging the function. This is the redundancy problem once again. there comes a time of too much. Now suppose that you discover a bug in the function as you are using it in one of the directories.Data . Even without bugs.First function is used to perform tasks as an S session starts. extra features may be added to some versions of the function so that you end up with several similar functions that are non-trivial to get back into sync with each other.0 .q ﬁle contains a solution to the main problem with dividing your S objects into separate directories. mkdir mkdir # you Splus .First <S Poetry c 1998 Patrick J. it checks to see if there is a .Data by starting S from the directory.First function in the working database. and give the command Shere to Unix. Just before S gives you the ﬁrst prompt of the session. The most common solution is to create separate S data directories for the various projects you work on. 3. bad).Help may need to change the next line < $HOME/pS/dotFirst.44 CHAPTER 3. Almost everyone has a set of general functions that they always want to have available. and always attach that directory as S starts. and if so. After which you can use the new . but I haven’t yet discovered a good use for it. You will ﬁx the bug where you are working. It is likely that you will need to rediscover the bug in one or two more directories before the bug is extinct.Data/.3 Divide and Conquer No matter how organized your naming convention.) Here is what that dotFirst. then you merely need to go to the directory where you want to run S.Last.q Assuming that the script is called Shere and is executable. You may be wondering why I dislike the idea of copying functions around. The dotFirst. (There is also a . but on a diﬀerent scale.

) { in. and some of the options are directed at the S language itself.frames. then that line needs to be changed—in S-PLUS a statement like getenv("HOME") could be part of the solution. with freedom comes responsibility. Burns v1. There are quite a number of options each of which tends to be recognized by one or more functions. However.privateFirst() cat("the answer to 2 * 2 is". "\n") } 45 The statement involving . It is just that now we are attempting proper abstraction of groups of S objects. where = 1)) . Under this scheme you use ..privateFirst allows each individual directory to have its own start up actions.First. then you get a new option and the old option still retains its old value.. listed below. You may have noticed that abstraction is still the operable concept. You can always use soptions in lieu of options. In S-PLUS there is also a ..names <.privateFirst like you would normally use .3. The exact mechanism is not very important. objects may be assigned. The function soptions (as in “safe options”)./pS/dotFunctions") cat("attaching dotFunctions\n") if(exists(".0 . For example. Note that the attach statement presumes that all of the places where S is to be used are on the same level of the directory tree.4.)) S Poetry c 1998 Patrick J. and libraries attached. ENVIRONMENT function() { options(error = dump..size = 20000000) attach(". What is important is the eﬀect of dividing your objects rationally while maintaining convenience. object.local that can be used at a site so that a set of startup actions is performed for all of the users. When you intend to change an existing option but misspell it. "soptions"<function(.privateFirst".names(list(.. You can add your own options—there is no distinction between the options that come with S and those that you create. If this is not to be the case. 3. gives a warning when you are adding an option so that you will realize when you’ve made a mistake.4 Environment Your S session is controlled by a number of options that can be inspected and changed by the options function. 2 * 2.First.

Auto. oldnames. ECOLOGY oldnames <. This is a safety feature so that blunders along the lines of 1:Inf are not taken seriously. "\n") S Poetry c 1998 Patrick J. > cat(pi.names(options()) newnames <.names[!match(in. The object. paste(newnames.First.size option limits the size in bytes of an object when it is created.. The digits option controls the number of signiﬁcant digits that are printed. then no warning would have appeared and the user would probably not realize for a while that the option was not changed. "))) } ans <.names.141593 > cat(format(pi). Burns v1.in.size function that returns the size in bytes of its argument.46 CHAPTER 3.options(. See the options help ﬁle for more information on these options and to learn about other options.print) ans else invisible(ans) } An example of when this is useful is: > soptions(warning=2) Warning messages: new options added: warning in: soptions(warning = 2) The actual option is named warn and what should have been typed was: > soptions(warn=2) If the options function were used instead of soptions. Some of the more important options are now discussed. collapse = ". so this is a good candidate to be changed in your .) if(. nomatch = 0)] if(length(newnames)) { warning(paste("new options added: ". In many versions of S the default value is rather small.. "\n") 3.14159265358979 > print(pi) [1] 3. There is also an object. This example also points out a weakness of soptions—there will be no warning message if the warn option is not set properly.0 .

141593 > options(digits=14) > print(pi) [1] 3. The error option gives the action to be performed when an error during evaluation occurs. You can change the prompts that S gives by using the prompt and continue options. See the debugging chapter to learn why you care.calls and dump.frames()) # WRONG Your two main choices for error are dump. The two extremes are -1 to suppress warnings. but cat does not. it should typically be 80 when printing to a terminal. while the latter returns the stack of calls plus the state of each variable in each call at the time of the error. If you only want the column labels printed once at the start.frames) or options(error=expression(dump. as does format. want the prompt to include the name of the current directory. The former returns the stack of calls at the time of the error. You may.frames. This can be either a function (with no arguments) or an expression of a call to a function. "\n") 3. Burns v1. then set the length option to some large number.14159265358979 47 From this example you can see that print obeys the digits option. The warn option controls what happens when a warning is encountered. After an S session starts up. The ignore. ENVIRONMENT 3. and is used. The width and length options also concern printing. width gives the maximum number of characters to put on a line. for instance. and it is set at 55 for listings in this book.3.1415926535898 > cat(pi.frames())) but not options(error=dump. An object that is in the hash table is found quicker than objects that are not in it. each object that is used is tested to see if it should be put in the hash table.0 . for example.4. length speciﬁes the number of lines per page. The keep option states what should be kept in a certain hash table. to decide how often to print column headings when printing a matrix. and 2 to convert warnings to errors. So you can say options(error=dump.error function on page 215 is an instance where it is natural to set the error option to NULL. For example. Usually keep is set to "function" so that functions S Poetry c 1998 Patrick J.

1) S Poetry c 1998 Patrick J.1 4.1 2.4. keep can be set to NULL to disable hashing entirely. The graphics parameters mostly have names consisting of three characters.1 > par(c("mar". Some other more esoteric options that can be useful are expressions.0 .par)) old.1 4.1 2. You can set keep to "any" which will put all objects in the hash table. DANGER. Burns v1. See page 219 for a discussion of how options are implemented. but it controls the behavior of S graphics. echo and interrupt. the options and par functions do have diﬀerent conventions for the return value. In contrast.exit(par(old. options always returns a list.par <. This is generally a bad idea because you will soon use lots of RAM. par returns the value of the parameter. when a graphics parameter is being set.1)+. ECOLOGY and only functions are hashed. However. See the discussion of synchronize on page 103 for more on this hash table.1.48 CHAPTER 3. This could be used if you are going to access a large number of functions only once.1 4. Going the other way.1 $cex: [1] 1 When querying the value of a single graphics parameter. so the meaning of a graphics parameter is not completely discernible from its name—take this as a lesson of how not to name objects. You may have noticed that a function runs slower the ﬁrst time it is called in a session than subsequently. "cex")) $mar: [1] 5. Although generally analogous.par(mar=c(8. The par function is similar to options in many respects.1 4. but it can be a useful strategy to temporarily use this setting to hash some objects that will be accessed numerous times in the session. and only returns a list of values when two or more parameters are queried. then par does return a list (invisibly) so that the following idiom will work: on. > par("mar") [1] 5.

0 . Unix Power Tools by Peek. you will need to use an SCCS command before using source or its relatives. An object named my. SOURCE CONTROL 49 3. nothing needs to be added to ﬁles put under SCCS).5 Source Control It is wise (and more relaxing) to have your code under a control system. O’Reilly and Loukides (1993) has a brief introduction to SCCS (despite what they say. C and Fortran sources. The idea is that you have a history of versions of the code (or whatever) so that you can recover any particular version.rationalnum") If you want to replace the current deﬁnition of a function with the last version (or if you accidentally removed the function).q ﬁle. I have found it best to try out my changes to a function rather than immediately creating a new version. The two S functions called sccs and diffsccs make it trivial to put S objects under SCCS control.fun is put under control with the command: sccs(my. Details of the use of SCCS are available in the man page. Believe me. then using the xname argument to sccs and diffsccs is essential. Here is an example: diffsccs("/. To get a version farther back. My usual method when I am about to edit a function is to do: diffsccs(my. such as RCS.q which is a dump of the function. it is good to control help ﬁle. The main reasons to use such a system are: the new version that you just wrote may not work right so you want to go back to what did work. Furthermore there is a related ﬁle inside the SCCS directory to keep track of the versions (done in a memory eﬃcient manner). For these you need to learn more about SCCS. but there are others.5. I will talk about SCCS. the last point is no small bonus.3. "Div. and then edit the function.rationalnum".fun) to see if what is there is diﬀerent than the last SCCS version. you need to reproduce the state that existed at some speciﬁc time in the past. then I use sccs on it. The only preparation is to make sure that you have SCCS available on your machine and to create a subdirectory called SCCS in the directory where you want the control ﬁles to live. SCCS is an acronym for “Source Code Control System”. Start S in this same directory. If the S object has a name that is not nice for a ﬁlename. Burns v1. In addition to putting S objects under source control. Here are the listings for the two functions: S Poetry c 1998 Patrick J.fun. You may ﬁnd it useful to write some simple scripts or aliases to perform the tasks with names that make sense to you. you are protected from accidentally destroying what exists. If it is.fun) This creates a ﬁle in the current directory called my. hence the reason that SCCS is often not up-to-date. then use source or restore on the .

dd <.!is.paste(xname. xfile. out = F) } hcom <. xname = x.dump(x. ".dd) unix(paste("sccs delget". "q". sep = ".deparse(substitute(x)) do.") } else { xfile <.dd) { if(do. already. out = F)) warning(paste("file ". xfile). xname. xfile.d".character(x)) { if(length(x) > 1) stop("only one at a time") } else x <. out = T)). xfile. S Poetry c 1998 Patrick J. do.0 .") } if(new) { ok. xfile.out) unix(paste("sccs edit". new = !length(unix(paste("ls".dd) { data. xfile) } } # start of main function if(is.function(x. xfile) } else { dump(x. sep = "") if(unix(hcom.paste(xname. do. sep = "").d not found.dump(x.dump(x. ". do. ECOLOGY "sccs"<function(x. xfile).dd) unix(paste("sccs create". xfile).". out = F) } else { if(!already.dd) { xfile <.function(get(x)) if(do.out = F) { ok. out = F) ok. xname. "Q". out = F) unix(paste("rm . xfile).50 CHAPTER 3. sep = ". do you have a help file?".paste("test -f ". Burns v1.dump <.

The ending “. "diffsccs"<function(x.function(get(x)) xfile <. sep = "") invisible(unix(cmd. Actually this is misnamed since it doesn’t depend on SCCS at all. CODE NOTE.". It checks the diﬀerence between the current value of the object in S and a representation of it in a ﬁle in the current directory—it presumes that the ﬁle is not being changed behind the back of sccs. The function has two diﬀerent ways of testing for the existence of a ﬁle.0 . xname.dd <.s” is sometimes used for assembly code. but it is mainly a bunch of decisions. This looks messy.q” in the ﬁle names comes from QPE (S’s nom de guerre).character(x)) { if(length(x) > 1) stop("only one at a time") } else x <. suffix. The main substance of the function is knowing the SCCS commands.dump is an important feature.deparse(substitute(x)) do. Burns v1.q" } cmd <.dd) { data. The “. Most of the trickier parts will be covered in the next chapter.3.". SOURCE CONTROL sep = "")) invisible(xfile) } 51 CODE NOTE.!is.dump(x.Q" } else { dump(x.tempfile("dumpsccs") on. especially in a single function. xfile) suffix <. " ". xfile) suffix <. out = F)) } CODE NOTE. I advise you to pick one way of doing a task. xname = x) { if(is.paste("diff -c ". This is done for the purpose of demonstrating options. but switching between dump and data.5. S Poetry c 1998 Patrick J.exit(unlink(xfile)) if(do. xfile.

Here are some uses of verify. Implications that the documentation makes should be tested. A step in that direction is knowing that today’s answers are the same as last week’s. You have moved to a diﬀerent version of S. I think it is good to know these distinctions. The structure of the code may imply areas where it is most vulnerable. The test should deﬁnitely include speciﬁc examples in the documentation (and hence it becomes a test of the documentation also). That is. When you write a regression test. white box and regression. Tests for software are usually given less than their due. reincarnation deﬁnitely exists for bugs—it is not at all uncommon for a bug that had been ﬁxed to reappear. Also the bug may be an indication of a general weakness that the test might guard against. It seems ludicrous. Testing ranges from formal test suites (like those that verify runs) to the completely informal. Regression tests are tests of bugs that have been ﬁxed. The informal testing should be fodder for the formal tests. it is important that you test that the code is right. tests may be based on weaknesses or common misuse of some of those functions. Documentation of the code—especially if written by some one other than the programmer—is a good source. You are making your code publicly available and it is desirable that the user has some assurance that they are getting good answers on their setup. not that you don’t get a particular wrong answer.52 CHAPTER 3. and you want to ensure that you get the same answers.6 Test Suites Correct answers are nice. try to get the test to hit as much functionality as feasible. First oﬀ. A particularly vivid example is that the person may want to know if the code works under both S and R. you want the test to be: 2 * 2 == 4 not 2 * 2 != 5 # poor test S Poetry c 1998 Patrick J. or a diﬀerent version of the operating system. Given knowledge of which S functions are used. probably the informal testing of functions as you write them is the most important testing that is performed. though it is not necessarily clear into which category a particular test falls. Although I think it is very important to have formal test suites for functions that you depend upon. If there is a speciﬁcation for the code. You have made some eﬃciency improvements to your functions. The verify function—given below—does this. but these are actually very valuable.0 # good test . I’ll now discuss three types of tests: black box. or a diﬀerent architecture. The tester merely knows what the code is supposed to do. ECOLOGY 3. White-box testing means that the tester has knowledge of the code. Test the most common uses. The name “black-box test” comes from the idea that the person writing the test does not know anything about how the code is written—code need not even exist when the test is drafted. Once that is done. Burns v1. then this is obviously a starting place for the tests.

Expert programmers are always skeptical. then a “garbage” input would have been the unquoted name of a function. Give the function garbage. arrays that have one or more dimensions only 1 long. For instance it is easy to get the index wrong when translating between S and C. This should give you some appreciation for the diﬃculty of testing a large piece of software.3. there are some general principles that are good to follow. 11 Let’s think about what sort of informal testing should be done for the sccs function listed above. we may well have written the sccs function so that it only accepted a character vector containing the names of objects. So a simple. Bugs like this are the result of some global variable not operating properly. zero length inputs. To do this. it should be inputs that a naive user might feed the function. We want to make sure that the same behavior occurs when the input changes between these two forms. or to get the size of something wrong by not including both ends. TEST SUITES 53 Although there is no truly eﬀective way to test software. singular matrices. Alternatively.6. S Poetry c 1998 Patrick J. Test along “seams”. Testing should be continuous. We may consider testing sccs thoroughly.out both TRUE and FALSE. it was important to make sure that soptions (page 45) worked when there were no argument names. This should not just be random garbage.0 . possibly with character data as a special case. of course. Another potential trouble spot is when the name of the object is weird and we need to use the xname argument. We would want to test it with new being both TRUE and FALSE. The ﬁrst thing. etc. A class of bugs to look for—with either black or white box tests—is “one-oﬀ” bugs. ’tis our turn now To hear such tunes as killed the cow. is to make sure that it works at all—so test it on a function that hasn’t been put under control. and speciﬁcally that a character vector of length 1 can be properly put under control. Another class of bug is “ﬁrst-only”. and with already. Sometimes a routine will work only the ﬁrst time it is used in a session. but work subsequently. it might not work the ﬁrst time. For instance. For example. Just what this means depends on the situation. If we had written it that way. Burns v1. but it can mean values of 1 or 0 when positive integers are expected. One potentially ﬂaky spot is that we allow as input both a character string containing the name of the object and the name itself. then change that same function and use sccs again to make sure that it updates okay. we would want to test it on objects with and without troublesome names that were functions and non-functions. little function that doesn’t do very much is going to require on the order of 10 test cases to cover all possibilities. they forever question if everything is making sense. Still along this line we should look at what happens when the object that we want to put under SCCS control is character.

The other method of verify is for objects of class verify and it compares the current answers to the original answers to see if they have changed (signiﬁcantly). ECOLOGY verify is a generic function with two methods. To start. Burns v1. Names are optional for the commands—but convenient when there are a large number of commands in the test.sin. Test. and the second is a list of the objects to be used in the commands.x : 1992 Machine: julia Date: Sat Aug 24 12:08:37 GMT 1591 Once you have this original test object. SunOS 4. plus the name of the test or its index number: > jjverif2 <.verify(c(sin="sin(x)". It is mandatory to have names for the second argument. where each string is an S command. The default method expects a vector of character strings. Test. The default method returns an object of class "verify".x : 1992 Machine: revery Date: Fri Dec 10 12:07:37 EST 1830 The answers from each test are available as input to subsequent tests by the name of Test.sin. SunOS 4.54 CHAPTER 3.ran)"). we give verify two arguments—the ﬁrst is a character vector of commands. ran="runif(9)"). ran="runif(9)".1 Release 1 for Sun SPARC.ran)" Names of data: x Specifics: version: Version 3. + list(x=1:4)) > jjverif Passed: sin ran NA NA Commands: sin ran "sin(x)" "runif(9)" Names of data: x Specifics: version: Version 3. In the test below.verify(c(sin="sin(x)".0 . give verify a character vector containing the commands that you want to test. > jjverif <. list(x=1:4)) > jjverif2 Passed: sin ran more NA NA NA Commands: sin ran more "sin(x)" "runif(9)" "outer(Test. + more="outer(Test. The result of the new call to verify will S Poetry c 1998 Patrick J.1 Release 1 for Sun SPARC. Both objects given as arguments have names. you can use it as an argument to verify in whatever new environment you like.

2. 16))) Test. 1997) Test. Create a clean directory with nothing in the .2 * Test. .08. rep(variousnum.3.loan _ loan(2000.loan.transcribe _ transcribe("state.name". "trig") > source("poet.q") As you write this ﬁle.Data.3 Release 1 for Silicon Graphics Iris.ratnum Test. you need to make the character vector to feed to verify.10) Test. At this stage you have the opportunity to see if the results are as they should be.something. and if so. IRIX 5.". with each one assigned to Test.6.verif. Make a copy of the ﬁle so that you retain the original (to make additions). Burns v1. rep(100.5)) Test.ratnum . then how. testing transcribe is also a test of perl because transcribe calls perl.". There is the task of creating the initial vector of character strings of commands. > !head poet. 2. > verify(jjverif) Passed: sin ran T T Commands: sin ran "sin(x)" "runif(9)" Names of data: x Specifics: version: Version 3.numbase _ numberbase(-9:9. The suite as a whole should test all of the signiﬁcant functionality. TEST SUITES 55 tell you if any of the answers are diﬀerent. you can use source to evaluate the ﬁle of statements. I’ll give the method I’ve used to do this. Write a ﬁle containing the commands that you want to test. rep(16.0 . Once you are pleased. it may be that you decide that the old tests were wrong and that the new one is giving the correct answers. For example. and edit the copy by taking out the assignments."_") Test.op _ Test. Using an underscore S Poetry c 1998 Patrick J.verif.pg1 _ polygamma(1:10. Objects that are to be passed into verify—like variousnum in my example—are assigned in this clean directory. At some point. and anything that seems prone to being wrong.ratnum _ rationalnum(variousnum.ratnum.loan _ update(Test.update.2 : 1995 Machine: azcan Date: Thu Oct 2 12:14:09 EST 1879 The commands that are in your test suite should each cover as much functionality as possible—just as a line of good poetry packs the maximum amount of meaning.q Test.

1./spo. + sep="\n") > names(jjpv) <.verif.var sym. > print(poet.x : 1992 Machine: plum Date: Mon Sep 17 08:37:51 EST 1883 So now poet. SunOS 4.grab > q() Now.default.jjvn > poet.test/poet.init stack.q poet.int lagrange NA NA NA NA NA NA global. Burns v1. go to the home of the code so that the veriﬁcation object will live in the same directory.init que.test/.0 . > !cp poet. work seamlessly.pop NA NA NA NA NA que. I added a cat statement in S Poetry c 1998 Patrick J.form twice. list(variousnum=get("variousnum".scan(".grab".verif. This really was done under S-PLUS version 3.verify(jjpv.verif <. but the test will system terminate using the usual verify. what=""./spo.verif is the veriﬁcation object that users of the code can use at their whim with a command like: > print(verify(poet. > jjpv <..verif. short=T) DANGER.grab > # edit poet.1 Release 1 for Sun SPARC.56 CHAPTER 3. You can use another copy of the ﬁle to get the names to give to the commands. You can then try this out on verify to see how it goes.verif. It will.Data"))) Scan the commands into S so that each command is a separate element of the resulting character vector. ECOLOGY makes it easy to do the editing as long as any assignments within the test proper are not underscores. of course.verif).2 NA NA NA NA Names of data: variousnum Specifics: version: Version 3.loan numbase ratnum ratnum.pop quad.addr symsqrt stack. short=T) Passed: transcribe loan update.op NA NA NA NA NA NA pg1 pg2 pg3 digam1 digam2 digam3 digam4 digam5 NA NA NA NA NA NA NA NA expint1 expint2 mathgra getpath line..verif. + where=".

seed)) { old.seed <.random. sep = ".seed <<.seed } n <.seed") if(length(random. you can do: verify(poet.verify.seed on.attr(x. This bug was ﬁxed (I believe) in 3.all.. Burns v1. and it ran ﬁne.1:n for(i in 1:n) { if(length(data)) { ans[[i]] <." )]] <.length(x) data <. if you do not have an ANSI C compiler.exit(. ans[[ i]]) data[[paste("Test".names(x) if(!length(tnam)) tnam <.Random. Symptoms are that strange things happen in a loop. For example.3.2.attr(x.equal(x[[i]]. the verify method for objects of class "verify". TEST SUITES 57 the loop. "random. and changing something like putting in calls to cat make it disappear or change.eval(parse(text = commands[i]).6.vector("list".old. "commands") random.seed) .seed <<.verif[-28]) to remove the one test that demands this. "data") passed <. "verify. tnam[i]. The default method is similar. n) tnam <.xnam <. There is also subscripting of verify objects so that you can run only part of a test.0 . though it has an extra argument for data.Random.ans <.verify"<function(x) { commands <.seed <. data) } else { ans[[i]] <.eval(parse(text = commands[i])) } passed[[i]] <. Below is the listing for verify.attr(x.ans[[i]] } S Poetry c 1998 Patrick J.Random.

0 .unlist(passed) names(passed) <. I have a personal function that reports the total computer time and the elapsed time—I ﬁnd that more useful. So.time(rationalnum(1:10. ECOLOGY if(all(unlist(lapply(passed. the third number is the elapsed (wall) time rounded to the nearest second. An important feature is that it takes care of getting the random seed right if the test involves any functions that use the S random generator. adds each result to the list of available variables.list(version = if(exists( "version")) version else NULL. 3)) S Poetry c 1998 Patrick J.7200012 0. passed = passed. unix.7300014 0. it will be faster (and probably more memory eﬃcient) to name the object. Just give the command you want to time as the argument to unix. mode)) == "logical")) passed <. so assignment merely adds the pointer for the name. in general.0000000 0. The object is already there.7 Time Monitoring There is one operation in S that is essentially instantaneous—assignment.length(data) .0000000 > unix. 3)) [1] 0. if an object is used twice or more in a function. data = data.0000000 > unix. Burns v1. and compares the current results to the old results.time.xnam length(data) <. date = date()) attributes(ans) <.00000000 0.00000000 > unix.0000000 0.1600001 9.time(for(i in 1:100) rationalnum(1:10. class = "verify") ans } Mainly this function loops over the commands in the test suite. A simple tool in S to monitor the time used by function calls is unix. The last two numbers are zero for almost all commands. commands = commands.00000000 0.58 CHAPTER 3. specifics = specifics.1499999 8.time(for(i in 1:100) rationalnum(1:10.seed = random.time.0000000 0.00000000 0. 3)) [1] 3.time returns ﬁve numbers—the ﬁrst two are diﬀerent kinds of computer time and the last two are computer times for subprocesses. evaluates them. When I do count the clock 12 > unix.n specifics <.list(names = xnam. machine = unix("hostname"). random.time(for(i in 1:100) rationalnum(1:10. 3)) [1] 3.0000000 0. 3.04000092 0.seed.

so we seem to be getting accurate timing (though 4 seconds is a little slim still). give it a vector of character strings naming the functions that you want to monitor. "c". In order to get timing that is at all accurate. There are poor uses of this function as well as good ones—the “Code Tuning” chapter of Bentley (1986) discusses the issue from the point of view of a language like C. For instance: > interlude(c("qr". The function is rather involved. such as disk access time. and have similar timing. The interlude function monitors the number of calls to and the execution time of selected functions.0000000 0. The 30-dimensional problem then ran in about 30 minutes. This is useful to see where the time goes in long calculations. but it took eighteen hours to complete on a 30-dimensional problem when my calculations based on the 2-dimensional case showed that it should only take a couple hours. I used interlude to striking advantage in one case. This can be used either to see how much time particular functions take. Sometimes you have very little idea of how often various functions get called.0000000 59 In the ﬁrst call the total time is just a fraction of a second.1800001 8. You might notice that the elapsed time is a lot more than the computer time. Sometimes the elapsed time is greater even when the machine is doing nothing else.3. The usual trick is to use a for loop to call the command some number of times. This is often called “proﬁling”. In this case it is because there are other users on the machine. but an examination of the process showed that memory use was not excessive and paging was not the cause.0000000 0. you need times that are several seconds. To use interlude.7000008 0. so a discussion of the code for it is deferred until page 220. This implies that your command uses a signiﬁcant amount of time that is not being accounted for. He points out that it is important to get the big picture right before mucking around in the details. I wrote a rank. Then I put interlude on the case. "seq")) Warning messages: 1: assigning "qr" on database 0 masks an object of the S Poetry c 1998 Patrick J. The purpose in the ﬁrst case is often to see where the computation time is going with the view of speeding things up.7. It is feasible to think that there are cases where knowing the number of calls to some function can suggest a more eﬃcient algorithm. and it showed that rank was the source of the problem (there were lots of calls to it). or how many times they are called.0 . but that name is already taken in S. "rep".fast function that ignored missing values (there were none in my application) and ties (which were rare and unimportant). I immediately thought of paging as the reason. The last three commands are all the same. TIME MONITORING [1] 3. Burns v1. A function given a 2dimensional problem ran in a few minutes.

0 . The default is to remove all the functions. DANGER.00000000 0 NA c 0. type: > summary. The trace function uses this same mechanism—it will not work to use both trace and interlude on a function at the same time.lm(formula = freeny. S Poetry c 1998 Patrick J.interlude() total. 3.interlude whenever you like.x) Execute as many commands as you like. You can remove some or all of the functions from being monitored with uninterlude. For accurate timing.y ~ freeny.time qr 0.08333325 Use summary.8 Memory Monitoring The amount of memory that an S function takes to run is generally the most important measure of its eﬃciency. and when you want to see the results. It makes a mess. You do not want to edit functions that are under interlude since you will get the interlude changes mixed into your copy of the function. The amount of memory a function uses depends—sometimes heavily—on the state of memory at the time that the function is called. DANGER. you should avoid monitoring functions that are used by other functions being monitored. ECOLOGY same name on database 5 2: assigning "c" on database 0 masks an object of the same name on database 7 3: assigning "rep" on database 0 masks an object of the same name on database 5 4: assigning "seq" on database 0 masks an object of the same name on database 7 > jj <. Burns v1. Seemingly trivial changes to a function can sometimes have dramatic eﬀects on memory usage. It is also hard to understand.00000000 0 NA seq 0.01666665 9 0.time number.00185185 rep 0.08333325 1 0.calls mean.60 CHAPTER 3. interlude creates temporary versions of the functions that will last no longer than the present session.

y) The ans.lm. then there are a few things that you can do. The ﬁrst thing to try is to get out of S and start a fresh session—the command may work since the new S session is using less memory.xmat <.3. Burns v1.0 .x2) and you are running out of memory. The lm function does quite a lot of data manipulation before it gets to the actual computation of the regression. There is one idiom that is guaranteed to waste memory. Before the for loop. huge.x1. and you want it to be memory eﬃcient.alt <. ans grows in size so it generally needs to move from where it has been living to a new location. If this doesn’t work. then see if you can break the task into pieces that can be performed individually. huge. Here is an example: # AVOID THIS ans <.y ~ huge.8. The problem is that this code fragments memory. This frees up the space it was using.c(ans. For large objects the alternative method will use substantially less memory. DANGER. that is not a problem—it works ﬁne. ans takes up a certain number of bytes (the size of the S header). The remainder of this section presumes that you have a programming task that will be done repeatedly. huge.NULL for(i in 1:n) { ans <. suppose that you are doing a linear regression > ans <. For example. and then call the computational function directly: > huge. The example above can be written much more eﬃciently: S Poetry c 1998 Patrick J.xmat. On each pass through the loop.qr(huge.cbind("(Intercept)"=1. but probably has all you need. but many objects will be too large for the hole so there will be numerous small holes in the memory which means that more total memory has to be used. It is the maximum memory that a function uses during the call that is important—it doesn’t matter if that usage is throughout the entire function or just at one line of it.lm(huge.x2) > ans.alt object is without a few of the ﬂourishes that ans has. sum(n:i)) } If you are worrying that this won’t work because on the ﬁrst iteration we are concatenating NULL with something. We can do the data manipulation by hand. MEMORY MONITORING 61 If you have a speciﬁc task that you are only doing once and you are running out of memory.x1 + huge.fit.

backwards = F) { nobs <. in this example the truly best way would be to ﬁgure out the formula.rev(iseq) for(i in iseq) { this.x <.209.x[1:i. To test the memory use in these two cases.dim(x)[1] ans <. If this is the case and the loop is not long.size()) } The fjjcomma function is just a call to the S-PLUS format function so that commas are placed in numbers.var(this. There are less callous ways of fragmenting memory.0 .384" S Poetry c 1998 Patrick J. > fjjloop function(x. ] this. and avoid the loop altogether. and they shrink in size when backwards is TRUE. then the data grows in size throughout the loop.sum(n:i) } CHAPTER 3.v) } fjjcomma(memory. Function fjjloop provides a simple example of a type of operation that is common in some of my work.x) ans[i] <. Burns v1. we issue each command as the ﬁrst in a session of S. But if the loop is long and you are working with large objects. then I just use the adding-on technique. > # in fresh session > fjjloop(jjbig) [1] "4.v <.) The key characteristic is that the data changes size throughout the course of the loop.numeric(n) for(i in 1:n) { ans[i] <. ECOLOGY Of course.62 # better ans <. you will want to ﬁnd an alternative.2 or later. When the backwards argument is FALSE.sum(this. then the second version will not work.2:nobs if(backwards) iseq <. It is also of importance that the order in which the iterations are performed can be changed.numeric(nobs) iseq <. (It needs S-PLUS version 3. When you don’t know what the ﬁnal length of the answer will be.

part %*% t(b)) } Here are the objects to test the function with. MEMORY MONITORING > # in fresh session > fjjloop(jjbig. b %*% a) cbind(part. S functions guarantee that they will not change the objects that were passed in as arguments (unless the programmer speciﬁcally programs that to happen). "\n") that are scattered through the code. Burns v1. memory. while fresh memory needs to be used when this.0 . You also need to use objects that are large enough (maybe on the order of a megabyte) so that you can see how many copies are being created. b) { part <. each time you call the function you need to have just started the S session (possibly followed by a speciﬁc set of commands).x can ﬁt where the old one was. Garbage is created in loops as objects are continually created and destroyed. The ﬁrst attempt is compact and simple to understand: fjjaugsym1 function(a.997. The primary diagnostic for memory use is statements like cat("at location 2. S Poetry c 1998 Patrick J. Often the main expenditure of memory is because there are numerous copies of objects. Suppose that we have two matrices A and B where A is symmetric and we want to get the larger matrix that is partitioned as: A BA AB BAB If these matrices are large.x is growing in size. One of the main things to do in managing memory is to try to reduce the number of copies of objects that are likely to be large. back=T) [1] "1. (jjbig is 100 by 100 in this example.rbind(a. There are two main reasons for memory growth in S—garbage and copies. but some garbage remains. S has a garbage compactor that recovers most of the memory.544" 63 Clearly the backwards order uses substantially less memory. To use these reliably.size().8.3. Let’s look at some possible deﬁnitions of functions to do this and how they use memory. then memory may be of concern. memory is".) The reason that backwards is superior is that on each iteration the new this. by the way. so each function tends to make its own copy of its arguments.

da[1]) dnb <.b %*% part dna <. oseq] <. nseq] <. The second attempt is signiﬁcantly longer.size(jja)) [1] "320.size(T)) [1] "4. leaving about 3 megabytes as “wasted”.124" Let’s do some accounting now.961.1:da[1] nseq <. "fjjaugsym2"<function(a. Burns v1. it is useful to create objects at least this large so that signiﬁcant changes in memory will be easily visible.dim(a) db <.1:db[1] + da[1] ans[oseq.fjjaugsym1(jja. If your function does anything at all. ECOLOGY When testing memory use.t(part) ans[oseq.dim(b) newsize <. oseq] <.size(jj)) [1] "720.size(jjb)) [1] "160. nseq] <. b) { da <.b %*% a ans[nseq.rep("". c(newsize.124" CHAPTER 3.part ans[nseq.array(NA. then there is going to be memory used in addition to the inputs and output—the aim is to keep the amount small. newsize)) oseq <.a part <. but the logic is still simple.dimnames(b)[[1]] S Poetry c 1998 Patrick J. and then do replacements.dimnames(a)[[1]] if(!length(dna)) dna <.124" > fjjcomma(object.sum(db) ans <.0 .64 > dim(jja) [1] 200 200 > dim(jjb) [1] 100 200 > fjjcomma(object.400" > fjjcomma(object. The idea is to create a matrix that is the size of the result. # New S Session > jj <.part part <. Inputs total about half a megabyte and there was about a half of a megabyte used before we got the ﬁrst S prompt. jjb) > fjjcomma(memory. So about 2 megabytes are accounted for.

However in this test.x). Some of the lines in the function can be permuted which may have consequences for memory use. Obviously the same amount is going to be written to memory—the issue is how eﬃciently memory is recycled. While memory isn’t cleaned up well within a function. it actually uses more memory—I don’t understand why.0 . though this value is seldom used.456" So this saves 1.470.size(T)) [1] "3. + fjjaugsym1(var(freeny. Here are the function deﬁnitions: S Poetry c 1998 Patrick J. db[1]) newdn <. MEMORY MONITORING if(!length(dnb)) dnb <. matrix(1:8. There’s a catch—the memory you save must more than compensate for any extra copies that you create when you increase the depth of the function calls.equal(fjjaugsym2(var(freeny.list(newdn. but I didn’t ﬁnd a better order than the one given. The subfunction principle and the NULL value for the loop were tried separately and together. A logical improvement to this latest version is to change the NA in the call to array to as. newdn) ans } 65 The ﬁrst thing to do is check to make sure that it does the same thing as the previous version: > all.double(NA). I didn’t try very hard.2)). all of the memory is released (except that used by the result) when the function exits. Loops in S return a value. Encapsulating most of the contents of a loop into a subfunction often helps relieve memory growth. This can be used to advantage at times in managing memory by putting some calculations into a subfunction.3.0 of S-PLUS eliminates the return value of loops. Below I show the results of an attempt (slightly unsuccessful) at reducing memory use of a test function. At times memory use can be reduced by making the return value of a loop be something small (like NULL) rather than the large object that happens to be the value. Burns v1.x).8. dnb) if(sum(nchar(newdn))) dimnames(ans) <.c(dna. Version 4. jjb) > fjjcomma(memory.fjjaugsym2(jja. matrix(1:8.rep("".5 megabytes over fjjaugsym1. This way ans will be initialized to be its ﬁnal size instead of half as big.2))) [1] T Now test the memory use with our large objects: # New S Session > jj <.

Burns v1.vector("list". "memory.x %*% x NULL } NULL } > fjjmem3 function(iter = 10. n).function(n) { S Poetry c 1998 Patrick J. n = 100) { subfun <. rep(5.x %*% x } NULL } > fjjmem2 function(iter = 10.0 .vector("list". n = 100) { ans <.size is". memory. memory. n = 100) { ans <. i.outer(rep(1. rep(5. ECOLOGY > fjjmem1 function(iter = 10. "\n") x <. i. "memory.subfun(n) } NULL } > fjjmem4 function(iter = 10.outer(rep(1. iter) for(i in 1:iter) { cat("i is".66 CHAPTER 3. n)) x %*% x } ans <. "memory.size(). n)) ans[[i]] <.size(). i.size is". n = 100) { subfun <.function(n) { x <. "\n") x <.outer(rep(1. n)) ans[[i]] <. n). rep(5. memory.size is". iter) for(i in 1:iter) { cat("i is".size().vector("list". "\n") ans[[i]] <. n). iter) for(i in 1:iter) { cat("i is".

3.8. MEMORY MONITORING x <- outer(rep(1, n), rep(5, n)) x %*% x } ans <- vector("list", iter) for(i in 1:iter) { cat("i is", i, "memory.size is", memory.size(), "\n") ans[[i]] <- subfun(n) NULL } NULL }

67

Here are the results using S-PLUS version 3.3 on a Silicon Graphics. Function fjjmem1 performed the same as fjjmem2: # > i i i i i i i i i i # > i i i i i i i i i i # > i i new S session fjjmem2(n=300) is 1 memory.size is 544584 is 2 memory.size is 6381384 is 3 memory.size is 7823176 is 4 memory.size is 7823176 is 5 memory.size is 9264968 is 6 memory.size is 10706760 is 7 memory.size is 12148552 is 8 memory.size is 12148552 is 9 memory.size is 13590344 is 10 memory.size is 13590344 new S session fjjmem3(n=300) # subfun is 1 memory.size is 544584 is 2 memory.size is 6401864 is 3 memory.size is 7843656 is 4 memory.size is 7843656 is 5 memory.size is 8564552 is 6 memory.size is 9285448 is 7 memory.size is 10006344 is 8 memory.size is 10727240 is 9 memory.size is 11448136 is 10 memory.size is 12169032 new S session fjjmem4(n=300) # subfun and NULL return is 1 memory.size is 544584 is 2 memory.size is 6401864 S Poetry c 1998 Patrick J. Burns v1.0

68 i i i i i i i i is is is is is is is is 3 memory.size is 7843656 4 memory.size is 8564552 5 memory.size is 9285448 6 memory.size is 10006344 7 memory.size is 10727240 8 memory.size is 11448136 9 memory.size is 12169032 10 memory.size is 12889928

The subfunction saved a minor amount of memory. Notice that putting the NULL at the end of the for loop, which is supposed to reduce memory use, actually increased it when the subfunction is used. This is part of the frustration. There were no diﬀerences in memory among these four functions using S-PLUS version 3.1 on a Sparc (the memory use there was slightly higher than here).

3.9

Things to Do

Decide what naming convention for S objects makes the most sense for you, and rename your objects to ﬁt it. Decide how your S objects should be divided, and do it. Put your important ﬁles and S objects under source code control, and create verify objects for your important S functions. Write a version of soptions that guarantees there will be a warning message printed when a new option is added. Which functionality is better? Test the memory and time use of some of your functions.

3.10

Further Reading

The literature on software testing is wide and varied, and contains no magic bullets that I know of. One interesting little book is Davis (1995) 201 Principles of Software Development.

3.11

10 11

Quotations

Percy Bysshe Shelley “Ozymandias” A. E. Housman “Terrence, This is Stupid Stuﬀ . . .” 12 William Shakespeare (Sonnet 12) S Poetry c 1998 Patrick J. Burns v1.0

Chapter 4

Vocabulary

This chapter introduces some of the more important functions to know. Most important is that you can ﬁnd information on functions for yourself. If you are running S-PLUS on a window system, then you should use the help.start function before requesting help. There are two commands for getting help on a function or other object—the following are equivalent: help(match) ?match You do not need to surround the name with quotes, although you may. Neither do you need quotes when you ask for the name of a help ﬁle that is not an S object: help(Syntax) In addition to the “Syntax” and “Devices” help ﬁles, the two most useful help ﬁles of this sort in S-PLUS are “Release.Notes” and “Command.edit”. You do need quotes around “illegal” names: > ?break ? only meaningful if followed by name or call Dumped > ?? ? only meaningful if followed by name or call Dumped > ?%% Syntax error: sp.op ("%%") used illegally at this point: ?%% > ?"?" > ? # same as last command 69

70

CHAPTER 4. VOCABULARY

For the most part, help and ? are synonyms, but ? takes the special form “methods” which gives you a list of possible methods for a generic function. (It just looks at the names, it doesn’t check if they really are methods or not.) > ?methods(summary) The following are possible methods for summary Select any for which you want to see documentation: 1: summary.aov 2: summary.aovlist 3: summary.data.frame 4: summary.default 5: summary.factor 6: summary.gam 7: summary.glm 8: summary.interlude 9: summary.lm 10: summary.loess 11: summary.mlm 12: summary.model.matrix 13: summary.ms 14: summary.nls 15: summary.ordered 16: summary.tree 17: summary.varcomp Selection: The result is a menu. Give the number of any object you want to get help for, type 0 to exit. See more on the menu function on page 106. S-PLUS has a methods function which mimics the “?methods” form, but you get a character vector instead of a menu of options. The methods function also has a class argument so that you can discover which generic functions have methods for a speciﬁc class: > methods(class="numberbase") .Data .Data "numberbase.numberbase" "print.numberbase" Functions are categorized in the S-PLUS Reference Manual, in the S-PLUS help window, and in Appendix 3 of Becker, Chambers and Wilks (1988). Use one of these when you know what type of functionality you want but do not know the name of the function that performs it. The categorization could use some improvement, but is still useful. If your system is properly conﬁgured, you can get hard-copy versions of help ﬁles (including your own) via the help function. S Poetry c 1998 Patrick J. Burns v1.0

4.1. EFFICIENCY

71

4.1

Eﬃciency

Improving the eﬃciency of your S functions can be well worth some eﬀort. Here are a few functions that can often be used to make your functions more direct. But remember that large eﬃciency gains can be made by using a better algorithm, not just by coding the same algorithm better. when the cities lie at the monster’s feet there are left the mountains. 13 Whenever you have a number of elements that are to be selected from a group of elements, match should come to mind. It can be used to avoid some loops. The match function has three arguments x, table and nomatch (the S-PLUS version also has an incomparables argument). It returns a numeric vector as long as x. The ith element of the result is the index of table that matches the ith element of x. nomatch states what should be used when the element of x doesn’t match anything in table. Although straightforward, match can be employed in a variety of ways—I always use the fact that the result has the same length as x as a hint about how to use it in a given situation. Suppose that we have two character vectors many.strings and good.strings and we want the elements of many.strings that are in good.strings. This is done with: many.strings[match(good.strings, many.strings, nomatch=0)] A speciﬁc example is: > letters[match(c("e","D","ff","z"), letters, nomatch=0)] [1] "e" "z" Let’s think about what is happening. match gives indices for its second argument— which is many.strings which is the vector being subscripted—so we have hope of getting the right subscripts. We know that the correct answer can’t be longer than good.strings, and since that is the ﬁrst argument to match, that must be true. So it is certainly a feasible solution, and in fact it isn’t hard to see that it really is doing what we want. Now suppose that we have a named vector the.data and we want to remove any elements that have names in bad.ones (see the soptions function on page 45 for a similar situation). Here is one way of getting this: the.data[!match(names(the.data), bad.ones, nomatch=0)] match returns a numeric vector that contains zeros for elements that do not match and a non-zero for those that do match a bad.ones element. The “bang” operator coerces this to logical so the subscripting vector is logical and what we want. Here is an example: S Poetry c 1998 Patrick J. Burns v1.0

72

CHAPTER 4. VOCABULARY

> jjwt dorothy harold munchkin stevie 14 15 2 4 > jjwt[!match(names(jjwt), c("stevie","esther"), nomatch=0)] dorothy harold munchkin 14 15 2

DANGER. An almost equivalent version is: the.data[ -match(bad.ones, names(the.data), nomatch=0)] This works only as long as at least one match exists. First, let’s look at what happens when it is okay. match gives indices for the names of the.data that are in bad.ones plus possibly some zeros; these are made negative and used in subscripting. The subscripting ignores the zeros and removes elements corresponding to the negative subscripts. When there are no matches, then the subscripting vector is all negative zeros, which are thought to be positive zeros, so the result of the whole operation is a vector of length zero, rather than the whole of the.data as we want.

DANGER. If the elements of the table argument are not unique, only the ﬁrst occurrence is used. > match(c("here","aa","b","here"), c("here",letters,"here")) [1] 1 NA 3 1 An example of when this might come into play is selecting all of the rows of a matrix that have a column matching some values. As here where we want the rows of the matrix where the ﬁrst column contains either a 1 or a 2. > jjmatr <- cbind(rep(1:3,4), rep(1:4,rep(3,4))) > jjmatr[match(1:2, jjmatr[,1]),] # wrong [,1] [,2] [1,] 1 1 [2,] 2 1 > jjmatr[!is.na(match(jjmatr[,1], 1:2, no=NA)),] # right [,1] [,2] [1,] 1 1 [2,] 2 1 [3,] 1 2 [4,] 2 2 [5,] 1 3 [6,] 2 3 S Poetry c 1998 Patrick J. Burns v1.0

4.1. EFFICIENCY [7,] [8,] 1 2 4 4

73

Here is a function that uses match. "p.replace"<function(x, old, new) { if(length(old) != length(new)) stop("old and new must be same length") x.ind <- match(x, old, nomatch = 0) x[as.logical(x.ind)] <- new[x.ind[x.ind != 0]] x } This substitutes values from the new argument for values in the old argument. Notice that the name begins with “p.” which is my personal preﬁx for modiﬁed in-built functions (see page 41). There is an in-built replace, but even if there hadn’t been, I would have wanted to put my preﬁx on since “replace” seems such a common name. An example of its use is: > p.replace(c("The", "corgi", "pranced", "by", "the", "corgi"), + old=c("corgi", "pranced"), new=c("kitty", "strolled")) [1] "The" "kitty" "strolled" "by" "the" "kitty" For substituting characters within strings, see the transcribe function on page 276. The match function is used in many spots, including the to.base10 function (page 236). Functionality related to that of match exists in pmatch, charmatch (both discussed on page 94) and grep (page 83). I ﬁnd the outer function one of the most interesting in S. The statement outer(x, y) produces the equivalent of the mathematical statement xy (resulting in a matrix) when x and y are vectors. outer is much more general than this though. First oﬀ, it can use any function that is vectorized for two arguments, not just the default multiplication. An example that might be surprising is jj2let <- outer(letters, letters, paste, sep="") S Poetry c 1998 Patrick J. Burns v1.0

74

CHAPTER 4. VOCABULARY

This produces a 26 by 26 matrix containing each two letter combination. (Additional arguments to the function can be given—the sep="" is passed to the paste call.) The ﬁrst two arguments to outer may be arrays as well as plain vectors, the result is an array with dimension equal to c(dim(as.array(x)), dim(as.array(y))) So the command outer(jj2let, letters, paste, sep="") produces a 26 by 26 by 26 array of all the combinations of three letters. There are times when using outer cuts down on execution time, but increases memory use—one of those unfortunate times when you must decide between evils. The interpolator.lagrange function (page 325) and exp.integral (page 272) provide examples of outer. sweep also involves arrays. If you want each element of a vector to be subtracted from the corresponding row of a matrix, then you can just rely on vectorization and automatic replication to do that: > jjmat [,1] [,2] [,3] [,4] [,5] [1,] 1 4 7 10 13 [2,] 2 5 8 11 14 [3,] 3 6 9 12 15 > jjmat - 1:3 [,1] [,2] [,3] [,4] [,5] [1,] 0 3 6 9 12 [2,] 0 3 6 9 12 [3,] 0 3 6 9 12 However, if you want the operation to be relative to columns, then sweep comes to the rescue: > sweep(jjmat, 2, 1:5, "-") [,1] [,2] [,3] [,4] [,5] [1,] 0 2 4 6 8 [2,] 1 3 5 7 9 [3,] 2 4 6 8 10 The “2” gives the dimension to work on—like the MARGIN argument to apply (see the next section). The 1:5 is the vector to use as the second argument to the function, and subtraction is the function being used. S Poetry c 1998 Patrick J. Burns v1.0

name[1:9]. 4. lapply. the appropriate apply function is faster—often dramatically so—than the equivalent loop. mean) lapply(jjl. When you use apply. state. This produces a list that has one component for each category in state. The second two are the same in the sense that a function (rather than the name of a function) is being passed into lapply as the FUN argument. The apply function is for arrays. function(x) sum(x)/length(x)) The ﬁrst two are the same in the sense that the same function is applied. In general. "mean") lapply(jjl. There’s an example of this on page 28.region. the MARGIN argument speciﬁes the dimensions that you want to keep. sapply and tapply all apply a function to sections of some object. These are used in lieu of a for loop.2.name vector. Each apply function also accepts an arbitrary number of additional arguments to be given to FUN.region is a category that corresponds to the state. and each of the components is a vector of the state names within the region. and any function that is vectorized on two arguments.4. then use split. So a statement like S Poetry c 1998 Patrick J. An example is > split(state.2 The Apply Family The functions apply. When you have an atomic vector and you want to make it into a list with each element of the vector in some component of the list.region[1:9]) $Northeast: [1] "Connecticut" $South: [1] "Alabama" "Arkansas" "Delaware" "Florida" $"North Central": character(0) $West: [1] "Alaska" "Arizona" "California" "Colorado" state.0 . All three of the following are essentially equivalent: lapply(jjl. THE APPLY FAMILY 75 sweep can handle higher dimensional arrays. Each of the four apply functions have an argument named FUN which can be either a function or the name of a function. Burns v1.

When the function used in apply returns a vector. sort) } Note that it seems unlikely that you would really want to do such an operation. i]) } x } fjjcsort2 function(x) { apply(x. Here is a function to sort the columns of a matrix using a loop and an equivalent function using apply. The stable. fjjcsort2(freeny.apply function on page 287 can be used to keep the dimensions of the result in the same order as those of the input. VOCABULARY saves the second dimension while the ﬁrst dimension (and any others) are “collapsed” into the function call. S Poetry c 1998 Patrick J.x. sort) will return the transpose of what you would naively expect. "fjjcsort1"<function(x) { for(i in 1:ncol(x)) { x[.sort(x[. 1. DANGER. 2. For higher dimensional arrays.equal(fjjcsort1(freeny.x)) [1] T > dim(jjm) [1] 20 1000 > dim(jjm2) [1] 20 10000 Using two diﬀerent sizes of matrix we time these two versions. myfun) CHAPTER 4. we want to test that they do the same thing—no matter how obvious. > all. the MARGIN can have length greater than one. So a command like: apply(freeny. then that becomes the ﬁrst dimension of the array that is returned. In recent versions of S-PLUS. the apply functions oﬀer a speed advantage over the equivalent for loop. 2. i] <. As always. these again are the dimensions to be saved.76 apply(xmat. Burns v1.x).0 .

but there is a trick that works (that I wish I’d thought of myself). lapply applies a function to each component of a list: > lapply(list(a=jjm.49 46 7459368 > p.time(for(i in 1:10) fjjcsort2(jjm)) comp time clock time memory 34. Burns v1.2.xatt ans } After making sure that it does what we want. dim) $a: [1] 5 3 $b: [1] 3 5 S Poetry c 1998 Patrick J.time(for(i in 1:10) fjjcsort3(jjm)) comp time clock time memory 9.time(fjjcsort3(jjm2)) comp time clock time memory 13.time(fjjcsort1(jjm2)) comp time clock time memory 45. There is an indication.130001 9 2875944 > p.02 35 3695144 > p.unix. we time it like the others: > p.unix. b=t(jjm)).time(for(i in 1:10) fjjcsort1(jjm)) comp time clock time memory 42.time(fjjcsort2(jjm2)) comp time clock time memory 39.24 43 3088936 > p. though.unix.unix.x[order(col(x). THE APPLY FAMILY > p. x)] attributes(ans) <. "fjjcsort3"<function(x) { xatt <. Always it is going to be more eﬃcient if you can vectorize the problem.0 .unix. This problem seems unamenable to doing it all at once.25 14 14856744 Now we have some substantial speed improvement.4. that memory use could be a problem with the apply version.attributes(x) ans <.12 40 21688872 77 We can see that apply does indeed improve the speed.unix.

frame(jjm)). but the output above was from Version 3. b=t(jjm)). lapply and sapply can take a vector as their ﬁrst argument.] 3 5 sapply produces a matrix when all of the results are the same length. + function(x) dim(x)) [[1]]: [1] 5 3 [[2]]: [1] 5 3 This bug has sometimes been attributed to the fact that lapply is internal in recent versions of S-PLUS. but generally returns a simpliﬁed object. VOCABULARY sapply is used in the same situations as lapply. DANGER.data. as. The following example assumes that we have two lists of matrices and we want the result to be a list whose components are the products of the corresponding components of the original lists.0 . then the result of sapply is the same as unlist of the same call to lapply. b=t(jjm)).1 where lapply uses a for loop. dim)) a1 a2 b1 b2 5 3 3 5 > sapply(list(a=jjm. Burns v1. > lapply(list(jjm.data. Internal generic functions (see page 225) do not work correctly in lapply or sapply. When the function being applied returns a single number in each case. S Poetry c 1998 Patrick J. If the function returns vectors that are all one length. as.] 5 3 [2. then these two are diﬀerent: > unlist(lapply(list(a=jjm. One use of this is to apply functions where you want more than one of the arguments to change. dim) a b [1.78 CHAPTER 4. then sapply is the same as lapply.frame(jjm)). dim) [[1]]: [1] 5 3 [[2]]: NULL The workaround is to write a wrapper function: > lapply(list(jjm. If the results vary in length.

but with a missing value because there are no male cats. list(jjgender. y=jjblist) 79 The trick is to write a function whose ﬁrst argument is used to subscript the other arguments. jjtype). jjtype). Burns v1. "cat"][[1]] munchkin stevie 2 4 S Poetry c 1998 Patrick J. "cat"] [[1]]: munchkin stevie 2 4 > jjlistmat["female".tapply(jjwt.value) [1] "numeric" This is a typical numeric matrix. function(i. Here we look at the mean of jjwt within each combination: > tapply(jjwt. mean) cat corgi female 3 14 male NA 15 > mode(. 2 14 male NULL. The following three vectors are used in the discussion of tapply. This is sometimes called a ragged array.Last.4. but there is not necessarily an equal number of points in each combination of categories. the result of the following command gives us a matrix of the same shape again.0 . but this time the mode of the matrix is list: > jjlistmat <. list(jjgender. x. 0 15 > jjlistmat["female".2. y) + x[[i]] %*% y[[i]]. x=jjalist. > jjwt dorothy harold munchkin stevie 14 15 2 4 > jjgender [1] "female" "male" "female" "female" > jjtype [1] "corgi" "corgi" "cat" "cat" tapply is used in situations where you have one or more categories. + function(x) x) > jjlistmat cat corgi female numeric. However. THE APPLY FAMILY > lapply(seq(along=jjalist).

4 of S-PLUS added a simplify argument to tapply which you can set to FALSE to ensure that you always get an object of mode list. The last two results are essentially equivalent. We can make the list that split creates look more inviting by changing the category we give it: > split(jjwt. This is object-oriented and can do some things that tapply can’t. jjtype))) $"1": munchkin stevie 2 4 $"3": dorothy 14 $"4": harold 15 The names of this list show that tapply did return the indices for jjlistmat. the ﬁrst argument could be optional when FUN is missing since its values are not used. paste(jjgender. and hence still required.80 > jjlistmat["female".) Here we use tapply without FUN to produce a vector to be used by split: > split(jjwt. Burns v1.0 . then the result is a vector of indices into the array that is created when FUN is given. When FUN is not given. The FUN argument is optional in tapply. list(jjgender. S Poetry c 1998 Patrick J. (Logically. jjtype)) $"female cat": munchkin stevie 2 4 $"female corgi": dorothy 14 $"male corgi": harold 15 An alternative to tapply that is in S-PLUS is by. tapply(jjwt. VOCABULARY Version 3. "corgi"] [[1]]: dorothy 14 CHAPTER 4. but it is used for its length.

4.0 .order(x) xsort <. The order function produces the vector of indices that will sort a vector. go back to page 29. The apparition of these faces in the crowd. A more common example is to sort the rows of a matrix relative to one of the columns: S Poetry c 1998 Patrick J. few of these functions will be discussed more than brieﬂy. x[order(x)] is the same as sort(x) order is useful when you want to sort several vectors in parallel: xord <. The rep function—one of the most useful in S—was discussed earlier.3.3 Pedestrian First is a list of the truly pedestrian. If you missed it. cummin and kronecker. Functions that are S-PLUS additions that may not be in other versions are cumprod. cummax. PEDESTRIAN 81 4. ysort probably isn’t sorted but it is in the order that made x sorted.x[xord] ysort <. 14 abs sqrt exp log log10 sum prod cumsum cumprod min max pmin pmax cummin cummax range gamma lgamma round signif trunc ceiling floor sin cos tan asin acos atan sinh cosh tanh asinh acosh atanh Re Im Mod Arg Conj length c list unlist names attributes attr array dim dimnames unique duplicated matrix nrow ncol row col rbind cbind t crossprod solve backsolve eigen svd qr chol kronecker scale interp approx fft Most of the mathematical functions accept complex as well as numeric inputs. That is. Burns v1.y[xord] xsort is actually sorted.

shorter arguments are replicated.".000 7. ] CHAPTER 4. The command paste(1:9. y.000 2. and the result is a vector of substrings.000 9.000 8. substring takes a character vector. For example nchar(state. sep=". These arguments are replicated to each be as long as the longest. VOCABULARY If the order to use depends on more than one vector—with secondary vectors breaking ties—then order is also used: xyzord <.order(x.rev(order(x)) paste is very useful for dealing with character data. collapse=" ") results in the single string "1.1]). A character vector is created which is as long as the longest argument.000" nchar returns a vector which is the count of the number of characters in each element of its argument.000 6. To reverse the order of a vector or list.82 xmat[order(xmat[.000 5. use the rev function. Burns v1.000 3.bigfirst <. So you might have xord. The collapse argument turns the result into a single string. Note that backslash-something counts as just one character.000 4. and a numeric vector of the last character to be used (which defaults to some large number). The sep argument is a character string that is placed at each spot that is pasted together. DANGER. To break a string into individual characters do: S Poetry c 1998 Patrick J. not the number of keystrokes that you need to type to get it.0 . a numeric vector of the ﬁrst character to be used. Any number of arguments can be given to it plus the two special arguments sep and collapse (which must be given by their full names).name) is a numeric vector of length 50 containing the number of letters (and spaces) in each state name. z) An example of this use is in the fjjcsort3 function on page 77. "000".

4.3. PEDESTRIAN substring(x, 1:nchar(x), 1:nchar(x))

83

If for a particular element last is less than first, then the result for that element will be the empty string. In particular, last can be zero. grep returns the indices of a character vector (second argument) that match the pattern (ﬁrst argument). When you use the grep function, it is tempting to think of Unix wildcarding as with Unix ls. This is especially true when using the pattern argument to objects which creates a call to grep. But instead you need to think of regular expressions. DANGER. The behavior of grep may be somewhat machine dependent.

> jjgreptest [1] "abc"

"a.bc"

"a\\bc"

"a\tbc" "aabbcc"

> grep(".", jjgreptest) # finds everyone [1] 1 2 3 4 5 > grep("\.", jjgreptest) # still finds everyone [1] 1 2 3 4 5 > grep("\\.", jjgreptest) # finds the dot [1] 2 Now I search for backslashes: > grep("\\\\\\\\", jjgreptest) [1] 3 There are 8 backslashes in the command above—escape characters grow in powers of two. > grep("\\\t", jjgreptest) [1] 4

DANGER. If there are any newlines in your strings, you are in trouble: > jjgreptest2 [1] "abc" "a.bc" "a\\bc" > length(jjgreptest2) [1] 5 > grep("bc", jjgreptest2) [1] 1 2 3 5 6

"a\nbc" "aabbcc"

S Poetry c 1998 Patrick J. Burns v1.0

84

CHAPTER 4. VOCABULARY

It is a little unusual to see something in the 6th element of a 5-element vector.

A couple of specialized S functions that deal with character strings are abbreviate and make.names. Access Unix functionality with the unix function. The ﬁrst argument is a character string that gives the Unix command. The output argument is an important control: > jjd1 <- unix("date") > jjd1 [1] "Wed Mar 26 15:22:23 PST 1874" > jjd2 <- unix("date", out=F) Mon Nov 28 15:22:45 GMT 1757 > jjd2 [1] 0 When output is TRUE (the default), then the output of the Unix command is captured in an S character vector (one element for each line of the output) and returned. If output is FALSE, then the output of the command goes to standard-out and the return value in S is the status of the Unix command. The Unix convention is that 0 means no problem and a non-zero value means some type of error. So S code might look something like: if(status <- unix(cmd, out=F)) stop(paste("Unix trouble, code", status)) If you have several commands, you need to put them all into one string with the commands separated by semicolons or newlines: > cmds <- c("date", "echo $SHOME") > unix(paste(cmds, collapse=";"), out=F) Sun Oct 14 20:51:20 EST 1894 /usr/lang/s [1] 0 There is also an input argument to unix. This is seldom used since usually inputs are contained in the command argument. The perl function (page 275) provides a good example of the use of the input argument. In S-PLUS there is also the unix.shell function which allows you to select the shell that will be used rather than being forced to use the Bourne shell: > unix("echo $shell") [1] "" > unix.shell("echo $shell", shell="/bin/sh") S Poetry c 1998 Patrick J. Burns v1.0

4.3. PEDESTRIAN [1] "" > unix.shell("echo $shell", shell="/bin/csh") [1] "/bin/csh"

85

Arrays are generally created with the array function. You may create matrices either with array or with matrix. The advantage of array is that it is slightly faster (matrix calls array). The advantages of matrix are that you only need to know one of the dimensions and you can choose to ﬁll along either columns or rows. The diag function also creates matrices. The behavior of this function is highly dependent on its argument. if you give it a single number, it returns an identity matrix of that size: > diag(3) [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 If you give it a vector, it returns a square matrix with that vector along its diagonal: > diag(c(4.3, 5.7, -2.9)) [,1] [,2] [,3] [1,] 4.3 0.0 0.0 [2,] 0.0 5.7 0.0 [3,] 0.0 0.0 -2.9 If you give it a matrix, it returns the diagonal of the matrix: > diag(freeny.x) [1] 8.79636 4.70217 > diag(as.matrix(3)) [1] 3

5.83112 12.98060

There are arguments to control the shape of the result, and there is an assignment form to change the diagonal of a matrix. DANGER. When an array with three or more dimensions is given to diag, then it is treated as a simple vector (creates a large diagonal matrix) rather than extracting some deﬁnition of the diagonal from the array.

S Poetry c 1998 Patrick J. Burns v1.0

86

CHAPTER 4. VOCABULARY

lower.tri returns a logical matrix the same size as its argument. Its value is TRUE for all elements below the diagonal. The S-PLUS (version 3.2 and later) version of the function has a diag argument that allows the diagonal elements to be TRUE also. Similar functionality can be achieved with row and col; for example, to get the super-diagonal do: > row(jjmat) - col(jjmat) == -1 [,1] [,2] [,3] [,4] [,5] [1,] F T F F F [2,] F F T F F [3,] F F F T F [4,] F F F F T [5,] F F F F F A command like solve(A) %*% x # avoid this is an ineﬃcient use of solve. The reason that solve is not called something like invert is precisely this case. We don’t really care about the inverse of A, we only want the ﬁnal answer. The more eﬃcient command is: solve(A, x) backsolve is available when you have a triangular set of equations to solve. Given one or more vectors that can be thought of as categories, the table function counts up the number of occurrences in each combination. It is called “table” because of the statistical concept of a contingency table, but nonstatisticians may think of it as a binning function. > table(jjgender) female male 3 1 > table(jjtype) cat corgi 2 2 > table(jjgender, jjtype) cat corgi female 2 1 male 0 1 Somewhat related is the cut function. While table counts the number of occurrences of categories, cut breaks numerical data into categories. This is another sense of binning. S Poetry c 1998 Patrick J. Burns v1.0

4.3. PEDESTRIAN

87

DANGER. Do not abbreviate the breaks argument of cut to break: > cut(jjwt, break=seq(0,25,by=5)) Syntax error: "=" used illegally at this point: cut(jjwt, break= break is a reserved word in S, so the parser is ever on the lookout for it. Any of the following forms are acceptable though: > cut(jjwt, brea=seq(0,25,by=5)) [1] 3 3 1 1 attr(, "levels"): [1] " 0+ thru 5" " 5+ thru 10" "10+ thru 15" [4] "15+ thru 20" "20+ thru 25" > cut(jjwt, breaks=seq(0,25,by=5)) [1] 3 3 1 1 attr(, "levels"): [1] " 0+ thru 5" " 5+ thru 10" "10+ thru 15" [4] "15+ thru 20" "20+ thru 25" > cut(jjwt, "break"=seq(0,25,by=5)) [1] 3 3 1 1 attr(, "levels"): [1] " 0+ thru 5" " 5+ thru 10" "10+ thru 15" [4] "15+ thru 20" "20+ thru 25"

Order dependent counting is done with the S-PLUS function rle (run-length encoding). The diff function performs diﬀerencing of successive elements. When x is a vector, then diff(x) is equivalent to x[-1] - x[-length(x)] however, diff has arguments to control the number of diﬀerences, and the lag. Also diff does each column of a matrix separately. aperm stands for array dimension permutation. Consider the three-dimensional array: S Poetry c 1998 Patrick J. Burns v1.0

88 > jja , , 1 [1,] [2,] , , 2 [1,] [2,] , , 3 [1,] [2,] , , 4 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24 > dim(jja) [1] 2 3 4 [,1] [,2] [,3] 13 15 17 14 16 18 [,1] [,2] [,3] 7 9 11 8 10 12 [,1] [,2] [,3] 1 3 5 2 4 6

CHAPTER 4. VOCABULARY

aperm is usually used with two arguments—the array to be permuted and the permutation. The ith element of the permutation vector is the dimension in the current array that will be the ith dimension in the resulting array. > aperm(jja, c(2,3,1)) , , 1 [1,] [2,] [3,] , , 2 [1,] [2,] [3,] [,1] [,2] [,3] [,4] 2 8 14 20 4 10 16 22 6 12 18 24 [,1] [,2] [,3] [,4] 1 7 13 19 3 9 15 21 5 11 17 23

If the ith element of the vector that you have contains the dimension in the result that is the ith current dimension, then use order on the vector: > aperm(jja, order(c(2,3,1))) S Poetry c 1998 Patrick J. Burns v1.0

4.3. PEDESTRIAN

89

, , 1 [1,] [2,] [3,] [4,] , , 2 [1,] [2,] [3,] [4,] , , 3 [1,] [2,] [3,] [4,] [,1] [,2] 5 6 11 12 17 18 23 24 [,1] [,2] 3 4 9 10 15 16 21 22 [,1] [,2] 1 2 7 8 13 14 19 20

Let’s review. If the values are for the input and the indices are for the result, then the vector is appropriate to use as the perm argument. If the values are for the result and the indices are for the input, then you need to use order. Be careful when testing with three-dimensional arrays since (2, 3, 1) and (3, 1, 2) are the only permutations that do not map to themselves. > jjperm3 [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 2 2 3 3 [2,] 2 3 1 3 1 2 [3,] 3 2 3 1 2 1 > jjperm3o <- apply(jjperm3, 2, order) > jjperm3o [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 2 3 2 3 [2,] 2 3 1 1 3 2 [3,] 3 2 3 2 1 1 > apply(jjperm3 != jjperm3o, 2, all) [1] F F F T T F > jjperm3[,.Last.value] [,1] [,2] [1,] 2 3 [2,] 3 1 [3,] 1 2 S Poetry c 1998 Patrick J. Burns v1.0

90

CHAPTER 4. VOCABULARY

The .Machine object is a list of the machine constants, like largest integer and epsilon. It is often useful to pass pieces of this into C or Fortran code that you have written to eliminate machine dependency of your S functions. pi is another number that is a part of the S language.

4.4

Input and Output

Commonly a more frustrating part of computing is getting data from one program to another. S is no exception. Essentially all data that have not already been made into S objects must come into S via scan, which reads ASCII data. The use of scan ranges from very simple, to exceptionally complicated. A strange, but at times useful, way to use scan is to read from standard input rather than from a ﬁle—cutting numbers from some window and pasting to the S session is one example. In the remainder of the discussion of scan, the input will be assumed to be a ﬁle. If a simple vector is to be read in, then you merely need to give the name of the ﬁle and possibly an indication of what mode to give the data (numeric is the default). Otherwise, the ﬁle is assumed to be in the form of records and ﬁelds. Typically there is one record per line in the ﬁle, but this is not mandatory. By default, ﬁelds are assumed to be separated by white-space, that is, by spaces, tabs or newlines. The sep argument to scan controls the separation character, and this default is denoted by sep="". (Since a separator of no characters doesn’t make sense, the empty string was available for a compound choice.) If the ﬁle contains strings containing spaces, then the default separator will not work. Tabs and semicolons are popular choices for separators. Only the ﬁrst character of the sep argument is used. Once past the problem of the separator, the key argument to scan is what. The what argument provides a template for the resulting object. Here are three examples: jjnum <- scan("numbers.txt") jjchar <- scan("text.txt", "", sep="\t") jjrecords <- scan("records.txt", list(name="", age=0, NULL, volume=0), sep=";") The ﬁrst example is with a ﬁle that contains a bunch of numbers separated by white-space; the result, jjnum, is a numeric vector. The second example yields a character vector and the separator is a tab, presumably to allow spaces within strings. The ﬁnal example will result in a list with four components. The ﬁrst component will be named "name", will be character, and will be read from the ﬁrst ﬁeld of each record. The second component is numeric and corresponds to the second ﬁeld in the ﬁle. The third ﬁeld is not read because of the NULL in the third component of the what argument, and this component will be NULL in jjrecords. The fourth component is once again numeric. If the ﬁle contained S Poetry c 1998 Patrick J. Burns v1.0

4. possibly even in conjunction. It grows the objects incrementally so it fragments memory some. You can read everything into S—probably all as character—and straighten it out with S commands. Fortran uses a “d” rather than an “e” to indicate the exponent. You will probably have problems if you try to read in numbers that were double-precision Fortran. You can use tools like Perl. then you can save some memory by creating the what argument to be the same size as the ﬁnal result. Some ﬁles don’t conform to the records and ﬁelds format. The easiest if you have S-PLUS is to use the widths argument to scan.line allows records to span more than one line in the ﬁle. If you are reading in a large object and you know how big it is. The convenience of read. As scan reads the ﬁle.table is diminished by the fact that it will decide on its own to use a ﬁeld as the row names of the result. If there are some parts of the ﬁle that you don’t want to read in.table to read the ﬁle and return a data frame. The main purpose of this argument is when there is header information that does not look like the actual data. INPUT AND OUTPUT 91 more than four ﬁelds per record. S Poetry c 1998 Patrick J. This wastes memory in two ways.0 .) If your ﬁle is very nicely arranged in records and ﬁelds. The ﬁeld will then not appear in the data frame. awk and sed to convert the ﬁle to a more hospitable format. then the statement above would be wrong. and multi. it almost surely will overshoot the size that it needs. DANGER. The flush argument allows you to ignore unused ﬁelds. The n argument says how many items in all (records times ﬁelds) are to be read. Burns v1. skip says how many lines at the top of the ﬁle not to read. scan does not understand the “d” and assumes that the value is not a valid number.fields which creates a new ﬁle that contains the ﬁelds interspersed with a separator. You need to convert (carefully) the “d”s to “e”s before using scan. The other way is to use make. or if not all records were on a single line. and it seems impossible to get what you want.4. There are two strategies you can use. then use the skip and n arguments. I suggest that you always give a value for the row. (Or possibly read them into S as character. and the last time that it grows. There is no free lunch. and then do the conversion.names argument to avoid surprises. then there are two ways of getting the data read in. If your ﬁle is in ﬁxed format (meaning each ﬁeld takes a certain number of characters). It does accept ﬁxed format ﬁles. which takes a vector of the ﬁeld widths. it keeps increasing the size of where it is putting the data. then you can use read. DANGER.

get function to import data from SAS. The name of the argument should deﬁnitely be x (see the discussion of arguments to generic functions on page 128).. you want . form in the argument list of your print methods. dump.restore.myfun() print(jj) You also want to have the ... and many print methods use cat to do part of the printing. S-PLUS (and statlib) has a sas. data.dump.0 . S Poetry c 1998 Patrick J. then the backslash-n will be visible within the string (and the string will generally be surrounded by quotes). If you print a character string with a backslash-n in it. see the discussion starting on page 96 regarding data. to those calls. VOCABULARY For transferring S objects around. then the text following the backslash-n will be on the next line of output—cat interprets the backslash-n rather than just reproducing it..print(myfun()) will work the same as jj <.92 CHAPTER 4. Often your print method will call print and you can pass . but in most other respects are quite diﬀerent. > print("\\blah\nblah") [1] "\\blah\nblah" > cat("\\blah\nblah") \blah blah> Notice that the S prompt is directly after the string that was catted. print is a generic function. The reason to have the return value be the input is so that jj <. because your method may be called by another print call that uses arguments that your method does not have. This is because there was not a backslash-n at the end of the string. Burns v1. An example would be if an object of your class is in a list that is printed. Even if you don’t have print calls. When you are writing a print method.. There are also several commercial programs that allow easy transfer between a number of packages including S. But if you cat that same string. print and cat both create printed output. print produces representations of S objects.. while cat is a low-level function that interprets character strings. the last line of the function should be: invisible(x) This makes the return value x while avoiding an inﬁnite loop of printing. source and restore.

write. Although write is likely to fool you more than help you.000000" " 3. Unfortunately in version 3 of S there is no convenient way to send the output both to the screen and to the ﬁle. The echo option doesn’t take eﬀect until after the option call. the S-PLUS function write. if TRUE. Burns v1. means that the command is repeated.] " 3.4. Notice the blank spaces so that the decimal points line up: > format(c(222. There are numerous additional arguments to cat to control its output. pi))) [. you can control whether or not numbers are written in scientiﬁc notation among other features. then each element of the second argument and so on.out") > freeny.table writes a matrix or a data frame to an ASCII ﬁle as you would expect it to be written.matrix(format(c(222. The sink function allows you to send the output from S commands to a ﬁle. INPUT AND OUTPUT 93 While print always prints to standard-out.table is useful.] "222.1] [1.out") sink("jj. Thus numbers printed by cat have their full precision.2 and later. The format function is mainly used to turn numbers into suitably aligned character strings. We can see what happens after the second sink command (with no arguments) which changes the output back to normal. In S-PLUS version 3. cat takes an arbitrary number of arguments—each element of the ﬁrst argument is printed. S Poetry c 1998 Patrick J.000000" [2. It returns strings that all have the same number of characters.4. Here is an example of how to put both the S commands and their results into a ﬁle: > options(echo=T) > sink("jj.x > sink() > 1:3 1:3 [1] 1 2 3 > options(echo=F) options(echo=F) The echo option.141593" > as. pi)) [1] "222. Note that cat does not obey the digits option.141593" See the justify function on page 278 for similar functionality. and hence it ends up in the sink ﬁle. you can tell cat to send its characters to a ﬁle. as print and format do.0 .

ok argument. you can not nest sink commands. "Mont". In the older versions. and several state names begin with "New". "New"). The following commands have ambiguous matches for "Ala" and "New" since both "Alaska" and "Alabama" partially match "Ala". the main diﬀerence occurs when there is an ambiguous match. VOCABULARY DANGER. "")) [1] 27 > pmatch("". state. 4. "Mont". There are two additional diﬀerences between them. nomatch=Inf) [1] Inf 26 Inf pmatch considers an ambiguous match to be nomatch while charmatch always returns 0 for an ambiguous match. then match won’t do. are – 15 When you have a vector of strings to match and they are possibly abbreviated. nomatch=0) [1] 0 The second is that pmatch has a duplicates.) They are almost identical in eﬀect. "").4. "Mont". and charmatch in S-PLUS. "New"). Understanding the diﬀerences can speed programming and eliminate bugs. "b".0 . In particular. state. the dump command will work okay but your sink will be wiped out—output will go to standard-out as usual. letters) [1] 1 2 NA S Poetry c 1998 Patrick J.name) [1] 0 26 0 > pmatch(c("Ala". c(letters. "a"). (These two functions were produced at about the same time in two diﬀerent locations. Where the meanings. > pmatch(c("Ala". If you are using a version of S prior to S-PLUS version 3. "New"). each call to sink rescinds the previous one.94 CHAPTER 4. state. > pmatch(c("a". The ﬁrst is that pmatch does not match empty strings: > charmatch("". We can ﬁnd no scar. c(letters. What does work is pmatch. But internal diﬀerence. Burns v1.5 Synonyms This section discusses functions that are similar to another. dump uses sink so if you use dump while a sink is in eﬀect.name.name) [1] NA 26 NA > charmatch(c("Ala".

"jj45")) rm(jja. You may have forgotten to use any or all. remove takes a character vector of the names of the objects to be deleted. SYNONYMS > pmatch(c("a". and always steps 1 or -1. then the sequence has zero length also—this is very useful in for loops. the ifelse function. T)) "true" else "false" [1] "false" Warning messages: Condition has 2 elements: only the first used in: if(c(F. 95 The rm function is a short cut for remove.4. "a"). and the operators &&. The two following commands would do the same thing: remove(c("jja". These do not evaluate their second argument S Poetry c 1998 Patrick J. starting point (the default is 1). jj45) rm removes objects only from the working database. The important part is that if x has length zero. and length. remove has arguments where and frame so that you can select the location of the deletions. Burns v1. if takes a logical value that is meant to be just one long. T)) "true" else "false" Do not ignore such warnings—it means you have at least done something silly and more likely that there is a bug. The statement seq(along=x) returns a sequence starting at 1 that is as long as x. letters) [1] 1 2 1 charmatch forces duplicates to be okay. then the ﬁrst value is used and a warning is issued: > if(c(F. "b". if the length is greater than one. The unabbrev. letters.value function (page 218) uses pmatch. which it calls. A handy argument is along. rm just takes the unquoted names of the objects.0 . seq allows arbitrary step sizes (the default is 1).5. "a"). || and | are all related. dup=T) [1] 1 2 1 > charmatch(c("a". The if construct. The seq function is a generalization of the : operator. The : operator is usually given integers. "b". The && (and control operator) and || (or control operator) operators go with if and length one logicals. ending point (the default is 1). &.

Suppose we have two (length one) logical values called logical.0 Let’s summarize. ifelse is very S in being vectorized. restore is almost like source. data.na(x) | x < 0] <. for instance || does not need to look at the second argument if the ﬁrst one is TRUE. there is the xor function which performs a vectorized exclusive-or operation.val1 and logical.dump also produces ASCII ﬁles. 1) # correct if(logical.0 .vec2. Most often data. so it is not appropriate that they should reserve judgment on evaluating their second argument.restore.96 CHAPTER 4. then use write.vec1 && logical. so it needs to use & and |. but it is less useful than you might think.val1 & logical. can be done more directly with x[is. 0. 1) # wrong ifelse(logical. For instance. the more problem there is with restoring a dump ﬁle.val2.val1 & logical.vec2) 0 else 1 # wrong if(logical. The ifelse function performs a vectorized decision.vec2.val2. 0. On the other hand.na(x) | x < 0. The larger the size of the objects.dump.dump should be used for other types of objects. Burns v1.vec1 && logical. The reason that there are two choices is that ﬁles created by dump can take an enormous amount of memory (and hence time) when they are read back into S. x) which changes missing values and negatives in x to 0. Then here are several statements and their acceptability: if(logical. VOCABULARY if they already know the answer. and we have two logical vectors with length greater than one called logical. Since functions are generally small and commonly read by humans. but the format is decidedly tailored towards machines rather than humans. The most common is to use source.vec2) 0 else 1 # wrong ifelse(logical. dump is usually used for them.vec1 and logical. then it must be read back into S with data.vec2.ifelse(is. 0.) dump creates representations that are (relatively) easy for humans to comprehend—the format is S commands.val2) 0 else 1 # correct ifelse(logical. 1) # slightly inefficient There are two ways of putting ASCII representations of S objects into ﬁles.vec1 & logical. By the way. the statement: x <. but there is a S Poetry c 1998 Patrick J. There are several ways to read ﬁles created by dump.vec1 & logical.table (page 93) or cat. If a ﬁle was produced by data.val1 && logical. (If you want just the values without retaining the characteristics of the S object. & and | are vectorized “and” and “or” operators. 0.val2) 0 else 1 # slightly inefficient if(logical.

but can return something other than NULL for objects that do not have a class.mode and mode always give the same answer. which.na.na can save a substantial amount of memory. "single" or "integer". data. 10:1) [1] 1 2 3 4 5 5 4 3 2 1 > min(1:10. storage.inf and which.na is the indices of the result of is. If x has length one million and contains ten missing values. For example: % S < myfile.4. 10:1) [1] 1 S Poetry c 1998 Patrick J.C or . If you expect an object to be large and to have few missing values. This looks to see if it is one of the types of object that was in common use before objectorientation came along. "array" for arrays that are not matrices.5.na is one million while the result of which.na instead of is. The possibilities are "matrix". plus categories yield the class they would have if they were factors. "ts" for time series. then the length of the result of is. Because restore starts up a new session. In S-PLUS there are functions called which. source acts as if the contents of the ﬁle were typed to S. Essentially. Compare: > pmin(1:10. These are like is. except that they return a vector of numeric indices rather than a logical vector. etcetera. Storage mode is generally only of interest when using . There are times when you want a vectorized form that gives the maximum or minimum for each location among a number of vectors—the pmax and pmin functions do this. Yet another way to read a dump ﬁle into S is to redirect the ﬁle while you are in Unix.class does the same if the object has a "class" attribute.0 . So restore always puts the objects into the working database for the current directory.na that are TRUE.Fortran.na is only ten long. then using which.mode is one of "double". Burns v1. The result of which.q # Unix command This also puts the objects into the working database for the current directory. while source puts the objects into the current working database. The class function returns the value of the "class" attribute of an object. In that case storage.na.nan. but restore starts up a new session of S and reads the ﬁle. The max and min functions each return a single number that is the maximum or minimum of all of the numbers in all of the arguments. unless the mode of the object is numeric. there are going to be implications for memory usage. SYNONYMS 97 diﬀerence that can be important.

The following fragment should give an indication of when this might occur: for(i in 1:n) get(paste("data". hence one use for exists). It is not so uncommon to have a bug because max is used where pmax is needed. These are especially hard to discover and track down.emacs(get("[<-. as its name implies. A few pointers are given on placing calls to stop and warning on page 33. Using as. sep="")) It is used when the name does not parse to a valid object name: get("dim<-") get("%*%") get("valid_name_in_other_language") "[<-. 4. But get has several uses. warning is used within functions to create a warning. Most of the time get is unnecessary because just giving the name of the object gets it.mathgraph")) S Poetry c 1998 Patrick J. This is useful to avoid an error when the object does not exist. the length of the result of deparse is usually 1 (but not guaranteed to be). The deparse and as. i.6 Linguistics Use stop to create an error condition in a function. If the problem is not serious enough to warrant an error but is still worth mentioning. Arguments can be used to direct the search.character where deparse is needed can cause some subtle bugs. as. then use warning. It is used when what you have is a character string containing the object name.mathgraph" <. VOCABULARY DANGER.load function on page 24. deparse.character functions both turn non-character objects into something of mode character.0 . warnings is used to look at the warnings that were created if there were a lot of them. The exists function takes a character string containing the name of an object and returns a logical value that states whether or not the object was found. Burns v1. The get function goes further than exists and returns the object if it is found (an error results if it isn’t found. An example is the poet. Do not confuse warning with warnings. rather than the name itself.character thinks in terms of vectors and always returns an object that has the same length as the input.dyn. A little extra thought whenever you use max or min can be a good thing.98 CHAPTER 4. returns a character string that can be parsed by S.

i. it is a synonym for <-). i. sep="") <. immediate=T) This says that you want x but you only care about it for this one instance so it should not be cached for future reference.data/x") # probably not what you want get("x".").0 . The assign function is in a sense the opposite of get.1:i # THIS WON’T WORK The assign function is necessary in this case.data") # more likely right You can sometimes save some memory with a statement like: get("x". sep=""))) You can use quotes around a name that is being assigned to: "%myop%" <. sep=""). and is used for many of the same reasons. DANGER.6. and you want one of the ones that is not ﬁrst: get("x". i. even if the name is not a valid S name.4. Use it when what you have is a character string of the name rather than the name itself. You can not have a function on the left of an assignment operator that returns a character string of the names: paste("data". where=4) It is used when the object is in a database that is not currently on the search list. for(i in 1:n) assign(paste("data". assign(paste("data". LINGUISTICS 99 It is used when there is more than one instance of an object name on the search list. i. or that at least is not guaranteed to be on it. sep=". The cache that we are avoiding here starts when a statement is evaluated and is destroyed again when we get back to the S prompt—it is not the same cache that the keep option controls.fjj So assign is not necessary here. but there are times when you must use it. Do not include any of the path for the database there: get("/usr/me/good. Burns v1. The ﬁrst argument needs to be a string that is the name of the object as S knows it. scan(paste("file". You can use it whenever an assignment is made (that is. where="/usr/me/good. DANGER. 1:i) S Poetry c 1998 Patrick J.

where=1) > x $a: [1] 1 > get("x$new") [1] 3 You need to assign the entire object in one go: > assign("x".100 CHAPTER 4. where=1. An example is: assign("jj". DANGER. this is generally a database later in the search list: assign("jj". 3) does not do anything to a component of an object named x.X". VOCABULARY A common use of assign is to assign an object to a location other than the usual location. c(x. A statement like assign("x$new". immediate=T) This form of statement can be very useful. but it is a deﬁnite violation of the no side eﬀects rule—use this sparingly and with good reason. where=4) Within a function. Rather it creates an object with the unfriendly name x$new: > x $a: [1] 1 > assign("x$new". where=1) > x $a: [1] 1 $new: [1] 3 S Poetry c 1998 Patrick J. x. this is usually assignments to frame 1 or database 0: assign("fjJ. 3. jj. frame=1) Within a function you can also force the assignment to be written to disk immediately rather than only being written as you get the S prompt back. new=3).0 . From the prompt. jj. Burns v1.

and the function returns its value (and performs its side eﬀects). of course.exit(par(old. then old.exit the graphics parameter will be permanently changed if an error (or interrupt) occurs between the time that the parameter is changed and it is changed back. then it would be possible (though highly unlikely) that the parameter get permanently changed.0 . and deleting all of the on. and then make sure that the parameter is returned to its previous value when the function quits. If we had the on. In this case. but no matter since the graphics parameter wasn’t changed anyway. The purpose of these two lines is to set a graphics parameter to a certain value.6. without on.filename)) # the file is created by some command S Poetry c 1998 Patrick J. This removes a ﬁle—typically a ﬁle that was created within the function calling unlink.exit. This allows you to perform actions as the function exits—often some cleanup—without getting in the way of the return value. we could almost get by without on. think that I must have these commands in reverse order.par)) old. Let’s analyze what happens with our on.exit(unlink(the.exit call second. A characteristic use is: the. The interlude function listed on page 220 creates functions that use on.par <. Oh build your ship of death. LINGUISTICS 101 Often you want to create a local version of the object and then use assign.exit command ﬁrst.exit.filename <.exit is unlink. Fancier use of on. If the function exits normally. Burns v1. your little ark 16 Here is a typical example: on. If an error occurs during the call to par. and it’s not going to break—I’ll get to that. If an error occurs after the call to par. then the on.exit expression will fail (you’ll get an “Error during wrapup” message).tempfile("Spfun") on. However. But we do want the on. and then return whatever it is that we want (in other cases there is no way to do the necessary cleanup and get the proper return value).exit by resetting the graphics parameter after we do all of the graphics.exit example.exit expression is used to change it back. A common function to be used in on.exit statements. An unusual (but very useful) function is on. on.4.par doesn’t exist so the on.exit makes sure that the action is performed whether the function exits normally or exits due to an error.exit includes adding to the actions to be performed.par(cex=2) You. then the parameter is changed. changed back.

This is very useful for telling you what the object represents. A trivial example is: > fjjcopy function(x) list(name = deparse(substitute(x)). but at this point you can just think of this as the magic potion that performs this operation. Burns v1. Many objects have a component or an attribute named call which is an image of the command that created the object. The call component or attribute is generally created with the match.x) $name: [1] "freeny.x" $sum: S Poetry c 1998 Patrick J.411 > fjjcopy(1:8) $name: [1] "1:8" $sum: [1] 36 We will learn more about substitute later (page 324).call()) > fjjcopy(freeny. It might be used like: if(inherits(x. > fjjcopy function(x) list(name = deparse(substitute(x)). The sccs function on page 49 uses this.call function. call = match.. VOCABULARY inherits in its most common use returns a number (that can be interpreted as a logical) indicating if a certain string is one of the elements of the object’s class.102 CHAPTER 4. sum = sum(x).0 .x) $name: [1] "freeny. "factor")) .x" $sum: [1] 1282. sum = sum(x)) > fjjcopy(freeny. More involved use is also possible.. Labeling plots is a common example. Use the idiom deparse(substitute(x)) to create a character string of the name that was used for x in the call to a function.

as. use do. as does c. ")) [1] "1. The synchronize function is used to ensure that hash tables associated with a database are up-to-date. but rather a language object. sep=". Burns v1.mathgraph on page 308. The ﬁrst is the hash table of names of objects within the database. 10. c(jj. In such a case. 7" The genopt function on page 331 shows do. then there is always the eval-parse-text idiom. As assignments are made in an S session. There are two hash tables involved. Note that the result of match. 3)) > do. This is only a concern if more than one S session is using the same database and at least one of them is writing to it.411 $call: fjjcopy(x = freeny. LINGUISTICS [1] 1282.6. Session 1 creates a new object: S1> jjnew <. If you know how to perform an operation but there doesn’t seem to be a way to directly do the command with what you have.1:10 S1> jjnew [1] 1 2 3 4 5 6 7 8 9 10 Session 2 doesn’t see the new object until after the call to synchronize: S2> jjnew Error: Object "jjnew" not found S Poetry c 1998 Patrick J.0 . Below we have two S sessions running and both using the same working database. but the session doesn’t know about changes that are made behind its back. A dumb example is ans <."(x)".call("paste".".sep=""))) The fjjcheckratop function on page 254 and bind.x) 103 Let me lobby you to use this convention in your functions. or that is the easiest thing to get. this hash table is revised. And I plucked a hollow reed 17 Sometimes you have a list of the arguments to give to a function.eval(parse(text=paste("fjjmain.call in a more realistic setting.call: > jj <.4.list(sample(10.array on page 289 provide further examples.call is not a character string. method.

In this example both sessions start with jj being 1 through 10. S1> jj [1] 1 2 3 4 5 S1> jj[5] <. Generally only functions are put into this hash table (but see page 47). there is no need for synchronize.104 S2> synchronize(1) NULL S2> jjnew [1] 1 2 3 4 5 CHAPTER 4. VOCABULARY 6 7 8 9 10 The other hash table that can get out of date is the hash table of objects (rather than object names) that is controlled by the keep option. it doesn’t see the revised version that is in the database until it calls synchronize: S2> fjj function(x) 1/x S2> synchronize(1) NULL S2> fjj function(x) 2/x^2 synchronize can also be used without an argument.function(x) 2/x^2 S1> fjj function(x) 2/x^2 Because session 2 has fjj in its object hash table. But it is diﬀerent if both sessions have used function fjj and the ﬁrst session changes the deﬁnition: S1> fjj function(x) 1/x S1> fjj <. Burns v1.0 .55555 S1> jj [1] 1 2 [9] 9 10 S2> jj [1] [9] 6 7 8 9 10 3 4 55555 6 7 8 1 9 2 10 3 4 55555 6 7 8 The second session sees the change in jj. Suppose that you have a for loop that takes a long time to perform each iteration like: S Poetry c 1998 Patrick J.

1. 1) switch(ans. 10). You can force lcf. type = "n".substring(readline().long.0 . For more on this topic. q = . The readline function captures a line of text. Q = break. 50. xlab = "") polygon(sample(10. Y = { plot(1. xlim = c(0.ans[[i]] <. ylim = c(0.6. This is generally used in functions that interact with the user. y = . replace = T)) } .long. y[[i]]) } 105 As this stands.ans[[i]] <.function(x[i]. sample( 10. S Poetry c 1998 Patrick J. or you may want to have the results as they get computed.complicated. y[[i]]) synchronize() } In this case. ylab = "". replace = T).ans (which we are assuming resides in the working database) will not be changed (written to disk) until the entire loop is ﬁnished. axes = F. Here is an example: "fjjartshow"<function() { cat("type ’y’ to get picture ’n’ to quit\n") repeat { cat("Do picture? ") ans <. LINGUISTICS for(i in seq(along = lcf. it would have been better to use assign with immediate=T.complicated. N = .ans) { lcf. n = .ans to be updated at each iteration with: for(i in seq(along = lcf. 10).ans) { lcf.function(x[i]. but if lots of things change in the loop. You may be concerned that the machine will crash before the end of the loop. lcf. Burns v1. see the chapter on large computations starting on page 355. 50. synchronize becomes an easier solution.4.

sieve = 10) { repeat { switch(menu(c("(another) plot". help("polygon")) } invisible() } menu takes a character vector of choices and optionally a title to be printed above each menu. But in the case of fjjartshow the input is neither S language nor arbitrary text. corners. but merely a simple choice. That is. there are some undocumented features.0 . sample(sieve. title = "type 0 to exit") + 1. and the help ﬁle is not chosen at all: > fjjartshow2() S Poetry c 1998 Patrick J. Here is a revised artshow: "fjjartshow2"<function(corners = 50. replace = T)) } . sieve). ylim = c(0. There should be a balance between the users’ current convenience and the programmer’s future options. VOCABULARY cat("Say what?")) } } CODE NOTE. break. In the instructions for using the function there appear to be only two valid responses. sieve). yet the function actually allows more. Here is a session where just one plot is created. The user will be shown a numbered menu of the choices and prompted to select one. xlab = "") polygon(sample(sieve. axes = F. Undocumented features are ﬁne as long as you are not going to want to change them. The menu function is the proper tool for this case. { plot(1. xlim = c(0. Burns v1. "help on polygon"). type = "n". It is conventional to have 0 mean “exit” which is what this function does. corners.106 CHAPTER 4. ylab = "". replace = T ). If the text that is to be read is S language. then it is better to use parse to capture the text than to use readline.

) DANGER. Many (but not all) of the functions described in the rest of this section are S-PLUS functions.0 . 4)$integral [1] 0. qr performs a QR decomposition.7 Numerics There are several functions for matrix decompositions. Inﬁnite limits are allowed.. The function that you give to integrate must be vectorized. > integrate(function(x) pmin(x.tol.. while choleski returns a list that includes the condition number and the pivoting (reordering for numerical accuracy).4. 2).7. lower. (The line. . and see page 219 for an explanation of why that works.. Choleski decompositions are computed with either chol or choleski. often return something invisibly. this is the decomposition on which the least squares regressions are based.5736171 Warning messages: subdivision limit reached in: integrate. The invisible function does this. 1.f(f. Note. you can get wrong answers. rel. 1. that invisibility is a trickier concept than you might think—see how the soptions function on page 45 does it. You can numerically integrate an S function with integrate. If it isn’t. Burns v1. NUMERICS type 0 to exit 1: (another) plot 2: help on polygon Selection: 1 type 0 to exit 1: (another) plot 2: help on polygon Selection: 0 > 107 Periodically you want the return value of a function not to be printed automatically. 4)$integral [1] 5. for example. chol merely returns the triangular matrix. Graphics functions. subdivisions. and eigen does eigen decompositions. 4.499997 S Poetry c 1998 Patrick J.integral function on page 320 is an alternative with advantages and disadvantages. Singular value decompositions are done with svd. 2). though. upper. > integrate(function(x) min(x.

1. aux = . upper.(sin(x) * y)/x^2 But you can get an actual function from deriv: > deriv(expression(sin(x) * y/x). lower.tol.value). list( NULL.(sin(x)) * y . and s is an additional argument to the function. lower. "x")) .value <.grad[. 2: integrate(function(x. 3.sin(x)/x^2 > D(expression(sin(x) * y/x). has the more human face: > D(expression(sin(x)/x). DANGER. function(x. What is happening is that the s that we pass in is being used as the value of the subdivisions argument to integrate. Symbolic derivatives of some simple functions are given by deriv or D.(.expr2/( S Poetry c 1998 Patrick J. The integrate function doesn’t follow the capitalization convention on its arguments that is discussed on page 27 so it is possible that the extra arguments to your function can collide with arguments of integrate.108 CHAPTER 4. "x") (cos(x) * y)/x . 1)) 3: integrate. y) NULL) function(x. VOCABULARY Don’t count on there being a warning message when you get it wrong..expr2 <. s) 1: The call fails. s) x*s.. f.). upper.grad <.. Burns v1. rel.(((cos(x)) * y)/x) . lower. Here is an example where we are integrating with respect to x. and the traceback doesn’t enlighten us at all. y) { . c(length(.0 D .f(f.expr2/x . > integrate(function(x. subdivisions..array(0. 1). "x". "x") cos(x)/x . s=20) Error in qf15(f. "x"] <. aux = optargs(list(. upper.: singularity encountered Dumped > traceback() Message: singularity encountered 5: stop("singularity encountered") 4: qf15(f.

fft accepts arrays as well as vectors. In addition to the statistical operation of spectral analysis. This ﬁnds prime numbers. They are both based on PORT routines. The formula is applied to each row of the data frame. S Poetry c 1998 Patrick J. To ﬁnd some root of a non-polynomial univariate function. It takes a formula and a data frame. ivp. napsack and stepfun.0 .value. kronecker. They each have their little quirks. but the only essential diﬀerence between them is that nlminb allows box constraints to be put on the parameters while nlmin does not. Use polyroot to ﬁnd the roots of a polynomial.value } 109 This returns a vector of the expression which has a gradient attribute containing the derivative. Burns v1.7. use uniroot. The primes function may be found in the progexam library to S-PLUS.. The fft function performs a fast Fourier transform.grad . More commonly useful is the interp function that performs interpolation to a regular two-dimensional grid given three vectors.4. The S-PLUS Programmer’s Manual shows a factors function. The portoptgen function (see starting at page 344) provides an alternative. but it doesn’t appear in the library. optimize ﬁnds a local minimum of a univariate function.ab (for diﬀerential equations). this can be used for convolutions. In particular. so it performs higher dimensional transforms. peaks. The ms function provides optimization from a diﬀerent perspective. chull (for two-dimensional convex hulls). and the result is summed across rows. The approx function performs linear interpolation given two vectors. Both nlminb and nlmin minimize general nonlinear functions. and the degree can be as high as 48. contour and image. The coeﬃcients may be complex as well as numeric. this is often used to create output that is suitable as input to the graphics functions persp. "gradient") <. spline. NUMERICS x^2)) attr(. It seeks a minimum of that sum. Other numerical functionality is performed by vecnorm.

> fjjran1 function(n = 3) { # DO NOT DO THIS .Random.Random.Random. The random functions look for the ﬁrst value of .7764676 > fjjran1() [1] 0.2946985 0.6019266 [1] 0.5730458 0.seed <. or anywhere else that is not a database on the search list.5157684 0.9350564 0.seed even though new versions are being written to the S Poetry c 1998 Patrick J.7920594 0.7764676 [1] 0. As far as I know.seed1 <.7920594 0.4174643 DANGER.2946985 0.Random..Random.93036326 0.Random.. VOCABULARY 4.seed on the working database.Random.seed for(i in 1:n) { print(runif(3)) } invisible() } > fjjran1() [1] 0.7920594 0. When a random function is done.Random.seed <. So while we stay inside fjjran1 we only see one version of . but it is not a wise idea to assign anything to .seed that they see.94985548 > .seed that was not previously a random seed.7764676 [1] 0. it updates .5981965 0.8 Randomness Pseudo-random functions within S use a single random number generator.5701077 0.5157684 0.0 .6019266 [1] 0.Random. Burns v1.4174643 > runif(4) [1] 0.110 CHAPTER 4.5981965 0.5730458 0. Do not assign .jjsave.5157684 0. If you want to reproduce results of random functions.9350564 0.08411177 0. then you need to save the seed and restore it just before the call that is to reproduce the results: > jjsave. which is a vector of 12 small integers. which in this case is the value set in fjjran1. this is robust.seed within a function.Random.93932162 0.seed1 > runif(4) [1] 0.6019266 Here’s the explanation for what is happening.5730458 0.seed. The seed for the generator is stored in .seed > runif(4) [1] 0.9350564 0.5701077 0.

sd = jjsd) S Poetry c 1998 Patrick J.seed) > [1] 21 14 49 16 62 1 32 22 36 23 28 3 3 The example above shows that set.seed.Random.Random.5524101 Warning messages: One or more nonpositive parameters in: rnorm(3. 3.0 . qunif. RANDOMNESS 111 working database. sd=jjsd) # bad news [1] 0. This is the proper behavior—you wouldn’t want the seed to be changed when an error occurs. When a function that uses randomness is stopped on an error or interrupted. and end with a code for the distribution. > set. and sets the random seed to a certain value based on the integer given.4) > rnorm(3. Examples are runif. then you couldn’t be sure that your ﬁx to the function eliminated the bug—it could be that the new set of random numbers doesn’t exhibit the bug. dunif. > jjsd <. the probability functions give the probability of being less than or equal to the value given (probability distribution function).seed returns the same answer for arguments that are the same modulo 1024. mean = 1. print(. ‘d’ is for density.4. These functions are vectorized. print(. 0.seed) > [1] 21 14 49 16 62 1 32 22 36 23 28 > set. and ‘q’ is for quantile. ‘q’. ‘d’. If the seed did change. mean=1.Random. and the quantile functions return the value where the distribution function achieves the probability given. S contains functions for a number of probability distributions. ‘p’ is for probability.seed(1023).seed(-1).c(1. it sees the value of . ‘p’.4. the random seed will not be updated. You need to be careful when you get to the boundary of parameter spaces. pnorm. The random functions generate numbers from the distribution. qnorm.7728073 NA 0.seed that was updated on the last loop of the previous call to fjjran1. punif. and rnorm.8. Burns v1. This takes an integer that should be between 1 and 1024. DANGER. The ‘r’ stands for random. When the second call to fjjran1 starts. The function names begin with one of the four letters ‘r’. Usually there are four functions for each distribution. The technical reason for this is that assignments are not committed until the execution of the function is ﬁnished. dnorm. Another way of ensuring reproducibility is to use set.

574893 > rnorm(3) * jjsd + 1 # right [1] -0.5578254 > ppois(round(jj2).eps) > jj2 [1] 2 2 > ppois(jj2.8088468 The sample function is a random function of a diﬀerent sort.8088468 0. (A single integer stands for the integers up to it from 1.) > sample(names(jjwt). 1.91233952 DANGER. > sample(names(jjwt). prob=1:4) S Poetry c 1998 Patrick J. mean=1) * jjsd # wrong [1] 2.0 . rep=T.Machine$double. In its simplest use. 2 . VOCABULARY > rnorm(3.c(2. 10. sample will normalize it.112 CHAPTER 4.00000000 3.02761199 1.) > names(jjwt) [1] "dorothy" "harold" "munchkin" "stevie" > sample(names(jjwt)) [1] "harold" "munchkin" "dorothy" "stevie" > sample(10) [1] 7 2 5 10 3 9 1 6 8 4 An important argument is replace which tells whether or not items are to be replaced after they have been selected. (I remember that the default value of replace is FALSE since a permutation results when replace is missing. 10. rep=T) [1] "harold" "stevie" "harold" [5] "dorothy" "stevie" "dorothy" [9] "munchkin" "munchkin" "harold" "munchkin" There is also a probability argument that takes the relative probabilities for each item in the population.000239 0. rounding is often a good idea. Discrete distributions can be tricky. This need not sum to 1. 1.000000 6. Burns v1.5) [1] 0..8088468 0. > jj2 <.5) [1] 0. it returns a permutation of its argument.

tau and scale. which humans are not very good at doing.test.10 Statistics Functions to do simple summary statistics are mean.gof and ks. mtext. qqplot. Use dot charts or bar plots instead. Burns v1. General purpose graphics functions include barplot. plus a little more. and friedman.test. Pie charts are out of favor with statistical graphics experts— they essentially require us to estimate angles. quantile. kruskal. cor. wilcox. Among the most useful low-level graphics functions are title. points.test. Functions in S-PLUS that perform hypothesis tests are t.0 . symbols and pie. legend. segments. Some functions of a more statistical nature are boxplot. pairs. polygon. lines.test. abline.test. mantelhaen. and the S-PLUS function biplot. The trellis package is a relatively recent addition to the graphical capabilities in S. scale. locator will give the coordinates to locations that you select. axis.wt for covariance estimation with observation weights.9.test.test. var. The default method does what I expect you would think. var. dotchart. Two functions that are available on many graphics devices for interacting with plots are identify and locator.mve. prop. 4. matplot. tsplot. It is generic and there are numerous methods for it.test. box and frame. Plots of three variables are done with persp. contour and the S-PLUS function image. mcnemar. plot is the most conspicuous graphics function. text. median. The S-PLUS functions location.9 Graphics Here is another part of my cursory look at graphics in S. The trellis function wireframe is similar to persp.a perform robust estimation.4. rug.test. GRAPHICS [1] "stevie" "stevie" [5] "munchkin" "stevie" [9] "munchkin" "stevie" "stevie" "dorothy" "munchkin" "munchkin" 113 4.gof. S-PLUS also contains cov.m. S Poetry c 1998 Patrick J. arrows. chisq.test. cov. cor. identify writes something on the plot about the nearest datapoint to the place that you specify. hist. qqnorm. crosstabs is also an S-PLUS function. fisher. Goodness of ﬁt tests are performed with chisq. Primarily trellis is concerned with ways to plot data of several variables in a comprehensible manner.test.

or at least continuously distributed and symmetric about zero. Univariate ARIMA models are ﬁt with arima.fracdiff function ﬁts (long-memory) fractionally diﬀerenced ARIMA models.qr would be a good choice. Notice that one of the functions to ﬁt regression was lm as in “linear model”. A situation that I have in mind is if the function you are writing does iterative regressions. VOCABULARY There are two ways to perform ordinary least squares regression—either via lm which takes a formula. Functions for multivariate analysis include cmdscale for classical multidimensional scaling. then lm. If you want to use a function that is quick at performing regressions on data that is known to be good (no missing values and so on). Burns v1. Multivariate analysis within S-PLUS also includes principal components with princomp. (There is an anova function—which does mean analysis of variance—but this is not the function you want to use to ﬁt an ANOVA model. or with lsfit which is an older function. Second that the errors can come from a variety of distributions. Analysis of variance (ANOVA) is accomplished with aov. Robust regression may be done via l1fit which performs least absolute deviation regression. The glm function ﬁts generalized linear models.fit. or with rreg which does M-estimates of regression. furthermore. Highbreakdown regression is done with the ltsreg (least trimmed squares regression) S-PLUS function. In addition. First. that function is called the link function. factor analysis with factanal and multivariate analysis of variance with manova.mle and its friends. S Poetry c 1998 Patrick J. that the expected response is some speciﬁed function of a linear combination of the explanatory variables. and cancor for canonical correlation. Generalized linear models are generalized in two ways. Logistic regression ﬁts within the scheme of generalized linear models. such as binomial.0 . Yet another area where S-PLUS has added a lot of functionality is in time series.fit function of S-PLUS. If you want to ﬁt a least squares model with the coeﬃcients constrained to be positive. The arima. and spectrum performs spectral analysis on univariate and multivariate series. the errors from this are distributed as a Gaussian. S-PLUS also has functions for making quality control charts.) Variance components can be estimated with the S-PLUS function varcomp. multivariate autocorrelation estimates (including cross-correlations) are available via acf and univariate robust seasonal decompositions are created with stl (vanilla S function). One description of regression is that the expected value of the response is some linear combination of the explanatory variables.114 CHAPTER 4. The lme and nlme functions (linear mixed eﬀects and nonlinear mixed eﬀects) provide related functionality in recent versions of S-PLUS. Survival analysis functions are available in S-PLUS and from statlib. ar ﬁts univariate and multivariate AR models. then use the nnls. Poisson or gamma.

Each smooth function is a function of one variable only.2) (4. These take a linear combination of the explanatory variables ﬁrst and then does a smooth. loess. The S-PLUS function ppreg ﬁts projection pursuit regression models. The nls function performs nonlinear least squares.1) that can be transformed to a linear form by taking logarithms. Then split S Poetry c 1998 Patrick J. Directly along this line are generalized additive models which are ﬁt with gam. The ace and avas S-PLUS functions ﬁt models that are similar to generalized additive models.smooth. Explicit functions for smoothing include smooth.10. A generalized additive model is just like a generalized linear model except that we ﬁnd a linear combination of smooth functions of each explanatory variable. smooth. The object is to have the response within each of the two groups as alike as possible. except that they do not have a link function.4. This is diﬀerent from ordinary regression only because the parameters do not enter linearly in the model. Trees are decidedly non-smooth. Several of these terms may be added together to approximate the response. we get far aﬁeld with tree which ﬁts classiﬁcation and regression trees. STATISTICS 115 Smoothing is an important element of modern statistics.spline and the S-PLUS functions ksmooth and supsmu. Here is a nonlinear equation z = eax1 ebx2 ec (4. The following formula is inherently nonlinear: y = axb + c (4. Much of the rest of the statistical functionality that I’ll talk about are some combination of the idea of generalized linear models and smoothing. At ﬁrst there is only one node—the whole dataset.0 . but they allow the response to be transformed via a smooth as well as the explanatory variables. The basic operation in growing a tree is splitting a node. You split that in two. Burns v1. b and c. rather than a linear combination of the variables themselves. Finally. A tree is a regression tree if the response takes on continuous values. For example a linear equation would be y = ax1 + bx2 + c where the parameters are a. and it is a classiﬁcation tree if the response is categorical. loess does local (robust) regressions so that the response is approximated locally in each region of the space of the explanatory variables.3) Nonlinear regression with box constraints on the parameters may be done in S-PLUS with nlregb. which means to ﬁnd the best combination of the explanatory variable and the location in the explanatory variable to break the data into two.

intersect. On the other hand. or for other reasons are less visible than I think they should be. Continue splitting nodes until you don’t want to anymore. S-PLUS contains a group of functions that do set operations on atomic vectors.objects which is like find except that it ﬁnds object names that contain a certain string rather than matching the string exactly. you only need to consider one variable at a time: Number of feet greater than 3? Yes. then you know that there are no numbers much bigger than that. Another secretive S-PLUS function is find. you could use merge. It was created to be used for statistics that are printed out. S Poetry c 1998 Patrick J.116 CHAPTER 4.objects looks in all of the databases on the search list. if you had a factor representing species. 4.answer with perhaps some noise thrown in. It is a generalization of the row and col functions to higher dimensional arrays. If you see numbers that are on the order of rounding error.fjj(input)) Here we think that the result of fjj(input) should be the same as real. Their names are union.0 .) Trees have become popular because in some respects they are very easy. and setdiff. The slice.11 Under-Documented This section presents functions that don’t have help ﬁles. zapsmall is a little-known and deviously clever little function. The other diﬀerence is that find. you might see a bunch of zeros and a few non-zero numbers. and then to decide which splits to throw away—ineﬃcient computationally. Weight less than 10 kg? No. VOCABULARY each of the two nodes you just created. It changes numbers that are very close to zero relative to the largest number (in absolute value) in the vector.levels to make a new factor representing genera or families. For example. Then must be a corgi.objects is like using objects with the pattern argument except that find. The merge. Burns v1. When you have your tree and you want to know how to classify a new case. (The most common current practice is to create more than enough nodes. or didn’t have help for some portion of their life. Using find.answer .levels function allows you to combine levels in a factor. not just one. but more eﬃcient statistically.index S-PLUS function is a relatively recent addition. but it can also be used for informal testing as in: zapsmall(real.objects only takes a string—you may not pass the unquoted name to it.

For each character in a character vector. Can you make them reasonably eﬃcient? Is object-orientation useful? Is a complement function feasible? S Poetry c 1998 Patrick J.45"). This doesn’t seem to be intended for public use.mark On-line Information on Functions. util Earnings and Market/Book Ratio for Utilities util. The S-PLUS functions union and intersect use atomic vectors to represent sets. Create a suite of functions that perform set operations on lists. Include a symmetric diﬀerence.list does the same thing as. AsciiToInt returns the corresponding ASCII code.13 Things to Do Create a vectorized function that divides each string in a vector into its individual characters. Use order instead. + AsciiToInt("/"). DEPRECATED 117 If you are poor and use S-PLUS via modem without a windowing system. though factors are no angels as can be seen starting from page 130. Burns v1. . The result that AsciiToInt returns is one numeric vector. And your exasperating pit-pat 18 sort. Set theory implies that lists are the proper objects for sets. > match(AsciiToInt("/tmp/myomy/Sfun. but it probably won’t go away soon. It holds no beneﬁt and some deﬁcits relative to the object-oriented versions..4. 4. ordered and their relatives do. nomatch=0) [1] 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 4. but a little less than order. then you will not have the help window system available to you.mktbook Earnings and Market/Book Ratio for Utilities An S-PLUS function that can come in handy for messing with character data is AsciiToInt. so that you may be able to ﬁnd some functionality even if you don’t know the function name. > topic("mark") invisible Mark Function as Non-Printing Question..12 Deprecated The category function and its relatives do the same thing that factor.earn Earnings and Market/Book Ratio for Utilities util.12. category was written before object-orientation came along.0 . Objects. and the name can be a source of confusion. The topic function searches for phrases within the titles of help ﬁles.

Burns v1. Books that describe some of the more esoteric statistical methods include McCullagh and Nelder (1989) Generalized Linear Models.118 CHAPTER 4. Friedman. Olshen and Stone (1984) Classiﬁcation and Regression Trees.1] [.] 1 6 11 [2.dump produces. which comes with high praise.] 5 10 15 > apply(jjm.] 3 8 13 [4.4] [. Seber and Wild (1989) Nonlinear Regression. Chambers and Hastie (1992) Statistical Models in S describes the modeling functions that were produced at Bell Labs. Is this a good thing to do? Learn what each function in your version of S does. 1.2] [. Make it so that the statement paste("data". see Venables and Ripley (1997) Modern Applied Statistics with S-Plus. "+") [.14 Further Reading For more on statistical functionality in S and S-PLUS.1] [. Hack at it until you understand it.] 1 2 3 4 5 [2. Create a function that will read in double-precision numbers that are written in scientiﬁc notation with a “d” instead of an “e”. Breiman. Ripley (1996) Pattern Recognition and Neural Networks. i.] 2 7 12 [3.0 . What do you want it to do? Figure out why the following happens: > jjm [.] 11 12 13 14 15 Learn the format that data. Bates and Watts (1988) Nonlinear Regression Analysis and Its Applications.] 4 9 14 [5.1:i will work.5] [1. VOCABULARY Make zapsmall work for complex numbers.3] [. Many of the graphical ideas embodied in the trellis package are discussed in Cleveland (1993) Visualizing Data.2] [. # THIS WON’T WORK 4. sep="") <.3] [1. S Poetry c 1998 Patrick J.] 6 7 8 9 10 [3. Hastie and Tibshirani (1990) Generalized Additive Models. Spector (1994) An Introduction to S and S-Plus also surveys the statistical capabilities of S-PLUS. Beran (1994) Statistics for Long-Memory Processes.

15. Lawrence “The Ship of Death” 17 William Blake “Piping Down the Valleys Wild” 18 Stevie Smith “No Categories!” S Poetry c 1998 Patrick J. H. Perishing Republic” Ezra Pound “In a Station of the Metro” 15 Emily Dickinson 258 (There’s a certain Slant of light) 16 D.0 . QUOTATIONS 119 Reading help ﬁles and snooping through directories for undocumented objects are the best ways to increase your vocabulary. Of course for the undocumented (and under-documented) functions. you will need to use the ultimate documentation—the code itself.4. Burns v1. Additions to the section are often announced on S-news. 4. I’ll not survey its contents because some uncooperative person would go and add something good after I made the list.15 13 14 Quotations Robinson Jeﬀers “Shine. Further functionality may be found in the S section of statlib.

0 . Burns v1.120 S Poetry c 1998 Patrick J.

these functions think of a vector as an object without attributes.vector(jjvec) [1] T > is. 5. which is the explosive part. A better name would have been is.list(1:26) > names(jjlistn) <.vector(jjvecn) [1] F > jjlist <. The common notion of “vector” is an atomic vector. But wait. This covers only programming—problems with statistics and with graphics are not given.vector. a linguist.1 Identity Crisis Benjamin Wharf. DANGER.simple.Chapter 5 Choppy Water In which we learn to forecast the seas.letters 121 .) An “empty” gasoline container is actually full of vapors.letters > is. you also get inconsistency about names: > jjvec <.jjlistn <.jjvecn <. He presented the example that lots of ﬁres are started by “empty” gasoline cans. DANGER. Two of the most confusing functions are is. (He worked for an insurance company. However.1:26 > names(jjvecn) <.as.vector and as. had the theory that our view of the world is shaped by the words of our language.

matrix(x) actually is a useful idiom—it coerces a data frame to a matrix. These.double test. The deﬁnition of is. Whenever you are tempted to use one of these functions. it is easier for a list to be a vector than for a vector to be a vector.matrix(x)) x <. DANGER. you should note that an object may have attributes and still pass the is. you want to assign to the mode or storage mode: storage.vector."double" # saves attributes DANGER.vector(jjlist) [1] T > is.xxx(x) is TRUE.xxx(x) is not guaranteed to be the same as x when is.array has similar behavior. you should deﬁnitely ponder if this is the right thing.as.0 . strip attributes oﬀ of the object as well as coercing the contents. Data frames ﬁt this deﬁnition. DANGER.vector and is. is. S Poetry c 1998 Patrick J. However. Burns v1. I’ve said it elsewhere.matrix can also be a source of confusion. but it is worth saying again that using drop=F when subscripting matrices and arrays in function deﬁnitions often prevents bugs. like as.mode(xmat) <.vector(jjlistn) [1] T CHAPTER 5.character.122 > is. is. DANGER. The following nonsense if(is. DANGER.vector always agree with each other. It is generally the case that as. Related to this are functions like as. To preserve the attributes.matrix is an object with a dim of length 2. You’ll be glad to hear that as. CHOPPY WATER So to paraphrase.double and as.

0 .] 13 13 S thinks of this as a certain number of positions being ﬁlled in jjm6 (arranged in the usual column-major order).matrix(1:6.2] [1. because S has a diﬀerent plan. and replicates the values on the right-hand side of the assignment to be the right length. then it is as if all of the other arguments are coerced to be lists with as.2) > jjm6 [. > jjm6 [. If one of the arguments is a list.3) > jjm4 <. DANGER.2] [1. b=5:7) [[1]]: [1] "fjd" [[2]]: [1] 3 4 $b1: [1] 5 $b2: [1] 6 S Poetry c 1998 Patrick J.T).1.1] [.] 2 5 [3. IDENTITY CRISIS > jjm6 <.] 123 Here is our plan with the command above: that each of the rows marked TRUE in jjm6 will end up being the same as the ﬁrst row of jjm4.] 3 6 > jjm4 [.5.jjm4[1.list.] <.3:4).10 + matrix(1:4.] 11 11 [2.] 2 5 [3.] 11 13 [2.F. You can use the c function to combine lists as well as atomic vectors.1] [.] 1 4 [2. It’s disappointment time. > c(list("fjd".] 12 14 > jjm6[c(T.1] [. Burns v1.2] [1.

DANGER. and S still retains that feature.1 [1] 19 > as. When going from a ﬂoating point number to an integer.value) [1] 18 S Poetry c 1998 Patrick J. and even rarer that they are useful. but sometimes fails. The only problem with them is our tendency to think that they should be exactly the same as their non-array counterparts.. In the slow. b=list(5:7)) [[1]]: [1] "fjd" [[2]]: [1] 3 4 $b: [1] 5 6 7 CHAPTER 5. truncating is much faster than rounding. The following example may be machinedependent. then it doesn’t matter which is used. numbers are essentially truncated. They almost are. Whether the coercion is through is. There is a little leeway—numbers that are very close to an integer are rounded up to the integer. S does an amazingly good job of it.Last. If the vector is only of length one. rather than creating the ﬁle.124 $b3: [1] 7 > c(list("fjd".integer.1)/.0 . Burns v1. it is the second of these that will be the proper formulation. > (2 .3:4). It is rare that these appear. tempfile is another Wharﬁan word—it should be called tempﬁlename to show that it merely returns the name of a temporary ﬁle. There are times when S creates 1-dimensional arrays—arrays where the dim has length 1. DANGER. DANGER. or an internal coercion. One way to get such a critter is to use t to transpose an object that is not an array. old days that was an important consideration. CHOPPY WATER In most cases. changing the storage mode to integer.integer(.

An object need not be “null” just because it has length zero. but the storage mode is double. and you type options(warning=2) # WRONG then you won’t get what you want because the name of the option is warn.integer is a test of the storage mode. integer. you need to say S Poetry c 1998 Patrick J. For instance if you want to turn warning messages into errors. DANGER. When you are changing an option. it is better to test if the length of the object is zero.null appears.0 . > storage. DANGER. then you have to get the name of the option precisely right. S hides the fact that numbers can be represented as double precision ﬂoating point. and so on.numeric. DANGER. it is good practice to round them directly. 4. IDENTITY CRISIS 125 Whenever you do computations to get numbers that are going to be coerced to integer. Most of the numbers that S creates are double precision. DANGER. On very rare occasions a number stored as an integer will act diﬀerently than if it had been stored as a double. Burns v1. not of the values in the object. 6) contains values that are logically integer. > is.null(numeric(0)) [1] F Often where is. The vector c(0. A complex vector does not test TRUE in is. The most common way to get integers is with the : operator.5.mode(1:10) [1] "integer" DANGER. Keep in mind that is.1.

in a sense it is “lead” rather than “lag”. .0 . So code like: if(!all..equal(as.na(longley.nan(0/0) [1] T NaN’s and NA’s are not distinguished by all. Furthermore. 0/0) [1] T > is. DANGER. DANGER.na(longley. The “not” operator can not be the ﬁrst character in a command because it will be interpreted as an escape to the operating system. > !is. . S Poetry c 1998 Patrick J.y) Badly placed ()’s > (!is.numeric(NA)) [1] F > is.numeric(NA).y)) 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 T T T T T T T T T T 1957 1958 1959 1960 1961 1962 T T T T T T The “Badly placed” message is a syntax error from Unix. given on page 45. Burns v1.nan(as. but if you are in a picky mood . guards against this problem. DANGER.126 options(warn=2) CHAPTER 5.equal function does not return a logical value when the objects are not equivalent. is not going to work properly.equal.equal(x. > all. it only changes the time points that the observations represent.. y)) . DANGER. This is quite picky. CHOPPY WATER The soptions function. The lag function does not change the actualy data. The all.

DANGER.na(longley.2 Oﬀ the Wall DANGER.y) 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 F F F F F F F F F F 1957 1958 1959 1960 1961 1962 F F F F F F DANGER. S Poetry c 1998 Patrick J. Do not modify the looping variable in a for loop. The problem is that the NA is replicated to the length of x and then the answer is that number of missing values.0 . DANGER. return. Probably the most common reserved words that you might want to use are F. S has a number of reserved words that you may not use as the name of an object.2.y == NA 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 NA NA NA NA NA NA NA NA NA NA 1957 1958 1959 1960 1961 1962 NA NA NA NA NA NA I’ve tried this myself a few times years after I knew better.na: > is. next.5. T. A natural thing to try when testing for missing values is: > longley. You can see an example of this problem on page 87. The proper way to get this done is to use is. If you type x<-2 and expect a logical vector containing the values of x that are less than -2. You will confuse at least yourself and possibly S also. and you have just destroyed x by assigning it the value 2. OFF THE WALL 127 5. As far as I know there is only one spot in S where a space makes a diﬀerence. you will be saddened to learn that you don’t get your logical vector. break. Burns v1.

the argument is evaluated and called x. though.. the double evaluation can be a bug—for instance. Burns v1. 10) This is another reminder that the word “vector” has more than one meaning in S. And what a congress of stinks! 19 DANGER. it re-evaluates the ﬁrst argument to get object. DANGER. in which case i is ﬁrst 1 and then 0. you say double(10) and you say integer(10) to create a vector of 10 integers. Now when we get to the method. except when the length of x is zero. The class says to use our method. you want corresponding arguments in the generic function and its methods to have the same name. If the method has object as the argument corresponding to x in the generic function. Let’s make it even worse. again. But list(10) does not create a list that is 10 long—it creates a list that has one component (that is the number 10). if randomness is involved. While we are on the subject. then use the following instead: for(i in seq(along = x)) . so that function becomes the operable one. When we call the generic function. Seldom what you want. real trouble develops. and the method also has x as an additional argument.. you need to use the circuitous statement: vector("list". To create a vector of doubles that is 10 long. At times.128 CHAPTER 5. At least for version 3. but it doesn’t evaluate x because it already knows what x is (though it is the wrong x as far as the method is concerned). CHOPPY WATER DANGER. The method needs object so it evaluates the ﬁrst argument. If there is a chance that the length of x will be zero. This is ﬁne. To initialize a list with length 10. DANGER.0 . A common code fragment goes like for(i in 1:length(x)) . another popular way to mess this up is to say: S Poetry c 1998 Patrick J. In the best circumstances this is merely ineﬃcient. Suppose that the ﬁrst argument to the generic function is x and the corresponding argument in one of the methods is object...

then the loading will occur each time that the function is called instead of once per session. Burns v1. DANGER. In particular. DANGER.. But when it is called hundreds of times. If you get it wrong.op) { ans <.C("myfun_Sp"))) dyn. b = { cat("doing b\n") 2 } S Poetry c 1998 Patrick J.C or symbol.loaded(symbol.2. make sure that you get the symbol right for is. "switch. and you can start paging or run out of memory all together.dump when making ASCII representations of objects that are not functions. then pieces of memory get eaten by the loaded code.load("myfun. The call component will be evaluated instead of remaining merely an image of what to evaluate.op.switch(this. Consider the following function. If you dump (using dump) an object that contains a call component or attribute.o") If you do this.For. then the restoration is not going to work. a = { cat("doing a\n") 1 } .. In version 3 of S-PLUS a common idiom to arrange for object code to be loaded automatically is something like: if(!is.loaded. 129 DANGER. OFF THE WALL for(i in length(x)) . This only does the last of the intended iterations.5. you want to spell the routine name correctly and you do not want to forget the symbol. your memory gets fragmented.0 . This isn’t so bad if it is only called a few times in a session.chartest"<function(this. This is another reason to use data.

> jj <. > jjtypen Harold Dorothy Munchkin Stevie "corgi" "corgi" "cat" "cat" > jjfac <.chartest("a") doing a [1] 1 > switch. and it exists at least in S-PLUS version 3.0 .chartest(" ") doing other [1] 3 > switch.chartest("b") doing b [1] 2 > switch. CHOPPY WATER cat("doing other\n") 3 } ) ans } > switch. Unfortunately. 5. this section is not likely to cover all of the problems.4 and earlier. Examples will use the object jjfac. Burns v1. { CHAPTER 5.switch.130 .chartest("") > jj > mode(jj) [1] "missing" And the nothing that it does is a little bizarre. When switch dispatches on an empty string.chartest Now give this a few values: > switch. then it does nothing instead of doing the default code.chartest("") It is ﬁne except when we give it an empty string. This is a bug.3 Factors There are a number of sticky bits with objects of class factor.factor(jjtypen) S Poetry c 1998 Patrick J.

3 : 1996 Obviiously. so are names.) DANGER.x : 1992 Though note that the names are printed here. SunOS 4. If factors used quotes. someone ﬁxed something.4 Release 1 for Silicon Graphics Iris.1 Release 1 for Sun SPARC. The ﬁrst thing that is strange is that the names are not printed even though they are there.3. Factors are printed as strings with no quotes around them. Here is reasonable behavior from a version of S-PLUS: > factor(factor(jjtypen. (This is stolen from a message to S-News by John Maindonald. IRIX 5. Now with a later version: > factor(factor(jjtypen. > jjfac [1] corgi corgi cat cat > unclass(jjfac) Harold Dorothy Munchkin Stevie 2 2 1 1 attr(. "levels"): [1] "cat" "corgi" > names(jjfac) [1] "Harold" "Dorothy" "Munchkin" "Stevie" This is a design decision that is sensible once you consider the alternatives.5. FACTORS 131 DANGER. DANGER. exclude="cat")) Harold Dorothy Munchkin Stevie corgi corgi NA NA > version Version 3. S Poetry c 1998 Patrick J. exclude="cat")) Harold Dorothy Munchkin Stevie NA NA NA NA Levels: [1] "1" > version Version 3. Burns v1.0 . So if the names were printed. it would be rather ambiguous at times which corresponded to names and which to the data. then they would be hard to distinguish from character data. unlike in other cases.

character(jjfac) [1] "corgi" "corgi" "cat" "cat" > paste(names(jjfac). as. Burns v1. Shera="cat".0 . DANGER. We need to explicitly coerce to character.) > paste(names(jjfac). (Z. Coercion from factor is not necessarily smooth and enlightened. > jjpet <. S Poetry c 1998 Patrick J.c(Sybil="cat". it is not what some of us would have in mind. There is not a factor method for c (at the time of writing at least). CHOPPY WATER "Munchkin" "Stevie" There is a labels argument to factor. jjpetfac) Harold Dorothy Munchkin Stevie Sybil Shera Fred 2 2 1 1 1 1 2 > unclass(. in this case it is the same as the names. "is a". Todd Taylor pointed this out to S-news. but this is not the same as the result of the labels function on the factor. The labels function is generic and is meant to give a name to each datum in the object.character(jjfac)) [1] "Harold is a corgi" "Dorothy is a corgi" [3] "Munchkin is a cat" "Stevie is a cat" Those involved are happier with this. "is a". Fred="chinchilla") > jjpetfac <.132 > labels(jjfac) [1] "Harold" "Dorothy" CHAPTER 5.value) Harold Dorothy Munchkin Stevie Sybil Shera Fred 2 2 1 1 1 1 2 We really want to take this more scenic route. > as. The labels argument tells how to make the levels attribute of the factor.Last.factor(jjpet) > jjpetfac [1] cat cat chinchilla > c(jjfac. DANGER. jjfac) [1] "Harold is a 2" "Dorothy is a 2" [3] "Munchkin is a 1" "Stevie is a 1" Though this answer may arguably be true.

and everything is ﬁne. > ordered(as. > jjo [1] 90 90 100 90 110 100 110 > ordered(jjo) [1] 90 90 100 90 110 100 110 90 < 100 < 110 jjo is a numeric vector. S Poetry c 1998 Patrick J.factor(jjtypen. The sorting is being done in dictionary order rather than numeric order. Burns v1. as. FACTORS > factor(c(as. "cat")) > unclass(jjfac2) Harold Dorothy Munchkin Stevie 1 1 2 2 attr(. However. The levels may be permuted because codes sorts them.character(jjo)) [1] 90 90 100 90 110 100 110 100 < 110 < 90 Now we get 90 being greater than 110.0 .character(jjpetfac))) [1] corgi corgi cat cat [5] cat cat chinchilla 133 DANGER. the same character vector doesn’t give the same answer. You want to pay a little attention when you are creating ordered factors.3.character(jjfac). "levels"): [1] "corgi" "cat" > codes(jjfac2) [1] 2 2 1 1 Here codes gives a diﬀerent code than what is actually in the object since the levels are not in alphabetic order. levels=c("corgi". Be careful when you use the codes function.5. We make an ordered factor with it. > jjfac2 <. DANGER.

0 .6 19 Quotations Theodore Roethke “Root Cellar” S Poetry c 1998 Patrick J. When a data frame is created.numeric is tempting.ordered(jjo) > jjof [1] 90 90 100 90 110 100 110 90 < 100 < 110 > as. A reason that many of the problems with factors come up so often is that data frames are excessively fond of factors—they favor factors over character data. 5.4 Things to Do Create a new class of objects for categorical data that has none of the problems that factors do. All sorts of trouble might ensue if it were changed so that it did work since the object underlying factors is a numeric vector. There is a special idiom for coercing an ordered factor that represents numbers into those numbers.numeric(as. 5.numeric(jjof) [1] 1 1 2 1 3 2 3 > as. DANGER. but not right. Burns v1. 5. Look through your functions for these trouble spots. often a character vector is coerced to be a factor.5 Further Reading The S-news list. > jjof <.character(jjof)) [1] 90 90 100 90 110 100 110 Just using as.134 DANGER.

You might think that this is slow because you have to back up and ﬁll in. then you can save yourself a few occurrences of errors due to missing braces by typing the braces before ﬁlling them in. If you get a syntax error. • The command is incomplete. there are three possible results: • There is a syntax error. but you save a lot of time by not having as many syntax errors. When S reads a line of your input (that is. It can also save you from getting the logic wrong. When a crop is so thin 20 When you write code in an editor. and by not needing to count braces as you type. there is really no alternative to puzzling over the line in question with the hint that S gives you. So at one point you have: if(length(x)) { } else { } then you go back and ﬁll in the empty braced expressions. Once a comand is complete. • The command is complete.Chapter 6 Debugging In which we learn to bail the boat. S will go on to evaluate the command.1 Masking Consider the two commands: 135 . parses it). 6.

Random. > masked() [1] ".)[[1]]) Warning messages: assigning "c" masks an object of the same name on database 5 The result of this is a diﬀerent warning message.] 4 5 6 [3.) cat(list(.. Now the c that we just created will be used instead of the in-built c function (and trouble will ensue).matrix(1:9. consider the following command: # DO NOT DO THIS > c <. Here is what is going on.seed" "c" We see our c function that is masking the real one (it would also appear if it weren’t a function). The S-PLUS function masked can be used to see what objects in the current directory have the same names as objects in the rest of the databases on the search list. A danger of masking functions becomes clear when we try to get rid of our version of the c function. a. (The names T and F are prohibited. In addition to c.. ignored one of mode "integer" in: print.3] [1. other likely names that are good to avoid are t. Since the c that we created is not a function.structure(x. quote = quote.Random.) In contrast. S is just being friendly when it issues the warning. 3) > t(c) [.] 7 8 9 Warning messages: Looking for object "c" of mode "function". but we have created another object named c that S sees ﬁrst. C and s.function(.. There is an in-built function named c that is used in the process of printing the transposed matrix. it continues searching until it ﬁnds a function named c.2] [. We can avoid such warnings by renaming our object. It can seem particularly mysterious when such warnings start appearing a long time after the assignment is made. . Unless you really mean to replace the in-built function.0 . and uses that.. it merely means that we have used a random function with this working database. rename your function immediately. but on the second we get a mysterious warning message.] 1 2 3 [2.seed is also masked—this is ﬁne.136 CHAPTER 6. S Poetry c 1998 Patrick J. prefix = prefix) Everything seems ﬁne at the ﬁrst command.1] [. DEBUGGING > c <. Burns v1.

Random. you can get “object not found” errors.calls is used. In order to use debugger except as a synonym for traceback. but you don’t understand why it is happening. it can be dump. DUMPING > rm(c) Error in remove(list): need names of objects to remove. you need to set the error option to dump. or dump. and other sorts of errors.seed" 137 When the bizarre occurs. It can be NULL.2 Dumping So what can go wrong? You can get a syntax error. The debugger function has the same sort of interface as browser.frames creates an object named last. When dump. 6. then last.frames like: options(error=dump.6. Each of dump. this is often enough information to pinpoint the problem. consider masking as a likely source of the problem. If traceback is not enough to comprehend the error. In version 3 there are basically three choices for this option.dump in the working database.2. but also all of the objects in the frame for each function call. Both traceback and debugger use last. traceback is essentially just a printing method to show the function calls. Almost all errors except syntax errors and system terminating result in the error action.frames is used.dump (by default). the default.dump contains the string of function calls that was in eﬀect at the time of the error. When dump.calls and dump. then S performs the action given by the error option.frames) or options(error=expression(dump. then debugger can be used. Do this with the command: S Poetry c 1998 Patrick J. I think of traceback as the ﬁrst thing to do after an error dump. Although very simple.calls. then you will need to recreate the error in order to look at objects with debugger. which is seldom useful. then last.0 .frames. This is when you should turn a warning into an error. When a dumpable error happens. not the objects themselves Dumped > remove("c") > masked() [1] ". This allows you to look at objects within speciﬁc frames as they were at the time of the error.frames())) If this was not done prior to the time of the error.dump contains not only the function calls. There are times when you get a warning that you know is an indication of trouble. Burns v1.

what is important is that you not assume code works unless you have evidence. I use browser a lot to test functions as I write them. is ﬂat 21 I have considered making a version of fix that automatically puts a call to browser in the deﬁnition of a new function.operator so that you can use those objects for testing.3 Exploring When you don’t have an error condition. Burns v1. If not.default function has a test in it to see if S is being used interactively. write some more. Write part of the function. but either something is wrong or you are not sure that everything is right. I suspect he was a formidable tennis opponent. This function allows you to look at the objects in the frame of a function as they exist at the time of the call to browser. 6. a BATCH job can be ruined if you leave a call to browser in one of the functions it uses. You can use traceback when you are at a browser prompt to see who called whom. make sure that the result so far is as expected. check that. they’re taking him to prison for the colour of his hair 22 When you are in browser. This seems a punishment more ﬁtting of the crime. then you need some means of exploring the computations. you can assign objects to the working database with the <<.0 .138 options(warn=2) CHAPTER 6. A very good function for this is browser. DANGER. Thus. DEBUGGING You can then recreate the warning. This is a boy that pats the ﬂoor to see if the world is there.default can be changed to something like the following: if(!interactive()) { warning("for interactive use") return(NULL) } This makes it so that browser is ignored when it is called in a non-interactive setting except that a warning is issued. The browser. (It is reputed that Charlie Chaplin didn’t care if he won a volley of tennis— the important bit was looking graceful while he played. which will now be an error. It is often natural at this point to interrupt (hit control-backslash) the browsing instead S Poetry c 1998 Patrick J. and use the same tools as with any other error. then an error occurs.) It’s not important that every line you write works right oﬀ. I ﬁnd this more productive (and fun) than trying to decipher errors from a function that I don’t check as I go. The top of browser.

You can interrupt the browsing. myarg2)) You can then step through the calculations and look at objects as you go. like trace. and it kills itself rather than risking the possibility of corrupting something permanently. Almost always a system termination is the result of a problem with a call to . See talk of synchronize on page 103. The problem is that the assignments are not committed. S Poetry c 1998 Patrick J. DANGER. trace creates modiﬁed versions of the functions being traced. Usually (but not necessarily) the termination occurs during the call to C or Fortran. creates modiﬁed versions of functions that are being inspected. but unlike trace the functions that inspect creates disappear right away so there is no chance of inadvertently getting the modiﬁed version. If you do.C or . then your variables won’t be there. This works in any language.Fortran that has been added locally. You give inspect a command to evaluate. inspect. A system termination means that there is a bug somewhere. there are a few things to check. You generally need to use the techniques of this section to discover where a “system terminating” happens. An advantage over browser is that the computations are not interrupted. so it is quite popular. Burns v1. This is discussed on page 179.6. Version 3. like: inspect(myfun(myarg1. Once you believe that you know the call that is causing the problem. This is reminiscent of debuggers (like dbx) for C and Fortran code. System termination occurs when S detects an inconsistency.0 . An example of a good use for trace is that you can trace all of the methods of a generic function. Another exploration technique is the old standby of putting print and cat statements in function deﬁnitions. You want to be careful when editing functions that you do not edit a traced version because you will end up with a bunch of garbage in the deﬁnition that you will need to get rid of later.3.2 of S-PLUS introduced the inspect function. EXPLORING 139 of going through the whole computation. The trace function allows you to get information each time a traced function is called. so that you can be sure that the functions being called are the ones that you expect. but you need to synchronize the database before you do. so you can see a lot of information with minimal eﬀort.

The obvious ﬁrst step was to look for local functions that contained . As you might expect.matrix(1:6. > dimnames(jjmat)[[1]] <. Burns v1. it took several iterations to pare away pieces of the function in order to narrow down the location of the problem.140 CHAPTER 6.1:2 > jjmat a b c 1 1 3 5 2 2 4 6 > rbind(jjmat. Here are some examples of actual debugging in S. The sharp-eyed will recognize what the problem is in the output of the next command. c("a". > jjmat <. jjmat) System terminating: bad address But in fact the behavior is signiﬁcantly diﬀerent as you can see.Fortran. 2) > dimnames(jjmat) <. I got a “system terminating” while using a fairly involved function. > dimnames(jjmat) [[1]]: [1] 1 2 [[2]]: [1] "a" "b" "c" S Poetry c 1998 Patrick J."c")) > jjmat a b c 1 1 3 5 2 2 4 6 > rbind(jjmat. they have been streamlined to avoid some of the dead ends. the dimnames bug The presentation of this example deﬁnitely starts toward the end of the process. jjmat) a b c 1 1 3 5 2 2 4 6 1 1 3 5 2 2 4 6 Now we do something that should change nothing at all. Here is a simple example where everything works. DEBUGGING 6. but there were none.4 Examples The best way to learn to debug is to see it done.C or ."b".0 .list(1:2.

0032299 Warning messages: Looking for object "print" of mode "function".) What we’ve learned is that the S Poetry c 1998 Patrick J. > jjpbx <. the print.matrix(data.0032299 > print(jjpbx.6. but you are on your own after that. One idea is to trace all of the functions that contain the word “print” in their name.0 .3230554 3 3 -1.7710749 2 2 1. EXAMPLES A broader hint is provided by this command: > mode(dimnames(jjmat)[[1]]) [1] "numeric" 141 The bug is that when a component of the dimnames of an array is assigned a numeric vector. Burns v1. digits=2) a b 1 1 -0. and not even a warning to force into an error. and when we try to print it out with only a few digits.matrix bug Here we have a matrix.0032299 There is no error.as. we see that the digits argument is not respected.3230554 3 3 -1.7710749 2 2 1. the coercion to character does not take place. The bug is insidious since not all operations are aﬀected by it. This example shows oﬀ none of the debugging tools in S. You can try using trace or inserting cat statements to identify just where the “system terminating” occurs.matrix(jjpbx. digits=2) On entry: print. digits = 2) a b 1 1 -0.7710749 2 2 1. b=rnorm(3))) > jjpbx a b 1 1 -0. > trace(find. So we need some other means to ferret out where the problem lies.4. ignored one of mode "{" (The warning message is indicating that there is a bug in trace that will give us another opportunity to practice debugging.objects("print")) > print(jjpbx. There aren’t any that are especially useful.frame(a=1:3.3230554 3 3 -1.

matrix function. numeric = if(missing(digits)) . if(length(clab)) { nd <. ignored one of mode "{" > print(unclass(jjpbx). 2) if(abbreviate. Below is the fragment of print.00 Warning messages: Looking for object "print" of mode "function".0 . collab = clab. complex = 2 * (if(missing( digits)) . Burns v1. character = max(nchar(x)) + 2.Options$ digits else digits). DEBUGGING print. digits=2) a b 1 1 -0.142 CHAPTER 6. The task is now to understand what is happening inside print. right = right) We place a call to browser just before the call to prmatrix.matrix function is used.abbreviate(clab. and that the bug disappears when we unclass the matrix.77 2 2 1.labels && max(nchar(clab )) > nd + 3) clab <.matrix. and then try printing to see what happens. nd + 3)) } prmatrix(x.Options$digits else digits. This is the key step in the debugging. ignored one of mode "{" > untrace() This is substantial progress—the bug has been isolated to the print. and it is the only function used that is currently being traced. Below we see that the object has class "matrix". rowlab = dn[[1]]. but was arrived at by lucky accident more than by the inevitable discovery of our debugging strategy. > class(jjpbx) [1] "matrix" Warning messages: Looking for object "print" of mode "function".32 3 3 -1. quote = quote.matrix that seems to be pertinent.switch(mode(x). S Poetry c 1998 Patrick J. c(nd.

00 attr(. eval. EXAMPLES 143 > print(jjpbx.default(nframe. quote = quote) from 3 3: print.structure(x. but still the function doesn’t work.default(jjpbx.frame. parent) from 4 4: browser.matrix. Now we try the same thing using print. eval. parent) from 6 6: browser. b(2)> digits [1] 2 b(2)> traceback() 5: eval(i. . . prefix = prefix) from 2 2: print.matrix(x.matrix(jjpbx.0032299 The digits object is as we expect. which seems wrong.frame. from 3 3: browser() from 2 2: print.default. quote = quote) b(4)> digits NULL b(4)> traceback() 7: eval(i. if(!is. Burns v1. from 5 5: browser() from 4 4: print. digits=2) Called from: print. a.32 3 3 -1. We solve the bug by adding the following if statement to the top of print.matrix. quote = quote. "class"): [1] "matrix" This time digits is NULL.4. digits = 2) from 1 1: from 1 b(4)> 0 a b 1 1 -0. message = paste("Called from:". digits=2) Called from: print.matrix(jjpbx.default(nframe.null(digits)) { S Poetry c 1998 Patrick J. digits = 2 .0 . but the function works.77 2 2 1.7710749 2 2 1. digits = 2) from 1 1: from 1 b(2)> 0 a b 1 1 -0.matrix(x.3230554 3 3 -1. A little thought and investigation shows that the problem is that digits needs to be fed to options. which is not happening when the computations go directly to print.6.default(jjpbx. > print. message = paste("Called from:".

144 CHAPTER 6. F. "S_put") cat("On entry: ") std. . and we check that the digits option really is put back to the default value. but by a diﬀerent name. S Poetry c 1998 Patrick J. > print(jjpbx.. My ﬁrst hypothesis was that there’s a problem tracing functions whose body is not wrapped in braces.options(digits = digits) } } Now we get what we want.print > trace("fjj") > fjj(fjj) function(x. but tracing the print function does not.77 2 2 1.32 3 3 -1. "S_put") } UseMethod("print") } This works ﬁne. where = 0).Internal(assign(".. DEBUGGING if((length(digits) != 1 || digits < 1) || digits > 20) warning("Bad value for digits") else { on. where = 0).) { if(. Burns v1. digits=2) a b 1 1 -0.00 > options("digits") $digits: [1] 7 the trace bug Here we investigate the problem of tracing print that we spotted in the last example. T. Tracing a function like that worked ﬁne so let’s try tracing the print function.trace() . > fjj <.Traceon) { .0 .Traceon".Internal(assign(".Traceon".exit(options(d)) d <.

S Poetry c 1998 Patrick J.Internal(assign(".Traceon) { . T.0 ..6. F.trace() .Internal(assign(". where = 0). where = 0). T.Traceon". "S_put") cat("On entry: ") std. where = 0).Traceon".) { if(.4. where = 0). "S_put") } } Warning messages: Looking for object "print" of mode "function". ignored one of mode "{" > untrace() 145 Now it is obviously the case that the name “print” is the culprit. which appears to be the only possible source of confusion. But our curiosity gets the better of us and we continue the investigation in order to understand how the print argument confuses the production of a print function.Internal(assign(". argument. EXAMPLES > trace("print") > print { if(. > trace("print") > print On entry: print(print) function(x.Traceon". . "S_put") } UseMethod("print") } > untrace() It is now possible to trace print without the extraneous warning. The trace function has an argument named print. F.trace() . Burns v1.Traceon". This has no eﬀect on the use of the function. A solution is to add a dot to the end of the print argument in trace so that it becomes the print..Traceon) { .Internal(assign(". "S_put") cat("On entry: ") std. So we are in the highly unlikely situation of conﬁdently producing a ﬁx for the problem while having only a superﬁcial understanding of how the problem arises.

mode = "function") d> step stopped in trace (frame 3). at: get(name.of.expression({ NULL NULL .get(name. at: n <. We continue to explore.. local = loc. d> eval name [1] "print" d> eval print [1] T d> eval what [1] "print" d> step stopped in trace (frame 3).length(fun) d> eval fun [1] T Here is the trouble. mode = "function") d> step stopped in trace (frame 3).146 CHAPTER 6. index = c(6. d> eval mode((print) Syntax error: end.args[1]. > inspect(trace("print")) entering function trace stopped in trace (frame 3). d> step Here I omit a number of step commands in order to get to the interesting part.0 . at: fun <.. DEBUGGING We use inspect to examine the computation (using the original deﬁnition of trace). The fun object is getting the print argument rather than the print function even though we are trying to restrict the get to functions. 3.file ("\n") used illegally at this point: mode((print) Calls at time of error: 7: 6: 5: 4: 3: error = function() from 6 parse(text = s) from 5 eval.it(str. 14.tracer(what = TR. 4)) from 3 trace("print") from 1 S Poetry c 1998 Patrick J. Burns v1. at: exprs <.GENERIC.frame) from 4 debug.

. Now we have to regard our ﬁx for trace as a workaround for the get bug. The function does indeed exhibit the problem. the problem is that there is a bug in get.. EXAMPLES 2: inspect(trace("print")) from 1 1: from 1 Dumping frames . Finally. "fjjget"<function(print = T.6. Dumped local frame (frame of error) is parse (frame 6) d> quit d> eval mode(print) [1] "logical" d> eval get("print". mode = "function") { print # this line to evaluate it ans <. > fjjget() [1] T > fjjget(mode="function") [1] T > fjjget(mode="logical") [1] T > fjjget(mode="numeric") [1] T > fjjget(print=print. We have now solved the mystery of how trace is going wrong—there is nothing wrong with trace. mode = mode) ans } The print argument needs to be used before the call to get in order for it not to be of mode argument. so inspect essentially calls itself on the error. the get bug We now try to zero in on the real problem with get. The ﬁrst step is to create a simple function that reproduces the behavior. mode="function") [1] T 147 The ﬁrst attempt to look at the mode of print resulted in an error due to a typo. . we conﬁrm that get is doing the wrong thing.) S Poetry c 1998 Patrick J. mode="numeric") function(x. so we tell inspect to quit in order to get back to the function that we do care about.0 .get("print". Burns v1.4. We don’t care about this error...

mode = "any". immediate = F) { code <. mode = .Internal(get(name.if(immediate) 20 else 0 if(missing(where)) { if(missing(frame)) . where). code + 1) } We try a new function that restricts the search to a frame to see what part of the code is broken. Burns v1.. mode. frame). "S_get". "S_get". frame = 2) ans } The following example shows that this works okay.default to see what we are dealing with. code + 2) } else . frame. DEBUGGING It appears to be the case that the mode argument is ignored entirely when the object exists in the current frame.0 . Let’s look at get. inherit). T. T. T. code) else . mode = "numeric") S Poetry c 1998 Patrick J. mode="numeric") Error in get. where = NULL.default("print".: Object "print" of mode numeric not found Dumped > debugger() Message: Object "print" of mode numeric not found 1: 2: fjjget2(print = print.Internal(get(name. > get.148 UseMethod("print") CHAPTER 6. It creates an error.get. inherit = F. mode. > fjjget2(print=print. mode = "function") { print # this line to evaluate it ans <.Internal(get(name. mode. > fjjget2 function(print = T. so we go into the debugger to verify the value of print. "S_get". mode = mode.default("print".default function(name.

0 . mode = mode. or the bug does not occur in which case you have proved there are more necessary conditions than are in the simpliﬁcation. the sillier the error. and resist those foul urges to ‘just change it until it works’ ”. STRATEGY 3: get. frame = 2) Selection: 2 Frame of fjjget2(print = print. mode = mode. When you do this.default("print".5 Strategy Remember to breathe. I once spent three full days ﬁguring out a bug in some Fortran. frame = 2) Selection: 0 NULL > 149 Our conclusion is that the mode argument to get. mode = "numeric") 3: get. when we thought there was a bug in get that was causing trouble in trace. we can’t do anything about it anyway.6. For nothing can be sole or whole That has not been rent. while the real bug is here where it looks right. mode = "numeric") d(2)> print function(x. . The primary thing that you want to do when debugging is to continually simplify the problem.5. remember the following rule. there can be one of two outcomes. 23 Bentley (1986) oﬀers these words of advise: “understand the code at all times.default("print".. For example. For example. We could continue to see just what goes wrong. This narrows down where the problem can be.default is ignored if the search is unrestricted and the object is found in the current frame. Burns v1. either the problem still occurs in which case you have a simpler problem to solve.) UseMethod("print") d(2)> 0 1: 2: fjjget2(print = print. 6. but since the function is internal.. The urges are foul because this tactic is likely to cause other bugs down the line. There are times of frustration when everything is right except the answer. we wrote a simple function that imitated the pertinent part of the situation. Capricious Rule 1 The harder a bug is to ﬁnd. When this happens to you. It turned out that the problem was merely that a number that had been scaled S Poetry c 1998 Patrick J.

3. You might try to give me an out by saying that the code wasn’t mine. State what operating system is in use. Ocassionally. Pentium. you may ﬁnd a bug in code that comes from somewhere else. direct way to reproduce the problem. • State as best you know the limits of the problem.0 . e. • State what is wrong. SunOS 4. A workaround can be a kludgy ﬁx to the problem function. and unfortunately for the trees) to spot mistakes on paper than it is on a computer screen. or only use objects that are easily obtained by the readers of the report. the more eﬀective your report. e. Burns v1.g.6 Bug Reports Most of the time when programming. and I couldn’t see it. The example should either be self-contained. switch to searching for the stupidest mistakes possible. give a traceback of the error. State the version of S.. then whatever software and hardware is involved with the graphics should also be identiﬁed. or a way to get the intended result while avoiding the problem. Sparc. DEBUGGING at the start was not scaled back to the original units. The easier you make it to ﬁx the bug. however..g. 6. When it is taking a long time to ﬁnd a bug. it may not be obvious to the reader. S Poetry c 1998 Patrick J. Once I realized that. Often it helps to make a hardcopy of the code— it can be easier (for some psychological reason beyond my understanding. Here are the ingredients of a good bug report: • Identify the software and hardware involved. e. Even if it seems obvious to you. and it is good practice to be more speciﬁc. • Give a simple.1. It would have been four days if it were my own because I would be even more sure that everything was right. Always include the general type of machine. S-PLUS 3..g. I felt like a fool because several of the symptoms pointed directly there—but it was too simple. “The problem seems to only occur when there is at least one missing value in argument blah of function blahblah. For example. If the problem involves graphics. you might say.3. You should then send a bug report to the responsible party. • Include any workarounds. Random numbers are suitable if you give the seed so that they are reproducible. Give this information even if you think it is superﬂuous— it may not be. • State what good behavior would be.” • If you think you know how to ﬁx the problem. • If the problem results in an error condition.150 CHAPTER 6. give your ﬁx. the problems that arise are self-imposed.

0 . Here is a minimal list of things to include in a question to S-news: • The version of S or S-PLUS that you are using. you can consider disguising the data so that conﬁdentiality is not broken. This clearly needs work. then reports like this will make no dent in his day. including the machine type. When I try to use “genopt”. and that he has ample time to work on it. but it is diﬀerent than you had thought. there is virtually no chance of Pat divining what the reporter witnessed. Sometimes the data that you are using when the bug occurs is conﬁdential. Try using diﬀerent data. S Poetry c 1998 Patrick J. reproduce and ﬁx the bug as easily as possible. but you should not raise your hopes too high about the bug being found. you can send the report without data that reproduces the problem. Many times this exercise will show you that the bug is in your way of thinking rather than in the software. It is always educational—I don’t think there is a better way to learn a language than tracking down bugs.6. BUG REPORTS 151 Focus on the part about getting a simple example. At best I will realize that there was some twist that I missed. Here is an example bug report: Dear Pat. It is often the case that there is a simpler way to achieve the goal that may have nothing to do with your speciﬁc question. Other times it will result that there is a bug. it breaks. If all else fails. but the same sort of principles apply when asking questions on S-news or similar venues. I have (of course) presented the easy case. Assuming the best of circumstances that Pat is highly motivated to have “genopt” as clean as possible. and you may not be able to ﬁnd a way to consistently reproduce the problem. If Pat is short on either pride or time. • Why you want to do it.6. Include enough detail so that it will be clear to the reader. Reporting such behavior can be useful. I make it a policy to never submit a bug report the same day that I write it. Some bugs are intermittent. This is a bit of a digression. and the whole report was oﬀ the mark. If that doesn’t work. • What you want to do. At worst I can merely check over it with a fresh mind to ensure that others will understand it. It bears repeating that the object is to allow the receiver to identify. Burns v1.

8 Further Reading Programming Pearls by Jon Bentley includes some debugging advise. What do you want it to do? What happens when there are bugs in it that cause an error condition? Send bug reports for any problems that you know about.0 . like a story by P. Burns v1.152 6. Wodehouse.7 Things to Do Write your own error action function like dump. If you are overly frustrated. consider reading something humorous. 6. G. Elizabeths” 22 A. E.9 20 21 Quotations Cecil Day-Lewis “A Failure” Elizabeth Bishop “Visits to St. Housman “Oh Who is That Young Sinner” 23 William Butler Yeats “Crazy Jane Talks with the Bishop” S Poetry c 1998 Patrick J. 6.frames.

To access the value of a variable. By convention environment variables are all in upper-case. So you could issue a Unix command like ls $HOME to list the ﬁles in your home directory. Since an escape to Unix creates a subprocess of S. other commands (like starting S) result in a number of processes. But there are a few things that can be helpful. Simple Unix commands result in one process being created. The child process inherits the environment of the parent. Files are pretty intuitive except that they are abstracted to include more than you might expect. though toolkits do exist so that you can get much of the functionality discussed here.Chapter 7 Unix for S Programmers You generally need to know very little about your operating system in order to use S. The child can then change its environment. When we start S (for example). The program that gives us a Unix prompt is a process. The environment is the collection of environment variables and the values that they have. you can see S’s environment with the S command: > !env 153 . and may spawn children of its own. then you may wish to skip this chapter. put a $ in front of the variable name. Each process has a parent. the process started for S is the child of the process that gave us the Unix prompt.1 Environment Variables The two fundamental entities in Unix are processes and ﬁles. If you are not using some variant of Unix. S uses a number of environment variables. 7.

seanet.Functions:/byron/splus/3.3/sgi/splus/. The S WORK and S PATH variables provide what is to be on the search list as S starts up.3/sgi SLIC_RSHCMD=rsh %s env SHOME=$SHOME Splus LICENSE server start S_CC_FLAGS= S_CLHISTFILE=/byron/users/burns/.3 of S-PLUS on an SGI machine.3/s gi/s/.Data TERM=vt100 USER=burns XAPPLRESDIR=/usr/lib/X11/app-defaults XFILESEARCHPATH=/byron/splus/3.3/sgi/s tat/. UNIX FOR S PROGRAMMERS (or possibly !printenv).Functions:/byron/splus/3.0 LESS=-PPaging with ’less’ MANPATH=/usr/local/man:/usr/share/catman:/usr/man:/usr/catman PATH=/usr/lang/gnu/bin:/usr/lang/gnu/lib:/usr/sbin:/byron/users/bur ns/bin:/usr/local/bin:/bin:/usr/bin/X11:/usr/bsd:.seanet.0 .Datasets:/byron/splus/3.Datasets S_POSTSCRIPT_PRINT_COMMAND=lp S_PRINTGRAPH_METHOD=postscript S_PRINT_ORIENTATION=landscape S_SCRIPT_SCCS_MAJOR=1 S_SCRIPT_SCCS_MINOR=73 S_SHELL=/bin/csh S_WORK=. This is using version 3.com SHELL=/bin/csh SHOME=/byron/splus/3. and everything in S PATH is then S Poetry c 1998 Patrick J. Here is an edited version of the result with only the most pertinent variables shown.Datasets:/byron/splus/3.154 CHAPTER 7.Data:/byron/users/burns/.Functions:/byron/splus/3. They include variables to control printing.3/sgi/splus/lib/X11/app-defaults/%N:/ usr/lib/X11/%T/%N Most of the S speciﬁc variables start with S . The ﬁrst thing in S WORK that exists is used as the working database.3/sgi/stat/.3 /sgi/splus/.com:0 HOME=/byron/users/burns LD_LIBRARY_PATH=/usr/lang/SC1.:/usr/etc:/etc:/u sr/bin PWD=/user/users/burns/spo REMOTEHOST=pburns. > !env DISPLAY=pburns.3/sgi/s/.Splus_history S_DEFAULT=NEW S_DL2_EXTRAS= S_FRONT_END_RUNNING=1 S_LASERJET_DPI=300 S_LASERJET_PRINT_COMMAND=lp -Ppoe S_LD_FLAGS= S_LIB_DYNLOAD2= -lF77 -lI77 -lU77 -lisam -lelf -lmld -lm -lbsd -lsun -lc S_LIB_STATIC= -lF77 -lI77 -lU77 -lisam -lelf -lmld -lm -lbsd -lsun -lc S_PAGER=less S_PATH=/byron/splus/3. loading object code and so on. Burns v1.

but on the order of 1010 that contain 6 S Poetry c 1998 Patrick J.2 Using Unix Generally the ﬁrst thing you do each time you meet up with Unix is to type a password. 7. S : Copyright AT&T. This should contain S commands. never have the password the same as the login name.0 . % setenv S_SILENT_STARTUP 1 % Splus error off > setenv is the C shell command to set an environment variable.Data error off > A variable that can be useful in special situations is S SILENT STARTUP. There are “shell” variables as well as environment variables. Burns v1. There are only about half a million “words” that contain 4 lower-case letters. USING UNIX put on the search list. Version 3. but should contain some non-alphabetic characters. In particular. Here are some rules for picking a password: • Do not use a name or a word as a password. cat("error off\n")’ % Splus S-PLUS : Copyright (c) 1988. 1995 MathSoft. Hackers can automatically try each item in a list as a guess for your password—a dictionary or a list of users is a likely choice. Inc. but which can be set is S FIRST. Always maintain a secure password.2. • They should also not be too short. and the . The Bourne Shell has a diﬀerent mechanism.7. the commands are executed when S starts. This suppresses the copyright printing.First function (if it exists) is not executed. % setenv S_FIRST ’options(error=NULL).3 Release 1 for Silicon Graphics Iris. IRIX 5. • Do not use an identiﬁcation number or birthday as a password—a hacker could discover these and try them.2 : 1995 Working data will be in . • Passwords should be easy to type (so it isn’t easy to observe the password as you type it in). The distinction is that the children of a process inherit the environment variables but not the shell variables. 155 A variable that doesn’t appear here.

!number repeats the command with that number. for example. but I’ll just give the most basic commands.0 . The most common shell for interactive use is the C shell. you are really conversing with a shell. UNIX FOR S PROGRAMMERS characters from both lower. I (you’ll be shocked to learn) generally create an acronym out of a line of poetry. 1010 is. Burns v1. the slow smokeless burning of decay 24 becomes “ts2bod”. The C shell has a history mechanism that can be very handy. Combining two words in some way. A shell is the outer clothing of Unix and may be changed.and upper-case letters. Three important “ﬁles” are standard-in. but there are diﬀerences in such things as history mechanisms and ﬂow control. and the most common shell for programming is the Bourne shell. and then make sure that you get the exact same thing in the second command (rule out typos)..) which means the directory directly above the current directory. and to change it if you suspect it could have been. Half a million is not too many for some hackers to try. standard-out S Poetry c 1998 Patrick J. or using an acronym are two common ways of making a good password. the digits and a few symbols. For example to copy a bunch of ﬁles to the current directory. Most things are the same in all of the shells. I stated earlier that Unix ﬁles are abstracted to include more than our usual sense of ﬁles.. and the dot (. So you can do something like: % ls jj* % rm !$ to ﬁrst make sure that there is nothing untoward in the expansion. you could give a Unix command like: % cp . The reason that you can’t use an S command like > attach("~/./other_dir/*. For example.c . !21 repeats command number 21.Data") # this won’t work is that the tilde construct is a C shell device. it is important to not let it be compromised. but attach uses the Bourne shell to look for the ﬁle.) which means the current directory. When you are at a Unix prompt.156 CHAPTER 7. !! repeats the last command. Two universal abbreviations concerning path names is the double dot (. Once you have a good password. Never put your password in an email message or similar place—a hacker can search for occurrences of “password”. Some amazing things can be done with it. !$ gets the last “word” of the last command. Alternatives include the Korn shell and the Bourne-again shell. and !* gets all but the ﬁrst word of the last command.

The & character puts processes into the background. meaning that the process continues to compute but you get the prompt back so that you can perform other tasks. For example S Poetry c 1998 Patrick J. A Unix pipe is related to redirection—the standard-out from one process is used as the standard-in of another. Commands in Unix may be put into the background.out then you will get the prompt back right away.in before the prompt will come back. The command % S < c. Standard-out is where most programs put their output. The command % S < commands. Here is an example: % setenv S_FIRST ’invisible()’ % setenv S_SILENT_STARTUP 1 % set jj=7 % echo $jj "+" $jj "/3" | Splus [1] 9. In contrast if you type: % Splus < c. The backquote is used to substitute in the result of a command.7.0 .out is almost like using the S BATCH utility except that BATCH takes care of some details. Unix recognizes three types of quotes. for example % S < commands. These can be changed or redirected to actual ﬁles. so standard-in is redirected.in > c.333333 % A more practical instance of this is shown on page 191 in the discussion of CHAPTER and data. Burns v1.in > c. USING UNIX 157 and standard-error.dump. and standard-error is where error messages usually appear.q makes S look in commands.out is in the foreground so you need to wait until S has done all of the commands in c.out & [1] 21806 % jobs [1] + Running Splus < c.in > c.q for its input. Standard-in is where most Unix programs expect their input to come from.q > s.2. Both double quote and single quote are used to protect characters from being interpreted—read some Unix documentation to get the diﬀerence between " and ’.

If there are no characters between two colons. S Poetry c 1998 Patrick J. if any. so the command: % ls ‘Splus SHOME‘ Version local/ adm/ module/ bin/ newfun/ cmd/ s/ doc/ splus/ include/ stat/ library/ lists the contents of that directory. A less trivial example is: % alias pgb ’ps -aux | grep burns’ This creates the command pgb that gives a list of all of the processes that belong to my login name (more correctly all the processes whose listing from ps contains “burns”). that spot represents the current directory. This is precisely analogous to the search list in S. UNIX FOR S PROGRAMMERS returns the directory that is the home of S.0 . Another way of getting my pgb would be: % echo "ps -aux | grep burns" > pgb % chmod +x pgb It is customary to put your scripts into a directory called bin directly under your home directory. The PATH is a sequence of directories separated by colons. Burns v1. A trivial example is that someone comfortable with DOS can alias dir to be ls. it uses the environment variable PATH to decide where to look for the command.158 % Splus SHOME /byron/splus/3. Unix allows you to give an alias to a command. including whatever arguments there are. You can create your own Unix commands (called scripts) merely by putting the command(s) into a ﬁle and making that ﬁle executable.3/sgi CHAPTER 7. Another non-trivial alias is the following: % alias pd ’pushd \!*’ This (in the C shell) is as if you typed “pushd” when you type “pd”. When Unix looks for a command.

You can look at the “man page” for the command with the man command. which is basically equivalent to ps -ef in the other major ﬂavor. VOCABULARY 159 7.in b. I then look for the main process of the S job and note its id number. My most common use of this is: % S BATCH b. ps lists the processes that are current on the machine. I often want to kill an S batch job.out # (output from what S is doing.out The grep command is used to search for particular strings in lines of ﬁles. An especially powerful feature of tail is the f ﬂag which continuously prints lines as they are added to the end of the ﬁle until you interrupt it (control-c). Burns v1. The kill command takes process id numbers and attempts to kill that process (and its children). but the most common is to print out the entire contents of a ﬁle. respectively. I use my pgb command to list the processes that I own. then use head or tail.3 Vocabulary In each case below. The cat command has several uses. S Poetry c 1998 Patrick J.7. For example % man cat % man man % man csh will give you help on the cat command.out % tail -f b. To do this.3. If you want to look at only the top or the bottom of the ﬁle. I give only a hint of the full functionality. the man command and the C shell (including its commands). So a typical sequence of commands might be: % S BATCH b. ps -ef /byron/splus/3.out % tail -f b.in b. One reason to list the processes is to kill oﬀ one of them. It uses regular expressions to do the matching. One common set of ﬂags to use is ps -aux with one ﬂavor of Unix.0 . control-c to get prompt) % ps -ef | grep burns burns 21741 21740 0 burns 21806 21741 80 burns 21812 21741 7 burns 21810 21806 0 % kill -9 21806 % !ps ps -ef | grep burns 19:15:40 19:37:13 19:40:57 19:37:13 pts/1 pts/1 pts/1 pts/1 0:00 2:39 0:00 0:00 -csh /byron/splus/3.

Each ﬁle in Unix has a set of permissions attached to it. Here’s the idea: you have blocks which depend on other blocks. to write.Audit chmod also will take a three digit octal representing the permissions. The ln command creates links.Audit To see what the permissions are on a ﬁle. and to execute. the group and everyone.Audit ﬁle in the working database unwriteable with a command like: % chmod -w . Links are a way of saving disk-space (and a means of abstraction). For example. The diff command is used to see what is diﬀerent between two ﬁles. if you want to block the S auditing process. w and x with either a plus or minus in front. Such permissions can be given to the owner of the ﬁle. which may depend on yet more blocks. The chmod command is used to change permissions.0 . rather than having a separate help ﬁle for each. Burns v1. The digit positions refer to the user. There can be permission to read. The permissions of ﬁles as they are created is controlled by umask. so extra force is necessary. The make command is extraordinary software. UNIX FOR S PROGRAMMERS 0 19:15:40 pts/1 0 19:42:18 pts/1 4 19:42:18 pts/1 0:00 -csh 0:00 grep burns 0:00 ps -ef The 9 ﬂag on kill is needed because S batch jobs don’t die easily. you can use the l ﬂag of ls: % ls -l . you can make the . 2 is write and 1 is execute. the numbers are the sum of the permissions where 4 is read. for example every weekday at ﬁve o’clock. The diffsccs function (page 51) is an example of its use. The at command allows you to start processes at an arbitrary time. You might for instance want to start an S batch job two hours after you leave when you presume that everyone will be oﬀ the system. S Poetry c 1998 Patrick J.160 burns 21741 21740 burns 21817 21741 burns 21816 21741 CHAPTER 7.Data/.Audit To make it writeable again. the owner’s group.Data/. On several platforms SPLUS links help ﬁles that are for more than one function. etc. you would do: % chmod +w .Data/. chmod understands the letters r. A link essentially just gives more than one name to a ﬁle. The crontab command is used to start jobs on a regular schedule. and to all users.

It is important to know that tabs are an important character in Makeﬁles. -name ’sccs*’ -print Functionality for ﬁnding out about diﬀerent versions of a Unix command in the current environment is performed with which and whereis. make can be used to give the commands that creates the object ﬁle that can be loaded into S. The command % find . just type the command: S Poetry c 1998 Patrick J. One example of when make is useful is if you have some C or Fortran code that depends on a bunch of libraries and you want to dyn. This is a traditional use of make.3. If you want to look for ﬁles that match a certain pattern. DANGER. If you are using vi. Thus making it trivial to recreate the object ﬁle after you have edited your code. To see the more important (in someone’s eyes) settings. The stty command is used to control terminal commands.q” in the current directory and all its subdirectories and then prints the location of any that it ﬁnds.7.z The which command states which version of the command will be used. and :se nolist turns it oﬀ. but its general principle is expansive enough that it can be used for a variety of purposes. make provides such functionality.load it into S. but a simple use of it is to ﬁnd ﬁles that you don’t know the location of. The magic to be performed is stated in a ﬁle named Makefile. VOCABULARY 161 If one or more of the blocks change. you want everything updated that needs to be (but unnecessary work should be avoided). while whereis gives the locations where the command (or its documentation) is found on the current path. Burns v1. % which ps /bin/ps % whereis ps ps: /bin/ps /usr/bin/ps /sbin/ps /usr/share/catman/u_man/cat1/ps.0 . A common mistake is to forget to give the directory from which to start.q -print looks for ﬁles named “sccs. the owls were bearing the farm away 25 The find command does more than you can imagine. -name sccs. you can see the actual characters in a ﬁle with the command :se list. then you can give find a wildcard pattern (as opposed to a regular expression) but you must put quotes around it: find .

unix(cmds[i]. out = F) == 0 extras <.c(exists = fbe. size = NA) if(!fbe || !any(extras)) return(ans) cmds <. out = F) == 0 } ans } Here are some examples of its use: S Poetry c 1998 Patrick J. dsusp = ^@. intr = ^C. The command is % stty erase where after the “erase” there is a space.0 . The test command is quite useful in Unix programming. old-swtch = ^@. filename) for(i in seq(along = extras)) { if(!extras[i]) next ans[i + 1] <. write = NA. UNIX FOR S PROGRAMMERS % stty speed 9600 baud.c(directory. "test -s"). "test -w". "test -r".paste(c("test -d". "filetest"<function(filename. read = F. "-o -d".162 CHAPTER 7. write = F. Some versions of stty allow you to change the number of rows and columns for the display so that programs like editors will work properly. filename). execute. directory = NA. "test -x". execute = F. directory = F. brkint -inpck icrnl onlcr tab3 echo echoe echok echoke Add a -a ﬂag to see all of the settings. read = NA. size) ans <. write. filename.unix(paste("test -f". read. A common setting to change is the key that is used to erase characters. and then you hit the key that you want to use. The following S function uses test to give information on ﬁle names. execute = NA. Burns v1. size = F) { fbe <. -parity hupcl clocal line = 1.

write=T) exists directory read write execute size T T NA T NA NA > filetest("not_there". These two commands implement the notion of a directory stack.q") exists directory read write execute size T NA NA NA NA NA > filetest("SCCS". VOCABULARY > filetest("polygamma. Suppose we start in our home directory and give the command: % pushd proj1 ~/proj1 ~ We are now in the proj1 directory and home is the second directory in the stack. We can add to the stack by giving a new directory to pushd: % pushd ~/proj2 ~/proj2 ~/proj1 ~ % pushd ~/proj1 ~/proj2 ~ pushd with no arguments switches the top two directories in the stack. If we do: % pushd +2 ~ ~/proj1 ~/proj2 then the third directory is brought to the top (and the stack is treated circularly). size=T. dir=T.3. the pushd and popd commands in the C shell are still handy. dir=T) exists directory read write execute size F NA NA NA NA NA It is used in portoptgen on page 348. and dirs displays the directory stack. by the way. The “2” can be generalized to higher numbers. 163 Although the use of window systems reduces the need. and popd removes the top directory from the stack. % pushd ~/proj1 ~ ~/proj2 % popd ~ ~/proj2 % dirs ~ ~/proj2 popd removes the top directory. pushd adds to the stack (among other things).0 . S Poetry c 1998 Patrick J. Burns v1.7.

I’ll admit that it isn’t always easy to go directly to that answer. but the journey that you take is often as useful as getting the answer to your original question. Read the man page of interesting commands. You might type: % apropos tree ftw. such as making substitutions.4 Things to Do Change your password if it is poor or known to others. Unless you are a Unix expert. tfind. Burns v1. Perl is a language that encompasses awk. Your last wish should always be for more wishes. The top of a shar ﬁle generally states how to unpack it. O’Reilly and Loukides is a very good book for learning Unix.manage binary search trees and you will get back a brief description of a number of commands. twalk (3C) . you can rule the world.164 CHAPTER 7. These are both good programs. Much of the S software on statlib is in shar ﬁles. UNIX FOR S PROGRAMMERS The shar command is a way of bundling a number of ﬁles into one ﬁle. If you know both Perl and S. 7. I suggest that you learn Perl instead. The apropos command is used to try to ﬁnd documentation on commands that relate to a certain subject. but if you don’t know them.5 Further Reading Unix Power Tools by Peek. tdelete. nftw (3C) . awk and sed are used to manipulate text. Check the permissions on your ﬁles and see if they are how you want them.clone a file tree using symbolic links tsearch.0 .walk a file tree tlink (1) . and give a command like: % /bin/sh sharfile to unpack the ﬁle sharfile. sed and Unix scripts. A shar ﬁle contains the ﬁles to be created and Unix (Bourne shell) instructions on what to name the ﬁles and where the boundaries of the ﬁles are. or you might get nothing back. S Poetry c 1998 Patrick J. 7. most of which are likely to have nothing to do with your intention. You should be in the directory where you want the ﬁles to live. the answer to any Unix question that you have is likely to be in this book. It presumes a little competence with Unix but goes a long way toward teaching the whole of it.

6 24 25 Quotations Robert Frost “The Wood-Pile” Dylan Thomas “Fern Hill” S Poetry c 1998 Patrick J.6.7. Burns v1. QUOTATIONS 165 7.0 .

0 .166 S Poetry c 1998 Patrick J. Burns v1.

I think each language has done the right thing—zero-base is good for machines where C has its strength. In S. If you know C but not S. 8. while this is automatic in S.Chapter 8 C for S Programmers C is a natural partner for S. The most bothersome discrepancy is that indexing in C is zerobased while indexing in S is one-based. Nothing is vectorized in C. Names in S and C have similar forms except for one important diﬀerence. Thus C is much faster at calculations than S is—changing strategic portions of S code into C can often cause execution time to drop from minutes to seconds. DANGER. C also requires the user to take care of allocating and freeing memory. Another example is the ability of C to operate on bits. So saying x[1] gets the ﬁrst element in S but the second element in C. and one-base is good for humans. C demands that each command end with a semicolon (there is no penalty for a C programmer who puts such semicolons in S code). C has a number of operators that S does not. The most important of these are the increment operators like i++ and i += n. 167 . periods are the only non-alphanumeric character allowed in names. Although this diﬀerence is a fertile source of bugs. In C.1 Comparison of S and C The big diﬀerence between C and S is that C is compiled while S is interpreted. This science is then like gathering ﬂowers of the weed 26 S code and C code look very similar (this is not accidental). then the next section can help you to learn S. there are some striking diﬀerences. C requires that everything has its type declared. However.

never the address. Pointers are what tend to cause the most confusion to novice C programmers. and the arguments are declared outside the list. old. old = *xpointer. The ANSI standard mainly extended and clariﬁed the language. new. The original value at the xpointer address is given to old. then you can “dereference” it (get the value at that address) with the * operator. Furthermore. xpointer = &new. C distinguishes lower-case from upper-case letters. and there’s no need to panic. If you have a pointer (which is the address of something). The other is ANSI C. The oldstyle (K&R) contains just the argument names within the argument list. In S you can only grab hold of the values. Like S. *xpointer = new. double *x. S Poetry c 1998 Patrick J. then the value of new is placed at that address. new. the period is an operator in C. but the most visible change is in how arguments to functions are given. old. There really isn’t all that much to them though. The new-style (ANSI) contains the declarations within the argument list: double my_fun( double *x.0 .) A code fragment is: double *xpointer. There is a limit to the length of a name in C. n) long n. (* also means multiplication as it does in S. but which is meant is distinguished by context. Here both old and new are double-precision numbers and xpointer is a pointer to a double.168 CHAPTER 8. most of my example C functions follow the old-style. and the underscore is an operator in S. but the ANSI standard requires that that length be at least 31 characters. A pointer is an object that contains the address of some value. as in: double my_fun(x. An example that is subtly diﬀerent from the last is: double *xpointer. A feature of C that is entirely novel relative to S is pointers. old = *xpointer. Burns v1. long n) Being old-fashioned. There are two ﬂavors of C. C FOR S PROGRAMMERS it is underscores that are the only non-alphanumeric character. the original version is known as K&R C after Kernighan and Ritchie who wrote it and the book describing it. You can also get the address of an object with the & operator (this is sort of the inverse of the dereferencing operator).

-> _ & ^ | ? : = # C Meaning structure member structure pointer member character in names address of or bitwise and bitwise exclusive or bitwise or ifelse trinary operator ifelse trinary operator assignment introduce preprocessor command S Meaning character in names assignment to right assignment vectorized and exponentiation vectorized or help sequence argument matching introduce comment Table 8.h. However. in the ﬁrst example the value at one address is changed. The & operator is most commonly used in function calls where the function is expecting a pointer. Operator .1.0 . Flow control is almost identical in the two languages. The second example would be unusual—the ﬁrst example would more often be correct.8. if any.2: Operators with diﬀerent meanings in C than in S This still has the same value for old and xpointer still points to the same value.1 lists some of the operators in C. braces are used precisely the same to group multiple statements into a single statement. but %% in S works with ﬂoating point numbers also. Table 8.2 lists some operators that have the potential for confusion between C and S.1: Some C operators and approximate S equivalents. COMPARISON OF S AND C C Operator [ . Burns v1. The modulo operator % in C works only on integers. while in the second it is addresses that change. but you have a value. In particular. and table 8. Exponentiation in C is done with the pow function in math. The format and to some extent the functionality of switch diﬀers in C and S. -> & * ++ -* / % << >> Task array subscript structure member structure pointer member address of dereference increment decrement multiply ﬂoating or integer divide modulo bit shift S Equivalent [ $ $ none none none none * / %/% %% none 169 Table 8. A notable exception is that the control mechanisms for for loops are distinctly diﬀerent. C has a do-while loop that is absent in S (repeat can be used to do S Poetry c 1998 Patrick J.

An example of a for loop in S is: for(i in 1:10) { x[i] <. Case h shows how the C switch is fundamentally diﬀerent than that of S.) If you want more than one statement in either the ﬁrst or third part. The third part states what to do after each iteration. } This is a bizarre example. default: y -= 8. while S demands a single expression for a case (multiple statements need to be surrounded by braces). C FOR S PROGRAMMERS the same thing). Case b shows that one diﬀerence with S is that there can be any number of statements for a C case. but shows the rules. Case d is empty which means that it is identical to the next non-empty case—this is like S except that S doesn’t have this feature for numeric variables. Burns v1. break. /* fall through */ case ’n’: y += 9. The second part is a test.0 . case ’d’: case ’e’: y += 2. The ﬁrst part is the initialization. i++) { x[i] = i. z = 6. case ’h’: z += 8. The switch variable can be either a character or integer—in this case it is character. you can put commas between the statements. Here’s an example: for(i=0. where the loop is exited the ﬁrst time that the test is false (so it is possible that the body of the loop is never entered). It is typical that each case end with break. C does not have a repeat. In C it is not local to the loop. One more diﬀerence is that next in S is continue in C (I favor S in this regard). } So there are three parts inside the parentheses of a C for. ij=0. Hence the order of the cases in a C switch can make a diﬀerence. i++. but this is easily replaced with while(1). An example of the format of a switch statement in C is: switch(x) { case ’b’: y += 3.i } while a similar for loop in C would be: for(i=0. return(y).170 CHAPTER 8. and—as we’ve seen—there need not even be a unique variable. it goes into case n also. break. S Poetry c 1998 Patrick J. Since there is no break or return in case h. i < 10. i < 10. (Notice that the resulting vector in C will be diﬀerent than the one in S. The body of the switch is one expression which is divided by a number of case statements. break. ij += n) Another diﬀereence is that the iteration variable in an S for is local to the loop.

Although code using S. The angle brackets mean that the preprocessor is free to look in a number of locations for the ﬁle. If you learn C especially for S.2 Loading Code into S The C functions that can be called from S are of a restricted class. If you have existing C code that you want to use in S. You can have anything in a comment except */ (that is.h is often called from S. *double (double in S).0 . The preprocessor performs its commands to change the source ﬁle before the compiler sees it. All of the arguments must be pointers (which makes sense if you consider that S vectors translate to C arrays). there is a trick—you can put the code inside an if that is always FALSE.8.h ﬁle is the header ﬁle that comes with S—it contains a number of deﬁnitions that can be used in C code. it should not be diﬃcult to write a wrapper function in C so that it obeys the requirements that S has.h ﬁle into the source ﬁle.2. or **char (character in S). and many more things—there is no equivalent feature in S. The C preprocessor is a powerful feature of C that inserts include ﬁles. The next step is to compile the source code.C on page 177. LOADING CODE INTO S 171 C has a unique way of specifying comments—they begin with /* and end with */. The arguments may be *long (integer or logical in S). and are very useful. The S. A deﬁnite advantage to this form of comment delimiting is that it is easy to comment out a section of code. Fight this impulse—structures in C are the equivalent of lists in S. then you can use the utility created for the purpose. it need not be. The # symbol is used in preprocessor commands. 8. but this is large enough that it should seldom be of concern. And often does.c The COMPILE utility is meant to perform an appropriate compilation on whatever platform you are using. the ﬁrst line of a C source ﬁle might be: #include <S. There is an exception—see the discussion of the pointers argument to . makes platform dependent versions. An example is: Splus COMPILE myfun. For example. Note that structures are not allowed as arguments. *float (single in S). However. but if you have S-PLUS. there is the tendency to avoid the use of structures since they may not be passed in. The function is typically declared as void since return values are thrown away. comments don’t nest). You can do this directly. Burns v1.h> This inserts the contents of the S. There is a limit on the number of arguments allowed. There is no equivalent in S to do this. S Poetry c 1998 Patrick J.

merely the S session. you are extending or changing the symbol table. Picturing what is happening with the symbols S Poetry c 1998 Patrick J.load will take a vector of ﬁle names—the order of the ﬁles can matter because it is like a series of individual calls to dyn. and the script with which users start S can point to this new executable.For("fjj") to see how your system does it.load is the simplest of the three.0 .load. you can try using the Unix command ld to create a ﬁle that contains all of the necessary symbols so that using dyn.load. When you load code into S. and often hard to solve.shared are diﬀerent ways to solve the same problem. a symbol is a translation of a routine name based on platform speciﬁc rules. In S-PLUS there are three functions that do dynamic loading: dyn. Static loading means that you combine the executable as it stands with additional code and get a new (larger) executable. Not all of these will be available on a particular platform. The other good use for static loading is when dynamic loading is unavailable or doesn’t work for the code at hand. Troubles with dynamic loading are fairly common.load. It is not likely that dynamically loading a self-contained ﬁle will fail. the more likely problems will arise. Burns v1.load2 and dyn. An explanation of how LOAD works is on page 191. C symbols are diﬀerent than Fortran symbols.load2 or dyn. dyn. A symbol table associates each symbol that it knows to an address.shared.load.C("fjj") symbol. These functions allow symbols that are deﬁned in neither the source ﬁle nor the S session to be pulled from libraries.) dyn. Static loading is done with the LOAD utility.load2.load2 to load related symbols—all the symbols must be put into the S symbol table with a single call to dyn. The two main divisions of loading code are static loading and dynamic loading. In S-PLUS you can do: symbol. It takes a character string which is the name of an object ﬁle. dyn. First oﬀ. (Actually dyn. The more complex the situation with symbols. you can not make two diﬀerent calls to dyn. C FOR S PROGRAMMERS The S executable (the ﬁle that contains the stuﬀ that makes S actually go) contains a symbol table.load is a possibility. If using dyn. all of the symbols in the ﬁle need to be deﬁned either in the ﬁle or in the S session. One good use of static loading is if everyone at a site needs access to a speciﬁc set of routines that are thoroughly debugged. Generally a symbol is the routine name with underscores before and/or after it.load. Dynamic loading is when the executable is already executing and you add the code to this instance of execution—the executable is not changed.load. Unlike dyn. These routines can be statically loaded into S. The disadvantage of static loading is that a new executable is created—the size of which is measured in megabytes.load2 and dyn.shared doesn’t work.172 CHAPTER 8.

a pointer). There is plenty of room to trip over the nomenclature between S and C. 8. This generally happens when the same function is in two diﬀerent places. A tool that can help is the NM utility discussed on page 192. a C programmer may well think that a “character vector” is a character string. Burns v1. 1 type rectangle of 1 type hyperrectangle of 1 type collection. C provides a very good way of keeping this from happening. then you will need a wrapper function in C to convert back and forth between the two conventions. can be a substantial help in solving a loading problem. One loading problem that can arise is when there are duplicate symbols.3 lists terminology from some languages for a few simple data structures. Table 8. To reduce the possibility that these remaining names will conﬂict with something.. This is the way that Fortran handles S Poetry c 1998 Patrick J. If you declare as many functions as possible to be static. For instance. However. For example. but it is easy to do.3 Arrays It is common for C programmers to think of matrices as pointers to pointers so that they can be subscripted like: cmat[i][j] However. then I advise treating matrices in C the same way that S does. an S matrix passed into C is merely a C array (i. Many of the in-built C routines in S begin with a capital S and an underscore. My personal convention for names that are going to be used in S is to put “ Sp” at the end.0 . you can put a personal preﬁx or suﬃx on the name. ARRAYS object single value ordered. then there will only be a few symbols that can possibly conﬂict. You need to do some arithmetic to get a speciﬁc row and column. All C functions except those that are used in a . This means that you can have a C function called sort and it will not conﬂict with any other “sort” as long as you declare it static in your C source ﬁle.3. If you have code that uses pointers to pointers for matrices.e. while S programmers know that a “character vector” is a vector containing character strings. several types S vector vector matrix array list C scalar array Fortran scalar vector matrix array Perl scalar array. so my sort routine—if it were not static—would be called “sort Sp”.3: Terminology from various languages. there could well be a C function called “S sort”.C call in S can be declared static. list 173 structure Table 8. If you are writing your own numerical code. there is the possibility that there are two diﬀerent functions with the same name.8.

dim(qmat) dx <. static double quad_form(double *Q.matrix(x)) if(length(dq) != 2 || dq[1] != dq[2]) stop("qmat must be a square matrix") S Poetry c 1998 Patrick J. double quad_form(double*. It is simple to generalize to higher dimensional arrays. double *x. long *xdim. ij++) { ans = ans + x[i] * Q[ij] * x[j]. ii. i++. for(i=0. double ans = 0.dim(as. double *ans) { long i. ii=0. C FOR S PROGRAMMERS matrices also. each of whose columns is an x in the quadratic form.0. Here is an example that computes quadratic forms x Qx where Q is an n by n matrix and x is a vector. "quad.0 . i++) { for(j=0. n). j++. for(i=0. double *x. x + ii. n. j. j < n. } void quad_form_Sp(double *Q. } } return(ans). Here is the S function that calls this code. ii += n) { ans[i] = quad_form(Q. } } Stepping through the matrices is made eﬃcient by incrementing the subscripts rather than always performing a multiplication. The S function generalizes this to allow a number of quadratic forms to be evaluated in one call by passing in a matrix x. ij. i < n.form"<function(qmat. so Fortran matrix code can be called from C when using this convention. long). ij = i * n. long n) { long i.174 CHAPTER 8. i < xdim[1]. double*. Burns v1. x) { dq <. n = xdim[0].

as. The specialsok argument in S-PLUS allows inﬁnite values. double(dx[2]))[[4]] } 175 This function saves a little time compared to the usual S operation by using one operation instead of two matrix multiplications.4 C Code for S One type of functionality that the S header provides is for handling missing values and special values. S will copy the S Poetry c 1998 Patrick J. Burns v1. DOUBLE. If you really want to evaluate a number of quadractic forms with the same kernel. the names to use for the modes are listed in table 8. If you want to allow missing values in the arguments to your C function (or Fortran subroutine) that is called from S. na set3 and inf set.) The ﬁrst two arguments to each of these is a pointer to the value of interest. is na and na set3 use the symbols Is NA and Is NaN to distinguish between missing values and not-a-numbers. This is similar to malloc. then you need to use the NAOK argument to . C CODE FOR S if(dx[1] != dq[2]) stop("qmat and x do not match") if(!is. is inf.C("quad_form_Sp"))) poet. but does not have a corresponding free statement. If your routine uses workspace.loaded(symbol. then some more eﬃciency can be squeezed out by modifying the looping in quad form. na set.dyn. so you don’t want to have side eﬀects in the arguments. The memory that is allocated by S alloc is automatically freed when the S memory frame containing the C call is exited.4.0 .C. and the storage mode of the object (back in S). you would say: na_set3(x + i.double(x). so you are guaranteed that the space will be freed. C code that will be called by S and needs to allocate memory can use the S alloc C function. The C functions include is na. You can see some of the C functions that handle special values in the code for the digamma function on page 262.integer(dx). It is freed even if there is an abnormal exit from the C code. then it is better to allocate the workspace in C with S alloc rather than passing it in through the call to . as.o") .double(qmat).8. Neither of these arguments may be abbreviated. Is_NaN). If the kernel matrix is symmetric and large.4.load("quad_form. as. So to set the ith element (i + 1th in S) of x to be a not-a-number.C("quad_form_Sp". then this will save a lot of time. 8. (Actually these are generally macros rather than functions.C.

These start with PROBLEM followed by a string.0 . iwork = (long *)S_alloc(2 * n. say: seed_out((long *) NULL). bad. x is %g". followed by arguments to be placed in the string (if any). This updates the stored random seed. Before you can generate any random numbers. so that the same “random” numbers will not be generated repeatedly. you need the line: seed_in((long *) NULL). This makes the random seed available. double *work. An example of an error statement is: PROBLEM "really bad" RECOVER(NULL_ENTRY). The two functions that generate random numbers are unif rand and norm rand. both of which return a double. sizeof(long)).4: Modes within C Code. and you need not have a newline at the end of the string. x WARNING(NULL_ENTRY). S Poetry c 1998 Patrick J. An example of a warning is: PROBLEM "bad. i. followed by another magic phrase indicating if it is an error or a warning.176 S storage mode logical integer single double complex CHAPTER 8. You can issue the equivalent of stop and warning statements in your C code called by S. object in the latter case. sizeof(double)). Note that there are no parentheses (they are in the deﬁnition of the PROBLEM macro and its mates). S-PLUS makes available routines so that you can use the S random number generator in your C code. Burns v1. C FOR S PROGRAMMERS Name in C LGL INT REAL DOUBLE COMPLEX C declaration long long float double complex Table 8. After all random numbers have been generated. i is %d. Use of S alloc might look like: long *iwork. work = (double *)S_alloc(n.

if(count >= curlen) { S_realloc(*ans. rsd=pars[2]. curlen.h> void rand_seq_Sp(pars. } The random part of this function is easy. #include <S. while(1) { this_rand = norm_rand() * rsd + rmn. the old length. break. but here it is a pointer to a pointer to double.0 . Burns v1.C. double threshold=pars[0]. curlen += incr. It produces a sequence of random Gaussian deviates that terminates the ﬁrst time that one of the random numbers is greater than a speciﬁed value. if(this_rand > threshold) break. incr=ipars[0]. } (*ans)[count++] = this_rand. *ans = (double *)S_alloc(incr. Not so transparent is what is happening with ans. seed_in((long *) NULL).8. the new length. sizeof(double)). double *pars. curlen + incr. Generally ans would be a pointer to double. sizeof(double)). curlen. *alen. exiting" WARNING(NULL_ENTRY). Reasonably easy is the use of S realloc—when the space allocated by S alloc is no longer large enough. Its arguments are the pointer to the space. larger space and takes care of transferring the old values to the new location.4. alen) long *ipars. safety=ipars[1]. { long count=0. It also introduces a couple more tricks for C programs in S—the S realloc function and the pointers argument to . ipars. curlen = incr. ans. S realloc allocates a new. } } seed_out((long *) NULL). **ans. The reason is that the length of ans is not known in advance. so it uses the pointers S Poetry c 1998 Patrick J. double this_rand. if(count >= safety) { PROBLEM "reached maximum length. rmn=pars[1]. *alen = count. and the size of the elements. C CODE FOR S 177 The example below shows how to use the random functions.

C that is a pointer corresponds to two arguments in C—the ﬁrst C argument is a pointer to a pointer.C the ﬁrst element of threshold and several other variables are extracted when combining variables together for arguments to the C function. it is less useful than might be thought. S Poetry c 1998 Patrick J. I’m not going to give an example of its use.C("rand_seq_Sp". (You still have the overhead of the S function call. Chambers and Wilks (1988) and Spector (1994). mean[1]. A C function that is included with S is call S which allows C code used by S to go back to S and evaluate an S function.C S function.dyn. sd = 1.178 CHAPTER 8. parameters = as. "mean". F. and the second argument is a pointer to long that gives the length of the object. Each argument to . safety[1])). This ensures that a misspeciﬁed S argument will not completely mess up the C call. See the PORT optimization example on page 340. don’t forget to set the length of the variable-length objects in C. increment = 1000.double(c(threshold[1].integer(c(increment[1]. so you can’t gain speed by looping in C instead of S. mean = 0. Although it does have its place.o") ans <. safety = 10000) { if(!is. Burns v1.match. C FOR S PROGRAMMERS argument to the .load("rand_seq. sequence = double(0). DANGER. as.call() ans } Notice that there is one less argument to . pointers = c(F. T))[c("sequence". "parameters")] names(ans$parameters) <. Here is the S function that calls this code: "fjjrand.) Examples of how to use call S are in Becker. "stand dev") ans$call <. An alternative strategy is to check the length of each variable.loaded(symbol.0 .C (excluding the pointers argument) than there is in the C function.. S-PLUS gives you routines that help you when calling Fortran from C. When using the pointers argument. sd[1])).seq"<function(threshold. In the call to .c("threshold".C("rand_seq_Sp"))) poet.

Fortran whose storage mode is not coerced. then there is a bug. Usually. Each argument in a . Don’t do it. S Poetry c 1998 Patrick J.5 Debugging If you are unlucky. but forget to edit the S function. there are likely to be a number of frustrating hours wasted trying to track down the problem. and it kills itself to prevent permanent damage in case of corruption. Some day when you try to make this go on some other machine where the long-int identity doesn’t hold.C call and the C function is to edit the C function and recompile. Likewise. or an argument mismatch.C or . The second most common is “bus error” which is an indication of serious confusion.8. one of the ﬁrst things you will see when you use an S function that calls some of your C or Fortran code is something like: System terminating: bad address and you are thrown out of S and back into the operating system. though. Remember that the more subtle and silly the mistake. The most common is “bad address” which is often an indication of running oﬀ the end of an array. A system terminating can be caused from a corrupted version of an in-built function—and it need not be apparent what the problem is. DANGER. DANGER.Fortran calls is that there is the wrong number of arguments or they are not declared properly. A common way to have the number of arguments diﬀerent in the . You should report this to your provider of S. The system terminating gives an indication of what went wrong.C or . The most common problem with .C or . it is a bug. If not. Storage mode integer in S corresponds to long in C.Fortran call needs to be either coerced to the correct storage mode or to have had its storage mode coerced directly before the call. I never heard that sound by day. locally written code that is loaded into S can cause a problem that only later results in a system termination. 27 If you see a system terminating in an S session in which no local code is used and you are not masking any in-built functions. the harder it is to ﬁnd. the termination occurs immediately. The system terminating happens when S detects that something bad has happened.0 . On many machines a long is identical to an int so it is possible to declare S integers as int in C and have the program run. DEBUGGING 179 8. Burns v1. Capricious Rule 1 Your skin crawls when you see an argument to .5.

The C function printf is the most likely candidate to use to print messages. So if the buﬀer never ﬁlls.3 If you are inserting print statements because the function is hanging and you want to know how far it is getting. If you have S-PLUS and you are using S. and it also typically contains locations for any subsequent arguments.180 CHAPTER 8. There are times when optimization during compilation—especially a high level of optimization—causes bugs to appear. The ﬁrst argument to printf is a character string which may contain excaped characters such as backslash-n. then you will need to ﬂush the output. try out the code. C puts output into a buﬀer and only really writes once the buﬀer is full. i. C FOR S PROGRAMMERS On Unix the lint command looks for mistakes in C code. then you can use a command like: lint -I‘Splus SHOME‘/include myfun. The usual command to ﬂush printf output is: fflush(stdout). Burns v1.0 . The routine is: modify the C code. I make an initial debugging statement in the C code that prints a version number. Of course I need to remember to change the version number when I edit the code in order for this trick to be eﬀective. The locations for the additional arguments are marked with a symbol that starts with % and is followed by a letter that indicates the type of object to be printed.c This tells lint the appropriate place in which to look for the S header ﬁle.h. I sometimes forget either to compile the code or load the code so it appears that my last change had no eﬀect when in fact the last change is not being used. you won’t see your output. My strategy is to use the standard level of optimization except on ﬁles that are primarily arithmetic. like mismatches of type and variable numbers of arguments.9 x sub i is 9.) When I’m in a disciplined mood. x[i]). load the code into S. Once I’ve done this several times and frustration is setting in. To save time. and I generally time the computations to see if the optimization really works. A tried and true debugging technique is to insert print statements into the code. compile the C code. S Poetry c 1998 Patrick J. (stdout is deﬁned in S.h. which might result in the output: i is 0 i is 1 x sub i is 4. Here’s an example: printf("i is %d x sub i is %f\n".

0 . In: if(i). An extra pair of parentheses to guarantee the computation order would S Poetry c 1998 Patrick J.6. the code will compile without a problem. DANGER. ans1 is probably what was intended. Integer division doesn’t commute like ﬂoating point division does. If the intention is to set j to 0 if i is not. The most common mistake is to leave out semicolons. Burns v1. i < n. i <= n. j = 0. but will not do what was probably intended. ans1 = n * (n + 1) / 2L.8. then do nothing. ans2 = n / 2L * (n + 1). ans2 is not the same as ans1 unless n is even. then that ﬁrst semicolon has got to go.6 Common C Mistakes DANGER. and ﬁnally always set j to 0. you might try the more subtle: for(i=0. COMMON C MISTAKES 181 8. ans1. i++) You really want: for(i=0. ans2. not a ﬂoating point number. i++) /* RIGHT */ /* STILL WRONG */ DANGER. Consider the code fragment: long n. The code as it stands will test the value of i. i++) /* WRONG */ Once you are past that. A more serious mistake can be putting a semicolon where it doesn’t belong. It is easy to get the indexing wrong on an array. i <= n. This is seldom a serious mistake since the compiler will catch most instances. Be careful when performing division with integers—the result is an integer. It is natural to translate the S expression: for(i in 1:n) into C as: for(i=1.

however. C. A squabble between Fortran and S is that S doesn’t handle Fortran input/output. or it may involve extensive changes to S Poetry c 1998 Patrick J. but not do the right thing. The comma operator evaluates each statement but has the value of the right-most statement.7 Fortran I give Fortran less emphasis partly from personal taste (though I admit that newer versions of Fortran are deﬁnite improvements on older ones). If you have the source code for the Fortran. then the easiest thing to do is to comment out all of the input/output statements. If you have a statement containing: xmat[i. The code will compile. /* WRONG */ will once again compile. Another common mistake is to use = where == is desired. DANGER. It is usually faster and safer to have a little hassle interfacing existing Fortran to S than to write new C code. since it is a syntax error. CHAPTER 8. 8.j] /* WRONG */ then it is wrong (but it will compile). So the above is equivalent to xmat[j]. However Fortran has the one big advantage that lots of algorithms are programmed in it. Do not use print instead of printf. This step may be as simple as just commenting out lines. returns a value from assignment. and partly because there are more complications when interfacing Fortran to S than when interfacing C. like S.) DANGER. Burns v1. C FOR S PROGRAMMERS DANGER. After this statement j will always be 5 no matter what it is before the statement. but have severe problems in the execution. (You will have the opportunity for such bugs in version 4.182 be a good idea. The statement: if(j=4) j++.0 . Such a mistake is not possible in version 3 of S.

Fortran call needs to be in all lower-case. S Poetry c 1998 Patrick J. and all of the names are visible. You do not need to print all of the values in the variable—it will print the ﬁrst few values if you choose. These are often useful for debugging purposes. You merely have to do the best that you can to try to avoid collisions. DANGER.7. For example. S already has three or four. do not use “sort”. intpr for integers (and logicals). } S does come with some Fortran subroutines that produce printed output in S format.0 . though most compilers are nice enough to do the counting for you if you give it a negative 1. The names of the routines are dblepr for double-precision data. -1.h> s_wsFe() { fprintf(stderr. An example call is: call dblepr(’x is’. Each of these allows a message also. then you may need to ensure that S has a number of symbols loaded that it generally doesn’t have by writing C functions and loading them. In particular. The third argument is the name of the variable that you want to print. The second argument is the length of the message. one function might look like: #include <S. I often put: implicit none statements into Fortran routines (when the compiler accepts it) and then declare all objects explicitly. If you don’t have the source code. x. and the fourth argument is the number of elements to be printed. the name of the subroutine that you are calling in the . FORTRAN 183 have data passed in through arguments rather than read in. but S needs to have it that way. It is a lot easier to get name conﬂicts in Fortran names because the name space is so much more limited (you only have 6 characters to work with). This can uncover problems that are not otherwise apparent. nx) The ﬁrst argument is a message. Fortran doesn’t care about the diﬀerence. Burns v1. At least in some versions of S.8. and realpr for single-precision numbers. "Trying to use Fortran I/O\n").

2. The ﬁfth argument is the number of integers to print—this must be 0. n) m99 = "x value is negative" m99l = len(m99) m2 = "r1 is more than 2" m2l = len(m2) do i=1. m2l character*100 m99. The next two arguments are the integers. 0.1 or 2. The remaining arguments are the reals to print. 1 or 2.0) The ﬁrst four arguments to xerror and xerrwv are the same: a message. 1) . Here is exciting code that does important stuﬀ.summary help.0) then x(i) xerrwv(m2. x. an error number.For("xerrsp"))) poet.loaded(symbol. n if(x(i) call endif if(x(i) xe = call endif end do return end . again this must be 0. m99l. 0.184 CHAPTER 8.o") S Poetry c 1998 Patrick J. "fjjxerrsp"<function(x) { if(!is. 1. m99l.0 . the length of the message. You have a choice of two Fortran subroutines—xerror and xerrwv. 1. You will notice in the code above that there is a variable to hold the real value to be printed by xerrwv so that a real is passed in rather than a double precision. m2 real xe call dblepr(’x values are’. C FOR S PROGRAMMERS S-PLUS has facilities for returning error messages from your Fortran code called from S.lt.dyn.load("xerrsp. Below is the S function that calls this code. and an error severity. 0. xe. The xerror subroutine only has these arguments. -1. 2.gt. The eighth argument is the number of reals to print. but xerrwv has six more to allow you to print up to two intergers and two real values. 0. Burns v1. subroutine xerrsp(x. You can learn about the S functions in the xerror.0) then xerror(m99. n) implicit none integer n double precision x(n) integer i. m2l. 0. 99.

.. You can call a Fortran routine in C by using the symbol name rather than just the routine name—C really only cares about the address of the function. as. What is the diﬀerence in execution speed? How much ﬂexibility are you S Poetry c 1998 Patrick J. 8. THINGS TO DO xerror.. thus it doesn’t need the concept of “pointer”.clear() ans <.8 Things to Do Find portions of S functions that would beneﬁt by being coded in C.8. as. r1 = 3 error number = 2 error message summary message start x value is negative r1 is more than 2 other errors not individually tabulated = [1] -1 0 1 2 3 nerr 99 2 0 level 1 1 count 1 1 If you are using Fortran code. The other thing to note is that all of the arguments to the Fortran routine must be pointers. r1 is more than 2 in message above. The S. (Fortran only uses pointers.0 .. then there is going to be an interface between Fortran and C somewhere since S is written in C.h include ﬁle contains macros to get this right.summary() ans } An example of what happens is: > fjjxerrsp(-1:3) x values are [1] -1 0 1 2 3 recoverable error in.8.integer(length(x)))[[1]] if(options()$warn >= 0) xerror. and do it. x value is negative error number = 99 185 recoverable error in.double(x). Burns v1.) The PORT optimization on page 340 is an example of how to call Fortran from C.. You can push the interface down into C code that you write.Fortran("xerrsp".

There is a raft of other books on C. and it deﬁnitely is not the worst (I think I have that one also). I make no claims that this is the best book on C. Burns v1. 8. 2nd Edition.186 sacriﬁcing? Can you change it so that you lose less ﬂexibility while retaining most of the speed improvement? 8.10 26 27 Quotations Louis Zukofsky “It’s Hard to See but Think of a Sea” Robert Francis “By Night” S Poetry c 1998 Patrick J.0 . This is a terse book—perhaps overly so—but carefully written. but it is reasonably good. A quite interesting book is Libes (1993) Obfuscated C and Other Mysteries. One that I happen to have is Darnell and Margolis (1991) C: A Software Engineering Approach.9 Further Reading The fountainhead of C is Kernighan and Ritchie (1988) The C Programming Language.

There is also the possibility of a . I’m not sure of the circumstances under which these get created. Once started. the same audit ﬁle is used throughout a session. you can check to see that it is not listed in nonfile and remove it. Parts are of less than general interest.Chapter 9 S for S Programmers This chapter covers topics about S that are not explained in the previous chapters. It is the .1 Databases When you start S. there are occasions when a nonfile ﬁle is written with a name like 34352 that is not indexed in the ﬁle. DANGER. no matter what the working database is.Data directory to use as the working database. In general you can think of this as merely holding ﬁles that each contain one of your S objects. the actual ﬁle nonfile that is a will be called something like 3 and there is a ﬁle called dictionary to translate between the S object name and the ﬁle name. where the name of the ﬁle and the name of the S object are the same.Help subdirectory—see page 193.Help subdirectory of the . For names that are troublesome for the operating system.Data.Cat.Audit in the working database as a session starts that is used. 187 .Audit ﬁle is the location where the audit trail of S commands is kept. Each object with help has a corresponding ﬁle in . At least in some versions of S-PLUS. The help ﬁles live in the . 9. The . I’m guessing that it is perhaps when there is some sort of interruption.Help. If you see a suspicious ﬁle like this. it looks for a . Sometimes these objects can be quite large. However. Many of these are alluded to in the section below on utilities.Data directories that may be of interest. there are other entities living in .

in b. This reduces the amount of memory that the database uses. which is called Sqpe. One of the things that this script allows is an extra argument which is the name of a utility. 9. It is frustrating to check a batch job after a weekend and ﬁnd that it didn’t do anything because of a typo or other minor error.0 . Note that what I am writing as S here should be replaced with the command you use to start S. such as missing parentheses.in is the name of the input ﬁle for the batch job.out already exists. the previous contents will be deleted. The result will either be an expression or an error. BATCH jobs are good for running long computations. If b. you still don’t know for sure that it will work—you may need to attach a directory. If there is no error.out where b. you are really starting a Unix script that does a number of things. if you want to kill one of the jobs. See also my ﬁx for browser. I like to put date() at the end of the ﬁle also. Burns v1. or may have misspelled a name—but I often ﬁnd this check worth the eﬀort. A partial guard against this is to perform an S command like: parse(file="b. one of which is to start the executable (the actual language). S Poetry c 1998 Patrick J.in") where b.2 Utilities When you give the command to start S. A utility is an executable that lives in the cmd subdirectory of SHOME—the utility is executed instead of the S language.188 CHAPTER 9. S FOR S PROGRAMMERS In recent S-PLUS versions the databases of in-built objects contain one ﬁle that holds all of the objects (called BIG) and a ﬁle to index the objects (called BIGIN). then starting your input ﬁles with !hostname date() can help you keep all of the jobs straight. If you often run several BATCH jobs on a variety of machines.default on page 138. you can tell which process to kill even if you have more than one batch job running on the machine in question.out is where the results go. For instance.in is a ﬁle of S commands and b. It is invoked like % S BATCH b. It will be syntax errors that are found. The BATCH utility is probably the most commonly used.

2.Data attaching dotFunctions Harold weighs harold 15 kilograms.3 Release 1 for Silicon Graphics Iris. but Dorothy weighs {jjwt["dor"]}. This takes a ﬁle that is generally in a typesetting language like TEX or troﬀ that contains some specially marked sections of S commands.out1 REPORT will run in batch: input from file rep. Version 3.9. The ﬁrst problem is that all of the printing when S-PLUS starts gets put into the output ﬁle. but Dorothy weighs dorothy 14. it is very expedient at times. Though simple.in. S : Copyright AT&T.0 . UTILITIES The command % S SHOME 189 simply returns the path of where S lives. The utility is ﬁred up and then we view the resulting ﬁle: % Splus REPORT rep. The result is a diﬀerent ﬁle with the output resulting from the S commands replacing those commands.in rep. Burns v1.out1. The S commands are surrouned by braces. % cat rep. Here is a trivial example. The REPORT utility is probably underutilized. The cat function comes to the rescue: S Poetry c 1998 Patrick J. output on file rep. 1995 MathSoft.in Harold weighs { jjwt["harold"] } kilograms. so we need to be a little more careful about our commands. What gets put into the output ﬁle is everything that S says in response to the commands.in) looks like: % cat rep. IRIX 5. We get two problems in one go. You can get a sense of all of the utilities available to you by doing: % ls ‘S SHOME‘/cmd Not all of the ﬁles there are utilities. The second problem is that we are getting the names of the quantities that we want as well as the numbers themselves.2 : 1995 Working data will be in . but many are. Our (naive) input ﬁle (called rep. Inc.out1 S-PLUS : Copyright (c) 1988.

190 CHAPTER 9. The solution to the startup printing is solved with: % setenv S_SILENT_STARTUP 1 % setenv S_FIRST ’’ The second command is optional. sep="")}. sep="") } kilograms. output on file rep. so this is only a partial solution. so S is going to want to evaluate just about the whole ﬁle.in2 rep.3 under some Unix systems). but in my case my . "\n".out3 Harold weighs 15 kilograms.out2 Harold weighs 15 kilograms. but Dorothy weighs {cat(jjwt["dor"].out3. but {\it Dorothy weighs \{cat(jjwt["dor"]. % cat rep. Note that the backslash-n in the cat calls is of importance. "\n". There are occurrences of backslash-{ in TEX ﬁles.First or have the S FIRST environment variable do the necessary actions. If the commands will need your .out3 REPORT will run in batch: input from file rep. A workaround would be to change the backslash-{ combinations in the ﬁle that are not intended for S to something else.in3. The help ﬁle for REPORT states that you can follow the -e with a character to use in place of the backslash.out2 REPORT will run in batch: input from file rep.0 . Burns v1. "\n".out2. sep="") } kilograms. % Splus REPORT rep. but that functionality seems to be broken (at least as of S-PLUS version 3. and this command avoids the call to . sep="")}. Now our output ﬁle looks ﬁne. The solution to this is to use the -e ﬂag to REPORT.} % Splus REPORT -e rep. then you will need to change the . % cat rep. A bare -e means that the introductory brace for S commands is preceded by a backslash. then change them back after S has S Poetry c 1998 Patrick J.First prints something. output on file rep. % cat rep. but Dorothy weighs 14.in3 Harold weighs \{ cat(jjwt["harold"]. You may be worrying that TEX ﬁles are going to be full of braces.First and it prints something. but {\it Dorothy weighs 14. "\n".} DANGER.in3 rep.in2 Harold weighs { cat(jjwt["harold"].in2.First. S FOR S PROGRAMMERS % cat rep.

UTILITIES processed the ﬁle.Sqpe ﬁle is not going to be small. so you probably don’t want many of them around.9. If it exists.o If all is well.restore(unix(’ls $(DDOBJS)’))" | Splus which is made part of the install target.funs echo "poet.Sqpe. There is more discussion of loading code into S starting on page 171.objs: install. The DDOBJS variable in my case is declared to be: S Poetry c 1998 Patrick J. then the usual executable is used. The local. You can redo the make install command any time that you have added ﬁles to the Makefile.0 . Once the Makefile is edited. The result of CHAPTER is to create a Makefile that allows easy updating.Sqpe in the current directory that will contain the usual S executable along with the additional code given in the command. DANGER. A command with it might look like: % S LOAD mycode.h include ﬁle is found. There are locations in the Makefile where you put the S dump ﬁles and the help ﬁles. Burns v1. One thing that it does is make sure that the S. I have added the target: install. If COMPILE doesn’t work properly. The script that starts S looks in the current directory for a ﬁle named local. or that you have changed existing dump or help ﬁles.dump. The purpose of the CHAPTER utility is to make it easy to create a library of S objects. The Makefile that CHAPTER creates does not handle objects created by data. then this is used as the S executable.2. A utility added to S-PLUS is COMPILE which is used to compile source ﬁles in a way that makes the resulting object ﬁles suitable to be loaded into S. this will create a ﬁle named local. This is really a call to make. then you merely need to do: % make install and the S objects and help ﬁles are all put in the right place.o my_other_code.data. 191 The LOAD utility performs static loading into S. If it doesn’t exist. In the Makefile for the code that goes with this book that was made by CHAPTER. it is probably because of a problem with your local installation.

After that an arbitrary number of ﬁles can be given that contain help. Symbols marked “U” are not part of the object ﬁle.ctemplate. then those symbols have to be known to S.stack S Poetry c 1998 Patrick J.Help perl.verif.Data/.o:00000230 T digamma. but the other functions do appear and are marked with a “T”. but CHAPTER can ease the pain in this case also. make it look like: .192 CHAPTER 9. Another S-PLUS utility is HINSTALL (it is undocumented). Here is an example: % Splus NM digamma.d transcribe.o: U digamma.stret8 _digamma_complex_Sp _digamma_real_Sp _log _set_inf _set_na _test_inf _test_na The C functions that are deﬁned as static within the ﬁle do not appear. Burns v1.Q These are the objects that the sccs function used data.FN stack This line says what object the help ﬁle is for. meaning that they are deﬁned. The S-PLUS utility NM is a convenient way to use the Unix nm command to examine the symbols in a ﬁle of object code.FN stack .o: U digamma. To add the print method for stack to the help ﬁle.d The ﬁrst argument after HINSTALL is the help directory where the ﬁles are to go.o: U . The advantage of Splus NM over nm is that the former has the same format on all platforms.o: U digamma.0 .o: U digamma. Near the top of a help ﬁle you will ﬁnd a line like: .o: U digamma.Q portoptgen. so if this ﬁle is to be loaded into S. An advantage of using HINSTALL is that it automatically takes care of a ﬁle that is for more than one object. S FOR S PROGRAMMERS DDOBJS=poet.FN print. and is used like: % Splus HINSTALL .dump on.o:00000000 T digamma.o digamma. The situation is complicated when C or Fortran code is also involved. This installs help ﬁles.

The ADDKEYWORD utility is meant to facilitate adding keywords to the list of acceptable ones.Cat. I imagine that HINSTALL will go extinct.Data (This utility fell outside of the capitalization convention. Burns v1. See page 135 for a discussion of masking. When you add help ﬁles to the . % Splus MASKED Searching user directory . UTILITIES 193 Mostly this FN command is ignored.2. the titles and the keywords. See the list in the help ﬁle for prompt.Help directory. The help ﬁles in version 3 of S are formatted by the Unix nroff command. S Poetry c 1998 Patrick J.Cat. It is used like: % mkdir .9. Because version 4 of S changes the way help works. That is.Data/.findsum S-PLUS function that performs the same task as the utility. it is possible that the function won’t do. The category part of the S-PLUS help window system looks for the special ﬁle in each database on the search list that contains the information from the help ﬁles based on the function names.Data . then the keywords in the new help ﬁles won’t be recognized until you update the ﬁle. Some machines do not have nroff so S will look in a .findsum .0 .Help directory. However. Update it with the command: % Splus help.Help directory for preformatted help ﬁles before looking in the .) The keywords in the help ﬁle can not be arbitrary—only a recognized set of keywords are used. It is reasonably obvious what to do once you ﬁnd the right ﬁle. My one attempt at using this utility met with failure. but HINSTALL knows about it and does the right thing.Cat.Data S-PLUS has added the MASKED utility that shows objects in the working database that mask other objects. but if things are really messed up. There is also a help.Help subdirectory of a database. just putting FN lines in the ﬁle does nothing to make help for the additional objects unless you use HINSTALL.. The CATHELP utility will put the formatted ﬁles into the . you’ll notice.Help % Splus CATHELP . S-PLUS also has a masked function. it did usefully point to the ﬁle that needs to be changed. You can also use this if the extra added speed of getting help is worth the extra space taken up..

If you want none of the ﬁle saved. The default ﬁle is the one that would be written to when you start S in the current directory—that is.Audit) to most recent 100000 characters Audit file truncated to 99913 characters This saved the most recent part of the ﬁle and threw away the older part.Data/.NA 3461: jjn Above we start up AUDIT and look at the commands that “get” an object named jjn.0 . Burns v1. (The magic to turn auditing oﬀ is given on page 160. This is used by the history function. so the output is abridged): audit: N GP rw . You can give it a number which tells the approximate number of characters to save.194 CHAPTER 9. Below we view the names that were both put and got (there are a lot.Data/. Run TRUNC_AUDIT to truncate it. the . % S AUDIT Reading audit file ".Data/jjv .. S FOR S PROGRAMMERS One of the things that S does behind the scenes is to keep an audit ﬁle of the commands given to S.Data/jjn 3463: jjn 3462: jjn[jjn < 0] <.Data/jjsy rw .Audit ﬁle that is in the . S Poetry c 1998 Patrick J. See help file for complete instructions. can be examined with the AUDIT utility.. you will get a message like: > q() Warning: Audit file is 519851 characters long.Data that S will use as the working directory—but you can specify the ﬁle of your choice. This is telling you to run a Unix command like: % Splus TRUNC_AUDIT Truncating audit file (. you would say: % Splus TRUNC_AUDIT 0 The AUDIT utility provides a means of viewing what an audit ﬁle contains.Audit" 3652 statements. 352 identifiers audit: G jjn .Data/jjmat rw . and has potential to be used for other purposes also.) If the audit ﬁle grows to more than a certain size.

NA 3268: jjn <.3. The command % Splus LICENSE users produces a list of those currently using S-PLUS licenses.1:5 > jjn[jjn < 0] numeric(0) S Poetry c 1998 Patrick J. Finally two more utilities that may be of interest are EXEC and CSH. The LICENSE utility of S-PLUS is primarily for system administration. but I haven’t found it. DANGER.9. Should we believe in nothing? 28 Here is a common situation where zero length objects occur.3 Zero Length Objects Objects of zero length are not uncommon in S.1:5 audit: q Type “q” to exit the AUDIT program. and CSH gives you a C-shell with the S environment. % Splus EXEC env 9. The EXEC utility executes a Unix command in the environment of S. > jjn <. So you can decide who to harass when you can’t get a license.0 . They can be unnerving for the uninitiated. I haven’t ever gotten the E command of AUDIT to work. Perhaps it does work on some machine. ZERO LENGTH OBJECTS 195 Perhaps one of the more useful commands is the ability to backtrack a command number: audit: B 3463 jjn ~~~~~~~~~ 3463: jjn 3462: jjn[jjn < 0] <. but their existence adds a great deal of versatility to the language. Burns v1. but there is one function of general interest.

> jjlist $a: [1] "aaa" "aa" S Poetry c 1998 Patrick J. > jjn[jjn < 0] <. the elements would be numbers. A NULL object contains nothing.0 . S FOR S PROGRAMMERS Even when doing a replacement here.196 CHAPTER 9. Its vacancy glitters round us everywhere. 29 A common occurrence of NULL is as the result of the extraction from a list of a component that doesn’t exist. Burns v1. > character() character(0) > length(character()) [1] 0 > length("") [1] 1 > nchar(character()) numeric(0) > nchar("") [1] 0 The character() object contains no strings.NA > jjn [1] 1 2 3 4 5 A source of confusion is the diﬀerence between a zero length character vector and the empty string. there is no problem. and it is not speciﬁed what it would contain if it did have something. An object that is numeric(0) contains nothing. Recursive objects can have zero length also: > list() list() > length(list()) [1] 0 > parse(text="") expression() An object printed as NULL (mode null) is a special zero length object. while "" is one string that has no characters in it. but if it contained anything.

9. there is a diﬀerence between a component that does not exist and a component whose value is NULL.3.list(NULL) # change component > jjlist $a: [1] "aaa" "aa" $b: NULL $c: [1] "ccc" "ccc" > jjlist[[2]] <.NULL # no change > jjlist $a: [1] "aaa" "aa" $b: [1] "bb" "bbb" $c: [1] "ccc" "ccc" > jjlist[2] <.0 . S Poetry c 1998 Patrick J. To change the value of a component to NULL. Burns v1. use double brackets and assign NULL. ZERO LENGTH OBJECTS $b: [1] "bb" 197 "bbb" $c: [1] "ccc" "ccc" > jjlist$d NULL Though it is often irrelevant.NULL # delete component > jjlist $a: [1] "aaa" "aa" $c: [1] "ccc" "ccc" So to remove a component. use single brackets and a list whose component is NULL. > jjlist[2] <.

198 CHAPTER 9.0 .991149 49. "fjjcylinder"<function(r = ((2 * V)/pi/h)^0.5.000402 2.5 $V: [1] 5.5) $r: [1] 1 2 3 $h: [1] 3. Burns v1. S FOR S PROGRAMMERS The fjjcylinder function returns the radius. h = h. h=3.497787 21.480084 > fjjcylinder(h=3. > fjjcylinder(r=4) Error in fjjcylinder(r = 4): Recursive occurrence of default argument "h" Dumped Where we’re caught is when S notices that it is being asked to resolve a circular reference. there has to be some sort of trouble. height and volume of a cylinder.954410 2.5 * pi * r^2 * h) { list(r = r.5 $V: [1] 21 22 23 Obviously if only one argument is given. h = (2 * V)/pi/r^2.5. Now we look at the state of the objects at the time of the error. V = V) } > fjjcylinder(r=1:3. > debugger() Message: Recursive occurrence of default argument "h" 1: 2: fjjcylinder(r = 4) Selection: 2 Frame of fjjcylinder(r = 4) d(2)> ? S Poetry c 1998 Patrick J.045361 $h: [1] 3. V = 0. and it computes the third. Give the function any two of these. V=21:23) $r: [1] 1.

0 .c("x".missing(x) ans["y"] <.3. call = match. "z") ans2 <.9. ZERO LENGTH OBJECTS 1: V 2: h 3: r d(2)> mode(r) [1] "numeric" d(2)> mode(h) [1] "unknown" d(2)> mode(V) [1] "unknown" d(2)> missing(r) [1] F d(2)> missing(h) [1] T d(2)> missing(V) [1] T 199 Note that not all versions of S (or S-PLUS) will give you the same answers in this example. "y". There are really two senses of what a missing object is. Burns v1. The second sense is that “missing” means an object of mode missing.call()) } > fjjtestmiss1(1. For example. the call fjjcylinder(r=4) means that h and V will be missing.missing(x) ans2["y"] <.missing(y) ans["z"] <. The ﬁrst and most proper is that it is an object that corresponds to an argument of a function that was not given in the call.missing(z) list(before = ans.ans ans["x"] <.missing(y) ans2["z"] <. after = ans2.logical(3) names(ans) <. z) { ans <.2) $before: x y z F F T S Poetry c 1998 Patrick J."missing" ans2["x"] <. y.missing(z) mode(y) <. Here is a little test case: > fjjtestmiss1 function(x.

"z") ans2 <.missing(z) list(before = ans. y = 2) CHAPTER 9. "y". but it appeals to my sense of irony.missing(x) ans2["y"] <. h = (2 * V)/pi/r^2. "cylinder"<function(r = ((2 * V)/pi/h)^0.missing(y) ans["z"] <. call = match.call()) } I’m not sure why you would ever want to do such a thing.0 . V = 0. with no default: fjjtestmiss1() Dumped So you can not make a missing argument missing. as in the fjjcylinder(r=4) call where V can’t be computed. y. z) { ans <. The proper approach is: > fjjtestmiss2 function(x. Burns v1.missing(z) if(!missing(y)) mode(y) <. A mode of unknown means that S is left confused about the situation.5.ans ans["x"] <. Nobody knew where I was and now I am no longer there."missing" ans2["x"] <.logical(3) names(ans) <. S FOR S PROGRAMMERS > fjjtestmiss1() Error in fjjtestmiss1: Argument "y" is missing. after = ans2.5 * pi * r^2 * h) { if(nargs() != 2) S Poetry c 1998 Patrick J.missing(y) ans2["z"] <.missing(x) ans["y"] <.c("x".200 $after: x y z F T T $call: fjjtestmiss1(x = 1. 30 There is a way to make a constraint-based function like fjjcylinder behave better.

4). These additional arguments are the attributes of the object."whim" > dput(jjstruc) structure(. Some language objects are recursive. . 2. Burns v1. each producing a diﬀerent type of object. 2. h = h. The structure function has a .2) > jjmat2 S Poetry c 1998 Patrick J.Data = c(1.4 Modes There are numerous modes. We know in this case that we need exactly two arguments. 4). 2) > dput(jjmat) structure(.1:4 > attr(jjmat2. V = V) } 201 The nargs function returns the number of arguments that are actually passed into the call for the function it is in. 4) > jjstruc <.Data argument and takes an arbitrary number of additional arguments.Dim = c(2.Data = c(1. 2. Although not a mode.1:4 > class(jjstruc) <. 2)) > attributes(jjmat) $dim: [1] 2 2 > jjmat2 <. We could allow all three arguments to be given. The modes are divided into three categories: atomic. class = "whim") > jjmat <. 9. Here are some examples: > dput(1:4) c(1. > cylinder(r=4) Error in cylinder(r = 4): must give exactly two arguments Dumped This is a more rational error message than that from fjjcylinder. and perform some sort of adjustment to make the three quantities follow the constraint.matrix(1:4. 3. “structures” are a way that objects are described. 3.9. MODES stop("must give exactly two arguments" ) list(r = r.0 . 3. Recursive objects may contain other objects of the same mode.4. ". recursive and language.Dim") _ c(2. Atomic (which includes mode null as well as those given in chapter 1) does not overlap with the other two categories.

functions are recursive. These are functions that appear on the “wrong” side of an assignment.] So . Using an assignment function may cause a local version of the object to be created. A special type of function is an assignment function.Data" "/usr/lang/s3. > find("freeny. the body is not named. the dput function is what does the real work in the dump function. Burns v1. This is a recursive mode since expressions can contain expressions. The names of a function are the names of the arguments.x) <. 12) > find("freeny.x") [1] ".c(13. Objects of mode expression are (the equivalent of) parsed commands. Mode call represents objects that are calls to a function. then the new copy will be in the frame of the function.1] [. Assignment functions have their own protocol to follow—see the explanation of names<-.rationalnum on page 248. as on page 25. the length of a function is the number of lines contained in the body of the function. S thinks the length of a function is the number of arguments plus one. then the local version will be in the working database. The extra “component” is the body of the function. In common usage.] [2. 6) The assignment function in this case has the (invalid) name of dim<-. When you are at the S prompt. S FOR S PROGRAMMERS [1.x) <. and loan on page 226. S functions are objects of mode function. The rest of the call is the arguments given. 12) Warning messages: Invalid dimnames deleted in: dim(freeny.Datasets" > dim(freeny. By the way.c(4. The ﬁrst component of a call is the name (mode name) of the function being called.Dim is another name for the dim attribute (due I’m sure to some sort of historical aberration).1/s/.Datasets" When in a function.x") [1] "/usr/lang/s3.c(13.1/s/. for example: dim(x) <. Since the default of an argument to a function can be a function. > length(expression(sin(1:4))) S Poetry c 1998 Patrick J.0 . Examples of this are in fjjlog2 on page 220.2] 1 3 2 4 CHAPTER 9.202 [.

but it works nonetheless. The command x[[c(1.0 .9. The portoptgen function on page 348 uses it extensively to build an S function. Burns v1. Formulas use the tilde operator.2)]] This allows a mixture of numeric and character subscripts. Another equivalent form is: x[[list(1.2. This operator is particularly lazy—it does nothing but return a call to itself. It preserves what was typed while being in a form that can be taken apart and manipulated.recursive(jjform) [1] T > is. > jjform <.2)]]) [1] "numeric" 203 Here we see an example of a type of subscripting that was promised in chapter 1.language(jjform) [1] T > jjform[[1]] ~ > jjform[[2]] y S Poetry c 1998 Patrick J.4. This seems like either a contradiction or an inﬁnite loop.2.2.2)]] is equivalent to x[[1]][[2]][[2]] Working with language constructs is one of the more likely spots where this form of subscripting is useful. MODES [1] 1 > mode(expression(sin(1:4))[[1]]) [1] "call" > expression(sin(1:4))[[1]][[1]] sin > mode(expression(sin(1:4))[[1]][[1]]) [1] "name" > expression(sin(1:4))[[1]][[2]] 1:4 > mode(expression(sin(1:4))[[1]][[2]]) [1] "call" > mode(expression(sin(1:4))[[c(1.y ~ x1 + x2 > mode(jjform) [1] "call" > is.

58123 1970: 9.00868 1965: 9.39767 9. b=as.69405 1971: 9. return.expression(if(x>0) y else z)[[1]] > jjif if(x > 0) y else z > mode(jjif) S Poetry c 1998 Patrick J.list(a=0.71774 9.79236 1963: 8.06906 9.79137 8.03049 9.69958 9.60048 9. To stand in the doorway is to stand in more brief worlds than this one 31 > jjform[[2]] <. Burns v1.y > eval(jjln$b) 1Q 2Q 1962: 8.81486 8. > jjif <.90751 1964: 8.y > jjln$b freeny.10698 1966: 9.2)]] x1 CHAPTER 9.64496 9.y")) > jjln $a: [1] 0 $b: freeny. while. repeat.44223 1969: 9.as. and the object itself. Something of mode name is in the crack between the character string that names an object.31378 9.42150 9.18665 9.64390 9. S FOR S PROGRAMMERS See the chapter on formulas for more.79424 Names are elusive because they are almost always evaluated. break.68683 9.05871 9.23823 1967: 9. These modes include: if.77536 4Q 8.96161 9.81301 8.28436 9.96044 9. and next.35025 1968: 9.12685 9.48721 9.93673 9.52374 9.35835 9.name("z") > jjform z ~ x1 + x2 > jjln <.name("freeny.53980 9.17096 9. for.74924 3Q 8.0 .204 > jjform[[3]][[2]] x1 > jjform[[c(3. There are several modes that control ﬂow within the language.26487 9.

recursive(jjif) [1] T > is. MODES [1] "if" > length(jjif) [1] 3 > is. and to see language modes in action.9. In the second statement above the assignment to jjx is not made so it doesn’t exist rather than having the value NULL as in the ﬁrst statement. > jjbrk <. jjx) > jj <.1 > jjx <.0 .vector("break") > jjbrk break > is.atomic(jjbrk) [1] F > is.3 NULL > jjx Error: Object "jjx" not found The value of the else clause when it is missing is NULL. S Poetry c 1998 Patrick J.4. Burns v1.recursive(jjbrk) [1] F > is. Here we see how an if without an else works: > rm(jj.assign function on page 211 for a couple additional modes. See the find.language(jjbrk) [1] T > length(jjbrk) [1] 0 So an object of mode break is neither recursive nor atomic.if(jj == 0) 3 > jjx NULL > rm(jjx) > if(jj == 0) jjx <.language(jjif) [1] T > jjif[[1]] x > 0 > mode(jjif[[1]]) [1] "call" 205 So an if object has length 3 (2 when the “else” is missing) and each piece is usually some other language object.

Dumped There is no problem making objects with new names. but when it hits the comma. The ﬁrst statement will parse ﬁne. b) They are exactly alike except for the third character. 32 When S parses a statement. “quentled” is a verb in the past tense of the passive voice.206 CHAPTER 9. and argument separation is the only use in S that a comma has. If it sees ! as the ﬁrst character. The second statement will parse okay through the “a”. but the parser would not understand it if you tried to create new operators at whim. ++ as a new operator because that would change the syntax of the language. “nebrad” is an adjective modifying “pebﬁn” which is a noun and the object of the preposition “in. Consider the pseudo-English sentence “The wugbun was quentled in the nebrad pebﬁn. but that doesn’t hinder us from understanding the form of the sentence. then it interprets this as an escape to the operating system. then a syntax error results. > ++2 [1] 2 > +-++2 [1] -2 > 3//5 Syntax error: */ ("/") used illegally at this point: 3// S Poetry c 1998 Patrick J. This is the process of breaking the command into units of known type. the parser can’t ﬁt a character in sensibly.0 .” We parse this by realizing that “wugbun” is a noun and the subject of the sentence. S FOR S PROGRAMMERS 9. > parse(text="f + (a. say. b)") Syntax error: ". then it tries to ﬁt it into one of the operators != or ! (meaning “not”). You can not add. but if it sees it only later in a statement. If at some point." used illegally at this point: f + (a. it realizes there is a problem—the parenthesis can’t be thought of as the start of an argument list because of the + operator.” We don’t know what these words mean. Burns v1. b) f + (a.5 Evaluation The ﬁrst step S takes in order to perform a command is to parse it. I can’t blab such blither blubber. it looks at each character in turn and tries to make sense of it. Learning what the words mean and then understanding the sentence would be analogous to evaluation in S. S decides what are numbers and what are object names. Here are two statements: f (a.

(Quotes are not allowed. However. e2) > 3 %@#&*#% 5 [1] 3 5 2 Below is the deﬁnition of an “element of” operator that might be of use at times. Burns v1. There is no interpretation for something like // so a syntax error results. nomatch=0) > 0 > c(-1.function(x. of Freedom 14 15 30 S Poetry c 1998 Patrick J. 2) [1] F T F F F F F F F F Statisticians looking at the list of operators may be wondering why %in% that is used in analysis of variance formulas is not listed. 2) %e% 1:10 [1] F F T > 1:10 %e% c(-1.objects("%") $SHOME/s/. > find.Functions $SHOME/splus/. but the name of the operator must begin and end in %. Here is a list of all such operators from S-PLUS version 3.default" "%/%" $SHOME/s/.5.500 Deg.) > "%@#&*#%" <. pigment) Call: raov(formula = Moisture ~ Batch + Sample %in% Batch. data = pigment) Terms: Batch Sample %in% Batch Residuals Sum of Squares 1210. EVALUATION 207 Using ++ in S does parse—it is interpreted as two unary plus signs in a row. y. The characters that appear inside the % symbols are quite arbitrary.Functions "%*%.933 869. e2) rpois(e1.y) match(x.9.Functions "%o%" The %*% function is generic with only a default method appearing here. > "%e%" <.Functions "%%" "%*%" $SHOME/splus/. The functionality is there: > raov(Moisture ~ Batch + Sample %in% Batch.750 27.0 .3.function(e1. The following operator even contains a pound sign that elsewhere would signal the start of a comment. 34.Functions $SHOME/s/. S provides a way to add new operators. 34.

functions called within that function need to be evaluated. FALSE. We Jazz June.TRUE = exp(fjjb(x) + 3). The char. exp.208 CHAPTER 9. Functions used by fjj4 include: > fjjb function(x. TRUE. but with more nesting. Formulas use S parsing (so the precedence in formulas is the same as in S commands). exp. y = 2) { browser() x + y } > get("%+%") S Poetry c 1998 Patrick J. The function called on the command line will be: > fjj4 function(x. add. In the evaluation process. it can be very aesthetic.Internal functions satisfying certain conditions) that do not get their own frame. Hence there is a stack of memory frames that grows and shrinks throughout the evaluation. but the formulas are not evaluated in S.new = F.9574271 Estimated effects are balanced The %in% is not really a function. it is perhaps a little gratuitous.TRUE = exp(fjjb(x) %+% 3)) } CODE NOTE.paste(add. since this is the chapter where I’m promising to tell the truth. likewise the + in the formula is not really addition. TRUE. sep = ". Below are some examples of frames shown by traceback from a call to browser.logic <.") switch(char.new. S FOR S PROGRAMMERS Residual standard error: 0. I have to state that there are “quick calls” (which are . S creates a memory frame (also called an evaluation frame) for each function call as it is evaluated. Each of the frames has a unique number which gives the depth it is in the stack. Burns v1.use = F) { char.logic.logic variable provides a mechanism for ﬂattening out a messy nest of “if-else”s. especially if a portion of code corresponds to more than one case.FALSE = log(fjjb(x) + 3).0 . Actually. We Thin gin. and functions within those functions need to be evaluated. and so on.use. In this case where it is only two deep.FALSE = log(fjjb(x) %+% 3). FALSE. 33 As a function is evaluated.

default(nframe. parent) from 7 7: browser. eval.frame.0 . message = paste("Called from:".302585 > fjj4(5. e2) { . Burns v1. add=T) Called from: fjjb(x) b(5)> traceback() 8: eval(i. T) from 1 1: from 1 b(4)> 0 [1] 22026. T.47 S Poetry c 1998 Patrick J.frame.default(nframe. message = paste("Called from:".Internal. T.frame.9. "do_op". from 6 6: browser() from 5 5: fjjb(x) from 2 4: fjjb(x) %+% 3 from 2 3: log(fjjb(x) %+% 3) from 2 2: fjj4(5.Internal(e1 + e2. EVALUATION function(e1. eval.default(nframe. 5) } 209 The %+% operator is the same as the + operator except that the former has braces around the call to . parent) from 6 6: browser.302585 > fjj4(5. from 5 5: browser() from 4 4: fjjb(x) from 2 3: fjjb(x) %+% 3 from 2 2: fjj4(5. from 5 5: browser() from 4 4: fjjb(x) from 2 3: log(fjjb(x) + 3) from 2 2: fjj4(5) from 1 1: from 1 b(4)> 0 [1] 2.5. T) Called from: fjjb(x) b(4)> traceback() 7: eval(i. eval. T. parent) from 6 6: browser. > fjj4(5) Called from: fjjb(x) b(4)> traceback() 7: eval(i. message = paste("Called from:". add = T) from 1 1: from 1 b(5)> 0 [1] 2.

then it should go into frame 1. If it needs to outlive a single function call. S FOR S PROGRAMMERS From these tracebacks we can infer that exp and + are quick calls while log and %+% are not.offset if(exists(x. mode = mode..character(x) || length(x) != 1) stop("need single character string for x") parent <. but there is never a frame for exp. Frame 1 has the curious property of being its own parent. Database 0 starts when an S session starts and dies when S is exited. = "any". Frame 1 is generally underused. • Finally each database on the search list is searched in turn. If it will only be used during a function call. Frame 1 is best explained as lasting from the start of a command being evaluated until the prompt returns. • If the object is not found in any of these places. offset = 0) { if(!is. also referred to as frame 0. When a variable needs to be global. Burns v1. When S looks for a an object. The number after the “from” gives the frame number of the parent of each frame. When log is used. Note that not all frames are searched. mode. frame = parent)) return( . "whence"<function(x. It ﬁrst looks for the appropriate object in the memory frame of interest. the session database or the session frame—so many names for one place. then it should go into database 0. this is almost always the location where it should be put. • The third place to look is database 0.210 CHAPTER 9. then in frame 1. frame = 1)) S Poetry c 1998 Patrick J.sys.parent) if(exists(x. there is a frame corresponding to it. Often S knows that it needs a function so it will ignore objects with the same name that are not functions (it gives a warning if this happens). but can be logically or conveniently recreated each session. it looks in a number of places until it ﬁnds an object that satisﬁes the requirements. The places searched are: • The ﬁrst place S looks is the current frame. You want the variable to live only as long as required. Only when the variable needs to live beyond a single session.parent() . A negative return value means that the object is found in a frame. then in database 0. then an error occurs. then on the search list. • The second place where it looks is a funny critter called frame 1.0 . should it go into the working database. mode = mode.. The whence function returns the location where an object is found.

9.5. EVALUATION return(-1) if(exists(x, mode = mode., frame = 0)) return(0) find(x, mode = mode., numeric = T)[1] } The whence function is used on page 228 in the [.stack function.

211

The find.assign function takes an expression and attempts to return all of the variables that are assigned to in the expression.

"find.assign"<function(line) { switch(mode(line), "<-" = , "<<-" = { ans <- line[[1]] if(mode(ans) == "call") return(character(0)) else if(mode(line[[2]]) == "function") { this.fun <- line[[2]] return(c(ans, names(this.fun)[ - length( this.fun)], Recall(this.fun[[length(this.fun) ]]))) } else if(mode(line[[2]]) == "<-" || mode(line[[2]]) == "<<-") { return(c(ans, Recall(line[[2]]))) } else return(as.character(ans)) } , comment.expression = return(Recall(line[[1]])), "{" = { ans <- character(0) for(i in 1:length(line)) { ans <- c(ans, Recall(line[[i]])) } return(ans) } , "if" = return(c(Recall(line[[1]]), Recall(line[[2]]), Recall( line[[3]]))), "while" = { return(c(Recall(line[[1]]), Recall(line[[2]]))) } , "for" = {

S Poetry c 1998 Patrick J. Burns v1.0

212

**CHAPTER 9. S FOR S PROGRAMMERS
**

loopv <- substring(deparse(line)[1], 5) loopv <- substring(loopv, 1, match(32, AsciiToInt(loopv )) - 1) return(c(loopv, Recall(line[[3]]))) } , "repeat" = return(Recall(line[[1]])), call = { cnam <- as.character(line[[1]]) switch(cnam, assign = return(line[[2]]), switch = { ans <- character(0) for(i in 2:length(line)) { ans <- c(ans, Recall(line[[i]])) } return(ans) } , return(character(0))) } )

}

The basic idea of this function is dead simple—switch on the mode of the output, then recurse on pieces of the input using Recall (see page 239 for further explanation of Recall and recursion). The details can get a little messy, though. Although writing find.assign is a ﬁne exercise to understand S, its real purpose is as a subfunction to global.vars which attempts to return the variables that are global to a function. A mistake that is easy to make is to use a diﬀerent name in the body of a function than in the argument list for the same variable. For example: > global.vars(function(xmat) symsqrt(x)) [1] "x" This makes the misspelled name in the body a global variable. The situation is particularly dangerous when there is an object of that name on the search list—possibly created for debugging purposes. The global.vars function is an attempt to have a diagnostic for this. "global.vars"<function(fun) { if(is.character(fun)) fun <- get(fun) fnam <- names(fun) fnam <- fnam[ - length(fnam)] S Poetry c 1998 Patrick J. Burns v1.0

9.5. EVALUATION allv <- all.vars(fun, uniq = T) allv <- allv[!match(allv, c("NA", "T", "F", "Inf", "NULL", ".Internal", fnam), nomatch = 0)] body <- fun[[length(fun)]] if(mode(body) == "{") { anam <- character(0) for(i in 1:length(body)) { anam <- c(anam, find.assign( body[[i]])) } } else anam <- character(0) if(length(anam)) allv <- allv[ - match(anam, allv, nomatch = 0)] allv }

213

The all.vars function returns a vector of all of the variables in an expression, in this case we want just the unique names rather than each occurence of a name. Next we get rid of the common names we know are wrong and the argument names. Finally we get rid of the names that are assigned to in the body of the function. If you’ve been paying attention you should be a little concerned about this last move—on page 72 I said there was a problem with subscripting with negative numbers from match. But given the protection for the length of anam, there will be at least one match if that line of code is reached, so it should be ﬁne. S uses lazy evaluation. This means that arguments to functions are evaluated only when they are actually needed. Being used by another function doesn’t count—the object is passed into the call unevaluated. Once the value is needed, it is evaluated in the appropriate frame and the value is eﬀectively passed to all of the intermediate frames. An advantage of lazy evaluation is that unnecessary evaluation is avoided. Another is that the default to an argument can involve variables created inside the function as long as the variables exist by the time that the argument is needed. The down-side is that in certain situations there are problems with knowing where the evaluation should take place. The most common way to get this problem is when using formulas inside of functions—see page 291. The delay.eval function on page 217 slightly illuminates lazy evaluation. S is essentially a functional language, which means that variables in one function are not changed by functions called by the function. That is, side eﬀects are minimized, and generally only occur in a limited set of circumstances. S Poetry c 1998 Patrick J. Burns v1.0

214

CHAPTER 9. S FOR S PROGRAMMERS

Actually, it is possible with the assign function to change objects in other frames (one reason why S is not strictly a functional language). When objects are assigned to databases, they are not actually written to the disk until just before S gives another prompt. The assignments are said not to be committed until the prompt is given. For the most part, this is of no concern, but there are times when it matters. Instances where it does matter include large, repetitive computations (page 355), and browser calls (page 138). The immediate argument to assign insures that the commitment is made at the time of evaluation rather than held until later. See also the discussion of synchronize on page 103. There is a number of functions that provide information on the stack of memory frames at a given moment. Here is a list of them: > objects(5, pat="^sys") [1] "sys.call" "sys.calls" "sys.frame" [4] "sys.frames" "sys.function" "sys.nframe" [7] "sys.on.exit" "sys.parent" "sys.parents" [10] "sys.status" "sys.trace" We can use the fjj4 function again to explore what some of these functions do. > fjj4(6) Called from: fjjb(x) b(4)> traceback() 7: eval(i, eval.frame, parent) from 6 6: browser.default(nframe, message = paste("Called from:", from 5 5: browser() from 4 4: fjjb(x) from 2 3: log(fjjb(x) + 3) from 2 2: fjj4(6) from 1 1: from 1 b(4)> sys.nframe() [1] 4 b(4)> sys.parent() [1] 2 Although it is a little confusing with the browser complicating things, we get the answer that we are in frame number 4 and the parent to frame 4 is frame 2. b(4)> sys.call() fjjb(x) b(4)> sys.frame() S Poetry c 1998 Patrick J. Burns v1.0

9.5. EVALUATION $.Auto.print: [1] T $x: .Argument(x, x = ) $y: .Argument(, y = 2) b(4)> sys.function() function(x, y = 2) { browser() x + y }

215

The .Argument (which is a mode) objects are unevaluated arguments to the function. b(4)> sys.parents() [1] 1 1 2 2 4 5 6 4 sys.call is a close synonym of match.call. Either of them can be used to produce the call component or attribute of the results of functions. The diﬀerence is that match.call always gives the argument name: > fjj5 function(x) { match.call() } > fjj6 function(x) { sys.call() } > fjj5(y ~ x+z) fjj5(x = y ~ x + z) > fjj6(y ~ x+z) fjj6(y ~ x + z)

The ignore.error function allows a value to be given to a computation that produces an error, and the computation continues. This function provides an example of how to control evaluation as well as introducing the restart function. S Poetry c 1998 Patrick J. Burns v1.0

216

CHAPTER 9. S FOR S PROGRAMMERS

"ignore.error"<function(call, value = NULL) { nframe <- sys.nframe() flag <- paste("flag.ignorE.error", nframe, sep = ".") if(exists(flag, frame = 1) && get(flag, frame = 1)) { assign(flag, F, frame = 1) cat("(ignored)\n", file = "|stderr") return(value) } else assign(flag, T, frame = 1) restart(T) ans <- eval(call, nframe - 1) assign(flag, F, frame = 1) ans } Let’s go through the execution of this function. We start by ﬁnding where we are in the stack of frames. Next the variable name for the ﬂag is created—the funny capitalization helps to avoid name conﬂicts, but the important part is the frame number in case ignore.error is called in more than one level. If the ﬂag exists and is TRUE, then the ﬂag is changed to FALSE, the error message is appended, and the function exits with the return value. Otherwise the ﬂag is assigned to be TRUE, and restart is called so that evaluation will resume after an error. Now ﬁnally, the actual statement of interest is evaluated in the proper frame. If no error occurs, then evaluation within ignore.error continues—the ﬂag is turned to FALSE and the result is returned. If an error does occur, then the restart causes evaluation to resume at the beginning of the ignore.error call which means that the condition in the if will be TRUE. Note that you may be in for a wild ride if you use a function containing restart that is not properly debugged. Below is a function that uses ignore.error and an example of its use. When ignore.error is used, it is reasonable to set the error option to NULL to save time. "fjjigerr"<function(input) { this.mat <- base.mat <- matrix(c(2, 1, 1, 1), 2) lin <- length(input) ans <- array(NA, c(lin, 2), list(input, NULL)) for(i in 1:lin) { this.mat[4] <- base.mat[4]/input[i] ans[i, ] <- ignore.error(eigen( this.mat)$val, NA) S Poetry c 1998 Patrick J. Burns v1.0

9.5. EVALUATION } ans } > fjjigerr(-2:1) Error in eigen.default(this.mat): missing or infinite values in x Dumped (ignored) [,1] [,2] -2 2.350781 -0.8507811 -1 2.302776 -1.3027756 0 NA NA 1 2.618034 0.3819660

217

The for loop in fjjigerr uses 1:lin where lin is length(input). There is no assurance that input will have a length, so the loop might fail—see page 128. In this case it isn’t tragic since it is just a temporary function, and it will fail on array anyway. However, it is good to keep these things in mind for when they do count. The delay.eval function plays on lazy evaluation to allow evaluation to be performed in a diﬀerent memory frame than usual. "delay.eval"<function(expr, frames = 1) { eval(substitute(expr), local = sys.parent() + frames) } For example, the default value for ncol.arg in diag.default is n. The n is local to diag.default and not part of the environment of the frame calling diag.default. We can use the n inside a call to delay.eval though. > args(diag.default) function(x = 1, nrow.arg, ncol.arg = n) NULL > diag(1, nrow=3, ncol=2*n) Error: Object "n" not found Dumped > diag(1, nrow=3, ncol=delay.eval(2*n)) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 0 0 0 0 [2,] 0 1 0 0 0 0 [3,] 0 0 1 0 0 0 S Poetry c 1998 Patrick J. Burns v1.0

218

CHAPTER 9. S FOR S PROGRAMMERS

The unabbrev.value function is meant to allow an abbreviated function argument. That is, the argument can be one of a few strings that would need to be given in full without some mechanism like this. "unabbrev.value"<function(x, choices) { err <- F if(!is.character(x) || length(x) != 1) { err <- T emsg <- paste("need single character string for", deparse(substitute(x))) } else { xnum <- pmatch(x, choices, nomatch = 0 ) if(xnum == 0) { err <- T emsg <- paste( "unknown or ambiguous choice for ", deparse(substitute(x)), ": ", x, sep = "") } } if(err) { eval(call("stop", emsg), local = sys.parent()) } else choices[xnum] } A particular feature of this function is that the error message contained in emsg—if err is TRUE—will come from the function with the argument, not from unabbrev.value. An example of the use of this function is in line.integral on page 320. The .Internal function calls C code that is specially written to understand S. You can not write your own .Internal functions, and you can not have access to the code that they use. However, a brief explanation of a .Internal call might help you sometime. Here is an example: > get("*") function(e1, e2) .Internal(e1 * e2, "do_op", T, 4) The .Internal function has four arguments. The ﬁrst argument is an image of how the call must have looked, with appropriate substitutions made. S uses S Poetry c 1998 Patrick J. Burns v1.0

However there are a few occasions where ﬁnesse is desirable..00 > unix.00000000 0.39 106.6931472 1.Options object is a named list where the name is the option and the component is the value of the option.00 0. > fjjlog1 function(. the * function uses case 4 of the do op code.00 0.Options object that lives in database 0.00000000 [5] 0. then we might well be happier with the warnings.42 2. (Note there were really 1000 warnings.22000122 0.3862944 > unix. If this were always the case. For example.00000000 S Poetry c 1998 Patrick J.opt <.0986123 1. Here is a revision of the log function so that a warning is not given when negative numbers are given to it.) } Now we test that it works and time it relative to the original. The second argument names the section of internal S code that is to be used. EVALUATION 219 this image internally—it is more than cosmetic. The third argument is a logical value which states whether the arguments are to be evaluated or not. and the fourth argument is the speciﬁc part of that section that is to be used.02999973 3. you should only change options through the options function (or a modiﬁcation like soptions given on page 45)..00000000 There were 50 warnings (use warnings() to see them) If we had to put up with this much loss of speed.9. but S saved only the ﬁrst 50 of them.00000000 0. Options are implemented by using the . but it is easily created and modiﬁed appropriately as S starts up.) But a diﬀerent revision performs much better: > unix. but then one would be wrong.0000000 0.opt)) old.) { on.. Burns v1.00000000 [5] 0.time(for(i in 1:1000) fjjlog2(-4:4)) [1] 3..0 .time(for(i in 1:1000) fjjlog1(-4:4)) [1] 101.01999998 1. This is a logical place for it since it must have persistence.5. The . One might think that arguments should always be evaluated.options(warn = -1) log(. then there would be no reason to care how options are implemented. For almost all purposes.94000244 0.exit(options(old. The second argument is of signiﬁcance relative to the groups of the object-oriented programming.time(for(i in 1:1000) log(-4:4)) [1] 0. > fjjlog1(-4:4) [1] NA NA NA NA -Inf [6] 0.

There is no need of on.00000000 The statement above gives us the timing.) { .Options object is created due to the action of the $<.Auto.00000000 0.. but it is a little bit complicated to implement because not everything is printed.0 .220 The revised function is: > fjjlog2 function(..time(print(mean(rnorm(100)))) [1] 0.00999999 0. but we don’t get to see the result of the computation.time exits. S FOR S PROGRAMMERS This creates a local version of .Options$warn <. A simple and logical idea. The reason is that the last statement of print methods is invisible(x) so that the object being printed is returned but there can’t be an inﬁnite loop of printing. Suppose that we want to time a statement: > unix. "interlude"<function(x) { S Poetry c 1998 Patrick J.00000000 0. When you type the name of an object at an S prompt.02105694 Now we see the result of the statement. We need a call to print around the statement being timed in order to see its result: > unix.exit because the local version will die when fjjlog2 exits.Auto..00000000 0.print in frame 1 to FALSE so at the point that unix.function.00000000 0.00000000 Here is the deﬁnition of the interlude function whose use was explained on page 59. but not the timing.00999999 0. automatic printing is not in eﬀect.00999999 0. Burns v1.-1 log(.. There is a special object named .time(print(mean(rnorm(100))))) [1] 0. we need two calls to print: > print(unix.2123258 [1] 0. the object is printed.Options with the change to warn. That call to invisible has set . To see both results.time(mean(rnorm(100))) [1] 0. The new version of the .print that controls automatic printing.00000000 0.) } CHAPTER 9.

Burns v1.interludE.final) == 3) .interludE.Interlude".list[[.iname.interludE.intersect(names(ilist).substitute(.init) == 3) .exit({ .init) .onelist if(makefun[i]) { thisfun <.get(".iexpr thisbody[[1]] <.time <.interludE. 0) if(length(.interludE.interludE.length(thisfun) thisbody <. 5). where = 0) } } assign(".interludE.interludE."interlude" } onelist <.init <.deparse(substitute(x)) if(exists("..Interlude".interludE.Interlude".5.final <.0 . ilist. where = 0) } Let’s start at the bottom and work towards the top.get(i) thisn <. length(x)) names(makefun) <.thisl <.interludE.proc. . S Poetry c 1998 Patrick J. where = 0)) ilist <.interludE.time = rep(0.get(".interludE.interludE. on..thisl assign(".interludE.init <.c(.interludE.time(). where = 0) else { ilist <."{" lenex <. 0) .interludE.list.F for(i in x) { ilist[[i]] <.c(.final <.name <.length(iexpr) makefun <.character(x)) x <.interludE.list(0) class(ilist) <.9. This object is a list in which each component holds the time and number of calls for each of the functions being proﬁled.thisl$ncalls <.thisbody assign(i.thisl$total.proc.name]] .Interlude to the session database. . 0.thisl$ncalls + 1 .rep(T..name]] <.interludE. 0. thisfun.time + ( .interludE. where = 0) } ).interludE.. The last thing that interlude does is to assign .Interlude".x if(length(old <.final . ncalls = 0) iexpr <.interludE.interludE. NULL) mode(iexpr) <. where = 0) .expression(NULL. x))) makefun[old] <.final.Interlude". list(iname = i)) thisbody[[lenex]] <.list <.list[[..init.interludE.list(total. EVALUATION 221 if(!is.time() if(length(.thisl$total.thisfun[[thisn]] thisfun[[thisn]] <.

list[[. where = 0) .final <.interludE.interludE.interludE. Burns v1.thisl$ncalls + 1 . then ﬁlling this object in with the name of the function being modiﬁed.final .5 > interlude(sqrt) Warning messages: assigning "sqrt" on database 0 masks an object of the same name on database 5 > sqrt function(x) { .list[[ .thisl$ncalls <.exit({ .interludE.thisl assign(". > sqrt function(x) x^0.222 CHAPTER 9.list <.thisl$total. 0.interludE.interludE.final.interludE.proc. 0) .time + ( .c( .Interlude.interludE.init) == 3) .interludE. The next thing to do is to modify the function to be proﬁled if necessary and assign the modiﬁed function to the session database.interludE.interludE.final <.interludE.0 .interludE.name]] .get(".interludE.interludE. and with the body of the real function (remember that the last component of a function is its body—the previous components are its arguments). 0) if(length(.final) == 3) .interludE.interludE. Here is an example of a function after interlude has gotten hold of it. 0.interludE..interludE.time <. where = 0) } ) S Poetry c 1998 Patrick J.name]] <.thisl$total.time() if(length(.init.thisl <. The ﬁrst thing is to initialize the proper component of the list that will become . Modifying the existing function is done by creating an object of mode “brace”.c( .init) ."sqrt" on.Interlude".interludE. S FOR S PROGRAMMERS Now look at the for loop—this loops over each function to be proﬁled.Interlude".interludE.list.init <.name <. .interludE.

Signals are a way to get a program to change its behavior while it is running.Program if you like. You can use your own . Although it is possible to change the keystrokes for signals initiated by the user. Almost all generic functions look like: S Poetry c 1998 Patrick J. The .init <. You probably won’t want to hit control-backslash twice since the second signal will carry you from the S prompt to the Unix prompt.9.proc.5 } 223 So the usual function has just the last line of the modiﬁed function. and control-z (background). The signals are sent by holding down the “control” key and hitting another character.interludE.exit call with a bunch of commands in it. There are three new commands in the function: the name of the function is captured. and will leave you at a browser prompt if that’s where you are. In contrast control-c will leave you at an S prompt if you are at one already. control-backslash (a stronger interrupt). It is a Unix convention that control-d will exit you from a program—S follows that convention. control-d (a quit signal).6.Program object controls what S does with the expressions that it gets. examples are ﬂoating point exceptions and segmentation violations.time function contains the current amount of time that the S session has consumed.Program essentially parses then evaluates the expressions (plus some other details like deciding what to print). Typically it is the class of the ﬁrst argument which controls the method that will be dispatched. The most common signals in Unix are control-c (interrupt). Burns v1. The in-built . an on. There are other types of signals that are initiated by programs. 9. Controlbackslash is used in S to immediately exit from a call to browser (or debugger) and get back to the S prompt. This is a bit like putting a new foundation under S. I will refer to them by their common mappings. and the current time is captured.6 Object-Oriented Programming The UseMethod function is the key ingredient of a generic function. Control-d is equivalent to the S command q().time() x^0. Use control-c to stop an S command and get back to the prompt. Use control-backslash when you are at a browser prompt and you want to get back to an S prompt rather than have the function continue.0 . Often a call to UseMethod is the entire body of the function. OBJECT-ORIENTED PROGRAMMING . so we get how much time the function uses by subtracting the result of proc. (The proc.) More of this sort of function modiﬁcation is exhibited in the section on Lagrange interpolation starting on page 322. and is not something that everyone will want to do.time at the beginning from the result at the end.

) UseMethod("cavort") CHAPTER 9..foo(2..4. y. then the name of the argument needs to be given as the second argument to UseMethod. An example is: > residuals function(object. y) if(length(class(x))) UseMethod("%myop%") else UseMethod("%myop%". . Burns v1.. . 4.jjz) Called from: gambol. but there is no point as long as it is always used as an operator.) UseMethod("residuals") > resid function(object. Another use of this argument is to allow nicknames for the generic function. A look with browser in the frame of a generic function reveals some new objects. S FOR S PROGRAMMERS The ﬁrst argument to UseMethod is the name of the generic function. jjz) S Poetry c 1998 Patrick J. z) Generic functions include UseMethod. z. y) This deﬁnes a generic operator where it ﬁrst checks to see if it can dispatch using x and if not. > gambol function(x.0 .) UseMethod("gambol". Version 4 of S will change what you want to do here.224 > cavort function(x. . it is good form to include it.) UseMethod("residuals") If you want an argument other than the ﬁrst to be the one whose class controls the generic function. . Perhaps the simplest logical case where it isn’t is: "%myop%" <function(x.. however they need not be a bare call to UseMethod.. The %myop% function violates the rule that a generic function should always have the three-dots as an argument.. It is especially important to give the generic name when the name includes a period—there could be ambiguity of the method to use otherwise.. > gambol(2. it dispatches using y.. Although this is optional.

max. any.6.Group 4: z 5: . Although all of the generic functions that you write will use UseMethod.Generic 6: x 7: y 8: z b(2)> . jjzb) Called from: gambol.Group [1] "" b(2)> . OBJECT-ORIENTED PROGRAMMING b(2)> ? 1: . See the Math example with rational numbers on page 252 for a look at using group methods. gamma and so on. The useful groups are “Math”. Burns v1. min. The Ops are the operators like addition and subtraction.Group and . .foo" b(2)> .foo(2.Method objects are self-explanatory. The . > gambol(2.) So you can write a dim. that may be of use.Class object gives the classes.Generic [1] "gambol" 225 The objects . including the present one.Class 2: . The . These functions are called internal generic functions.Class [1] "foo" b(2)> . “Summary” and “Ops”. The Math functions include the trigonometric functions. functions that are a call to .foo function which will be used as a method of dim for class foo even though dim is not obviously generic.Internal.9.0 .Method. (Each group consists of the functions that call the same routine in their .Method [1] "gambol.Internal are also generic. the "bar" class of the input is not used because there is no method corresponding to that class.Generic keep track of all of the information that may be necessary during evaluation. I’ve been lying to you again. S Poetry c 1998 Patrick J.Method 3: . The Summary functions are all. prod.Generic and .Class [1] "foo" "zax" b(2)> class(z) [1] "bar" "foo" "zax" Groups provide a labor saving mechanism by allowing the programmer to write a single method for a number of generic functions. range and sum. jjzb) b(2)> .Class. 5. 5. In the case below. .

year <.frame") ans } All that loan does is return a data frame that is a little bit specialized for what we want. Inheritance is the part of object-orientation that allows you to build on top of existing classes.double(NA).attr(object.month <.date") <.month. "last.name) attr(ans. S FOR S PROGRAMMERS For the most part. payment = as. the key line is assigning the class to have length greater than one.c("loan". as described on page 128.length(payment) new.name[month].loan"<function(object. "data.character(month)) month <. Burns v1.1) %/% 12 + last. interest = as. An exception is that NextMethod may be used in a method.date[ S Poetry c 1998 Patrick J. though there is an example of its use on page 252. "last.data. Next comes a method of the update function for loan objects. As it’s name implies. In some situations this is extremely powerful.frame(month = month.name ans <. "update.226 CHAPTER 9. I have seldom found NextMethod useful.attr(object.name) <. month.c(month = month. year = year) class(ans) <.last. year = year. Do note that the names of arguments in methods should be the same as those in the generic function. methods have no distinguishing features—they look like ordinary functions. "rate") npay <.date") rate <.pmatch(month.date <. The input in addition to the loan object is a vector of the next payments to be made. month. rate. The loan function creates an object that contains the details of the loan as it is ﬁrst made. Nextmethod is synonymous with the next method available for the generic function.double(NA)) attr(ans. "loan"<function(amount.date["month"] + 1:npay new. payment) { last. year) { names(month.(new. principal = amount. I’m sure that some would argue that inheritance is the main point of object-orientation.month . let’s produce a “loan” object that inherits from data frames. As an example.rate if(is.0 . In terms of inheritance. "rate") <.

21 275 227 That inheritance comes into this.month .rbind(object.prin <. "last.loan(45000.1) %% 12 + 1 last.6.month[ npay].77 262. The ﬁrst line of loan that puts names onto month. 2) new.new.inter <.50 262.0 . .payment[i] } ans <. payment = payment)) attr(ans. interest = new.43 275 4 May 1902 44962.prin * rate)/12.last.numeric(npay) for(i in 1:npay) { new.month <. is not entirely obvious—that’s part of the power of it.07.year. "rate") <. .28 275 6 July 1902 44936.prin + this. > jjl <. 1902) > jjl month year principal interest payment 1 February 1902 45000 NA NA > update(jjl.c(month = new.name[new. We have used subscripting. "Feb". > loan(45000.inter[i] <.prin[i] <.28 262.prin <.00 NA NA 2 March 1902 44987.50 275 3 April 1902 44974. Burns v1.07. year = new. Each of those three functions is non-trivial.prin. It would be handy to write a summary method for the loan class—the inherited method is decidedly not what we want.date") <.inter .rate ans } With just these two functions we have everything that we really need.56 262.year[npay]) attr(ans. rbind and print methods inherited from data frames without really noticing it. year = new. "principal"] new.this.name allows us to use either the number of the month or (an abbreviation of) the month name.(new. data.month].round(( last.35 275 5 June 1902 44949. OBJECT-ORIENTED PROGRAMMING "year"] new. but we didn’t have to worry about them in our application.frame(month = month. 2.last.object[nrow(object).93 262. 5)) month year principal interest payment 1 February 1902 45000. principal = new.inter <.9. rep(275.inter. 1902) month year principal interest payment 1 February 1902 45000 NA NA S Poetry c 1998 Patrick J.prin <.

Stacks are often used because they are fundamental structures in the language and hence eﬃcient—this is not the case here. length. "[. Notice that there is a dot at the end of the length. but after the function exits. More importantly this implementation breaks the expectation about side eﬀects in S—subscripting a stack causes it to change.stack"<S Poetry c 1998 Patrick J.0 . I do this with great trepidation for two reasons.vector("list". Burns v1. size = size. = 64. 9. which is decidedly non-standard in S. For example.228 CHAPTER 9. initial = NULL. S FOR S PROGRAMMERS A new version of month. I remain skeptical.7 Data Structures An important topic in computer science is data structures. the same month.) if(size <. In large measure S programmers do not need to worry about these because S already provides tools equivalent to these structures. CODE NOTE. Below I present an implementation of stacks in S. binary trees. Because of partial argument name matching.length(initial)) { if(size > length. argument. named lists or vectors can perform similar functionality to a hash table. though.initial } atl <. "stack"<function(length. the argument name would conﬂict with the length function and we would get annoying warning messages. and linked lists. Examples of useful data structures are hash tables. Without the dot.list(class = "stack". update = update) attributes(ans) <. The following function is the “pop”. we can think of the argument name being length without the dot. && is. My hope is that more good will result from this example than abuse. update = T) { ans <.name is created in the frame of loan.logical(update ) && !update) stop("stack overflow") ans[1:size] <.atl ans } The stack function returns an object of class stack. Stacks are a common data structure in computer science.name as usual will be the one seen.

"size") <. offset = 1) attr(x.1 if(xloc < 0) assign(xname. x. The item is not erased from the list. "[<-. i) { if(!missing(i)) stop("only empty subscripting allowed") size <.2 * length(x) } else stop("stack overflow") } else { length(x) <. and the size of the stack is changed. The next function pushes an item onto the stack.7.unclass(x)[[size]] xname <.attr(x.deparse(substitute(x)) xloc <. "size") if(size == length(x)) { update <. This is a less troublesome function since a side eﬀect is expected. i.attr(x.size .stack"<function(x.whence(xname.logical(update)) { if(update) { length(x) <. "size") if(!size) return(NULL) ans <. 1)) ans } 229 This performs the unpleasant deed of assigning the changed object behind the scenes.attr(x.length(x) + update } attributes(x) <. "update") xat <. possibly in a new location.0 .9. where = min(xloc. x. though that may be a good idea for applications in which the items on the stack are large. Burns v1.xat S Poetry c 1998 Patrick J. The last item in the stack is put into ans. value) { if(!missing(i)) stop("only empty subscripting allowed") size <. DATA STRUCTURES function(x. frame = abs(xloc)) else assign(xname.attributes(x) if(is.

"size") <.value attr(x.default(jjstk) [[1]]: NULL [[2]]: NULL [[3]]: NULL attr(.) { size <. "size") if(size) print(unclass(x)[1:size].stack"<function(x.) else cat("(empty stack)\n") cat("Class:".0 . > jjstk <.. When update is logical.stack(3) > print. The update argument can be either logical or numeric. or growing the stack is disallowed. then either the length is doubled. If it is numeric.. S FOR S PROGRAMMERS # length change destroys attributes } x[[size + 1]] <. class(x).. "update"): [1] T > jjstk (empty stack) S Poetry c 1998 Patrick J. "print. "class"): [1] "stack" attr(.attr(x. . Burns v1. then this is the number of components to be added to the list each time it must be lengthened. "size"): [1] 0 attr(. Of course we need a print method.size + 1 x } The main complication here is that the list underlying the stack may need to be lengthened.230 CHAPTER 9.. "\n") invisible(x) } Here is a simple example of how a stack works. .

9. However. This is to some degree a matter of taste. and there are all sorts of things to be tested concerning the side eﬀects.-1:4 R> fjj. THE R LANGUAGE Class: stack > jjstk[] <.8.function() { S Poetry c 1998 Patrick J.1:3 > jjstk [[1]]: [1] 1 2 3 4 [[2]]: [1] 1 2 3 Class: stack > jjstk[] [1] 1 2 3 > jjstk [[1]]: [1] 1 2 3 4 Class: stack > jjstk[] [1] 1 2 3 4 > jjstk[] NULL 231 Some would argue that an error should result from popping an empty stack. One might be concerned if adding a NULL item onto the stack works properly (it does). code that works in S will do the same thing in R. 9. there are some signiﬁcant diﬀerences. One of the most drastic is that R inherits a diﬀerent idea of how to search for objects due to its lineage to Lisp. Here is an example using R.r function () { subfun <. Queues can be deﬁned analogously. Burns v1.1:4 > jjstk[] <.0 . R> x <. At this point the garden-variety use of the functions seems to be okay.8 The R Language To a great extent.s. I think it is better in general to not create an error when the result is not demonstrably wrong—speciﬁc applications can make it an error if need be.

r() [1] 1 0 1 4 9 16 CHAPTER 9.s.r() [1] 529 576 625 And the same example in S. That x is a mate to subfun because they are deﬁned in the same environment. Discover the modes that I haven’t talked about. 9. Find some .vars.Internal calls that do not evaluate their arguments.232 x * x } x <. S FOR S PROGRAMMERS The formatting of the function is diﬀerent. Create a function that fully checks the input ﬁle for BATCH.s.s. While in R the x that is deﬁned in the function is used.0 . and learn what they do.assign and global. S> x <. Generalize loan so that it is worthy of its name.function() { x * x } x <.9 Things to Do Decide which pieces of code that you have written would be improved by making it object-oriented.r function() { subfun <. Find and ﬁx the bugs in find. S Poetry c 1998 Patrick J. In S it is the global x that is used (see page 210 if you need to review). Do it. Why don’t they? The loan function uses a quite general name for a rather speciﬁc case. but the signiﬁcant diﬀerence is the answer.-1:4 S> fjj.23:25 subfun() } S> fjj. Burns v1.23:25 subfun() } R> fjj.

Chambers and Hastie (1992) Statistical Models in S has some further discussion of S.10. Burns v1.9.0 . 9. Chambers and Wilks (1988) The New S Language contains a thorough discussion of the S language. A discussion of basic data structures is in Knuth (1973). FURTHER READING 233 Why didn’t I use functions named pop and push to implement stacks? Should I have? Is there a better rule for the location of assignment for a stack that has been popped? Implement a class for deques.10 Further Reading Becker.11 28 29 Quotations David Wagoner “Looking for Mountain Beavers” Wallace Stevens “Evening Without Angels” 30 Gwendolyn Brooks “Boy Breaking Glass” 31 Philip Mead “The Soldier” 32 Theodor Geisel “Fox in Sox” 33 Gwendolyn Brooks “We Real Cool” S Poetry c 1998 Patrick J. You can ﬁnd out about R via statlib. in particular. 9. Appendix A explains the object-orientation in S.

0 .234 S Poetry c 1998 Patrick J. Burns v1.

1 Changing Base It is not unusual to want to view numbers in a base other than base 10—or in more technical terms—to change radix. 10. We’ll look at an implementation of this called numberbase. 8) [1] 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 20 21 [18] 22 23 24 (base 8) > numberbase(jj. 10. of numerical computations. 16) [1] 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 [18] 12 13 14 (base 16) > numberbase(c("10011101". octal and hexadecimal are particularly common bases to use. mostly half-baked. "1101010001").Chapter 10 Numbers Here are some examples. Binary. 2) [1] 1 10 11 100 101 110 111 1000 [9] 1001 1010 1011 1100 1101 1110 1111 10000 [17] 10001 10010 10011 10100 (base 2) > numberbase(jj.numberbase(1:20) > jj [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 18 19 20 (base 10) > numberbase(jj. Note that this function is limited to integers. Here it is in action: > jj <. old=2) 235 .

base10(attr(x.numberbase(ans.default"<function(x. "value")..raw * mult) S Poetry c 1998 Patrick J. ncr.raw <.raw. oldbase = 10) { ans <. mult.function(raw. Dividing the computations like this clariﬁes the scheme slightly—the function names crystalize the abstractions. ncr) this.nchar(raw) this. oldbase) if(newbase != 10) ans <. Once we look into these lower-level functions. newbase = 10) { from. Burns v1..1 digested <.to. digested + this. digested) { ncr <. The default method doubly cheats by calling the generic function again.0 .) UseMethod("numberbase") It has two methods written for it—a default method and the rather perverse idea of a method for objects of the class that it creates: "numberbase. NUMBERS The numberbase function itself is a typical generic function: "numberbase"<function(x. "to.sub <.raw <. newbase = 10. newbase) ans } "numberbase.raw == 36.base10.ifelse(this.base10(x. newbase) } These are nice and simple. but only because they have a couple of slaves that do all of the work. letters.match(this. .base10"<function(raw. "-")) . we see how the system works.236 [1] 157 849 (base 10) CHAPTER 10.substring(raw. oldbase) { to. c(0:9.numberbase"<function(x. digested.

(We should worry for an instant what happens in the call to substring when ncr is 1. Then the characters are converted to their numeric value starting with the one’s place and moving to the higher value places.transcribe(raw. and it has a "value" attribute that contains the numeric value.sub) takes care of doing the messy details of the work within the loop. Burns v1.character(raw)) stop(paste("can not handle data of mode".current$raw todo <.10.) The structure of a numberbase object is that it is a vector of the character representation in the base of interest. ans is made into an object of class "numberbase".Machine$double. digested = digested) } # start of main function if(is. and it has an attribute called "base" that states the base of the representation."numberbase" ans } 237 The ﬁrst step is to convert numbers to character strings.rep(0. length(raw)) while(any(todo)) { current <. but it works okay.1. mode(raw))) raw <. "value") <. Elements that have the largest number of characters in their representation control the length of the while loop.digested attr(ans.sub(raw = raw[ todo]. "A-Z". 1.0 .multiple * oldbase digested[todo] <.to.nchar(raw) > 0 digested <.raw) > .as.nchar(raw) > 0 } ans <.base10. "a-z") multiple <. ncr .10 class(ans) <. Finally.character(round(raw)) } else if(!is.eps)) { warning("rounding non-integers") } raw <.current$digested raw[todo] <.as.numeric(raw)) { if(any(abs(round(raw) . S Poetry c 1998 Patrick J. CHANGING BASE list(raw = substring(raw.character(digested) attr(ans. mult = multiple. "base") <. The subfunction (called to. digested = digested[todo]) multiple <.1).base10.1 todo <.

paste("-".digit <.c(0:9.round(raw) ans <. newbase) { from.which. sep = "") S Poetry c 1998 Patrick J.from.5 || newbase < 1.eps) stop("need single integer between 2 and 36 as base" ) raw <.digit + 1] ans[!zero] <.wna] } if(any(abs(round(raw) .base10. sep = "") ans } # start of main function if(length(newbase) != 1 || newbase > 36. let me emphasize that you should use one style throughout a project—we can expect three times as much debugging of to.character(length(raw)) this.digit.as. letters)[ this.raw) > . this.238 CHAPTER 10.raw[ .0 .raw == 0 if(all(zero)) return(character(0)) ans <.numeric(raw) wna <.base10 and from.Machine$double. ans[raw < 0].sub <. "from. Again.sub(abs(raw). rather than being called from a loop as is the subfunction in to.raw raw <. Burns v1. newbase).eps)) warning("rounding non-integers") raw <.base10 uses recursion. round(newbase)) ans[nchar(ans) == 0] <.base10.base10 than if we had stuck to one style.raw[!zero] %% newbase if(newbase > 10) this.digit <.base10.base10 could have been written with precisely the same structure as to.paste(Recall(raw[!zero] %/% newbase.5 || abs(round(newbase) newbase) > .Machine$ double."0" ans[raw < 0] <. but it is written in a diﬀerent style to display features of the language. The subfunction in from.function(raw. newbase) { zero <.base10"<function(raw. NUMBERS from.na(raw) if(length(wna)) { realraw <.base10.

character(length(realraw)) realans[wna] <."NA" realans[ . you can put a call to browser just before the call to paste in order to investigate. and calls itself on the portion that still needs further work.sub is to return if the condition is met that means it is at the bottom of the recursion.realans raw <.Machine$integer. and resources won’t be strained.base10 except there is no loop.. it wouldn’t work if it called itself recursively by name since the new call wouldn’t search the frame where the function is deﬁned.base10..10.newbase class(ans) <. (Page 210 tells the rules for object searching. then using recursion is good—the code is likely to be simple and easy to understand. and may break on larger problems.realraw } attr(ans. It then does some work. CHANGING BASE if(length(wna)) { realans <. If the recursion is guaranteed not to go too deep. then you do not need to change the deﬁnition of the function if you use Recall—you do if you hard-code the name in the deﬁnition.0 . If you are confused about how this works."numberbase" ans } 239 Except for some messing about with missing values.max. The second example below illustrates what happens when recursion goes too deep in S: > numberbase(. A recursive algorithm can always be converted to an iterative one. If the recursion is deep. 2) Error: Expressions nested beyond limit (256) -increase limit with options(expressions=.) S Poetry c 1998 Patrick J. "base") <. If you change the function name.sub is deﬁned within another function.sub calls itself recursively through the Recall function. There are two reasons to use Recall.raw attr(ans.) Many problems have a very natural recursive solution that is easy for us to comprehend.1. Since from. from. Unfortunately. As in all recursive functions. "value") <.max.base10.Machine$integer. the ﬁrst thing done in from.ans ans <. Burns v1.wna] <.base10. The second reason is that Recall ensures that the function will be found. then the code will be ineﬃcient. recursion does not tend to be natural in computer languages. 16) [1] 7fffffff (base 16) > numberbase(. the main actions are the same in this function as in to. S functions may call themselves recursively by using their own name or by using Recall.

")\n".max. attr(x.0 . Since integer.vector(x) names(xv) <. The above functions seem to work reasonably well. 2) [1] 1111111111111111111111111111111 (base 2) CHAPTER 10. "base").. quote = F) cat(paste("(base ". Burns v1. Additional arguments are allowed to come in even though they are not used. However. . old=2) [1] 56 157 849 (base 10) This is an example of a test of “garbage” inputs (page 53)—the input is not precisely what was expected. though we may want to change the expressions option within the function. sep = "")) invisible(x) } The most important aspects are that it follows the convention that it returns its argument invisibly.as. and that that argument is named x (see page 92).function(raw.240 only 27 of 83 frames dumped Dumped > options(expressions=350) > numberbase(. indicating the base is a requirement. Of course. "1101010001"). "print. if we upgrade the functionality to include ﬂoating point numbers.Machine$integer.names(x) print(xv. Here is the print method for numberbase objects. A new deﬁnition for to. we should worry about the recursion. digested.sub <. 10.base10.max is the worst case.numberbase"<function(x. so we need to ﬁx this. However.base10. Quoting of the character strings is turned oﬀ so that the result will look like numbers and not character strings.) { xv <. mult. there is no reason to give up the recursive algorithm. but it is well within the realm of reason."10011101". It’s decidedly impolite for missing values to transform themselves into normal looking data. S Poetry c 1998 Patrick J. NUMBERS The expressions option is a safety valve to keep unintentional recursion from creating an inﬁnite loop.sub makes it better: to.. informal testing uncovered the following behavior: > numberbase(c(NA.

ncr .as.ifelse(this. "-")) .. xbase.attr(xnb. "okay\n") } } We can test every combination of bases with a loop like: > for(i bases 2 bases 2 bases 2 in 2:36) fjjchecknumberbase(-1000:1000.vector( this. digested + this.) { xnb <. old = xbase) this. i.raw * mult) bad <. "fjjchecknumberbase"<function(x. . c(0:9. "and". new=i) and 2 okay and 3 okay and 4 okay S Poetry c 1998 Patrick J.substring(raw. new = i. digested = digested) } 241 One check on whether numberbase works or not. ncr. new = xbase.. Burns v1. letters.this.nb <.1 digested <.raw < 36 digested[bad] <.nchar(raw) this.numberbase(xnb. CHANGING BASE oldbase) { ncr <.1.nb). digested. "base") xnb <.0 .raw >= oldbase & this..numberbase(as. i)) else cat("bases".numberbase(x.raw <.raw == 36. . xbase.raw <.back <.) xbase <. is to go to a base and then use the characters for that base to go back to the original base and see if anything is diﬀerent. ncr) this. old = i) if(any(this.. "and".raw. 1.back != xnb)) stop(paste( "bad conversion between bases".NA list(raw = substring(raw.match(this.10.1).vector(xnb) for(i in 2:36) { this.

10. Do you dance. If you want precise computations with non-integer numbers (and your problem is suitable).eps)) . of course. then you can use rational numbers where the numerator and denominator are each integers. den=1:9) > jjr [1] 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 > jjr * 2 . it is only extreme cases where trouble is likely to be. but the problem persists.Machine$double. NUMBERS . is that computations are typically done with ﬂoating point numbers of ﬁnite length so that small numerical errors usually occur with each operation. NA.0 . . Here we create a class of objects for rational numbers. As long as the input is numeric. so I couldn’t see why a big.base10: Missing value where logical needed: if(any(abs(round(raw) . we are (rather accidentally) protected from special values causing problems because an error results if we include any: > numberbase(c(-10:10. We don’t know if it always works.999998 and 5. but we at least know that it works on the numbers that we gave it. S hides this better than that Fortran program did because S typically computes in double precision and prints in single precision. though it would be a nicety to have special values work.rationalnum(num=1. Inf)) Error in to. do you dance? 34 > jjr <. Dumped In summary.1 / jjr [1] 1 -1 -7/3 -7/2 -23/5 -17/3 -47/7 -31/4 [9] -79/9 The primary function that creates rational numbers is: "rationalnum"<function(numerator.raw) > . It was a program to do the quadratic equation. we can have reasonable faith that numberbase performs correctly.2 Rational Numbers I was disillusioned once my ﬁrst computer program (in Fortran) gave me an answer. and it produced answers like 1. Minnaloushe. . fancy computer couldn’t know it also.999999. I knew very well that the correct answers were 2 and 6. The reason. denominator) { S Poetry c 1998 Patrick J..242 CHAPTER 10. Burns v1. Given that it works for these..

0 .0 x$numerator[out] <.0 } if(any(out <. we see that the "rationalnum" class consists of a list of three components—the vector of numerators. but there is a catch—you need to make sure that lazy evaluation isn’t going to get in the way. names = anam) class(ans) <.sign(x) x[x == 0] <. RATIONAL NUMBERS n <. the vector of denominators.is.length( denominator)) if(ln != n) numerator <.infinite(x$denominator))) { x$denominator[out] <.infinite(x$ numerator)] <.rationalnum(ans) } 243 There are assignments of the arguments to max. If the argument is not used in the call.rep(denominator. Burns v1.rationalnum"<function(x) { signnz <. and a vector of names (which may be NULL). denominator = denominator. length = n ) if(ld != n) denominator <. After the function takes care of some bureaucracy.1 x } # start of main function nas <.10.is.length(numerator).names(denominator) ans <.is.names(numerator) if(!length(anam)) anam <.infinite(x$numerator))) { S Poetry c 1998 Patrick J. But the soul of the function lies elsewhere in reduce."rationalnum" reduce. length = n) anam <.list(numerator = numerator.na(x$numerator) | is.na(x$ denominator) # take care of Inf before coercing to integer if(any(out <.1 x$denominator[out & is.rep(numerator.max(ln <. You can put assignments virtually anywhere.2.function(x) { x <. ld <.rationalnum: "reduce. then it will not be evaluated and hence the assignment not made.

common.as.integer(ans)) ans[out] <.na(z) ans[zz] <.as.round(y) okay <.div"<function(x.finite(y) x <.244 CHAPTER 10.!is.NA x } The purpose of this function is to express each number in the form in which the numerator and denominator have no common factor.integer(x$numerator/fact * signnz(x$denominator)) x$denominator <.common. NUMBERS x$numerator[out] <.okay & !zz x[okay] <.z <.div which uses Euclid’s algorithm to compute the greatest common divisor for each numerator-denominator pair.round(x) y <.integer(abs(x$denominator/ fact)) x$numerator[nas] <.0 .NA ans } S Poetry c 1998 Patrick J.x while(any(okay)) { z[okay] <.0 } x$numerator <.finite(x) | !is.integer(x$denominator) fact <.length(y))) stop("x and y different lengths") out <.great. This again is mainly taken up with details like handling missing values and inﬁnity.y != 0 okay[out] <. x$denominator) x$numerator <.abs(as.common. y) { if(length(x) != (n <.z == 0 | is.y[okay] y[okay] <.z[okay] } ans <.x[okay] %% y[okay] zz <.integer(x$numerator) x$denominator <. The interesting part is in great.div(x$numerator.y[zz] okay <.F ans <.as.as.sign(x$numerator[ out]) * signnz(x$denominator[ out]) x$denominator[out] <. "great. Burns v1.

Because the v is zero.rationalnum function(x."NA" out[!xna & x$numerator < 0 & dz] <.x$denominator == 0 nz <.) { out <. rep(0. 8). [1] 1 2 3 4 1 6 1 4 3 [17] 1 6 1 4 3 2 1 12 1 [33] 3 2 1 12 1 2 3 4 1 [49] 1 2 3 4 1 6 1 4 3 [65] 1 6 1 4 3 2 1 12 1 [81] 3 2 1 12 1 2 3 4 1 [97] 1 2 3 4 100)) 2 1 12 2 3 4 6 1 4 2 1 12 2 3 4 6 1 4 1 1 3 1 1 3 2 6 2 2 6 2 3 4 1 4 1 12 3 4 1 4 1 12 The print method for the rational number class is deﬁned as: > print.div(1:100. 12 mod 8 = 4 to get the pair (8.0 . rep(12. x$denominator.na(x$ denominator) d1 <. The idea of the algorithm is that the greatest common divisor of u and v is the same as the greatest common divisor of v and u mod v . This leads to the pair (4.base10 function discussed earlier. RATIONAL NUMBERS 245 This function.is.10.common.div(c(-4:4).. then 8 mod 12 is 8. Burns v1.."0" out[!xna & nz & dz] <. sep = "") names(out) <.9)) [1] 4 3 2 1 0 1 2 3 4 Here is a more typical problem: > great.2. .x$numerator[d1] out[xna] <. p 316) wants the greatest common denominator of 0 with 0 to be 0."NA" dz <. we know we are done and the answer is 4.. like the to. if we start with the pair (8.common. .) S Poetry c 1998 Patrick J."-Inf" out[!xna & x$numerator > 0 & dz] <. 0). and the greatest common denominator of u with 0 to be the absolute value of u. 4). For instance.paste(x$numerator.x$numerator == 0 out[!xna & nz & !dz] <.na(x$numerator) | is. "/". uses vectorized operations and has a loop that stops when enough computing has been performed on the worst-case element.."Inf" print(out. > great. Knuth (1981. so we have the pair (12. quote = F. Next.x$denominator == 1 out[d1] <. 12).x$names xna <.

However. Several functions use as.x[good] ints <. numeric = { ans <. Here is addition: "+. and in consequence the function doesn’t really do much.rationalnum.x == ceiling(x) if(count <. Otherwise. we can convert integers. 1) } .sum(!ints)) warning(paste(count. great. we can eke out a little more eﬃciency by deleting the ﬁrst two lines. We are opening the door to unforeseen uses at the expense of only a couple dozen extra keystrokes.NA good <. Now we come to the arithmetic operators. stop(paste("can not coerce to rationalnum from mode". but nothing else.na(x) x <.rationalnum for class "rationalnum". so we could have made it an ordinary function. print it without quotes and return the original invisibly. If we already have rational numbers. Here is the default method for this function: "as. mode(x)))) } Note that there are no other methods for this function. All of the rest is details for missing values and such.default"<function(x) { if(inherits(x. Coercion to rational numbers is not easy. and creating a method of as. e2) { S Poetry c 1998 Patrick J.0 .x ans[] <.!is.246 invisible(x) } CHAPTER 10.rationalnum"<function(e1. "NA(s) created coercing to rationalnum" )) ans[good][ints] <. Since we are making it generic.rationalnum to make sure that an object is of class "rationalnum" when appropriate.x[ints] rationalnum(ans. it is believable that other methods could be desired. "rationalnum")) return(x) switch(mode(x). NUMBERS The ﬁrst line and the last two lines are really what the function is about—paste together a character string of the right thing. Burns v1.

e2) S Poetry c 1998 Patrick J. we would want the coercion to tend to go the other way—coerce rational to numeric and then proceed (but if the non-rational numbers can be coerced to rational. we plug what we need into rationalnum which will take care of reducing all of the fractions. The minus operator is decidedly unlike the addition operator. RATIONAL NUMBERS if(missing(e2)) return(e1) e1 <.rationalnum(e1) if(missing(e2)) { e1$numerator <. we let the usual operators do all of the work. 4) but if a user does that. Finally.2.10. As the comment says.rationalnum(e1) e2 <. The real reason we care is that we will end up in this method if only one of the two arguments has class "rationalnum".names(e2) rationalnum(num1 * den2 + e2$numerator * e1$ denominator. e1$denominator * den2) } 247 The ﬁrst thing is to see if this is a unary operator. Next.names(e1) den2 <. To make something more than a toy. It can be a deep problem of what to do when the two arguments to a binary operator are of diﬀerent types. there is the perverse way—the method is called directly: "+. then we would want to do that).e2$denominator names(den2) <.as.e1$numerator names(num1) <.rationalnum(e2) # propagate names. e2) { e1 <. do what seems to be unnecessary—coerce vectors to be rational numbers.rationalnum"<function(e1..as.e1$numerator return(e1) } e1 + ( . let default ops do the details num1 <. then they can get what they deserve.as. Burns v1. Why would we be inside a method for rational numbers if we don’t have rational numbers to begin with? Well.0 . and the binary operator cheats its way out. "-. This time the unary operator does something. We need to make sure that both are of this class before we proceed. in which case we don’t need to do anything.rationalnum"(1. Then we spend a few lines arranging for the names of the rational numbers to end up in the result.

> jjr <.rationalnum"<function(x) x$names From this we can see that “names” is confounded.value x } This is the typical form of an assignment function. NUMBERS The multiplication and division operators are very similar to addition. which is what is contained in the names component that the above function returns.as. and there are the names of the list that is the implementation of the rational number class. den=1:26) > jjr S Poetry c 1998 Patrick J. Burns v1. We have the names of the numbers. you should be upset by this deﬁnition—it presupposes that there is a length. and we can write a method for this also: "names<-.rationalnum (which is given below).rationalnum"<function(x. > names(jjr) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" [13] "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" [25] "y" "z" > names(unclass(jjr)) [1] "numerator" "denominator" "names" We need to keep these two straight. and a second argument named value which is the value given to the appropriate part of x (the object inside the parentheses to the left of the assignment operator). The return value of the function is x. One argument—often named x—that is the object being changed. Now we delve into the housekeeping functions: "names.0 . The names function has an assignment form.rationalnum(num=1. value) { value <.248 } CHAPTER 10. except that they do not allow the possibility of a unary operation.character(value) if(length(value) != length(x)) stop("bad length for names") x$names <. From what you have seen.

value) { length(x$numerator) <. From the deﬁnition of names<-.na function: "is. We need to know which values are missing. We would like the length of a rational number object to be the number of numbers represented by the object.10. S is smart about making the right thing happen.rationalnum"<function(x) length(x$numerator) An assignment form is also possible: "length<-. "length. as with ﬂoating point objects.0 .2.value length(x$denominator) <.rationalnum"<function(x.na.value x } DANGER. Burns v1.value if(length(x$names)) length(x$names) <. Changing the meaning of length—and to a lesser extent names— should be undertaken only with sound reason and with some trepidation. but that is not the case.letters > jjr a b c d e f g h i j k l 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/12 m n o p q r s t u v 1/13 1/14 1/15 1/16 1/17 1/18 1/19 1/20 1/21 1/22 w x y z 1/23 1/24 1/25 1/26 249 As this example shows. If all of the holes aren’t ﬁlled.rationalnum"<S Poetry c 1998 Patrick J. there is room for disaster.names(jjr2) <. so we need a method for the is.rationalnum it seems that we would be trying to assign the whole of jjr2 to the names of jjr.jjr > names(jjr) <. RATIONAL NUMBERS [1] 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 [11] 1/11 1/12 1/13 1/14 1/15 1/16 1/17 1/18 1/19 1/20 [21] 1/21 1/22 1/23 1/24 1/25 1/26 > jjr2 <.

den x$numerator <.x$numerator[i] x$denominator <.nan.value$denominator num[i] <.as.rationalnum"<function(x) { ans <. Burns v1.na(ans)] <.F names(ans) <.character(i)) i <.x$names den[i] <.x$denominator num <. i. value) { value <.nan(x) } This in turn needs to know about NaN’s: "is.na(x$numerator) | is. The assignment form uses a safer approach by putting the names onto the numerator and denominator.names(num) <.rationalnum(value) den <.rationalnum"<function(x. NUMBERS function(x) { is.0 .pmatch(i.250 CHAPTER 10.num S Poetry c 1998 Patrick J.value$numerator x$denominator <. dup = T) x$numerator <.na(x$denominator) | is.rationalnum"<function(x. i) { if(is.x$numerator names(den) <. and then depending entirely on the default subscripting behavior: "[<-.x$names ans } Subscripting is extremely important functionality—here is a method for that: "[.x$denominator[i] x$names <.x$names[i] x } Notice that we have to handle character subscripts on our own since the names are not as S expects. x$names.x$numerator == 0 & x$denominator == 0 # both infinity should not occur ans[is.

rat$names if(length(this.c(den.rat$denominator) this.rat$numerator) den <.rationalnum"<function(.names <..) { dots <. Here’s how: "as.c(num. but I can’t think of an alternative that is very appetizing.nam <.c(nam.vector(x$numerator/x$denominator) names(ans) <.nam class(ans) <. this.T else this.den <. of course.names) ans$names <.rep(""."rationalnum" ans } This uses the idiom of concatenating within a loop that I claimed earlier you should not use.numeric. this.names(x) ans } A method for c is often quite useful.F for(i in seq(along = dots)) { this..10.2. With some work we could avoid it. RATIONAL NUMBERS x$names <.nam <.nam <.list(numerator = num.nam) } ans <. Burns v1..as.names <.) num <. length( this. denominator = den ) if(has.rationalnum"<function(x) { ans <.nam)) has.NULL has.rat <.0 .list(. "c. this. want to be able to change from rational numbers to numeric vectors.names(den) x } 251 We.this. It presents the problem that it should allow an arbitrary number of arguments..as. Furthermore it seems extremely rare that S Poetry c 1998 Patrick J.rat)) nam <.rationalnum(dots[[i]]) num <.

41666667 0. 12) > exp(jjr2) [1] 0.25000000 0.0869040 1.8464817 0. So a logical thing to do for these functions is to coerce the rational numbers to numeric and then use the usual function.rationalnum: > c(jj3.08333333 0.Generic object is a character string that gives the name of the generic function that was called. .names is to keep the names lined up properly.7788008 0.50000000 Warning messages: coercing rational numbers to numeric in: Math. NA. Here is an example: jjr2 <.5168968 1.9200444 [6] 1.1813604 1.6592406 0. Burns v1.6487213 Warning messages: coercing rational numbers to numeric in: Math.: can not coerce to rationalnum from mode logical Dumped The provinces of his body revolted 35 A Math group method for rational numbers is useful: "Math.00000000 0.16666667 [9] 0. So Math..2840254 1.7165313 0.as.rationalnum does three simple things: warn that coercion is taking place.16666667 [5] 0.33333333 0.25000000 0.0 . call the next (default) method for the function. do the coercion.) { warning("coercing rational numbers to numeric") x <. The following statement shows that we can stand to improve as. The bother with has..Generic) } The Math group consists of a number of functions like sin and exp. jj3) Error in switch(mode(x).rati\ onalnum(jjr2) S Poetry c 1998 Patrick J.08333333 0.3956124 [11] 1.rati\ onalnum(jjr2) > abs(jjr2) [1] 0. that is. The main complication in this function is that some but not all of the arguments may have names.numeric(x) NextMethod(. Most of these are not closed under rational numbers.rationalnum"<function(x.41666667 0.rationalnum(-5:6. then the exact answer need not be a rational number. if a rational number is given as an argument.33333333 0. NUMBERS the loop would be very long.0000000 1.252 CHAPTER 10. The .

0.0 .abs(x$denominator) x } After this is deﬁned. Here is a function that helps with that job. our trick of doing the same operation on numerator and denominator is not going to work. 2]) if(test) { all. "abs.Inf. den) ratnum <. to match the default method.10.2. we would need to treat incomparables also. NA.equal(anum[. 4. 10. test = T) { anum <. 0/0). 2. we get: > abs(jjr2) [1] 5/12 1/3 [11] 5/12 1/2 1/4 1/6 1/12 0 1/12 1/6 1/4 1/3 The default method for unique is implemented as selecting those elements that are not duplicated. In writing a method of unique for rational numbers.paste(x$numerator. anum[. Inf. -7.abs(x$numerator) x$denominator <. 1. x$denominator ) x$names <. -4. ratnum$ num/ratnum$den) S Poetry c 1998 Patrick J. Burns v1. RATIONAL NUMBERS 253 We have gone too far—there is no need to coerce to numeric to get the absolute value.NULL x[!duplicated(xchar)] } With rational numbers we have the opportunity to get very precise answers that are wrong. "/". -2.rationalnum"<function(x) { xchar <. Checking the veracity of the computations is an important part of the programming.rationalnum"<function(x) { x$numerator <. "unique. 2].expand. "fjjcheckrationalnum"<function(num = c( . 7. 1]/anum[.grid(num. -1. The following is a cheap way out. The solution is to write a method for abs—this speciﬁc method will take precedence over the group method.rationalnum(anum[. 1]. -10. den = num.

numeric(ra)) else cbind(nn. as. deparse( substitute(r2))))) nn <.get(op)(n1. deparse(substitute(r1))))) nn <.254 } else ratnum } CHAPTER 10. and either compare them to see if they are all the same.grid call gives us a data frame containing every combination of numbers in num with numbers in den.eval(parse(text = paste(op. r1. and then we capture the vector of rational numbers for further testing. or return the numbers. Burns v1. as.as. 1.numeric(r1) if(unary) { ra <.fjjcheckrationalnum(test=F) > length(jjrt) [1] 225 > class(jjrt) [1] "rationalnum" The test passes. S Poetry c 1998 Patrick J.as.missing(r2) n1 <. n2) } if(test) all. NUMBERS The expand.0 . 2. a prime number and some composite numbers. The function then either tests the value of the rational numbers versus their true value. test = T) { unary <.numeric(ra)) } This has the same sort of form: do the computation with both rational numbers and numeric vectors.numeric(r2) ra <. or returns the rational numbers. op. r2.eval(parse(text = paste(deparse( substitute(r1)). > fjjcheckrationalnum() [1] T > jjrt <. This is a function to test operations on rational numbers: "fjjcheckratop"<function(op.equal(nn. The default value for num gives all of the special values plus 0.get(op)(n1) } else { n2 <.

inf(den) & !nans & !nas out <.1 x } # start of main function num <.function(x) { x <.rationalnum.div(num[!out].inf(den)) | (num != 0 & den == 0)) & !nas & !nans zeros <.round(x$denominator) nans <. jjrt) [1] "Missing value mismatches: 93 in current. RATIONAL NUMBERS > fjjcheckratop("+".common. The second call gives us the numbers so that we can investigate.10.inf(num) & is.sign(x) x[x == 0] <. jjrt.(is.integer(num[!out]/fact * signnz(den[!out])) x$denominator[!out] <.x$denominator[nans] <as.rationalnum(Inf.fjjcheckratop("+". The revised version is: "reduce. den[!out]) x$numerator[!out] <. jjrt.integer(0) x$numerator[infs] <.F nans[is. 1) > jjri [1] Inf > jjri + jjri [1] NA The problem is that inﬁnity is represented as 1/0 but it would be better if it were ∞/1.(Inf * signnz(den[infs]) * S Poetry c 1998 Patrick J.T nas <.is.na(nans)] <.rationalnum"<function(x) { signnz <.integer(abs(den[!out ]/fact)) x$numerator[nans] <.as. The trouble boils down to: > jjri <.great.0 .inf(den))) nans[is.inf(num) & !is. F) 255 in target" Now we see trouble. 61 > jjrplus <.nan(den)] <. This led to a rewrite of reduce. Burns v1.na(den)) & !nans infs <.2.((is.as.round(x$numerator) den <.na(num) | is.nas | nans | infs | zeros fact <.nan(num) | is.((num == 0 & den == 0) | (is. jjrt.

Burns v1. This version accepts that there are special cases to be addressed. "trigam") [1] 1. There is an easy translation between the two forms.3 Polygamma The digamma function.6449341 0. 10. 1:4) [1] 1.6449341 0.3949341 0.493939 -24. this diﬀerence is subtle enough to give you headaches if you are checking your answers. but it is more ﬁrmly rooted in the problem.2213230 > polygamma(1.2838230 0.644934 -2.3949341 0. DANGER. and so on. You will notice that I’m fairly liberal in my use of parentheses in logical statements—it’s easier this way to be sure that S and I are both thinking the same. this is the ﬁrst derivative of the lgamma function. Some people use Γ(x + 1) instead of Γ(x).886266 S Poetry c 1998 Patrick J. Our polygamma S function speciﬁes the order of the derivative of ψ via either a number or a character string.0 x$numerator[nas] <. The trigamma function is the ﬁrst derivative. often denoted ψ (x).1 x$numerator[zeros] <. the tetragamma function is the second derivative. and attacks them head on.404114 6. Thankfully the bug has saved us from staying with the original version. The ﬁrst version tried to get by with minimal attention to the weird cases—as a result it had code that was hard to understand (as well as being wrong). That it works correctly is of course a plus.2838230 0.0 . > polygamma(1:5.256 signnz(num[infs])) x$denominator[infs | zeros] <.2213230 > polygamma(1:5. However.6449341 0.NA x } CHAPTER 10. is deﬁned as ψ (x) = d loge Γ(x) dx (10.6449341 0. NUMBERS I like this version much better. 1) [1] 1. and wrong answers if you are not. Each polygamma function is a derivative of ψ (x) of some order.1) That is.

as.NA if(length(ans) == length(x)) attributes(ans) <..dyn. n = n. Burns v1.is. c("trigamma".0.loaded(symbol. as.pmatch(n. "low"]). dup=T) if(any(is.as. "x"]) alldat[xna.cbind(x = x.integer(dim(alldat)[1]).numeric(x) if(any(x <= 0. POLYGAMMA 257 The ﬁrst two commands are the same—they merely use diﬀerent conventions for indicating which polygamma function to compute.double(alldat[. "tetragamma". na. "pentagamma". as. "terms"]).0 . terms = 5) { if(!length(x)) return(x) if(is.na(n))) stop("unknown or ambiguous name") } else { n <.C("polygamma_Sp". low = low. "high"]). high = high.o") ans <. low = 0. "x"] <. as.10.double(alldat[.rm = T)) stop("only positive values allowed") alldat <.integer(alldat[. high = 100. "n"] + 1)))[[ 1]] ans[xna] <. "hexagamma").double(gamma(alldat[.30 xatt <.round(n) if(any(n < 1)) stop("n must be a positive integer") } terms[terms > 30] <. as.C("polygamma_Sp"))) poet. "x"]).na(alldat[.character(n)) { n <.load("polygamma. terms = terms) xna <.5 * alldat[xna. "n"]).integer(alldat[.0001.3. The last call is for a single x value at four diﬀerent orders of the derivative. n. as. as.attributes(x) x <. "low"] if(!is. Here is the S code: "polygamma"<function(x.double(alldat[.xatt ans S Poetry c 1998 Patrick J.

416930476e17. then just return it—it is often much cleaner when functions work with zero length inputs.0/870.0/14322.258 } CHAPTER 10.038778101e26. Burns v1. and then replaced by an arbitrary number—we’ll put the missing values back in the end. 8615841276005. so we use cbind to do all of the necessary replications and issuing of warnings in a single line of our code. #include <math. -1. S Poetry c 1998 Patrick J.929657934e16.652877648e28.849876930e30.371165521e13. i < *len. Below is the C code that computes the values. Another kludge follows in which the locations of missing values are noted. -23749461029. 43867.386542750e32. *n.208662652e23.0.0/30. -1. *nfact.0/42. The call to cbind is a kludge of signiﬁcance. -1.0/798. 5.0/2730. n[i].0. -1. -3617.0.0/30. and scream if it isn’t right.0.0.0. NUMBERS Here are the steps that the function performs. high[i].883323190e14. double polygamma().0. 2.0.0/138. -691. *low. Test the validity of the input. 8.0.0. then convert to the numeric order of the derivative. -2.0. for(i=0.0/66. and then clean up the result before returning.0/6. If n is given as character.115074864e21. This is listed in a logical order.h> void polygamma_Sp(x. *high. -236364091.0/6. } } /* Bernoulli numbers of even order from 2 to 60 */ static double bernou[30] = {1. low. *terms.0/6. -1. 4. 8553103.0.0/330. We want the function to be vectorized in all of the arguments.0/6. 2577687858367. -7709321041217.139994926e34}.0.0. nfact) long *len. high. Save the attributes of the input x so that they can be given to the output. 2.0 .0/2730. i++) { x[i] = polygamma(x[i]. If the length of the input is zero. -174611. { long i.500866746e24. 854513. len. terms. Such compilers would be upset with the order of these functions. 3. -5. 7. Finally we call C where the real work is done.0/510.033807185e19.0. -4.0/510. but note that some compilers insist that static objects be deﬁned in a ﬁle before they are used. terms[i]. double *x. low[i].0. 7. n. We’re not interested in the matrix that results—we only want everyone to be the same length. nfact[i]). -2.0. 1.

nexp).0 + nd * . double t0. while(x < high) { ans = ans + asign * nfact * pow(x. low. Burns v1.nd). . x = x + 1. nexp.0 * i + nd + 3.0) / (2. Here’s the idea of the algorithm that the polygamma C S Poetry c 1998 Patrick J. } ans = ans + asign * ser.10.0 * i + 3. high. high.0) * (2.nd . i < terms. ser = 0.0). double asign. ans = 0.0. return(ans). i++) { if(n ==1) { t0 = t0 * x2_inv.0 + nd * .1. double x. nd = (double) n. terms. -2. if(x < low) { return(asign * nfact / nd * pow(x. n.5 / x).0) * t0 * x2_inv.nd) * (1.0) / (2. x2_inv. asign = (n % 2) ? 1. { /* * polygamma function of a real positive x * no checks are made here on the suitability * of arguments */ long i. POLYGAMMA 259 static double polygamma(x.5 / x)).0. } nexp = . terms. for(i=0. low. } t0 = nfact / nd * pow(x.0 * i + 4.0. } else { t0 = (2.0 .0. nfact. nfact) long n. ser = t0 * ( 1. .0 * i + nd + 2.0 : -1. x2_inv = pow(x.3. } ser = ser + bernou[i] * t0. } polygamma Sp is called from S—it merely vectorizes the call to the real computing function.0.

1.92.5120891127) polygamma(x.c(1.pg } > fjjcheckpolygam() [1] 1.5120891127) polygamma(x.c(1.4141726631.168104e-09 [4] -1. We have a formula for the function at x in terms of the function at x + 1. 2. 3) pg <.c(1.. Let’s revise the function so that we can control the computations and investigate further. n) .430390e-11 [4] -1.6789231293. 1.11. 1. 1.11.8170975731. > fjjcheckpolygam function() { x <. -0.4141726631.4426631755. -1. If x is neither big nor small. If x is close to zero. > fjjcheckpolygam function(. -0.260 CHAPTER 10.252483e-10 1.319442e-10 2. If x is large.98. 1.11. 3.8170975731.11. NUMBERS function uses.090934e-10 3..98) n <. 3) pg <. 3. We can achieve some measure of comfort by seeing that our answers match some of those listed in Abramowitz and Stegun (1964). -1.c(1. 2.168104e-09 [4] -1.956035e-11 1. 4.4426631755.126171e-09 -9. 1.3602088083. 1. 4. then we have a good approximation.3602088083. then use the recurrence formula n times where x + n is larger than the lower bound for “big”.6789231293. All of the results of the following function should be close to zero. 1.090941e-10 3.) . 2. 1. 0.98. .. but the values for the tetragamma (third and ﬁfth) are a little worrisome. 2.090934e-10 3.pg } > fjjcheckpolygam(terms=9) [1] 1. 1. 0.713751e-11 S Poetry c 1998 Patrick J. Burns v1.c(1. n. 0.) { x <.0 .076084e-11 2.98) n <.095.041279e-11 In general this looks good. 1. 0.c(1.126171e-09 -9.095. Checking that we are getting correct answers is a little problematic.076084e-11 2.076095e-11 9. then we have a good approximation. 1..041279e-11 > fjjcheckpolygam(high=200) [1] 1.319442e-10 2.92.

. 1000.. 10. 10.796090e-07 [4] 4. high=900))).) { if(length(n) != 1 || length(mult) != 1) stop("bad input") partials <.097891e-11 1) 2) 3) 20) We can do a transformation to see about how many signiﬁcant digits we get: > round(-log10(abs(fjjpolygammult(c(0. 10.7 6. Another check is to see if the multiplication formula works.730390e-16 0.270955e-05 4. n) partials <.362287e-07 [4] -1.partials)/ abs(partials)) } One side of the equation is subtracted from the other so that the result should be near zero.663615e-07 -1. n.072273e-05 -2.5. so just making sure that we use it properly is good enough. 1e6).3 6. 5. 100. 10. 1e6).5. 100. 1) [1] 9. [1] 2. .049608e-16 -6.5. 5.407812e-08 -3. Here is how we do: > fjjpolygammult(c(. [1] -4.598018e-16 5.5. and the diﬀerence is normalized so we get the relative diﬀerence.1))/ mult. . 100.) . The formula is: ψ (n) (mx) = 1 mn+1 m−1 ψ ( n) x + k=0 k m (10. 1000.636471e-05 -1.574678e-11 -4. The “bad input” error message would be too terse if this were a function that would be used much.10. 1e6).364444e-07 4.955094e-10 1.. 5. mult) drop((polygamma(x * mult.000000e+00 2.860500e-03 -2.000000e+00 1. 5. 1e6).partials %*% rep(mult^( . (0:(mult .6 7.5. [1] -1.668093e-13 > fjjpolygammult(c(.104482e-16 0..513223e-07 6. "+"). [1] 6.polygamma(outer(x. But this is a private function that is just for verifying.8 12. mult = 2. 1e6).413523e-16 [4] -1.0 . > fjjpolygammult function(x. 100. n.117582e-16 > fjjpolygammult(c(. Burns v1.377635e-13 > fjjpolygammult(c(.3. 2. 100. 1000. but changing high does give us the accuracy contained in the table. 1000. 10. POLYGAMMA 261 Increasing the number of terms has little eﬀect.(n + 1)).2) The following function vectorizes this with the help of the outer function.124138e-16 [4] -1. 5.7 4.8 S Poetry c 1998 Patrick J. + 1000.

load("digamma.. however. 5000. > digamma(-2:5) [1] NA NA NA -0.] -0.000101002777700007 [3.0 .0000000i "digamma"<function(x) { if(!is.000101002777700007 [2.9145915+0.000101004999611128 [6.C("digamma_real_Sp"))) poet. im=4:0)) [1] 1.1079807+1.9208073i 0. 10.5772157 [5] 0.9227843 1.9946503+0.2561177+0. 2.4041297i [3] 0.3766740i [5] 1.] -0.6957963i 1. but it comes afterwards because complex as well as numeric arguments are allowed in the S implementation.o") if(is.] -0.000101004999832995 [7.3915363+1. show the value of S to investigate the quality of the computations.1] [1.] -0. S Poetry c 1998 Patrick J.000101004860945850 [4.C("digamma_complex_Sp".5061177 > digamma(complex(re=0:4.] -0.length(x) if(!n) return(x) ans <. The following examines the eﬀect of changes in the high argument for tetragamma of 100: > print(as.000101004999833349 The diﬀerences in these values shows that it is probably too much to hope for to have one single choice for the parameters that control the algorithm.loaded(symbol. 1000.matrix(polygamma(100. 200.complex(x)) { n <.4227843 0.100. specialsok = T. NUMBERS Values near 100 seem to be the most troublesome.dyn.000101004991152816 [5. high=c(50.2561177 1.4 Digamma The digamma function is more useful and simpler than the polygamma.] -0. NAOK = T. This does. + 400. 1e5))).262 CHAPTER 10. digits=15) [.] -0. Burns v1.

attributes(x) ans } else { storage.333333333333334e-02. i++) { if(is_na(x + i."double" .575757575757576e-03.C("digamma_real_Sp".607510546398047e+03 S Poetry c 1998 Patrick J.968253968253968e-03. } } static double stirling[] = { -8.almost = complex(n)) ans <. 8.166666666666667e-03.333333333333333e-03.053954330270120e+00.4. as.432598039215686e-01.almost + log(ans$z) attributes(ans) <.mode(x) <.333333333333333e-02. DOUBLE. DOUBLE)) . DOUBLE. -3. 3. Is_NaN). 4.integer(n). 4. NAOK = T. 2. -3.10. double *x.ans$ans. -7. double digamma_real_pos(). x. else if(is_inf(x + i.109279609279609e-02. else if(x[i] <= 0. DIGAMMA z = as.645621212121212e+01. ans.integer(length(x)))[[1]] } } 263 This function contains only a test of the mode of the input to decide which of two C functions to use.814601449275362e+02. { long i.complex(x).h> void digamma_real_Sp(x. DOUBLE)) inf_set(x + i. specialsok = T. -8. else x[i] = digamma_real_pos(x[i]). -2. as. Burns v1.0 . for(i=0. 2. n) long *n.0) na_set3(x + i. Here’s the C code: #include <math. 1).h> #include <S. i < *n.

CHAPTER 10. x = x + 1. /* the usual shortcut of *z for z[] is not a good idea here */ { long j.1. } void digamma_complex_Sp(z. x_inv. x_pow = x_pow * x_inv. while(x < upper) { ans = ans . } x_inv = 1. upper = 19. return(ans). n.0 + x) + euler_one. i++) { ans = ans + stirling[i] * x_pow. double upper. upper.NO CHECKS HERE */ if(x < lower) { ans = . for(i=0. lower = 1. i < 12.264 }.0e-8.0 .5 * x_inv. x_pow. Burns v1.0 / x. ans.0 / x. } return(ans).1. /* euler = -.0 / x . ans[]. x_inv = x_inv * x_inv. x_pow = x_inv. ans = ans + log(x) . } ans = 0.422784335098467139393488.5.0.0 / (1. i. euler_one..577215664901532860606512. { long i. NUMBERS static double digamma_real_pos(x) double x. complex z[]. /* expects x to be positive and finite .0. ans) long *n. */ euler_one = . S Poetry c 1998 Patrick J.1. double lower.

ans[j]..re * y.im.im = ans[j].re * y. inverse_complex().re .x. } S Poetry c 1998 Patrick J.im .im = ans[j]. z_inv). j < *n. z_pow = mult_complex(z_pow. COMPLEX)) { continue.re + stirling[i] * z_pow. return(ans).re = ans[j].re + x.0 .re = ans[j].im + stirling[i] * z_pow.re = 0. ans. z[j]. complex mult_complex().im * y.re. i < 12.5. y) complex x. } z_inv = inverse_complex(z[j]). continue.im = 0.re < upper) { z_inv = inverse_complex(z[j]). i++) { ans[j]. COMPLEX.re = z[j]. while(z[j]. z_pow = z_inv.0.im * y. } ans[j]. z_inv). 265 for(i=0.z_inv. Burns v1.im.im. DIGAMMA complex z_inv. z_pow.re . Is_NaN).re. for(j=0.4. ans.im.0. j++) { if(is_na(z + j.10.re .re = ans[j]. z_inv = mult_complex(z_inv.im . ans[j]. { complex ans.5 * z_inv. ans[j]. } if(is_inf(z + j.im..re. upper = 19. ans[j].0.im = ans[j]. } /* ans still needs log(z) added to it */ } } static complex mult_complex(x.5 * z_inv.z_inv. ans[j]. ans[j].im = x.re + 1. COMPLEX)) { na_set3(ans + j.re = x. y.

h. NUMBERS The computing algorithms are analogous to the polygamma algorithm. it is a non-trivial task to verify that our computations are correct. Another diﬀerence is that the complex version has no lower cutoﬀ—it iterates until the real part is large enough.266 static complex inverse_complex(x) complex x.im / den. double den. Note that the .im * x. ans."double" This is not such a bad thing to happen—it doesn’t seem worth creating an error for this. We need a complex logarithm also. An introduction to dealing with special values in C is on page 175.C call needs additional arguments so that special values can be passed into C. though. but that can be done at the end in S.re = x. As with the polygamma functions. One thing we know for all real numbers is that the complex version should match the real version: S Poetry c 1998 Patrick J. Burns v1. { complex ans. Complex arithmetic operators are not available to us.0 . This code provides an example of how to test and set special values in C code written for S.re / den.im = .re * x.im + x. ans. The comment in digamma real pos about checks on appropriate inputs is to help ensure that good code will be written when this routine is stolen for another purpose. all we need is complex multiplication and inversion.x. In this case. To perform this algorithm. so we need to make our own.mode(x) <.re. Hence it will be slow to compute values for numbers with large negative real parts. the parameters are hard-coded in C. return(ans). The C routines called by S take care of special values with the macros that live in S. Here is a test of garbage input: > digamma(letters) [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [18] NA NA NA NA NA NA NA NA NA Warning messages: 26 missing values generated coercing from character to numeric in: storage. den = x. } CHAPTER 10.

4.complex(10:20))) [1] 0. 2.000000e+00 0.9i. Here is a similar test with complex numbers: > fjjcheckdigamcom function() { z <. 1.digamma(as.6001618527) digamma(x) .4227843351.808776e-11 We only have 10 digits from Abramowitz and Stegun except for the ﬁrst value which is Euler’s constant.41907i.517271e-06i This also gives as good of results as we can expect.306153e-06-1.016396547. > fjjcheckdigam function() { x <. -0.digamz } > fjjcheckdigamcom() [1] 2.055.440892e-16 4.31332+1.c(1.537704e-07-3.574195e-06i [2] 2.4i) digamz <. 1. 0.574552e-11 2.293676e-11+2.51127i) digamma(z) .000000e+00 [10] 0.000000e+00 267 This is not an especially good test to see if we get the right answers (since both algorithms could be wrong).216597e-06i [3] -2.000000e+00 0. 100) digx <.78533+1.440892e-16 0.0 . so the test passes. 1+8. but it is a good test to make sure that we haven’t made a blunder in one of the C functions—the complex arithmetic would have been a likely spot.000000e+00 [4] 0.965.545408e-12 [4] -1. DIGAMMA > Mod(digamma(10:20) .42179i.1294144191+1. 0. 4. S Poetry c 1998 Patrick J. Here is a function that returns the diﬀerence between the result of digamma and values from a table in Abramowitz and Stegun (1964).4+5.000000e+00 0.330669e-15 -1.000000e+00 0.c(2. 56.000000e+00 [7] 4.3999605371.455547e-11 3.c(2+10i.4902094448.57721566490153.c(-0. Burns v1.000000e+00 0. 1.digx } > fjjcheckdigam() [1] -3.10.4. 1.533662e-12 2. 2.

5) + log(2) . When both types of S Poetry c 1998 Patrick J.155134e-16 2.110223e-16 [7] 9.000000e+00 [10] 4.930137e-16 [4] 2.440892e-16 We can also use the reﬂection formula as a test.000000e+00 1. (10.110223e-15 So far we haven’t found a reason to disbelieve digamma.5 Continued Fractions Continued fractions are a way of approximating a function.881784e-16 1. The equation is ψ (2z ) = 1 1 1 ψ (z ) + ψ z + 2 2 2 + log(2) (10.930137e-16 [10] 1. They often represent an alternative to a series approximation to the function. + im=rnorm(10.081668e-16 9. We can use it to reassure ourselves.220446e-16 1.digamma(2 * z)) } > fjjdigamdup(complex(re=1:10. sd=1000). im=2:11)) [1] 3.110223e-16 0.950904e-16 0.5 * digamma(z) + 0.complex(re=rnorm(10.digamma(1 z)) } > fjjdigamreflect(jj <.110223e-16 1.220446e-16 4.440892e-16 [4] 8.0 .776357e-15 9.4) 10. The equation is ψ (1 − z ) = ψ (z ) + π cot(πz ) and the test function is: > fjjdigamreflect function(z) { Mod(digamma(z) + pi/tan(pi * z) . sd=1000))) [1] 8.140185e-16 2. NUMBERS There is a duplication formula for the digamma.3) Once again in the test function.220446e-16 2.268 CHAPTER 10. we subtract one side of the equation from the other so that the result should be near zero: > fjjdigamdup function(z) { Mod(0. Burns v1.110223e-16 [7] 1.5 * digamma(z + 0.

CONTINUED FRACTIONS 269 approximation exist for a function.nrow(num) if(n < 2) S Poetry c 1998 Patrick J.as. b) { n <. "continue. it can be the case that the continued fraction converges fast in one part of the domain and the series converges fast in the other.matrix(den) if(any(dim(num) != dim(den))) stop("num and den do not match") n <. Burns v1.length(a) if(length(b) != n) stop("a and b do not match") if(n < 2) stop("not enough terms") bottom <.b[n] for(i in n:2) { bottom <.as.0 .10.6) which gives the opportunity to add an ellipsis since most continued fractions of interest are inﬁnite.5.b[i . The ﬁrst thing that we do is write a simple function to compute a continued fraction: "fjjcontinue. A continued fraction can be written as a1 (10. den) { num <.5) a2 b1 + b2 + b a3 3 but for typographical convenience is usually written a1 a2 a3 b1 + b2 + b3 + (10. so it is of little value.matrix(num) den <. Below is one way of vectorizing the computations—I’m not convinced that it is the best way.fraction"<function(num.fraction"<function(a.1] + a[i]/bottom } a[1]/bottom } This has the distinct disadvantage that it is not vectorized.

and the number of rows is the number of terms in the continued fractions.fraction. Real values of z must be positive.1. length = outlen) tseq <. ] <. ] + num[i.den[i . "+") * tseq num[1. ] for(i in n:2) { bottom <.8) This is relatively easy to translate into S: "fjjexp.integral"<function(z.0:(nterms . As always. "+") exp( . the number of columns is the length of the result.den[n.1.rep(z.0 . bottom } num[1. Although this function could be called directly.outer(tseq . Burns v1.270 CHAPTER 10. den) } The outer function is instrumental in creating the matrices that are required by continue. length = outlen) n <. Press et al (1992) give the continued fraction in which we are interested: En (z ) = e−z n 2(n + 1) i(n + i − 1) 1 ··· ··· z + n− z + n + 2− z + n + 4− z + n + 2i− (10. NUMBERS stop("not enough terms") bottom <. length(n)) z <.fraction(num.7) where n is a non-negative integer. n. it is more likely that another function that returns a mathematical function will call it.1) num <.outer(2 * tseq.. we turn to the tables in Abramowitz and Stegun (1964) to see if we are doing it right: S Poetry c 1998 Patrick J. z + n. ]/bottom ]/ } This function takes two matrices of the same size. It is deﬁned as ∞ En (z ) = 1 e−zt dt tn (10.max(length(z). One function that can be computed well with a continued fraction is the exponential integral.rep(n.1 den <.z) * continue. n. nterms = 9) { outlen <.

0.682575e-05 -9.636779i.enzl } > fjjcheckexpintcomp(100) S Poetry c 1998 Patrick J.073663e-07 2.756406e-07 -2.01.559773595.111485e-08 2.01) n <. 3.0 . 0. 0.527789e-08 > fjjcheckexpint(100) [1] -4. 0.953909e-08 2.01. 0.c(0.219228+0.582066e-08 3. n. 4.01.328954e-08 -2. 0.946083i.031852e-08 [10] -3.990076e-03 -1.397079e-03 -2. 0.05241438.879388e-07 2. 0. 3.188219e-06 [7] -5. We can also check a few points for complex numbers: > fjjcheckexpintcomp function(nterms) { z <.223099826. Burns v1. 0. 0.632293e-08 > fjjcheckexpint(200) [1] -4.627360e-08 [7] -4.1463199.998339e-08 1. 20) enx <.01. 2.5+0.integral(z.337404+0.95. 0.268147e-10 [4] -3.052079. 1.c(-2+0.enx } > fjjcheckexpint() [1] -1.1098682. CONTINUED FRACTIONS > fjjcheckexpint function(nterms = 9) { x <.098227e-10 -2.084973.01. 20.998339e-08 1.961532+0.038522e-05 -1.257321e-06 -1. 0.489412e-04 [4] -7.098228e-10 -2.238392e-10 [4] -4.c(-4.6i) n <. 0.031852e-08 [10] -3. 1.632293e-08 271 The continued fraction is reputed to converge slowest for small numbers. 2.01.0359929. 1i.integral(x. It appears that using 100 terms in the continued fraction is a reasonable compromise between speed and accuracy. 0. 1. 1.1082179.953909e-08 2. n.320013e-10 -2.5. 1.627358e-08 [7] -4.582066e-08 3.5.c(1.0181539) fjjexp.10. length(z)) enzl <. 0.111485e-08 2.320013e-10 -2. nterms = nterms) + log(z ) .c(1. 0.4902766. 1.01. nterms = nterms) .2i.107696e-07 2. 4.01.090723e-06 -1. The algorithm in Press et al (1992) uses a series approximation instead of the continued fraction for numbers less than 1. 0.533094e-05 -9.3283824. -0.99. 10.rep(1. 0.198577e-05 [10] -6.218215i) fjjexp. 10. 1.

709903e-08+7.709903e-08+7. but we do have an algorithm that works for both numeric and complex arguments that was quick to create.outer(tseq . n.712842e-07i > fjjcheckexpintcomp(5000) [1] -3.616037e-07-2. but we are left in the uncomfortable situation of not knowing where in the complex plane the approximation is going to be poor.616037e-07-2.attributes(x) x <.036718e-08i [3] -2.length(x) if(length(n) == 1) n <.036718e-08i [3] -2.364546e-05i [2] 7.1.580492e-01+8.0 ..0:(nterms .268037e-04-5. the function can be made to behave more like other mathematical functions. then we would want to create code (in C) that would ensure that we got good accuracy for every value. x + n. S Poetry c 1998 Patrick J. NUMBERS We now have good evidence that the function is coded properly.272 [1] -3. "exp. ] <.rep(n.710208e-08+7. The restriction put on n may cause some annoyance.exp( .532514e-02i [2] 7.x) * continue.1) num <. nterms = 100) { if(!length(x)) return(x) xatt <.vector(x) outlen <. but more likely it will catch blunders. "+") ans <.as.integral"<function(x.fraction(num.712842e-07i > fjjcheckexpintcomp(1000) [1] 8.712842e-07i CHAPTER 10.xatt ans } If we needed an accurate calculation of the exponential integral function. We don’t have that here.616037e-07-2. Finally. n. den) attributes(ans) <. "+") * tseq num[1. outlen) else if(length(n) != outlen) stop("n must be one long or the length of x") n <.1 den <. Burns v1.round(n) tseq <.outer(2 * tseq.648335e-07i [2] 7.853327e-07-1.036693e-08i [3] -2.

713. missing values and inﬁnity. Try to write a better version of continue. c(353. even without a proper ﬁx? Find and ﬁx any other problems with rational numbers. and ﬁll in any other holes in the implementation.div function to use C code. 213)) > > jjr3[1] * jjr3[2] * jjr3[3] [1] 1/233315703 > jjr3[1] * jjr3[2] * jjr3[3] * jjr3[4] [1] -1/1843362813 How would you ﬁx this problem? Are there changes that can be made to reduce the problem. Is it worth changing to C code? Which algorithm works better? A description of the binary algorithm is given in Knuth (1981). THINGS TO DO 273 10.rationalnum uses. Create a function that does not have such a limitation. Is it robust to diﬀerences in white-space? Make rational numbers work within arrays.10. Create functionality for rational complex numbers. Young and Gregory (1973. Include matrix multiplication. 927. Write functions that allow inﬁnite-precision arithmetic. Make a class of objects of rational numbers in an arbitrary base. volume 1) has a chapter on roots of polynomials. S Poetry c 1998 Patrick J. Revise the great.6 Things to Do Extend the functionality of numberbase to non-integers. Is there a better way of organizing the computations? Teach numberbase to do mathematics. Test them for speed and memory use. There is no error given when there is overﬂow in the denominator or numerator in rationalnum objects: > jjr3 <. then what base should the answer be? How will you test the results? Write a function based on scan that will read in rational numbers printed in the same format that print. The polyroot S-PLUS function ﬁnds the roots of a polynomial of order at most 48.0 .common. If the bases of two numbers in an operation are diﬀerent.fraction. data frames and so on. Write both a Euclidean algorithm version and a binary algorithm version of the C code.6. Burns v1. Try out at least three ways of evaluating polynomials.rationalnum(1.

instead of each quantity being a single number. 10. B. It presents pretty much all you are likely to want to know about rational arithmetic. Approximations. Hamming (1973) Numerical Methods for Scientists and Engineers. 10. Hamming discusses the polygamma functions. H. Auden “In Memory of W.7 Further Reading Knuth (1981) Seminumerical Algorithms contains a great deal of interesting material related to mathematical computing. you have a range of numbers that the quantity is known to lie in. Wall (1948).274 Create functions that do symbolic mathematics. Yeats” S Poetry c 1998 Patrick J. Press et al (1992) Numerical Recipes in C. Other topics include radix conversion and continued fractions. Abramowitz and Stegun (1964) Handbook of Mathematical Functions contains much information on mathematical functions including the digamma and polygamma functions and the exponential integral.0 .8 34 35 Quotations William Butler Yeats “The Cat and the Moon” W. Second Edition provides many numerical algorithms. In particular. Other books that discuss numerical computation include Young and Gregory (1973) A Survey of Numerical Mathematics. Burns v1. tables of values and relationships between functions are just some of what is included. they have one for the exponential integral for real values. Write a suite of functions to do interval arithmetic. gives an in-depth treatment of continued fractions. That is. but uses the Γ(x + 1) form.

trace = F) { ploc <. "}") else perl.attributes(x) ans } 275 .post).".post <.". preface = "". "while(<>) {") if(print) perl. c(perl.Chapter 11 Character Here the focus is character data.c(". No belly and no bowels.pre <. out = F) } ans <. It assumes that the same action will be performed on each element of the input x.unix(paste("perl". Only consonants and vowels.\tprint $_ . x) if(length(ans) == length(x)) attributes(ans) <. 36 11. preface.1 Perl This is a function that provides a simple interface to Perl.unix(paste("cat". "}") cat(file = ploc. cmd.post <.c(".pre. print = T. sep = "\n") if(trace) { foo <.c("#!/usr/bin/perl". ploc).tempfile("sperl") on. cmd.exit(unlink(ploc)) perl. "perl"<function(x. ploc). perl.

"A-Z". > jjsn <. I have presented this in reverse order. delete. new = "". but this is probably the most natural for S. new.paste(c("c". sep. squash)]. squash = F.paste(old. Burns v1. sep = "") if(length(ptext) != 1) stop("old and new must each be single strings") perl(x. " . ptext) } In good textbook fashion. Finally the script is run. complement = F. collapse = "") if(complement) old <. This function starts by creating a ﬁle that contains a Perl script. delete = F. "\\n". and the answer is given the attributes of the input if the lengths match. Switching between upper and lower case is one use for transcribe. "a-z") [1] "massachusetts" "michigan" [4] "mississippi" "missouri" [7] "nebraska" "nevada" [10] "new jersey" "new mexico" > transcribe(jjsn. then the script is printed—this is useful for debugging purposes. flags. sep = "/") { flags <. "A-Z") [1] "MASSACHUSETTS" "MICHIGAN" [4] "MISSISSIPPI" "MISSOURI" [7] "NEBRASKA" "NEVADA" [10] "NEW JERSEY" "NEW MEXICO" Other amusements include: S Poetry c 1998 Patrick J. old.state.name[21:32] > jjsn [1] "Massachusetts" "Michigan" [4] "Mississippi" "Missouri" [7] "Nebraska" "Nevada" [10] "New Jersey" "New Mexico" > transcribe(jjsn. sep. sep. In reality. "s")[c(complement. sep = "") ptext <. a version of transcribe came ﬁrst—that was the task at hand—then the Perl part of the function was abstracted out and transcribe rewritten. "transcribe"<function(x.0 "Minnesota" "Montana" "New Hampshire" "New York" "minnesota" "montana" "new hampshire" "new york" "MINNESOTA" "MONTANA" "NEW HAMPSHIRE" "NEW YORK" .paste("tr". "d". The transcribe function uses perl.276 CHAPTER 11. CHARACTER There are other ways to use Perl of course.". "a-z". old. If trace is TRUE.

"substifile"<function(filenames. sep = "") S Poetry c 1998 Patrick J.paste("perl -p -i". backup. squash=T) [1] "Mschsts" "Mchgn" "Mnst" "Msp" [5] "Msr" "Mntn" "Nbrsk" "Nvd" [9] "Nw Hmpshr" "Nw Jrsy" "Nw Mxc" "Nw Yrk" > transcribe(jjsn. collapse = " "). "aeiou". and vice versa. + "a-z".name". but to change ﬁles. sep.11. old. new. old. " . "_". Here the intent is not to change S objects. Burns v1. PERL > transcribe(jjsn. "aeiou". sep. comp=T) [1] "_a__a__u_e___" "_i__i_a_" "_i__e_o_a" [4] "_i__i__i__i" "_i__ou_i" "_o__a_a" [7] "_e__a__a" "_e_a_a" "_e___a____i_e" [10] "_e___e__e_" "_e___e_i_o" "_e___o__" > transcribe(jjsn. "aeiou".". you can use transcribe to change names valid in C to valid S names. "_") [1] "state_name" This is its use in portoptgen of page 348.name" > transcribe("state. backup = ". sep. " -e \"s".") [1] "state. paste(filenames.1. new)))) stop("sep character in old or new") cmd <. sep = "/". comp=T. "aeiou". The substifile function uses Perl in a diﬀerent fashion. ". ". "a-z".0 . delete=T). delete=T) [1] "Msschstts" "Mchgn" "Mnnst" "Msssspp" [5] "Mssr" "Mntn" "Nbrsk" "Nvd" [9] "Nw Hmpshr" "Nw Jrsy" "Nw Mxc" "Nw Yrk" > transcribe(transcribe(jjsn. > transcribe("state_name". new.jj") { if(length(old) != 1 || length(new) != 1) stop("old and new must be single strings") if(nchar(sep) != 1) stop("sep needs to be a single character") if(any(AsciiToInt(sep) == AsciiToInt(c(old. "_". delete=T) [1] "aaue" "iia" "ieoa" "iiii" "ioui" "oaa" "eaa" [8] "eaa" "eaie" "eee" "eeio" "eo" 277 When you’re done playing. "_".\" ".

half <.nchar(x) blanks <. type = "r") { type <. right = paste(blanks. blank. if followed by a string. left = paste(x. then backup copies of the original ﬁles are created.nchar(blanks) %/% 2 paste(substring(blanks. max(ncx) . quite a lot can be done with its in-built functions.unabbrev. c("right".half).paste(rep(" ". 1.h header ﬁle. The substifile function is used later (in portoptgen. blanks. 1. S Poetry c 1998 Patrick J.as.278 unix(cmd.character(x) ncx <. and ﬁnally the command is executed. An alternative to Perl for manipulating character strings is C. The p ﬂag to Perl invokes a while loop like the one that the perl function creates. but you have no control over how it does it.0 . The i ﬂag means to do the changes in place. x. Burns v1. adding the string as a suﬃx to the ﬁlenames. "center". The format function does this. substring(blanks. page 348) to modify template ﬁles to be speciﬁc to a particular case. CHARACTER A vector of ﬁle names is taken as well as a string to be replaced and a string to substitute in. out = F) } CHAPTER 11. Most often of use are paste. "left")) x <.2 Home Brew Although S is relatively weak on functionality for character data. sep = ""). "justify"<function(x. substring and nchar. 11. then the command is pasted together. which is the exit status of the command.value(type. Some checks are made to ensure that the input is okay. Basic functionality is obtained with the strings.substring(blanks. max(ncx)). collapse = "") blanks <. Here is a function to justify a vector of character strings. The return value of substifile is the return value of the call to unix with output=F. x.ncx) switch(type. sep = ""). center = { blank.

] "income level " [4. "l")) [.name"<function(x) { xnum <. HOME BREW blank. "valid. 65:90.0 .c(48:57.rep(NA.] " market potential " There is a description of “valid” S names on page 4.x)[[2]].s.] "market potential " > as.1] [1.11.] "price index " [3. All the rest of the function is quite simple.table <. AsciiToInt) the. table = the.unlist(lapply(xmatch.] "lag quarterly revenue" [2.] " market potential" > as.matrix(justify(dimnames(freeny. Burns v1.] " income level" [4.x)[[2]]. 46. > as. sep = "") } ) } 279 The blanks variable is ﬁrst created as a single string of a certain number of blank spaces.] "lag quarterly revenue" [2. Below is an illustration of the three types of justiﬁcation. function(z) any(z == 0) || all(z < 11) || !length(z))) ans <. nomatch = 0) good <.table.1] [1.lapply(x.1] [1. 97:122) xmatch <.half + 1). function(z) all(z > 10))) bad <. length(x)) S Poetry c 1998 Patrick J.x)[[2]].matrix(justify(dimnames(freeny.] "lag quarterly revenue" [2.] " price index" [3. "c")) [.2.] " price index " [3. "r")) [.] " income level " [4.matrix(justify(dimnames(freeny. match.lapply(xnum.unlist(lapply(xmatch. then it is replicated to the length of x via substring. here is a function that determines if each element of a character vector is a valid name.

The ﬁrst thing is to use lapply to eﬀectively vectorize AsciiToInt. The find.name(manifest)] These can be checked with diffsccs by knowing the naming convention. Although this does get the answer. Suppose that you have an object manifest that is a vector of the names of all the functions you want to have controlled.names(x) ans } . Notice that special attention needs to be given to the empty string—this falls under the heading of “test along seams” discussed on page 53. i.T ans[ugly][!xug] <. CHARACTER ans[good] <.F ugly <.unlist(lapply(xmatch[ugly].name(manifest)]) { cat("\n".ok = F) { string.I.code <. Page 301 tells why we would want to do this.AsciiToInt(string) S Poetry c 1998 Patrick J. Some of these may be operators or assignment functions. You can issue a command like: for(i in manifest[valid. > find. All of the rest is dealing with subscripting with logicals.I.of function does a very speciﬁc thing—take a string and ﬁnd the location of calls to the I function in it.T ans[bad] <.s. and have a diﬀerent name on the dump ﬁle.of function(string. function(z) z[z != 11][1] > 11)) ans[ugly][xug] <.F } names(ans) <. "\n") print(diffsccs(i)) } Then view the names of the strange ones with: manifest[!valid. then ﬁnd where the calls end. we are straining ourselves for such a simple request.s.!good & !bad if(any(ugly)) { xug <. Burns v1.280 CHAPTER 11.0 . nesting. So here’s the game: Find where the “I”s are that correspond to calls to the function. A use of this function is to make sure that a group of functions are under proper source control.

2.string. F) if(!any(Istart)) return(NULL) closes <.0 .of(jjs3) S Poetry c 1998 Patrick J.2] [1.par <.Istart. .string.AsciiToInt(")") opens <. diff(ans[.closes) Istart.num).par Istart <.lev)[this. HOME BREW if(!any(eyes <.1] [.of(jjs1) NULL > jjs2 [1] "~ jj1 * I((2*jjI)+1)" > find.num[i] this.I.lev])) + loci } if(!nesting. min(seq(along = this. drop = F] } ans } 281 The hard part is ﬁnding the end of the call.num <.seq(along = Istart)[Istart] ans <.array(0.I.string.] 9 20 > jjs3 [1] "~ jj1 * 2:4" > find.AsciiToInt("(") close.I. The for loop ﬁnds all of the calls.lev)) stop("no closing parenthesis for I") ans[i.par <. 2)) for(i in 1:length(Istart.par counts <. Here are some examples: > jjs1 [1] "~ jj1 + 2:4 * jjI" > find.eyes & c(opens[-1].of(jjs2) [.cumsum(opens .code == open.ok && nrow(ans) > 1) { ans <. Notice the drop = F to ensure that we always end up with a matrix.11.loci] == counts[loci] if(!any(this.num)) { loci <.lev <. 2]) > 0). Burns v1.code == close.code == AsciiToInt("I"))) return(NULL) open.c(0. The counts variable tells how many opening parentheses haven’t yet had a closing parenthesis to match it. ] <.counts[-1: .ans[c(T. then later the nested calls are removed if required. c(length(Istart.

11. Christiansen and Schwartz. so that it can be a simple call to your function. Burns v1.name using the perl function.of. Write a function that takes a vector and returns a list where each component of the result contains the “words” in the corresponding element of the input. S Poetry c 1998 Patrick J.] 3 18 [2.] 24 29 Notice that the call in jjs4 that has a space between the “I” and the “(” is not found by the function.0 .I.] 10 17 [3. How best can it be achieved? The comparison operators like < work with character data where the ordering comes from ASCII.I.282 CHAPTER 11. Try improving my version of the function using only “native” functionality.] 3 18 [2. How should the reserved words of S be handled? Write a function that generalizes find.] 24 29 > find.2] [1. Create a function for playing “hangman”.s.I. CHARACTER NULL > jjs4 [1] "~ I((2)* I(jee+3)) + 2+I(3^x) + I (y^2)" > find. T) [. Create a function that will compare character vectors using an arbitrary ordering of characters.2] [1. But this isn’t really much of a problem because the space won’t be there if the string is parsed and then deparsed.of(jjs4.3 Things to Do Make a list of desirable functionality for character data.1] [. How general can you make it? Rewrite valid. plus Learning Perl by Schwartz provide a good explanation of the Perl language.of(jjs4) [. Use this to create a function that will sort according to the arbitrary ordering.4 Further Reading The two books Programming Perl by Wall. 11.1] [.

Burns v1.5.5 36 Quotations John Crowe Ransom “Survey of Literature” S Poetry c 1998 Patrick J. QUOTATIONS 283 11.11.0 .

284 S Poetry c 1998 Patrick J. Burns v1.0 .

or its inverse. > jjfv <. 12. tol = 1e-10) { dx <. The Choleski decomposition provides one square root. "symsqrt"<function(x.chol(jjfv) > > max(abs(t(jjfvc) %*% jjfvc .1) where the prime denotes the transpose.t(x)) > tol * max(abs(x)))) stop("need symmetric matrix") xeig <. See also the section in the chapter on C about dealing with arrays in C code—it starts on page 173.dim(x) if(dx[1] != dx[2]) stop("need square matrix") if(any(abs(x .eigen(x. This is the particular square root that is upper triangular.x) > jjfvc <. inverse = F.var(freeny.387779e-17 Sometimes it is desirable to have the symmetric square root. Here is a function that returns it.jjfv)) [1] 1.Chapter 12 Arrays The focus is on arrays that have more than two dimensions. sym = T) 285 .1 Matrices The square root of a symmetric matrix A is deﬁned as any matrix R such that A=RR (12. but we start with matrices.

> jjsv1 <. Now test it to see that it does return a square root.] -0.x)) > jjsv1 lag quarterly revenue price index income level [1.jjfv)) [1] 1. in fact.06958443 -0. a whole group of square roots. ARRAYS if(inverse) xeig$vectors %*% (xeig$values^-0.] -0. and then checked to see that they really are square roots of the matrix in question. > jjfvs <.rmatsqrt(var(freeny.0 .19116893 0. Burns v1.286 CHAPTER 12.526557e-16 > max(abs(jjfvs .03956840 -0. shown to be diﬀerent. and then puts the matrix back together again. "rmatsqrt"<function(x) { xc <. These two are then multiplied to produce the random square root. then creates an orthogonal matrix out of a matrix of random numbers.] -0.02901060 [3.5 * t( xeig$vectors)) else xeig$vectors %*% (xeig$values^0.09945160 -0.10834290 0.t(jjfvs))) [1] 0 There are.symsqrt(jjfv) > max(abs(jjfvs %*% jjfvs .08713387 S Poetry c 1998 Patrick J.chol(x) x[] <. and that the result is symmetric. The following function returns a random square root.rmatsqrt(var(freeny.5 * t( xeig$vectors)) } This performs an eigen decomposition of the matrix.07706364 [2. Below two diﬀerent square roots are produced.runif(dim(x)[1]^2) orthmat <.x)) > jjsv2 <.svd(x)$v orthmat %*% xc } This ﬁnds one square root with chol. In eﬀect it uses a sparse matrix technique for the diagonal matrix of eigenvalues.21783334 0. takes the square root (or inverse square root) of each eigenvalue.

equal(t(jjsv2) %*% jjsv2.] -0.1119885 0.3] [1.equal(t(jjsv1) %*% jjsv1.] 2 9 6 [3.047556536 [2.01156840 market potential [1.2] [.] -0.009923954 > jjsv2 lag quarterly revenue price index income level [1. sorting the rows of a matrix will give you the transpose of what one would naively expect.] 12 13 9 4 15 [3.x)) [1] T 287 12.3] [.08579325 -0.] 1 4 5 [2.x)) [1] T > all.] -0.05075554 -0. ARRAY FUNCTIONS [4.03863553 > all.4] [.] 7 11 8 5 6 > apply(jjmr. > jjmr <.06152129 0.03474957 [2.1831166 0.06170711 [4.02301239 [3.03854587 -0.] -0.1] [.] 14 3 1 2 10 [2.] -0.2. var(freeny.021705228 [3. var(freeny.] -0. Stable Apply When the function used in apply returns a vector.] 3 12 7 S Poetry c 1998 Patrick J. then those vectors ﬁll along the ﬁrst dimension of the result.12.] -0.036462742 [4.1560023 0.] -0.07836504 -0. sort) [.] -0. 3) > jjmr [.05489805 market potential [1.5] [1.2] [.03051149 [4.1706170 0.] -0. For example. 1.0 .06217413 [2.] -0.1] [. Burns v1.matrix(sample(15).] -0.06180107 [3.] -0.2 Array Functions This section contains functions that make it easier to use higher-dimensional arrays.04130825 -0.

FUN. 1.array(1:6. Binding Arrays An operation that is fairly common with higher-dimensional arrays is to bind them together—similar to rbind and cbind with matrices.3] [.4] [.5] [1.] 5 6 7 8 11 Here is the deﬁnition of stable.. 3) . > stable. jjm4. MARGIN.] 1 2 3 10 14 [2..apply function keeps the dimensions of the output in the same order as the input.jjm3 + 7 > bind.length(dim(X)) if(length(MARGIN) != ldx .apply(X.1] [.) { ldx <. 1 [1..apply"<function(X.3)) > jjm4 <.1) { warning("stability not performed") return(apply(X.] [5.) if(length(dim(ans)) != ldx) ans else aperm(ans.0 . .2] [.apply(jjmr.2] [.3] 1 3 5 S Poetry c 1998 Patrick J.MARGIN]. The only real operation except feeding the problem to apply is to permute the dimensions with aperm (which you can read about on page 87).array(jjm3. c(2.)) } ans <..] 10 14 13 15 8 11 CHAPTER 12. MARGIN. MARGIN.] 4 9 12 13 15 [3. Burns v1. . sort) [.apply. "stable. . MARGIN))) } You can see from this that I’ve overstated the case somewhat—the dimensions are left intact only when a single dimension is being “collapsed”.288 [4. . FUN..] [. order(c((1:ldx)[ . FUN. ARRAYS The stable. Here are a couple of examples..1] [. > jjm3 <.

"bind.ldx == 1) { dx <.2.c(dimnames(x)[[margin ]]. { ldx <ldy <if(ldx 2 4 6 289 margin) length(dx <. dimnames(y)[[margin]]) S Poetry c 1998 Patrick J.dx[margin] + dy[margin] newdimnames <.dx newdim[margin] <.c(dy.6] [1. list( NULL))) } else stop("bad value for margin") } if(any(dx[ . 1) ldx <.array(jjm3. if(length( dimnames(x))) c( dimnames(x). dx.] .3] [1.array"<function(x. 1) x <.dim(y)) != ldy) stop("length of dimensions not equal") margin <.] 9 11 13 > bind. ARRAY FUNCTIONS [2. Here’s the deﬁnition.] 1 3 5 8 10 12 [2.margin])) stop("arrays not conformable") newdim <.round(margin) if(margin < 1) stop("bad value for margin") if(margin > ldx) { if(margin .12.margin] != dy[ .5] [.4] [. list( NULL))) y <. Burns v1. dy.1] [. 2) [.0 .ldx + 1 dy <. if(length( dimnames(x))) c( dimnames(y).2] [. y.1] [.] 2 4 6 9 11 13 This last command is the same as a cbind call. jjm4. 2 [.array(y.dimnames(x) newdimnames[[margin]] <.array(x.] 8 10 12 [2.c(dx.3] [.2] [.dim(x)) length(dy <. .

paste(rep( ".". "] <. margin . paste(rep(". ARRAYS ans <. Burns v1. 12.4 Further Reading A discussion of computations with matrices is Golub and Van Loan (1983). Create a function that generalizes a matrix computation to three-dimensional arrays. collapse = " "). "dx[margin] + 1:dy[margin]". "1:dx[margin]". paste(rep(".0 . then use the eval-parse-text idiom twice to put the inputs into the proper locations of the answer. 12.". collapse = " ").array(x[1].paste("ans[".margin). newdim.". collapse = " "). ldx . Check how you are doing by looking at the abind function from Statlib submitted by Tony Plate and Rich Heiberger.margin). margin .".x") eval(parse(text = cmd)) cmd <. "] <.paste("ans[".array.1 ). collapse = " "). A particular one is that it only accepts two arrays. S Poetry c 1998 Patrick J. ldx . but essentially all that is done is to create the answer to be the right size and shape.290 CHAPTER 12.y") eval(parse(text = cmd)) ans } There is a lot of messing around with details.3 Things to Do Find and ﬁx the bugs in bind. paste(rep(".1 ). newdimnames) cmd <.

We create a formula containing x and y in our function where they are perfectly behaved objects. Here is a simpliﬁed example of how the problem might arise. "fjjlz1"<function(x. Although terms is useful in many situations. and this can happen an arbitrary number of frames away from where the formula is given. formulas can oﬀer a clean interface to the user. Formulas have their own form of lazy evaluation that is distinct from but similar to the lazy evaluation of function arguments.1 Lazy Evaluation Formulas are the main trouble spot with lazy evaluation. A formula is a call to the tilde operator. for instance. The contents of a formula are not evaluated when the formula is evaluated. It is only when the data are really needed that evaluation will happen. A brief explanation of this is on page 203. This operator does essentially nothing—the result is a call to the operator that has class "formula". However. it is not necessary to use it with formulas—the mathematical graph example later in this chapter. It is never necessary to have a function take a formula—the same functionality can be achieved through other means—however. 13. Many of the modeling functions use the terms function which digests a formula. the formula is passed to another function where it is actually used. y) { 291 .Chapter 13 Formulas Formulas provide a mechanism for giving a function an arbitrarily complex input. does not use terms.

x.4326354 Perhaps the preferred solution is to keep the data and the formula together. FORMULAS The result is none too good. 7 residual Residual standard error: 0. > fjjlz2(jjx. One is to make global variables of those that appear in the formula.476645 0.function(form) { lm(form) } form <. jjy) Call: lm(formula = form) Coefficients: (Intercept) x1 x2 0.9723329 -0.0 .y ~ x subfun(form) } CHAPTER 13. By the time the variables within the formula are needed. "fjjlz2"<function(x. > fjjlz1(jjx. S Poetry c 1998 Patrick J. y. jjy) Error in subfun(form): Object "y" not found Dumped There are a couple of remedies.248944 Degrees of freedom: 10 total.y ~ x assign("x". frame = 1) assign("y".function(form) { lm(form) } form <. Burns v1. S doesn’t know the proper place to look for them.292 subfun <. frame = 1) subfun(form) } We get an answer instead of an error. y) { subfun <. The statistical modeling functions that take formulas expect a data frame of data.

983659 3 3 -0. data = df) } form <.2 y 1 1 0. y = y) subfun(form.0 . Let’s use debugger to see where the trouble lies. jjy) 3: subfun(form. > fjjlz3(jjx. data = df) 5: eval(m.y ~ x df <.07007709 1. df) } Now we try it. jjy) Error in subfun(form. LAZY EVALUATION "fjjlz3"<function(x.default(formula = form.53393595 2. df) { lm(form. Burns v1. > debugger() Message: Object "x" not found 1: 2: fjjlz3(jjx. jjy) d(2)> ? 1: df 2: y 3: x 4: form 5: subfun d(2)> df x.768500 5 5 0.frame(x = x.23720871 5. df): Object "x" not found Dumped 293 This would be the preferred technique if it worked.49557847 3.1 x. df) Selection: 2 Frame of fjjlz3(jjx.frame. y) { subfun <.634129 S Poetry c 1998 Patrick J.function(form.03948787 3. sys.569356 2 2 -0.366048 4 4 0.data.parent()) 6: model.13. df) 4: lm(form. data = df) 7: subfun(form.1.

9723329 -0. (Usually this is the desired behavior. df) } This new version presumes that y will be a vector.00915360 9.function(form. > fjjlz4(jjx. jjy) Call: lm(formula = form. (Thus the dual use of the word “frame”.y ~ x df <. data = df) Coefficients: (Intercept) x1 x2 0.) "fjjlz4"<function(x. df) { lm(form. once again we get an answer. df) { S Poetry c 1998 Patrick J.data. Burns v1. FORMULAS The problem is that data. Now.16859131 7.248944 Degrees of freedom: 10 total.4326354 There is another mechanism for passing the data along with the formula since modeling functions typically take an integer indicating the frame in which the objects are to be found as well as a data frame of the objects. "fjjlz5"<function(x.722343 7 -1.25087103 5. and then puts x into the data frame “by hand” in case it is a matrix.476645 0.0 . data = df) } form <.92886521 7. y) { subfun <.767544 CHAPTER 13.657451 9 1. but makes a column for each column of the matrix.frame(y = y) df$x <.x subfun(form.374802 10 -1.294 6 7 8 9 10 6 -0.) Here’s our function to do that.frame doesn’t put a matrix in as a single item.function(form. 7 residual Residual standard error: 0.272687 8 1.23286270 10. but obviously not always. y) { subfun <.

An alternative way of doing it would have been to have the command inside subfun be: lm(form.\ class(x) ("argument") Dumped This is a problem with the lazy evaluation of arguments. > fjjlz5(jjx. . data = df) Coefficients: S Poetry c 1998 Patrick J.. df) { lm(form. sys.parent()) Now for the test of the function. LAZY EVALUATION lm(form.nframe() is inside the call to subfun. A call to sys. sys.nframe()) } 295 We want to point to the frame of fjjlz5.: Invalid data.function(form. it is evaluated in the frame of fjjlz5 so it is correct. Even though the sys. Burns v1.frame()) } Both x and y need to be evaluated.default(Terms.13. > fjjlz6(jjx. y) { subfun <. data = sys.x eval(y) form <. So we ﬁx things up.0 . data = df) } x <. And assigning a variable to itself does indeed make a diﬀerence.frame (giving the frame rather than just the frame number) would work also. jjy) Call: lm(formula = form. This uses one method for each. data = df) } form <.matrix.y ~ x subfun(form.1.y ~ x subfun(form. The mode of something from the formula is still argument—meaning it hasn’t been evaluated yet— rather than something that a numerical routine might be happy with. "fjjlz6"<function(x. jjy) Error in model.

296 (Intercept) x1 x2 0.form[[2]] form[[2]] <. A formula is really an object of mode call (the call is to the tilde operator). The operators in formulas often have a special meaning. when both sides of the formula are given.as.9723329 -0. The ﬁrst component of a call is the function. 7 residual Residual standard error: 0. and subsequent components are the arguments. FORMULAS 13. For example in y ~ x1 + x2 x1 is not being added to x2—it is that both are to be included.0 . the tilde can be either unary or binary. then changes the second component. the formula has length 2. and the second component contains the right-hand side. A common operation is to add the left-hand side to a formula if it doesn’t exist—the following code does this. If that is what you want. The second and third components of a formula should be either of mode name or of mode call.476645 0. The formula y ~ x^2 will not be interpreted as the square of x. Burns v1. where form is a formula: if(length(form) == 2) { form[[3]] <. S Poetry c 1998 Patrick J. Thus. then the formula has length 3—the second component is the left-hand side and the third component is the right-hand side.4326354 CHAPTER 13. In operator terms. When only the right-hand side is given. then write: y ~ I(x^2) The I construct means to interpret functions in the usual sense of S rather than the “formula” sense.name("z") } This code puts the component originally in the second position into a new third component.248944 Degrees of freedom: 10 total. then the “hat” operator as well as addition is special. If you use a formula that is given to the terms function.2 Manipulating Formulas A formula need not have a left-hand (response) side.

0 .4). Edges can be either directed or undirected. There is also the plus operator that concatenates the terms together. MATHEMATICAL GRAPHS 297 13. The star operator makes a star by connecting each element of the ﬁrst vector with all of the elements in the second. and a collection of edges that each connect two of the nodes. the usual replication rules hold. > mathgraph(~ 1:3 / 2:4 + mathgraph(~ c(3. The form of the graph is given to mathgraph via a formula. Burns v1. > mathgraph(~ 1:3 / 2:4) [1] node 1 <-> node 2 [2] node 2 <-> node 3 [3] node 3 <-> node 4 class: mathgraph > mathgraph(~ 1:3 * 2:4) [1] node 1 <-> node 2 [2] node 2 <-> node 2 [3] node 3 <-> node 2 [4] node 1 <-> node 3 [5] node 2 <-> node 3 [6] node 3 <-> node 3 [7] node 1 <-> node 4 [8] node 2 <-> node 4 [9] node 3 <-> node 4 class: mathgraph > mathgraph(~ 1:3 / 4) [1] node 1 <-> node 4 [2] node 2 <-> node 4 [3] node 3 <-> node 4 class: mathgraph This displays the two main operators in mathgraph formulas. The mathgraph function creates an object representing a mathematical graph.13. dir=T)) S Poetry c 1998 Patrick J. A graph that contains only directed edges is called a directed graph. The slash operator connects each element in the ﬁrst vector with the corresponding element in the second—as the last example shows.3. Let’s start with some examples.3 Mathematical Graphs A mathematical graph is a collection of nodes (also called vertices).1) / c(2. Obviously a graph can be arbitrarily complex. so it makes some sense to use a formula to describe it.

Burns v1. then a few additions are put on. The directed argument allows the edges that are created to be either directed or undirected. data = data) adir <.directed attr(ans.parent()) { if(missing(formula)) { ans <. they can be character strings also.NULL } else { ans <. directed = F.na(adir)] <. "directed") <. S Poetry c 1998 Patrick J. "directed") adir[is.0 . The data argument provides a way around problems with lazy evaluation as discussed on page 291.name[2:4] * state. The nodes need not be numbers. the second term is of a directed graph.adir } class(ans) <. The real work is done by build.298 [1] [2] [3] [4] [5] node node node node node 1 <-> node 2 2 <-> node 3 3 <-> node 4 3 -> node 2 1 -> node 4 CHAPTER 13. In this case. we allow the possibility of an empty graph. but also it can be another mathgraph object.name[12:13]) [1] Alaska <-> Idaho [2] Arizona <-> Idaho [3] Arkansas <-> Idaho [4] Alaska <-> Illinois [5] Arizona <-> Illinois [6] Arkansas <-> Illinois class: mathgraph Here is the deﬁnition of the mathgraph function.attr(ans."mathgraph" ans } First.mathgraph. data = sys. FORMULAS class: mathgraph This example shows that a term can not only be a call to * or /. "mathgraph"<function(formula.build.mathgraph(formula. > mathgraph(~ state.

dim(this.eval(this." " } else { this.ev) adir <.ev <.eval(this. rep(le1. "*" = { e1 <. e2) ans <.c(adir.op <.mathgraph"<function(formula.parse[[1]]) == 1) { this.parse[[1]][[2]].c(adir.c(adir.parse[[1]][[1]]) } switch(this.e1[rep(1:le1.character(this.unique(e2) le1 <. MATHEMATICAL GRAPHS The build. "/" = { e1 <.rbind(ans.as.ev.length(e2) e1 <. data) e1 <. "directed")) } else stop(paste( "do not know how to handle term:".parse(text = raw) if(length(this. Burns v1.cbind(e1.ev) adir <.ev)[1])) } . data) e2 <. raw)) } S Poetry c 1998 Patrick J.3. "mathgraph")) { ans <.parse[[1]][[3]]. data) { cterm <.rbind(ans. e2) ans <.parse <.deparse(formula[[2]]) allterms <.eval(this. data) this. rep(NA.length(e1) le2 <.parse. rep(NA.ev)[1])) } . sep = "+") ans <.unique(e1) e2 <.ev.mathgraph function looks like: 299 "build. dim(this.allterms[[i]] this.ev) adir <.eval(this.unpaste(cterm.13. data) if(inherits(this. attr(this.op <.ev <.e2[rep(1:le2. { this.ev <.rbind(ans. data) e2 <. this.parse[[1]][[3]].parse[[1]][[2]].logical(0) for(i in seq(along = allterms)) { raw <. le2)] e2 <. this.op.eval(this.0 .NULL adir <.cbind(e1. this. le2))] this.

nams <. "print. It represents the graph with a two-column matrix where each row stands for an edge and the entries are the two nodes connected by the edge. "\n") } else { cat("mathgraph()\n") } invisible(x) } As always. class(x). xu[. prefix. Burns v1.. format( 1:length(x)). FORMULAS This breaks a character representation of the formula into the individual terms.mathgraph"<function(x. sep = "") } else { the.nams <. if the last line of a print method is not invisible(x).names(x))) { the. Some of the code (including the call to justify which is given on page 278) refers to the names of the graph— S Poetry c 1998 Patrick J. then creates the representation for each term. "<->").300 ) } attr(ans. sep = "") } out <. "] ".. 1].unclass(x) if(length(the. 2]) cat(out.nams <.node.nams.nams. prefix.paste("[". xu[ . "directed"). prefix. ifelse(attr(x. "directed") <. sep = "\n") cat("\nclass:".) { if(length(unclass(x))) { xu <.paste(the.node = if(is. .paste(justify( the. "r").adir ans } CHAPTER 13. This essentially just pastes the nodes for each edge together with a symbol for the directions of the edge between. then it is wrong. " ". The print method for mathgraph produces a fairly clear representation of the graph that also matches the structure of the object quite closely.character(xu)) "" else "node".0 . " ->".node.

so we end up with something that doesn’t even parse.op.1:3 > mathgraph(~ jj1 * jj1+1) Error in switch(this. We need to revise build.mathgraphI". Another part of the code indicates numerical subscripting which we will also get to later. 1:nI. sep = ".of(cterm) if(length(eyes)) { # calls to I() need to be evaluated nI <. and it shouldn’t. The way around this is to surround operations that are to be interpreted in the usual sense instead of the formula sense with a call to I.mathgraph.substring(cterm. The intention there was to add one to each of the numbers in jj1. but the plus sign is interpreted as introducing a second term in the formula rather than as performing addition. sys. data) assign(Inames[i].0 .I. MATHEMATICAL GRAPHS 301 we will get to names shortly. this doesn’t give us much satisfaction either.paste("Build. eyes[. deparse(formula[[2]])) eyes <. 2] . this doesn’t even come close—the formula is broken into pieces between plus signs. this.ext <. > jj1 <.eval(parse(text = Iexpr[i]).nrow(eyes) Inames <.of. nchar( S Poetry c 1998 Patrick J.paste(collapse = "".matrix(c(0.mathgraph to handle I.1) for(i in 1:nI) { this.val. > mathgraph(~ jj1 * I(jj1+1)) Syntax error: end. Burns v1.nframe().find. Here is the ﬁrst part of the revised version of build. function(formula.val <. eyes[. data) { cterm <.3. frame = 1) } eye. Let’s be a little more adventurous. t(eyes).file ("\n") used illegally at this point: jj1 * I(jj1 Dumped However.13.") Iexpr <. Actually. 1] + 2.: do not know how to handle term: 1 Dumped The statement above doesn’t work.

paste(allI. and put into variables in frame 1. by = 2 )] <.of of page 280. This has nothing to do with I. then each of them is evaluated in the appropriate frame using the eval-parse-text idiom. deparse returns a string for each line that it thinks there should be. 2 * nI + 1.strings. When there are instances of I to handle. but I’m not sure that it would be especially useful. collapse = "") } allterms <. nrow = 2) allI. sep = "+") You may notice that the ﬁrst line (that uses deparse) has been changed.Inames allI.character(2 * nI + 1) allI.0 . eye.strings <. ] + 1. Burns v1. eye.nframe makes sure that there will not be collisions in the names if there are nested occurrences of this code.strings[seq(1.strings[2 * (1:nI)] <. Now we can try our example again. ] .1) cterm <.ext[ 1. but rather ﬁxes a bug when the formula is long. This last operation could be abstracted into a separate function.substring(cterm.ext[2. > mathgraph(~ jj1 * I(jj1+1)) [1] node 1 <-> node 2 [2] node 2 <-> node 2 [3] node 3 <-> node 2 [4] node 1 <-> node 3 [5] node 2 <-> node 3 [6] node 3 <-> node 3 [7] node 1 <-> node 4 [8] node 2 <-> node 4 [9] node 3 <-> node 4 class: mathgraph > mathgraph(~ I(jj1*2) / I (jj1+3) + jj1 / I(jj1+1)) [1] node 2 <-> node 4 [2] node 4 <-> node 5 [3] node 6 <-> node 6 [4] node 1 <-> node 2 [5] node 2 <-> node 3 [6] node 3 <-> node 4 S Poetry c 1998 Patrick J.302 CHAPTER 13. which returns a matrix of the start and end of each call to I (if any). FORMULAS cterm) + 2). and the names for the evaluated I expressions are substituted in.unpaste(cterm.I. The cterm string is broken into pieces. The call to sys. this starts with a call to find. After that. So the paste command ensures that we have just a single string.

given the structure.mathgraph"<function(x) { dimnames(unclass(x))[[1]] } "names<-.mathgraph"<function(x. The names for the class can just be the row names of the matrix underlying the object. Burns v1. the unclassing is not necessary since there are no mathgraph methods for dimnames. For graphs that means subscripting on the edges.cl x } This makes the assignment form very easy as well.unclass(x) if(length(x)) dim(x)[1] else 0 } A simpler way to write this function would be to replace the if statement by the line: length(x)/2 However. we don’t want our functions to break if code is added later. "names. However.class(x) x <. At this point. Subscripting is a very powerful feature of S. MATHEMATICAL GRAPHS class: mathgraph 303 It is sensible to name edges as well as nodes. value) { cl <.value class(x) <. The names of the mathgraph object is natural for this. "length.unclass(x) dimnames(x)[[1]] <.0 . it might be slightly safer and not much less eﬃcient to leave it as is.mathgraph"<function(x) { x <. and we want subscripting to work for any class that we create if at all sensible.3. We want the length of the graph to be the number of edges contained in the graph.13. S Poetry c 1998 Patrick J.

value) { if(!inherits(value.0 .mathgraph"<function(x.sum(rep(i. Burns v1. length = nrow(x))) if(nrow(value) != ilen) stop("replacement value not correct length") xdir <.class(x) x <. "mathgraph")) stop("need mathgraph on right-hand side") cl <.mathgraph"<function(x. "directed") attr(x. drop = F] attr(x. .unclass(x) xdir <.dimnames(x)[[1]] x[i. The assignment form of subscripting follows the same pattern. FORMULAS "[. "directed") if(is.x[i. S Poetry c 1998 Patrick J.attr(x.cl x } The lines involving ilen are to avoid the problem described on page 122.character(i)) names(xdir) <.unclass(value) ilen <. i. "directed") <.attr(value. ] <.attr(x.xdir class(x) <.304 CHAPTER 13.logical(i)) ilen <. "directed") if(is. "directed") <. we don’t want to lose the matrixness in case the result has only one edge.character(i)) names(xdir) <. Note the drop = F.dimnames(x)[[1]] x <.xdir[i] class(x) <. Let’s see how they work.value xdir[i] <. but it is inheritance nonetheless.unclass(x) value <.length(i) if(is.cl x } This is easy to write since we inherit the subscripting from matrices. Not in the technical sense since matrices are not object-oriented in version 3. "[<-. i) { cl <.class(x) x <.

We can create a generic function to turn non-directed graphs into directed ones."f")] f node 6 <-> node 1 j node 3 <-> node 6 f node 6 <-> node 1 class: mathgraph > jjmg2[c(F.13."j".jjmg1 > jjmg2 d node 4 <-> node 1 e node 1 <-> node 2 f node 2 <-> node 3 g node 3 <-> node 4 h node 3 <-> node 4 i node 2 <-> node 6 j node 3 <-> node 6 class: mathgraph Sometimes we want to deal only with directed graphs.T)] e node 5 <-> node 1 g node 2 <-> node 4 i node 2 <-> node 6 class: mathgraph So subscripting is very similar to subscripting a typical vector.0 . Replacement is also similar to the case with vectors. MATHEMATICAL GRAPHS > jjmg2 d node e node f node g node h node i node j node 305 4 5 6 2 3 2 3 <-> <-> <-> <-> <-> <-> <-> node node node node node node node 1 1 1 4 4 6 6 class: mathgraph > jjmg2[2:3] e node 5 <-> node 1 f node 6 <-> node 1 class: mathgraph > jjmg2[c("f". Burns v1.3. So the result looks like: S Poetry c 1998 Patrick J. > jjmg2[2:4] <.

mathgraph"<function(x) { dir <. This function could be used by a special type of mathematical graph S Poetry c 1998 Patrick J.rep(T. Burns v1.attr(x..0 . 2:1]) attr(ans.default"<function(x. CHAPTER 13. .. "directed") <."mathgraph" ans } This takes all of the input edges and adds another edge in reverse order for each undirected edge. but doesn’t exist. x[!dir. Here is the method for mathgraph. However. Note that the class of the output is not guaranteed to be the same as that of the input.306 > jjmg1 a node 1 <-> node 2 b node 2 <-> node 3 c node 3 <-> node 4 class: mathgraph > alldirected(jjmg1) a node 1 -> node 2 b node 2 -> node 3 c node 3 -> node 4 a node 2 -> node 1 b node 3 -> node 2 c node 4 -> node 3 class: mathgraph The default method merely creates an error. nrow(ans)) class(ans) <. "alldirected. FORMULAS "alldirected.unclass(x) ans <. "directed") if(all(dir)) return(x) x <. when it is written.) { stop("do not know how to handle") } In a sense this is redundant since an error will occur if the default method is needed. it means that the programmer has thought over the situation.rbind(x.

> > > > > a b c d e f g h i j jjmg3 <. MATHEMATICAL GRAPHS 307 that inherits from mathgraph.0 . It is subtle distinctions like this that are the hardest part of object-oriented programming. Burns v1. By not passing the class of the input to the output. jjmg3) a node 1 <-> node 2 b node 2 <-> node 3 S Poetry c 1998 Patrick J. > c(jjmg4. This also allows us to retain subclasses of mathgraph if appropriate.c("subA". "mathgraph") class(jjmg4) <. jjmg3) d node 4 <-> node 1 e node 1 <-> node 2 f node 2 <-> node 3 g node 3 <-> node 4 h node 3 <-> node 4 i node 2 <-> node 6 j node 3 <-> node 6 a node 1 <-> node 2 b node 2 <-> node 3 c node 3 <-> node 4 class: subA mathgraph > c(jjmg1. We can combine mathgraph objects by calling mathgraph again.jjmg1 jjmg4 <. Whether you have a monument or a pile of rubble can depend on getting these decisions right.jjmg2 class(jjmg3) <.3.13. we are making sure that we don’t end up with an object that doesn’t make sense—it could be that that special class of graph doesn’t make sense for directed graphs. "mathgraph") mathgraph(~ jjmg3 + jjmg4) node 1 <-> node 2 node 2 <-> node 3 node 3 <-> node 4 node 4 <-> node 1 node 1 <-> node 2 node 2 <-> node 3 node 3 <-> node 4 node 3 <-> node 4 node 2 <-> node 6 node 3 <-> node 6 class: mathgraph But it is more convenient to have a method for the c function.c("subA".

commontail(lapply(dots.c("subB". But it isn’t necessarily in this case—if the subclass of mathgraph represents trees. dots) adir <.unlist(lapply(dots. FORMULAS class: mathgraph > jjmg5 <. there is the possibility of trouble that will be hard to track down..0 . For many types of object this is the proper behavior. nomatch = 0) ) stop("not all mathgraph objects") amat <. class)) if(!match("mathgraph".class amat } There is a check at the end to make sure that the directed attribute matches the rest of the object. "mathgraph") > c(jjmg5. The deﬁnition of the c method is: "c.class <. Burns v1.call("rbind".. "directed") <.list(.jjmg1 > class(jjmg5) <. S Poetry c 1998 Patrick J.mathgraph will be the most speciﬁc set of classes that all of the inputs have in common.308 c a b c node node node node 3 1 2 3 <-> <-> <-> <-> node node node node 4 2 3 4 CHAPTER 13. function(x) attr(x.adir class(amat) <.class. "directed"))) if(dim(amat)[1] != length(adir)) stop("garbled object") attr(amat.do.) { dots <.the. the combination of two or more trees doesn’t guarantee that the result is a tree.) the. jjmg3) a node 1 <-> node 2 b node 2 <-> node 3 c node 3 <-> node 4 a node 1 <-> node 2 b node 2 <-> node 3 c node 3 <-> node 4 class: mathgraph So the class of the result of a call to c..mathgraph"<function(. the.. Without the check.

the order is reversed in each component to make it easier. i) y[i].ans.ans) == 1) { ans <. Inside the for loop there is a decision based on there being a unique element in all of the components.unclass(x) xdup <.lens) if(min.class(x) x <. "directed") cl <.c(this.lens <. dir)) x <.lapply(x.0 .unique(unlist(lapply(x. drop = F] dir <.len) { this.NULL x <. function(y. Once we know that there is really some work to do. MATHEMATICAL GRAPHS 309 We wouldn’t mind having an optional argument to determine how to treat the class. ans) } else { break } } ans } You’ll notice that lapply plays prominently in this function. 2].mathgraph should be is the commontail function.mathgraph"<function(x) { dir <.duplicated(paste(x[.mathgraph.ans <. rev) for(i in 1:min. i = i))) if(length(this.attr(x.min(the.dir[!xdup] S Poetry c 1998 Patrick J.x[!xdup. then we can use unique.len <. Burns v1.len == 0) return(NULL) ans <. length)) min. but version 3 (at least) of S won’t allow it (the default method of c gets used when the optional argument is in the call). x[. Its deﬁnition is: > commontail function(x) { the.unlist(lapply(x. "unique. 1]. If we don’t want to allow edges of the same type between the same nodes.13.3. . The mechanism that decides what the class resulting from c.

dir class(x) <. and sorting of the edges.mathgraph"<function(x.order(x[.apply(x[!dir. j if there is an edge from node i to node j . "sort. "directed") <. One useful representation of mathematical graphs is an adjacency matrix. 2]) x <.mathgraph function makes it easier to look at printed graphs.310 attr(x. There is a “1” in location i.cl x } CHAPTER 13. The sort.dir[ord] } class(x) <. Note that unclass is necessary because we are subscripting in the sense of the matrix underlying the mathgraph object. 1. FORMULAS This uses the usual trick of unique which is to delete duplicated items.stable. "directed") cl <. So if all of the edges are undirected.x[ord. This is a square matrix in which the dimension is the number of nodes. then the matrix is symmetric. general = F) { S Poetry c 1998 Patrick J. so you will probably need to type the full name.unclass(x) if(nodes && !all(dir)) { x[!dir. sort) } if(edges) { ord <. Note that sort is not generic in (at least) lots of versions. nodes = T. ] attr(x. edges = T) { dir <.class(x) x <. not in the sense of subscripting the edges.attr(x. "directed") <. Here’s a function to transform a mathgraph object into the equivalent adjacency matrix.0 .cl x } . Burns v1. "adjamat. x[.mathgraph"<function(x. This sort routine allows both sorting of the nodes within an undirected edge. 1]. drop = F]. ] <.

nnode).. "call") <. "has.name[2:4])) Alabama Alaska Arizona Arkansas Alabama 0 1 0 0 Alaska 1 0 1 0 S Poetry c 1998 Patrick J.13.dx } ans[x] <. list(nnam.names class(ans) <.name[1:3]/state.character(x) if(ischar) { nnam <. c(nnode.paste("node".numeric(x)) stop("nodes must be character or numeric") nnode <.match(x.dim(x) x <.1 if(general) { stop("general version not implemented") } attr(ans.match.0 ."adjamat" ans } Making the function above a method for a generic function is sensible. 2:1]] <.names <.call() attr(ans.3. MATHEMATICAL GRAPHS 311 x <. "directed") ischar <. nnam)) if(ischar) { dx <.names <.has. nnam) dim(x) <.length(nnam) has.array(0..F } ans <.) UseMethod("adjamat") An example where the nodes are character is: > adjamat(mathgraph(~ state. Burns v1.attr(x. .T } else { if(!is.unclass(x) xdir <.names") <.max(x) nnam <.unique(x) nnode <. "adjamat"<function(x. 1:nnode) has.1 if(any(!xdir)) ans[x[!xdir.is.

Neither are very frugal with space.mathgraph"<function(x. FORMULAS Arizona 0 1 0 1 Arkansas 0 0 1 0 attr(.1:nedge cnam <. As is common. Here is the function to transform a mathgraph object into an incidence matrix. However. "call"): adjamat.character(x).is. "incidmat. so incidence matrices are usually going to be bigger than adjacency matrices.c(nodes = is.312 CHAPTER 13.match(x.dimnames(x)[[1]] has. "class"): [1] "adjamat" Another representation of a graph is an incidence matrix. Burns v1.name[1:3]/ state. "directed") nedge <.unique(x) nnode <. there are typically more edges than nodes in a graph. expand = T. an unimportant case like this involves a major part of the function.mathgraph(x = mathgraph( ~ state.length(nnam) dx <.name[2:4])) attr(. This matrix has as many rows as nodes in the graph.dim(x) x <. general = F) { x <.character(x) if(ischar) { nnam <.unclass(x) xdir <. "edge"). eseq) } ischar <.dim(x)[1] eseq <. This is simpliﬁed by taking out the code that makes sure that loops (an edge that begins and ends on the same node) behave properly.paste(ifelse(xdir. and the number of columns is equal to the number of directed edges plus twice the number of undirected edges. nnam) S Poetry c 1998 Patrick J.0 . "has. "arc". edges = T) if(!length(cnam)) { has.F cnam <.names["edges"] <. Incidence matrices have an advantage over adjacency matrices in that the latter destroy information about the edges—we can’t retain the names of the edges in an adjacency matrix without essentially keeping the original mathgraph object.names"): [1] T attr(.attr(x.names <.

this listing leaves out how loops are handled—not because it is hard.match(eseq. -1. "has.names") <. 1:nnode) } loops <. The else clause is similar.xdir) ans <. eseq)] <. reseq)] <. cnam)) ans[cbind(x[.1 ans[cbind(x[!xdir.0 .ifelse( xdir. length(cnam)).array(0.numeric(x)) stop("nodes must be character or numeric") nnode <.x[. and handle loops along with everything else in only a couple more lines than you see here. 1].match.1 ans[cbind(x[.call() class(ans) <. reseq[!xdir] + 1)] <. 1) } attr(ans. 2 . list(nnam. 2]. using a matrix as the object inside the subscripts.13. Expanding means that undirected edges are broken into two directed edges—so cnam is replicated accordingly.max(x) nnam <.has. rep(eseq. eseq)] <. This is then used to put values in the appropriate places.-1 ans[cbind(x[. The problem with that was that it didn’t work for all cases. reseq[!xdir] + 1)] <. list( nnam.-1 ans[cbind(x[!xdir.array(0. 1]. The reseq object is the indices corresponding to the ﬁrst column representing each edge. 2 xdir)) ans[cbind(x[. 2].3. S Poetry c 1998 Patrick J. reseq)] <. 1] == x[. 1]. 2].1 } else { ans <."incidmat" ans } After some initial bureaucracy. cnam)) reseq <. nedge).paste("node". MATHEMATICAL GRAPHS dim(x) <. Burns v1. My ﬁrst attempt at this function tried to be clever.dx } else { 313 if(!is. the work really gets underway in the if(expand). As stated earlier. 2] if(expand && !all(xdir)) { cnam <. "call") <.names attr(ans. but because it is a lot of code for not much gain.rep(cnam. c(nnode. c(nnode.

S Poetry c 1998 Patrick J. This is very similar to getpath. the code gets ever longer.mathgraph(x = mathgraph( ~ 1:3/2:4)) attr(.names"): nodes edges F F attr(. because simple solutions are hidden in the volumes of code. "has.0 . FORMULAS • You can subscribe always to cleverness. expand = F) attr(. Attempts to ﬁx these bugs may lead to further knots of code that can easily create new bugs.incidmat which is shown here. tedious. For example. "class"): [1] "incidmat" > incidmat(mathgraph(~ 1:3/2:4). thus making code that is long. "call"): incidmat. > incidmat(mathgraph(~ 1:3/2:4)) edge 1 edge 1 edge 2 edge 2 edge 3 edge 3 node 1 1 -1 0 0 0 0 node 2 -1 1 1 -1 0 0 node 3 0 0 -1 1 1 -1 node 4 0 0 0 0 -1 1 attr(. "class"): [1] "incidmat" Mathematical graphs are built not for their own sake. we call it to use the incidmat. in which case your code may have surprising bugs. • You can avoid cleverness. we may be interested in a path from one node to some other node. ex=F) edge 1 edge 2 edge 3 node 1 1 0 0 node 2 1 1 0 node 3 0 1 1 node 4 0 0 1 attr(.314 So there are two traps to fall into: CHAPTER 13.mathgraph(x = mathgraph( ~ 1:3/2:4). "has.mathgraph function.adjamat function does this when the graph is represented by an adjacency matrix. and perhaps ineﬃcient. The getpath. Don’t do either. The incidmat function is generic.names"): nodes edges F F attr(. Burns v1. "call"): incidmat. As more functionality is needed. but to learn things about them.

row(x[.0 . newe. nomatch = NA) bad.newn[newt] newe <.match(end.rep(1.c("start".character(c(end.names <.unclass(x) dircheck <.newn[newt] newe <.NULL unchecked[this.eseq[x[tset[this. start))) { start <.5 )] newt <.incidmat"<function(x. start. Burns v1. collapse = " and "). dim(x)[1]) %*% x if(any(dircheck)) stop("need matrix for directed graph") has.newe[newt] } else newn <. newn.newe[newt] newt <. MATHEMATICAL GRAPHS 315 "getpath.names") if(has.names.na(c(start.1:dim(x)[2] repeat { this.NULL if(is.names <.(1:length(unchecked))[unchecked][1] newe <. nomatch = 0) == 0 newn <.start prev <. end) S Poetry c 1998 Patrick J.match(end.c(tset[!unchecked].NULL if(has. node.3.1:dim(x)[1] eseq <.attr(x.in)) stop(paste(paste(bad.index] <. newe] < -0. nomatch = NA) end <.13. "not right")) } tset <.5] if(length(newe)) { newn <. "has. end))] if(length(bad.match(newn. tset. drop = F])[as.names <.F if(n <. ] > 0.length(newn)) { if(endind <.0 edges <. node.0 unchecked <.in.in <.index <.vector(x[.dimnames(x)[[1]] else node.names["nodes"]) node. nomatch = 0)) { # have a path tset <. end) { if(start == end) return(mathgraph()) x <.T nseq <.!duplicated(newn) newn <.dimnames(x)[[2]] else enames <. "end")[is.index].match(start.names["edges"]) enames <.names.

mathgraph( ~ node. n)) } if(!any(unchecked)) return(NULL) } } By seeing a repeat loop in which there is a great deal of creating and enlarging objects.c(prev[!unchecked].c(this.index] != start) { this. n)) unchecked <. Here is getpath in action.index]] path <.index]) edges <.names["nodes"]) { ans <. FORMULAS prev <. Burns v1.mathgraph pass the matrix of the mathgraph object down into C to do the work and pass the answer back. it will take substantially longer to code also.this.index <.1:length(tset) path <.index. A good implementation would have getpath.node <. tset[this. "Alabama".pseq[tset == prev[this.c(edges. newe[endind]) pseq <. dir = T) } if(has.index].name[24:25] + + state.name[22:23]*state.c(edges[!unchecked].names[tset[path]]. dir=T) > getpath(jjmgs2. A C implementation is going to be miles more eﬃcient than the S code.names["edges"]) { names(ans) <. path) } if(has.name[23].mathgraph makes an incidence matrix of the graph and then uses the incidmat method of getpath.c(tset.enames[edges[path]] } return(ans) } tset <. rep(T. however. You’d be right. at which point S would construct the appropriate object to return. dir = T) } else { ans <. The current getpath.name[2]/state. "Missouri") [1] Alabama -> Alaska S Poetry c 1998 Patrick J.0 .names[prev[path]]/ node.name[2:4] + + state. newe) prev <. you might have the idea that S is not the ideal place for this code. rep(tset[this. > jjmgs2 <.length(tset) this.end while(prev[this.c(unchecked. newn) edges <.c(prev.mathgraph( ~ prev[path]/tset[path].mathgraph(~ state.name[1:3]*state.index <.316 CHAPTER 13.

04. 1.98 * py if(!all(xdir)) segments(px[x[!xdir.. py * 1. 1.sin((2 * 0:(maxx .cos((2 * 0:(maxx . node.) { x <. MATHEMATICAL GRAPHS [2] [3] Alaska -> Minnesota Minnesota -> Missouri 317 class: mathgraph Of course it is natural to think of mathematical graphs visually.04). node. py[x[! xdir.0. . "directed") if(ischar <.names <.98 * px py <.1) * pi)/maxx) py <. xlab = "".names) invisible() } S Poetry c 1998 Patrick J. py[x[xdir.07.0 .unique(x) maxx <. .3.length(node. axes = F.1) * pi)/maxx) plot(px...) box() px <.13.character(x)) { node. 2]].names) dim(x) <. xlim = c(-1.04.match(x. 1]].07. 1]].. Here is a quick and decidedly dirty plot method for mathgraph objects. 1]].dx } else { maxx <. 1]]. 2]]) if(any(xdir)) arrows(px[x[xdir.as.dim(x) x <. "plot.is.unclass(x) xdir <. ylab = "".names <. py. py[x[!xdir.0.mathgraph"<function(x.names) dx <. ylim = c(-1. py[x[xdir.attr(x. Burns v1.character(1:maxx) } px <. 2]]. px[x[!xdir. px[x[xdir. 2]] ) text(px * 1.max(x) node.04). which means that you should be able to plot them.

Write a plot.0 . functions similar in spirit to getpath.318 CHAPTER 13.4 Things to Do There is a bug in the version of unique. Write a function that can beneﬁt from having at least one argument be a formula. What is the bug? How would you ﬁx it? Write adjamat. What diﬀerences would there be in an implementation without formulas? 13. Create a new class that inherits from mathgraph that has capacities for the edges. Ghare and Moore (1979) Applications of Graph Theory Algorithms. a telephone system. That is. S Poetry c 1998 Patrick J. Is subscripting feasible for classes adjamat and incidmat? Are there operations that should be added to the formulas for mathgraph? How should adjacency and incidence matrix representations be coerced to mathgraph objects? What other additions should be made to the mathgraph implementation? In many applications the edges of a graph have a capacity—for example. How do you test these functions? Write a function that generates random graphs. Burns v1.5 Further Reading A book on mathematical graphs is Chachra.mathgraph listed on page 309—trust me. FORMULAS 13.mathgraph that does what we really want to happen. Can you allow the user to have control over the exact output? Write a suite of functions to perform various operations with mathematical graphs.adjamat. you shouldn’t trust me.incidmat and incidmat.

length = EVALS) this.seq[1]) } ans } This function takes a function. EVALS)])) * ( this.) ans <.integral"<function(FUN. The following function performs contour integrals in the complex plane.. mode = "function") ans <..0.seq.ans[c(1.ans <.ans) .length(POINTS) if(n < 2) stop("POINTS must have length at least 2") if(is.this. . a vector of points that are the vertices of the contour along which the integration takes place.FUN(this.seq[2] .seq(POINTS[i .Chapter 14 Functions This chapter concentrates on functions that have functions as arguments. "fjjline. . 14.0 for(i in 2:n) { this.1].. POINTS[ i].seq <..) { n <.get(FUN.5 * sum(this. and the number of function 319 . EVALS = 100. POINTS.1 Numerical Integration Numerical integration (also known as quadrature) is often useful.character(FUN)) FUN <.ans + (sum(this. or that return functions.

283185i From theory we know that the true value of this integral is 2πi so we can see how close the numerical integration is to the right answer—it seems to be trying to get it right. and the function in this example is not.character(FUN)) FUN <. c("trapezoid". "line.320 CHAPTER 14.1. FUNCTIONS evaluations to do in each segment of the contour.integral(function(z) 1. With this experience in hand. + im=c(-1. each point that is evaluated is given the same weight.. simpson = { # want odd number of evaluation points if(EVALS %% 2 == 0) EVALS <EVALS + 1 } ) if(is. > fjjline. complex(re=c(1.1.283049i > pi*2i [1] 0+6. The problem is that the integrand is required to be vectorized..25601e-16+6.integral"<function(FUN. mode = "function") S Poetry c 1998 Patrick J.-1. "simpson")) n <.get(FUN. we call the method in the previous version the “trapezoid” method.1. The method of integration is quite naive—in a closed contour like the example below.0 . EVALS = 100. METHOD = "simpson".integral are all in upper case so that their names will not interfere with the names of arguments of the function being integrated. Burns v1. + im=c(-1. Now try another integral for which we know the correct answer. we can attempt a better version. complex(re=c(1.-1.1).value(METHOD. The names of the arguments to fjjline.-1.unabbrev.length(POINTS) if(n < 2) stop("POINTS must have length at least 2") switch(METHOD.1))) [1] NA This time we are met with failure. but is not terribly accurate. POINTS. Additional arguments for the function being integrated are also accepted.integral(function(z) 1/z.2).-1))) [1] 1. . > fjjline.) { METHOD <. Not wanting to appear naive.

1))/3 * (this.integral(function(z) 1/z.0.FUN(this.ans) != EVALS) stop("FUN not properly vectorized") ans <.integral and the SPLUS function integrate.integral(function(z) 1.-1. NUMERICAL INTEGRATION ans <. POINTS[ i].. EVALS)])) * ( this.5 * sum(this. > line.ans + switch(METHOD.1]. trapezoid = { (sum(this.1))) Error in line.14.1. length = EVALS . complex(re=c(1. Burns v1.ans[c(1.) if(length(this. the function can give answers that are the correct length but are wrong.1..2).seq[1]) } ) } ans } The result of both of the previous examples is now much improved.ans * c(1. .-1. + im=c(-1.seq[2] this.-1.integral(function(z): FUN not properly vectorized Dumped 321 Note that the check on vectorization is not foolproof—if you use max where you should use pmax.283185i > line.1.1.480297e-16+6. so the results can be compared.2). + im=c(-1.0 for(i in 2:n) { this.283185i > pi * 2i [1] 0+6. rep(c(4.1).seq[2] this. Real-valued integrations can be done by both line. complex(re=c(1.ans) .ans <.seq. 2). for instance.seq(POINTS[i . In the following S Poetry c 1998 Patrick J.0 . simpson = { sum(this.-1))) [1] 1.seq[1]) } .seq <. length = EVALS) this.

Burns v1.diff(pnorm(c(-1. One of the better methods is Lagrange interpolation. 1)$integral . and then to approximate the function at other points within the interval in which the exact values are known.220446e-16 > line.integral(dnorm. then integrate is likely to do much better. 1). However. + c(-1. Simpson’s method is signiﬁcantly less accurate than the adaptive technique that integrate employs. FUNCTIONS case.unix.integral for the typical problem.2 Interpolation Function interpolation is useful when it is expensive to evaluate the function. then the Lagrange interpolation formula for the ˜ is approximating function f n ˜(x) = f k=1 lk (x)f (xk ) (14. if the integrand is not nicely behaved. However. 14. EV=5000)) comp time clock time memory 6.integral takes only about half as much time as integrate for this problem. 1). EV=5000) . If we know the value of function f at n points xi .integral(dnorm.89 7 1763672 > p. The strategy is to evaluate the function exactly at a few points.85 13 2511336 The diﬀerence in both accuracy and computation time will vary from problem to problem—my suspicion is that this probably overstates the time advantage of line. > p.249756e-14 Even with a lot of evaluations.2) S Poetry c 1998 Patrick J. -1. 1)) comp time clock time memory 12. it could be proﬁtable to substitute line. c(-1.unix. In applications where speed is more important than accuracy. we are subtracting a very accurate way of evaluating a value from the numerical integration. line. -1.time(for(i in 1:100) line.0 .1))) [1] -2.integral for integrate. > integrate(dnorm.1))) [1] -7.diff(pnorm(c(-1.time(for(i in 1:100) integrate(dnorm.1) where lk (x) = i=k i=k (x − xi ) (xk − xi ) (14.322 CHAPTER 14.

prod) tab <.0 . length(x. Burns v1.know.expression(if(length(z) != 1) stop( "z must have length 1").know <. NULL.know. prod))) mode(body) <.14. sum(y. "fjjinterpolator."{" S Poetry c 1998 Patrick J. NULL.2. 1.know. What we do want is a way of creating an S function that approximates a function of our choice.know * num)/den) } Obviously we don’t really care about having a function that approximates the sine—S already has a function that approximates the sine very well.1 num <. "-") diag(tab) <. The next task is to translate Lagrange’s formula into S. 1.function(z) { } body <.sin(jjsx) Then we write the function to do interpolation with them: function(x) { x.jjsx y.outer(rep(x.apply(tab. "-") diag(tab) <.seq(0. x. y) { if(length(x) != length(y)) stop("x and y must be the same length") ans <.know <.jjsy if(length(x) != 1) stop("x must have length 1") if(x < min(x.apply(tab.know) || x > max(x. First we get the known values: > jjsx <. prod) sum((y. The following function takes a vector of x values and a vector of corresponding values of the mathematical function of interest and returns an S function that is the Lagrange interpolation of the data. NULL. 1. len=5) > jjsy <.1 den <. pi/2. x.outer(x.know)). INTERPOLATION 323 To get a function that interpolates for the sine function we need to start with a vector of x-values and the sine of those values.lagrange"<function(x.know)) stop("x not in interval") tab <.den * apply( tab. NULL.

47477308499758. y/den) body[[4]] <.1")[[1 ]] ans[[length(ans)]] <. Body my house my horse my hound 37 In the middle of the function there are three methods of adding commands to body. and mode expression has exhausted its usefulness at this point. 5). x = x)) body[[5]] <. "-").68193882509428. "y.call("assign".lagrange(jjsx.outer(x.substitute(tab <. Skipping to the end. x.785398163397448. 1.den". c(0.392699081698724. c(0. "-") diag(tab) <.substitute(if(z < minx || z > maxx) stop("z not in interval of known x values" ). 0.0 . jjsy) function(z) { if(length(z) != 1) stop("z must have length 1") if(z < 0 || z > 1. The start of this is to create body that is an expression of the correct length and has a couple of the commands ﬁlled in.outer(rep(z. or you can use the text argument in parse. list(minx = min(x).17809724509617. FUNCTIONS body[[2]] <. prod) body[[3]] <. other names are left as is.324 CHAPTER 14.body ans } The result of this function will be a function of one argument. body is attached to ans and ans is returned. 0. Burns v1. so the ans variable is assigned a one-argument function—the argument will be called z. 1. 1.den".5707963267949).43336516385593. > fjjinterpolator. Then body is coerced to be of mode “brace”—this is the mode we want it to end up being. "-") S Poetry c 1998 Patrick J. When a command is to be created and there is nothing to substitute in.5707963267949) stop("z not in interval of known x values") assign("y. Values of variables that are in the list are substituted into the expression. x. then you can use call. substitute provides very useful functionality—it takes an expression as its ﬁrst argument and a named list as its second. -2. Here is the result with the sine data.outer(rep(z.apply(tab. 1. xlen).1 den <.parse(text = "diag(tab) <. -6.75206097146613)) tab <. list(xlen = length(x). 7. maxx = max(x))) tab <. What the function actually does comes later.

jjsx. "-") tab <. ans) S Poetry c 1998 Patrick J.aperm(tab. so now outer creates a threedimensional array.length(z). This merely adds another dimension for the values in the input. attributes(ans) <. rep(length(den). Once this function is written and we know that it works.NA tab <.1.1 den <. zlen)). prod) zlen <. zlen)). NULL.apply(tab. we can use it as a template. c(1. c(length(den). c(1. function(z) { out <. prod).1 sum(y.outer(jjsx.aperm(tab. "interpolator. Burns v1.(z < min(jjsx) | z > max(jjsx)) z[out] <. 3. 2)).2. "-") diag(tab) <. ans <rep(1. 3. tab[row(tab) == col(tab)] <.den * apply(tab. length(jjsy)) %*% ((jjsy * num)/ den)) } The single-value version required outer to create a matrix.NA.den * num). prod) drop(rep(1. num <.1 num <.0 . ilen) %*% (y. zlen)).lagrange"<function(x. jjsx.attributes(z). zlen)). 1. tabz <array(rep(z.apply(tab. NULL. So we go to work and come up with a function to interpolate sine that is vectorized.expression(zlen <.outer(array(rep(z. rep(ilen. 3). c(ilen. NULL.length(z) tab <. INTERPOLATION diag(tab) <.function(z) { } body <. 1.14. y) { if(length(x) != length(y)) stop("x and y must be the same length") ans <. prod)) } 325 The major shortcoming of this is that it is not vectorized. c(1. 3). c(1. z[zout] <. 2)) tab[row(tab) == col(tab)] <.apply(tab. tab <. NULL.

"-") diag(tab) <.0 . list(x = x)) ans[[length(ans)]] <.il function(z) { zlen <. c(1.den".68193882509428. it is good practice to check if the answer seems sensible. 1.equal(ans(x).substitute(ilen <. list(minx = min(x). 7.47477308499758.5707963267949) z[zout] <. rep(ilen.array(rep(z. 3. zlen)) tab <. -6.logical(test) || !test) stop("function didn’t build correctly") ans } This includes a test at the end to see if the resulting function looks reasonable.body test <.aperm(tab. ilen) %*% (y.den * num) S Poetry c 1998 Patrick J.x = length(x))) body[[3]] <.rep(1.NA assign("y. y) if(!is.outer(tabz. 0.17809724509617.(z < 0 | z > 1. prod) ans <.43336516385593.length(z) ilen <. list( length. zlen)).il <.apply(tab.length.(z < minx | z > maxx). c(ilen.outer(tabz. jjsy) > fjjsin.substitute(zout <. "-"). > fjjsin. prod) body[[5]] <. 2)) tab[row(tab) == col(tab)] <.lagrange(jjsx. "y.1 num <.785398163397448. 3).outer(x. c(0. 1. We know that the interpolation should be exact at the input points.call("assign". x. Here we try it out. and this is an easy thing to check.all."{" body[[2]] <. 1.5 zout <.1 den <.75206097146613)) tabz <.substitute(tab <.x. 1. y/den) body[[7]] <. c(0.interpolator. Burns v1.392699081698724. When we humans solve a problem. "-") tab <.5707963267949).326 CHAPTER 14. FUNCTIONS mode(body) <. x. There’s no reason that programs shouldn’t do the same.apply(tab. maxx = max( x))) tab <.den". 0. c(1. -2.

14.3. The following function implements a crude genetic algorithm. . so gradient techniques are very slow. • The objective function is not diﬀerentiable.) { S Poetry c 1998 Patrick J. GENETIC ALGORITHMS attributes(ans) <. + fjjsin.. though. • The number of parameters is very large.attributes(z) ans } 327 We can see how it does by plotting the error of the Lagrange interpolation and the error of linear interpolation. the main ingredients for a genetic algorithm are sex and violence. Someone once called this the Oedipus algorithm—kill your father.sin(jjx1). a child is created by mixing the elements from the two parents. marry your mother. pi/2. MUTATE = 0.02. TRACE = T. There needs to be a mechanism for children to be born based on a random selection of attributes of members of the population. Then there needs to be a way to decide who is to live so that the population size remains constant (or at least relatively constant). len=200) > matplot(jjx1. As in Hollywood movies. jjsy.. The basic form of it. JITTERS = 3. cbind(approx(jjsx. Burns v1.0 . This function operates on logical vectors. resembles many of the algorithms that I use in practice. 14. BIRTHS = 100.3. POP. PROB = 0. xout=jjx1)$y. "fjjgenop1"<function(FUN. The best two of the three survive. Interpolation is not something to do thoughtlessly. > jjx1 <. Reasons to use a genetic algorithm rather than something like a Newton-Raphson or conjugate gradient technique are: • There are multiple local optima.il(jjx1)) . If a child survives. type="l") Note that you can get terrible performance from interpolation if your function does not behave the way that the interpolation method presumes of functions. The basic idea is to have a population of solutions that improves relative to the objective through a “survival of the ﬁttest” mechanism. as traditional genetic algorithms do. Two members of the population are selected at random. then it is “jittered” in an attempt to ﬁnd a better solution nearby.seq(0.3 Genetic Algorithms Genetic algorithms are a type of random algorithm that are used for optimization (among other things).

prob = PROB) if(all(cloc)) cloc[sample(npar. out] <.sample(c(T.matrix(POP)) stop("bad input for POP") if(!is.F else if(all(!cloc)) cloc[sample(npar.c(MUTATE. .obj <.popsize npar <.child. "\n") for(i in 1:BIRTHS) { parents <.. S Poetry c 1998 Patrick J. 1 .parents[1] else out <.sample(c(F. "\n") if(survive) { if(objective[parents[1]] > objective[parents[ 2]]) out <.funevals + 1 survive <.obj < min(objective)) cat("new minimum\n") POP[.c(PROB.logical(POP)) stop("this is for logical populations only") popsize <.) funevals <.ncol(POP) objective <. rep = T. parents[1]] cloc <.nrow(POP) if(PROB <= 0 || PROB >= 1) stop("bad value for PROB") PROB <.FUN(child.T child[cloc] <. parents[2]] child.MUTATE) if(TRACE) cat("objectives are from".parents[2] if(TRACE && child. if( survive) "(improve)". Burns v1. min(objective).obj for(i in seq(length = JITTERS)) { cloc <.get(FUN) if(!is.obj < max(objective[parents]) if(TRACE) cat("parents:". max( objective).POP[.. . rep = F) child <. T).Random. npar. 1)] <.obj). format(child.child. "to"..sample(popsize.FUN(POP[.seed if(is. F). i]. 1)] <. 1 .child objective[out] <.character(FUN)) FUN <. "child:"..328 CHAPTER 14.PROB) if(MUTATE <= 0 || MUTATE >= 1) stop("bad value for MUTATE") MUTATE <. rep = T.) } funevals <. npar..seed <. 2.0 . format(objective[parents]). FUNCTIONS random.numeric(popsize) for(i in 1:popsize) { objective[i] <.POP[cloc.

obj.24 1.obj <.54 2..sum(x * jsa20) > jsa20 [1] 2.) funevals <.jchild objective[out] <. GENETIC ALGORITHMS prob = MUTATE) if(all(!cloc)) cloc[sample(npar. objective = objective[ord].10 1.66 1.FUN(jchild. out] <.39 1. "\n") if(jchild.0 .obj <jchild.child.10 3.18 The following function keeps track of how well fjjgenop1 does on the problem for a number of evaluations.3.00 3.89 1.01 1. so that is the one we’ll use. jchild.69 1.35 2. .". Burns v1.funevals + 1 if(jchild.seed. They studied the simple problem of maximizing p g (x) = i=1 ai xi xi ∈ {0.15 1. funevals = funevals. "new objective".child <.obj } } } } ord <. call = match.jsa20 function(x) .09 1.43 2.seed = random. 1)] <. > fjj.02 3.obj < min(objective)) cat(" new minimum\n") } POP[. 1} All of the ai are positive so the function is obviously maximized when all of the xi are 1 (or TRUE in our function)..order(objective) list(population = POP[.43 [11] 1. S Poetry c 1998 Patrick J.child jchild[cloc] <. ord]. random.!child[cloc] jchild.obj < child. Jennison and Sheehan give the complete details only for their 20 dimensional problem.12 1.14.36 1. Here is the S function to be minimized.03 1.T jchild <.call()) } 329 The main purpose for writing this function is to compare how well it does on one of the problems studied by Jennison and Sheehan (1995).obj) { if(TRACE) { cat("jitter successful.

this.fjjgenop1(fjj. The performance of fjjgenop1 seems adequate to me. The call to summary indicates the distribution of the number of function evaluations used.call()) } The results look reasonable using the defaults. rep = T.res1$numbest) 18 19 20 1 22 77 > summary(gentest1. So jittering is useful even in this artiﬁcially simple problem. 420 420 420 420 420 420 The result is worse while using more function evaluations. 77 of the 100 evaluations found the correct answer.) { funev <.numbest <. 306 321 330 330 336 351 When jittering is used. FUNCTIONS "fjjgentest1"<function(trials = 100. 1st Qu.) funev[i] <. To test this. Max. 1]) } list(funevals = funev.sum(this. 400). I set the number of jitters to zero and the number of births to 400. call = match. This performance is much better than that reported by Jennison and Sheehan with the traditional genetic algorithm that they used. . Median Mean 3rd Qu. Max.res0.4$funeval) Min.one$funeval numbest[i] <. 1st Qu. The ﬁrst thing to look at is the number of elements of the best solution that were TRUE in each of the trials. I hypothesized that the jittering might be slowing down the algorithm. matrix(sample(c(F. the number of function evaluations is random.res0.4$numbest) 17 18 19 20 3 4 31 62 > summary(gentest1.0 . Burns v1..jsa20. > table(gentest1. 20). numbest = numbest.numeric(trials) for(i in 1:trials) { this..one$popula[... Median Mean 3rd Qu.res1$funeval) Min. .330 CHAPTER 14. and only one time did the best solution contain more than one FALSE.one <. > table(gentest1. T). S Poetry c 1998 Patrick J. Since the problem is so simple.

Another addition in genopt is a purely random phase. This avoids the possibility of name conﬂicts altogether. Burns v1.. lower = . The genopt function has the argument add.0 .upper[pars > upper] pars } if(is.args) go..args = NULL. lower. Thus.Random.. if desired.args that is expected to be a list of the additional arguments. Without further ado.3.14. is a more useful function that minimizes a function via a genetic algorithm. First oﬀ.control( . add. upper = Inf. here is the deﬁnition of genopt.).list(population)) { objective <.args <.lower[pars < lower] pars[pars > upper] <. listed below. "genopt"<function(fun. control = genopt. genopt saves the random seed that it started with..get(fun. this is easy to do. In the latter case it is of importance that the function being optimized is not re-evaluated for the population. scale = dcontrol["eps"]. In fjjgenop1 the argument names were all in upper case to avoid conﬂicts with additional arguments for the function being optimized. Box constraints are allowed on the parameters. add. The population argument to genopt can be a matrix whose columns are each a parameter vector in the initial population.population$funevals S Poetry c 1998 Patrick J. so that the computations can be reproduced. or it can be the result of a previous call to genopt. The improvement in the population resulting from the random phase can greatly improve the eﬃcacy of the genetic phase. GENETIC ALGORITHMS 331 The genopt function.c(list(NULL). Parameter vectors are created by some random mechanism that does not care about the parameters within the population. and decidedly useful enough to be worth the eﬀort. so the eﬀect is like asking for a large number of births but getting intermediate results.function(pars. The creation of the random vectors in genopt is quite ad hoc—a better method should be used for speciﬁc applications. mode = "function") fun.seed <. genopt can be called repeatedly with the population equal to the result of the last call. A parameter vector that has a better objective than the worst member of the population replaces that member. With so many arguments capitalized.rectify <.character(fun)) fun <. . it presumes that the parameters are real-valued rather than binary—a much more common case..Inf.seed if(is. and frees up the three-dots for a diﬀerent task.) { random.population$objective funevals <. upper) { pars[pars < lower] <. population. the solution becomes almost as ugly as the problem.

upper) objective[i] <.na(funevals)) { funevals <. ]] + dcontrol["scale.ncol(population) if(is. FUNCTIONS population <.matrix(population)) stop("bad input population") popsize <.range[2. ].do. length = npar) upper <.args) } funevals <.control$dcontrol trace <. lower.rep(lower.fun. i].do.this. 1.apply(population.args[[1]] <go.control$icontrol dcontrol <.order(objective)[popsize] population[. maxind] <. i] <.numeric(funevals) || is.range <. fun.null(popsize) || length(objective) != popsize) stop("bad input population") if(!is.obj <.n"]) { par.0 warning("funevals starting at 0") } } else { if(!is. format(max(objective)).fun. "to".args) if(this. ]) this. format(minobj).0 . ]] <par. par.range[2.range[2.obj < maxobj) { maxind <.rectify(population[. par.call("fun".icontrol["trace"] minobj <.rep(upper.range[2.range[1.nrow(population) lower <. "\n") } if(icontrol["random. length = npar) if(any(upper < lower)) stop("upper element smaller than lower") for(i in 1:popsize) { population[.ncol(population) objective <.range[2.max(objective) for(i in 1:icontrol["random. Burns v1.range[1. par.range[1.min(objective) npar <.args[[1]] <.runif(npar.call("fun".population$population popsize <.args[[1]] objective[maxind] <.popsize } icontrol <.obj S Poetry c 1998 Patrick J. range) par.nrow(population) if(trace) { cat("objectives go from". par. ] == par.n"]) { fun. fun. ] == par.numeric(popsize) npar <.min"] maxobj <.332 CHAPTER 14.

format(minobj).child objective[out] <.population[.obj if(trace && child.obj[1] > parent.do. "parents:".T child[cloc] <. rep = T.sample(c(T.objective[parents] survive <.parents[1] else out <.min"]] <.population[cloc. format(maxobj). GENETIC ALGORITHMS maxobj <.obj <.obj[2])) { if(parent. Burns v1. parents[2]] fun.min"] scale <. 1 .dcontrol["prob"] prob <. "\n") } } njit <.14.obj cat("new minimum\n") } 333 S Poetry c 1998 Patrick J.obj).child.F else if(all(!cloc)) cloc[sample(npar.child. fun. "\n") } if(survive || (child.child child. F). format(child. "to".3.obj) if(trace) { cat(i.rep(lower.icontrol["jitters.obj < minobj) { minobj <.obj <.rep(scale.args) funevals <.child.obj == parent.n"] lower <.args[[1]] <. length = npar) upper <. parents[1]] cloc <.sample(popsize.prob) maxeval <. out] <. prob = prob) if(all(cloc)) cloc[sample(npar.call("fun". npar.obj[1] && child.obj. "child:". 2) child <. 1)] <.rep(upper. if(survive) "(improve)".obj < max(parent.obj == parent.funevals + 1 parent.0 .c(prob. 1)] <.obj[2]) out <. parent. length = npar) if(any(upper < lower)) stop("upper element smaller than lower") scale[scale < dcontrol["scale.dcontrol["scale.max(objective) } } if(trace) { cat("objectives go from".icontrol["maxeval"] for(i in 1:icontrol["births"]) { if(funevals >= maxeval) break parents <.parents[2] population[. length = npar) prob <.

obj).3. maxeval = Inf) { dcon <.1.0 . random.min) S Poetry c 1998 Patrick J. Often. lower.call()) } It would be much nicer if each birth and the jittering were abstracted into functions.obj < child. scale.min = 1e-12. scale).population[. The diﬃculty with this is how information is to be passed in and out of the functions.jchild. ord]. Burns v1. trace = T. scale.obj <.seed = random.obj < minobj) { cat("new minimum\n") minobj <.obj) { child <.n = 0. FUNCTIONS for(i in seq(length = njit)) { fun. objective = objective[ ord]. An important feature of genopt is that it uses the “control” paradigm that was introduced into S with some of the statistical modeling functions (Chambers and Hastie.334 CHAPTER 14. upper) jchild. prob = prob.obj if(trace) { cat("jitter successsful:". as in genopt.args ) funevals <.n = 3.seed. This list can be created with a separate function—in this case called genopt.call("fun".funevals + 1 if(jchild.do. child.go. out] <jchild child.c(eps = eps.rectify( rnorm(npar. call = match.control"<function(births = 100. format( jchild. jitters.min = scale. "genopt.obj <. This is a convenient and powerful convention. funevals = funevals.control. fun.obj } } } } } } ord <.jchild <.args[[1]] <. "\n") if(jchild.order(objective) list(population = population[. There is a control argument that expects an object containing a number of items that control the computation. the three-dots of the main function allows individual control arguments to be given. 1992). random.objective[out] <jchild. eps = 0. prob = 0.

This function would not need to change if the optimization were put into C. trace=F) > summary(jj2$objective) Min. 1st Qu. there is an S function called napsack for one class of knap-sack problems.jsa20.14. + upper=1. upper=1. Suppose that we still want to minimize fjj. lower=0.c(births = births. The Jennison and Sheehan problem can be made a little harder by letting the parameters be real-valued between 0 and 1.jsa20 but there is the constraint that the sum of the parameters can be no more than 10.1]) Min. but we’re not doing so terrible with 650 function evaluations.control if no new control arguments were necessary (unused ones pose no problem). trace = trace. 1st Qu.08 -31. Burns v1. Here’s an approach to solving this problem with genopt. Such a function could use genopt.09 -28.72 -33 -32.genopt(fjj. jj1.n = random.7201 0. A new control function would be required if new controls were used.8724 0. Median Mean 3rd Qu.jsa20. The genopt function can be used as a model for a function that specializes the optimization.44 -32. Median Mean 3rd Qu.06 -29. The ﬁrst command indicates what the true minimum is.n.0 .16 > summary(jj2$population[. This new problem is known as a knap-sack problem. in C it matters greatly. random. instead of only 0 or 1.20). the division between integer and ﬂoating-point numbers is unnecessary. jitters. dcontrol = dcon) } 335 For use within S. 0. S Poetry c 1998 Patrick J. trace=F) > summary(jj1$objective) Min.9248 1 1 > jj2$funeval [1] 646 This is obviously harder than the binary problem. Median Mean 3rd Qu. -33. Max.49 -32.3. The problem can be made more realistic by adding a constraint to the problem.78 > jj1 <. An inexact but often eﬀective way of dealing with constraints is to add a penalty to the objective if constraints are violated.9713 0.45 -30. maxeval = maxeval) list(icontrol = icon. matrix(runif(400). 1st Qu.32 > jj2 <. > -sum(jsa20) [1] -35. Max.genopt(fjj. However. lower=0. GENETIC ALGORITHMS icon <. Max.32 -29.49 -25. -31.n = jitters.n.

> fjj. sum) [1] 10.747117 9.jsa20sum.65 -21.jsa20sum.31 -20. upper=1.157180 10. -22.32 -22.sum(x * jsa20) <. The default cost is almost surely too small. FUNCTIONS Here is a new function to be optimized that includes a penalty for the constraint. + lower=0. cost = 2) <.65 -21.1 -21.074452 9. Genetic algorithms don’t fail on account of discontinuity. lower=0. so the cost obviously can’t be inﬁnite when using this trick with derivative-based optimizers.692068 11. but not egregiously violated.024000 9.476254 9.020323 9.020323 10. jj3. Median Mean 3rd Qu.344031 10.84 S Poetry c 1998 Patrick J.genopt(fjj. -22.59 At this point the constraint isn’t obeyed very well.genopt(fjj.639238 9.250866 10. > jj4 <.322911 10.476254 10.880965 [11] 9.963033 10.14 > jj3 <.803568 9.842347 > summary(jj3$objective) Min.250866 10.594859 10.11 -21.676853 [6] 10. matrix(runif(400).057599 > summary(jj4$objective) Min.858581 9. 1st Qu. > -sum(sort(jsa20)[11:20]) [1] -24. Here we see what the true optimum is.640020 [11] 10.20).731130 10. so we give a diﬀerent value of cost to fjj. Max.0 .640379 9.jsa20sum.057599 11.95 -19. Max. Burns v1. + add.934612 9. trace=F) > apply(jj4$population.048758 [16] 10. sum(x) . 2.074452 11.319002 10.23 -21.819179 10.979223 10.854145 [6] 10. but you still don’t want the cost to be excessively high because you want to guide the algorithm toward the right place.640020 [16] 9. An inﬁnite cost causes a discontinuity.344031 10.336 CHAPTER 14. trace=F. and the objectives aren’t so bad.arg=list(cost=50).max(0.991200 11. sum) [1] 9.limit) + violation * cost Naively you may think that the cost parameter should be inﬁnite.436489 10. limit { objective violation objective } = 10. random=100) > apply(jj3$population. 1st Qu.62 -21. and then evaluate how well we are doing with the constraint and the objective. upper=1.894391 9. Median Mean 3rd Qu.24 -20. 2.594859 10..jsa20sum function(x.850953 10.

582101 > summary(jj5$objective) Min.genopt(fjj. > jj5 <.923369 [11] 9.392638 9.14.808857 9. Another popular way is for the population not to move much at all.jsa20sum. GENETIC ALGORITHMS > jj4$funeval [1] 442 337 Often a much better solution can be found by inserting into a population the best solution from each of two or more independent runs of the genetic algorithm. lower=0.961821 9.arg=list(cost=50)) > apply(jj6$population.811276 9.975258 9.33 -22. jjp1. + add. jj5$pop[.995824 [21] 9.536091 [16] 9. To be eﬀective a genetic algorithm requires a great number of function evaluations.919993 [6] 9. -22.979015 9.756353 9.46 For many applications.679190 [11] 9.890572 9.961821 [16] 9.92 -21. then the overhead of the genetic algorithm in S will be negligible.972682 9. The one problem that a genetic algorithm can have is that it doesn’t go to the optimum. As the population size goes from tiny to huge. 2. S is the wrong place for a genetic algorithm. + trace=F.genopt(fjj.82 -22.852910 9.287558 9.97 -19.876357 9. matrix( + runif(400). Median Mean 3rd Qu. -21.16 -21.jsa20sum. matrix(runif(400).20)) > jj6 <.800468 9. 2. 1st Qu. you will go from the ﬁrst of these problems to the second.881737 9. 1st Qu.761804 9. Max.570757 9. sum) [1] 9.975258 9.848886 9.871977 9.400245 9.78 -19. Burns v1.931718 9.16 -22. Max. But if each evaluation will take a long time anyway.947428 9. + add.24 -18. trace=F. upper=1.72 -20.1]. sum) [1] 9.932042 9. So choosing the population size is important—the proper choice depends on the particular problem and on the use to which the result is put. Median Mean 3rd Qu.20). This is the strategy with jj4 and jj5 that arrives at jj6.899633 9. upper=1. birth=500.0 .3.966458 > summary(jj6$objective) Min. + lower=0.934612 9.59 > jjp1 <. then the optimization should be in C or another compiled language. random=100.538543 9.1].985085 9.arg=list(cost=50)) > apply(jj5$population.785121 9. S Poetry c 1998 Patrick J.206575 [6] 9.964027 9.701156 8.51 -19. random=100. One way this occurs is that the population initially moves well but too soon everyone in the population is very similar—premature convergence.809348 9.cbind(jj4$pop[. If the function being optimized does not involve heavy computing.969054 9.970647 9.

The main thing to get right in a genetic algorithm is the genetics. start. control = portopt. the vector of parameters over which the optimization occurs and the length of the vector. I tend to use population sizes of 20 to 50. A drawback of these two functions is that they are slow because they use S functions. 14. I’ve chosen a population of size 10 in an application.Inf. then the algorithm will be ineﬀective. This comes out of Bell Labs and was written by David Gay and associates. but it gives us a foothold on the problem. length = p) S Poetry c 1998 Patrick J.length(start) if(length(scale) != 1 && length(scale) != p) stop("bad length for ’scale’ argument") if(any(scale) <= 0) stop("nonpositive scale vector") scale <. lower = . Both of the S-PLUS functions nlminb and nlmin are based on PORT routines.address.) { if(fun. .control(.338 CHAPTER 14. scale = 1. Burns v1. but I have no basis in fact for doing so.address == 0) stop("symbol not loaded") p <. The development of genetic algorithms is limited primarily by our imaginations. One good reason to use S for genetic algorithms is because S makes it easy to use whatever is best at representing the solutions.. FUNCTIONS On one extreme.rep(scale. and each function evaluation is reasonably expensive. The portopt1 and portoptgen functions discussed below provide a way of optimizing a function written in C or Fortran but getting the results in S... In cases where the result needs to be close to the true optimum and the cost of function evaluations is insigniﬁcant in comparison.).. This is not terribly useful functionality. upper = Inf. "portopt1"<function(fun. and because the number of parameters can be very large. then a population size of hundreds is probably not too many. In run-of-the-mill situations. Simple interface The portopt1 function allows you to minimize a C or Fortran function that has just two arguments. A genetic algorithm is used to get a starting point for a derivative-based optimizer because there are local minima.0 .4 Optimization via PORT The PORT library is some of the best optimization software that is freely available. If what you are using in the population to represent each solution does not map well into the objective to be optimized.

339 S Poetry c 1998 Patrick J. as. "step.tol". as. as. "scale.77 + (p * (p + 23))/2 init <. length = p) upper <. "objective". "scale.rbind(lower = lower. "step. Burns v1..integer(p).integer(lv).is. "diff.tol".max")] dnam <. "sing. "both X and relative function convergence". "x.fac". "iter.Fortran("divset".dcvec[!dmiss] names(upper) <. as. iv = as.tol".icontrol[c("eval.double(init$v).0 .integer(fun.double(start))[c("parameters".mod".double(scale).1e+30 lower[lower < .switch(ans$iv[1] . v = double(lv))[c("iv".big upper[upper > big] <.integer(init$iv).integer(liv)."double" ans <.14. as. npar = as. "singular convergence". length = p) if(any(lower > upper)) stop("box constraints are infeasible") liv <.c("abs.2.control$icontrol init$iv[17:18] <. "limits")] msg <.4.max". as. "X convergence". iv = integer(liv).big] <. v = as.names(start) limits <. "sing. upper = upper) storage.rep(upper.address). scale = as. limits = limits. parameters = as. 39:42)[!dmiss]] <.control$dcontrol[dnam] dmiss <.tol". "absolute function convergence".C("portopt_one_Sp".mode(limits) <.na(dcvec) init$v[c(31:37..step".max".grad") dcvec <.tol".rep(lower.big lower <.integer(icontrol). "iv". OPTIMIZATION VIA PORT if((length(lower) != 1 && length(lower) != p) || (length( upper) != 1 && length(upper) != p)) stop("bad length for lower or upper") big <. as.. "rel. "scale. objective = double(1). "v")] if(init$iv[1] != 12) stop("abnormal return from divset") icontrol <. "relative function convergence".integer(lv).integer(liv).59 + p lv <.min".integer(2).

objective.h> void portopt_one_Sp(liv. if(trace) { S Poetry c 1998 Patrick J.msg ans$call <. Then the optimizer is invoked through the call to C. "iteration limit reached". stop("negative component in scale vector"). *scale. limits. stop("invalid output code 17"). icontrol. iv. v. lv. and ﬁnally the result is ﬁxed up for the return. trace = icontrol[2].0 . { long i. DOUBLE) || is_inf(objective. npar. par) long *liv. DOUBLE)) { iv[1] = 1. *par. v). par. double (**f_addr)(). Burns v1. *iv. f_addr. if(is_na(objective. stop("invalid output code 13").call() ans$iv <. } while(iv[0] < 3 && iternum < iter_max) { if(iv[30] != iternum) { iternum = iv[30]. iter_max = iv[17]. scale. and then changes control arguments as needed. FUNCTIONS "false convergence". *objective = (**f_addr)(par. *objective. iternum=-1. if(iter_max) { F77_CALL(mnb0)(npar. Then it calls a Fortran function that initializes the two special arrays for the optimizer. npar). stop("LIV is too small"). *icontrol. scale. stop("cannot complete optimization")) ans$convergence <. stop("LV is too small"). *lv. double *v. stop("invalid output code 12"). "function evaluation limit reached". lv. It starts out checking for misspeciﬁcation of arguments and getting those arguments ready to be used in the optimization. *limits. stop("invalid output code 11"). limits. *npar. iv. objective.NULL ans } This is really a simple function despite its length. liv.340 CHAPTER 14. The novel part is the C code that arranges for the optimization to take place. #include <S.match.

Finally it notiﬁes the optimizer if trouble is afoot. fflush(stdout). iv.max = 150.step = NA. DOUBLE) || is_inf(objective.fac = NA.tol = NA. eval. abs. It then calls the PORT optimizer. npar). it is a pointer to the function. scale. Next it calls the function to be optimized with the parameter vector that the optimizer set.mod = NA. that is.max = 5000. scale. The portopt. } } F77_CALL(mnb0)(npar.grad = NA. npar). The number that is contained in the S vector is the address of the function. sing.max = eval. liv. The operative part of this function is a while loop that does four things. iter. the declaration would remain the same. portopt1 follows the convention of a control argument that is supplied by a “something. sing.tol = NA. x. S always thinks in terms of vectors—if there were addresses for a number of functions contained in the S vector passed into C.min = NA.max. v).c(npar = NA.control” function.control"<function(eval. so we have a pointer to a pointer to the function. Perhaps you are puzzled by the declaration: double (**f_addr)().control function is a little diﬀerent in that it uses missing values for some of its defaults because the real defaults are created elsewhere. par. step. scale. It updates the iteration number if necessary and prints the iteration information if asked to. "portopt. if(is_na(objective.tol = NA. *objective = (**f_addr)(par. *objective).4. scale. lv. the Fortran function mnb0. S Poetry c 1998 Patrick J. limits.tol = NA. trace = T) { icontrol <. I’ve found the extra bother to be well worth it—there are a surprising number of instances in which doing nothing is desired. DOUBLE)) { iv[1] = 1. Whatever S passes into C is a pointer to that information. } } } *objective = (**f_addr)(par. step. iternum. rel. Burns v1. OPTIMIZATION VIA PORT 341 printf("iter: %d objective: %g\n".max = NA. } The ﬁrst thing you may notice is that the code is complicated by a test to see if the maximum number of iterations is zero.tol = NA. diff.0 .14. objective.

342 CHAPTER 14.3.character(symbol) .tol. dcontrol = dcontrol) } The symbol. 2.tol = x. 6. scale. trace = trace) dcontrol <. i < 3. i++) { dif = data[i] ans = ans + dif } for(i=3. { long i.8}. n) long *n.tol = rel. x. } pars[0].tol. double *pars.step = sing. for(i=0. i++) { dif = data[i] ans = ans + dif } return(ans).grad = diff. At this stage of testing.grad) list(icontrol = icontrol. Now compile the code. dyn load it and use it in the optimization function.fac.tol. S Poetry c 1998 Patrick J.mod.max.0. integer(length(symbol)). 8. ans=0.mod = scale. we really do want to know the right answer beforehand. -. length(symbol))[[2]] } As a test.fac = scale. step.min. sing.tol = sing. double jjtest1(pars.max = iter. It returns the addresses of a vector of symbols—zero means that the symbol is not currently loaded (or you didn’t get the symbol right). static double data[6] = { 3.address function(symbol) { symbol <. I use a computer intensive way of calculating two means. Burns v1. symbol.min = step. rel.loaded SPLUS function. sing. step.tol. 9.as.tol.max. double dif.tol = abs.step.address function is a slight modiﬁcation of the is.4. * dif. scale. FUNCTIONS iter.7.0 .max = step. symbol. diff.tol = scale.c(abs. i < 6. pars[1].1.2. scale. * dif.C("S_get_entries".

48 $limits: [.C("jjtest1")).o make -f /usr/lang/sp/newfun/lib/S_makefile jjtest1.400020 $objective: [1] 64.c > dyn. Here is the test function translated into Fortran. c(0.3 343 The answers are correct. It is true that this is a little unnatural in C.c targets= jjtest1.300002 3.9/3 [1] 6.address = symbol.o cc -c -I${SHOME-‘Splus SHOME‘}/include -O2 jjtest1.load("jjtest1.0 . OPTIMIZATION VIA PORT > !Splus COMPILE jjtest1. tdat(6) tdat(1) = 3. Burns v1. n) implicit none integer n double precision pars(n) integer i double precision dif.23 iter: 1 objective: 178.1] [. and the algorithm converged okay. If you were paying attention.14.C( "jjtest1")).2] lower -1e+30 -1e+30 upper 1e+30 1e+30 $convergence: [1] "relative function convergence" $call: portopt1(fun.4. function jjtest2(pars.4 > 18.3 S Poetry c 1998 Patrick J. 0)) > 10.48 iter: 3 objective: 64.o") > portopt1(symbol.48 $parameters: [1] 6.2/3 [1] 3.0)) iter: 0 objective: 218. but it allows Fortran to be used as well as C. start = c(0.4 tdat(2) = 6.address(symbol.address(symbol.277 iter: 2 objective: 64. jjtest2. then you have already chastised me for passing a pointer for the parameter length in the calls to the function to be optimized in portopt one Sp.

a C function that not only has a vector of parameters over which to be optimized. data) long n.8 CHAPTER 14.C function. in this case it is much more ﬂexible in what it can do. c(0. Let’s start with an example. *lengs.pars(2) jjtest2 = jjtest2 + dif * dif continue return end 10 20 It is used like: > portopt1(symbol. but data inputs also. { S Poetry c 1998 Patrick J. and returns an S function that makes that call.c #include <S. lengs. For example.2 -. > !cat jjtest3. 6 dif = tdat(i) . like the code in the last subsection.0 do 10 i=1.h> double jjtest3(pars. General interface The portoptgen function provides a way to build an S function that minimizes a function in a compiled language that has several arguments. and go through how it works later.7 2. FUNCTIONS jjtest2 = 0. The C function jjtest3 is the function that we want to minimize.For("jjtest2")). double *pars. but put into a function that insures that the code computing a speciﬁc function is loaded and then calls portopt1.344 tdat(3) tdat(4) tdat(5) tdat(6) = = = = 9. n. This.0)) I foresee portopt1—if it is useful at all—not being used much directly. *data.0 . Burns v1.1 8.address(symbol. is a computationally intensive way to ﬁnd the means of numbers in some groups. portoptgen creates a ﬁle of C code containing a function to be called by the . However. 3 dif = tdat(i) .pars(1) jjtest2 = jjtest2 + dif * dif continue do 20 i=4.

for example). This call also uses two of the optional arguments. lengs="long". The names of this vector are used by S and C as variable names. nonpo="n". the number of parameters. j. } count = top_count. It makes it more confusing. double dif. we indicate that in the nonpointers argument to portoptgen.4. n="long". OPTIMIZATION VIA PORT long i. In general. ans = ans + dif * dif. Now that we know what the function looks like. An improvement would be to ﬁnd a better name in place of “nonpointer”. c(v.portoptgen("jjtest3".c S Poetry c 1998 Patrick J. or S storage modes (“integer”). make="n") Creating file port_opt_jjtest3. and hence more likely that bugs will occur. Burns v1. i < n. so we don’t want to have this as an argument to the S function—S can assign it within the function. it is not a good idea to force users into a double negative as in nonpointer=F means “pointer”. } 345 This function has arguments that are the array of parameters over which the optimization takes place (they will be the means). top_count=0. This is merely the length of the vector of parameters. Arguments that are in make are not arguments to the S function that is created (hence they need to be made inside the function). we are ready to use portoptgen. Here is the top of the ﬁle of C code. In the call to portoptgen above. and an array containing all of the data. j++) { dif = data[j] . at least.c The ﬁrst argument to portoptgen is the name of the function to minimize. for(i=0. an array giving the number of points in each group. But in this case the ease of use overrules this—in my mind. All of the variables are presumed to be passed by address into the C function unless noted in nonpointers. for(j=count. j < top_count.14. two things happened: the S function was put into fjjtest3.0 . The second argument is a character vector containing the declarations for the arguments to the function—these can either be C declarations (“long”.pars[i]. > fjjtest3 <. ans=0. i++) { top_count += lengs[i]. The n variable is also in the make argument to portoptgen.2="double". Since n in jjtest3 is not a pointer. } return(ans). data = "double"). and a ﬁle of C code was created. count=0.0. > !head -20 port_opt_jjtest3.

trace = icontrol[3].) In a lot of situations. v. double *data. lengs.0 . lv. data ) long *liv. lower = . objective. if(is_na(objective.C. n. lengs.o’) # # need to ensure that port_opt_jjtest3_Sp is loaded. possibly with # if(!is. . liv. There is nothing that needs to be done to this except compiling it. *n. limits. objective. long *npar = icontrol. v).loaded(symbol.loaded(symbol. *objective. if(iter_max) { F77_CALL(mnb0)(npar.C(’jjtest3’))) # dyn. icontrol. The arguments to jjtest3 are passed in along with the other arguments needed for the optimizer. *iv.) { # need to ensure that jjtest3 is loaded. *scale.load(’port_opt_jjtest3.control(. long *lengs. iter_max = iv[17]. control = portopt. { long i. iternum=-1. Comments at the top indicate what needs to be done. (There is lots more of the function..). double *v.2) if(length(scale) != 1 && length(scale) != p) stop("bad length for ’scale’ argument") .length(v.h> void port_opt_jjtest3_Sp(liv. ..load(’jjtest3. extern double jjtest3(). limits. > fjjtest3 function(v. v_2.Inf. *icontrol. upper = Inf. scale. and the call to jjtest3 is done correctly. iv. (Note that v. *objective = jjtest3(v_2. data). DOUBLE) || It is a function that is called by fjjtest3 through .) The S function that portoptgen returns does usually need a little work before it is usable. Burns v1. *limits. scale = 1. long *n. FUNCTIONS #include <S. you can probably just S Poetry c 1998 Patrick J. . iv. lv..2 = NULL.o’) # # need to fix up n p <. data = NULL.2 is translated to v 2 in C. scale. *lv. v_2. possibly with # if(!is..C(’port_opt_jjtest3_Sp’))) # dyn. double *v_2.346 CHAPTER 14. lengs = NULL.

c(3. 1:8) iter: 0 objective: 60 iter: 1 objective: 34. 5). the fortran argument is a logical that tells whether cfun is Fortran or C. we can compute our means. S Poetry c 1998 Patrick J. data = 1:8) Here are all of the arguments to portoptgen. make = F. Any variable that appears in the make argument to portoptgen will need to be taken care of. Once the S function editing is done and the C code compiled. lengs = c(3.999999 $objective: [1] 12 $limits: [.load to take care of the issue of loading.4. nonpointers = F.1624 iter: 3 objective: 12 iter: 4 objective: 12 iter: 5 objective: 12 $parameters: [1] 2. args. parameter = 1.0 . > fjjtest3(c(1. 3). As you might guess.2] lower -1e+30 -1e+30 upper 1e+30 1e+30 $convergence: [1] "relative function convergence" $call: fjjtest3(v.329 iter: 2 objective: 24.1] [. DANGER. The function as returned by portoptgen will create an error when any of these is attempted to be used (in the . OPTIMIZATION VIA PORT 347 uncomment the lines involving dyn. > args(portoptgen) function(cfun.000000 5. The parameter argument indicates which argument to cfun is the one over which optimization is to occur.14. 5). It would usually be good form to make the arguments to the S function that are also arguments to cfun be required arguments—that is.2 = c(1. 3).C call). fortran = F) NULL There are two that we haven’t discussed yet. The return value of the cfun function that portoptgen uses must be double precision. Burns v1. remove the NULL default value.

These will be replaced with text that depends on the function that is being optimized.character(cfun) || length(cfun) > 1) S Poetry c 1998 Patrick J. v). trace = icontrol[3].stemplate. args. Another ingredient is portoptgen. icontrol. scale. v. parameter = 1. The portoptgen. > portoptgen function(cfun." [15] "\t\t\tobjective. iternum=-1. *icontrol." [16] "\t\t*objective = cFcALL" This should be familar from the C code shown above except that there are funny bits like “cFsUF”. then— when it was correct—reading it into S with the command: portoptgen. liv. iv.c". The ﬁrst ingredient is an object that contains the template for the C code. limits. > portoptgen. *iv. limits. " [4] "cFaRGS" [5] ")" [6] "long *liv. cFpAR.ctemplate[1:16] [1] "#include <S. lv. Burns v1. scale." [8] "cFdEC" [9] "{" [10] "\tlong i. make = F. how it all works. obje ctive.348 CHAPTER 14. This is the model on which the return value of portoptgen is built. Here are the ﬁrst few elements of it.0 ." [12] "\textern double cFfUNDEC().h>" [2] "void" [3] "port_opt_cFsUF_Sp(liv. FUNCTIONS Now.ctemplate object was created by typing in a ﬁle. *limits. *objective. It is quite similar to the portopt1 function. *scale. *lv. iv." [13] "\tif(iter_max) {" [14] "\t\tF77_CALL(mnb0)(npar. sep = "\n") The only trick is that you want to have a double backslash in the ﬁle anywhere that you would have a backslash in C. nonpointers = F.ctemplate <. what = "". Here is the deﬁnition of portoptgen.scan("portopt_gen. lv. fortran = F) { if(!is." [7] "double *v. iter_max = iv[17]." [11] "\tlong *npar = icontrol.

"objective". ".names(args))) names(args) <. "icontrol".cargs sam <.outnames. "iv".bnam[nonam] } if(any(is. c("double". "character".4. nomatch = 0) if(any(cam)) { cargs[cam == 1] <.args cam <. 1:length(args). stop("invalid input for make")) S Poetry c 1998 Patrick J. arglen) names(make) <.0 ."integer" sargs[sam == 3] <.names(args) make[maknam] <. "_") if(any(duplicated(names(cargs)))) stop("duplicated names in arguments") c."single" sargs[sam == 2] <.rep(make. c."float" cargs[cam == 2] <."character" names(sargs) <."char" } names(cargs) <. "integer". length = arglen) } .outnames <. logical = { make <. "liv". "complex"). nomatch = 0) sargs[sam == 1] <. "_".paste("v". c("single".transcribe(names(cargs).na(match(args.length(args) switch(mode(make). "character"). "limits") if(sum(match(names(cargs).transcribe(names(sargs). "integer". "v".outnames.make make <. "long". nomatch = 0))) stop(paste("Can’t have C argument names in:". nomatch = NA)))) stop("bad type in ’args’ argument") cargs <. "float". c("float"."long" cargs[cam == 3] <.". "long". paste(c. "))) sargs <.") arglen <. "scale". OPTIMIZATION VIA PORT 349 stop("cfun must be a single string") if(fortran) nonpointers <. "single". "char"). sep = "") if(!length(anam <. Burns v1.14.rep(F.c("lv".F bnam <.match(cargs. "char".nchar(anam) == 0 names(args)[nonam] <.T if(length(make) != arglen) stop("bad input for make") } . collapse = ". ". character = { maknam <.match(sargs.bnam else { nonam <.

test["dir"]) stop(paste(cfile.350 CHAPTER 14. FUNCTIONS names(make) <. sep = "") cfile.paste(cargs.test["read"]) stop(paste(cfile. Burns v1.name.rep(nonpointers. ") cfun.m] } if(sargs[parameter] != "double") stop("parameters need to be double precision") if(make[parameter]) stop("parameter must be passed in. dir = T. nomatch = NA) if(is.nonpointers nonpointers <.name.test["wr"]) stop(paste(cfile.name. names(cargs).test["exists"]) { if(cfile.dec <.na(par. sep = "". nomatch = 0)) { par. arglen) names(nonpointers) <.rep(F.names(sargs)[par.test <.filetest(cfile. wr = T.cn <. cfun. "\n") } else { cat("Creating file". collapse = "\n") if(fortran) cfun. ")". logical = { nonpointers <. sep = "") S Poetry c 1998 Patrick J. collapse = ". names(args).name.m <. length = arglen) } . " *".name <. character = { pointnam <.c".m)) stop("bad value for ’parameter’ argument") parameter <.names(args) nonpointers[pointnam] <.name. "is a directory")) if(!cfile.numeric(parameter)) parameter <. cfile. not made") switch(mode(nonpointers).0 .". cfile. ". "\n") } cfun. stop("invalid input for nonpointers")) # # # create and modify file of C code # cfile. cfun.T if(length(nonpointers) != arglen) stop("bad input for nonpointers") } .args <. names(sargs).paste(names(cargs). ".paste("F77_CALL(".paste("port_opt_".name. "is unreadable")) cat("Overwriting file". r = T) if(cfile.names(sargs)[parameter] else if(!match(parameter.match(parameter.names(sargs) if(is. "is unwriteable")) if(!cfile.

ctemplate. sep = "\n") substifile(cfile. names( make)[make])) } the.name.comm <. collapse = ".double(init$v).c(new. file = cfile. cfun.as. ifelse(make. paste( "# if(!is. "cFdEC". cfun.comm. sargs.name(parameter) the.C("portopt_one_Sp". paste("stop(’fix up". sep = "".c(new.call(parse(text = paste("foo(". limits = limits.name) . sep = "")).1). possibly with").comm[[1]]. "cFaRGS".C(’port_opt_".loaded(symbol.integer(init$iv). "}"))) ans[[ans.c(the. nchar( cfile.name. "is loaded.len <.args) substifile(cfile. as. "objective". "cFsUF".integer(liv). 1.cfun cfun.integer(lv). collapse = ".dotC <.loaded(symbol.as. "") ans <. names(cargs). "cFcALL". Burns v1.portoptgen. names(sargs). sep = "").eval(parse(text = c("function() {". cfun. cfun.sargs <. scale = as. paste("# dyn.len]] <.0 . substring(cfile.fun)]. ". length(new.par <. as. cfun. paste( "# need to ensure that port_opt_".name.load(’".names(sargs)[!make] ans <.name.call <. 2)]] <. "# ".14. "# ". as. "(’". cfun. cfun. "(".name. "cFpAR".fun[ .c(paste("# need to ensure that".sargs) + 1) names(new.comm <. "=". paste(names(sargs). objective = double(1). the.integer(icontrol). sep = "")) if(any(make)) { the. "_Sp is loaded. paste("# dyn. paste( "as. paste(ifelse(nonpointers. "(". sep = ""). paste("# if(!is. cfun. names(sargs))] cat(portoptgen.load(’". "o’)".fun <.comm.". names(sargs). "iv". v = as.length(ans) ans[[c(ans.".vector("function". ")"))) S Poetry c 1998 Patrick J. ans) the. "limits")]) new.sargs. "cFfUNDEC".". 2.. "*". possibly with".cn. cfun) # # # get S function right # new.call) substifile(cfile. iv = as.o’)". sep = "").name. "’)"). "’)))".name. ans[[ans. 1.paste(cfun.stemplate ans.par) substifile(cfile. cfun. ").c(the.cn) substifile(cfile.fun) <.dotC <.double(scale))[c("parameters". ").comm <.name. cfun.cn <.dec) substifile(cfile. sep = ""). ")". "").len.expression(ans <. "). sep = "") cfun. OPTIMIZATION VIA PORT 351 else cfun.4.len]]) new. paste("# need to fix up".length(new. "_Sp’)))". if(fortran) "For" else "C".names(cargs)[match(parameter.

Should.length(ans) dotC. upper = Inf. It is then pasted some more.dotC[[1]])[match(parameter. So the replacement into ans is picking out the second part of the second part of the ﬁrst statement in ans. The ﬁrst section makes sure that the arguments are right. as.dotC[[1]] ans } There’s a lot to explain here—we’ll just go in order.paste("port_opt_". 2)]]. parsed. It starts out by testing if the ﬁle that would be used is suitable—this uses filetest of page 162.0 . or can be ﬁxed up to be right.comm starts out life as a character vector that has been pasted together.352 CHAPTER 14..dotC[[1]]))] <"parameters" the.len. 2. FUNCTIONS names(new. scale = 1. The ﬁrst few lines of portoptgen. and putting in the name of the parameter. Next the two objects of mode “brace” are concatenated together with c to form the new body of ans. This move to a function is basically so we have a container to hold the comments—they tend to run through your hands if you don’t have a proper bucket.dotC[[c(1.len <.length(This. In this case it is more than a good idea because we don’t want to have the side eﬀect of creating a permanent ﬁle. control = portopt.call". Next comes the. 2)]] <. and evaluated to form a function. Then the template is catted into the ﬁle.control(.character(ans[[length(ans)]])) ans[[c(ans. Now comes: S Poetry c 1998 Patrick J. and then substituted into the ﬁle with substifile of page 277.Be. 2)]] <.C. dotC.num)]] <.) { p <..num <.Gone) if(length(scale) != 1 && length(scale) != p) Recall that the last component of a function is the body of the function. In general this is a good idea so that the user doesn’t need to wait around merely to be told that there was an error. Here we come across the statement: ans.match("the. The last section creates the S function to be returned. 2.the... 2. 2.len. sep = "") ans.dotC[[c(1. and then abort the function.. 1. the. The second section builds the C code.c(the.name(parameter) Subscripting with a vector or list inside [[ is described on page 203.stemplate (which is the same as ans here) are: function(lower = .as.comm which creates the comments at the start of the ans that is returned.). new. Burns v1.dotC[[1]][-1]) the. "_Sp". . names(new. 2.dotC[[c(1.Inf.length(ans) ans[[c(ans. the required strings pasted together. 2)]] <.len <. cfun. All of the error creation is near the top of the function.

THINGS TO DO new. "") ans <. ans) 353 This creates a function that has arguments that are those that we want to add to the answer. S Poetry c 1998 Patrick J.sargs.fun <. One such book is Mathews (1987)..sargs) + 1) names(new.fun)]. and so on are all abstracted (there are separate functions for each). If gradients or Hessians are computable.6 Further Reading You can learn about integration in the complex plane from a text on complex analysis. then portopt1 and portoptgen can be modiﬁed to use diﬀerent PORT routines. This follows the same pattern—we want to add to a call. Discussions of numerical integration are in numerical analysis books. one that I happen to have is Bak and Newman (1982) Complex Analysis.lagrange or portoptgen that creates functions that perform some other task. 14. Placing the call to . his book is (1975) Adaptation in Natural and Artiﬁcial Systems. Rewrite genopt so that births.5.c(new.C and place it in ans.length(new.fun) <.” John Holland introduced genetic algorithms. Interpolation is discussed in many numerical analysis books. 14. except that we take away the body of new.C. but I haven’t found anything that gives a good explanation of how the optimization is performed. for example. Burns v1. Gay (1990) gives details about how to use some of the PORT routines.C is easy since the spot in ans where it should go is the character string "the.integral.c(new. E. length(new. Goldberg (1989) Genetic Algorithms in Search.5 Things to Do Add an integration method to line.fun[ .14. Design a genetic algorithm for an optimization problem that you have. Hildebrand (1974) Introduction to Numerical Analysis. I also like his article (1989) “Zen and the art of genetic algorithms.vector("function".call".fun. Then the two functions are concatenated together. Write a function similar to interpolate.0 . jittering. The ﬁnal task to accomplish is to create the correct call to . Optimization & Machine Learning. A good source on genetic algorithms is D. so we create another call and use c to combine them.

354 14.0 . Burns v1.7 37 Quotations May Swenson “Question” S Poetry c 1998 Patrick J.

dof) if(i %% 10 == 0) assign(backup. dof)) } ans <. "call") <. unix("echo $$"). This saves partial results in case the job stops for some reason. trials = 100) { backup <.match. where = 1.simbk. Here is an example of the technique.Chapter 15 Large Computations How to live well on nothing a year. 15.1 Backups When the computations are large in the sense of time-consuming but the answer is not especially large.numeric(trials) attr(ans. I use a backup scheme.paste("sbk.sub <.function(n. dof. sep = ".call() for(i in 1:trials) { ans[i] <. "fjjsimbk"<function(n.sub(n.") simbk. immediate = T) } ans } 355 .BACKUP". ans. dof) { sum(rt(n.

For all averred. In general.0 .BACKUP.5 <. On the other hand.356 CHAPTER 15.10. "call") fjjsimbk(n = 20. It is harder to apply where there is really just a single answer.10. we might make two diﬀerent calls to the function. fjjsimbk has hidden side eﬀects also. It could be that the other call has merely not ﬁnished yet.BACKUP. The ﬁrst statement creates an object name that will be essentially unique from call to call of the function. > attr(sbk.BACKUP. I had killed the bird 38 15.13843. dof = 5) > attr(sbk. An alternative approach would be to have the function not return the object. For example. Of course.BACKUP.2 Reduce and Expand I have created the generic functions reduce and expand to relieve memory problems. dof = 3) This scheme is useful for simulations and similar situations in which there are a number of results. The immediate=T is key—this makes sure that the object is actually written to disk.13517.value" [3] ". The second statement assigns the backup object to the working database at appropriate intervals. Burns v1.3) At a later time we can see what exists in our database: > objects() [1] ". if the answer is a large object.fjjsimbk(10. LARGE COMPUTATIONS In a real application the sub-function would not be so trivial (I would hope). I’d rather avoid this—the hidden side eﬀects can create confusion. The call attribute allows us to sort out which backup object is for which call. then it may be wiser not to have two copies of it.fjjsimbk(20.20. ts. but rather use the object that is assigned internally be the “result” of the function.13843" "ts. probably in two diﬀerent ﬁles input to diﬀerent BATCH jobs.Random.5) ts. but the side eﬀects are of secondary importance and are unlikely to cause trouble.First" ". My ﬁrst application was to minimize the use of disk-space by removing components of objects that can be recreated.13517" [5] "sbk. The two statements involving backup implement the idea. but only one “real” result from the function.seed" "sbk. or it may have crashed. To use this.5" Here we see that there is a backup object for each process.Last. we could create S Poetry c 1998 Patrick J. "call") fjjsimbk(n = 10.3 <.

. c(n.) x Another use for reduce is to save on memory in computations.0 . m. "fjjtwomat"<function(n. or expand an unreduced object with no harm done.twomat"<function(x) x$x %*% x$y The next function is a reduce method that stores the two matrices in new locations. The default methods for reduce and expand both just return the ﬁrst argument unchanged.. sep = ". k))) ans$seed <.. REDUCE AND EXPAND 357 an lm."twomat" ans } The function above produces an object of a speciﬁc class that holds two matrices.2. sep = ".seed ans <. Burns v1.paste("rEdUtwomat".default"<function(x. "%*%. Below is a matrix multiplication method for this class.0 S Poetry c 1998 Patrick J. "x". immediate = F) { # attempt to make unique name basename <. "reduce.15.paste(basename.") # # ensure that it is unique xname <. The residuals can be recreated by the expand method assuming that the original response is still available and unchanged. This allows you to try to reduce an already reduced object. c(m. y = array(rnorm(m * k).twomat"<function(x. "reduce. and remembers those locations. unix("echo $$" ).") count <.seed class(ans) <.Random.reduced class that saves space by deleting the residuals. and only expanded at the last minute when needed. m)).list(x = array(rnorm(n * m). k) { seed <. . The reduced object is passed in.

"class"): [1] "twomat. bnchar).y" "/user/users/burns/spo/. It merely gets the real S Poetry c 1998 Patrick J. sep = ". where = 1)) { count <.18173.nchar(basename) while(exists(xname.x" "/user/users/burns/spo/. 1. x$y. > jjtr $x: name db "rEdUtwomat. class(x)) x } Most of the work is in ensuring that the names are unique and the full name of the working database is known. immediate = immediate) assign(yname. (Actually the code takes the liberty of assuming that if the “x” name doesn’t exist. db = dbname) class(x) <. count. LARGE COMPUTATIONS bnchar <.Data" $seed: [1] 41 57 51 7 5 2 21 49 27 17 30 0 attr(. where = 1.") dbname <. db = dbname) x$y <.Data" $y: name db "rEdUtwomat.c("twomat. then neither does the “y” name.count + 1 basename <.search()[1] # need absolute path if(AsciiToInt(dbname)[1] != 47) { dbname <. and is the same size no matter how large the matrices it represents.") } yname <.paste(substring(basename. where = 1. immediate = immediate) x$x <. "x".paste(basename. dbname. sep = ".358 CHAPTER 15.") xname <.reduced". Burns v1.paste(unix("pwd"). sep = ".18173.) Below is an object of the reduced class—this is quite a small object. sep = "/") } assign(xname. "y".c(name = xname.paste(basename. x$x.reduced" "twomat" Here is the method for expanding the reduced object.0 . this may not be a good idea depending on the mechanism used for removing these objects.c(name = yname.

x$x yn <. the objects are merely marked to be written. Below are some ingredients for testing how much memory is saved. where = xn["db"]. expand(reduce(jjt. Burns v1.2.x" not found Dumped This shows the usefulness of the immediate argument. fjjcomma(memory.0 .reduced"<function(x) { xn <.x$y x$x <. expand(reduce(jjt.x cat("memory before fjjmul2 ". In the example above where an error occurs. immediate = T) x$y <. fjjcomma( memory.size()).twomat. > fjjmul1 function(x) { cat("memory at start ". REDUCE AND EXPAND matrices.class(x)[-1] x } 359 It is a good idea to check that the methods work in the sense that reducing and then expanding gets you back to the same thing. immediate = T) class(x) <. whe..size( )).get(yn["name"].15.equal(jjt. but do not actually exist at the time that expand demands them.: Object "rEdU\ twomat. "\n") ans <.get(xn["name"]. then the objects that the reduce method create are not actually written (in S jargon. "\n") ans } > fjjmul2 S Poetry c 1998 Patrick J. > all. im=F))) Error in get.18201. where = yn["db"].equal(jjt. fjjcomma( memory. not committed) until just before the next S prompt is given. "expand.size()). and peels oﬀ the ﬁrst element of the class.fjjmul2(y) cat("memory after fjjmul2 ".default(x$x["name"]. If immediate is FALSE. "\n") y <. im=T))) [1] T > all.

744 So we have saved one copy of the matrices in this example.475" And now we do the test. 15.321.744 Now.fjjtwomat(100.783.424 memory after fjjmul2 2.360 function(y) { sum("%*%"(expand(y))) } > jjtmbig <.296 memory before fjjmul2 834.0 . starting a new session each time.red) memory at start 633.size(jjtmbig)) [1] "480.296.reduced object. Burns v1. 200) > fjjcomma(object.5 38 Quotations Samuel Taylor Coleridge “The Rime of the Ancient Mariner” S Poetry c 1998 Patrick J.3 Things to Do Revise or add to the reduce. 15.296 memory before fjjmul2 1.000 memory after fjjmul2 2. 15.696 [1] -1066. 200.twomat example so that there is not a problem with stored components existing that are no longer pointed to by a twomat.272 [1] -1066.4 Further Reading Vanity Fair by William Makepeace Thackeray. > fjjmul1(jjtmbig. with the reduced version. > fjjmul1(jjtmbig) memory at start 633.

Proof: The demonstration uses the in-built matrix freeny. Consider the sequence: > attributes(attributes(freeny. I now eliminate the deﬁciency.x))) $names: [1] "names" > attributes(attributes(attributes( + attributes(freeny.x)))) $names: [1] "names" Obviously this continues ad inﬁnitum so the size of the object is bounded below by: > object. 16.Chapter 16 Esoterics Where we sail oﬀ the edge. Theorem 1 An S object with at least one attribute consumes an inﬁnite amount of space.1 Magnitude We have come all this way without a theorem.x.x)))) * Inf 361 . but the particular object—as long as it has an attribute—is immaterial.size(attributes(attributes(attributes( + freeny.x)) $names: [1] "dim" "dimnames" > attributes(attributes(attributes(freeny.

T. It gives the required answer: > 2 * 2 [1] 5 but is clumsy elsewhere. though it goes to the bother of writing a new function. then twice-two-makes-ﬁve is sometimes a most charming little thing. ESOTERICS 16. Mind you. 4) + 1 The next attempt is slightly better.0 . I quite agree that twice-two-makes-four is a most excellent thing. ♠ CHAPTER 16.Internal(e1 * e2. Burns v1.362 [1] Inf Q. My ﬁrst method is rather cheap and unsatisfactory.E.D. e2) . but if we are to give everything its due. > 2 * 0:9 [1] 1 3 5 7 9 11 13 15 17 19 Here’s the adjustment: > get("*") function(e1. Fyodor Dostoevsky Notes from the Underground translated by David Magarshack The object of the game is to make twice two equal ﬁve. dressed-up fellow who stands across your path with arms akimbo and spits at you. > twice(2) [1] 5 > twice(1:9) [1] 2 5 6 8 10 12 14 16 18 Behind the scenes is a little work involving two’s and ﬁve’s: S Poetry c 1998 Patrick J.2 Dostoevsky Twice-two-makes-four is a farcical. too. "do_op".

3 Things to Do Make 2 ∗ 2 = 5.3.0 . 16. THINGS TO DO "twice"<function(x) { 2 * x + dnorm(x .2.16.4 Further Reading Catch-22 by Joseph Heller. Develop a topology for S objects. sd = 1/sqrt(2 * pi)/5)/5 } 363 This deﬁnition of twice gives answers close to the usual except for inputs close to 2. S Poetry c 1998 Patrick J. 16. Burns v1.

0 . ESOTERICS S Poetry c 1998 Patrick J. Burns v1.364 CHAPTER 16.

can be potent if you create the right abstractions. abstraction. q() or just some human sleep 40 365 . Abstraction. So. that redundancy is your enemy. documentation and tests under a source code control scheme. Perhaps I haven’t mentioned abstraction enough. It isn’t hard to get something to happen. distrustful simpleton—then in some measure I have succeeded. lazy. impatient. that artistry in programming is worth seeking—that is. • Find algorithms that are as simple and general as possible. • Write a veriﬁcation suite. • Figure out how to test your code thoroughly. C has become a major programming language because it has efﬁcient abstractions that allow the user eﬀective control of the machine. Unix is a powerful operating system because it has simple but general abstractions. if you have become a proud. • Implement the algorithms as simply and generally as you can. what are the steps of programming? • Think about what functionality is desirable. Then think about how to generalize that. but it can be devilish to get the right thing to happen.Epilogue If I’ve convinced you that abstraction and consistency are your friends. it’s not elves exactly 39 It is no accident that abstraction is given top billing in the previous paragraph. abstraction. Notice that the actual code generation is but a fraction of the time involved (unless perhaps you are just learning the language). • Revise as necessary. • Document your code clearly and completely. Your code. too. • Put your code.

0 .366 Quotations 39 40 Robert Frost “Mending Wall” Robert Frost “After Apple Picking” S Poetry c 1998 Patrick J. Burns v1.

and Sussman. Olshen. and Moore. 1982. L. [13] Chambers. Addison-Wesley. Wadsworth.. and Watts.. [2] Abramowitz. and Stone. J. The New S Language.. R. [5] Becker. Structure and Interpretation of Computer Programs.. A. J. Chambers.. T. Wadsworth. 1988. A. [12] Chachra. [11] Breiman. J. C. M. J. and Chambers. Reading. J. New York.. D. 1986. [4] Bates. New York. Statistics for Long-Memory Processes.. and Wilks. 1992.. Reading. Belmont. A... 1988. Addison-Wesley. J. D. M. H. 367 . R. R. I. and Newman. M.. J.. R. Massachusetts. Wadsworth. Dover. H. California. Springer-Verlag. 1979. J. Ghare. Applications of Graph Theory Algorithms. Chapman and Hall. Handbook of Mathematical Functions. Second Edition. Friedman. P. John Wiley and Sons. Nonlinear Regression Analysis and Its Applications.. and Stegun. 1972. New York. A. Sussman. Chapman and Hall. J. M. J. Massachusetts. More Programming Pearls: confessions of a coder. M. D.Bibliography [1] Abelson. V. New York. M. Extending the S System. Classiﬁcation and Regression Trees. Massachusetts.. and Hastie. [8] Bentley. G. 1984. J. [3] Bak. M. MIT Press. 1984. 1985. 1994. L. Complex Analysis. J. [7] Becker. 1988. J. Chapman and Hall. L. Cambridge. Elvesier North Holland. S: An Interactive Environment for Data Analysis and Graphics. [6] Becker. A. [10] Beran. R. Statistical Models in S. G. and Chambers. [9] Bentley. New York. 1996.. Programming Pearls.

Burns v1. 1995. Introduction to Numerical Analysis. C: A Software Engineering Approach. D. Theoretical and empirical properties of the genetic algorithm as a numerical optimizer. [29] Knuth. [17] Gay. New Jersey. rep. 201 Principles of Software Development. 1973. Hobart Press. Dover. Usage summary for selected optimization routines. Massachusetts. Third Edition. Englewood Cliﬀs. 1987. and Sheehan. [23] Hildebrand. 1988. 1993. M. and Mathematics. Visualizing Data. H. and Plauger. 1996. Dover.368 BIBLIOGRAPHY [14] Cleveland. F. E. Michigan. Seminumerical Algorithms. Matrix Computations.0 . J.. Addison-Wesley. [16] Davis. and Van Loan. and Ritchie. Tech. S Poetry c 1998 Patrick J. E. 1990. [31] Mathews. Reading. Obfuscated C and Other Mysteries. E. ATT Bell Laboratories. Numerical Methods for Computer Science. C. 1973.. P. 296–318. P. 1993. [19] Goldberg. W. New York. M. 2nd Edition. New York. 1975. Johns Hopkins University Press. 1974. W. E. Chapman and Hall. Third Edition. [18] Goldberg. J. New York. 1974. M. T. [20] Golub. D. G. Massachusetts. AddisonWesley. B. B. H. D. Engineering. W. F. Prentice-Hall. Springer-Verlag. Addison-Wesley. pp. B. 80–85. Adaptation in Natural and Artiﬁcial Systems.. [15] Darnell. W. [30] Libes. [24] Holland. 1981.. New York. Maryland. In Proceedings of the 3rd International Conference on Genetic Algorithms (1989). D. 1990. The C Programming Language. A. [27] Kernighan. Generalized Additive Models. 1989. Journal of Statistical Graphics and Computing 4 (1995). [25] Jennison. A. Optimization & Machine Learning. Zen and the art of genetic algorithms. D. and Tibshirani. P. 1997. Numerical Methods for Scientists and Engineers. The Elements of Programming Style.. D. C. New Jersey.. Ann Arbor. University of Michigan Press. [22] Hastie. N. Genetic Algorithms in Search. Englewood Cliﬀs. [26] Kernighan. McGraw-Hill. D. New Jersey. and Margolis. [21] Hamming. Fundamental Algorithms. McGraw-Hill.. Second Edition. Mass. E. New York. R. Reading. Second Edition. John Wiley and Sons. Prentice-Hall. Murray Hill.. J. R. Reading. Baltimore. [28] Knuth.

1948. O’Reilly and Associates. S. W. and Flannery. [34] Nelder... Dover. Modern Applied Statistics with S-Plus. W. and Schwartz. New York. Second Edition. Computer J. L. Second Edition. J.. J. A.0 . Unix Power Tools. 1990. [41] Spector. F. M. R. Vetterling. C. California. B. T. Cambridge University Press.. [40] Seber. R. 1997... Chapman and Hall. [33] McCullagh.BIBLIOGRAPHY 369 [32] MathSoft. W. Burns v1. 1989. Cambridge. and Ripley. 1989. Van Nostrand. A simplex method for function minimization. B. 1996. Learning Perl. A. [38] Ripley. [45] Young. P. R. John Wiley and Sons. A. L. D. T. 308–313. An Introduction to S and S-Plus. Second Edition. How to Solve It. J. New York.. A Survey of Numerical Mathematics.. S.. 1973. Generalized Linear Models. 1996. O’Reilly. CA. Penguin. P. Nonlinear Regression. Teukolsky. G. S-PLUS Programmer’s Manual. Analytic theory of continued fractions. B. New York. [39] Schwartz.. M. [35] Peek. [44] Wall. S Poetry c 1998 Patrick J. Springer-Verlag. T. O’Reilly and Associates. Belmont. Numerical Recipes in C. A. [37] Press. Pattern Recognition and Neural Networks. Cambridge. T. and Loukides. H. and Nelder.. G. Sebastopol. Sebastopol. 1992. MathSoft. L. and Gregory. [36] Polya. Second Edition. California. and Mead. P. and Wild. Sebastopol. [43] Wall. 1993. Wadsworth. Cambridge University Press.. Inc. [42] Venables. H. California. R. O’Reilly and Associates.. 1994. J.. 7 (1965). Second Edition. 1994. Christiansen. Inc. New York. New York. Programming Perl. 1997.

Burns v1.370 BIBLIOGRAPHY S Poetry c 1998 Patrick J.0 .

This can either be speciﬁc as in Euclid’s algorithm for the greatest common divisor. C and Unix. a method of combining the likelihood and the number of parameters so that non-nested models can be compared. Akaike. argument matching 1) Comp. 2) Math. This is a trivial exercise except in languages like S that allow default values for arguments. similar to a macro in C. a recipe for a computation. alias 1) Unix. a symbol used in S. in fractional experimental designs. calculated in S with Arg. due to H. the process of deciding which arguments in a particular call corresponds to the formal arguments of the routine being called. an input into a function or subroutine. a name that expands into some (part of a) command. argument 1) Comp. for example. AIC has been criticized for suggesting models with too many parameters. AIC 1) Stat. a monthly time series of amount sold can be aggregated into an annual series. See also SIC. to combine consecutive observations in a time series.Glossary abscissa 1) Math. 371 . a number that represents a location in memory. numerous and serious bugs are to be expected. 2) Stat. some eﬀects are aliased (or confounded) with others. ampersand 1) the character “&”. a prototype of some software. alpha version 1) Comp. the x-coordinate. the angle of a complex number from the positive real axis. aggregate 1) Stat. alphanumeric 1) the alphabetic characters plus the ten digits. “An Information Criterion”. or be a general way of approaching a problem as in the EM algorithm. algorithm 1) Comp. See beta version. address 1) Comp.

module. an operator ◦ is associative if (x ◦ y ) ◦ z = x ◦ (y ◦ z ). This can refer either to the physical size of the graph. Pronounced “ASSkey”. and represents a (hyper) rectangle of objects. but untidy. null. Burns v1.372 GLOSSARY array 1) S. Also the function that retrieves or changes this list. Performed in S with “<-”. logical. All elements of an atomic vector are of the same type. library.letters The name of the names assignment function is “names<-”. assignment 1) Comp. but many programs (including S) only deal well with the ﬁrst 128 = 27 characters. S Poetry c 1998 Patrick J. an object that is one of the ﬁve modes numeric. a symbol not used in S version 3. See also ragged array. character. Each character uses 8 bits (1 byte). assembly code 1) Comp. asymptotic 1) Stat. or to the units represented by the graph. Examples of attributes include class. Also a function for creating those objects. assignment function 1) S. an object that has a dim attribute. an approximation of a quantity based on the limit as the number of observations goes to inﬁnity. ASCII 1) Comp. complex. the deﬁnition of a function that appears on the left of an assignment. 2) C. the process of giving a name to some entity. association 1) Math. and it associates from right to left if it evaluates as x ◦ (y ◦ z ).middle” would be the assignment function for fjj for class middle. a standard for encoding characters. For example: names(x) <. then “fjj<-.0 . a named list that is a subsidiary part of an object. the ratio of the x-axis to the y-axis. the asymptotic variance of an estimator. atomic 1) S. attributes 1) S. an operator ◦ associates from left to right if x ◦ y ◦ z is evaluated the same as (x ◦ y ) ◦ z . it is an acronym for American Standard Code for Information Interchange. but is used to access attributes in version 4. at sign 1) the character “@”. 2) Comp. an object that is the equivalent to a vector in S. This is done with functions attach. the process of adding a database to the search list. If fjj is a generic assignment function. a program in the machine language for a particular chip—very fast. aspect ratio 1) Comp. dim. For example. names. attach 1) S.

The axes are numbered 1 for the bottom. backquote 1) the symbol ‘. bang 1) Comp.373 audit 1) Comp. background color 1) S. the process of testing the speed of one computer (or piece of software) relative to another. A command can be put in the background by placing an ampersand at the end of the command. the color of a blank graph. but which has only been used by the developers. the state such that old code will run properly using a new version of the software. compare to interactive. S Poetry c 1998 Patrick J. the line in a graph along which one variable is zero. a record of the commands performed. backward-compatibility 1) Comp. a style of computing in which all instructions are submitted and then results are obtained.0 . 3 for the top. binary ﬁle 1) Comp. 2) S. batch 1) Comp.Data/. 2) Comp. Allowing a group of people to use the version in return for reporting bugs and giving general comments is the beta test. slang for the exclamation mark “!”. the information such as tick marks and tick labels that are put along an axis. beta test See beta version.Audit ﬁle. and 4 for the right. A task that is harder than it seems since the relative speed of diﬀerent tasks is often not very similar. in S this is denoted color 0. background 1) Unix. unused in S. An example is an S BATCH job. Generally. but not listening to standard-in. backslash 1) the symbol “ \ ” used as an escape character by many languages. used in Unix to surround commands that are to be interpreted. beta version 1) Comp. bias 1) Stat. Also called “screamer”. benchmark 1) Comp. generally with enough information that all states can be reproduced. the diﬀerence between the mean value of an estimator and the true (unknown) value that is being estimated. a process is in the background if it is running. object code. any ﬁle that is not in ASCII format is said to be a binary ﬁle. a ﬁle that is in a format suitable to some program as opposed to a format for being read by humans. in particular. a version of some software that is thought to be close to release quality. Often there is a trade-oﬀ between bias and variance when choosing an estimator. S keeps such a record in the . axis 1) Math. See also single quote. 2 for the left. Burns v1.

body 1) S. used in S and in C for subscripting. Points farther than d from the box are individually marked. parenthesis. binary tree 1) Math. bootstrap 1) Stat. very popular for programming. box constraint 1) Math. brace 1) one of the symbols “{” or “}”. constraints on a vector such that there is a minimum and a maximum for each element (independently of the other elements). the part of a function where the computations are. starting a simple process that allows a similar but more complex process to work. S Poetry c 1998 Patrick J. blue book 1) S. a relative of a mathematical graph in which each node has zero. bivariate 1) Stat. one or two children (other nodes). Sometimes called “square bracket” to distinguish from the more general use of the term “bracket” that includes braces and parentheses. bracket 1) one of the symbols “[” or “]”. It is the distinction of left and right that makes a binary tree diﬀerent from an ordinary mathematical tree. boxplot 1) Stat. the lines around the plot area of a ﬁgure. There are various deﬁnitions. where d is some number times the length of the box.0 . The “whiskers” stretch to the farthest datapoint that is within distance d of the near end of the box. a situation involving two random variables. acronym for Basic Linear Algebra Subroutine. 2) Comp. Burns v1. the amount of information contained in a binary digit. Becker. If your computer has the BLAS. The smallest case of multivariate. the mark within the box is the median. See also bracket. Chambers and Wilks (1988) The New S Language. 2) Comp.374 GLOSSARY binary number 1) Math. BLAS 1) Comp. Compare linear constraint. a graph indicating the distribution of some data. one of the most common Unix shells. and “left” children are distinguished from “right” children. composed only of the digits 0 and 1. it does not have a case of ennui but rather it has subroutines for doing some linear algebra tasks that are optimized for that machine. bit 1) Comp. but a common one (and the one used in S) is: The ends of the box are the ﬁrst and third quartiles. box 1) S. a binary digit. as opposed to the arguments. used in S and in C to bind several commands into one. a number in base 2. Bourne shell 1) Unix. a way of evaluating the variability of an estimate by resampling the input data.

C shell 1) Unix. Fortran and DOS are not. a way of assessing the realationship between two multivariate sets of data. 2) S. Almost always the term is used to mean something that is time eﬃcient—it can be a memory location. “medium”. “high”}. but not quite the same. cache 1) Comp. a process that appears to be random. a problem with some software or hardware. pejorative for feature. Pseudorandom numbers are an example of chaos. byte 1) Comp. slang for something that is ill-planned and/or illexecuted. “human”}. 3) S. “kitty”. a hash table. 2) Comp. Burns v1. a mode. canonical correlation 1) Stat. case-sensitive 1) Comp. an ASCII character rather like newline. category 1) Stat. bug 1) Comp. ceiling 1) Math. a storage area that is treated specially in some way.0 . a program is case-sensitive if capital letters are considered to be distinct from lower-case letters. central processing unit 1) Comp. a popular shell for interactive use. The levels can be ordered as in {“low”. in most cases consisting of 8 bits.375 brain-dead 1) Comp. a component or attribute of many S objects that is the call that created the object of which it is a part (the mode of this component or attribute is generally call). less popular for programming. a chunk of memory set aside for a speciﬁc purpose. buﬀer 1) Comp. but is really deterministic. call 1) S. Unix and C are casesensitive. the computer chip that does the actual computation. a lookup table. chaos 1) Math. Compare to continuous variable. or not as in {“corgi”. carriage return 1) Comp. S Poetry c 1998 Patrick J. Compare ﬂoor. Abbreviated as “CPU”. the smallest integer greater than or equal to a number. the action of invoking a function. an object of mode call contains the information for a speciﬁc call to a function. etc. then the buﬀer is marked empty so it can accept more characters. For example printed output in C uses a buﬀer—characters to be printed are put in the buﬀer and printing actually takes place when the buﬀer is full. an object that represents a category. a variable that takes on a ﬁnite number of values and the values are generally non-numeric. the preferred type of object is one that inherits from class factor. S. 2) S. the basic unit of memory for a computer.

the conversion of an object from one concept to another. characteristic 1) Comp. the storage order of the elements of a matrix in which all of the elements of the ﬁrst column are together.0 . Compare with permutation. combination 1) Math. a statistical model that describes a categorical variable by means of a recursive partition of the explanatory variables (where each partition is of a single explanatory variable). command 1) a non-technical term—something that will make the computer do something. a selection without replacement of a certain number of elements from a ﬁnite set where the order of the selection is ignored. the occurrence of more than one value being encoded to the same location in a hash table. a memory frame is the child of the frame that caused it to come into existence. each element of a character vector is a character string of arbitrary length.376 GLOSSARY character 1) S. coercion 1) Comp. Frame 1 is no one’s child. column-major order 1) Comp. command line editing 1) Comp. the class attribute of an object determines which method of a generic function is dispatched when the object is a particular argument in the function call. or through the -e ﬂag to S-PLUS. the ability to recall past commands. collision 1) Comp. a process is the child of the process that spawned it. an atomic mode. characteristic function 1) Stat. this is called “unsupervised learning” in the machine-learning literature. a technique for grouping datapoints by their similarity. clustering 1) Stat. such as from character to numeric. Burns v1. In S this can be accomplished with S-mode for emacs. S and Fortran store matrices (and higher dimensional arrays) in column-major order. edit them and execute the modiﬁed command. 2) see eigen. class 1) S. The other part is the mantissa. classiﬁcation tree 1) Stat. an attribute. See also regression tree. One use is to compute moments of the distribution. colon 1) the character “:” which is used in S to produce sequences. Fit in S with tree. Compare to row-major order. the exponent of a number stored in ﬂoating-point. The binomial coeﬃcient gives the number of combinations. the Fourier transform of the density function of a random variable. etc. or from double-precision ﬂoating-point to single-precision ﬂoating-point. S Poetry c 1998 Patrick J. 2) Unix. followed by the elements of the second column. child 1) S.

an item in a list or other recursive object. an atomic mode. conditional distribution 1) Stat. compile 1) Comp. Compacting usually occurs only inside of loops. the distribution of height given that weight is less than some speciﬁc amount. which often is not until S is about to give a prompt rather than at the time the assignment is made. text following a pound sign on a line is a comment. an object of this mode contains complex numbers. >=. compacting 1) S. commute 1) Math. small numbers mean good results. a number which is a real number plus a real number times the square root of −1. as opposed to when it is used which is called run-time. The result is a ﬁle of object code. <. the act of converting source code into the corresponding instructions for a particular computer chip and operating system. the time at which a program was compiled. text in software that is ignored in computations. By convention.0 . the condition number for inverting a matrix is often deﬁned as the ratio of largest to smallest singular value. and large or inﬁnite values mean poor results. a probability distribution that assumes some condition holds. For example. >. in which the source code is compiled into object code which can then be executed. condition number 1) Comp. The length of a list is the number of components it has. Compare to interpreted language. an assignment of an object to a permanent location is committed when it is actually written to disk. <=. complex 1) S. !=. The square root of −1 is often denoted “i”. term used to mean garbage collection. Burns v1. a language. component 1) S. C or Fortran for example. complex number 1) Math. For example. which is the case in S. Compare with element. commitment 1) S.377 comment 1) Comp. an operator ◦ commutes if x ◦ y = y ◦ x. In S. an operator that tests equality or inequality. compile-time 1) Comp. comparison operator 1) Comp. S Poetry c 1998 Patrick J. and is used to inform humans reading the code. a number that states the degree of ill-conditioning in a computation. In the interim S acts as if the object were in the speciﬁed location even though it isn’t physically there. compiled language 1) Comp. The comparison operators in S are ==.

conjugate 1) Math. a plot of some sort that is conditional on a value or set of values of a variable that is not represented in the plot.378 GLOSSARY conditioning plot 1) Comp. contingency table 1) Stat. More variables means higher dimensional arrays. an object—like the number 5. a graphic that shows the value of a function of two variables by drawing lines of equal value of the function. then you won’t be able to discern the health eﬀects of one of the drugs. when a categorical variable is being modeled. Compare to category. constraint 1) Math. a matrix that is a contingency table of the true categories versus the predicted categories from the model of the data. continuous variable 1) Stat. an object that is ﬁxed. in Bayesian statistics there are conjugate distributions. an attribute or component of some objects representing a statistical model that shows the set of contrasts that was used in the ﬁtting process. continuation 1) S. so a − bi is the conjugate of a + bi. contrast 1) Stat. constant 1) S. a condition placed on a problem. confounded 1) Stat.0 . a variable that can (conceptually) take on any value in some range of the real numbers. a number is the (complex) conjugate of another if it has the same real part but the opposite imaginary part. 2) where you eat when your kitchen table has a broken leg. a table of counts of the number of occurrences in the combination of levels of two or more categorical variables. confusion matrix 1) Stat. a coeﬃcient vector that sums to zero. Computed in S with Conj. 2) S. These are created in S with table.13—that is ﬁxed. then the table is represented as a matrix. two eﬀects are confounded if the data at hand can not distinguish if it is only one or the other that has inﬂuence on the response. a means of extending a function over more of the complex plane. Important because they are orthogonal to the vector of all 1’s which corresponds to the mean. contour plot 1) Comp. not just totally confounded versus not confounded at all. Two common classes of constraint are box constraint and linear constraint.) S Poetry c 1998 Patrick J. The coplot function does this in S. 2) Math. 2) Math. When there are two variables. (These are not necessarily contrasts in the sense of meaning 1. Confounding is often a matter of degree. Burns v1. if you only observe people who either use both alcohol and tobacco or use neither. but not necesarily known. an incomplete statement may be given to S and the continuation of the command given on one or more additional lines. For example. 2) Stat. The default continuation prompt is “+ ”. meaning that the computations simplify when the two distributions are used together as prior and likelihood.

cursor 1) Comp. control operator 1) S. QR decomposition. The problem was ﬁxed by “debugging” the relay that failed. Compare to matrix.0 . [The entomology of this word is reputed to be that a moth ﬂew into the Mark I computer in 1945. cross-validation 1) Stat. decomposition 1) Math. and often using SQL. the available observations are randomly divided into 10 groups. a program designed to store information—generally in the form of ﬁelds and records. the act of keeping a series of backups of objects such as ﬁles of code or documentation.379 control 1) Comp. the marker on a computer screen that indicates where input is to go. the action when using a window system of marking some text in one spot and making that text appear in another spot. the process of tracking down and ﬁxing a problem. CPU 1) Comp. or the spot where the mouse is pointing. For example. the speciﬁcation of the type of object that a variable is to be. If this story isn’t true. data frame 1) S. This is done in compiled languages like C and Fortran. respectively. Read carefully when this term comes up—there are some confusing uses of it. declaration 1) Comp. S Poetry c 1998 Patrick J.frame that represents a rectangle of values in which the type of value may be diﬀerent from column to column. cut and paste 1) Comp. one of the operators && or || which perform nonvectorized “and” and “or”. it should be. but essentially non-existent in S. slang for the unintentional or unexpected halt of a program or of hardware. crash 1) Comp.] decimal number 1) Math. the value of an optional argument that is not given in the call. for example. a process of estimating the quality of a model by comparing observations left out of the ﬁtting process with the prediction of that ﬁt for the observations. crashing the machine. an item on the search list. or something that might be. 2) Comp. a number written in base ten. database 1) S. an object of class data. debug 1) Comp. In Unix this is done with SCCS or RCS (and probably other methods). Burns v1. writing a matrix as the product of two or more other matrices of special types. default 1) S. the ﬁtting is performed 10 times with a diﬀerent group dropped each time. acronym for central processing unit.

deviance 1) Stat.380 GLOSSARY density 1) Stat. The form of the measurement depends on the assumed error distribution.”] dereference 1) C. dim 1) S. then the density is such that the integral between any two points of the density is the probability of the distribution landing in that interval. 3) Stat. distribution-free test 1) Stat. distribution function 1) Stat. the slope of a function at any given point. the act of getting the value pointed to by a pointer. a matrix that is zero except on the diagonal (where the row number is equal to the column number). Some times called “nonparametric”. a dependent variable is a function of one or more other variables. 2) Math. a response in a statistical model. directory 1) Comp. a function of x that is the probability that a random variable will be less than or equal to x. [I believe the name derives from “double-ended queue. a class of hypothesis tests which use ranks of the data. deparse 1) S. diagonal matrix 1) Math. Also the function that retrieves or changes this attribute. statistics and graphics intended to show how well a statistical model ﬁts the data—an important part of the modeling process. S Poetry c 1998 Patrick J. distribution 1) Stat. a function that deﬁnes a probability distribution. derivative 1) Math. If the distribution is continuous. Burns v1. a data structure similar to a stack or a queue except that both adding and deleting are allowed at both ends. diagnostics 1) Stat. then the density at a point is the probability of the distribution taking that value. An example is the Wilcoxon signed-rank test. If the distribution is discrete. This is in analogy to meaning 2. dependent 1) Stat.0 . to change a language object into one or more character strings that represents the object. two random variables are dependent if knowing information about one helps predict the other. deque 1) Comp. the attribute that describes the shape of an array. a description of how a particular random variable behaves. a location that contains ﬁles and (possibly) sub-directories. a measure of the distance of the data from its ﬁt to a model. but confusing because of meaning 1. done with the ∗ operator.

2) Comp. S Poetry c 1998 Patrick J. 2) S. eﬀect 1) Stat. echo 1) Comp. double-precision 1) Comp. dynamic graphics 1) Comp. See also backquote and single quote. These are created in S with dotchart. a technique for optimizing over n time points. edge 1) Math. eﬃciency 1) Comp. This generally goes to the object last. dot product 1) Math. usually consisting of 8 bytes. Compare static load. Given the state at time n − 1. twice the estimated coeﬃcient for a factor or interaction. so work backwards through time to the present. a plot that displays information as plotting symbols that are each some distance along their respective lines—used as an alternative to barplots and pie charts. some measure of the amount of computing time. 3) Comp. the act of the computer repeating input so it can be seen. Passwords are not echoed when they are typed in. graphics that change spontaneously or from input by the user. n u·v = i=1 ui vi where n is the length of the vectors. This is not the same as deﬁnition 1. computation performed with double-precision numbers. 2) S.0 . usually to display high-dimensional data.dump (these two representations are distinctly diﬀerent). the best decision to take is computable. the action of representing the state of S when an error occurs.dump. the variance of an estimator (usually) measured relative to the variance of the best possible estimator given the model. in 2k designs. memory or other resources that a program uses. a ﬂoating-point number.load and its relatives. a component of an object of class aov that is a vector of orthogonal single degree-of-freedom values. a part of a mathematical graph that connects two nodes. a number which is the sum of the product of corresponding elements of two vectors. Burns v1. the process of adding code to a program that is already running. double quote 1) the character ".381 dot plot 1) Comp. dump 1) S. This is performed in S with dyn. This can be done in S via sum(u*v). dynamic load 1) Comp. an error action as in a core dump. 2) Stat. an ASCII representation of an S object created by dump or data. dynamic programming 1) Comp.

0 . an approach to estimating a statistical model in which there is some form of missing data. eigenvector See eigenvalue. environment variable children. to temporarily leave a program to give an operating system command. then the model likelhood is maximized given these values (the M step). a bug. it does not mean “mistake”. Most statistical models specify the distribution of the errors. that is. See dump (meaning 2) for what happens in S. The length of an atomic vector is the number of elements that it contains. will perform some action. see memory frame. an item of an atomic vector. Version 3 of S knows nothing about events. done in S with the ! form. after the command has been parsed. See also numerical error . then all of the eigenvalues are real-valued. executable 1) Comp. (adjective) the condition when a ﬁle can be executed. (noun) the ﬁle that executes a program. ellipsis 1) the set of characters “. S Poetry c 1998 Patrick J. 2) Comp. Burns v1. If A is real-valued and symmetric.382 GLOSSARY eigenvalue 1) Math. Convergence can be slow. evaluation 1) S. Sqpe is the executable of S. EM algorithm 1) Stat. the actions that happen when a program hits an error condition. When an equation of the form Ax = λx is satisﬁed. a random disturbance. non-technically. such as a mouse click or the cursor entering a certain region. a character that allows the following character (or set of characters) to have a meaning other than usual. Eigenvalues are sometimes referred to as “characteristic values”. escape 1) Comp.” error 1) Stat. also known as three-dots (which see). but the code can be easier to write than alternatives (if there are any). element 1) S. event 1) Comp. evaluation frame 1) S. . acronym for “End Of File.”. error handling 1) Comp. 1) Unix. and x is an eigenvector of A. the process of doing the actual computation. The expected value of the missing data is taken (the E step). variable that a process passes on to its EOF 1) Comp. 2) Comp. but version 4 does. In this sense of the word. 3) Comp. . In S the back-slash is the escape character in character strings. a condition that causes some program to stop execution—see error handling and warning. an action that happens on a computer. then λ is an eigenvalue of the square matrix A. 2) Comp. or the diﬀerence between the data and a model of the data. These two steps are iterated until convergence.

acronym for Frequently Asked Questions. lgamma(n+1) is usually more useful. expression 1) S. area of an ASCII ﬁle that is to be read as a unit. a similar concept in any database. ﬁgure 1) S. one or more ﬁelds comprise a record. but doesn’t know in which direction to go. and each record would correspond to a diﬀerent individual.” exponential notation 1) the same as scientiﬁc notation. 3) Stat. used non-technically to mean a command or part of a command. First Out”. extraction See subscript. ﬁeld 1) Comp.” factorial 1) Math. S Poetry c 1998 Patrick J. feature 1) Comp. all of which are surrounded by the outer margin. an object of the operating system that holds information (and often has a name). 4) Math. FILO 1) Comp. a variable in a statistical model that is used to approximate the response. Last Out”. 2) Stat. See also regular expression. a categorical variable in an experimental design (hence meaning 1). acronym for “First In.383 expected value 1) Stat. is the product of the integers from 1 to n. In S this is found with gamma(n+1). state of a derivative-based optimizer in which it knows that it has not found an optimum. class of an object that represents a categorical variable. Burns v1. ﬁle 1) Comp. species and weight. n-factorial. FIFO 1) Comp. written n!. factor 1) S. a stack. For example. a recursive mode in S that is the parsed form of one or more commands. false convergence 1) Comp. but the surest way to achieve it is to get the sign of the gradient wrong. the mean of a probability distribution. a factor analysis model. a (hypothetical) latent variable in. a class of object used in glm and gam that describes the link function and error distribution. 2) Comp. This can be caused by various forms of ill-conditioning. euphemism for bug. family 1) S. There can be one or more ﬁgures in a frame. a queue. for example. 2) Comp. FAQ 1) Comp.0 . conceptual area of a graphics frame that contains a plot area surrounded by a margin. acronym for “First In. there could be two ﬁelds. as in “an integer can be factored uniquely as the product of prime numbers. This is also known as a “predictor” or the confusing “independent variable. explanatory variable 1) Stat. Because the numbers quickly get large. relating to a product.

The computations are performed when a call to the function is evaluated. or very speciﬁc. the most common probability distribution in Statistics.0 . a computer representation of a real number composed of a fractional part. and hence numerical error will usually result during computations. as in “computer modern font”. a logical variable that signals a certain condition. an optional argument to a Unix command. garbage collection 1) Comp. Burns v1. frame 1) S. but would not be considered one by a functional language purist. there are (at least) three types of “frame” in S: memory frame (also called “evaluation frame”).names in c. data frame. ﬂoating-point number 1) Comp. a correspondence from one set into another set. class of object that results from an expression that uses the tilde operator. as in “10 point italic computer modern font. mode of an object that deﬁnes computations. the act of freeing up memory that is no longer being used. 2) Comp. 2) Math. S Poetry c 1998 Patrick J. functional language 1) Comp. For example the “t” in ls -t. Gaussian distribution 1) Stat. This is opposed to a ﬁle which divides the ﬁelds by a delimitor such as a semicolon. a style of ASCII ﬁle in which the ﬁelds are each a speciﬁed number of characters and in a speciﬁc location relative to the left border.” formula 1) S. has. ﬂoor 1) Math. in programming. a set of descriptions of how characters are to be represented. few numbers are represented exactly. a class of computer languages that avoids side eﬀects.rationalnum on page 251 is a ﬂag. Also known as “Normal” and “bell curve”. ﬂag 1) Unix. the condition of RAM in which there are numerous unused portions. Operators generally have special meanings within formulas. the largest integer less than or equal to a number. S is essentially a functional language. introduced by a dash. Compare ceiling. For example. font 1) Comp. ﬁxed-width font 1) Comp. This can be either rather general. making the total use of memory greater. Used to provide an argument of arbitrary complexity. function 1) S. fragmented memory 1) Comp. and graphics frame. Because the representation is ﬁnite. a font in which each character is the same width. called the mantissa and an exponent. called the characteristic.384 GLOSSARY ﬁxed format 1) Comp.

graphics frame 1) S. some create a window and draw graphics in the window. 2) Comp. gradient 1) Math. global variable 1) Comp. The error distribution can be one of several. The error distribution can be one of several. gigabyte 1) Comp. or that describes the state of the graphics device. string or logical value that controls some aspect of how a graph will look. The Motif ﬂavor of X-Windows is an example. a program that renders graphics in some form. 2) S. etc. genetic algorithm 1) Comp. and an outer margin. graphics parameter 1) S.385 generalized additive model 1) Stat. (or “graphics”) a pictorial representation of information. graphical user interface 1) Comp. generally consisting of windows. if the graphics are printed. a system of how human and computer communicate. an S function that starts such a program. 230 bytes. (in S terminology) a variable that is not created locally. same as graphics parameter. A particular type of graph is a tree. A frame contains one or more ﬁgures. For example. The class of the relevant argument(s) determines which method of the generic function is used. graphics device 1) S. graph 1) Math. a statistical model in which the response is approximated by a link function of the sum of nonlinear smooths of each explanatory variable. menus. Fit in S with glm. the “Choppy Water” chapter contains a number of gotchas. and is not passed in as an argument. Fit in S with gam. a statistical model in which the response is approximated by a link function of a linear combination of the explanatory variables. gotcha 1) Short for “got you”. a form of random algorithm used for optimization that imitates a natural selection process. 109 bytes. 2) Comp. a number. Burns v1. graphical parameter 1) S. vector which is the ﬁrst derivative of a multivariate function. generic function 1) S. The par function queries and changes graphics parameters. generalized linear model 1) Stat.0 . a function whose behavior is determined by the class of one or more of its arguments. In S you must start a graphics device before creating graphics. this is equivalent to a single page. S Poetry c 1998 Patrick J. The postscript device translates the graphics into PostScript and then sends it to a printer or a ﬁle. a collection of nodes (also called vertices) and edges which each connect two nodes.

a part of a program that is not easily changed. greedy algorithm 1) Comp. y ˆ = Hy . hard-coded 1) Comp. S Poetry c 1998 Patrick J.” hash table 1) Comp. the hat matrix times the observation vector equals the ﬁtted observations. 2) Comp. GUI 1) Comp. often speciﬁcally the MLE. hat matrix 1) Stat. a person who “hacks” (meaning 1). a kludge (meaning 1). but for lots of problems it will not.386 GLOSSARY gray scale 1) Comp. The operation performed by * on matrices in S (the %*% operator is used for matrix multiplication). hang 1) Comp. 3) Comp. to illegally enter (or attempt to enter) a computer. a collection of generic functions that may have one method written for all of them. element by element product of matrices. the computations take muxh longer than expected. For example. Used in Statistics to mean an estimate. slang for programmer. hacker 1) Comp. For some problems this will produce the true optimum. This provides fast access to any record for which the key is known. Another example is a concept in a program that is always used as a constant rather than abstracted into a variable. and the inverse of every element in the set is also in the set. hash symbol 1) the character “#”. 2) Comp. (not strictly an algorithm.” Hadamard product 1) Math. the identity for the operation is in the set.0 . as in “S hangs when I execute this command. It is mainly the diagonal of the hat matrix that is of interest in diagnostics. but more a description of some algorithms) an optimization algorithm that always does the best it can on each step. acronym for Graphical User Interface. group 1) S. also called “pound sign. Burns v1. hat 1) slang for the caret character. hack 1) Comp.Internal function in S is hard-coded since it can not be changed by users.ˆ . that is. said of a computer or program when nothing happens any more. the matrix (in linear regression) that puts the “hat” on the vector of observations. to discover how some undocumented program works by experimenting with it. 2) Math. the functionality performed in a .” This is probably caused by one of three things: it is not doing anything. as in “S hacker. a table of values in which each record has a key on which a mathematical operation is performed to give an address for the record. a set for which there is an associative operation. but does not look ahead. there is an inﬁnite loop. the rendering of values by means of diﬀerent shades of gray. This can be done in S-PLUS with image.

3) Stat. as are some matrices. For instance. “line. Also called “self-adjoint. referring to base 16.” For real-valued matrices.” idempotent 1) Math. a matrix that is equal to the transpose of its complex conjugate. 2) Math. this can be approximated using the Hessian at the optimum. two distributions are independent if information on one gives you no information about the other.h ﬁle. In maximum likelihood estimation. the term “independent variable ” is used to mean an explanatory variable in analogy with meaning 2. an element or component number. high-level graphics function 1) S.387 header ﬁle 1) C. Hessian 1) Math. index 1) S. histogram 1) Stat. The S.h” that contains deﬁnitions of general use—intended to be included in code. high-level graphics parameter 1) S. this is the same as symmetric. matrix which is the second derivative of a multivariate function. a ﬁle usually ending in “. 0 and 1 are idempotent. a form of abstraction in which a variable is created to avoid hard-coding something. creating a variable for the ﬁrst part of a path to a ﬁle. 4 is the index into x. indirection 1) Comp. which can be used for C code to be called by S. hexadecimal 1) Math. matrix that measures the variability of estimated parameters. help ﬁle 1) S. an object such that the product of itself with itself equals itself. but confusing because of meaning 1.integral has a hook for additional methods of integration. a graphics parameter that may not be given to par. In the S command x[4]. Mathematically this is expressed by the density of the joint distribution being equal to the product of the densities of the marginal distributions. a plot showing the distribution of a variable by means of boxes that show the count of observations in a number of adjacent intervals. The corresponding term in Unix is “man page”. Hermitian matrix 1) Math. and then creates a graph. is an example. independent 1) Stat. a feature of a program that allows some functionality to be easily added later if desired. hook 1) Comp. information matrix 1) Stat. As in. a function that makes a new graphics frame.0 . Created in S by hist. Burns v1. the on-line documentation on individual objects. an independent variable is in the domain of a function. S Poetry c 1998 Patrick J.

a language that is not compiled. also “I/O”. one pass through an iterative computation. performing an operation repetitively in a loop while some things change. interquartile range 1) Stat. a value known to be an integer. For example. acronym for the interquartile range. a multivariate probability distribution of all of the variables in question. a value that is not printed automatically. so that the interpretation into machine instructions is performed on the ﬂy. such as control-backslash. Compare to ﬂoating-point. . 2) Comp. K 1) Comp. interpreted language 1) Comp. Control-c is the usual key binding for the interrupt signal. and then asks for more input. S Poetry c 1998 Patrick J. ±2. 2) Comp. Opposed to batch. to place text in a certain alignment. S is an interactive language since it takes input from the user. one of the numbers 0. a signal that tells a program to quit doing what it is doing at the moment.388 GLOSSARY inheritance 1) Comp. IQR 1) Stat. . julian date 1) Comp. integer programming 1) Comp. justify 1) Comp. iteration 1) Comp. responds to it. sometimes a vector with storage mode integer. the feature of object-oriented programming in which a subclass of objects can use some of the methods for the main class. an integer that represents the number of days from a certain date. functionality related to information entering or leaving a program. integer 1) Math. a program that converses with the user. Sometimes “interrupt” is used in a less technical sense to include other signals as well. In S inheritance results when a class attribute has length greater than one. joint distribution 1) Stat. input/output 1) Comp. an optimization problem with (linear) constraints in which the elements of the answer must be integer-valued. Compare with marginal distribution. sometimes a ﬂoating-point number that is logically integer. ±1. slang for kilobyte.0 . left justify means to place along the left margin. . Compare compiled language. Compare to recursion. interrupt 1) Comp. invisible 1) S. 3) S. the distance between the ﬁrst quartile (25th percentile) and the third quartile (75th percentile) of a probability distribution or empirical sample. S is an interpreted language. interactive 1) Comp. Burns v1.

for example. S Poetry c 1998 Patrick J.” kilobyte 1) Comp. length 1) S.389 kernel 1) Unix. lazy evaluation 1) Comp. but has other statistical deﬁciencies and is a little more expensive to compute. least absolute deviations 1) Stat. Also known as LAD. available from netlib. a fundamental property of an S object. least sum of absolute errors. knap-sack problem 1) Math.0 . but it often fails to be statistically robust. strategy in which objects are not actually evaluated until necessary. natural language. the part that is the same no matter what shell is used. Useful because of its computational simplicity in many situations. LSAE. it’s sort of cute. a part of a graph that explains the meaning of the symbols used in the main part of the graph. LAPACK 1) Comp. The S-PLUS kronecker function generalizes the concept beyond multiplication. a speciﬁc class of optimization problem. language 1) S. 2) Stat. a piece of programming that is so ugly. the heart of the Unix system. principle of estimation in which the sum of the absolute value of the estimated errors is minimized. kludge 1) Comp. least squares 1) Stat. 1024 = 210 bytes. The prototypical problem is to maximize the value of what goes into your knap-sack while keeping the weight below a certain limit. interpreted language. a piece of programming that is ugly. 2) S. This typically overcomes some of the statistical robustness problems of least squares. 1000 bytes. the association of a key or key combination on the keyboard to a speciﬁc meaning such as “erase” or “interrupt. a generic function that creates labels for various objects. Burns v1. Kronecker product 1) Math. a product of two matrices in which the result has dimensions that are the products of the dimensions of the matrices being multiplied. See also functional language. an argument to factor and ordered that controls the value of the levels attribute of the output. legend 1) Comp. it states how many numbers the object contains. and its theoretical optimality given a Gaussian distribution. a package of Fortran routines for performing linear algebra. a function that produces the weights for a moving average-type (kernel) smooth. S uses lazy evaluation. principle of estimation where the sum of squares of the estimated errors is minimized. 2) Comp. L1 . key binding 1) Comp. In numeric vectors. 2) Comp. Some LAPACK symbols are in S-PLUS. an object that possesses one of several modes that correspond to the language itself rather than to data. compiled language. labels 1) S.

load 1) Comp. “3. Abbreviated as “LP”. 2) Stat. A user’s object code is linked against a library to get symbols that are deﬁned in the library. a set of compiled routines. to combine ﬁles of object code together so that symbols from one that are used in another are resolved. For example. 2) Comp. such as generalized additive models. in a generalized linear model. an optimization problem in which the objective function is a linear function of the variables. a ﬁle name that points to a ﬁle with another name. A bug of omission.1” is a literal for the number 3. the function of the linear combination of the explanatory variables that approximates the expectation of the response. likelihood 1) Stat. static load. See dynamic load. linear programming 1) Comp. the density of a statistical model thought of as a function of the parameters of the distribution with the data known. The labels argument to factor and ordered can be used to control the resulting levels attribute.390 GLOSSARY levels 1) S. as opposed to a variable. a constraint on vectors x that can be written as cx < a or cx ≤ a or cx = a. linked list 1) Comp. the value of the likelihood (previous meaning) or its logarithm when it is maximized. or to save space. linear constraint 1) Math. This can be done statically or dynamically.1.0 . library 1) S. something that a program might be expected to do but doesn’t. A function that performs a similar task in other models. link 1) Comp. This can be used for abstraction. A particularly simple and common set of linear constraints is a box constraint. attribute of a categorical object that is a character vector containing the possible values that the variable can take on. Specializations include doubly linked lists in which each item has a pointer to the previous item as well as the next item. in a language. limitation 1) Comp. link function 1) Stat. 2) Unix. and circular lists in which the ﬁnal item has a pointer to the ﬁrst item. The maximum likelihood estimate is the value of the parameters that maximizes this function. and there are linear constraints on the variables. a data structure in which each item in the list contains a pointer to the location of the next item. a directory containing S objects that is set up so that the library function can attach it. to make compiled routines available to a program by making sure that the symbol table of the program contains the routines. S Poetry c 1998 Patrick J. Burns v1. text that represents some speciﬁc thing. literal 1) Comp.

masking 1) S. Axis labels and titles are in the margin. macro 1) C. excludes data frames. 2) S. the conceptual part of a graph that surrounds the plot area but is within a ﬁgure. the univariate probability distribution of a variable (within a multivariate context). (This is the deﬁnition is. an object masks another object if they have the same name (and sometimes the same mode).) 3) S. See also outer margin. The order of bits for ASCII characters is the same on all computers. a table of values used in a computing strategy in which values are stored and retrieved rather than being computed each time. an object that has a dim attribute of length two and is also an array.) S Poetry c 1998 Patrick J. a probability distribution that has a higher probability of points far from the mean than a Gaussian distribution with the same variance. lookup table 1) Comp. but adds to a graph instead. Such distributions are problematic for least squares estimators. LP 1) Comp. The exponential part is called the characteristic. something that changes between types of computers. axis. abline. matrix 1) Math.0 . man page 1) Unix. long-tailed distribution 1) Stat.matrix uses. acronym for linear programming. Examples include points. Compare with joint distribution. Burns v1. an object that has a dim of length two. and the ﬁrst is earlier on the search list than the second. so ASCII is not machine dependent (which is the main point of ASCII). (This is the deﬁnition as. For example. something that is True or False.391 logical 1) Comp. includes data frames. mantissa 1) Comp. margin 1) S. the order of bits in a ﬂoating-point number is diﬀerent on some machines than on others. machine dependent 1) Comp. In S a third logical value is NA. an atomic mode. the fractional part of a number in a ﬂoating-point number. marginal distribution 1) Stat. mathematical graph 1) Math. a deﬁnition that is substituted into the code in the preprocessing. 2) S. See graph. On-line help for a Unix command. a function that does not switch to a new graphics frame. low-level graphics function 1) S.matrix uses. a rectangular display of numbers that can be treated as a single entity.

a device in interactive computing that provides a number of choices to the user from which the user is to select. mean 1) Stat. 1. the collection of values that are local to a particular function during an evaluation. See also not-a-number. a measure of the central location of a probability distribution. Burns v1. Also called “evaluation frame. acronym for Maximum Likelihood Estimate. slang for megabyte. This is often RAM. mode 1) S. storage of objects. method 1) S. or to least absolute deviations. it is the 50th percentile. for empirical data it is the sum of the observations divided by the number of observations. a principle of estimation which uses the most probable value of a set of parameters given the data and a statistical model. missing value 1) S. One way to estimate this for continuous distributions is to ﬁnd the maximum of a density estimate of the data.000.0 . at least half the data are bigger than or equal to the median. MLE 1) Stat. a value written NA that represents a value that is unknown. 2) Stat.000 bytes. For empirical data.” menu 1) Comp. In some circumstances.” megabyte 1) Comp. missing 1) S. memory frame 1) S. models exist such that the maximum likelihood estimator is equivalent to least squares. The median is not uniquely deﬁned for all datasets. a function that performs the computation of a generic function for a speciﬁc class. 220 bytes. but may be swap-space. memory 1) Comp. S Poetry c 1998 Patrick J. “My S function gobbled 55 megs. It is a least absolute deviation estimate for a single sample. a fundamental property of an S object which states what type of object it is. a measure of the central location of a probability distribution. median 1) Stat. etc.392 GLOSSARY maximum likelihood estimation 1) Stat. mode of an object that is an argument which was not included in the call. micro.1) a preﬁx meanining one-millionth (10−6 ). and at least half are smaller than or equal to it. As in. Almost every function call results in a new frame as it is evaluated. For a single sample it is a least squares estimate. 2) Comp. the most likely value of a probability distribution. meg 1) Comp. disk-space. For discrete distributions use table.

P. equivalent to the absolute value for real numbers. Box says. Also the function that retrieves or sets this attribute.393 model 1) Stat. NA 1) S. Not to be confused with the names attribute. Unix arranges for its processes to be multitasked. NaN 1) Comp. how missing values and not-a-numbers are printed.nan. an attribute of many objects that labels the elements or components of the object. a common way that not-a-number is printed. some models are useful. pertaining to more than one variable. as in “π seconds are a nanocentury. a hypothetical picture of how some particular set of data is generated. unary. a synonym for simulation. evaluation by means of a random mechanism. Basically.0 . In S the Mod function returns this. Examples are English. mode of an object that is a variable within an expression. E. a multivariate integral can be approximated by sampling uniformly over the integration space and averaging the function values of those points—this is Monte Carlo integration. G. Opposed to univariate. discipline in programming that allows pieces of the code to work independently of each other. nano.1) a preﬁx meaning one-billionth (10−9 ). Using the same resources for more than one job by a time-sharing mechanism. Monte Carlo 1) Stat. usually consisting of a functional description plus a probability distribution for the errors from the functional part. multivariate 1) Stat. For example. but includes bivariate. but can be distinguished from ordinary missing values with is. French and Northern Sahaptin. multitasking 1) Comp. a number which can be thought of as the remainder after a division. modulo 1) Math. The %% operator performs this in S. Burns v1. name 1) S. names 1) S. “All models are wrong.” modular programming 1) Comp.” natural language 1) a language that humans have developed to communicate with each other. S Poetry c 1998 Patrick J. modulus 1) Math. Opposed to computer language. monadic 1) Math. the distance of a complex number from the origin. In S they are printed NA.

a number which gives the “size” of something. the approximation of an integral by some means. numeric 1) S. S Poetry c 1998 Patrick J. newline 1) Comp.394 GLOSSARY netlib 1) Comp. the deﬁnition is a convoluted statement involving polynomial-time algorithms. and inﬁnity minus inﬁnity are examples. There are various norms for vectors. null 1) S. The world wide web address is http://www.nan to distinguish where not-a-numbers are. Often printed as NaN. It is diﬀerent than carriage return. it is an object. This contains the storage modes double. the diﬀerence between an actual result as computed compared to what the answer would be if computations were performed with numbers of inﬁnite precision. Also called “vertex”. one of a class of “numbers” that represents a limit that does not exist. Much of the software is numeric. node 1) Math. numerical integration 1) Comp. nonparametric 1) Stat. not-a-number 1) Comp. a means of estimation that does not depend on optimizing parameters.0 . almost everything in S is an object—if it has a mode. on-line repository of a large amount of software and other information. Burns v1. S has the function is. To be sure that you have the real solution is too hard in all but the smallest of problems. The practical eﬀect is that if you have a problem that is NP-complete. many smooths are nonparametric. Note that “error” here is not used in the sense of a bug. Not a trivial task since it involves dividing by approximately zero. but in S printed as NA. and for matrices. numerical error 1) Comp. NP-complete 1) Comp. approximating the derivative of a function at speciﬁc locations by performing arithmetic on function values. norm 1) Math. mode of the object of zero length that is printed NULL. numerical diﬀerentiation 1) Comp. single and integer. then the best you are likely to do is get a reasonable approximation to the solution.netlib. Also called “quadrature. 2) Stat. Complex numbers are excluded.” object 1) S. a character that means a new line of text is to be started. a distribution-free procedure.org/. Zero divided by zero. Represented in S and in C by backslash-n. atomic mode that represents real numbers. a part of a mathematical graph that is usually represented as a point.

but may also happen with ﬂoating-point numbers. overload 1) Comp. the coneptual area that is at the very edges of a graphics frame. Generic functions in S are overloaded. Typically the ﬁle name ends with . $ and ! (when it means “not”).0 . object-oriented programming 1) Comp. if the transpose of A is equal to the inverse of A. orthogonal vectors 1) Math. In S this is accomplished with generic functions. outlier 1) Stat.o in Unix. operator 1) S. a datapoint that is far enough from its predicted value in a statistical model that it strains the credibility of the model for that point. a style of software in which the action of some “message” (or function) depends on the type of object that receives the message. optional argument 1) Comp. pertaining to numbers in base 8. a function that does not need to use the usual format when called.395 object code 1) Comp. OOP 1) Comp. In S at least. two vectors are orthogonal if their dot product is zero. a square matrix A is orthogonal (also called AA=I that is. has non-zero area) when there is more than one ﬁgure in the frame. operating system 1) Comp. the action of making the same thing do more than one job. the compiled version of some code in a compiled language. S Poetry c 1998 Patrick J. then usually some default value is used for it. methods for generic functions and the class attribute. Burns v1. an argument to a function that need not be given. octal 1) Math. This is usually only used (that is. orthogonal matrix “unitary”) if 1) Math. Some call this “orthonormal” and allow an orthogonal matrix B to be such that B B is some diagonal matrix. Examples are +. When it isn’t given. the condition when an operation causes the answer to go out of bounds of the range of the number representation. but with integers the value is capricious. option 1) S. Unix and MacOS are operating systems. These are queried and set by the options function. one of several values that control some aspect of the S session. outer margin 1) S. overﬂow 1) Comp. software that controls the general operation of a computer. acronym for Object-Oriented Programming. This most commonly happens with integers since the range is relatively small. overﬂow with ﬂoating-point numbers is little problem since the value gets set to inﬁnity. See also margin.

paste 1) Comp. pivot 1) Comp. the process that caused the process in question to come into existence. There are n-factorial permutations of n distinguishable objects.396 GLOSSARY p-value 1) Stat. see swapping. It is similar to the bootstrap. non-technical term meaning the type of hardware. members of the owner’s group and users at large to read. Pentium machine running Linux.1) preﬁx meaning 10−12 . See also combination. permutation 1) Math. the probability that something more extreme than the test statsitic would be observed if the null hypothesis were true.0 . S Poetry c 1998 Patrick J. parent 1) S. pico. a command that uses the | operator to take the standard-out from the process on the left as the standard-in for the process on the right. a test of “no eﬀect” that is performed by permuting observations and comparing the real statistic with the distribution of statistics from the permuted data. but is limited in applicability. the memory frame that caused the memory frame in question to come into existence. platform 1) Comp. Can also be used in the broader sense that includes braces and brackets. one of numerous ways of writing a sparse or repetitious matrix in a small amount of space. see cut and paste. and possibly operating system. permutation test 1) Stat. packed 1) Comp. parenthesis 1) either of the characters “(” or “)”. 2) Unix. in hypothesis testing. the process of breaking a group of characters into parts with respect to the syntax of the language. but commonly the reordering of computations in order to ensure the computation is as numerically stable as possible. paging 1) Comp. For example. as in picosecond. 2) Both S and Unix have paste routines to combine text. a particular ordering of objects. there are various meanings. It has the advantages that it is valid under very broad assumptions and it is easy to compute. which are respectively an “opening parenthesis” and a “closing parenthesis”. permission 1) Unix. parse 1) Comp. A small p-value is evidence against the null hypothesis being true (if the conditions assumed for the calculation of the p-value are close enough to being true). pipe 1) Unix. Burns v1. write and execute the ﬁle. the ability of the owner of a ﬁle.

the portion of a graph where the actual data go. This is equivalent to all of the eigenvalues of A being non-negative. S Poetry c 1998 Patrick J. pound sign 1) The character “#”. posterior distribution 1) Stat. synonym for explanatory variable.397 plot area 1) S. the process of removing an item from a stack. the designation of points by angle and distance from the origin. principal components 1) Stat. but not C. pointer 1) C. graphics frame. a symmetric matrix A is positive deﬁnite if x Ax > 0 for all x. a process that changes the text of a source ﬁle before it is compiled. pop 1) Comp. preprocessor 1) C. PostScript 1) Comp. surrounded by the margin. See posterior distribution. prior distribution 1) Stat. the order in which calculations take place in an otherwise ambiguous context. to handle machine dependencies. for example. also called “hash symbol”. This is used. a multivariate technique that creates a set of new variables (the principal components) that are uncorrelated with each other and are ordered such that the variance of each principal component is no larger than the preceeding one. For example * takes precedence over + in S and in C. used for comments in many languages including S and Unix. polar coordinates 1) Math. a language that describes how text and graphics are to be rendered. predictor 1) Stat. an object that holds the address of the real object of interest. the distribution of the model parameters after taking the data into account. precedence 1) Comp. The pound sign is used to indicate preprocessor commands. the hypothesized distribution of the model parameters before the data are consulted. Compare to rectangular coordinates. positive deﬁnite 1) Math.0 . The ﬁrst principal component has as large of variance as possible. in Bayesian statistics. This is equivalent to all of the eigenvalues of A being positive. The posterior is a combination of the prior distribution and the likelihood. code that can easily be adapted to numerous computers. a symmetric matrix A is positive semideﬁnite if x Ax ≥ 0 for all x. See also ﬁgure. in Bayesian statistics. portable 1) Comp. Burns v1. positive semideﬁnite 1) Math.

The default prompt in S is “> ”. For example. a value such that a random variable is less than the value a given percent of the time. a data structure in which items are added at one end and removed from the other. acronym for quadratic programming. For empirical data there are a number of ways of calculating the quantiles which are slightly diﬀerent from each other. the process of adding an item to a stack. the least squares hat matrix is a projection matrix. a plot to compare an empirical distribution to a theoretical probability distribution (or another empirical distribution). Points in the tails are much more variable than those in the middle. quartile 1) Stat. a class of algorithms for converting a rectangular matrix into an orthogonal matrix times an upper triangular matrix. QQplot 1) Stat. Compare to stack. quadratic programming 1) Math. See also interquartile range. QP 1) Comp. Also called “FIFO” for “ﬁrst in. univariate numerical integration.0 . 50th or 75th percentile. If the points are along a straight line. projection pursuit 1) Stat. There are a few such techniques in use. Functions in S for this include qqnorm. push 1) Comp. projection 1) Math. a speciﬁc way of mapping a space into a smaller subspace. a multivariate technique that searches for “interesting” linear combinations of the variables. quadrature 1) Comp. the solution to a problem such as the minimum over all x of g x − . The 95 percent quantile of a standard Gaussian distribution is approximately 1. quantile 1) Stat. This is convenient and numerically stable for least squares regression calculations. a quantile that is at the 25th. pseudorandom number 1) Comp. qqplot and ppoints.398 GLOSSARY process 1) Unix. queue 1) Comp. The series is hoped to possess the most important empirical features of a truly random series of numbers. then the two distributions match (except possibly diﬀerent means and variances). prompt 1) Comp. Burns v1. characters used to indicate that an interactive program is ready for input from the user.64. one of a series of numbers generated by a mathematical function. executing a command starts one or more processes.5x H x subject to a set of linear constraints on x. S Poetry c 1998 Patrick J. the basic “being” in Unix. ﬁrst out”. QR decomposition 1) Comp. for example projection pursuit regression.

binary. an algorithm that uses (psuedo) random numbers. 2) Math. range 1) Stat. a language that is very similar to S—much S code can run properly in R.Random. the minimum and maximum of a dataset. there are 2π radians in a circle. acronym for random access memory. It is free software available via statlib. Examples are genetic algorithms and simulated annealing. the location where data needs to be in order to be accessed by a central processing unit. the spot in the ordered sample that an observation falls. Burns v1. while one that shows the history S Poetry c 1998 Patrick J. Examples include decimal. more generally (and accurately) a memory bank that can be accessed by address rather than sequentially. These are often used for distribution-free tests and statistically robust estimates. a monitor that shows current temperature is real-time. rank 1) Math. a data structure in which there is some number of dimensions (as in an array). RAM 1) Comp. unlike in a genetic algorithm or simulated annealing.seed. and hence often arrives at diﬀerent solutions on diﬀerent runs. For example. Computed in S with rank. random seed 1) Comp. In S the random seed is kept in the object . Computationally the rank is a much fuzzier concept due to numerical error—there is not a universally applicable algorithm to get the rank of a matrix.0 . set of values taken on by a function. or the distance between the two. used loosely to mean an algorithm that uses randomness. ragged array 1) Comp. In S all of the objects involved in a computation must be held in RAM. real-time 1) Comp. an S function that does not create a separate memory frame. also called “main memory”. R 1) Comp. These are discussed further on page 79. The trigonometric functions in S use radians. the maximum number of independent linear combinations that can be made with rows or with columns of a matrix. an object that holds the state of a pseudorandom number generator. the base of the representation of a number. radian 1) Math.399 quick call 1) S. In mathematics there is a deﬁnite rank to a matrix. but not all of the “cells” contain the same number of items. calculations that pertain to the situation at the moment that the calculations are available. random access memory 1) Comp. random algorithm 1) Comp. 2) Comp. 2) Stat. and was originally developed by Robert Gentleman and Ross Ihaka. 2) Comp. a unit of measure for angles. ties create fractional values. hexadecimal. radix 1) Math. but all of the randomness is independent.

Burns v1. This is also confusingly called “dependent variable”. a test of the quality of software that checks if a bug occurs that was previously found in the software. Examples of reserved words in S are return. a special memory location where access by the central processing unit is extremely fast. can have a component that is a list. but recursive algorithms tend to be ineﬃcient in terms of computing resources. This is using subscripting as an assignment function. regression tree 1) Stat. The time lag longer than which the computation is not “real-time” is application dependent. T. S Poetry c 1998 Patrick J. then an error results. record 1) Comp. for example. For example standard-out can be redirected to a ﬁle named my. ﬁt in S with tree. recursive object 1) S. regression test 1) Comp. the ls command. the pattern matching facility used by grep and several other commands. register 1) Comp. 2) It is certainly feasible that this phrase could appear in a statistical context. required argument 1) Comp. in a database. for example. redirect 1) Unix. a coordinate system in which the axes are linear and perpendicular to each other. an argument to a function that must be given in its calls. Some problems are easily solved with recursion. the use of subscripts on the left of an assignment to change the values in part of an object. recursion 1) Comp. the same as a classiﬁcation tree except that the response is continuous rather than categorical. reserved word 1) Comp. This is not the same as the wildcarding that Unix uses with. When it isn’t given.400 GLOSSARY of yesterday’s temperature is not. a variable that is approximated in a statistical model by a function of one or more explanatory variables. an object of one of several modes such as list and expression that can contain an object of the same mode. NULL. A list. a collection of ﬁelds constituting one observation.0 . response 1) Stat. Compare to polar coordinates. break. rectangular coordinates 1) Math.out. a computation that is self-referential—a recursive function calls itself. the action of changing how standard-in. standard-out and/or standard-error behave. A recursion can always be recast as an iterative process. a group of characters that can not be used as a variable name in a language because they mean something particular in the language. replacement 1) S. regular expression 1) Unix.

rug 1) Comp. root 1) Math. the process of dropping the least signiﬁcant digits at some particular location relative to the decimal point. as opposed to a vector or matrix. then it is not clear what to do and there is more than one convention—round to even. 3) Comp. S-news 1) a mailing list for discussion of and questions about S and S-PLUS. S Poetry c 1998 Patrick J. a single number. row-major order 1) Comp. For example. The opposite of unstable. RTFM 1) Comp.0 . Often this refers to code that is smart enough to retain as much precision as possible. For example. scalar 1) Math. code that works even given extreme conditions. run-time 1) Comp. a selection of observations from some population. sample 1) Stat. marks along an axis of a plot indicating the marginal distribution of the data along that axis. a value that makes a function zero.wustl. as opposed to when it is compiled. Usually used in the sense of a selection from a known population with a speciﬁc random mechanism. the mechanism to subscribe is to send email to: S-news-request@wubios. an algorithm that works even under adverse conditions. Compare side eﬀect. the value that is left in place of a function call after it is evaluated. robust 1) Stat. and round up are the two most common.401 return value 1) Comp. then the second row. Compare truncate. a node in a tree that is treated specially as the primary node. an estimate or hypothesis test that is relatively reliable even when (some) assumptions are violated. the order of elements in a matrix in which the ﬁrst row is ﬁlled.edu with the body of the message equal to: subscribe s-news You should get back conﬁrmation in an explanatory message. same as numerical error. which is called compile-time. Compare column-major ordering. etc. the time at which a program is used. 2) Comp. genetic algorithms are said to be robust because they work even with multiple local optima and if the objective is not diﬀerentiable. rounding 1) Math. If a 5 is the single digit to be dropped. 2) Math. rounding error 1) Comp. Burns v1. a suggestion to Read The Manual. the median is robust to outliers while the mean is not. The IMSL C routines use row-major ordering. At press time.

for objects. SIC 1) Stat.402 GLOSSARY scientiﬁc notation 1) Math. used as in “most signiﬁcant digits” or “least signiﬁcant digits” (or bits) to mean the digits representing the left-most or right-most places. sometimes denoted as BIC (but there is another slightly diﬀerent BIC also). selection 1) Comp. in order. a speciﬁc method of solving a linear programming problem. This. Burns v1. It is a “shell” because it covers the kernel which is unchangeable. the number of digits in a number that are not just random noise. like AIC. S (and many other programs) would write it as 1. script 1) Unix. this often suggests smaller models than does AIC. 2) Comp.1234 × 103 . search list 1) S. signiﬁcant digits 1) Stat. The semantics aﬀects the evaluation of an expression. The analogue in Unix is the path variable. semantics 1) Comp. an algorithm introduced by Nelder and Mead (1965) to solve general optimization problems without the assumption of diﬀerentiability. a special message given to a program to change what it should do. Compare syntax. shell 1) Unix. session 1) S. 2) Comp. the “Schwarz Information Criterion”. Examples are the Bourne shell and the C shell.234 × 102 or as . each step replaces one of the solutions with a new one until a local minimum is found. a program written in a shell language. see subscript. the collection of databases that an S session will search. depending on the convention used. seed 1) Comp. a number written as a number within a certain range times 10 to some power. is a way of combining the likelihood and the number of parameters so that non-nested models may be compared.0 . see random seed. the meaning attached to expressions in a language. An interrupt signal is an example.234e2. the program that controls the interaction with the user. The algorithm starts with one more solution vector than there are parameters. simplex algorithm 1) Comp. S Poetry c 1998 Patrick J.4 is written in scientiﬁc notation as 1. The number 123. side eﬀect 1) Comp. In practice. signal 1) Comp. any action of a program (function) except returning a value. a process that starts when S is started (interactively or in batch) and ends when S is exited. Compare return value.” used in S and in C to separate commands. semicolon 1) the character “.

object code. S Poetry c 1998 Patrick J. See also compile.1). computations done with single-precision numbers. stable 1) Comp. almost always on the same key as the double quote. source code 1) Comp. and the size of the random jumps tends to be smaller and smaller as the optimization progresses. single-precision 1) Comp. 2) Comp. a function that is a piece-wise polynomial of a certain degree. generally using 4 bytes. the process of making S objects from an ASCII representation of the objects. single quote 1) the character ’. the decomposition of a rectangular matrix into the product of an orthogonal matrix times a diagonal matrix times another orthogonal matrix. acronym for “Structured Querry Language”. but need not be. singular matrix 1) Math. not to be confused with the backquote. a class of optimization algorithm that randomly explores around the current best value. Used in S and in C to mean division. a non-linear model of data. 2) Stat. This usually refers to code for a compiled language. spline 1) Math. The analogy is to a material ﬁnding its minimum energy state as it cools. a number-like entity: an inﬁnity or not-a-number. a square matrix that is numerically not invertible. a procedure that is robust (meaning 2). This is done with the source function. 1) Comp. data. smooth 1) Stat. A competitor to genetic algorithms. The distinction between singular and non-singular is fuzzy with ﬁniteprecision numbers.403 simulated annealing 1) Comp.restore and redirection of standard-in. singular value decomposition 1) Math. often with the constraint that the function and a certain number of its derivatives are continuous. 2) Comp. sparse matrix special value 1) Math. the software as written in a language like C or Fortran that created a program. SQL 1) Comp. Burns v1. This is most commonly univariate. but more loosely may refer to use of restore. source 1) S. often created with a nonparametric method. a ﬂoating-point number. slash 1) the character “/”. but may also refer to code from an interpreted language like S. the lingua franca of database management programs. and in Unix within path names. a probability distribution that is equivalent to a standard Gaussian divided by a uniform(0.0 . a square matrix that is not invertible (has determinant zero). a matrix that contains a large number of zeros.

also called FILO for “ﬁrst in.0 . static load 1) Comp. S Poetry c 1998 Patrick J. standard error 1) Stat. but has found little practical use except for the Gaussian and Cauchy distributions which are special cases. and items are popped oﬀ of the stack. a family of distributions that is very interesting from a theoretical view. statlib 1) a repository of software and other information pertaining to Statistics. Usually this is the keyboard. In S this is done with the LOAD utility.cmu. deﬁned as the square root of the variance. Compare dynamic load. standard-out 1) Unix. a data structure in which items are added to and extracted from the same place. pop. storage mode 1) S. Usually this is the screen. stack 1) Comp. slang for the asterisk. standard-error 1) Unix. The possibilities are integer. the return value of a Unix program that indicates the error state—zero. An example is input from a keyboard. standard-in 1) Unix. Often used to mean multiplication. last out”. deﬁned as the square root of the variance. sub-types of mode numeric. means no error. any sorting method that ensures that. star 1) Comp. a sequence of characters. standard deviation 1) Stat. Compare queue. the addition of code (symbols) to the executable of a program when it is not running. in the case of ties in the variables used to sort. the place where error messages go.404 GLOSSARY stable distribution 1) Stat. stream 1) Comp. a new. Usually this is the screen. Accessible on the world wide web with http://lib.stat. but it has other meanings as well. The image is of a spring-loaded container for plates—new items are pushed onto the stack. double and single. larger executable is created. the place where information goes. the variability of an estimate. Burns v1. by convention. the character “*”.edu/. status 1) Unix. a conceptually unending series of pieces of information. observations within a tie retain their original order. It includes a large section of contributed software for S. See also push. the variability of a probability distribution. That is. the place where information comes from. string 1) Comp. stable sort 1) Comp.

tera.405 subscript 1) Comp. a table kept by a program that relates symbols to addresses. In this sense. symbol table. the rules governing how expressions are made in a language. symbol table 1) Comp. mark along the axis of a graph indicating a value for that location. swap-space 1) Comp. opposed to superscript. Compare semantics. an ambiguous concept on time-sharing computers since there is a diﬀerence between the total elapsed time from start to ﬁnish of a task (“wall time”) and the time actually expended on the task (“computer time”). lookup table. Burns v1.0 . such as simpliﬁcation and factoring. system terminating 1) S.” used to accept an arbitrary number of arguments in a function. as in terabytes of data.. mathematical computation. tilde 1) the character “˜” used in S to denote a formula. the process of switching information between RAM and swap-space so that all information required for the pending computation is in RAM. a character string that identiﬁes a compiled routine— often with underscores added to the routine’s name. the mark put at the location of a datapoint in a plot. tick 1) Comp. Counts as white space. without regard to what the expressions mean.. portion of disk-space reserved for overﬂow from RAM. table See contingency table. the extraction (or replacement) of part of an object. In S and C this is done with brackets.Fortran) has happened. symbol 1) Comp. S Poetry c 1998 Patrick J. swapping 1) Comp. Compare Hermitian matrix. a square matrix that is equal to its own transpose. symmetric matrix 1) Math.C or . where some of the “numbers” are variable names. 2) S. the action that S takes when it thinks something bad (usually inside calls to . 2) text that is written lower (and generally in a smaller font) than the regular text. the construct “. Also known as “paging”. tab 1) character written as backslash-t in S and C that represents some number of blank spaces. syntax 1) Comp. symbolic computation 1) Comp. The syntax aﬀects the parsing of an expression. S kills itself to make sure that no permanent data become corrupted. three-dots 1) S.1) preﬁx meaning 1012 (or 240 ). time 1) Comp.

unary 1) Math. univariate 1) Stat. tridiagonal matrix 1) Math. operation in which the rows of a matrix are interchanged with its columns. Generally. a situation involving a single distribution. this is of little consequence. lines. a matrix in which there are all zeros above the diagonal (lower triangular). underscore 1) the character “ ”. The main title is written at the top of the ﬁgure. the event in which a non-zero number becomes indistinguishable from zero due to ﬁnite-precision arithmetic. generally evenly spaced time points. a dataset where observations are at a sequence of time points.0 . trace 1) Math. system of graphics functions that focus on displaying multivariate data in a simple and comprehensible manner. the process of modifying functions so that they produce some information when called. truncate 1) Math. Compare to diagonal matrix. one of two high-level graphics parameters that label a graph. token 1) Comp. Also called “monadic”. a (connected) mathematical graph that contains no loops. trellis 1) S. for example “unary minus” to change the sign of a number. the sum of the elements on the diagonal of a matrix. used in S to mean assignment. 2) S. underﬂow 1) Comp. Compare to multivariate. type 1) S. an operation on a single entity. a high-level graphics parameter that speciﬁes how the data are to be represented in the graph. a relative of rounding. used in C within variable names. Burns v1. classiﬁcation tree. triangular matrix 1) Math. used for debugging. The titles appear in the margin. an element of a parsed expression. title 1) S. tree 1) Math. the sub title is at the bottom. regression tree. a matrix in which the only non-zero elements are on the diagonal and the diagonals immediately above and below the diagonal. or below the diagonal (upper triangular). Comare rounding. See also binary tree. Choices include points. transpose 1) Math.406 GLOSSARY time series 1) Stat. but the least signiﬁcant digits or bits are just dropped with no change to what is left. S Poetry c 1998 Patrick J. highdensity or none.

and is equivalent to an S matrix being treated as a simple vector. an ordered collection of numbers. a temporary ﬁx for a bug. vertex (plural vertices or vertexes. 2) S.vector use).vector and is. vectorization 1) S. as opposed to its side eﬀects. Confusingly. but merely issues a message.0 is based on Version 3 of S. the columns of a matrix stacked on top of each other to form a vector. workaround 1) Comp. a function is vectorized when the output can contain an arbitrary number of answers. dividing by a number close to zero is an unstable operation. For example.407 unstable 1) Comp. wall time 1) see time. S Poetry c 1998 Patrick J. an error-like condition that does not interrupt computation. word 1) Comp. swap-space. a method of avoiding a bug by going a diﬀerent route. wildcard 1) Unix. vector 1) S. computations are vectorized by some (super) computers. the version of S that has existed since the early 1990’s. a byte. a computation that can be inaccurate due to numerical error or some other phenomenon. warning 1) Comp. white space 1) Comp. Chambers and Hastie (1992) Statistical Models in S. 2) Comp. “tab”. any of the characters “space”. version 3 1) S. This is indicated by the “vec” operator. often used in the sense of what is returned by a function (return value). The opposite of robust (meaning 2). virtual memory 1) Comp. white book 1) S. utility 1) S. 4) Math. 2) Comp.) 1) Math. Compare to regular expression. Examples are BATCH and LOAD. and sometimes “newline”. Burns v1. version 4 1) S. an object with no attributes (the deﬁnition that as. S-PLUS Version 4. 3) S. an atomic object (common usage). an operating system command started by S. an object of one of many modes that allows an arbitrary length.0 . a part of a mathematical graph that is usually represented by a point—also called node. The style of speciﬁcation of multiple ﬁle names used by ls and some other commands. the version of S ﬁrst appearing in 1998 that allows the equal sign to mean assignment and numerous other changes. 3) Math. value 1) Comp.

408 working database GLOSSARY 1) S. (pronounced WIZZ-ee-Wig) an acronym for “What you see is what you get.0 . WYSIWYG 1) Comp. the database that is ﬁrst on the search list.” S Poetry c 1998 Patrick J. Burns v1.

270.First. 122. 178 specialsok argument: 175 .Machine: 8.local: 45 .: 371 algorithm: 371 alias: 371 alias (Unix): 158 . 155.Argument mode: 215 .Last: 44 . 274.Group: 225 .Help: 187.Cat.stack: 228 409 [. 139. 252 . 9 %e%: 207 %in%: 207 &: 95 &&: 95 BIG: 188 BIGIN: 188 nonfile: 187 abbreviate: 84 abbreviation: 6. 367 abs. 281. see unabbrev.print: 45.Help: 193 . 110.mathgraph: 303 [. 125 <<-: 26. M.value disallowed: 19 Abelson. 27.privateFirst: 45 : operator: 95.Audit: 160. 220 . 193 . 111 .Generic: 225. H.Method: 225 . 9. 21.Fortran: 30. 367 abind: 290 Abramowitz. 19. 32.Index ˜ : 291.C: 30. 208.Random. 267.First.Last.Auto. 194 .: 260. 220 .mathgraph: 310 aggregate: 371 AIC: 371 Akaike.stack: 229 [[: 9–10.Options: 219.verify: 57 [<-.rationalnum: 253 abscissa: 371 abstraction: 23–24.rationalnum: 250 [. 183 . 30.rationalnum: 247 .Internal: 18. 218. 138 = versus ==: 182 ? operator: 69. 203 diﬀerence with [: 10 with vector: 203. 70 [: 5–8.mathgraph: 304 [<-.seed: 57. 187. 187–188 . 239 . 258 pointers argument: 177. 190 . 139. see formula +. 352 $: 8. 365 add argument: 15 ADDKEYWORD: 193 address: 20. 10–13 with drop: 10. 276. 225 .value: 4 .Class: 225 . 304 [. 90. 371 adjacency matrix: 310 adjamat: 311 adjamat. Harold: 37.rationalnum: 246 -.First: 44.Data: 39.lib: 40 .

90. 223 background color: 373 backquote: 157. Burns v1. 356 batch: 373 S Poetry c 1998 Patrick J. 105.equal: 57. 36.double: 122 as. 355 assignment: 214.vector: 121. 75–77. 400 argument name: 27 capitalized: 27 same as function: 145 ARIMA: 114 arithmetic operators: 4 array: 372 generalization of row: 116 one-dimensional: 124 permuting: 87 ragged: 79 S: 3 subscripted as vector: 11 array function: 85 as. 280 aspect ratio: 372 assembly code: 51. H.name: 296 as.: 252 AUDIT: 194–195 audit: 187. 288. 126. W. 287.0 . 372 assign: 99–101.: 353.rationalnum: 251 INDEX as. 372 speed of: 58 assignment function: 202. 132–134 as. 325 apply: 18.numeric: 134 as. 372 attach: 372 attach: 39. 372 asymptotic: 372 at (Unix): 160 at sign: 372 atomic: 1. 201. 122 ASCII: 2. 451 Bak. 157. 324. 121. 96. changing: 235–242 BATCH: 138. 129. 372 create local object: 202. 93. 326. 372 AsciiToInt: 117.vars: 212 alldirected. 141 as.matrix: 122. J. 156 attributes: 3. 373 backsolve: 86 backup: 355 backward-compatibility: 31. 214. 122. 359 all. 220. 248.numeric. 373 backslash: 2. 228 association: 16. 373 bad address: 179. 323 approx: 109 apropos (Unix): 164 arbitrary limits: 30 arbitrary number of arguments: 19 argument: 371 arbitrary number of: 19 capturing as written: 102 creating: 352 matching: 19. 122. 188. 371 optional: 19. 367 bang: 373 barplot: 15 base.rationalnum: 246 as. 373 stopping: 160 augmented matrix: 63 autocorrelation: 114 awk (Unix): 164 axis: 373 label: 16 background: 373 background (Unix): 157.character: 98.410 all: 95 all. 395 required: 19. 372 attributes: 361 Auden. 279. 159.mathgraph: 306 alpha version: 371 alphanumeric: 371 ampersand: 371 analysis of variance: 114 and: 95 anonymous: 28 anova: 114 ANSI C: 168 any: 95 aov: 114 aperm: 87–89. 243.

180. 224 buﬀer: 375 bug: 23. 22. 301 building blocks: 25 bus error: 179 by: 80 byte: 375 411 c: 123. Gwendolyn: 200. 139. 103 BLAS: 374 blue book: 374 body: 221. 208 browser. checking: 188 Bates. 367 Becker. 374 combining: 352 bootstrap: 374 Bourne shell: 156. 223. 152. E. 191. 214. 178. 374 box: 374 box constraint: 109. 263. 149. 397 print: 182 printf: 180. 98. 56. 374 benchmark: 373 Bentley. 140– 149. 391 malloc: 175 missing value: 175 modulo operator: 169 mult complex: 264 na set3: 175. 233. 397 pointer to pointer: 173 portopt one Sp: 340 preprocessor: 171. 36. 374 bracket: 374 brain-dead: 375 break: 87.0 . 367 Beran. 136. William: 25. 266 S Poetry c 1998 Patrick J. 204 breathe: 149 Breiman. 21. 263 is na: 175. 182 PROBLEM: 176 RECOVER: 176 S.INDEX batch ﬁle. 59. Jon: 37.default: 138 browser: 138. Elizabeth: 138 bit: 374 bivariate: 374 black-box test: 52 Blake.array: 289 binning: 86 Bishop. Jan: 118. 352. 142. 208. 264 name: 167 norm rand: 176 operators: 169 optimization: 180 pointer: 168.mathgraph: 299. G. 380 ﬄush: 180 for: 170 header ﬁle: 171 inf set: 263 int: 179 integer division: 181 inverse complex: 264 is inf: 175.: 393 boxplot: 374 brace: 17. 130. 263 long: 179 macro: 175. Rick: 3. 374 Box. P. 185.h ﬁle: 171. 367 Brooks. 367. 35. Burns v1. 331. 375 ﬁrst-only: 53 intermittent: 151 one-oﬀ: 53 silly: 149 bug report: 150–151 build. 353 C language: 167–186 ANSI: 168 call S: 178 comma: 182 continue: 170 debugging: 179–180 declaration: 168. 367 beta test: 373 beta version: 373 bias: 373 binary ﬁle: 373 binary number: 374 binary tree: 374 bind. Doug: 118. 171 dereference: 168. 70. Leo: 118.

John: 3. Samuel Taylor: 356 collision: 376 column-major order: 376 combination: 376 comma in C: 182 in S: 206 command line editing: 42. 30. 356 call mode: 202. 233. 160 chol: 107. 118. 377 commontail: 309 commute: 377 compacting: 63. 92. 179 carriage return: 375 case-sensitive: 4. 27. 286 choleski: 107 Christiansen. 141. 367. V. 334. 70. 352. 377 comparison operator: 18. 377 component: 3. 129. 28. 375 history: 156 c.: 318. Burns v1. 377 COMPILE: 171. 93. 97 classiﬁcation tree: 115. Tom: 282. 376 character: 196 characteristic: 376 characteristic function: 376 charmatch: 94. 285. 96. 37. 25.mathgraph: 307. 319. 116 col generalized: 116 Coleridge. 247. 21. 296 call S (C): 178 canonical correlation: 375 capitalized argument name: 27 Capricious Rule 1: 23. 376 Cleveland. 22. 324 call quick: 399 call component: 102.h: 278 structure: 171 switch: 170 unif rand: 176 WARNING: 176 C shell: 156. 262. 376 to integer: 124 col: 86. 369 chull: 109 class: 376 class: 14. 407 chaos: 375 Chaplin. 377 S Poetry c 1998 Patrick J. 191 compile: 377 compile-time: 377 compiled language: 377 complex: 377 complex number: 2. 149.0 . 125. 215. 313 clustering: 376 codes: 133 coercion: 17–18. 189 cat (Unix): 159 categorization of functions: 70 category: 375 category: 117 CATHELP: 193 cbind: 258 ceiling: 375 central processing unit: 375 cex parameter: 15 Chachra. 374. 177 S realloc: 177 seed in: 176 seed out: 176 special value: 175 static: 173 stdout: 180 strings. 368 cleverness: 35. 367 Chambers.412 S alloc: 175. 178.rationalnum: 251 cache: 375 call: 375 combining: 353 call: 218. 376 comment: 35. 139. 95 child: 376 chmod (Unix): 158. Charlie: 138 INDEX CHAPTER: 191 character: 2. 308 c. William: 118. 375 cat: 47. 359. 377 multiline: 171 commitment: 214.

331. 377 conditional distribution: 377 conditioning plot: 378 confounded: 378 confusion matrix: 378 conjugate: 378 conjugate gradient: 327 consistency: 27–29. 226. 302 413 S Poetry c 1998 Patrick J.fraction: 269. 368 Day-Lewis. 368 data frame: 11.: 29 cumprod: 18 cumsum: 18 cursor: 379 cut: 86 cut and paste: 379 cylinder: 200 D: 108 Darnell.frame: 141.restore: 96 database: 39–40. 378 box: 109. 379 attaching: 40 containing matrix: 294 data hiding: 26 data. 374 linear: 390 penalty for: 335 constrast: 378 contingency table: 378 continuation: 47. 280 control argument: 334. Cecil: 135 dblepr (Fortran): 183 dbx: 139 debug: 379 debugger: 137. Burns v1. 96. 223. 191 data.INDEX condition number: 107. 408 database 0: 28. 148. 100.0 . 202. 254. 299. 129. 96. 379 decomposition: 379 default: 379 delay.dump: 49. 341 control key: 223 control operator: 95. 51. Alan: 68. 379 convention naming: 27 convergence false: 383 convex hull: 109 convolution: 109 coordinates polar: 397 rectangular: 400 correlation auto: 114 canonical: 375 cross: 114 CPU: 379 crash: 379 critique of S: 36 crontab (Unix): 160 cross-correlation: 114 cross-validation: 379 CSH: 195 cummax: 18 cummings. 365 constant: 378 constrained nonlinear regression: 115 constraint: 200. P. 379 working: 39. e. 270 continued fractions: 268–272 continuous variable: 378 contour: 109 contour integral: 320 contour plot: 378 control: 379 source: 48–51. 171. 378 continue (C): 170 continue option: 47 continue. 293 debugging: 135–152 example: 140–149 in C: 179–180 strategy: 149–150 decimal number: 379 declaration: 167.eval: 217 density: 380 deparse: 380 deparse: 98. 294 data. 168. 122. 102.class: 97 data.: 186. 210 date julian: 388 Davis. e.

262–268 digits signiﬁcant: 402 digits option: 46. 129 dump. 331 INDEX documentation: 34–36 Dostoevesky. 137 dyn. 137 dump. 202. 160 diﬀerencing: 87 diﬀerential equation: 109 diﬀerentiation: see derivative diﬀerentiation numerical: 394 diffmask: 43 diffsccs: 49. 225 dimnames: 3. Burns v1.load2: 24. 381 dynamic programming: 381 echo: 381 edge: 297. 286 eigenvalue: 382 eigenvector: 382 element: 1. 308. 51. 141 directory: 380 directory stack: 163 disk-space: 19 distribution: 380 conditional: 377 Gaussian: 384 joint: 388 long-tailed: 391 marginal: 391 posterior: 397 prior: 397 slash: 403 stable: 404 distribution function: 380 distribution-free test: 380 distrust: 27. 216.load: 24. 304 drop function: 33 dump: 381 error: 137 dump: 49. 280 digamma: 267 digamma function: 256. 380 deriv: 108 derivative: 380 symbolic: 108 detach: 40 deviance: 380 diag: 85 diag. 140.q: 44 double: 128 double negative: 345 double quote: 157. 381 eigen: 107. 94. 281. 96. 143 dim: 3. 172 dynamic graphics: 381 dynamic load: 172. 172 dyn. 32. 122.shared: 172 dyn. 53. Fyodor: 362 dot plot: 381 dot product: 381 dotFirst. 201 drop argument: 10. 381 editing: 135 eﬀect: 381 eﬃciency: 71. 380 Dickinson. 381 double-precision: 381 reading: 91 dput: 43.414 dependent: 380 deque: 380 dereference (C): 168.load. Emily: 94 diff: 87 diﬀ (Unix): 43. 51. 129. 124. 382 EOF: 382 erase key: 162 error: 382 dump: 137 S Poetry c 1998 Patrick J.calls: 47.default: 217 diagnostics: 380 diagonal matrix: 286.frames: 47.0 . 382 ellipsis: 382 EM algorithm: 382 emacs editor: 42 empty string: 196 environment (Unix): 153 environment variable (Unix): 153– 155.call: 103. 318 do.

299 evaluation: 206. see memory frame event: 382 exclusive or: 96 EXEC: 195 executable: 158.0 .jsa20sum: 336 fjjartshow: 105 fjjartshow2: 106 fjjcheckdigam: 267 fjjcheckdigamcom: 267 fjjcheckexpint: 270 fjjcheckexpintcomp: 271 fjjchecknumberbase: 241 fjjcheckpolygam: 260 fjjcheckrationalnum: 253 fjjcheckratop: 254 fjjcomma: 62 fjjcylinder: 198–200 fjjdigamdup: 268 fjjdigamreflect: 268 fjjexp. 218 numerical: 124. 137. 383 factor: 117 factor analysis: 114 factorial: 383 415 false convergence: 383 family: 383 FAQ: 383 fast Fourier transform: 109 feature: 383 undocumented: 106 ﬄush (C): 180 fft: 109 ﬁeld: 90. 301 find. see subscripting factanal: 114 factor: 130–134. 215.objects: 116. 216 escape: 153.integral: 272 expand: 356 expand. 57. 210 ﬁnd (Unix): 161 find. 217. 98.twomat.of: 280. 218. 98.lagrange: 323 fjjline. 210 exp: 209 exp. The: 27 ﬁle: 153.integral: 270 fjjgenop1: 327 fjjgentest1: 329 fjjinterpolator. 348. 206 Euclid’s algorithm: 244 Euler’s constant: 267 eval: 57. 382 exists: 24. 207 ﬁrst-only bug: 53 fix: 138 ﬁxed format: 91. 34. 254. 242. 394 silly: 149 rounding: 401 error option: 47. 324 expressions option: 240 extraction: 383. 382 in string: 2 key: 42 to Unix: 126. 383 Figure a Poem Makes. 217.grid: 254 expand. 116. 384 ﬁxed-width font: 384 fjj. 383 permission: 160 filetest: 162. 383 FIFO: 383 ﬁgure: 16. 352 FILO: 383. 389 evaluation frame: 382. 103.I.integral: 319 S Poetry c 1998 Patrick J.reduced: 359 expected value: 383 expense minimizing: 36 explanatory variable: 383 exponential integral: 270 exponential notation: 383 expression: 383 regular: 400 expression: 202.assign: 211 find.INDEX handling: 382 ignoring: 215 message: 33. Burns v1. see stack filter: 18 find: 43. 382 lazy: 213. 156.

Robert: 179 Friedman. 146–148 get. 368 Golub. 210. 170. 210. 61.0 .default: 149 getenv: 45 getpath.control: 334 Gentleman. 208. 385 genopt: 331. 291–318. 352 pedestrian: 81 function mode: 202 functional language: 213. 240 Gaussian distribution: 384 Gay.base10: 238 Frost. 384 manipulating: 296 Fortran: 182–185 dblepr: 183 implicit none: 183 input/output: 182 intpr: 183 realpr: 183 xerror: 184 xerrwv: 184 foul urges: 149 Fourier transform: 109 fragmented memory: 61.adjamat: 314 getpath. 348– 353 categorization of: 70 length of: 202. Theodor: 206 generalized additive model: 115. 129. 98. 368 gotcha: 385 gradient: 109. 369 ﬂesh is grass: 28 ﬂoating-point number: 384 ﬂoor: 384 font: 384 ﬁxed-width: 384 fools and liars: 35 for: 18. Jerome: 118. 335 genopt. see data frame evaluation: see memory frame graphics: 15.: 270. 75. David: 338. 99. 65–68.incidmat: 314 Ghare. 367 from. 365 function: 384 arguments: 19. 127–129. 62. M. 271. 367 gigabyte: 385 glm: 114 global variable: 26.: 353. 385 generating random number in C: 176 generic function: 13. 352 INDEX as return value: 322–326. P. see memory frame frame 0: see database 0 frame 1: 100. 385 global. 353. 384 gam: 115 gambol: 224 garbage collection: 384 garbage compacting: 63 garbage test: 53. 212.: 318. 384 frame: 384 data: 379. Robert: 27.416 fjjlog2: 220 fjjmiss1: 26 fjjmiss2: 26 fjjpolygammult: 261 fjjrowmin: 12 fjjtwomat: 357 ﬂag: 384 Flannery. 385 graph: 385 mathematical: 297–317 graphical user interface: 385 S Poetry c 1998 Patrick J. 31. D.vars: 212 Goldberg. Robert: 399 get: 43. 274. Gene: 290. 156. E. Burns v1. 62. 368 Geisel. 204 for (C): 170 format: 47. 62. 385 generalized linear model: 114. 223. 220 frame function: 15 Francis. 278 formula: 203. 385 memory: 392. B. 385 argument of: 128 debugging: 139 group: 225 internal: 225 operator: 224 genetic algorithm: 327–338. 93.

387 installing: 192 keyword: 193 multiple objects in: 192 writing: 34–35 help. 95. 387 Heiberger. 391 graphics parameter: 15. R.findsum: 193 help. 228.error: 215 ignored object warning: 136 Ihaka. 369 grep (Unix): 159 grep function: 83 group: 386 of generic functions: 225 GUI: 386 hack: 386 hacker: 155. 385 high-level: 387 gray scale: 386 great. 407 hat: 386 hat matrix: 386 head (Unix): 159 header ﬁle (C): 171. 118. 273 greedy algorithm: 386 Gregory. Rich: 290 Heller. 227.: 353.INDEX graphics: 15–16. 385 graphics frame: 15.mathgraph: 312 increment (C): 167 independent: 387 index: 387 indirection: 24. 368. 368 HOME: 153 hook: 387 horse who bit: 28 hostname (Unix): 188 Housman. 96. 233. 388 inherits: 102 input/output: 388 S Poetry c 1998 Patrick J. E. 301 I/O: 388 idempotent: 387 identify: 113 if unnesting: 208 if: 17. 226. 105. 387 high-level graphics parameter: 387 Hildebrand. 214. 70 help ﬁle: 49. 304. John: 353. Ross: 399 image: 109 immediate argument: 99. Joseph: 363 help: 69. 386 Hastie. 274.: 273. 138 hubris: 25. 387 Inf: 2 inf set (C): 263 information matrix: 387 inheritance: 14. 386 hash symbol: 386 hash table: 47. 368 hang: 386 hard-coded: 31.0 .: 53. 113 graphics device: 15. 387 low-level: 15. Burns v1.: 274. 367. F. 100. A. R. 96 ignore. 306. 104. 103.div: 244 greatest common divisor: 245. 386 Hadamard product: 386 Hamming. 30 hypothesis tests: 113 I: 296. 204 ifelse: 95. 385 graphics function high-level: 15. 368 HINSTALL: 192 histogram: 387 history: 194 hobgoblin of small minds: 27 Holland. 359 impatience: 25 implicit none (Fortran): 183 incidmat. B. Trevor: 21.common.start: 69 Hermitian matrix: 387 Hessian: 387 hexadecimal: 387 417 high-level graphics function: 15. 334.

104 kernel: 389 Kernighan. 389 least squares: 389 legend: 389 length: 1. 309 last. 279. 263 iter.na. Donald: 233. 368 key binding: 389 keyword in help ﬁle: 193 kill (Unix): 159 kilobyte: 389 kludge: 389 knap-sack problem: 335.max argument: 28 iteration: 388 ivp. 146. Robinson: 71 INDEX Jennison. 217. 159.array: 122 is.nan. 107. 368 jjtest3: 344 joint distribution: 388 julian date: 388 justify: 278–279. 335.integer: 124. 388 intpr (Fortran): 183 inverse of a matrix: 86 inverse complex (C): 264 invisible: 388 invisible: 45. H.0 .rationalnum: 249 is. 389 compiled: 377 functional: 213. 263 is na (C): 175. 273. 127 is. 245.nan: 2 is. Burns v1. 223. Chris: 329. 220 IQR: 388 is.dump: 137 Lawrence. 384 interpreted: 388 modes: 201 natural: 393 LAPACK: 389 lapply: 14. 295. 147 int (C): 179 integer: 125. 388 justify: 278 K: 388 keep option: 47. 27.ab: 109 Jeﬀers. 389 of a function: 202. 389 Knuth. Brian: 37.lagrange: 325 interpreted language: 388 interquartile range: 388 interrupt: 223 interrupt signal: 138.numeric: 125 is. 321 integration numerical: 107. 18. 389 ld (Unix): 172 leaps: 30 least absolute deviations: 114.rationalnum: 250 is. 125 is. 75.max: 240 integrate: 107. 92. 394 interactive: 388 interactive function: 138 interlude: 59. 122 is inf (C): 175. 352 S Poetry c 1998 Patrick J. 322–327 interpolator. 220 intermittent bug: 151 internal generic function: 225 interp: 109 interpolation: 109. 388 integer: 128 integer division (C): 181 integer programming: 388 integer. 389 labels: 389 labels: 132 lag: 126 Lagrange interpolation: 322–327 language: 1.matrix: 122 is. D. 274. 77. 368 kronecker product: 109. 78. 186. 319–322.loaded: 129 is.vector: 121. 243.: 101 laziness: 25 lazy evaluation: 213. 291.na: 97.418 Fortran: 182 inspect: 139.

Mike: 49.: 186. see vectorization. P.0 . 45.loc: 24. 404 loan: 226. 125. Dan: 186. 137.integral: 320.rationalnum: 252 mathematical graphs: 297–317 Mathews. 381 static: 172. 40 LICENSE: 195 likelihood: 390 limit arbitrary: 30 limitation: 390 line. 395 marginal distribution: 391 Margolis. 231 list: 3. 191 load: 390 dynamic: 172. 212 match. 369 low-level graphics function: 15. 96 logistic regression: 114 long (C): 179 long-tailed distribution: 391 lookup table: 391 looping: 18–19. 391 subscripting with: 7 logical operator: 95. 391 Magarshack.tri: 86 LP: 391 machine dependent: 391 macro (C): 175. 391 match: 32. 285–391 adjacency: 310 augmenting: 63 confusion: 378 S Poetry c 1998 Patrick J. John: 131 make (Unix): 160. David: 362 Maindonald. Burns v1. 161. 391 lower case: 276 lower. 391 outer: 16. 258 length: 128.fit. 193. 368 mathgraph: 297 matrix: 107. 122. 135–137. 71–73. 164. 368 Marlowe.qr: 61 ln (Unix): 160 LOAD: 172. 215 Math group: 225 Math.rationalnum: 249 levels: 390 levels: 132 lgamma: 256 lib.INDEX zero: 3. 40 Libes. Christopher: 19 MASKED: 193 masked: 136. 195–200. 227 local regression: 115 local.mathgraph: 303 length. 322 linear constraint: 390 linear programming: 390 lines: 15 link: 390 link (Unix): 160 link function: 114. 368 library: 390 library: 24.call: 102.names: 84 malloc (C): 175 man (Unix): 159 man page: 391 mandatory space: 127 mantissa: 391 margin: 16. John: 353. 193 masking: 39. 219 log10: 261 419 logical: 2. 129 length option: 47 length.Sqpe: 191 locator: 113 loess: 115 log: 209. 128 literal: 390 lm. see search list attaching: 40 combining: 123 component of: 3 list: 3. 191 make. 390 linked list: 390 lint (Unix): 180 Lisp: 37. see for speed of: 18 Loukides. see apply.

Peter: 118. splendiferous: 25 mixed eﬀects model: 114 MLE: 392 mnb0 (Fortran): 341 mode: 1. 62.: 369. 243. 142 matrix function: 85 max: 97. 199 missing value: 258. 98. 211 model: 393 modular programming: 25. 208–211. 129. 356–360 virtual: 407 memory frame: 137. 200 while: 204 mode: 97. 201–205.Argument: 215 “brace”: 221. Burns v1. 384 in loops: 65 inﬁnite: 361 monitoring: 60. 98 minimizing expense: 36 INDEX missing: 392 missing function: 26. 392 menu: 392 menu: 106 merge. 392 . 226. 296 next: 204 repeat: 204 return: 204 unknown: 199. 214– 215. J. 321 maximum likelihood estimation: 392 McCullagh. 393 modulo: 393 modulo operator: 169 modulus: 393 monadic: 393 Monte Carlo: 393 Moore. 392 fragmented: 61. 296 categories of: 201 expression: 202 for: 204 function: 202 if: 204 list: 3 missing: 130.: 318. 367 motif: 15 ms: 109 mult complex (C): 264 multiple S sessions: 103 multitasking: 393 multivariate: 393 multivariate analysis: 114 NA: 2. 402 mean: 392 median: 392 meg: 392 megabyte: 392 memory: 20. 324 break: 204 call: 202. 218. 17. 204. 126. 91. 392 argument of: 128 methods function: 70 mfcol parameter: 15 micro-: 392 min: 97. 406 tridiagonal: 406 matrix class: 32. 392. 199 name: 202. 369 Mead.0 . R. 63 reducing: 60–68. 294–296.420 diagonal: 286. 217. 285. 127 na set (C): 263 S Poetry c 1998 Patrick J. M. 199 missing mode: 130. 228. see NA in C: 175 mistake. 380 hat: 386 Hermitian: 387 in C: 173 in data frame: 294 information: 387 inverting: 86 orthogonal: 395 S: 3 singular: 403 sparse: 403 square root: 285 symmetric: 405 triangular: 107.levels: 116 method: 14. Philip: 204 Mead.

223–227. 377 control: 95.mathgraph: 303 names. J. 164. 252 nlmin: 109. 367 Newton-Raphson: 327 next: 170 NextMethod: 226.rationalnum: 248 names<-. 296 names: 121 names: 3 names as character: 99 names. 338 nlminb: 109. 96 user deﬁned: 207 Ops group: 225 optimization: 327–353 optimization of C code: 180 optimize: 109 option: 45–48. 143 echo: 93 error: 47. 196. 393 as character: 98 conventions for: 27 in C: 167 invalid: 98. 236. 369. Richard: 118. 395 continue: 47 digits: 46. 264 name: 40–44. 223 one-oﬀ bug: 53 OOP: 395. John: 118.exit: 43. 379 generic: 224 in C: 169 logical: 95. 394 O’Reilly. 367 on. Burns v1. 216 expressions: 240 keep: 99 keep: 47. see inheritance object.INDEX na set3 (C): 175. 319–322. 104 S Poetry c 1998 Patrick J. 369 object: 394 object code: 395 object-oriented programming: 13– 15. 394 Newman. 394 numerical integration: 107. 201 natural language: 393 nchar: 82. 204. 278 Nelder. 393.0 . 46 objects: 1–4 objects: 41. 395. 99. 220. 42 objects. see object-oriented programming operating system: 395 operator: 395 comparison: 18. 137.size function: 46 object. 338 nlregb: 115 nls: 115 NM: 173. 29. 101. see generic function. 241 numeric: 394 numerical diﬀerentiation: 394 numerical error: 124. 402 netlib: 394 newline: 2. 277 name mode: 202. D.rationalnum: 248 NaN: 2. 24. 192 nm (Unix): 192 node: 297. 125. 197 null: 394 421 numberbase: 235. see NA nano-: 393 napsack: 335 nargs: 43.: 353. 96.size option: 20.mathgraph: 303 names<-. 394 nonlinear regression: 115 nonparametric: 394 norm: 394 norm rand (C): 176 not operator: 126 not-a-number: 394. 242. see NA NP-complete: 394 nroﬀ (Unix): 193 NULL: 2.summary: 42 octal: 395 Oedipus algorithm: 327 Olshen. Tim: 49. 202 of function: 30 valid: 4.

stemplate: 348 positive deﬁnite: 397 S Poetry c 1998 Patrick J. 106. 258 pmax: 97. 348 portoptgen. 74. 299.: 37.replace: 73 p. 396 random: 112 INDEX permutation test: 396 permuting an array: 87 persp: 109 phone book code: 24 pi: 90 pico-: 396 pie: 113 pie chart: 113 pigment data: 207 pipe (Unix): 157. 369 polygamma: 256 polygamma functions: 256–262 polygon: 106 polynomial: 109 polyroot: 30.C: 177. 206.load: 24 poet. 125 or: 95 exclusive: 96 order: 81. 278. 125. 98. 325 outer margin: 16. 113 plot area: 16. 109 pop: 228. 89. 74.unix.mathgraph: 317 pmatch: 94. 47. 301.0 . J. 397 pointer to pointer (C): 173 pointers argument to .size: 46 prompt: 47 warn: 46. 254. 369 penalty for constraint: 335 Perl: 164. 107 poet. Tony: 290 platform: 396 Plauger. Jerry: 49.verif: 55 pointer (C): 168. 178 points: 15 polar coordinates: 397 Polya. 82. 98. 103. George: 37. 220 width: 47 optional argument: 19. P. 164. 48. 341 portopt one Sp (C): 340 portoptgen: 338.restore: 191 poet. 302 PATH: 158 pedestrian functions: 81 Peek. 338–353 portable: 397 portopt. 395 outlier: 395 overﬂow: 395 overload: 395 p-value: 396 p. 95. 261. 344. 82.ctemplate: 348 portoptgen. 321 pmin: 97. 288 ordered: 117 organizing objects: 44–45 orthogonal matrix: 395 orthogonal vectors: 395 outer: 73. 324 password: 155 paste: 396 paste: 45. 275–278 perl: 275 permission: 396 ﬁle: 160 permutation: 89. Burns v1.422 length: 47 object. 347. 396 pivot: 396 places searched for object: 210 Plate. 397 plot. 137. 270.data. 396 parse: 57. 188.control: 341 portopt1: 338.time: 41 package: 1 packed: 396 paging: 396 par: 15. 323. 98. 395 options: 45.dyn. 397 popd (Unix): 163 PORT: 109. 345. 101 parent: 396 parent of frame: 210 parenthesis: 396 parse: 135. 368 plot: 15. 48.

398 parent: 153 product dot: 381 Hadamard: 386 kronecker: 109. 182 prior distribution: 397 private view: 14 prmatrix: 142 probability distribution: 111 PROBLEM (C): 176 proc.rationalnum: 245 print. 93.default: 143 print.numberbase: 240 print. 397 table of: 16 predictor: 397 preprocessor (C): 171.: 270. 399 RAM: 19. 399 random access memory: 399 random algorithm: 399 random number generating in C: 176 generator: 110 random permutation: 112 random seed: 399 range: 399 rank: 399 rank: 59 S Poetry c 1998 Patrick J. 92. H. 399 radian: 399 radix: 399 changing: 235–242 ragged array: 79.0 . 398 quick call: 208. 193 prompt option: 47 prose documentation: 34 ps (Unix): 159 pseudorandom number: 398. 148. 20. 369 prim9: 20 prime numbers: 109 primes: 109 principal components: 114. 145. 319–322.INDEX positive semideﬁnite: 397 posterior distribution: 397 PostScript: 397 postscript: 15 pound sign: 397 Pound.matrix: 141–143 print. 403 R: 231–232. 398 quantile: 398 quartile: 398 questions on S-news: 151 queue: 231. 274. 381 single: 2. 399 quote double: 2.time: 223 process (Unix): 153. Burns v1. 144. 398 pushd (Unix): 163 QP: 398 QQplot: 398 qr: 107 QR decomposition: 398 quad. 210.form: 174 quadratic form: 174 quadratic programming: 398 quadrature: 107. see random number public view: 14 push: 229. 389 proﬁling time use: 59 progexam library: 109 programming steps of: 365 projection: 398 423 projection pursuit: 398 projection pursuit regression: 115 prompt: 398 prompt function: 34. 271. 220 print (C): 182 print method writing: 92 print. Ezra: 81 ppois: 112 precedence: 16–17. 139. 397 Press.stack: 230 printf (C): 180. W. 157. 157.mathgraph: 300 print. 397 princomp: 114 print: 47.

36. 238–240. 127.table: 91 reading data: 90 readline: 105. 400 reduce: 356 reduce. see subscripting replication: 4. John Crowe: 275 rational numbers: 242–256 rationalnum: 242. Dennis: 186. 177 S FIRST: 155.0 . 309 Ripley. 400 reserved word: 87. 180. 365 Reed. 128 root: 109.424 Ransom. Edwin Arlington: 31 robust: 401 statistical: 113 Roethke. 401 row: 86. 400 RECOVER (C): 176 rectangular coordinates: 400 recursion: 211.h ﬁle: 171. 266 S alloc (C): 175. 30 repeat: 204 replacement: 400. 118. 95 rep: 29. 191. 106 real-time: 399 realpr (Fortran): 183 Recall: 211. 157. Henry: 19 register: 400 regression: 114 constrained: 114 constrained nonlinear: 115 least absolute deviation: 114 least trimmed squares: 114 local: 115 logistic: 114 M-estimation: 114 nonlinear: 115 projection pursuit: 115 regression test: 52. 252 rbind: 140 RCS: 48 read. 400 resid: 224 INDEX residuals: 224 response: 400 restart: 215 restore: 97 return: 204 return value: 401 rev: 82. 190 S PATH: 154 S realloc (C): 177 S SILENT STARTUP: 155.get: 92 S Poetry c 1998 Patrick J. 369 Ritchie. 248. Brian: 21. 30. 116 row generalized: 116 row-major order: 401 RTFM: 401 rug: 401 rules of style: 37 run-length encoding: 87 run-time: 401 S-mode: 42 S-news: 401 questions to: 151 S.rationalnum: 243. Theodore: 3. 368 rle: 87 rm: 95 rmatsqrt: 286 rnorm: 111 Robinson. Burns v1.twomat: 357 redundancy: 23. 401 rounding error: 124. 190 S WORK: 154 s wsFe: 183 safe programming: 31–33 sample: 401 sample: 112 sapply: 78 SAS: 92 sas. 44. 29 REPORT: 189–191 required argument: 19. 400 regular expression: 400 remove: 42. 400 recursive modes: 201 recursive object: 400 redirection: 97. 157. 401 rounding: 124. 255 reduce. 185. 400 regression tree: 115. 239 record: 90.

apply: 288 stack: 228–231.mathgraph: 310 source: 403 source: 55. 356 signal: 223. 80 SQL: 403 Sqpe: 188 square root symmetric: 285 stable distribution: 404 stable sort: 404 stable. 369 splendiferous mistake: 25 spline: 403 split: 75. A. 368 shell: 402 Bourne: 156. 369 sed (Unix): 164 seed: 402 random: 399 seed in (C): 176 seed out (C): 176 selection: 402 semantics: 402 semicolon: 167 seq: 95. 178. F. 374 C: 156. 126 sort: 81 sort stable: 404 sort. Stevie: 117 smooth: 403 smooth: 115 solve: 86 soptions: 45. 189 SIC: 402 side eﬀect: 402 admonishment against: 26. 402 interrupt: 138. 369 scientiﬁc notation: 402 script: 402 search: 39 search for an object: 210 search list: 39. 53. 403 single-precision: 403 singular matrix: 403 singular value decomposition: 107. Percy Bysshe: 41 Shere: 44 SHOME: 158. 118.: 118.index: 116 small minds hobgoblin of: 27 Smith. 128 session: 402 session database: 210.seed: 111 Shakespeare. 404 S Poetry c 1998 Patrick J. Burns v1. 402 not on: 99 seasonal decomposition: 114 Seber. see database 0 set operations: 116 set. 210. 91 SCCS: 48 sccs: 49. 94 slash: 403 slice. 159 signiﬁcant digits: 402 silly error: 149 simplex algorithm: 402 425 simplicity: 25 Simpson’s method: 320 simulated annealing: 403 sin: 323 single quote: 157. 403 sink: 93. 154.INDEX scalar: 401 scan: 90. 375 shell (Unix): 156 shell variable: 155 Shelley. 335. Phil: 21. 403 C: 175 specialsok argument: 175 Spector.0 .list: 117 sort. William: 58 shar (Unix): 164 Sheehan. 188. Nuala: 329. 280 space mandatory: 127 sparse matrix: 403 special value: 266. Randal: 282. G. 192 Schwartz. 96 source code: 403 source control: 48–51.

122. 137. 367 Sussman. 125 stream: 404 string: 404 empty: 196 strings. 324 substring: 82. 295 sys. 98. 140. Z. 12. 404 standard-out: 156. 75 Swenson. 137. 301 INDEX sum: 17 summary: 330 Summary group: 225 summary. Charles: 118. 276. 254. 405 swapping: 405 sweep: 74. 280 with negative numbers: 6 with positive numbers: 5 with [[: 9 substifile: 277. 278. 295 system terminating: 56. 80 Taylor. Burns v1. 404 star: 404 state.: 260.For: 129 symbolic computation: 405 symbolic derivative: 108 symbolic link (Unix): 160 symmetric matrix: 405 symmetric square root: 285 symsqrt: 285 synchronize: 103–105.parent: 214.h ﬁle: 278 structure: 201 structure (C): 171 stty (Unix): 161 style. 405 tab: 405 table contingency: 378 hash: 103. 352 substitute: 102. 404 storage. 208 switch (C): 170 symbol: 173. 404 standard-in: 156. Wallace: 196 Stone. 348. 301.call: 214 sys. Todd: 132 tek4105: 15 S Poetry c 1998 Patrick J. May: 324 switch: 129.426 directory: 163 stack: 228 standard deviation: 404 standard error: 404 standard-error: 157. 139.interlude: 60 surprise: 27 Sussman. 220.frame: 214.region: 75 static (C): 173 static load: 172. 217.mode: 97. 274.nframe: 214. Julie: 37. 270. 311 state. 367 stop: 33. Gerald: 37. 122. 206 sys.0 . 405 symbol. rules of: 37 subscript: 405 subscripting: 5–13 drop argument: 10. 179. 304 empty: 12 of arrays: 10 of matrices: 32 with arrays: 11 with characters: 6 with logicals: 7.C: 129 symbol.address: 342 symbol. 218 storage mode: 179. 139 syntax: 405 syntax error: 135. 302 sys. 295. 405 table: 86. 281. 386 symbol: 172. 404 statlib: 404 status: 404 stdout (C): 180 Stegun. 330 tail (Unix): 159 tapply: 79. I. 170. 405 duplicate: 173 symbol table: 172. 267. 367 steps of programming: 365 Stevens. 32. 122. 367 svd: 107 swap-space: 19.name: 75.

rationalnum: 253 uniroot: 109 univariate: 406 Unix: 153–164 alias: 158 apropos: 164 at: 160 awk: 164 background: 157 Bourne shell: 156 C shell: 156 cat: 159 chmod: 158. 141. 124 tennis: 138 tera-: 405 terminology. Rob: 118. 139. 406 time use: 58–60. 224. 285. 147 traceback: 137.sub: 240 token: 406 top-down programming: 30 topic: 117 trace: 406 trace: 60. 291. 321 unable to obtain requested dynamic memory: 20 unary: 406 unclass: 303. 142. 405 throw away ﬁrst version: 31 Tibshirani. 145. 251. 160 crontab: 160 diﬀ: 43. Burns v1. 240 regression: 52 white-box: 52 test (Unix): 162 test suite: 51–58 tetragamma: 256 Teukolsky. 208 transcribe: 276. 278.value: 218. A. Henry David: 4 three-dots: 19.: 270. 160 S Poetry c 1998 Patrick J.0 . 406 triangular matrix: 107. S. 369 TEX: 189. 190 text: see character justify: 278–279 Thackeray.base10.mathgraph: 309 unique. William Makepeace: 360 Theorem 1: 361 think globally: 30 safety: 31–33 Thomas. 405. 310 underﬂow: 406 underscore: 406 understand the code: 149 undocumented feature: 106 unif rand (C): 176 uninterlude: 60 unique. 406 tridiagonal matrix: 406 trigamma: 256 troﬀ (Unix): 189 TRUNC AUDIT: 194 truncate: 406 truncating numbers: 124 twice: 362 twice-two: 362 umask (Unix): 160 unabbrev. 219 timing: 58 title: 406 graphics: 16 tmp: 28 to. table comparing: 173 terms: 291 test black-box: 52 garbage: 53.INDEX tempfile: 43. 348 427 transpose: 406 trapezoid method: 320 tree: 406 binary: 374 classiﬁcation: 376 regression: 400 tree: 115 trellis: 113. Dylan: 161 Thoreau. 138. 331. 277. 274. 76–77.base10: 236 to. 368 tick: 405 tick label: 16 tilde: 203. 144. see formula time: 405 time series: 114. 271.

Benjamin: 121 whence: 210. 121. William: 21. S. Donald: 118. 369 vi (Unix): 42. 367 Wharf. 369 warn option: 137 warn option: 46.nan: 97 while: 204. 398 ps: 159 pushd: 163 sed: 164 shar: 164 shell: 156 stty: 161 tail: 159 test: 162 troﬀ: 189 umask: 160 vi: 42.: 270.inf: 97 which. 219.name: 279 value: 407 special: 403 Van Loan. 14. 269.0 .time: 41. 396 popd: 163 process: 153. Charles: 290. 118. 407 vertex: 297. 161.loan: 226 upper case: 276 urges foul: 149 UseMethod: 223.s.na: 97 which. J. 407 INDEX valid. 382 ﬁnd: 161 grep: 159 head: 159 hostname: 188 kill: 159 ld: 172 lint: 180 ln: 160 make: 160. 325. 34. T. H. 407 Venables. 161 whereis: 161 which: 161 unix function: 84. 278 unix.: 274. Larry: 282. 236 white book: 407 white space: 407 white-box test: 52 width option: 47 Wild. 368 varcomp: 114 variance components: 114 vector: 1. 161 virtual memory: 407 virtues: 25 Wagoner. 320.shell function: 84 unix. 101 unpaste: 299 unstable: 407 update. W. 224 utility: 188–195. 220 warning: 407 coerce to error: 137 warning: 98 WARNING (C): 176 warnings: 98 Watts. 58.: 118. 369 Wall. 191 man: 159 nm: 192 nroﬀ: 193 parent process: 153 pipe: 157. 274. 57 version: 57 version 3: 13. 28.428 environment: 153 environment variable: 153–155. C. 407 version 4: 13. 125. 271. David: 195 Wall. 28. 228 whereis (Unix): 161 which (Unix): 161 which. 280. 220 unknown mode: 199. 275. 200 unlink: 43. Burns v1. 407 vector function: 128 vectorization: 4–5. 14. 369 verify: 51–53. 407 Vetterling. 369 S Poetry c 1998 Patrick J.

summary: 184 xerrwv (Fortran): 184 xor: 96 Yeats. William Butler: 149. 407 working database: 39.: 273. 408 write: 93 write. Louis: 167 429 S Poetry c 1998 Patrick J. 374 Wodehouse. 258 Zukofsky.: 152 word: 407 reserved: 400 workaround: 36. Burns v1. 202. G. 369 zapsmall: 116 zero: 109 zero length object: 3. P. D. 233. 21. Alan: 3. 125.INDEX wildcard: 407 Wilks.0 . 242 Young. 274. 70. 178. 367. 195–200.table: 93 wrong answer: 33 WYSIWIG: 408 xerror (Fortran): 184 xerror.

- Panduan Durusul Lughah Al Arabiyah 1amn
- 01_9727
- PEMBETUKANFIILMAR
- Panduan Durusul Lughah Al Arabiyah 1amn
- Mohammad Jahoorul Islam 2013 9
- Mohammad Jahoorul Islam 2013 9
- Four Books expert
- photoshoot
- photo shoot
- Materials Handbook
- Custom Act
- Dod Ethics Fail
- Collected Poems, Dylan Thomas - Dylan Thomas
- Pregnancy
- Public policy
- encyclkopedia of economy
- Sufi Mystical Poetry
- BAH_hsp_bm_f1
- Turn Down the Heat Why a 4 Degree Centrigrade Warmer World Must Be Avoided
- Jenkins Coc
- Kaspersky Global It-security-risks-survey Report Eng Final
- CreativityAndCognition09 Fp392 Andre
- Scholte
- WDR 2012 Web Small

programing sceinces

programing sceinces

- L2 - Basic C Elements_4
- 09templa
- C Basics
- JSP Syntax
- python_en_toc.pdf
- C++ Essentials
- Expressions_White_Paper
- Variable Length Codes
- Micro Manual LISP
- Opengles31 Quick Reference Card
- Scilab Computation
- BCA Syllabus - 1st and 2nd Semester - CBCS
- Oberon07.Report
- Gate Syllabus
- BASH_Programming_Part_1.pdf
- V Rajaraman Computer Oriented Numerical Methods
- class-1b
- Scalable Key Cryptography
- Syllabus MCA 12-15
- MCA MG Syllabus 1st Sem
- DLmaxBitRateSF32.txt
- GATE Syllabus
- 10.1.1.16.3037
- Employing Reverse Polish Notation in Encryption
- Usr Guide
- C Notes Complete
- c Programming
- Hecataeus Reference
- latex algorithms package
- Quick Guide for Complex Variable Toolbox

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd