Introductory Guide to S-Plus

Final Version
B.D. Ripley Professor of Applied Statistics, University of Oxford e-mail:

0 ( %  !        © § ¥ £ ¡ ¨)'&$#" ¢¦¦¨¦¤¢ 
24 August 1994

Preface
This guide was originally written for graduate students in Statistics at the University of Oxford. The first versions were based closely on notes by Dr. Bill Venables of the Department of Statistics at the University of Adelaide, but have been updated to reflect later versions of S, the extensions of S-Plus and local facilities. Several sections, in particular 4, 6 and 11, remain close to Dr. Venables’ original material. This guide will no longer be updated, following the publication of Venables & Ripley (1994). [See p. 1. Where that takes a significantly better approach than earlier editions of these notes, the material formerly here has been dropped.] The guide is to S-Plus, but much of it will be relevant to users of the underlying S. Extensions which are only in S-Plus include dynamic graphics ( 6.3, and ) and the classical statistics functions ( 9). The terminology of this guide is intended to be precise, only referring to S-Plus rather than S for features unique to S-Plus. These notes were written for a particular environment, S-Plus 3.2 on Sun SparcStations running the Open Windows windowing system. You will find a number of differences depending on your local environment. It will help to have the library available — it should be in the same source as these notes. It can be also be obtained by anonymous ftp from (163.1.20.1)

in file

. It is available from

(see Section A.2) as

Alternatively,

from Venables & Ripley (1994) can be used.

This guide may be freely copied and redistributed for any educational purpose (including commercial courses) provided its authorship (B.D. Ripley and W.N. Venables) is clearly stated. Where appropriate, a small charge to cover the costs of production and distribution, only, may be made. B.D. Ripley, University of Oxford, 24th August, 1994.

i

A 9 7 @

8 642 7 53

FDC 9@ GEB3

2@C W B4W I B7

1

y d d x vu F 3 @ aafaw¦Q3 I 42 "C d H "4rsaf9 r"GQ7 R 3t FDC @ 3 q AD iT 87T D C @ 3 c d c 2 5 pgh)gF E9 feEaEa9 P 5 ` T YRT 7 W7T SR P I EbT aI UGXW I BVUGQ3 H

1

1

Contents

Contents

1.1

Introduction

Starting and Finishing

€‚ƒ‚ƒ‚‚ƒ‚‚ƒ€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € €‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€

4

3

2

Simple Data Manipulation

A First Session

Datasets

1.3

1.2

4.1

Hardcopy Output

Getting Help

Vectors

€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € ‚ƒ‚ƒ‚€ €‚ƒ‚ƒ‚‚ƒ‚‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚€ €‚ƒ‚ƒ‚‚ƒ‚‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€

5

4.8

5.1

Reading data into S

4.9

4.7

4.6

4.5

4.4

4.3

4.2

Lists

Writing out data

€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ€

6

6.1

Graphics

Graphical Parameters

€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€ €‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € €‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € €‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€

6.5 6.4 6.3 6.2

Equally-scaled plots Brush and Spin Data Frames Arrays Character Vectors Vector Arithmetic

Interaction with Plots

Some Basic Plotting Functions

Index Vectors. Selecting and Modifying Subsets of a Data Set

Logical Vectors. Missing Values

Generating Regular Sequences of Numbers.

18 18 17 17 16 15 12 11 10 14 9 8 8 7 6 6 3 2 1 6 5 1 3
ii

16

y€€€u FC 9 …hh„¦sE9 I W

7

Contents

7.1

Statistical Summaries

Arithmetical Summaries

€‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € €‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€

8

8.1

Distributions

7.3

7.2

Q-Q Plots

€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ‚€

10 Handling Categorical Data

9

Classical Statistics

10.1 The Function

Boxplots

Histograms and Stem-and-Leaf Plots

and Ragged Arrays

€ € € € € € € € € € € € € € € ‚ƒ‚ƒ‚€

11 Loops and Conditional Execution

12 Writing Your Own Functions

13 Statistical Models

13.1 Model Formulas

€ € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€ €‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚€ €‚ƒ‚ƒ‚‚ƒ‚‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ€

14 Multivariate Analysis

A Libraries

Appendix

13.5 Updating and Selecting Models

13.4 Generalized Linear Models

13.3 Designed Experiments

13.2 One-way Layouts

A.1 Library

€‚ƒ‚ƒ‚‚ƒ‚€ € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € € ‚ƒ‚ƒ‚‚ƒ‚‚ƒ€

A.2 Sources of Libraries

46 42 39 35 33 32 28 23 21 20 46 20 27 29 43 20 22 24 30 45 32
iii

FDC @ Gaf9 f3

There also manuals for S-Plus itself, whose organization differs from release to release. Other books include W.N. Venables and B.D. Ripley (1994) Modern Applied Statistics with S-Plus. New York: Springer ISBN 0-387-94350-1 which goes far beyond the coverage of this guide, including many topics (such as robust statistics, non-linear regressions, modern regression, survival analysis, tree-based models, time series and spatial statistics) not covered here, as well as in greater depth on what is covered. 1.1 Starting and Finishing To start S-Plus, type the command

This is waiting for input from you. Technically S is a function language with a very simple syntax. Like most Unix based packages it is case sensitive, so and are different variables. Elementary commands consist of either expressions or assignments. If an expression is given as a command, it is evaluated, printed, and the value is discarded . An assignment also evaluates an expression and passes the value
which can be changed, but the default is assumed here In fact it is kept in the (hidden) variable and so can be retrieved from the ‘bin’.

‰

h g f e” ™ ˜ • 6Vh— –dB— –”

I

‘

x

7 5C 9d ˆD @ `I ‡ssr6A 8 E‡H

After a short while (and, the first time, an initialization message) you get the S-Plus prompt :

1

’

“



†

Introduction

1

1 Introduction
S is a statistical language developed at AT&T’s Bell Laboratories. S-Plus is a binary distribution of S, with added functions, produced by the StatSci Division of MathSoft in Seattle. The S system was radically re-designed in the 1988 release and known as ‘New S’. In August 1991 a new release of what is once again called S consisted of a moderate revision of ‘New S’ together with far-ranging extensions. S-Plus 3.0 was introduced in late 1991, based on that release of S, with numerous additional features. S-Plus 3.1 was released at the very end of 1992, and S-Plus 3.2 in very early 1994. The main references are: R.A. Becker, J.M. Chambers and A.R. Wilks (1988) The NEW S language. Wadsworth & Brooks/Cole. J.M. Chambers and T.J. Hastie (1992) Statistical Models in S. Wadsworth & Brooks/Cole. It is not the intention of this guide to replace the books. Rather these notes are intended as a brief introduction to the capabilities of the S programming language and to how to perform some common statistical procedures within S. Users of S-Plus will need to consult both books, probably frequently. Both books contain some reference documentation, but the on-line versions (see 1.2) are later and definitive.

1.2 Getting Help

2

to a variable but the result is not printed automatically. An expression can be as simple as or a complex function call. Assignments are indicated by the assignment operator or . (As the first needs two keystrokes, lazy typists use the second. However, the first is easier to read.) For example,

The

states that the answer is starting at the first element of a vector.

Commands are separated either by a semi-colon, , or by a newline. If a command is not complete at the end of a line, S will give a different prompt, namely

on second and subsequent lines and continue to read input until the command is syntactically complete. S can be extended by writing new functions, which then can be used exactly as built-in functions (and can even replace them). How to write your own functions is covered in section 12. 1.2 Getting Help S has an inbuilt help facility similar to the man facility of Unix. To get more information on , the command is any specific named function or dataset, for example

Help uses a window which overlays your main window. The pager accepts a number of options, including for the next page and to quit. (Other useful options are to go to the top and to go back a page.) If you prefer, a separate help window (which can be left up) can be obtained by the argument . Another way to get help is by

Short help is given by the function

.

S-Plus also has a window-based help facility, started by

Click with the left mouse button on items to select categories and items. The help window can be left up, or removed by

€77@ 7 4aE‡B6€

For a feature specified by special characters, and in a few other cases (one is argument must be enclosed in double quotes, making it a ‘character string’:

), the

j ri

m al

‚ Bo

AID H

y

qo Bpn i Ex T o St~v6o 3as4c Bpn y u W } 7k q H o y‡63 I WB7‡8t&3 I S|mal{Szy¨yQ3 I W 7‡…u¦A I D H mEl H  W u W 8 x w axsawdT v y W63 I W 8t¦kA uBpn  7 u a oI D q H o r 6q o siBpn kj  „ƒR q @ 64s""A ‡ 7 ‡ ‡Q3 I } y € P R R C D 9 R € ƒ @ 5 ‡u W W 7T 9 C D 6‡s6afA GBGEw¦63 I BVgsaf8  2mCR 3WA saE"4EGR ` D EI B7 ` 9 y ‡A I D H ¦saf8  u 9CD y 6€ n an €u 9CD $¦saf8  A I D H † …

j

k

1.3 Hardcopy Output

3

It is not advisable to quit S-Plus windows from the frame menu. 1.3 Hardcopy Output Graphics are printed by holding down the right button on the menu in a window (see 6) and releasing over the print item. This will print on the nearest laser printer (or that selected by your environment variable).

2 Datasets
Datasets are stored in a directory . They are permanent, so all the objects you create are retained until explicitly deleted. (As the directory name begins with it will normally be hidden in file listings from Unix by .) If there is a directory in the current directory when S is invoked, that directory is used rather than . This provides one way to organize your S, using separate directories for each project. In S, to get a list of names of the objects currently defined use the command

Your own functions are also stored in . To find out whether an object is a function or dataset, and what is in it, just type its name at the prompt, e.g.

This prints out the function, dataset, . In the later versions of S it may print a short summary of the object. To get the full details, use

When S looks for an object, it searches in turn through a sequence of directories known as the search list. Usually the first entry in the search list is the sub-directory of the current working directory. The names of the directories currently on the search list can be found by the function

The names of the objects held in any directory on the search list can be displayed by giving the function an argument. For example lists the contents of the second directory in the search list. Normally the second, third and fourth directories are built-in functions, and the fifth, sixth and seventh contain standard datasets Extra search directories can be added to this list with the function and removed with the function, details of which can be found in the manuals or the fa-

9CD GE8

y€€€u ` W thh„~8 aI aW I

Ž I W I pT

yiu 7 D 2 4’‘‡W ` ¤4GR

y

object

T

T c I W I Ž& Ž I W I T Ž I W I pT

W fefaGfW @ qD WYD

To record a session cut-and-paste to a and save as a Unix file.

window, then remove your mistakes (if any)

yu PRRC AD 9 p~GaEsGR

8 3 a9 I "‡

7 4C

Ž I W I pT

ŽT c I W I p&

€€ hh€

‰Œ „ Š ‰ 6"a‹ 6ˆ

y€€€u ` WD thh„~8 aI G¤q

u W C t D qT W A@ 3 s"5 I s"XgaB"9 

yu t tRT 9C D „~as…gsaf8  1 yu 7 D 2 h‘W ` ¤4sR  yu „~8 ` 3 I Q7  D WRC saf9  YT ` W gP EI B7  7 4C

cility. Note that attached directories are searched after the attached to first attached. To remove objects permanently the function

is available:

The function Warning

can be used to remove objects with non-standard names.

Objects in your directory will take precedence over system objects of the same name. This is a frequent cause of rather obscure errors, and can cause apparently correct behaviour but erroneous results. Avoid using names such as for your own objects. If you get peculiar errors, clean up your directory and try again! S keeps a record of commands in the file in the directory. This is a hidden file and can grow rather large. Use (from the Unix command line)

occasionally to clean out the audit file entirely (or omit the to keep the last 0.5Mb).

Ž I W I pT DD 3W ” D ‡ 3 C ‡ ” W ” 7 E"4›EA I š” H fš¢™¨˜” `

Ž I W I T

Ž I W I pT



H3

W f4aT @ q5x

y€€€u D S D thh„—sR H ¤3 y D W” P A 5 ” @” –” F” Y ‡9 H w¦Eas…¦A p~t•w¢…u H 3 

 Š ŽxŸ ž‹‰„ 7 5C 9 r„ EaG¤4EaaœGfGd

Ž I W I T

“

Datasets

4

directory in the order last

The sample session given below is intended to show by example some of the capabilities of the system. Work through the session given by the commands on the left of the page. Some clues as to what is going on are given at the right hand side of the page.

3 A First Session

A First Session

Set up a matrix Look at pattern of all three Use mouse to highlight points and check their identity. Then click on Finish session

¥ · ¬ Ì

³² Ì p°®

Stem-and-leaf plot. Scatter plot. linear regression summary of fit analysis of variance table plot line on scatter plot Move mouse to plot and click with left button to see what height is. Click middle button to quit. set up 1 row, 2 cols for plots plots of fitted values and residuals vs fitted value. one plot again. normal probability plot of residuals and of Studentized residuals line through quartiles all pair-wise scatter plots rotate points in 3D, select and de-select points. Click on to end multiple regression. Try functions as before

Ë

Ë

Start the session. Open the graphics window. Add a library of functions and datasets. use q to quit Print out a data frame of the trees data so that we can use names diam etc Histogram as counts. as probability density

Find the ‘odd’ states. to avoid any confusion

Ú»¦¥ ª­ º ­»´¢ µ ¢ ¡ ¤­ µ ³ saVpه§ · ¢ · 6a’6« É ¯ ‚º · ØÀ ¹ ¢ ¯ µ ² ’"¬ 6´ ® ­µ¥¢ ³ ¢ ¡ "· ØÀ ¹ ¢ ¯ µ ² E’e"ª ® ­ ¢ º § Ç º ­ µ§ ¿¥ ¹² ¦¥ ´£ Ä ¢ ¡ ³ "¤ · G§ ¹ ‘« "¬ ‡E„E’4µ t¹ Vp×G · fÖÀ ¹ ¢ ¯ µ ® ­ º ­ ¢ º «§ ¶Ç ¦ ³ § · ¢ · ›4¤ · s§ ¹ Ô$"¬ Ç ² G’¥ · ’§ ¹ ¥ ® ³ Ñ ¶ VÑ eÈ ¯ Õ"¤ · G§ ¹ Ô« 4BÇ ‘B¯ aª ® Ó » « º ­ ¢ º §¬ ² · « ­ º ­ ¢ º ­µ§¿¥ ¹ ¶Ç ¦ ³ § · ¢ · ›‡"¤ · s§ ¹ ahEp"µ …² G’¥ · ’§ ¹ ¥ ® ­§ ¡¢ ¦ É µ Ä ³ a¹ ¢ ¯ µ ² sfd"–À B¯ Ïs°§ · ¢ · ­ ® ³ Ó » « º ­ ¢ º ­ µ§ ¿¥ ¹² · « BÑ ¶ VÑ 6È ¯ ҇"¤ · s§ ¹ ‡E„E’4µ t‘B¯ aª ® ­ ¢ º ­ µ§ ¿¥ ¹² · « ³ "¤ · s§ ¹ ‡E„E’4µ t‘B¯ aª ® ¤£ · ³ E¹ ¢ ¯ µ ² ps¢ G· ¢ ® ª«§ ³ E¹ ¢ ¯ µ ² $$"¤ ® ³ ­§§ ·Ñ ¤£ ¹ BÑ Gsaµ )„² ps¢ · § °® « ¡ ¹ « § ¬ ¡« Ä Â Ð ¡«À ­ § § · ³³ ¤È¥§ s"· Bp6"¤ ² È ¯ |Í ³ V¢ ¥ …² È ¯ ˜Å ³ ¡ h« ¯ ¿ ² È ¯ « ² ’°sœ"ppÁssaµ ¸® ¡« Ä Â Ê ¡«À ­ § § · ³ ¤È¥§ ¤ Í ¡¢ Å § ¡¬ "· Bp6"ÏÎdB¥ ¹ |f$$« ¯ ¿ ² ’°sœ"ppÁssaµ ¸® ¥ · ¬ Ì È¥ § ¤ º ¡¢ ¹² ¦¥ ´ ¤­ µ ³ § ¬ G³ ¡ $« ¯ ¿zº · ¤ ’64'~dB¥ t¹ )p£ ² ’"¬ 6´ ® ­§§ · ­µ¥¢ ³ GsEµ X² E’e"ª ® «À ­ § · ­ § § ¦¥ ÌÌ ³ ³ ¡ppÁs§saµ w² saµ ¹ ¬ · ­ ² 4)E« s°® ¡« ­ · ­§ ¡ ÌÌ ³ ³ ppÀÁs§s§aµ w² saµ ¹ ¬ · ­ ² „µ ¯ ¦ s°® ­ ¢ ¥­§ ¡ ÌÌ ³ ¡ À ­§ G³ V«pÁsG§Eµ ·w² 6« "¬ ¹ esEµ ² „µ ¯ ¦ s°® ¼º £ É µÇ µ¢ ³s³ p¤¼ ² 6» B¯ sa¡ ² „"ª ® ¡«À ­ § § ·² · « ³ ppÁssaµ w‘B¯ aª ® ³ ʺ £ É µ Ç µ¢ s³ w¤¼ ² 6» B¯ sa¡ ² „"ª ® È º § ¬ ¿ º ¡¢ ¹ ¶Ç ¦ ³4· ¤ ’¥6§4¤¡ h« ¯ ‚~dB¥ …² G’¥ · ’§ ¹ ¥ ® ¡«À ­ § § · § ¦¥ « ´ ³ ’pÁsGEµ w² 4)EQ’¢ ® ¡«À ­ § § · ¢ ¦ ³ ’Æ‘GsEµ X² E¿ ¯ ’¢ ® ¡«À ­ § § · ¶ µ¢ ¡ ¡ ¬ ³ pp‘Gsaµ X² s„fs$’­ ® ¡ Å ¬ ¡« Ä Â ¡«À ­ § § · ³ V¢ ¥ ¹ |§¡ h« ¯ ¿ ² pœGÃppÁssaµ ¸® ¡¬ ¿ º ¡ ¹² · « ³ §f$$« ¯ ‚~V¢ ¥ t‘B¯ aª ® ¡ ¹ ¡ ³ V¢ ¥ t² V§ · ­ ® ³ ¥ ª«§ "· ­ ¤ ² $$"¤ ® « ´ µ º ½ «£ ¦ º ¡ ¹² ¥ ³ ¾ »6¶ · ¥E„¥p¢"´ ¯ 6ª‘’¼4» ­s­G¢Q$"~V¢ ¥ t‘· ­ ¤ ® ¡ ¹² ¥ ³ V¢ ¥ t‘· ­ ¤ ® ­§§ ³ sGEµ w² ps¢ G· ¢ ® · ¤£ · ­§§ · ssaµ ¸® ­§§ · ª«§ ³ ssaµ w² $$"¤ ® ª¥ ¶µ¢ µ´¥ ³ ¶h§6«aV"µ ² s„E6VE« ® ³² ¯ «¦§ ¯ ’‘± s¯ a’4ª °® ­ ¬«ª© ¨§ ¦¥ ¤£¢ 4$Q$œ$")¤’Gf¡

5

 

This is an assignment statement using the function taking an arbitrary number of vector arguments and whose value is the vector of its arguments. A number occurring by itself in an expression is taken as a vector of length one. Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using

If an expression is used as a complete command, the value is printed and lost. So now if we were to use the command

4.2 Vector Arithmetic Vectors can be used in arithmetic expressions, in which case the operations are performed element-by-element. Vectors occurring in the same expression need not all be of the same length. If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated. So with the above assignments the command

The elementary arithmetic operators are the usual , , , and for raising to a power. In addition all of the common arithmetic functions are available. , , , , , , , and so on, all have their usual meaning. and select the largest and smallest elements of an vector respectively. is a function whose value is a vector of length two, namely . The element-by-element maximum and minimum of two or more vectors are given by and . is the number of elements in , gives the total of the elements in and their product.

y Y 5 ‡tu H B7 Y

Yß GEi

generates a new vector of length 11 constructed by adding together, element-by-element, repeated 2.2 times, repeated just once, and repeated 11 times.

7 4R ` A a7 asD  GEC GEC @ 9Y ‡R ‡R o à

Y

the reciprocals of the five values would be printed (and, of course, the value of unchanged).

would be

Y

A@ H Y H I c ß m j

y€€€ ’h„hu `

y Yu q R 3 ‡t&e¤49 Y y Yu 8 W ‡ AD t~aaEGEC A @ H 9

o

DaA I 3 ‡

y v T ݕp¦Ý” T k …T ¢& u ` E{Y  i ” xT Ü ” Ü ” xT ml o o r o

Y  Þy v T Ý¢p¦Ï” T k …T •p& u `  m i ” xT Ü ” Ü ” xT o o r o

Y ‡H 9 I y y Yu I ” y Yu A a‡w~Y ‡H ¨‡…¦B@ H u `

S

o

F

j F j Yßi ml srsGarE{S 

W3} 6EG7 A I W

Y Gc

o 

Û

Simple Data Manipulation

6

4 Simple Data Manipulation
The basic data objects in S are vectors, arrays, lists and data frames. 4.1 Vectors S operates on named data structures. The simplest such structure is the vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named , say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the S command

Two statistical functions are , which evaluates to and , which gives the value , the sample variance. If the argument to is an matrix the value is a sample covariance matrix obtained from regarding the rows as independent -variate sample vectors. returns a vector of the same size as with the elements arranged in increasing order. Other, more flexible, sorting facilities are available (see which produces a permutation to do the sorting, and ). 4.3 Generating Regular Sequences of Numbers. S has a number of facilities for generating commonly used sequences of numbers. For exis the vector . The colon operator has highest priority within ample an expression, so, for example is the vector . Put and compare the sequences and . The construction may be used to generate a backwards sequence.

The function is a more general facility for generating sequences. It has five arguments, only some of which may be specified in any one call. The first two arguments, if given, specify the beginning and end of the sequence, and if these are the only two arguments given is the same vector as . the result is the same as the colon operator. That is, Parameters to , and to many other S functions, can also be given in named form, in which case the order in which they appear is irrelevant. The first two parameters may be named value and value; thus , and are all the same as . The next two parameters to may be named value and value, which specify a step size and a length for the sequence respectively. If neither of these is given, the default is assumed. For example

generates in

the vector

. Similarly

generates the same vector in

.

The fifth parameter may be named and creates a sequence empty (as it can be).

vector, which if used must be the only parameter, vector , or the empty sequence if the vector is

A related function is which can be used for replicating a structure in various complicated ways. The simplest form is

7

Y

which will put five copies of

end-to-end in

.

ƒ F "a2 y ƒ H "aXæ k eft&e67 R 3 t” ƒ R Wu } D o

y Yu t&3 I S  o ’å  o i

ml aèA

W E4…U6667 7 @ CT W 3 R y€€€u 3 D q 3 thh„&e¤aeR Y ä ä ã Uƒä ä ã gÕâ y€€€u Vhh„&3 I S y Qt~6aaA Ep‘Qas46tA I D H …au H 5 7 m y Yu 8 W ‡ D Cu c y ià y y Yu m Yu y Yu 8 W ‡ D C c y Y 5 t~6aaA EaQ‡tu oH B7 y Yu t~A I D H y mA…u m fA å y 4 k ”~çeV”s€h€h€p”æÜ)”¢X”æpu ` o —o i x i ß å ao i —o å y ” w i r ” o € ¨€ ” i " k æa)eGphho pV” u ` € o o y€€€u }D Vhh„667 y ƒR W 4 k 6"w” ƒ H ¤4…eQ7 4 k ” eQ7 R 3 tu }D y u }D o o y ” iu }D 4 æ’667 o r ƒ F "a2 y€€€u 9D Vhh„¦s"3  y€€€u }D Vhh„&eQ7 yVhh„&e67 €€€u }D  o ’å k k ¨o6W å ƒR

y u 8 ‡ADC ”€€€ ” i ¦EWaEGa˜E„h˜æϔ o ƒ4EGEC I ‡AR x B7 y T ƒ F 4id¢"a2z” m"ƒ H R¤34ٔ aEaEGa’66êEéB7  t ƒ8W‡ADCu }D7 ml x r o ¤r y"XT ”~wTUw”ædTgX”G€h€p€’æXTUx V¦pTUG)”æ…T ’u `  ç x Ü x ” Ü m” ç xm  m k7 r r m y iT ƒ F 2 mu }D k 7  Þ4d¢4Eš” r ” r p&eQ7 

y s4D H ‡Ù¢…¦s"ëEl 7  ƒ7 @ W ” Yu 9D 3 m r r

 k å ¨o

y Yu W 3 R ‡t~6667

ƒ8W‡AD 4aEasaC

ƒ H ¤4t R 3

á

4.3 Generating Regular Sequences of Numbers.

7

Logical vectors are generated by conditions. For example

sets as a vector of the same length as with values corresponding to elements of where the condition is not met and where it is. The logical operators are , , , , for exact equality and for inequality. In addition if and are logical expressions, then is their intersection (and), is their union (or) and is the negation of . Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, becoming and becoming . However there are situations where logical vectors and their coerced numeric counterparts are not equivalent. In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value . In general any operation on an becomes an . The motivation for this rule is simply that if the specification of an operation is incomplete, the result cannot be known and hence is not available. The function gives a logical vector of the same size as with value if and only if the corresponding element in is .

4.5 Character Vectors Character quantities and character strings are used frequently in S, for example as plot labels. They are denoted by a sequence of characters delimited by the double quote character. E.g. , . Single quotes can also be used, in matching pairs. Character strings may be collected into a vector by the will emerge frequently.

function; examples of their use

The function takes an arbitrary number of character string arguments and concatenates them into a single character string. Any numbers given among the arguments are coerced into character strings in the same way they would be if they were printed. The arguments are by default separated in the result by a single blank character, but this can be changed by the named parameter, string, which changes it to string, possibly empty. For example

makes the character vector ticular that recycling of short vectors takes place here too; thus

. Note in paris repeated 5

x a‹

„

Y

i` ` ñ zo

„

y6‘ôh€Ï‘‘óh$u € ” € € ` y Q€¤ h€¸Ô€¤wó$€°”"€h€h€Ï‘€ ó € ” € i ô € ” ó € ô ” ” 7 2 I C k hϑh¸Ô€ o h$u o y€€ ƒ9D7 ” 6E‡4s6Ï ” y€ ô€” € ó u D W 9 ml 7 ‘Q‡hhÁh€ u ` ¨B7 I îaÞ2 I C  oå p—o

ì

Y

ƒ Æï

y€€€ Vh„hu `

x E‹

ì

o ` o ` Ôo `ƒaƒ ƒ ð   ƒ6l l „

€7 WC 7D 3 AR@ 3D @ D ‹ € 7D 5 Sm Y 4Gf5 4¤îGQ‡W I efW ésh€ "4fGC I Gfh€

Y

o

x E‹

Y

„

Œd íx eEaì

y Y AT 7 ‡…u I pæE@

ƒ9D 4s67



y – AT 7 @ m l q A Vu I æEêaò4B@ 

9 H W D Y ml D k o  îas9 H fW 

y€€€u D thh„—W 7 I 9

x E‹

Œ‰ 4aE„

ï o ` i `

ì

o `

á

4.4 Logical Vectors. Missing Values

8

4.4 Logical Vectors. Missing Values As well as numerical vectors, S allows manipulation of logical quantities. The elements of a logical vector have just two possible values, represented formally as (for ‘false’) and (for ‘true’). ( and are also valid representations.)

4.6 Index Vectors. Selecting and Modifying Subsets of a Data Set Elements of a vector may be extracted by specifying the element in square brackets, e.g. . More generally, subsets of a vector (or any expression that evaluates to a vector) may be selected by appending to the name of the vector an index vector in square brackets. Such index vectors can be any of four distinct types: 1. A logical vector. In this case the index vector must be of the same length as the vector from which elements are to be selected. Values corresponding to in the index vector are selected and those corresponding to omitted. For example

creates (or re-creates) an object which will contain the non-missing values of , in the same order. Note that if has missing values, will be shorter than . Also

2. A vector of positive integral quantities. In this case the values in the index vector must lie in the the set . The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length and the result is of the same length as the index vector. For example is the sixth component of and

(an admittedly unlikely thing to do) produces a character vector of length 16 consisting of repeated four times. 3. A vector of negative integral quantities. In this case the index vector specifies the values to be excluded rather than included. Thus

Y

gives

all but the first five elements of .



y x ƒ 7 @ W” ” i” i sGaD H w—y æ……” u ` ¦9 ¤3 QÁhh‘‘h$u `  u D y€ F€” € Y€ q o o n ú y Yu 8 W ‡ AD U‡…¦EaasaC Y

selects the first 10 elements of

(assuming

). Also

Ü Y q n

j 4Y

creates an object and places in it the values of the vector ing value in was both non-missing and positive.

for which the correspond-

qr edn

Y

Y

y6€4 @Ph€Ï”‘€Á‡B@‡t$€ö”‘€DaA I 36Rs€Ï”‘€¤DECE9 I €Ý‘€ I A I A @ ‡ 9 ” ” k ” o r Ek

Y

„

o

o

F

ù y Yu 8 W ‡ AD C ” €€€ ” i at¦EaEGaø"„h˜æϔ ÷ o

Y – –  m   Y 6‡tu I bæEt‡u y 4…u  y y Y AT 7 @ï jY q ð n o

ì

y

F

Ü x Ü Ei o @¤ r @P†ka°faA 3 R Dr C 9 ‡ @k t D v ‡ I erEE9 ÞI A I A I 7 W7 ‡B4R 2€ ml y7 7 u 7 I $$u ` aõ6‡W "R ` ‘4D H I Ü x ” Üi ml 7 W7 v ݔ r ›æa’u ` aއB4R

€ Y€ ” € F€ ” € F€ ” € Y ‘h¸‘‘h¸ÔÁh¸ÔÁ$€

Y

q

y Y AT 7 @ Y m l ‡tu I æa…ï îaèF  n

Y

F y —m îaèF  u Y ml q d¨o n rå

 Y q ’¨pn  oå o

7 4D H I A

I 2

A  ` 

`

á

4.6 Index Vectors. Selecting and Modifying Subsets of a Data Set

9

times to match the sequence. The elements of a vector can be named (as well as numbered) by assigning a character vector to its attribute, e.g.

4. A vector of character strings. This possibility only applies where an object has a attribute to identify its components. In this case a subvector of the names vector may be used in the same way as the positive integral labels in 2.

This option is particularly useful in connection with data frames (see 4.9). An indexed expression can also appear on the receiving end of an assignment, in which case the assignment operation is performed only on those elements of the vector. The expression must be of the form index vector as having an arbitrary expression in place of the vector name would not make sense. The vector assigned must match the length of the index vector, and in the case of a logical index vector it must again be the same length as the vector it is indexing. For example

has the same effect as

4.7 Arrays An array can be considered as a multiply subscripted collection of data entries of the same type, for example numeric, logical or character string. An array is defined by having a dimension vector, a vector of positive integers. If its length is then the array is –dimensional. The values in the dimension vector give the upper limits for each of the subscripts. The lower limits are always 1. Suppose, for example, is a vector of 1500 elements. The assignment

allows to be treated as a

array.

Other functions such as and ural looking assignments in special cases, e.g.

are available for simpler and more nat-

The values in the data vector give the values in the array in the same order as they would occur in Fortran, that is, with the first subscript moving fastest and the last subscript slowest. For example if the dimension vector for an array, say , is then there are entries in and the data vector holds them in the order . To make life easier, has a parameter for data presented by row rather than by column.

„ƒR 3F Q4G¤4E2 ” V”•w”i "„hš” ” i I ” ” ” i x ” €€€ ” n I q y i ” ow Bo 4o )¢’n ” I u ÿ ¤£¡ Uÿ •qû o o n x k ` ¡ ¢ ã ã

q

–

y€€€u 3 Vhh„¦F I E3 I

I

Y "W H @ 3 I

ythh„u~Y "W H €€€ @ 3 I þs„ý¸ÙüöãÙû þ 㠖 y aa ” ” k u ` EÞdu H fq  ml y – @ o r

y ay ” ~d¦B"W ‡H Eè–  ” –u Y@ 3 I m l r k yy  6"a ” ” k u ` ¦d~F I a3 I Eè–  ” –u 3 m l o r

Y

replaces any missing values in

by zeros and

7 H 4D BI A

1

q

y€D ‡ 3R€” € DC 9 Q46A I Gsh¤a"E9 I € u ` W 64Þas8 ` asC  @ 53t ml A5 n

q

n

3R D efW ` fS

q

l Fm m afF s†El afF F  l n q n

 m †al

P

y Fu 7 tԇ2 I E{F  ml

q

y Y AT 7 ‡…u I pæE@ Y  n

P

q

i” x V¢X” k I n

I

P

á

4.7 Arrays

10

in that order. stands for the entire array, which is the same as omitting the subscripts entirely and using alone. Arrays may be used in arithmetic expressions and the result is an array formed by element-byelement operations on the data vector. The dimension vectors of operands generally need to be the same, and this becomes the dimension vector of the result. So if , and are all similar arrays, then

makes a similar array with data vector the result of the evident element-by-element operations. The matrix multiplication operator is . There are extensive matrix manipulation facilities, including transposes and eigenvalue, Cholesky, QR and singular-value decompositions. See help on , , , and . Any dimension of an array can be given a set of names using to use the facilities of data frames. , but is usually easier

Matrices can be built up from given vectors and matrices by the functions and . Informally, forms matrices by binding together vectors or matrices horizontally, or column-wise, and vertically, or row-wise. 4.8 Lists An S list is an object consisting of an ordered collection of objects known as its components. There is no particular need for the components to be of the same mode or type, and, for example, a list could consist of a numeric vector, a logical value, a matrix, a character array, a function, and so on. is a list, Components are always numbered and may always be referred to as such. If then the function gives the number of (top level) components it has, specified as , and so on. Components of lists may also be named, and in this case the component may be referred to either by giving the component name as a character string in place of the number in double square brackets, or, more conveniently, by giving an expression of the form

for the same thing. This is a very useful convention as it makes it easier to get the right component if you forget the number, and is strongly advised. You can find out the names of the components by

I ” E” q n I i)”¢w”i I )” k ”i I i)”æV”i I )” i I ¢wi I ” k i I æVi I ” i I x i i i ” ” x” ” ” i” ” q n q n q n q o n qo n qo n qo n qo o n y i” x 4)¢tu ` 㠔” ¡ ›ÿ q aæi n I

name component name

y

names

qS 4B7

y€€€u q @ thh„&4A 2 `

7 DD 3 "a¤4W

7 H @ 4D BI A H q 3 CR E} E8 ` s‡ "D W AD @

ž

¥ x

ˆß e"ˆ

y …€h€„€hu"A @‡Q3 q 2 y€€€u q @ …hh„"A 2 `

i "a¤4W 7 DD 3 7DD 3 4E"4W q n y 7 D D E3 q W u a8n W ‡ A D q q o n a46¤4…¦aEasaC aBpan

o

Ž j ž j ¥ß xßi ml êÎr¦fGarE{Ž 

y€€€u q @ 2 …hh„"A 63

u 7 ‘4D H I A 

§



á

4.8 Lists

11

Individual elements of an array may be referenced by giving the name of the array followed by the subscripts in square brackets, separated by commas. More generally, subsections of an array may be specified by giving a sequence of index vectors in place of subscripts; however if any index position is given an empty index vector, then the full range of that subscript is taken. Thus is a array with dimension vector and data vector , , , , , , , ,

It is important to distinguish from .“ ” is the operator used to select a single element of a list, whereas “ ” is a general subscripting operator for vectors. Fortunately, numbered components are needed very rarely.

sets up a list of 3 components using the existing objects and for the components and giving them names as specified by the argument names (which can be chosen freely). If these names are omitted, the components are numbered only. Lists can be -ed as well as directories, and this allows their components to be accessed as if they were stand-alone entities. Thus in the example we could have

It is wise to

after use to avoid any nasty surprises.

4.9 Data Frames Data frames were introduced in the August 1991 release of S, and can be thought of as closelycoupled lists of data vectors of the same length. Unlike matrices, the data vectors can be of different types, including character data. Both the rows and columns can be labelled. Consider the data frame from :

which has both row and column labels. The columns can be treated as components of a list:

and the structure can be treated as a two-dimensional array:

ST D D 3 p&E""W

½p¨ À  ½À ½ p¨6!

½p¨’sëp—# À ½¼ ¼ ½À  ½À Ð ½À  p¨s p—#

8 T&DaD¤4Ù¦X&a¤4W 3 W ” qT D D 3 DD 3 a¤4W y ST D D 3 ‡b&E""W ¢ D H GEz¦&6¤4W ¢ aaB4fÒX&6"aW ¢ H I V¦W E4ÎaÞ4E""W  5 C R S ” 8T D D 3 W 8 ‡ @ D 8 ” qT D D 3 @ qu 7 @ C ml 7 D D 3

y€€€u 7 @ thh„~W E4C

New lists may be formed from existing objects by the function the form

½À  Ɨ%¼ ½À p¨½

q aq

€€ „h€

q§7DD 3 $¨4a¤4W

n an

½À  p¨Ê !À "¨Ð

 …T w ai T …¦i ÜT dgx …&o¤çk x d&E v dT k Ü T   T   o o iT …¦Ü  w …T xT & …&w T i aÜk …¦aÜ dgsÜ  T Ür T x C D 5 ot afaao 9 H f°C I "6†G¤4s9 D W 3 53 AD q9R

q ’n o

½À  p—%¼ ¼À V—

7 DD 3 "a""W q

7 DD 3 "a¤4W

½Æ—e! À ½ ½À РƗG

7DD 3 4E"4W

€€ „h€

y FDC @ 3u F 3 2@ G6f9 "d~63 I 4B4C

n

qqon aBpan

½À Ê ½À ½ ½¼   Ɨs p¨sp†$#Ê  ½À Ɨ pÀæ$Êp¼rpÀ¨s½’¼†Æ— p¨sp†$p¼  ½ ! ½ ½ ½À  ½À Ê ½¼  Ð ½ ÆÀ— ¼G¼rp¨Ðs p¨ÐsÐ À— p¨ ¼  ½À ½À  ½À   « EeBµ ¨¹ ¢ ¯ µ ® ¢µ¬ ©

7DD 3 4E"4W

@ H I fq

ys€47"D6D"3aW$€$u¦8 aI G"q ` WD y W 8 ‡@ D 8u aaB4ft¦A I D H  y 7 D D 3 Wu ` W Eaa""t~8 aI aW I 

ç w i WAR as‡v x kk i w ¤rçei R v TEa6TE6aaTEaETaTaEaET T T T T T oT T T T T x C I P 7 I fx k6aw oEo ç çÜ C I‡H I 2 I x 7 3rs S @ q 8 6oDB3aœ7‡EW I D"q q I ¤3  R

8 EI aW I ` W

q I "3 R

á

4.9 Data Frames

12

and this generates much less output that printing the object, which will achieve the same purpose. The names of components may be abbreviated down to the minimum number of letters needed to identify them uniquely. Most of the datasets are in fact lists (or can be treated as lists), so we could refer to the component of the data as . Similarly, many S functions return lists of results.

. An assignment of

D BI "pT I W I q H 3 t

y ST D D 3 W” 8T D D 3 W” qT D D 3 Wu H 3 t b&6""p¢æaE4Xw¦6"4w—D BI 4T I W I îEîD H I 4GE""W  q ml 3tDD 3 y ST D D 3 W ƒ 5 C R S ” 8T D D 3 W ƒ W 8 @ D 8 ” qT D D 3 W H @ qu H 3 t pæE""6eD H saÕ¢b&EE4Q4aE‡ "Ùwæa¤a6ƒ hI t—D BI 4T I W I îEîD H I 4GE""W  q ml 3tDD 3
function. For ex-

D H I "pT I W I q 3 t

A data frame can be created from vectors and matrices by the ample:

"ç fx E  x w ç kÜ k i @ 3 q 7 "8ei C aDo 5Etr9 H W|C I o 345Q3†AGD"q"9GRf9¸7f3eDSBE°aW oI D ” € R v ‘‡„€ q q n € Á9 DW$€ö”‘€R‡„€ q v q H n …T w I P x” wi q n q
13

Character vectors given to specified within a function.

8 EI EW I ` W

R ‡v D "q R I ¤3   fx R v R I ¤3  7 r I fx C R I ¤3  á

gives

If the columns are not named, they pick up the names of the vectors, so

Note how the row label is carried along.

4.9 Data Frames

Data frames can be -ed just as lists can, and this allows their columns to be accessed as if they were named vectors.

are automatically treated as factors (see 10), unless

1

yu „‘Š

TTTTT aaEaET çT w•ç ç T & v & o rx xT p¦Ü o io v T rw¢ç o i…¦ o T çT k ÜÜ T ¦o ÜT …¢ç ik k ¦ o T r T ¢ç Sk T D D 3 W 8 T v D D 3 W q k T D D 3 p&Eo""rp&E""|X&a¤4W o

or

Input is terminated by a blank input line (from the terminal only, despite the documentation) or by EOF (ctrl-D in Unix). To read in a character vector we specify the vector type by the second argument:

To read from a file specify its name as the first argument, for example

Now suppose that multiple data vectors of equal length are to be read in in parallel. For example suppose that there are three vectors, the first of mode character and the remaining two of mode numeric, and the file is . Use to read in the three vectors as a list, as follows

The second argument is a dummy list structure that establishes the mode of the three vectors to be read. The result, held in , is a list whose (named) components are the three vectors read in. Matrices are usually read by row, as follows

The argument

Data frames can be read from a file by the table in one of a number of formats:

y kEk ”æfpæa…” v ÜV” k w”¦çG)”æ6Ei…æw ”æ6i)” k xw6V” çt¨ia…” v X” i ” v Ü x” w w x Ü w ”  ” çw ” i x j o 6o r o o ao o ” ” ”¨i ” v …”æ…” ”¦…” k ” ”…” k ”¦w” k ” k ”æpu ` EއaEGR `  ” Ü Ü ç i x i ml 7 WA5 o6o 6o o o oEo o y€€€u I Vh„h¦A a` 7

y „ ƒ 3 D D 8 ” € qT 3 D t@ W R € u D C WT D 3 m l 3 D t @ W R ‡Qa6"q I ÙÁW I w¦efh‡sE3 $‘E2 I p•q I "ëEr6B‡G¤3 

Ü wi i x ç ç i kEk ÜxÞwaw v Ü k xÞçeÎaaÎiw Eo asÜk r6w ç r rai i v x o x i o v ao o o Ü kA i oo o E†ao o v o Eo „¦A o êaׇEasR y u k a` 7 mk l 7 Ws5k ` I

D C WT D af2 I ¢q I "3 A a` 7 "9 P 7 ƒ @ I y „ ƒ  R 3 F 2 ƒ C A ” y € qT W 8 ‡ @ C € u I 7u Y @ 3 I m l ‡Q4G¤4EҔ eER ` z¨Q‡W I pU6aBas$A a` „¦B"W ‡H E{ó  r

y y  ƒ F ”  ƒ Y ” € € ƒ q @u 7 @ C” € qT W 5 9 A @ €u I 7 m l A a4¤4™æ""za6GhW aa)ÁW I wg6EaB6$~A E` êEéB@ 

to

can be used to skip header rows of files. function. The data file should be a

1. A file such as (page 39) which has a first row naming the columns, followed by the table of numeric data can be read by

y€€€u I thh„~A a` 7

W I wUaaEA @ qT W 5 9

y € qT q €u I 7 m l 7 W A 5 s‘W I X¦48 ` $~A a` êEއaEGR ` 

W I w¢6B‡G¤3 qT 3 D t @ W R

A B@

žÎé†r(srêÃ)rŒ Œ ì Ž ¥ x x ž Ž ¥ ì Ž ì x Œ ž ¥ ¥ Ž Œ ž x Îsr†Î('rÎé†ì ŒsxÎ'†Ãér¤sÎêŽ ¥ ì Ž ž ì ¥ ž x Œ y€€”u I 7 ml WD@ Qah4¦A a` êass6fq 

&

Reading data into S

14

5 Reading data into S
Data objects will usually be read as values from external files. This is done most conveniently with the function. To read a vector from the keyboard we can use

Note that the header has one less entry than subsequent rows. This format is read by

3. A table without any header. The row and column labels are then and . However, if there exists a character column without duplicates, the first such is taken as the row labels and removed as a column. Sometimes it is necessary to read in character strings which contain spaces. This can be done by separating the fields in the file by, for example, tabs or commas:

5.1 Writing out data , and comThere are amny ways to write out data from S, for example the mands. To write directly to a file, there are , and, from S-Plus 3.2, which is usually the simplest method. This can write a dataframe, matrix or vector, with syntax

and further arguments can be found in the help page. By default it writes out comma-separated items on rows, but the separator can be changed to space or tab ( in Unix).

for numeric data, and in one column for character data. To write out a matrix , use

D W@ 3 B4

The function converts data to a line of characters, and can be used with to construct custom reports.

or

W E` I

H

W H eft I 3R y u C Aƒ7 5C A ” 6y H ‘aR ` 6GA H GaR ` zԀ I W I q eE6‡Õ¨y H ¦…¨B"  € ƒD C @ t ” u Wu D W@ 3

y ƒGA H 5GaR ` AzԀ I W I q €ƒeDEC6@‡Õ” I W I V¨B"  7 C ” t qu D W@ 3 r D @ 3 fW "

The function

writes a vector, with syntax

D6f2 I W&fW " C T D @ 3 W ‡H 6t I 3R

W E` EA f49 I W @ 3

€ W9 Ás€

D W@ 3 B4 W a` I

y€” € ƒ9D7 ” €€ ƒDC@ t 6h‘4GQš‘E‡6a6Ҕ I W I V‘a2 I &B"  qu D C WT D W @ 3

W G9

where

is the usual Unix abbreviation for a tab character. This device also applies to

D C WT D E2 I p•q I "3

” 7 o

5 ”€€€ 6"„h˜” o

yy ƒCD 5t ”  ƒ E4¤eafaz""9 D W  ”  ƒ 3 5 3 ”  ƒ A D q 9 R 9 ”  ƒ 7 3 D S @ 3 q H faA I Ò"eC I "6›æ""G¤4Gfzæ¤GfeBE†j ”  ƒ7 8 D q ” €€ ƒD "GaW I "ّ6‡6"W I W „~W E4™‘Ás4GQš‘‡W I w•q I "B$~A a` êEÎq I ¤G5  7u 7 @ C ” € W 9 € ƒ 9 D 7 ” € qT R 3 €u I 7 m l R 37

ÊÀ p¨  G¼ ¼ «§ $"¬ Ç

½ sÐ Ê s ª¡ Qd§ ·

«¢µ $E6¬ µ 

À 4—

y € qT R 3 €u D C WT D 3 m l 6‘W I X¦q I EB„¨E2 I p•q I ¤ëEÎq I ¤3  R 



!À 3—½ ¼ G¼ ! $  %¼ ¦ p§ ¹ ª ¯ öEha’4µ ¹ ª ­µ§¿¥

ÀÀÀÀÀÀÀÀÀÀÀÀ sGsGssGsGssGÀ

­ ¢ "¤ · s§ ¹

Ð e! ¢ ± sQ« 1 ­¢  ¢ ¡¢ ´¢ #2V4’Q« 1

â 7€€ 48„h€

0

5.1 Writing out data

15

2. A file laid out like the listing of a data frame. This has a first header line, and rows which contain the row label followed by the data for the columns, such as

.

Graphics

16

6 Graphics
The graphical facilities are central to S. The steps involved are as follows: 1. The type of terminal, or device, is declared to S at the beginning of the session:

2. A command is issued to construct a plot from data. For example

specifies a simple point plot where and are vectors giving the - and -coordinates of the points respectively. (The command includes a default automatic choice of axes, scales, titles and plotting characters, all of which can be overridden with additional graphical parameters that could be included as named arguments in the command.) 6.1 Graphical Parameters Functions producing graphical output usually have optional additional named arguments that can be specified to override some default parameter settings and hence modify the characteristics of a plot. A short list of the main ones is as follows:
If all axes are suppressed. Default

, axes are automatically constructed.

Type of plot desired. Values for are: for points only, (the default for function ), for lines only, for both points and lines, (the lines miss the points), for step functions ( specifies to change now, to change just before the next point), for overlaid points and lines, for high density vertical line plotting, and for no plotting (but axes are still found and set). Give labels for the – and/or –axes (default: the names, including suffices, of the and coordinate vectors). specifies a title to appear under the –axis label and of the plot in larger letters. (default: both empty).

a title for the top

Approximate minimum and maximum values for – and/or –axes settings. These values are automatically rounded to make them “pretty” for axis labelling.

Other graphical parameters control the background characteristics of all subsequent plots and are usually specified by a call to the function . There are a great number of these parameters and the command

gives a complete list of them and their meanings. Some of the more commonly adjusted ones are as follows:

A

¦¥¢ V6f¡

@

©

Ó

« · ¯ Qª

CED F$#¾

Ó

y€€€u Vhh„&3 I 9

£

F

­

Y

Ó

C B $© #1 Ú

© º #­

Ó

´¬ G’­

«

¯

ª

´

¤

¦

yu PRRC AD 9 h¦GEafGsR  y F” Yu W R C ‡w•t~Gaf9  yE‡za’u ` ƒ H "F @ 8 ” RC @ C yE‡Xua’u ` ƒ H "Y @ 8” R C @ C €Áahf4B6‡"A @ ‡H ‡A@ 3W7€ ƒ I € ‡A@ 3 7€ ƒ2 Á6B"W 64E5 7 €Áahf4B6‡"2 I F ‡A@ 3W7€ ƒ C € ‡A@ 3W7€ ƒ C Áahf4B6‡"2 I Y y 9u 9C D f3 I t¦saf8  € ` ‡6EaW € ƒD 9F íƒ7D 6s4fY I

Line type is . If lines are being plotted, a variety of line types is available; means a solid line, indicates a variety of broken line forms.

multiple frames on the one plot. Instead of plotting just one graph per screen, each screen (or page) will contain an array of graphs forming an grid. If is used the screen is filled row-by-row and if is used it is filled column-by-column. Useful if many graphs are to be inspected simultaneously and high resolution is not necessary. Specify the type of plotting region currently in effect. Possible values for to generate a square plotting region; (the default) to generate a maximal size plotting region.

6.2 Some Basic Plotting Functions The elementary plotting functions are as follows: Scatter plot of points with – and –coordinates given by the two main parameters. The pair may be replaced by a single list with components labeled and , called a ‘plot list’. Graphical parameters are particularly useful. Add points to an existing plot (possibly using a different plotting character. Follows on from a command. Add lines to an existing plot. Similar to points. Note will join the points of a plot by a cubic spline interpolation function. (See for further information.) Add text to a plot at points given by . Normally is an inis plotted at point teger or character vector in which case . The default is . Note: This function is often used in the sequence The graphics parameter suppresses the plotting of points but set up the axes, and the function supplies special characters (in this case just the integers by default) for the points. Draw a line in intercept and slope form, ( , ), across an existing plot. may be used to specify –coordinates for the heights of horizontal lines to go across a plot, and similarly for the –coordinates for vertical lines.

lmobject

6.3 Interaction with Plots S-Plus allows users to interact with plots, by identifying points and by adding information at places selected by mouse clicks.

£

T R USQ

y Yu 8 W ‡ AD t¦EaEGaC å ¨o @ "a2 7 CD IC 7CD 4af2 I C q n F” w•Y

Y

I

Specify the character to be used for plotting points (default: minals, for PostScript).

for graphics ter-

» ë¦

« ¯ aE¡ £Ç

2 I

y€ €u W D …h€h„¦EYGfW € € ƒ 9 ÁA$‡eDfaFEW y‡X”¢tu~asWz—ysÁAh€ƒ6D96FEp”¢Fw•t~Gaf9  F Y WYD y € W ” Yu WR C y @ w” @ tu F Y q n q n

y y F” Yu D A @ C 7u 7 D @ y y F” Yu W R C 6‡X¦t¨"B4f9 „Ô4fA 4C ‘‡w•t~Gaf9 

¦I ¡

F

ƒ ` 4S

y€€€u WR C Vhh„~GE9

F XӢY Y

HHH º Ð º Ê » %%6‘#Ԝë¦
F F Y

yD @ C 97u 9CD 4"A "B„¦saf8

¦

P

ƒ ` 48

µÇ É ¯ sa¡

y …€h€p€p” u D A@C ¨B4f2 y€€€ ƒ Su D A@ C Vphp” ` 4…¨B4f2 yVphp” ` 4…¨B4f2 €€€ ƒ 8u D A@ C y€€€” 2 u D A@ C Vphp¦X” I ¨B4f2

¼

­

¡

y€€€” F” Yu 7 W @ R Vphp¦X¢…‘‡EA "9

y€€€” F” Yu 7 D A @ …h„V•w•t‘"B4C

y€€€” F” Yu WR C thp’¢X¢t~GE9

y‡p” A y A ‡p”

y €” 7CD V€p€hp—4af2 I C ” F” Yu W Y D ¢X¢t~asW

ƒC H u ` eER ` t H ƒR 3 H u ` 4s""t H

€ ` 48 ` 9 € ƒ

€ ` 4Ea9 € ƒFW

AƒFW Q4EGC I I I I

G

6.2 Some Basic Plotting Functions

17

are

On a current plot of , clicking the LEFT mouse button places the appropriate string from near the point which has been clicked on. Click the MIDDLE mouse button to finish. If is omitted uses index numbers, and always returns the indices of selected points. Returns a list of vector coordinates of points clicked by the LEFT mouse button. Click the MIDDLE mouse button to finish. ditto, but plots the points as in Add a legend box at a mouse-selected point (one LEFT click). See help page for the box contents and other options.

.

is often used with

to add annotation to plots, e.g.

6.4 Brush and Spin These are S-Plus enhancements to allow dynamic manipulation of graphs. Spin allows three columns chosen from a matrix of data vectors to be rotated in space.

Use the left mouse button to select three of the variables, then use the cross-shaped pad to rotate the point cloud. Finally click on .

includes and a plot. Additionally one can ‘brush’ by selecting points with the left mouse button, and de-selecting them with the middle button. One can mark points in different ways, with the four symbols, and even label points if is selected.

Now select the first 50 points with one symbol and the last fifty with another. The intermediate nature of the middle 50 then stands out. 6.5 Equally-scaled plots It is sometime necessary to make geometrically-square plots, for example so that distances in can be assessed accurately. This is somewhat tricky, but done by the functions , which adjusts the axis scales to be equal within the current window shape.

« 4’Q« §´¢

WRC 7 } saf9 ` feD

y6€a74Q7 a` €p”¨ypu36"W E` RaC’u¦WEYGDWÕy¨y6a"6""6GR ` h¨p&GW E` Ep¦EGfW  D I R I €7 CR 3 WA €” yu 3R I RCu WYD WYD aGfW yu 3R I R „&eW a` aC

· « B¯ aª

« 4’Q« §´¢

y ” 7@ @ i” 7@ 3@ ” 7 @ @u q @ 23u 7 53 ay k 6” 63 p” VE” a"sV” a” E3 h—4A 6V¦8 6"2  q n q n q Bo n CD E2 I C

¶ pº Ó

W ‡Q} @ 5

7 3 s@ I 9 A ‡B7 @ 9 y „ ƒ 7 @ 8 v YT D ‡Q4W E‡z” av æW I W h¦8 6"2  7u 7 53

y VTaT6T"”¨ypu&3GW 6` a’4ssaC R I RCu qAD ‡D y€ 9€”u 3R I R 6Á’h4&eW a` aC

y 7 C D C” F” Yu F @ W A D q aaE2 I …¢w¦…¦Et ‡EG¤G@ y YT D W 7 u A @ 9 v6v b&W I B„¦B‡B7  y€D Q4W I W Q$¦saf8  7€u 9CD y D C 9@ 3u F 3 2@ F ahfd~63 I 4B4C yu 3R I R „&eW a` aC

G

6.4 Brush and Spin

18

7@ 3 afG@

87 53 B‡6"2

yu PRRC AD 9 „¦saEGfGR

G

6.5 Equally-scaled plots

19

Figure 1: Screen dump of an different highlights for the three groups.

window displaying

on the

data, with

Standard summaries such as , and are available. The function will take a data matrix and give the variance-covariance matrix, and computes the correlation matrix, either from two vectors or a data matrix. There are also standard functions , , and . The functions and will compute trimmed summaries. More sophisticated robust summaries are available, such as and as well as via the library. 7.2 Histograms and Stem-and-Leaf Plots which plots a conventional histogram. More The standard histogram function is control is available via the extra parameters. The parameter gives a plot of unit area rather than cell counts, and sets the number of bins. Densities can be estimated via the function :

See figure 2.

0.0

0.005

0.010

0.015

0.020

50

100

150 hstart

200

Figure 2: A histogram of

with two density estimates overlaid.

3 eR `

AID H

„sƒ4F6WB@4C62 I G""9 @ 2R 3

3I S

y ƒ FWC ” y i ƒ 8Wq@  ” 6 W k †as›¨"a""aQG‡Õ¢633 I yy 6‡W I yEy4i6…T¦…”æ’u ` ƒ H @"FÙ¦6aaB4Q‡2 I G¤4za"sa7    C ” „ƒFW@C@ 2R 39 ” i ƒ7 FaWB@a‡s"q 7 AD 7 a7 I C ` A y€€€ ” Yu W7 @ hh„Ý¢…¦Ba‡8

V

DC@ W 5 E6‡EA I 6}

3 eR `

W7 52R B‡EG"3

3I S

D ‡ aA I 3 B@ H Y H A I

W W7 63 I B‡8

A I feD H A I D H @ q

5 I &EC a` 7 WT D I

W …aW 6G¤d‘"B4C 7 8u F @ 7 AD qu 7 D A@ W …aW 6G¤d‘"B4C 7 8u F @ 7 AD qu 7 D A@ A ” W W7 8u 7 @ I C ` z¢s3 I B‡t~W E‡8

T AR@ I R H gs6‡W a` aC

Û

Statistical Summaries

20

7 Statistical Summaries
7.1 Arithmetical Summaries

7.3 Boxplots

21

A stem-and-leaf plot is an enhanced histogram:

Apart from giving a visual picture of the data, this gives more detail. The actual data, in sorted order, is roughly and this can be read off the plot. Sometimes the pattern of numbers (all odd?) gives clues. Quantiles can be computed (roughly) from the plot. 7.3 Boxplots A boxplot is a way to look at the overall shape of a set of data. The central box shows the data between the quartiles, with the median represented by a line. ‘Whiskers’ go out to the extremes of the data, and very extreme points are shown by themselves. It is also possible to plot boxplot for groups side-by-side:

divides a time-series into months, and plots the boxplots for each month on one plot. See figure 3. Other styles of boxplot are available—see the help page.

y‡62 I T¢8aWEGR H ƒG"D BI Az”—6y H DWEWGRftu¨EC ` F ` ” H DaW RftBa9 h¦saEGf2 2 A 7 H y A D W Au W@ C 7u WR C 9 YR y D C @ 3u F 3 @ ‡F E"9 V¦Q3 I "2 "C 

x i ç oÜ e¸å ai ik ç å i Ü x k k k å o v EÜ r ax Eak o  å ai v aiv å wç o Üv aÜ å o w  å Üv o r ç x eoa å o rx ww x aaw ax r aoEi å so ik i o w aw Üv Eaak i ¸å o x kakx x w w v vv v Ü Ü Ü Ük o i EEç a6Erv Eaafx Eoa å ik o ç x 6Ü k G å o o Ü r sa å ao x å o w r v av r k i ¸å wç oi vÜ w k Ewr Ek å r w kv aiÝå Üv iar å r Ýå r AsRaaR ` D8EW†tGRsWE8a‡B@3îD8EWîRf|D aI 9 7a@éEA "œC ‡H @ ` Ž C W ` C W @ R 9 I D o çT w¢ç ædT  ¸4E6‡Q3 ” i ƒ 7DC@ W I 5 ç r wT k so ƒ†A Ir fo ev e êW ‹ @ qD ç ƒ r Ek o o y W 7 8 D W 63 I W tu H B7  €€€ ” wÜ hhÏæEϔ k ÝæEÏæEݔ Ü ” iÜ ” iÜ r Er

40

50

60

Figure 3: Boxplots for months of

8 Distributions
S has functions built it to (approximate) the density, cumulative distribution function and quantile function (the inverse of the CDF) for many standard distributions. There are also function to simulate samples from these distributions. The first letter of the name indicates the function, e.g. respectively. Distributions available are:

Distribution beta binomial Cauchy chisquare exponential F gamma geometric hypergeometric log-normal logistic negative binomial normal normal range Poisson stable T uniform Weibull Wilcoxon

S name

parameters

A ™” H D I 7 ” D aC a` Ýæf9 I 8 7 Y H ¢B@ H I ” A t 6q 7a74fasP Ý¢s""A @ D AD 7 ” YD q q H I "2 BI C q7 ” D –@ GÏ"sa7 q7 ” GϕA I D H 2R 39 ” D –@ s"4›"sa7 D I 7 EC a` ϔ ` aC R ‡sRaC¤Gš•GafA I D H q7 ” ‡RC P ” A ™¢™” H 2R 3 G¤49 D f9 I 8 7 i tq t 6…” 6q D o W I 3 t 6q 7 ” AR@ I R š¢s6‡W a` aC 2R 39 ” D –@ s"4›"sa7 7 D I 8 ݔ o f9 I 8 7

DaC a` I iD af9

D WWR H EGfA

e d c b a X X ` a ` Y X

30

Y GR C @ ` 6 CEC5a2B4f @D t@ A B‡E5

2 I B7 W 7@R E4f9 DaA 3 ‡ I "A 3R H efA R A@ 2 H B‡EA 7@ ‡R aGEC 3R A H 6sC 3D 9F efaE8 RD H af‡ I H ‡EH I ‡ t 9Y asD }7@ GE‡8 ` Fa8 ` 5 E` I R @ H fA 2 WD I Gf2

W

3R A3 3R A} 3R A9 3R A H ef6š” H ef6›” H eEҔ H 66q

“

Distributions

22

Jan

Feb

Mar

Apr

May Jun

Jul

Aug Sep

Oct

Nov Dec

data.

The function

8.1 Q-Q Plots One of the best ways to compare the distribution of a sample with a distribution is to use a Q-Q plot, of which the normal probability plot is the best-known example. Q-Q plots can also be used to compare two samples. For a sample the quantile function is the inverse of the empirical CDF, that is

The function plots the quantile functions of two samples and against each other, and so compares two samples. The function replaces one of the samples by a sample at the quantiles of a standard normal distribution. This idea can be applied quite generally. For example, to test a sample against a distribution, we use

where

computes the appropriate set of probabilities for the plot.

The function helps assess how straight a plot is by plotting a straight line through the upper and lower quartiles. (See the example in 3.)

F

Y

hv €¤y

y‡tu H 66E} Y 3R A}

1

D @ C } fA ""a} H 3eRfA6}E} 7 W @ R 9 ‡EA "a9 y y Yu W 3 R 7 ” y w” y Yu 7 W @ R 9 9u W } u WR C އ…¦Qe6™¨4…—tÔaA "Et¦Qö¦sa9

ä

quantile

proportion of the data

Y

‚ ƒ

Y

w v r p h x4g utsq¢ iÖäg

y€€€ ” F ” Yu WR C 9 } phh¸•Ò•t~Gaf6E}

DC af9 H I 7

f

8.1 Q-Q Plots

23

re-samples from a data vector, with or without replacement.

‡ ­ ¢ ¡ ­§ §«ª¡¢ —s§ · ft¥ · GœQQQds­ Ð   Ð  ÐÀ ʼ  Ê !   À s#s#p—pÞ#6˜— ‡ «¢¿ ¦¥ § ¦ ¥ Ç ˆ$Esµ„§ · )|G£"’§ ¹ 46¦ ¯ £ · ’GEh4˜ ¦§£µ§ª  ½ ¯ ’¼ e· « ¢"¬ Ì § · ¯ ¦ÏB¥Ã’s§š"¬ µ · ‡—Bes§"¤ B¯ B6šEp¥ · "B„§ · « ¢ ­ ¦¢ ¡ § ­¥­ · ª¶¤ §¿ ¢ ¦µ ÐÐ À ½ §¬ ¢¿ ª º Ge!"—|»¸4$«$EhÄ4Ô”»îÇ ¹ ºÔÊp"—œ» · ¼ À ½ ‡ 1 —¢ · ¢ ¹
­§ ¾ §«ª¡¢­Ä§ · GEhÄ · 6a6VsGs4¦ – ½¼»¬ ³ ’4aQ¡ º 1 ‘· s§ · À ¸® ² ­ · ­ ¤ ¤£ · ³ s§ ¯ ’­ ² ps¢ G· ¢ ® ‡¼ ’"Ê À м ÐÀ м ‡ 4¨’†p¨’š• ¼ À  ÐÀ  ‡ 4—”Æ—“• ¼ ÐÀ ¼ ¼ À ½¼ ‡ ps†4¨’š• ¼ À  À  ‡ Ð 4—”—“•’¼ !À  À  ‡ ¼ "—”—“’s¼ À ¼ ¼ À ½¼ ‡ fsë4—pöˆ ÊÀ !¼ ÐÀ !¼ ‡ Æëp‘öˆ À ½¼ ÊÀ ¼ ¼ ‡ —’ëpfGöˆ À  ÊÀ  ‡ 4¨p¨‰ˆÐ ½À !¼ ÊÀ м ‡ Æëp—pöi¼ ³ ½»† º ½ ² ­¥« ¦¢£­ Ä ­ ¤ s³ se£‘s» 1 ‘· BEº ² ’sGœs°s§ ¯ ’­ ®
We can use these data to illustrate one-sample and paired and unpaired two-sample tests. The rather voluminous output has been edited:

¦¢§ Ó Ç ¯ ps¡

o o o o o o



ow ç Üv

y‡tu ‰ y‡tu í y‡tu ‰ y‡tu ‰ y‡tu ‰ y‡tu í y‡tu ‰ y‡tu í y‡tu ‰ y ‰ ‡tu

ÜdT T ko k T&w kX&Eo ç T ow xT &Ü ç XT idgEo T ox idT o ç T oç X¢Eo T dgx o

ytu í ytu ‰ ytu í ytu í ytu í ytu ‰ ytu í ytu ‰ ytu í y í tu

T çk T w¢çk çT w& T &w rT ܅&Ü T v T& k Tgx w…& i…¢ç T i …T k

rx ik o FR G2

… „

„

9 Classical Statistics

Classical Statistics

S-Plus 3.1 has a section on classical statistics. The same functions are used to perform tests and to calculate confidence intervals.

The table shows the amount of wear in a shoe experiment with 10 boys, an experiment reported in Box, Hunter & Hunter (1977), Statistics for Experimenters. There were two materials ( and ) that were randomly assigned to the left or right shoe.
24

…

¼  ! ½ РмÀ Ä  Ð     À ½ "¦$sG’d—½ œG#—$Ä ‡ «¢¿µ ˆ$Es„§ ¦ ¥ § £ ¦ ¥ Ç ¦§£µ§ª  · )|G"’§ ¹ 46¦ ¯ £ · ’GEh4˜ ½ 6· « 4¬ Ì § B¯ Ï­ œGs"pE„Esp¥ ¹ Ç ¯ ’sš"¬ µ · —Bes"¤ B¯ B6šEp¥ · "B„§ · « ¢ ¦¢§ ¡ § ‡­¥­§ · ª¶¤ §¿ ¢¦µ ¯ ¢ · ¦ ¥ ­§£ ¦§µ§ÇÇ   ½ ½À ½ » § ¬ ¢ ¿ Ä ª º  » º   ! ÐÀ Ä #sƗ|¸"h« ah"‚Ô”îÇ ¹ ‘F˜p¨Ð ×» · † ¹ ’¢ 1 —¢ · ¢ ¹ ¦ ‡ ­§¾ · §µ¥ · GEhÄ ˜¹ a’e¢ h §µ¥¢ ª º † ² ­ · ³ ¾ » ¹ E’e"z—fº 1 ‘· s§ · À ¸® ! ½À ¼ $ps¼ Ð À ½ s—p¼ ¶ œÇ ¯ ’s¡ Ó Ç ¯ ps¡ ¦¢§ ¦¢§ ‡ ­ ¢ ¡ ­§ §«ª¡¢ —s§ · ft¥ · GœQQQds­  ! ½  Ê À  ! ½  ! À Ê F$GG4f¼ ¦$GF$—$Ä ‡ «¢¿µ ˆ$Es„§ ¦ ¥ § £ ¦ ¥ Ç ¦§£µ§ª  · )|G"’§ ¹ 46¦ ¯ £ · ’GEh4˜ ½ 6· « 4¬ Ì § B¯ ¦Ï­ |­"psfø)|G"’ahas’¥ ¹ "¬ µ · —Bes"¤ B¯ B6šEp¥ · "B„§ · « ¢ ¥ ¦¢§ ¡ ¦¥ §£ ¦§µ§ÇÇ § ‡­¥­§ · ª¶¤ §¿ ¢¦µ ¯ ¢ ·  ¼ À ½ » § ¬ « ¢ ¿Ä ª º   À ¼ » º    ÐÀ Ä  44¨×ø"$$E„"Ô#4¨ †îÇ ¹ ‘#p¨½ ×» · † ¹ ’¢ 1 —¢ · ¢ ¹ ¦ ‡ ­§ ¾ §«ª¡¢ ¯ §¥Ç ¹ g ¤£« · GE„Ä · Qa6V6© Ä eÉ ¾ ¹ B4’¥ G¯ ”pQ$§ ™ ® Ú » ¢ §À µ¢ ¿ º † ² ­ ³ sG« 4¬ Ì w„Ez¨“º 1 ‘· G§ · À · ! ½À ¼ $ps¼ Ð À ½ s—p¼ ¶ œÇ ¯ ’s¡ Ó Ç ¯ ps¡ ¦¢§ ¦¢§ ‡ ­ ¢ ¡ ­§ §«ª¡¢ —s§ · ft¥ · GœQQQds­ ! Ê  ! Ê À ! Ê  ! ! À Ê $s¦$G4f¼ ˜s¦$—$Ä ‡« ¿µ ˆ$¢Es„§ ¦ ¥ § £ ¦ ¥ Ç ¦§£µ§ª  · )|G"’§ ¹ 46¦ ¯ £ · ’GEh4˜ ½ 6· « 4¬ Ì § B¯ Ï­ ¥|"p¢s§f¡ø)¥|§G"’§ahaÇs’¥ ¹ §"¬ µ · ‡—Bes"¤ B¯ B6šEp¥ · "B„§ · « ¢ ­ ¦ ¦ £ ¦ µ§ Ç ­¥­§ · ª¶¤ §¿ ¢ ¦µ ¯ ¢ · ¦  44À¨×»ø"$$¢E¿„"ªº‘ ¼†»îÇ ¹ ‘#p¨½ ×» · ¼ ½ § ¬« Ä º    ÐÀ Ä † ¹ ’¢ 1 —¢ · ¢ ¹ ¦ ‡ ­§ ¾ §«ª¡¢© ¯ µ ¦ · GEhÄ · 6a6VQ$Ä eÉ ¾ ¹ h¢ ¹ ’¢ · ©

ÀÀ sGÀ

† ² ­ · ³ fº 1 ‘· s§ · À ¸® ½ ¯ ¢ ’¼ e· « 4¬ Ì § · ¯ Ý­ s6š"¬ µ · —Bes"¤ B¯ B6šEp¥ · "B„§ · « ¢ ¦ ¥ ¬¡ § ‡­¥­§ · ª¶¤ §¿ ¢¦µ    À ½ » § ¬ « ¢ ¿ Ä ª º ½¼ » ¦ º ! Ð » d #—œ¸4$$Eh4zԒërz¨$œîe£ ¥ · ­ ¥ · ¢ · ­ ± pE„Ä ¹ 4Bp6­ ¦¢µ §¦È¥
­ · ¦¢µ § ¦È¥­ ¯ £« ™ £ · s§ ˜± ’E„Ä ¹ "Bp6r¦ 6Ó ¯ Q„¥ · s¢ Ó C ½¼»¬ £« É ³ ’4aQ¡ º 1 ‘· s§ · À Ó ¯ Q„¥ ¸® ² ­  À ½  —|‘¼  ‡ BÑ « ahQÆe¦ ¯ £ Ñ º ² µ G· ¢ ³ § ¿ § «À Ç · ÐG#sƗpre$#4¨ ‘¼  Ð  ÐÀ ʼ  Ê !   À  ¦¥À Ç © ² ­ · · )t6¦ ¯ £ $³ 1 ‘· s§ · À ¸® Ð À ½ s—p¼

‡ 1 —¢ · ¢ ¹

Classical Statistics

25

Classical Statistics

26

The sample size is rather small, and one might wonder about the validity of the -distribution. An alternative for a randomized experiment such as this is to base inference on the permutation distribution of . Figure 4 shows that the agreement is very good. (As the computation of this figure uses some subtle ideas in S, it is omitted: see Venables & Ripley (1994, Chapter 5).)

0.4

0.3

0.2

0.1

0.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4: Histogram and empirical CDF of the permutation distribution of the -test in the shoes example. The density and CDF of are shown overlaid. The list of classical tests is:

Many of these have alternative methods – for and .

there are methods

€ AR7 D 9 Ás6f3 I fh€

Ðe!¼"ÆÀ—½|»¸§4$« ¢a¿hÄ4ªºÔ#F!3À—Ê$Ä×»|ij¦ £ § µ £ ½ ¬ ! £ ­ «¢ ¡ ¦¢µ §¦È¥ ¯ ¥ · GEGµ ¯ ë¤ · ¥ É B¥ · ­ ¥ · ¢ · ×$f„µ ¯ ¦ ± pE„Ä ¹ 4Bp6­ † ¹ ’¢ 1 —¢ · ¢ ¹ ¦ ‡
Permutation dsn t_9 cdf diff

W7 D WT Y C @ h4fpUGR ` 6 W 7a¢3 ‡H fA H W ""p¢GD D WT I D ` 7 D WT A W7 D WT 3 D 8 7 @ h4fp•eBat



l

WB74DfpT¢6R ` W 3 W "p•3 I S 7 D WT 8CD W I I saEA H W "p•eR ` 7 D WT 3

W 4fpUW 7 D WT W 74D"&C I P 6"P WT 7 53 W7 D WT } 7 @ h"¢GE‡8 `

‚ 8

l

-4

-2

0

2

4

-4

-2

0 diff

2

4



k

V

m

k

­ · ¦¢µ §¦È¥­ ¯Ó £« · s§ ¸± pE„Ä ¹ 4Bp6r¦ 6B¯ Q„¥ ™

V

¾ §µ¥¢ª º † ² ­ £« É ³ B» ¹ a’64—fº 1 ‘· s§ · À Ó ¯ Q„¥ ¸® € I ‡A ‡H 3 I 9 Q€ D 7 W 7 D WT 9 R 3 B4fpgs""9 Wh74DbUA H eQ"t WT I q D @ 3 W7 D W R @ h"T H fA 2 q
,

€ C q AD P ¤EC I "Gfh€

This provides examples of each of S’s types of categorical data structure. There are two main structures, categories and factors. The latter were introduced in the August 1991 release, and have almost entirely superseded the use of categories. A factor is regarded as a vector over the set of levels which have no implied order. Thus sex, TV area and transport are all factors. However, TV area is coded by number rather than by the names of the companies. These variables can be declared as

Internally in S levels are numbered in alphabetical order, and when factors are used as treatments in designed experiments, the order of levels may matter. For example, if we want to contrast females with males (rather than vice versa) we need to specify the levels of the factor explicitly:

Social class is an ordered factor in that the classes are perceived as ordered, with “A” (professionals) regarded as highest. We can declare an order by

The first line orders the levels by the default (alphabetical) order. The second shows how the set of levels may be changed, in this case by reversing the existing ordering. Age is an ordered category for which it is necessary to specify the levels explicitly. Had been specified as a continuous variable, it could have been categorized using (whose help page gives other ways to produce the categories):

Britain is covered by 12 commercial TV companies, so this provides a simple geographical variable. Derived from occupation.

qT D I W I w¦‡ I

yEy6€jeaeö‘w m $€ ÔÁEGm söԇGEB$u ` ܀ ” € x ” € xx i€ ” € xim€ r r ” ‘y I rW I q ` ¦"‡ I —eW T D u 3R yyww ” Ü a4EÏaݔ š” Ýæ’u ` ” I W I w&f‡ x i ”  qT D r r

yEy6€jeaÜe€ö”‘w m $€ ÔÁEGm € x ” € xx r r ” ¨y x 4C y y oI q T 6y q Vå W I n p&C

W a5 `

yy€ 쀔 € v€ ƒ7CD SDC 66‘’„ÔÁ’$u ` s4afGE›” I W I wgY Q„&eW aI îaèGQ7  qT D 7u 3 R ` t m l Y D

€€ „h€

sex: age: TV area: social: transport: spend:

M, F –24, 25–44, 45–59, 60+ 1, , 12 A, B, C1, C2 car, bus, cycle, foot positive continuous

p

y I W I XT¢W6e"BA I 4t&efW aI îEr669 A I 4W q 3 R 97 3 Wu 3 R ` t m l W 3 R 7 3 y I W I pgq…6W EI Þal I "3 I Uq„ qT 7 „u 3 R ` t m D T 7 y I W I pUGQ„6W aI |Eés67 qT Y D 7u 3 R ` t m l Y D

o

i€ ” € xim€ ƒ7 CD SD söԇGEB$u G"asaC r qT D u 3 R ` tu q D 3 D q 3 R m l D I W I p¦"‡ I —eW EI …e¤e¤aerE†f‡ I ` R7u 7CD SDC ml y R7u 7CD SD I @ ` Q„Ô4afGE†aõaC I @ ` 6h‘4EsaC R7u 3R ` tu q D 3D q 3 R m l R I @ ` sh—efW aI …6"e¤a6†aÞC I @ ` 67

ƒ7 CD SD G"asaC ` tu qD 3D q 3R m l D EI …e¤e¤aerE†f‡ I u W m T D I ¦a5 ` El I W I q ` &f‡ I

r

s

n

Handling Categorical Data

27

10 Handling Categorical Data
Consider a (fictitious) survey of shoppers in Britain. Amongst the variables collected for each person surveyed are sex, age, TV area , social class , transport used for this trip to the shops, and total spend at supermarkets. The possible values of these variables are

10.1 The Function

Some of the functions for statistical models treat ordered factors in appropriate special ways.

To continue the previous example, suppose we have want to summarize spend by some of the factors To calculate the sample mean income for each age-group we can now use the special function :

giving a means vector with the components labeled by the levels

Suppose further we needed to calculate the standard errors of the mean spends. To do this we need to write an S function to calculate the standard error for any given vector. We discuss to calculate the functions more fully in 12, but since there is an inbuilt function sample variance, such a function is a very simple one-liner, specified by the assignment:

After this assignment, the standard errors are calculated by

and the values calculated are then

The function can be used to handle more complicated indexing of a vector by multiple factors. For example, we might wish to split the spend by both age and sex:

The combination of a vector and a labelling factor is an example of what is called a ragged array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same the indexing may be done implicitly and much more efficiently by using arrays. The function is the analogue of for arrays.

} WœR ‚t œ W ‹ v t¢ ¡ ‹ t  œtT ª © ‹  t ‹ Y x  v §t †6Fu € #ž„ÙŽ€F„€R € Ž€Š¦«g€sŽ’s¨s € •qi#iW 

v§t qi#’W

The pattern of our survey can be seen by the and returns the contingency table as an array, e.g.

function, which takes a listing of factors

}zzzx œt ‘||{¦iS

} } Yx ¦ W  ‚  v ¥ } Yx œ t Sx W ¤ } Yx ‚ R ¢ W ¡   Ÿ ‰ œ œ  ƒ € F†w£€6qFq†…ŒF’ty6œ #€ w£#Q†s’‚ €j€‡ q¦ž„W '

}œœƒ ‹ t ‹ ƒ‚ x wvuut ‰ œœƒ T ƒ‚ € iqFŠ„W € Ži#syq#’u € y#’€#iW q‡ F¦ŠŠW € ¢„’u '

}‚#tq…‹Ži#tx‹Œƒ„‚#Šu € xyw#v’u€u#tiW qˆ€ #€†¢„’u ' ‰ ‡ ‚ t  …T ƒ ‚  € }zzzx wv u ut ~||{y#iq#iW

} ‚t … ‹ } Y ‹  tx ¢v ‹ ƒ‚ x wv uut #€†•G € Ži#%yW € „syq#’u € y#’€#iW 

}zzzx wv u ut ~||{y#’€#iW

10.1 The Function

“q• T ˜š ’’ T q™ ˜ ‘ ™ —– ”“‰“ ¦q• q€q’’

–ž˜ T ‘ €“ T ’ “ —– ”“‰“ ¦q• q€q’’

wvuut #’€#iW

H ¶«ªª ³ %H H ² BQsp¢ · ›

and Ragged Arrays

28

and Ragged Arrays

} zzx wvuut ‘z |{£’€#’W

™ “ – ˜ ¦“ T Š™ €‘ T ¦‘ ’ ’ ‰ “… ’‘ ‚qt qqT ‘ ƒ ‚  #€‰ € #€†¢„’u ' € ™ €™ T ‘ Š˜ T ™ – ’’‰“ ’‘ œ qœ  qƒq T ƒ ‚#€ q#ž„W‘ € ¢Š#i‰u € wvuu ’€#t

The control statements are very close in spirit to those of the C programming language, and only a few are mentioned here. There is a conditional construction of the form

where expr must evaluate to a logical value and the result of the entire expression is then evident. There is also a

–loop construction which has the form

where name is a dummy, is a vector expression (often a sequence like ), and is often a grouped expression with its sub-expressions written in terms of the dummy name. is repeatedly evaluated as name ranges through the values in the vector result of . As an example, suppose is a vector of class indicators and we wish to produce separate facility to understand the following: plots of versus within classes. Use the

(Note the function which produces a list of vectors got by splitting a larger vector according to the classes specified by a factor.) Other looping facilities include the

expr

statement and the

Loops in S are often memory-hungry, and care may be needed not to use up all of your computer’s memory. Expert advice is necessary on work-arounds.

WY as’‚

statement. The statement can be used to terminate any loop abnormally, and be used to discontinue one particular cycle.

}

condition

expr can

°œuY €„qs

®œuY €„€G

– ‘· q«•š

}zzzx ¢ v ~||{yW Š’u € ³ } } » » ¢¹ ¹ ¡ w‹ » » ¢¹ ¹ ¡ Yx qiF#F’4yF#F’wyW ¢ %ˆ’$„’t vx  ‚¢v § ¯•}i»F»#º¹q¹£Šƒ‹yF#ºq£ŠtyW ’˜¨F„iƒ£qEqqv$€ $¤{¶œ µ — ¢ ¡ w » » ¢¹ ¹ ¡ Yx v u­ } } ¡ wx ¦ W  ‚  Ÿ ‚ ¢ ¢x } ‚ Y x W ¢ v € µ ‰ ¡ ¯ } ƒ ‚ ¢ ‹ w ·£W ¢ v € ‰ ¡iŸ iƒ„˜¢6‹¨…£B„’u '€‡ ’Y •’q$%¸~ˆš B„iu '€‡ iwj x uv q’¦ Y w

°

®œuY q„€G

ƒ‚ „˜¢

} |®

name

expr

expr

´

¾tœ #qž„§

‚ $¢ œ ’Ÿ µ

} |®

expr

expr

expr

³ ²œuY ¯zzz¯°œuY ¯®œuY £¨„qs±q||%F€„qseF€„€Gq­

 € q° v

®

x v¢ ¼ ½q¨†¦ j

Wt u Gqi#žœ 

x µŸ ¶œ ij

x ٢ 

°œuY €„€G

¬

Loops and Conditional Execution

29

11 Loops and Conditional Execution
Commands may be grouped together in braces, . The value of the group is the result of the last expression in the group evaluated. Since such a group is also an expression it may, for example, be itself included in parentheses and used as part of an even larger expression, and so on. This facility is most often used with the control statements of this section.

Writing Your Own Functions

30

12 Writing Your Own Functions
As we have seen informally in 10.1, the S language allows the user to create his or her own functions. These are true S functions that are stored in a special internal form and may be used in further expressions and so on. In the process the language gains enormously in power, convenience and elegance. Most of the functions supplied as part of the S system, such as and and so on, are themselves written in S and thus do not differ materially from user written functions. (However, increasingly such functions are being re-written as internal functions to gain efficiency.) Listing these functions (by printing their name without parentheses) is a very fruitful way to gain hints for writing your own functions. A function is defined by an assignment of the form

The expression is an S expression, (usually a grouped expression), that uses the arguments, arg , to calculate a value. The value of the expression is the value returned for the function. A call to the function then takes the form expr expr and may occur anywhere a function call is legitimate.

This first computes the quartiles, then returns the last value computed, their difference. Note that any ordinary assignments done within the function are temporary and lost after exit from the function. Thus is not left behind, and does not affect any other object . If global and permanent assignments are intended within a function, then the ‘superassignment’ operator, ‘ ’ can be used. See the documentation for details, and see also the function. As a second example of a useful function, consider a function to evaluate the ‘Huber proposal 2’ robust estimator(s) of location and/or scale:

ß ß ­Ò è ÌÏ ­ ­ Ï ÛÒ â ¦)妦sB†"—Ë Ï ê Ì ÊÉ Ø é#é†Ì ¬Û ÊÉ Ø ¬ QÜe†QÛ ¬Û ÊÉ à¬ QÜ”qQÛ Ç­ â 6« Ç ê ØÊÌ ÊÉ Ø q„é#é†Ì ¬Û Ï ÌÏ ÊÉ à¬ 6|Î žj”qQÛ ß ß ¬ ÛÒ è ÌÏ ­ ­ Ï ÛÒ â #’6"妦sB†"—Ë Ï ß Ô ÎèÌÇ« ÊÉ ŠƒÒ Å #F%¨°¤Ì ç Šƒ’„æÌå­ º#Œ2qÔ ß ÔÒ Þ Ù Ïä ã Ô Ê É â ß á Ê Ç àÙ Ø × « Ð Î Õ ß ÔÒ Þ Ï Ý Ç Û × ¬ Û Ï ÌÏ Õ ­ Õ ¬ Û Õ ÚÙ Ø × Ö Õ ÔÒ Ì ÐÏ Î Í Ì Ë Ê É ­ È Ç Æ q˜#¨4’q”s¦£’Š8ÓÌ ˜€˜’ejQ{Î ’‘ܐ›ŽQfÓiqjfˆ8ÓÑ{Š|„¬ ±#E{Š#¬ Å

œ

uv qi¦

» š ¨œ ‰ » ‘ ¨œ ¹ ¹ } €} w W‚ “ž˜ Tr‹ “q‘ T¤xˆ¡s‹¨ºx•qv¨¢†E#t ¨Ã€‡ œ  ¤ ‰ ­ }w~y‚ µ Gi‚ q”€‡ FÀ x ¢W¡  Ÿ ‰ ÂÁ ÂÁ FÀ

For example, the

function in

is defined as:

}zzz‹ {||%Š°

‹ F®

} € œx wœt œ §¢ †W   § µ ‘£¨¦ŠŠ˜Šv

x …t •˜i‚

} |{Š° zzz ‹

‹ ®

arg

arg

expression

}zzzx ‚t  º||{£q…

›

œ

x ¢W¡  Ÿ ‰ …t y‚ µ †Gi‚ €jq‡ ˜i‚ 

‰‡ qq‡

}x Ä¢ œ¦¡‚ {•€†‚ µ „’€w €

}zzzx œt ‘||{¦iS

¿

³

This allows either of the location and scale to be specified. Optional arguments are the parameter , the initial value for and a convergence tolerance. The first line removes all missing values. The function checks if a parameter is supplied. Two constants are then calculated as functions of . The rest of the function is a loop. In general loops are inefficient in S and should be avoided if at all possible, but here we have no choice as the calculation is iterative. Finally the function returns two components, the location and scale.

ß à ­ × ­ Õ à ¬ Û × ¬ ÛÒ Î Ï €Ñë÷›åq6éSQ"—|­ €« ê Ø­ ÊÉ à ¨×#S˜­ Ø¬Û ÊÉ à¬ iQÜ”qQÛ ÖÞÇÈ |#€FÆ ßq˜­œìð«GÐFÎ2±ß$¨|Sà˜V’"%ðeq˜|Ss¦2±$iQ܈¨Q3’"%#åË Ï à É Ø ­ Ê ­Ò ­ Æ Þ ö ö ß à ­ ì « Ð Î É ß Ø ¬ Û Ê à ¬ ÛÒ ­ Æ ÞÒ Ò ê ßÞÎÇ ô­­Ò ÎÈõ­ ÊÉ Ø ¦€|„Æ #s)—˜˜|±q­ Ø ô ëï ßØ ¬ Û Ê Ô ÔÒÒ Û¬­ Ê É ­ †Ì ˜ß€{$iQܔ8#ŒÑ’|”s­ ß ß ­Ò è ÌÏ ­ Ï ÛÒ â #F)—F‘Fs­ 3åË Ï Ì ß Ô ÔÒ ¬ ­ Ê Ø ¬ ß ß ¬ ÛÒ è ÌÏ ­ Ï ÛÒ %ô$„8ŽÛ˜p×#Éé†6Û žQ"—F‘Fs­ 3åË Ï ß€˜­|qÖ à ì ¬ Û Õ ß Õ à ­ Ö à ¬ ÛÒ ñ Þ Û ªÒ ÌÏ Û ª Ê É Ô î àq6ói„Ô‰Ó˜°ìò2ʐq6"å{iя•‘Ñø#ðÔ ÎÞÇ ªÇ â |#Šp€È ߊƒŒÛ{$q|Ý2q2ìÃéˆß Å Î2é{Ò(”|ŠÖ Ê픀{ŠÆ ÖÒ È Ð Ì ì Ö ë Ê Ê Ø ì ëï î Î ÊÉ ÞÎÇ ØñߊƒÒŒ|˜q#ø”#É Å Î Ê Ö ÛÈÐ̪ ì ë Ê ê ­ ÊÉ Ø ×#éq­ ­ ÊÉ à ×#S˜­ Ç­ â 6« Ç ê ß ÔÒ ÝÞ Û ÊÉ à Š8†˜’6#S˜­
31

It is sometimes useful to be able to time commands:

which return the total cpu time and the elapsed time taken by a command or sequence of commands enclosed in . Note: as these are functions, assignments inside them are in the frame of the function rather than permanent. Alternatively, use before and after a group of commands.

} … WT œ {xˆ†|¢p¦¡ µ Šu

³T T T ºEa¦­

» •†…å†Ñ†pUY ‚ ’t£‚ ¹ } Yx  … ¢ WT ¢ } Yx } ™ ¹ } Y x™  … ¢ W T Y ¢ x   } Y x Š» F‰ Žw•’|bgB†‚   ¨… ˜€   ’t£‚

¢W¡ ƒ utv µ G’‚   Ÿ ‰ ¦ #€q ¢ W ¡  qŸð‰q‡  … ¢€ u µ G’‚ qðq‡ †|W   #¡

¾ }x ‚ € ¢ {yq˜¢ q€ —…

€

 …  …

¾

ê

Writing Your Own Functions

13.1 Model Formulas A model formula couples a y-vector with a model expressed in a terminology very similar to that of GLIM and GENSTAT. The form is

for the linear regression of on and . Factors are replaced by a set of indicator variables for the regression, and can interact via the operator (not as this is a valid character in a variable name). Thus we can have all the following constructs:
equivalent to

nested layout parallel lines line thorough the origin quadratic polynomial natural spline smooth function, for

The syntax of a linear-model fit is

where the names in the model formula refer to columns of the data frame, which can be omitted if it has already been attached. For example

This show how to extract information from a fit by the use of ancillary functions. There are no standard ancillary functions for standardized and Studentized residuals, but I have added them as and in .

}

}F}—$8T € ŠŠq…Œ#¢ € žœ‰•}å|ºT € ž„F…ˆ¦’EW ¢ Ÿ £W µ iu  …v  œ w Wx ƒ  ‹ …v  œ w Wx ƒ W x v }—$v8T  œ qW W # ¢ ¡ ¢ … w ‚ € €Š—wx «T € aŠ¨„¨•’SŸ  µt¡  } |v  Šq…x tq ‚ … œwW Ÿ }†$ºT € €€„q…£¨¦†€µ … $' …v  œ w Wx w œt …  € } ‚ t x ˜v … …v  w € ’W — ƒqœF’¦ ð€q€Šµ v ¨ŠFŠq§‡ «yT #€qi„stW  ù ‰ ˜« žœW € } œ  §q œ x ¦ ¡ t aW } w  v u ¢ œ  x w œ t œ § ¢ †˜€Š˜’‘£¨¦ŠŠ˜Šv 

} w v u¢ œx wœt œ §¢ ˜qŠ˜i«yF¦ž„$„v

‹

model formula

data frame

ÛÞ ‘qè

z

ú

vt¢W¢‚ €F†B†$¢ — u µ Š ù $„i  ‚¢t   œ ‚¢ § § ‚œt ¦W‚œ € € |†€§ µ ¥ € F¦iw ù qEq#ž„W ' W‚ …Wt œ W û µ E#FG€ŠŠîü‚ F€ ¢ µ u ù |W   …¢ W‚ …Wt œ 6#’FsFŠŠW · ‚ F€ ¢ µ u — E#FG€ŠŠW — ‚ F€ ¢ µ u ù |W  W‚ …Wt œ  …¢ µ µ

‚ € ’W

›

W‚¢ œ a˜i„u

€ ‚ƒœt q€ ’F€¦i¦

}‚¢ƒtœ € ’€˜i¦qž«x 'ù }’©FýŠWauq¡ŠœF’WE˜¢2‹ ‹  ‚ ¢ ƒ t  œ ‚ ’ ¨F$’FqŠ‘x € ‚ ù } ‘ ‹£q˜ŠFqž«£v µ u ù ‚¢ ƒt  œx w ‚¢ƒt — š‰ q$’FqŠœ ”’¤ù

x … ¨˜v

€€€„µ v ‚ — € ‚ƒœt ù €€ € #iW 2q€ ’F€¦i¦ q€„µ v 

} œ |x € Šqƒ   W €

} œƒ {x € Š€„W €

¡i‚ ¡i‚ ¡i‚ ¡ i‚

µ¡  µ¡  µ¡  µ¡ 

ø

Statistical Models

32

13 Statistical Models
These facilities form the heart of the 1991 version of S. They are based on object-oriented extensions, so that generic functions such as know what to do with the results of various models. The two most basic notions are a data frame ( 4.9) and a model formula.

ß Ò q!åÈ# NJh« ހ39#Ì #€ #$¤Ñ$Ë  ¬   õ ÞÇ õ ËÐ Û¬ ß ÐÞÙ ­ Û ÍÒ Ô ÈÞ Û Û¬ „Ñ8å‘Ç Å —{iђ­ © " " à áÙ % 5 á Ø ØÙ % % Ú 5Ù ë $)#$#4ˆë¦8$#á #‘Œ)80ˆ#" #$(•$%Ø Ú 5 "Ù # Ú " ª ¬ È 7G$Цè 몬ÐÈ Ø ª¬ÐÈ ß ÎªÇÍÈÇÎ 6 qG$¦è †G$¦è ŠFpq|€¦Ì 4Ò ß„Ñ8å‘Ç Å žE¦Ì $¦$„|F˜Í ©  Ð ÞÙ ­ Û ÍÒ ­ Î Ç Ï Í Ï Ë Ë Ç Ð Ú % Ø 4َà ž% (•)¥4Ø #3Q$Š${FGÇ 2 'Ù ë á 5 5 % á" ­«Þ¬ÝÏ­ '$Ѩ1Šë#'$áÙiØÃ##&Š0ˆ)ë(Ž%$#&#Øq4َ$##" " à Ê Ç 'Ø à % à Ø 'Ù # #Ù ' á #Ø Ú % # à á Ú ª¬ÐÈ G$¦è ß È q!җ DŽ$$ހ #Ì Þ#€ #$¤Ñ$Ë  ¬«  õ Ç õ ËÐ Û¬ ß ÐÞÙ ­ Û ÍÒ Ô ÈÞ Û Û¬ „Ñ8å‘Ç Å —{iђ­ ©
which gives The analysis of one-way layout is best illustrated by an example. The table gives data on observed concentrations (ng/ml) of a chemical in groups of 10 patients after oral administration of almitrine bismesylate: 13.2 One-way Layouts
13.2 One-way Layouts print out table and the parameters and on log scale test for linearity of response set up for AOV Label the observations by dose Make a factor from the doses Function to compute st. dev.

} q} … ¡ }v t¡¢ … ¦¡x x x wœt … € ˜i¦#x‹£u ˜µ Š — Š „µ «y µ v ù q€FFå|’} #‘yt v ŒS ¦ t x£w¨œ¦t†…€…   €   œ } € ƒx †S ºT µ |’µ#¡ £¨¦†€…  $' … } € ˜i#¡6¸u #µ „ ù „€FF—|i#¡ £ µ %yµ S µ t € €‡ S µ «T € $’' … ¦ ‹   œ }vt¡¢ … ¦ x vx t …  ¦$€¡ } ’S t …  ¦ ¡ ‰ W ‚  ¢ ¡ µ 8T }€ |it#‘x … a¦#¨x „¨¢œ tq…  ¡ †S ºT € |’#¡ £¨¦†Ÿ€… µ   w Ÿ } € ˜’¡ ¸u µ „ ù F€F嘒%yµ S µ t € €‡ S µ «T € $’' … ¦ ‹   œ vt¡¢ … ¦¡x t …  ¦$€¡ }„vq€F嘒s¨u „ºå†˜žœ &’‰sŠƒ t ¡ ¢ … ¦¡ ‹ œ x  …t T t Wt …¦ ‰ ‡ w $’¡ } S  ƒ   ‹ ƒ ‹ v Ÿ t ¡ ¢ …  ¦ ¡qx ðv u u t †Gž„W µ Ž € ‰ŒqF¨†˜’%y#€’€#iW  } ‚t … ‹ ƒ ‹ vt¡¢ … ¦¡x wv uut †q€ Ž €Šµ ‰ŒqF¨†˜’%y#’€#iW  } } € ƒ ‹ v t ¡Š¢ µ …  ¦ ¡ x W ¢ v x v u q„ „µ ‰•q€—˜Š%ŒBq’u £W ’€Y §  } € ƒx W¡ Ÿ€ ‰   œ Š „µ ‘Œœ µ G€t ðq‡ µ u µ µŠ  } ‹} x ¡x u –š ˆ¨Gž4‘ƒ &‹ qF¨†˜‹ i€§‹£q‘q•%y#žœ q€F— $„µ¡ƒ  ‰  } –– –– ¦ ‚ ¨¦ Wqt € T v t€¡ ¢š …  –#¡“ ¦ x “ t ¡ v tq¡‡ ¢ … €’¦ } } Y x œ t S x W ¤ } Y x ¢ W €’‚ ‰qŸ ‰ S  ƒ € qwŒF’ty6œ #€ ’t£‚ µ G(q‡ðq‡ Gž„W ' ¡   –¦š ”
¢

•€q‘ ” ‘€q“ ‘ –ž˜š ˜ ¢ ¢ ¤£‘ ¢ ¢ ¤£‘ ’“ Š™ “” €š ‘ž¦‘ ™ ”‘ €q‘
¢ ¢ ¤£‘

• “ š ‘ ™

˜ ’

•” €š ˜’ Š|š –’ i|š ”– €š ¢ ” ¥š ™™ €q™ –‘ €š –˜ ž¦‘ š¦‘ ˜ •“ €q‘

” €” –š €š ‘ ž¢ “ €” “ €” “ ž¢ ““ q¦š š¢ –q¦š “ ‘ €”

šž“ ˜ ¦‘ ” q‘ – Š™ ‘ q“ šž‘ ”’’ –q“ •’’ ’ F™ “ q‘

–€–q‘ –– €š – €“ } ÿ þx € ƒ œ †«½ „µ ¤   €ƒ

W¡     sqŠ¡§ $€

33

á „(•£R˜q|ŠhdÊ 2 Fa)„$QB Ø á #Ù à Q Ý Ç ÈÞ ¬õ  Ç « ªÏ Î « ¬ ÛÐ Ý È Ë Ë Ð ­ Ç Ç È èÇ Ý á " ÌÐ ' 'Ø "Ù à Q È Ð È ÈÇ Ý ÈÞ Ý Þ Î­ « Þ ¬ ÝÏ ­ %˜Ç#ǀ#$2Gq|F”|$1Š(•“e˜F#|S¦|¨¨Ì q|×$Š${FGÇ 2 # ë ë #Ù )#(ˆà Ú Ú ë ëÙ ##4ˆà ë Š%‘•à $#4•à P å#Q#« Ø àØÙ # ë ë àÙ ÙÇ­ÐÝ 5 ë à àÙ )#4ˆà # Ø ëÙ " )á „4ˆdÊ ë Š%‘•à #(•˜Ê cå#Q#« Ø àØÙ % Ú ë "Ù à SÙ Ç ­ Ð Ý à à à àÙ ##4ˆà )#4Œ4Ø # à ë áÙ % ë Š%‘•à $£a’Ø aå#Q#« Ø àØÙ à ' 5 %Ù bÙ Ç ­ Ð Ý à à à àÙ ##4ˆà )$¦aŒ§' ' # à %Ù % á à Ú àÙ #4•à )(Ž% Š¦’#€|qF%`Ò à Ú 5 5Ù ßΪÇÍÈÇÎÌ6 ß Y Î R© —B„$$€Î ˜F#X•F$VŠ$$€ ÈÐÈÈW Ù ÝÎ Ǭ«Þ Y Ò È Ç ¬ « Þ  Q ­ Î ÌÇ Ï Í Ï Ë Ë Ç CEF%$¦$Š#|FÐ P Ø # % áÙ à á à # ëÙ à ë ' Ø à àÙ à Ê 5 #Ø ëÙ à Ê á à 5 ÚÙ à „¥§4ˆ÷)4•±#$à „ˆ˜”$&„4•˜S#$ˆÑÊ ñÞ {€ U" S Ì ˜€˜q TØ ÞÏÝÇ S ÌÏ „ Q ­ « Þ ¬ Ý Ï ­ CQ$Š${FGÇ 2 ß Û ¦­f‘Ç Í × Þ Î Þ Õ Ç ­ Ð Ý « A ß « Þ Í Ï Û Í Ò è Ð « × Þ « ¬ Û È Ð Ë Ò  Ð Þ Q « « Å ð2q|¨Ý 6#˜3€$˜‘Ç Å ‘å$#œ2FaQ{$¦8姘Rs$Þ P ß ß ­ Û Õ Ç ­ Ð Ý « A ß « Þ ÍÏ Û ÍÒ è Ð «Ò  Ð ÞÒ «Ù Ô ÈÞ Û Û ¬ F‘Ç Å Í #Q#ÏB€$|«Ç Å å˜tåÑŽÛ pŽ{iђ­ ©
(As far as I can see the use of is necessary to get results for the individual coefficients.) This shows that the response can be regarded as quadratic in log(dose): There is a ‘clever’ way to test for linearity using a re-parameterization of the factor as an ordered factor, for which the default parameterization is polynomial in . (This relies on having levels in an arithmetic progression. One could always use in place of .)

I7} v  S  %x H ‹ z z z v € €’G€„Dq||2‹ š u µ œ   

G

… T w t… ˜vºUFœF†q…  ˜€ } … q} € ˜i¦#¡x‹Ž ƒFv qqtqå|’¡‘£ v £S ‘¨˜«g¨¦†€…   } v ¡¢ … ¦ x x tx …vT wœt … }€Šµ } ƒ x v q}F„ €qµù ‘Œ µ ‘xœ µ Ws¡Ft Ÿ ¦ž¦µŠ€œ üq‡  Šµ F' x µ ƒ  œ  ƒ µ ‰ € ƒ$€v  €Šµ Fv ƒ } ˆq ‘£ %y#v µ u ‹ } ƒx vx w } € ƒ x™ € „ „µ ‘£ µ v „µ µ tœW € W € ŠŠa‚ µ ¡ } å…   ‹   œ z x | $€ ¸u #µ „º|ŒzF z

Of course, the parameterization only affects the coefficients, not the fitted values, residuals, . The contrasts for a particular term in a fit can be changed by the function, e.g. or using .

}x {F

} } ¦ w uT œ W ¡ ¦ ‹ ¦ W ‚  … W t  œ WT œ W ¡ ¦ x ¡ F¨C˜v µ 4•„a‚ µ ¦DåE6#’FsFŠŠb¢„6‚ µ §åŠý € W € €Š6‚ µ ¡ x € ‚ µ au ¤ tœW ¢W µ }}¦ w T œW ¡¦ ‹ ¦   T œW ¡¦x ¡ FFE˜v µ u ¢Ša‚ µ DÓC… $€ ¢„6‚ µ §åŠý € W € €Š6‚ µ ¡ x € ‚ µ au ¤ tœW ¢W µ
and the GLIM parameterization by
13.2 One-way Layouts

The parameterization of linear models for designed experiments is a little tricky. The usual parameterization is to impose a ‘sum to zero’ constraint on the parameters for a factor. GLIM sets the parameter for the first level to zero, so that parameters for the the other levels are differences between that level and the first. By default S uses the Helmert parameterization, which compares the second and subsequent levels to the average of lower levels. The usual parameterization can be gotten as default by setting

" " ë ØÙ $)à «ˆà Š)$$ˆ" Ø 5 " # áÙ á" ­«Þ¬ÝÏ­ #3Q$Š${FGÇ 2 Ú 5 " # " Ø àÙ à " ' 'ØÙ $)$)à Šˆ)$1‘ˆÚ #$ˆà ¦iØ ë ë à ë " ÚÙ % à % á àÙ ª¬ÐÈ G$¦è à à à à à à à àÙ à ë ' á 5Ù "Ø ë 5 ' " 5 #Ù Ø ë 5 ' " 5 #Ù Ø ##ˆ2)(•&„$)$$0i„$)$$0i„ë Ø F#Q~嘫 ß Ç­ ÐÝÒ èÐ ß Ò È €!— Ǭ«Þ „$$€ #Ì #€ #$¤Ñ$Ë  õ ÞÇ õ ËÐ Û¬ ß#ßF‘Ç Å ŽshFè î ¦Q#º—$¸BF« #$†‘Ç Å —$t姘—{iђ­ © ­ Û Í Õ ª¬Ð È ß Ç­ ÐÝÒ èЫ A ß ÞÍÏ Û ÍÒ èÐ«Ò ÐÞÒ ÔÈÞ ÛÛ¬ Ú ë " ë àØÙ à Ø 5 " # áÙ #$%‘•6Š)$$ˆ" #3Q$Š${FGÇ 2 á" ­«Þ¬ÝÏ­ Ú q#¨%Š¥§iq#0Œ§$#F§4•@Š#$$0ˆ#ë " Ø Ê Ç ëØ " % Ú ÚÙ Ø á ë ë 5Ù % 5 " à à á % áÙ 5 Ø à # " 'Ù ë ª¬ÐÈ G$¦è
34

% ë % ë ëÙ §F#4•ë )4•# Ú ë 5 àÙ à á " Q$Š${FGÇ 2 ­«Þ ¬ÝÏ­ á à Ú ëØ Ø à " " % #Ù #ë%«Ùˆ)¥5)(’Ø á#$$%‘ÙŽ% $&„4•ë á ј­ €¨ª e{qÎ ' # áØ # "Ø àÙ Ú ÌÐ ÏÐ Q ÎÞÇÈ " à à à à àÙ % 5Ø ë ë " ë ë à Ú áÙ Ú Ø à "Ù " $#à#ˆà”§&Š#4ٕ$#á#4’ØŠ2ë Š#(•$à Ø ë ÌЭÏÐ ÑÑBqqª #$)#àˆ)Ú#(•&Øq)$á#(•"3ë%‘•' " à à à àÙ à # Ú à #Ù " # # à 5Ù à á à ØÙ ë " ÎÞÇÈ {qÎ ß È q!җ Ǭ Þ „$« q #yÌ%q õ#x˜q˜hË  õ ÞÇ  Ë Ð Û ¬  ß  ÐÞÙ ­ ÌЭ Ï Ð ªÒ Ô ÈÞ Û Û¬ ŠÑ8—"ÑÑBqq—{iђ­ ©
which gives The central concept for designed experiments is a factor. Consider the famous Box-Cox poisons data (survival times (in hours) of animals with 3 poisons and 4 antidotes, from Box & Cox (1964), J. Roy. Statist. Soc. B26, 211–252 and Box, Hunter & Hunter (1977), Statistics for Exgenerates the rows, columns and so on – consult its perimenters). The function help page for full details.

à à à P å#Q#« ÙÇ­ÐÝ à à cå#Q#« SÙ Ç ­ Ð Ý à aå#Q#« bÙ Ç ­ Ð Ý SÙ Ç ­ Ð Ý « bÙ Ç ­ Ð Ý « ß Î ª Ç Í È Ç Î 6 c—Q#|iå#Q#݄Fp€{€¦Ì `Ò Q ­ Î Ç Ï Í Ï Ë Ë Ç CE¦Ì $¦$„|FÐ P $ˆÑ{Š{¨« q˜Ð P ËÐ ÌÐÏÎÞ ÇÈÈ Ú ¨€§ij­ ”Šh« €hŠÏÇ Å Î Ø Ê Ç % Ú ÚÙ Ø Ï Ç ¬ Þ Ê ª Õ ÛÐ Ý Ç Ç È Ë Ë Ð ­ Ç Ç È èÇ Ý á " Ý Þ " Ì Ð ë 5Ù % 5 Q Í Ï Î Ï ÎÞ Î­ Ê y%Ñ€#$ésq{¨”¨Ì ðgѐ#(ŽfE$„|­ Š|q|G¨
13.3 Designed Experiments 35

13.3 Designed Experiments

full fit additive fit for 1dofna box plots plot main effects and using medians data in hours

ßßÌÐ ÏÐ Q ÎÞÇÈ ’|˜­ €q–ªe{€#Î î |¦EÎ „Ë î ÑÑB€¨ª î |#€e3Giº„|VåÑ—|’ђ­ ëï­ Ï ÌЭÏÐ ÎÞÇ ÈÎ A ­Ç ÛÏ Î­Ò ÐÞÒ ÔÈÞ ÛÛ¬ ß ß Ð ÞÙ ­ Ì Ð ­ Ï Ð ªÒ ÝÏ ­ Ç ÈÕ ß  Ð ÞÙ ­ ÌÐ ­ Ï Ð ªÒ Ý Ç Î ÎÏ ËÒ Î Ð « ŠTÑ8å4ÑÑBFqp|¦sF4’Š§˜8—"|˜Bqqp˜q%Š8—$#aª ß ß  ÐÞÙ ­ ÌЭ Ï Ð ªÒ ÝÏ ­ Ç ÈÒ ÛÈ Ð Ìõ „Ñ8å4ÑÑB€¨†|FG€ƒŒ|˜q|õ ß ß  Ð ÞÙ ­ ÌÐ Ï Ð ªÒ ÝÏ ­ Ç ÈÒ Î „˜ƒå4ј­ €¨{FG€8—|­ Ï Å ß ß Õ ëÒ Í × u Ð È Ë ÛÒ È Þ €ë Ó~’¨¦¦€3å{Šª ß  ÐÞÙ ­ ÌЭ Ï Ð ªÒ Ô ÈÞ Û Û¬ „Ñ8å4ÑÑB€¨—|’ђ­ ßßÌÐ ÏÐ žÑ˜­ €¨ª î |qe3s’º„|)—ѐр#Î Š2#E%ŠË ÎÞÇ ÈÎ A ­Ç ÛÏ Î­Ò ÐÞÒ ÝÇ Î Ï Ë ÊÉ ­ ÎÏ ß ÌÐ Ï Ð ª ì ÎÞ Ç È Î A ­ Ç ÛÏ Î­Ò ÐÞ Ê É ÐÞÙ ­ ÌÐ­Ï Ð žÑ˜­ €¨øj|qewsi~Š{)姘””Ñ8—"ÑÑBqqª ß ÌÞ Ï ÝÇ Û× Ì ¬ Ë Õ ­ Ç ÛÏ Î­ Õ ÌЭ Ï Ð ª Õ ÎÞ Ç È ÎÒ Î Ð « Ù ÌÐÏ ÎÍ Þ ÈÇ Î Ì ’%$q˜i%q#B“†si~Š{›Œ|˜Bqq'•|qƒå$#a–ª£Ñ{Š{€{€¦Ï ß ­ Ç ÛÏ Î­ Õ ÌЭ Ï Ð ª Õ ÎÞ Ç È ÎÒ Î Ð « Ù ÌÐÏ ÎÍ Þ ÈÇ Î Ì ¦si~Š{›Œ|˜Bqq'•|qƒå$#a–ª£Ñ{Š{€{€¦Ï ß ­ ÌЭ Ï Ð ª×Þ ÎÞ ÝÕ ÌЭ Ï Ð ¦"ÑÑBqq$р{¨ƒŽÑÑB€¨ª Î Þ Ç È Î A ­ Ç Û Ï Î ­ Ò È Ð Î Í Þ Ë Ù Î Ð « î |#€e3Giº„|Vå$¦|#€"Ž$#aª ß­ÌЭÏЪ ÍÞÎÎ ¦"|˜BqqpÒ Å #€#|Þ ß Þ Ï Ý Ç × Ì ¬ Ë Õ ­ ÌÐ ­ Ï Ð ªÒ Ì èÏ ­ Ç ÝÙ Î Ð « žÌ ˜€˜’Û ¨Bó†"|˜Bqqp•$%FG¨4Ž$#aª ß ­ ÌÐ ­ Ï Ð ªÒ Ì èÏ ­ Ç ÝÙ Î Ð « ¦"|˜Bqqp•$%FG¨4Ž$#aª ß ß Õ "Ò Í × u Ð È Ë ÛÒ È Þ €ë Óv’¨¦¦€3å{Šª ß ­ Ç ÛÏ Î ­Õ ß ­ Ç ÛÞ Ì ËÕ ß "Õ %Õ %Ò ÍÒ Ì Ï ­ Ç ÝÙ Í Þ ËÒ Ç ÛÞ È ËÙ Þ Î Þ Ý Ê É ­ Ì Ð ­ Ï Ð #s’ºŠ{…ŠFGi«Š$3’€t•(•c„Ó$è ¦sF4å#€ƒ’i«€#"åq|F2#"ÑÑBqqª ß ß p 6 6 6 pÕ p 6 6 pÕ p 6 p Ò Í × Ì Ð Ï Ð ª Õ %Q Ø × « ª Ç È Õ ç %Q Øã W r r W b × Î Þ Ç È ÎÒ Î ­ Ï « Ê É ­ Ç ÛÞ Ì ˜7)$d`žU$d`žUdTž¨¨Ñ˜­ €¨ˆisŠ#ap€óåis‘• 2 $$)$§F|#€#8å{Bqœ#s’‘„$Ë ß p Î Þ ÝÙ Ì Ð ­ Ï Ð ªp Ò ÌÞ Í ­ Ê É ­ Ç ÛÏ Î $q|¨£|˜Bqq…T•%#×#s’º„|­ ‚   ƒT ¡ €˜¢ € Š8&qt Ÿ

13.3 Designed Experiments

36

B median of stimes 4 5 6 I 2 4 3 1 II

mean of stimes 4 5 6

D

D

C

C

3

A treat repl Factors III poison

A treat repl Factors

3

12

10

stimes 6 8

4

2

treat

3

median of stimes 4 5 6 7 8

mean of stimes 4 6 8

poison II I III

A

B

C

D

A

B

treat

treat

20

4

resid(poisons.aov)

15

2

-2

• • • -2

• ••

0

5

••••••• ••••••••••• ••••••••• •••• ••••

10

-4

-2

0

2

4

0

-1

resid(poisons.aov)

Quantiles of Standard Normal

4

• • • • • • • • 2 4 6 • 8 • • • Log Likelihood -100

95%

resid(poisons.aov)

2

• •• •• • • ••• • ••• • • • • • • • • • • • • • • •

0

-2

-140

-120

-2

-1

fitted(poisons.aov)

Lambda

Figure 5: Plots for Poison data

‚

2

2

†

ˆ



€

‚

†

A

B

C

D

2

4

stimes 6 8

10

12

I

II poison

C

0

0

ˆ

‡

„ƒ …

†

 € ‚

ˆ

‚

‡

†

ƒ… „  ‰ 

†

 € ‚ € €

B I

3 2 1 4

II

III poison

III

poison I II III

D

• • •• •• •

1

2

1

2

Now consider a Latin square. Six litters of six piglets were ranked in order of birthweight, providing a table, and each piglet given one of 6 dietary supplements in a Latin square. The weight gain (in kg) over 12 weeks is given in the table.

— ˜ D™—

% ë % ë ëÙ §F#4•ë )4•# Ú ë 5 àÙ à á " Q$Š${FGÇ 2 ­«Þ ¬ÝÏ­ Ú ë " 5 ëØ ÚÙ à " ' á á #Ù 5 ë # ë 'Ù "Ø % áÙ #$$%Šˆ)$#(•à )$#(’Ø 1’ˆ' Ú Ñ˜­ €¨ª e{qÎ ÌÐ ÏÐ Q ÎÞÇÈ # ÚØ Ú ëØ àÙ à ë "Ø Ø 'Ù $%Š%Šˆ#&„(•á ¦)(•Ú Ã)(•Ú Ø ë % ë 5 "Ù Ø % ë 5 "Ù Ø ß ëï ­ ÎÏ ËÒ €{FE%Šƒ’6 " à à à à à àÙ à % 5Ø ë ëÙ " ë ë á à Ú áÙ Ø Ú Ø à "Ù " $##ˆ”§&Š#4•$##4’Š2ë Š#(•$à Ø ë ÌЭÏÐ ÑÑBqqª # " à à à à àÙ à # Ú Ú à #Ù "Ø # # á à 5Ù à " á à ëØÙ ë $)#ˆ)#(•&q)$#(•3%‘•' " ÎÞÇÈ {qÎ ß Ò È q!— Ǭ Þ „$« q #y%q #x˜q˜hË  õ ÌÞÇ õ ËÐ Û¬ ßß ÌЭÏÐ Q ÎÞÇÈ ëï­ÎÏ #’|˜Bqq–ªe|#€#Î î |¦E%ŠË î ÑÑBqqª î |#€#é3Gi~Š|V姘—{iђ­ © ÌЭÏÐ ÎÞÇ ÈÎ A ­Ç ÛÏ Î­Ò ÐÞÒ ÔÈÞ ÛÛ¬
13.3 Designed Experiments 37

indicating the need for transformation. The pansion; is equivalent to gives up to n-th order interactions.

There is no direct Box-Cox function, but we can do the operations by hand. They are quite slow (25 secs on a SparcStation IPC), due to the overhead of calling the :

‚ µ †s’‚ €Ÿ S µ t ¢W¡  

} µ ‚ F€ ¢ µ u — G€ŠŠW ” ’|†W € yY µ ¡Y µ – Wt œ ù  …¢ x F } w v u¢ œx wœt œ §¢ †˜€€Š˜’‘£¨¦ŠŠ˜Šv  wvu¢ q’$’œ Y µ ¡Y µ – F ßp •Ú' p Õ ÞÍ­ ì ØÙ $¡)BDå« #|鑕à î Î ~€« i„‘•6ƒ’—{€Î Ï ÛÏ Õ ß çØã « ñÒ ÍÒ Î ñÇ ç ëã ß p ÌÏ ªp Ò È Þ ª ô ß ç "ã ß p È ­ ¬p Ò È Þ ª Ê ç %ã ß p È ­ ¬p Ò È Þ ªÒ Ê É Þ Í $4†$qždhå|„’˜Š$0˜”|4dhå|„ÏS#a˜”{"dhå{Š'#S« #­ ß à Õ Ï ÛÏ «Ò Ç ÌÏ « q“•Î ~€t’Š‘€QÆ Þ ß Ø Õ Ú 'Ù àÒ õ ­ Í õ ì ÚÙ à Ê Î Þ Ý Æ ÛÞ « Ê É ÎÏ ÛÏ $Üå(•º†˜BÏ Å F2ð4ˆ÷”|Þ Å ¨¨¨«¨°ð%~€« ç „Ö q6$#…—|ie#%€6˜ÆŒÖ q6˜¸#ð|Þ Å ¨¨¨«¨« ß Ï « è Ð «Ò ñ Þ Û × × ÖÏ « è Ð «ã Ï « è Ð « Ê É Î Þ Ý Æ ÛÞ ß$“’p’2NJª fžp“Ð Å q« €%ϊ$¦`yÃ%¨6fžU¨¨¨‘q`’j%¨6ñ ˆÖ q6˜Òå68—$#aª p « × Ô Î Õ Ý Ð Ï Ç Ö b èÐ bp × ÆÞ « Ô Õ p Þ Ý Æ ÛÞ bp × ÆÞ « Õ Ï « èÐ « Õ « ñÒ ÎЫ ê ê Û Ì« Ì Ê ß ­­Ò èÐ ì ëô Ì Ê ÊÉ çÏã ÖÏ «èÐ |èF|aeeFsV嘫 F˜„é÷ð%ºŒ%€e$« ß ïßÝÏ­ ȑ ÌÐ Ï që|q|¦sǀ{ߞј­ €Ð¨ª î |ÇqeBFGi~Š|V嘅—ÑŽ˜p×#s­ ÎÞ ÈÎ A ß ­Ç ÛÏ Î­Ò èÐ«Ò ÐÞÒÒ Û¬­ ÊÉ ­ â Ç­ 6« Ç ê Û Ì« ßØ Ê ã « ñ ß ­Ò èÐ ô Ì Ê ß ã « ñ ­ ÞÒ èÐ ì Ì ÊÉ çÏã ÖÏ «èÐ {èFÑQÌ ì˜$qqç Ï~ˆ6ƒÒ î ¦­s)—$#« ì¨ëъ6eß„ç Ï~ˆ6ƒÒ’4Æ —$#« ŠÜð%ºŒ%€e$« ßëïß Ï­ÇÈ ßÌЭ Ð €{€Ý{FG€|‘§’|˜BÏqqª Î Ç È Î Ï « ñ ï Ï Î ­ Ò  Ð Þ Ò Ò Û ¬ ­ Ê É ­ î {Þ€#eAç ºã•6%F­sǒۺ„|)—ÑŽ˜p×#s­ â ß Ø àÙ ˜Š4•à © „ç º•6ƒ’"%—Ë Ï ß Ïã « ñÒ ­ ÆÞÒ ß ß « ñ Î è ÌÇ «Q ÌÏ ÏÒ È Ð â #€6ƒÒ Å F%¨ÆsØ %å˜FË ßßFGi~Š|V#FFp嘰ü{F|aÌ ­ Ç ÛÏ Î­Ò Ý Ð È ªÒ è Ð « Ê É Ûè Ì« ß­Ç ÛÏ Î­ Îè Ç « ÊÉ ¦s’ºŠ{)Ò Å ¦Ì F¸¤Ì ß « ñÒ È Ð Î Í Ç Ù ­ Þ Ê É Ï « è Ð €e8å˜F{€iåGS#ðÖ q6˜« ß ØÙ à × Ô ÆÕ ØÕ ë ÊÒ õ Ç ­ Ê É « ˜‘ˆ#F¦"žÓѐќ#ð6ñ ‚ Wtœ W µ F€ ¢ µ u · Gq€ŠW — ‚ € ¢ µ u — G‘qEža„WTœTTt} åÀ µ x
A more efficient way (4 secs) is to use the function in the library :

‚ ¡} œ sqt Ÿ x  W¡  } Wt œ W€ µ ‘ ¡’‚ F€ ¢ µ u — G€ŠŠtx µ

function protects the argument from exand generally

ÙÙÙÙÙÙÙÙÙÙÙÙÙ #####Ù á à à àÙ ##4ˆà )$)4Œ% ' ' " àÙ ë Ú " %Ù £a•à $$#(’Ø ¥|˜Ï  " # Ú 5Ù  Î Ç Ú 5 " àÙ #$)4ˆà )$#4ˆë 5 # ë ëÙ ë Ú " %Ù £a•à )(•à $|˜Ï  à à 5 'Ù W Î Ç " % ë %Ù £¦aˆà á Ø #Ù #Ú „(ˆà ë Ú " %Ù £a•à #(•à  |˜Ï  à Ú Ú "Ù ÎÇ Ø Ú á "Ù „#(ˆà )#(ˆà 5 á ë 'Ù ë Ú " %Ù £a•à $$¦a•à P |˜Ï  " " à %Ù ÎÇ ÚØ à "Ù %Š#(ˆà )#4iØ 5 à á àÙ ë Ú " %Ù £a•à &Š¦a•à e |˜Ï  5 Ø á %Ù ÎÇ ÙÙÙÙÙÙÙÙÙÙÙÙÙÙ ######Ù à à à àÙ ##4ˆà #ë „4ˆ1Ø ë Ø ëÙ # # 5 à "Ù $$#(•à à Ú à áÙ #4•Ú Š¦’#€|qF%`Ò ßΪÇÍÈÇÎÌ6 ß Y Î R© —B„$$€Î ˜F#X•F$VŠ$$€ ÈÐÈÈW Ù ÝÎ Ǭ«Þ Y Ò È Ç ¬ « Þ  Q ­ Î ÌÇ Ï Í Ï Ë Ë Ç CEF%$¦$Š#|FÐ P Ø á à 'Ù à Ø 5 "Ù à Ø Ø ØÙ à á à ' ëÙ à Ê Ø Ú àÙ ë „#(ˆ÷Ú Š)(•6„ë ‘•#$ˆ˜eŠˆÑÊ ñÞ |q ¥" S Ì ˜€˜q3dØ ÞÏÝÇ S ÌÏ ‘Š Q ­ « Þ ¬ Ý Ï ­ CQ$Š${FGÇ 2 ߞϊ{Þ¨«|×2ÞqÎ|¨Ý ˆ|˜Ï î {ǀ%€« î ¦Ì €¦6’¦€#82FaQ{$¦8姘Rs$Þ P Ì Î Þ Õ ÎÇ È ÎÎÏ Ö Þ È Æ A ÌÏ Þ è Îu × Þ « ¬ ÛÈÐ ËÒ ÐÞ Q « « ß  Ð ÞÙ ÌÏ ÎÞ «Ò «Ù Ô ÈÞ Û Û ¬ „˜ƒ£‘Š|F…ŽÛ pŽ{iђ­ © ' ' ë # á ÚÙ à ' ' Ú á "Ù Ø $)$#4•$)0i#Ø #2Q$Š${FGÇ 2 àë ­«Þ¬ÝÏ­ à¥§Ú „à Šˆ2#&„$#)4Ž8"$#$#(ٕ6Š#&Ši#Ø Ú ' % Ø Ø àÙ à Ø Ú # àÙ % à Ú " ë " ë Ø Ú 5Ø áÙ Ø ÎÇ |˜Ï  5 % ë á ' àÙ à " à 5Ø 5 ë " à % % ÚÙ Ø % à ë 5Ù $5¥§#¥%ˆ2$ë#&„(ٕ$#)F$§4’ؐ’§0ˆ5 Ú ÈÇÎÎÏ |q%€« " " Ú % Ú % à à 5 # # # 5Ù ' à % # ÚÙ Ú % ë 'Ù $$á#F§Fو2$)á$)(•ë$##¥)4’Øqà¦0ˆ5 Ú Ö ÞÈ ¦Ì qFÆ ß Ò €!—È Ǭ«  „$$ހ #Ì Þ#€ #$¤Ñ$Ë  õ Ç õ ËÐ Û¬ ß ÐÞÙ ÌÏ ÎÞ «Ò Ô ÈÞ Û Û¬ „Ñ8£‘Š{¨…—{iђ­ © ß  Ð ÞÙ ÌÏ ÎÞ «Ò «Ù Ô ÈÞ Û Û ¬ „˜ƒ£‘Š|F…ŽÛ pŽ{iђ­ © ß ÐÞÙ ÌÏ ÎÞ «Ò Ô ÈÞ Û Û¬ „Ñ8£‘Š{¨…—{iђ­ © ßÌ ÎÞ Õ Ç ’‘ÏŠ|F«ÒˆÎ{$Ï hî |ǀ#Î Ïq« î ÖFÌ qF6gFq#c姘””Ñ8Š{¨« © È Î Þ È Æ A ÌÏ Þ è ÎuÒ ÐÞ Ê É ÐÞÙ ÌÏ ÎÞ ßÎÌÇ ÛÎÞÇÈÎ Õ ÎÇÏÝ ÊÉ ÎÇ „F%i|{q“•|$qºÒ P #ð|˜Ï  © ß ÌÏ Î Þ «Ò Ì Ï ­ Ç ÝÙ Î Ð « ž„|¨t•˜è FG¨Œ$#aª © ß ÌÏÞèÎu Õ ÎÇÏ ž¦€#gˆ{$qÝ î Ւßßq4Qs؄×F{ǀÎ%πÕÓ4”ØŠ¦ÖFÌ qÈF4å|­ π« i߀á Óá~’ÍÓÌ$%ÏFsF4—€ƒ’’‘€#"—€|F2Ê{¨« © á È Î « áQ × Þ ÆÒ Î Õ Õ Ò Ò è ­ Ç ÝÙ Í Þ ËÒ Ç ÛÞ È ËÙ Þ Î Þ Ý Ê É ÌÏ Î Þ ß ÎÇÏ ÝÒ ÈÐ ÎÍÞ Ë ÊÉ ÎÇÏ „|$qº—$F{q±#ð|˜€Ý © ë4•j4•ˆ#4•S#(•0Œ% ëÙ á % àÙ 5 à ÚÙ 5 á ÚÙ Ú Ú "Ù á Ú 'Ù 5 5Ù Ú Ø 5Ù á % #Ù Ú 5 ëÙ á áØÙ # Ú àÙ $(•eŠ(•”0ˆ)4•S%‘•ˆá %$a•”$4ٕ$0Œf„4•é„4•$ˆ" %Ù Ú " á 5 ' "Ù % Ø áÙ Ú Ø ëÙ Ú # ÚÙ " % á % % 5 ' %Ù 5 Ø %Ù Ú 5 #Ù Ú # ØÙ ¥aٕj$aٕ¥iˆ@ža•S)(•&«ˆ# Ú àÙ " 5 á " ÚÙ á # #Ù 5 % 5Ù Ú ë 'Ù 4•á”$(ٕ$ˆ)(•x§(•0ˆÚ " #Ù % ëÙ # ÚÙ á ë 'Ù Ú Ø ëÙ 5 % 5Ù $(Ž%4•5$ˆ#(•é„4•0ˆÚ ßÒ ÌÞÍ­ ÊÉ ÌÏÞ èÎ %•%°q‘F€#u ©
W P  e d  d e W  P   d  W e P d e 9 W P  W 9d e  P   P W  d e ©
The last command gives t-values for the contrasts (diet ?

i

13.3 Designed Experiments

diet A).
38

ßppÕÒ ÌÞÍ­ ÊÉ ÎÇÏ ˜$`¨•%#×#ð|˜€Ý

13.4 Generalized Linear Models

39

13.4 Generalized Linear Models The functions and have extensions which fits generalized linear models, and which further extends this to allow semi-parametric smooth functions in the explanatory variables. We can, for example, fit the poisons data by a gamma GLM:

note the ‘G’

analysis of deviance table

, and Once again there is a whole range of ancillary functions such as . The latter will produce a four types of residuals, but uses deviance residuals by default. The argument is also used to specify other aspects of the fit such as the link function. For example, one can have . With the binomial the response can either be a factor (taken as first level vs the rest) or a matrix with two columns giving family allowing user-defined models, the number of successes and failures. There is a and a family generator allowing robust fitting. The scope for ingenuity is unlimited! Binary Data The following example is taken from D. Collett (1991) Modelling Binary Data, page 217. Numbers of rotifers falling out of suspension for two species (Polyartha major and Keratella : cochlearis) are given for different fluid densities in the table, as file

An annotated session follows. Several points need further explanation.

…t ˜i

v ƒ  € €t   ¢ € Šœ WG¡F¢iƒ¦žœ„u q¡i‚#t¨†Gžƒ ¢S } …v 嘊 T ‚ ¢ ~•iS #t ux t ‚ } … v  µ µ u xµ w œ t …   —|i4T € ‚ €F€ ¢€µ ~yF¦qµ … ˜€ } … qt†…F$tŠj7ýŠw#v¨¢—˜t Ÿ ¸‚ ¨€ ¢ µ u — WstqŠŠW ”€ |¢†W € x¸˜’ ‰ |i4T € ‚ ¨ ¢ u … ‹ µ œ ù … …v …v µ } € €‚‡ F€ ¢ µ º£#€€’EµGt µ ux ¦¡ t W W
W t ƒT œ ¢ Gžƒ•¦ Ÿ †W µ œ

¢ € t F¤   }q}’B’§ µ q¨„F$„‘•F¨—… µ ˜†¨„F—$t Ÿ W¢ œuý ¾‚¢ vx vt¢ ‚¢ §ý wv¢ …

…v ˜i

…| …| ~ 143&4BT€ T€ Thi¤| € } „ }u } ~| € „| ƒ ~ … € € … } „ }u h4BT4BT§wT§wThi¤| € h‚ T‚ T€ T‚ Thi¤|  ‚ € } „ }u | }| … )43&~ T‚ &‚ Thi¤| ‚ …  ‚ }u  d… Tƒ Tƒ 4| $hi¤|  ƒ } | ‚ }u … 1„ $„ 4| !| Thi¤| | „ … } ‚ }u }‚| … h43& !| ~ … }  }u Thi¤| ~ d… ~ ƒ Tƒ Tƒ Thi¤| } }  }u € d… Tƒ T & §1i¤| ƒ ~ … ~ … }u ƒ d… „ „ Tƒ Tƒ §1i¤| } € … }u ƒ‚|  h4BTƒ T T ¥1i¤| ‚ ‚ | … }u „| | ƒ ‡TBTƒ $ 4| §1i¤| | } } … }u ‚€ ƒ ƒ hT†T T… & §1i¤| … … } … }u „‚| ‚ h4BTƒ Tƒ 4| $hi¤| ~  |  }u | ‚|  )4BT T„ $ƒ Thi¤|  | }  }u ~ƒ| … h43!| T ~ ‚ }  }u Thi¤| € ƒ } hT†4| T€ 4| Thi¤|  ~ }  }u … ƒ } 1T†T T„ 4| $hi¤| ‚ } | ƒ }u € …ƒ … d&D!| T€ „ ‚ } ƒ }u Thi¤| | ‚|  )4B4| T T| 4)i¤| € | ~| }u p x pu z y qu z y p x pu t r qu t r q p o n m l &§{¡¥wv¡¥w&dvETwvEhsT4T£!$k

Sµt

… ˜v

W †  § µ œ €

wv¢… #¨—$t Ÿ

The parametrizations need careful consideration. By default S uses a linear-model parameterization, contrasting each level with the average of the previous levels. This is less useful for GLMs. The first way out below is to remove the overall mean (the term) which forces separate means for each species. We can also change to the GLIM parameterization by the line.

numbers the factor levels in alphabetical order, so There is a catch here. By default we have to force the order we want (see 10).

Figure 6: Plots for Rotifer data. The square symbols and dashed line indicate species Polyartha major.

› }x W¡ {œ µ Gqt Ÿ

¢W € ‚ µ au µ

š ž‰

13.4 Generalized Linear Models

1.02

1.03

1.04

density

• • •

1.05

pm.prop 0.4 0.6

bare summaries Now combine the two species fit separate models for each species and plot them compute the proportions list the data frame
1.06 • • • 1.07 • • •

ß߄Πτ$t$$%q‘’£†s$¦„&|"%¨2¦s€{€˜Ffsq8†qžÆ ‘ŒÛ eej$F3¸%6è è Ð «Ò « Þ Ï Û Ð ÌÏ Æ Õ ­ Ç Ï Í Ç ‰ ­ ì ­ Ì Ç Ý A ß ­ Ç Ô Ê Î Ð Î Õ ­ Ç ÔÒ Ý ÌÏ ÍÒ « è Ê É Î Ð ÈÙ Û « ßëÈÇ ÏÎÐÈ ÍÞÎÎ €¦|qË Š˜FƒÒ Å #€#|Þ ß ­ Ç Ï Í Ç ‰ ­ Õ ß Î Ð ÎÙ Í Ö Õ Î Ð ÎÙ Û ‰Ò Í × Î Ð Î Õ ß ÔÙ Í Ö Õ ÔÙ Û ‰Ò Í × ­ Ç ¦s˜F„&›’Š˜F"—€fˆ˜F"¨˜(’ð”$¦“i„"—€“•"¸ÑŒžÃ÷G€Ô Õ ß Ô Î Ï ­ Ì Ç Õ Ô Ï ­ Ç ÝÒ Í × ­ Ì Ç ÝÒ Ç ÛÞ È ËÙ Þ Î Þ Ý Ê É ë È Ç Ë Ï Î Ð i„%F"%¨Ý ˆ#Î F4Ì Fº’ðí"%¨ºži«€3åq|¨éS¦|€%Š˜FÈ ßßp Í Öp Õ p ۉpÒ Í ×­ «Ç Ç ˜7€‘DžC˜!T’”˜Q$€h¨« Õ ß ß à ëÕ p Í ÖpÒ ‰Ç È Õ ß à ëÕ p ۉpÒ ‰Ç ÈÒ ÍÒ È Ð ÎÍ Þ Ë Ê É ­ Ç Ï Í Ç ‰ ’€#ƒŠ7€‘TÓ&€ó’€#ƒžC˜!T•1€ƒ’—$¦|qe÷s$¦„&­ Í ÖÙ « Ž Û Ù Û« q"¨Û 6è ˜v‰¸%6è ß#ŠÎ „$#…$$†Û q‘’£•%F"%¨ewŠ"—€{€$¦"—€óˆ3åq8¨žÆ ŽÛ e±#€3¸%6è ß Ï è Ð «Ò « Þ Ï Ð ÌÏ Æ Õ Ô Î Ï ­ Ì Ç Ý A ß ÔÙ Í Ö Ê Î Ð ÎÙ Í Ö Õ ÔÙ Í ÖÒ Ý ÌÏ ÍÒ « è Ê É Í ÖÙ Û « ß ß Ï è Ð «Ò « Þ Ï Ð ÌÏ Æ Õ Ô Î Ï ­ Ì Ç Ý A ß ÔÙ Û ‰ Ê Î Ð ÎÙ Û ‰ Õ ÔÙ Û ‰Ò Ý ÌÏ ÍÒ « è Ê É Û Ù Û « #ŠÎ „$#…$$†Û q‘’£•%F"%¨ewŠ"¨˜1€$¦"¨˜ˆ3¸ÑŒ¨žÆ ŽÛ e±#ü˜v‰¸%6è ß ‰Ð È Ù Í Ö Õ ÔÎÏ ­ ÌÇ ÝÒ ­ Î ÌÏ Ð ’TF¦Š‰åq£•%F"%¨~’E¦qq‰ ß à Í ‰ Õ ‰Ð È Ù Û‰ Õ ÔÎÏ ­ ÌÇ ÝÒ ­ Î ÌÏ Ð €#× Å „XŒTF¦Š‰¸Ñ‹•%F"%¨~’E¦qq‰ ßßØÕ àÒ Í × ÛÏ « Ô Õ p Ìp ×Ç ‰ÔÎ Õ ‰Ð È Ù Û‰ Õ Ô Ï ­ ÌÇ ÝÒ ÎЫ $Ó~’𤺀e£Š¨º¡˜„#“ŒTFFv‰¸Ñ‹•Î ¦"%¨º—$#a‰ Î Ð ÎÙ Û ‰ ô ÔÙ Û ‰ Ê É ‰ Ð È Ù Û $¦"¸Ñ&q"¸Ñ@ÃdF¦Š‰¨˜‰ Î Ð ÎÙ Í Ö ô ÔÙ Í Ö Ê É ‰ Ð È Ù Í $¦"åq|q"åqeÃdF¦Š‰—€Ö ß­ÈÇ ÏÎÐÈ ÍÞÎÎ Fa|qË Š˜FƒÒ Å #€#|Þ ­ÈÇËÏ ÎÐ a|€%Š˜FÈ ß r × È Ç Ý Þ Õ p Î Þ ÝÙ È Ç Ï Î Ð Èp Ò Ç « Þ ÎÙ Ý Þ Ç È Ê É ­ È Ç Ë Ï Î Ð Š¦|F˜Ç Å žq|¨Œ{€Ë „$¦Tž¨QÆ €3ˆÑ€ía|€%Š˜FÈ

ˆ

0.0 0.2 0.8 1.0

40

} å… ‚ •} |’‚ •} ‹ ‹ å%£F$¢ ž4&qt •’$Šœ &iGŠƒ q q¾ ‹  …t ‹ x ¡x ‚   ƒT ¡ x  … t T t W t ‰ ¾ } } ¦ ‚   ¦ ‹ ¦ w ¦€ x ¡ ý ƒ ¦ ¡ · ’ ý œ q¨¨|TC|§•žq„Ÿs‹ ‘ ’ •š ’ž…   ¦ € ‹ € ’ åš ý F€ Ÿ ŠŠ~£W €Ÿ Šv qS€ ‡˜i‚ Ÿ · €  œ ux ¢ ‰ ‡  …t ™€™Ã•’’ð”q” ˜¦•(™Š’ð£•ü”qq‘üš€š'–q(™Šq¢¥”¤Šyq‘ „±ž¦š $qš ¢ – ” ‘ ’ “¢ ‘ ˜’ š‘ ˜š š š €š¤šq)‘š ˜Ã•(𚀚 ¢’™S • }š x ‘q™ ¤€™ ™…   |£(¡'ò‡(т ‚t €’‰
with log-linear analysis: We consider the log-linear analysis of a contingency table. As this has two ‘history’ factors and two levels of the the response, it could also be treated as binomial data. The response is the occurrence of coronary heart disease. The table is of the form: 1 2 3 4 1 2 3 4 117 85 119 67 1 2 3 8 7 121 98 209 99 3 2 11 12 2 47 43 68 46 3 1 6 11 3 22 20 43 33 4 3 6 11 4
41

ß " × Ô Î Õ ç ë ØQ ë Úã ÌÇ Ý Ô Õ ÌÇ Ý ñÒ ­ Ç ÌÏ €)F#$« ˆ$#à «‘£%¨FfŒ%¨¦8’GŠ‘€« ß " × Ô Î « Õ çØ ÚQ Øã ÌÇ Ý Ô Õ ÌÇ Ý ñÒ ­ Ç ÌÏ q$F#$ҕ„4s«£%¨FfŒ%¨¦8’GŠ‘€« ßp Ç­ ÌÐ ‰­Ç Èp ×Ç ‰ÔÎ Õ ßßßp Í Öp Õ p ۉpÒ Í ×­ «Ç Ç $U"|q1s€‘¡ÑŠ#“i#˜7€‘DžC˜!T’”˜Q$€h¨« Õ ß ßØ Ú Õ p Í ÖpÒ ‰Ç È Õ ßØ Ú Õ p ۉpÒ ‰Ç ÈÒ ÍÒ È Ð ÎÍ Þ Ë ×­ Ç Ï Í Ç ‰ i˜Š‰Š7qTÓ&q“i˜ŠfžC˜!T•1€ƒ’—$¦|q$Ñs$¦„&­ Õ ß Õ ÌÇ Ý ñÒ ‰ Ç È × ­ Ç ÝÒ Ç ÛÞ È ËÙ Þ Î Þ Ý Õ Î Ð ÈÙ Û« èÒ Î Í Ï Ý Ç È ‰ Ê É Ç Ý i€ë Œ‘¨¦8Ó&€˜˜4Ì ¨~’’‘€#"—€|F“•$¦"¸%6ƒå|˜€Ñ€FB#qÌ FFÔ ß Ø à àÙ à Õ 5 àÙ Ø Õ ë àÙ Ø Ò õ Ç ­ Ê É Ç Ý ˜Šˆ‰å$iÜå4’|†˜|#qÌ FFñ ß " × Ô Î « Õ çp Û ‰p × × ­ Ç Ï Í Ç ‰­ã ß Î Ð ÈÙ « èÒ Ý Ç Î ÎÏ Ë Õ Ô Ï ­ Ç ÝÒ ­ Ç ÌÏ €)F˜‚ˆ‘E˜c“˜G$¦Š1X†Š$¦"¨Û 6ƒÑ€%Šfˆ#Î F4Ì Fº’GŠ‘€« ß çp Í Öp × × ­ Ç Ï Í Ç ‰­ã ß Î Ð ÈÙ « èÒ Ý Ç Î ÎÏ Ë Õ Ô Ï ­ Ç ÝÒ ­ Ç ÌÏ Š‘7€º“˜G$¦Š1X†Š$¦"¨Û 6ƒÑ€%Šfˆ#Î F4Ì Fº’GŠ‘€« ß ß Ï è Ð «Ò « Þ Ï Û Ð ÌÏ Æ Õ ­ Ç Ï Í Ç ‰ ­ Ì Ç Ý A ß ­ Ç Ô Ê Î Ð Î Õ ­ Ç ÔÒ Ý ÌÏ ÍÒ « è Ê É Î Ð ÈÙ Û « „Î „$t$$%q‘’£†s$¦„&­ î "%¨2¦s€{€˜Ffsq8†qžÆ ‘ŒÛ eej$F3¸%6è ß Î Ð ÈÙ « èÒ Þ  Ð Š˜F"¨Û e8’q¨Ì Þ ß Î Ð ÈÙ « èÒ Ô È Þ Û Û ¬ Š˜F3¸Û e8—|’ђ­ Î Ð ÈÙ Û « $F3¸%6è ß ß Ï è Ð «Ò Þ Ï Û Ð ÌÏ Õ ­ Ç Ï Í Ç ‰ ­ ì ­ Ì Ç Ý A ß ­ Ç Ô Ê Î Ð Î Õ ­ Ç ÔÒ Ý ÌÏ ÍÒ « è Ê É Î Ð ÈÙ Û « „Î Š˜t« ˜%qžÆ †s˜F„&G"%¨2¦s€{€˜Ffsq8†qžÆ ‘ŒÛ eej$F3¸%6è ß ß p Ô « Ð Ù È Î Ì Ð Í p Õ p Î Ç Û Î Þ Ç È ÎÙ È Î Ì Ð Í p Ò Í × ­ Î ­ Þ È Î Ì Ð ÍÒ ­ Ì Ð Ï Î ‰ #$”˜s¨Š‰Œ#F|˜ÑŠ”¦Ì Š|{€#"•¦ÑÑÑTž¨ÑE|G€#FÑÑž"Ñ{Š¦dÐ Î Ð ÈÙ Û « $F3¸%6è ß ß Î Ï è Ð «Ò Þ Ï Ð ÌÏ Æ Õ ­ Ç Ï Í Ç ‰ ­ ì ­ Ì Ç Ø Ê A ß ­ Ç Ô Ê Î Ð Î Õ ­ Ç ÔÒ Ý ÌÏ ÍÒ « è Ê É Î Ð ÈÙ Û « #Š%Š$#…†« $†Û ¨’óG$¦Š1|"%¨Ý î q¦s€{€˜Ffsq8†qžÆ ‘ŒÛ eej$F3¸%6è Î Ð ÈÙ Û « $F3¸%6è
13.4 Generalized Linear Models separate means for each species Note the parameterization used

Poisson Data

no yes

serum chd cholesterol

blood pressure
these lines are rather crude, so try harder! over-dispersion, but a common slope looks OK

 ¢ #iW u u #€ iW €¼ €

Wtƒ ’sŠŠu  

à ë # % " Ú #Ù à à à àÙ #£$#(•#4ˆà à Ú 5 5Ù #$0Œ% ' Ý Å ƒEsG€¦Š–ÑB{­ ÍQ ­ ­ Ç È ‰Q Û ¬ È Ç # # " ë à à àÙ à Ú 5 5Ù )$)#4•)(Œ% ' % # ëÙ ' §$ˆ&Ø " Ý Å 8CssqF‰ ÍQ ­ ­ Ç È Ø Ø à à à à àÙ à ' Ú àÙ % #Š##4•$#4Œ§ë ëØ ë Ú %Ù à #Fiˆ" " Ý Å 8—˜B{­ ÍQ Û¬ÈÇ % Ú % á " à àÙ à Ø Ø ÚÙ % §F§$#4•6„4Œ§Ú ÚØ ' % %Ù % £$iŒë ' sG€¦Š–ÑB{­ ­ ­ Ç È ‰Q Û ¬ È Ç à à à à à à àÙ à à á 'Ù # ###4•#(ˆ)5 % ë ' à áÙ ' áØ Ø )ˆ$%Ø Ý Å Í à à à à à à àÙ à à 5 ÚÙ # % ###4•±#$4•¥§ë Ø ë Ú 5 # ëÙ #Ø )$ˆ&„" " sG€¦‰ ­­ÇÈ à à à à à à àÙ à á Ú #Ù á á ###4•±#(•#Ú Ø $ë # à 5 "Ù 5 #$0ˆ$5 " ˜¬ |#­ Û ÈÇ 5 ë ëÙ % % )4Ž$§á Ø Ø Š" b • $b §” ß Å ÒÈ #Ï |P å# TÇ  ˆ{FsÇ 2 Ë  •|FGÇ 2 #Š%$ŠhÇ  Ë   Ù ÝÏ­ Ù ÝÏ­ ÇÍ ÌÞÏ  ß Î­ Þ « РΠέ Ï ËÒ Ô « ÞÏ Î ÌÇ ¬õÇ ­ ÝÇ Ý ÝÞ ­ ÛÈÇ „|G¨œ¦|aÈ Šƒ)$G« ˜ŠF%Šh˜|˜¨#˜|{€r Û¬ Q Ç­ÌЉ­ ˜GÌ E#"Ѩ&GÇ 2 « FŠéјGBqF ÇÝÐÛ ÌЭ­ÏÐ Ç« Þr ÇÍ ÌÞÏ ËÐ Ï­Ô ÞÌ ¨QÆ €“Š%$„TÇ  $2­ FE˜« „$d ß p Å p × Î ­ Ç Î Õ « èÙ Ö ÖÒ Þ  Ð $£Ï ÑP ¡¦|sq££Û e"Œ#8ž€¨Ì Þ © …v ˜i t iS ‚ µ #t } |’4Uqº£W ’“ˆq} ‹ •€ý …v T ¾ ¾x v u ¯ } x¡ …x œ t ¼ œ … v 8ŒF’u }ƒ¦ ’Š˜¡ œ š T ‹ ˜iµ"T q¾ x  G‘t ƒ š u #µ Ÿ˜i T q …v ¾ W ¾ ·€ œ •q€ €„u · …   ¦ „‰ T ù Fg’º•i’ŠŠU˜Šqg€š ~•iS4U#t¾ € } F ý W  v ‡¾ ¦ ¢ ¦ §¦ „W  Ջ …ð‰ T ˆ ¾ x t ‚ }¾¾ýtWtƒ ‹ µ€ uýwv¢… ‹ ƒ¦¡ € œuû œ € †q¨¦iGŠs£‚ ¨F€ ¢ µ FŠ#¨—˜t Ÿ Œ„Fû €€ Š„†…   ¦ (ù € …   ~¸˜i q‡ ˜i4Uq¾ ‚ x … v ‰ … v  Tµ ¾
command gives an analysis of deviance for objects:
42

There are number of facilities to update models. The vious fit and changes the model in some way. 13.5 Updating and Selecting Models The
13.5 Updating and Selecting Models

and show the (approximate) effects of adding and dropping single terms, and runs a fairly general stepwise fitting procedure. (Note that S-Plus 3.x has a separate function for multiple regression.)

œ š u µ €ƒ

ƒƒ š €¦t

function takes a result of a pre-

S-Plus is particularly rich is functions for exploratory multivariate analysis, such as and . There are also functions for classical multivariate analysis. Clustering The workhorses here are which computes distance matrices (also used in which computes a cluster tree by single-, average- or complete linkage.

Distance matrix calculations Hierarchical clustering Create groups from a cluster tree Plot a cluster tree Label a cluster tree plot Re-order leaves of a cluster tree Extract part of a cluster tree ”model-based” clustering auxiliary functions

Graphical Methods This is a varied collection of functions for displaying multivariate data.

Classical multi-dimensional scaling Chernoff’s faces Minimal spanning tree Star plots Biplot (v 3.2)

Two analyses of socio-economic data on Swiss cantons:

œ¢t € „’u

vt ƒ… q€q¡ € ž$¡

­4‰$hÐFÈè2#€È ŠαÐFΦϔÇ#€#ΐ|FÝF$ÑÊqȈ“ßqšÕ Å ’#€¦’“Õ Å å{¨¦$$—|"h« Fa‰ ¬ ÇÇ Ì ÇÈ ÈÇ ÈÐ Ç ™ ß ß" Ò ÇÇÈÎ¬Í Ò ÈÇÝÈЫÍÒ Î­ ¬ Í« ß" ÒÇÇÈά qšÕ Å ’q¦’Í ß Å —|"h« Fa‰ Ò Î­¬ Í« ß ÝÒ Î­ ¬ Ê qº—|"h« Í Å É Å ßßØÍÒ õÇ­ Õ ëÍ Õ ØÍÒ ÎñÇ #$q‘Ñ›å˜sŠq—{€Î ß Ô « ‰ ÈÒ ÈÞ ÆÏ Ð Š{Ǩa4ϊƒåÔ{€È¦€«ëÛ FÈ#ː™@ߘ¨«“ט„‰å˜‰žq‘å˜a16˜Ç p Ìp Ç ‰ Ô Î Õ ë Í Õ Ø ÍÒ Î Ð « ‰Í ­ õ ç ã É ëÍ Ž çØÕ ã ñ ÊÉ Ø $ë ՈŽñeÊðÑs‘‘%Œe#±qÍ ß ÝÒ Ç «ÞÍ­ Ý ÛÍ ÊÉ €ºž¨$6Š«qñ ߊ3Ùås­ Šh)å{Bq2ðÝ ñ ­ Ï u­Ò έÏÝ ÊÉ ß ÔÇ « ‰Ï ÈÒ Ô ÈÞ È ÆÏ „|Fa`„8—|qF‘€«

W € iƒ ¢

‚¢ ˜u €

W v $¢ u µ ’i†§ œ ¦t W œ q€ž„W € … ¡ q€€t vt ƒ… €qq€¡ € ’˜¡Ÿ

¡ µ qžž… vœ v¡ € W t v€¡† q€ q€†……   €ŠW    qŠa§ ˜€ œ œƒ v ¦Š€œ q¡ W €†  €’µ qv v¡§t W v¡v €†žq€’¡u  q  œ„W v ¡  €’¦

W   €’¦ € v¡

W

€ ¢ W  € ’ƒ

¦ †  „§ € œ

˜

Multivariate Analysis

43

14 Multivariate Analysis
,

) and

ž

Ÿ
s

¡
-9
v

¡¡

¡ ¡¡ ¡ ¡
v v v

¡¡ ¡ ¡ ¡
v v v

second discriminant variable -8 -7 -6 -5

  

} T „sqx¨#€ºT ’ܸ4T ’¢ ¨bT i#{yas’W ýY¡ ‹ §tv ¢œ¢ ‹ w ¢œ ‹ Y ¢ œ¢x WY } ¦ ˜  v– § t ¢ œ t S W ‚ t ‚ ¢ … ¢ œ ¢ ƒ ƒ ¡ ¦€ ý § t v ¨qqi#¨’¦i|a˜i€˜å|’F¡ ’€ ”Š‚ q ’„#€’w ‹Óq€’§#¨¢’Ft’|6’˜å|i¦¡ ’jW #¢ ¨Š#€’ÙåCѨF’q€Ò¸€4T ’¢ µ ¨bT € i#{yW ’u ¦ v t œ S W ‚ t ‚ ¢ … ¢ œ ¢ ƒ œ ¦ ý§tvY ‹ ¦ ‚¦ ýuwW ‹ w ¢œ ‹ Y ¢ œ¢x v } – ‹ ¦ S ¦ x € u  œ ‹€ q} q“ ÓC$d£Š6•} – Ÿ 唧£#žœ •} – ¦ §£€Š«ˆ¡ ‰ € qºT i#¢ ‹ ¦¡¦x u ‹ ‹ ¦x u œx § t v ¢µ œ » ‹ S ƒ €“ ¢ œ w ¢ œ ¢q“ € ¢ œ €‡ ‰ Y € œ ‘ F¹ 68T € i#¢ q‡ qT } € FŠq’¯ u » ¨¨UT QƒT i€˜{Œq$†%£T ¦ € i„¢ § ‰ } 4 i¢ #¡2 ‹ š ‹ s¹ 8 S ƒ ¢ œ ¢ ’x ƒ¢ ‚ ¢ q§‡ ¡ x p ¢ œ#  Sƒ  § ¢ t W ‚ … œ ¢ƒ ƒ‚ œ t œ ¢ ƒ € i#(› û€ F’ € iœ ‰ 6ƒ †i# ¢œ¢ › œtS ¢ S ƒ € ¢  œ € €v’tF’œF’SÞ6‚#ti˜¢—Ñ¢’F¡ € iq„$¢ Ÿ H € #’S ¥€ ’}ƒT € ŒFŠpT £ž’#|ŒFT ¡ € iƒ#¢ q‡ ’ƒTT € i#¢¢ ‹ œtS ¢œ¢x œ ¢ ¢ƒ ¢œ }™ q} ‹ – y€ ˜Š~‹ ™ £Šœ €”€ FŠ€qiu x uœ x€ u  ‰ ‡  ¢ ¡  } ‹ ‹ ¢ œ ‹ ‹ ‹ ¢ œ ‹ ‹ F‹ “ ¢ œ ¢ x ƒ· Š‚ ¢ § ‰ ‰ ‡ Ft S ¢ œ i» ™ qF¹ € i#¢ y» ‘ €F¹ € ’¢ £»™ š FF¹ € ’#|Œˆš ˜†¨œ €‡qS€’pT € i#¢€ ƒ œ
An example of discriminant analysis with Fisher’s iris data:
Mahalanobis distances Canonical correlation analysis Discriminant analysis Principal components analysis Principal components analysis (v 3.2) Principal components analysis (v 3.2)

The classical methods based on variance-covariance matrices.

Matrix Methods

Multivariate Analysis

vv

Ÿ

          

 

c

c

cc c

 

¡ ¡ ¡ ¡ ¡ ¡ ¡   ¡   ¡ ¡   ¡ ¡¡    ¡              ¡ ¡ ¡ ¡ ¡ ¡ ¡¡ ¡                        ¡ ¡ ¡            ¡  ¡              ¡¡
v v v v v vv v v v v v v v v v v v v v c v v v v v v c c v c c c cc c c c c c c cc c c c c c cc c c cc c c cc c c

Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ ŸŸ Ÿ Ÿ Ÿ ŸŸ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ Ÿ ŸŸ ŸŸ ŸŸ Ÿ Ÿ ŸŸ Ÿ Ÿ Ÿ Ÿ Ÿ Ÿ ŸŸ Ÿ Ÿ
s ss s s sss s ss s s s s ss s ss s s s s s s s s s s s s s ss s s s s s s s s s s s s

¡ ¡

v

v

v

 

¡

-10

v

vv

v v v v v

v

v

Figure 7: Discriminant analysis
-5 0 first discriminant variable
c c c c cc c c c c



5
s s

¡

¡ ¡ ¡

-4

vt‚tW¡ €’#iG€t u ¡‚¢œ Š… µ i˜i„uŸ u „… ¡ œ ’ ¡ µF¡ŠF’ƒu œ ¢ œ ’€ q¡ ¡‚t ¢ ‚ t vµ t ¦ t € § µ #q€’†…

44

˜

ÖÍÐÈ ÇÎÏ ÌÇÔ­ ÌÏ «ÇÖÍÏ |¨¦e€%’%€|ëð$€|˜’Ì ÔÇÆ {Š#Æ Þ È¦$#6±$i«¨GÐ Å hq{Š$T˜Í ¬Ð«Ë «ÞÇ ÛÇ« u ÌÏ ÈÇ ‰‰Ð Û ‘Ç Å Í ­ÎÇ­ÞÎÞ a|G€{¨Ý ÌÐ ­­Ç èÇÈ Î­¬ÆÐÈ ÈÇƬ Ñ|ϦsG€È{€{"GÑF÷|Š#B¤ ß ÚÙ Ø × Ö Õ Õ ñÒ è Ç €i¤j“ˆÔ ˆ8—|qÈ Å ç¦Ì$Ш$Öe­#ՌÌ$§Ðq˜Öð¬6Û Î Ï u Ì u Ì Þ ­ ‰Ð È ÈÇ Æ ¬ ß ­ Õ ¬ Û Õ ÚÙ Ø × Ö Õ ÔÒ ­ ÈÇ Æ Å %Šuiã ë÷« 6ÐqTFF‰3|Š#B¤ ¦™ŽQóå4’üˆ“ˆƒ’E{Š#¬ Å Ç F« Þ#Í­ d Ï u Ð Ï Î Þ Ð « Ç Æ ¬ ß ÚÙ Ø × Ö Õ ÔÒ È Ç Æ €i¤”“•8å{Š#¬ Å  $ Å Î ŠhÌÑ|„|#ͨõÈ|Š#B¤ Ç ÞÈ Ç« ÎÈÞ õÊÈ ÎÌ €è¦Ì €Ü¨„ÏŠ#|Š¬h˜q|ǀ¦Ï ßÔ S ŠƒÒ 2 $6 ­ Ï Î Î ­ έ ¬ÆÐÈ ÈÐË ­ ÌÐÏ ÎÍ Ì¬ s͘Š|­ ϊ{ހÎ|î|"GѦ$¦e"||Š{Š#BË ß Î­ ¬ÆÐÈ׉« Ò ÔÈÞ ÈÆÏ Š|4s|F$¨d$Ç Å —{€F‘€« ©
name

}

ý uv  ¦x w œt œ §¢ Š˜q’º£¨¦ŠŠ˜Šv 

­ÍÏÎ Ï ÎÞέ «ÞÏÎÞ ‰ s$„|­ Š|q|×$$„|Š1­ « $„|„&­ ÞÏÎމ ­ÖÈÐuÎÇÌ ÞÈ¬Ç E#$F)|„é« qF’„Ì ÎÇÌ |„Ì ÔÈÞÈÆÏ« ÌÏÞ {€¦€×F’Û   d $£$ ¥ ­ ¬ « Ê ’"h6h¨ Å %ŠDs$„|­ Š|q$Ñ$qa$§${¨#F ¨ ÎÏ u ­ÍÏÎ Ï ÎÞÎ ÝÇÏ«‰‰d ÌÈÇÝÐ ß %''Ø ÔÇ«‰ ö ­Ç«ÆÞ ÌÇ  ÛÐÈË ­ ÌÐÏ ÎÍÇ­ ÔÈÞÈÆÏ Š§$1|Ò |Fa`Ï 2 eG¨a%Š%€x%Fí"Ñ{Š{î|qF‘Šb ß © ’‘„Ì ¡||„|#sðjT« Ç Å å{€¦€« Ç ÛÞ § ÌÐÏ ÎÍÇ­É × ‰ Ò ÔÈÞ ÈÆÏ Q Ì ¬ È ¦#B± • $h¨gÃ$”#$F{qÈ €SÑ|„|#Þ#|€¦q« Å #Ç b Ê  ÌÏ È Ð Ô È Ð ÎÍ Ç Ï Ý Ì Ð Ï Î Í Ç ­ Ô È Þ È ÆÏ ÍÞ Ì Ç«ÏË W d ψ¨h„$  )W 2 Ç Å e#rÑ{Š|#Þ#|qFq« Å #¤ÑˆÑ{Š{i|˜F¦j€˜Še˜ Î ÇÇ­ ÌÐÏ ÎÍÇ­ ÔÈÞÈÆÏ ÍÞÇ ÌÐ ÌÐÏ ÎÞ ÛÈÐËÌÏ ÇÈÐ Û ÈÐ ­ ÌÐÏ ÎÍ Ì¬Ë è Ì ÍÞÇ Î ­¥ ÔÇ « ‰ 4Ñ|„|„B÷FÏ Å #€í…‘{¨Q`Ï 2 Ù  Ù e ÔÇ«‰Ï |Fa4ŠÈ Ùåǀ{Š¬ FÌ q8TÇ èÞ è Þb  u r Ë Ð Ø Ø È Ç Î ‰ Í Ð È Ë ­ Ì ÐÏ ÎÍ Ì ¬ ­ÍÏ ÎÌÞ ÛÇ s˜ŠF%i«­ ” Ç Å 9˜6ò|€¦&Þ Å (Û ¦Ü"Ñ{Š{Šh Ù «Þ ¬ Þ  ­¥ ÈÇ ÛÛÞ ÈèÐ È Ð ÈË ­Ç « ‰ ÛÞ ñ ˆ$ŠGÌ €D…R|’‘q˜FÛ F#esFa¥‘€#W ÛÞ ñÇ èÐÈ «€|q$¦F‰ Ù Þ ¬ÌÞ  ­¥ ÈÇ ÛÛÞ ÈèÐÈ ÛÐ ÈË Ç « ‰ ÛÞ ñÇ uÞ ÈÝ ˆ« „s%€D…‘{i#‘€#$¦x%Fí¨a¥‘q|ðh€$# u Þ È Ý è Ð È h€$¦$¦F‰ Ù ­ÌÐÏÎÍÇ£ÐÈ å4Ñ|„|#¨$¦F‰ Ï u ­ ‰ Þ Û Ë Ð Ô Þ « ‰ ­ ­ ‰Þ "1iÛ Å Î „84&’e$”{¨Q&BÏ  Ù ­Ç èÞ ÛÏ ÔÞ « ‰­ åG€|’ºq{¨Q&BÏ  ÇèÞÛ q|’ºÏ Ù ­ ÎÍÇ £ ÆÐ ß Ç èÈÞ «Ò Þ ÌÈÇ ÎñÇ Ç «Ý Þ —E|#¨UÑe¦€{¨t(« „$|q{¨GqÌ q¤ «Þ ÌÈÇÎñ $Š${€#|Ç Ù ÇèÞ èÌÞb  u åq|Š¬ F%€TÇ ” Ç Å Û F#eE{F7є¨Ì 2"Ñ{Š{Šh r Ð È Ë ­ ÎÍ Ç £ Æ Ð Ý Þ ­ Ì ÐÏ ÎÍ Ì ¬ ­Ç«‰ÛÞñ G¨a¥‘q|Ç Ù ­Ç ÛÏ Î Ý ÌÞ ­Ç ÎÞ Ý Ç «Ý —si~Šq%sq|F2¨GqÌ Þ Å FÜ"Ñ{Š{Šh ÐÎ ­ÌÐÏÎÍ̬ ÌÐ |FÈ Å Í
¢ 6 r 2   W 2 ” )¤)T6 §P £W  £T6 )e Q Ô È Þ È ÆÏ « e#|€¦qœÇ ¢ 6 W ” $7r P 

Î ÌÏ Ç« Þ«ÏÞÞ ÇÈÞ ­ ÌÐÏ ÎÍÇ­ èÌÏ uЫ«ÐË Å ”‘”¨QÆ FhFqTq|4Ñ{Š|#Þ¦„Gs¦eÇ Å r
which on one of my systems gave:

}x wœtœ§¢ |£¨¦ŠŠ˜Šv 

Libraries are a mechanism to add ‘packages’ of extra objects (functions and datasets) to S. To find out which libraries are available type

e.g.

To find out more about a section, use

A Libraries
Libraries 45

¬

A.1 Library

To use the library, invoke it by

which attaches it as a data directory at the end of the search list. Thus libraries cannot over-ride standard functions nor your own functions. To make a library over-ride the system functions, use name

This is a collection of useful functions and datasets for teaching at Oxford.

Datasets in the library are:

US accidental deaths 1973-8 dataset on heat evolved in setting cements dataset on performance of cpus time series on UK lung deaths 1974-9 from Diggle as above, for males and females remission times on leukaemia patients (censored) Forbes’ dataset on boiling points, from Atkinson dataset on times of Scottish hill races (uncensored) survival times on leukaemia patients time series on luteinizing hormone from Diggle body weight(kg) and brain weight (g) of mammals, from Weisberg motorcycle impact data – Silverman JRSS B 1985 accelerated life testing on motorettes time-series of temperatures in Nottingham, 1920-1939 dataset on road deaths in the US dataset on relating permeability to physical measurements dataset on rubber wear ship damage incidents, from McCullagh & Nelder Black Cherry trees heights, diameters and volumes

A.2 Sources of Libraries Many S users have generously collected together their functions and datasets together into libraries and made them publically available. An archive of libraries is maintained at Carnegie-

b ­ )•

ß ÎÍÇ £ ÆÐÒ ­Ç ÈÝ ¬Î Š{¨UÑ~’sq$¨B|­ ߄|F7|º’G€˜F|­ ÎÍÇ £ ÆÐÒ ­Ç ÈÝ Î ÎЫ‰Í­õ $Q&#Q˜Ç ߊ4Ն€1ÏF…ÕåFyÛ ¦ƒ6è ñ Ç ¬ ­ Ð ÎÕ Ð È ËÒ « ß ÔÕ ñÒ ñ ñ „4•8å˜Ð P $Ð e

­ÇÇÈ s#€Î ­ ‰ "4Ï Å ­ ÈÇÆƬ {Š#BÈ ÖÍÐ {¨FÈ ÝÞРѨFÈ ÛÇ ÎÎÐ «€#$qÌ Ç ÎÇÈÐÎÐ qÎ|q$¦$ŠÛ Ç ÍÔÍ F« q|iÛ ­Q$i#‘iÛ «Þ ÛÛÞ Å« Ö¬Ç ¦’¨« ­«« QGhÏ Å ­Ç ÆÈÐ GŠ˜$FË Ì Þ Å €è Ç ­ Å |Þ#ǨÝFË ­ Å |#¨ŠÛ Î Õ ÎÞÇÝ ­ ÎÞÇ Å {¨Ý ­¬‰ 4s&Í Î Ç ÛÇ ¦Ì ’‘Í

wvu¢ q’$’œ

A.1 Library

Box-Cox plot for transformations. replacement for GLIM . equally-scaled plot function. calculate standardized residuals from a fit. calculate Studentized residuals from a fit.

tWt« ’s’(T

which attaches it at position 2 (after the

}©ý œ F„W € ¢ Ÿ ‹

}

name

directory).

" 5 'Ø Ê à Ú 'Ø ­ « )$1q¨#1jQG«

Þ Í Ç Ì ‰¥ Þ Ï è « ÷ŠÑÐ Å iªÌ ˜Š$$Ç e ÈÇÝuЉ Ö«Ï Û ÌÏ ÝÞÇ {¨F§q8˜h†‘ð˜#¨«

ÔÇ«‰Ï |Fa`„È

­ Å |#¨ÑÞ ÎÞÇÝÍÍ

x wœtœ§¢ £¨¦ŠŠ˜Šv 

x wœtœ§¢ £¨¦ŠŠ˜Šv 

­ÇÌ s„ÑÐ Å ‰ Ö«Ï $„Û

©

46

with body

Ftp to

with user

§¢vWt ˜Š’s’W €

ƒ …¡T Wt T §¢ v ® §¢ v Wt   #ºT   ˜º¢s’W € ¢˜„ž“$„’s’W €

 …¡T Wt   ƒF8T   |«gs’W ¯°… œ YGž„˜(Š# ƒ‚¢ ƒ‚ µ Ÿ Gž„˜(Š# Y ƒ‚¢ ƒ ‚

€

€

T §¢ € g$„v

©

A.2 Sources of Libraries

47

Mellon as a service to the statistical profession by Mike Meyer. To obtain details of its contents by e-mail send a message to

is also available.